For months now, social distancing has forced a shift in our communications practices. In the blink of an eye downloads and the use of video conferencing technologies exploded. While tools such as Zoom deliver intuitive experiences with many benefits, they also expose the challenges to building great software. In this blog, we explore examples of the classic and inherent biases in video conferencing, the importance of building inclusive products and why it’s so critical to continuously test and learn.
The silencing of Zoom
I’m lucky enough to spend every day in a work environment where I am heard.
Frequently the only woman in the room during a meeting, I am heard. My male colleagues are astute and read social cues in a meeting room which means that whilst I’m not physically the biggest, or the loudest, my opinion is valued and made space for. I’m not interrupted, and my voice carries weight.
When the “new normal” set in, we were lucky to have incredible technology at our fingertips helping us and our communities, families and businesses stay connected.
We use Zoom as a business and it’s brilliant.
It’s intuitive, accessible, and it works. Over the past few weeks I’ve bounced in and out of every video call technology on the market as I speak with colleagues, clients, and prospects - and I can safely say it’s the best one I’ve used.
That being said, using the tool with such frequency, I have started to pick up on some things that ordinarily I might not have spotted...
So here’s the big caveat.
I don’t think that Zoom was built to be sexist. But I think that in creating an intuitive tool, it’s added some of the classic and inherent biases that get embedded in software.
Selective speaker mode relies on picking up the loudest sounds - and often, that is a male voice. In natural volume and timbre, this is frequently the case. The problem is that in making speaker selection heavily based on these characteristics, women are artificially silenced on video calls, when they would, ordinarily, be heard in a physical setting. I have been on countless calls now where I, or a female colleague, has been talking and have experienced speaker selection auto-reselect mid-sentence purely based on a single word or sound from a deeper or louder voice.
My male colleagues, having become equally aware of this as we’ve used these communication tools more, have been mortified by this. It’s often not even a purposeful interruption that creates this muting - it might be a single word, a cough, or a laugh.
But the fact is that it happens. In the real world where I have always felt heard, I am now finding myself muted.
The issues of gender, voice and AI are not new.
They are also something that are definitely not restricted to Zoom and have been increasingly discussed throughout the technology community. As the Harvard Business review stated “I do not believe that the creators of these systems set out to build {...} sexist products. It’s doubtful these biases are intentional, but they are still problematic. The fact is that speech recognition understands white male voices well…but what about the rest of us?” There has been a host of research on speech recognition across gender with research highlighting the “inherent technical problem down to the fact that females generally have higher pitched voices [...] tend to be quieter and sound more “breathy”. The focus on pitch in video calling and voice AI seems to be a thread that flows throughout a lot of research and problems with current software. This is something that may be affected by factors such as the location of the call and the microphone you are talking through, but undoubtedly is affected by the speaker's pitch and tone, which if higher than a parallel voice or sound will result in the overtaking of the speaker.
Building great software is complex, and Zoom is certainly leading the way when it comes to video calls and conferencing.
But times like these remind me of how important it is to test these widely used products with all markets, all contexts, and make sure that you’re adding extra care to ensure they work for everyone using them. Simple solutions could improve the product experience such as adding tolerance to the algorithms that assess who is “the speaker”. Using the same visual cues as speaker selection, perhaps use a flashing white square or similar to indicate a meeting attendee who would like to say something. Or, even at its very simplest, offer a toggle to enable people to turn speaker selection off. I know how valuable it can be to give a single voice a platform during a meeting, to silence the background noise, and to create a feeling of order and formality - but equally, people value the real nature of overlapping conversation and collaboration. And we want to be heard.
So how can we make sure we’re accounting for inherent biases in the products we build?
On this topic, there is a host of research and solutions suggested by AI now surrounding the lack of female AI leaders and the necessary profound shift needed to create diverse and accessible products. I know that as people use products and see the day-to-day barriers, if we voice what we see, we’ll see things change. So I’m looking forward to seeing how we spot and explore these issues as we continue to rely on technology to enable our work lives, and see how we can change products for the better.