AI Project Preserves and Recreates People's Voices
Thu, January 27, 2022

AI Project Preserves and Recreates People's Voices

Artificial intelligence is changing our society and our daily lives in fundamental ways / Photo by: melpomen via 123RF


Artificial intelligence is changing our society and our daily lives in fundamental ways. While there has been a growing fear of AI stealing human jobs, people still trust the technology. Forbes, a global media company focusing on business, investing, technology, entrepreneurship, leadership, and lifestyle, reported that 50% of workers are currently using some form of AI at work – up from only 32% in 2018.

Reports show that 65% of workers are overall grateful for having robot co-workers, while nearly a quarter report having a loving and gratifying relationship with AI at work. Also, 64% stated that they would trust a robot more than their managers and 82% think robots can do things better than their managers. Some reasons workers think that AI can do better include their maintaining work schedules (34%), problem-solving (29%), providing unbiased information (26%), and managing a budget (26%). 

People have so much trust in AI that they also trust it with their voices. For instance, Google launched a text-to-speech system called Tacotron 2, which utilizes the company’s deep neural network and speech generation method WaveNet. WaveNet can analyze a visual representation of audio called a spectrogram to generate audio, which would be used to generate the voice for Google Assistant. It is now almost impossible to tell if a voice is human-generated or AI-generated. 

Tools like this are extremely helpful, especially in offering celebrity cameos. For instance, John Legend’s voice is now being used as an option on any device in the US with Google Assistant. The voice can respond to questions like "How far away is the moon” and "What's the weather.” This kind of advanced technology opens new opportunities like Lyrebird to provide new products and services. Lyrebird leverages AI in creating voices for chatbots, audiobooks, video games, text readers, and more. 

WaveNet Technology

AI’s capability to advance vice technology goes beyond offering celebrity cameos and opening doors for companies. It can also help reunite speech-impaired users with their original voices. The story of Tim Shaw, a former American football linebacker who played in the National Football League, is the perfect example of this. Shaw was having the time of his life as a football linebacker when his performance suddenly faltered. At home, he would drop bags of groceries and his legs started to buckle underneath him. 

His condition soon worsened. It all made sense when he was diagnosed with Amyotrophic lateral sclerosis (ALS) in 2013. He discovered that the neurons that control his voluntary muscles died, leading to a total loss of control over one’s body. Aside from that, ALS made it difficult for him to move, swallow, and even speak. As Shaw put it: “it’s beyond frustrating not to be able to express what’s going on in my mind. I’m smarter than ever but I just can’t get it out.”

This is where technologies like AI come in. For years, tech companies such as Google have been customizing text-to-speech technology to the user’s natural speaking voice. Thus, Deepmind, a UK AI company, used WaveNet technology to recreate a voice with just a handful of audio recordings. WaveNet, a generative model, was trained in many hours of speech and text data from diverse speakers. After six months, Google’s AI team showed Shaw and his family the results. They were able to hear his old voice again for the first time in years.

“It has been so long since I've sounded like that, I feel like a new person. I felt like a missing part was put back in place. It's amazing. I'm just thankful that there are people in this world that will push the envelope to help other people,” Shaw said. 

Preserving People’s Voices

Recently, AI once again showed how it can help people who are losing the ability to speak due to illnesses such as motor neuron disease and throat cancer. VocaliD, a pioneering company dedicated to preserving and recreating people’s voices using AI, in collaboration with researchers from the Northeastern University of Boston, launched The Voice Preservation Clinic. The clinic aims to provide people with ways to maintain their sense of identity, particularly by keeping their voices. 

Prof. Rupal Patel, the founder and chief executive of VocaliD, stated that while many people can record their own voices in their own means, not all of them have access to better equipment for high-quality recordings. Thus, his team decided to bring the technology to the community. 

“What we have them do is record about two to three hours of speech. From those recordings, we then are able to build an AI-generated voice engine, essentially, that sounds like them,” Patel said. 

First, the team "banks" the person’s voice by providing them with poems, speeches, and short stories in a range of topics to read. The clinic takes those recordings and feeds them to machine-learning algorithms. The AI-generated voice engine breaks down the sound of the voice. After that, the digital voice is installed on the accompanying app that has been installed on the user’s phone. The app can then produce the audio of the sentences in the user’s voice.

AI is undoubtedly opening new doors for people who feel like they’re losing their identity just because they’re losing their voices. 

Recently, AI once again showed how it can help people who are losing the ability to speak due to illnesses such as motor neuron disease and throat cancer / Photo by: Federico Marsicano via 123RF