In a video aired on a Jan. 25 news network, President Joe Biden talks about tanks. But in an altered version of that video, which has racked up hundreds of thousands of views this week on social media, Biden is made to appear to be giving a speech attacking transgender people.

Digital forensics experts say the video was created using a new generation of artificial intelligence tools that allow anyone to quickly create audio that simulates a person’s voice with a few clicks of a button. And while the Biden video didn’t fool most users this time, the images show how easy it is now to generate digitally manipulated videos, or deepfakes, that can indeed be damaging in the real world.

“Tools like this are practically going to add fuel to the fire,” said Hafiz Malik, a professor of electrical and computer engineering at the University of Michigan who specializes in multimedia forensics. “The monster is already on the loose.”

It arrived last month with the beta phase of the ElevenLabs speech synthesizer platform, which allows users to generate realistic audio of anyone’s voice by simply uploading a few minutes of audio samples and typing in any text to play back the voice.

The company says the technology was developed to dub audio from movies, audiobooks and games into different languages to preserve the voice and emotions of the original speaker.

Social media users immediately began sharing an AI-generated audio sample of Hillary Clinton reading the same transphobic text used in Biden’s altered short film, as well as fake audio clips of Bill Gates allegedly claiming that the COVID-19 vaccine causes AIDS and actress Emma Watson allegedly reading Hitler’s “My Struggle” manifesto.

Soon after, ElevenLabs tweeted that it had noticed “an increasing number of cases of voice cloning misuse” and announced that it was exploring safeguards to curb abuse. One of the first steps was to make the feature only available to those providing payment information. The company also says that if necessary, it can track down and reach the creator of any audio generated.

But tracking down creators won’t mitigate the damage stemming from the tool, says Hany Farid, a professor at the University of California, Berkeley campus, who is dedicated to digital forensics and disinformation analysis.

“The damage is done,” he added.

In one example laid out by Farid, malicious individuals could manipulate the stock market with a fake audio of a CEO announcing a drop in earnings. And there is already a short film on YouTube in which the tool was used to alter a video to make it look like Biden claims the U.S. would launch a nuclear strike against Russia.

Various free and open source programs with the same capability have also appeared online, making paying for commercial tools a no-brainer. The Associated Press used free software on the Internet and generated audio samples in a few minutes that sound like actor Daniel Craig and actress Jennifer Lawrence.

When deepfakes first made headlines about five years ago, they were easy to notice because the subject didn’t blink and sounded robotic. That’s no longer the case with the evolution of the tool.

The altered video in which Biden is made to appear to make disparaging remarks about transgender people, for example, paired AI-generated audio with actual footage of the president as he announces Jan. 25 the dispatch of U.S. tanks to Ukraine taken from a live CNN broadcast. Biden’s mouth was manipulated in the video to match his movement to the audio. While most users on Twitter acknowledged that the content was not something Biden would have said, they expressed surprise at the realism. Others apparently thought it was real or at least didn’t know what to believe.

Hollywood studios have long had the ability to distort reality, but access to that technology has been democratized without considering the implications, Farid said.

“It’s a combination of the powerful AI-based technology, the ease of use, and the fact that the model seems to be: let’s upload it to the internet and see what happens,” Farid said.

Audio is just one of the areas where AI-generated disinformation poses a threat.

Free AI imagers such as Midjourney and DALL-E can produce realistic images of war and natural disasters in the style of traditional news media with a simple text message. Last month, some U.S. school districts began blocking ChatGPT, which can produce readable text on demand, such as student term papers.

Categorized in: