OpenAI Disables AI Voice That Sounded Like Scarlett Johansson

If you’re questioning OpenAI’s commitment to ethics these days – easy to do with the recent disbandment of their safety team –  the actress Scarlett Johansson just added more fuel to the fire. When OpenAI released a demo of its new GPT-4o last week, they focused on the near-instantaneous response time, a significant feature of the new version. However, one of the voices used, Sky, sounded suspiciously close to Scarlett Johansson’s voice from the sci-fi movie Her, where a man falls in love with the voice from his computer. Shortly after the demo, OpenAI CEO Sam Altman tweeted an image of himself with the word “Her” over it.

That didn’t last long.

Altman took down the post, and OpenAI was forced to shut down the voice that sounded like the actress. On Monday, they released a statement saying,

We’ve heard questions about how we chose the voices in ChatGPT, especially Sky . . . We are working to pause the use of Sky while we address them.

It might sound like an innocent mistake by a new startup, but the story gets more complicated when you hear Scarlett Johansson’s side. It turns out that Sam Altman approached her last year about using her voice. She declined, only to receive another request within 48 hours of the demo’s release. Before she could respond, the demo, including the Sky voice, was out.

Like many tech companies, OpenAI lives by Grace Hopper’s famous axiom: “It is better to ask forgiveness than permission.” That might work (though it actually backfired) when you’re trying to convince a city to adopt a scooter-share program. It’s a whole other story when it involves the theft of someone’s identity or an essential element of their livelihood.

Scarlett Johansson’s Reaction

Here’s Scarlett Johansson’s statement in the Hollywood Reporter:

Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and AI. He said he felt that my voice would be comforting to people.  

When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. Mr. Altman even insinuated that the similarity was intentional, tweeting a single word “her” – a reference to the film in which I voiced a chat system, Samantha, who forms an intimate relationship with a human.

Two days before the ChatGPT 4.0 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.

As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAI, setting out what they had done and asking them to detail the exact process by which they created the “Sky” voice. Consequently, OpenAI reluctantly agreed to take down the “Sky” voice.

In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity. I look forward to resolution in the form of transparency and the passage of appropriate legislation to help ensure that individual rights are protected.  

Johansson suddenly found herself front and center for one of the issues in the Hollywood writers and actors strikes last year – the rights to your own work and control of your identity. If you think this issue is only relevant to celebrities, think again. It will hit us all over the next few years unless – as Johansson says – appropriate legislation is passed.

The Ethical Challenges Ahead

The current issue of an AI platform mimicking the voice of a well-known actress is just one small segment of the vast ethical challenges we face with the rapid growth of generative AI. But it reveals the dilemmas we face in this era. OpenAI’s response in taking down the Sky voice notes the following,

We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice — Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. . . . To protect their privacy, we cannot share the names of our voice talents.

Taking the AI startup at its word, it’s likely that a professional voice actor was used for the Sky voice. But clearly, Sam Altman wanted Johansson’s voice as he’s fascinated by the movie and initiated discussions with her about licensing her voice.

No Precedent For Where We Are Headed

Currently, there is simply no legal precedent for AI voice cloning. However, there are legal arguments based on misuse or false endorsement. Both the singer/actress Bette Midler (in a case against Ford Motor Co) and the singer Tom Waits (in a case with Frito-Lay) faced these issues in the past with companies using imitations of their voices.

If you want to see how this issue was handled in the pre-AI era, the law firm Gottlieb, Rackman & Reisman has a concise account of these legal cases. The details are useful, but it’s still like navigating the city you live in today with a map from another century.

The Ethical Complications of AI

But as always, AI makes it much more complicated. You’re not just hiring a professional voice actor but tweaking the result to straddle the line between a unique voice and outright appropriation of another’s voice.

Think about it for a moment. What if Sam Altman hadn’t referred to the movie Her? Would the Sky voice have been so quickly associated with Scarlett Johansson’s work? What if it sounded less like her? What if only a few people thought it was the voice of the character in Her? Where does replication cross the line into appropriation? In contrast, the cases against Ford and Frito-Lay were easy, and both celebrities were awarded damages in their trials.

And if that isn’t complicated enough, there are currently no laws against using the voice of a deceased person. We’ve already seen the unsettling use this year of the voices of children killed by gun violence becoming part of a phone campaign targeting the U.S. Congress. The campaign by The Shotline raised a host of questions about parents using the voices of their deceased children. They argue it’s ethical as the parents are creating the voices. But it is still using the voices of children no longer here – and who were never asked if they wanted to speak from beyond the grave.

Only The Beginning

How this conflict plays out over the next few weeks will be interesting. No doubt, OpenAI will not release the Sky voice again—or at least not in its current form. But it remains to be seen if Scarlett Johansson continues to seek a legal remedy. Our laws were designed for another era, where technology didn’t factor into replicating another’s voice. As the use of AI explodes, it becomes increasingly clear that we will need new laws and regulations to deal with the appropriation and misuse of identity.

As Scarlett Johansson said,

I believe these are questions that deserve absolute clarity.