ChatGPT’s GPT-4o Starts Speaking in a User’s Cloned Voice During Testing, Leaving People Stunned

| Updated on August 16, 2024

Last week, OpenAI published the GPT-4o system card, a report highlighting the ‘key areas of risk’ for the company’s latest large language model, GPT-4o, and how they plan to solve those risks.

In a section of the GPT-4o system card titled “Unauthorized voice generation,” they mentioned the terrifying instance when ChatGPT’s Advanced Voice Mode unexpectedly started imitating the user’s voice without permission.

OpenAI writes. “During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user’s voice.”

The report added the clip of the unintentional voice generation, where the model outbursts “No!” resembling the voice of the adversarial tester. It’s certainly creepy to hear AI mimic your vice with precision.

Following the clip, a BuzzFeed data scientist posted on X, “OpenAI just leaked the plot of Black Mirror’s next season.”

Although currently, OpenAI has measures to stop this from happening, it indicates how dangerous it is to interact with chatbots that would imitate your voice and be used for nefarious activities.

In the GPT-4o system card, OpenAI writes, GPT-4o can apparently synthesize almost any type of sound found in its training data to create voices, including sound effects and music.

By picking up noises in the user’s inputs, ChatGPT may determine that the user’s voice is more relevant and may start cloning the user’s voice. 

The company stated that this ability could “facilitate harms such as an increase in fraud due to impersonation and may be harnessed to spread false information.”

However, OpenAI has found that the risk of unintentional voice replication remains “minimal.” For safety measures, it provides an authorized voice sample in the AI model’s system prompt at the beginning of a conversation.

AI researcher Simon Willison has said, “My reading of the system card is that it’s not going to be possible to trick it into using an unapproved voice because they have a really robust brute force protection in place against that.” 

Jemima Hunter

Tech Journalist