In May, when OpenAI debuted an eerily realistic, near-real-time “advanced voice mode” for its AI-powered chatbot platform ChatGPT, the company said the feature would be rolled out to paid ChatGPT users within a few weeks.
Months later, OpenAI says it needs more time.
In a post on OpenAI’s official Discord server, OpenAI said it had planned to start rolling out Advanced Voice Mode in alpha to a small group of ChatGPT Plus users in late June, but outstanding issues forced it to postpone the launch to sometime in July. .
“For example, we are improving the model’s ability to detect and reject specific content,” OpenAI wrote. “We are also working to improve the user experience and prepare our infrastructure to scale to millions while maintaining real-time responses. As part of our iterative deployment strategy, we will begin the alpha phase with a small group of users to collect feedback and scale based on what we learn.”
OpenAI says Advanced Voice Mode may not go live for all ChatGPT Plus customers until the fall, depending on whether it meets some internal safety and reliability checks. However, the delay will not impact the rollout of the new video and screen sharing capabilities that were showcased separately during OpenAI’s spring press event.
These capabilities include solving mathematical problems, giving a picture of the problem, and explaining the various settings menus on the device. It’s designed to work over ChatGPT on smartphones as well as desktop clients, such as the macOS app, which became available to all ChatGPT users earlier today.
“ChatGPT’s advanced voice mode can understand and respond to emotions and non-verbal cues, bringing us closer to natural, real-time conversations using AI,” OpenAI wrote. “Our mission is to carefully bring you these new experiences.”
On stage at the launch event, OpenAI staff demonstrated ChatGPT’s almost instantaneous response to requests such as solving a math problem on a piece of paper held in front of a researcher’s smartphone camera.
OpenAI’s advanced voice mode has generated a great deal of controversy over the similarity of the default “Sky” voice to that of actress Scarlett Johansson. Johansson later issued a statement saying that she had hired legal counsel to inquire about the audio and obtain precise details on how it was developed, and that she had rejected repeated pleas from OpenAI to license its audio to ChatGPT.
OpenAI, while denying that it had used Johansson’s voice or audio resemblance without permission, later removed the offending audio.