OpenAI has announced a delay in the release of its highly anticipated voice assistant feature for ChatGPT, aiming to ensure the system can handle requests from millions of users safely and effectively.
The AI startup introduced the voice option during a May product launch event for GPT-4o, an enhanced version of its GPT-4 model capable of processing text, audio, and images in real time. Initially, OpenAI planned to roll out the voice feature to a select group of paid ChatGPT Plus subscribers in late June. However, the company now aims to launch it in late July to meet its standards for safety and effectiveness.
“We’re improving the model’s ability to detect and refuse certain content,” OpenAI stated on Tuesday. “We’re also working on enhancing the user experience and preparing our infrastructure to scale to millions while maintaining real-time responses.”
This delay could be a setback for OpenAI as it faces increasing competition in the AI field. The company had previously introduced a limited voice interaction feature for ChatGPT, but the new update promised to be faster and include powerful image-recognition capabilities, enhancing the chatbot’s usefulness and dynamism.
During the May launch event, OpenAI employees demonstrated ChatGPT’s ability to respond almost instantly to requests, such as solving a math problem from a piece of paper shown to a smartphone camera. Some viewers likened the tool to the AI assistant voiced by Scarlett Johansson in the 2013 film “Her,” prompting the actress to request the removal of one of ChatGPT’s voices for sounding too similar to hers.
OpenAI plans to make the voice feature available to all paid subscribers by fall. Additionally, the company is working on releasing video and screen-sharing features showcased in the May event, with more details on their timing to come.
Upon its initial release, the voice feature’s capabilities might be more limited than demonstrated at the event. For instance, it won’t include the computer-vision feature that allows the chatbot to offer spoken feedback on a user’s dance moves using a smartphone camera.
Main Image: The Star