OpenAI announces new technology for cloning voice from 15-second audio
OpenAI has introduced a new innovative tool called Voice Engine, which can clone the voice of any person from a 15-second audio sample.
Voice Engine analyzes a short audio signal and creates natural-sounding speech with “emotional and realistic voices. This innovative technology, which is based on OpenAI’s existing speech synthesis API, can be useful for a variety of purposes: audiobooks, language translation, and helping people with speech disorders.
OpenAI recognizes the serious risks of using this technology, including the possibility of its misuse by unscrupulous individuals. Therefore, the company is actively working to ensure privacy and security and is implementing a number of measures, such as watermarking and proactive monitoring of system usage.
According to the announcement, Voice Engine remains at the preview stage, but the company has already conducted successful pilot programs that demonstrate the potential of Voice Engine. A preview was held at Brown University, where the feature was used to help patients with speech impairments.
According to OpenAI, their Voice Engine will be implemented while collecting feedback from partners and adhering to a policy that prohibits the use of cloned voices without the person’s consent. In addition, it is planned to create a “list of prohibited votes” to avoid abuse.
The estimated cost of using Voice Engine is approximately $15 per million characters, which is approximately 162,500 words.