ElevenLabs introduced a tool that clones Celebrity voices.

A few days ago, the startup ElevenLabs introduced a beta version of the platform for creating synthetic clone voices of real people for voicing texts. Just a few days later, deepfakes of the voices of celebrities voicing extremely dubious texts appeared on the Web.

According to the company, there is an “increasing number of cases of abuse of vote cloning” – the company is already working to solve the problem by introducing additional protection measures.

What was meant by abuse, the company did not specify, but it is already known that audio recordings with the voices of celebrities containing statements of unacceptable content appeared on Internet forums.

It is not yet known whether all the materials were created using ElevenLabs technology, but a significant collection of voice files contains a link to the company’s platform. However, there is nothing surprising in this, since the emergence of publicly available machine learning systems has led to the emergence of numerous deepfakes of various kinds.

ElevenLabs is collecting Feedback

ElevenLabs is now gathering feedback to prevent abuse of the technology. At the moment, the company has not come up with anything unusual except for adding additional account verification measures to provide access to vote cloning.

Ideas include entering payment information or ID data. In addition, verification of the rights to use the voice that users intend to clone will be considered, for example, a sample with a reading of the proposed text will be asked.

Finally, the company is considering the possibility of completely abandoning the Voice Lab tool and forcing it to pass the verification of votes in manual mode. In the meantime, users were encouraged to share ideas with the developers of the service.

Microsoft is the ElevenLabs Competitor

It is known that in the first half of January, a similar solution was presented by Microsoft. Its VALL-E tool also allows you to convert text into speech, using as a sample just 3 seconds of recording the voice of any person.