Ars Technica: Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Ars Technica: Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio. “On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person’s voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker’s emotional tone.” Do we have time to vote against this trend of calling everything GAN-related whatever-E?

Leave a Reply

%d bloggers like this: