Adobe VoCo – what on earth?


This morning, I stumbled across this video of some really cool software coming from Adobe, called VoCo. The demo initially sounds rather fake, i.e. the sentence where the voice says “I kissed my wife and my wife”, or where the dogs and the wife are switched position. However, it gets really cool when the software is able to say words that weren’t in the original audio clip.

One thing that isn’t clear yet is how it works, though I guess it has something to do with learning phonemes as spoken by a single person and sampling/resequencing them. Zeyu Jin, the developer demonstrating VoCo in the video above states that it requires around 20-minutes of surplus recording to get to the point where the algorithm can speak whatever (the programmer? producer? *shrug*) wants.

I’m interested in this piece of software as it could revolutionise the way that dialogue is produced (in particular for games). A bit of blue-sky thinking is necessary here, but it’s not hard to imagine a world where you hire a voice actor, and get the software to replicate their voice, then make as many lines or variations of those lines as necessary. And we all know how much variation is required when working in game audio. Obviously, this wouldn’t work flawlessly in practice, and it’s hard to conceive of the output of this software being indistinguishable from the real thing (which is a good thing: we can all agree that fake recordings of people speaking could be rather dangerous..). It’s also not hard to imagine that voice actors would be against using this software to create new lines, essentially putting them out of work. I do, however, doubt that this software could replicate complex vocalisations and the emotion that professional voice actors can convey.

We don’t know anything about this software becoming available to the consumer yet. One would like to think it was going to be released as a part of Audition, and the footage above appears to show it running as a part of Audition.

Maybe I’m looking at this wrong, and it should only be used as a tool for editing and improving existing dialogue. Still, I’m excited to see more of VoCo as more information becomes public.