In a few years, you could be listening to an album of new songs featuring a duet between Elvis and Kurt Cobain. No, the two never cut a record together, but engineers and computer programmers are getting closer to being able to “resurrect” any singer’s voice for use in synthesized songs.
Yamaha’s been developing voice synthesizers for years — think Mac’s text-to-speech meets AutoTune — under the brand name Vocaloid. But to build a Vocaloid “voice library,” a singer typically had to sing every possible syllable, one at a time, in the target language. A computer later would synthesize the fragments into songs.
But now the Vocaloid team has announced that it has succeeded in building a library based on the voice of someone who couldn’t participate in the painstaking process: Hitoshi Ueki, a popular Japanese vocalist who died in 2007. The initial results were revealed on a Japanese video-streaming site earlier this year.
“As far as I know, many viewers were satisfied with the result, and so am I,” said Yamaha researcher Hideki Kenmochi in an e-mail to Wired.com. “It really sounds like him, because the creator [the programmer in charge of the voice library] did a good job.”
If perfected, the technology could result in some very uncanny entertainment, with singers, actors and others whose voices have been extensively recorded seeming to speak from beyond the grave. The “resurrected” voice could be employed anywhere computerized speech is heard, from automated customer service to GPS devices (though Yamaha’s mum on where its proof-of-concept technology will end up).
Kenmochi and his team started their ongoing research on Ueki-loid, as the software’s informally called, last year. They built a computer that could “listen” to isolated vocal tracks from several songs by Ueki and pick out the individual syllables. From there, it will be relatively simple to use the library to build new tracks.
LISTEN: A Ueki-loid audio file
The technology isn’t perfect. Listening to a song created by an English-language Vocaloid, it’s often clear that the voice was made by a computer — but there are moments when it’s possible to forget. This near-perfection is known as the “uncanny valley” in English, and “the valley of death” in Japanese, according to Jordi Bonada Sanjaume, part of the music technology team that helped develop the original Vocaloid, at Pompeu Fabra University in Barcelona, Spain.
“When you pretend the synthesis sounds like a real person, any small artifact or unnatural subtle sound will make the whole listening experience frustrating, emphasizing that it sounds synthetic,” Sanjaume said in an e-mail to Wired.com. “Otherwise, if you ’sell’ it as a synthesizer, all those small artifacts or unnatural sounds can be at some point totally ignored during the listening experience, or even wanted and pleasing.”