What if they can clone your voice?

It’s a Canadian company that specializes in speech synthesis software. They’ve developed software they claim can copy anyone’s voice and make it say anything.

The founders tell me if they can get a high-quality recording of you speaking for just one minute, their software can replicate your voice with very high accuracy.

If they get a recording of you speaking for five minutes, they say it would be difficult to tell the difference between your voice and their computer-generated mimic. That’s where the name Lyrebird comes from: a lyrebird is an Australian bird that’s noted for its mimicry.

Here is the story, as they say solve for the equilibrium…

Confidential business conversations over the telephone might dwindle, and perhaps we will have Peter Cushing and Humphrey Bogart movies for a long time to come.  What else?

For the pointer I thank Michelle Dawson.

Comments

The low-hanging fruit is replacing voice actors for animation. Donald Duck, Bugs Bunny, Tweety Bird -- all synthetic.

Indeed. I remember when the Simpson's voice actors went on strike.....5-6 figure salaries http://www.today.com/popculture/simpsons-stars-strike-more-doh-wbna4645435

But, it is more interesting how the intellectual property deals with the topic. Are voice, inflections and mannerisms intellectual property?

The issue would be whether you maintain a property interest in your voice that has been scanned to make the synthetic version, a derivative version of your voice.

Just as a precaution, I am going to the copyright office to register my voice.

Images, words and sound have long been treated differently.

Sound is not generally copyrighted unless you put it in classical notation, called "sheet music" and enforce copyright. The part of songs that is usually protected is the words, not the sounds.

There are a few exceptions for some very deep pocketed industry players. Basically, that's not how it works.

Which suggests that it might be difficult to protect the right to prevent others from cloning your voice. Among other things, it would be annoying (understatement). Some variety of security risks could also present themselves in many situations.

I'm pretty sure there are rules against appropriating someone's likeness for commercial purposes. In the film "Rogue One," the production company apparently got permission from Peter Cushing's estate heirs to use his computer-generated likeness in the film. Presumably similar laws would apply to using someone's synthesized voice. Then again, I remember that South Park did an episode where they crudely sampled Isaac Hayes' voice for the character Chef.

Just because Rogue One asked doesn't mean they needed permission. It would suck for marketing if Peter Cushing's family complained about it.

South Park used old samples from previous shows; "crudely sampled" is dead on. It was an obvious parody, and you didn't really think Isaac Hayes talked about having sex with children.

Came here to say this. I worked for a speech-synthesis company in the early 2000s (before Nuance bought out every competitor in that space). Creating a new voice involved dozens to hundreds of hours of studio time with the actor, and several companies were of course interested in immortalizing the voices of some of their famous signed actors -- think "large East Coast TelCo with a signature bass voice from an iconic African-American actor" and you can imagine the stakes. Of course, the actors understand the stakes too, and were understandably skittish about signing these agreements. It sounded a lot like the exploitation of African-American musicians in the early 20th century all over again: paid to sing once, then somebody else monetizes your recording for eternity.

I'm skeptical that the tech has advanced to the point these guys claim. The tech press is notoriously credulous of the ridiculous, unsupported claims of VC-hungry startups.

But if it is true... voice actors need to get in front of this shit. And IP lawyers need to start lobbying for right of publicity/personality protection. There is some legal precedent in the Tom Waits / Chester Cheetah case, but that's the 9th Circuit, which is kind of like Law on Mars. We could use some standardization of these rules before this tech actually does arrive (if it ever does).

Just run infinite random modifications on anything until your converge towards approximations with sufficient similarity.

Or use some theory to direct the algorithms, and it will be done faster and better.

I imagine the methods in the early 2000s were extremely crude, and are not very related to what can be done with widely accessibly skills in the market (or other places) these days. Or maybe related, but instead of having someone listen for how similar it is, you can just run an algorithm that tries some thousands of different modulating algorithms for the individual until once of them meets a threshold. Then the software user just picks from one of the top few (or the software picks it directly).

Maybe. That was the dominant thinking from the 1960s through the late 90s: everybody assumed a purely synthetic approach would eventually work as well as the brute-force approach of recording hundreds of hours of speech and slicing it up. But just like most wishful thinking in AI, they were wrong. The brute-force approach had lots of refinements applied (modulation, thin slices, multiple phonemes recorded for different acoustic / phonetic contexts) but pure synthesis just got nowhere close to the quality of the brute-force approach. Maybe something has changed fundamentally since then, but if it has I haven't heard about it

Today's voice actors can likely band together to protect themselves.

But what is to stop studios from, in the future, using completely artificial voices, with no one human source? If that was the voice of Homer Simpson, there would be no need to negotiate with anyone.

How many more decades will pass before visual information can be duplicated, replicated, and manipulated in the same way? In other words, will we eventually create blockbuster-level movies on our laptops using the artificial voices that we can create now?

https://en.m.wikipedia.org/wiki/The_Congress_(2013_film)

Reality already explored in film.

It won't be blockbuster if anyone can do it. Because someone else will be able to do better.

But you can already do a lot for not much money.

You can't believe what you hear.

... what if they could digitally manipulate/falsify photos & videos ?

what if they could digitally copy/forge your legal signature ?

what if they could copy/steal your SocialSecurityNumber and impersonate you ?

what if all types of private & government official documents could be falsified ?

what if the Notary Public system & Court Reporters were silly anachronisms ?

what if public officials/legislators/executives routinely signed/endorsed official documents/laws that they had never read ?

what if the NSA/FBI/CIA were recording all voice communications in the U.S. and could analyze and manipulate them at will ?

what if government criminal justice forensics labs were often incompetent/corrupt and presented false evidence against citizens in formal court ?

Falsifying official documents is not very useful in places where things are not messed up. Goes with independent courts, rule of law, good (or at least decent) governance, and all that other kind of jazz.

As faces can also be replicated, death of appeal to authority and of expertise. Death of wire as method to prosecute people. Triumph of postmodernism: they were actually right all the time. Occidental dissolve. Return to more simpler but more reliable societal and political structures such as family and neighbourhood.

Choose whichever percentage you want for each bet.

What about music? Singers will be able to record far more music if they don't actually have to do any singing. And once life-like artificial voices are available (not based on actual people), the opportunities for composers should explode.

So what does this mean for Queen?

Time spent in the recording studio is not the binding constraint on musician productivity.

On condition of having something worth recording, actually, yes, access to a professional recording studio very often is precisely the constraint.

You need, say $5000 to get a few solid tracks out. And once you count the time of just rent on the physical space, the time of a sound engineer, the time of a helper, and the time of someone who keeps it all running, you can see that that doesn't get you very far. You'll have to be extremely prepared in advance to get good value from that size of budget.

Aside from voice recordings made for blackmail or as criminal evidence, I'm not sure much changes.

Just as with documents, a person who wants to authenticate a recording to later show that I said this at this time, can do so with digital signature techniques.

Fraudulent voice syntheses will lack power as evidence, for want of a signature.

The only loser is the party who wants to use a covert or hostile recording as evidence.

There's an opportunity here for a party to develop an unimpeachable reputation for not engaging in recording fraud, to market recording devices that are certified to a high FIPS 140 category for tamper resistant cryptography, such that a recording signed by the recording device as having been made at such a time, would be trustworthy.

Unless all forms of digital signature are broken, I don't see a problem here.

When the governing Conservative Party in Canada offered a million dollar bribe (an insurance plan to a dying cancer patient) in order to get his vote in the legislature, it was sufficient that the possibility to falsify could not be ruled out.

They did not prove it in the slightest. Nor is there record of them having tried to prove that the recording was false.

They merely asserted that it was plausibly false, and thus the police were disinterested in the possibility that the recorded conversation during which the million dollar insurance plan for a dying cancer patient (aka bribe) could have been a fabricated conversation.

No one denied that a million dollar insurance plan was offered to a dying man. They denied the recorded conversation on the basis of plausibility.

So, what happen in the 10 years that followed were less surprising to some people than others. The first media interaction after the election included pre-approved questions only, a practice that continued until the method of pre-approved journalists could have the same effect. (Once, several young women got kicked out of an event because one time on Facebook they posted a picture where a member of the opposition was in the picture.)

"Confidential business conversations over the telephone might dwindle". Hardly. If what people write in emails is any indication, people don't care who reads (or hears) confidential (or embarrassing) communication. If I need to communicate with someone that I don't wish to be published for the world to read or hear, I don't do it in an email I talk to them. I suppose that the voice replication software may mean I'm not talking to the person I think I'm talking to, but at least I know what I say won't be recorded, not in my state where it's a crime to record a phone conversation without consent. And there's always Skype. Would someone actually disguise herself as someone she is not just to get me to say something to her that I don't want her to hear? Paranoid, are we? At least this latest nonsense didn't come out of Silicon Valley so I can't blame the boy wonders in Silicon Valley. Instead, I can blame the boy wonders in Canada. Why can't the boy wonders invent something useful like flying cars, spaceships to Mars, and talking refrigerators. Oh, wait!

New therapies for those who have lost loved ones. New forms of audio memes literally putting words in a celebrity's mouth (and more strawmen too). Not just movies, the average is over in audiobooks and metro instructions too. A new (overused?) plot device for horror movies, e.g., "Yes honey, I'll meet you in the basement." Customized ads that surprise you in familiar voices, imagine if your mother was telling you to consider Taco Bell for dinner! A great way to cheat on phone interviews.

Obvious applications in porn. Auto-tuned celebrity look alikes in mass media, or custom audio of people you meet one time. Recording audio will be the new creepshot.

Amazon put out a demonstration of this a few months ago.

Can't remember where I read it exactly. On the internet ...

This would not be used to organically synthesize a clone of your voice. It would be used to modulate the digitization of another voice. So, someone speaks a sentence, but the digital output sounds like your voice.

There is software that does this for facial features too. It points the camera at your face, and then maps expressions onto the original content. So you can make Trump grimace when he says dumb stuff, among other things.

Getting all the vocab and mannerisms is far more complicated than cloning a voice, per se, with existing technology. The reason is that you need an extensive "library" of psychological profiling, etc. to make it a convincing representation of the person, as opposed to causing a voice sound that is acoustically similar to that of the individual.

I guess Schwab's shift to voice verification as a biometric marker wouldn't be effective anymore. There aren't many biometric markers that are cheap and easy that are left.

David Attenborough will narrate everything.

Ha! Him and Morgan Freeman.

A body of law known as the "right of publicity" (statutory) and/or the tort of appropriation (misappropriation) exists in most U.S. states. The right of publicity and/or (mis)appropriation prohibit the unauthorized use of a person's "likeness."

Voice is well-established to be a "likeness" under some version of the publicity/appropriation law in most states.

Variation from state to state does exist. Not all states have common law precedent or statutory law prohibiting the non-commercial use and some states limit to "public figures", but the trend is toward adoption of statutes that protect the "right of publicity" for everyone celeb or not.

My own state, Alabama, recently adopted a comprehensive statute protecting everyone's right of publicity and prohibiting unauthorized use of likeness (which includes voice) for commercial/non-commercial purposes.

"prohibit the unauthorized use of a person’s “likeness.”"

thanks

Comments for this post are closed