How to protect your identity while your voice is recorded constantly? Researchers are working towards privacy preservation!

EURECOM Communication
8 min readMay 26, 2021

Voice interactive systems are omnipresent these days, since voice is one of the most natural means to interact with automated systems. More and more we have devices at home with smart speakers, always listening to our conversations and potentially recording privacy sensitive information which could be linked back to you. It is now essential more than ever to consider “voice anonymisation”, which means removing the identity and suppressing the privacy sensitive aspects in a speech signal.

EURECOM’s Prof. Nicholas Evans, expert in speaker recognition, deepfake detection, privacy preservation and biometrics will guide us through the VoicePrivacy Challenge Initiative, a project dedicated to anonymisation and identity protection in voice speech signals.

Q. What is the context behind the VoicePrivacy challenge?

NE. The effort within the community to look at privacy preservation within the context of speech and language technologies is quite recent. There is a history of work but the topic has not really taken off until recently, meaning the last year or so. Realising this fact and given also the nature of speech and at what degree it can contain and reveal privacy sensitive information, this was something our community had to address.

So, we have relatively recently started exploring the privacy aspect and in this context we launched the VoicePrivacy Challenge Initiative. It is an international collaboration between French and Japanese partners, namely EURECOM, University of Avignon, Inria and NII in Japan. EURECOM’s work in VoicePrivacy is supported by an ANR project called HARPOCRATES (ANR-19-DATA-0008) as well ANR-JST CREST VoicePersonae (ANR-18-JSTS-0001). The participants in the challenge had to use standard database, protocols and metrics to benchmark different solutions for anonymisation.

These types of benchmarks and challenges are really the drivers that spearhead progress because without standard protocols and metrics, it is impossible to compare different solutions. Each lab is going to evaluate their solution on a different database/protocol/metric and then we won’t know whether any claimed improvements to performance come from the algorithm or simply because they are evaluating their algorithm on a different database. So, it is a community led initiative that takes the form of a challenge, where different labs around the world developed solutions to anonymisation and then they submit to us their speech data that is processed using their algorithm. In the end, we evaluate the anonymisation performance using our metrics

Q. What is the role of EURECOM in VoicePrivacy Challenge?

NE. My lab is the Audio Security and Privacy group, in the Digital Security department at EURECOM. In our group we are mainly looking at anonymisation through two angles. The first one, is to try and stop an eavesdropper intercepting your speech data. The second one, is to try and suppress any privacy sensitive information from your speech data, without any encryption needed. In other words, we are working on voice anonymisation, the process of suppressing the identity from the recording of a speech signal, removing information, for example your gender, your age, your job title, even your emotions while speaking. An obvious requirement with an anonymisation process is that we want your voice to still be either intelligible, meaning that a human or a machine can understand what you are saying or natural, meaning that a human being would listen to what you are saying and she/he wouldn’t know that your voice has been treated by an anonymisation system.

On the other hand it would mean that a speech recognition system, trying to understand what you are saying would still perform reliably and still interpret your message or your request even though it’s been treated to remove your identity.

Q. How can we define “identity” in a speech signal?

NE. Speech signals are quite rich in terms of the different sources of information that they contain and identity is one of them. These sources of information could reveal for example your gender, age, ethnicity, emotional state etc. Voice is a biometric and it can be used in order to determine who is speaking. Humans recognise the voice of people they know quite well and machines are able to do the same thing and learn to recognise your voice, in the same spirit that machines can recognise your face or your fingerprints.

But let me explain about biometrics and identity. For example, using samples of DNA the chances of confusing two people are extremely low. Now, face, iris, fingerprints, voice should also be very distinguishable. We can use these characteristics, called biometrics, to differentiate one person from another and be relatively confident in the result. However, biometric characteristics are not unique. In fact, each biometric characteristic has different degrees of distinguishability. There is often a misleading narrative in the media on being able to uniquely identify a person through such biometrics and we have to keep this in mind.

Q. How does anonymisation work?

NE. Regarding anonymisation, in terms of the VoicePrivacy Challenge, we want to develop an algorithm that would disentangle the identity information from all the other sources of information within the speech signal. Then, the goal would be to resynthesize the speech signal with all those other components except the identity. There is an argument claiming that if we are able to suppress the identity from a speech signal there would be no privacy concerns, although some would certainly disagree.

Here, I would like to clarify some terminology. We often talk about recognition, identification authentication, and verification and we have to keep in mind that they all have different meanings. The most general term is recognition and then we have identification on one hand and on the other hand authentication or verification.

Identification means that you would present your biometric to the system and it has to determine who you are.

Authentication/verification means that you would present your biometric to the system and it replies with a yes or no.

In the case of identification there is no claim of identity, so we have “1 to N” comparison. In the case of authentication/verification there is an inherent claim of identity and so an “1 to 1” comparison.

A sketch for voice data anonymisation

More generally, we haven’t got this far into looking at specific applications for voice privacy, but we are using authentication/verification type of experiments to determine how successful the anonymisation system is. More specifically, we run some experiments with a database of natural voices and we would judge the performance of a speaker verification on that database. Then, we would anonymise all the voices and then we would run the same experiment. Normally, the performance of the speaker verification system would deteriorate. If it deteriorates to the point that the error is around 50%, it means that the system is guessing and it cannot verify your identity with any confidence.

Regarding the tools we use, there is a variety of different solutions ranging from very traditional signal processing tools, to very complex deep learning solutions that are based on neural acoustic and waveform models. Both methods can distort your speech signal, such that speaker recognition systems don’t work anymore. Most people in the community are working on deep learning approaches, but I personally believe that we can still do pretty well with much simpler solutions.

Q. What would be an example of real life applications for such voice anonymisation systems?

NE. Imagine you attend a party one day with friends and maybe you drink a little bit more than you should. Your activities are recorded and posted online on social media. Now imagine your employer becomes aware of this. Now, if that data was treated with anonymisation, your face will be masked and your voice treated such that we could not tell if it was you. In other words, if you want to publish some data online, but you want to protect your identity, anonymisation is a solution. So, publishing content on social media would be one example of anonymisation.

Let’s say now you are using automated voice interactive systems when you call your bank or your insurance company. It is possible that anybody who can get a recording of your voice could derive privacy sensitive information from that recording. The point is that with the GDPR, there is a drive that is putting control of privacy into the hands of the individual. So, you should be able to decide for yourself, what information is being used by companies and for what purpose. But if you provide the information without protection, you are trusting tech companies not to misuse your private data.

Q. What are the challenges for succeeding in anonymisation?

NE. One of the big challenges is how to actually measure privacy. In fact there is no broadly accepted legal definition of privacy. One can wonder how the GDPR came to force when no one can define precisely what privacy means.

One other topic is computational complexity, keeping in mind that many of the systems must operate in real time. You might have an app installed on your phone that could anonymise your voice when you call your health insurance provider or your tax office. Some state-of-the-art solutions cannot run on a mobile phone, since there is too much processing, especially for the deep learning type techniques which are just too heavy on computation. Other lighter computational methods might not provide enough privacy preservation but you might be able to get them to run on a mobile device. So how do we strike a nice balance between utility and efficiency on personal consumer devices and the level of privacy preservation that is provided? That is a future challenge!

Q. What are the future directions of this field and your message as an expert?

NE. As mentioned before, understanding what privacy means in the context of speech signals is very important. Andreas Nautsch, a research fellow at EURECOM is an expert on speech data within the legal community and misconceptions about legal implications within the speech community, having written a very nice article on this topic. How anonymisation solutions should be integrated within existing speech technologies is an open question. How existing speech technologies should be adapted given that we may need to include anonymisation. Because if we anonymise speech, it is possible that tasks like speech recognition may not work so well anymore. We also have a lot of work to do in assessment to compare different solutions reliably. A central goal of the VoicePrivacy initiative is to bring answers to these questions. The goal is really to raise awareness that privacy is important within speech technology and that we can address the requirements of GDPR. But it is important to realise that we can’t put all speech technologies into one bucket. Different speech applications would require different privacy preservation solutions. Anonymisation is not the silver bullet; the field is much broader.

By Dora Matzakou for EURECOM

References

https://www.voiceprivacychallenge.org/

Introducing VoicePrivacy Initiative

https://arxiv.org/pdf/2005.01387.pdf

Article on Privacy Definition

https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2647.pdf

--

--

EURECOM Communication

Graduate school & Research Center in digital science with a strong international perspective, located in the Sophia Antipolis technology park.