There are times when we receive an advertisement for a product or service after just talking about it with friends or family at home in the quiet. We can wonder about this strange coincidence: would we be spied on by our voice assistant, by certain applications on our smartphone, our computer, our connected watch? If this espionage is not proven, it is technically possible. A team of three experts in deep learning from Columbia University has developed an algorithm that creates almost inaudible tones that scramble the frequencies, preventing our own devices from spying on us. They presented their research entitled “ Real-time neural speech camouflage during the last ICLR, (International Conference on Representations of Learning, dedicated deep learning).
Natural language processing (NLP) or automatic language processing (TALN) is a branch of artificial intelligence that allows machines to analyze the human voice in order to transcribe it into text, understand it, formulate a query or a conversational partner like Siri to answer or Alexa. AI algorithms are mainly divided into two groups: detection and generation. In the case of NLP, recognition consists in analyzing and understanding the sound, while generation does the synthesis. The work of Mia Chiquier, Chengzhi Mao and Carl Vondrick, a computer scientist at Columbia University, covers both areas. Their approach is innovative because they introduced predictive attacks.
The method of neural voice camouflage
Automatic speech recognition models built into almost all smart devices have the potential to eavesdrop on conversations. Over the past decade, work has shown that neural network models are vulnerable to small additive perturbations, ambient noise, etc. However, streaming audio is a particularly difficult area to jam because the computation has to be done in real time and the software developed to date has not been effective enough to thwart espionage.
The changes in the sound signal make it almost impossible for a machine to keep up with a person’s speech. The biggest challenges for the team were optimization and speed: to be efficient, their algorithm had to be able to predict and adapt to a change in tone of voice or rate of speech.
Predictive attacks to prevent eavesdropping
The team introduced predictive attacks capable of interrupting any word that automatic speech recognition models were trained to transcribe.
It’s actually a signal emitted by a computer whose Hertzian frequencies vary according to the speaker’s vocal characteristics at a frequency of around 16 kHz, which the researchers say resembles the sound of a quiet air conditioner in the background.
The algorithm of deep learning, trained from a large labeled speech data set, predicts what comes next. A noise model fitted to the prediction is then generated, making the coming speech unintelligible to an automatic speech recognition tool.
Mia Chiquier, assistant professor of computer science, first author of this study, says:
“Our algorithm stops a malicious microphone from picking up your words 80% of the time. It works even if we don’t know anything about the malicious microphone, like its location or even the software that uses it. »
The algorithm is only at the prototype stage, the team that is continuing to work on it would like to offer it in the form of a downloadable application in different languages.
Sources of the article:
Real-time neural speech camouflage
Mia Chiquier, Chengzhi Mao, and Carl Vondrick
University of Columbia
ICLR 2022 (Oral)
#Columbia #University #team #developed #algorithm #combat #prying #microphones