NexTV Africa & Middle East

Complete News World

Impressive AI software that recreates faces from sound

Impressive AI software that recreates faces from sound

Reconstructed images on images after the audio sample (Photo: screenshot/Speech2Face)

Technology continues to grow by leaps and bounds, drawing on different areas to explore new capabilities and features. One of them is strength. “Reconstruct” a person’s face with a portion of the voice.

the study Speech 2 face It was presented in 2019 at the Patterns of Seeing and Recognition conference that showed that Artificial Intelligence (AI) can do it Decode a person’s shape through short audio clips.

The document explains that the goal of researchers Tae Hyun Un, Tali Dikle, Changel Kim, Inbar Mosseri, William T. Freeman, and Michael Rubinstein of the MIT Science and Research Program is not to reconstruct people’s faces symmetrically but to create an image with the physical properties associated with the analyzed sound.

To achieve this they used Deep neural network design and training I analyzed millions of videos taken from YouTube where people are talking. During training learn the model Connecting sounds to facesallowing you to produce Pictures with physical characteristics similar to loudspeakersincluding Age, gender and race.

Program encounters with sound (Photo: Screenshot / Speech2Face)
Program encounters with sound (Photo: Screenshot / Speech2Face)

Training was carried out under supervision and using Matching faces and voices From online videos, without the need to model the detailed physical features of the face.

“The reconstructions, obtained directly from audio, reveal the interrelationships between faces and sounds. We quantify, quantify, and in what way, our Speech2Face reconstructions from audio resemble real images of the faces of the speakers.”

They show that because this study could have sensitive aspects due to race, as well as privacy, no specific physical aspects were added to the facial reconstitution and they confirm this, like any other aspects. . system machine learning, This gets better over time, because with each use it increases its knowledge library.

See also  Second attempt: new comer GeForce RTX 3060 with mining brake

Although the evidence shows it Speech2Face has a large number of matches between faces and voicesalso has some drawbacks, in which race, age, or gender do not match the audio sample used.

Examples of errors.  The first image corresponds to the person's face in the audio and the second to the image recreated by the program (Image: screenshot/Speech2Face)
Examples of errors. The first image corresponds to the person’s face in the audio and the second to the image recreated by the program (Image: screenshot/Speech2Face)

The form is designed to present Statistical correlations between facial features with voice. It must be remembered that AI learned from YouTube videos, which do not represent a real sample of the world’s population, for example, in some languages ​​there are inconsistencies with the training data.

In this sense, the study itself recommends, at the end of its findings, that those who decide to explore and update the system take into account a broader sample of people and voices so that it is done in this way machine learning when Wider repertoire of face matching and entertainment.

The program was also able to recreate the sound in the cartoons, which also bear an amazing resemblance to the voices in the analyzed phonemes.

Since this technology can also be used for malicious purposes, re-creating the face only preserves the closest thing to the person and does not give full faces, as this can be a problem for people’s privacy. However, he was surprised by what the technology could do with audio samples.

Read on:

Beware: Hackers are impersonating WeTransfer to steal data
E-Health, what it is and why it is the future of medicine
solar panels that work without the sun; They make panels that also produce energy at night
WhatsApp 6 offers reactions with emojis for messages and more privacy options