The Tech Behind Deepfakes

I am by no means a computer programmer, let alone an expert in artificial intelligence. But I have done a lot of research on AI and machine learning in the context of deepfakes. In this post, I summarise some of the key concepts underpinning how deepfakes are made.

Artificial Intelligence

When some people think of Artificial intelligence, they may imagine a system such as Skynet – the computer with superintelligence that stars, albeit off screen, as the antagonist in the Terminator films. Such systems are often portrayed as sentient and self-aware, and may one day replicate or potentially even surpass human intelligence to form cultures or societies of their own. But this sort of AI, known as ‘general’ or ‘strong’ AI, is – for the time being, at least – purely hypothetical.

On the other hand, AI which is ‘narrow’ or ‘weak’ is already fairly commonplace, and refers to a goal-oriented system programmed to follow certain rules and master a specific task, which traditionally, only humans were able to master.

To do so, a human programmer will typically code the algorithm with a step-by-step procedure involving calculations and various rules to follow, which in turn leads to a certain outcome. Examples of narrow AI in the film and television industry include Netflix’s algorithm, which suggests a series for you to binge watch. Similarly, software such as Cinelytic can read through scripts and offer predictions on its box office success (which I’ve also written about here). Using the Netflix example, if subscriber repeatedly gives a thumbs up to shows of a certain genre, an outcome of the algorithm may be to push shows labelled with the same genre to the top of the viewer’s recommendations.

Machine Learning

Machine learning is a subset of narrow AI, but one which does not rely on a set of pre-programmed rules to make decisions or generate an outcome. Instead, machine learning algorithms acquire their own ‘knowledge’, just as people learn through repetition and experience. Just think of flashcards!

With image recognition, a system will be shown thousands of labelled datasets, and thereby trained to identify images based on certain attributes and features. When the system incorrectly identifies an image, it adjusts the biases and weights of its various nodes. Despite the sophistication of machine learning, the training process and coding of features still depended upon extensive input from human programmers. You needed a programmer to label the datasets in the first instance, and then correct the algorithm if its identification or suggestion was incorrect.

GAN – the game changer!

This changed in the early 2010s, when computer scientists developed a special type of deep learning known as the generative adversarial network, or GAN. Deep learning is a subset of machine learning, so-called because it utilises multiple or ‘deep’ layers of a system’s nodes, works to progressively extract increasingly nuanced features from the datasets. The GAN can essentially teach itself, because two algorithms are pitted against each other as ‘adversaries’ – one to teach, and the other to learn – which requires no human supervision.

In other words, the big picture starts with Artificial Intelligence. Putting “strong” or “general” AI to one side, we can focus on “narrow” AI. Machine learning is a type of narrow AI, and deep learning is a type of machine learning. GAN is, in turn, a super special type of deep learning, which has made deepfakes possible. (Confused yet?)

A very simple diagram I made, depicting Swedish actress Alicia Vikander. Real images are used to train the algorithm: the generator and discriminator then “teach and learn from each other” (steps 2 – 4 are repeated) until a believable deepfake is acheived.

The Human Face x Deepfakes

New research suggests that the incredible diversity of human faces is the result of evolutionary pressure to make each of us easily recognisable. Because we are particularly adept at distinguishing different faces from each other, we can easily sense when something looks weird or unnatural. That eerie and unnerving sensation you gets when looking at a lifelike robot or computer game character even has its own name: the uncanny valley. But the GAN has made it possible to create incredibly accurate depictions of one of the most difficult images of all: the human face.

The term ‘deepfake’ is now used to describe any face-swapping technique, whereby images of an individual are used by AI to generate digital doppelgängers (look-alikes), and then superimpose them onto different bodies. Deepfakes generated with only one source image are often obvious as fakes, but those generated with thousands of images or video clips can be very realistic. In contrast to deepfakes, other forms of audiovisual manipulation which do not utilise artificial intelligence are known as “shallow fakes” or “cheap fakes”.

As a form of entertainment, deepfakes are available for almost anyone to make or enjoy. The software is free to download, hundreds of YouTube tutorials offer guidance on how to use it, and some freelance creators even sell their services for as little as €5 per video on marketplaces such as Fivver. Mobile apps such as ZAO, Celebrity Face Morph and Deep Art Effects generate fairly realistic face-swapped videos and augment one’s appearance using just one selfie as their source, and more mainstream apps like Instagram and Snapchat have ‘filters’ which can easily do the same.

Featured image from A Breakdown of Blade Runner 2049’s Oscar-Nominated Visual Effects