I am by no means a computer programmer, let alone an expert in artificial intelligence. But I have done a lot of research on AI and machine learning in the context of deepfakes. In this post, I summarise some of the key concepts underpinning how deepfakes are made.
Artificial intelligence is often associated with a system such as Skynet, the computer with superintelligence that stars, albeit off screen, as the antagonist in the Terminator films. Such computers are often portrayed as sentient and self-aware, and may one day replicate or potentially even surpass human intelligence to form cultures or societies of their own. But this sort of AI, known as ‘general’ or ‘strong’, is – for the time being, at least – purely hypothetical. On the other hand, AI which is ‘narrow’ or ‘weak’ is already fairly commonplace, and refers to a goal-oriented system programmed to follow certain rules and master a specific task.
To do so, a human programmer will code the algorithm with a step-by-step procedure involving calculations and various rules to follow, which in turn leads to a certain outcome. Examples of narrow AI in the film and television industry include Netflix’s algorithm, which suggests a series for you to binge watch, or even software such as Cinelytic, which can read through scripts and offer predictions on its box office success (which I’ve also written about here). Using the Netflix example, if subscriber repeatedly gives a thumbs up to shows of a certain genre, an outcome of the algorithm may be to push shows labelled with the same genre to the top of the viewer’s recommendations.
Machine Learning & GAN
Machine learning is a subset of narrow AI, but one which does not rely on a set of pre-programmed rules to make decisions or generate an outcome. Instead, machine learning algorithms acquire their own ‘knowledge’, just as people learn through repetition and experience. With image recognition, a system will be shown thousands of labelled datasets, and thereby trained to identify images based on certain attributes and features. When the system incorrectly identifies an image, it adjusts the biases and weights of its various nodes.
Despite the sophistication of machine learning, the training process and coding of features still depended upon extensive input from human programmers. This changed in the early 2010s, when computer scientists developed a special type of deep learning known as the generative adversarial network, or GAN. Deep learning, so-called because it utilises multiple or ‘deep’ layers of a system’s nodes, works to progressively extract increasingly nuanced features from the datasets. The GAN can essentially teach itself, because two algorithms are pitted against each other as ‘adversaries’ – one to teach, and the other to learn – which requires no human supervision.
New research suggests that the incredible diversity of human faces is the result of evolutionary pressure to make each of us easily recognisable. Because we are particularly adept at distinguishing different faces from each other, we can easily sense when something looks weird or unnatural. That eerie and unnerving sensation you gets when looking at a lifelike robot or computer game character even has its own name: the uncanny valley. But the GAN has made it possible to create incredibly accurate depictions of one of the most difficult images of all: the human face.
The term ‘deepfake’ now describes any face-swapping technique whereby images of an individual are used by artificial intelligence technology to generate digital doppelgängers (look-alikes) and then superimposed onto different bodies. Deepfakes generated with only one source image are often obvious as fakes, but those generated with thousands of images or video clips can be very realistic. In contrast to deepfakes, other forms of audiovisual manipulation which do not utilise artificial intelligence are known as “shallow fakes” or “cheap fakes”.
As a form of entertainment, deepfakes are available for almost anyone to make or enjoy. The software is free to download, hundreds of YouTube tutorials offer guidance on how to use it, and some freelance creators even sell their services for as little as €5 per video on marketplaces such as Fivver. Mobile apps such as ZAO, Celebrity Face Morph and Deep Art Effects generate fairly realistic face-swapped videos and augment one’s appearance using just one selfie as their source, and more mainstream apps like Instagram and Snapchat have ‘filters’ which can easily do the same.
Featured image from A Breakdown of Blade Runner 2049’s Oscar-Nominated Visual Effects