Home Explore Deep Learning for Deepfakes Creation and

Deep Learning for Deepfakes Creation and

Published by apannavich, 2021-09-13 23:25:32

Description: Deep Learning for Deepfakes Creation and

Read the Text Version

Pages:

1 - 16

1 Deep Learning for Deepfakes Creation and Detection: A Survey Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Cuong M. Nguyen, Dung Nguyen, Duc Thanh Nguyen, Saeid Nahavandi, Fellow, IEEE arXiv:1909.11573v3 [cs.CV] 26 Apr 2021 Abstract—Deep learning has been successfully applied I. INTRODUCTION to solve various complex problems ranging from big data analytics to computer vision and human-level control. Deep In a narrow deﬁnition, deepfakes (stemming from learning advances however have also been employed to cre- “deep learning” and “fake”) are created by techniques ate software that can cause threats to privacy, democracy that can superimpose face images of a target person onto and national security. One of those deep learning-powered a video of a source person to make a video of the target applications recently emerged is deepfake. Deepfake al- person doing or saying things the source person does. gorithms can create fake images and videos that humans This constitutes a category of deepfakes, namely face- cannot distinguish them from authentic ones. The proposal swap. In a broader deﬁnition, deepfakes are artiﬁcial of technologies that can automatically detect and assess the intelligence-synthesized content that can also fall into integrity of digital visual media is therefore indispensable. two other categories, i.e., lip-sync and puppet-master. This paper presents a survey of algorithms used to create Lip-sync deepfakes refer to videos that are modiﬁed to deepfakes and, more importantly, methods proposed to make the mouth movements consistent with an audio detect deepfakes in the literature to date. We present recording. Puppet-master deepfakes include videos of a extensive discussions on challenges, research trends and target person (puppet) who is animated following the directions related to deepfake technologies. By reviewing facial expressions, eye and head movements of another the background of deepfakes and state-of-the-art deepfake person (master) sitting in front of a camera [1]. detection methods, this study provides a comprehensive overview of deepfake techniques and facilitates the devel- While some deepfakes can be created by traditional opment of new and more robust methods to deal with the visual effects or computer-graphics approaches, the re- increasingly challenging deepfakes. cent common underlying mechanism for deepfake cre- ation is deep learning models such as autoencoders Impact Statement—This survey provides a timely and generative adversarial networks, which have been overview of deepfake creation and detection methods applied widely in the computer vision domain [2]–[4]. and presents a broad discussion of challenges, potential These models are used to examine facial expressions trends, and future directions. We conduct the survey and movements of a person and synthesize facial images with a different perspective and taxonomy compared to of another person making analogous expressions and existing survey papers on the same topic. Informative movements [5]. Deepfake methods normally require a graphics are provided for guiding readers through the large amount of image and video data to train models latest development in deepfake research. The methods to create photo-realistic images and videos. As public surveyed are comprehensive and will be valuable to the ﬁgures such as celebrities and politicians may have a artiﬁcial intelligence community in tackling the current large number of videos and images available online, they challenges of deepfakes. are initial targets of deepfakes. Deepfakes were used to swap faces of celebrities or politicians to bodies in porn Keywords: deepfakes, face manipulation, artiﬁcial intelli- images and videos. The ﬁrst deepfake video emerged in gence, deep learning, autoencoders, GAN, forensics, survey. 2017 where face of a celebrity was swapped to the face of a porn actor. It is threatening to world security when T. T. Nguyen and D. T. Nguyen are with the School of Information deepfake methods can be employed to create videos Technology, Deakin University, Victoria, Australia. of world leaders with fake speeches for falsiﬁcation purposes [6]–[8]. Deepfakes therefore can be abused to Q. V. H. Nguyen is with School of Information and Communication cause political or religion tensions between countries, to Technology, Grifﬁth University, Queensland, Australia. fool public and affect results in election campaigns, or create chaos in ﬁnancial markets by creating fake news C. M. Nguyen is with LAMIH UMR CNRS 8201, Universite´ Polytechnique Hauts-de-France, 59313 Valenciennes, France. D. Nguyen is with Faculty of Information Technology, Monash University, Victoria, Australia. S. Nahavandi is with the Institute for Intelligent Systems Research and Innovation, Deakin University, Victoria, Australia. Corresponding e-mail: [email protected] (T. T. Nguyen).

2 [9]–[11]. It can be even used to generate fake satellite Deepfake Detection Challenge to catalyse more research images of the Earth to contain objects that do not really and development in detecting and preventing deepfakes exist to confuse military analysts, e.g., creating a fake from being used to mislead viewers [25]. Data obtained bridge across a river although there is no such a bridge from https://app.dimensions.ai at the end of 2020 show in reality. This can mislead a troop who have been guided that the number of deepfake papers has increased signif- to cross the bridge in a battle [12], [13]. icantly in recent years (Fig. 1). Although the obtained numbers of deepfake papers may be lower than actual As the democratization of creating realistic digital numbers but the research trend of this topic is obviously humans has positive implications, there is also positive increasing. use of deepfakes such as their applications in visual effects, digital avatars, snapchat ﬁlters, creating voices of Fig. 1. Number of papers related to deepfakes in years from 2016 those who have lost theirs or updating episodes of movies to 2020, obtained from https://app.dimensions.ai at the end of 2020 without reshooting them [14]. However, the number of with the search keyword “deepfake” applied to full text of scholarly malicious uses of deepfakes largely dominates that of papers. The number of such papers in 2018, 2019 and 2020 are 64, the positive ones. The development of advanced deep 368 and 1268, respectively. neural networks and the availability of large amount of data have made the forged images and videos almost This paper presents a survey of methods for creating indistinguishable to humans and even to sophisticated as well as detecting deepfakes. There have been existing computer algorithms. The process of creating those ma- survey papers about this topic in [26]–[28], we however nipulated images and videos is also much simpler today carry out the survey with different perspective and taxon- as it needs as little as an identity photo or a short video omy. In Section II, we present the principles of deepfake of a target individual. Less and less effort is required algorithms and how deep learning has been used to to produce a stunningly convincing tempered footage. enable such disruptive technologies. Section III reviews Recent advances can even create a deepfake with just different methods for detecting deepfakes as well as their a still image [15]. Deepfakes therefore can be a threat advantages and disadvantages. We discuss challenges, affecting not only public ﬁgures but also ordinary people. research trends and directions on deepfake detection and For example, a voice deepfake was used to scam a CEO multimedia forensics problems in Section IV. out of $243,000 [16]. A recent release of a software called DeepNude shows more disturbing threats as it II. DEEPFAKE CREATION can transform a person to a non-consensual porn [17]. Deepfakes have become popular due to the quality Likewise, the Chinese app Zao has gone viral lately of tampered videos and also the easy-to-use ability of as less-skilled users can swap their faces onto bodies their applications to a wide range of users with various of movie stars and insert themselves into well-known computer skills from professional to novice. These ap- movies and TV clips [18]. These forms of falsiﬁcation plications are mostly developed based on deep learning create a huge threat to violation of privacy and identity, techniques. Deep learning is well known for its capability and affect many aspects of human lives. of representing complex and high-dimensional data. One variant of the deep networks with that capability is Finding the truth in digital domain therefore has deep autoencoders, which have been widely applied become increasingly critical. It is even more challeng- for dimensionality reduction and image compression ing when dealing with deepfakes as they are majorly [29]–[31]. The ﬁrst attempt of deepfake creation was used to serve malicious purposes and almost anyone can create deepfakes these days using existing deepfake tools. Thus far, there have been numerous methods proposed to detect deepfakes [19]–[23]. Most of them are based on deep learning, and thus a battle between malicious and positive uses of deep learning methods has been arising. To address the threat of face-swapping technology or deepfakes, the United States Defense Advanced Research Projects Agency (DARPA) initiated a research scheme in media forensics (named Media Forensics or MediFor) to accelerate the development of fake digital visual media detection methods [24]. Recently, Facebook Inc. teaming up with Microsoft Corp and the Partnership on AI coalition have launched the

3 TABLE I SUMMARY OF NOTABLE DEEPFAKE TOOLS Tools Links Key Features Faceswap https://github.com/deepfakes/faceswap - Using two encoder-decoder pairs. Faceswap-GAN https://github.com/shaoanlu/faceswap-GAN - Parameters of the encoder are shared. Few-Shot Face https://github.com/shaoanlu/fewshot-face- Adversarial loss and perceptual loss (VGGface) are added to an auto-encoder archi- tecture. Translation translation-GAN - Use a pre-trained face recognition model to extract latent embeddings for GAN DeepFaceLab https://github.com/iperov/DeepFaceLab processing. - Incorporate semantic priors obtained by modules from FUNIT [42] and SPADE [43]. DFaker https://github.com/dfaker/df - Expand from the Faceswap method with new models, e.g. H64, H128, LIAEF128, DeepFake tf https://github.com/StromWine/DeepFake tf SAE [44]. AvatarMe https://github.com/lattas/AvatarMe - Support multiple face extraction modes, e.g. S3FD, MTCNN, dlib, or manual [44]. MarioNETte https://hyperconnect.github.io/MarioNETte - DSSIM loss function [45] is used to reconstruct face. - Implemented based on Keras library. DiscoFaceGAN https://github.com/microsoft/DiscoFaceGAN Similar to DFaker but implemented based on tensorﬂow. StyleRig https://gvv.mpi-inf.mpg.de/projects/StyleRig - Reconstruct 3D faces from arbitrary “in-the-wild” images. FaceShifter https://lingzhili.com/FaceShifterPage - Can reconstruct authentic 4K by 6K-resolution 3D faces from a single low-resolution FSGAN https://github.com/YuvalNirkin/fsgan image [46]. Transformable https://github.com/kyleolsz/TB-Networks - A few-shot face reenactment framework that preserves the target identity. Bottleneck - No additional ﬁne-tuning phase is needed for identity adaptation [47]. Networks github.com/carolineec/EverybodyDanceNow - Generate face images of virtual people with independent latent variables of identity, “Do as I Do” https://justusthies.github.io/posts/neural- expression, pose, and illumination. Motion Trans- voice-puppetry - Embed 3D priors into adversarial learning [48]. fer - Create portrait images of faces with a rig-like control over a pretrained and ﬁxed Neural Voice StyleGAN via 3D morphable face models. Puppetry - Self-supervised without manual annotations [49]. - Face swapping in high-ﬁdelity by exploiting and integrating the target attributes. - Can be applied to any new face pairs without requiring subject speciﬁc training [50]. - A face swapping and reenactment model that can be applied to pairs of faces without requiring training on those faces. - Adjust to both pose and expression variations [51]. - A method for ﬁne-grained 3D manipulation of image content. - Apply spatial transformations in CNN models using a transformable bottleneck framework [52]. - Automatically transfer the motion from a source to a target person by learning a video-to-video translation. - Can create a motion-synchronized dancing video with multiple subjects [53]. - A method for audio-driven facial video synthesis. - Synthesize videos of a talking head from an audio sequence of another person using 3D face representation. [54]. FakeApp, developed by a Reddit user using autoencoder- Fig. 2. A deepfake creation model using two encoder-decoder pairs. decoder pairing structure [32], [33]. In that method, the Two networks use the same encoder but different decoders for training autoencoder extracts latent features of face images and process (top). An image of face A is encoded with the common the decoder is used to reconstruct the face images. To encoder and decoded with decoder B to create a deepfake (bottom). swap faces between source images and target images, there is a need of two encoder-decoder pairs where each pair is used to train on an image set, and the encoder’s parameters are shared between two network pairs. In other words, two pairs have the same encoder network. This strategy enables the common encoder to ﬁnd and learn the similarity between two sets of face images, which are relatively unchallenging because faces normally have similar features such as eyes, nose, mouth positions. Fig. 2 shows a deepfake creation process where the feature set of face A is connected with the decoder B to reconstruct face B from the original face A. This approach is applied in several works such as Deep- FaceLab [34], DFaker [35], DeepFake tf (tensorﬂow- based deepfakes) [36].

4 By adding adversarial loss and perceptual loss im- Fig. 3. Categories of reviewed papers relevant to deepfake detection plemented in VGGFace [37] to the encoder-decoder methods where we divide papers into two major groups, i.e. fake architecture, an improved version of deepfakes based image detection and face video detection. on the generative adversarial network (GAN) [38], i.e. faceswap-GAN, was proposed in [39]. The VGGFace about the critical need of future development of more perceptual loss is added to make eye movements to be robust methods that can detect deepfakes from genuine. more realistic and consistent with input faces and help to smooth out artifacts in segmentation mask, leading to This section presents a survey of deepfake detection higher quality output videos. This model facilitates the methods where we group them into two major cate- creation of outputs with 64x64, 128x128, and 256x256 gories: fake image detection methods and fake video resolutions. In addition, the multi-task convolutional detection ones (see Fig. 3). The latter is distinguished neural network (CNN) from the FaceNet implementation into two smaller groups: visual artifacts within single [40] is introduced to make face detection more stable video frame-based methods and temporal features across and face alignment more reliable. The CycleGAN [41] is frames-based ones. Whilst most of the methods based on utilized for generative network implementation. Popular temporal features use deep learning recurrent classiﬁca- deepfake tools and their typical features are summarized tion models, the methods use visual artifacts within video in Table I. frame can be implemented by either deep or shallow classiﬁers. III. DEEPFAKE DETECTION A. Fake Image Detection Deepfakes are increasingly detrimental to privacy, so- ciety security and democracy [55]. Methods for detecting Face swapping has a number of compelling applica- deepfakes have been proposed as soon as this threat was tions in video compositing, transﬁguration in portraits, introduced. Early attempts were based on handcrafted and especially in identity protection as it can replace features obtained from artifacts and inconsistencies of faces in photographs by ones from a collection of stock the fake video synthesis process. Recent methods, on the images. However, it is also one of the techniques that other hand, applied deep learning to automatically extract cyber attackers employ to penetrate identiﬁcation or salient and discriminative features to detect deepfakes authentication systems to gain illegitimate access. The [56], [57]. use of deep learning such as CNN and GAN has made swapped face images more challenging for forensics Deepfake detection is normally deemed a binary clas- models as it can preserve pose, facial expression and siﬁcation problem where classiﬁers are used to classify lighting of the photographs [66]. Zhang et al. [67] used between authentic videos and tampered ones. This kind the bag of words method to extract a set of compact of methods requires a large database of real and fake features and fed it into various classiﬁers such as SVM videos to train classiﬁcation models. The number of fake [68], random forest (RF) [69] and multi-layer percep- videos is increasingly available, but it is still limited trons (MLP) [70] for discriminating swapped face im- in terms of setting a benchmark for validating various ages from the genuine. Among deep learning-generated detection methods. To address this issue, Korshunov images, those synthesised by GAN models are probably and Marcel [58] produced a notable deepfake dataset most difﬁcult to detect as they are realistic and high- consisting of 620 videos based on the GAN model using quality based on GAN’s capability to learn distribution the open source code Faceswap-GAN [39]. Videos from of the complex input data and generate new outputs with the publicly available VidTIMIT database [59] were used similar input distribution. to generate low and high quality deepfake videos, which can effectively mimic the facial expressions, mouth Most works on detection of GAN generated images movements, and eye blinking. These videos were then however do not consider the generalization capability used to test various deepfake detection methods. Test results show that the popular face recognition systems based on VGG [60] and Facenet [40], [61] are unable to detect deepfakes effectively. Other methods such as lip- syncing approaches [62]–[64] and image quality metrics with support vector machine (SVM) [65] produce very high error rate when applied to detect deepfake videos from this newly produced dataset. This raises concerns

5 of the detection models although the development of general dataset is extracted from the ILSVRC12 [86]. GAN is ongoing, and many new extensions of GAN are The large scale GAN training model for high ﬁdelity frequently introduced. Xuan et al. [71] used an image natural image synthesis (BIGGAN) [87], self-attention preprocessing step, e.g. Gaussian blur and Gaussian GAN [88] and spectral normalization GAN [89] are noise, to remove low level high frequency clues of GAN used to generate fake images with size of 128x128. The images. This increases the pixel level statistical similarity training set consists of 600,000 fake and real images between real images and fake images and requires the whilst the test set includes 10,000 images of both types. forensic classiﬁer to learn more intrinsic and meaningful Experimental results show the superior performance of features, which has better generalization capability than the proposed method against its competing methods such previous image forensics methods [72], [73] or image as those introduced in [90]–[93]. steganalysis networks [74]. B. Fake Video Detection On the other hand, Agarwal and Varshney [75] cast the GAN-based deepfake detection as a hypothesis testing Most image detection methods cannot be used for problem where a statistical framework was introduced videos because of the strong degradation of the frame using the information-theoretic study of authentication data after video compression [94]. Furthermore, videos [76]. The minimum distance between distributions of have temporal characteristics that are varied among sets legitimate images and images generated by a particular of frames and thus challenging for methods designed to GAN is deﬁned, namely the oracle error. The analytic detect only still fake images. This subsection focuses on results show that this distance increases when the GAN deepfake video detection methods and categorizes them is less accurate, and in this case, it is easier to detect into two smaller groups: methods that employ temporal deepfakes. In case of high-resolution image inputs, an features and those that explore visual artifacts within extremely accurate GAN is required to generate fake frames. images that are hard to detect. 1) Temporal Features across Video Frames: Based on Recently, Hsu et al. [77] introduced a two-phase deep the observation that temporal coherence is not enforced learning method for detection of deepfake images. The effectively in the synthesis process of deepfakes, Sabir et ﬁrst phase is a feature extractor based on the common al. [95] leveraged the use of spatio-temporal features of fake feature network (CFFN) where the Siamese net- video streams to detect deepfakes. Video manipulation is work architecture presented in [78] is used. The CFFN carried out on a frame-by-frame basis so that low level encompasses several dense units with each unit including artifacts produced by face manipulations are believed to different numbers of dense blocks [79] to improve the further manifest themselves as temporal artifacts with representative capability for the fake images. The number inconsistencies across frames. A recurrent convolutional of dense units is three or ﬁve depending on the validation model (RCN) was proposed based on the integration of data being face or general images, and the number of the convolutional network DenseNet [79] and the gated channels in each unit is varied up to a few hundreds. Dis- recurrent unit cells [96] to exploit temporal discrepancies criminative features between the fake and real images, across frames (see Fig. 4). The proposed method is tested i.e. pairwise information, are extracted through CFFN on the FaceForensics++ dataset, which includes 1,000 learning process. These features are then fed into the videos [97], and shows promising results. second phase, which is a small CNN concatenated to the last convolutional layer of CFFN to distinguish deceptive Fig. 4. A two-step process for face manipulation detection where the images from genuine. The proposed method is validated preprocessing step aims to detect, crop and align faces on a sequence for both fake face and fake general image detection. On of frames and the second step distinguishes manipulated and authentic the one hand, the face dataset is obtained from CelebA face images by combining convolutional neural network (CNN) and [80], containing 10,177 identities and 202,599 aligned recurrent neural network (RNN) [95]. face images of various poses and background clutter. Five GAN variants are used to generate fake images with Likewise, Guera and Delp [98] highlighted that deep- size of 64x64, including deep convolutional GAN (DC- fake videos contain intra-frame inconsistencies and tem- GAN) [81], Wasserstein GAN (WGAN) [82], WGAN poral inconsistencies between frames. They then pro- with gradient penalty (WGAN-GP) [83], least squares posed the temporal-aware pipeline method that uses GAN [84], and progressive growth of GAN (PGGAN) [85]. A total of 385,198 training images and 10,000 test images of both real and fake ones are obtained for validating the proposed method. On the other hand, the

6 CNN and long short term memory (LSTM) to detect above the threshold of 0.5 with duration less than 7 deepfake videos. CNN is employed to extract frame-level frames. This method is evaluated on a dataset collected features, which are then fed into the LSTM to create a from the web consisting of 49 interview and presentation temporal sequence descriptor. A fully-connected network videos and their corresponding fake videos generated is ﬁnally used for classifying doctored videos from real by the deepfake algorithms. The experimental results ones based on the sequence descriptor as illustrated in indicate promising performance of the proposed method Fig. 5. in detecting fake videos, which can be further improved by considering dynamic pattern of blinking, e.g. highly Fig. 5. A deepfake detection method using convolutional neural frequent blinking may also be a sign of tampering. network (CNN) and long short term memory (LSTM) to extract temporal features of a given video sequence, which are represented 2) Visual Artifacts within Video Frame: As can be via the sequence descriptor. The detection network consisting of fully- noticed in the previous subsection, the methods using connected layers is employed to take the sequence descriptor as input temporal patterns across video frames are mostly based and calculate probabilities of the frame sequence belonging to either on deep recurrent network models to detect deepfake authentic or deepfake class [98]. videos. This subsection investigates the other approach that normally decomposes videos into frames and ex- On the other hand, the use of a physiological signal, plores visual artifacts within single frames to obtain eye blinking, to detect deepfakes was proposed in [99] discriminant features. These features are then distributed based on the observation that a person in deepfakes into either a deep or shallow classiﬁer to differentiate be- has a lot less frequent blinking than that in untampered tween fake and authentic videos. We thus group methods videos. A healthy adult human would normally blink in this subsection based on the types of classiﬁers, i.e. somewhere between 2 to 10 seconds, and each blink either deep or shallow. would take 0.1 and 0.4 seconds. Deepfake algorithms, however, often use face images available online for a) Deep classiﬁers: Deepfake videos are normally training, which normally show people with open eyes, created with limited resolutions, which require an afﬁne i.e. very few images published on the internet show face warping approach (i.e., scaling, rotation and shear- people with closed eyes. Thus, without having access to ing) to match the conﬁguration of the original ones. images of people blinking, deepfake algorithms do not Because of the resolution inconsistency between the have the capability to generate fake faces that can blink warped face area and the surrounding context, this normally. In other words, blinking rates in deepfakes are process leaves artifacts that can be detected by CNN much lower than those in normal videos. To discriminate models such as VGG16 [101], ResNet50, ResNet101 real and fake videos, Li et al. [99] ﬁrst decompose the and ResNet152 [102]. A deep learning method to detect videos into frames where face regions and then eye areas deepfakes based on the artifacts observed during the face are extracted based on six eye landmarks. After few warping step of the deepfake generation algorithms was steps of pre-processing such as aligning faces, extracting proposed in [103]. The proposed method is evaluated and scaling the bounding boxes of eye landmark points on two deepfake datasets, namely the UADFV and to create new sequences of frames, these cropped eye DeepfakeTIMIT. The UADFV dataset [104] contains 49 area sequences are distributed into long-term recurrent real videos and 49 fake videos with 32,752 frames in convolutional networks (LRCN) [100] for dynamic state total. The DeepfakeTIMIT dataset [64] includes a set of prediction. The LRCN consists of a feature extractor low quality videos of 64 x 64 size and another set of high based on CNN, a sequence learning based on long short quality videos of 128 x 128 with totally 10,537 pristine term memory (LSTM), and a state prediction based on images and 34,023 fabricated images extracted from 320 a fully connected layer to predict probability of eye videos for each quality set. Performance of the proposed open and close state. The eye blinking shows strong method is compared with other prevalent methods such temporal dependencies and thus the implementation of as two deepfake detection MesoNet methods, i.e. Meso- LSTM helps to capture these temporal patterns effec- 4 and MesoInception-4 [94], HeadPose [104], and the tively. The blinking rate is calculated based on the face tampering detection method two-stream NN [105]. prediction results where a blink is deﬁned as a peak Advantage of the proposed method is that it needs not to generate deepfake videos as negative examples before training the detection models. Instead, the negative ex- amples are generated dynamically by extracting the face region of the original image and aligning it into multiple scales before applying Gaussian blur to a scaled image of random pick and warping back to the original image.

7 This reduces a large amount of time and computational et al. [104] proposed a detection method by observing resources compared to other methods, which require the differences between 3D head poses comprising head deepfakes are generated in advance. orientation and position, which are estimated based on 68 facial landmarks of the central face region. The 3D Nguyen et al. [106] proposed the use of capsule head poses are examined because there is a shortcoming networks for detecting manipulated images and videos. in the deepfake face generation pipeline. The extracted The capsule network was initially introduced to address features are fed into an SVM classiﬁer to obtain the limitations of CNNs when applied to inverse graphics detection results. Experiments on two datasets show the tasks, which aim to ﬁnd physical processes used to pro- great performance of the proposed approach against its duce images of the world [107]. The recent development competing methods. The ﬁrst dataset, namely UADFV, of capsule network based on dynamic routing algorithm consists of 49 deep fake videos and their respective real [108] demonstrates its ability to describe the hierarchical videos [104]. The second dataset comprises 241 real pose relationships between object parts. This develop- images and 252 deep fake images, which is a subset ment is employed as a component in a pipeline for of data used in the DARPA MediFor GAN Image/Video detecting fabricated images and videos as demonstrated Challenge [113]. Likewise, a method to exploit artifacts in Fig. 6. A dynamic routing algorithm is deployed to of deepfakes and face manipulations based on visual route the outputs of the three capsules to the output features of eyes, teeth and facial contours was studied in capsules through a number of iterations to separate [114]. The visual artifacts arise from lacking global con- between fake and real images. The method is evaluated sistency, wrong or imprecise estimation of the incident through four datasets covering a wide range of forged illumination, or imprecise estimation of the underlying image and video attacks. They include the well-known geometry. For deepfakes detection, missing reﬂections Idiap Research Institute replay-attack dataset [109], the and missing details in the eye and teeth areas are deepfake face swapping dataset created by Afchar et exploited as well as texture features extracted from the al. [94], the facial reenactment FaceForensics dataset facial region based on facial landmarks. Accordingly, the [110], produced by the Face2Face method [111], and eye feature vector, teeth feature vector and features ex- the fully computer-generated image dataset generated by tracted from the full-face crop are used. After extracting Rahmouni et al. [112]. The proposed method yields the the features, two classiﬁers including logistic regression best performance compared to its competing methods and small neural network are employed to classify the in all of these datasets. This shows the potential of the deepfakes from real videos. Experiments carried out on capsule network in building a general detection system a video dataset downloaded from YouTube show the best that can work effectively for various forged image and result of 0.851 in terms of the area under the receiver video attacks. operating characteristics curve. The proposed method however has a disadvantage that requires images meeting Fig. 6. Capsule network takes features obtained from the VGG-19 certain prerequisite such as open eyes or visual teeth. network [101] to distinguish fake images or videos from the real ones (top). The pre-processing step detects face region and scales it to the The use of photo response non uniformity (PRNU) size of 128x128 before VGG-19 is used to extract latent features for analysis was proposed in [115] to detect deepfakes from the capsule network, which comprises three primary capsules and two authentic ones. PRNU is a component of sensor pattern output capsules, one for real and one for fake images (bottom). The noise, which is attributed to the manufacturing imperfec- statistical pooling constitutes an important part of capsule network tion of silicon wafers and the inconsistent sensitivity of that deals with forgery detection [106]. pixels to light because of the variation of the physical characteristics of the silicon wafers. When a photo is b) Shallow classiﬁers: Deepfake detection methods taken, the sensor imperfection is introduced into the mostly rely on the artifacts or inconsistency of intrinsic high-frequency bands of the content in the form of invisi- features between fake and real images or videos. Yang ble noise. Because the imperfection is not uniform across the silicon wafer, even sensors made from the silicon wafer produce unique PRNU. Therefore, PRNU is often considered as the ﬁngerprint of digital cameras left in the images by the cameras [116]. The analysis is widely used in image forensics [117]–[120] and advocated to use in [115] because the swapped face is supposed to alter the local PRNU pattern in the facial area of video frames. The videos are converted into frames,

8 which are cropped to the questioned facial region. The their sabotage strategy without using social media. For cropped frames are then separated sequentially into eight example, this approach can be utilized by intelligence groups where an average PRNU pattern is computed services trying to inﬂuence decisions made by important for each group. Normalised cross correlation scores are people such as politicians, leading to national and in- calculated for comparisons of PRNU patterns among ternational security threats [124]. Catching the deepfake these groups. The authors in [115] created a test dataset alarming problem, research community has focused on consisting of 10 authentic videos and 16 manipulated developing deepfake detection algorithms and numerous videos, where the fake videos were produced from the results have been reported. This paper has reviewed genuine ones by the DeepFaceLab tool [34]. The analysis the state-of-the-art methods and a summary of typical shows a signiﬁcant statistical difference in terms of mean approaches is provided in Table II. It is noticeable that a normalised cross correlation scores between deepfakes battle between those who use advanced machine learning and the genuine. This analysis therefore suggests that to create deepfakes with those who make effort to detect PRNU has a potential in deepfake detection although a deepfakes is growing. larger dataset would need to be tested. Deepfakes’ quality has been increasing and the per- When seeing a video or image with suspicion, users formance of detection methods needs to be improved normally want to search for its origin. However, there accordingly. The inspiration is that what AI has broken is currently no feasibility for such a tool. Hasan and can be ﬁxed by AI as well [125]. Detection methods are Salah [121] proposed the use of blockchain and smart still in their early stage and various methods have been contracts to help users detect deepfake videos based proposed and evaluated but using fragmented datasets. on the assumption that videos are only real when their An approach to improve performance of detection meth- sources are traceable. Each video is associated with a ods is to create a growing updated benchmark dataset smart contract that links to its parent video and each of deepfakes to validate the ongoing development of parent video has a link to its child in a hierarchical struc- detection methods. This will facilitate the training pro- ture. Through this chain, users can credibly trace back cess of detection models, especially those based on deep to the original smart contract associated with pristine learning, which requires a large training set [126]. video even if the video has been copied multiple times. An important attribute of the smart contract is the unique On the other hand, current detection methods mostly hashes of the interplanetary ﬁle system, which is used focus on drawbacks of the deepfake generation pipelines, to store video and its metadata in a decentralized and i.e. ﬁnding weakness of the competitors to attack them. content-addressable manner [122]. The smart contract’s This kind of information and knowledge is not always key features and functionalities are tested against several available in adversarial environments where attackers common security challenges such as distributed denial commonly attempt not to reveal such deepfake creation of services, replay and man in the middle attacks to technologies. Recent works on adversarial perturbation ensure the solution meeting security requirements. This attacks to fool DNN-based detectors make the deepfake approach is generic, and it can be extended to other types detection task more difﬁcult [127]–[131]. These are of digital content, e.g., images, audios and manuscripts. real challenges for detection method development and a future research needs to focus on introducing more IV. DISCUSSIONS AND FUTURE RESEARCH robust, scalable and generalizable methods. DIRECTIONS Another research direction is to integrate detection Deepfakes have begun to erode trust of people in methods into distribution platforms such as social me- media contents as seeing them is no longer commen- dia to increase its effectiveness in dealing with the surate with believing in them. They could cause distress widespread impact of deepfakes. The screening or ﬁlter- and negative effects to those targeted, heighten disin- ing mechanism using effective detection methods can be formation and hate speech, and even could stimulate implemented on these platforms to ease the deepfakes political tension, inﬂame the public, violence or war. detection [124]. Legal requirements can be made for This is especially critical nowadays as the technologies tech companies who own these platforms to remove for creating deepfakes are increasingly approachable and deepfakes quickly to reduce its impacts. In addition, social media platforms can spread those fake contents watermarking tools can also be integrated into devices quickly [123]. Sometimes deepfakes do not need to be that people use to make digital contents to create im- spread to massive audience to cause detrimental effects. mutable metadata for storing originality details such as People who create deepfakes with malicious purpose time and location of multimedia contents as well as their only need to deliver them to target audiences as part of untampered attestment [124]. This integration is difﬁcult to implement but a solution for this could be the use

9 of the disruptive blockchain technology. The blockchain Using detection methods to spot deepfakes is crucial, has been used effectively in many areas and there are but understanding the real intent of people publishing very few studies so far addressing the deepfake detection deepfakes is even more important. This requires the problems based on this technology. As it can create judgement of users based on social context in which a chain of unique unchangeable blocks of metadata, deepfake is discovered, e.g. who distributed it and what it is a great tool for digital provenance solution. The they said about it [132]. This is critical as deepfakes integration of blockchain technologies to this problem are getting more and more photorealistic and it is highly has demonstrated certain results [121] but this research anticipated that detection software will be lagging behind direction is far from mature. deepfake creation technology. A study on social context TABLE II SUMMARY OF PROMINENT DEEPFAKE DETECTION METHODS Methods Classiﬁers/ Key Features Dealing Datasets Used Eye blinking [99] Techniques with LRCN - Use LRCN to learn the temporal patterns of eye blinking. Videos Consist of 49 interview and presentation videos, CNN and - Based on the observation that blinking frequency and their corresponding generated deepfakes. LSTM of deepfakes is much smaller than normal. Intra-frame CNN is employed to extract frame-level features, Videos A collection of 600 videos obtained from multiple and temporal VGG16 [101] which are distributed to LSTM to construct sequence websites. inconsistencies ResNet50, descriptor useful for classiﬁcation. [98] 101 or 152 [102] Artifacts are discovered using CNN models based Using face CNN on resolution inconsistency between the warped face Videos - UADFV [104], containing 49 real videos and 49 area and the surrounding context. fake videos with 32752 frames in total. warping artifacts Capsule - DeepfakeTIMIT [64] networks - Two deep networks, i.e. Meso-4 and [103] MesoInception-4 are introduced to examine Videos Two datasets: deepfake one constituted from online SVM deepfake videos at the mesoscopic analysis level. videos and the FaceForensics one created by the MesoNet [94] - Accuracy obtained on deepfake and FaceForensics Face2Face approach [111]. Logistic datasets are 98% and 95% respectively. Capsule- regression - Latent features extracted by VGG-19 network [101] Videos/ Four datasets: the Idiap Research Institute replay- forensics [106] and neural are fed into the capsule network for classiﬁcation. Images attack [109], deepfake face swapping by [94], network - A dynamic routing algorithm [108] is used to route facial reenactment FaceForensics [110], and fully RCN the outputs of three convolutional capsules to two computer-generated image set using [112]. output capsules, one for fake and another for real Head poses [104] Convolutional images, through a number of iterations. Videos/ - UADFV consists of 49 deep fake videos and their bidirectional - Features are extracted using 68 landmarks of the Images respective real videos. Eye, teach and recurrent face region. - 241 real images and 252 deep fake images from facial texture LSTM - Use SVM to classify using the extracted features. Videos DARPA MediFor GAN Image/Video Challenge. [114] network PRNU - Exploit facial texture differences, and missing A video dataset downloaded from YouTube. reﬂections and details in eye and teeth areas of Spatio-temporal CNN deepfakes. Videos FaceForensics++ dataset, including 1,000 videos features with - Logistic regression and neural network are used for [97]. RCN [95] classifying. Temporal discrepancies across frames are explored Videos FaceForensics++ [97] and Celeb-DF (5,639 deep- Spatio-temporal using RCN that integrates convolutional network fake videos) [141] datasets and the ASVSpoof features with DenseNet [79] and the gated recurrent unit cells [96] 2019 Logical Access audio dataset [142]. LSTM [140] - An XceptionNet CNN is used for facial feature extraction while audio embeddings are obtained by Videos Created by the authors, including 10 authentic and Analysis of stacking multiple convolution modules. 16 deepfake videos using DeepFaceLab [34]. - Two loss functions, i.e. cross-entropy and PRNU [115] Kullback-Leibler divergence, are used. - Analysis of noise patterns of light sensitive sensors Phoneme-viseme of digital cameras due to their factory defects. Videos Four in-the-wild lip-sync deepfakes from Instagram mismatches - Explore the differences of PRNU patterns be- and YouTube (www.instagram.com/bill posters uk [133] tween the authentic and deepfake videos because and youtu.be/VWMEDacz3L4) and others are cre- face swapping is believed to alter the local PRNU ated using synthesis techniques, i.e. Audio-to- patterns. Video (A2V) [63] and Text-to-Video (T2V) [134]. - Exploit the mismatches between the dynamics of the mouth shape, i.e. visemes, with a spoken phoneme. - Focus on sounds associated with the M, B and P phonemes as they require complete mouth closure while deepfakes often incorrectly synthesize it.

10 Methods Classiﬁers/ Key Features Dealing Datasets Used Techniques with Using ResNet50 - The ABC metric [137] is used to detect deepfake attribution- model [102], videos without accessing to training data. Videos VidTIMIT and two other original based pre-trained on - ABC values obtained for original videos are greater Videos conﬁdence VGGFace2 than 0.94 while those of deepfakes have low ABC datasets obtained from the COHFACE (ABC) metric [136] values. [135] (https://www.idiap.ch/dataset/cohface) and Siamese Modality and emotion embedding vectors for the Emotion audio- network [78] face and speech are extracted for deepfake detection. from YouTube. datasets from COHFACE visual affective cues [143] Rules deﬁned Temporal, behavioral biometric based on facial ex- [138] and YouTube are used to generate two Using based on facial pressions and head movements are learned using appearance and behavioural ResNet-101 [102] while static facial biometric is deepfake datasets by commercial website and behaviour features. obtained using VGG [60]. [145] CNN Extract biological signals in portrait videos and use https://deepfakesweb.com and another deepfake FakeCatcher them as an implicit descriptor of authenticity because [147] DCGAN, they are not spatially and temporally well-preserved dataset is DeepfakeTIMIT [139]. WGAN-GP in deepfakes. Preprocessing and PGGAN. - Enhance generalization ability of deep learning DeepfakeTIMIT [139] and DFDC [144]. combined with models to detect GAN generated images. deep network SVM, RF, MLP - Remove low level features of fake images. Videos The world leaders dataset [1], FaceForensics++ [71] - Force deep networks to focus more on pixel level [97], Google/Jigsaw deepfake detection dataset CNN concate- similarity between fake and real images to improve [146], DFDC [144] and Celeb-DF [141]. Bag of words nated to CFFN generalization ability. and shallow Extract discriminant features using bag of words Videos UADFV [104], FaceForensics [110], FaceForen- classiﬁers [67] method and feed these features into SVM, RF and Images sics++ [97], Celeb-DF [141], and a new dataset of Pairwise learn- MLP for binary classiﬁcation: innocent vs fabricated. 142 videos, independent of the generative model, ing [77] Two-phase procedure: feature extraction using CFFN resolution, compression, content, and context. based on the Siamese network architecture [78] and Defenses classiﬁcation using CNN. - Real dataset: CelebA-HQ [85], including high against quality face images of 1024x1024 resolution. adversarial - Introduce adversarial perturbations to enhance - Fake datasets: generated by DCGAN [81], perturbations in deepfakes and fool deepfake detectors. WGAN-GP [83] and PGGAN [85]. deepfakes [127] - Improve accuracy of deepfake detectors using Lips- Analyzing chitz regularization and deep image prior techniques. Images The well-known LFW face database [148], con- convolutional taining 13,223 images with resolution of 250x250. traces [150] Using expectation maximization algorithm to extract VGG [60] and local features pertaining to convolutional generative Images - Face images: real ones from CelebA [80], and Face X-ray ResNet [102] process of GAN-based image deepfake generators. Images fake ones generated by DCGAN [81], WGAN [156] [82], WGAN-GP [83], least squares GAN [84], and - Try to locate the blending boundary between the PGGAN [85]. Using common target and original faces instead of capturing the - General images: real ones from ILSVRC12 [86], artifacts of synthesized artifacts of speciﬁc manipulations. and fake ones generated by BIGGAN [87], self- CNN-generated - Can be trained without fake images. attention GAN [88] and spectral normalization images [157] Train the classiﬁer using a large number of fake GAN [89]. images generated by a high-performing uncondi- tional GAN model, i.e., ProGAN [85] and evaluate 5,000 real images from CelebA [80] and 5,000 fake how well the classiﬁer generalizes to other CNN- images created by the “Few-Shot Face Translation synthesized images. GAN” method [149]. KNN, SVM, Images Authentic images from CelebA and correspond- and linear Images ing deepfakes are created by ﬁve different GANs discriminant (group-wise deep whitening-and-coloring transfor- analysis mation GDWCT [151], StarGAN [152], AttGAN [153], StyleGAN [154], StyleGAN2 [155]). CNN FaceForensics++ [97], DeepfakeDetection (DFD) [146], DFDC [144] and Celeb-DF [141]. ResNet-50 Images A new dataset of CNN-generated images, namely ForenSynths, consisting of synthesized images [102] pre- from 11 models such as StyleGAN [154], super- resolution methods [158] and FaceForensics++ trained with [97]. ImageNet [86] of deepfakes to assist users in such judgement is thus and thus the experts’ opinions may not be enough to worth performing. authenticate these evidences because even experts are unable to discern manipulated contents. This aspect Videos and photographics have been widely used as needs to take into account in courtrooms nowadays when evidences in police investigation and justice cases. They images and videos are used as evidences to convict may be introduced as evidences in a court of law by perpetrators because of the existence of a wide range of digital media forensics experts who have background in digital manipulation methods [159]. The digital media computer or law enforcement and experience in collect- forensics results therefore must be proved to be valid ing, examining and analysing digital information. The and reliable before they can be used in courts. This development of machine learning and AI technologies might have been used to modify these digital contents

11 requires careful documentation for each step of the foren- [11] Guo, B., Ding, Y., Yao, L., Liang, Y., and Yu, Z. (2020). The sics process and how the results are reached. Machine future of false information detection on social media: new learning and AI algorithms can be used to support the perspectives and trends. ACM Computing Surveys (CSUR), determination of the authenticity of digital media and 53(4), 1-36. have obtained accurate and reliable results, e.g. [160], [161], but most of these algorithms are unexplainable. [12] Tucker, P. (2019, March 31). The newest AI-enabled This creates a huge hurdle for the applications of AI weapon: ‘Deep-Faking’ photos of the earth. Available at in forensics problems because not only the forensics https://www.defenseone.com/technology/2019/03/next-phase- experts oftentimes do not have expertise in computer ai-deep-faking-whole-world-and-china-ahead/155944/ algorithms, but the computer professionals also cannot explain the results properly as most of these algorithms [13] Fish, T. (2019, April 4). Deep fakes: AI-manipulated are black box models [162]. This is more critical as media will be ‘weaponised’ to trick military. Avail- the most recent models with the most accurate results able at https://www.express.co.uk/news/science/1109783/deep- are based on deep learning methods consisting of many fakes-ai-artiﬁcial-intelligence-photos-video-weaponised-china neural network parameters. Explainable AI in computer vision therefore is a research direction that is needed to [14] Marr, B. (2019, July 22). The best (and scariest) promote and utilize the advances and advantages of AI examples of AI-enabled deepfakes. Available at and machine learning in digital media forensics. https://www.forbes.com/sites/bernardmarr/2019/07/22/the- best-and-scariest-examples-of-ai-enabled-deepfakes/ REFERENCES [15] Zakharov, E., Shysheya, A., Burkov, E., and Lempitsky, V. [1] Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., and Li, (2019). Few-shot adversarial learning of realistic neural talking H. (2019, June). Protecting world leaders against deep fakes. head models. arXiv preprint arXiv:1905.08233. In Computer Vision and Pattern Recognition Workshops (pp. 38-45). [16] Damiani, J. (2019, September 3). A voice deepfake was used to scam a CEO out of $243,000. Available at [2] Tewari, A., Zollhoefer, M., Bernard, F., Garrido, P., Kim, H., https://www.forbes.com/sites/jessedamiani/2019/09/03/a- Perez, P., and Theobalt, C. (2020). High-ﬁdelity monocular voice-deepfake-was-used-to-scam-a-ceo-out-of-243000/ face reconstruction based on an unsupervised model-based face autoencoder. IEEE Transactions on Pattern Analysis and [17] Samuel, S. (2019, June 27). A guy made a deepfake app to Machine Intelligence, 42(2), 357-370. turn photos of women into nudes. It didn’t go well. Available at https://www.vox.com/2019/6/27/18761639/ai-deepfake- [3] Lin, J., Li, Y., & Yang, G. (2021). FPGAN: Face de- deepnude-app-nude-women-porn identiﬁcation method with generative adversarial networks for social robots. Neural Networks, 133, 132-147. [18] The Guardian (2019, September 2). Chinese deepfake app Zao sparks privacy row after going viral. Available at [4] Liu, M. Y., Huang, X., Yu, J., Wang, T. C., & Mallya, A. https://www.theguardian.com/technology/2019/sep/02/chinese- (2021). Generative adversarial networks for image and video face-swap-app-zao-triggers-privacy-fears-viral synthesis: Algorithms and applications. Proceedings of the IEEE, DOI: 10.1109/JPROC.2021.3049196. [19] Lyu, S. (2020, July). Deepfake detection: current challenges and next steps. In IEEE International Conference on Multime- [5] Lyu, S. (2018, August 29). Detecting ‘deepfake’ dia and Expo Workshops (ICMEW) (pp. 1-6). IEEE. videos in the blink of an eye. Available at http://theconversation.com/detecting-deepfake-videos-in- [20] Guarnera, L., Giudice, O., Nastasi, C., and Battiato, S. (2020). the-blink-of-an-eye-101072 Preliminary forensics analysis of deepfake images. arXiv preprint arXiv:2004.12626. [6] Bloomberg (2018, September 11). How faking videos became easy and why that’s so scary. Available at [21] Jafar, M. T., Ababneh, M., Al-Zoube, M., and Elhassan, A. https://fortune.com/2018/09/11/deep-fakes-obama-video/ (2020, April). Forensics and analysis of deepfake videos. In The 11th International Conference on Information and [7] Chesney, R., and Citron, D. (2019). Deepfakes and the new Communication Systems (ICICS) (pp. 053-058). IEEE. disinformation war: The coming age of post-truth geopolitics. Foreign Affairs, 98, 147. [22] Trinh, L., Tsang, M., Rambhatla, S., and Liu, Y. (2020). Interpretable deepfake detection via dynamic prototypes. arXiv [8] Hwang, T. (2020). Deepfakes: A Grounded Threat Assessment. preprint arXiv:2006.15473. Centre for Security and Emerging Technologies, Georgetown University. [23] Younus, M. A., and Hasan, T. M. (2020, April). Effective and fast deepfake detection method based on Haar wavelet [9] Zhou, X., and Zafarani, R. (2020). A survey of fake transform. In 2020 International Conference on Computer news: fundamental theories, detection methods, and Science and Software Engineering (CSASE) (pp. 186-190). opportunities. ACM Computing Surveys (CSUR), DOI: IEEE. https://doi.org/10.1145/3395046. [24] Turek, M. (2019). Media Forensics (MediFor). Available at [10] Kaliyar, R. K., Goswami, A., and Narang, P. (2020). Deepfake: https://www.darpa.mil/program/media-forensics improving fake news detection using tensor decomposition based deep neural network. Journal of Supercomputing, DOI: [25] Schroepfer, M. (2019, September 5). Creating a https://doi.org/10.1007/s11227-020-03294-y. data set and a challenge for deepfakes. Available at https://ai.facebook.com/blog/deepfake-detection-challenge [26] Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., & Ortega-Garcia, J. (2020). Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion, 64, 131-148. [27] Verdoliva, L. (2020). Media forensics and deepfakes: an overview. IEEE Journal of Selected Topics in Signal Process- ing, 14(5), 910-932. [28] Mirsky, Y., & Lee, W. (2021). The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR), 54(1), 1-41. [29] Punnappurath, A., and Brown, M. S. (2019). Learning raw image reconstruction-aware deep image compressors. IEEE

12 Transactions on Pattern Analysis and Machine Intelligence. [51] Nirkin, Y., Keller, Y., & Hassner, T. (2019). FSGAN: subject DOI: 10.1109/TPAMI.2019.2903062. agnostic face swapping and reenactment. In Proceedings of [30] Cheng, Z., Sun, H., Takeuchi, M., and Katto, J. (2019). the IEEE/CVF International Conference on Computer Vision Energy compaction-based image compression using convolu- (pp. 7184-7193). tional autoencoder. IEEE Transactions on Multimedia. DOI: [52] Olszewski, K., Tulyakov, S., Woodford, O., Li, H., & Luo, L. 10.1109/TMM.2019.2938345. (2019). Transformable bottleneck networks. In Proceedings of [31] Chorowski, J., Weiss, R. J., Bengio, S., and Oord, A. V. the IEEE/CVF International Conference on Computer Vision D. (2019). Unsupervised speech representation learning using (pp. 7648-7657). wavenet autoencoders. IEEE/ACM Transactions on Audio, [53] Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. (2019). Speech, and Language Processing. 27(12), pp. 2041-2053. Everybody dance now. In Proceedings of the IEEE/CVF In- [32] Faceswap: Deepfakes software for all. Available at ternational Conference on Computer Vision (pp. 5933-5942). https://github.com/deepfakes/faceswap [54] Thies, J., Elgharib, M., Tewari, A., Theobalt, C., & Nießner, [33] FakeApp 2.2.0. Available at M. (2020, August). Neural voice puppetry: Audio-driven facial https://www.malavida.com/en/soft/fakeapp/ reenactment. In European Conference on Computer Vision (pp. [34] DeepFaceLab. Available at 716-731). Springer, Cham. https://github.com/iperov/DeepFaceLab [55] Chesney, R., and Citron, D. K. (2018). Deep fakes: a loom- [35] DFaker. Available at https://github.com/dfaker/df ing challenge for privacy, democracy, and national security. [36] DeepFake tf: Deepfake based on tensorﬂow. Available at https://dx.doi.org/10.2139/ssrn.3213954. https://github.com/StromWine/DeepFake tf [56] de Lima, O., Franklin, S., Basu, S., Karwoski, B., and George, [37] Keras-VGGFace: VGGFace implementation with Keras frame- A. (2020). Deepfake detection using spatiotemporal convolu- work. Available at https://github.com/rcmalli/keras-vggface tional networks. arXiv preprint arXiv:2006.14749. [38] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde- [57] Amerini, I., and Caldelli, R. (2020, June). Exploiting pre- Farley, D., Ozair, S., ... and Bengio, Y. (2014). Generative ad- diction error inconsistencies through LSTM-based classiﬁers versarial nets. In Advances in Neural Information Processing to detect deepfake videos. In Proceedings of the 2020 ACM Systems (pp. 2672-2680). Workshop on Information Hiding and Multimedia Security (pp. [39] Faceswap-GAN. Available at 97-102). https://github.com/shaoanlu/faceswap-GAN. [58] Korshunov, P., and Marcel, S. (2019). Vulnerability assessment [40] FaceNet. Available at https://github.com/davidsandberg/facenet. and detection of deepfake videos. In The 12th IAPR Interna- [41] CycleGAN. Available at https://github.com/junyanz/pytorch- tional Conference on Biometrics (ICB), pp. 1-6. CycleGAN-and-pix2pix. [59] VidTIMIT database. Available at [42] Liu, M. Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehti- http://conradsanderson.id.au/vidtimit/ nen, J., and Kautz, J. (2019). Few-shot unsupervised image- [60] Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015, Septem- to-image translation. In Proceedings of the IEEE International ber). Deep face recognition. In Proceedings of the British Conference on Computer Vision (pp. 10551-10560). Machine Vision Conference (BMVC) (pp. 41.1-41.12). [43] Park, T., Liu, M. Y., Wang, T. C., and Zhu, J. Y. (2019). Se- [61] Schroff, F., Kalenichenko, D., and Philbin, J. (2015). Facenet: mantic image synthesis with spatially-adaptive normalization. A uniﬁed embedding for face recognition and clustering. In In Proceedings of the IEEE Conference on Computer Vision Proceedings of the IEEE Conference on Computer Vision and and Pattern Recognition (pp. 2337-2346). Pattern Recognition (pp. 815-823). [44] DeepFaceLab: Explained and usage tutorial. Available [62] Chung, J. S., Senior, A., Vinyals, O., and Zisserman, A. at https://mrdeepfakes.com/forums/thread-deepfacelab- (2017, July). Lip reading sentences in the wild. In 2017 explained-and-usage-tutorial. IEEE Conference on Computer Vision and Pattern Recognition [45] DSSIM. Available at https://github.com/keras-team/keras- (CVPR) (pp. 3444-3453). contrib/blob/master/keras contrib/losses/dssim.py. [63] Suwajanakorn, S., Seitz, S. M., and Kemelmacher-Shlizerman, [46] Lattas, A., Moschoglou, S., Gecer, B., Ploumpis, S., Tri- I. (2017). Synthesizing Obama: learning lip sync from audio. antafyllou, V., Ghosh, A., & Zafeiriou, S. (2020). AvatarMe: ACM Transactions on Graphics (TOG), 36(4), 1–13. realistically renderable 3D facial reconstruction ”in-the-wild”. [64] Korshunov, P., and Marcel, S. (2018, September). Speaker In Proceedings of the IEEE/CVF Conference on Computer inconsistency detection in tampered video. In 2018 26th Eu- Vision and Pattern Recognition (pp. 760-769). ropean Signal Processing Conference (EUSIPCO) (pp. 2375- [47] Ha, S., Kersner, M., Kim, B., Seo, S., & Kim, D. (2020, April). 2379). IEEE. MarioNETte: few-shot face reenactment preserving identity of [65] Galbally, J., and Marcel, S. (2014, August). Face anti-spooﬁng unseen targets. In Proceedings of the AAAI Conference on based on general image quality assessment. In 2014 22nd Artiﬁcial Intelligence (vol. 34, no. 07, pp. 10893-10900). International Conference on Pattern Recognition (pp. 1173- [48] Deng, Y., Yang, J., Chen, D., Wen, F., & Tong, X. 1178). IEEE. (2020). Disentangled and controllable face image genera- [66] Korshunova, I., Shi, W., Dambre, J., and Theis, L. (2017). Fast tion via 3D imitative-contrastive learning. In Proceedings of face-swap using convolutional neural networks. In Proceedings the IEEE/CVF Conference on Computer Vision and Pattern of the IEEE International Conference on Computer Vision (pp. Recognition (pp. 5154-5163). 3677-3685). [49] Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H. [67] Zhang, Y., Zheng, L., and Thing, V. L. (2017, August). P., Pe´rez, P., ... & Theobalt, C. (2020). StyleRig: Rigging Automated face swapping and its detection. In 2017 IEEE StyleGAN for 3D control over portrait images. In Proceedings 2nd International Conference on Signal and Image Processing of the IEEE/CVF Conference on Computer Vision and Pattern (ICSIP) (pp. 15-19). IEEE. Recognition (pp. 6142-6151). [68] Wang, X., Thome, N., and Cord, M. (2017). Gaze latent [50] Li, L., Bao, J., Yang, H., Chen, D., & Wen, F. (2019). support vector machine for image classiﬁcation improved by FaceShifter: Towards high ﬁdelity and occlusion aware face weakly supervised region selection. Pattern Recognition, 72, swapping. arXiv preprint arXiv:1912.13457. 59-71.

13 [69] Bai, S. (2017). Growing random forest on deep convolutional [87] Brock, A., Donahue, J., and Simonyan, K. (2018). Large scale neural networks for scene categorization. Expert Systems with GAN training for high ﬁdelity natural image synthesis. arXiv Applications, 71, 279-287. preprint arXiv:1809.11096. [70] Zheng, L., Duffner, S., Idrissi, K., Garcia, C., and Baskurt, [88] Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. (2018). A. (2016). Siamese multi-layer perceptrons for dimensionality Self-attention generative adversarial networks. arXiv preprint reduction and face identiﬁcation. Multimedia Tools and Appli- arXiv:1805.08318. cations, 75(9), 5055-5073. [89] Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. (2018). [71] Xuan, X., Peng, B., Dong, J., and Wang, W. (2019). On Spectral normalization for generative adversarial networks. the generalization of GAN image forensics. arXiv preprint arXiv preprint arXiv:1802.05957. arXiv:1902.11153. [90] Farid, H. (2009). Image forgery detection. IEEE Signal Pro- [72] Yang, P., Ni, R., and Zhao, Y. (2016, September). Recapture cessing Magazine, 26(2), 16-25. image forensics based on Laplacian convolutional neural net- works. In International Workshop on Digital Watermarking [91] Mo, H., Chen, B., and Luo, W. (2018, June). Fake faces iden- (pp. 119-128). tiﬁcation via convolutional neural network. In Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia [73] Bayar, B., and Stamm, M. C. (2016, June). A deep learning Security (pp. 43-47). approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM [92] Marra, F., Gragnaniello, D., Cozzolino, D., and Verdoliva, L. Workshop on Information Hiding and Multimedia Security (pp. (2018, April). Detection of GAN-generated fake images over 5-10). ACM. social networks. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (pp. 384-389). [74] Qian, Y., Dong, J., Wang, W., and Tan, T. (2015, March). Deep IEEE. learning for steganalysis via convolutional neural networks. In Media Watermarking, Security, and Forensics 2015 (Vol. 9409, [93] Hsu, C. C., Lee, C. Y., and Zhuang, Y. X. (2018, December). p. 94090J). Learning to detect fake face images in the wild. In 2018 International Symposium on Computer, Consumer and Control [75] Agarwal, S., and Varshney, L. R. (2019). Limits of deep- (IS3C) (pp. 388-391). IEEE. fake detection: A robust estimation viewpoint. arXiv preprint arXiv:1905.03493. [94] Afchar, D., Nozick, V., Yamagishi, J., and Echizen, I. (2018, December). MesoNet: a compact facial video forgery detection [76] Maurer, U. M. (2000). Authentication theory and hypothesis network. In 2018 IEEE International Workshop on Information testing. IEEE Transactions on Information Theory, 46(4), Forensics and Security (WIFS) (pp. 1-7). IEEE. 1350-1356. [95] Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., [77] Hsu, C. C., Zhuang, Y. X., and Lee, C. Y. (2020). Deep fake and Natarajan, P. (2019). Recurrent convolutional strategies for image detection based on pairwise learning. Applied Sciences, face manipulation detection in videos. In Proceedings of the 10(1), 370. IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 80-87). [78] Chopra, S. (2005). Learning a similarity metric discrimina- tively, with application to face veriﬁcation. In IEEE Confer- [96] Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., ence on Compter Vision and Pattern Recognition (pp. 539- Bougares, F., Schwenk, H., and Bengio, Y. (2014, October). 546). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 [79] Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, Conference on Empirical Methods in Natural Language Pro- K. Q. (2017). Densely connected convolutional networks. In cessing (EMNLP) (pp. 1724-1734). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700-4708). [97] Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). Faceforensics++: Learning to detect [80] Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep manipulated facial images. In Proceedings of the IEEE/CVF learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1-11). International Conference on Computer Vision (pp. 3730-3738). [98] Guera, D., and Delp, E. J. (2018, November). Deepfake video [81] Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised detection using recurrent neural networks. In 2018 15th IEEE representation learning with deep convolutional generative International Conference on Advanced Video and Signal Based adversarial networks. arXiv preprint arXiv:1511.06434. Surveillance (AVSS) (pp. 1-6). IEEE. [82] Arjovsky, M., Chintala, S., and Bottou, L. (2017, July). [99] Li, Y., Chang, M. C., and Lyu, S. (2018, December). In Wasserstein generative adversarial networks. In International ictu oculi: Exposing AI created fake videos by detecting eye Conference on Machine Learning (pp. 214-223). blinking. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS) (pp. 1-7). IEEE. [83] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. (2017). Improved training of Wasserstein [100] Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, GANs. In Advances in Neural Information Processing Systems M., Venugopalan, S., Saenko, K., and Darrell, T. (2015). Long- (pp. 5767-5777). term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on [84] Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., and Paul Computer Vision and Pattern Recognition (pp. 2625-2634). Smolley, S. (2017). Least squares generative adversarial net- works. In Proceedings of the IEEE International Conference [101] Simonyan, K., and Zisserman, A. (2014). Very deep con- on Computer Vision (pp. 2794-2802). volutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [85] Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2017). Pro- gressive growing of GANs for improved quality, stability, and [102] He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual variation. arXiv preprint arXiv:1710.10196. learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. [86] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., 770-778). Ma, S., ... and Berg, A. C. (2015). ImageNet large scale vi- sual recognition challenge. International Journal of Computer [103] Li, Y., and Lyu, S. (2019). Exposing deepfake videos by Vision, 115(3), 211-252. detecting face warping artifacts. In Proceedings of the IEEE

14 Conference on Computer Vision and Pattern Recognition analysis. IEEE Transactions on Biometrics, Behavior, and Workshops (pp. 46-52). Identity Science, 1(4), 302-317. [104] Yang, X., Li, Y., and Lyu, S. (2019, May). Exposing deep [120] Phan, Q. T., Boato, G., and De Natale, F. G. (2019). Accurate fakes using inconsistent head poses. In 2019 IEEE Interna- and scalable image clustering based on sparse representation of tional Conference on Acoustics, Speech and Signal Processing camera ﬁngerprint. IEEE Transactions on Information Foren- (ICASSP) (pp. 8261-8265). IEEE. sics and Security, 14(7), 1902-1916. [105] Zhou, P., Han, X., Morariu, V. I., and Davis, L. S. (2017, [121] Hasan, H. R., and Salah, K. (2019). Combating deepfake July). Two-stream neural networks for tampered face detection. videos using blockchain and smart contracts. IEEE Access, In 2017 IEEE Conference on Computer Vision and Pattern 7, 41596-41606. Recognition Workshops (CVPRW) (pp. 1831-1839). IEEE. [122] IPFS powers the Distributed Web. Available at https://ipfs.io/ [106] Nguyen, H. H., Yamagishi, J., and Echizen, I. (2019, May). [123] Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., and Procter, Capsule-forensics: Using capsule networks to detect forged R. (2018). Detection and resolution of rumours in social images and videos. In 2019 IEEE International Conference media: A survey. ACM Computing Surveys (CSUR), 51(2), 1- on Acoustics, Speech and Signal Processing (ICASSP) (pp. 36. 2307-2311). IEEE. [124] Chesney, R. and Citron, D. K. (2018, October 16). Disin- [107] Hinton, G. E., Krizhevsky, A., and Wang, S. D. (2011, June). formation on steroids: The threat of deep fakes. Available at Transforming auto-encoders. In International Conference on https://www.cfr.org/report/deep-fake-disinformation-steroids. Artiﬁcial Neural Networks (pp. 44-51). Springer, Berlin, Hei- [125] Floridi, L. (2018). Artiﬁcial intelligence, deepfakes and a delberg. future of ectypes. Philosophy and Technology, 31(3), 317-321. [108] Sabour, S., Frosst, N., and Hinton, G. E. (2017). Dynamic [126] Dolhansky, B., Bitton, J., Pﬂaum, B., Lu, J., Howes, R., Wang, routing between capsules. In Advances in Neural Information M., and Ferrer, C. C. (2020). The deepfake detection challenge Processing Systems (pp. 3856-3866). dataset. arXiv preprint arXiv:2006.07397. [109] Chingovska, I., Anjos, A., and Marcel, S. (2012, September). [127] Gandhi, A., and Jain, S. (2020). Adversarial perturbations fool On the effectiveness of local binary patterns in face anti- deepfake detectors. arXiv preprint arXiv:2003.10596. spooﬁng. In Proceedings of the International Conference of [128] Neekhara, P., Hussain, S., Jere, M., Koushanfar, F., and Biometrics Special Interest Group (BIOSIG) (pp. 1-7). IEEE. McAuley, J. (2020). Adversarial deepfakes: evaluating vulner- [110] Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., ability of deepfake detectors to adversarial examples. arXiv and Nießner, M. (2018). FaceForensics: A large-scale video preprint arXiv:2002.12749. dataset for forgery detection in human faces. arXiv preprint [129] Carlini, N., and Farid, H. (2020). Evading deepfake-image arXiv:1803.09179. detectors with white-and black-box attacks. In Proceedings of [111] Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., and the IEEE/CVF Conference on Computer Vision and Pattern Nießner, M. (2016). Face2Face: Real-time face capture and Recognition Workshops (pp. 658-659). reenactment of RGB videos. In Proceedings of the IEEE [130] Yang, C., Ding, L., Chen, Y., and Li, H. (2020). Defending Conference on Computer Vision and Pattern Recognition (pp. against GAN-based deepfake attacks via transformation-aware 2387-2395). adversarial faces. arXiv preprint arXiv:2006.07421. [112] Rahmouni, N., Nozick, V., Yamagishi, J., and Echizen, I. [131] Yeh, C. Y., Chen, H. W., Tsai, S. L., and Wang, S. D. (2020). (2017, December). Distinguishing computer graphics from Disrupting image-translation-based deepfake algorithms with natural images using convolution neural networks. In 2017 adversarial attacks. In Proceedings of the IEEE Winter Con- IEEE Workshop on Information Forensics and Security (WIFS) ference on Applications of Computer Vision Workshops (pp. (pp. 1-6). IEEE. 53-62). [113] Guan, H., Kozak, M., Robertson, E., Lee, Y., Yates, A. N., [132] Read, M. (2019, June 27). Can you spot Delgado, A., ... and Fiscus, J. (2019, January). MFC datasets: a deepfake? Does it matter? Available at Large-scale benchmark datasets for media forensic challenge http://nymag.com/intelligencer/2019/06/how-do-you-spot- evaluation. In 2019 IEEE Winter Applications of Computer a-deepfake-it-might-not-matter.html. Vision Workshops (WACVW) (pp. 63-72). [133] Agarwal, S., Farid, H., Fried, O., and Agrawala, M. [114] Matern, F., Riess, C., and Stamminger, M. (2019, January). (2020). Detecting deep-fake videos from phoneme-viseme Exploiting visual artifacts to expose deepfakes and face ma- mismatches. In Proceedings of the IEEE/CVF Conference on nipulations. In 2019 IEEE Winter Applications of Computer Computer Vision and Pattern Recognition Workshops (pp. 660- Vision Workshops (WACVW) (pp. 83-92). IEEE. 661). [115] Koopman, M., Rodriguez, A. M., and Geradts, Z. (2018). [134] Fried, O., Tewari, A., Zollho¨fer, M., Finkelstein, A., Shecht- Detection of deepfake video manipulation. In The 20th Irish man, E., Goldman, D. B., ... and Agrawala, M. (2019). Text- Machine Vision and Image Processing Conference (IMVIP) based editing of talking-head video. ACM Transactions on (pp. 133-136). Graphics (TOG), 38(4), 1-14. [116] Rosenfeld, K., and Sencar, H. T. (2009, February). A study of [135] Fernandes, S., Raj, S., Ewetz, R., Singh Pannu, J., Kumar the robustness of PRNU-based camera identiﬁcation. In Media Jha, S., Ortiz, E., ... and Salter, M. (2020). Detecting deepfake Forensics and Security (Vol. 7254, p. 72540M). International videos using attribution-based conﬁdence metric. In Proceed- Society for Optics and Photonics. ings of the IEEE/CVF Conference on Computer Vision and [117] Li, C. T., and Li, Y. (2012). Color-decoupled photo response Pattern Recognition Workshops (pp. 308-309). non-uniformity for digital image forensics. IEEE Transactions [136] Cao, Q., Shen, L., Xie, W., Parkhi, O. M., and Zisserman, on Circuits and Systems for Video Technology, 22(2), 260-271. A. (2018, May). VGGFace2: A dataset for recognising faces [118] Lin, X., and Li, C. T. (2017). Large-scale image clustering across pose and age. In 2018 13th IEEE International Confer- based on camera ﬁngerprints. IEEE Transactions on Informa- ence on Automatic Face and Gesture Recognition (pp. 67-74). tion Forensics and Security, 12(4), 793-808. IEEE. [119] Scherhag, U., Debiasi, L., Rathgeb, C., Busch, C., and Uhl, A. [137] Jha, S., Raj, S., Fernandes, S., Jha, S. K., Jha, S., Jalaian, B., (2019). Detection of face morphing attacks based on PRNU ... and Swami, A. (2019). Attribution-based conﬁdence metric

15 for deep neural networks. In Advances in Neural Information want. IEEE Transactions on Image Processing, 28(11), 5464- Processing Systems (pp. 11826-11837). 5478. [138] Fernandes, S., Raj, S., Ortiz, E., Vintila, I., Salter, M., [154] Karras, T., Laine, S., and Aila, T. (2019). A style-based Urosevic, G., and Jha, S. (2019, October). Predicting heart generator architecture for generative adversarial networks. In rate variations of deepfake videos using neural ODE. In Proceedings of the IEEE Conference on Computer Vision and 2019 IEEE/CVF International Conference on Computer Vision Pattern Recognition (pp. 4401-4410). Workshop (ICCVW) (pp. 1721-1729). IEEE. [155] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and [139] Korshunov, P., and Marcel, S. (2018). Deepfakes: a new threat Aila, T. (2020). Analyzing and improving the image quality to face recognition? assessment and detection. arXiv preprint of StyleGAN. In Proceedings of the IEEE/CVF Conference on arXiv:1812.08685. Computer Vision and Pattern Recognition (pp. 8110-8119). [140] Chintha, A., Thai, B., Sohrawardi, S. J., Bhatt, K. M., Hicker- [156] Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, son, A., Wright, M., and Ptucha, R. (2020). Recurrent convolu- B. (2020). Face X-ray for more general face forgery detection. tional structures for audio spoof and video deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer IEEE Journal of Selected Topics in Signal Processing, DOI: Vision and Pattern Recognition (pp. 5001-5010). 10.1109/JSTSP.2020.2999185. [157] Wang, S. Y., Wang, O., Zhang, R., Owens, A., & Efros, A. [141] Li, Y., Yang, X., Sun, P., Qi, H., and Lyu, S. (2020). Celeb-DF: A. (2020). CNN-generated images are surprisingly easy to A large-scale challenging dataset for deepfake forensics. In spot... for now. In Proceedings of the IEEE/CVF Conference Proceedings of the IEEE/CVF Conference on Computer Vision on Computer Vision and Pattern Recognition (pp. 8695-8704). and Pattern Recognition (pp. 3207-3216). [158] Dai, T., Cai, J., Zhang, Y., Xia, S. T., & Zhang, L. [142] Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, (2019). Second-order attention network for single image super- H., Nautsch, A., ... and Lee, K. A. (2019). ASVspoof 2019: resolution. In Proceedings of the IEEE/CVF Conference on Future horizons in spoofed and fake audio detection. arXiv Computer Vision and Pattern Recognition (pp. 11065-11074). preprint arXiv:1904.05441. [159] Maras, M. H., and Alexandrou, A. (2019). Determining au- [143] Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., and thenticity of video evidence in the age of artiﬁcial intelligence Manocha, D. (2020). Emotions don’t lie: A deepfake detec- and in the wake of deepfake videos. The International Journal tion method using audio-visual affective cues. arXiv preprint of Evidence and Proof, 23(3), 255-262. arXiv:2003.06711. [160] Su, L., Li, C., Lai, Y., and Yang, J. (2017). A fast forgery [144] Dolhansky, B., Howes, R., Pﬂaum, B., Baram, N., and Fer- detection algorithm based on exponential-Fourier moments for rer, C. C. (2019). The deepfake detection challenge (DFDC) video region duplication. IEEE Transactions on Multimedia, preview dataset. arXiv preprint arXiv:1910.08854. 20(4), 825-840. [145] Agarwal, S., El-Gaaly, T., Farid, H., and Lim, S. N. (2020). [161] Iuliani, M., Shullani, D., Fontani, M., Meucci, S., and Piva, Detecting deep-fake videos from appearance and behavior. A. (2018). A video forensic framework for the unsupervised arXiv preprint arXiv:2004.14491. analysis of MP4-like ﬁle container. IEEE Transactions on [146] Dufour, N., and Gully, A. (2019). Contributing Information Forensics and Security, 14(3), 635-645. Data to Deepfake Detection Research. Available at: [162] Malolan, B., Parekh, A., and Kazi, F. (2020, March). Explain- https://ai.googleblog.com/2019/09/contributing-data-to- able deep-fake detection using visual interpretability methods. deepfake-detection.html. In The 3rd International Conference on Information and [147] Ciftci, U. A., Demir, I., & Yin, L. (2020). FakeCatcher: Detec- Computer Technologies (ICICT) (pp. 289-293). IEEE. tion of synthetic portrait videos using biological signals. IEEE Transactions on Pattern Analysis and Machine Intelligence, Thanh Thi Nguyen was a Visiting Scholar DOI: 10.1109/TPAMI.2020.3009287. with the Computer Science Department at [148] Huang, G. B., Mattar, M., Berg, T., and Learned-Miller, Stanford University, California, USA in 2015 E. (2007, October). Labelled faces in the wild: A database and the Edge Computing Lab, Harvard Uni- for studying face recognition in unconstrained environ- versity, Massachusetts, USA in 2019. He re- ments. Technical Report 07-49, University of Massachusetts, ceived a European-Paciﬁc Partnership for ICT Amherst, http://vis-www.cs.umass.edu/lfw/. Expert Exchange Program Award from Eu- [149] Shaoanlu’s GitHub. (2019). Few-Shot Face Translation ropean Commission in 2018, and an Aus- GAN. Available at https://github.com/shaoanlu/fewshot-face- tralia–India Strategic Research Fund Early- translation-GAN. and Mid-Career Fellowship Awarded by the Australian Academy of [150] Guarnera, L., Giudice, O., and Battiato, S. (2020). Deepfake Science in 2020. Dr. Nguyen obtained a PhD in Mathematics and detection by analyzing convolutional traces. In Proceedings of Statistics from Monash University, Australia and has expertise in the IEEE/CVF Conference on Computer Vision and Pattern various areas, including AI, deep learning, reinforcement learning, Recognition Workshops (pp. 666-667). computer vision, cyber security, IoT, and data science. He is currently [151] Cho, W., Choi, S., Park, D. K., Shin, I., and Choo, J. (2019). a Senior Lecturer in the School of Information Technology, Deakin Image-to-image translation via group-wise deep whitening- University, Victoria, Australia. and-coloring transformation. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (pp. 10639-10647). [152] Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., and Choo, J. (2018). StarGAN: Uniﬁed generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8789-8797). [153] He, Z., Zuo, W., Kan, M., Shan, S., and Chen, X. (2019). AttGAN: Facial attribute editing by only changing what you

16 Quoc Viet Hung Nguyen received a PhD de- Duc Thanh Nguyen was awarded a PhD gree from EPFL, Switzerland. He is currently in Computer Science from the University of a senior lecturer with Grifﬁth University, Aus- Wollongong, Australia in 2012. Currently, he tralia. He has published several articles in top- is a lecturer in the School of Information tier venues, such as SIGMOD, VLDB, SIGIR, Technology, Deakin University, Australia. His KDD, AAAI, ICDE, IJCAI, JVLDB, TKDE, research interests include computer vision and TOIS, and TIST. His research interests include pattern recognition. He has published his work data mining, data integration, data quality, in- in highly ranked publication venues in Com- formation retrieval, trust management, recom- puter Vision and Pattern Recognition such as mender systems, machine learning, and big data visualization. the Journal of Pattern Recognition, CVPR, ICCV, ECCV. He also has served a technical program committee member for many premium conferences such as CVPR, ICCV, ECCV, AAAI, ICIP, PAKDD and reviewer for the IEEE Trans. Intell. Transp. Syst., the IEEE Trans. Image Process., the IEEE Signal Processing Letters, Image and Vision Computing, Pattern Recognition, Scientiﬁc Reports. Cuong M. Nguyen received the B.Sc. and M.Sc. degrees in Mathematics from Vietnam National University, Hanoi, Vietnam. In 2017, he received the Ph.D. degree from School of Engineering, Deakin University, Australia, where he worked as postdoctoral researcher and sessional lecturer for several years. He is currently a postdoc in Autonomous Vehicles at LAMIH UMR CNRS 8201, Universite´ Poly- technique Hauts-de-France, France. His research interests lie in the areas of Optimization, Machine Learning, and Control Systems. Dung Tien Nguyen received the B.Eng. and Saeid Nahavandi received a Ph.D. from M.Eng. degrees from the People Securiry Durham University, U.K. in 1991. He is an Academy and the Vietnam National University Alfred Deakin Professor, Pro Vice-Chancellor University of Enginerring and Technology in (Defence Technologies), Chair of Engineering, 2006 and 2013, respectively. He received his and the Director for the Institute for Intelligent PhD degree in the area of multimodal emotion Systems Research and Innovation at Deakin recognition using deep learning techniques University, Victoria, Australia. His research from Queensland University of Technology in interests include modelling of complex sys- Brisbane, Australia in 2019. He is currently tems, machine learning, robotics and haptics. working as a research fellow at Deakin University in computer vision, He is a Fellow of Engineers Australia (FIEAust), the Institution of machine learning, deep learning, image processing, and affective Engineering and Technology (FIET) and IEEE (FIEEE). computing. He is the Co-Editor-in-Chief of the IEEE Systems Journal, As- sociate Editor of the IEEE/ASME Transactions on Mechatronics, Associate Editor of the IEEE Transactions on Systems, Man and Cybernetics: Systems, and an IEEE Access Editorial Board member.

apannavich

Deep Learning for Deepfakes Creation and

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

Deep Learning for Deepfakes Creation and

Description: Deep Learning for Deepfakes Creation and

Read the Text Version

apannavich

TOP SEARCH

RELATED PUBLICATIONS