The rapid advancement of AI has made it possible to create highly realistic synthetic videos โ commonly known as deepfakes. These manipulated videos can closely mimic real human faces and voices, making it increasingly difficult to distinguish authentic from fake content.
This project focuses on building a deepfake video detection system using deep learning. The system classifies videos as REAL or FAKE by analyzing visual patterns extracted from video frames.
The approach leverages preprocessing (frame extraction + face detection) followed by feature learning using neural network models โ contributing to trust and authenticity in digital content.
| Tool | Repository | Key Features |
|---|---|---|
| Faceswap | github.com/deepfakes/faceswap | Two encoder-decoder pairs; shared encoder parameters |
| Faceswap-GAN | github.com/shaoanlu/faceswap-GAN | Adversarial loss + perceptual loss (VGGface) on auto-encoder architecture |
| Few-Shot Face Translation | github.com/shaoanlu/fewshot-facetranslation-GAN | Pre-trained face recognition for latent embedding; FUNIT + SPADE semantic priors |
| DeepFaceLab | github.com/iperov/DeepFaceLab | Extended Faceswap with H64, H128, LIAEF128, SAE models; S3FD, MTCNN, dlib extraction |
A dedicated CNN model compares generated faces with surrounding regions to detect artifacts. Current DF algorithms generate limited-resolution images that require transformation to match source video faces โ creating detectable warping artifacts.
Uncovers fake face recordings by detecting eye blinking patterns โ a physiological signal absent in synthesized videos. Tested on benchmark datasets with promising results on DNN-generated recordings. Absence of flickering is used as the primary detection hint.
Detects manipulated/forged video and image data across various situations including replay attacks and computer-generated videos. Random noise was used in training โ an undesirable practice. Our method proposes a noiseless, real-time dataset for improved robustness.
Detects fake portrait videos using biological signals (PPG guides) extracted from genuine/fake video pairs. Trains a probabilistic SVM + CNN to ensure spatial soundness and temporal consistency. Achieves high accuracy regardless of generator, content, or goal.
Design and develop a deep learning algorithm to classify video as deepfake or pristine. Predict the probability that a video is fake โ a binary classification problem.
Binary Cross Entropy Loss optimized on every training sample:
where p(Y=i|V) is the probability that the network labels the video as class i.
Many tools exist to create DeepFakes, but few can reliably detect them. Our approach detects all types of deepfakes:
| Programming Language | Python |
| Deep Learning Framework | TensorFlow / Keras |
| Computer Vision | OpenCV (cv2) |
| Deep Learning Model | ResNet50 |
| Sequence Model | LSTM |
| Data Processing | NumPy, Pandas |
| Visualization | Matplotlib, Seaborn |
| ML Utilities | Scikit-learn |
| Dev Environment | Jupyter Notebook |
import cv2, numpy as np from tensorflow.keras import Sequential from tensorflow.keras.applications import ResNet50 # Feature Extractor resnet = ResNet50(weights='imagenet', include_top=False) # Sequence Model model = Sequential([ resnet, LSTM(256), Dense(1, activation='sigmoid') ]) # Binary Classification model.compile( loss='binary_crossentropy', optimizer='adam' )
| Model | Train Accuracy | Test Accuracy | Status |
|---|---|---|---|
| Custom Model | 0.8923 |
0.8027 |
Baseline |
| ResNet50 + LSTM โญ | ~0.92 |
~0.80 |
Our Model |
| MesoNet | 0.9568 |
0.8997 |
Comparison |
| DenseNet121 | 0.9699 |
0.8881 |
Comparison |
Enhance detection precision through advanced architectures and fine-tuning strategies.
Develop low-latency inference pipelines for live video stream analysis.
Expand training data with diverse demographics, lighting, and manipulation techniques.
Optimize model for deployment on mobile devices and edge computing hardware.
Integrate XAI techniques to visualize and explain model detection decisions.
Combine audio + visual signals for more robust cross-modal deepfake detection.
Continuously adapt to emerging deepfake generation methods and adversarial attacks.
A deepfake video detection system was developed using deep learning on a large-scale video dataset. The system distinguishes real from manipulated videos by learning spatial and temporal patterns from extracted video frames through preprocessing (frame extraction + face cropping).
The proposed approach is effective in identifying deepfake content with good accuracy, demonstrating the potential of combining convolutional and sequential models. Careful preprocessing and balanced dataset preparation play a key role in achieving reliable results.
This project contributes toward addressing the growing challenge of digital misinformation caused by synthetic media โ providing a foundation for advanced deepfake detection in security, media verification, and online content moderation.
Joshua Brockschmidt, Jiacheng Shang, and Jie Wu. On the Generality of Facial Forgery Detection. IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW), pp. 43โ47. IEEE, 2019.
Yuezun Li, Ming-Ching Chang, and Siwei Lyu. In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking. arXiv:1806.02877v2, 2018.
TackHyun Jung, SangWon Kim, and KeeCheon Kim. Deep-Vision: Deepfakes Detection Using Human Eye Blinking Pattern. IEEE Access, 8:83144โ83154, 2020.
Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. Realistic Speech-Driven Facial Animation with GANs. International Journal of Computer Vision, 128:1398โ1413, 2020.
Hai X. Pham, Yuting Wang, and Vladimir Pavlovic. Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network. arXiv:1803.07716, 2018.
Yuezun Li, Siwei Lyu. Exposing DF Videos By Detecting Face Warping Artifacts. arXiv:1811.00656v3.
Yuezun Li, Ming-Ching Chang and Siwei Lyu. Exposing AI Created Fake Videos by Detecting Eye Blinking. arXiv.
Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. Using Capsule Networks to Detect Forged Images and Videos.
Umur Aybars Ciftci, ฤฐlke Demir, Lijun Yin. Detection of Synthetic Portrait Videos using Biological Signals. arXiv:1901.02212v2.
Liu, M. Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., and Kautz, J. Few-shot unsupervised image-to-image translation. Proceedings of the IEEE International Conference on Computer Vision, pp. 10551โ10560, 2019.