| 研究生: |
黃博鴻 Bo-Hong Huang |
|---|---|
| 論文名稱: |
基於區塊一致性評估之影像竄改與深偽視訊偵測 Detecting Forged Images and DeepFake Videos via Block Consistency Evaluation |
| 指導教授: | 蘇柏齊 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 影像竄改 、深度偽造 、孿生網路 、深度學習 |
| 外文關鍵詞: | Image manipulation, DeepFake, Siamese network, deep learning |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
數位影像編輯工具可輕易改變影像甚至視訊內容,並同時保持極高的畫面品質。深度偽造(DeepFake)的出現造成更大的影響,也因為各種惡意目的的操作下,這些對於內容的篡改使數位影像與視訊的真實性帶來威脅與挑戰。近年來有不少偵測畫面內容竄改的方法被提出,大多採用機器學習或深度學習相關技術。然而竄改方式的多樣與不斷演進變化,讓搜集所有型態的竄改資料以進行監督式訓練變得困難或不切實際,即便蒐集齊全也可能面臨資料集過於龐大而需要更多訓練資源等問題。
本研究從另一個角度出發,提出基於區塊相似性的深度學習辨識方法,透過評估區塊內容的一致性來判斷影像或視訊中的偽造或受影響區域。這種方法旨在避免蒐集各類型竄改資料進行訓練,我們選擇使用原始或未修改畫面區塊來實現相關的辨識與偵測。我們訓練一個卷積神經網路來提取影像區塊特徵,使用孿生(Siamese)網路進行區塊對之間的相似度比對,以確定畫面中可能被竄改的區域。對於影像竄改偵測,我們另引入分割網路以對竄改區域進行進一步精細處理。對於深偽視訊偵測,我們首先定位人臉區域,接著透過比對前後幀中的人臉區域相似度來判斷該視訊的真實性。我們在公開資料集上對所提出的方法進行測試和驗證,以證實所提出方法的可行性。這些資料集包含各種不同類型的影像與視訊,涵蓋多種內容竄改操作。與其他方法比較顯示所提出方案在準確性和穩定性的優越。
Digital image editing tools enable effortless manipulation of images and video content while maintaining high visual quality. However, the emergence of DeepFake has introduced significant challenges to the authenticity of digital media. Various methods for detecting such content manipulations have been proposed, primarily relying on machine learning or deep learning techniques. However, the constantly evolving nature of manipulation methods makes it impractical to collect all types of manipulated data for supervised training. Additionally, handling large datasets can be resource-intensive. In this study, we propose a deep learning-based method that utilizes block similarity to identify forged or manipulated regions within images or DeepFake videos by evaluating the consistency of block content. Our approach aims to avoid the need for collecting various types of manipulated data for training. Instead, we opt to use original or unmodified blocks for forgery detection. We train a convolutional neural network to extract features from image blocks and employ a Siamese network to compare block similarity. For image manipulation detection, we introduce a segmentation network to further refine the detection of manipulated regions. In the cases of DeepFake video detection, we first locate facial regions and then determine the video's authenticity by comparing facial region similarity between consecutive frames. We conduct tests on publicly available datasets, encompassing images and videos with various content manipulation operations. The experimental results demonstrate superior accuracy and stability compared to other existing methods.
[1] M. Kirchner and T. Gloe, "Forensic camera model identification," Handbook of Digital Forensics of Multimedia Data and Devices, pp. 329-374, 2015.
[2] A. Swaminathan, M. Wu, and K. R. Liu, "Nonintrusive component forensics of visual sensors using output images," IEEE Transactions on Information Forensics and Security, vol. 2, no. 1, pp. 91-106, 2007.
[3] T. Filler, J. Fridrich, and M. Goljan, "Using sensor pattern noise for camera model identification," in 2008 15th IEEE international conference on image processing, 2008: IEEE, pp. 1296-1299.
[4] G. Xu and Y. Q. Shi, "Camera model identification using local binary patterns," in 2012 IEEE international conference on multimedia and expo, 2012: IEEE, pp. 392-397.
[5] T. H. Thai, R. Cogranne, and F. Retraint, "Camera model identification based on the heteroscedastic noise model," IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 250-263, 2013.
[6] L. T. Van, S. Emmanuel, and M. S. Kankanhalli, "Identifying source cell phone using chromatic aberration," in 2007 IEEE International Conference on Multimedia and Expo, 2007: IEEE, pp. 883-886.
[7] H. Farid, "Image forgery detection," IEEE Signal processing magazine, vol. 26, no. 2, pp. 16-25, 2009.
[8] B. Bayar and M. C. Stamm, "Augmented convolutional feature maps for robust cnn-based camera model identification," in 2017 IEEE International Conference on Image Processing (ICIP), 2017: IEEE, pp. 4098-4102.
[9] A. E. Dirik and N. Memon, "Image tamper detection based on demosaicing artifacts," in 2009 16th IEEE International Conference on Image Processing (ICIP), 2009: IEEE, pp. 1497-1500.
[10] L. Bondi, S. Lameri, D. Guera, P. Bestagini, E. J. Delp, and S. Tubaro, "Tampering Detection and Localization Through Clustering of Camera-Based CNN Features," in CVPR Workshops, 2017, vol. 2, p. 2.
[11] O. Mayer and M. C. Stamm, "Learned forensic source similarity for unknown camera models," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: IEEE, pp. 2012-2016.
[12] "Deepfakes." https://github.com/deepfakes/faceswap Accessed: 2021-11-13. (accessed.
[13] I. Korshunova, W. Shi, J. Dambre, and L. Theis, "Fast face-swap using convolutional neural networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3677-3685.
[14] E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky, "Few-shot adversarial learning of realistic neural talking head models," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9459-9468.
[15] Y. Zhu, Q. Li, J. Wang, C.-Z. Xu, and Z. Sun, "One shot face swapping on megapixels," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4834-4844.
[16] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110-8119.
[17] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, "Face2face: Real-time face capture and reenactment of rgb videos," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387-2395.
[18] H. Kim et al., "Deep video portraits," ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1-14, 2018.
[19] O. Fried et al., "Text-based editing of talking-head video," ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1-14, 2019.
[20] T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401-4410.
[21] T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive growing of gans for improved quality, stability, and variation," arXiv preprint arXiv:1710.10196, 2017.
[22] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8789-8797.
[23] T. Nguyen, A. T. Tran, and M. Hoai, "Lipstick ain't enough: beyond color matching for in-the-wild makeup transfer," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13305-13314.
[24] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, "High-resolution image synthesis and semantic manipulation with conditional gans," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8798-8807.
[25] Y. Nirkin, Y. Keller, and T. Hassner, "Fsgan: Subject agnostic face swapping and reenactment," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7184-7193.
[26] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, "Two-stream neural networks for tampered face detection," in 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2017: IEEE, pp. 1831-1839.
[27] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "Mesonet: a compact facial video forgery detection network," in 2018 IEEE international workshop on information forensics and security (WIFS), 2018: IEEE, pp. 1-7.
[28] H. H. Nguyen, J. Yamagishi, and I. Echizen, "Capsule-forensics: Using capsule networks to detect forged images and videos," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: IEEE, pp. 2307-2311.
[29] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "Faceforensics++: Learning to detect manipulated facial images," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1-11.
[30] J. Li, H. Xie, J. Li, Z. Wang, and Y. Zhang, "Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6458-6467.
[31] H. Liu et al., "Spatial-phase shallow learning: rethinking face forgery detection in frequency domain," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 772-781.
[32] H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen, "Multi-task learning for detecting and segmenting manipulated facial images and videos," in 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2019: IEEE, pp. 1-8.
[33] D. Güera and E. J. Delp, "Deepfake video detection using recurrent neural networks," in 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), 2018: IEEE, pp. 1-6.
[34] Y. Li, M.-C. Chang, and S. Lyu, "In ictu oculi: Exposing ai created fake videos by detecting eye blinking," in 2018 IEEE international workshop on information forensics and security (WIFS), 2018: IEEE, pp. 1-7.
[35] X. Yang, Y. Li, and S. Lyu, "Exposing deep fakes using inconsistent head poses," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: IEEE, pp. 8261-8265.
[36] B. Bayar and M. C. Stamm, "A deep learning approach to universal image manipulation detection using a new convolutional layer," in Proceedings of the 4th ACM workshop on information hiding and multimedia security, 2016, pp. 5-10.
[37] M. Huh, A. Liu, A. Owens, and A. A. Efros, "Fighting fake news: Image splice detection via learned self-consistency," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 101-117.
[38] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, "Signature verification using a" siamese" time delay neural network," Advances in neural information processing systems, vol. 6, 1993.
[39] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
[40] B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, "High performance visual tracking with siamese region proposal network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8971-8980.
[41] P. Neculoiu, M. Versteegh, and M. Rotaru, "Learning text similarity with siamese recurrent networks," in Proceedings of the 1st Workshop on Representation Learning for NLP, 2016, pp. 148-157.
[42] R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, vol. 2: IEEE, pp. 1735-1742.
[43] M. Forte and F. Pitié, "$ F $, $ B $, Alpha Matting," arXiv preprint arXiv:2003.07711, 2020.
[44] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE transactions on image processing, vol. 13, no. 4, pp. 600-612, 2004.
[45] D. Shullani, M. Fontani, M. Iuliani, O. A. Shaya, and A. Piva, "Vision: a video and image dataset for source identification," EURASIP Journal on Information Security, vol. 2017, no. 1, pp. 1-16, 2017.
[46] M. Stamm, P. Bestagini, L. Marcenaro, and P. Campisi, "Forensic camera model identification: Highlights from the IEEE signal processing cup 2018 student competition [sp competitions]," IEEE Signal Processing Magazine, vol. 35, no. 5, pp. 168-174, 2018.
[47] T. Gloe and R. Böhme, "The'Dresden Image Database'for benchmarking digital image forensics," in Proceedings of the 2010 ACM symposium on applied computing, 2010, pp. 1584-1590.
[48] T. J. De Carvalho, C. Riess, E. Angelopoulou, H. Pedrini, and A. de Rezende Rocha, "Exposing digital image forgeries by illumination color classification," IEEE Transactions on Information Forensics and Security, vol. 8, no. 7, pp. 1182-1194, 2013.
[49] A.-R. Gu, J.-H. Nam, and S.-C. Lee, "FBI-Net: Frequency-Based Image Forgery Localization via Multitask Learning With Self-Attention," IEEE Access, vol. 10, pp. 62751-62762, 2022.
[50] P. Ferrara, T. Bianchi, A. De Rosa, and A. Piva, "Image forgery localization via fine-grained analysis of CFA artifacts," IEEE Transactions on Information Forensics and Security, vol. 7, no. 5, pp. 1566-1577, 2012.
[51] S. Ye, Q. Sun, and E.-C. Chang, "Detecting digital image forgeries by measuring inconsistencies of blocking artifact," in 2007 IEEE International Conference on Multimedia and Expo, 2007: Ieee, pp. 12-15.
[52] B. Mahdian and S. Saic, "Using noise inconsistencies for blind image forensics," Image and Vision Computing, vol. 27, no. 10, pp. 1497-1503, 2009.
[53] R. Salloum, Y. Ren, and C.-C. J. Kuo, "Image splicing localization using a multi-task fully convolutional network (MFCN)," Journal of Visual Communication and Image Representation, vol. 51, pp. 201-209, 2018.
[54] Y. Wu, W. AbdAlmageed, and P. Natarajan, "Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9543-9552.
[55] H. Ding, L. Chen, Q. Tao, Z. Fu, L. Dong, and X. Cui, "DCU-Net: a dual-channel U-shaped network for image splicing forgery detection," Neural computing and applications, vol. 35, no. 7, pp. 5015-5031, 2023.
[56] Y. Zhang, G. Zhu, L. Wu, S. Kwong, H. Zhang, and Y. Zhou, "Multi-task SE-network for image splicing localization," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4828-4840, 2021.
[57] "Faceswap." https://github.com/MarekKowalski/FaceSwap.
Accessed: 2021-11-13. (accessed.
[58] J. Thies, M. Zollhöfer, and M. Nießner, "Deferred neural rendering: Image synthesis using neural textures," Acm Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1-12, 2019.
[59] J. Fridrich and J. Kodovsky, "Rich models for steganalysis of digital images," IEEE Transactions on information Forensics and Security, vol. 7, no. 3, pp. 868-882, 2012.
[60] D. Cozzolino, G. Poggi, and L. Verdoliva, "Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection," in Proceedings of the 5th ACM workshop on information hiding and multimedia security, 2017, pp. 159-164.
[61] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, "Celeb-df: A large-scale challenging dataset for deepfake forensics," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3207-3216.
[62] Y. Li and S. Lyu, "Exposing deepfake videos by detecting face warping artifacts," arXiv preprint arXiv:1811.00656, 2018.
[63] F. Matern, C. Riess, and M. Stamminger, "Exploiting visual artifacts to expose deepfakes and face manipulations," in 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2019: IEEE, pp. 83-92.
[64] X. Li et al., "Sharp multiple instance learning for deepfake video detection," in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 1864-1872.
[65] I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and W. AbdAlmageed, "Two-branch recurrent network for isolating deepfakes in videos," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, 2020: Springer, pp. 667-684.