| 研究生: |
儲健峰 Chien-Feng Chu |
|---|---|
| 論文名稱: |
基於AI技術之對抗部分遮蔽的即時臉部辨識系統 |
| 指導教授: |
王文俊
Wen-June Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 82 |
| 中文關鍵詞: | 深度學習 、物件追蹤 、臉部偵測 、臉部辨識 、臉部遮蔽 、注意力機制 |
| 外文關鍵詞: | deep learning, object tracking, face detection, face recognition, partially covered faces, attention mechanism |
| 相關次數: | 點閱:42 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文旨在使用深度學習技術並結合追蹤演算法,實現一套對抗部分遮蔽的即時臉部辨識系統。儘管臉部受到部分遮蔽,如:口罩、帽子、墨鏡,我們提出的系統還是能夠順利的進行辨識,並可運行在即時影像與離線影像上。進行臉部辨識之前,僅需藉由註冊系統完成身分註冊後,系統可以立即辨識,無須重新訓練。系統在辨識的同時,會記錄該身分進入、離開畫面的時間點,方便使用者用於影像搜索、紀錄等場景。
目前遮蔽的臉部辨識面臨以下幾個問題,首先對於未學習過的臉部遮蔽情況,會影響網路辨識的準確性。或是僅適用於正臉,對於不同角度的人臉,其準確性也會大幅下降。再者,也有文獻提出使用臉部修復的方法,但該方法需要大量的當地人臉做為訓練資料集,以修復被遮蔽的臉部區塊,但亞洲人臉的資料集相較於歐美人臉少非常多,可能會導致亞洲人臉還原後,具有歐美人臉特徵。
一般人們辨識對方,對方人臉若有受到遮蔽之情況下,普遍會更關注臉部未被遮蔽的區塊。本論文就以此觀點進行研究,在網路中加入注意力機制模塊,以改良人臉辨識網路模型SeesawFaceNet的骨幹架構,並在訓練時使用遮蔽模塊,對訓練資料進行擴增,使得網路對部分遮蔽的人臉擷取特徵時,自動關注未遮蔽的臉部區域,加強擷取未遮蔽特徵,藉此減少遮蔽物對特徵的影響。並且避免原先需要大量亞洲人臉資料集,以及改善不同角度的遮蔽人臉辨識問題,使系統對於不同的遮蔽情況,能夠達到更好的辨識準確性。
This thesis attempts to use deep learning technology combining tracking algorithm to implement a real-time system to recognize a person whose face is partially covered. Although the face is partially covered by a mask, hat or sunglasses, the system can still accurately recognize the face of a person in real time or offline video. To perform facial recognition, we should use the enrollment system to complete the identity enrollment, and the system can recognize the enrolled person immediately without retraining even his/her face is partially covered by something. At the same time, the system will record the time points when the enrolled person enters or leaves the screen. This function is very convenient to be used to search a specific person in a long video.
The partially covered facial recognition has some problems to be overcome. One is if there is no learning about all covered face cases, it is hard to have an accurate recognition result. The other is the covered face should be the exactly front face, otherwise the recognition accuracy will be reduced. Furthermore, some studies used facial repair method to recover the covered part of the face, but it needs a lot of local people’s faces as training data for training. However, the data set of Asian faces is much less than that of European and American faces, which may cause Asian faces having European and American facial features after face repair.
It is known that when people recognize a partially covered face, they usually pay more attention to the uncovered areas of the face. This thesis studies from this point of view and adds the attention mechanism to the network model to improve the backbone architecture of the face recognition network SeesawFaceNet. During the training process, let the covered block be used to augment the training data set. When the network extracts feature of the partially covered faces, it will automatically focus on the uncovered face areas, and strengthen the extraction of the uncovered features to mitigate the impact of feature reduction. Two more contributions of this study are that we do not need a large data set of Asian faces, and can recognize the partially covered faces with different angles.
[1] (2018年, 7月). AI/感測關鍵技術助陣 [Online]. Available: https://www.2cm.com.tw/2cm/zhtw/market/39CF0BA42294406F8EE77337CF67FBB5
[2] (2017年, 11月). An On-device Deep Neural Network for Face Detection [Online]. Available: https://machinelearning.apple.com/research/face-detection
[3] (2019年, 11月). 經濟部國際合作處 [Online]. Available: https://www.moea.gov.tw/mns/ietc/bulletin/Bulletin.aspx?kind=30&html=1&menu_id=17130&bull_id=6536
[4] P. Hu and D. Ramanan, "Finding tiny faces," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 951-959.
[5] M. Najibi, P. Samangouei, R. Chellappa, and L. S. Davis, "Ssh: Single stage headless face detector," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4875-4884.
[6] X. Tang, D. K. Du, Z. He, and J. Liu, "Pyramidbox: A context-assisted single shot face detector," in Proceedings of the European Conference on Computer Vision, 2018, pp. 797-813.
[7] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li, "S3fd: Single shot scale-invariant face detector," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 192-201.
[8] J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, and S. Zafeiriou, "Retinaface: Single-stage dense face localisation in the wild," arXiv preprint arXiv:1905.00641, 2019.
[9] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, pp. 1499-1503, 2016.
[10] Y. He, D. Xu, L. Wu, M. Jian, S. Xiang, and C. Pan, "LFFD: A light and fast face detector for edge devices," arXiv preprint arXiv:1904.10633, 2019.
[11] Y. Xu, W. Yan, G. Yang, J. Luo, T. Li, and J. He, "CenterFace: joint face detection and alignment using face as point," arXiv preprint arXiv:1911.03599, 2019.
[12] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, "Centernet: Keypoint triplets for object detection," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6569-6578.
[13] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized intersection over union: A metric and a loss for bounding box regression," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 658-666.
[14] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU loss: Faster and better learning for bounding box regression," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12993-13000.
[15] Z.-H. Feng, J. Kittler, M. Awais, P. Huber, and X.-J. Wu, "Wing loss for robust facial landmark localisation with convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2235-2245.
[16] J. Yang, D. Zhang, A. F. Frangi, and J.-y. Yang, "Two-dimensional PCA: a new approach to appearance-based face representation and recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 26, pp. 131-137, 2004.
[17] F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A unified embedding for face recognition and clustering," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815-823.
[18] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, "Sphereface: Deep hypersphere embedding for face recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 212-220.
[19] H. Wang et al., "Cosface: Large margin cosine loss for deep face recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5265-5274.
[20] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "Arcface: Additive angular margin loss for deep face recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4690-4699.
[21] Y. Huang et al., "Curricularface: adaptive curriculum learning loss for deep face recognition," in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5901-5910.
[22] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, "KNN model-based approach in classification," in OTM Confederated International Conferences, 2003, pp. 986-996.
[23] D. S. Trigueros, L. Meng, and M. Hartnett, "Enhancing convolutional neural networks for face recognition with occlusion maps and batch triplet loss," Image and Vision Computing, vol. 79, pp. 99-108, 2018.
[24] E. Osherov and M. Lindenbaum, "Increasing cnn robustness to occlusions by reducing filter support," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 550-561.
[25] F. Cen and G. Wang, "Dictionary representation of deep features for occlusion-robust face recognition," IEEE Access, vol. 7, pp. 26595-26605, 2019.
[26] S. Ge, C. Li, S. Zhao, and D. Zeng, "Occluded face recognition in the wild by identity-diversity inpainting," IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, pp. 3387-3397, 2020.
[27] A. Howard et al., "Searching for mobilenetv3," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314-1324.
[28] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[29] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
[30] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988.
[31] J. Zhang, "SeesawFaceNets: sparse and robust face verification model for mobile platform," arXiv preprint arXiv:1908.09124, 2019.
[32] K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.
[33] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision, 2018, pp. 3-19.
[34] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," in Proceedings of the IEEE conference on image processing, 2016, pp. 3464-3468.
[35] R. Kalman, "A new approach to linear filtering and prediction problems," Journal of Basic Engineering, vol. 82, pp. 35-45, 1960.
[36] S. Yang, P. Luo, C.-C. Loy, and X. Tang, "Wider face: A face detection benchmark," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525-5533.
[37] Face Mask Detection Dataset. https://makeml.app/datasets/mask
[38] X. An et al., "Partial FC: Training 10 Million Identities on a Single Machine," arXiv preprint arXiv:2010.05222, 2020.
[39] I. K. Shlizerman, S. M. Seitz, D. Miller, and E. Brossard, "The megaface benchmark: 1 million faces for recognition at scale," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4873-4882.
[40] N. Boyko, O. Basystiuk, and N. Shakhovska, "Performance evaluation and comparison of software for face recognition, based on dlib and opencv library," in Proceedings of the IEEE Second International Conference on Data Stream Mining & Processing, 2018, pp. 478-482.
[41] D. Yi, Z. Lei, S. Liao, and S. Z. Li. "Learning face representation from scratch," arXiv preprint arXiv:1411.7923, 2014
[42] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, "Labeled faces in the wild: A database forstudying face recognition in unconstrained environments," in Workshop on faces in'Real-Life'Images: detection, alignment, and recognition, 2008.
[43] S. Sengupta, J.C. Chen, C. Castillo, V. M. Patel, R. Chellappa, and D. W. Jacobs, "Frontal to profile face verification in the wild," in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2016, pp. 1-9.
[44] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou, "Agedb: the first manually collected, in-the-wild age database," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 51-59.
[45] B. Maze et al., "Iarpa janus benchmark-c: Face dataset and protocol," in International Conference on Biometrics, 2018, pp. 158-165.