跳到主要內容

簡易檢索 / 詳目顯示

研究生: 莊函潔
Han-Cheih Chuang
論文名稱: 結合影像與音訊之身份認證系統
Identity Authentication System Using Audio-visual Information
指導教授: 范國清
Kuo-Chin Fan
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 96
語文別: 中文
論文頁數: 60
中文關鍵詞: 生物認證
外文關鍵詞: Biometrics Recognition
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今社會中,人類比以前更加在乎人身安全及隱私問題。隨著日益進步的科技發展,硬碟和攝影器材價格的下降,數位影音資訊變得更容易取得,自動化的身份辨識系統變成人類重要的訴求。利用人類生物特徵如人臉、指紋、聲紋、簽名…等做為身份認證的依據,因具有唯一性和便利性,所以成為最近很熱門的主題。近來有許多雙模組身份認證方法被提出,利用兩種以上的生物特徵進行辨識,以增加認證系統之可靠度。本論文提出一個基於視訊之身份認證系統,結合音訊及人臉影像進行身份之認證。本論文,藉由讓語者回答隨機的字彙,以避免發生事先錄音之情況,並使用語音辨識來進行通關密碼的驗證。而在人臉特徵抽取的部份,同時使用了靜態和動態特徵。因為靜態資訊較容易以圖片偽造,而語者說話時臉部某些特徵之變化卻是很難模仿的,因此除了人臉靜態資訊外,本論文還同時參考一些動態資訊,如唇型變化和嘴巴開合等不易偽造的特徵,以辨識語者身份。
    整個系統中,首先利用音訊資料進行關鍵字認證,辨識語者是否說出正確的關鍵字,並利用嘴唇資訊輔助辨認,當確認關鍵字正確後,則繼續進行語者身份辨識。身份辨識部份,抽取人臉五官靜態特徵和嘴唇、下巴範圍的移動變化等動態特徵,分別進行身份辨識。最後結合靜態和動態特徵的辨識結果進行身份判定,並進行身份之認證。
    本研究方法,在關鍵字辨識的正確率上達92.96%,而身份辨識正確率為98.89%。由實驗結果證實本論文所提出之系統具有很好之效果。


    In modernized society, security and privacy are the two issues that people concern most. Due to the rapid progress of technology development, the price of hard discs and photographic equipments reduce dramatically such that digital video information can be accessed more easily. Therefore, there is a growing need for automatic identity recognition systems to protect the misusing of personal private information. Biometrics recognition by using biometrics features, such as facial features, finger-print, voice, and signature, is an ultimate technique to resolve the problem because of its uniqueness and convenience. There are many bimodal identity authentication systems proposed recently which use two and or more biometrics features to improve the reliability of the authentication system. In this thesis, we propose a video-based identity authentication system which uses both audio and visual information. By asking a user a random question, the system can operate without the need of pre-recorded voice patterns. Since static features can be usually deceived by using photos, the system uses both static and dynamic features extracted from face images because the variations of dynamic features of a speaker are hard to be forged. In our work, the lip movement and mouth opening patterns are simultaneously adopted as the dynamic features to recognize the speaker''s identity.
    First, the audio information and the lip information are inputted to check if the user has said the correct password. After the password has been checked, the identity of the user is then to be verified. Here, the static features like facial features and the dynamic features like the movement of lips and the jaw are utilized in identity recognition.。Finally, the two recognition results are fused to determine the identity.
    The proposed system can achieve 92.96% accuracy in password checking, and 98.89% accuracy in face recognition. Experimental results verify the validity of the proposed system in identity authentication.

    中文摘要 I Abstract II 目 錄 IV 圖 目 錄 VI 表 目 錄 VII 第一章 緒論 1 1.1 研究動機 1 1.2 相關研究 2 1.3 系統流程 4 1.4 論文架構 6 第二章 影片前處理 7 2.1 人臉偵測 8 2.2 特徵點偵測 12 2.2.1 建立主動外觀模型 13 2.2.2 主動外觀模型之搜尋 16 第三章 特徵抽取 20 3.1 影像特徵抽取 20 3.1.1 靜態特徵抽取 22 3.1.2 動態特徵抽取 26 3.2 語音特徵抽取 28 第四章 關鍵字與身份辨識 30 4.1 支援向量機 31 4.2 系統辨識流程 33 4.1.1 關鍵字辨識 33 4.1.2 身份辨識 34 4.1.3 決策結合 35 第五章 實驗結果 36 5.1 影片資料庫 36 5.2 系統實驗結果 38 5.2.1 關鍵字辨識 38 5.2.2 身份辨識 39 5.2.3 整體辨識 46 第六章 結論與未來工作 48 6.1 結論 48 6.2 未來工作 50 參考文獻 51

    [1] A. Kumar, D. Zhang, “Personal Recognition Using Hand Shape and Texture,” IEEE Transactions on Image Processing, Vol. 15, no. 8, August 2006.
    [2] C. C. Chibelushi, Farzin Deravi, and John S. D. Mason, “A review of speech-based bimodal recognition,” IEEE Transactions on Multimedia, Vol. 4, no. 1, March 2002.
    [3] L. Hong and A. Jain, “Integrating faces and fingerprints for personal identification,” IEEE Transactions on Pattern Analysis and Machine Intellegence vol. 20, no. 12, pp. 1295–1307, December 1998.
    [4] M. B. Stegmann, “Analysis and Segmentation of Face Images using Point Annotations and Linear Subspace Techniques,” IMM Technical Report , August 2002.
    [5] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active Contour Models,” Int''l J. Computer Vision, vol. 1, pp. 321-331, 1988.
    [6] P. Lucey and G. Potamianos, “Lipreading using profile versus frontal views ,” IEEE Multimedia Signal Processing 2006.
    [7] P. Viola, M. Jones, “Robust Real-Time Fce Detection,” International Journal of Computer Vision, Vol.57, pp. 137-154, May 2004.
    [8] S. Ben-Yacoub, Y. Abdeljaoued, E. Mayoraz, “Fusion of face and speech data for person identity verification,” Neural Networks, vol. 10, pp. 1065–1074, September 1999.
    [9] S. Ribaric, I. Fratric, K. Kis, “A biometric verification system based on the fusion of palmprint and face features,” Image and Signal Processing and Analysis, 2005.
    [10] T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, “Active Shape Models - Their Training and Application,” Computer Vision and Image Understanding, Vol.61, pp. 38-59, January 1995.
    [11] Y. Wang, T. Tan, A. K. Jain, “Combining face and iris for identity verification,” Guildford, U.K., June, pp. 805–813,2003.
    [12] Y. Freund and R. E. Schapire, “A Short Introduction to Boosting,” Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.
    [13] S. S. Stevens and J. Volkman, “The relation of pitch to frequency,” Journal of psychology, 53, pp. 329, 1940.
    [14]王駿發、林博川、王家慶、宋豪靜, “A Novel Algorithm for Speaker Change Detection Based on Support Vector Machine,” 成功大學電機研究所,2005.

    QR CODE
    :::