跳到主要內容

簡易檢索 / 詳目顯示

研究生: 蔡炎興
Yan-Hsing Tsai
論文名稱: 關鍵詞萃取及語者辨識系統之研製
A System for Keyword Spotting and Speaker Recognition
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 91
語文別: 中文
論文頁數: 83
中文關鍵詞: 語者識別關鍵詞萃取語音辨識
外文關鍵詞: Speaker Recognition, Keyword Spotting
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文的研究主題是針對前人的關建詞萃取、確認技術加以改進,並結合語者辨識技術建構一套系統。本論文主體可分為三個部分。在關鍵詞萃取方面,關鍵詞與無關詞模組是用次音節模型來建立的,目的是使最後建立的系統更具有可攜性。另外,除了找出關鍵詞模組與無關詞模組的混合數有最好的搭配方式外,還應用了GCS辨識演算法在關鍵詞的萃取上,使得辨識時就具有部分的拒絕能力,最後再應用類似Beam Search的概念來對系統加速。而關鍵詞的確認上,同樣的我們使用了次音節模型來作假設測試,並且提出了一個不用訓練每個次音節臨界值的方法,使得以後建立確認系統可以更快速。最後,簡略地介紹我們使用語者辨識技術的方法。結合前人的語者識別技術【27】,建構一套系統,並且利用Visual C++的MFC、SDK將我們的語音辨識核心技術包起來,實現視窗化的使用者介面,使得我們的理論能做到即時的線上測試。


    摘要 ............................................................I 目錄 ............................................................II 附圖目錄 ........................................................V 表格目錄 ........................................................VII 第一章 緒論 .....................................................1 1.1 研究動機 ....................................................1 1.2 研究目標 ....................................................2 1.3 論文大綱 ....................................................3 第二章 語音辨識基本技術 .........................................4 2.1 特徵參數擷取 ................................................4 2.2 隱藏式馬可夫模型 ............................................8 2.3 聲學模型 ....................................................11 2.4 模型訓練與參數預估 ..........................................16 2.4.1 訓練演算法 ................................................16 2.4.2 訓練流程圖 ................................................19 第三章 關鍵詞萃取與確認 .........................................21 3.1 概論 ........................................................21 3.2 關鍵詞萃取架構 ..............................................22 3.2.1 關鍵詞模組 ................................................22 3.2.2 無關詞模型 ................................................23 3.2.3 辨識模組的排列 ............................................24 3.3 辨識演算法 ..................................................25 3.4 辨識流程 ....................................................28 3.5 廣義信任分數 ................................................29 3.6 關鍵詞確認 ..................................................33 3.6.1 確認流程 ..................................................33 3.6.2 次音節的假設測試 ..........................................35 3.6.3 錯誤率的計算 ..............................................38 3.7 系統加速 ....................................................40 第四章 語者辨識與確認 ...........................................42 4.1 語者辨識 ....................................................42 4.2 語者確認 ....................................................45 4.2.1 語者模型 ..................................................45 4.2.2 全域語者模型(Global Speaker Model) ........................46 第五章 實驗與結果 ...............................................48 5.1 實驗環境 ....................................................48 5.2 關鍵詞萃取實驗 ..............................................50 5.2.1 混合數對辨識率的影響 .....................................50 5.2.2 廣義信任分數(GCS) .......................................52 5.3 關鍵詞確認實驗 值 ...........................................55 5.3.2 關鍵詞確認 ................................................58 5.4 系統加速 ....................................................61 5.5 系統實現 ....................................................62 第六章 結論與展望 ...............................................68 6.1 結論 ........................................................68 6.2 未來展望 ....................................................69 參考文獻 ........................................................71

    [1] M.-W. Koo, C.-H. Lee, and B.-H Juang, “ Speech Recognition and Utterance Verification Based on a Generalized Confidence Score, ” IEEE Trans .on Speech and Audio Processing, vol. 9, No. 8, Nov. 2001.
    [2] Chi-Min Liu, Chin-Chih Chiu, and Hung-Yuan Chang “ Design of Vocabulary -Independent Mandarin Keyword Spotters, ” IEEE Trans. on Speech and Audio Processing, vol. 8, No. 4, July 2000.
    [3] Qi Li, B.-H, Juang, Qiru Zhou, and C.-H. Lee, “ Automatic Verbal Information Verification for User Authentication, ” IEEE Trans. on Speech and Audio Processing, vol. 8, No. 5, Sep. 2000.
    [4] T. Kawahara, C.-H. Lee, and B.-H. Juang, “ Flexible Speech Understanding Based on Combined Key-Phrase Detection and Verification, ” IEEE Trans. on Speech and Audio Processing, vol. 6, No. 6, Nov. 1998.
    [5] B. H. Juang, “ The past, present, and future of speech processing, ” IEEE Trans. on Signal Processing, pp. 24-28, May 1998.
    [6] M. G. Rahim, C.-H. Lee, and B.-H. Juang, “ Discriminative Utterance Verification for Connected Digits Recognition, ” IEEE Trans. on Speech and Audio Processing, vol. 5, No. 3, May 1997.
    [7] D. Burshtein, “ Robust parametric modeling of duration in hidden Markov models, ” IEEE Trans. on Speech Audio Processing, vol. 4, pp. 240-242, May 1996.
    [8] H. Ney, “ The use of a one stage dynamic programming algorithm for connected word recognition, ” IEEE Trans on. Acoustic, Speech, Signal Processing, vol.32, No.2, April 1984.
    [9] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “ An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition, ” The Bell System Technical Journal, vol. 62, No. 4, April 1983.
    [10] J. Neyman and E. S. Pearson, “ On the problem of the most efficient tests of statistical hypotheses, ” phil. Trans. R. Soc. A, vol. 231, pp. 289-337, 1933.
    [11] Lin Xin and Bing-Xi Wang “ Utterance Verification For Spontaneous Mandarin Speech Keyword Spotting, ” IEEE Proceedings ICII 2001, Beijing, pp. 397-401 vol.3
    [12] Myoung-Wan Koo and Sun-Jeong Lee, “ An Utterance Verification System Based on Subword Modeling For A Vocabulary Independent Speech, ” Eurospeech 1999.
    [13] N. Moreau and D, Jouvet “ Use of A Confidence Measure Based in Frame Level Likelihood Ratios for The Rejection of Incorrect Data, ” Eurospeech, 1999.
    [14] Tatsuya Kawahara, C.-H. Lee and B.-H. Juang “ Combining Key-Phrase Detection and Subword-Based Verification For Flexible Speech Understanding, ” in Proc IEEE Int. Conf. Acoustic, Speech, Signal Processing, Munich, Germany, May 1997, pp. 1159-1162
    [15] L. R. Rabiner, “ A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, ” Proceedings of the IEEE, vol. 77, No. 2, Feb. 1989.
    [16] L. R. Rabiner and B. H. Juang, “ Fundamentals of Speech Recognition, ” Prentice Hall, New Jersey, 1993.
    [17] John R. Deller, Jr., John G. Proakis, John H. L. Hansen, “ Discrete-Time Processing of Speech Signals ”, 1987
    [18] L. R. Rabiner and R. W. Schafer, “ Digital Processing of Speech Recognition Signals, ” Prentice-Hall Co. Ltd, 1978
    [19] Tou, J. T., “ Pattern Recognition Principles, ” Addison-Wesley, 1974
    [20] J. Neyman and E. S. Pearson, “ On the use and interpretation of certain test criteria for purpose of statistical inference, ” Biometrika, pt I, vol. 20A, pp.175-240, 1928.
    [21] Y. Zhzng, D. Zhand and Z. Shu, “ A novel text-independent speaker verification method based on the global speaker model, ” IEEE Trans. on Systems, Man, and Cybernetics, 30(5):598-602, 2000.
    [22] Chi-Shi Liu, Hsiao-Chuan Wang and Chin-Hui Lee, “ Speaker verification using normalized log-likelihood score, ” IEEE Trans. on Speech and Audio Processing, Jan. 1996, pp.57-60.
    [23] E. Roseberg, J. Delong, C. H. Lee, B. H. Juang and F. K. Soong, “ The Use of Cohort Normalized Scores for Speaker Recognition, ” Pro. ICSL 92. Banff, pp.599-602. Oct. 1992.
    [24] 蔡永琪,“ 基於次音節單元之關鍵詞辨識 ”,國立中央大學碩士論文,中華民國八十四年六月
    [25] 黃國彰,“ 關鍵詞萃取與確認之研究 ”,國立中央大學碩士論文,中華民國八十五年六月
    [26] 王維邦,“ 連續國語語音關鍵詞萃取系統之研究與發展 ”,國立中央大學碩士論文,中華民國八十六年六月
    [27] 吳金池,“ 語者辨識系統之研究 ”,國立中央大學碩士論文,中華民國九十年五月

    QR CODE
    :::