跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃夢晨
Meng-chen Huang
論文名稱: 最小錯誤鑑別式應用於語者辨識之競爭語者探討
The research of competitive speakers on MCE for speaker identification
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 96
語文別: 中文
論文頁數: 67
中文關鍵詞: 高斯混合模型支撐向量機最小錯誤鑑別式
外文關鍵詞: Support Vector Machine, Minimum Classification Error, Gaussian mixture model
相關次數: 點閱:5下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們主要是利用最小錯誤鑑別式(Minimum Classification Error, MCE)重新訓練語者模型,而使用最小錯誤鑑別式(MCE)在訓練語者模型時,所會遇到的最大問題則是要以何種標準選取競爭語者群,針對這一項問題,我們共提出四種競爭語者群的選取方法,包含:排名法、臨界值法、分數分類法及模型分類法,分數分類法及模型分類法皆是將語者參數輸入至支撐向量機(SVM)內做分類的動作,分數分類法是輸入每一位語者的最大相似分數,而模型分類法則是輸入每位語者的模型參數。將參數皆輸入至支撐向量機(SVM)後,再藉由支撐向量機(SVM)優良的分類特性,從語料庫中找到更合適的競爭語者群,進而提升系統語者辨識率,分數分類法對傳統高斯混合模型(Gaussian mixture model, GMM)語者辨識系統有42.27%的錯誤改善率,本論文實驗中是使用TIMIT語料庫為基礎。


    In this thesis, we re-train speaker model by Minimum Classification Error Method (MCE). For Minimum Classification Error Method, searching competitive speakers is the most important problem, and then we propose four methods for searching competitive speakers, ex: ranking method, threshold method, model classification method and score classification method. For model classification method and score classification method, we use speaker’s parameters as inputs to train Support Vector Machine (SVM), and SVM will classify target speaker and competitive speakers. In this paper, we expect that the two methods will raise speaker recognition rate. The experimental result shows that Score classification method obtains a 42.27% speaker recognition rate improvement over Gaussian mixture model (GMM). This paper is based on TIMIT database..

    摘要......................................................i Abstract.................................................ii 謝誌....................................................iii 目錄....................................................iv 附圖目錄................................................vii 附表目錄...............................................viii 第一章 緒論...............................................1 1.1 研究動機.........................................2 1.2 語者辨識概述.....................................5 1.3 語者調適技術概述.................................7 1.4 研究方向.........................................8 1.5 章節概要.........................................9 第二章 語者識別之基本技術................................11 2.1 特徵參數擷取....................................11 2.2 語者模型建立....................................14 2.2.1 高斯混合模型................................14 2.2.2 語者模型訓練流程............................15 2.2.3 向量量化....................................17 2.2.4 EM演算法....................................18 2.3 語者模型調適技術................................21 2.3.1 貝適調適法..................................21 2.4 語者識別........................................25 第三章 系統架構..........................................27 3.1 最小錯誤鑑別式..................................30 3.1.1 鑑別函式...................................30 3.1.2 錯誤鑑別準則...............................31 3.1.3 綜合機率減少演算法.........................33 3.1.4 最小錯誤鑑別式之應用.......................34 3.2 支撐向量機......................................36 第四章 實驗與討論........................................38 4.1 TIMIT語音資料庫.................................38 4.2 模型訓練及測試..................................40 4.3 實驗數據........................................41 4.3.1 實驗一 排名法..............................41 4.3.2 實驗二 臨界值法............................43 4.3.3 實驗三 模型分類法..........................46 4.3.4 實驗四 分數分類法..........................49 第五章 結論與末來展望....................................52 5.1 結論............................................52 5.2 未來展望........................................53 參考文獻.................................................54

    [1] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993
    [2] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
    [3] G.R. Doddington: Speaker Recognition-Identifying People by Their Voices. Proceedings of IEEE, Vol. 73, No. 11, 1986, pp. 1651-1644.
    [4] B.H Juang, W. Hou, C.H Lee, “Minimum classification error rate methods for speech recognition:’ IEEE Trans. on Speech and Audio Processing. vol. 5, pp. 257-265, May 1997.
    [5] O. Siohan, A. E. Rosenberg, and S. Parthasarathy, “Speaker identification using minimum classification error training,” ICASSP-98, vol.1, pp.109–112, May 1998.
    [6] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T Kitamura, , “Minimum Classification Error Interactive Training for Speaker Identification”, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ''05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):641 – 644
    [7] Valente Fabio, Wellekens, Christian J, “Minimum classification error /eigenvoices training for speaker identification”, ICASSP 2003, 28th IEEE International Conference on Acoustics, Speech, and Signal Processing, April 6-10, 2003 - Hong Kong
    [8] Yamamoto, H.; Nankaku, Y.; Miyajima, C.; Tokuda, K. Kitamura, T.; “Parameter sharing and minimum classification error training of mixtures of factor analyzers for speaker identification” Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ''04). IEEE International Conference on Volume 1, 17-21 May 2004 Page(s):I - 29-32 vol.1
    [9] Johan A.K. Suykens, Tony Van Gestel, Jos De Brabanter, Bart De Moor and Joos Vandewalle, Least Squares Support Vector Machines, World Scientific, 2002
    [10] Sheng-Yu Sun; Tseng, C.L.; Chen, Y.H.; Chuang, S.C.; Fu, H.C., “Cluster-based support vector machines in text-independent speaker identification”, Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on Volume 1, 25-29 July 2004 Page(s):
    [11] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,”IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,April 1994.
    [12] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
    [13] J. McDonough, T. Schaaf, A. Waibel, “On maximum mutual information speaker-adapted training” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ''02). IEEE International Conference on Volume 1, 2002 Page(s):I-601 - I-604 vol.
    [14] J. Kaiser, B. Horvat, Z. Kacic, “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,” Speech Communication, Vol. 38, 2002, pp.383-398.
    [15] V. Doumpiotis, W. Byrne, “Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition,” to appear in Speech Communication.
    [16] L. Wang, P.C. Woodland,, “MPE-Based Discriminative Linear Transform for Speaker Adaptation” in Proc. IEEE International Conference on Acoustics, Speech, Signal processing, vol. I, 2004, pp. 321-324.
    [17] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. on Speech and Audio Process., vol.3, no.1, pp.72–83, Jan. 1995
    [18] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker- Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
    [19] T. E. Tremain. “The Government Standard Linear Predictive Coding Algorithm. ” Speech Technology (1982) 40--49.
    [20] T. K. Moon, "The Expectation Maximization. Algorithm", IEEE Signal processing magazine, Nov. 1996.
    [21] D. Reynolds and T. Quatieri, Speaker Verification Using Adapted Gaussian Mixture Models, in Digital Signal Processing A Review Journal, vol. 10, no. 1-3, pages19-41, Academic Press,2000.
    [22] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech recognizer,” Proc. ICASSP-92, pp. 473–476
    [23] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, “A Practical Guide to Support Vector Classification”, abailable at
    http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [24] R. Vergin and D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999
    [25] del Alamo, C.M.; Alvarez, J.; de la Torre, C.; Poyatos, F.J.; Hernandez, L.; “Incremental speaker adaptation with minimum error discriminative training for speaker identification” Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on Volume 3, 3-6 Oct. 1996 Page(s):1760 - 1763 vol.3
    [26] 李信廷, “改善最小錯誤鑑別式之語者辨認方法” ,國立中央大學電機工程研究所碩士論文,民國九十五年。
    [27] 朱映霖,“利用支撐向量機改善最小錯誤鑑別式之語者辨識方法”,國立中央大學電機工程研究所碩士論文,民國九十六年
    [28] 陳柏仁,“應用投票演算法之語者確認系統研究”, 國立中央大學電機工程研究所碩士論文,民國九十六年

    QR CODE
    :::