| 研究生: |
黃夢晨 Meng-chen Huang |
|---|---|
| 論文名稱: |
最小錯誤鑑別式應用於語者辨識之競爭語者探討 The research of competitive speakers on MCE for speaker identification |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 高斯混合模型 、支撐向量機 、最小錯誤鑑別式 |
| 外文關鍵詞: | Support Vector Machine, Minimum Classification Error, Gaussian mixture model |
| 相關次數: | 點閱:5 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們主要是利用最小錯誤鑑別式(Minimum Classification Error, MCE)重新訓練語者模型,而使用最小錯誤鑑別式(MCE)在訓練語者模型時,所會遇到的最大問題則是要以何種標準選取競爭語者群,針對這一項問題,我們共提出四種競爭語者群的選取方法,包含:排名法、臨界值法、分數分類法及模型分類法,分數分類法及模型分類法皆是將語者參數輸入至支撐向量機(SVM)內做分類的動作,分數分類法是輸入每一位語者的最大相似分數,而模型分類法則是輸入每位語者的模型參數。將參數皆輸入至支撐向量機(SVM)後,再藉由支撐向量機(SVM)優良的分類特性,從語料庫中找到更合適的競爭語者群,進而提升系統語者辨識率,分數分類法對傳統高斯混合模型(Gaussian mixture model, GMM)語者辨識系統有42.27%的錯誤改善率,本論文實驗中是使用TIMIT語料庫為基礎。
In this thesis, we re-train speaker model by Minimum Classification Error Method (MCE). For Minimum Classification Error Method, searching competitive speakers is the most important problem, and then we propose four methods for searching competitive speakers, ex: ranking method, threshold method, model classification method and score classification method. For model classification method and score classification method, we use speaker’s parameters as inputs to train Support Vector Machine (SVM), and SVM will classify target speaker and competitive speakers. In this paper, we expect that the two methods will raise speaker recognition rate. The experimental result shows that Score classification method obtains a 42.27% speaker recognition rate improvement over Gaussian mixture model (GMM). This paper is based on TIMIT database..
[1] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993
[2] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[3] G.R. Doddington: Speaker Recognition-Identifying People by Their Voices. Proceedings of IEEE, Vol. 73, No. 11, 1986, pp. 1651-1644.
[4] B.H Juang, W. Hou, C.H Lee, “Minimum classification error rate methods for speech recognition:’ IEEE Trans. on Speech and Audio Processing. vol. 5, pp. 257-265, May 1997.
[5] O. Siohan, A. E. Rosenberg, and S. Parthasarathy, “Speaker identification using minimum classification error training,” ICASSP-98, vol.1, pp.109–112, May 1998.
[6] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T Kitamura, , “Minimum Classification Error Interactive Training for Speaker Identification”, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ''05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):641 – 644
[7] Valente Fabio, Wellekens, Christian J, “Minimum classification error /eigenvoices training for speaker identification”, ICASSP 2003, 28th IEEE International Conference on Acoustics, Speech, and Signal Processing, April 6-10, 2003 - Hong Kong
[8] Yamamoto, H.; Nankaku, Y.; Miyajima, C.; Tokuda, K. Kitamura, T.; “Parameter sharing and minimum classification error training of mixtures of factor analyzers for speaker identification” Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ''04). IEEE International Conference on Volume 1, 17-21 May 2004 Page(s):I - 29-32 vol.1
[9] Johan A.K. Suykens, Tony Van Gestel, Jos De Brabanter, Bart De Moor and Joos Vandewalle, Least Squares Support Vector Machines, World Scientific, 2002
[10] Sheng-Yu Sun; Tseng, C.L.; Chen, Y.H.; Chuang, S.C.; Fu, H.C., “Cluster-based support vector machines in text-independent speaker identification”, Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on Volume 1, 25-29 July 2004 Page(s):
[11] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,”IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,April 1994.
[12] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
[13] J. McDonough, T. Schaaf, A. Waibel, “On maximum mutual information speaker-adapted training” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ''02). IEEE International Conference on Volume 1, 2002 Page(s):I-601 - I-604 vol.
[14] J. Kaiser, B. Horvat, Z. Kacic, “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,” Speech Communication, Vol. 38, 2002, pp.383-398.
[15] V. Doumpiotis, W. Byrne, “Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition,” to appear in Speech Communication.
[16] L. Wang, P.C. Woodland,, “MPE-Based Discriminative Linear Transform for Speaker Adaptation” in Proc. IEEE International Conference on Acoustics, Speech, Signal processing, vol. I, 2004, pp. 321-324.
[17] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. on Speech and Audio Process., vol.3, no.1, pp.72–83, Jan. 1995
[18] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker- Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[19] T. E. Tremain. “The Government Standard Linear Predictive Coding Algorithm. ” Speech Technology (1982) 40--49.
[20] T. K. Moon, "The Expectation Maximization. Algorithm", IEEE Signal processing magazine, Nov. 1996.
[21] D. Reynolds and T. Quatieri, Speaker Verification Using Adapted Gaussian Mixture Models, in Digital Signal Processing A Review Journal, vol. 10, no. 1-3, pages19-41, Academic Press,2000.
[22] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech recognizer,” Proc. ICASSP-92, pp. 473–476
[23] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, “A Practical Guide to Support Vector Classification”, abailable at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[24] R. Vergin and D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999
[25] del Alamo, C.M.; Alvarez, J.; de la Torre, C.; Poyatos, F.J.; Hernandez, L.; “Incremental speaker adaptation with minimum error discriminative training for speaker identification” Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on Volume 3, 3-6 Oct. 1996 Page(s):1760 - 1763 vol.3
[26] 李信廷, “改善最小錯誤鑑別式之語者辨認方法” ,國立中央大學電機工程研究所碩士論文,民國九十五年。
[27] 朱映霖,“利用支撐向量機改善最小錯誤鑑別式之語者辨識方法”,國立中央大學電機工程研究所碩士論文,民國九十六年
[28] 陳柏仁,“應用投票演算法之語者確認系統研究”, 國立中央大學電機工程研究所碩士論文,民國九十六年