| 研究生: |
朱映霖 Ying-Lin Chu |
|---|---|
| 論文名稱: |
利用支撐向量機改善最小錯誤鑑別式之語者辨識方法 SPEAKER IDENTIFICATION BASED ON AN IMPROVED MINIMUM CLASSIFICATION ERROR METHOD |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 74 |
| 中文關鍵詞: | 支撐向量機 、語者辨識 、最小錯誤鑑別式 |
| 外文關鍵詞: | Minimum Classification Error, Speaker Identification, Support Vector Machines |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在語者辨識中,有效的訓練語料是非常重要的,因為是以其來建立語者模型,所以對辨識效果有很大的影響。傳統的語者模型都是以最大相似度為準則,雖然在大量的訓練語料下有很好的效果,但在極少量的訓練語料下卻不然,並且因為最大相似度估計的方法,是利用同一個語者的訓練語料去訓練此語者的模型,而跟其他語者的訓練語料則無相關。由於此種模型訓練時並沒有考慮到語者辨識時,語者模型互相間的關係,所以在語者辨識時容易產生混淆。因此近年來有所謂的鑑別式聲學模型訓練方式被提出來,不以最大化訓練聲學語料的相似度為目標,而以最小化分類錯誤為目標。
本論文中我們使用最小錯誤鑑別式重新去訓練語者模型,並利用支撐向量機來改善最小錯誤鑑別式,由於最小錯誤鑑別式在競爭語者數量的設定方面不夠強健,所以我們透過語者模型對調適語料的分數,附上類別標籤後來訓練支撐向量機,再由其支撐向量選取競爭語者,使選取競爭語者這方面比傳統最小錯誤鑑別式較有強健性,也有較高的語者辨識效果。
In speaker recognition, it is important to have effective training data to train speaker models which have a great effect on recognition performance. In abundant training data, traditional speaker models which is based on maximum likelihood have a good effect, but it is opposite in slight training data. Besides, being independent with other speakers, we used training data for the same speaker to train speaker model owning to the method of maximum likelihood. In the stage of training model, we did not concern the relation of different speaker model, so we would get confused easily in speaker recognition. In recent years, Discriminative Acoustic Model Training is proposed to minimize classification error, not maximizing training acoustic models likelihood.
In this thesis, we use minimum classification error to train speaker models, and support vector machines to improve minimum classification error. Due to the non-robustness of minimum classification error in setup for the amount of competitive speakers, we use the scores of speaker models for training data as labels of classes to train support vector machines. Then, we use support vectors to choose competitive speakers to make more robust and higher speaker recognition performance than minimum classification error.
[1] B.H Juang, W. Hou, C.H Lee, “Minimum classification error rate methods for speech recognition:?IEEE Trans. on Speech and Audio Processing. vol. 5, pp. 257-265, May 1997.
[2] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, “A Practical Guide to Support Vector Classification?, abailable at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[3] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models,? IEEE Trans. on Speech and Audio Process., vol.3, no.1, pp.72–83, Jan. 1995.
[4] D. Reynolds and T. Quatieri, Speaker Verification Using Adapted Gaussian Mixture Models, in Digital Signal Processing A Review Journal, vol. 10, no. 1-3, pages19-41, Academic Press, 2000.
[5] G.R. Doddington: Speaker Recognition-Identifying People by Their Voices. Proceedings of IEEE, Vol. 73,
No. 11, 1986, pp. 1651-1644.
[6] Johan A.K. Suykens, Tony Van Gestel, Jos De Brabanter, Bart De Moor and Joos Vandewalle, Least Squares Support Vector Machines, World Scientific, 2002
[7] J. Kaiser, B. Horvat, Z. Kacic, “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,? Speech Communication, Vol. 38, 2002, pp.383-398.
[8] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,?IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,April 1994.
[9] J. McDonough, T. Schaaf, A. Waibel, “On maximum mutual information speaker-adapted training? Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ''02). IEEE International Conference on Volume 1, 2002 Page(s):I-601 - I-604 vol.
[10] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[11] L. Wang, P. Woodland, “MPE-based discriminative linear transform for speaker adaptation? Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ''04). IEEE International Conference on
[12] O. Siohan, A. E. Rosenberg, and S. Parthasarathy, “Speaker identification using minimum classification error training,? ICASSP-98, vol.1, pp.109–112, May 1998.
[13] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,? IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
[14] R. Vergin, D. O'' Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker- Independent Continuous-Speech Recognition,? IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532,September 1999.
[15] T. E. Tremain. “The Government Standard Linear Predictive Coding Algorithm. ? Speech Technology (1982) 40--49.
[16] Tie Cai, Jie Zhu, “A novel Method for rapid speaker adaptation based on support speaker weighting?, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ''05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):993 – 996
[17] T. K. Moon, "The Expectation Maximization. Algorithm", IEEE Signal processing magazine, Nov. 1996.
[18] V. Doumpiotis, W. Byrne, “Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition,? to appear in Speech Communication.
[19] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech
recognizer,? Proc. ICASSP-92, pp. 473–476.
[20] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[21] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T Kitamura, , “Minimum Classification Error Interactive Training for Speaker Identification?, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ''05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):641 – 644
[22] 賴彥輔, “語者辨識之研究? ,國立中央大學電機工程研究所碩士論文,民國九十二年。
[23] 張文杰, “模型調適之語者辨識系統? ,國立中央大學電機工程研究所碩士論文,民國九十四年。
[24] 李信廷, “改善最小錯誤鑑別式之語者辨認方法? ,國立中央大學電機工程研究所碩士論文,民國九十五年。