| 研究生: |
李信廷 Shin-Ting Li |
|---|---|
| 論文名稱: |
改善最小錯誤鑑別式之語者辨認方法 Improved Minimum Classifiaction Error Method for Speaker Identification |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 最小錯誤鑑別式 、語者辨認 |
| 外文關鍵詞: | Speaker Identification, Minimum Classifiaction Error |
| 相關次數: | 點閱:7 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在語者辨認中,能夠有效的訓練語料是非常重要的,因為這對辨識的效果是有很大的影響。到目前為止,傳統的語者模型都還是以最大相似度為準則,這在擁有大量訓練語料之下確實是有很好的效果,但在極少量訓練語料下卻不然,並且最大相似度估計的方法是,利用同一個語者的訓練語料去訓練出這個語者的模型,跟其它語者的訓練語料並無相關。,而此種模型訓練並沒有考慮到語者辨認時模型間彼此的關係,在模型參數訓練完成後有可能使得語音特徵向量落在對應的聲學模型與非相關模型的相似度值同時變大,產生辨識上的混淆。因此近十幾年來有所謂的鑑別式聲學模型訓練方法被提出來,不以最大化訓練聲學語料的相似度為目標,而以最小化分類(或辨識)錯誤為目標。
在本論文中,我們使用最小錯誤鑑別式法則重新去訓練語者模型,並提出了三個改善傳統最小錯誤鑑別式法則的方法。 此外,還把最小錯誤鑑別式使用在特徵語音調適法上,因為最小錯誤鑑別式受劣質近似模型的影響比最大相似度小。於是我們提出一個結合最小錯誤鑑別式和特徵語音調適法的方法,增加在極少語料時的強健性,以及降低建構聲學空間時造成劣質近似模型的影響性。
In the speaker identification, the data that can be effective training is very important, because this has very great influence on identification rate. Up to now, traditional speaker model use maximum likelihood. There is a very good result in a large amount of training data, but not good in a small amount of training data. The method of maximum likelihood is, use the training data for this speaker to train model for this speaker and not relevant with other speaker’s training data. This kind of training model which does not consider mutual relation among the models to verification.After the parameters are trained to finish,it may make the likelihood value of feature vectors leave the corresponding acoustics model and non- relevant model which become great at the same time,then produce the obscurity in verifying.So the so-called Discriminative Acoustic Model Training has been proposed in recent ten years.Do not regard maximizing to train acoustic data of likelihood as the goal, but regard minimizing classification(or identificaion) error as the goal.
In this thesis, we use minimum classification error to train speaker model again, and propose three method of improved traditional minimum classification error. In addition, also use minimum classification error in eigenvoices, because minimum classification error is smaller of mistake distinguishing than maximum likelihood. Then we purpose a method of to combine minimum classification error and eigenvoices, increase robust in a few data, and reduce influence of mistake distinguishing when construct acoustics space.
參考文獻
[1] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[2] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[3] G.R. Doddington: Speaker Recognition-Identifying People by Their Voices. Proceedings of IEEE, Vol. 73, No. 11, 1986, pp. 1651-1644.
[4] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,”IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,April 1994.
[5] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
[6] B.H Juang, W. Hou, C.H Lee, “Minimum classification error rate methods for speech recognition:’ IEEE Trans. on Speech and Audio Processing. vol. 5, pp. 257-265, May 1997.
[7] O. Siohan, A. E. Rosenberg, and S. Parthasarathy, “Speaker identification using minimum classification error training,” ICASSP-98, vol.1, pp.109–112, May 1998.
[8] J. McDonough, T. Schaaf, A. Waibel, “On maximum mutual information speaker-adapted training” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ''02). IEEE International Conference on Volume 1, 2002 Page(s):I-601 - I-604 vol.
[9] J. Kaiser, B. Horvat, Z. Kacic, “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,” Speech Communication, Vol. 38, 2002, pp.383-398.
[10] V. Doumpiotis, W. Byrne, “Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition,” to appear in Speech Communication.
[11] L. Wang, P. Woodland, “MPE-based discriminative linear transform for speaker adaptation” Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ''04). IEEE International Conference on
[12] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. on Speech and Audio Process., vol.3, no.1, pp.72–83, Jan. 1995.
[13] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker- Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[14] T. E. Tremain. “The Government Standard Linear Predictive Coding Algorithm. ” Speech Technology (1982) 40--49.
[15] T. K. Moon, "The Expectation Maximization. Algorithm", IEEE Signal processing magazine, Nov. 1996.
[16] D. Reynolds and T. Quatieri, Speaker Verification Using Adapted Gaussian Mixture Models, in Digital Signal Processing A Review Journal, vol. 10, no. 1-3, pages19-41, Academic Press, 2000.
[17] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech recognizer,” Proc. ICASSP-92, pp. 473–476.
[18] Q.Y Hong, S. Kwong , “Discriminative training for speaker identification based on maximum model distance algorithm”, Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ''04). IEEE International Conference on Volume 1, 17-21 May 2004 Page(s):I - 25-8 vol.1
[19] F. Valente, C. Wellekens, “Minimum classification error/eigenvoices training for speaker identification” Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ''03). 2003 IEEE International Conference on Volume 2, 6-10 April 2003 Page(s):II - 213-16 vol.2
[20] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T Kitamura, , “Minimum Classification Error Interactive Training for Speaker Identification”, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ''05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):641 – 644
[21] 賴彥輔, “語者辨識之研究” ,國立中央大學電機工程研究所碩士論文,民國九十二年。
[22] 張文杰, “模型調適之語者辨識系統” ,國立中央大學電機工程研究所碩士論文,民國九十四年。
[23] 莊智顯, “結合聲學與韻律訊息之強健性語者辨認” ,國立臺北科技大學電腦通訊與控制研究所碩士論文,民國九十四年。