| 研究生: |
丁英智 Ing-Jr Ding |
|---|---|
| 論文名稱: |
語者調適演算法及其應用於線上之研究 |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 89 |
| 語文別: | 中文 |
| 論文頁數: | 83 |
| 中文關鍵詞: | 隱藏式馬可夫模型 、語者調適 |
| 外文關鍵詞: | HMM, speaker adaptation |
| 相關次數: | 點閱:8 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們即針對語者調適演算法做深入的研究。而這些演算法是貝氏調適法(Bayesian adaptation, MAP)、最大可能性線性迴歸法(Maximum Likelihood Linear Regression, MLLR)、修正最大可能性線性迴歸法(Modified Maximum Likelihood Linear Regression)、利用最大可能性理論求取轉換參數法、及利用貝氏理論求取轉換參數法。而經由實驗發現,同屬於參數轉換的後三者演算法在非監督式少量語料下調適皆有不錯的調適效果。而貝氏調適法由於是屬於精調式的方法故在非監督式調適下則無顯著的調適效果,另外,最大可能性線性迴歸法若不加入修正則在極少量語料下(僅一句、二句或甚少的語料)仍會發生調差語音模型的現象。再者,我們發現貝氏調適法雖不適合做非監督式的語者調適,但在自我監督式調適若配合適當設計的模糊控制器,則其會有較穩定的調適性能,並且當語料充足時則會有接近完全辨識的效果。
本論文的研究是採先以離線式的方式對各個調適演算法做性能上的評估,之後再以線上的方式測試語者調適的效果,而在線上測試時亦加入了對於調適語料確認的簡單方法。
[1] X. Huang and K.F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition”. IEEE Trans. on Speech and Audio Proc., Vol. 12, pp. 150—157, April 1993.
[2] Seyed Mohammad Ahadi-Sarkani, “Bayesian and Predictive Techniques for Speaker Adaptation”. Ph.D. Thesis, Cambridge University, U.K., 1996.
[3] Lawrence Rabiner and B-H. Juang, “Fundamentals of Speech Recognition”. Prentice Hall, 1993.
[4] C-H. Lee, C-H. Lin, and B-H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”. IEEE Trans. on Sig. Proc., Vol. 39, No. 4, pp. 806—814, April 1991.
[5] M. Tonomura, T. Kosaka and S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Field Smoothing using Maximum a Posteriori Probability Estimation”. ICASSP-95, Vol. 1, pp. 688—691, 1995.
[6] Heidi Christensen, “Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression”. MSc.E.E. Thesis. Aalborg University, Denmark, June 1996.
[7] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of HMM’s using Linear Regression”. Technical Report GUED/F-INFENG/ TR.181, Cambridge University, June 1994.
[8] C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”. Computer Speech and Language, Vol. 9, pp. 171—185, 1995.
[9] C.J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”. Proc. ARPA Spoken Language Technology Workshop, pp. 104—109, Feb. 1995.
[10] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of continuous density HMMs using Multivariate Linear Regression”. ICSLP-94, Vol. 2, pp. 451—454, Yokohama, 1994.
[11] M.J.F. Gales, “Maximum Likelihood Linear Transformation for HMM-Based Speech Recognition”. Technical Report GUED/F-INFENG/TR.291, Cambridge University, May 1997.
[12] B.F. Necioglu, M. Ostendorf, and J.R. Rohlicek, “A Bayesian Approach to Speaker Adaptation for the Stochastic Segment Model”. ICASSP-92, Vol. 1, pp. 437—440, 1992.
[13] J-I. Takahashi and S. Sagayama, “Fast Telephone Channel Adaptation Based on Vector Field Smoothing Technique”. Second IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, pp. 97—100, 1994.
[14] J. Takahashi and S. Sagayama, “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation”. ICASSP-95, Vol. 1, pp. 696—699, 1995.
[15] J. Takahashi and S. Sagayama, “Minimum Classification Error Training for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning”. ICASSP-96, Vol.: 2, pp. 597—600, 1996.
[16] A. Sankar and C-H. Lee, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition”. IEEE Trans. on Speech and Audio Proc., Vol. 4, pp. 190—202, May 1996.
[17] V.V. Digalakis, D. Rtischev and L.G. Neumeyer, “Speaker adaptation using constrained estimation of Gaussian mixtures”. IEEE Trans. Speech Audio Process. 3, pp. 357-366, 1995.
[18] J.T. Chien and H.C. Wang, “Telephone speech recognition based on Bayesian adaptation of hidden Markov models”. Speech Communication 22, pp. 369-384, 1997.
[19] L.G. Neumeyer, V.V. Digalakis and M. Weintraub, “Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus”. IEEE Trans. Speech Audio Process. 2, pp. 590-597, 1994.
[20] J.T. Chien, L.M. Lee and H.C. Wang, “Channel estimation for reference model adaptation in telephone speech recognitiion”. Proc. 4th European Conf. Speech Communication and Technology, Vol. 2, pp. 1541-1544, 1995.
[21] J.T. Chien, L.M. Lee and H.C. Wang, “Estimation of channel bias for telephone speech recognition”. Proc. Internat. Conf. Spoken Language Processing, Vol. 3, pp. 1840-1843, 1996.
[22] B. Widrow and S.D. Stearns, “Adaptive Signal Processing”. Prentice-Hall, Englewood Cliffs, NJ, pp.56-60, 1985.
[23] S. Homma, K. Aikawa, S. Sagayama, “Improved Estimation of Supervision in Unsupervised Speaker Adaptation”. Proc. ICASSP-97, Vol. 2, pp. 1023-1026, 1997.
[24] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”. Proc. IEEE, Vol. 77, No.2, pp. 257—286, Feb. 1989.
[25] R. Kuhn, P. Nguyen, J. —C. Junqua, N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space”. IEEE Trans. on Speech and Audio Proc., Vol. 8, pp. 695-707, Nov. 2000.