| 研究生: |
廖家慶 Chia-Ching Liau |
|---|---|
| 論文名稱: |
語者調適之應用研究 The Research of Speaker Adaptation |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 90 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 語者調適 |
| 外文關鍵詞: | speaker adaptation |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
摘 要
在語音辨識系統中,特定語者(Speaker-dependent)型語音辨識系統雖有高辨識率的優點,但當應用到新語者時須花釵h語音訓練資料和時間;而不限語者(Speaker-independent)或多語者(multi-speaker)型的語音辨識系統,除最初建立系統時所需語音資料外,應用於新語者時不再需新語音訓練資料,但其辨識率普遍不高。語者調適(Speaker-adaptive)辨識系統則利用一充分訓練過的參考系統已知資訊,藉新語者少量語音資料訓練 ,可達到接近特定語者系統的辨識率,因此論文中將針對語者調適系統進行研究。
本論文內容包含兩個主要研究主軸,其一為如何在少量調適語料之狀況下,增進改善調適演算法,藉此提升系統辨識率與調適結果;另一主軸則為利用增進後之調適演算法實際應用於線上辨識與調適。
於第一研究主軸中,其重點在於考慮初始模型與最大可能性線性迴歸(Maximum Likelihood Linear Regression,MLLR)兩者間貢獻的比重分配,藉由找出最佳平衡點來提升調適性能。接著並考慮向量場平滑化(Vector-Field-Smoothing,VFS)轉移向量場的調適方式,針對沒有觀測到之調適語料模型,加以參考有調適語料之模型來進行調整,藉此特性再搭配權重化之MLLR調適方法研究其調適效果。接者利用特定語者模型與不特定語者模型來架構出特徵向量空間,由此特徵向量空間來找出語者的代表點所在,藉此調整系統模型參數。而在第二研究主軸內,藉由所發展出少量調適語料即能達到調適系統之演算法,將此調適演算法應用於線上系統,使語者能夠感受到辨識與調適之即時變化。
Speaker adaptation has been applied to speech recognition to get a speaker dependent system with a good performance. Most adaptation techniques use the initial model as a starting point and then introduce speaker’s specific information. By using the adapted parameters, the recognition performance can be significantly improved.
In this thesis, we present a variation on improving the performance of maximum likelihood linear regression (MLLR) in cases of little adaptation data. The transformed Gaussian means are interpolated with the means in the initial mean models. The VFS algorithm proposed by the following steps. First, the transfer vectors are estimated. Then, interpolation and smoothing are performed using the transfer vectors. We applied the idea of using eigenvoices, a set of orthogonal basis vectors derived from the parameters of speaker dapendent models trained on reference speakers.
參考文獻
[1] Rabiner,L. R. et al.”Recognition of Isolated Digits Using Hidden
Markov Models with Continuous Mixture Densities,” AT&T Technical Journal 64(6):1211-1233,1985.
[2] Juang,B.H.,and Rabiner,L.R.”Mixture Auto-regressive Hidden
Markov Models for Speech Signals,”IEEE Trans.on ASSP,vol.33,No.6,pp.1404-1413,Dec.1985.
[3] Rabiner,L.R.,and Juang,B.H. ”An Introduction to Hidden Markov
Models”IEEE ASSP Magzine,Jan.1986.
[4] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”. Proc. IEEE, Vol. 77, No.2, pp. 257–286, Feb. 1989.
[5] C-H. Lee, C-H. Lin, and B-H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”. IEEE Trans. on Sig. Proc., Vol. 39, No. 4, pp. 806–814, April 1991.
[6] Heidi Christensen, “Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression”. MSc.E.E. Thesis. Aalborg University, Denmark, June 1996.
[7] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of HMM’s using Linear Regression”. Technical Report GUED/F-INFENG/ TR.181, Cambridge University, June 1994.
[8] C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”. Computer Speech and Language, Vol. 9, pp. 171–185, 1995.
[9] C.J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”. Proc. ARPA Spoken Language Technology Workshop, pp. 104–109, Feb. 1995.
[10] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of continuous density HMMs using Multivariate Linear Regression”. ICSLP-94, Vol. 2, pp. 451–454, Yokohama, 1994.
[11] M.J.F. Gales, “Maximum Likelihood Linear Transformation for HMM-Based Speech Recognition”. Technical Report GUED/F-INFENG/TR.291, Cambridge University, May 1997.
[12] M.J.F. Gales, “The Generation and use of Regression Class Trees for MLLR Adaptation”. Technical Report GUED/F-INFENG/TR.263, Cambridge University, August 1996.
[13] A. Sankar and C-H. Lee, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition”. IEEE Trans. on Speech and Audio Proc., Vol. 4, pp. 190–202, May 1996
[14] L. R. Rabiner and R. W. Schafer, “ Digital Processing of Speech Recognition Signals ”, Prentice-Hall Co. Ltd, 1978.
[15] M. Tonomura, T. Kosaka and S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Field Smoothing using Maximum a Posteriori Probability Estimation”. ICASSP-95, Vol. 1, pp. 688–691, 1995.
[16] B.F. Necioglu, M. Ostendorf, and J.R. Rohlicek, “A Bayesian Approach to Speaker Adaptation for the Stochastic Segment Model”. ICASSP-92, Vol. 1, pp. 437–440, 1992.
[17] J-I. Takahashi and S. Sagayama, “Fast Telephone Channel Adaptation Based on Vector Field Smoothing Technique”. Second IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, pp. 97–100, 1994.
[18] J. Takahashi and S. Sagayama, “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation”. ICASSP-95, Vol. 1, pp. 696–699, 1995.
[19] J. Takahashi and S. Sagayama, “Minimum Classification Error Training for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning”. ICASSP-96, Vol.: 2, pp. 597–600, 1996.
[20] R. Kuhn, P. Nguyen, J. –C. Junqua, N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space”. IEEE Trans. on Speech and Audio Proc., Vol. 8, pp. 695-707, Nov. 2000.