跳到主要內容

簡易檢索 / 詳目顯示

研究生: 丁英智
Ing-Jr Ding
論文名稱: 語者調適演算法及其應用於線上之研究
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 89
語文別: 中文
論文頁數: 83
中文關鍵詞: 隱藏式馬可夫模型語者調適
外文關鍵詞: HMM, speaker adaptation
相關次數: 點閱:8下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • 在本論文中,我們即針對語者調適演算法做深入的研究。而這些演算法是貝氏調適法(Bayesian adaptation, MAP)、最大可能性線性迴歸法(Maximum Likelihood Linear Regression, MLLR)、修正最大可能性線性迴歸法(Modified Maximum Likelihood Linear Regression)、利用最大可能性理論求取轉換參數法、及利用貝氏理論求取轉換參數法。而經由實驗發現,同屬於參數轉換的後三者演算法在非監督式少量語料下調適皆有不錯的調適效果。而貝氏調適法由於是屬於精調式的方法故在非監督式調適下則無顯著的調適效果,另外,最大可能性線性迴歸法若不加入修正則在極少量語料下(僅一句、二句或甚少的語料)仍會發生調差語音模型的現象。再者,我們發現貝氏調適法雖不適合做非監督式的語者調適,但在自我監督式調適若配合適當設計的模糊控制器,則其會有較穩定的調適性能,並且當語料充足時則會有接近完全辨識的效果。
    本論文的研究是採先以離線式的方式對各個調適演算法做性能上的評估,之後再以線上的方式測試語者調適的效果,而在線上測試時亦加入了對於調適語料確認的簡單方法。


    目 錄 ? 摘 要 ? 附圖目錄 ? 附表目錄 ? 第一章 序論 1 1.1緣由1 1.2研究動機2 1.3研究方向及目標2 1.4論文大綱3 ? 第二章 語者調適相關技術 4 2.1貝氏調適法(MAP)4 2.2加入模糊控制器的貝氏調適法5 2.2.1 模糊理論概述5 2.2.2 加入模糊控制器之修正貝氏調適法7 2.3最大可能性線性回歸(MLLR)8 2.3.1 MLLR 理論8 2.3.2 MLLR高斯分布轉換矩陣的估計10 2.3.3 MLLR對角化 之推導13 2.4向量場平滑化(VFS) 15 2.5加入權重之修正MLLR調適方法 17 2.6演算法之合併使用 20 2.7最大相似法則求取轉換參數 20 2.8最大事後機率法則求取轉換參數 23 2.9調適語料的確認技術 25 ? 第三章 系統架構 29 3.1實驗環境29 3.1.1 實驗設備29 3.1.2 系統設定29 3.1.3 訓練、調適及測試語料29 3.2初始模型─(使用右相關次音節模型)30 3.3辨識模組的組成及排列32 3.4調適實驗架構34 3.4.1 調適實驗初始模型34 3.4.2 監督批次式調適架構(SB)34 3.4.3 非監督式增量調適架構35 3.4.4 線上非監督式增量調適架構37 ? 第四章 實現及結果 39 4.1不特定語者實驗結果39 4.2含有模糊控制器之MAP自我調適實驗39 4.3含有權重之MLLR調適實驗42 4.4含有權重之MLLR+VFS調適實驗45 4.5利用 ML法則求取轉換參數調適實驗47 4.6利用 MAP法則求取轉換參數調適實驗50 4.7實驗結果總結54 4.7.1 極少調適語料(僅一句)時之調適性能比較54 4.7.2(a) 增量調適實驗之調適性能比較一56 4.7.2(b) 增量調適實驗之調適性能比較二59 4.7.3調適語料與測試語料內容相同之的調適實驗比較61 4.7.4 錯誤語料的排名昇降調適實驗比較64 4.7.5 調適時間的比較67 4.7.6 關於語料確認的實驗結果68 4.7.7 總結68 4.8線上辨識及調適的介面70 4.8.1 UI_1線上調適介面70 4.8.2 UI_2線上調適介面73 ? 第五章 結論及未來發展方向 75 5.1結論75 5.2未來研究方向75 ?\r ? 參考文獻 ? 附 錄

    [1] X. Huang and K.F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition”. IEEE Trans. on Speech and Audio Proc., Vol. 12, pp. 150—157, April 1993.
    [2] Seyed Mohammad Ahadi-Sarkani, “Bayesian and Predictive Techniques for Speaker Adaptation”. Ph.D. Thesis, Cambridge University, U.K., 1996.
    [3] Lawrence Rabiner and B-H. Juang, “Fundamentals of Speech Recognition”. Prentice Hall, 1993.
    [4] C-H. Lee, C-H. Lin, and B-H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”. IEEE Trans. on Sig. Proc., Vol. 39, No. 4, pp. 806—814, April 1991.
    [5] M. Tonomura, T. Kosaka and S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Field Smoothing using Maximum a Posteriori Probability Estimation”. ICASSP-95, Vol. 1, pp. 688—691, 1995.
    [6] Heidi Christensen, “Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression”. MSc.E.E. Thesis. Aalborg University, Denmark, June 1996.
    [7] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of HMM’s using Linear Regression”. Technical Report GUED/F-INFENG/ TR.181, Cambridge University, June 1994.
    [8] C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”. Computer Speech and Language, Vol. 9, pp. 171—185, 1995.
    [9] C.J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”. Proc. ARPA Spoken Language Technology Workshop, pp. 104—109, Feb. 1995.
    [10] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of continuous density HMMs using Multivariate Linear Regression”. ICSLP-94, Vol. 2, pp. 451—454, Yokohama, 1994.
    [11] M.J.F. Gales, “Maximum Likelihood Linear Transformation for HMM-Based Speech Recognition”. Technical Report GUED/F-INFENG/TR.291, Cambridge University, May 1997.
    [12] B.F. Necioglu, M. Ostendorf, and J.R. Rohlicek, “A Bayesian Approach to Speaker Adaptation for the Stochastic Segment Model”. ICASSP-92, Vol. 1, pp. 437—440, 1992.
    [13] J-I. Takahashi and S. Sagayama, “Fast Telephone Channel Adaptation Based on Vector Field Smoothing Technique”. Second IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, pp. 97—100, 1994.
    [14] J. Takahashi and S. Sagayama, “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation”. ICASSP-95, Vol. 1, pp. 696—699, 1995.
    [15] J. Takahashi and S. Sagayama, “Minimum Classification Error Training for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning”. ICASSP-96, Vol.: 2, pp. 597—600, 1996.
    [16] A. Sankar and C-H. Lee, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition”. IEEE Trans. on Speech and Audio Proc., Vol. 4, pp. 190—202, May 1996.
    [17] V.V. Digalakis, D. Rtischev and L.G. Neumeyer, “Speaker adaptation using constrained estimation of Gaussian mixtures”. IEEE Trans. Speech Audio Process. 3, pp. 357-366, 1995.
    [18] J.T. Chien and H.C. Wang, “Telephone speech recognition based on Bayesian adaptation of hidden Markov models”. Speech Communication 22, pp. 369-384, 1997.
    [19] L.G. Neumeyer, V.V. Digalakis and M. Weintraub, “Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus”. IEEE Trans. Speech Audio Process. 2, pp. 590-597, 1994.
    [20] J.T. Chien, L.M. Lee and H.C. Wang, “Channel estimation for reference model adaptation in telephone speech recognitiion”. Proc. 4th European Conf. Speech Communication and Technology, Vol. 2, pp. 1541-1544, 1995.
    [21] J.T. Chien, L.M. Lee and H.C. Wang, “Estimation of channel bias for telephone speech recognition”. Proc. Internat. Conf. Spoken Language Processing, Vol. 3, pp. 1840-1843, 1996.
    [22] B. Widrow and S.D. Stearns, “Adaptive Signal Processing”. Prentice-Hall, Englewood Cliffs, NJ, pp.56-60, 1985.
    [23] S. Homma, K. Aikawa, S. Sagayama, “Improved Estimation of Supervision in Unsupervised Speaker Adaptation”. Proc. ICASSP-97, Vol. 2, pp. 1023-1026, 1997.
    [24] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”. Proc. IEEE, Vol. 77, No.2, pp. 257—286, Feb. 1989.
    [25] R. Kuhn, P. Nguyen, J. —C. Junqua, N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space”. IEEE Trans. on Speech and Audio Proc., Vol. 8, pp. 695-707, Nov. 2000.

    QR CODE
    :::