| 研究生: |
陳俊傑 Chun-Chieh Chen |
|---|---|
| 論文名稱: |
結構化語者模型之研究 The study of structural speaker model |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 83 |
| 中文關鍵詞: | 語者調適 、語者確認 、語者識別 、語者辨識 |
| 外文關鍵詞: | speaker recognition, speaker verification, speaker identification, speaker adaptation |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們提出以樹狀結構高斯密度為基礎的文字不特定語者調適系統。首先將一個訓練良好的通用背景模型以樹狀結構建造出在聲學空間中具有不同解析度的結構化背景模型,因此利用結構化的語者調適法調適出來的特定語者模型亦具有多重解析度的聲紋特性;利用樹狀結構於語者調適技術及語者模型在語者辨識正確率上有不錯的效果。
我們也將比較樹狀結構在不同語者調適方法上的效果。在少量調適語料的情況下,模型中沒有分到調適語料的高斯分佈會使得辨識的效能降低。因此對於沒有調適的高斯分佈,本論文提出一個加入結構化的向量場平滑化演算法,改善傳統向量場平滑化方法的缺點,進一步地提升系統的辨識效能。
在語者確認方面,樹狀結構中每一層解析度都有一定的效果,本論文也嘗試結合多層解析度計分的方法,以萃取在不同空間架構下的優缺點,以降低語者確認系統的等錯誤率。
[1] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[2] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[3] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, Addison Wesley, 1974.
[4] L. S. Lee and Y. Lee, “Voice Access of Global Information for Broad-Band Wireless: Technologies of Today and Challenges of Tomorrow,” Proceedings of the IEEE, vol. 89, no. 1, pp. 41-57, January 2001.
[5] G. R. Doddington, “Speaker recognition-identifying people by their voices,” Proceedings of the IEEE, vol. 73, no. 11, pp. 1651-1664, November 1985.
[6] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, April 1994.
[7] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
[8] M. Tonomura, T. Kosaka and S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Filed Smoothing Using Maximum a Posteriori Probability Estimation,” ICASSP-95, vol.1, pp. 688-691, 1995.
[9] D. A. Reynolds and R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Models,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, January 1995.
[10] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[11] T. K. Moon, “The Expectation-Maximization Algorithm,” IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, November 1996.
[12] C. S. Liu, H. C. Wang and C. H. Lee, “Speaker Verification Using Normalized Log-Likelihood Score,” IEEE Trans. Speech and Audio Processing, vol. 4, no. 1, pp. 56-60, January 1996.
[13] K. Shinoda and C. H. Lee, “A Structural Bayes Approach to Speaker Adaptation,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 3, pp. 276-287, March 2001.
[14] T. Watanabe, K. Shinoda, K. Takagi and K. –I. Iso, “High speed speech recognition using tree-structured probability density function,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1995, pp. 556-559.
[15] B. Xiang and T. Berger, “Efficient Text-Identification Speaker Verification with Structural Gaussian Mixture Models and Neural Network,” IEEE Trans. Speech and Audio Processing, vol. 11, no. 5, pp. 447-456, September 2003.
[16] T. J. Hanzen and A. K. Halberstadt, “Using aggregation to improve the performance of mixture Gaussian acoustic models,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1998, pp. 653-656.
[17] B. L. Pellom and J. H. L. Hansen, “An Efficient Scoring Algorithm for Gaussian Mixture Model Based Speaker Identification,” IEEE Signal Processing Letters, vol. 5, no. 11, pp. 281-284, November 1998.
[18] 吳金池,”語者辨識系統之研究”,國立中央大學電機工程研究所碩士論文,民國九十一年。
[19] 賴彥輔,”語者辨識之研究”,國立中央大學電機工程研究所碩士論文,民國九十二年。
[20] 陳冠廷,”以樹狀結構有效使用調適語料之語者調適技術”,國立台灣大學電信工程研究所碩士論文,民國八十八年。