| 研究生: |
郭又禎 Yo-zhen Kuo |
|---|---|
| 論文名稱: |
改良式梅爾倒頻譜參數應用於關鍵字萃取 Improved Mel-scale Frequency Cepstral Coefficients for Keyword Spotting Technique |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 65 |
| 中文關鍵詞: | 梅爾倒頻譜系數 、粒子群演算法 、關鍵詞萃取 |
| 外文關鍵詞: | MFCC, PSO, keyword spotting |
| 相關次數: | 點閱:6 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在語音辨識系統中,梅爾倒頻譜係數(Mel frequency cepstral coefficients, MFCCs)為常用的特徵值參數,然而隨著MFCC被廣泛地應用,許多研究MFCC改良的方法也被提出,本論文針對三角帶通濾波器能量組進行權重調整,以粒子群演算法尋找濾波器組的最佳權重,演算法中以語料能量統計曲線與濾波器組包絡線曲線之差作為適應函數,使濾波器組更能符合人耳感受度,以提升辨識效果。由實驗結果得知,改良後的MFCC的辨識效果優於傳統MFCC,且其抗高頻雜訊能力也優於傳統MFCC。
In the speech recognition system, Mel frequency cepstral coefficients (MFCCs) are the feature parameters that are used widely. Because of the wide applications of MFCC in the audio signal processing, lots of studies on the improvement of MFCCs were presented. In this study, we use particle swarm optimization algorithm to optimize the weight of MFCC filter bank. We utilize the difference between voice training database’s energy statistical curve and MFCC filter bank’s envelope as fitness function. Experimental results show that the proposed MFCCs method improves the recognition rate. In noisy environment experiments, the presented MFCCs method also improves the recognition performance.
[1] A. J. Oxenham and C. J. Plack, “Suppression and the upward spread of masking,” Journal of the Acoustical Society of America, 104 (6), pp. 3500-3510, December 1998.
[2] B. H. Juang, “The past, present, and future of speech processing,” IEEE Signal Processing Magazine, pp. 24-28, May 1998.
[3] F. Zheng, G. Zhang and Z. Song, “Comparison of different implementations of MFCC,” Journal of Computer Science and Technology, Vol. 16, pp. 582-589, 2001.
[4] H. Ney, “The use of a one-stage dynamic programming algorithm for connected word recognition,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 32, pp. 263-271, 1984.
[5] H. Bourlard, B. D’hoore and J. M. Boite, “Optimizing recognition and rejection performance in word spotting systems,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. I373-I376, 1994.
[6] J. R. Deller, J. G. Proakis and J. H. L. Hansen, Discrete-time Processing of Speech Signals, Wiley-IEEE Press, 1999.
[7] J. Kennedy and R. Eberhart, “Particle swarm optimization,” IEEE International Conference on Neural Networks, Vol. 4, pp. 1942-1948, 1995.
[8] J. Junkawitsch, L. Neubauer, H. Hoge and G. Ruske, “A new keyword spotting algorithm with pre-calculated optimal thresholds,” Proceeding of Fourth International Conference on Spoken Language Proceedings, Vol. 4, pp. 2067-2070, 1996.
[9] J. Bradbury, “Linear predictive coding,” 2000.
[10] J. Z. Hua and Y. Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” IEEE International Symposium on Intelligent Signal Processing and Communication Systems, pp. 32-35, November 2007.
[11] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Recognition Signals, Prentice Hall, 1978.
[12] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE proceedings, Vol. 77, pp. 257-286, 1989.
[13] L. R. Rabiner and B. H. Juang, Fundamentals of Speech recognition, Prentice Hall, 1993.
[14] M. R. Schroeder, J. H. Hall and B. S. Atal, “Optimizing digital speech coders by exploiting masking properties of the human ear,” Journal of the Acoustical Society of America, pp. 1647-1652, 1979.
[15] M. W. koo, C. H. Lee and B. H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 9, pp. 821-832, November 2001.
[16] R. C. Rose and D. B. Paul, “A hidden Markov model based keyword recognition system,” IEEE Transactions on Acoustic, Speech, and Signal Processing, pp. 129-132, 1990.
[17] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 7, pp. 525-532, September 1999.
[18] R. J. Schilling and S. L. Harris, Fundamentals of Digital Signal Processing, Clarkson University Potsdam, NY.
[19] S. E. Levinson, L. R. Rabiner and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition,” The Bell System Technical Journal, Vol. 62, April 1983.
[20] S. Umesh and R. Sinha, “A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children's Speech,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 15(8), pp. 2418-2430, November 2007.
[21] S. Chakroborty and S. Goutam, “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.
[22] W. W. Hung and H. C. Wang, “On the use of Weighted Filter Bank Analysis for the derivation of Robust MFCCs,” IEEE Signal Processing Letters, Vol. 8, No.3, March 2001.
[23] W. Han, C. F. Chan, C. S. Choy and K. P. Pun, “An Efficient MFCC Extraction Method in Speech Recognition,” International Symposium on Circuits and Systems, pp. 21-24, 2006.
[24] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” IEEE International Conference on Evolutionary Computation Proceedings, pp. 69-73, 1998.
[25] 國音學,國立臺灣師範大學國音教編輯委員會,2001。
[26] 高志杰,粒子群演算法應用於梅爾濾波器組之研究,國立中央大學碩士論文, 2013。
[27] 大五碼,台灣財團法人資訊工業策進會,1983。
[28] 黃國彰,關鍵詞萃取與確認之研究,國立中央大學碩士論文,1996。
[29] 周智勳,最佳化梅爾倒頻譜係數之研究及其於音樂曲風辨識之應用,Journal of Information Technology and Applications, Vol. 4, No. 1, pp. 53-58, 2010.
[30] 蔡炎興,關鍵詞萃取即語者辨識系統之研製,國立中央大學碩士論文,2003。
[31] 王小川,張月琴,國科會計畫報告「國語語音資料庫(MAT)之標音技術與語音特徵參數分析,2000。
[32] 王小川,語音訊號處理,修訂二版,全華圖書股份有限公司,2009年2月。