跳到主要內容

簡易檢索 / 詳目顯示

研究生: 郭又禎
Yo-zhen Kuo
論文名稱: 改良式梅爾倒頻譜參數應用於關鍵字萃取
Improved Mel-scale Frequency Cepstral Coefficients for Keyword Spotting Technique
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 65
中文關鍵詞: 梅爾倒頻譜系數粒子群演算法關鍵詞萃取
外文關鍵詞: MFCC, PSO, keyword spotting
相關次數: 點閱:5下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在語音辨識系統中,梅爾倒頻譜係數(Mel frequency cepstral coefficients, MFCCs)為常用的特徵值參數,然而隨著MFCC被廣泛地應用,許多研究MFCC改良的方法也被提出,本論文針對三角帶通濾波器能量組進行權重調整,以粒子群演算法尋找濾波器組的最佳權重,演算法中以語料能量統計曲線與濾波器組包絡線曲線之差作為適應函數,使濾波器組更能符合人耳感受度,以提升辨識效果。由實驗結果得知,改良後的MFCC的辨識效果優於傳統MFCC,且其抗高頻雜訊能力也優於傳統MFCC。


    In the speech recognition system, Mel frequency cepstral coefficients (MFCCs) are the feature parameters that are used widely. Because of the wide applications of MFCC in the audio signal processing, lots of studies on the improvement of MFCCs were presented. In this study, we use particle swarm optimization algorithm to optimize the weight of MFCC filter bank. We utilize the difference between voice training database’s energy statistical curve and MFCC filter bank’s envelope as fitness function. Experimental results show that the proposed MFCCs method improves the recognition rate. In noisy environment experiments, the presented MFCCs method also improves the recognition performance.

    摘要 I Abstract II 致謝辭 III 目錄 IV 圖目錄 VI 表目錄 VIII 附錄 IX 第一章 緒論 1 1-1 研究動機 1 1-2 文獻回顧 2 1-3 章節架構 5 第二章 語音識別 6 2-1 預處理 6 2-2 特徵值擷取 9 2-3 隱藏式馬可夫模型 12 2-4 聲學模型及模型訓練 14 2-4-1 聲學模型 14 2-4-2 模型訓練與參數重估 15 2-5 關鍵字萃取 20 2-5-1 關鍵字萃取架構 20 2-5-2 關鍵字辨識流程 22 2-5-3 ㄧ階動態規畫演算法 24 第三章 粒子群演算法應用於濾波器組 26 3-1 粒子群演算法 26 3-2 梅爾濾波器組權重 30 3-2-1 遮蔽效應 30 3-2-2 變數設定及適應函數 31 3-2-3 調整梅爾濾波器組權重 34 第四章 實驗結果 38 4-1 實驗環境 38 4-1 混合數對辨識率的影響 41 4-2 調整梅爾濾波器之權重實驗 43 4-2-1 調整三角帶通濾波器權重 43 4-2-2 調整三角帶通濾波器中心頻率且調整權重 46 4-3 雜訊環境實驗 49 第五章 結論與未來展望 56 5-1 結論 56 5-2 未來展望 57 參考文獻 58 附錄 62

    [1] A. J. Oxenham and C. J. Plack, “Suppression and the upward spread of masking,” Journal of the Acoustical Society of America, 104 (6), pp. 3500-3510, December 1998.
    [2] B. H. Juang, “The past, present, and future of speech processing,” IEEE Signal Processing Magazine, pp. 24-28, May 1998.
    [3] F. Zheng, G. Zhang and Z. Song, “Comparison of different implementations of MFCC,” Journal of Computer Science and Technology, Vol. 16, pp. 582-589, 2001.
    [4] H. Ney, “The use of a one-stage dynamic programming algorithm for connected word recognition,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 32, pp. 263-271, 1984.
    [5] H. Bourlard, B. D’hoore and J. M. Boite, “Optimizing recognition and rejection performance in word spotting systems,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. I373-I376, 1994.
    [6] J. R. Deller, J. G. Proakis and J. H. L. Hansen, Discrete-time Processing of Speech Signals, Wiley-IEEE Press, 1999.
    [7] J. Kennedy and R. Eberhart, “Particle swarm optimization,” IEEE International Conference on Neural Networks, Vol. 4, pp. 1942-1948, 1995.
    [8] J. Junkawitsch, L. Neubauer, H. Hoge and G. Ruske, “A new keyword spotting algorithm with pre-calculated optimal thresholds,” Proceeding of Fourth International Conference on Spoken Language Proceedings, Vol. 4, pp. 2067-2070, 1996.
    [9] J. Bradbury, “Linear predictive coding,” 2000.
    [10] J. Z. Hua and Y. Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” IEEE International Symposium on Intelligent Signal Processing and Communication Systems, pp. 32-35, November 2007.
    [11] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Recognition Signals, Prentice Hall, 1978.
    [12] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE proceedings, Vol. 77, pp. 257-286, 1989.
    [13] L. R. Rabiner and B. H. Juang, Fundamentals of Speech recognition, Prentice Hall, 1993.
    [14] M. R. Schroeder, J. H. Hall and B. S. Atal, “Optimizing digital speech coders by exploiting masking properties of the human ear,” Journal of the Acoustical Society of America, pp. 1647-1652, 1979.
    [15] M. W. koo, C. H. Lee and B. H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 9, pp. 821-832, November 2001.
    [16] R. C. Rose and D. B. Paul, “A hidden Markov model based keyword recognition system,” IEEE Transactions on Acoustic, Speech, and Signal Processing, pp. 129-132, 1990.
    [17] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 7, pp. 525-532, September 1999.
    [18] R. J. Schilling and S. L. Harris, Fundamentals of Digital Signal Processing, Clarkson University Potsdam, NY.
    [19] S. E. Levinson, L. R. Rabiner and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition,” The Bell System Technical Journal, Vol. 62, April 1983.
    [20] S. Umesh and R. Sinha, “A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children's Speech,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 15(8), pp. 2418-2430, November 2007.
    [21] S. Chakroborty and S. Goutam, “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.
    [22] W. W. Hung and H. C. Wang, “On the use of Weighted Filter Bank Analysis for the derivation of Robust MFCCs,” IEEE Signal Processing Letters, Vol. 8, No.3, March 2001.
    [23] W. Han, C. F. Chan, C. S. Choy and K. P. Pun, “An Efficient MFCC Extraction Method in Speech Recognition,” International Symposium on Circuits and Systems, pp. 21-24, 2006.
    [24] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” IEEE International Conference on Evolutionary Computation Proceedings, pp. 69-73, 1998.
    [25] 國音學,國立臺灣師範大學國音教編輯委員會,2001。
    [26] 高志杰,粒子群演算法應用於梅爾濾波器組之研究,國立中央大學碩士論文, 2013。
    [27] 大五碼,台灣財團法人資訊工業策進會,1983。
    [28] 黃國彰,關鍵詞萃取與確認之研究,國立中央大學碩士論文,1996。
    [29] 周智勳,最佳化梅爾倒頻譜係數之研究及其於音樂曲風辨識之應用,Journal of Information Technology and Applications, Vol. 4, No. 1, pp. 53-58, 2010.
    [30] 蔡炎興,關鍵詞萃取即語者辨識系統之研製,國立中央大學碩士論文,2003。
    [31] 王小川,張月琴,國科會計畫報告「國語語音資料庫(MAT)之標音技術與語音特徵參數分析,2000。
    [32] 王小川,語音訊號處理,修訂二版,全華圖書股份有限公司,2009年2月。

    QR CODE
    :::