跳到主要內容

簡易檢索 / 詳目顯示

研究生: 高志杰
Chih-Chieh Kao
論文名稱: 粒子群演算法應用於梅爾濾波器組之研究
PSO Algorithm for Mel- Filterbank
指導教授: 莊堯棠
Y.-T. Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 61
中文關鍵詞: 梅爾濾波器組粒子群演算法梅爾倒頻譜系數關鍵詞萃取
外文關鍵詞: Mel- Filterbank, PSO, MFCC, keyword spotting
相關次數: 點閱:12下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文主要針對特徵值擷取方法梅爾倒頻譜係數MFCC 中的梅爾濾波器組做研究。 在基於粒子群演算法最佳化濾波器組的中心頻率與邊界頻率上,提出不同於一般使用辨識率當適應函數的方法,而是以統計曲線與濾波器組包絡線的相似度做為適應函數進行最佳化,而本論文依照語音訊號在能量頻譜上的特性,以能量統計圖及能量差異性統計圖為依據,得到兩組最佳化的結果,並分別進行關鍵詞辨識和三種常見雜訊環境下的測試。 最後的實驗結果顯示,此方法有提升特徵值擷取效果的能力,提高了關鍵詞萃取系統的辨識率,且在強健性上亦含有特定環境的抗雜訊能力。


    In this thesis, a study for feature extraction using filter bank applied to mel frequency cepstrum coefficients (MFCC) is presented. We propose a novel approach to use particle swarm optimization (PSO) to optimize the parameters of MFCC filterbank, such as the central and side frequencies. The proposed PSO algorithm utilizes filter similarity between statistical curve and filterbank’s envelope as fitness function. According to the energy and energy difference statistical charts that comply with characteristics of the speech signal in the energy spectrum, we obtained two optimal results by PSO. Then keyword recognization and three noisy environments are considered for tests. The results of our experiments show that the proposed method improves the recognition rate of keyword spotting system and the robustness against the testing noisy environments.

    摘要....................... I Abstract.....................II 致謝.....................III 目錄.....................IV 圖目錄......................VI 表目錄.................... VII 附錄.......................VIII 第一章 緒論...................1 1.1 研究動機....................1 1.2 文獻探討....................1 1.3 章節架構....................4 第二章 背景知識.....................5 2.1 特徵參數擷取................5 2.1.1 MFCC ................5 2.1.2 LPCC................12 2.2 特徵參數的補償...............13 2.2.1 倒頻譜消去法 (CMS) ..............13 2.2.2 倒頻譜平均值與變異數正規化法 (CMVN)........15 2.3 隱藏式馬可夫模型................16 2.4 聲學模型..................17 第三章 粒子群演算法應用於濾波器組.............21 3.1 粒子群演算法...................21 3.1.1 粒子群演算法模式..............21 3.1.2 慣性權重...............24 3.2 PSO 用於最佳化濾波器組...............25 3.2.1 變數設定...............25 3.2.2 適應函數 (fitness function)..........26 第四章 實驗結果...................29 4.1 關鍵詞萃取..................29 4.1.1 關鍵詞萃取架構..............29 4.1.2 辨識流程...............32 4.2 實驗環境.................33 4.3 通道效應實驗...................34 4.4 PSO 最佳化濾波器組實驗...............37 4.5 雜訊環境實驗...................41 第五章 結論與未來展望.................46 5.1 結論.....................46 5.2 未來展望..................47 參考文獻.......................48

    [1] Aggarwal, R. K. and Dave, M., “Filterbank optimization for robust ASR using GA and PSO,” International Journal of Speech Technology, Vol.15, pp. 191-201, 2012.

    [2] Bou-Ghazale, S. E. and Hansen, J. H. L., “A comparative study of traditional and newly proposed features for recognition of speech under stress,” IEEE Transactions on Speech and Audio Processing, Vol.8, pp. 429-442, 2000.

    [3] Bradbury, J., “Linear Predictive Coding,” Online PDF,pp.1-23, 2000.

    [4] Chakroborty, S. and Goutam, S., “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.

    [5] Charbuillet, C., Gas, B., Chetouani, M., and Zarader, J., “Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm,” NOLISP, pp. 105-113, 2007.

    [6] Hung, W. and Wang, H., “On the use of weighted filter bank analysis for the derivation of robust MFCCs,” IEEE Signal Processing Letters, Vol.8, pp. 70-73, 2001.

    [7] Kennedy, J. and Eberhart, R., “Particle swarm optimization,” IEEE International Conference on, Vol.4, pp.1942-1948, 1995.

    [8] Lee, C., Hyun, D., Choi, E., Go, J. and Lee, C., “Optimizing feature extraction for speech recognition,” IEEE Transactions on Speech and Audio Processing, Vol.11, pp. 80-87, 2003.

    [9] Nickel, R. M., “Feature-Automatic Speech Character Identification,” IEEE Circuits and Systems Magazine, pp. 10-31, 2006.

    [10] Ney, H., “The use of a one stage dynamic programming algorithm
    for connected word recognition,” IEEE Acoustic, Speech Signal,
    Processing, Vol. 32, pp. 263-271, 1984.

    [11] Rosenberg, A. E., Lee, C. H. and Soong, F. K., “Cepstral channel
    normalization techniques for HMM-based speaker verification,”
    International Conference on Spoken Language Processing (ICSLP), pp. 1835-1838, 1994.

    [12] Rabiner, L. R., “A Tutorial on Hidden Markov Models and Selected
    Applications in Speech Recognition,” IEEE Proceedings, Vol.77, pp. 257-286, 1989.

    [13] Rose, R. C. and Paul, D. B., “A hidden Markov model based
    keyword recognition system,”, IEEE Acoustics, Speech, and Signal Processing, pp.129-132, 1990.

    [14] Shi, Y. and Eberhart, R., “A modified particle swarm optimizer,”
    IEEE International Conference on Evolutionary Computation Proceedings, pp. 69-73, 1998.

    [15] Schafer, R. W. and Wbiner, L., “Digital representations of speech
    signals,” IEEE Journals & Magazines, Vol.63, pp. 662-677, 1975.

    [16] Shannon, B. J. and Paliwal K. K., “Feature extraction from
    higher-lag autocorrelation coefficients for robust speech recognition,” ScienceDirect Speech Communication, Vol.48, pp. 1458-1485, 2006.

    [17] Skowronski, M. and Harris, J., “Increased mfcc filter bandwidth for
    noise-robust phoneme recognition,”IEEE Acoustics, Speech and Signal Processing, Vol.1, pp. 801-804, 2002.

    [18] Skowronski, M. and Harris, J., “Improving the filter bank of a
    classic speech feature extraction algorithm,” International Symposium on Circuits and Systems (ISCAS), Vol.4, pp. 281-284, 2003.

    [19] Tiberewala, S. and Hermansky, H., "Multiband and adaptation
    approaches to robust speech recognition", Eurospeech97, 1997, pp. 107-110, 1997.

    [20] Vignolo, L. D., Rufiner, H. L., Milone, D. H. and Goddard, J. C.,
    “Genetic optimization of cepstrum filterbank for phoneme classification,” Bio-inspired Systems and Signal Processing, pp. 179-185, 2009.

    [21] Vignolo, L. D., Rufiner, H. L., Milone, D. H. and Goddard, J. C.,
    “Evolutionary cepstral coefficientts,” ScienceDirect Applied Soft Computing, Vol.11, pp. 3419-3428, 2011.

    [22] Viikki, O. and Laurila, K., “Cepstral domain segmental feature
    vector normalization for noise robust speech recognition,” ScienceDirect Speech Communication, Vol. 25, pp. 133-147, 1998.

    [23] Wu, J. and Yu, J., “An Improved Arithmetic of MFCC in Speech
    Recognition System,” International Conference on Electronics, Communications and Control (ICECC), pp 719-722, 2011.

    [24] Zheng, F., Zhang, G. and Song, Z., “Comparison of different
    implementations of MFCC,” Journal of Computer Science and Technology, Vol.16, pp. 582-589, 2001.

    [25] Zabidi, A., Mansor, M., Lee, Y. K., Yassin, I. M. and Sahak , R.,
    “Discrete Mutative Particle Swarm Optimisation of MFCC computation for classifying hypothyroidal infant cry,” Computer Applications and Industrial Electronics(ICCAIE), pp.588-592, 2010.

    [26] 蔡炎興, “關鍵詞萃取即語者辨識系統之研製,” 國立中央大學碩
    士論文, 2003.
    [27] 簡忠弘, “關鍵詞辨認系統的研究與實現,” 國立清華大學碩士論
    文, 1997.
    [28] 王小川,“語音訊號處理,” 全華圖書股份有限公司, 2009.

    [29] “國音學,” 國立臺灣師範大學國音教編輯委員會,2001.

    [30] “大五碼,” 台灣財團法人資訊工業策進會,1983.

    [31] “MAT Speech Database,” 中華民國計算語言學學會
    http://www.aclclp.org.tw/doc/mat2500_brief.pdf

    QR CODE
    :::