跳到主要內容

簡易檢索 / 詳目顯示

研究生: 許時懷
Shih-huai Hsu
論文名稱: 語音特徵參數擷取之濾波器改良
Improved Filter-bank of Speech Feature Coefficient Extraction
指導教授: 莊堯棠
Yau-tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 64
中文關鍵詞: 梅爾濾波器組語音特徵關鍵詞萃取
外文關鍵詞: mel-filterbank, speech feature, keyword spotting
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   本論文研究之主題為針對語音關鍵詞辨識系統中的特徵參數擷取部分進行改良。在整個關鍵詞辨識系統的架構中,擷取語音特徵參數主要是想凸顯每段不同聲音個別的特性,並且在擷取的過程又可達到減低資料量的效果,很多學者都曾在文獻中提出不同的方式來擷取出語音特徵參數,或是對其中的擷取方法來進行改良。
      本論文主要為討論在梅爾倒頻譜係數中數種改良後的濾波器組,將效果最好的濾波器組取代原本的梅爾三角濾波器組,經實驗結果發現,應用此改良後的濾波器組能夠提升關鍵詞萃取系統的辨識率,故證明此濾波器組能有效的加強擷取出之語音的特性。


      The theme of this thesis is to improve the part of feature extraction in the speech keyword recognition. In the framework of the entire keyword recognition system, feature extraction is to highlight the individual features of different voices, and can reduce the amount of data by means of the extract process. Many researchers have presented different ways to extract the speech features in the literature, or on which making improvements at extracting feature coefficient method.
      This thesis discusses several improved filter bank in mel-frequency cepstral coefficients (MFCC). The best filter bank is used to replace the original mel-triangular filter set. The experimental results showed that the application of this improved filter bank can effectively improve the recognition rate of the keyword extraction system.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 V 表目錄 VI 第一章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 2 1.3 章節概要 4 第二章 語音處理 6 2.1 語音特徵參數擷取 7 2.2 特徵參數的補償 15 2.3 隱藏式馬可夫模型 16 2.4 聲學模型 20 2.5 模型訓練 25 第三章 多種梅爾濾波器組 30 3.1 遮蔽效應 31 3.2 傳統 MFCC三角濾波器組 32 3.3 不同之梅爾濾波器組 35 3.3.1 矩形濾波器組(Rectangle filter) 36 3.3.2 梯形濾波器組(Trapezoid filter) 37 3.3.3 高斯濾波器組(Gaussian filter) 38 第四章 關鍵詞萃取 41 4.1 關鍵詞萃取系統架構 41 4.2 一階動態規劃系統 44 4.3 關鍵詞辨識流程 48 第五章 實驗結果與分析 50 5.1 實驗環境 50 5.2 實驗結果 52 第六章 結論與未來展望 58 6.1 結論 58 6.2 未來展望 58 參考文獻 60

    [1] Juang B. H., “Speech recognition in adverse environment,” Computer Speech and language, 5, pp275-294, 1991.
    [2] Imai, S., “Cepstral analysis synthesis on the mel frequency scale,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '83., vol.8, no., pp.93-96, 1983.
    [3] Mansour, D. and Juang, B.H., “The short-time modified coherence representation and noisy speech recognition, ” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.37, no.6, pp.795-804, 1989.
    [4] Singer, H., Umezaki, T. and Itakura, F., “Low bit quantization of the smoothed group delay spectrum for speech recognition,” Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, vol., no., pp.761-764 vol.2, 3-6, 1990.
    [5] Shannon, B. J. and Paliwal, K. K., “Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition,” Science Direct Speech Communication, Vol.48, pp. 1458-1485, 2006.
    [6] Junqin, Wu. and Junjun, Yu., “An improved arithmetic of MFCC in speech recognition system,” Electronics, Communications and Control (ICECC), 2011 International Conference on, vol., no., pp.719-722, 9-11., 2011.
    [7] Xiaojia, Zhao. and DeLiang, Wang., “Analyzing noise robustness of MFCC and GFCC features in speaker identification,” Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, vol., no., pp.7204-7208, 26-31., 2013.
    [8] Jun, Qi., Dong, Wang., Yi, Jiang. and Runsheng, Liu., “Auditory features based on Gammatone filters for robust speech recognition,” Circuits and Systems (ISCAS), 2013 IEEE International Symposium on , vol., no., pp.305-308, 19-23., 2013.
    [9] Davis, S. and Mermelstein, P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.28, no.4, pp.357-366, 1980.
    [10] Zufeng, Weng., Lin, Li. and Donghui, Guo., “Speaker recognition using weighted dynamic MFCC based on GMM,”Anti-Counterfeiting Security and Identification in Communication (ASID), 2010 International Conference on, vol., no., pp.285-288, 18-20., 2010.
    [11] Mitra, V., Franco, H., Graciarena, M. and Mandal, A., “Normalized amplitude modulation features for large vocabulary noise-robust speech recognition,” Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, vol., no., pp.4117-4120, 25-30., 2012.
    [12] Devi, M. R. and Ravichandran, T., “A novel approach for speech feature extraction by Cubic-Log compression in MFCC,” Pattern Recognition, Informatics and Mobile Engineering (PRIME), 2013 International Conference on, vol., no., pp.182-186, 21-22., 2013.
    [13] Wilpon, J. G., Rabiner, L., Chin-Hui, Lee. and Goldman, E.R., “Automatic recognition of keywords in unconstrained speech using hidden Markov models,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.38, no.11, pp.1870-1878, 1990.
    [14] Dong, Yu., Li, Deng. and Seide, F., “The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, no.2, pp.388-396, 2013.
    [15] Hai-Son, Le., Oparin, I., Allauzen, A., Gauvain, J. and Yvon, F., “Structured Output Layer Neural Network Language Models for Speech Recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, no.1, pp.197-206, 2013.
    [16] 王小川,「語音訊號處理」,全華圖書股份有限公司,2009。
    [17] Shamsul Alam, S.M. and Khan, S., “Response of different window methods in speech recognition by using dynamic programming,” Electrical Engineering and Information & Communication Technology (ICEEICT), 2014 International Conference on, vol., no., pp.1,6, 10-12., 2014.
    [18] Nickel, R. M., “Feature-Automatic speech character identification,” Circuits and Systems Magazine, IEEE, vol.6, no.4, pp.10,31, Fourth Quarter 2006.
    [19] 王祐邦,“Advanced DSP Final Report:Speech Signal Time-Frequency Analysis and Mel-FilterCepstral Coefficient ─A Tutorial,” 2010.
    [20] 林品宏,「關鍵詞萃取系統及語音聲控車之應用」,國立中央大學碩士論文,2012。
    [21] Ronsenberg, A.E., Lee, C.H. and Soong, F.K., “Cepstral channel normalization techniques for HMM-based speaker verification,” International Conference on Spoken Language Processing (ICSLP), pp. 1835-1838, 1994.
    [22] Viikki, O. and Laurila, K., “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Science Direct Speech Communication, Vol. 25, pp. 133-147, 1998.
    [23] Tiberewala, S. and Hermansky, H., “Multiband and adaptation approaches to robust speech recognition,” Eurospeech97, 1997, pp. 107-110, 1997.
    [24] Rose, R. C. and Paul, D. B., “A hidden Markov model based keyword recognition system,” Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, vol., no., pp.129-132 vol.1, 3-6., 1990.
    [25] 張智傑,「多種語音特徵的合併及其在智慧型手機上之應用」,國立中央大學碩士論文,2014。
    [26] 蔡炎興,「關鍵詞萃取即語者辨識系統之研製」,國立中央大學碩士論文,2003。
    [27] 簡忠弘,「關鍵詞辨認系統的研究與實現」,國立清華大學碩士論文,1997。
    [28] J Jian Zhi-Hua; Yang Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” Intelligent Signal Processing and Communication Systems, 2007. ISPACS 2007. International Symposium on, vol., no., pp.32-35, 2007.
    [29] 「大五碼」,台灣財團法人資訊工業策進會,1983。
    [30] Oxenham, A. J. and Plack, C. J., “Suppression and the upward spread of masking,” Journal of the Acoustical Society of America, 104 (6), pp. 3500-3510, 1998.
    [31] 「遮蔽效應 Masking Effect」,國立中央大學音視訊處理實驗室。http://vaplab.ce.ncu.edu.tw/chinese/pcchang/course2009a/avsp/Masking%20Effect.pdf
    [32] Xuan, Zhu., Yining, Chen., Jia, Liu. and Runsheng, Liu., “Feature selection in Mandarin large vocabulary continuous speech recognition,” Signal Processing, 2002 6th International Conference on, vol.1, no., pp.508-511 vol.1, 26-30., 2002.
    [33] 呂易宸,「語音門禁系統」,國立中央大學碩士論文,2011。
    [34] Ney, H., “The use of a one-stage dynamic programming algorithm for connected word recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.32, no.2, pp.263-271, 1984.
    [35] Jhing-Fa, Wang., Chung-Hsien, Wu., Chaug-Ching, Haung. and Jau-Yien, Lee., “Integrating neural nets and one-stage dynamic programming for speaker independent continuous Mandarin digit recognition,” Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, vol., no., pp.69,72 vol.1, 14-17., 1991.
    [36] 林佑輯,「互動式語音導覽系統」,國立中央大學碩士論文,2010。
    [37] “MAT Speech Database,” 中華民國計算語言學學會。
    [38] 高志杰,「粒子群演算法應用於梅爾濾波器組之研究」,國立中央大學碩士論文,2013。

    QR CODE
    :::