跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林品宏
Ping-Hung Lin
論文名稱: 關鍵詞萃取系統及語音聲控車之應用
A Keyword Spotting Technique and It’s Application to A Voice-activated car
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 100
語文別: 中文
論文頁數: 49
中文關鍵詞: 梅爾倒頻譜係數關鍵詞萃取語音聲控
外文關鍵詞: keyword spotting, Mel-frequency cepstral coefficients, voice-activated
相關次數: 點閱:6下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文的研究主題是針對前人的關鍵詞萃取中的特徵參數擷取作改良,將前人所用之LPC方法改為MFCC方法,並結合語音辨識系統建構一套聲控車系統。本論文主體可分為兩個部分,在關鍵詞萃取部分,關鍵詞與無關詞模組是用次音節模型來建立,目的是使的系統更有可攜性。第二部分是將建立出來的模型,利用Visual Basic 6的開發環境,應用一階動態辨識演算法,將我們的辨識技術製作成視窗化的人機介面,達到即時辨識的效果,並且可以根據辨識的結果,與市售的遙控車結合,讓車子可以依照使用者所講的方向移動。


    The topic of the thesis is modifying part of keyword spotting that feature extracting, we substitute method Mel-frequency cepstral coefficients for method Linear prediction coefficients, and construct a voice-activated car by speech recognition.
    There are two topics in the thesis. In the first part, we focus on keyword spotting, and our keyword models and garbage models are building by sub-syllable models, and the advantage is that the system can save a lot of time. In the second part, we use Visual Basic 6 to make a human-machine interface for real-time recognition, and we combine the human-machine interface with remote control car to make a voice-activated car.

    第一章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 1 1.3 論文大綱 6 第二章 語音訊號處理 7 2.1 短時段語音處理[41] 7 2.1.1 取音框 7 2.1.3 能量計算 9 2.2 特徵參數擷取 9 2.2.1 梅爾倒頻譜 9 2.3 隱藏式馬可夫模型 13 2.4 聲學模型 15 2.5 模型訓練與參數重估 20 第三章 關鍵詞萃取 25 3.1 關鍵詞萃取架構 25 3.1.1 關鍵詞模型 25 3.1.2 無關詞模型 26 3.2 辨識流程 26 3.2.1 辨識模組的排列 26 3.2.2 辨識演算法 27 第四章 實驗與結果 31 4.1 實驗環境 31 4.2 關鍵詞萃取實驗 33 第五章 系統應用 36 5.1 辨識流程 36 5.2 系統介紹 37 6.1 結論 41 6.2 未來展望 41 參考文獻 43

    [1] 蔡佳君,國語發音和方法,台灣學生書局,1993.
    [2] L. R. Rabiner, Ronald W. Schafer, Digital Processing of Speech Signal, Prentice-Hall, INC.1978.
    [3] S. Furui, Digital Speech Processing, Synthesis, and Recognition ,Marcel Dekker, INC.1989.
    [4] B.H. Juang, “Speech recognition in adverse environment”, Computer Speech and language, 5, pp275-294,1991.
    [5] A. V. Oppenheim, R. W. Schafer, J. R. Buck, Discrete-Time Signal Processing, 曾建誠, 陳常侃, 王鵬華, 丁建均, 第二版, 離散時間訊號處理, 全華科技圖書股份有限公司, 2004.
    [6] B. H. Juang, L. R. Rabiner, J. G. Wilpon, “On the Use of Bandpass Liftering in Speech Recognition,” IEEE Trans. Assp-35, NO.7, pp. 947-954, July. 1984.
    [7] Y. Tohkura, “Weighted Cepstral Distance Measure for Speech Recognition,” IEEE Trans. Assp-35, NO.10, pp.1414-1422, Oct. 1987.
    [8] F. K. Soong, M. Mohan, “A Frequency -Weighted Itakura Spectral Distortion Measure and Its Application to Speech Recognition in Noise,” IEEE Trans. on Assp Vol. 36, NO 1, Jan. 1988.
    [9] K.K. Paliwal and M.M. Sondhi, “Recognition of Noisy Speech Using Cumulant-Based Linear Prediction Analysis,” Proc. ICASSP, pp.429-432, 1990.
    [10] S. IMAI, “Cepstral Analysis Synthesis on The Mel Frequency Scale,” Proc. ICASSP, pp. 93 – 96, 1983.
    [11] D. Mansour, B. H. Juang, “The Short-Time Modified Coherence Representation and Noisy Speech Recognition,” IEEE Trans. on Assp Vol 37, NO 6, pp. 795-804, June 1989.
    [12] H. Singer, T. Umezaki, F. Itakura, “Low Bit Quantization of the Smoothed Group Delay Spectrum for Speech Recognition,” Proc. ICASSP, pp. 761-765, 1990.
    [13] J. G. Wilpon et al.,”Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models,” IEEE Trans. on Assp Vol.38, NO11, pp.1870-1878, Nov.1990.
    [14] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE Vol 77, NO.2, pp. 257-286, Feb.1989.
    [15] R. P. Lippmann, “An Introduction'' to Computing with Neural Nets,” IEEE ASSP Mag. Vol 4, pp. 4 – 22, 1987
    [16] D. E. Rumelhart, B. Widrow, M. A. Lehr, “The Basic Ideas in Neural Networks,” Communication of the ACM Vol 37, NO.3, March 1994
    [17] J. R. Rohilcek et al., ”Continuous Hidden Markov Modeling For Speaker-Independent Word Spotting,” Proc. Int. Conf. on Assp., pp.627-630, 1989.
    [18] R. C. Rose, D. B. Paul, ”A Hidden Markov Model Based Keyword Recognition System,” Proc. Int. Conf. on Assp. Vol.1, pp.129-132, 1990
    [19] R. C. Rose, ”Discriminant Wordspotting Techniques For Rejecting Non-Vocabulary Utterances In Unconstrained Speech,” Proc. Int. Conf. on Assp. Vol.2, pp.105-108, 1992.
    [20] L. Bahl et al.,” Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” Proc. Int. Conf. on Assp. Vol.11, pp. 49-52, 1986.
    [21] C. Torre, A. Acero, ”Discriminative Training of Garbage Model for Non-Vocabulary Utterance Rejection,” Proc. Int. Conf. on Spoken Language Processing, June 1994.
    [22] Moreno et al, ”Rejection Techniques in Continuous Speech Recognition Using Hidden Markov Model,” Proc. European Conf, on Signal Processing, pp.1383-1386, 1990.
    [23] M. W. Feng, B. Mazor, “Continuous Word Spotting for Applications in Telecommunication,” Proc. Int. Conf. on Spoken Language Processing, pp. 21-24, 1992.
    [24] R.W. Christiansen, C. K. Rushforth, ” Detecting and Locating Key Words in -Continuous Speech Using Linear Predictive Coding,” IEEE Trans. on Assp vol.25,No. 5, pp.361-367, Oct. 1977.
    [25] A. Higgins, R. Wohlford, “Keyword recognition using template concatenation,” Proc. IEEE int Conf. on Assp. Vol.10, pp.1233-1236, 1985.
    [26] H. W. Hon, K. F. LEE, “CMU Robust Vocabulary-Independent Speech Recognition System,” Proc. IEEE int Conf. on Assp. Vol.2, pp.889-892, May 1991.
    [27] J. R. Bellegarda, D. Nahamoo, ” Tied Mixture Continuous Parameter Modeling for Speech Recognition,” IEEE Trans. on Assp Vol.38, pp.2033-2045,1990.
    [28] B. H. Juang, L. Rabiner, “Mixture Autoregressive Hidden Markov Models for Speech Signals,” IEEE Trans. on Assp Vol.33, pp. 1404-1412,Dec. 1985.
    [29] X. D. Huang, M. A. Jack, “Semi-continuous hidden Markov models for speech signals”, Computer Speech and Language Vol.3, pp.239-257,1989.
    [30] L. F. Larnel, S. Seneff, “Speech Database Development`: Design and Analysis of the Acoustic-Phonetic Corpus,” Proc. MIT Speech Recognition Workshop, July 1986.
    [31] R. Schwartz, Y. L. Chow,” The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses,” Proc. ICASSP, pp.81-84, 1990.
    [32] S. R. Young, W. H. Ward, ” Recognition Confidence measures for spontaneous spoken dialog,” Proc. European. Conf. on Speech Communications, pp.1177-1179, 1993.
    [33] R. A. Sukkar, J. G. Wilpon,” A two pass classifier for utterance rejection in keyword spotting,” Proc. Int. Conf. on Assp. Vol.2, pp.451-454, April 1993.
    [34] W. Chou, B. H. Juang, C. H. Lee,” Segmental GPD training of HMM based speech recognizer,” Proc. Int. Conf. on Assp. Vol.1, pp.473-476, 1992.
    [35] L. Villarrubia, A. Acero, ”Rejection techniques for digit recognition in telecommunication applications,” Proc. Int. Conf. on Assp. Vol.2, pp.455-458, 1993.
    [36] J. G. Wilpon, D. M. DeMarco, R. P. Mikkilineni,” Isolated Word Recognition Over the DDD Telephone Network Results of Two Extensive Field Studies,” Proc. IEEE Int. Conf. on Assp Vol.1, pp.55-58,1988.
    [37] B. Chigier,” Rejection and Keyword Spotting Algorithms for a Directory Assistance City Name Recognition Application,” Proc. ICASSP, pp.93-96,1992.
    [38] Y. Gao et al, ” Tangerine:a large vocabulary Mandarin dictation system,” Proc. ICASSP Vol.1, pp.77-80,1995.
    [39] C. E. Mokbel, G. F. A. Chollet,” Automatic Word Recognition in Cars,” IEEE Trans. on Assp Vol.3, NO.5, pp.346-356, Sept. 1995.
    [40] 廖弘源, 吳宗憲教授 便利生活的多媒體人機通訊發明, Feb 2011,第122期國科會工程科技E-paper.
    [41] 王小川, 語音訊號處理,修訂二版, 全華圖書股份有限公司, 2009年2月.
    [42] R. Vergin, D. O’Shaughnessy, A. Farhat,”Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. on Speech and Audio processing VOL. 7, NO.5, pp.525-532, 1999.
    [43] C. Ai et al, ”Pipeline Damage and Leak Detection Based on Sound Spectrum LPCC and HMM,” Intelligent Systems Design and Applications, 2006.,829-833,2006.
    [44] R. M. Nickel, “Feature - Automatic Speech Character Identification,” IEEE Circuits and Systems Magazine, Vol.6, pp.10-31,2006.
    [45] 黃國彰, 「關鍵詞萃取與確認之研究”,國立中央大學碩士論文」,中華民國八十五年六月.
    [46] 蔡炎興, 「關鍵詞萃取即語者辨識系統之研製」,國立中央大學碩士論文,中華民國九十二年六月.
    [47] L. Gu, S. A. Zahorian, “A New Robust Algorithm for Isolated Word Endpoint Detection,” IV-4161 International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp.13-17, May 2002.
    [48] 黃明哲,易習Visual Basic 6 程式語言基礎入門, 經緯國際股份有限公司,2009年3月.
    [49] 黃明哲,易習Visual Basic 6 程式語言進階應用, 經緯國際股份有限公司,2009年3月.
    [50] 陳永達,詹可文, 微電腦控制 : 專題製作 : VB串並列埠控制, 全華圖書股份有限公司, 初版,2004年.

    QR CODE
    :::