跳到主要內容

簡易檢索 / 詳目顯示

研究生: 溫家誠
Chia-chen Wen
論文名稱: 多媒體應用之語音辨識系統
Multimedia Applications for Speech Recognition System
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 96
語文別: 中文
論文頁數: 47
中文關鍵詞: 語音辨識系統
外文關鍵詞: Speech Recognition System
相關次數: 點閱:6下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著電子多媒體系統的迅速發展,使得多媒體服務有無限可能。其中藍芽系統已成為無線通訊技術發展的新領域,這代表著所有的應用將可透過藍芽技術整合功能,而能夠讓使用者更便利的利用這項服務,關鍵詞萃取語音辨識系統就成了重要的方式之ㄧ。
    在本論文中,我們首先將針對語音辨識發展理念規劃一套多媒體應用語音辨識系統,模擬使用者使用多媒體系統的情況。所提出的服務則基於駕駛者在車內最常使用的操控模式,包括聽音樂、打電話及導航系統等等,透過問答方式的人機互動介面讓操作者感到友善,且本系統中將採用語音合成來模擬人聲以作為回應。
    我們以關鍵詞萃取為主的辨識技術可提升系統的移植性與擴展性,而階層式架構設計可於各種環境下增加語音辨識的可靠度。然而環境噪音以及雜音干擾,我們將進行強健性語音辨識,利用強建語音參數及模型調適等方面的技術來降低測試環境的影響。最後,我們再對系統進一步增建個人化使用的設計,藉由語者辨識技術提供專屬的服務,且再運用語者模型調適技術來強化系統的辨識效能。


    Vehicle electronic multimedia system with the rapid development of the car makes the services provide immense possibilities. In which, the Bluetooth wireless technology has become a new area, and then all the applications will be integrated through this technology. However, the crucial role to play in that is speech recognition.
    In this thesis, we develop a speech recognition system of multimedia applications in car environment to mimic the using of multimedia for the driver and passengers. Our service is based on the most common use of control modes, including listening to music, phone and navigation systems, and so on. The user-friendly interface will be made through the interactive question-and-answer approach. Speech synthesis is adopted in our system to simulate human voices as response.
    Keyword spotting-based recognition system can improve the portability and system scalability and the design of hierarchical structure can increase speech recognition reliability in car environment. However, the vehicle noise and interference from vehicle environment is still a challenge, so we carry out the robustness speech recognition. Robust features and model adaptation methods are adopted to reduce the environmental impact of testing. Finally, we build a more personalized system for providing exclusive services. By the speaker recognition techniques, we also expect to strengthen the recognition system performance further.

    第一章 緒論 1.1 研究動機...................... 1 1.2 研究目標.....................2 1.3 章節概要....................3 第二章 語音辨識之基本技術 2.1 特徵參數擷取....................4 2.2 隱藏式馬可夫模型..................6 2.3 聲學模型......................8 2.4 模型訓練與參數預估................12 2.4.1 訓練演算法..................12 2.4.2 訓練流程圖..................15 第三章 系統方法 3.1 關鍵詞萃取..................... 17 3.1.1 關鍵詞模組..................17 3.1.2 無關辭模組..................17 3.1.3 辨識模組的排列................18 3.1.4 辨識演算法..................18 3.1.5 辨識流程...................21 3.2 MVA參數處理法.................. 23 3.2.1 MVA流程...................23 3.2.2 MVA處理結果.................25 3.3 語者調適.................... .26 3.3.1 語者調適概論.................26 3.3.2 最大相似度線性迴歸..............26 第四章 系統架構 4.1 系統環境.....................29 4.2 系統基本架構...................30 4.3 階層式設計....................32 4.4 系統強健化....................34 第五章 系統展示 5.1 系統功能展示...................36 5.2 功能說明.....................37 第六章 結論與未來展望 6.1 結論.......................42 6.2 未來展望.....................43 參考文獻.........................44

    [1] M.-W. Koo, C.-H. Lee, and B.-H Juang, “ Speech Recognition and Utterance Verification Based on a Generalized Confidence Score, ” IEEE Trans .on Speech and Audio Processing, vol. 9, No. 8, Nov. 2001.
    [2] Chi-Min Liu, Chin-Chih Chiu, and Hung-Yuan Chang “ Design of Vocabulary -Independent Mandarin Keyword Spotters, ” IEEE Trans. on Speech and Audio Processing, vol. 8, No. 4, July 2000.
    [3] B. H. Juang, “ The past, present, and future of speech processing, ” IEEE Trans. on Signal Processing, pp. 24-28, May 1998.
    [4] Huang, Kuo-Chang; Juang, Yau-Tarng; Chang, Wen-Chieh; September, 2006“ Robust integration for speech features” Signal Processing Volume: 86, Issue: 9, September, 2006, pp. 2282-2288(SCI).
    [5] L. R. Rabiner and B. H. Juang, “ Fundamentals of Speech Recognition, ” Prentice Hall, New Jersey, 1993.
    [6] R. Vergin and D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
    [7] John R. Deller, Jr., John G. Proakis, John H. L. Hansen, “ Discrete-Time Processing of Speech Signals ”, 1987
    [8] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “ An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition, ” The Bell System Technical Journal, vol. 62, No. 4, April 1983.
    [9] L. R. Rabiner, “ A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, ” Proceedings of the IEEE, vol. 77, No. 2, Feb. 1989.
    [10] D. Burshtein, “ Robust parametric modeling of duration in hidden Markov models, ” IEEE Trans. on Speech Audio Processing, vol. 4, pp. 240-242, May 1996.
    [11] H. Ney, “ The use of a one stage dynamic programming algorithm for connected word recognition, ” IEEE Trans on. Acoustic, Speech, Signal Processing, vol.32, No.2, April 1984.
    [12] Lin Xin and Bing-Xi Wang “ Utterance Verification For Spontaneous Mandarin Speech Keyword Spotting, ” IEEE Proceedings ICII 2001, Beijing, pp. 397-401 vol.3
    [13] M. J. F. Gales and P. C. Woodland, “Mean and variance adaptation within the MLLR Framework,” Computer Speech and Language, Vol. 10, pp. 249-264, 1996.
    [14] C. J. Leggeter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, pp.171-185, 1995.
    [15] N.J.-C. Wang, S.S.-M. Lee, F. Seide and Lin-Shan Lee; “Rapid speaker adaptation using a priori knowledge by eigenspace analysis of MLLR parameters”. Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ''01). 2001 IEEE International Conference on, Vol. 1 , 7-11 May 2001 Page(s): 345 -348 vol.1
    [16] Chia-Ping Chen; J.A. Bilmes ”MVA Processing of Speech Features”, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007
    [17] R. M. Stern, A. Acero, F.-H. Liu, and Y. Ohshima, “Signal processingfor robust speech recognition,” in Speech Recognit., C.-H. Lee and F. Soong, Eds. Boston, MA: Kluwer, 1996, pp. 351–378.
    [18] C.-P. Chen, J. Bilmes, and D. Ellis, Blind MVA speech feature processing on Aurora 2.0 Dept. Elect. Eng., Univ. Washington, Seattle, WA, Tech. Rep. UWEETR-2004-0017, 2004 [Online]. Available: http://www.ee.washington.edu/techsite/papers
    [19] 黃國彰,“國語語音強健辨認之研究”,國立中央大學電機工程研究所博士論文,民國九十二年。
    [20] 陳文杰,“雜訊環境下經驗模態分解法於語音辨識之應用”,國立中央大學電機工程研究所碩士論文,民國九十五年。
    [21] 蔡炎興,“關鍵詞萃取及語者辨識系統之研製”,國立中央大學碩士論文,中華民國九十二年六月。
    [22] 張文杰,“模型調識之語者辨識系統”,國立中央大學碩士論文,中華民國九十四年六月。

    QR CODE
    :::