互動式語音導覽系統｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林佑輯 You-ji Lin
論文名稱：	互動式語音導覽系統 An Interactive Speech Guidance System
指導教授：	莊堯棠 Yau-tarng Juang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 電機工程學系 Department of Electrical Engineering
畢業學年度：	98
語文別：	中文
論文頁數：	63
中文關鍵詞：	語音活動偵測、關鍵詞萃取、語音導覽系統
外文關鍵詞：	speech guidance system, keyword spotting, voice activity detection
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文主要是設計一個互動式語音導覽系統。我們模擬遊客在博物館中使用多媒體系統的情形，此系統所提供的服務包括地名、人物、產業及景點等的介紹，經由人機互動的問答方式提供友善的使用者介面，且採用語音合成來模擬人聲作為回應。
以次音節單元的關鍵詞萃取辨識技術可提高系統的可換性與移植性，在端點偵測的研究中，我們加入語音前後段的門檻值限制，以提升偵測正確率。藉由關鍵詞字彙結構的相關性，將所有的關鍵詞予以分類成一階層式架構，不但能降低非相關性字彙的誤判，還可大幅的減少辨識時間，在本論文裡，我們以8男8女的語料，針對50個關鍵詞來做辨識率的測試，實驗結果得到95.7%的辨識率及平均辨識一個句子需要0.25秒的時間。

This thesis deals with the design of an interactive speech guidance system for Dasi and Longtan. We use a hierarchical structure for keyword spotting to improve the recognition capability of the system. Through a series of questions and answers, a user-friendly interface is established. The developed speech guidance system provides interesting information for Dasi and Longtan, including the geographical names, some famous persons, industries, scenic spots and so on.
In our experiments, over 800 utterances pronounced by 8 males and 8 females are used to test the system performance. In average 0.25 seconds is spent for identifying a keyword and a recognition rate of 95.7% is obtained for the developed speech guidance system.

中文摘要..................................................i
英文摘要.................................................ii
誌謝....................................................iii
目錄.....................................................iv
附圖目錄.................................................vi
附表目錄...............................................viii
第一章 緒論...............................................1
1.1 研究動機..............................................1
1.2 研究目標..............................................2
1.3 章節概要..............................................2
第二章 語音處理的相關技術.................................3
2.1 特徵參數的擷取........................................3
2.2 隱藏式馬可夫模型......................................6
2.3 聲學模型..............................................8
2.4 模型的訓練演算法.....................................13
2.4.1 訓練流程圖.........................................13
2.4.2 維特比演算法.......................................15
第三章 關鍵詞萃取技術....................................17
3.1 概論.................................................17
3.2 關鍵詞萃取架構.......................................17
3.2.1 關鍵詞模組.........................................18
3.2.2 無關詞模組.........................................18
3.3 一階動態規劃演算法...................................19
3.4 關鍵詞辨識流程.......................................22
3.5 階層式關鍵詞萃取架構.................................23
第四章 語音導覽系統架構..................................25
4.1 音訊錄製與處理.......................................25
4.2語音活動偵測..........................................27
4.3 即時語音辨識系統.....................................29
4.3.1 Windows API的基本觀念..............................29
4.3.2 系統基本架構.......................................29
4.4系統功能說明與展示....................................32
第五章 實驗與結果........................................37
5.1 實驗環境.............................................37
5.2關鍵詞萃取實驗........................................40
第六章 結論與未來展望....................................47
6.1 結論.................................................47
6.2 未來展望.............................................48
參考文獻.................................................49

                                

[1] L. R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition,” Prentice Hall, New Jersey, 1993.
[2] Yumin Lee and Lin-Shan Lee, “Continuous Hidden Markov Models integrating transitional and instantaneous features for Mandarin syllable recognition,” Computer Speech and Language, vol.7, pp.247-263, 1993.
[3] Changsheng Ai, Honghua Zhao, Rujian Ma, and Xueren Dong, “Pipeline damage and leak detection based on sound spectrum LPCC and HMM,” Proceeding of the Sixth International Conference on Intelligent Systems Design and Applications, pp.829-833, Oct.2006.
[4] John R. Deller, Jr., John G. Proakis, John H. L. Hansen, “Discrete-Time Processing of Speech Signals,” 1987.
[5] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition,”Proceedings of the IEEE, vol.77, No.2, Feb.1989.
[6] Changsheng Ai, Xuan Sun, Honghua Zhao, Honghua Zhao, and Xueren Dong, “Pipeline damage and leak sound recognition based on HMM,” Proceeding of the 7th World Congress on Intelligent Control and Automation, pp.25-27, June.2008.
[7] M.-W. Koo, C.-H. Lee, and B.-H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Trans .on Speech and Audio Processing, vol. 9, No. 8, Nov. 2001.
[8] 蔡永琪，“基於次音節單元之關鍵詞辨識”，國立中央大學碩士論文，中華民國八十四年六月。
[9] H. Bourlard, B. D’hoore, J. M. Boite, “Optimizing recognition and rejection performance in wordspotting systems,” ICASSP 1994.
[10] H. Ney, “The use of a one-stage Dynamic Programming Algorithm for connected word rcognition,” IEEE Trans Acoustics Speech Signal Proc., vol.32, No.2, pp.263-271, April 1984.
[11] 楊鎮光，“Visual Basic與語音辨識－讓電腦聽話”，文魁資訊股份有限公司，2002.
[12] S. Uppgard, “Implementation and Analysis of Pitch Tracking Algorithms,” Report for Master of Science Thesis Project, KHT, Stockholm, Sweden, 2001.
[13] 林隆煥，“視窗程式設計函式庫: Win 32 API”，金禾資訊，2004.
[14] Lingyun Gu and Stephen A. Zahorian, “A New Robust Algorithm for Isolated Word Endpoint Detection,” IV-4161 International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp.13-17, May 2002.
[15] 蔡炎興，“關鍵詞萃取及語者辨識系統之研製”，國立中央大學碩士論文，中華民國九十二年六月。
[16] K. F. Lee, “Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System,” Ph.D Dissertation, Computer Science Department, Carnegie Mellon University, Apr. 1988.
[17] R. C. Rose & D. B. Paul, “A hidden Markov model based keyword recognition system,” ICASSP 1990.
[18] J. G. Wilpon, L. R. Rabiner, C. H. Lee, E. R. Goldmn, “Automatic recognition of keyword in unconstrained speech using hidden Markov models,” IEEE Trans. ASSP Vol.38, No.11, Nov. 1990.
[19] J. G. Wilpon, et al., “Automatic Recognition of Keyword in Unconstrained Speech Using Hidden Markov Models,” IEEE ASSP Magazine, Vol.38, pp. 1870-1878, 1990.
[20] Hapeshi, K., “Design guidelines for using speech in interactive multimedia systems,” inc. Baber and J. M. Noyes(eds), Interative Speech Technology (London: Taylor & Francis), pp.177-188, 1993.
[21] Q. Li, A. Tsai, Jinsong Zheng and Qiru Zhou, “Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition,” IEEE, Transations on Speech and Audio Processing, vol.10, No.3, March 2002.
[22] Nobuo Hataoka, Yasunari Obuchi, Teruko Mitamura, Eric Nyberg, “Robust Speech Dialog Interface for Car Telematics Service,” Consumer Communications and Networking Conference, First IEEE, CCNC 2004.
[23] Fengqin Yang, Changhai Zhang, “An Effective Hybrid Optimization Algorithm for HMM,” vol.4, pp.80-84, ICNC 2008.
[24] 黃國彰，“關鍵詞萃取與確認之研究”，國立中央大學碩士論文，中華民國八十五年六月。
[25] 王國榮，“Visual Basic 6.0與Windows API講座”，旗標，1998.
[26] 王小川，“語音訊號處理”，全華，2007.
[27] 葉人豪，林新德，郭雅秀，“多媒體槪論”，學貫行銷，2007.

簡易檢索 / 詳目顯示

相關論文