| 研究生: |
楊鎮光 Zhen-Guang Yang |
|---|---|
| 論文名稱: |
快速演算法在大字彙關鍵詞萃取上的應用 |
| 指導教授: |
莊堯棠
Yau-Tarng Juang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 89 |
| 語文別: | 中文 |
| 論文頁數: | 43 |
| 中文關鍵詞: | CMS 、樹枝狀 、關鍵字萃取 、快速演算法 、Cepstrum Weighting |
| 相關次數: | 點閱:6 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在傳統whole word based的關鍵詞萃取辨識系統中,辨識效能常因關鍵詞彙的增加而導致辨識率下降及辨識時間增加,所謂的快速演算法,就是藉由關鍵詞字彙結構的相關性,將關鍵詞予以分類並加以結構化,因而能藉由樹枝狀的搜尋架構,大幅的減少辨識時間,而隨著關鍵詞彙的增加,辨識率仍能維持ㄧ定水準而不墬,這就是將快速演算法應用在大字彙關鍵詞萃取的目的.
在作法上,我們先將關鍵詞分成幾個次部分(subsets),而不同關鍵詞的次部分會包含相同的共同次字彙(common subword),如同樹枝一般,在辨識出前N個最佳的共同的次字彙之後,就能夠減小搜尋範圍,捨去不可能入選的關鍵詞,針對相似度比較高的關鍵詞進行最後的確認.進而達到快速的目的.
除了演算法本身之外,論文中還針對多項能夠提昇辨識率的方案進行實驗,這些方案包含了將無關詞對語音特徵的機率加上一縮小權值,以使關鍵詞的切音區更加準確.使用動態的權值,讓不同的測試語句都有相對應最佳的縮小權值.另外鑒於測試和訓練語料取得環境的不同(分別為電話及麥克風錄音),我們以CMS加上Cepstrum weighting分別對訓練語料及測試語料進行處理,並重新訓練次音節模型,最後,將處理前後(指有無加上CMS及Cepstrum weighting)的機率值混合考慮,並由實驗找出最佳的混合比例.由實驗結果可以發現,動態權值及機率混合考慮這兩種方法如配合使用,可達最佳辨識率Top1為91.32%.而僅使用單一權值的辨識效果最差,Top1達83.67%.
為了使關鍵詞萃取系統更加完整,關鍵詞拒絕的能力是有必要被加入的,在實驗結果方面,加入關鍵詞拒絕後的正確率為81.51%.
[1]Torsten Zeppenfeld et al., “ Improving the MS-TDNN for Word
Spotting ”, ICASSP ’93, pp. II-475~II-478.
[2]S. V. Kosonocky et al., “ A Continuous Density Neural Tree Network Word Spotting System ”, ICASSP ’95, pp. 1870~1878.
[3]Jay G. Wilpon et al., “ Automatic Recognition of keywords in Unconstrained Speech Using Hidden Markov Models ”, IEEE Trans on Assp, Vol. 38, No. 11, Nov 1990, pp. 1870~1878.
[4]R. C. Rose et al., “ A Hidden Markov Model Based Keyword Recognition System ”, ICASSP ’90, pp. 129~132.
[5]Rohilcek, J., Russel, W., Roukos, S., and Gish, H.(1989) “ Continuos Hidden Markov Modells for Speaker Independent Word Spotting, ” Proc. Int. Conf. On Acoust., Speech, and Signal Processing, pp. 627~630.
[6]Rose, B., and Paul, D.(1990) “ A Hidden Maekov Model Based Keyword Recognition System, ” Proc. Int. Conf. On Acoust., Speech, and Signal Processing, I , pp. 129~132.
[7]Rose, R.(1992) “ Discriminant Word Spotting Techniques for Rejecting Non-vocabulary Utterances in Unconstrained Speech ”, Proc. Int. Conf. On Acoust., Speech, and Signal Processing, II, pp. 105~108.
[8]Bahl, L., Brown, P., Souza, P., and Mercer, R.(1986) “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition, ” Proc. Int. Conf. on Acoust., Speech, and Signal Processing, I , pp. 49~52.
[9]A.L. Higgins and R.E. Wohlford,”Keyword recognition using template concatenation,”in Proc. IEEE Int. Conf. Acust., Speech, Signal Processing, Apr.1985, pp 1233-1236
[10]J .G.Wilpon,L. R. Rabiner,C. H.Lee, and E. R. Goldman,”Automatic recognition of keywords in unconstrained speech using hidden Markov models,”IEEE Trans. Acoust.,Speech,Signal Processing, vol.11,pp 1870-1878 ,Nov. 1990
[11]R.C. Rose and D.B.Paul ,”A hidden Markov model based keyword recognition system,”in Proc. IEEE Int .Conf Acoust.,Speech,Signal Processing ,Apr.1990,pp.129-130
[12]Christiansen, R. W. and Rushforth, C.K., “ Deteding and Locating Key Words in Continuous Speech Using Linear Predictive Coding. ” IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-25, No. 5, pp. 361~367, October 1977.
[13]Higgins, A. L. and Wohford, R. E., “Keyword Recognition Using Template Concatenation ” Proc. IEEE Int Conf. Acous., Speech, and Signal Processing, pp. 1233~1236, Tampa, Florida, March 1985.
[14]H. W. Hon and K. F. Lee, “ CMU robust vocabulary independent speech recognition system, ” Proc. Int. Conf. On Acoust., Speech, and Signal., pp. 889~892, May 1991.
[15]J. R. Bellegarda and D. Nahamoo, “ The mixture continuous parameter modeling for speech recognition , ” IEEE Trans. on Acoust, Speech and Signal. Proc., vol. ASSP-38, no. 12, pp. 2033~2045, 1990.
[16]B. H. Juang and L.R. Rabiner, “ Mixture Autoregressive Hidden Markov Models for Speech Signal ”, IEEE Trans. ASSP, vol. 33, pp. 1404~1412, Dec. 1985.
[17]X. D. Huang and M. A. Jack, “ Semi-continuous Hidden Markov models for speech signals ” Computer, Speechand Language, vol. 3 pp. 239~257, 1989.
[18]L. F. Larnel, and S. Seneff, “ speech database development:design and analysis of the acoustic-phonetic corpus, ” Proc. MIT Speech Recognition Workshop, July 1986.
[19]Richard Schwarz and Yen-Lu Chow, “ The N-Best Algorithm:An Efficient and Exact Procedure for Finding The N Most Likely Sentence Hypothese ”, Proc. Speech&Natural Language Workshop Oct., 1989., pp. 199~202.
[20]E.F. Huang,H.C.Chuan,and F.K. Soong,”A Fsat Algorithm for Large Vocabulary Keyword Spotting Application”,IEEE,Trans Speech and Audio Processing,VOL,2,NO.3,JULY 1994,PP,449-452
[21]Wilpon, J. G., DeMarco, D. M., and Mikkilineni, R. P., “ Isolated Word recognition over the DDD telephone network-result of two Extensive field studues, ” Proc. IEEE Int. Conf. Acous., Speech and Sig. Processing, 1S. 1. 10, pp. 55~57, New York City, NewYork, Apri, 1988.
[22]Chigier, B.(1992) “ Rejection and Keyword Spotting Algorithms for a Directory Assistance City Name Recognition Application, ” Proc. ICASSP, pp. 93~96.
[23]L. R. Rabiner and B. H. Juang, “ Fundamentals of Speech Recognition ”, Prentice-Hall Co. Ltd, 1993.
[24]F. K. Soong and A. F. Rosenberg, “ On the Use of Instantaneous and Transitional Spectral Information in Speaker Recognition ”, Proc. ICASSP, pp. 877~880, 1986.
[25]F. Itakura and T. Umezaki, “ Distance Measure for Speech Recognition Based on the Smoothed Group Delay Spectrum ”, Proc. ICASSP, pp. 1257~1260, 1987.
[26]D. Mansour and B. H. Juang, “ A Familiy of Distortion Measure Based upon Projection for Robust Speech Recognition ”, IEEE Trans. ASSP, Vol. 37, pp. 1659~1671, 1989.
[27]K. K. Paliwai and M. M. Sondhi, “ Recognition of Noisy Speech using Cumulant-Based Linear Prediction Analysis ”, Proc. ICASSP, pp. 429~432, 1990.
[28]D. Mansour and B. H. Juang, “ The Short-Time Modified Coherence Representation and Noisy Speech Recognition ”, IEEE Trans. ASSP, Vol. 37, pp. 795~804, June 1989.
[29]L. R. Rabiner and R. W. Schafer, “ Digital Processing of Speech Recognition Signals ”, Prentice-Hall Co. Ltd, 1978.
[30]Mokbel,C., Monne,J. and Jouvet, D.:”One-Line Adaption of a Speech Recognizer to Variations in Telephone Line Conditions”,European Conference of Speech Communication and Technology (EURPOSPEECH),pp.1247-1250,1993
[31] Mokbel,C.,Paches-Ieal,P., Monne,J. and Jouvet, D.:”Compensation of Telephone Line Effect for Robust Speech Recognition”,Int Conf. Spoken Language Processing,pp.987-990,1994
[32]Becchetti,C. and L.P. Ricotti,Speech Recognition,John Wiley& Sons,1999.
[33]Rabiner,L. and B.H. Juang,”Fundamentals of Speech Recognition”,Prentice-Hall,1993.