跳到主要內容

簡易檢索 / 詳目顯示

研究生: 楊鎮光
Zhen-Guang Yang
論文名稱: 快速演算法在大字彙關鍵詞萃取上的應用
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 89
語文別: 中文
論文頁數: 43
中文關鍵詞: CMS樹枝狀關鍵字萃取快速演算法Cepstrum Weighting
相關次數: 點閱:5下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在傳統whole word based的關鍵詞萃取辨識系統中,辨識效能常因關鍵詞彙的增加而導致辨識率下降及辨識時間增加,所謂的快速演算法,就是藉由關鍵詞字彙結構的相關性,將關鍵詞予以分類並加以結構化,因而能藉由樹枝狀的搜尋架構,大幅的減少辨識時間,而隨著關鍵詞彙的增加,辨識率仍能維持ㄧ定水準而不墬,這就是將快速演算法應用在大字彙關鍵詞萃取的目的.
    在作法上,我們先將關鍵詞分成幾個次部分(subsets),而不同關鍵詞的次部分會包含相同的共同次字彙(common subword),如同樹枝一般,在辨識出前N個最佳的共同的次字彙之後,就能夠減小搜尋範圍,捨去不可能入選的關鍵詞,針對相似度比較高的關鍵詞進行最後的確認.進而達到快速的目的.
    除了演算法本身之外,論文中還針對多項能夠提昇辨識率的方案進行實驗,這些方案包含了將無關詞對語音特徵的機率加上一縮小權值,以使關鍵詞的切音區更加準確.使用動態的權值,讓不同的測試語句都有相對應最佳的縮小權值.另外鑒於測試和訓練語料取得環境的不同(分別為電話及麥克風錄音),我們以CMS加上Cepstrum weighting分別對訓練語料及測試語料進行處理,並重新訓練次音節模型,最後,將處理前後(指有無加上CMS及Cepstrum weighting)的機率值混合考慮,並由實驗找出最佳的混合比例.由實驗結果可以發現,動態權值及機率混合考慮這兩種方法如配合使用,可達最佳辨識率Top1為91.32%.而僅使用單一權值的辨識效果最差,Top1達83.67%.
    為了使關鍵詞萃取系統更加完整,關鍵詞拒絕的能力是有必要被加入的,在實驗結果方面,加入關鍵詞拒絕後的正確率為81.51%.


    第一章序論 1 1.1 研究動機…………………………………………………1 1.2 關鍵詞萃取的基本定義………………………………….1 1.3 快速演算法的概念……………………………………….1 1.4技術回顧………………………………………………….2 1.5 論文大綱………………………………………………….4 第二章語音辨識的基本技術 5 2.1 概論…………………………………………………5 2.2 特徵參數的求取…………………………………………5 2.3 隱藏式馬可夫模型………………………………………7 2.3.1 隱藏式馬可夫模型的描述…………………………8 第三章系統架構 11 3.1 概論……………………………………………….11 3.2 模型參數………………………………………………11 3.3 訓練與辨識的演算法…………………………………12 3.3.1 訓練演算法…………………………………………..12 3.3.2 辨識模組與辨識演算法….…………………………15 第四章快速演算法 16 4.1 概論……………………………………………….16 4.2 快速演算法……………………………………………16 4.3 無關詞模組對特徵值機率的縮小權值.……………..21 4.3.1 靜態的縮小權值…………..…………………………21 4.3.2 動態調整縮小權值………..…………………………21 4.4 兩種對特徵值處理的方法─Cepstrum Mean Subtraction和Cepstrum Weighting…………………….……………22 4.4.1 Cepstrum Mean Subtraction和Cepstrum Weighting…22 4.4.2將Cepstrum+Delta Cepstrum及Cepstrum+Delta Cepstrum+CMS+Cepstrum Weighting的機率值加權計算…………………………………………………23 4.5 關鍵詞的拒絕能力.…………………………………23 4.5.1 關鍵詞拒絕的原理…………………………………..23 4.5.2 訓練反模型(anti-model)的方法……………………..24 4.5.3 訓練臨界值τk的方法………………………………25 4.5.4 關鍵詞拒絕的演算法………………………………..26 4.5.5 錯誤率的計算………………………………………..26 第五章實驗與結果 28 5.1 概論……………………………………………….28 5.2 實驗環境………………………………………………28 5.3 大字彙的關鍵詞萃取實驗………………….…………..28 第六章結論 6.1 結論…………………………………………………...…38 6.2 未來發展………………………………………………...39 參考文獻

    [1]Torsten Zeppenfeld et al., “ Improving the MS-TDNN for Word
    Spotting ”, ICASSP ’93, pp. II-475~II-478.
    [2]S. V. Kosonocky et al., “ A Continuous Density Neural Tree Network Word Spotting System ”, ICASSP ’95, pp. 1870~1878.
    [3]Jay G. Wilpon et al., “ Automatic Recognition of keywords in Unconstrained Speech Using Hidden Markov Models ”, IEEE Trans on Assp, Vol. 38, No. 11, Nov 1990, pp. 1870~1878.
    [4]R. C. Rose et al., “ A Hidden Markov Model Based Keyword Recognition System ”, ICASSP ’90, pp. 129~132.
    [5]Rohilcek, J., Russel, W., Roukos, S., and Gish, H.(1989) “ Continuos Hidden Markov Modells for Speaker Independent Word Spotting, ” Proc. Int. Conf. On Acoust., Speech, and Signal Processing, pp. 627~630.
    [6]Rose, B., and Paul, D.(1990) “ A Hidden Maekov Model Based Keyword Recognition System, ” Proc. Int. Conf. On Acoust., Speech, and Signal Processing, I , pp. 129~132.
    [7]Rose, R.(1992) “ Discriminant Word Spotting Techniques for Rejecting Non-vocabulary Utterances in Unconstrained Speech ”, Proc. Int. Conf. On Acoust., Speech, and Signal Processing, II, pp. 105~108.
    [8]Bahl, L., Brown, P., Souza, P., and Mercer, R.(1986) “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition, ” Proc. Int. Conf. on Acoust., Speech, and Signal Processing, I , pp. 49~52.
    [9]A.L. Higgins and R.E. Wohlford,”Keyword recognition using template concatenation,”in Proc. IEEE Int. Conf. Acust., Speech, Signal Processing, Apr.1985, pp 1233-1236
    [10]J .G.Wilpon,L. R. Rabiner,C. H.Lee, and E. R. Goldman,”Automatic recognition of keywords in unconstrained speech using hidden Markov models,”IEEE Trans. Acoust.,Speech,Signal Processing, vol.11,pp 1870-1878 ,Nov. 1990
    [11]R.C. Rose and D.B.Paul ,”A hidden Markov model based keyword recognition system,”in Proc. IEEE Int .Conf Acoust.,Speech,Signal Processing ,Apr.1990,pp.129-130
    [12]Christiansen, R. W. and Rushforth, C.K., “ Deteding and Locating Key Words in Continuous Speech Using Linear Predictive Coding. ” IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-25, No. 5, pp. 361~367, October 1977.
    [13]Higgins, A. L. and Wohford, R. E., “Keyword Recognition Using Template Concatenation ” Proc. IEEE Int Conf. Acous., Speech, and Signal Processing, pp. 1233~1236, Tampa, Florida, March 1985.
    [14]H. W. Hon and K. F. Lee, “ CMU robust vocabulary independent speech recognition system, ” Proc. Int. Conf. On Acoust., Speech, and Signal., pp. 889~892, May 1991.
    [15]J. R. Bellegarda and D. Nahamoo, “ The mixture continuous parameter modeling for speech recognition , ” IEEE Trans. on Acoust, Speech and Signal. Proc., vol. ASSP-38, no. 12, pp. 2033~2045, 1990.
    [16]B. H. Juang and L.R. Rabiner, “ Mixture Autoregressive Hidden Markov Models for Speech Signal ”, IEEE Trans. ASSP, vol. 33, pp. 1404~1412, Dec. 1985.
    [17]X. D. Huang and M. A. Jack, “ Semi-continuous Hidden Markov models for speech signals ” Computer, Speechand Language, vol. 3 pp. 239~257, 1989.
    [18]L. F. Larnel, and S. Seneff, “ speech database development:design and analysis of the acoustic-phonetic corpus, ” Proc. MIT Speech Recognition Workshop, July 1986.
    [19]Richard Schwarz and Yen-Lu Chow, “ The N-Best Algorithm:An Efficient and Exact Procedure for Finding The N Most Likely Sentence Hypothese ”, Proc. Speech&Natural Language Workshop Oct., 1989., pp. 199~202.
    [20]E.F. Huang,H.C.Chuan,and F.K. Soong,”A Fsat Algorithm for Large Vocabulary Keyword Spotting Application”,IEEE,Trans Speech and Audio Processing,VOL,2,NO.3,JULY 1994,PP,449-452
    [21]Wilpon, J. G., DeMarco, D. M., and Mikkilineni, R. P., “ Isolated Word recognition over the DDD telephone network-result of two Extensive field studues, ” Proc. IEEE Int. Conf. Acous., Speech and Sig. Processing, 1S. 1. 10, pp. 55~57, New York City, NewYork, Apri, 1988.
    [22]Chigier, B.(1992) “ Rejection and Keyword Spotting Algorithms for a Directory Assistance City Name Recognition Application, ” Proc. ICASSP, pp. 93~96.
    [23]L. R. Rabiner and B. H. Juang, “ Fundamentals of Speech Recognition ”, Prentice-Hall Co. Ltd, 1993.
    [24]F. K. Soong and A. F. Rosenberg, “ On the Use of Instantaneous and Transitional Spectral Information in Speaker Recognition ”, Proc. ICASSP, pp. 877~880, 1986.
    [25]F. Itakura and T. Umezaki, “ Distance Measure for Speech Recognition Based on the Smoothed Group Delay Spectrum ”, Proc. ICASSP, pp. 1257~1260, 1987.
    [26]D. Mansour and B. H. Juang, “ A Familiy of Distortion Measure Based upon Projection for Robust Speech Recognition ”, IEEE Trans. ASSP, Vol. 37, pp. 1659~1671, 1989.
    [27]K. K. Paliwai and M. M. Sondhi, “ Recognition of Noisy Speech using Cumulant-Based Linear Prediction Analysis ”, Proc. ICASSP, pp. 429~432, 1990.
    [28]D. Mansour and B. H. Juang, “ The Short-Time Modified Coherence Representation and Noisy Speech Recognition ”, IEEE Trans. ASSP, Vol. 37, pp. 795~804, June 1989.
    [29]L. R. Rabiner and R. W. Schafer, “ Digital Processing of Speech Recognition Signals ”, Prentice-Hall Co. Ltd, 1978.
    [30]Mokbel,C., Monne,J. and Jouvet, D.:”One-Line Adaption of a Speech Recognizer to Variations in Telephone Line Conditions”,European Conference of Speech Communication and Technology (EURPOSPEECH),pp.1247-1250,1993
    [31] Mokbel,C.,Paches-Ieal,P., Monne,J. and Jouvet, D.:”Compensation of Telephone Line Effect for Robust Speech Recognition”,Int Conf. Spoken Language Processing,pp.987-990,1994
    [32]Becchetti,C. and L.P. Ricotti,Speech Recognition,John Wiley& Sons,1999.
    [33]Rabiner,L. and B.H. Juang,”Fundamentals of Speech Recognition”,Prentice-Hall,1993.

    QR CODE
    :::