跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林志榮
Zhe-Run Lin
論文名稱: 結合隱藏式馬可夫模型與類神經網路之國語語音辨識
指導教授: 莊堯棠
Yau-Tarng Juang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 88
語文別: 中文
論文頁數: 48
中文關鍵詞: 隱藏式馬可夫模型類神經網路模型語者相關系統語者無關系統
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • 在語者相關系統方面,三種系統的辨識率皆在九成以上,其中HMM-NN-Net、NN-NN-Net狀態模型更可達百分之百,並且在經過對收斂條件適當的調整後,HMM-NN-Net狀態模型的辨識率以微幅的差距超越隱藏式馬可夫模型。
    在語者無關系統方面,HMM-NN-Net狀態模型以94.25﹪的辨識率領先其他模型,進一步證明了新方法的可行性。同時,利用HMM-NN-Net與NN-NN-Net兩種狀態模型的比較,對類神經網路收斂問題,做完整的分析。


    Hidden Markov model (HMM) was widely used for speech recognition and has been proved useful in dealing with the statistical and sequential aspects of the speech signal. However, their discriminative properties are weak if they are trained with the maximum likelihood. On the other hand, neural networks (NN) have powerful classification capability but are not well-suited for dealing with time-varying input patterns. In this study, a hybrid HMM-NN speech recognition system that combines the advantages of both models is presented. Three neural net state models, HMM-NN-Net, HMM-HMM-Net and NN-NN-Net, are developed for the proposed hybrid HMM-NN system. All the experimental results are compared with the one obtained from HMM.
    In the speaker-dependent experiment, the recognition rates of all the three models are above the level of 90 percent. Furthermore, in spite of the results of HMM-HMM-Net models, all error rates approach to zero after adjusting the criterion.
    In the speaker-independent case, HMM-NN-Net model achieves a recognition rate of 94.25 percent and has the best performance compared with other models. Besides, NN-NN-Net model requires less training time than HMM-NN-Net model although its recognition capability cannot compete with HMM-NN-Net model.
    The experimental results indicate that the hybrid HMM-NN recognition system based on HMM-NN-Net model improves the performance of traditional HMM system. It is also found that the criterion of neural net state models was related to the recognition capability.

    摘要I AbstractII 誌謝III 目錄IV 附圖目錄VI 表格目錄VIII 第一章 導論1 1.1 研究動機1 1.2 文獻回顧1 1.3 研究目標2 1.4 方法簡介3 1.4.1 以訓練樣本建立辨識模型3 1.4.2 輸入測試樣本進行辨識4 1.5 論文大綱5 第二章 理論基礎6 2.1 特徵參數的求取6 2.2 隱藏式馬可夫模型7 2.3 類神經網路11 2.3.1 倒傳遞網路的定義與學習原理12 2.3.1 倒傳遞網路的訓練方法17 第三章 結合隱藏式馬可夫模型與類神經網路模型之語音辨識系統 20 3.1 模型訓練階段20 3.1.1 隱藏式馬可夫模型音框分配系統20 3.1.2 自我監督類神經網路模型音框分配系統21 3.1.3 完整訓練流程22 3.2 模型辨識階段28 3.2.1 隱藏式馬可夫模型辨識方法28 3.2.2 類神經網路狀態模型辨識方法28 第四章 實驗結果與討論35 4.1 系統設定35 4.2 語者相關辨識系統37 4.3 語者無關辨識系統40 第五章 結論與未來展望44 5.1 結論44 5.2 未來展望45 參考文獻46

    ﹝1﹞ L. E. Baum and T. Tetrie, “Statistical Inference for Probabilistic Functions of Finite State Markov Chains,” Ann. Math. Stat., Vol. 37, pp. 1554-1563, 1966.
    ﹝2﹞ L. E. Baum and J. A. Egon, “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of A Markov Process and to A Model for Ecology,” Bull. Amer. Meteorol. Soc., Vol. 73, pp. 360-363, 1967.
    ﹝3﹞ L. E. Baum and G. R. Sell, “Growth Functions for Transformations on Manifolds,” Pac. J. Math., Vol. 27, No.2, pp. 211-227, 1968.
    ﹝4﹞ L. E. Baum, T. Petrie, G. Soules, and N Weiss, “A Maximization Technique Occurring in The Statistical Analysis of Probabilistic Functions of Markov Chains,” Ann. Math. Stat., Vol. 41, No. 1, pp. 164-171, 1970.
    ﹝5﹞ L. E. Baum, “An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, Vol. 3, pp. 1-8, 1972.
    ﹝6﹞ J. K. Baker, “The Dragon System-An Overview,” IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 23, No. 1, pp. 24-29, Feb. 1975.
    ﹝7﹞ F. Jelinek, “A Fast Sequential Decoding Algorithm Using A Stack,” IBM J. Res. Develop., Vol. 13, pp. 675-685,1969.
    ﹝8﹞ L. R. Bahl and F. Jelinek, “Decoding for Channels with Insertions, Deletions, and Substitutions with Applications to Speech Recognition,” IEEE Trans. on Information Theory, Vol. 21, pp. 404-411, 1975.
    ﹝9﹞ F. Jelinek, L. R. Bahl, and R. L. Mercer, “Design of A Linguistic Statistical Decoder for The Recognition of Continuous Speech,” IEEE Trans. on Information Theory, Vol. 21, pp. 250-256, 1975.
    ﹝10﹞ F. Jelinek, “Continuous Speech Recognition by Statistical Methods,” Proc. IEEE, Vol. 64, pp. 532-536, Apr. 1976.
    ﹝11﹞ R. Bakis, “Continuous Speech Word Recognition via Centi-second Acoustic States,” in Proc. ASA Meeting (Washington DC), Apr. 1976.
    ﹝12﹞ F. Jelinek, L. R. Bahl, and R. L. Mercer, “Continuous Speech Recognition: Statistical Methods,” in Handbook of statistics, II, P. R. Krishnaiad, Ed. Amsterdam, The Netherlands: North-Holland, 1982.
    ﹝13﹞ L. R. Bahl, F. Jelinek, and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence., Vol. 5, pp. 179-190, 1983.
    ﹝14﹞ L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, Vol. 77, No.2, pp. 257-286, Feb. 1989.
    ﹝15﹞ K. J. Lang, Alex H. Waibel and G. E. Hinton, “A Time-Delay Neural Network Architecture for Isolated Word Recognition,” Neural Networks, Vol. 3, pp. 23-43, 1990.
    ﹝16﹞ A. Bendiksen and K. Steiglitz, “Neural Networks for Voiced/Unvoiced Speech Classification,” IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1, No. 90, pp. 521-524, 1990.
    ﹝17﹞ T. Ghiselli-Crippa, A. El-Jaroudi, “A Fast Neural Net Training Algorithm and Its Application to Voiced-Unvoiced-Silence Classification of Speech,” IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1, No. 91, pp. 441-444, 1991.
    ﹝18﹞ Y. Qi and B. R, Hunt, “Voiced-Unvoiced-Silence Classifications of Speech Using Hybrid Features and A Network Classifier,” IEEE Trans. on Speech and Audio Processing, Vol. 1, No. 2, pp. 250-255, Apr. 1993.
    ﹝19﹞ G. Kuhn, R. L. Watrous and B. Ladendorf, “Connected Recognition with A Recurrent Network,” Speech Communication, Vol. 9, No. 1, pp. 41-48, Feb. 1990.
    ﹝20﹞ S. J. Lee, K. C. Kim, H. Yoon and J. W. Cho, “Application of Fully Recurrent Neural Networks for Speech Recognition,” Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1, pp. 77-80, 1991.
    ﹝21﹞ A. Hunt, “Recurrent Neural Networks for Syllabification,” Speech Communication, Vol. 13, pp. 323-332, 1993.
    ﹝22﹞ T. Lee, P. C. Ching and L. W. Chan, “Recurrent Neural Networks for Speech Modeling and Speech Recognition,” Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 5, pp. 3319-3322, 1995.
    ﹝23﹞ W.-Y. Chen, Y.-F. Liao and S.-H. Chen, “Speech Recognition with Hierarchical Recurrent Neural Networks,” Pattern Recognition, Vol. 28, No. 6, pp. 795-805, 1995.
    ﹝24﹞ H. Bourlard and C. j. Wellekens, “Links between Markov Models and Multilayer Perceptrons,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 12, No. 12, pp. 1167-1178, Dec. 1990.

    QR CODE
    :::