多種語音特徵的合併及其在智慧型手機上之應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	張智傑 Chih-chieh Chang
論文名稱：	多種語音特徵的合併及其在智慧型手機上之應用 Combination of Multiple Speech Features and its Application on Smartphone
指導教授：	莊堯棠 Y.T. Juang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2014
畢業學年度：	102
語文別：	中文
論文頁數：	84
中文關鍵詞：	語音辨識、特徵、合併、智慧型手機、iPhone 、關鍵詞萃取
外文關鍵詞：	speech recognition, feature, combination, smartphone, iphone, keyword spotting
相關次數：	點閱：22 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文研究主題為針對語音辨識中的特徵值擷取部分進行改良。特徵值擷取在語音辨識上是很重要的一個部分，具有降低資料量與突顯聲音特性兩個優點，許多學者都曾提出不同的特徵參數或改良方式以突顯不同的語音特性，本論文主要為提出一種合併特徵參數的方法，用以將不同的特徵值方法擷取出的語音特性結合在一起。經實驗結果發現，依此方法合併後的特徵參數能有效的提升關鍵詞萃取系統的辨識率，證明合併的方法能有效的加強聲音的特性。
本論文第二部分在於將關鍵詞萃取系統應用於iPhone智慧型手機App上實作出一個聲控的小遊戲，並於遊戲中實現即時語音辨識的功能。

This thesis deals with the improvement on the speech feature extracting part in speech recognition. Feature extraction is a very important part in speech recognition, by having two advantages of reducing the amount of data and highlighting the characteristics of voice. Many researchers have been published different extracting methods or improving methods for speech features for highlighting different characteristics of voice. This thesis presents a method for combining different speech features, and binding the characteristics of different feature methods together. The result of our experiments showed that the proposed method improves the recognition rate of the keyword spotting system, and also proved that the method can effectively improve the characteristics of voice.
In the second part of this thesis, we apply the keyword spotting system to iPhone smartphone app and build a voice-controlled game to achieve real-time speech recognition.

摘要    I
Abstract    II
致謝    III
目錄    IV
圖目錄    VI
表目錄    VIII
第一章 緒論    1
1 研究動機    1
2 研究目標    1
3 文獻回顧    2
4 章節摘要    4
第二章 系統概述    6
1 特徵參數擷取    7
1.1 LPCC    7
1.2 MFCC    11
1.3 PLPCC    14
2 特徵參數補償    16
3 隱藏式馬可夫模型    17
4 聲學模型    20
5 模型訓練    25
第三章 多種特徵參數的合併    29
1 語音特性    29
1.1 LPCC    29
1.2 MFCC    31
1.3 PLPCC    32
2 合併特徵參數的方法    33
第四章 實驗結果與分析    37
1 關鍵詞萃取系統    37
1.1 關鍵詞系統架構    37
1.2 辨識演算法    39
2 實驗結果    41
2.1 實驗環境    41
2.2 單一特徵參數實驗    43
2.3 合併特徵參數實驗    45
2.4 權重向量實驗    49
2.5 特徵參數維度實驗    51
第五章 系統應用    56
1 開發環境    56
1.1 開發平台    56
1.2 程式語言    59
2 系統介紹    63
2.1 錄音功能說明    65
2.2 辨識功能說明    69
2.3 畫面展示    71
第六章 結論與未來展望    74
1 結論    74
2 未來展望    75
參考文獻    77
附錄    83

                                

[1] Bradbury, J., “Linear Predictive Coding,” Online PDF, pp. 1-23, 2000.

[2] Chakroborty,S. and Goutam, S., “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.

[3] Charbuillet, C., Gas, B., Chetouani, M., and Zarader, J., “Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm,” NOLISP, pp. 105-113, 2007.

[4] Chanwoo K. and Stern R.M., “Power-normalized cepstral coefficients (PNCC) for robust speech recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4101-4104, 2012.

[5] Dubey R.K. and Kumar A., “Non-intrusive objective speech quality assessment using a combination of MFCC, PLP and LSF features,” IEEE International Conference on Signal Processing and Communication (ICSC), pp. 297-302, 2013.

[6] Davis S.B. and Mermelstein P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE T Acoust., Speech Signal P., pp. 357–366, 1980.

[7] Evgeny K., “Real-time speaker identification,” Thesis of University of Joensuu, 2003.

[8] Hermansky H., “Perceptual Linear Predictive Analysis of Speech,” J Acoustic SOC America, v87, 114, 1990.

[9] Li J., Zhao B. and Zhang H., “Face recognition based on PCA and LDA combination feature extraction,” IEEE International Conference on Information Science and Engineering (ICISE), pp. 1240-1243, 2009.

[10] Mitra V., Franco H., Graciarena M. and Mandal A., “Normalized amplitude modulation features for large vocabulary noise-robust speech recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4117-4120, 2012.

[11] Mporas, Iosif, et al. "Comparison of Speech Features on the Speech Recognition Task." Journal of Computer Science 3.8, 2007.

[12] Nickel R.M., “Feature-Automatic Speech Character Identification,” IEEE Circuits and Systems Magazine, pp. 10-31, 2006.

[13] Ney,H., “The use of a one stage dynamic programming algorithm for connected word recognition,” IEEE Acoustic, Speech Signal, Processing,Vol. 32,pp. 263-271, 1984.

[14] Qian Z., Liu L.Y. and Li Z.Y., “Speaker identification based on MFCC and IMFCC,” IEEE International Conference on Information Science and Engineering (ICISE), pp. 5416-5419, 2009.

[15] Patel I. and Rao Y.S., “Speech recognition using hidden markov model with MFCC-subband technique,” IEEE International Conference on Telecommunication and Computing, pp. 168-172, 2010.

[16] Rose, R. C. and Paul, D. B., “A hidden Markov model based keyword recognition system,” IEEE Acoustics, Speech, and Signal Processing, pp.129-132, 1990.

[17] Revathi A. and Venkataramani Y., “Speaker independent continuous speech and isolated digit recognition using VQ and HMM,” IEEE International Conference on Communications and Signal Processing (ICCSP), pp. 198-202, 2011.

[18] Schafer, R.W. and Wbiner, L., “Digital representations of speech signals,” IEEE Journals & Magazines, Vol.63, pp. 662-677, 1975.

[19] Shrawankar U. and Thakare V., “Feature Extraction for a speech recognition system in noisy environment: A study,” IEEE Second International Conference on Computer Engineering and Applications, Vol.1, pp. 358-361, 2010.

[20] Shannon, B. J. and Paliwal K. K., “Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition,” Science Direct Speech Communication, Vol.48, pp. 1458-1485, 2006.

[21] Skowronski, M. and Harris, J., “Increased mfcc filter bandwidth for noise-robust phoneme recognition,” IEEE Acoustics, Speech and Signal Processing, Vol.1, pp. 801-804, 2002.

[22] Skowronski M.D. and Harris J.G., “Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition,” J. Acoust. Soc. Am., pp. 1774–1780, 2004.

[23] Tiberewala,S. and Hermansky,H., "Multiband and adaptation approachesto robust speech recognition", Eurospeech97, 1997, pp. 107-110, 1997.

[24] Viikki O. and Laurila K., “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Science Direct Speech Communication, Vol. 25, pp. 133-147, 1998.

[25] Wu, J. and Yu, J., “An Improved Arithmetic of MFCC in Speech Recognition System,” International Conference on Electronics, Communications and Control (ICECC), pp. 719-722, 2011.

[26] Weng Z.F., Li L. and Guo D., “Speaker recognition using weighted dynamic MFCC based on GMM,” IEEE International Conference on Anti-Counterfeiting Security and Identification in Communication (ASID), pp. 285-288, 2010.

[27] Wei H., Chan C.F., Choy C.S. and Pun K.P., “An efficient MFCC extraction method in speech recognition,” IEEE International Symposium on Circuits and Systems (ISCAS), 2006.

[28] Wisesty U.N., Liong T.H. and Adiwijaya, “Indonesian speech recognition system using Discriminant Feature Extraction-Neural Predictive Coding (DFE-NPC) and Probabilistic Neural Network,” IEEE International Conference on Computational Intelligence and Cybernetics, pp. 158-162, 2012.

[29] Yuan Y., Zhao P. and Zhou Q., “Research of speaker recognition based on combination of LPCC and MFCC,” IEEE International Conference on Intelligent Computing and Intelligent Systems, Vol.3, pp. 765-767, 2010.

[30] Z. Tufekci and J.N. Gowdy, “Feature Extraction using discrete wavelet transform for speech recognition,” IEEE Southeastcon 2000, pp. 116-123, 2000.

[31] Zhu X., Chen Y., Liu J. and Liu R., “Feature selection in Mandarin large vocabulary continuous speech recognition,” IEEE International Conference on Signal Processing, Vol.1, pp. 508-511, 2002.

[32] Zhao X., Shao Y. and Wang D., “CASA-based robust speaker identification,” IEEE Journals & Magazines, Vol.20, pp. 1608-1616, 2012.

[33] Zhao X., Shao Y. and Wang D., “Analyzing noise robustness of MFCC and GFCC features in speaker identification,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7204-7208, 2013.

[34] 王祐邦, “Advanced DSP final report: Speech signal time-frequency analysis and Mel-filter cepstral coefficient-A tutorial,” Thesis of National Taiwan University, 2010.

[35] 蔡炎興, “關鍵詞萃取即語者辨識系統之研製,” 國立中央大學碩士論文, 2003.
[36] 呂易宸, “語音門禁系統,” 國立中央大學碩士論文, 2011.
[37] 林品宏, “關鍵詞萃取系統及語音聲控車之應用,” 國立中央大學碩士論文, 2012.
[38] 高志杰, “粒子群演算法應用於梅爾濾波器組之研究,” 國立中央大學碩士論文, 2013.
[39] 林銘駿, “環境中低頻噪音之量測及管制策略研究,” 國立中央大學碩士論文, 2008.
[40] 蘇培智, “基於藉語音再取樣萃取共振峰變化之聲調調整技術,” 國立中央大學碩士論文, 2004.
[41] 簡忠弘, “關鍵詞辨認系統的研究與實現,” 國立清華大學碩士論文, 1997.
[42] 張志豪, “強健性和鑑別力語音特徵擷取技術於大詞彙連續語音辨識之研究,” 國立師範大學碩士論文, 2005.
[43] 謝宗學, “加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識,” 國立暨南國際大學碩士論文, 2006.
[44] 朱斯詠, “使用長時域特徵參數的串接式辨識系統,” 國立臺灣大學碩士論文, 2008.
[45] 王小川, “語音訊號處理,” 全華圖書股份有限公司,2009.
[46] 林柏全, “iPhone創意程式設計家第二版,” 松崗資產管理股份有限公司, 2010.
[47] “大五碼,” 台灣財團法人資訊工業策進會,1983.
[48] “MAT Speech Database,” 中華民國計算語言學學會 http://www.aclclp.org.tw/doc/mat2500_brief.pdf

簡易檢索 / 詳目顯示

相關論文