基於麥克風陣列的語者辨識系統設計與實作｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	游仁男 Jen-Nan Yu
論文名稱：	基於麥克風陣列的語者辨識系統設計與實作 Design and Implementation of a Microphone Array Based Speaker Recognition System
指導教授：	陳慶瀚 Ching-Han Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering
論文出版年：	2017
畢業學年度：	105
語文別：	中文
論文頁數：	77
中文關鍵詞：	語者辨識、麥克風陣列、機率神經網路
外文關鍵詞：	Speaker Recognition, Microphone Array, Probabilistic Neural Network
相關次數：	點閱：13 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

為了提升單一麥克風語者辨識系統的效能。本研究因此設計一個基於麥克風陣列的嵌入式語者辨識系統，系統分成四個模組：麥克風陣列聲音訊號擷取、波束成形、語者特徵擷取和語者辨識模組。聲音訊號模組經由微機電(MEMS)麥克風組成的環形麥克風陣列收集語者聲音資訊；波束成形模組藉由多通道聲音處理來增強語音訊號與除去周圍的雜訊；在語者特徵擷取模組，我們使用線性預測編碼倒頻譜(LPCC)來表示語者的聲音特徵模型；最後藉由機率神經網路(PNN)分類器來進行語者辨識。我們建置一個實驗的語者聲音資料庫，錄製十二人共120個相同語句的聲音資料，來驗證此一語者辨識系統，實驗過程藉由機率神經網路平滑參數與波束成形參數的訓練來最佳化辨識率。實驗結果顯示，基於麥克風陣列的語者辨識系統，相較於單一麥克風的語者辨識系統，可降低約百分之十的錯誤相等率。

The study is to design an embedded speaker identification system based on microphone array in order to improve the efficiency of single microphone identification systems. The system is composed of four modules including sound signal extraction from microphone array, beam forming, speaker features extraction and speaker identification module. Sound signal module is to collect speaker sound information by using loop microphone array composed of Micro Electro Mechanical System (MEMS) microphone; Beam forming is to enhance sound signal and remove background noise via multi-channel sound processing; Linear Predictive Cepstrum Coefficient (LPCC) is applied to represent a speaker sound characteristics module; The classifier of Probabilistic Neural Network (PNN) is applied to identify speaker. Besides, we built a database of experimental speaker sounds with one hundred and twenty same statements recorded by twelve people. This is to validate the speaker identification system. The recognition rate was optimized by PNN smoothing parameters and beam forming parameters during the training. The test results showed that our speaker identification system based on microphone array could reduce about 10% error rate compared to the single one.

第一章 緒論    1
1 研究動機    1
2 文獻回顧    2
3 論文架構    5
第二章 MEMS麥克風陣列波束成形    6
1 MEMS 麥克風    6
1.1 MEMS 麥克風的原理    7
1.2 MEMS麥克風的種類    7
1.3 麥克風的指向性    8
2 麥克風陣列    10
2.1 線狀麥克風陣列    10
2.2 環形麥克風陣列    11
3 波束成形演算法    12
3.1 延遲求和波束成形(Delay and Sum Beamformer)    12
3.2 利用GCC-PHAT 估算TDOA（Time Difference of Arrival）    14
4 聲源方位估測演算法    15
4.1 到達時間差(TDOA)聲源方位估測法    15
5 特徵擷取    16
5.1 前處理    16
5.2 線性預測倒頻譜係數(LPCC)    19
6 機率神經網路(PNN)分類器    20
6.1 機率神經網路架構    20
第三章 麥克風陣列語者辨識系統    22
1 系統架構    23
1.1 聲音訊號擷取    24
1.2 波束成形    25
1.3 語音特徵擷取(feature extraction)    26
1.4 語者辨識    27
2 散事件系統建模    28
2.1 麥克風陣列語者辨識系統建模    28
2.2 聲音訊號擷取建模    29
2.3 波束成形建模    30
2.4 語音特徵擷取建模    31
2.5 語者辨識建模    32
2.6 主要的狀態(state)與動作(action)    33
3 軟體合成    35
3.1麥克風陣列語者識系統模型軟體合成    36
3.2聲音訊號擷取模型軟體合成    37
3.3波束成形模型軟體合成    37
3.4語音特徵擷取模型軟體合成    38
3.5語者辨識模型軟體合成    39
3.6軟體的模擬    40
第四章 系統整合實驗與驗證    45
1實驗環境    45
1.1 STM32F429 Discovery 開發板規格簡介    45
1.2 MEMS麥克風規格簡介    48
2實驗    48
2.1 受測人員資料採集    49
2.2 麥克風陣列語者辨識系統樣本與參數的訓練    51
3語者辨識性能評估    54
3.1 單一麥克風的語者辨識效能    55
3.2 使用麥克風陣列的語者辨識效能    55
4 實驗結果與討論    56
第五章 結論    57
參考文獻    59

                                

[1] “Speech Recognition”, [Online] Available: https://en.wikipedia.org/wiki/Speech_recognition
[2] Gongping Huang, Jacob Benesty and Jingdong Chen, “On the Design of Frequency-Invariant Beampatterns with Uniform Circular Microphone Arrays”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. PP, pp.1-1, March 2017.
[3] B. D. Van Veen and K. M. Buckley, “Beamforming: A Versatile Approach to Spatial Filtering,” IEEE ASSP Magazine, vol.5, no.2, pp.4 –24, April 1988.
[4] “語音識別”, [Online] Available: https://zh.wikipedia.org/wiki/%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB
[5] K. H. Davis, R. Biddulph and S. Balashek, “Automatic Recognition of Spoken Digit”, Journal of the Acoustical Society of America, vol.24 No 6, November 1952.
[6] N. Morgan and H. Franco, “Applications of neural networks to speech recognition”, IEEE Signal Processing Magazine, vol. 14, pp. 46-48, Nov.1997.
[7] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, vol.77, pp 257-286, Feb.1989.
[8] Warren McCulloch and Walter Pitts, "A Logical Calculus of Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics, vol.5, pp.115–133, in 1943.
[9] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, pp. 1533-1545, July 2014.
[10] D. F. Specht, “Probabilistic neural networks for classification, mapping, or associative memory”, IEEE International Conference on Neural Networks, vol.1, pp.525-532, July 1988.
[11] B. S. Atal, “Effectiveness of linear prediction characteristics of the speech
wave for automatic speaker identification and verification”, J. Acoust. Soc. Am., vol. 55, June 1974.
[12] R. Vergin, D. O'Shaughnessy and V. Gupta, “Compensated mel frequency cepstrum coefficients", IEEE ICASSP Processing Conference Proceedings, vol.1, pp.323-326, May 1996.
[13] V. M. Alvarado, H. F. Silverman, "Experimental Results Showing the Effects of Optimal Spacing Between Elements of a Linear Microphone Array", ICASSP-90, pp. 837-84, April 1990.
[14] S. Gholamrezaei, S. Alirezaee, A. Ahmadi, M. Ahmadi and S. Erfani, "Sound target localization in a 2-D microphone array", Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on, pp.1168 - 1171, 3-6 May 2015.
[15] Y. Tamai, S. Kagami, H. Mizoguchi, K. Sakaya, K. Nagashima and T. Takano, Circular microphone array for meeting system”, Sensors, 2003.Proceedings of IEEE, Vol.2, pp.1100 - 1105, Oct 2003.
[16] Y. Tamai, S. Kagami, Y. Amemiya, Y. Sasaki, H. Mizoguchi and T. Takano, "Circular microphone array for robot's audition", Sensors, 2004. Proceedings of IEEE, vol.2, pp. 565 - 570, 24-27 Oct 2004.
[17] Y. Sasaki, M. Kabasawa, S. Thompson, S. Kagami, K. Oro, “Spherical Microphone Array for Spatial Sound Localization for a Mobile Robot”, IEEE/RSJ International Conference on Intelligent Robots and Systems, 7-12 Oct. 2012.
[18] P. R. Roth, “Effective measurements using digital signal analysis,” IEEE Spectrum, vol.8, pp.62-70, April 1971.
[19] G. C. Carter, A. H. Nuttall, and P. G. Cable, “The smoothed coherence transform”, Proceedings of the IEEE, vol. 61, pp. 1497-1498, Oct. 1973
[20] C. H. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay”, IEEE Trans. Acoustic speech and Signal Processing, vol.24, pp.320-327, Aug. 1976
[21] M. S. Brandstein, H. F. Silverman, “A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Room “, ICASSP-97, vol.1, pp.375-378, April 1997.
[22] R. O. Schmidt, “Multiple Emitter Location and Signal Parameter Estimation”, IEEE Transaction Antennas and Propagation, vol.34, pp.276-280, March 1986.
[23] K. Yao, R. E. Hudson, C. W. Reed, D. Chen, and F. Lorenzelli, “Blin21beamforming on a randomly distributed sensor array system”, IEEE Journal on Selected Areas in Communications, vol.16, pp.1555–1567, Oct. 1998.
[24] T. Yamada, S. Nakamura and K. Shikano, “Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array”, IEEE Transactions on Speech and Audio Processing, vol. 10, pp. 48-56, August 2002.
[25] Xianyu Zhao and Zhijian Ou, “Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp.1114-1122, February 2007.
[26] Jungpyo Hong, Seungho Han, Sangbae Jeong, and Minsoo Hahn, “Adaptive microphone array processing for high-performance speech recognition in car environment”, IEEE Transactions on Consumer Electronics, vol. 57, pp. 2, March 2011.
[27] Kenichi Kumatani, John McDonough and Bhiksha Raj, “Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors”, IEEE Signal Processing Magazine, vol. 29, pp.127-140, October 2012.
[28] Weifang Li, Longbiao Wang, Yicong Zhou, John Dines, Mathew Magimai. –Doss, Hervé Bourlard and Qingmin Liao, “Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array”, vol. 22, pp. 2244-2255, October 2014.
[29] Soudeh A. Khoubrouy and John H. L. Hansen, “Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition”, vol. 23, pp.1344-1348, July 2016.
[30] X. Anguera, C. Woofers, J. Hernando, "Speaker diarization for multi-party meetings using acoustic fusion", Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on, pp. 426 – 431, 27-27 Nov. 2005.
[31] Ching-Han Chen, Tun-Kai Yao, Jia-Hong Dai and Chen-Yuan Chen, “A pipelined multiprocessor SOC design methodology for streaming signal processing”, Journal of Vibration and Control, vol.20, pp.163-178, in 2014
[32] Ching-Han Chen, Chia-Ming Kuo, Chen-Yuan Chen and Jia-Hong Dai, “The design and synthesis using hierarchical robotic discrete-event modeling”, Journal of Vibration and Control, vol.19, pp.1603-1613, in 2013
[33] STMicroelectronics. (2016). ARM Cortex-M4 32b MCU+FPU, 225DMIPS, up to 2MB Flash/256+4KB RAM, USB OTG HS/FS, Ethernet, 17 TIMs, 3 ADCs ,20 comm. Interfaces, camera & LCD-TFT. STM32F429xx. Doc ID 024030 Rev 8.
[34] Akustica, Inc. DS32-1.04 AKU142 Data Sheet, Package type 4-pin LGA top port, Data sheet revision 1.04, Release date 19 June 2015

簡易檢索 / 詳目顯示

相關論文