| 研究生: |
游仁男 Jen-Nan Yu |
|---|---|
| 論文名稱: |
基於麥克風陣列的語者辨識系統設計與實作 Design and Implementation of a Microphone Array Based Speaker Recognition System |
| 指導教授: |
陳慶瀚
Ching-Han Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 77 |
| 中文關鍵詞: | 語者辨識 、麥克風陣列 、機率神經網路 |
| 外文關鍵詞: | Speaker Recognition, Microphone Array, Probabilistic Neural Network |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為了提升單一麥克風語者辨識系統的效能。本研究因此設計一個基於麥克風陣列的嵌入式語者辨識系統,系統分成四個模組:麥克風陣列聲音訊號擷取、波束成形、語者特徵擷取和語者辨識模組。聲音訊號模組經由微機電(MEMS)麥克風組成的環形麥克風陣列收集語者聲音資訊;波束成形模組藉由多通道聲音處理來增強語音訊號與除去周圍的雜訊;在語者特徵擷取模組,我們使用線性預測編碼倒頻譜(LPCC)來表示語者的聲音特徵模型;最後藉由機率神經網路(PNN)分類器來進行語者辨識。我們建置一個實驗的語者聲音資料庫,錄製十二人共120個相同語句的聲音資料,來驗證此一語者辨識系統,實驗過程藉由機率神經網路平滑參數與波束成形參數的訓練來最佳化辨識率。實驗結果顯示,基於麥克風陣列的語者辨識系統,相較於單一麥克風的語者辨識系統,可降低約百分之十的錯誤相等率。
The study is to design an embedded speaker identification system based on microphone array in order to improve the efficiency of single microphone identification systems. The system is composed of four modules including sound signal extraction from microphone array, beam forming, speaker features extraction and speaker identification module. Sound signal module is to collect speaker sound information by using loop microphone array composed of Micro Electro Mechanical System (MEMS) microphone; Beam forming is to enhance sound signal and remove background noise via multi-channel sound processing; Linear Predictive Cepstrum Coefficient (LPCC) is applied to represent a speaker sound characteristics module; The classifier of Probabilistic Neural Network (PNN) is applied to identify speaker. Besides, we built a database of experimental speaker sounds with one hundred and twenty same statements recorded by twelve people. This is to validate the speaker identification system. The recognition rate was optimized by PNN smoothing parameters and beam forming parameters during the training. The test results showed that our speaker identification system based on microphone array could reduce about 10% error rate compared to the single one.
[1] “Speech Recognition”, [Online] Available: https://en.wikipedia.org/wiki/Speech_recognition
[2] Gongping Huang, Jacob Benesty and Jingdong Chen, “On the Design of Frequency-Invariant Beampatterns with Uniform Circular Microphone Arrays”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. PP, pp.1-1, March 2017.
[3] B. D. Van Veen and K. M. Buckley, “Beamforming: A Versatile Approach to Spatial Filtering,” IEEE ASSP Magazine, vol.5, no.2, pp.4 –24, April 1988.
[4] “語音識別”, [Online] Available: https://zh.wikipedia.org/wiki/%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB
[5] K. H. Davis, R. Biddulph and S. Balashek, “Automatic Recognition of Spoken Digit”, Journal of the Acoustical Society of America, vol.24 No 6, November 1952.
[6] N. Morgan and H. Franco, “Applications of neural networks to speech recognition”, IEEE Signal Processing Magazine, vol. 14, pp. 46-48, Nov.1997.
[7] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, vol.77, pp 257-286, Feb.1989.
[8] Warren McCulloch and Walter Pitts, "A Logical Calculus of Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics, vol.5, pp.115–133, in 1943.
[9] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, pp. 1533-1545, July 2014.
[10] D. F. Specht, “Probabilistic neural networks for classification, mapping, or associative memory”, IEEE International Conference on Neural Networks, vol.1, pp.525-532, July 1988.
[11] B. S. Atal, “Effectiveness of linear prediction characteristics of the speech
wave for automatic speaker identification and verification”, J. Acoust. Soc. Am., vol. 55, June 1974.
[12] R. Vergin, D. O'Shaughnessy and V. Gupta, “Compensated mel frequency cepstrum coefficients", IEEE ICASSP Processing Conference Proceedings, vol.1, pp.323-326, May 1996.
[13] V. M. Alvarado, H. F. Silverman, "Experimental Results Showing the Effects of Optimal Spacing Between Elements of a Linear Microphone Array", ICASSP-90, pp. 837-84, April 1990.
[14] S. Gholamrezaei, S. Alirezaee, A. Ahmadi, M. Ahmadi and S. Erfani, "Sound target localization in a 2-D microphone array", Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on, pp.1168 - 1171, 3-6 May 2015.
[15] Y. Tamai, S. Kagami, H. Mizoguchi, K. Sakaya, K. Nagashima and T. Takano, Circular microphone array for meeting system”, Sensors, 2003.Proceedings of IEEE, Vol.2, pp.1100 - 1105, Oct 2003.
[16] Y. Tamai, S. Kagami, Y. Amemiya, Y. Sasaki, H. Mizoguchi and T. Takano, "Circular microphone array for robot's audition", Sensors, 2004. Proceedings of IEEE, vol.2, pp. 565 - 570, 24-27 Oct 2004.
[17] Y. Sasaki, M. Kabasawa, S. Thompson, S. Kagami, K. Oro, “Spherical Microphone Array for Spatial Sound Localization for a Mobile Robot”, IEEE/RSJ International Conference on Intelligent Robots and Systems, 7-12 Oct. 2012.
[18] P. R. Roth, “Effective measurements using digital signal analysis,” IEEE Spectrum, vol.8, pp.62-70, April 1971.
[19] G. C. Carter, A. H. Nuttall, and P. G. Cable, “The smoothed coherence transform”, Proceedings of the IEEE, vol. 61, pp. 1497-1498, Oct. 1973
[20] C. H. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay”, IEEE Trans. Acoustic speech and Signal Processing, vol.24, pp.320-327, Aug. 1976
[21] M. S. Brandstein, H. F. Silverman, “A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Room “, ICASSP-97, vol.1, pp.375-378, April 1997.
[22] R. O. Schmidt, “Multiple Emitter Location and Signal Parameter Estimation”, IEEE Transaction Antennas and Propagation, vol.34, pp.276-280, March 1986.
[23] K. Yao, R. E. Hudson, C. W. Reed, D. Chen, and F. Lorenzelli, “Blin21beamforming on a randomly distributed sensor array system”, IEEE Journal on Selected Areas in Communications, vol.16, pp.1555–1567, Oct. 1998.
[24] T. Yamada, S. Nakamura and K. Shikano, “Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array”, IEEE Transactions on Speech and Audio Processing, vol. 10, pp. 48-56, August 2002.
[25] Xianyu Zhao and Zhijian Ou, “Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp.1114-1122, February 2007.
[26] Jungpyo Hong, Seungho Han, Sangbae Jeong, and Minsoo Hahn, “Adaptive microphone array processing for high-performance speech recognition in car environment”, IEEE Transactions on Consumer Electronics, vol. 57, pp. 2, March 2011.
[27] Kenichi Kumatani, John McDonough and Bhiksha Raj, “Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors”, IEEE Signal Processing Magazine, vol. 29, pp.127-140, October 2012.
[28] Weifang Li, Longbiao Wang, Yicong Zhou, John Dines, Mathew Magimai. –Doss, Hervé Bourlard and Qingmin Liao, “Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array”, vol. 22, pp. 2244-2255, October 2014.
[29] Soudeh A. Khoubrouy and John H. L. Hansen, “Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition”, vol. 23, pp.1344-1348, July 2016.
[30] X. Anguera, C. Woofers, J. Hernando, "Speaker diarization for multi-party meetings using acoustic fusion", Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on, pp. 426 – 431, 27-27 Nov. 2005.
[31] Ching-Han Chen, Tun-Kai Yao, Jia-Hong Dai and Chen-Yuan Chen, “A pipelined multiprocessor SOC design methodology for streaming signal processing”, Journal of Vibration and Control, vol.20, pp.163-178, in 2014
[32] Ching-Han Chen, Chia-Ming Kuo, Chen-Yuan Chen and Jia-Hong Dai, “The design and synthesis using hierarchical robotic discrete-event modeling”, Journal of Vibration and Control, vol.19, pp.1603-1613, in 2013
[33] STMicroelectronics. (2016). ARM Cortex-M4 32b MCU+FPU, 225DMIPS, up to 2MB Flash/256+4KB RAM, USB OTG HS/FS, Ethernet, 17 TIMs, 3 ADCs ,20 comm. Interfaces, camera & LCD-TFT. STM32F429xx. Doc ID 024030 Rev 8.
[34] Akustica, Inc. DS32-1.04 AKU142 Data Sheet, Package type 4-pin LGA top port, Data sheet revision 1.04, Release date 19 June 2015