| 研究生: |
女哲藹 Harisma Khoirun Nisa |
|---|---|
| 論文名稱: |
應用於人工電子耳編碼策略之H-ELM架構的語音回響消除法 Speech Dereverberation Based on H-ELM framework for Cochlear Implant Coding Strategy |
| 指導教授: |
吳炤民
Wu, Chao-Min |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 73 |
| 中文關鍵詞: | 階層式極限學習機(HELM) 、回響 、映射目標 、盲法目標 、特徵學習 、電子耳 |
| 外文關鍵詞: | Hierarchical extreme Learning Machine (HELM), dereverberation, feature learning, mapping target, masking target, CI strategies |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現實環境中,人類的語音會被背景噪聲與回響所干擾,而對於電子耳的使用者來說,影響更是嚴重,因為回響會降低電子耳接收的語音品質與清晰度。本研究的目的是使用深度學習來增強語音的清晰度及品質。階層式極限學習機(Hierarchical Extreme Learning Machine, HELM)架構包含了是Original HELM與 Highway HELM,兩者皆能利用各個不同的回響環境進行快速訓練來有效地抑制回響。研究中使用了映射目標和理想比率遮罩(Ideal Ratio Masking, IRM)來作為HELM的訓練目標,並利用台灣地區噪音下漢語語音聽辨測試(Taiwan Mandarin Hearing in Noise, TMHINT)語料以及短時客觀與音理解度(Short-Time Objective Intelligibility, STOI)評估HELM的性能。實驗結果顯示,在短時客觀與音清晰度(STOI)的評估指標下,使用映射目標時,改善幅度可從0.677至0.683,而遮罩目標的改善幅度則是0.677至0.641。不過兩種架構對於回響抑制的結果並無明編碼顯差異。Original HELM及Highway HELM改善幅度分別是0.683至0.706、0.683至0.707。以HELM架構抑制回響後的語音更進一步地經過人工電子耳電子耳編碼策略處理,包括了進階聯合編碼(advanced combination encoder, ACE)、包絡增強 (Envelope Enhancement, EE) 、基本頻率調變(Fundamental frequency modulation, F0mod)等方法,以模擬電子耳使用者的聆聽表現。結果顯示採用映射的HELM架構可改善有效改善ACE及EE策略的言語理解度。
Human speech activity in the real condition is distorted by background noise and reverberant conditions, which affects the speech intelligibility and speech quality especially for cochlear implant (CI) users. Environmental noise especially in reverberant condition represents one of the challenges for CI user speech understanding in everyday life. The purpose of this study is to increase the intelligibility and perceived quality of the speech component using machine learning. The Hierarchical Extreme Learning Machine (HELM) framework, including HELM original and HELM Highway, demonstrated the attenuation of reverberation which have effectively and quickly learning. Feature learning based on training target mapping and ideal ratio masking (IRM) were applied on this framework to evaluate the performance of speech enhancement. The Taiwan Mandarin Hearing in Noise (TMHINT) dataset and short-time objective intelligibility (STOI) test were used to evaluate the performance of the HELM framework. The experimental results showed that average STOI scores of the mapping training target (0.677 to 0.683) achieved better results compared to masking training target (0.677 to 0.641) to attenuate reverberant effect. However, both framework HELM original (0.683 to 0.706) and HELM Highway (0.683 to 0.707) had no significant effect on the result. The deverberant speech processed by the HELM framework, was further processed by the cochlear implant sound coding strategies. Advanced Combination Encoder (ACE), Envelop Enhancement (EE) and Fundamental Frequency (F0mod), to simulate the listening performance of CI users. The results showed that HELM mapping framework could improve speech intelligibility in both ACE and EE strategies.
REFERENCES
Ariyanti, W. (2020). Ensemble and Multimodal learning for pathologiacal Voice. Taoyuan: Master Thesis National Central University.
ASHA, A. S.-L.-H. (2020, October 28). American Speech-Languange-Hearing Association. Retrieved from Definition of Communication: https://www.asha.org/NJC/Definition-of-Communication-and-Appropriate-Targets/
Bhat, G., Shankar, N., Reddy, C. K., & Panahi, I. M. (2019). A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone. IEEE Access, 78421-78433.
Bradley, J. S., & Sato, H. (2003). On the importance of early reflection for speech in rooms. Journal of the Acoustical Society of America, 113, 3233-3244.
CD, J., & N, A. (2019). Development and comparison of Extreme Learning machine and multi-layer perceptron neural network models for predicting optimum coagulant dosage for water treatmen. Journal of Physics: Confernece Series, 1-15.
Chu, K., Throckmorton, C., Collins, L., & Mainsah, B. Using Machine Learning to mitigate the effect of reverberation and noise in cochlear implant. Proceeding of Meeting on acoustic, 175th Meeting of the Acosutical Society of America , 1-13.
Chung, K., Nelson, L., & Teske, M. (2012). Noise reduction technologies implemented in head-worn preprocessor for improving cochlear implant performance in reverberant noise fields. Hearing and Research, 291, 41-51.
Crowson, M. G., Lin, V., Chen, J. M., & Chan, T. C. (2020). machine learning and Cochlear Implantation- A structured Review of Opportunities and Challenges. Journal Otology adn Neurotology , e36-e45.
Delfarah, M., & Wang, D. (2017). Feature for Masking-based Monaural Speech Speration in Reverberant Condition. IEEE Transaction on Audio, Speech, and Language Processing, 1085-1094.
Desmond, J. M., Collins, L. M., & Throckmorton, C. S. (2014). The effect of reverberation self-and overlap-masking on speech recognition in cochlear implant. The Journal of the Acoustical Society of America, 304-310.
Ding, S., Xu, X., & Nie, R. (2013). Extreme learning machineand its application. Neural Computing and Applications, 549-556.
Falk, T. H., Parsa, V., Santos, J. F., Arehart, K., Hazrati, O., Huber, R., . . . Scollies, S. (2015). Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices. IEEE SIGNAL PROCESSING MAGAZINE, 115-124.
Geurts, L., & Wouters, J. (1999). Enhancing the speech envelope of continuous interleaved sampling processors for cochlear implants. The Journal of the Acoustical Society of America, 2476-2484.
Goehring, T., Bolner, F., Monaghan, J. J., Djik, B. v., Zarowski, A., & Bleeck, S. (2017). Speech Enhancement based on neural Network improves speech intelligibility in noise for cochlear implant users. Hearing Research, 183-194.
Han, K., Wang, Y., Wang, D., Woods, W., Merks, I., & Zhang, T. (2015). Learning Spectral Mapping for Speech Dereverberation and Denoising. IEEE Transactions on Audio, Speech and Language Processing, 982-992.
Hazrati, O., & Loizou, P. C. (2013). Reverberation suppression in cochlear implants using a blind channel-selection strategy. Journal Acoustical Society of America, 133(6), 4188–4196.
Hazrati, O., Lee, J., & Loizou, P. C. (2013a). Blind binary masking for reverberation suppression in cochlear implant. Journal Acoustical Society of America, 133, 1607–1614.
Hazrati, O., sadjadi, S. O., Loizou, P., & Hansen, J. H. (2013). Simultaneous suppression of noise and reverberation in cochlear Implant using ratio masking strategy. Journal Acoustic Society, 3759-3765.
Huang, G. B. (2015). What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Computer, 263-278.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. 2004 INternational Joint Conference on Neural Network , 985-990. Budapest: IEEE.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. h. (2006). Extreme learning machine Theory and applications. Neurocomputing, 489-501.
Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental Study on Extreme learning machine Applications for Speech Enhancement. IEEE Access.
Hussain, T., Siniscalchi, S. M., Wang, H.-L. S., Tsao, Y., Salerno, V. M., & Liao, W.-H. (2020). Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation. IEEE Transactions on Cognitive and Developmental Systems, 744-758.
Jinxian, L. (2021). Study of speech dereverebration based on deep learning approach. Taiwan: Master Thesis National Central University.
Kokkinakis, K., & Loizou, P. C. (2010). Multi-microhone adaptive noise reduction strategies for coordinated stimulation in bilateral cochlear implant devices. Journal of the Acoustical Society of America, 127(5), 3136–3144.
Kokkinakis, K., & Loizou, P. C. (2011). The impact of reverberant self-masking and overlap-masking effect on speech intelligibility by cochlear implant listener (L). Journal Acousitical Society of America, 130, 1099-1102.
Kokkinakis, K., Hazrati, O., & Loizou, P. C. (2011). A channel-selection criterion for suppressing reverberation in cochlear implant. Journal Acoustical Society of America, 129, 3221–3232.
Koning, R., & Wouters, J. (2012). The Potential of Onset enhancement for increased speech intelligibility in auditory protheses. The Journal of the Acoustical Society of America, 2569-2581.
Lai, Y. H., Tsao, Y., Lu, X., Chen, F., Su, Y. T., Chen, K. C., . . . Lee, C. H. (2018). Deep Learning Based Noise Reduction Approach to improve speech intelligibility for Cochlear Implant Recipient. Ear and Hearing.
Lai, Y.-H., Chen, F., Wang, S.-S., Lu, X., Tsao, Y., & Lee, C.-H. (2017). A Deep Denoising Autoencoder Approach to improving Intelligibilty of Vocoded Speech in Cochlear Implant Simulation. IEEE Transaction on Biomedical Engineering, 64.
Lee, W.-J., Wang, S.-S., Chen, F., Lu, X., Chien, S.-Y., & Tsao, Y. (2018). Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), , 5454-5458. Calgary,AB.
Li, R., Li, T., Sun, X., Sun, X., & Zhao, F. (2020). Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments. Applied Acoustic, 1-17.
Loizou, P. C. (2007). Speech Enhancement Theory and Practice. CRC Press.
McGovern, S. (2004). A Model for Room Acoustic. 1-13.
Meister, H., Walger, M., Lang-Roth, R., & Müller, V. (2020). Voice fundamental frequency differences and speech recognition with noise and speech maskers in cochlear implant recipients. The Journal of the Acoustical Society of America, 19-24.
Milczynski, M., Wouters, J., & Wieringen, A. v. (2009). Improved fundamental frequency coding in cochlear implant signal processing. The Journal of the Acoustical Society of America, 2260-2271.
Monaghan, J. J., & Seeber, B. U. (2016). A method to enhance the use of interaural time differences for cochlear implant in reverberant environtment. Journal Acoustical Society of America, 140, 1116–1129.
Nisar, S., Khan, O. U., & Tariq, M. (2016). An Efficient Adaptive Window Size Selection Method for Improving Spectrogram Visualization. Computational Intelligence and Neuroscience, 1-14.
Noisser, S. A., Wall, J., Moniri, M., Glackin, C., & Cannings, N. (2020). Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures. 2020 International Joint Conference on Neural Network (IJCNN), 1-8. Glasgow, United Kingdom: IEEE.
Odelowo, B. O., & Anderson, D. V. (2017, October 15-18). Speech Enhancement Using Extreme Learning Machines. IEEE Workshop on Application of Signal Processing to Audio and Acoustic, 200-204.
P. Wang, Y.Wang, H.LIu, Y.Sheng, X.Wang, & Z.Wei. (2013). Speech enhancement based on auditory masking properties and log-spectral distance. Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, 1060-1064. Dalian: IEEE.
Paliwal, K., & Wojcicki, K. (2008). Effect of analysis window duration on speech intelligibility. IEEE Signal Processing Letters, 785-788.
Pandey, A., & Wang, D. (2020). Learning complex spectral mapping for Speech Enhancement with Improved Cross-corpus Generalization. INTERSPEECH 2020 (4511-4515). Shanghai, China: ISCA.
Parameswaran, K. (2018). Objective Assessment of Machine Learning Algorithms for. Canada: Electronic Thesis and Dissertation Repository, The University of Western Ontario.
Poissant, S. F., Whitmal, N. A., & Freyman, R. L. (2006). Effect of reverberation and masking on speech intelligibility in cochlear implant simulation. Journal Acoustical Society of America, 119, 1606–1615.
Prell, C. G., & Clavier, O. H. (2017). Effect of noise on speech recognition : Challenges for comunication by service members. Hearing Research, 76-89.
Qazi, O. u., Dijk, B. v., Moonen, M., & Wouters, J. (2012). Speech Understanding Performance of Cochlear Implant Subject Using Time-Frequency Masking Based Noise Reduction. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 59, 1364-1373.
Rao, L., & Yang, J. (2020). Speech Dereverberation Based on Improved Wesserstain Generative Adversarial Network. Journal of Physics : Conference Series.
Roman, N., & Woodruff, J. (2013). Speech intelligibility in reverberation with ideal binary masking : Effects of early reflections and signal-to-noise ratio threshold. Journal Acoustical Society of America, 133, 1707–1717.
Sun, L., Du, J., Dai, L.-R., & Lee, C.-H. (2017). Multiple-Target Deeep Learning for LSTM-RNN based Speech Enhancement. 2017 Hands-Free Speech Communincation and Microphone Arrays, HSCMA (136-140). San Francisco, CA: IEEE.
Sun, Y., Wang, W., Chambers, J., & Navqi, S. M. (2018). Enhanced Time-Frequency Masking by Using Neural Networks for Monaural Source Separation in Reverberant Room Environments. 26th European Signal Processing Conference (EUSIPCO) (1647-1651). Rome: IEEE.
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An Algorithm For Intelligibility Prediction Of Time-Frequency Weighted Noisy Speech. IEEE Transactions On Audio, Speech, And Language Processing, 2125-2135.
Tabibi, S., Kegel, A., Lai, K. W., & Dillier, N. (2020). A bio-inspired coding (BIC) strategy for cochlear implants. Hearing Research, 1-16.
Tang, J., Deng, C., & Huang, G.-B. (2016). Extreme Learning Machine for Multilayer Perceptron. IEEE Transactions On Neural Networks And Learning Systems, 809-821.
Vandali, A. E., Whitford, L. A., Plant, K. L., & Clark, G. M. (2000). Speech Perception as a Function of Electrical stimulation rate Using the nucleaus 24 cochlear Implant system. Ear Hear, 608-624.
Wang, D., & Hansen, H. L. (2018). Speech enhancement for cochlear implant recipient. Jornal Acoustical Society of America, 143(4), 2244–2254.
Wang, Y., Han, K., & Wang, D. (2013). Exploring Monaural Features for Classification-Based Speech Segregation. IEEE Transaction on Audio, Speech and language Processing, 270-279.
Whitmal, N. A., & Poissant, S. F. (2009). Effect of source-to-listener distance and masking on perception of cochlear implant processed speech in reverberant rooms. Journal Acoustical Society of America, 126, 2556–2569.
Williamson, D. S., Wang, Y., & Wang, D. (2016). Complex ratio masking for monaural speech separation. IEEE/ACM Transactions On Audio, Speech, And Language Processing, 483-492.
Wong, L. L., Soli, S. D., Liu, S., Han, N., & Huang, M.-W. (2007). Development of the Mandarin Hearing in Noise Test (MHINT). Ear & Hearing, 70-74.
Wouters, J., McDermott, H. J., & Francart, T. (2015). Sound coding in cochlear implants from electric pulse to hearing. IEEE signal Processing Magazine, 67-80.
Xia, J., Xu, B., Pentony, S., Xu, J., & Swaminathan, J. (2018). Effect of Reverberation and noise on speech intelligibility in normal-hearing and aided hearing impaired listeners. The Journal of the Acoustical Society of America, 1523-1533.
Xiao, D., Li, B., & Mao, Y. (2017). A Multiple Hidden Layers Extreme Learning Machine Method and Its Application. Mathematical Problems in Engineering, 1-10.
Xiao, X., Zhao, S., Nguyen, D. H., Zhong, X., & Jones, D. L. (2016). Speech dereverebation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal on Advances in Signal, 1-18.
Xu, Y., Dai, L.-R., & Lee, C.-H. (2014). An Experimental Study on Speech Enhancement based on Deep Neural Network. IEEE Signal Processing Letters, 65-68.
Yuan, M., Sun, Y., Feng, h., & Lee, T. (2013). A Speech Enhancement Method for Cochlear Implant. Japan: Annu Int Conf IEEE Eng Med Biol Soc.
Zeng, F.-G., Rebscher, S., Harrison, W. V., Sun, X., & Feng, H. (2008). Cochlear Implants System Design, Integration, and Evaluation. IEEE Rev Biomedical Engineering, 115-142.
Zezario, R. E., Sigalingging, J. W., Hussain, T., Wang, J.-C., & Tsao, Y. (2019). Comparative Study of Masking and Mapping Based on Hierarchical Extreme Learning Machine for Speech Enhancement. 2019 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) (1-2). Taipe: IEEE.
Zhang, X.-L., & Wang, D. (2016). A Deep Ensemble Learning Method for Monaural Speech Spearation. IEEE Transaction on Audio, Speech, and Language Processing, 967-977.