跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳昱安
Yu-An Chen
論文名稱: 基於深度學習之殘響消除
Acoustic Reverberation Cancellation Based on Deep Neural Network
指導教授: 王家慶
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 62
中文關鍵詞: 殘響深度學習
相關次數: 點閱:7下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 聲音在日常生活中扮演著重要的地位,但大多環境內往往會有殘響的存在,例如視訊會議、遠距教學甚至是手機通訊等面對的議題,因此語音的清晰度顯得格外為重要。
    深層類神經網路(Deep Neural Network, DNN)目前已經成為處理訊號問題的熱門方法。本論文主要以深層網路為基礎設計一個不同於以往的新架構,結合了自編碼器與深層遞迴類神經網路,稱之序列至序列自編碼模型(Sequence to sequence Autoencoder, SA),作法是用經由短時傅立葉轉換後,將能量資訊(magnitude)輸入至網路模型,藉由同時考慮能量的時間關係和自身的結構資訊,輸出為預估的能量大小,並結合相位資訊(phase)映射回時域上。最後,本論文提出的方法使用Chime4和REVERB challenge 2014的資料作評估,實驗結果顯示本方法較其他深度類神經網路更加優秀。


    Sound plays an important role in daily life, but most of the environment often has reverberations, such as video conferencing, distance education, and even mobile communication. Therefore, the clarity of speech is particularly important.
    The Deep Neural Network (DNN) has become a popular method for dealing with signal problems. This paper mainly designs a new architecture different from the previous one based on the deep network. It combines the Auto-Encoder and deep recursive neural network, called the sequence to sequence Autoencoder (SA). The method is to input the magnitude into the network model by using the energy of output of the short-time Fourier transform. Considering the temporal relationship of energy and its structural information, the estimated energy is output and then combined with the phase information to map to the time domain. Finally, the proposed method in this paper uses Chime4 and REVERB challenge 2014 data for reverberation elimination. The experimental results show that this method is superior than other deep neural networks.

    中文摘要 V ABSTRACT VI 章節目錄 VII 圖目錄 X 第一章 緒論 1 1-1背景 1 1-2 研究動機與目的 2 1-3 研究方法與章節介紹 3 第二章 相關文獻探討 4 2-1 音訊特徵 4 2-1-1 時頻譜 4 2-1-2 線性預估系數 5 2-1-3 梅爾頻譜(Mel-spectrum) 6 2-1-4 梅爾頻率倒譜系數(Mel-Frequency Cepstral Coefficients, MFCCs) 7 2-2深度學習 8 2-2-1類神經網路發展及概念 10 2-2-2感知機 11 2-3深層類神經網路應用於殘響消除 12 2-4基於深度去噪式自編碼器之去殘響 15 2-5深度卷積神經網路之去殘響 18 2-6基於遞迴式類神經網路之去殘響 21 第三章 序列至序列自編碼模型之殘響消除 26 3-1序列至序列自編碼模型架構 27 3-2 序列至序列自編碼模型正傳遞 28 3-3序列至序列自編碼模型倒傳遞 31 3-4序列至序列自編碼模型設置 34 第四章 實驗設計與結果 36 4-1實驗環境及深層類神經網路設置 36 4-2與其他方法的比較 38 4-2-1 訓練集誤差函數 38 4-2-2 測試集SDR、SAR和SNR數值比較 40 4-2-3範例音檔頻譜圖 42 4-2-4網路模型計算效率 44 第五章 結論及未來研究方向 45 第六章 參考文獻 46

    [1] G. Hinton, S. Osindero, and Y. Teh, ‘‘A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
    [2] G. Hinton and R. Salakhutdinov, ‘‘Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006.
    [3] Y. Bengio, A. Courville, and P. Vincent, ‘‘Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013.
    [4] Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal processing magazine 29.6 (2012): 82-97.
    [5] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504-507, 2006.
    [6] A. Ng, “Sparse autoencoder,” CS294A Lecture notes, pp. 72-2011.
    [7] S. Nie, H. Zhang, X. Zhang, and W. Liu, “Deep stacking networks with time series for speech separation,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 6667-6671.
    [8] M. Hermans and B. Schrauwen, “Training and analyzing deep recurrent neural networks,” in Proceedings Advances in Neural Information Processing Systems, 2013, pp. 190-198.
    [9] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” in Proceedings International Conference on Learning Representations, 2014
    [10] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
    [11] LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and neural networks 3361.10 (1995): 1995.
    [12] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [13] He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer, Cham, 2016.
    [14] ]Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
    [15] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).]
    [16] O'Shaughnessy, Douglas. "Linear predictive coding." IEEE potentials 7.1 (1988): 29-32.
    [17] Logan, Beth. "Mel Frequency Cepstral Coefficients for Music Modeling." ISMIR. Vol. 270. 2000.
    [18] Molau, Sirko, et al. "Computing mel-frequency cepstral coefficients on the power spectrum." Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP'01). 2001 IEEE International Conference on. Vol. 1. IEEE, 2001.
    [19] McCulloch, Warren S., and Walter Pitts. "A logical calculus of the ideas immanent in nervous activity." The bulletin of mathematical biophysics 5.4 (1943): 115-133.
    [20] Wu, Bo, et al. "A reverberation-time-aware approach to speech dereverberation based on deep neural networks." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.1 (2017): 102-111.
    [21] Han, Kun, Yuxuan Wang, and DeLiang Wang. "Learning spectral mapping for speech dereverberation." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
    [22] Feng, Xue, Yaodong Zhang, and James Glass. "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
    [23] Wang, D. S., Y. X. Zou, and W. Shi. "A deep convolutional encoder-decoder model for robust speech dereverberation." Digital Signal Processing (DSP), 2017 22nd International Conference on. IEEE, 2017.
    [24] Park, Sunchan, et al. "Linear prediction-based dereverberation with very deep convolutional neural networks for reverberant speech recognition." Electronics, Information, and Communication (ICEIC), 2018 International Conference on. IEEE, 2018.
    [25] Weninger, Felix, et al. "Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
    [26] Ahmad, Abdul Manan, Saliza Ismail, and D. F. Samaon. "Recurrent neural network with backpropagation through time for speech recognition." Communications and Information Technology, 2004. ISCIT 2004. IEEE International Symposium on. Vol. 1. IEEE, 2004.
    [27] Weninger, Felix, et al. "Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
    [28] Santos, Joao Felipe, and Tiago H. Falk. "Speech Dereverberation With Context-Aware Recurrent Neural Networks." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.7 (2018): 1236-1246.
    [29] He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.
    [30] Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.
    [31] Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
    [32] Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on Machine learning. ACM, 2008.
    [33] Feng, Xue, Yaodong Zhang, and James Glass. "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
    [34] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." nature323.6088 (1986): 533
    [35] Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv

    QR CODE
    :::