| 研究生: |
劉振宏 Chen-Hung Liu |
|---|---|
| 論文名稱: |
實作於微控制器的深度神經網路聲音事件辨識 A Deep Neural Network for Sound Event Recognition Implemented in Microcontroller |
| 指導教授: | 陳慶瀚 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 深度神經網路 、聲音事件辨識 、微控制器 、量化 、深度學習 、DS-CNN |
| 外文關鍵詞: | DS-CNN |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
典型的深度神經網路需要使用大量記憶體和高速浮點數計算性能,難以應用在硬體資源極少的微控制器嵌入式平台。深度神經網路可以成功的應用在聲音事件辨識,但為了能夠在微控制器平台實作深度聲音事件辨識應用,本研究提出一個量化策略,用以壓縮深度神經網路模型,以便在辨識性能和硬體資源需求之間進行最佳化。本研究採用了DS-CNN的架構去建構聲音事件辨識神經網路模型,擷取聲音的MFCC作為特徵來訓練辨識模型,透過我們的量化程序,將量化過後的權重參數置入ARM Cortex-M7微控制器進行驗證。在PC平台訓練完成的神經網路模型可以達到82%的辨識率,經過量化和移植到MCU平台後,在維持相同的0.2秒的辨識速度條件下,辨識率降低至60%。證實此方法的確可將PC上訓練後的深度神經網路模型移植到MCU平台運行,且仍然維持可接受的辨識性能和辨識率。本研究成果可將深度學習AI技術推廣至眾多低硬體資源需求的應用。
Typical deep neural networks require the use of considerable memories and high-speed floating-point arithmetic; hence, it is difficult to apply it to microcontroller-embedded platforms with limited hardware resources. Deep neural networks can be successfully applied in recognizing sound events. To facilitate the implementation of microcontroller platforms in deep sound event recognition, this study proposed a quantization strategy to compress deep neural networks and optimize the recognition performance and hardware resource needs. This study adopted the depthwise separable convolutional neural network (DS-CNN) structure to establish the neural network model for sound event recognition. Mel-frequency cepstral coefficients (MFCC) that extract sound were used as the features to train recognition models. Through the quantization process, the quantized weight parameters were input into an ARM Cortex-M7 microcontroller to facilitate verification. The neural network model that completed training on a personal computer platform reached a recognition rate of 82%. After the model was quantized and transferred to a microcontroller unit, the recognition rate dropped to 60% with the recognition speed remaining at 0.2 second. The result verified that the proposed method can enable the deep neural network model training on a personal computer to be transferred to microcontroller units while maintaining acceptable recognition performance and recognition rates. The results can extend the deep learning artificial intelligence technologies to numerous applications with low requirement of hardware resources.
[1] A. Krizhevsky, I. Sutskever, G. Hinton, "Imagenet classification with deep convolutional neural networks", Paper presented at the Advances in neural information processing systems, pp. 1097-1105, 2012.
[2] A. Graves, A. Mohamed, G. Hinton, " Speech recognition with deep recurrent neural networks", Paper presented at the Acoustics, speech and signal processing (icassp), pp. 6645-6649, 2013.
[3] N. Lane, S. Bhattacharya, A. Mathur, P. Georgiev, C. Forlivesi, F. Kawsar, " Squeezing deep learning into mobile and embedded devices", IEEE Pervasive Computing, no. 3, pp. 82-88, 2017.
[4] NVIDIA. (2018). 嵌入式系統開發套件、模組及SDK | NVIDIA Jetson. from https://www.nvidia.com/zh-tw/autonomous-machines/embedded-systems-dev-kits-modules/
[5] ARM. (2018). Project Trillium - Arm. from https://www.arm.com/products/silicon-ip-cpu/machine-learning/project-trillium
[6] S. Han, H. Mao, W. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding", arXiv preprint arXiv:1510.00149, 2015.
[7] S. Bhattacharya, N. D. Lane, “Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables”, Paper presented at the Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM, Stanford, CA, USA, 2016.
[8] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications”, arXiv preprint arXiv:1704.04861, 2017.
[9] L. Lai, N. Suda, V. Chandra, “CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs”, eprint arXiv:1801.06601, 2018.
[10] Y. Zhang, N. Suda, L. Lai, V. Chandra, “Hello edge: Keyword spotting on microcontrollers”, arXiv preprint arXiv:1711.07128 ,2017
[11] J.-w. Chen, C.-H. Liu, Y.-F. Liao, “基於深層類神經網路之音訊事件偵測系統” (Deep Neural Networks for Audio Event Detection) [In Chinese]. Paper presented at the Proceedings of the 28th Conference on Computational Linguistics and Speech Processing, 2016.
[12] CS231n, Stanford. (2018). Convolutional Neural Networks for Visual Recognition. from http://cs231n.github.io/convolutional-networks/
[13] C.-S. Li, (2018). Depthwise Separable Convolution. from http://blog.yeshuanova.com/blog/posts/depthwise-separable-convolution/
[14] I. Hubara, M. Courbariaux, D. Soudry, E.-Y. Ran, Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations”, The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869-6898, 2017.
[15] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference", Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[16] R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper”, arXiv preprint arXiv:1806.08342, 2018.
[17] UrbanSound8K, (2018). Urban Sound Datasets. from https://urbansounddataset.weebly.com/urbansound8k.html
[18] X. Zhu, M. Kaznady, G. Hendry, (2018). Hearing AI: Getting Started with Deep Learning for Audio on Azure. from https://blogs.technet.microsoft.com/machinelearning/2018/01/30/hearing-ai-getting-started-with-deep-learning-for-audio-on-azure/