實作於微控制器的深度神經網路聲音事件辨識｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉振宏 Chen-Hung Liu
論文名稱：	實作於微控制器的深度神經網路聲音事件辨識 A Deep Neural Network for Sound Event Recognition Implemented in Microcontroller
指導教授：	陳慶瀚
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	44
中文關鍵詞：	深度神經網路、聲音事件辨識、微控制器、量化、深度學習、DS-CNN
外文關鍵詞：	DS-CNN
相關次數：	點閱：13 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

典型的深度神經網路需要使用大量記憶體和高速浮點數計算性能，難以應用在硬體資源極少的微控制器嵌入式平台。深度神經網路可以成功的應用在聲音事件辨識，但為了能夠在微控制器平台實作深度聲音事件辨識應用，本研究提出一個量化策略，用以壓縮深度神經網路模型，以便在辨識性能和硬體資源需求之間進行最佳化。本研究採用了DS-CNN的架構去建構聲音事件辨識神經網路模型，擷取聲音的MFCC作為特徵來訓練辨識模型，透過我們的量化程序，將量化過後的權重參數置入ARM Cortex-M7微控制器進行驗證。在PC平台訓練完成的神經網路模型可以達到82%的辨識率，經過量化和移植到MCU平台後，在維持相同的0.2秒的辨識速度條件下，辨識率降低至60%。證實此方法的確可將PC上訓練後的深度神經網路模型移植到MCU平台運行，且仍然維持可接受的辨識性能和辨識率。本研究成果可將深度學習AI技術推廣至眾多低硬體資源需求的應用。

Typical deep neural networks require the use of considerable memories and high-speed floating-point arithmetic; hence, it is difficult to apply it to microcontroller-embedded platforms with limited hardware resources. Deep neural networks can be successfully applied in recognizing sound events. To facilitate the implementation of microcontroller platforms in deep sound event recognition, this study proposed a quantization strategy to compress deep neural networks and optimize the recognition performance and hardware resource needs. This study adopted the depthwise separable convolutional neural network (DS-CNN) structure to establish the neural network model for sound event recognition. Mel-frequency cepstral coefficients (MFCC) that extract sound were used as the features to train recognition models. Through the quantization process, the quantized weight parameters were input into an ARM Cortex-M7 microcontroller to facilitate verification. The neural network model that completed training on a personal computer platform reached a recognition rate of 82%. After the model was quantized and transferred to a microcontroller unit, the recognition rate dropped to 60% with the recognition speed remaining at 0.2 second. The result verified that the proposed method can enable the deep neural network model training on a personal computer to be transferred to microcontroller units while maintaining acceptable recognition performance and recognition rates. The results can extend the deep learning artificial intelligence technologies to numerous applications with low requirement of hardware resources.

摘要    i
ABSTRACT    ii
誌謝    iii
目錄    iv
圖目錄    vi
表目錄    vii
第一章 緒論    1
1.1研究動機    1
1.2研究目標    2
第二章 技術回顧    3
2.1 從類神經網路到深度學習    3
2.2卷積神經網路    5
2.3 Depthwise Separable Convolution    6
2.4 神經網路的量化    9
第三章 CNN神經網路量化和裁減    11
3.1 模型架構設計和訓練    11
3.2 模型量化的概念    13
3.3 權重參數的量化    14
3.3.1 Q格式和量化    14
3.3.2 量化權重    15
3.3.3 量化啟動資料    16
3.3.4 開發板置入量化後的模型    17
第四章 系統整合實驗    19
4.1 軟硬體實作平台    19
4.1.1 模型訓練平台    19
4.1.2 開發板硬體平台    20
4.2 資料前處理    21
4.3 模型訓練及量化之後的實驗    23
4.3.1 混淆矩陣    23
4.3.2 模型訓練過程及結果    24
4.3.3 量化範圍的設定對於模型準確率的影響    26
4.4 開發板的驗證    27
4.4 實驗結果整理    30
第五章 結論與未來研究方向    32
5.1 結論    32
5.2 未來研究方向    33
參考文獻    34


                                

[1] A. Krizhevsky, I. Sutskever, G. Hinton, "Imagenet classification with deep convolutional neural networks", Paper presented at the Advances in neural information processing systems, pp. 1097-1105, 2012.
[2] A. Graves, A. Mohamed, G. Hinton, " Speech recognition with deep recurrent neural networks", Paper presented at the Acoustics, speech and signal processing (icassp), pp. 6645-6649, 2013.
[3] N. Lane, S. Bhattacharya, A. Mathur, P. Georgiev, C. Forlivesi, F. Kawsar, " Squeezing deep learning into mobile and embedded devices", IEEE Pervasive Computing, no. 3, pp. 82-88, 2017.
[4] NVIDIA. (2018). 嵌入式系統開發套件、模組及SDK | NVIDIA Jetson. from https://www.nvidia.com/zh-tw/autonomous-machines/embedded-systems-dev-kits-modules/
[5] ARM. (2018). Project Trillium - Arm. from https://www.arm.com/products/silicon-ip-cpu/machine-learning/project-trillium
[6] S. Han, H. Mao, W. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding", arXiv preprint arXiv:1510.00149, 2015.
[7] S. Bhattacharya, N. D. Lane, “Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables”, Paper presented at the Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM, Stanford, CA, USA, 2016.
[8] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications”, arXiv preprint arXiv:1704.04861, 2017.
[9] L. Lai, N. Suda, V. Chandra, “CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs”, eprint arXiv:1801.06601, 2018.
[10] Y. Zhang, N. Suda, L. Lai, V. Chandra, “Hello edge: Keyword spotting on microcontrollers”, arXiv preprint arXiv:1711.07128 ,2017
[11] J.-w. Chen, C.-H. Liu, Y.-F. Liao, “基於深層類神經網路之音訊事件偵測系統” (Deep Neural Networks for Audio Event Detection) [In Chinese]. Paper presented at the Proceedings of the 28th Conference on Computational Linguistics and Speech Processing, 2016.
[12] CS231n, Stanford. (2018). Convolutional Neural Networks for Visual Recognition. from http://cs231n.github.io/convolutional-networks/
[13] C.-S. Li, (2018). Depthwise Separable Convolution. from http://blog.yeshuanova.com/blog/posts/depthwise-separable-convolution/
[14] I. Hubara, M. Courbariaux, D. Soudry, E.-Y. Ran, Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations”, The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869-6898, 2017.
[15] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference", Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[16] R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper”, arXiv preprint arXiv:1806.08342, 2018.
[17] UrbanSound8K, (2018). Urban Sound Datasets. from https://urbansounddataset.weebly.com/urbansound8k.html
[18] X. Zhu, M. Kaznady, G. Hendry, (2018). Hearing AI: Getting Started with Deep Learning for Audio on Azure. from https://blogs.technet.microsoft.com/machinelearning/2018/01/30/hearing-ai-getting-started-with-deep-learning-for-audio-on-azure/

簡易檢索 / 詳目顯示

相關論文