一個有效的邊緣智慧運算加速器設計: 一種適用於深度可分卷積的可重組式架構

簡易檢索 / 詳目顯示

回結果列表

研究生：	江鴻儀 Hung-Yi Chiang
論文名稱：	一個有效的邊緣智慧運算加速器設計: 一種適用於深度可分卷積的可重組式架構 An Efficient Accelerator Design for Edge AI: A Reconfigurable Structure for Depthwise Separable Convolution
指導教授：	周景揚 Jing-Yang Jou
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	39
中文關鍵詞：	人工智慧加速器、可重組架構、輕量化網路
外文關鍵詞：	AI accelerator, Reconfigurable Structure, MobileNets
相關次數：	點閱：20 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

卷積神經網絡（convolution neural network）已廣泛應用於電腦視覺任務(computer vision tasks)的領域，然而標準的神經網絡需要大量的運算和參數，這對嵌入式設備而言是個挑戰。因此前人提出了一種新穎的神經網路架構MobileNets，MobileNets採用深度可分離卷積(depthwise separable convolution)代替標準卷積，使其運算量和參數大幅減少且精度損失有限。而MobileNets中主要有兩種不同的計算方法pointwise和depthwise，如果用傳統的加速器來計算這兩種不同的運算，會因為運算參數和方式的不同而造成硬體利用率低下。除此之外，常見降低神經網路計算負擔的方法還有量化(quantization)，其透過減少位寬(bit width)或採用不同位寬來降低計算負荷，但如果用相同精度的硬體來計算不同位寬的資料，則無法有效的節省運算時間。基於MobileNets和量化網路，本文提出了一種可以有效計算量MobileNets的新型計算架構，以達到加速運算和節省面積的效果。

Convolution neural network (CNN) has been widely applied in the fields of computer vision applications. However, conventional neural network computations require a lot of operations and parameters, which becomes a challenge for embedded devices. MobileNets, a novel CNN which adopts depthwise separable convolution to replace the standard convolution, has substantially reduced operations and parameters with only limited loss in accuracy. There are mainly two different calculation methods in MobileNets, pointwise and depthwise. If the same accelerator is used to perform these two different operations, the accelerator may not able to be fully exploited due to different operation parameters. In addition, there are some methods for neural network quantization, which limit the bit width to reduce computing energy and parameters. If the same precision hardware is used to calculate quantized operations, the maximum benefit cannot be achieved. Therefore, A novel architecture which can effectively calculate quantized MobileNets is proposed in this thesis.

摘要    II
Abstract    III
致謝    IV
Table of Contents    V
Table of Figures    VI
Table of Table    VII
Chapter 1    Introduction    1
Chapter 2    Background    5
2.1 Depthwise Separable Convolution    5
2.2 CNN Processor for MobileNets    6
2.3 Precision-Scalable Process Element    8
Chapter 3    Reconfigurable Accelerator Design    11
3.1 Architecture Overview    11
3.2 Sum of eight multiplication(S8)    13
3.3 Extra Multipliers and Accumulate Adders    15
3.4 Merge Adders    16
3.5 Dataflow    18
Chapter 4    Experiment Results    23
4.1 Experimental Setup    23
4.2 Comparisons    25
Chapter 5    Conclusions    27
References    28
                                

[1] Ching-Che Chung, Wei-Ting Chen, Ya-Ching Chang “Using Quantization-Aware Training Technique with Post-Training Fine-Tuning Quantization to Implement a MobileNet Hardware Accelerator” in Proc. of Indo-Taiwan 2nd International Conference on Computing, Analytics and Networks, pp. 28-32, Feb. 2020.
[2] Raghudeep Gadde, Varun Jampani, Peter V. Gehler “Semantic Video CNNs through Representation Warping”, in Proc. of IEEE International Conference on Computer Vision, pp. 4453-4462, Oct. 2017.
[3] Ross Girshick, “Fast R-CNN”, in Proc. of IEEE International Conference on Computer Vision, pp. 1440-1448, Dec. 2015.
[4] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, Jun. 2014.
[5] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” CoRR, vol. abs/1704.04861, 2017.
[6] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “Imagenet classification with deep convolutional neural networks”, in Proc. of Advances in Neural Information Processing Systems, pp. 1097-1105, Dec. 2012.
[7] Darryl Lin, Sachin Talathi, Sreekanth Annapureddy “Fixed Point Quantization of Deep Convolutional Networks” in Proc. of International Conference on Machine Learning, pp. 2849–2858, Jun. 2016.
[8] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, “You only look once: Unified real-time object detection”, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, Jun. 2016.
[9] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks”, in Proc. of Advances in Neural Information Processing Systems, pp. 91-99, Dec. 2015.
[10] Sungju Ryu, Hyungjun Kim, Wooseok Yi, Jongeun Koo, Eunhwan Kim, Yulhwa Kim, Taesu Kim, Jae-Joon Kim “A 44.1TOPS/W Precision-Scalable Accelerator for Quantized Neural Networks in 28nm CMOS” in Proc. of IEEE Custom Integrated Circuits Conference, Apr. 2020.
[11] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, Google Inc. “MobileNetV2: Inverted Residuals and Linear Bottlenecks” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, Jun. 2018.
[12] Karen Simonyan, Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition”, in Proc. of International Conference on Learning Representations, May. 2015.
[13] Di Wu, Yu Zhang, Xijie Jia, Lu Tian, Tianping Li, Lingzhi Sui, Dongliang Xie, Yi Shan “A High-performance CNN Processor Based on FPGA for MobileNets” in Proc. of International Conference on Field Programmable Logic and Applications, pp. 136-143, Sep. 2019.

簡易檢索 / 詳目顯示

相關論文