用於動作辨識中的時空圖卷積網路之可重構硬體架構設計

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳子捷 TZU-CHIEH CHEN
論文名稱：	用於動作辨識中的時空圖卷積網路之可重構硬體架構設計 A Reconfigurable Hardware Architecture for Spatial Temporal Graph Convolutional Network in action recognition
指導教授：	蔡宗漢 TSUNG-HAN TSAI
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	50
中文關鍵詞：	圖卷積網路、硬體加速器、可重構架構、特殊應用積體電路、現場可程式化邏輯閘陣列
外文關鍵詞：	graph convolutional neural network, hardware accelerator, reconfigurable architecture, ASIC, FPGA
相關次數：	點閱：20 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

人類的動作辨識可以使用各種不同的數據當作輸入，主要有RGB圖像或是骨架數據等。傳統方法使用RGB圖像當作輸入，並採用了卷積神經網路(CNN)或循環神經網路(RNN)模型。然而，由於圖片中存在各種噪音，導致這些方法的準確性很低。為了提高辨識的準確性，一些研究探討了使用骨架數據做為替代的輸入。儘管如此，CNN或RNN模型無法充分利用骨架數據中存在的空間和時間關係，因此限制了模型的有效性。近年來，圖卷積網路(GCN)因為其在社會網路分析以及推薦系統等任務中的廣泛適用性而獲得了極大的關注。GCN特別適用於處理非歐基里德的數據，像是人體骨骼關節。與RGB圖像不同的是，它不會受到環境因素的影響。然而，由於GCN的計算複雜性以及數據稀疏性，當使用在CPU或是GPU平台上時，往往會導致高延遲以及低功率效率。為了面對這些挑戰，設計專門的硬體加速器起到了至關重要的作用。時空圖卷積網路(ST-GCN)是一個廣泛用於人類動作辨識的模型。在本文中，我們為ST-GCN提出了一個高度平行化運算且靈活的架構。我們的架構包含了通用型的處理元件(PE)，它們可以被歸類為組合引擎和聚合引擎來計算GCN層。這些PE也可以在處理TCN層時相互連接。根據我們提出的方法，此加速器還具有良好的可擴展性。我們在ASIC和FPGA平台上都實現了該硬體設計。與其他一樣為ST-GCN實現硬體設計的論文相比，我們提出的方法實現了高達39.5%的延遲降低以及高達2.23倍的功率效率提高。

Human action recognition can leverage various data inputs, including RGB images and skeleton data. Traditional approaches utilize RGB images as input and employ convolutional neural network (CNN) or recurrent neural network (RNN) models. However, these methods suffer from low accuracy due to the presence of various background noises in the images. In an attempt to enhance accuracy, some studies have explored the use of skeleton data as an alternative input. Nevertheless, CNN or RNN models are unable to fully exploit the spatial and temporal relationships inherent in the skeleton data, limiting their effectiveness. In recent years, Graph Convolutional Networks (GCNs) have gained significant attention due to their wide applicability in tasks such as social network analysis and recommendation systems. GCNs are particularly suitable for processing non-Euclidean data, such as human skeleton joints, which remain unaffected by environmental factors unlike RGB images. However, the computational complexity and data sparsity of GCNs often result in high latency and low power efficiency when deployed on CPU or GPU platforms. To address these challenges, dedicated hardware accelerators play a crucial role. In this paper, we propose a highly parallelized and flexible architecture for Spatial-Temporal Graph Convolutional Networks (ST-GCN), a widely used model in human action recognition. Our architecture incorporates general Processing Elements (PEs) that can be grouped into combination engines and aggregation engines to compute GCN layers. These PEs can also be interconnected while processing TCN layers. The accelerator also has high scalability due to our proposed method. We implemented our design on both ASIC and FPGA platforms. Compared to other works that also implement hardware designs for ST-GCN, our proposed method achieves up to 39.5% reduction in latency and up to 2.23x improvement in power efficiency.

摘要                             I
ABSTRACT                        II
序論                           1
1研究背景與動機               1
2論文架構                    5
文獻探討                       6
1時空圖卷積網路(ST-GCN)       6
2ST-GCN硬體加速器            11
硬體架構設計                   16
1整體硬體架構                16
2組合引擎模組設計            18
3聚合引擎模組設計            20
4組合引擎與聚合引擎的合作方式 22
5處理TCN層的運算             24
硬體實現結果                   25
1逐層分析                    25
2與相關ASIC加速器比較        28
3與相關FPGA加速器比較        31
結論                          35
參考文獻                        36


                                

[1] Yin, Jun, et al. "A skeleton-based action recognition system for medical condition detection." 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS). IEEE, 2019.
[2] Lin, Weiyao, et al. "Human activity recognition for video surveillance." 2008 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2008.
[3] Ma, Zaosheng. "Human action recognition in smart cultural tourism based on fusion techniques of virtual reality and SOM neural network." Computational Intelligence and Neuroscience 2021 (2021).
[4] Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in neural information processing systems 27 (2014).
[5] Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." Proceedings of the IEEE international conference on computer vision. 2015.
[6] Zhang, Pengfei, et al. "EleAtt-RNN: Adding attentiveness to neurons in recurrent neural networks." IEEE Transactions on Image Processing 29 (2019): 1061-1073.
[7] BANERJEE, Avinandan; SINGH, Pawan Kumar; SARKAR, Ram. Fuzzy integral-based CNN classifier fusion for 3D skeleton action recognition. IEEE transactions on circuits and systems for video technology, 2020, 31.6: 2206-2216.
[8] DING, Zewei, et al. Investigation of different skeleton features for cnn-based 3d action recognition. In: 2017 IEEE International conference on multimedia & expo workshops (ICMEW). IEEE, 2017. p. 617-622.
[9] DU, Yong; FU, Yun; WANG, Liang. Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE, 2015. p. 579-583.
[10] DU, Yong; WANG, Wei; WANG, Liang. Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 1110-1118.
[11] ZHANG, Songyang, et al. Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Transactions on Multimedia, 2018, 20.9: 2330-2343.
[12] LI, Chuankun, et al. Skeleton-based action recognition using LSTM and CNN. In: 2017 IEEE International conference on multimedia & expo workshops (ICMEW). IEEE, 2017. p. 585-590.
[13] KIPF, Thomas N.; WELLING, Max. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[14] CHIANG, Wei-Lin, et al. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019. p. 257-266.
[15] PEI, Hongbin, et al. Geom-gcn: Geometric graph convolutional networks. arXiv preprint arXiv:2002.05287, 2020.
[16] ABU-EL-HAIJA, Sami, et al. N-gcn: Multi-scale graph convolution for semi-supervised node classification. In: uncertainty in artificial intelligence. PMLR, 2020. p. 841-851.
[17] YOU, Yuning, et al. L2-gcn: Layer-wise and learned efficient training of graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 2127-2135.
[18] Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional networks for skeleton-based action recognition." Thirty-second AAAI conference on artificial intelligence. 2018.
[19] LIANG, Shengwen, et al. Engn: A high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Transactions on Computers, 2020, 70.9: 1511-1525.
[20] YAN, Mingyu, et al. Hygcn: A gcn accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020. p. 15-29.
[21] GENG, Tong, et al. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2020. p. 922-936.
[22] KININGHAM, Kevin; LEVIS, Philip; RÉ, Christopher. GRIP: A graph neural network accelerator architecture. IEEE Transactions on Computers, 2022.
[23] CHEN, Ming, et al. Simple and deep graph convolutional networks. In: International conference on machine learning. PMLR, 2020. p. 1725-1735.
[24] LI, Jiajun, et al. GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021. p. 775-788.
[25] Ding, Luchang, Zhize Huang, and Gengsheng Chen. "An FPGA implementation of GCN with sparse adjacency matrix." 2019 IEEE 13th international conference on ASIC (ASICON). IEEE, 2019.
[26] Pei, Songwen, et al. "STARS: Spatial Temporal Graph Convolution Network for Action Recognition System on FPGAs." 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2021.
[27] WU, Weiwei, et al. STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023.
[28] Kay, Will, et al. "The kinetics human action video dataset." arXiv preprint arXiv:1705.06950 (2017).
[29] Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[30] HE, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770-778.

簡易檢索 / 詳目顯示

相關論文