| 研究生: |
沈桓慶 Huan-Ching Shen |
|---|---|
| 論文名稱: |
基於聯合嵌入之雙手配對與追蹤系統 A Hands Pairing and Tracking System base on Associative Embedding |
| 指導教授: |
范國清
Kuo-Chin Fan |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 深度學習 、偵測系統 、手勢追蹤 、物件偵測 、類神經網路 、聯合嵌入 |
| 外文關鍵詞: | deep learning, detection system, gesture tracking, object detection, neural network, associative embedding |
| 相關次數: | 點閱:19 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
手部追蹤旨在預測影像序列中多個手的軌跡,對於空中手寫、手語辨識及手勢辨識等應用具有重要的意義,而將雙手分組可以使上述應用實現更複雜的功能。
本論文提出基於YOLOv3和聯合嵌入的方法,整合多目標追蹤和關節點檢測的單階段類神經網路模型和演算法,實現實時的多人雙手追蹤。
Hand tracking aims to predict the trajectory of multiple hands in an image sequence, which is of great significance for applications such as air handwriting, sign language recognition and gesture recognition, and grouping the hands can enable the above applications to achieve more complex functions.
This paper proposes a single-stage neural network model and algorithm based on YOLOv3 and associative embedding, integrating multi-target tracking and joint point detection, to achieve real-time multi-person hand tracking.
[1] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali
Farhadi. You only look once: Unified, real-time object detection.
In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 779–788, 2016.
[2] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single
shot multibox detector. In European conference on computer vision,
pages 21–37. Springer, 2016.
[3] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik.
Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 580–587, 2014.
[4] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster
r-cnn: Towards real-time object detection with region proposal
networks. arXiv preprint arXiv:1506.01497, 2015.
[5] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr
Dollár. Focal loss for dense object detection. In Proceedings of the
IEEE international conference on computer vision, pages 2980–2988,
2017.
[6] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using part affinity fields.
In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 7291–7299, 2017.
[7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.
23Advances in neural information processing systems, 25:1097–1105,
2012.
[8] Alejandro Newell, Zhiao Huang, and Jia Deng. Associative embedding: End-to-end learning for joint detection and grouping.
arXiv preprint arXiv:1611.05424, 2016.
[9] Hei Law and Jia Deng. Cornernet: Detecting objects as paired
keypoints. In Proceedings of the European conference on computer
vision (ECCV), pages 734–750, 2018.
[10] Zhongdao Wang, Liang Zheng, Yixuan Liu, and Shengjin
Wang. Towards real-time multi-object tracking. arXiv preprint
arXiv:1909.12605, 2(3):4, 2019.
[11] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
[12] Tomas Simon, Hanbyul Joo, and Yaser Sheikh. Hand keypoint detection in single images using multiview bootstrapping.
CVPR, 2017.
[13] Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin
Gui, Sean Banerjee, Timothy Scott Godisart, Bart Nabbe, Iain
Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh.
Panoptic studio: A massively multiview system for social interaction capture. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2017.
[14] Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain
Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh.
Panoptic studio: A massively multiview system for social motion
capture. In The IEEE International Conference on Computer Vision
(ICCV), 2015.
[15] Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu.
Lending a hand: Detecting hands and recognizing activities in
complex egocentric interactions. In The IEEE International Conference on Computer Vision (ICCV), December 2015.
24[16] James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1,
pages 281–297. Oakland, CA, USA, 1967.
[17] A. Neubeck and L. Van Gool. Efficient non-maximum suppression. In 18th International Conference on Pattern Recognition
(ICPR’06), volume 3, pages 850–855, 2006.