| 研究生: |
賴慶榮 Lai, Chin-Rong |
|---|---|
| 論文名稱: |
基於深度學習之人形偵測以實現空中手寫與行人姿態辨識 Human Body Detection Based on Deep Learning to Facilitate Air Writing and Pedestrian Gait Recognition |
| 指導教授: | 范國清 |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 82 |
| 中文關鍵詞: | 空中手寫 、行人姿態 |
| 外文關鍵詞: | Air Writing, Pedestrian Gait |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於智慧型科技快速發展,人類姿態辨識的研究已成為熱門的研究領域之一。所謂姿態辨識即是使用電腦或智慧型設備來偵測並解譯人類姿態意涵的能力。這些姿態包括人類的手或軀體的移動、臉部表情甚或聲音指令等,皆可以做為用來控制設備或人機介面所使用。空中手寫是一種新型的人類與智慧型設備通信方法,允許使用者以自然連續的方式進行溝通控制。而步態辨識則是另一種健康照護或安全監視的應用領域,而最新興起的機器學習則可以應用於上述兩種技術的研究發展,並可對其所獲得的資料進行分析與解譯。
相較於其他書寫方法而言,空中手寫具有冗餘提筆筆畫、單一字書寫多樣性(multiplicity)及不同字軌跡類似模糊性(confusion)等獨特的特性,導致其較之於其書寫方法更具挑戰性。我們提出了一個嶄新的逆時序演算法,無需任何啟動的觸發動作或筆畫,有效率地過濾掉不必要的提筆筆畫,並簡化了複雜的筆劃軌跡比對程序。接著我們設計了一個三層階梯式結構,並以不同的取樣速率對空中手寫軌跡進行取樣,以解決書寫多樣性及軌跡類似模糊性等問題,所提出逆時序筆畫軌跡辨識的方法,其精確率可高達94%以上。
有關行人步態辨識方面,我們利用深度神經網路來達到自動偵測與辨識的功能。在抓取行人骨骼與關節移動部分,使用的是一連串的行人彩色影像輸入,而非使用穿戴式裝置來獲取影像資料。其後,我們使用捲機神經網路(CNN)抓取行人的位置,接著行人的密集光流這些低階特徵也被抽取出來,一起當成下階段處理的輸入資料。下一步是使用經微調的寬殘差網路(wide Residual Network)來抽取高階的抽象特徵。除此之外,為了克服使用二維(2D) CNN無法獲得局部且具有時序性特徵的困難,我們引入並使用了部分的三維(3D)卷積結構。此種設計使得在記憶體受到限制的實體環境中,能獲得有效的特徵抽取並提高了深度神經網路(DNN)的執行效能。實驗結果顯示本論文所提出的行人偵測辨識方法具有相當良好的執行效能。
With the rapid development of intelligent technologies, gesture recognition has become one of the most popular research areas in the world. It is the ability of a computer or smart device to detect and interpret human gestures. Such gestures, including movements of hand or body, facial expressions or even voice commands, can be used to control devices or interfaces. Air-writing is a new human and smart device communication approach which permits users to write inputs in a natural and relentless way. Gait recognition is another one for healthcare and surveillance. And machine learning can be applied to these two typical applications to analyze and interpret the captured data.
Compared with other writing methods, air-writing is more challenging due to its unique characteristics such as redundant lifting strokes, multiplicity, and confusion. Without using any starting trigger, we propose a novel reverse time-ordered algorithm to efficiently filter out unnecessary lifting strokes, and thus simplifies the matching procedure. Then a tiered arrangement structure is proposed by sampling the air-writing results with various sampling rates to solve the multiplicity and confusion problems. The recognition accuracy of the proposed approach is satisfactorily higher than 94%.
As to the gait recognition, we apply a deep neural network (DNN) to achieve gait-based automatic pedestrian detection and recognition. Instead of using wearable devices to precisely capture skeletal and joint movements, pedestrian color-image sequences are used as input. At a subsequent time, a pretraining convolutional neural network (CNN) is employed to capture pedestrian location, and the pedestrian dense optical flow is extracted to serve as concrete low-level feature inputs. Then, a finely-tuned DNN based on the wide residual network is employed to extract high-level abstract features. In addition, to overcome the difficulty of obtaining local temporal features by using a 2D CNN, part of the 3D convolutional structure is introduced into the CNN. This design enabled use of limited memory to acquire more effective features and enhance the DNN performance. The experimental results show that the proposed method has exceptional performance for pedestrian detection and recognition.
[1] M.Y. Chen, G. Alregib, B.-H. Juang, Air-writing recognition—Part I: modeling and recognition of characters, words, and connecting motions, IEEE Trans. Hum.-Mach. Syst. 46 (3) (2016) 403–413.
[2] S. Mitra, T. Acharya, Gesture recognition: a survey, IEEE Trans. Syst., Man, Cybern. C Appl. Rev. 37 (3) (2007).
[3] L. Gupta, S. Ma, Gesture-based interaction and communication: automated classification of hand gesture contours, IEEE Trans. Syst. Man Cybern. C Appl. Rev. 31 (1) (2001) 114–120.
[4] I. Infantino, R. Rizzo, S. Gaglio, A framework for sign language sentence recognition by commonsense context, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., 37(5) (Sep. 2007) 1034–1039.
[5] X. Zhang, et al., A framework for hand gesture recognition based on accelerometer and EMG sensors, IEEE Trans. Syst., Man, Cybern., A, Syst., Humans, 41(6) (Nov. 2011) 1064-1076.
[6] K.M. Lim, A.W.C. Tan, S.C. Tan, Block-based histogram of optical flow for isolated sign language recognition, J. Vis. Commun. Image R. 40 (2016) 538–545.
[7] T.-H. S. Li, M.-C. Kao, P.-H. Kuo, Recognition System for Home-Service-Related Sign Language Using Entropy-Based K -Means Algorithm and ABC-Based HMM, IEEE Trans. Systems, Man, and Cybernetics, 46(1) (Jan. 2016).
[8] N.C. Kiliboz, U. Gudukay, A hand gesture recognition technique for human–computer interaction, J. Vis. Commun. Image R. 28 (2015) 97–104.
[9] J. Yang, J. Yuan, Y. Li, Parsing 3D motion trajectory for gesture recognition, J. Vis. Commun. Image R. 38 (2016) 627–640.
[10] X. Zhang, et al., A new writing experience: finger writing in the air using a kinect sensor, IEEE Multimedia 20 (4) (2013) 85–93.
[11] K. Tsuchida, H. Miyao, M. Maruyama, Handwritten Character Recognition in the Air by Using Leap Motion Controller, International Conference on Human Computer Interaction, vol. 52, Springer, pp. 534-538, 2015, https://doi.org /10.1007/978-3-319-21380-4_91.
[12] C.-C. Chiang, R.-H. Wang, B.-R. Chen, ‘Recognizing arbitrarily connected and superimposed handwritten numerals in intangible writing interfaces’, Pattern Recogn. 61 (2017) 15–28.
[13] J. Tian, C. Qu, W. Xu, S. Wang, KinWrite: Handwriting-Based Authentication Using Kinect, in: Proceedings of the 20th Annual Network & Distributed System Security Symposium, 2013.
[14] Romain Tavenard, An introduction to Dynamic Time Warping, https://rtavenar.github.io/blog/dtw.html#dynamic-time-warping
[15] C.Z. Qu, D.Y. Zhang, J. Tian, Online kinect handwritten digit recognition based on dynamic time warping and support vector machine, J. Inform. Computational Sci. 12 (1) (2015) 413–422.
[16] -T. Chu, C.-Y. Su, A Kinect-Based Handwritten Digit Recognition for TV Remote Controller, IEEE International Symposium on Intelligent Signal Processing and Communications Systems, 2012, pp.414-419.
[17] F.-A. Huang, C.-Y. Su, T.-Te Chu, Kinect-Based Bid-Air Handwritten Digit Recognition using Multiple Segments and Scaled Coding, IEEE International Symposium on Intelligent Signal Processing and Communications Systems, Nov. 2013, pp. 694-697.
[18] C.-Y. Su, et al., Kinect-Based Midair Handwritten Number Recognition System for Dialing Numbers and Setting a Timer, IEEE International Conference on Systems, Man and Cybernetics, Oct. 2014, pp. 2127-2130.
[19] T. Murata, J. Shin, Hand Gesture and Character Recognition Based on Kinect Sensor, International Journal of Distributed Sensor Networks, vol. 10, Jul. 2014, [online] Available: https://doi.org/10.1155/2014/543278460.
[20] A. Schick, D. Morlock, C. Amma, Vision-Based Handwriting Recognition for Unrestricted Text Input in Mid-Air, in: Proceedings of the 14th ACM international conference on Multimodal Interaction, Oct. 2012, pp. 217-220.
[21] S. Beg, M. F. Khan, and F. Baig, “Text writing in Air,” Journal of Information Display, vol. 14, no. 4, 2013, https://doi.org/10.1080/15980316.2013.860928.
[22] A. Takeuchi, Y. Manabe, K. Sugawara, Multimodal Soft Biometrie Verification by Hand Shape and Handwriting Motion in the Air, IEEE International Joint Conference on Awareness Science and Technology and Ubi-Media Computing, Nov. 2013, pp. 103-109.
[23] Z.-Wen Sun et al., A 3-D hand gesture signature based biometric authentication system for smartphones, Security Communication Networks, vol. 9, Feb. 2016, pp.1359-1373.
[24] G. Xiao, M. Milanova, M. Xie, Secure behavioral biometric authentication with leap motion, 2016 4th International Symposium on Digital Forensic and Security (ISDFS), Little Rock, AR, 2016, pp. 112-118, doi: 10.1109/ISDFS.2016.7473528.
[25] N. Akazawa, Y. Takei, Y. Nakayama, H. Kakuda, M. Suzuki, A Learning Support System for 9x9 multiplication table with Kinect, in: IEEE 2nd Global Conference on Consumer Electronics (GCCE), Oct. 2013, pp. 253-257.
[26] P. Suryanarayan, A. Subramanian, D. Mandalapu, Dynamic Hand Pose Recognition Using Depth Data, IEEE International Conference on Pattern Recognition, Aug. 2010, pp. 3105-3108.
[27] L.W. Chiu, et al., Person authentication by air-writing using 3D sensor and time order stroke context, International Conference on Smart Multimedia ICSM (2018) 260–273.
[28] T.-H. Tsai, J.-W. Hsieh, H.C. Chen, Shih-Chin Huang. Reverse time ordered stroke context for air-writing recognition, in: 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media).
[29] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Trans. Pattern Recognition Mach. Intell. 24 (4) (2002) 509–522.
[30] Hofmann, M.; Geiger, J.; Bachmann, S.; Schuller, B.; Rigoll, G. The TUM Gait from Audio, Image and Depth (GAID) Database:Multimodal Recognition of Subjects and Traits. J. Vis. Commun. Image Represent. 2014, 25, 195–206. [CrossRef]
[31] Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 8 December 2014.
[32] Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
[33] Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015.
[34] Feng, Y.; Li, Y.; Luo, J. Learning Effective Gait Features Using LSTM. In Proceedings of the International Conference on Pattern Recognition (ICPR), Cancún, Mexico, 4–8 December 2016.
[35] Giacomo, G.; Martinelli, F.; Saracino, A.; Alishahi, M.S. Try Walking in My Shoes, if You Can: Accurate Gait Recognition Through Deep Learning. In Proceedings of the International Conference on Computer Safety, Reliability, and Security, Trento, Italy, 12–15 September 2017.
[36] Das, D.; Chakrabarty, A. Human Gait Recognition using Deep Neural Networks. In Proceedings of the International Conference on Information and Communication Technology for Competitive Strategies, Udaipur, India, 4–5 March 2016.
[37] Sokolova, A.; Konushin, A. Pose-based Deep Gait Recognition. IET Biom. 2019, 8, 134–143. [CrossRef]
[38] Castro, F.M.; Marín-Jiménez, M.J.; Guil, N.; Pérez de la Blanca, N. Automatic learning of gait signatures for people identification. In Proceedings of the International Work-Conference on Artificial Neural Networks, Cádiz, Spain, 18 May 2017.
[39] Redmon, J.; Farhadi, A. Yolo9000: Better, faster, stronger. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
[40] Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
[41] Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016.
[42] Girshick, R. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015.
[43] Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 7–12 December 2015.
[44] Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
[45] Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; Van Der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning optical flow with convolutional networks. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015.
[46] He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016.
[47] Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference, York, UK,19–22 September 2016.