| 研究生: |
泰利 Ashiq Hussain Teeli |
|---|---|
| 論文名稱: | Advancing Human Action Recognition for Precision Assembly Using Vision and Mechanomyography Signals |
| 指導教授: |
林錦德
Lin, Chin-Te |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 68 |
| 中文關鍵詞: | 人體動作辨識 、精確組裝 、穿戴式感測器 、手部動作辨識 、深度學習模型 、過渡不穩定性 、工業安全 、人機協作 、精細動作辨識 |
| 外文關鍵詞: | Human Action Recognition, Precision Assembly, Wearable Sensors, Hand Action Recognition, Deep Learning Model, Transition Instability, Industrial Safety, Human-Robot Collaboration, Fine-Action Recognition |
| 相關次數: | 點閱:20 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
中文摘要
本研究著重於精密組裝環境中之連續人體動作識別的挑戰,特別解決因僅應用攝影機
在識別小動作、過渡不穩定性和整體準確性方面的局限性。為此,提出了一種新穎的
多模組系統,包含識別身體動作的相機、識別精密手部動作的手環、和確認手部動作
識別是否合乎啟動條件的次要相機。這些訊號輸入給身體與手部動作辨識模組,最後
以決策模組統合兩者輸出以形成更好的識別結果。透過實際任務,包含樂高汽車組裝
和電子連接器組裝任務來評估系統的性能,並比較了 AE+LSTM、LSTM+Attention 及
單純 LSTM 三種深度學習模型的性能。結果表明,LSTM+Attention 模型在手部和身體
動作辨識方面均有著優越的表現。其次,所提出之方法在識別大規模身體動作和小手
部動作方面都有顯著改進,而且在精細動作識別方面顯著優於基於單純攝影機的系統。
最後,決策模型有效地管理了過渡不穩定,提高了HAR系統的整體可靠性。總結而言,
這項研究為精密組裝環境提出了強大的解決方案,為 HAR 領域做出了貢獻,有可能提
高工業環境中的安全性、效率和人機協作。未來的工作應該集中在改進演算法以更好
地處理噪音。此外,也應確保 HAR 系統在動態工業環境中對使用者友好且有效。
Abstract
This study addresses the challenges of continuous Human Action Recognition (HAR) in
precision assembly environments, focusing on the limitations of camera-based systems in
recognizing small actions, transition instability, and overall accuracy. To this end, a novel
multi-module system is proposed, including a camera that recognizes body movements, a
bracelet that recognizes precise hand movements, and a secondary camera that confirms
whether hand movement recognition meets the startup conditions. The sensing signals are input
to the body and hand action recognition modules, and finally the decision-making module
integrates their outputs to form better recognition results. This research employed an
experimental approach using LEGO car assembly and electronic connector assembly tasks to
evaluate the performance of the system. Three deep learning models, AE + LSTM, LSTM +
Attention, and LSTM, were compared. The results show that the LSTM + Attention model
demonstrated superior performance in both hand and body action recognition. Also, significant
improvements in recognizing both large-scale body movements and small hand actions, and
here the wearable sensor outperforming the camera-based system in fine-action recognition.
Finally, the decision-making model effectively managed transition instability and enhanced the
overall reliability of the HAR system. This research contributes to the field of HAR by
proposing a robust solution for precision assembly environments, potentially improving safety,
efficiency, and human-robot collaboration in industrial settings. Future work should focus on
refining the algorithms to better handle noise. Additionally, emphasis should be placed on
ensuring that the HAR system is user friendly and effective in dynamic industrial settings.
Figure 1 Components used in Assembly 2
Figure 2 Single Layer Neural Network[1] 4
Figure 3 The overall structural design of the CNN model[3] 5
Figure 4 Auto-Encoder architecture diagram[5] 6
Figure 5 Recurrent network system[6] 7
Figure 6 Architecture of the LSTM model[7] 8
Figure 7 Mediapipe hand tracking[10] 9
Figure 8 MMG signals[11] 10
Figure 9 Overview of the human machine interaction system [12] 11
Figure 10 Module diagram of the system [13] 12
Figure 11 Timeline and framework of the system [14] 13
Figure 12 Discrete and continuous action recognition comparison [15] 14
Figure 13 Recognized gestures in[17] 15
Figure 14 System architecture of continuous action recognition 16
Figure 15 ZED 2 binocular vision camera 17
Figure 16 Body key points 18
Figure 17 CoolSo wearable device 18
Figure 18 LOGITECH HD Camera 20
Figure 19 Hand key points in rectangular boundary 20
Figure 20 Relationship between computers 22
Figure 21 Skeleton frame in the sequence 23
Figure 22 Data collection system 25
Figure 23 Gesture data collection system 26
Figure 24 Auto Encoder + LSTM model architecture 30
Figure 25 LSTM + Attention model architecture 31
Figure 26 LSTM model architecture 32
Figure 27 Five hand action recognition for this experiment 34
Figure 28 Four body action recognition for this experiment 35
Figure 29 LEGO Car assembly recipe 37
Figure 30 Effect of Sequence Length 39
Figure 31 Training and validation loss graph, along with a confusion matrix derived from the test dataset for a hand action recognition model 40
Figure 32 Training and validation loss graph, along with a confusion matrix derived from the test dataset for body action recognition model 41
Figure 33 Algorithm of decision module 43
Figure 34 Sequence diagram of prediction results of real time continuous action recognition 44
Figure 35 Tiny parts flex and terminal used in connector assembly 45
Figure 36 Confusion matrix derived from the bracelet data input 46
Figure 37 Confusion matrix derived from the camera data input 47
Figure 38 Action A0 (Pick up Flex) and A2 (Install Flex) 48