跳到主要內容

簡易檢索 / 詳目顯示

研究生: 泰利
Ashiq Hussain Teeli
論文名稱: Advancing Human Action Recognition for Precision Assembly Using Vision and Mechanomyography Signals
指導教授: 林錦德
Lin, Chin-Te
口試委員:
學位類別: 碩士
Master
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 68
中文關鍵詞: 人體動作辨識精確組裝穿戴式感測器手部動作辨識深度學習模型過渡不穩定性工業安全人機協作精細動作辨識
外文關鍵詞: Human Action Recognition, Precision Assembly, Wearable Sensors, Hand Action Recognition, Deep Learning Model, Transition Instability, Industrial Safety, Human-Robot Collaboration, Fine-Action Recognition
相關次數: 點閱:20下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中文摘要
    本研究著重於精密組裝環境中之連續人體動作識別的挑戰,特別解決因僅應用攝影機
    在識別小動作、過渡不穩定性和整體準確性方面的局限性。為此,提出了一種新穎的
    多模組系統,包含識別身體動作的相機、識別精密手部動作的手環、和確認手部動作
    識別是否合乎啟動條件的次要相機。這些訊號輸入給身體與手部動作辨識模組,最後
    以決策模組統合兩者輸出以形成更好的識別結果。透過實際任務,包含樂高汽車組裝
    和電子連接器組裝任務來評估系統的性能,並比較了 AE+LSTM、LSTM+Attention 及
    單純 LSTM 三種深度學習模型的性能。結果表明,LSTM+Attention 模型在手部和身體
    動作辨識方面均有著優越的表現。其次,所提出之方法在識別大規模身體動作和小手
    部動作方面都有顯著改進,而且在精細動作識別方面顯著優於基於單純攝影機的系統。
    最後,決策模型有效地管理了過渡不穩定,提高了HAR系統的整體可靠性。總結而言,
    這項研究為精密組裝環境提出了強大的解決方案,為 HAR 領域做出了貢獻,有可能提
    高工業環境中的安全性、效率和人機協作。未來的工作應該集中在改進演算法以更好
    地處理噪音。此外,也應確保 HAR 系統在動態工業環境中對使用者友好且有效。


    Abstract
    This study addresses the challenges of continuous Human Action Recognition (HAR) in
    precision assembly environments, focusing on the limitations of camera-based systems in
    recognizing small actions, transition instability, and overall accuracy. To this end, a novel
    multi-module system is proposed, including a camera that recognizes body movements, a
    bracelet that recognizes precise hand movements, and a secondary camera that confirms
    whether hand movement recognition meets the startup conditions. The sensing signals are input
    to the body and hand action recognition modules, and finally the decision-making module
    integrates their outputs to form better recognition results. This research employed an
    experimental approach using LEGO car assembly and electronic connector assembly tasks to
    evaluate the performance of the system. Three deep learning models, AE + LSTM, LSTM +
    Attention, and LSTM, were compared. The results show that the LSTM + Attention model
    demonstrated superior performance in both hand and body action recognition. Also, significant
    improvements in recognizing both large-scale body movements and small hand actions, and
    here the wearable sensor outperforming the camera-based system in fine-action recognition.
    Finally, the decision-making model effectively managed transition instability and enhanced the
    overall reliability of the HAR system. This research contributes to the field of HAR by
    proposing a robust solution for precision assembly environments, potentially improving safety,
    efficiency, and human-robot collaboration in industrial settings. Future work should focus on
    refining the algorithms to better handle noise. Additionally, emphasis should be placed on
    ensuring that the HAR system is user friendly and effective in dynamic industrial settings.

    Abstract i 中文摘要 ii Acknowledgment iii Content iv Figures vii Tables ix Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Objective 2 1.3 Thesis Outline 3 Chapter 2 Literature Review 4 2.1 Deep Learning 4 2.1.1 Convolutional Neural Networks 4 2.1.2 Recurrent Neural Networks 6 2.1.3 Attention Models 8 2.1.4 MediaPipe 8 2.2 Data Capture Techniques 9 2.2.1 Mechanomyography (MMG) 9 2.2.2 Body Tracking 10 2.3 Industrial Applications 10 2.4 Related Studies 12 2.5 Synthesis of Literature 15 Chapter 3 Methodology 16 3.1 Research Design 16 3.2 Equipment Used in This Study 17 3.2.1 ZED 2 Camera 17 3.2.2 CoolSo Bracelet 18 3.2.3 Simple Camera 19 3.2.4 Computing Environment 21 3.3 Dataset 22 3.3.1 ZED 2 Camera 23 3.3.2 CoolSo bracelet 24 3.3.3 Pre-processing 28 3.4 Deep Learning Models 29 3.4.1 Auto-Encoder and LSTM Model 29 3.4.2 LSTM and Attention Model 31 3.4.3 LSTM Model 32 Chapter 4 Experiment and Result 33 4.1 Dataset 33 4.2 Action Sequence and Experimental Setup 35 4.3 Training Parameters 37 4.4 Experimental Results 39 4.4.1 Hand Action Recognition Model 39 4.4.2 Body Action Recognition Model 41 4.4.3 Decision Making Model 41 4.4.4 Results and Analysis of Real-Time Action Recognition 43 4.5 Precision electronic connector assembly 44 Chapter 5 Discussion 49 5.1 Deep Learning Model 49 5.2 HAR System 49 5.3 Decision Model 50 5.4 System Effectiveness and General Applicability 50 Chapter 6 Conclusion and Future Work 52 6.1 Contribution 52 6.2 Future Work 53 References 54

    Figure 1 Components used in Assembly 2
    Figure 2 Single Layer Neural Network[1] 4
    Figure 3 The overall structural design of the CNN model[3] 5
    Figure 4 Auto-Encoder architecture diagram[5] 6
    Figure 5 Recurrent network system[6] 7
    Figure 6 Architecture of the LSTM model[7] 8
    Figure 7 Mediapipe hand tracking[10] 9
    Figure 8 MMG signals[11] 10
    Figure 9 Overview of the human machine interaction system [12] 11
    Figure 10 Module diagram of the system [13] 12
    Figure 11 Timeline and framework of the system [14] 13
    Figure 12 Discrete and continuous action recognition comparison [15] 14
    Figure 13 Recognized gestures in[17] 15
    Figure 14 System architecture of continuous action recognition 16
    Figure 15 ZED 2 binocular vision camera 17
    Figure 16 Body key points 18
    Figure 17 CoolSo wearable device 18
    Figure 18 LOGITECH HD Camera 20
    Figure 19 Hand key points in rectangular boundary 20
    Figure 20 Relationship between computers 22
    Figure 21 Skeleton frame in the sequence 23
    Figure 22 Data collection system 25
    Figure 23 Gesture data collection system 26
    Figure 24 Auto Encoder + LSTM model architecture 30
    Figure 25 LSTM + Attention model architecture 31
    Figure 26 LSTM model architecture 32
    Figure 27 Five hand action recognition for this experiment 34
    Figure 28 Four body action recognition for this experiment 35
    Figure 29 LEGO Car assembly recipe 37
    Figure 30 Effect of Sequence Length 39
    Figure 31 Training and validation loss graph, along with a confusion matrix derived from the test dataset for a hand action recognition model 40
    Figure 32 Training and validation loss graph, along with a confusion matrix derived from the test dataset for body action recognition model 41
    Figure 33 Algorithm of decision module 43
    Figure 34 Sequence diagram of prediction results of real time continuous action recognition 44
    Figure 35 Tiny parts flex and terminal used in connector assembly 45
    Figure 36 Confusion matrix derived from the bracelet data input 46
    Figure 37 Confusion matrix derived from the camera data input 47
    Figure 38 Action A0 (Pick up Flex) and A2 (Install Flex) 48

    QR CODE
    :::