使用強化學習之動作同步遠端操控人型機器人｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳誌銘 Jhih-Ming Wu
論文名稱：	使用強化學習之動作同步遠端操控人型機器人 A reinforcement learning based motion tracking approach for remote humanoid robot manipulation
指導教授：	李柏磊 Po-Lei Lee
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	89
中文關鍵詞：	慣性感測器、逆向運動學、強化學習、運動重定向
外文關鍵詞：	IMU, Inverse kinematics, Reinforcement learning, Motion retargeting
相關次數：	點閱：28 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本研究使用可穿戴式慣性感測器（IMU）獲取關節旋轉資訊，將IMU數據輸入至Unity軟體中加以計算得以獲得人體骨骼姿態，設計運動捕捉系統獲取人體運動時間序列數據。本系統使用運動重定向（Motion Retargeting）的技術，透過人體骨骼姿態控制NaoV6，於頭部配戴VR頭戴顯示器以獲取機器人視野，使操作者有沉浸式的體驗，設計語音系統使人機兩端能進行語音通訊。NaoV6身上配置數個關節傳感器，透過傳感器之反饋信息與動態模型（Linear Inverse Pendulum）來實現平穩走路，考量機器人的安全性，足部控制設計條件透過閥值觸發前進、側移及轉身等動作，手勢姿態能夠滿足許多多元生活需求，因此手部控制必須更加精密，本研究使用逆向運動學與強化學習兩種方法將人體的手部姿態運動重定向至機器人手部控制，並比較兩種運動重定向方式對於系統的優劣。逆向運動學法替機器人手臂建置Denavit-Hartenberg參數模型（D-H模型），並將當前人體姿態之笛卡爾座標位置映射至機器人維度之座標位置，依據逆向運動學解，將當前手臂位置回推至機器人的關節角度，驅使機器人執行相應的動作。強化學習法，採用行動者評論家網路之學習方式，於機器人端預先設計好動作，分別為歡呼、揮手、指向、雙手合十、敬禮及擦臉等目標動作，透過獎懲機制自主式學習，使受試者的姿態資料自動生成機器人的姿態動作。經實驗驗證，本系統能即時識別操作者動作，而機器人在表現動作上也能順暢的被操控與正確表現動作，提出的模型具備泛化的能力，能夠執行部分未學習的動作，並依據平均弗雷歇距離分析，本系統平均軌跡誤差約1.9公分，在重定向控制上具有很高的穩定度。

This study aims to use the inertial measurement unit sensor (IMU) data to reconstruct the human skeleton animation posture in Unity. A self-designed motion capture system is used to record the time series trajectory data of human animation. In this system, human controls Nao-V6 remotely by human posture with motion retargeting method, and gets the robot vision by VR headset, making user have an immersive experience. Design an audio system to communicate with the user of operator side and robot side.
Equipped with the smooth default foot movement in Nao V6 due to the feedback information of the sensors that mounted on the robot and Linear Inverse Pendulum model. Considering the safety of the robot, the foot control such as move forward, move sideway and turn action will be triggered by threshold. People always use different gestures to meet with various requirement of daily life, that is the reason that gesture control must be more sophisticated. Two different motion retargeting methods are indicated and compared in this research, inverse kinematics and reinforcement learning.
The inverse kinematics method needs to build Denavit-Hartenberg parameter model for each robot’s arm, and map the Cartesian coordinate of the current human posture to the robot dimension. The joint angles of the robot will be back-calculated through the current human arm position by the inverse kinematics solution. The reinforcement learning adopts Actor-Critic network. For the model learning, the robot should make six pre-designed motions. In the training phase, the human gesture will generate the gesture of the robot, the model parameter will update by reward and punishment rules. The proposed system has been demonstrated to successfully recognize subjects’ different in the initial onset of each motion action. According to the analysis of the average Fréchet distance, the average trajectory error of the system is 1.9 cm, and it has a high stability in the motion retargeting control.

中文摘要    i
Abstract    vii
目錄    viii
圖目錄    x
表目錄    xi
第一章 緒論    1
1-1    研究動機與目的    1
1-2    文獻探討    2
1-2-1    人體姿態研究    2
1-2-2    運動重定向    2
1-2-3    機器人運動重定向    2
1-2-4    強化學習    3
1-3    論文章節結構    4
第二章 原理介紹    5
2-1    慣性感測單元    5
2-2    四元數與歐拉角    6
2-2-1    四元數（Quaternion）    6
2-2-2    歐拉角與旋轉矩陣    9
2-3    Nao機器人    10
2-4    機器人運動學    11
2-5    變分自動編碼器（Variational Autoencoder）    13
2-6    強化學習    15
2-6-1    強化學習簡介    15
2-6-2    行動者評論家算法（Actor-Critic）    17
2-6-3    近端策略優化算法（Proximal Policy Optimization ,PPO）    18
2-7    弗雷歇距離（Fréchet distance）    21
第三章 研究設計與方法    22
3-1    系統架構    22
3-1-1    操作者遠端控制系統    22
3-1-2    人體運動捕捉系統    23
3-1-3    機器人系統    24
3-2    系統資料處理    26
3-2-1    IMU to Segment（I2S）校正    26
3-2-2    資料前處理    28
3-2-3    資料傳輸    29
3-3    系統設計    31
3-3-1    手部控制（強化學習）    31
3-3-2    手部控制（逆向運動學）    43
3-3-3    音訊控制    48
3-3-4    頭部控制    50
3-3-5    足部控制    52
第四章 結果與討論    56
4-1    座標軸轉換    56
4-2    變分自動編碼器    57
4-3    行動者評論家網路    58
4-4    重定向效能    59
4-5    系統整合應用    69
第五章 結論與未來展望    72
5-1    結論    72
5-2    未來展望    72
第六章 參考文獻    73

                                

[1] E.N. Corlett, S.J. MADELEY, and I. Manenica, "Posture targeting: a technique for recording working postures," vol. 22, no. 3, pp. 357-366, 1979.

[2] Victor Z. Priel, "A numerical definition of posture," vol. 16, no. 6, pp. 576-584, 1974.

[3] Wiktorin C, Mortimer M, Ekenvall L, Kilbom A,and Wigaeus Hjelm E, "HARBO, a simple computer-aided observation method for recording work postures," Scandinavian Journal of Work, Environment & Health, vol. 21, no. 6, pp. 440-449, 1995.

[4] Casale, P., Pujol, O., Radeva, P, "Human Activity Recognition from Accelerometer Data Using a Wearable Device," In: Vitrià, J., Sanches, J.M., Hernández, M. (eds) Pattern Recognition and Image Analysis. IbPRIA 2011. Lecture Notes in Computer Science, vol 6669, pp.289-296, 2011.

[5] J. Bandera, J. Rodriguez, L. Molina-Tanco, and A. Bandera, “A survey of vision-based architectures for robot learning by imitation,” International Journal of Humanoid Robotics, vol. 9, no. 01, p. 1250006, 2012.

[6] J.-S. Monzani, P. Baerlocher, R. Boulic, and D. Thalmann, “Using an intermediate skeleton and inverse kinematics for motion retargeting,” in Computer Graphics Forum, vol. 19, pp. 11–19, Wiley Online Library, 2000.

[7] M.-K. Hsieh, B.-Y. Chen, and M. Ouhyoung, “Motion retargeting and transition in different articulated figures,” in Ninth International Conference on Computer Aided Design and Computer Graphics (CADCG’05), pp. 6–pp, IEEE, 2005.

[8] S. Wang, X. Zuo, R. Wang, F. Cheng, and R. Yang, “A generative human-robot motion retargeting approach using a single depth sensor,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5369–5376, IEEE, 2017.

[9] P. Shahverdi and M. T. Masouleh, "A simple and fast geometric kinematic solution for imitation of human arms by a NAO humanoid robot,"2016 4th International Conference on Robotics and Mechatronics (ICROM), pp. 572-577,2016.

[10] E. Rolley-Parnell et al., "Bi-Manual Articulated Robot Teleoperation using an External RGB-D Range Sensor," 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 298-304, 2018.

[11] Sungjoon Choi, Matt Pan, and Joohyung Kim, "Nonparametric Motion Retargeting for Humanoid Robots on Shared Latent Space, "In: Proceedings of Robotics: Science and Systems (RSS) ,2020.

[12] Yuwei Liang et al. "Dynamic movement primitive based motion retargeting for dual-arm sign language motions". In: 2021 IEEE International Conference on Robotics and Automation
(ICRA). IEEE, pp. 8195–8201, 2021.

[13] Taewoo Kim and Joo-Haeng Lee, "C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning, "In: 2020 IEEE International
Conference on Robotics and Automation (ICRA), IEEE, pp. 8425–8432, 2020.

[14] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning, "arXiv preprint arXiv:1312.5602, 2013.

[15] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., "Mastering the game of go with deep neural networks and tree search, "nature, vol. 529, no. 7587, p. 484, 2016.

[16] Y. Liu, A. Gupta, P. Abbeel, and S. Levine, "Imitation from observation: Learning to imitate behaviors from raw video via context translation, " in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125, IEEE, 2018.

[17] X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills, "ACM Transactions on Graphics (TOG), vol. 37, no. 4, p. 143, 2018.

[18] Oliver Kroemer, Scott Niekum, and George Konidaris. "A review of robot learning for manipulation: Challenges, representations, and algorithms," arXiv preprint arXiv:1907.03146, 2019.

[19] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning. CoRR, " abs/1509.02971, 2015.
[20] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. " Asynchronous methods for deep reinforcement learning," In International Conference on Machine Learning, pp. 1928–1937, 2016.

[21] Justin Fu, Sergey Levine, and Pieter Abbeel. "One-shot learning of manipulation skills with online dynamics adaptation and neural network priors, " In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4019–4026. IEEE, 2016.

[22] Ivaylo Popov, Nicolas Heess, Timothy P. Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, and Martin A. Riedmiller. "Data-efficient deep reinforcement learning for dexterous manipulation, " CoRR, abs/1704.03073, 2017.

[23] Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. "Overcoming exploration in reinforcement learning with demonstrations," In International Conference on Robotics and Automation, pp. 6292–6299, 2018.

[24] Henry Zhu, Abhishek Gupta, Aravind Rajeswaran, Sergey Levine, and Vikash Kumar. "Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost," arXiv preprint arXiv:1810.06045, 2018.

[25] Emanuel Todorov, Tom Erez, and Yuval Tassa. "Mujoco: A physics engine for model-based control, " In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012.

[26] Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B Tenenbaum, and Antonio Torralba. "Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids," arXiv preprint arXiv:1810.01566, 2018.

[27] Hislop, Jaime, Mats Isaksson, John McCormick, and Chris Hensman. "Validation of 3-Space Wireless Inertial Measurement Units Using an Industrial Robot," Sensors 21, no. 20: 6858, 2021.

[28] Konda, V. Actor-Critic Algorithms. PhD thesis, Cambridge: Middlesex, Massachusetts, USA, 2002.

[29] L. P. Poubel, S. Sakka, D. Cehajic, and D. Creusot. "Support changes during online human motion imitation by a humanoid robot using task specification, " in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, pp. 1782–1787, 2014.

[30] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms, " arXiv preprint arXiv:1707.06347, 2017.

[31] Wien, Technische Universität ,Eiter, Thomas ,Eiter, Thomas ,Mannila, Heikki ,Mannila, Heikki , "Computing discrete Fréchet distance, " 1994.

[32] Tao Yu, Jianhui Zhao., Zerong Zheng, Kaiwen Guo, Qionghai Dai, HaoLi, Gerard Pons-Moll, Yebin Liu, "DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 10, pp. 2523-2539, 2020.

[33] C. Breazeal, C.D. Kidd, A.L. Thomaz, G. Hoffman, M. Berlin, Effects of nonverbal communication on efficiency and robustness in human-robot teamwork, in: Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, Alberta, Canada, pp. 708–713, 2005.

[34] Aaron P. Shon, Keith Grochow and Rajesh. P. N. Rao, "Robotic imitation from human motion capture using Gaussian processes," 5th IEEE-RAS International Conference on Humanoid Robots, pp. 129-134, 2005.

[35] Ibrahim, Adrianto & Adiprawita, Widyawardana, "Analytical Upper Body Human Motion Transfer to Naohumanoid Robot," International Journal on Electrical Engineering and Informatics. pp.563-574, 2012.

[36] Lee, Alex X., et al., "Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model,"Advances in Neural Information Processing Systems 33, pp.741-752, 2020.

簡易檢索 / 詳目顯示

相關論文