跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李沅紘
Yuan Hung Lee
論文名稱: A Novel Multi-Task-Agents Reinforcement Learning with Multi-Dimensional Action Space
指導教授: 施國琛
Timothy K. Shih
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 44
中文關鍵詞: 機械學習強化學習多維度動作空間多玩家星海爭霸2
外文關鍵詞: Machine learning, Reinforcement learning, multi dimension action space, multi agent, StarCraft2
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 於硬體大幅的進步,使得強化學習(Reinforcement learning)可以實作並在AI領域帶來新的革新技術,也成為受人歡迎的領域。他可以應用在難以預測的環境上有良好的效果。先前關於強化學習的研究大部分專注在一個玩家在一維度且小範圍動作空間環境,但是多玩家可以處理空間更大互動更多的環境。強化學習其中一項有挑戰性的困難點在多任務的互動行為。本論文中我們提出了一種新型的model,模型透過多個玩家的合作在多任務和多維度動作空間來得取高分。此外我們提出可行的方式來分解巨大的動作空間及減少硬體內存需求和減少計算時間來提高效果。在StartCraft2平台上進行了性能評估,以證明迷你遊戲的有效性。實驗結果表明,所提出的方法在所有指標上均明顯優於比較模型。


    Owing to the great advance of hardware techniques, RL (Reinforcement learning) can implement and become popular technique to interact with unpredictable environment. Most prior studies of successive RL model only focus on interaction between single agent and environment with single task and small action space. However, the multi-agent can solve more problem. One of RL challenge task is interaction in multi task environment. In this paper, we propose a novel RL model to interact with the multi task and multi dimension action space environment well through cooperation of agents. In addition, we propose a feasible way to decompose action space to reduce memory size and calculation, and improve the effect. The performance evaluations are conducted on the StartCraft2 platform to demonstrate the effectiveness of mini-games. The experimental results show that the proposed methods significantly outperform the state-of-the-art models in all metrics.

    摘 要 i Abstract ii 誌 謝 iii List of tables v List of figures vi Symbol vii 1. Introduction 1 2. Related work 4 2.1 RL in single dimension action space 4 2.2 RL in multi-dimensional action space 5 2.3 RL in Multi agent 6 3. Preliminary 7 3.1 Notation 7 3.2 Problem Definition 7 4. Proposed RL: R2-PPO 8 4.1 Feature extraction and Reward function 8 4.2 slave training model 9 4.2.1 Action space decomposition 9 4.2.2 Proximal Policy Optimization (PPO) 11 4.2.3 Multi-dimensional action space PPO 12 4.3 Master training model 13 5. Performance Evaluation 16 5.1 Experiment Setup 16 5.2 R2-PPO Performance 19 5.3 The Effectiveness of R2-PPO training episodes 19 6. Conclusion 30 Reference 31

    [1] David Silver, Thomas Hubert, Julian Schrittwieser, "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm," in Science, 2018.
    [2] Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van, "StarCraft II: A New Challenge for Reinforcement Learning," in arXiv, 2017.
    [3] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, "Playing Atari with Deep Reinforcement Learning," in Neural Information Processing Systems, 2013.
    [4] Hado van Hasselt, Arthur Guez, David Silver, "Deep Reinforcement Learning with Double Q-learning," in Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, 2016.
    [5] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas, "Dueling Network Architectures for Deep Reinforcement Learning," in The 33rd International Conference on Machine Learning, 2016.
    [6] Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, "Prioritized Experience Replay," in International Conference on Learning Representations, 2016.
    [7] Kristopher De Asis, 1 J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton, "Multi-Step Reinforcement Learning: A Unifying Algorithm," in Thirty-Second AAAI Conference on Artificial Intelligence , 2018.
    [8] Matteo Hessel, Joseph Modayil,Hado van Hasselt, "Rainbow: Combining Improvements in Deep Reinforcement Learning," in Association for the Advancement of Artificial Intelligence 2018, 2017.
    [9] Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation," in the 12th International Conference on Neural Information Processing Systems, 1999.
    [10] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel, "Trust Region Policy Optimization," in International conference on machine learning, 2015.
    [11] Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver, "Emergence of Locomotion Behaviours in Rich Environments," in arXiv, 2017.
    [12] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, "Proximal Policy Optimization Algorithms," in arXiv, 2017.
    [13] Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," in International Conference on Machine Learning, 2016.
    [14] Warwick Masson, Pravesh Ranchod, George Konidaris, "Reinforcement learning with parameterized actions," in the Thirtieth of Association for the Advancement of Artificial Intelligence, 2016.
    [15] Matthew Hausknecht, Mupparaju, Sandeep Subramanian, Shivaram Kalyanakrishnan, and Peter Stone, "Half field offense: An environment for multiagent learning and ad hoc teamwork," in AAMAS Adaptive Learning Agents (ALA) Workshop, 2016.
    [16] Matthew Hausknecht, Peter Stone, "Deep reinforcement learning in parameterized," in the International Conference on Learning Representations, 2016.
    [17] Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu, "Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space," in CoRR, abs/1810.06394, 2018.
    [18] Ermo Wei, Drew Wicke, Sean Luke, "Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space," in AAAI Fall Symposium on Data Efficient Reinforcement Learning , 2018.
    [19] Zhou Fan, Rui Su,Weinan Zhang,Yong Yu, "Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space," in International Joint Conferences on Artificial Intelligence 2019, 2019.
    [20] Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross, "Efficient Entropy for Policy Gradient with Multidimensional Action Space," in International Conference on Learning Representations, 2018.
    [21] M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents," in Machine Learning Proceedings, 1993, p. 330.
    [22] Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, "Emergent complexity via multi-agent competition," in The International Conference on Learning Representations, 2018.
    [23] Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch, "Emergent tool use from multi-agent autocurricula," in The International Conference on Learning Representations 2020 Conference Blind, 2019.
    [24] Jakob N. Foerster, Yannis M. Assael,Nando de Freitas,Shimon Whiteson, "Learning to Communicate with Deep Multi-Agent Reinforcement Learning," in Advances in Neural Information Processing Systems, 2016.
    [25] Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, "Grandmaster level in StarCraft II using multi-agent reinforcement learning," in Nature, 2019.
    [26] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel, "Value-decomposition networks for cooperative multi-agent learning based on team reward," in the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018.
    [27] Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala, "Episodic Exploration for Deep Deterministic Policies for StarCraft Micro-Management," in International Conference on Learning, 2017.
    [28] Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson, "Counterfactual Multi-Agent Policy Gradients," in the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    [29] Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson, "QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning," in the 35th International Conference on Machine Learning, 2018.
    [30] David Ha, Andrew Dai, Quoc V. Le, "HyperNetworks," in the International Conference on Learning Representations, 2017.
    [31] M. L. Littman, "Markov games as a framework for multi-agent reinforcement learning," in the eleventh international conference on machine learning, 1994.
    [32] Sham Kakade, John Langford, "Approximately optimal approximate reinforcement learning," in the Nineteenth International Conference on Machine Learning, 2002.
    [33] Alex Krizhevsky, Ilya Sutskever,Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional," in Advances in neural information processing systems, 2012.

    QR CODE
    :::