跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李岱龍
Dai-Long Lee
論文名稱: A Multi-Agent Reinforcement Learning Framework for Datacenter Traffic Optimization
指導教授: 孫敏德
Min-Te Sun
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 74
中文關鍵詞: 多代理人強化學習資料中心流量控制
外文關鍵詞: Multi-Agent, Reinforcement Learning, Datacenter, Traffic Control
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 多年來,資料中心的網路流量優化一直是一項熱門的研究議題,傳統流量優化演算法主要為以資料中心管理者的經驗法則與對網路環境的理解為基礎來實作。然而,隨著現在網路環境越加複雜且快速變化,傳統演算法可能無法適當的處理其中的流量。近年隨著強化學習的蓬勃發展,有許多的相關研究證實使用強化學習應用於網路流量控制的可行性。本研究提出可應用於資料中心流量控制的多代理人的強化學習框架,我們設計常見的拓僕作為模擬環境。利用網路最佳化中經常使用的效用函數作為強化學習中代理人的獎勵函數,再透過深度神經網路讓代理人學習如何最大化獎勵函數,藉此找出最佳的網路控制策略,此外,為了加強代理人於環境中的探索效率,我們在代理人的策略網路參數加入噪聲造成擾動。我們的實驗結果顯示兩件事:1) 當代理人以簡單的深度網路架構實作時,本框架效能亦不會有所損失使 2) 本框架可以達到接近傳統演算法的表現且不需要傳統演算法的必要假設。


    Datacenter traffic optimization has been a popular study domain for years. Traditional methods to this problem are mainly based on rules crafted with datacenter operators’ experience and knowledge to the network environment. However, the traffic in a modern datacenter tends to be more complicated and dynamic, which may cause traditional method to fail. With the booming development in deep reinforcement learning, a number of research works have proven to be feasible to adopt deep reinforcement learning in the domain of traffic control. In this research, we propose a multi-agent reinforcement learning framework that can be applied to the problem of datacenter traffic control. The simulation environment is carefully designed to consist of popular topologies. With the reward function based on utility function that is often used for traffic optimization, our agents learn an optimal traffic control policy by maximizing the reward function with the deep neural network. Additionally, to improve the exploration efficiency of our agents in the environment, noise is introduced to perturb parameters of the agent’s policy network. Our experimental results show two points: 1) The performance of our framework does not downgrade when agents are implemented with a simple network architecture. 2) The proposed framework performs nearly as well as popular traffic control schemes without assumptions that are required by those traffic control schemes.

    1 Introduction P.1 2 Related Work P.4 2.1 Traditional Traffic Optimization Schemes P.4 2.2 Machine Learning for Traffic Optimization P.5 2.3 Reinforcement Learning for Traffic Optimization P.5 3 Preliminary P.7 3.1 Reinforcement Learning P.7 3.1.1 Deterministic Policy Gradient P.12 3.1.2 Multi-Agent Deterministic Policy Gradient P.13 4 Design P.16 4.1 Environment Configuration P.16 4.1.1 Network Topology P.16 4.1.2 Traffic Pattern P.21 4.2 Proposed Multi-Agent Reinforcement Learning Framework P.23 4.2.1 Definitions of State Space, Action Space, and Reward Function P.25 4.2.2 Actor Network and Critic Network Model P.28 4.2.3 Exploration with Parameter Noise P.31 4.2.4 Framework Algorithm P.33 4.2.5 Feature Scaling P.37 4.2.6 Hybrid Traffic Control Scheme P.38 5 Performance P.39 5.1 Experimental Settings P.39 5.2 Evaluation Metrics P.40 5.3 Experiment Results of Dumbbell Topology P.41 5.4 Experiment Results of JellyFish Topology P.47 6 Conclusion P.54 Reference P.55

    [1] Gigabit ethernet. https://en.wikipedia.org/wiki/Gigabit_Ethernet.
    [2] Open vswitch. http://www.openvswitch.org/.
    [3] Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commod-
    ity data center network architecture. ACM SIGCOMM computer communication
    review, 38(4):63–74, 2008.
    [4] Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen
    Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. Data center tcp
    (dctcp). In Proceedings of the ACM SIGCOMM 2010 conference, pages 63–74, 2010.
    [5] Maximilian Bachl, Tanja Zseby, and Joachim Fabini. Rax: Deep reinforcement learn-
    ing for congestion control. In ICC 2019-2019 IEEE International Conference on
    Communications (ICC), pages 1–6. IEEE, 2019.
    [6] Justin A Boyan and Michael L Littman. Packet routing in dynamically changing
    networks: A reinforcement learning approach. In Advances in neural information
    processing systems, pages 671–678, 1994.
    [7] Lawrence S. Brakmo and Larry L. Peterson. Tcp vegas: End to end congestion
    avoidance on a global internet. IEEE Journal on selected Areas in communications,
    13(8):1465–1480, 1995.
    [8] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman,
    Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
    [9] Tony F Chan, Gene Howard Golub, and Randall J LeVeque. Updating formulae
    and a pairwise algorithm for computing sample variances. In COMPSTAT 1982 5th
    Symposium held at Toulouse 1982, pages 30–41. Springer, 1982.
    [10] Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. Auto: Scaling deep reinforcement
    learning for datacenter-scale automatic traffic optimization. In Proceedings of the
    2018 Conference of the ACM Special Interest Group on Data Communication, pages
    191–205, 2018.
    [11] Zhitang Chen, Jiayao Wen, and Yanhui Geng. Predicting future traffic using hidden
    markov models. In 2016 IEEE 24th international conference on network protocols
    (ICNP), pages 1–6. IEEE, 2016.
    [12] Peng Cheng, Fengyuan Ren, Ran Shu, and Chuang Lin. Catch the whole lot in an
    action: Rapid precise packet loss notification in data center. In 11th {USENIX}
    Symposium on Networked Systems Design and Implementation ({NSDI} 14), pages
    17–28, 2014.
    [13] Intel Corporation. Pktgen - traffic generator powered by dpdk. https://github.
    com/pktgen/Pktgen-DPDK.
    [14] Ítalo Cunha, Pietro Marchetta, Matt Calder, Yi-Ching Chiu, Bruno VA Machado,
    Antonio Pescapè, Vasileios Giotsas, Harsha V Madhyastha, and Ethan Katz-Bassett.
    Sibyl: a practical internet route oracle. In 13th {USENIX} Symposium on Networked
    Systems Design and Implementation ({NSDI} 16), pages 325–344, 2016.
    [15] Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira.
    {PCC}: Re-architecting congestion control for consistent high performance. In 12th
    {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI}
    15), pages 395–408, 2015.
    [16] Mo Dong, Tong Meng, Doron Zarchy, Engin Arslan, Yossi Gilad, Brighten Godfrey,
    and Michael Schapira. {PCC} vivace: Online-learning congestion control. In 15th
    {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI}
    18), pages 343–356, 2018.
    [17] Hewlett Packard Enterprise.
    Hewlettpackard/netperf.
    https://github.com/
    HewlettPackard/netperf.
    [18] ESnet. iperf3: A tcp, udp, and sctp network bandwidth measurement tool. https:
    //github.com/esnet/iperf.
    [19] Sally Floyd, Mark Handley, Jitendra Padhye, and Jörg Widmer. Equation-based con-
    gestion control for unicast applications. ACM SIGCOMM Computer Communication
    Review, 30(4):43–56, 2000.
    [20] Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson.
    Learning to communicate with deep multi-agent reinforcement learning. In Advances
    in neural information processing systems, pages 2137–2145, 2016.
    [21] Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W
    Moore, Gianni Antichi, and Marcin Wójcik. Re-architecting datacenter networks
    and stacks for low latency and high performance. In Proceedings of the Conference
    of the ACM Special Interest Group on Data Communication, pages 29–42, 2017.
    [22] Christian Hopps et al. Analysis of an equal-cost multi-path algorithm. Technical
    report, RFC 2992, November, 2000.
    [23] Charles Hornig. Standard for the transmission of ip datagrams over ethernet net-
    works. 1984.
    [24] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network
    training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
    [25] Nathan Jay, Noga Rotman, Brighten Godfrey, Michael Schapira, and Aviv Tamar.
    A deep reinforcement learning perspective on internet congestion control. In Inter-
    national Conference on Machine Learning, pages 3050–3059, 2019.
    [26] Junchen Jiang, Shijie Sun, Vyas Sekar, and Hui Zhang. Pytheas: Enabling data-
    driven quality of experience optimization using group-based exploration-exploitation.
    In 14th {USENIX} Symposium on Networked Systems Design and Implementation
    ({NSDI} 17), pages 393–406, 2017.
    [27] Abdul Kabbani and Milad Sharif. Flier: Flow-level congestion-aware routing for
    direct-connect data centers. In IEEE INFOCOM 2017-IEEE Conference on Com-
    puter Communications, pages 1–9. IEEE, 2017.
    [28] Diego Kreutz, Fernando MV Ramos, Paulo Esteves Verissimo, Christian Esteve
    Rothenberg, Siamak Azodolmolky, and Steve Uhlig. Software-defined networking:
    A comprehensive survey. Proceedings of the IEEE, 103(1):14–76, 2014.
    [29] Bob Lantz, Brandon Heller, and Nick McKeown. A network in a laptop: rapid
    prototyping for software-defined networks. In Proceedings of the 9th ACM SIGCOMM
    Workshop on Hot Topics in Networks, pages 1–6, 2010.
    [30] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,
    Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep rein-
    forcement learning. arXiv preprint arXiv:1509.02971, 2015.
    [31] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor
    Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments.
    In Advances in neural information processing systems, pages 6379–6390, 2017.
    [32] Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. Resource
    management with deep reinforcement learning. In Proceedings of the 15th ACM
    Workshop on Hot Topics in Networks, pages 50–56, 2016.
    [33] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson,
    Jennifer Rexford, Scott Shenker, and Jonathan Turner. Openflow: enabling inno-
    vation in campus networks. ACM SIGCOMM Computer Communication Review,
    38(2):69–74, 2008.
    [34] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,
    Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg
    Ostrovski, et al. Human-level control through deep reinforcement learning. Nature,
    518(7540):529–533, 2015.
    [35] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,
    Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg
    Ostrovski, et al. Human-level control through deep reinforcement learning. Nature,
    518(7540):529–533, 2015.
    [36] Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control.
    IEEE/ACM Transactions on networking, 8(5):556–567, 2000.
    [37] Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y
    Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz. Parameter
    space noise for exploration. arXiv preprint arXiv:1706.01905, 2017.
    [38] K Ramakrishnan, Sally Floyd, and D Black. Rfc3168: The addition of explicit
    congestion notification (ecn) to ip, 2001.
    [39] Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. In-
    side the social network’s (datacenter) network. In Proceedings of the 2015 ACM
    Conference on Special Interest Group on Data Communication, pages 123–137, 2015.
    [40] Fabian Ruffy, Michael Przystupa, and Ivan Beschastnikh.
    Iroko: A framework
    to prototype reinforcement learning for data center traffic control. arXiv preprint
    arXiv:1812.09975, 2018.
    [41] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolu-
    tion strategies as a scalable alternative to reinforcement learning. arXiv preprint
    arXiv:1703.03864, 2017.
    [42] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George
    Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam,
    Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree
    search. nature, 529(7587):484–489, 2016.
    [43] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin
    Riedmiller. Deterministic policy gradient algorithms. 2014.
    [44] Joel Sing and Ben Soh. Tcp new vegas: improving the performance of tcp vegas over
    high latency links. In Fourth IEEE International Symposium on Network Computing
    and Applications, pages 73–82. IEEE, 2005.
    [45] Ankit Singla, Chi-Yao Hong, Lucian Popa, and P Brighten Godfrey. Jellyfish: Net-
    working data centers randomly. In Presented as part of the 9th {USENIX} Sympo-
    sium on Networked Systems Design and Implementation ({NSDI} 12), pages 225–
    238, 2012.
    [46] Anirudh Sivaraman, Keith Winstein, Pratiksha Thaker, and Hari Balakrishnan. An
    experimental study of the learnability of congestion control. ACM SIGCOMM Com-
    puter Communication Review, 44(4):479–490, 2014.
    [47] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction.
    2011.
    [48] udhos, j3hempsey, and apostov. goben. https://github.com/udhos/goben.
    [49] Bhanu Chandra Vattikonda, George Porter, Amin Vahdat, and Alex C Snoeren.
    Practical tdma for datacenter ethernet. In Proceedings of the 7th ACM european
    conference on Computer Systems, pages 225–238, 2012.
    [50] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-
    4):279–292, 1992.
    [51] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist
    reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
    [52] Keith Winstein and Hari Balakrishnan. Tcp ex machina: Computer-generated con-
    gestion control. ACM SIGCOMM Computer Communication Review, 43(4):123–134,
    2013.
    [53] Zhiyuan Xu, Jian Tang, Jingsong Meng, Weiyi Zhang, Yanzhi Wang, Chi Harold
    Liu, and Dejun Yang. Experience-driven networking: A deep reinforcement learning
    based approach. In IEEE INFOCOM 2018-IEEE Conference on Computer Commu-
    nications, pages 1871–1879. IEEE, 2018.
    [54] Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip Levis,
    and Keith Winstein. Pantheon: the training ground for internet congestion-control
    research. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18),
    pages 731–743, 2018.
    [55] Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, and Dongya Wu.
    Convolutional neural networks for time series classification. Journal of Systems En-
    gineering and Electronics, 28(1):162–169, 2017.

    QR CODE
    :::