基於DQN強化學習之自適應QUIC流量控制機制｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張佑祺 YuChi Chang
論文名稱：	基於DQN強化學習之自適應QUIC流量控制機制 Adaptive QUIC Flow Control Mechanism with Deep Q-Learning
指導教授：	胡誌麟
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	73
中文關鍵詞：	流量控制
外文關鍵詞：	flow control
相關次數：	點閱：11 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著全球的網際網路流量以及物聯網裝置的增加，現代的網路應用對於延遲時間、遺失率、吞吐量提出了更高的要求。近年來，為了滿足網路應用的需求，發展出了一項新的傳輸層網路協定：QUIC，QUIC 結合了傳輸控制協定 (Transmission ControlProtocol, TCP) 和使用者資料報協定 (User Datagram Protocol, UDP) 的優勢，在大幅
降低延遲的同時，保有高度的可靠性，但作為一個新協定，QUIC 在流量控制 (Flow Control, FC) 方面的研究尚未發展成熟，導致吞吐量以及延遲等效能指標受到顯著的限制。

近期的研究以基於規則 (rule-based) 的原則來設計 QUIC 的流量控制機制，因此無法很好地適應廣泛的網路環境，並且無法在動態環境中調整其行為，而近年來有許多研究應用機器學習 (Machine Learning, ML) 來解決網路運營和管理中的各種問題，其中強化學習 (Reinforcement Learning, RL) 能夠在沒有先備知識的情況下，從經驗中學習如何與環境進行互動，並逐漸找到最佳的策略，因此能夠在變動的網路環境中，學習正確的流量控制策略，達到良好的傳輸性能。Deep Q-Learning (DQN) 作為其中一個常見的強化學習模型，能夠有效處理高維度狀態空間並且可以解決資料的關聯性問題，提升了演算法的穩定性。基於上述問題，本研究提出一套 QUIC 流量控制方法：FC-DQN，FC-DQN 透過 DQN 強化學習模型來提取端到端 (end-to-end) 的網路特徵，以此來選擇適當的流量控制視窗，使其能夠穩定、快速的學習最佳的流量控制策略。此外，由於 FC-DQN 能根據環境進行動態的規則控制，因此可以適應動態和各種不同的網路場景，實驗結果表明，FC-DQN 的性能優於傳統基於規則的流量控制方法，在保持低遺失率的同時，能夠降低封包的傳輸延遲。

With the increase in global internet traffic, modern network applications have higher requirements for latency, packet loss rates, and throughput. To meet the needs of network applications, a new transport layer network protocol called QUIC has been proposed. It combines the advantages of Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). It significantly reduces the delay while maintaining a high degree of reliability. As a new protocol, the research on QUIC flow control has not yet matured, resulting in significant limitations on performance metrics such as delay and throughput.

Recent research designs QUIC flow control mechanism based on rule-based principles. Therefore, it cannot adapt well to a wide range of network environments, and it cannot adjust its behavior in dynamic environments. In recent years, many studies applying machine learning to solve various problems in network operation and management. As a type of machine learning, reinforcement learning can learn how to interact with the environment without prior knowledge and gradually find the best policy. Thus, it can learn the correct flow control strategy and achieve better transmission performance in the dynamic network environment. Deep Q-Learning (DQN) is a common reinforcement learning model. It can effectively deal with high-dimensional state space and solve the problem of data correlation, which can make the algorithm more stable.

In this paper, we propose a QUIC flow control mechanism called FC-DQN. It can select the appropriate flow control window from the end-to-end network characteristics with DQN reinforcement learning model. Since FC-DQN can accomplish dynamic rule control according to the environment, it can adapt to dynamic and various network scenarios. We show that FC-DQN outperforms the traditional rule-based QUIC flow control mechanisms, and can reduce delay and packet loss rate.

摘要 i
Abstract ii
圖目錄 v
表目錄 vi

1 簡介 1
    1.1 前言 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
    1.2 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2 背景與相關文獻探討 5
    2.1 QUIC 協定的發展 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
    2.2 QUIC 流量控制 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
        2.2.1 RFC9000 開放性標準 . . . . . . . . . . . . . . . . . . . . . . . . .7
        2.2.2 QUIC 流量控制相關研究 . . . . . . . . . . . . . . . . . . . . . . . 8
    2.3 TCP 視窗調控機制 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
        2.3.1 固定視窗大小調控 . . . . . . . . . . . . . . . . . . . . . . . . . 10
        2.3.2 動態視窗大小調控 . . . . . . . . . . . . . . . . . . . . . . . . . 12
        2.3.3 基於強化學習的動態視窗大小調控 . . . . . . . . . . . . . . . . . . 12
    2.4 強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
        2.4.1 強化學習要素 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
        2.4.2 強化學習演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 研究方法 17
    3.1 系統架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
    3.2 問題定義 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 強化學習 24
    4.1 Deep Q-learning (DQN) . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
    4.2 基於 DQN 之流量控制 (FC-DQN) . . . . . . . . . . . . . . . . . . . . . . . . 27
5 實驗結果分析 31
    5.1 實驗環境與設備 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
    5.2 實驗設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
        5.2.1 網路環境設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
        5.2.2 模型參數設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
        5.2.3 效能指標之計算 . . . . . . . . . . . . . . . . . . . . . . . . . . 34
        5.2.4 QUIC 流量控制方法 . . . . . . . . . . . . . . . . . . . . . . . . .35
    5.3 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
        5.3.1 學習率對 FC-DQN 之影響 . . . . . . . . . . . . . . . . . . . . . . 36
        5.3.2 衰減率對 FC-DQN 之影響 . . . . . . . . . . . . . . . . . . . . . . 38
        5.3.3 獎勵函數的權重 . . . . . . . . . . . . . . . . . . . . . . . . . . 40
        5.3.4 不同頻寬網路下之視窗調控 . . . . . . . . . . . . . . . . . . . . . 43
        5.3.5 不同頻寬網路下之性能 . . . . . . . . . . . . . . . . . . . . . . . 51
        5.3.6 動態頻寬網路下之視窗調控 . . . . . . . . . . . . . . . . . . . . . 55
        5.3.7 動態頻寬網路下之性能 . . . . . . . . . . . . . . . . . . . . . . . 57
6 結論與未來研究 60

參考文獻 61
                                

[1] David Reinsel, John Gantz, and John Rydning. Data age 2025: the digitization of the world from edge to core. Seagate, 2018.
[2] Transmission Control Protocol. RFC 793, September 1981.
[3] Jana Iyengar and Martin Thomson. QUIC: A UDP-Based Multiplexed and Secure Transport. RFC 9000, May 2021.
[4] User Datagram Protocol. RFC 768, August 1980.
[5] E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.3. RFC 8446, August 2018.
[6] Mike Belshe, Roberto Peon, and Martin Thomson. Hypertext Transfer Protocol Version 2 (HTTP/2). RFC 7540, May 2015.
[7] Adam Langley, Alistair Riddoch, Alyssa Wilk, Antonio Vicente, Charles Krasic, Dan Zhang, Fan Yang, Fedor Kouranov, Ian Swett, Janardhan Iyengar, et al. The quic transport protocol: Design and internet-scale deployment. In Proceedings of the conference of the ACM special interest group on data communication, pages 183–196, 2017.
[8] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. Robotica, 17(2):229–235, 1999.
[9] Nathan Willis. Connecting on the quic. linux weekly news, 2013.
[10] F Lardinois. Google wants to speed up the web with its quic protocol, 2015. 61
[11] Robin Marx, Joris Herbots, Wim Lamotte, and Peter Quax. Same standards, different decisions: A study of quic and http/3 implementation diversity. In Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC, pages 14–20, 2020.
[12] Quic-go. https://github.com/lucas-clemente/quic-go.
[13] mvfst. https://github.com/facebookincubator/mvfst.
[14] Timo Völker, Ekaterina Volodina, Michael Tüxen, and Erwin P Rathgeb. A quic simulation model for inet and its application to the acknowledgment ratio issue. In 2020 IFIP Networking Conference (Networking), pages 737–742. IEEE, 2020.
[15] Ekaterina Volodina and Erwin P Rathgeb. Flow control in the context of the multiplexed transport protocol quic. In 2020 IEEE 45th Conference on Local Computer Networks (LCN), pages 473–478. IEEE, 2020.
[16] aioquic. https://github.com/aiortc/aioquic.
[17] picoquic. https://github.com/private-octopus/picoquic.
[18] Msquic. https://github.com/microsoft/msquic, 2022.
[19] Junho Cho. quiche. https://github.com/cloudflare/quiche/pull/529, May 2020.
[20] Chromium quic implementation. https://cs.chromium.org/chromium/src/net/quic/.
[21] Sunwoo Lee and Donghyeok An. Enhanced flow control for low latency in quic. Energies, 15(12):4241, 2022.
[22] Andrew S Tanenbaum. Computernetwerken. Pearson Education, 2003.
[23] Shruti Sanadhya and Raghupathy Sivakumar. Adaptive flow control for tcp on mobile phones. In 2011 Proceedings IEEE INFOCOM, pages 2912–2920. IEEE, 2011.
[24] Van Jacobson. Berkeley tcp evolution from 4.3-tahoe to 4.3-reno. Proceedings of 18th IETF, 365, 1990. 62
[25] Sangtae Ha, Injong Rhee, and Lisong Xu. Cubic: a new tcp-friendly high-speed tcp variant. ACM SIGOPS operating systems review, 42(5):64–74, 2008.
[26] Neal Cardwell, Yuchung Cheng, C Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. Bbr: congestion-based congestion control. Communications of the ACM, 60(2):58–66, 2017.
[27] Tom Henderson, Sally Floyd, Andrei Gurtov, and Yoshifumi Nishida. The newreno modification to tcp’s fast recovery algorithm. 2012.
[28] Wei Li, Fan Zhou, Kaushik Roy Chowdhury, and Waleed Meleis. Qtcp: Adaptive congestion control with reinforcement learning. IEEE Transactions on Network Science and Engineering, 6(3):445–458, 2018.
[29] Xiaohui Nie, Youjian Zhao, Zhihan Li, Guo Chen, Kaixin Sui, Jiyang Zhang, Zijie Ye, and Dan Pei. Dynamic tcp initial windows and congestion control schemes through reinforcement learning. IEEE Journal on Selected Areas in Communications, 37(6):1231–1247, 2019.
[30] Alessio Sacco, Matteo Flocco, Flavio Esposito, and Guido Marchetto. Owl: congestion control with partially invisible networks via reinforcement learning. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications, pages 1–10. IEEE, 2021.
[31] Minghao Chen, Rongpeng Li, Jon Crowcroft, Jianjun Wu, Zhifeng Zhao, and Honggang Zhang. Ran information-assisted tcp congestion control using deep reinforcementlearning with reward redistribution. IEEE Transactions on Communications, 70(1):215–230, 2021.
[32] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3):279–292, 1992.
[33] Aurélien Garivier and Eric Moulines. On upper-confidence bound policies for nonstationary bandit problems. arXiv preprint arXiv:0805.3415, 2008. 63
[34] Jan Peters and Stefan Schaal. Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.
[35] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
[36] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
[37] Arthur L Samuel. Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3):210–229, 1959.
[38] Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
[39] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In International conference on machine learning, pages 387–395. PMLR, 2014.
[40] Scott Fujimoto, Herke Hoof, and David Meger. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pages 1587–1596. PMLR, 2018.
[41] Gavin A Rummery and Mahesan Niranjan. On-line Q-learning using connectionist systems, volume 37. Citeseer, 1994.
[42] Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018. 64
[43] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
[44] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
[45] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
[46] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
[47] R. Shade. Flow control in google quic. https://docs.google.com/document/d/1F2YfdDXKpy20WVKJueEf4abn_LVZHhMUMS5gX6Pgjl4/edit, 2016.

簡易檢索 / 詳目顯示

相關論文