跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳宇睿
CHEN,YU-JUI
論文名稱: Meta-Learning Traffic Pattern Adaptation for DRL-Based Radio Resource Management
指導教授: 黃志煒
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 通訊工程學系
Department of Communication Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 39
中文關鍵詞: 無線資源管理
相關次數: 點閱:8下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著第五代行動通訊技術的發展逐步成熟,人們對於透過行動網路進行交流與溝通的需求也持續飛快增長,未來將出現許多對延遲和可靠度要求嚴格的新型應用服務,例如:以360度8K超高畫質XR來觀看直播或遊玩多人互動式線上遊戲,使得5G網路的環境變得更加擁擠且複雜。因此未來在5G網路中如何有效率地進行資源分配將成為一項重要的任務,並且隨著人工智慧領域的蓬勃發展,許多關於機器學習的研究已被用於處理無線資源管理的問題上。然而傳統的深度學習方法雖然效能強大並已能夠在單一任務上取得優異的表現,但是當遇到變化較大的新環境時,無外乎都得再從頭開始學習。而元學習是機器學習領域近年來興起的新技術,被認為能夠有效地解決這個問題,因此在本研究中我們提出了基於元學習適應性的深度強化學習演算法,用以解決在5G MEC下多使用者互動環境中的資源分配問題,來讓系統的頻寬效益最大化。最後我們的實驗結果顯示,透過元學習的技術不但能有效的提高現有深度強化學習演算法的效能,並且未來將我們的資源分配機器學習模型部署到新環境時,更能快速適應從未接觸過的新環境。


    As the development of the fifth-generation mobile communication technology gradually matures, people's demand for communication through mobile networks continues to grow rapidly. In the future, many new application services with strict requirements on latency and reliability will appear, such as 360-degree 8K Ultra-high-definition XR watching the live stream or playing multiplayer interactive online games makes the 5G network environment more crowded and complicated. Therefore, how to efficiently allocate resources in 5G networks will become an important task in the future, and with the vigorous development of artificial intelligence, many kinds of research on machine learning have been used to deal with wireless resource management issues. Although traditional deep learning methods are powerful and have been able to achieve excellent performance on a single task. But when encountering a new environment with great changes, it is nothing more than learning from scratch. Meta-Learning is a new technology that has emerged in the field of machine learning in recent years. It is considered to be able to effectively solve this problem. In this research, we propose a deep reinforcement learning algorithm based on Meta-Learning adaptability to solve the problem of resource allocation in a multi-user interactive environment under 5G MEC to maximize the bandwidth benefit of the system. The final experimental results show that the Meta-Learning technology can not only effectively improve the performance of existing deep reinforcement learning algorithms, but also when deploying our resource allocation machine learning model to a new environment in the future, it will be able to quickly adapt to a new environment that has never been touched before.

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 B5G/6G Application Scenarios . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background and Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 RRM for MEC Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Meta-Learning for DRL . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Traffic Pattern Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Hybrid traffic scenario and multi-user interactive radio resource management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Traffic Pattern Adaptation Problem . . . . . . . . . . . . . . . . . . . . . 9 4 MDP Model for Edge RRM . . . . . . . . . . . . 11 4.1 The MDP Model and State-Action-Reward Setting . . . . . . . . . . . . 11 5 Meta-Learning Based Traffic Pattern Adaptation . . . . 14 5.1 Decoupled Exploration and Execution Strategy via Meta-Learning . . . . 14 5.2 Model-Based Meta-Learning . . . . . . . . . . . . . . . . . . . . . . . . 16 6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    [1] Y.-H. Hsu, J.-H. Cheng, K.-Y. Liao, Y.-S. Wang, T.-H. Chen, H.-Y. Chen, C.-K.
    Yen, and W. Liao, “Ntu smart edge for wireless virtual reality,” in 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), 2020, pp.
    1–2.
    [2] Q.-V. Pham, F. Fang, V. N. Ha, M. J. Piran, M. Le, L. B. Le, W.-J. Hwang, and
    Z. Ding, “A survey of multi-access edge computing in 5g and beyond: Fundamentals, technology integration, and state-of-the-art,” IEEE Access, vol. 8, pp. 116 974–
    117 017, 2020.
    [3] B. Cao, L. Zhang, Y. Li, D. Feng, and W. Cao, “Intelligent offloading in multi-access
    edge computing: A state-of-the-art review and framework,” IEEE Communications
    Magazine, vol. 57, no. 3, pp. 56–62, 2019.
    [4] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-edge ai: Intelligentizing mobile edge computing, caching and communication by federated learning,”
    IEEE Network, vol. 33, no. 5, pp. 156–165, 2019.
    [5] Y. Wei, F. R. Yu, M. Song, and Z. Han, “Joint optimization of caching, computing,
    and radio resources for fog-enabled iot using natural actor–critic deep reinforcement
    learning,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 2061–2073, 2019.
    [6] L. T. Tan and R. Q. Hu, “Mobility-aware edge caching and computing in vehicle
    networks: A deep reinforcement learning,” IEEE Transactions on Vehicular Technology, vol. 67, no. 11, pp. 10 190–10 203, 2018.
    [7] X. Hu, S. Liu, R. Chen, W. Wang, and C. Wang, “A deep reinforcement learningbased framework for dynamic resource allocation in multibeam satellite systems,”
    IEEE Communications Letters, vol. 22, no. 8, pp. 1612–1615, 2018.
    [8] Z. Du, Y. Deng, W. Guo, A. Nallanathan, and Q. Wu, “Green deep reinforcement
    learning for radio resource management: Architecture, algorithm compression, and
    challenges,” IEEE Vehicular Technology Magazine, vol. 16, no. 1, pp. 29–39, 2021.
    [9] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation
    of deep networks,” in International Conference on Machine Learning. PMLR,
    2017, pp. 1126–1135.
    [10] P.-C. Chen, Y.-C. Chen, W.-H. Huang, C.-W. Huang, and O. Tirkkonen, “Ddpgbased radio resource management for user interactive mobile edge networks,” in
    2020 2nd 6G Wireless Summit (6G SUMMIT), 2020, pp. 1–5.
    [11] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and
    D. Wierstra, “Continuous Control with Deep Reinforcement Learning,” in International Conference on Learning Representations (ICLR), Feb. 2016.
    [12] Y. Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading for mobileedge computing with energy harvesting devices,” IEEE Journal on Selected Areas
    in Communications, vol. 34, no. 12, pp. 3590–3605, 2016.
    [13] C. You, K. Huang, H. Chae, and B.-H. Kim, “Energy-efficient resource allocation
    for mobile-edge computation offloading,” IEEE Transactions on Wireless Communications, vol. 16, no. 3, pp. 1397–1411, 2017.
    [14] A. Al-Shuwaili and O. Simeone, “Energy-efficient resource allocation for mobile
    edge computing-based augmented reality applications,” IEEE Wireless Communications Letters, vol. 6, no. 3, pp. 398–401, 2017.
    [15] X. Chen, L. Jiao, W. Li, and X. Fu, “Efficient multi-user computation offloading for
    mobile-edge cloud computing,” IEEE/ACM Transactions on Networking, vol. 24,
    no. 5, pp. 2795–2808, 2016.
    [16] J. Zhang, W. Xia, F. Yan, and L. Shen, “Joint computation offloading and resource
    allocation optimization in heterogeneous networks with mobile edge computing,”
    IEEE Access, vol. 6, pp. 19 324–19 337, 2018.
    [17] T. Yang, Y. Hu, M. C. Gursoy, A. Schmeink, and R. Mathar, “Deep reinforcement
    learning based resource allocation in low latency edge computing networks,” in 2018
    15th International Symposium on Wireless Communication Systems (ISWCS), 2018,
    pp. 1–5.
    [18] T. Alfakih, M. M. Hassan, A. Gumaei, C. Savaglio, and G. Fortino, “Task offloading
    and resource allocation for mobile edge computing by deep reinforcement learning
    based on sarsa,” IEEE Access, vol. 8, pp. 54 074–54 084, 2020.
    [19] N. Shan, X. Cui, and Z. Gao, ““drl+ fl”: An intelligent resource allocation model
    based on deep reinforcement learning for mobile edge computing,” Computer Communications, vol. 160, pp. 14–24, 2020.
    [20] M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum,
    H. Larochelle, and R. S. Zemel, “Meta-learning for semi-supervised few-shot classification,” arXiv preprint arXiv:1803.00676, 2018.
    [21] C. Finn, K. Xu, and S. Levine, “Probabilistic model-agnostic meta-learning,”
    in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach,
    H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31.
    Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/
    paper/2018/file/8e2c381d4dd04f1c55093f22c59c3a08-Paper.pdf
    [22] M. Botvinick, S. Ritter, J. X. Wang, Z. Kurth-Nelson, C. Blundell, and D. Hassabis,
    “Reinforcement learning, fast and slow,” Trends in cognitive sciences, vol. 23, no. 5,
    pp. 408–422, 2019.
    [23] K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen, “Efficient off-policy metareinforcement learning via probabilistic context variables,” in International conference on machine learning. PMLR, 2019, pp. 5331–5340.
    [24] J. D. Co-Reyes, Y. Miao, D. Peng, E. Real, Q. V. Le, S. Levine,
    H. Lee, and A. Faust, “Evolving reinforcement learning algorithms,” in
    International Conference on Learning Representations, 2021. [Online]. Available:
    https://openreview.net/forum?id=0XXpJ4OtjW
    [25] X. Song, Y. Yang, K. Choromanski, K. Caluwaerts, W. Gao, C. Finn, and
    J. Tan, “Rapidly adaptable legged robots via evolutionary meta-learning,” in 2020
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020,
    pp. 3769–3776.
    [26] Q. He, A. Moayyedi, G. Dan, G. P. Koudouridis, and P. Tengkvist, “A meta-learning ´
    scheme for adaptive short-term network traffic prediction,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 10, pp. 2271–2283, 2020.
    [27] A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero,
    and R. Hadsell, “Meta-learning with latent embedding optimization,” in
    International Conference on Learning Representations, 2019. [Online]. Available:
    https://openreview.net/forum?id=BJgklhAcK7
    [28] K. Lee, Y. Seo, S. Lee, H. Lee, and J. Shin, “Context-aware dynamics model for
    generalization in model-based reinforcement learning,” in International Conference
    on Machine Learning. PMLR, 2020, pp. 5757–5766.
    [29] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with
    deep reinforcement learning,” in Proceedings of the 15th ACM workshop on hot
    topics in networks, 2016, pp. 50–56.
    [30] E. Z. Liu, A. Raghunathan, P. Liang, and C. Finn, “Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices,” in International Conference on Machine Learning. PMLR, 2021, pp. 6925–6935.
    [31] S. Belkhale, R. Li, G. Kahn, R. McAllister, R. Calandra, and S. Levine, “Modelbased meta-reinforcement learning for flight with suspended payloads,” IEEE
    Robotics and Automation Letters, vol. 6, no. 2, pp. 1471–1478, 2021.
    [32] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An introduction to
    variational methods for graphical models,” Machine learning, vol. 37, no. 2, pp.
    183–233, 1999.
    [33] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
    [34] 3GPP TS 23.501, “System Architecture for the 5G System.”
    [35] T. Xu, Q. Liu, L. Zhao, and J. Peng, “Learning to explore via meta-policy gradient,”
    in International Conference on Machine Learning. PMLR, 2018, pp. 5463–5472.

    QR CODE
    :::