| 研究生: |
劉政威 Zheng-Wei Liu |
|---|---|
| 論文名稱: |
適用於深度增強式學習之瀑布式排程方法 Waterfall Model for Deep Reinforcement Learning Based Scheduling |
| 指導教授: |
黃志煒
Chih-Wei Huang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系在職專班 Executive Master of Communication Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | 排程 、強化學習 |
| 外文關鍵詞: | Scheduling, Reinforcement Learning |
| 相關次數: | 點閱:18 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
第四代通訊系統已可滿足移動式設備的多媒體應用需求。透過基地台提供的排程服務,用戶設備可在通訊系統的下行鏈路獲取各自所需的資料封包,藉以滿足並獲得更好的應用服務,因此配給通道資源並提供用戶群排程服務的演算法相當關鍵。本文實現一行動通訊排程學習平台,提出基於Deep Deterministic Policy Gradient模型,並採用瀑布模型概念將排程算法流程依序解析為排序挑選、資源評估和通道分配三個階段,透過階段微型算法學習挑選在當前通訊環境下使單位時間資料吞吐量更多並滿足更多用戶需求的瀑布式排程方法。行動通訊排程學習平台由六大模組元件架構而成:基地台與通道資源、強化學習神經網路、用戶設備屬性、應用服務類型、環境資訊與獎勵函式,與階段微型算法與依賴注入。利用反轉控制與依賴注入降低平台軟體耦合性,在階段微型算法與六大模組元件的維護上變得相當容易。
The fourth generation of communication systems has been able to meet the multimedia application needs of mobile devices. Through the scheduling service provided by the base station, the user equipment can obtain the data packets required by the downlink of the communication system to meet and obtain better application services, so the channel resources are allocated and the calculation of the user group scheduling service is provided. The law is quite critical. This paper implements a mobile communication scheduling learning platform, and proposes a Deep Deterministic Policy Gradient model. The waterfall model concept is used to analyze the scheduling algorithm flow into three stages: sorting selection, resource evaluation and channel allocation. A waterfall scheduling method that enables more data throughput per unit time and meets more user needs in the current communication environment. The mobile communication scheduling learning platform is composed of six modular components: base station and channel resources, enhanced learning neural network, user equipment attributes, application service types, environmental information and reward functions, and phase micro-algorithms and dependency injection. . Using inversion control and dependency injection to reduce platform software coupling, it is quite easy to maintain the stage micro-algorithm and the six module components.
[1] Lin Wang, Lei Jiao, Ting He, Jun Li, and Max Mühlhäuser. Service entity placement for social virtual reality applications in edge
computing. IEEE INFOCOM 2018 - IEEE Conference on Computer
Communications, pages 468–476, 2018.
[2] 3GPP TS 23.501. System Architecture for 5G System. Technical
report.
[3] S.-C. Tseng, Z.-W. Liu, Y.-C. Chou, and C.-W. Huang. Radio resource scheduling for 5g nr via deep deterministic policy gradient.
in IEEE International Conference on Communications Workshops
(ICC WS), 2019.
[4] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation.
Advances in Neural Information Processing Systems 12, 1999.
[5] V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. Advances
in Neural Information Processing Systems 12, 1999.
[6] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. Deterministic policy gradient algorithms. International Conference on Machine Learning, 2014.
[7] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
D. Silver, , and D. Wierstra. Continuous control with deep reinforcement learning. International Conference on Learning Representations, February 2016.
[8] Martin Fowler. Inversion of Control Containers and the Dependency Injection pattern. https://martinfowler.com/articles/
injection.html, 2004. [Online; accessed 23-January-2004].
[9] Abbas and Ali E. Constructing multiattribute utility functions for
decision analysis. In Risk and Optimization in an Uncertain World,
pages 62–98. INFORMS, 2010.