適用於深度增強式學習之瀑布式排程方法｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉政威 Zheng-Wei Liu
論文名稱：	適用於深度增強式學習之瀑布式排程方法 Waterfall Model for Deep Reinforcement Learning Based Scheduling
指導教授：	黃志煒 Chih-Wei Huang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系在職專班 Executive Master of Communication Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	53
中文關鍵詞：	排程、強化學習
外文關鍵詞：	Scheduling, Reinforcement Learning
相關次數：	點閱：18 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

第四代通訊系統已可滿足移動式設備的多媒體應用需求。透過基地台提供的排程服務，用戶設備可在通訊系統的下行鏈路獲取各自所需的資料封包，藉以滿足並獲得更好的應用服務，因此配給通道資源並提供用戶群排程服務的演算法相當關鍵。本文實現一行動通訊排程學習平台，提出基於Deep Deterministic Policy Gradient模型，並採用瀑布模型概念將排程算法流程依序解析為排序挑選、資源評估和通道分配三個階段，透過階段微型算法學習挑選在當前通訊環境下使單位時間資料吞吐量更多並滿足更多用戶需求的瀑布式排程方法。行動通訊排程學習平台由六大模組元件架構而成：基地台與通道資源、強化學習神經網路、用戶設備屬性、應用服務類型、環境資訊與獎勵函式，與階段微型算法與依賴注入。利用反轉控制與依賴注入降低平台軟體耦合性，在階段微型算法與六大模組元件的維護上變得相當容易。

The fourth generation of communication systems has been able to meet the multimedia application needs of mobile devices. Through the scheduling service provided by the base station, the user equipment can obtain the data packets required by the downlink of the communication system to meet and obtain better application services, so the channel resources are allocated and the calculation of the user group scheduling service is provided. The law is quite critical. This paper implements a mobile communication scheduling learning platform, and proposes a Deep Deterministic Policy Gradient model. The waterfall model concept is used to analyze the scheduling algorithm flow into three stages: sorting selection, resource evaluation and channel allocation. A waterfall scheduling method that enables more data throughput per unit time and meets more user needs in the current communication environment. The mobile communication scheduling learning platform is composed of six modular components: base station and channel resources, enhanced learning neural network, user equipment attributes, application service types, environmental information and reward functions, and phase micro-algorithms and dependency injection. . Using inversion control and dependency injection to reduce platform software coupling, it is quite easy to maintain the stage micro-algorithm and the six module components.

謝誌....................................................................................................i
中文摘要.............................................................................................iii
英文摘要.............................................................................................v
目錄....................................................................................................vii
圖目錄................................................................................................ix
表目錄................................................................................................xi
一、緒論..............................................................................1
1.1 前言..........................1
1.2 研究動機........................1
1.3 本文貢獻........................1
1.4 論文架構........................2
二、相關研究與技術............................................................3
2.1 強化學習........................3
2.2 ACtor-Critic......................4
2.3 Deep Deterministic Policy Gradient..........5
2.4 控制反轉........................7
2.5 依賴注入........................8
三、瀑布式排程方法............................................................9
3.1 瀑布模型........................9
3.2 階段微型算法.....................10
3.2.1 排序挑選階段.....................10
3.2.2 資源評估階段.....................12
3.2.3 通道分配階段.....................14
四、行動通訊排程學習平台架構..........................................17
4.1 基地台與通道資源...................17
4.2 強化學習神經網路...................18
4.3 用戶設備屬性.....................19
4.4 應用服務類型.....................19
4.5 環境資訊與獎勵函式..................20
4.5.1 環境資訊........................20
4.5.2 獎勵函式........................21
4.6 階段微型算法與依賴注入...............22
五、實驗流程.......................................................................25
5.1 微型算法挑選問題...................25
5.2 微型算法修剪方案...................27
5.2.1 排序挑選階段.....................28
5.2.2 資源評估階段.....................28
5.2.3 通道分配階段.....................28
5.3 排除挑選問題.....................30
六、總結..............................................................................33
6.1 總結..........................33
6.2 未來工作........................33
參考文獻.............................................................................................35

                                

[1] Lin Wang, Lei Jiao, Ting He, Jun Li, and Max Mühlhäuser. Service entity placement for social virtual reality applications in edge
computing. IEEE INFOCOM 2018 - IEEE Conference on Computer
Communications, pages 468–476, 2018.
[2] 3GPP TS 23.501. System Architecture for 5G System. Technical
report.
[3] S.-C. Tseng, Z.-W. Liu, Y.-C. Chou, and C.-W. Huang. Radio resource scheduling for 5g nr via deep deterministic policy gradient.
in IEEE International Conference on Communications Workshops
(ICC WS), 2019.
[4] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation.
Advances in Neural Information Processing Systems 12, 1999.
[5] V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. Advances
in Neural Information Processing Systems 12, 1999.
[6] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. Deterministic policy gradient algorithms. International Conference on Machine Learning, 2014.
[7] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
D. Silver, , and D. Wierstra. Continuous control with deep reinforcement learning. International Conference on Learning Representations, February 2016.
[8] Martin Fowler. Inversion of Control Containers and the Dependency Injection pattern. https://martinfowler.com/articles/
injection.html, 2004. [Online; accessed 23-January-2004].
[9] Abbas and Ali E. Constructing multiattribute utility functions for
decision analysis. In Risk and Optimization in an Uncertain World,
pages 62–98. INFORMS, 2010.

簡易檢索 / 詳目顯示

相關論文