使用多代理人強化學習於無線快取網路設計空中基地台三維路徑之研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖鍇旻 Kai-Min Liao
論文名稱：	使用多代理人強化學習於無線快取網路設計空中基地台三維路徑之研究 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks Using Multi-Agent Reinforcement Learning
指導教授：	陳昱嘉 Yu-Jia Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	86
中文關鍵詞：	無人機、路徑設計、無線快取、多代理人強化學習
外文關鍵詞：	Unmanned aerial vehicles (UAVs), trajectory design, wireless caching, multi-agent reinforcement learning
相關次數：	點閱：16 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本論文中，我們考慮一個無線裝置間通訊（Device-to-device, D2D）網路，藉由在三維空間中設計具快取功能無人機的最佳路徑，以最大化長期網路吞吐量。由於能將熱門內容快取在鄰近移動用戶中，D2D 快取能夠有效提升網路吞吐量並減輕網路後傳負擔。此外，無人機因為具有高移動性以及可靈活布署等特徵，所以將其視為飛行基站的研究也漸漸受到關注。使用具快取功能的無人機可以追蹤用戶的移動模式，並藉由有限的快取儲存空間提供服務。然而，由於動態環境中具有頻繁變化的網路拓撲，在需同時考慮到空中與地面快取節點的情況下，設計出最佳的無人機路徑軌跡具有一定的挑戰性。針對此挑戰，我們提出了一種基於多代理人強化學習的新穎框架，該框架能在不需中央協調器的情況下以分布式學習設計出每台無人機的最佳三維路徑。在所提出之方法中，一定距離內的多台無人機可以透過共享經驗來共同決定飛行決策。模擬結果展示了我們的演算法優於傳統的單代理人以及多代理人Q學習演算法。本論文將具有快取功能的無人機作為地面 D2D 快取網路的重要輔助，並證實其可行性以及有效性。

This paper investigates a dynamic 3D trajectory design of multiple cache-enabled unmanned aerial vehicles (UAVs) in a wireless device-to-device (D2D) caching network with the goal of maximizing the long-term network throughput. By storing popular content at the nearby mobile user devices, D2D caching is an efficient method to improve network throughput and alleviate backhaul burden. With the attractive features of high mobility and flexible deployment, UAVs have recently attracted significant attention as cache-enabled flying base stations. The use of cache-enabled UAVs opens up the possibility of tracking the mobility pattern of the corresponding users and serving them under limited cache storage capacity. However, it is challenging to determine the optimal UAV trajectory due to the dynamic environment with frequently changing network topology and the coexistence of aerial and terrestrial caching nodes. In response, we propose a novel multi-agent reinforcement learning based framework to determine the optimal 3D trajectory of each UAV in a distributed manner without a central coordinator. In the proposed method, multiple UAVs can cooperatively make flight decisions by sharing the gained experiences within a certain proximity to each other. Simulation results reveal that our algorithm outperforms the traditional single- and multi-agent Q-learning algorithms. This work confirms the feasibility and effectiveness of cache-enabled UAVs which serve as an important complement to terrestrial D2D caching nodes.

論文摘要................................................................................................. i
Abstract.................................................................................................... ii
目錄............................................................................................................. iv
圖目錄......................................................................................................... v
表目錄......................................................................................................... vi
一、緒論..................................................................................... 1
1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究動機與目的. . . . . . . . . . . . . . . . . . . . 2
1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . 3
二、文獻探討............................................................................. 4
三、系統模型............................................................................. 7
3.1 網路模型. . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 通道模型. . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 無人機對地通道模型. . . . . . . . . . . . . . . . . . 9
3.2.2 地對地通道模型. . . . . . . . . . . . . . . . . . . . 10
3.3 快取模型. . . . . . . . . . . . . . . . . . . . . . . . 11
四、空對地無線快取網路中的內容傳輸................................. 13
4.1 內容傳輸方式. . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 無人機對裝置通訊鏈路成功傳輸率. . . . . . . . . . 14
4.1.2 裝置間通訊鏈路成功傳輸率. . . . . . . . . . . . . . 15
4.1.3 基站對裝置通訊鏈路成功傳輸率. . . . . . . . . . . 17
4.2 問題闡述. . . . . . . . . . . . . . . . . . . . . . . . 17
五、以基於獨立代理人強化學習設計三維無人機路徑......... 21
5.1 基於K-means分群演算法的無人機初始布建位置. . 21
5.2 基於獨立Q-learning之路徑設計. . . . . . . . . . . . 22
六、以協作式多代理人強化學習設計三維無人機路徑......... 25
6.1 演算法設計. . . . . . . . . . . . . . . . . . . . . . . 26
6.2 基於協作式多代理人強化學習之演算法分析. . . . . 30
6.2.1 收斂性分析. . . . . . . . . . . . . . . . . . . . . . . 30
6.2.2 複雜度分析. . . . . . . . . . . . . . . . . . . . . . . 32
七、模擬結果與分析................................................................. 33
八、未來研究方向及議題......................................................... 44
8.1 前言背景. . . . . . . . . . . . . . . . . . . . . . . . 44
8.2 衛星星座快取網路. . . . . . . . . . . . . . . . . . . 47
8.3 衛星輔助快取無人機網路. . . . . . . . . . . . . . . 48
8.4 全球覆蓋之高性能非地面網路. . . . . . . . . . . . . 50
8.5 延遲導向之衛星輔助無人機多接取邊緣運算系統. . 50
九、結論與貢獻......................................................................... 52
參考文獻..................................................................................................... 53
附錄一......................................................................................................... 63
                                

[1] 3GPP, “Study on New Radio (NR) to support non-terrestrial networks,” Tech. Spec. 38.811 version 15.4.0 (Sep. 2020). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/38_series/38.811/
[2] X. Wang, Y. Zhang, V. C. M. Leung, N. Guizani, and T. Jiang, “D2D big data: Content deliveries over wireless device-to-device sharing in large-scale mobile networks,” IEEE Wireless Communications, vol. 25, no. 1, pp. 32–38, Feb. 2018.
[3] G. J. Nunns, Y. J. Chen, D. K. Chang, K. M. Liao, F. P. Tso, and L. Cui, “Autonomous flying WiFi access point,” in IEEE Symposium on Computers and Communications (ISCC), 2019.
[4] Y. J. Chen and D. Y. Huang, “Trajectory optimization for cellularenabled UAV with connectivity outage constraint,” IEEE Access, vol. 8, pp. 29 205–29 218, 2020.
[5] M. Mozaffari,W. Saad, M. Bennis, Y. Nam, and M. Debbah, “A tutorial on UAVs for wireless networks: Applications, challenges, and open problems,” IEEE Commun. Surveys Tuts., vol. 21, no. 3, pp. 2334–2360, 2019.
[6] M. Chen, W. Saad, and C. Yin, “Liquid state machine learning for resource and cache management in LTE-U unmanned aerial vehicle (UAV) networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 3, pp. 1504–1517, Mar. 2019.
[7] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong, “Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimized quality-of-experience,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 5, pp. 1046–1061, May 2017.
[8] V. Sharma, I. You, D. N. K. Jayakody, D. G. Reina, and K. R. Choo, “Neural-blockchain-based ultrareliable caching for edge-enabled UAV networks,” IEEE Transactions on Industrial Informatics, vol. 15, no. 10, pp. 5723–5736, Oct. 2019.
[9] M. Chen,W. Saad, and C. Yin, “Echo-liquid state deep learning for 360° content transmission and caching in wireless VR networks with cellularconnected UAVs,” IEEE Transactions on Communications, vol. 67, no. 9, pp. 6386–6400, Sep. 2019.
[10] F. Cheng, G. Gui, N. Zhao, Y. Chen, J. Tang, and H. Sari, “UAVrelaying- assisted secure transmission with caching,” IEEE Transactions on Communications, vol. 67, no. 5, pp. 3140–3153, May 2019.
[11] N. Zhao, F. Cheng, F. R. Yu, J. Tang, Y. Chen, G. Gui, and H. Sari, “Caching UAV assisted secure transmission in hyper-dense networks based on interference alignment,” IEEE Transactions on Communications, vol. 66, no. 5, pp. 2281–2294, May 2018.
[12] H. Wang, J. Wang, G. Ding, L. Wang, T. A. Tsiftsis, and P. K. Sharma, “Resource allocation for energy harvesting-powered D2D communication underlaying UAV-assisted networks,” IEEE Transactions on Green Communications and Networking, vol. 2, no. 1, pp. 14–24, Mar. 2018.
[13] A. Asheralieva and D. Niyato, “Game theory and Lyapunov optimization for cloud-based content delivery networks with device-to-device and UAV-enabled caching,” IEEE Transactions on Vehicular Technology, vol. 68, no. 10, pp. 10 094–10 110, Oct. 2019.
[14] L. Liu, S. Zhang, and R. Zhang, “CoMP in the sky: UAV placement and movement optimization for multi-user communications,” IEEE Transactions on Communications, vol. 67, no. 8, pp. 5645–5658, Aug. 2019.
[15] Y. Zeng, X. Xu, and R. Zhang, “Trajectory design for completion time minimization in UAV-enabled multicasting,” IEEE Transactions on Wireless Communications, vol. 17, no. 4, pp. 2233–2246, Apr. 2018.
[16] S. Zhang, H. Zhang, B. Di, and L. Song, “Cellular UAV-to-X communications: Design and optimization for multi-UAV networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 2, pp. 1346– 1359, Feb. 2019.
[17] Q. Wu, Y. Zeng, and R. Zhang, “Joint trajectory and communication design for multi-UAV enabled wireless networks,” IEEE Transactions on Wireless Communications, vol. 17, no. 3, pp. 2109–2121, Mar. 2018.
[18] M. Li, N. Cheng, J. Gao, Y. Wang, L. Zhao, and X. Shen, “Energyefficient UAV-assisted mobile edge computing: Resource allocation and trajectory optimization,” IEEE Transactions on Vehicular Technology, vol. 69, no. 3, pp. 3424–3438, Mar. 2020.
[19] P. Yang, X. Cao, X. Xi, Z. Xiao, and D.Wu, “Three-dimensional dronecell deployment for congestion mitigation in cellular networks,” IEEE Transactions on Vehicular Technology, vol. 67, no. 10, pp. 9867–9881, Oct. 2018.
[20] 3GPP, “Enhancement for unmanned aerial vehicles,” Tech. Rep. 22.829 version 17.1.0 (Sep. 2019). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/22_series/22.829/
[21] Y. J. Chen, K. M. Liao, M. L. Ku, and F. P. Tso, “Mobility-aware probabilistic caching in UAV-assisted wireless D2D networks,” in IEEE Global Communications Conference (GLOBECOM), Dec. 2019.
[22] L. Wang, Y. Chao, S. Cheng, and Z. Han, “An integrated affinity propagation and machine learning approach for interference management in drone base stations,” IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 1, pp. 83–94, Mar. 2020.
[23] X. Liu, Y. Liu, and Y. Chen, “Reinforcement learning in multiple-UAV networks: Deployment and movement design,” IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 8036–8049, Aug. 2019.
[24] J. Cui, Y. Liu, and A. Nallanathan, “Multi-agent reinforcement learningbased resource allocation for UAV networks,” IEEE Transactions on Wireless Communications, vol. 19, no. 2, pp. 729–743, Feb. 2020.
[25] Y. J. Chen, D. K. Chang, and C. Zhang, “Autonomous tracking using a swarm of UAVs: A constrained multi-agent reinforcement learning approach,” IEEE Transactions on Vehicular Technology, Early Access, 2020.
[26] F. Y. Wu, H. L. Zhang, J. J. Wu, and L. Y. Song, “Cellular UAVto- device communications: Trajectory design and mode selection by multi-agent deep reinforcement learning,” IEEE Transactions on Communications, Early Access, 2020.
[27] X. Liu, Y. Liu, Y. Chen, and L. Hanzo, “Trajectory design and power control for multi-UAV assisted wireless networks: A machine learning approach,” IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 7957–7969, 2019.
[28] K. Zhang, Z. Yang, and T. Ba¸sar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” arXiv preprint arXiv:1911.10635, 2019.
[29] R. Amer,W. Saad, and N. Marchetti, “Mobility in the sky: Performance and mobility analysis for cellular-connected UAVs,” IEEE Transactions on Communications, Early Access, 2020.
[30] J. G. Andrews, F. Baccelli, and R. K. Ganti, “A tractable approach to coverage and rate in cellular networks,” IEEE Transactions on Communications, vol. 59, no. 11, pp. 3122–3134, 2011.
[31] M. N. Anjum and H.Wang, “Mobility modeling and stochastic property analysis of airborne network,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 3, pp. 1282–1294, 2020.
[32] Z. Ma, B. Ai, R. He, G.Wang, Y. Niu, and Z. Zhong, “A wideband nonstationary air-to-air channel model for UAV communications,” IEEE Transactions on Vehicular Technology, vol. 69, no. 2, pp. 1214–1226, Feb. 2020. [33] Y. Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing UAV,” IEEE Transactions on Wireless Communications, vol. 18, no. 4, pp. 2329–2345, Apr. 2019.
[34] X. Wang and M. C. Gursoy, “Coverage analysis for energy-harvesting UAV-assisted mmWave cellular networks,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 12, pp. 2832–2850, Dec. 2019.
[35] J. Dai, J. Liu, Y. Shi, S. Zhang, and J. Ma, “Analytical modeling of resource allocation in D2D overlaying multihop multichannel uplink cellular networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 8, pp. 6633–6644, Aug. 2017.
[36] Z. Chen, N. Pappas, and M. Kountouris, “Probabilistic caching in wireless D2D networks: Cache hit optimal versus throughput optimal,” IEEE Communications Letters, vol. 21, no. 3, pp. 584–587, Mar. 2017.
[37] F. Song, J. Li, M. Ding, L. Shi, F. Shu, M. Tao,W. Chen, and H. V. Poor, “Probabilistic caching for small-cell networks with terrestrial and aerial users,” IEEE Transactions on Vehicular Technology, vol. 68, no. 9, pp. 9162–9177, Sep. 2019.
[38] J. Wen, K. Huang, S. Yang, and V. O. K. Li, “Cache-enabled heterogeneous cellular networks: Optimal tier-level content placement,” IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 5939– 5952, Sep. 2017.
[39] B. Blaszczyszyn and A. Giovanidis, “Optimal geographic caching in cellular networks,” in IEEE International Conference on Communications (ICC), 2015.
[40] J. Rao, H. Feng, C. Yang, Z. Chen, and B. Xia, “Optimal caching placement for D2D assisted wireless caching networks,” in IEEE International Conference on Communications (ICC), 2016.
[41] R. Wang, R. Li, E. Liu, and P. Wang, “Performance analysis and optimization of caching placement in heterogeneous wireless networks,” IEEE Communications Letters, vol. 23, no. 10, pp. 1883–1887, Oct. 2019.
[42] Q. Pham, S. Mirjalili, N. Kumar, M. Alazab, and W. Hwang, “Whale optimization algorithm with applications to resource allocation in wireless networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 4285–4297, Apr. 2020.
[43] A. M. Koushik, F. Hu, and S. Kumar, “Deep Q-learning-based node positioning for throughput-optimal communications in dynamic UAV swarm network,” IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 554–566, Sep. 2019.
[44] Y. J. Chen and D. Y. Huang, “Trajectory optimization for cellularenabled UAV with connectivity outage constraint,” IEEE Access, vol. 8, pp. 29 205–29 218, 2020.
[45] N. Fulda and D. Ventura, “Predicting and preventing coordination problems in cooperative Q-learning systems,” in International Joint Conference on Artificial Intelligence (IJCAI), 2007.
[46] K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Ba¸sar, “Fully decentralized multi-agent reinforcement learning with networked agents,” arXiv preprint arXiv:1802.08757, 2018.
[47] X. Wu and J. Lu, “Fenchel dual gradient methods for distributed convex optimization over time-varying networks,” IEEE Transactions on Automatic Control, vol. 64, no. 11, pp. 4629–4636, Nov. 2019.
[48] Y. Chon, E. Talipov, H. Shin, and H. Cha, “Mobility prediction-based smartphone energy optimization for everyday location monitoring,” in Proc. 9th ACM Conf. Embedded Netw. Sensor Syst., Seattle, WA, USA, Nov. 2011.
[49] M. Carrascosa and B. Bellalta, “Decentralized AP selection using multiarmed bandits: Opportunistic "-greedy with stickiness,” IEEE Symposium on Computers and Communications (ISCC), 2019.
[50] N. Rupasinghe, Y. Yapıcı, . Güvenç, and Y. Kakishima, “Nonorthogonal multiple access for mmWave drone networks with limited feedback,” IEEE Transactions on Communications, vol. 67, no. 1, pp. 762–777, Jun. 2019.
[51] X. Lu, E. Sopin, V. Petrov, O. Galinina, D. Moltchanov, K. Ageev, S. Andreev, Y. Koucheryavy, K. Samouylov, and M. Dohler, “Integrated use of licensed- and unlicensed-band mmWave radio technology in 5G and beyond,” IEEE Access, vol. 7, pp. 24 376–24 391, 2019.
[52] S. Zhang, N. Zhang, P. Yang, and X. Shen, “Cost-effective cache deployment in mobile heterogeneous networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 12, pp. 11 264–11 276, Dec. 2017.
[53] 3GPP, “Service requirements for the 5G system, stage 1,” Tech. Spec. 22.261 version 18.1.0 (Dec. 2020). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/22_series/22.261/
[54] J. H. Lee, J. Park, M. Bennis, and Y. C. Ko, “Integrating LEO satellites and multi-UAV reinforcement learning for hybrid FSO/RF nonterrestrial networks,” arXiv preprint arXiv:2010.10138, 2020.
[55] 3GPP, “Solutions for NR to support non-terrestrial networks (NTN),” Tech. Spec. 38.821 version 16.0.0 (Dec. 2019). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/38_series/38.821/
[56] G. Yang, M. Xiao, and H. V. Poor, “Low-latency millimeter-wave communications: Traffic dispersion or network densification?” IEEE Transactions on Communications, vol. 66, no. 8, pp. 3526–3539, Aug. 2018.
[57] Z. Jia, M. Sheng, J. Li, D. Niyato, and Z. Han, “LEO satellite-assisted UAV: Joint trajectory and data collection for internet of remote things in 6G aerial access networks,” IEEE Internet of Things Journal, Early Access, 2020.
[58] M. Qin, N. Cheng, Z. Jing, T. Yang, W. Xu, Q. Yang, and R. R. Rao, “Service-oriented energy-latency tradeoff for IoT task partial offloading in MEC-enhanced multi-RAT networks,” IEEE Internet of Things Journal, vol. 8, no. 3, pp. 1896–1907, Feb. 2021.
[59] C. Jiang and Z. Li, “Decreasing big data application latency in satellite link by caching and peer selection,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 2555–2565, May 2020.
[60] Z. Yang, Y. Li, P. Yuan, and Q. Zhang, “TCSC: A novel file distribution strategy in integrated LEO satellite-terrestrial networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 5, pp. 5426–5441, May 2020.
[61] X. Hu, Y. Zhang, X. Liao, Z. Liu, W. Wang, and F. M. Ghannouchi, “Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems,” IEEE Transactions on Broadcasting, vol. 66, no. 3, pp. 630–646, Sep. 2020.
[62] C. Zhou,W.Wu, H. He, P. Yang, F. Lyu, N. Cheng, and X. Shen, “Deep reinforcement learning for delay-oriented IoT task scheduling in spaceair- ground integrated network,” IEEE Transactions on Wireless Communications, Early Access, 2020.

簡易檢索 / 詳目顯示

相關論文