| 研究生: |
廖鍇旻 Kai-Min Liao |
|---|---|
| 論文名稱: |
使用多代理人強化學習於無線快取網路設計空中基地台三維路徑之研究 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks Using Multi-Agent Reinforcement Learning |
| 指導教授: |
陳昱嘉
Yu-Jia Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 86 |
| 中文關鍵詞: | 無人機 、路徑設計 、無線快取 、多代理人強化學習 |
| 外文關鍵詞: | Unmanned aerial vehicles (UAVs), trajectory design, wireless caching, multi-agent reinforcement learning |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們考慮一個無線裝置間通訊(Device-to-device, D2D)網路,藉由在三維空間中設計具快取功能無人機的最佳路徑,以最大化長期網路吞吐量。由於能將熱門內容快取在鄰近移動用戶中,D2D 快取能夠有效提升網路吞吐量並減輕網路後傳負擔。此外,無人機因為具有高移動性以及可靈活布署等特徵,所以將其視為飛行基站的研究也漸漸受到關注。使用具快取功能的無人機可以追蹤用戶的移動模式,並藉由有限的快取儲存空間提供服務。然而,由於動態環境中具有頻繁變化的網路拓撲,在需同時考慮到空中與地面快取節點的情況下,設計出最佳的無人機路徑軌跡具有一定的挑戰性。針對此挑戰,我們提出了一種基於多代理人強化學習的新穎框架,該框架能在不需中央協調器的情況下以分布式學習設計出每台無人機的最佳三維路徑。在所提出之方法中,一定距離內的多台無人機可以透過共享經驗來共同決定飛行決策。模擬結果展示了我們的演算法優於傳統的單代理人以及多代理人Q學習演算法。本論文將具有快取功能的無人機作為地面 D2D 快取網路的重要輔助,並證實其可行性以及有效性。
This paper investigates a dynamic 3D trajectory design of multiple cache-enabled unmanned aerial vehicles (UAVs) in a wireless device-to-device (D2D) caching network with the goal of maximizing the long-term network throughput. By storing popular content at the nearby mobile user devices, D2D caching is an efficient method to improve network throughput and alleviate backhaul burden. With the attractive features of high mobility and flexible deployment, UAVs have recently attracted significant attention as cache-enabled flying base stations. The use of cache-enabled UAVs opens up the possibility of tracking the mobility pattern of the corresponding users and serving them under limited cache storage capacity. However, it is challenging to determine the optimal UAV trajectory due to the dynamic environment with frequently changing network topology and the coexistence of aerial and terrestrial caching nodes. In response, we propose a novel multi-agent reinforcement learning based framework to determine the optimal 3D trajectory of each UAV in a distributed manner without a central coordinator. In the proposed method, multiple UAVs can cooperatively make flight decisions by sharing the gained experiences within a certain proximity to each other. Simulation results reveal that our algorithm outperforms the traditional single- and multi-agent Q-learning algorithms. This work confirms the feasibility and effectiveness of cache-enabled UAVs which serve as an important complement to terrestrial D2D caching nodes.
[1] 3GPP, “Study on New Radio (NR) to support non-terrestrial networks,” Tech. Spec. 38.811 version 15.4.0 (Sep. 2020). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/38_series/38.811/
[2] X. Wang, Y. Zhang, V. C. M. Leung, N. Guizani, and T. Jiang, “D2D big data: Content deliveries over wireless device-to-device sharing in large-scale mobile networks,” IEEE Wireless Communications, vol. 25, no. 1, pp. 32–38, Feb. 2018.
[3] G. J. Nunns, Y. J. Chen, D. K. Chang, K. M. Liao, F. P. Tso, and L. Cui, “Autonomous flying WiFi access point,” in IEEE Symposium on Computers and Communications (ISCC), 2019.
[4] Y. J. Chen and D. Y. Huang, “Trajectory optimization for cellularenabled UAV with connectivity outage constraint,” IEEE Access, vol. 8, pp. 29 205–29 218, 2020.
[5] M. Mozaffari,W. Saad, M. Bennis, Y. Nam, and M. Debbah, “A tutorial on UAVs for wireless networks: Applications, challenges, and open problems,” IEEE Commun. Surveys Tuts., vol. 21, no. 3, pp. 2334–2360, 2019.
[6] M. Chen, W. Saad, and C. Yin, “Liquid state machine learning for resource and cache management in LTE-U unmanned aerial vehicle (UAV) networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 3, pp. 1504–1517, Mar. 2019.
[7] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong, “Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimized quality-of-experience,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 5, pp. 1046–1061, May 2017.
[8] V. Sharma, I. You, D. N. K. Jayakody, D. G. Reina, and K. R. Choo, “Neural-blockchain-based ultrareliable caching for edge-enabled UAV networks,” IEEE Transactions on Industrial Informatics, vol. 15, no. 10, pp. 5723–5736, Oct. 2019.
[9] M. Chen,W. Saad, and C. Yin, “Echo-liquid state deep learning for 360° content transmission and caching in wireless VR networks with cellularconnected UAVs,” IEEE Transactions on Communications, vol. 67, no. 9, pp. 6386–6400, Sep. 2019.
[10] F. Cheng, G. Gui, N. Zhao, Y. Chen, J. Tang, and H. Sari, “UAVrelaying- assisted secure transmission with caching,” IEEE Transactions on Communications, vol. 67, no. 5, pp. 3140–3153, May 2019.
[11] N. Zhao, F. Cheng, F. R. Yu, J. Tang, Y. Chen, G. Gui, and H. Sari, “Caching UAV assisted secure transmission in hyper-dense networks based on interference alignment,” IEEE Transactions on Communications, vol. 66, no. 5, pp. 2281–2294, May 2018.
[12] H. Wang, J. Wang, G. Ding, L. Wang, T. A. Tsiftsis, and P. K. Sharma, “Resource allocation for energy harvesting-powered D2D communication underlaying UAV-assisted networks,” IEEE Transactions on Green Communications and Networking, vol. 2, no. 1, pp. 14–24, Mar. 2018.
[13] A. Asheralieva and D. Niyato, “Game theory and Lyapunov optimization for cloud-based content delivery networks with device-to-device and UAV-enabled caching,” IEEE Transactions on Vehicular Technology, vol. 68, no. 10, pp. 10 094–10 110, Oct. 2019.
[14] L. Liu, S. Zhang, and R. Zhang, “CoMP in the sky: UAV placement and movement optimization for multi-user communications,” IEEE Transactions on Communications, vol. 67, no. 8, pp. 5645–5658, Aug. 2019.
[15] Y. Zeng, X. Xu, and R. Zhang, “Trajectory design for completion time minimization in UAV-enabled multicasting,” IEEE Transactions on Wireless Communications, vol. 17, no. 4, pp. 2233–2246, Apr. 2018.
[16] S. Zhang, H. Zhang, B. Di, and L. Song, “Cellular UAV-to-X communications: Design and optimization for multi-UAV networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 2, pp. 1346– 1359, Feb. 2019.
[17] Q. Wu, Y. Zeng, and R. Zhang, “Joint trajectory and communication design for multi-UAV enabled wireless networks,” IEEE Transactions on Wireless Communications, vol. 17, no. 3, pp. 2109–2121, Mar. 2018.
[18] M. Li, N. Cheng, J. Gao, Y. Wang, L. Zhao, and X. Shen, “Energyefficient UAV-assisted mobile edge computing: Resource allocation and trajectory optimization,” IEEE Transactions on Vehicular Technology, vol. 69, no. 3, pp. 3424–3438, Mar. 2020.
[19] P. Yang, X. Cao, X. Xi, Z. Xiao, and D.Wu, “Three-dimensional dronecell deployment for congestion mitigation in cellular networks,” IEEE Transactions on Vehicular Technology, vol. 67, no. 10, pp. 9867–9881, Oct. 2018.
[20] 3GPP, “Enhancement for unmanned aerial vehicles,” Tech. Rep. 22.829 version 17.1.0 (Sep. 2019). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/22_series/22.829/
[21] Y. J. Chen, K. M. Liao, M. L. Ku, and F. P. Tso, “Mobility-aware probabilistic caching in UAV-assisted wireless D2D networks,” in IEEE Global Communications Conference (GLOBECOM), Dec. 2019.
[22] L. Wang, Y. Chao, S. Cheng, and Z. Han, “An integrated affinity propagation and machine learning approach for interference management in drone base stations,” IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 1, pp. 83–94, Mar. 2020.
[23] X. Liu, Y. Liu, and Y. Chen, “Reinforcement learning in multiple-UAV networks: Deployment and movement design,” IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 8036–8049, Aug. 2019.
[24] J. Cui, Y. Liu, and A. Nallanathan, “Multi-agent reinforcement learningbased resource allocation for UAV networks,” IEEE Transactions on Wireless Communications, vol. 19, no. 2, pp. 729–743, Feb. 2020.
[25] Y. J. Chen, D. K. Chang, and C. Zhang, “Autonomous tracking using a swarm of UAVs: A constrained multi-agent reinforcement learning approach,” IEEE Transactions on Vehicular Technology, Early Access, 2020.
[26] F. Y. Wu, H. L. Zhang, J. J. Wu, and L. Y. Song, “Cellular UAVto- device communications: Trajectory design and mode selection by multi-agent deep reinforcement learning,” IEEE Transactions on Communications, Early Access, 2020.
[27] X. Liu, Y. Liu, Y. Chen, and L. Hanzo, “Trajectory design and power control for multi-UAV assisted wireless networks: A machine learning approach,” IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 7957–7969, 2019.
[28] K. Zhang, Z. Yang, and T. Ba¸sar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” arXiv preprint arXiv:1911.10635, 2019.
[29] R. Amer,W. Saad, and N. Marchetti, “Mobility in the sky: Performance and mobility analysis for cellular-connected UAVs,” IEEE Transactions on Communications, Early Access, 2020.
[30] J. G. Andrews, F. Baccelli, and R. K. Ganti, “A tractable approach to coverage and rate in cellular networks,” IEEE Transactions on Communications, vol. 59, no. 11, pp. 3122–3134, 2011.
[31] M. N. Anjum and H.Wang, “Mobility modeling and stochastic property analysis of airborne network,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 3, pp. 1282–1294, 2020.
[32] Z. Ma, B. Ai, R. He, G.Wang, Y. Niu, and Z. Zhong, “A wideband nonstationary air-to-air channel model for UAV communications,” IEEE Transactions on Vehicular Technology, vol. 69, no. 2, pp. 1214–1226, Feb. 2020. [33] Y. Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing UAV,” IEEE Transactions on Wireless Communications, vol. 18, no. 4, pp. 2329–2345, Apr. 2019.
[34] X. Wang and M. C. Gursoy, “Coverage analysis for energy-harvesting UAV-assisted mmWave cellular networks,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 12, pp. 2832–2850, Dec. 2019.
[35] J. Dai, J. Liu, Y. Shi, S. Zhang, and J. Ma, “Analytical modeling of resource allocation in D2D overlaying multihop multichannel uplink cellular networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 8, pp. 6633–6644, Aug. 2017.
[36] Z. Chen, N. Pappas, and M. Kountouris, “Probabilistic caching in wireless D2D networks: Cache hit optimal versus throughput optimal,” IEEE Communications Letters, vol. 21, no. 3, pp. 584–587, Mar. 2017.
[37] F. Song, J. Li, M. Ding, L. Shi, F. Shu, M. Tao,W. Chen, and H. V. Poor, “Probabilistic caching for small-cell networks with terrestrial and aerial users,” IEEE Transactions on Vehicular Technology, vol. 68, no. 9, pp. 9162–9177, Sep. 2019.
[38] J. Wen, K. Huang, S. Yang, and V. O. K. Li, “Cache-enabled heterogeneous cellular networks: Optimal tier-level content placement,” IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 5939– 5952, Sep. 2017.
[39] B. Blaszczyszyn and A. Giovanidis, “Optimal geographic caching in cellular networks,” in IEEE International Conference on Communications (ICC), 2015.
[40] J. Rao, H. Feng, C. Yang, Z. Chen, and B. Xia, “Optimal caching placement for D2D assisted wireless caching networks,” in IEEE International Conference on Communications (ICC), 2016.
[41] R. Wang, R. Li, E. Liu, and P. Wang, “Performance analysis and optimization of caching placement in heterogeneous wireless networks,” IEEE Communications Letters, vol. 23, no. 10, pp. 1883–1887, Oct. 2019.
[42] Q. Pham, S. Mirjalili, N. Kumar, M. Alazab, and W. Hwang, “Whale optimization algorithm with applications to resource allocation in wireless networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 4285–4297, Apr. 2020.
[43] A. M. Koushik, F. Hu, and S. Kumar, “Deep Q-learning-based node positioning for throughput-optimal communications in dynamic UAV swarm network,” IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 554–566, Sep. 2019.
[44] Y. J. Chen and D. Y. Huang, “Trajectory optimization for cellularenabled UAV with connectivity outage constraint,” IEEE Access, vol. 8, pp. 29 205–29 218, 2020.
[45] N. Fulda and D. Ventura, “Predicting and preventing coordination problems in cooperative Q-learning systems,” in International Joint Conference on Artificial Intelligence (IJCAI), 2007.
[46] K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Ba¸sar, “Fully decentralized multi-agent reinforcement learning with networked agents,” arXiv preprint arXiv:1802.08757, 2018.
[47] X. Wu and J. Lu, “Fenchel dual gradient methods for distributed convex optimization over time-varying networks,” IEEE Transactions on Automatic Control, vol. 64, no. 11, pp. 4629–4636, Nov. 2019.
[48] Y. Chon, E. Talipov, H. Shin, and H. Cha, “Mobility prediction-based smartphone energy optimization for everyday location monitoring,” in Proc. 9th ACM Conf. Embedded Netw. Sensor Syst., Seattle, WA, USA, Nov. 2011.
[49] M. Carrascosa and B. Bellalta, “Decentralized AP selection using multiarmed bandits: Opportunistic "-greedy with stickiness,” IEEE Symposium on Computers and Communications (ISCC), 2019.
[50] N. Rupasinghe, Y. Yapıcı, . Güvenç, and Y. Kakishima, “Nonorthogonal multiple access for mmWave drone networks with limited feedback,” IEEE Transactions on Communications, vol. 67, no. 1, pp. 762–777, Jun. 2019.
[51] X. Lu, E. Sopin, V. Petrov, O. Galinina, D. Moltchanov, K. Ageev, S. Andreev, Y. Koucheryavy, K. Samouylov, and M. Dohler, “Integrated use of licensed- and unlicensed-band mmWave radio technology in 5G and beyond,” IEEE Access, vol. 7, pp. 24 376–24 391, 2019.
[52] S. Zhang, N. Zhang, P. Yang, and X. Shen, “Cost-effective cache deployment in mobile heterogeneous networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 12, pp. 11 264–11 276, Dec. 2017.
[53] 3GPP, “Service requirements for the 5G system, stage 1,” Tech. Spec. 22.261 version 18.1.0 (Dec. 2020). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/22_series/22.261/
[54] J. H. Lee, J. Park, M. Bennis, and Y. C. Ko, “Integrating LEO satellites and multi-UAV reinforcement learning for hybrid FSO/RF nonterrestrial networks,” arXiv preprint arXiv:2010.10138, 2020.
[55] 3GPP, “Solutions for NR to support non-terrestrial networks (NTN),” Tech. Spec. 38.821 version 16.0.0 (Dec. 2019). [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/38_series/38.821/
[56] G. Yang, M. Xiao, and H. V. Poor, “Low-latency millimeter-wave communications: Traffic dispersion or network densification?” IEEE Transactions on Communications, vol. 66, no. 8, pp. 3526–3539, Aug. 2018.
[57] Z. Jia, M. Sheng, J. Li, D. Niyato, and Z. Han, “LEO satellite-assisted UAV: Joint trajectory and data collection for internet of remote things in 6G aerial access networks,” IEEE Internet of Things Journal, Early Access, 2020.
[58] M. Qin, N. Cheng, Z. Jing, T. Yang, W. Xu, Q. Yang, and R. R. Rao, “Service-oriented energy-latency tradeoff for IoT task partial offloading in MEC-enhanced multi-RAT networks,” IEEE Internet of Things Journal, vol. 8, no. 3, pp. 1896–1907, Feb. 2021.
[59] C. Jiang and Z. Li, “Decreasing big data application latency in satellite link by caching and peer selection,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 2555–2565, May 2020.
[60] Z. Yang, Y. Li, P. Yuan, and Q. Zhang, “TCSC: A novel file distribution strategy in integrated LEO satellite-terrestrial networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 5, pp. 5426–5441, May 2020.
[61] X. Hu, Y. Zhang, X. Liao, Z. Liu, W. Wang, and F. M. Ghannouchi, “Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems,” IEEE Transactions on Broadcasting, vol. 66, no. 3, pp. 630–646, Sep. 2020.
[62] C. Zhou,W.Wu, H. He, P. Yang, F. Lyu, N. Cheng, and X. Shen, “Deep reinforcement learning for delay-oriented IoT task scheduling in spaceair- ground integrated network,” IEEE Transactions on Wireless Communications, Early Access, 2020.