車載網路下基於 Stackelberg 賽局和多代理人強化學習之中繼傳輸群組建立及即時影像分享

簡易檢索 / 詳目顯示

回結果列表

研究生：	朱育成 Yu-Cheng Chu
論文名稱：	車載網路下基於 Stackelberg 賽局和多代理人強化學習之中繼傳輸群組建立及即時影像分享 Using Stackelberg Game and Multi-Agent Reinforcement Learning to Self-Organize Relaying Groups for Real-Time Video Sharing in Vehicular Networks
指導教授：	胡誌麟 Chih-Lin Hu
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	102
中文關鍵詞：	仿真環境模擬、邊緣計算、車聯網、賽局理論、多代理人強化學習、車載自組織網路
外文關鍵詞：	Simulation environment modeling, Edge computing, Vehicular networks, Game theory, Multi-agent reinforcement learning, Vehicular self-organizing network
相關次數：	點閱：31 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

由於快速的城市化導致交通路況變得更不穩定，若是後方車輛無法得知前方的路況為何，當前方有事故發生或是異常狀況，將會導致反應不及發生追撞事件，造成嚴重的交通事故與安全問題。因此車輛間畫面的協作共享將會成為一項重要的議題，隨著 5G 和人工智慧的蓬勃發展，不但能利用無線通訊讓車載裝置之間進行快速的溝通，也能針對所收集到的數據進行分析部屬。有鑑於此，本研究首先使用車載仿真模擬器進行真實環境的建模，接著利用賽局理論對車載環境進行詳細的描述與定義，然後將其集成至多代理人強化學習模型，並採用 MADDPG 模型解決此問題，以挑選擁有最低延遲、最高數據傳輸率的最佳傳輸路徑，最終將車輛組成自組織網路以實現畫面傳輸共享。在分析方面，本研究針對不同的車載訊息傳輸方式、車載間跳點裝置的最大數，皆有進行評估比較，並比較了多代理人與單代理人強化學習之間的評估，實驗結果表明，部屬多代理人強化學習能使車載傳遞訊息時的效果更好，有較高的效能。最終本研究將針對傳輸延遲、數據傳輸率、功耗等三項指標進行不同模型之間的評估分析。

Due to rapid urbanization, traffic conditions have become increasingly unpredictable. In the scenarios of neighbor vehicles crowds, vehicles in the rear are unaware of the current road conditions ahead. Accidents or abnormal situations occur in the front can lead to delayed reactions and rear-end collisions, this phenomenon which results in severe traffic accidents and safety concerns. Collaborative sharing of visual information among vehicles becomes an important issue. With the rapid development of 5G and artificial intelligence, not only can wireless communications be utilized for fast data transmissions between in vehicle devices, but the data collected can also be analyzed and deployed. Hence, the study in this thesis first utilizes a vehicular simulation emulator to model real-world environments. Subsequently, the game theory is employed to provide a detailed description and definition of the vehicular environment. Both of the above two efforts are then integrated into a multi-agent reinforcement learning model, using the Multi-Agent Deep Deterministic Policy Gradient（MADDPG）approach. The objective is to select the optimal transmission path with the lowest latency and highest data transmission rate, thereby enabling vehicles to form a self-organizing network for video transmission and sharing. This study evaluates and compares different vehicular information transmission methods and the maximum number of hop devices between vehicles. In addition, this study compares the evaluations between multi-agent and single-agent reinforcement learning approaches. Experimental results demonstrate that deploying multi-agent reinforcement learning yields better performance and higher efficiency in vehicular message transmission. Finally, this study conducts evaluation and analysis among different models based on three metrics: transmission latency, data transmission rate, and power consumption.

摘要 i
Abstract ii
致謝 iii
圖目錄 vii
表目錄 ix
簡介 1
研究背景及文獻探討 4
1 影像碼率調整與傳輸 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 自適應比特率調整 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 影像串流傳輸 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 賽局理論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 多代理人強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 機器學習背景 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 多代理人強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 邊緣計算 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 車載自組織網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
研究方法 15
1 系統架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1 車載環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 數據接收率比值 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 傳輸功耗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 數據正規化 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 系統流程 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 賽局理論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 領導者效用函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 跟隨者效用函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1 單代理人強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 多代理人強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 自組織跳點網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
實驗與結果分析 39
1 實驗環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.1 參數表 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.2 模型參數設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2 Sumo 仿真環境介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.1 高速公路模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 城市模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 超參數調整影響 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1 學習率（α）影響 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 衰減率（γ ）影響 . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 模擬評估 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 車載傳輸方式比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 單跳點環境模型評估 . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 K-跳極限計算 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 多跳點環境模型評估 . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 數據傳輸率評估比較 . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 傳輸延遲評估比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 功耗評估比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
結論與未來研究 83
參考文獻 84
                                

[1] Statistics of accidents by the daoan information inquiry network. [Online].Available: https://roadsafety.tw/Dashboard/Custom?type=%E7%B5%B1%E8%A8%88%E5%BF%AB%E8%A6%BD
[2] (2019) Cisco visual networking index, global mobile data traffic forecast update, 2017-2022 white paper. [Online]. Available: http://media.mediapost.com/uploads/CiscoForecast.pdf
[3] S. Takahashi, K. Yamagishi, P. Lebreton, and J. Okamoto, “Impact of quality factors on users＇viewing behaviors in adaptive bitrate streaming services,” in 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2019, pp. 1–6.
[4] Z. Zhou, P. Liu, Z. Chang, C. Xu, and Y. Zhang, “Energy-efficient workload offloading and power control in vehicular edge computing,” in 2018 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). IEEE, 2018, pp. 191–196.
[5] X. Huang, R. Yu, J. Kang, and Y. Zhang, “Distributed reputation management for secure and efficient vehicular edge computing and networks,” IEEE Access, vol. 5, pp. 25 408–25 420, 2017.
[6] M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hoßfeld, and P. Tran-Gia, “A survey on quality of experience of http adaptive streaming,” IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 469–492, 2014.
[7] A.-T. Tran, N.-N. Dao, and S. Cho, “Bitrate adaptation for video streaming services in edge caching systems,” IEEE Access, vol. 8, pp. 135 844–135 852, 2020.
[8] Z. Wang, Y. Cui, X. Hu, X. Wang, W. T. Ooi, Z. Cao, and Y. Li, “Multilive: Adaptive bitrate control for low-delay multi-party interactive live streaming,” IEEE/ACM Transactions on Networking, vol. 30, no. 2, pp. 923–938, 2021.
[9] H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video streaming with pensieve,” in Proceedings of the conference of the ACM special interest group on data communication, 2017, pp. 197–210.
[10] M. Naresh, N. Gireesh, P. Saxena, and M. Gupta, “Sac-abr: Soft actor-critic based deep reinforcement learning for adaptive bitrate streaming,” in 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS). IEEE, 2022, pp. 353–361.
[11] Y. Guo, F. R. Yu, J. An, K. Yang, C. Yu, and V. C. Leung, “Adaptive bitrate streaming in wireless networks with transcoding at network edge using deep reinforcement learning,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 3879–3892, 2020.
[12] H. Jin, Q. Wang, S. Li, and J. Chen, “Joint qos control and bitrate selection for video streaming based on multi-agent reinforcement learning,” in 2020 IEEE 16th International Conference on Control & Automation (ICCA). IEEE, 2020, pp. 1360–1365.
[13] J. Cao, X. Su, B. Finley, A. Pauanne, M. Ammar, and P. Hui, “Evaluating multimedia protocols on 5g edge for mobile augmented reality,” in 2021 17th International Conference on Mobility, Sensing and Networking (MSN). IEEE, 2021, pp. 199–206.
[14] Y. He, X. Hu, H. Wang, and J. Li, “Development and realization of home online teaching system based on video data analysis,” in 2020 5th international conference on mechanical, control and computer engineering (ICMCCE). IEEE, 2020, pp. 2097–2101.
[15] Y. Jing and Q. Gao, “Design and implementation of live streaming system for wearable devices,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 2018, pp. 1–5.
[16] Z. Zhu, X. Feng, Z. Tang, N. Jiang, T. Guo, L. Xu, and S. Wei, “Power-efficient live virtual reality streaming using edge offloading,” in Proceedings of the 32nd Workshop on Network and Operating Systems Support for Digital Audio and Video, 2022, pp. 57–63.
[17] J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and A. D. Dragan,“Hierarchical game-theoretic planning for autonomous vehicles,” in 2019 International conference on robotics and automation (ICRA). IEEE, 2019, pp. 9590–9596.
[18] N. Li, Y. Yao, I. Kolmanovsky, E. Atkins, and A. R. Girard, “Game-theoretic modeling of multi-vehicle interactions at uncontrolled intersections,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 2, pp. 1428–1442, 2020.
[19] S. Çalışır and M. K. Pehlivanoğlu, “Model-free reinforcement learning algorithms: A survey,” in 2019 27th signal processing and communications applications conference (SIU). IEEE, 2019, pp. 1–4.
[20] X. Hu, S. Xu, L. Wang, Y. Wang, Z. Liu, L. Xu, Y. Li, and W. Wang, “A joint power and bandwidth allocation method based on deep reinforcement learning for v2v communications in 5g,” China Communications, vol. 18, no. 7, pp. 25–35, 2021.
[21] X. Wei, M. Zhou, S. Kwong, H. Yuan, and T. Xiang, “Joint reinforcement learning and game theory bitrate control method for 360-degree dynamic adaptive streaming,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 4230–4234.
[22] X. Li, H. Yang, Q. Yao, B. Bao, J. Li, and J. Zhang, “Deep reinforcement learningbased power and caching joint optimal allocation over mobile edge computing,” in 2020 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 2020, pp. 1–3.
[23] T. Li, K. Zhu, N. C. Luong, D. Niyato, Q. Wu, Y. Zhang, and B. Chen, “Applications of multi-agent reinforcement learning in future internet: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 1240–1279, 2022.
[24] X. Li, L. Lu, W. Ni, A. Jamalipour, D. Zhang, and H. Du, “Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications,” IEEE Transactions on Vehicular Technology, vol. 71, no. 8, pp. 8810–8824, 2022.
[25] Z. Jiandong, Y. Qiming, S. Guoqing, L. Yi, and W. Yong, “Uav cooperative air combat maneuver decision based on multi-agent reinforcement learning,” Journal of Systems Engineering and Electronics, vol. 32, no. 6, pp. 1421–1438, 2021.
[26] J. Aguilar-Armijo, “Multi-access edge computing for adaptive bitrate video streaming,” in Proceedings of the 12th ACM Multimedia Systems Conference, 2021, pp. 378–382.
[27] H. Wang, X. Li, H. Ji, and H. Zhang, “Federated offloading scheme to minimize latency in mec-enabled vehicular networks,” in 2018 IEEE Globecom Workshops (GC Wkshps). IEEE, 2018, pp. 1–6.
[28] W. Shi, Q. Li, R. Zhang, G. Shen, Y. Jiang, Z. Yuan, and G.-M. Muntean, “Qoe ready to respond: a qoe-aware mec selection scheme for dash-based adaptive video streaming to mobile users,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 4016–4024.
[29] B. Ravi, J. Thangaraj, and S. Petale, “Stochastic network optimization of data dissemination for multi-hop routing in vanets,” in 2018 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 2018, pp. 1–4.
[30] G. Luo, Q. Yuan, H. Zhou, N. Cheng, Z. Liu, F. Yang, and X. S. Shen, “Cooperative vehicular content distribution in edge computing assisted 5g-vanet,” China communications, vol. 15, no. 7, pp. 1–17, 2018.
[31] Y. Yang, R. Zhao, and X. Wei, “Research on data distribution for vanet based on deep reinforcement learning,” in 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM). IEEE, 2019, pp. 484–487.
[32] Y. Liu, “Vanet routing protocol simulation research based on ns-3 and sumo,” in 2021 IEEE 4th International Conference on Electronics Technology (ICET). IEEE, 2021, pp. 1073–1076.
[33] S. Jat, R. S. Tomar, and M. S. P. Sharma, “Traffic analysis for accidents reduction in vanet＇s,” in 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). IEEE, 2019, pp. 115–118.
[34] M. Tahira, D. Ather, and A. K. Saxena, “Modeling and evaluation of heterogeneous networks for vanets,” in 2018 International Conference on System Modeling & Advancement in Research Trends (SMART). IEEE, 2018, pp. 150–153.
[35] M. H. C. Garcia, A. Molina-Galan, M. Boban, J. Gozalvez, B. Coll-Perales, T. Şahin, and A. Kousaridas, “A tutorial on 5g nr v2x communications,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 1972–2026, 2021.
[36] L. Zou, R. Trestian, and G.-M. Muntean, “edoas: Energy-aware device-oriented adaptive multimedia scheme for wi-fi offload,” in 2014 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2014, pp. 2916–2921.
[37] Q. Luo, C. Li, T. H. Luan, and W. Shi, “Collaborative data scheduling for vehicular edge computing via deep reinforcement learning,” IEEE Internet of Things Journal, vol. 7, no. 10, pp. 9637–9650, 2020.
[38] Y.-H. Xu, C.-C. Yang, M. Hua, and W. Zhou, “Deep deterministic policy gradient (ddpg)-based resource allocation scheme for noma vehicular communications,” IEEE Access, vol. 8, pp. 18 797–18 807, 2020.
[39] X. Hu, S. Xu, L. Wang, Y. Wang, Z. Liu, L. Xu, Y. Li, and W. Wang, “A joint power and bandwidth allocation method based on deep reinforcement learning for v2v communications in 5g,” China Communications, vol. 18, no. 7, pp. 25–35, 2021.
[40] S.-W. Kim, B. Qin, Z. J. Chong, X. Shen, W. Liu, M. H. Ang, E. Frazzoli, and D. Rus, “Multivehicle cooperative driving using cooperative perception: Design and experimental validation,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 2, pp. 663–680, 2014.
[41] G. Noh, J. Kim, S. Choi, N. Lee, H. Chung, and I. Kim, “Feasibility validation of a 5g-enabled mmwave vehicular communication system on a highway,” IEEE Access, vol. 9, pp. 36 535–36 546, 2021.
[42] Y. Yao, Y. Hu, G. Yang, and X. Zhou, “On mac access delay distribution for ieee 802.11 p broadcast in vehicular networks,” IEEE Access, vol. 7, pp. 149 052–149 067, 2019.
[43] P. Droździel, S. Tarkowski, I. Rybicka, and R. Wrona, “Drivers＇reaction time research in the conditions in the real traffic,” Open Engineering, vol. 10, no. 1, pp. 35–47, 2020.
[44] T.-Y. Chen, Y. Chiang, J.-H. Wu, H.-T. Chen, C.-C. Chen, and H.-Y. Wei, “Ieee p1935 edge/fog manageability and orchestration: Standard and usage example„” 2023.

簡易檢索 / 詳目顯示

相關論文