車載網路中基於多代理人強化學習和賽局理論之最小化任務成本的方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊宗祐 Zong-You Yang
論文名稱：	車載網路中基於多代理人強化學習和賽局理論之最小化任務成本的方法 Using Multi-Agent Reinforcement Learning and Game Theory to Minimize the Task Cost in Vehicular Networks
指導教授：	胡誌麟 Chih-Lin Hu
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	中文
論文頁數：	128
中文關鍵詞：	自動駕駛、車聯網、計算卸載、協作計算、賽局理論、多代理人強化學習
外文關鍵詞：	Autonomous Driving, Vehicular Networks, Computing Offloading, Collaborative Computing, Game Theory, Multi-Agent Reinforcement Learning, Computing Offloading, Collaborative Computing, Game Theory, Multi-Agent Reinforcement Learning
相關次數：	點閱：14 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著自動駕駛技術的快速發展，車載感測器能夠更精確地感知車輛周圍的環境，然而，這也導致了大量的感測數據產生，進而使自動駕駛系統面臨計算能力限制所帶來的延遲增加和能耗過大的挑戰。為了緩解車載計算的負擔，將計算任務轉移至外部計算資源，並通過雲端、邊緣計算節點或附近車載設備進行協作計算，成為一種解決方案。本文提出了一種基於賽局理論和深度強化學習的車載計算任務卸載策略。首先，我們設計了一個綜合考量延遲、功耗和計算資源租賃成本的任務成本函數，旨在制定卸載策略並評估不同卸載決策的優劣。接著，我們將車輛間競爭計算資源的問題描述為一個賽局，並證明該賽局存在納許均衡解。最後，我們將賽局問題整合至多代理人強化學習問題中，採用MATD3架構進行模型訓練，以尋找賽局中的最優策略均衡解。該方法能根據當前網路環境和任務特性，選擇最佳的計算任務卸載決策。在滿足時間容忍度要求的前提下，顯著提高計算任務的完成率並降低任務完成成本。實驗結果顯示，與以往研究相比，本文提出的方法能更好地評估不同卸載決策，制定出最佳的卸載策略，從而降低任務成本並提升任務完成率，為自動駕駛等應用提供更加靈活和高效的解決方案。

With the rapid advancement of autonomous driving technology, vehicular sensors have increasingly enhanced their capability to accurately perceive the vehicle's surrounding environment. However, this advancement has resulted in the generation of substantial amounts of sensor data, which presents significant challenges for autonomous driving systems, particularly in terms of increased latency and excessive energy consumption arising from computational capacity limitations. To mitigate the burden on vehicular computing resources, the strategy of offloading computational tasks to external resources through collaboration with cloud computing, edge computing nodes, or nearby vehicular devices has emerged as a viable solution.This thesis proposes a vehicle computing task offloading strategy grounded in game theory and deep reinforcement learning. Initially, a comprehensive task cost function is designed, taking into account latency, power consumption, and the costs associated with renting computational resources. This function aims to formulate offloading strategies while evaluating the advantages and disadvantages of various offloading decisions. Subsequently, the competition for computational resources among vehicles is framed as a game, wherein the existence of a Nash equilibrium solution is demonstrated. Finally, the game problem is integrated into a multi-agent reinforcement learning framework, utilizing the MATD3 architecture for model training to identify the optimal strategy equilibrium solution within the context of the game.This method enables the selection of the optimal computational task offloading decision based on current network environment and task characteristics. Under the conditions of meeting time tolerance requirements, it significantly enhances the completion rate of computational tasks while concurrently reducing task completion costs. Experimental results indicate that, in comparison to previous studies, the proposed method provides a superior evaluation of different offloading decisions and facilitates the formulation of optimal offloading strategies. Consequently, this approach reduces task costs and improves task completion rates, thereby offering a more flexible and efficient solution for applications in autonomous driving.

摘要 i
Abstract ii
圖目錄 v
表目錄 vii
簡介 1
背景與相關文獻探討 4
1 車載網路中的協作計算 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 計算卸載（Computation Offloading） . . . . . . . . . . . . . . . . 6
1.2 協作計算（Collaborative Computing） . . . . . . . . . . . . . . . . 8
2 賽局理論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 賽局理論簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 車載網路下的賽局理論 . . . . . . . . . . . . . . . . . . . . . . . . 13
3 強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 強化學習簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 車載網路下的強化學習 . . . . . . . . . . . . . . . . . . . . . . . . 21
研究方法 24
1 系統架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 環境及問題定義 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1 本地端運算 (Local Computing) . . . . . . . . . . . . . . . . . . . . 31
2.2 卸載運算 (Remote Computing) . . . . . . . . . . . . . . . . . . . . 32
2.3 問題描述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 賽局理論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 賽局描述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 納許均衡 (Nash Equilibrium) . . . . . . . . . . . . . . . . . . . . . 37
3.3 潛在賽局 (Potential Game) . . . . . . . . . . . . . . . . . . . . . . 38
4 強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 單代理人強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 多代理人強化學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
實驗與結果分析 61
1 實驗設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2 實驗環境配置 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3 Sumo 仿真環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 實驗參數設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1 網路環境參數設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 強化學習模型參數設計 . . . . . . . . . . . . . . . . . . . . . . . . 68
5 實驗對照組 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 超參數 (Hyperparameter) 調整影響 . . . . . . . . . . . . . . . . . . . . . . 70
6.1 學習率（Learning Rate,α）影響 . . . . . . . . . . . . . . . . . . . 71
6.2 衰減率（Gamma,γ）影響 . . . . . . . . . . . . . . . . . . . . . . . 73
7 實驗評估 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.1 不同權重對任務成本的影響 . . . . . . . . . . . . . . . . . . . . . . 76
7.2 賽局理論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.3 納許均衡 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.4 不同強化學習演算法之間比較 . . . . . . . . . . . . . . . . . . . . . 88
7.5 不同任務成本之間比較 . . . . . . . . . . . . . . . . . . . . . . . . 95
結論與未來研究 111
參考文獻 112

                                

[1] W. Feng, N. Zhang, S. Li, S. Lin, R. Ning, S. Yang, and Y. Gao, “Latency minimization of reverse offloading in vehicular edge computing,” IEEE Transactions on
Vehicular Technology, vol. 71, no. 5, pp. 5343–5357, 2022.
[2] A. T. Jawad, R. Maaloul, and L. Chaari, “A multi-agent reinforcement learning-based
approach for uav-assisted vehicle-to-everything network,” in 2023 9th International
Conference on Control, Decision and Information Technologies (CoDIT), 2023, pp.
123–129.
[3] W. Zhan, C. Luo, J. Wang, C. Wang, G. Min, H. Duan, and Q. Zhu, “Deepreinforcement-learning-based offloading scheduling for vehicular edge computing,”
IEEE Internet of Things Journal, vol. 7, no. 6, pp. 5449–5465, 2020.
[4] D. Pliatsios, P. Sarigiannidis, T. D. Lagkas, V. Argyriou, A.-A. A. Boulogeorgos,
and P. Baziana, “Joint wireless resource and computation offloading optimization for
energy efficient internet of vehicles,” IEEE Transactions on Green Communications
and Networking, vol. 6, no. 3, pp. 1468–1480, 2022.
[5] M. Liwang, J. Wang, Z. Gao, X. Du, and M. Guizani, “Game theory based opportunistic computation offloading in cloud-enabled iov,” IEEE Access, vol. 7, pp.
32 551–32 561, 2019.
[6] Q. Wu, H. Ge, H. Liu, Q. Fan, Z. Li, and Z. Wang, “A task offloading scheme in
vehicular fog and cloud computing system,” IEEE Access, vol. 8, pp. 1173–1184,
2020.
[7] C. Wu, Z. Huang, and Y. Zou, “Delay constrained hybrid task offloading of internet of
vehicle: A deep reinforcement learning method,” IEEE Access, vol. 10, pp. 102 778–
102 788, 2022.
[8] K. Wang, X. Wang, X. Liu, and A. Jolfaei, “Task offloading strategy based on reinforcement learning computing in edge computing architecture of internet of vehicles,”
IEEE Access, vol. 8, pp. 173 779–173 789, 2020.
[9] C. Chen, L. Chen, L. Liu, S. He, X. Yuan, D. Lan, and Z. Chen, “Delay-optimized
v2v-based computation offloading in urban vehicular edge computing and networks,”
IEEE Access, vol. 8, pp. 18 863–18 873, 2020.
[10] X. Dai, Z. Xiao, H. Jiang, H. Chen, G. Min, S. Dustdar, and J. Cao, “A learning-based
approach for vehicle-to-vehicle computation offloading,” IEEE Internet of Things
Journal, vol. 10, no. 8, pp. 7244–7258, 2023.
[11] R.-H. Hwang, M. M. Islam, M. A. Tanvir, M. S. Hossain, and Y.-D. Lin, “Communication and computation offloading for 5g v2x: Modeling and optimization,” in
GLOBECOM 2020 - 2020 IEEE Global Communications Conference, 2020, pp. 1–6.
[12] H. Ge, X. Song, S. Ma, L. Liu, S. Li, X. Cheng, T. Zhou, and H. Feng, “Task
offloading algorithm in edge computing based on dqn,” in 2022 4th International
Conference on Natural Language Processing (ICNLP), 2022, pp. 482–488.
[13] S. Birhanu Engidayehu, T. Mahboob, and M. Young Chung, “Deep reinforcement
learning-based task offloading and resource allocation in mec-enabled wireless networks,” in 2022 27th Asia Pacific Conference on Communications (APCC), 2022,
pp. 226–230.
[14] J. Chen, Z. Wu, L. Wu, Y. Xia, Y. Wang, L. Xiong, and C. Shi, “Hybrid decision
based multi-agent deep reinforcement learning for task offloading in collaborative
edge-cloud computing,” in 2022 IEEE 24th Int Conf on High Performance Computing
& Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on
Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems &
Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2022, pp. 228–235.
[15] P.-D. Nguyen and L. B. Le, “Joint computation offloading, sfc placement, and resource allocation for multi-site mec systems,” in 2020 IEEE Wireless Communications and Networking Conference (WCNC), 2020, pp. 1–6.
[16] Y. Li, C. Yang, M. Deng, X. Tang, and W. Li, “A dynamic resource optimization
scheme for mec task offloading based on policy gradient,” in 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), vol. 6, 2022,
pp. 342–345.
[17] P. Teymoori and A. Boukerche, “Dynamic multi-user computation offloading for
mobile edge computing using game theory and deep reinforcement learning,” in ICC
2022 - IEEE International Conference on Communications, 2022, pp. 1930–1935.
[18] Y. Wang, P. Lang, D. Tian, J. Zhou, X. Duan, Y. Cao, and D. Zhao, “A game-based
computation offloading method in vehicular multiaccess edge computing networks,”
IEEE Internet of Things Journal, vol. 7, no. 6, pp. 4987–4996, 2020.
[19] X. Chen, L. Jiao, W. Li, and X. Fu, “Efficient multi-user computation offloading
for mobile-edge cloud computing,” IEEE/ACM transactions on networking, vol. 24,
no. 5, pp. 2795–2808, 2015.
[20] M. Dai, Z. Su, Q. Xu, and N. Zhang, “Vehicle assisted computing offloading for
unmanned aerial vehicles in smart city,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1932–1944, 2021.
[21] H. Wang, Z. Lin, K. Guo, and T. Lv, “Computation offloading based on game theory
in mec-assisted v2x networks,” in 2021 IEEE International Conference on Communications Workshops (ICC Workshops), 2021, pp. 1–6.
[22] ——, “Computation offloading based on game theory in mec-assisted v2x networks,”
in 2021 IEEE International Conference on Communications Workshops (ICC Workshops), 2021, pp. 1–6.
[23] G. Jain, A. Kumar, and S. A. Bhat, “Recent developments of game theory and
reinforcement learning approaches: A systematic review,” IEEE Access, vol. 12, pp.
9999–10 011, 2024.
[24] T. Rappaport, Wireless Communications: Principles and Practice, 2nd ed. USA:
Prentice Hall PTR, 2001.
[25] S. Jošilo and G. Dán, “Decentralized algorithm for randomized task allocation in fog
computing systems,” IEEE/ACM Transactions on Networking, vol. 27, no. 1, pp.
85–97, 2019.
[26] Z. Xiao, X. Dai, H. Jiang, D. Wang, H. Chen, L. Yang, and F. Zeng, “Vehicular
task offloading via heat-aware mec cooperation using game-theoretic method,” IEEE
Internet of Things Journal, vol. 7, no. 3, pp. 2038–2052, 2020.
[27] H. Wang, Z. Lin, K. Guo, and T. Lv, “Energy and delay minimization based on game
theory in mec-assisted vehicular networks,” in 2021 IEEE International Conference
on Communications Workshops (ICC Workshops), 2021, pp. 1–6.
[28] X. Hu, S. Xu, L. Wang, Y. Wang, Z. Liu, L. Xu, Y. Li, and W. Wang, “A joint
power and bandwidth allocation method based on deep reinforcement learning for
v2v communications in 5g,” China Communications, vol. 18, no. 7, pp. 25–35, 2021.
[29] D. Monderer and L. S. Shapley, “Potential games,” Games and Economic
Behavior, vol. 14, no. 1, pp. 124–143, 1996. [Online]. Available: https:
//www.sciencedirect.com/science/article/pii/S0899825696900445
[30] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actorcritic for mixed cooperative-competitive environments,” in Proceedings of the 31st
International Conference on Neural Information Processing Systems, ser. NIPS’17.
Red Hook, NY, USA: Curran Associates Inc., 2017, p. 6382–6393.
[31] L. Wang, K. Wang, C. Pan, W. Xu, N. Aslam, and L. Hanzo, “Multi-agent deep reinforcement learning-based trajectory planning for multi-uav assisted mobile edge computing,” IEEE Transactions on Cognitive Communications and Networking, vol. 7,
no. 1, pp. 73–84, 2021.
[32] J. J. Ackermann, V. Gabler, T. Osa, and M. Sugiyama, “Reducing overestimation
bias in multi-agent domains using double centralized critics,” ArXiv, vol. abs/
1910.01465, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:
203642167
[33] T. Rappaport, Wireless Communications: Principles and Practice. Cambridge
University Press, 2024. [Online]. Available: https://books.google.com.tw/books?id=
X3r5EAAAQBAJ

簡易檢索 / 詳目顯示

相關論文