跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林孟宏
Meng-Hong Lin
論文名稱: A Robust Deep Reinforcement Learning System for The Allocation of Epidemic Prevention Materials
指導教授: 孫敏德
Min-Te Sun
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 51
中文關鍵詞: 供應鏈管理強化學習醫療級口罩深度確定性策略梯度
外文關鍵詞: Supply Chain Management, Reinforcement Learning, Medical-grade Masks, Deep Deterministic Policy Gradient
相關次數: 點閱:12下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自 2019 年底以來,隨著 2019 新型冠狀病毒肺炎(COVID-19)在全球迅速蔓延,因此,對防疫物資(如,醫療級口罩)的需求急遽增加,若不適當控管口罩數量,將會導致存貨不足及哄抬價格現象產生。台灣早在疫情大流行前,醫療級口罩就由政府集中管理,並以固定價格出售給所有民眾。在這種情況下,優化供應鏈是一個重要問題,例如,如果政府在某個地區分配了太多的口罩,其他地區的民眾可能會遭受資源短缺的困擾。對於有效預防 COVID-19 而言,至關重要的是,將口罩分配到每個區域的量應接近每日消耗量。
    在本研究中,我們提出一個醫療級口罩分配系統。提出的系統採用強化學習框架,該框架以口罩的日常供需為環境,以 DDPG 演算法進行代理人更新,以每日缺貨量為獎勵和懲罰。我們透過實驗將此系統與用於供應鏈需求預測的機器學習方法進行了比較,結果表明,本研究所提出的系統在環境中獲得了更多獎勵。另外,我們的強化學習框架在不同的口罩總數下具有一致的性能。


    Coronavirus Disease 2019 (COVID-19) has spread rapidly around the world since the end of 2019. As a result, the demand for epidemic prevention materials (e.g., medical-grade masks) has increased drastically. If the masks are not properly controlled, it will lead to understock and price gouging. In Taiwan, since the very early stage of pandemic, the medical-grade masks have been collected and managed by the government, and have been sold to all residents for a fixed price. In this case, the supply chain optimization becomes an important issue. For instance, if the government allocates too many masks to a region, the residents in other regions may suffer from resource shortage. It is crucial that the masks are distributed to each region in the amount close to the daily consumption for efficient COVID-19 prevention. In this study, we propose a robust system for the allocation of medical-grade masks. The proposed system adopts the reinforcement learning framework, which takes the daily supply and demand of masks as the environment, the DDPG algorithm for agent updates, and the daily shortage as rewards and punishments. The proposed system is compared with the traditional machine learning approach used for supply chain demand forecasting through experiments, and the results indicate that the proposed system achieves more rewards in the environment. Moreover, our reinforcement learning framework has a consistent performance under different total numbers of masks.

    1 Introduction 1 2 RelatedWork 4 2.1 Machine Learning-based Customer Demand Forecasting 4 2.1.1 Support Vector Machine 4 2.1.2 Support Vector Regression 5 2.2 Reinforcement Learning 5 3 Preliminary 7 3.1 Machine learning techniques 7 3.1.1 Support Vector Machine 7 3.1.2 Reinforcement learning 9 3.2 Deep learning 10 3.2.1 Deep reinforcement learning 11 4 Design 13 4.1 Data Collection 14 4.2 Data Extraction 15 4.3 Reinforcement Learning Framework 17 4.3.1 Environment Design 17 4.3.2 Actor Network and Critic Network 19 4.3.3 Deep Deterministic Policy Gradient Algorithm 20 4.3.4 Feature Scaling 22 5 Performance 24 5.1 Data Description 24 5.2 Experimental Settings 25 5.3 Performance Evaluation 27 5.3.1 Evaluation Metrics 27 5.3.2 Experiment Results 28 6 Conclusions and Future Works 35 Reference 36

    [1] National Health Insurance Administration. compare of mask system 1.0, 2.0, and 3.0.
    [2] Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152, 1992.
    [3] Noe Casas. Deep deterministic policy gradient for urban traffic light control. arXiv preprint arXiv:1703.09035, 2017.
    [4] Tony F Chan, Gene Howard Golub, and Randall J LeVeque. Updating formulae and a pairwise algorithm for computing sample variances. In COMPSTAT 1982 5th Symposium held at Toulouse 1982, pages 30–41. Springer, 1982.
    [5] David J Closs, Cheri Speier, and Nathan Meacham. Sustainability to support endto-end value chains: the role of supply chain management. Journal of the Academy of Marketing Science, 39(1):101–116, 2011.
    [6] Taiwan Centers for Disease Control. Taiwan centers for disease control has commandeered masks for use by children and allocated combat masks to local governments free of charge to meet emergency epidemic prevention needs.
    [7] Organisation for Economic Co-operation and Development. The face mask globalvalue chain in the covid-19 outbreak: Evidence and policy lessons c oecd 2020the face mask global value chain in the covid-19 outbreak: Evidence and policy lessons, 2020.
    [8] Michael J Fry, Roman Kapuscinski, and Tava Lennon Olsen. Coordinating production and delivery under a (z, z)-type vendor-managed inventory contract. Manufacturing & Service Operations Management, 3(2):151–173, 2001.
    [9] Peter Gentsch. K¨unstliche Intelligenz f¨ur Sales, Marketing und Service: Mit AI und Bots zu einem Algorithmic Business–Konzepte, Technologien und Best Practices. Springer, 2017.
    [10] Ilaria Giannoccaro and Pierpaolo Pontrandolfo. Inventory management in supply chains: a reinforcement learning approach. International Journal of Production Economics, 78(2):153–161, 2002.
    [11] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
    [12] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.
    [13] Minghua He, Alex Rogers, Esther David, and Nicholas R Jennings. Designing and evaluating an adaptive trading agent for supply chain management. In AgentMediated Electronic Commerce. Designing Trading Agents and Mechanisms, pages 140–156. Springer, 2005.
    [14] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    [15] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
    [16] James Kennedy and Russell Eberhart. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks, volume 4, pages 1942–1948. IEEE, 1995.
    [17] CO Kim, J Jun, JK Baek, RL Smith, and Yeong-Dae Kim. Adaptive inventory control models for supply chain management. The International Journal of Advanced Manufacturing Technology, 26(9-10):1184–1192, 2005.
    [18] Axel Kuhn and Bernd Hellingrath. Optimierte zusammenarbeit in der wertsch¨opfungskette, 2002.
    [19] Choonjong Kwak, Jin Sung Choi, Chang Ouk Kim, and Ick-Hyun Kwon. Situation reactive approach to vendor managed inventory problem. Expert Systems with Applications, 36(5):9039–9045, 2009.
    [20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
    [21] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
    [22] Tom M Mitchell et al. Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37):870–877, 1997.
    [23] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
    [24] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
    [25] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.
    [26] Ahmad Mortazavi, Alireza Arshadi Khamseh, and Parham Azimi. Designing of an intelligent self-adaptive model for supply chain ordering management system. Engineering Applications of Artificial Intelligence, 37:207–220, 2015.
    [27] Ministry of Health and Taiwan Welfare. A detailed list of the remaining number of masks at health care pharmacies, 2020.
    [28] World Health Organization. Who coronavirus disease (covid-19) dashboard, 2020.
    [29] Athanasios S Polydoros and Lazaros Nalpantidis. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
    [30] Malek Sarhani and Abdellatif El Afia. Intelligent system based support vector regression for supply chain demand forecasting. In 2014 Second World Conference on Complex Systems (WCCS), pages 79–83. IEEE, 2014.
    [31] Yuhui Shi et al. Particle swarm optimization: developments, applications and resources. In Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), volume 1, pages 81–86. IEEE, 2001.
    [32] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014.
    [33] Ruoying Sun and Gang Zhao. Analyses about efficiency of reinforcement learning to supply chain ordering management. In IEEE 10th International Conference on Industrial Informatics, pages 124–127. IEEE, 2012.
    [34] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. 2011.
    [35] Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057–1063, 2000.
    [36] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence, 2016.
    [37] Martijn Van Otterlo and Marco Wiering. Reinforcement learning and markov decision processes. In Reinforcement Learning, pages 3–42. Springer, 2012.
    [38] Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.
    [39] Vladimir Vapnik, Steven E Golowich, and Alex J Smola. Support vector method for function approximation, regression estimation and signal processing. In Advances in neural information processing systems, pages 281–287, 1997.
    [40] Chen Weigen, Teng Li, Liu Jun, et al. Transformer winding hot-spot temperature prediction model of support vector machine optimized by genetic algorithm. Transactions of China Electrotechnical Society, 29(1):44–51, 2014.
    [41] Hannah Wenzel, Daniel Smit, and Saskia Sardesai. A literature review on machine learning in supply chain management. In Artificial Intelligence and Digital Transformation in Supply Chain Management: Innovative Approaches for Supply Chains. Proceedings of the Hamburg International Conference of Logistics (HICL), Vol. 27, pages 413–441. Berlin: epubli GmbH, 2019.
    [42] YB Zhao, HN Gao, and SB Feng. Emergency materials demand prediction based on support vector machine regression. Computer Simulation, 8:408–412, 2013.

    QR CODE
    :::