跳到主要內容

簡易檢索 / 詳目顯示

研究生: 江玟萱
Wen-Hsuan Chiang
論文名稱: Neural Network Architecture Optimization Based on Virtual Reward Reinforcement Learning
指導教授: 陳以錚
Yi-Cheng Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 41
中文關鍵詞: 神經架構搜索強化學習近端策略優化神經網絡優化機器學習
外文關鍵詞: Neural Architecture Search, Reinforcement Learning, Proximal Policy Optimization, Neural Network Optimization, Machine Learning
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來機器學習越來越受大眾歡迎,造成越來越多學者、業者、工程師等都進行相關的研究與應用。只要他們對於資料不夠理解,就有可能造成資訊的誤解或是模型的偏差,因為他們抓取的特徵就是一個機器學習的指標。為了避免手動抓取特徵的上述狀況,我們可以透過機器建立神經網路。我們的研究使用預測器來構立虛擬地圖。使用此虛擬地圖來訓練代理人,讓它可以找到良好的神經網絡體結構。但是獎勵函數有一些改變,因此我們在本研究中提出了四種模型。在實驗過程中,我們分析了四種模型的每個參數的實驗結果。並意識到模型穩定性的重要性。如果模型不穩定,則獲得的正確率的差距可能太大。然而我們的模型在正確率以及穩定性方面具有良好的性能。


    Abstract-- In recent years, machine learning has become more and more popular, causing more and more scholars, practitioners, and engineers to conduct related research and applications. If they don't understand the data well, it may cause misunderstanding of the information or deviation of the model, because the feature they capture is an indicator of machine learning. In order to avoid the above situation of manually grabbing features, we can build neural networks through machines. Our research uses a predictor to build a virtual map. Using this virtual map to train agents to find the good neural network architecture. But the reward function has some changes, so we proposed four models in this research. During the experiment, we analyze the experimental results of each parameter for the four models. And realize the importance of the model stability. If the model is unstable, the gap between the obtained accuracy may be too large. However, our model has a good performance in accuracy and stability.

    中文摘要 ii Abstract iii Table of contents iv List of Figures v List of Tables v 1. Introduction 1 2. Related Work 5 2.1 Neural Architecture Search 5 2.2 Reinforcement Learning 7 3. Methodology 9 3.1 Model Architecture 10 3.2 Data Sampling and Map Construction 11 3.3 Predictor 12 3.4 VR-PPO 13 4. Performance Evaluation 16 4.1 Accuracy Discussion 17 4.2 Parameter Setting of Predictor 18 4.3 Parameter Setting of VR-PPO 20 4.4 Reward Discussion (Virtual verse Real) 24 5. Conclusion 31 References 32

    [1] B. Zoph, V. Vasudevan, J. Shlens, and Q. Le. “Learning transferable architectures for scalable image recognition,” CVPR, 2017.
    [2] Islam, B.U., Baharudin, Z., Raza, M.Q., & Nallagownden, P. “Optimization of neural network architecture using genetic algorithm for load forecasting,” 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS), 1-6, 2014.
    [3] M. A. J. Idrissi et al. “Genetic algorithm for neural network architecture optimization,” 2016 3rd International Conference on Logistics Operations Management (GaL), 2016, pp. 1-4.
    [4] Ramchoun, H., Idrissi, M.A., Ghanou, Y., & Ettaouil, M. “Multilayer Perceptron: Architecture Optimization and Training,” IJIMAI, 4, 26-30, 2016.
    [5] E. Real, A. Aggarwal, T. Huang, and Q. Le. “Regularized evolution for image classifier architecture search,” Proceedings of the AAAI conference on artificial intelligence, Vol. 33, 2019.
    [6] B. Zoph and Q. Le. “Neural architecture search with reinforcement learning,” ICLR, 2017.
    [7] B. Baker, O. Gupta, N. Naik, and R. Raskar. “Designing neural network architectures using reinforcement learning,” ICLR, 2017.
    [8] Bello, Irwan, Pham, Hieu, Le, Quoc V., Norouzi, Mohammad, and Bengio, Samy. “Neural combinatorial optimization with reinforcement learning,” ICLR Workshop, 2017.
    [9] Guo, M., Zhong, Z., Wu, W., Lin, D., & Yan, J. “Irlas: Inverse reinforcement learning for architecture search,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.9021-9029.
    [10] R. Luo, F. Tian, T. Qin, E. Chen, and T.-Y. Liu. “Neural architecture optimization,” NeurIPS, 2018.
    [11] Wang, B., Xue, B., & Zhang, M.” Particle Swarm Optimization for Evolving Deep Neural Networks for Image Classification by Evolving and Stacking Transferable Blocks,” arXiv preprint arXiv: 1907.12659, 2019.
    [12] Liu, H., Simonyan, K., & Yang, Y. “Darts: Differentiable architecture search,” arXiv preprint arXiv: 1806.09055, 2018.
    [13] H. Cai, L. Zhu, and S. Han. “ProxylessNAS: Direct neural architecture search on target task and hardware,” ICLR, 2019.
    [14] A. Noy, N. Nayman, T. Ridnik, N. Zamir, S. Doveh, I. Friedman, R. Giryes, and L. Zelnik-Manor. “ASAP: Architecture search, anneal and prune,” arXiv preprint arXiv: 1904.04123, 2019.
    [15] Q. Yao, J. Xu, W.-W. Tu, and Z. Zhu. “Efficient Neural Architecture Search via Proximal Iterations,” AAAI Conference on Artificial Intelligence, 2020.
    [16] Whiteson, S., & Ciosek, K. “Expected policy gradients for reinforcement learning,” Journal of Machine Learning Research, Vol. 21, (52):1-51, 2020.
    [17] Booth, J. “PPO Dash: Improving Generalization in Deep Reinforcement Learning,” arXiv preprint arXiv: 1907.06704, 2019.
    [18] Hämäläinen, P., Babadi, A., Ma, X., & Lehtinen, J. “PPO-CMA: Proximal policy optimization with covariance matrix adaptation,” arXiv preprint arXiv: 1810.02541, 2018.
    [19] Greige, L., & Chin, P. “Reinforcement Learning in FlipIt,” arXiv preprint arXiv: 2002.12909, 2020.
    [20] Van Hasselt, H., Guez, A., & Silver, D. “Deep reinforcement learning with double q-learning,” Thirtieth AAAI conference on artificial intelligence, 2016.
    [21] Lakshmanan, K. “Accelerated Reinforcement Learning,” 2017 14th IEEE India Council International Conference (INDICON), IEEE, 2017, pp. 1-4.
    [22] Khadka, S., & Tumer, K. “Evolutionary reinforcement learning,” arXiv preprint arXiv: 1805.07917, 2018.
    [23] Jadeja, M., Varia, N., & Shah, A. “Deep Reinforcement Learning for Conversational AI,” arXiv preprint arXiv: 1709.05067, 2017.
    [24] Khandel, P., Rassafi, A. H., Pourahmadi, V., Sharifian, S., & Zheng, R. “SensorDrop: A Reinforcement Learning Framework for Communication Overhead Reduction on the Edge,” arXiv preprint arXiv: 1910.01601, 2019.
    [25] Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., & Dasgupta, S. “Model-based deep reinforcement learning for dynamic portfolio optimization,” arXiv preprint arXiv: 1901.08740, 2019.
    [26] Dunjko, V., Taylor, J. M., & Briegel, H. J. “Advances in quantum reinforcement learning,” 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 282-287, IEEE, 2017.
    [27] Faust, A., Francis, A., & Mehta, D. “Evolving rewards to automate reinforcement learning,” arXiv preprint arXiv: 1905.07628, 2019.
    [28] Yingjun, P., & Xinwen, H. “Learning Representations in Reinforcement Learning: An Information Bottleneck Approach,” arXiv preprint arXiv: 1911.05695, 2019.
    [29] Levy, A., Platt, R., & Saenko, K. “Hierarchical reinforcement learning with hindsight,” arXiv preprint arXiv: 1805.08180, 2018.
    [30] Haj-Ali, A., Ahmed, N. K., Willke, T., Gonzalez, J., Asanovic, K., & Stoica, I. “Deep Reinforcement Learning in System Optimization,” arXiv preprint arXiv: 1908.01275, 2019.
    [31] Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., & Dean, J. “Efficient neural architecture search via parameter sharing.” CoRR abs/1802.03268, 2018.
    [32] Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., & Murphy, K. “Progressive neural architecture search.” ECCV, 2018.

    QR CODE
    :::