| 研究生: |
蕭仲廷 Chung-Ting Hsiao |
|---|---|
| 論文名稱: |
基於策略式強化學習之神經網路架構搜最佳化 A Policy-Based Reinforcement Learning Model for Neural Network Architecture Search |
| 指導教授: |
施國琛
Timothy K. Shih |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 軟體工程研究所 Graduate Institute of Software Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 深度學習 、神經網路 、強化學習 、機器學習 、神經網路架構搜尋 |
| 外文關鍵詞: | Deep Learning, Neural Network, Reinforcement Learning, Machine Learning, Neural Network Architecture Search |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來深度學習以及他的核心方法神經網路在機器學習領域中佔有很重要 的份量。原因在於擁有很高的準確率以及很簡單就可以實作的強項,同時也得力 於運算資源的高速進步,GPU 和 TPU 來的加速優勢使得深度學習這個領域也有 著蓬勃的發展。在實作深度學習的時候最常遇到的問題就是如何設計神經網路的 架構以及決定該用甚麼樣的激勵函數這些超參數該如何設定。在手動設計神經網 路的時候我們需要大量的經驗以及該領域的知識才能設計出一個相對而言表現 較好的神經網路,因此在許多研究中希望依靠電腦本身以及各樣的演算法使得神 經網路的架構能夠自動的生成或是自動的最佳化超參數,實作自動化的設計神經 網路有各式各樣的方法,在這篇論文中我們提出爬山模型,使用強化學習搭配我 們所設計特殊環境,讓強化學習中的 Agent 能夠有效地找到最佳化的神經網路, 並同時只需要少量的運算資源就能夠達成我們的任務。
The last few years, in the field of machine learning deep learning and its core – neural network has playing a very important role. Because of its high performance, and easy to implement, in addition to computing resource advancement such as GPU and TPU, these reason triggered a vigorous growth of deep learning and neural network. The most common problem user faced is when using deep learning which neural network structure should user use, and how can neural network structure be designed. Furthermore because of handcraft neural network need a great deal of knowledge and experience. Lately, a lot of studies has focused on this issue—how to automatically generate the finest neural network. There are several methods to implement the method, in this paper we use reinforcement learning based method, and Propose a special method call Hill Climbing Model (HCM), this model will find the finest structure for the user, and it is easy to train just cost few computing resource.
[1] Zoph, Barret; LE, Quoc V. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
[2] Baker, Bowen, et al. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016.
[3] Miikkulainen, Risto, et al. Evolving deep neural networks. In: Artificial Intelligence in the Age of Neural Networks and Brain Computing. Academic Press p. 293-312., 2019.
[4] Saxena, Shreyas; Verbeek, Jakob. Convolutional neural fab-rics. In: Advances in Neural Information Processing Systems. p. 4053-4061, 2016.
[5] Schaffer, J. David; Whitley, Darrell; Eshelman, Larry J. Com-binations of genetic algorithms and neural networks: A sur-vey of the state of the art. In: [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algo-rithms and Neural Networks. IEEE p. 1-37, 1992.
[6] Stanley, Kenneth O.; Miikkulainen, Risto. Evolving neural networks through augmenting topologies. Evolutionary computation10.2: 99-127, 2002.
[7] Verbancsics, Phillip; Harguess, Josh. Generative neuroevolu-tion for deep learning. arXiv preprint arXiv:1312.5355, 2013.
[8] Liu, Hanxiao, et al. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436, 2017.
[9] Tomassini, Marco. Parallel and distributed evolutionary algorithms: A review. 1999.
[10] Koutník, Jan; Schmidhuber, Jürgen; Gomez, Faustino. Evolv-ing deep unsupervised convolutional networks for vision-based reinforcement learning. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computa-tion. ACM p. 541-548, 2014.
[11] Tsitsiklis, John N. Asynchronous stochastic approximation and Q-learning. Machine learning, 16.3: 185-202, 1994.
[12] Bertsekas, Dimitri. Distributed dynamic programming. IEEE transactions on Automatic Control, 27.3: 610-616, 1982.
[13] Cai, Han, et al. Efficient architecture search by network transformation. In: Thirty-Second AAAI Conference on Arti-ficial Intelligence. 2018.
[14] Bello, Irwan, et al. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
[15] Pham, Hieu, et al. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018.
[16] Shahriari, Bobak, et al. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104.1: 148-175, 2015.
[17] Bergstra, James; Yamins, Daniel; Cox, David Daniel. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. 2013.
[18] Domhan, Tobias; Springenberg, Jost Tobias; Hutter, Frank. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial In-telligence. 2015.
[19] Snoek, Jasper; Larochelle, Hugo; Adams, Ryan P. Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, p. 2951-2959. 2012.
[20] Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task bayesian optimization. NIPS, pp. 2004–2012, 2013.
[21] Bergstra, James S., et al. Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems. p. 2546-2554. 2011.
[22] Vilalta, Ricardo; Drissi, Youssef. A perspective view and survey of meta-learning. Artificial intelligence review, 18.2: 77-95, 2002.
[23] Hochreiter, Sepp; Younger, A. Steven; Conwell, Peter R. Learning to learn using gradient descent. In: International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, p. 87-94. 2001.
[24] Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." Advances in Neural Infor-mation Processing Systems. 2016.
[25] LIU, Xu-Ying; WU, Jianxin; ZHOU, Zhi-Hua. Exploratory undersampling for class-imbalance learning. IEEE Transac-tions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39.2: 539-550, 2008.
[26] Schapire, Robert E. A brief introduction to boosting. In: Ijcai. p. 1401-1406. 1999.
[27] Özdemir, Ahmet Turan, and Billur Barshan. “Detecting Falls with Wearable Sensors Using Machine Learning Techniques.” Sensors (Basel, Switzerland) 14.6 (2014): 10691–10708. PMC. Web. 23 Apr. 2017.
[28] Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron C Courville, and Yoshua Bengio. Maxout networks. ICML (3), 28:1319–1327, 2013.
[29] Min Lin, Qiang Chen, and Shuicheng Yan. Network in net-work. arXiv preprint arXiv:1312.4400, 2013.
[30] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
[31] Zhong, Zhao, Junjie Yan, and Cheng-Lin Liu. "Practical net-work blocks design with q-learning." arXiv preprint arXiv:1708.055521.2 (2017): 5.
[32] Swersky, Kevin, Jasper Snoek, and Ryan P. Adams. "Multi-task bayesian optimization." Advances in neural information processing systems. 2013.
[33] Wan, Li, et al. "Regularization of neural networks using dropconnect." International conference on machine learning. 2013.
[34] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of bayesian optimization." Proceedings of the IEEE 104.1 (2016): 148-175.
[35] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
[36] Lin, Long-Ji. Reinforcement learning for robots using neural networks. No. CMU-CS-93-103. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCI-ENCE, 1993.
[37] Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artifi-cial intelligence research 4 (1996): 237-285.
[38] Vermorel, Joannes, and Mehryar Mohri. "Multi-armed bandit algorithms and empirical evaluation." European conference on machine learning. Springer, Berlin, Heidelberg, 2005.
[39] Galstyan, Aram, Karl Czajkowski, and Kristina Lerman. "Resource allocation in the grid using reinforcement learning." Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 3. IEEE Computer Society, 2004.
[40] Gomes, Eduardo Rodrigues, and Ryszard Kowalczyk. "Learning the IPA market with individual and social re-wards." Web Intelligence and Agent Systems: An Interna-tional Journal 7.2 (2009): 123-138.
[41] Ziogos, N. P., et al. "A reinforcement learning algorithm for market participants in FTR auctions." 2007 IEEE Lausanne Power Tech. IEEE, 2007.
[42] Bertsekas, Dimitri P., and Athena Scientific. Convex optimi-zation algorithms. Belmont: Athena Scientific, 2015.
[43] Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. Diss. King's College, Cambridge, 1989.
[44] Dean, Jeffrey, et al. ”Large scale distributed deep networks.” Advances in neural information processing systems. 2012.
[45] Gu, Shixiang, et al. ”Continuous deep q-learning with model-based acceleration.” arXiv preprint arXiv:1603.00748 (2016).
[46] Van Hasselt, Hado, Arthur Guez, and David Silver. ”Deep Reinforcement Learning with Double Q-Learning.” AAAI. 2016.
[47] Narendra, Kumpati S., Yu Wang, and Snehasis Mukhopady-hay. ”Fast Reinforcement Learning using Multiple Models.”, 2016 Control and Decision Conference, Las Vegas
[48] Narendra, Kumpati S., Snehasis Mukhopadyhay, and Yu Wang. ”Improving the Speed of Response of Learning Algo-rithms Using Multiple Models: An Introduction.”, the 17th Yale Workshop on Adaptive and Learning Systems
[49] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” arXiv:1504.00702 [cs.LG], 2015.
[50] J.-A. M. Assael, N. Wahlström, T. B. Schön, and M. P. Deisenroth, “Data-efficient learning of feedback policies from image pixels using deep dynamical models,” arXiv:1510.02173 [cs.AI], 2015.
[51] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recog-nition with visual attention,” arXiv:1412.7755 [cs.LG], 2014.