| 研究生: |
蔡子涵 Tzu-Han Tsai |
|---|---|
| 論文名稱: | A DQN-Based Reinforcement Learning Model for Neural Network Architecture Search |
| 指導教授: |
陳以錚
Yi-Cheng Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 機器學習 、神經網路 、強化學習 、神經網路架構 |
| 外文關鍵詞: | Machine learning, Neural network, Reinforcement learning, Neural network architecture |
| 相關次數: | 點閱:22 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
機器學習算法是一類從數據中自動分析獲得規律,並利用規律對未知數據進行預測的算法。機器學習已廣泛應用於數據挖掘、計算機視覺、自然語言處理、生物特徵識別、搜尋引擎、醫學診斷、檢測信用卡欺詐、證券市場分析等等,網路時代來臨時帶動了數據量的的成長,但當在設計神經網路時針對某一項資料集設計一個神經網絡架構需要專業的知識、時間以及電算資源。每一個神經網路都是通過專家許多專業知識還有一次又一次的仔細的實驗或是從少數現有的優秀神經網絡更改其架構而來。為了加速建構神經網路的建構,我們建構了一套系統HILL-CLIMBING MODEL;這是一種基於強化學習的建模算法,可以給定強化學習中學習任務自動生成表現優異的神經網路架構。使用強化學習的訓練並搭配使用Epsilon貪婪的探索策略和經驗回放的DQN讓強化學習經由這些經驗與策略生成表現優異的神經網路。強化學習搭配貪婪式的探索加強了架構的可能性,並經由迭代地發現具有改進的學習任務的設計。即使在圖像分類基準測試中,強化學習的網絡也可以像設計的現有網絡那樣做得一樣好,而且效率更高。
Designing neural network (NN) architectures requires both human expertise and labor. New architectures are handcrafted by careful experimentation or modified from a handful of existing networks. We introduce HCM, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing NN architectures for a given learning task. The learning agent is trained to sequentially choose NN layers using DQN with an ɛ-greedy exploration strategy and experience replay. The agent explores a large but finite space of possible architectures and iteratively discovers designs with improved performance on the learning task. Even on image classification benchmarks, the agent-designed networks can do good as existing networks designed but more efficient. We also outperform existing meta-modeling approaches for network design on image classification or regression tasks.
[1] Baker, Bowen, et al. "Designing neural network architectures using reinforcement learning." arXiv preprint arXiv:1611.02167(2016).
[2] Zhong, Zhao, Junjie Yan, and Cheng-Lin Liu. "Practical network blocks design with q-learning." arXiv preprint arXiv:1708.055521.2 (2017): 5.
[3] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
[4] Schaffer, J. David, Darrell Whitley, and Larry J. Eshelman. "Combinations of genetic algorithms and neural networks: A survey of the state of the art." [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks. IEEE, 1992.
[5] Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. "Practical bayesian optimization of machine learning algorithms." Advances in neural information processing systems. 2012.
[6] Swersky, Kevin, Jasper Snoek, and Ryan P. Adams. "Multi-task bayesian optimization." Advances in neural information processing systems. 2013.
[7] Wan, Li, et al. "Regularization of neural networks using dropconnect." International conference on machine learning. 2013.
[8] Cai, Han, et al. "Efficient architecture search by network transformation." Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
[9] Stanley, Kenneth O., and Risto Miikkulainen. "Evolving neural networks through augmenting topologies." Evolutionary computation 10.2 (2002): 99-127.
[10] Verbancsics, Phillip, and Josh Harguess. "Generative neuroevolution for deep learning." arXiv preprint arXiv:1312.5355(2013).
[11] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of bayesian optimization." Proceedings of the IEEE 104.1 (2016): 148-175.
[12] Bergstra, James, Daniel Yamins, and David Daniel Cox. "Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures." (2013).
[13] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
[14] Lin, Long-Ji. Reinforcement learning for robots using neural networks. No. CMU-CS-93-103. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE, 1993.
[15] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of bayesian optimization." Proceedings of the IEEE 104.1 (2016): 148-175.
[16] Domhan, Tobias, Jost Tobias Springenberg, and Frank Hutter. "Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves." Twenty-Fourth International Joint Conference on Artificial Intelligence. 2015.
[17] Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. "Practical bayesian optimization of machine learning algorithms." Advances in neural information processing systems. 2012.
[18] Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task bayesian optimization. NIPS, pp. 2004–2012, 2013.
[19] Bergstra, James S., et al. "Algorithms for hyper-parameter optimization." Advances in neural information processing systems. 2011.
[20] Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artificial intelligence research 4 (1996): 237-285.
[21] Vilalta, Ricardo, and Youssef Drissi. "A perspective view and survey of meta-learning." Artificial intelligence review 18.2 (2002): 77-95.
[22] Hochreiter, Sepp, A. Steven Younger, and Peter R. Conwell. "Learning to learn using gradient descent." International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, 2001.
[23] Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." Advances in Neural Information Processing Systems. 2016.
[24] Vermorel, Joannes, and Mehryar Mohri. "Multi-armed bandit algorithms and empirical evaluation." European conference on machine learning. Springer, Berlin, Heidelberg, 2005.
[25] Tsitsiklis, John N. "Asynchronous stochastic approximation and Q-learning." Machine learning 16.3 (1994): 185-202.
[26] Bertsekas, Dimitri. "Distributed dynamic programming." IEEE transactions on Automatic Control 27.3 (1982): 610-616.
[27] Tomassini, Marco. "Parallel and distributed evolutionary algorithms: A review." (1999).
[28] Koutník, Jan, Jürgen Schmidhuber, and Faustino Gomez. "Evolving deep unsupervised convolutional networks for vision-based reinforcement learning." Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, 2014.
[29] Galstyan, Aram, Karl Czajkowski, and Kristina Lerman. "Resource allocation in the grid using reinforcement learning." Proceedings of the Third International Joint Conference on Autonomous Mountaineers and Multimountaineer Systems-Volume 3. IEEE Computer Society, 2004.
[30] Gomes, Eduardo Rodrigues, and Ryszard Kowalczyk. "Learning the IPA market with individual and social rewards." Web Intelligence and Mountaineer Systems: An International Journal 7.2 (2009): 123-138.
[31] Ziogos, N. P., et al. "A reinforcement learning algorithm for market participants in FTR auctions." 2007 IEEE Lausanne Power Tech. IEEE, 2007.
[32] Bertsekas, Dimitri P., and Athena Scientific. Convex optimization algorithms. Belmont: Athena Scientific, 2015.
[33] Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. Diss. King's College, Cambridge, 1989.
[34] Dean, Jeffrey, et al. ”Large scale distributed deep networks.” Advances in neural information processing systems. 2012.
[35] Gu, Shixiang, et al. ”Continuous deep q-learning with model-based acceleration.” arXiv preprint arXiv:1603.00748 (2016).
[36] Van Hasselt, Hado, Arthur Guez, and David Silver. ”Deep Reinforcement Learning with Double Q-Learning.” AAAI. 2016.
[37] Narendra, Kumpati S., Yu Wang, and Snehasis Mukhopadyhay. ”Fast Reinforcement Learning using Multiple Models.”, 2016 Control and Decision Conference, Las Vegas
[38] Narendra, Kumpati S., Snehasis Mukhopadyhay, and Yu Wang. ”Improving the Speed of Response of Learning Algorithms Using Multiple Models: An Introduction.”, the 17th Yale Workshop on Adaptive and Learning Systems
[39] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” arXiv:1504.00702 [cs.LG], 2015.
[40] J.-A. M. Assael, N. Wahlström, T. B. Schön, and M. P. Deisenroth, “Data-efficient learning of feedback policies from image pixels using deep dynamical models,” arXiv:1510.02173 [cs.AI], 2015.
[41] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” arXiv:1412.7755 [cs.LG], 2014.
[42] Zoph, B., & Le, Q. V. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.
[43] Cai, H., Chen, T., Zhang, W., Yu, Y., & Wang, J. (2018, April). Efficient architecture search by network transformation. In Thirty-Second AAAI Conference on Artificial Intelligence.
[44] Liu, H., Simonyan, K., Vinyals, O., Fernando, C., & Kavukcuoglu, K. (2017). Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436.
[45] Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. arXiv preprint arXiv:1302.4389.
[46] Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.
[47] Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
[48] Liu, X. Y., Wu, J., & Zhou, Z. H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539-550.