跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳靖玟
Jing-Wun Chen
論文名稱: Exploring Effects of Optimizer Selection and Their Hyperparameter Tuning on Performance of Deep Neural Networks for Image Recognition
Exploring Effects of Optimizer Selection and Their Hyperparameter Tuning on Performance of Deep Neural Networks for Image Recognition
指導教授: 黃楓南
Feng-Nan Hwang
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 56
中文關鍵詞: 深度學習
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,深度學習蓬勃發展,人們開始使用深度學習解決問題。深度神經網路可以用來進行語音辨識、圖像辨識、物件偵測、人臉辨識或是無人駕駛等等。最基礎的神經網路是多層感知器(MLP),由多個節點層組成,每層間互相全連接,而多層感知器最大的問題在於會忽略資料的形狀或順序,例如輸入影像資料時,將資訊處理成一維便會失去圖像重要的空間資訊,所以發展了卷積神經網路(CNN)。卷積神經網路比傳統的神經網路多了卷積層(Convolution layer)以及池化層(Pooling layer),以此保存以及擷取圖像特徵。

    我們將資料放進神經網路後,希望神經網路輸出的結果能夠接近真實值,其中就需要優化方法(Optimizer)的幫忙,讓預測值和真值的誤差最小化。深度學習內的優化方法通常都是基於梯度下降法(Gradient descent)改進的,而該如何選擇合適學習率(Learning rate)是一個難題。所以在本次實驗中,我們使用三種資料集(手寫數字MNIST, CIFAR-10和火車路途場景)及兩種網路架構(多層感知器以及卷積神經網路),搭配六種優化器(Gradient descent, Momentum, Adaptive gradient algorithm, Adadelta, Root Mean Square Propagation, 和Adam)。想藉此探討在圖像辨識問題上,優化器的選擇以及超參數的選取會有怎樣的影響。


    In recent years, deep learning has flourished and people have begun to use deep learning to solve problems. Deep neural networks can be used for speech recognition, image recognition, object detection, face recognition, or driverless. The most basic neural network is the Multilayer Perceptron (MLP), which consists of multiple node layers, each layer is fully connected to each other, and one of the drawbacks of MLP is that it ignores the shape of the data which is important for image data. Compare to traditional neural networks, the convolutional neural network (CNN) has additional convolution and pooling layers which are used for preserving and capturing image features.

    The accuracy rate for prediction using neural network depends on many factors, such as the architecture of neural networks, the cost functions, and the selection of an optimizer. The goal of this work is to investigate the effects of optimizer selection and their hyperparameter tuning on the performance of deep neural networks for image recognition problems. We use three data sets including MNIST, CIFAR-10 and train route scenarios as test problems and test six optimizers (Gradient descent, Momentum, Adaptive gradient algorithm, Adadelta, Root Mean Square Propagation, and Adam). Our numerical results show that Adam is a good choice because of its efficiency and robustness.

    Contents Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Artifi cial neural network (ANN) . . . . . . . . . . . . . . . . . . . . . . .. 2 2.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1.2 Multilayer feedforward neural network . . . . . . . . . . . . . . . . 3 2.1.3 Activation function . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Deep neural network (DNN) . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Multilayer perceptron (MLP) . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Backpropagation [5] . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 Convolutional neural network (CNN) . . . . . . . . . . . . . . . . . 8 2.2.4 Hyperparameter in neural network . . . . . . . . . . . . . . . . . . 9 2.3 Loss function in neural network . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Mean Square error (MSE) . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Softmax loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 The optimizer in neural network . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.1 Gradient descent (GD) . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.2 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.3 Adaptive gradient algorithm (Adagrad) . . . . . . . . . . . . . . . . 11 2.4.4 Adadelta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.5 Root mean square propagation (RMSprop) . . . . . . . . . . . . . 13 2.4.6 Adaptive moment estimation (Adam) . . . . . . . . . . . . . . . . . 13 3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 MNIST [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 CIFAR-10 [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.3 The images from Norway’s railway . . . . . . . . . . . . . . . . . . 16 3.1.4 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Network architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Experimental process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.1 The overview of parameters . . . . . . . . . . . . . . . . . . . . . . 19 4 Numerical results ans discussions . . . . . . . . . . . . . . . . . . . . . . . 21 4.1 Comparison of models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Comparison of optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Exploring the sensitivity of selection of learning rate . . . . . . . . 22 4.2.2 Exploring the convergence speed . . . . . . . . . . . . . . . . . . . 27 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    [1] Matt W Gardner and SR Dorling. Artifi cial neural networks (the multilayer percep-
    tron)ȋa review of applications in the atmospheric sciences. Atmospheric environ-
    ment, 32:2627–2636, 1998.
    [2] Steve Lawrence, C Lee Giles, Ah Chung Tsoi, and Andrew D Back. Face recognition:
    A convolutional neural-network approach. IEEE Transactions on Neural Networks,
    8:98–113, 1997.
    [3] Yoon Kim. Convolutional neural networks for sentence classifi cation. arXiv preprint
    arXiv:1408.5882, 2014.
    [4] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Suk-
    thankar, and Li Fei-Fei. Large-scale video classifi cation with convolutional neural
    networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
    Recognition, pages 1725–1732, 2014.
    [5] Robert Hecht-Nielsen. Theory of the backpropagation neural network. In Neural
    Networks for Perception, pages 65–93. Elsevier, 1992.
    [6] Li Deng. The MNIST database of handwritten digit images for machine learning
    research [best of the web]. IEEE Signal Processing Magazine, 29:141–142, 2012.
    [7] Alex Krizhevsky and Geoff Hinton. Convolutional deep belief networks on cifar-10.
    Unpublished manuscript, 40, 2010.
    [8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haff ner. Gradient-based
    learning applied to document recognition. Proceedings of the IEEE, 86:2278–2324,
    1998.
    [9] Alex Krizhevsky, Ilya Sutskever, and Geoff rey E Hinton. Imagenet classifi cation with
    deep convolutional neural networks. In Advances in Neural Information Processing
    Systems, pages 1097–1105, 2012.
    [10] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-
    scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    [11] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
    Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going
    deeper with convolutions. In Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition, pages 1–9, 2015.
    [12] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hier-
    archies for accurate object detection and semantic segmentation. In Proceedings of
    the IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587,
    2014.
    [13] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look
    once: Unifi ed, real-time object detection. In Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pages 779–788, 2016.
    [14] Florian Schroff , Dmitry Kalenichenko, and James Philbin. Facenet: A unifi ed em-
    bedding for face recognition and clustering. In Proceedings of the IEEE Conference
    on Computer Vision and Pattern Recognition, pages 815–823, 2015.
    [15] Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. Additive margin softmax for
    face verifi cation. IEEE Signal Processing Letters, 25:926–930, 2018.
    [16] Sameep Tandon Jeff Kiske Will Song Joel Pazhayampallil Mykhaylo Andriluka
    Pranav Rajpurkar Toki Migimatsu Royce Cheng-Yue Fernando Mujica Adam Coates
    Andrew Y. Ng Brody Huval, Tao Wang. An empirical evaluation of deep learning
    on highway driving. arXiv preprint arXiv:1504.01716, 2015.
    [17] James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms
    for hyper-parameter optimization. In Advances in Neural Information Processing
    Systems, pages 2546–2554, 2011.
    [18] Thomas M Breuel. The eff ects of hyperparameters on SGD training of neural net-
    works. arXiv preprint arXiv:1508.02788, 2015.
    [19] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT press,
    2016.
    [20] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for
    online learning and stochastic optimization. Journal of Machine Learning Research,
    pages 2121–2159, 2011.
    [21] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
    arXiv preprint arXiv:1412.6980, 2014.
    [22] Anil K Jain, Jianchang Mao, and KM Mohiuddin. Artifi cial neural networks:A
    tutorial. Computer, pages 31–44, 1996.
    [23] Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised
    learning algorithms. In Proceedings of the 23rd international conference on Machine
    learning, pages 161–168. ACM, 2006.
    [24] Horace B Barlow. Unsupervised learning. Neural Computation, 1:295–311, 1989.
    [25] Mario A. T. Figueiredo and Anil K. Jain. Unsupervised learning of fi nite mixture
    models. IEEE Transactions on Pattern Analysis & Machine Intelligence, pages 381–
    396, 2002.
    [26] Daniel Svozil, Vladimir Kvasnicka, and Jiri Pospichal. Introduction to multi-layer
    feed-forward neural networks. Chemometrics and Intelligent Laboratory Systems,
    39:43–62, 1997.
    [27] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward
    networks are universal approximators. Neural Networks, 2:359–366, 1989.
    [28] Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer
    feedforward networks with a nonpolynomial activation function can approximate any
    function. Neural Networks, 6:861–867, 1993.
    [29] Alex Krizhevsky, Ilya Sutskever, and Geoff rey E Hinton. Imagenet classifi cation with
    deep convolutional neural networks. In Advances in Neural Information Processing
    Systems, pages 1097–1105, 2012.
    [30] Geoff rey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep
    Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al.
    Deep neural networks for acoustic modeling in speech recognition. IEEE Signal
    Processing Magazine, 29, 2012.

    QR CODE
    :::