| 研究生: |
張哲維 Che-Wei Chang |
|---|---|
| 論文名稱: |
深度多模態神經網路──應用於少量樣本數的高效能機器學習模型 Deep Multimodal Neural Network: A Highly Efficient Machine Learning Model Applicable to Low Sample Numbers |
| 指導教授: |
陳慶瀚
Ching-han Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 116 |
| 中文關鍵詞: | 深度學習 、神經網路 、視覺檢測 、種子辨識 、特徵縮減 、多模態 |
| 外文關鍵詞: | deep learning, neural network, vision inspection, seed recognition, feature reduction, multimodal |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在醫療、工業及生物識別等許多不易取得大量數據的領域中,如何利用少量數據來進行有效的深度神經網路訓練,以完成分類任務,一直是欲克服的難題,而神經網路的可解釋性,同樣是相當重要的議題。現有的相關研究雖然能利用少樣本進行學習來達到有效的辨識,但要即時完成辨識任務,所消耗的功率、運算資源及硬體成本非常高,也存在著許多優化問題,並且缺乏可解釋性與良好的收斂性。因此本論文提出泛用且可解釋的深度多模態神經網路,以色彩、紋理和形狀做為主要特徵,利用統計與量化的方式,層層萃取出精簡且具鑑別性的多模態特徵,最後利用特徵間的互補性做出正確的分類決策。深度多模態神經網路使用基因演算法進行參數最佳化,以提升辨識率與實現增量學習。實驗結果深度多模態神經網路在少樣本多分類的辨識率高於淺層卷積神經網路12%,並且高於ResNet50 9.33%,而學習速度與運行速度也遠快於兩者,參數使用數量與模型大小則遠小於兩者,並且展現了相當優異的收斂性。深度多模態神經網路能夠實現於硬體資源有限且需要即時運行的嵌入式系統中,達到高速、即時、低功耗、低成本的目的。
In healthcare, industrial, and biometrics fields where obtaining a large amount of data is difficult, how to perform classification through deep learning using neural networks and a limited amount of data has become a problem remaining to be resolved. The interpretability of a neural network is another imperative topic. Previous studies have achieved effective recognition through machine learning using a small amount of data. However, real-time recognition still requires a large amount of power and computation resources as well as expensive hardware. Various problems concerning optimization also exist. In addition, such recognition processes result in undesirable interpretability and convergence. Therefore, this study proposed a widely applicable and interpretable deep multimodal neural network. Colors, textures, and shapes were used as the main features, and statistical and quantization methods were adopted to extract simple and identifiable multimodal features. Finally, the complementary characteristics of the extracted features were employed to perform classification. A genetic algorithm was used to optimize the parameters used in the deep multimodal neural network and therefore improve the recognition accuracy and achieve incremental learning. The experimental results revealed that when a small number of samples was used, the recognition rate of the deep multimodal neural network exceeded that of a shallow convolutional neural network by 12% and that of ResNet50 by 9.33%. Moreover, the deep multimodal neural network also exhibited faster learning and computation speeds than did these two networks, in addition to demonstrating excellent convergence. Deep multimodal neural networks can be applied to embedded systems with limited hardware resources that require real-time operations, thereby achieving the goals of high computation speed, instantaneity, low energy consumption, and low costs.
[1] J. Pendlebury, H. Xiong, and R. Walshe, "Artificial Neural Network Simulation on CUDA," in 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications, 2012, pp. 228-233.
[2] R. Ranjan, V. M. Patel, and R. Chellappa, "HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 121-135, 2019.
[3] G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
[4] A. R. Sharma and P. Kaushik, "Literature survey of statistical, deep and reinforcement learning in natural language processing," in 2017 International Conference on Computing, Communication and Automation (ICCCA), 2017, pp. 350-354.
[5] N. Baba, "A new approach for finding the global minimum of error function of neural networks," Neural networks, vol. 2, no. 5, pp. 367-373, 1989.
[6] K. Fukushima and S. Miyake, "Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition," in Competition and cooperation in neural nets: Springer, 1982, pp. 267-285.
[7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," California Univ San Diego La Jolla Inst for Cognitive Science1985.
[8] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, no. 4, pp. 541-551, 1989.
[9] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[11] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 248-255: Ieee.
[13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818-2826.
[14] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[15] Q.-s. Zhang and S.-C. Zhu, "Visual interpretability for deep learning: a survey," Frontiers of Information Technology & Electronic Engineering, vol. 19, no. 1, pp. 27-39, 2018.
[16] D. M. Hawkins, "The problem of overfitting," Journal of chemical information and computer sciences, vol. 44, no. 1, pp. 1-12, 2004.
[17] F. S. Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, "Learning to compare: Relation network for few-shot learning," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018.
[18] J. Shu, Z. Xu, and D. Meng, "Small sample learning in big data era," arXiv preprint arXiv:1808.04572, 2018.
[19] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
[20] O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, "Matching networks for one shot learning," in Advances in neural information processing systems, 2016, pp. 3630-3638.
[21] T. Munkhdalai and H. Yu, "Meta networks," arXiv preprint arXiv:1703.00837, 2017.
[22] S. Abe, "Feature selection and extraction," in Support Vector Machines for Pattern Classification: Springer, 2010, pp. 331-341.
[23] N. Venkat, The Curse of Dimensionality: Inside Out. 2018.
[24] E. Keogh and A. Mueen, "Curse of Dimensionality," in Encyclopedia of Machine Learning and Data Mining, C. Sammut and G. I. Webb, Eds. Boston, MA: Springer US, 2017, pp. 314-315.
[25] I. K. Fodor, "A survey of dimension reduction techniques," Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, vol. 9, pp. 1-18, 2002.
[26] H. Abdi and L. J. Williams, "Principal component analysis," Wiley interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433-459, 2010.
[27] M. Welling, "Fisher linear discriminant analysis," Department of Computer Science, University of Toronto, vol. 3, no. 1, 2005.
[28] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006.
[29] L. Gan, W. Lv, X. Zhang, and X. Meng, "Improved PCA+ LDA applies to gastric cancer image classification process," Physics Procedia, vol. 24, pp. 1689-1695, 2012.
[30] A. H. Sahoolizadeh, B. Z. Heidari, and C. H. Dehghani, "A new face recognition method using PCA, LDA and neural network," International Journal of Computer Science and Engineering, vol. 2, no. 4, pp. 218-223, 2008.
[31] V. Spruyt, "The curse of dimensionality in classification," URL: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/, 2014.
[32] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, no. 1, pp. 81-106, 1986.
[33] J. H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992.
[34] J. K. Hawkins, "Textural properties for pattern recognition," Picture processing and psychopictorics, pp. 347-370, 1970.
[35] T. Ojala, M. Pietikainen, and D. Harwood, "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions," in Proceedings of 12th International Conference on Pattern Recognition, 1994, vol. 1, pp. 582-585 vol.1.
[36] L. Sorensen, S. B. Shaker, and M. De Bruijne, "Quantitative analysis of pulmonary emphysema using local binary patterns," IEEE transactions on medical imaging, vol. 29, no. 2, pp. 559-569, 2010.
[37] C. Chen, B. Zhang, H. Su, W. Li, and L. Wang, "Land-use scene classification using multi-scale completed local binary patterns," Signal, image and video processing, vol. 10, no. 4, pp. 745-752, 2016.
[38] T. Ahonen, A. Hadid, and M. Pietikainen, "Face description with local binary patterns: Application to face recognition," IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 12, pp. 2037-2041, 2006.
[39] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971-987, 2002.
[40] S. Loncaric, "A survey of shape analysis techniques," Pattern recognition, vol. 31, no. 8, pp. 983-1001, 1998.
[41] M.-K. Hu, "Visual pattern recognition by moment invariants," IRE transactions on information theory, vol. 8, no. 2, pp. 179-187, 1962.
[42] Y. Liu, Y. Yin, and S. Zhang, "Hand gesture recognition based on HU moments in interaction of virtual reality," in Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2012 4th International Conference on, 2012, vol. 1, pp. 145-148.
[43] K. Khan, R. U. Khan, A. Alkhalifah, and N. Ahmad, "Urdu text classification using decision trees," in High-Capacity Optical Networks and Enabling/Emerging Technologies (HONET), 2015 12th International Conference on, 2015, pp. 1-4.
[44] J.-X. Du, X.-F. Wang, and G.-J. Zhang, "Leaf shape based plant species recognition," Applied mathematics and computation, vol. 185, no. 2, pp. 883-893, 2007.
[45] F. Kurugollu, B. Sankur, and A. E. Harmanci, "Color image segmentation using histogram multithresholding and fusion," Image and vision computing, vol. 19, no. 13, pp. 915-928, 2001.
[46] K. S. Tan and N. A. M. Isa, "Color image segmentation using histogram thresholding–Fuzzy C-means hybrid approach," Pattern Recognition, vol. 44, no. 1, pp. 1-15, 2011.
[47] M. J. Swain and D. H. Ballard, "Color indexing," International journal of computer vision, vol. 7, no. 1, pp. 11-32, 1991.
[48] I. Leichter, M. Lindenbaum, and E. Rivlin, "Mean shift tracking with multiple reference color histograms," Computer Vision and Image Understanding, vol. 114, no. 3, pp. 400-408, 2010.
[49] K. Lee, C. Lee, S.-A. Kim, and Y.-H. Kim, "Fast object detection based on color histograms and local binary patterns," in TENCON 2012-2012 IEEE Region 10 Conference, 2012, pp. 1-4.
[50] T. Kohonen, "The self-organizing map," Proceedings of the IEEE, vol. 78, no. 9, pp. 1464-1480, 1990.
[51] D. Miljković, "Brief review of self-organizing maps," in 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2017, pp. 1061-1066.
[52] D. F. Specht, "Probabilistic neural networks," Neural networks, vol. 3, no. 1, pp. 109-118, 1990.
[53] M. Nemati, M. Braun, and S. Tenbohlen, "Optimization of unit commitment and economic dispatch in microgrids based on genetic algorithm and mixed integer linear programming," Applied energy, vol. 210, pp. 944-963, 2018.
[54] A. Rezaie, G. Tsatsaronis, and U. Hellwig, "Thermal design and optimization of a heat recovery steam generator in a combined-cycle power plant by applying a genetic algorithm," Energy, vol. 168, pp. 346-357, 2019.
[55] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
[56] R. J. Mayer, "IDEF0 function modeling," A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report, AFWAL-TR-81-4023 (The IDEF0 Yellow Book), Knowledge-Based System Inc, College Station, TX, 1992.
[57] R. David, "Grafcet: A powerful tool for specification of logic controllers," IEEE Transactions on control systems technology, vol. 3, no. 3, pp. 253-268, 1995.
[58] L. Peeters and A. Dassargues, "Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset," 2006.
[59] X. Li, S. Deng, S. Wang, Z. Lv, and L. Wu, "Review of Small Data Learning Methods," in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), 2018, vol. 02, pp. 106-109.
[60] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 265-283.