跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張哲維
Che-Wei Chang
論文名稱: 深度多模態神經網路──應用於少量樣本數的高效能機器學習模型
Deep Multimodal Neural Network: A Highly Efficient Machine Learning Model Applicable to Low Sample Numbers
指導教授: 陳慶瀚
Ching-han Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 116
中文關鍵詞: 深度學習神經網路視覺檢測種子辨識特徵縮減多模態
外文關鍵詞: deep learning, neural network, vision inspection, seed recognition, feature reduction, multimodal
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在醫療、工業及生物識別等許多不易取得大量數據的領域中,如何利用少量數據來進行有效的深度神經網路訓練,以完成分類任務,一直是欲克服的難題,而神經網路的可解釋性,同樣是相當重要的議題。現有的相關研究雖然能利用少樣本進行學習來達到有效的辨識,但要即時完成辨識任務,所消耗的功率、運算資源及硬體成本非常高,也存在著許多優化問題,並且缺乏可解釋性與良好的收斂性。因此本論文提出泛用且可解釋的深度多模態神經網路,以色彩、紋理和形狀做為主要特徵,利用統計與量化的方式,層層萃取出精簡且具鑑別性的多模態特徵,最後利用特徵間的互補性做出正確的分類決策。深度多模態神經網路使用基因演算法進行參數最佳化,以提升辨識率與實現增量學習。實驗結果深度多模態神經網路在少樣本多分類的辨識率高於淺層卷積神經網路12%,並且高於ResNet50 9.33%,而學習速度與運行速度也遠快於兩者,參數使用數量與模型大小則遠小於兩者,並且展現了相當優異的收斂性。深度多模態神經網路能夠實現於硬體資源有限且需要即時運行的嵌入式系統中,達到高速、即時、低功耗、低成本的目的。


    In healthcare, industrial, and biometrics fields where obtaining a large amount of data is difficult, how to perform classification through deep learning using neural networks and a limited amount of data has become a problem remaining to be resolved. The interpretability of a neural network is another imperative topic. Previous studies have achieved effective recognition through machine learning using a small amount of data. However, real-time recognition still requires a large amount of power and computation resources as well as expensive hardware. Various problems concerning optimization also exist. In addition, such recognition processes result in undesirable interpretability and convergence. Therefore, this study proposed a widely applicable and interpretable deep multimodal neural network. Colors, textures, and shapes were used as the main features, and statistical and quantization methods were adopted to extract simple and identifiable multimodal features. Finally, the complementary characteristics of the extracted features were employed to perform classification. A genetic algorithm was used to optimize the parameters used in the deep multimodal neural network and therefore improve the recognition accuracy and achieve incremental learning. The experimental results revealed that when a small number of samples was used, the recognition rate of the deep multimodal neural network exceeded that of a shallow convolutional neural network by 12% and that of ResNet50 by 9.33%. Moreover, the deep multimodal neural network also exhibited faster learning and computation speeds than did these two networks, in addition to demonstrating excellent convergence. Deep multimodal neural networks can be applied to embedded systems with limited hardware resources that require real-time operations, thereby achieving the goals of high computation speed, instantaneity, low energy consumption, and low costs.

    摘要 I Abstract II 謝誌 III 目錄 V 圖目錄 VIII 表目錄 XI 第一章、 緒論 1 1.1 研究背景 1 1.2 研究目的 3 1.3 論文架構 4 第二章、 文獻回顧 5 2.1 特徵擷取技術 5 2.1.1 局部二值型態 5 2.1.2 Hu動差不變量 9 2.1.3 空間色彩統計向量 12 2.2 神經網路 13 2.2.1 自組織映射神經網路 14 2.2.2 機率神經網路 16 2.2.3 卷積神經網路 19 2.2.4 基因演算法 20 2.3 MIAT系統設計方法論 23 2.3.1 IDEF0階層式模組化設計 24 2.3.2 GRAFCET離散事件建模 26 第三章、 深度多模態神經網路 29 3.1 深度多模態神經網路架構 30 3.2 深度多模態神經網路模型與訊號傳遞 31 3.2.1 切割層 32 3.2.2 特徵層 33 3.2.3 量化層 36 3.2.4 決策層 37 3.3 深度多模態神經網路學習與測試演算法 39 3.3.1 網路學習演算法 39 3.3.2 網路測試演算法 42 第四章、 深度多模態神經網路辨識系統設計 44 4.1 深度多模態神經網路辨識系統模組 44 4.1.1 影像物件切割模組 45 4.1.2 多模態特徵擷取模組 46 4.1.3 統計特徵量化模組 48 4.1.4 決策網路模組 49 4.2 深度多模態神經網路辨識系統GRAFCET 50 4.2.1 影像物件切割GRAFCET 51 4.2.2 多模態特徵擷取GRAFCET 52 4.2.3 統計特徵量化GRAFCET 55 4.2.4 決策網路GRAFCET 56 第五章、 深度多模態神經網路辨識系統實驗 58 5.1 實驗環境 58 5.2 切割層實驗 62 5.3 多模態特徵實驗 64 5.4 訓練樣本數與分類數實驗 67 5.5 增量學習實驗 68 5.6 卷積神經網路實驗 70 5.7 深度神經網路綜合評比 79 5.8 硬體電路合成與驗證 83 5.8.1 量化層模組合成與驗證 84 5.8.2 決策層模組合成與驗證 88 第六章、 結論與未來展望 93 6.1 結論 93 6.2 未來展望 94 參考文獻 95

    [1] J. Pendlebury, H. Xiong, and R. Walshe, "Artificial Neural Network Simulation on CUDA," in 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications, 2012, pp. 228-233.
    [2] R. Ranjan, V. M. Patel, and R. Chellappa, "HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 121-135, 2019.
    [3] G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
    [4] A. R. Sharma and P. Kaushik, "Literature survey of statistical, deep and reinforcement learning in natural language processing," in 2017 International Conference on Computing, Communication and Automation (ICCCA), 2017, pp. 350-354.
    [5] N. Baba, "A new approach for finding the global minimum of error function of neural networks," Neural networks, vol. 2, no. 5, pp. 367-373, 1989.
    [6] K. Fukushima and S. Miyake, "Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition," in Competition and cooperation in neural nets: Springer, 1982, pp. 267-285.
    [7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," California Univ San Diego La Jolla Inst for Cognitive Science1985.
    [8] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, no. 4, pp. 541-551, 1989.
    [9] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
    [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
    [11] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 248-255: Ieee.
    [13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818-2826.
    [14] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [15] Q.-s. Zhang and S.-C. Zhu, "Visual interpretability for deep learning: a survey," Frontiers of Information Technology & Electronic Engineering, vol. 19, no. 1, pp. 27-39, 2018.
    [16] D. M. Hawkins, "The problem of overfitting," Journal of chemical information and computer sciences, vol. 44, no. 1, pp. 1-12, 2004.
    [17] F. S. Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, "Learning to compare: Relation network for few-shot learning," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018.
    [18] J. Shu, Z. Xu, and D. Meng, "Small sample learning in big data era," arXiv preprint arXiv:1808.04572, 2018.
    [19] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
    [20] O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, "Matching networks for one shot learning," in Advances in neural information processing systems, 2016, pp. 3630-3638.
    [21] T. Munkhdalai and H. Yu, "Meta networks," arXiv preprint arXiv:1703.00837, 2017.
    [22] S. Abe, "Feature selection and extraction," in Support Vector Machines for Pattern Classification: Springer, 2010, pp. 331-341.
    [23] N. Venkat, The Curse of Dimensionality: Inside Out. 2018.
    [24] E. Keogh and A. Mueen, "Curse of Dimensionality," in Encyclopedia of Machine Learning and Data Mining, C. Sammut and G. I. Webb, Eds. Boston, MA: Springer US, 2017, pp. 314-315.
    [25] I. K. Fodor, "A survey of dimension reduction techniques," Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, vol. 9, pp. 1-18, 2002.
    [26] H. Abdi and L. J. Williams, "Principal component analysis," Wiley interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433-459, 2010.
    [27] M. Welling, "Fisher linear discriminant analysis," Department of Computer Science, University of Toronto, vol. 3, no. 1, 2005.
    [28] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006.
    [29] L. Gan, W. Lv, X. Zhang, and X. Meng, "Improved PCA+ LDA applies to gastric cancer image classification process," Physics Procedia, vol. 24, pp. 1689-1695, 2012.
    [30] A. H. Sahoolizadeh, B. Z. Heidari, and C. H. Dehghani, "A new face recognition method using PCA, LDA and neural network," International Journal of Computer Science and Engineering, vol. 2, no. 4, pp. 218-223, 2008.
    [31] V. Spruyt, "The curse of dimensionality in classification," URL: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/, 2014.
    [32] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, no. 1, pp. 81-106, 1986.
    [33] J. H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992.
    [34] J. K. Hawkins, "Textural properties for pattern recognition," Picture processing and psychopictorics, pp. 347-370, 1970.
    [35] T. Ojala, M. Pietikainen, and D. Harwood, "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions," in Proceedings of 12th International Conference on Pattern Recognition, 1994, vol. 1, pp. 582-585 vol.1.
    [36] L. Sorensen, S. B. Shaker, and M. De Bruijne, "Quantitative analysis of pulmonary emphysema using local binary patterns," IEEE transactions on medical imaging, vol. 29, no. 2, pp. 559-569, 2010.
    [37] C. Chen, B. Zhang, H. Su, W. Li, and L. Wang, "Land-use scene classification using multi-scale completed local binary patterns," Signal, image and video processing, vol. 10, no. 4, pp. 745-752, 2016.
    [38] T. Ahonen, A. Hadid, and M. Pietikainen, "Face description with local binary patterns: Application to face recognition," IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 12, pp. 2037-2041, 2006.
    [39] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971-987, 2002.
    [40] S. Loncaric, "A survey of shape analysis techniques," Pattern recognition, vol. 31, no. 8, pp. 983-1001, 1998.
    [41] M.-K. Hu, "Visual pattern recognition by moment invariants," IRE transactions on information theory, vol. 8, no. 2, pp. 179-187, 1962.
    [42] Y. Liu, Y. Yin, and S. Zhang, "Hand gesture recognition based on HU moments in interaction of virtual reality," in Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2012 4th International Conference on, 2012, vol. 1, pp. 145-148.
    [43] K. Khan, R. U. Khan, A. Alkhalifah, and N. Ahmad, "Urdu text classification using decision trees," in High-Capacity Optical Networks and Enabling/Emerging Technologies (HONET), 2015 12th International Conference on, 2015, pp. 1-4.
    [44] J.-X. Du, X.-F. Wang, and G.-J. Zhang, "Leaf shape based plant species recognition," Applied mathematics and computation, vol. 185, no. 2, pp. 883-893, 2007.
    [45] F. Kurugollu, B. Sankur, and A. E. Harmanci, "Color image segmentation using histogram multithresholding and fusion," Image and vision computing, vol. 19, no. 13, pp. 915-928, 2001.
    [46] K. S. Tan and N. A. M. Isa, "Color image segmentation using histogram thresholding–Fuzzy C-means hybrid approach," Pattern Recognition, vol. 44, no. 1, pp. 1-15, 2011.
    [47] M. J. Swain and D. H. Ballard, "Color indexing," International journal of computer vision, vol. 7, no. 1, pp. 11-32, 1991.
    [48] I. Leichter, M. Lindenbaum, and E. Rivlin, "Mean shift tracking with multiple reference color histograms," Computer Vision and Image Understanding, vol. 114, no. 3, pp. 400-408, 2010.
    [49] K. Lee, C. Lee, S.-A. Kim, and Y.-H. Kim, "Fast object detection based on color histograms and local binary patterns," in TENCON 2012-2012 IEEE Region 10 Conference, 2012, pp. 1-4.
    [50] T. Kohonen, "The self-organizing map," Proceedings of the IEEE, vol. 78, no. 9, pp. 1464-1480, 1990.
    [51] D. Miljković, "Brief review of self-organizing maps," in 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2017, pp. 1061-1066.
    [52] D. F. Specht, "Probabilistic neural networks," Neural networks, vol. 3, no. 1, pp. 109-118, 1990.
    [53] M. Nemati, M. Braun, and S. Tenbohlen, "Optimization of unit commitment and economic dispatch in microgrids based on genetic algorithm and mixed integer linear programming," Applied energy, vol. 210, pp. 944-963, 2018.
    [54] A. Rezaie, G. Tsatsaronis, and U. Hellwig, "Thermal design and optimization of a heat recovery steam generator in a combined-cycle power plant by applying a genetic algorithm," Energy, vol. 168, pp. 346-357, 2019.
    [55] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
    [56] R. J. Mayer, "IDEF0 function modeling," A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report, AFWAL-TR-81-4023 (The IDEF0 Yellow Book), Knowledge-Based System Inc, College Station, TX, 1992.
    [57] R. David, "Grafcet: A powerful tool for specification of logic controllers," IEEE Transactions on control systems technology, vol. 3, no. 3, pp. 253-268, 1995.
    [58] L. Peeters and A. Dassargues, "Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset," 2006.
    [59] X. Li, S. Deng, S. Wang, Z. Lv, and L. Wu, "Review of Small Data Learning Methods," in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), 2018, vol. 02, pp. 106-109.
    [60] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 265-283.

    QR CODE
    :::