跳到主要內容

簡易檢索 / 詳目顯示

研究生: 涂珮涓
Pei-Chuan Tu
論文名稱: 用於3D物體辨識基於視圖的注意力圖卷積監督式對比學習神經網路
指導教授: 葉英傑
Yin-Gjie Ye
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理研究所
Graduate Institute of Industrial Management
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 39
中文關鍵詞: 工業自動化多視圖三維物體辨識注意力機制對比學習
外文關鍵詞: automated industry, multi-view 3D object recognition, attention mechanism, contrastive learning
相關次數: 點閱:5下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 工業革命自18、19世紀興起,歐美國家透過機器取代手工生產,演進出四次工業革命而目前正處於第四次。本研究針對工業革命的核心自動化,以提高生產效率、降低成本、提升品質為目標,特別關注於製造業中應用的機器視覺系統。傳統三維物體辨識方法多利用二維多視角圖片,但未充分利用多視角圖片間的相關性,以及現實生活中的拍攝環境可能會影響圖片品質增加模型辨識難度。因此,本研究旨在提出一套辨識三維產品的系統,包括基於視圖的圖卷積神經網路、圖片重要特徵提取以及對比學習訓練方法。具體目標為提高辨識效能、提升對圖片重點的捕捉能力以及增強在現實生活中的穩健性。為達成此目的,本研究將採用有效聚合多視角圖片訊息的基於視圖的圖卷積神經網路、注意力機制以提取重要特徵資訊,以及監督式對比學方法來訓練神經網路以提升模型泛化能力。這些方法的詳細內容將在後續章節中詳細探討。


    The Industrial Revolution emerged in the 18th and 19th centuries, during which European and American countries replaced manual labor with machines, leading to four distinct industrial revolutions, with the current era being the fourth. This study focuses on the core of the Industrial Revolution, automation, aiming to improve production efficiency, reduce costs, and enhance quality, particularly through the application of machine vision systems in the manufacturing industry. Traditional methods of three-dimensional object recognition often utilize two-dimensional multi-view images but fail to fully exploit the correlation between these images and the potential impact of real-life shooting conditions on image quality, thereby increasing the difficulty of model recognition. Therefore, this study aims to propose a system for recognizing three-dimensional products, comprising a view-based convolutional neural network, feature extraction from images, and contrastive learning training methods. The specific objectives are to improve recognition efficiency, enhance the capture of key features in images, and strengthen robustness in real-life scenarios. To achieve these goals, the study will adopt a view-based convolutional neural network that effectively aggregates information from multiple-view images, an attention mechanism to extract important feature information, and supervised contrastive learning methods to train neural networks and enhance model generalization capabilities. The detailed implementation of these methods will be discussed in subsequent chapters.

    目錄 摘要 ii Abstract iv 目錄 v 表目錄 2 第一章 緒論 3 1.1 研究背景與動機 3 1.2 研究挑戰 4 1.3 研究目的 4 1.4 研究方法 5 第二章 文獻回顧 6 2.1 神經網路 6 2.1.1殘差網路 6 2.1.2基於視圖的圖卷積神經網路 8 2.2 注意力機制 9 2.3 對比學習 10 第三章 方法論 13 3.1 基於視圖的圖卷積網路模型 14 3.1.1 ResNet18預訓練 14 3.1.2 圖卷積層 16 3..1.3 訊息傳遞層 16 3.1.4 選擇性視圖採樣層 17 3.2 注意力機制 18 3.3 監督式對比學習 19 第四章 實驗 22 4.1 數據集和前處理 22 4.1.1實際產品資料集 22 4.1.2 渲染資料集 23 4.1.3 ModelNet40資料集 24 4.2 實驗設置 25 4.3 實驗結果與分析 25 4.3.1 實驗一: 探討模型辨識實際圖片穩健性 25 4.3.2實驗二: 探討模型學習多類型圖片特徵效能 27 4.3.3實驗三: 探討模型在辨識公開資料集效能 29 第五章 結論 31 參考文獻 32

    參考文獻
    [1] Bahdanau, D., K. Cho & Y. Bengio (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    [2] Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202.
    [3] Golnabi, H. & A. Asadpour (2007). Design and application of industrial machine vision systems. Robotics and Computer-Integrated Manufacturing, 23(6), 630-637.
    [4] He, K., X. Zhang, S. Ren & J. Sun (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
    [5] Khosla, P., P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, ... & D. Krishnan (2020). Supervised contrastive learning. Advances in neural information processing systems, 33, 18661-18673.
    [6] Kipf, T. N., & M. Welling (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
    [7] Krizhevsky, A., I. Sutskever & G. E. Hinton (2012). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90.
    [8] LeCun, Y., L. Bottou, Y. Bengio & P. Haffner (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
    [9] Mnih, V., N. Heess, & A. Graves (2014). Recurrent models of visual attention. Advances in neural information processing systems, 27.
    [10] Niu, Z., G. Zhong & H. Yu (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48-62.
    [11] Qi, C. R., Yi, L., Su, H., & L. J. Guibas (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30.
    [12] Simonyan, K., & A. Zisserman (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
    [13] Su, H., S. Maji, E. Kalogerakis, & E. Learned-Miller (2015). Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision (pp. 945-953).
    [14] Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, ... & A. Rabinovich (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
    [15] Thoben, K. D., S. Wiesner, & T. Wuest (2017). “Industrie 4.0” and smart manufacturing-a review of research issues and application examples. International journal of automation technology, 11(1), 4-16.
    [16] Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ... & I. Polosukhin (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [17] Wei, X., R. Yu & J. Sun (2020). View-gcn: View-based graph convolutional network for 3d shape analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1850-1859).
    [18] Wu, Z., Y. Xiong, S. X. Yu & D. Lin (2018). Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3733-3742).
    [19] Zeiler, M. D., & R. Fergus (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13 (pp. 818-833). Springer International Publishing.

    QR CODE
    :::