| 研究生: |
吳芷芳 Zhi-Fang Wu |
|---|---|
| 論文名稱: |
印刷電路板的瑕疵辨識之深度學習系統 Defect Recognition on Printed Circuit Boards using A Deep Learning System |
| 指導教授: |
曾定章
Din-Chang Tseng |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 71 |
| 中文關鍵詞: | 深度學習 、分類網路 、瑕疵辨識 |
| 外文關鍵詞: | deep learning, classification network, defect recognition |
| 相關次數: | 點閱:8 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
印刷電路板 (printed circuit board, PCB) 被稱為電子系統產品之母,是因為它可以嵌入各種電子組件,控制軟體和硬體之間的交流,是大多數電子產品必要的元件之一,近兩年興起遠距辦公和教學,智慧型產品需求上升,面對逐年增長的電子產品上游零件需求,如何提升產品良率是廠商重視的議題之一。傳統印刷電路板製程中的瑕疵檢測是透過專業人工,隨著深度學習領域的發展,廠商開始在產線中加入自動光學檢測 (automated optical inspection, AOI) 和自動視覺檢測 (automated visual inspection, AVI) 的技術,降低生產成本並提升產品良率,但是自動檢測儀器會因為拍攝影像的色差或角度等因素,導致印刷電路板瑕疵的誤判,需要人工加以篩選,如果以影像分類網路來辨識自動檢測儀偵測到的瑕疵影像是否為真的瑕疵,可以更進一步降低瑕疵漏撿率。
近幾年來,許多結合卷積神經網路與 Transformer 的網路架構被應用於分類任務,希望能夠同時保有卷積擷取特徵的能力與自我注意力 (self-attention) 學習特徵之間關聯性的能力,我們用這種類型網路架構輸出的特徵圖作為特徵金字塔網路 (feature pyramid network, FPN) 的輸入,相較於傳統卷積神經網路,讓特徵金字塔網路有更好的特徵圖作高低階特徵融合與分類預測。
在本研究中,採用 CoAtNet-0 作為骨幹網路,這是一種結合Transformer和卷積神經網路的架構,主要修改內容包含:i. 卷積區塊加入注意力模組;ii. 加入特徵金字塔網路,通過特徵融合,讓高階特徵幫助低階特徵更穩定,模型在預測時有更豐富的空間資訊,加強網路辨識較小或是不明顯瑕疵特徵的能力;此外,我們還研究了不同注意力模組在骨幹網路的效能,不同的高低階特徵融合方法對網路效能的影響,以及測試修改損失函數類別權重值大小,改善樣本數量不平均的問題。
在實驗中,我們共收集了105,093張印刷電路板的影像資料,其中正常類別有61,671張,瑕疵類別有43,422張。正常類別分為訓練樣本55,016張,驗證樣本6,655張;瑕疵類別分為訓練樣本38,659張,驗證樣本4,763張。實驗結果顯示,原始 CoAtNet-0 的驗證集精確率為 99.264%,召回率為 99.055%,準確率 (accuracy) 為 99.299%;經過本研究修改網路架構與調整訓練參數,最終取得驗證集精確率為 99.140%,召回率為 99.265%,準確率為99.334% 的成果。
The printed circuit board (PCB) is known as the mother of electronic system products. PCB can be embedded various electronic components and is one of the necessary components of most electronic products. In the past two years, work from home and distance learning has become more and more popular, leading demand for smart products rise up. The demand for PCB is very high and still growing, how to improve product yield is one of the issues that manufacturers pay attention to. Defect detection in the traditional PCB manufacturing process relies on manual inspection. With the development of deep learning, manufacturers have switched to automatic optical inspection (AOI) and automatic visual inspection (AVI) for defect detection, reducing production costs and improving product yield. But automatic inspection instruments will cause misjudgment of PCB defects due to factors such as image chromatic aberration or angle, which needs additional manual screening.
In recent years, many network architectures combining convolutional neural networks (CNN) and Transformers have been applied to classification tasks. This type of architecture can retain the advantages of both CNN and Transformer. We use the feature map output by this type of network architecture as the input of the feature pyramid network (FPN), which makes FPN have better feature maps than only CNN.
In our experiment, we use CoAtNet-0 as the backbone network, which architecture is based on Transformer and CNN. The modifications include: i. Adding attention module at depthwise convolution block; ii. Adding feature pyramid network (FPN), by multiple resolutions, the model can learn more feature information to improve performance. In addition, we compare the performance of adding different attention modules into CoAtNet-0, the impact of different feature fusion methods on FPN performance, and modified different class weight of loss funciton to solve the problem of unbalance class data size.
In the experiment, we collected 105,093 images of PCBs, including 61,671 in the normal category and 43,422 in the defect category. Normal images are divided into 55,016 training samples and 6,655 verification samples; defective images are divided into 38,659 training samples and 4,763 verification samples. The experimental results show that the precision of the validation set of the original CoAtNet-0 is 99.264%, the recall is 99.055%, the accuracy is 99.299%. After modifying the network architecture and adjusting the training parameters in this study, the final precision of the validation set reached 99.140%, the recall reached 99.268%, the accuracy reached 99.334%.
[1] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, NV, Dec.3-8, 2012, pp.1106-1114.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv:1706.03762.
[3] Z. Dai, H. Liu, Q. V. Le, and M. Tan, “CoAtNet: marrying convolution and attention for all data sizes,” arXiv:2106.04803.
[4] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: efficient convolutional neural networks for mobile vision applications,” arXiv:1704.04861.
[5] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” arXiv: 1709.01507v4.
[6] T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, Jun.21-26, 2017, pp. 2117-2125.
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556.
[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” arXiv:1409.4842.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385.
[10] G. Huang, Z. Liu, L. Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” arXiv:1608.06993.
[11] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: transformers for image recognition at scale,” arXiv:2010.11929.
[12] H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “CvT: introducing convolutions to vision transformers,” arXiv:2103.15808.
[13] L. Yuan, Q. Hou, Z. Jiang, J. Feng, and S. Yan, “ VOLO: vision outlooker for visual recognition,” arXiv:2106.13112
[14] S. Woo, J. Park, J.-Y. Lee, and I. Kweon, “CBAM: convolutional block attention module,” arXiv:1807.06521v2.
[15] Y. Liu, Z. Shao, Y. Teng, and N. Hoffmann, “NAM: normalization-based attention module,” arXiv:2111.12419v1.
[16] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: inverted residuals and linear bottlenecks,” arXiv:1801.04381v4.
[17] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22-29, 2017, pp.843-852.
[18] D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELU),” arXiv:1606.08415v4.
[19] A. F. Agarap, “Deep learning using rectified linear units (ReLU),” arXiv:1803.08375v2.
[20] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for mobilenetv3.” arXiv:1905.02244.
[21] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, “Deconvolutional networks,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, Jun.13-18, 2010, pp.2528-2535.
[22] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” in Proc. of Neural Information Processing Systems (NIPS), Palais des Congrès de Montréal, Montréal, Canada, Dec.2-8, 2018, pp.8778-8788.
[23] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” arXiv:1512.04150.
[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: visual explanations from deep networks via gradient-based localization,” arXiv:1610.02391.
[25] K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics Gems, Academic Press, Amsterdam, 1994, Ch.5, pp.474-485.
[26] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv:1711.05101v3.
[27] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980v9.
[28] M. J. Zhao, N. Edakunni, A. Pocock, and G. Brown, “Beyond Fano’s inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications,” Journal of Machine Learning Research, 2013, pp.1033-1090.
[29] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: a large-scale hierarchical image database,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Miami, FL, Jun.20-25, 2009, pp.248-255.
[30] T. Ridnik, E. B. Baruch, A. Noy, and L. Z. Manor, “Imagenet-21k pretraining for the masses,” arXiv:2104.10972.
[31] P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position repre sentations,” arXiv:1803.02155.
[32] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-xl: attentive language models beyond a fixed-length context,” arXiv:1901.02860.
[33] P. Ramachandran, N. Parmar, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens, “Stand-alone self-attention in vision models,” arXiv:1906.05909.
[34] Y.-H. H. Tsai, S. Bai, M. Yamada, L.-P. Morency, and R. Salakhutdinov, “Transformer dissection: a unified understanding of transformer’s attention via the lens of kernel,” arXiv:1908.11775.
[35] B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze, “Levit: a vision transformer in convnet’s clothing for faster inference,” arXiv:2104.01136.
[36] L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, F. E. H. Tay, J. Feng, and S. Yan, “Tokens-to-token ViT: training vision transformers from scratch on imagenet,” arXiv:2101.11986.
[37] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805.
[38] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell, “Language models are few-shot learners,” arXiv:2005.14165.