跳到主要內容

簡易檢索 / 詳目顯示

研究生: 孫承德
Cheng-Te Sun
論文名稱: 結合多重解析度、可變形卷積、與自我注意力的瑕疵辨識系統
A Defect Recognition System integrated by Multi-resolution, Deformable Convolution, and Self-attention
指導教授: 曾定章
Din-Chang Tseng
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系在職專班
Executive Master of Computer Science & Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 73
中文關鍵詞: 深度學習瑕疵檢測卷積神經網路自我注意力機制多重解析度可變形卷積
外文關鍵詞: multi-resolution
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 印刷電路板 (Printed circuit board, PCB) 是電子元件的支撐體,可廣泛應用在電視、手機、電腦、汽車等電子工業產品。一般檢查印刷電路板是否為良品,會使用自動光學檢查 (automated optical inspection, AOI) 設備進行外觀檢查,但因為印刷電路板產業對良率有極高要求,易受到製程誤差、樣板對位不準確、光學演算法設定過於嚴格等問題而產生過多檢查誤報,需要大量人工進行二次複檢找出真實瑕疵進行確認,導致產能無法有效提高。
    在本論文中我們使用結合卷積神經網路 (convolution neural network, CNN) 與自我注意力的 CoAtNet 為基礎,建構一套瑕疵辨識系統。我們在系統中分別加入特徵金字塔網路 (Feature pyramid network, FPN) 模組建構多重解析度,將低階尺度特徵與高階尺度特徵融合,改進小尺度瑕疵辨識不易問題;使用可變形卷積網路 (Deformable convolution network, DCN) 模組取代 CoAtNet 部分卷積運算,透過在卷積採樣點增加偏移量,以強化模型在樣本幾何變換的泛用性;另外我們在影像前處理進行也進行改進,包含:i. 將印刷電路板影像與對應的母板影像經影像處理後進行通道併聯,藉由增加樣本特徵多樣性提升網路的辨識效果;ii. 使用影像強化增強資料的特徵以檢出不明顯瑕疵;iii. 使用資料擴增 (data augmentation) 增加樣本的多樣性避免模型過擬合 (overfitting)。
    在實驗中,我們比較各個前處理方法對系統進行的影響,分析瑕疵檢測系統添加改進模組前後的效益,以及比較 SGD、RMSprop、Adam、AdamW 四種最佳化法,與 StepLR、CosinAnnealingLR、ExponentialLR、ReduceLROnPlateau 四種學習策略對模型的效益,最終改進的網路架構在召回率的測試結果達到了 98.5642%,精確率達到了 98.7558%,相較於原版本的網路架構召回率提升了 1.5%,精確率提升了 0.65%,且沒有增加過多推論時間,可以在低漏失的前提下,有效改善 AOI 設備檢查誤報過多導致延誤生產的問題。


    Printed circuit board is the foundation of electronic components, and could be widely applied in electronic industrial products such as televisions, cell phones, computers, and cars. Generally, the automated optical inspection equipment is used for visual inspection to check whether the printed circuit board is acceptable product. However, printed circuit board industry has extremely high requirements on yield, and is easily affected by manufacturing process error, template alignment error, and strictly optical algorithm setting. It will bring out lots of false alarm, and need large human effort to verify real defects which make it’s inefficient to improve the production capacity.
    In this paper, we construct a defect recognition system based on CoAtNet, which integrated convolutional neural network and self-attention. We added feature pyramid network to establish multi-resolutions, which can consolidate low-level scale features with high-level scale features to improve the problem of small-scale defect identification. We replaced the partial convolution operation of CoAtNet with deformable convolution network module by increasing the offset at the convolution sampling point in order to enhance the generality of the model in sample geometric transformation. We also improve the image pre-processing, including: i. Concat printed circuit board image with it’s mother board image in the defect classification, which can improve lower scale defect identification; ii. Use image enhancement algorithm to enhance the features of the data to detect inconspicuous defects; iii. Utilize data augmentation to increase sample diversity and avoid model overfitting..
    In the experiment, we compared the impact of each pre-processing method on the system, analyzed the benefits of the defect recognition system before and after the integration of each module, and tested the benefits of four optimizers, including SGD、RMSprop、Adam、AdamW, and four learning schedules as StepLR、CosinAnnealingLR、ExponentialLR、ReduceLROnPlateau. The last improved network architectures recall rate reached 98.5642%, and the precision rate achieved 98.7558%. To compare with the original version network architectures, recall rate increased by 1.5% and the precision rate is increased by 0.65% without too much inference time. It could also effectively improve the problem of excessive false positives from automated optical inspection equipment inspections under the low false negatives, and avoid the production delay.

    摘要 i Abstract ii 致謝 iv 目錄 v 圖目錄 vi 表目錄 ix 第一章 緒論 1 1.1 研究動機 1 1.2 系統架構 2 1.3 系統特色 3 1.4 論文架構 3 第二章 相關研究 4 2.1 卷積神經網路辨識系統的相關研究 4 2.2 卷積神經網路的輕量化 8 2.3 卷積神經網路的注意力機制 14 第三章 PCB 瑕疵辨識網路架構 23 3.1 CoAtNet 架構 23 3.2 應用於 PCB 的 CoAtNet 架構 30 第四章 實驗 37 4.1 實驗設備與環境 37 4.2 卷積神經辨識網路訓練 37 4.3 卷積神經辨識網路的比較與評估 40 第五章 結論及未來展望 53 參考文獻 54

    [1] Z. Dai, H. Liu, Q. V. Le, and M. Tan, "CoAtNet: marrying convolution and attentionfor all data sizes," arXiv:2106.04803.
    [2] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. H. Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive histogram equalization and its variations,” Computer Vision, Graphics, and Image Processing, vol.39, no.3, pp.355-368, 1987.
    [3] E D. Cubuk, B. Zoph, D. Mané, V. Vasudevan, and Q. V. Le, “AutoAugment: learning augmentation strategies from data,” arXiv:1805.09501v3.
    [4] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “RandAugment: practical automated data augmentation with a reduced search space,” arXiv:1909.13719v2.
    [5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol.86, no.11, pp.2278-2324, Nov. 1998.
    [6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, Dec.3-8, 2012, pp.1097-1105.
    [7] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556.
    [8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Int Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.1-9.
    [9] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition ," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
    [10] M. Lin, Q. Chen, and S. Yan, “Netwok in network,” in Proc. Int. Conf. Learn. Represent (ICLR), Banff, Canada, Apr.14-16, 2014, pp.274-278.
    [11] F. N. Iandola, S. Han, W. Moskewicz, K. Ashraf, W. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 1mb model size,” arXiv: 1602.07360.
    [12] A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: efficient convolutional neural networks for mobile vision applications,'' arXiv:1704.04861.
    [13] F. Chollet, ''Xception: deep learning with depthwise deparable convolutions,'' in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, Jul.22-25, 2017, pp.1800-1807.
    [14] X. Zhang, X. Zhou, M. Lin, and J. Sun, ''ShuffleNet: an extremely efficient convolutional neural network for mobile devices,'' arXiv:1707.01083.
    [15] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, ''Densely connected convolutional networks,'' in Proc. IEEE Conf. on Pattern Recognition and Computer Vision (CVPR), Honolulu, Hawaii, Jul.22-25, 2017, pp.4700-4708.
    [16] M. Guo, T. Xu, J. Liu, Z. Liu, P. Jiang, T. Mu, S. Zhang, R. R. Martin, M. Cheng, and S. Hu, ''Attention mechanisms in computer vision: a survey,'' arXiv:2111.07624.
    [17] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks,'' arXiv:1709.01507v4.
    [18] V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, "Recurrent models of visual attention," arXiv:1406.6247.
    [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Neural Information Processing Systems (NIPS), Long Beach, CA, Dec.4-9, 2017, pp.5998-6008.
    [20] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent (ICLR), Vienna, Austria, May.3-7, 2021, pp.1-21.
    [21] S. Woo, J. Park, J. Lee, and I.S. Kweon, “CBAM: convolutional block attention module,” arXiv: 1807.06521v2.
    [22] J. Li, J. Wang, Q. Tian, W. Gao, and S. Zhang, “Global-local temporal representations for video person re-identification,” in Proc. of IEEE/CVF Int. Conf. on Computer Vision (ICCV), Seoul, Korea, Oct.27-Nov.2, 2019, pp.3958-3967.
    [23] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp.1735-1780, 1997.
    [24] Y. Fu, X. Wang, Y. Wei, and T. Huang, “Sta: Spatial temporal attention for large-scale video-based person reidentification,” in Proc. of AAAI Conf. on Artificial Intelligence, Honolulu, Hawaii, Jan.27-Feb.1, 2019, vol.33, pp.8287-8294.
    [25] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proc. of IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, Jun.16- Jun.20, 2019, pp.510-519.
    [26] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.C. Chen, “MobileNetV2: inverted residuals and linear bottlenecks,” arXiv:1801.04381.
    [27] D. Hendrycks and K. Gimpe, “Gaussuan error liner units (GELUS),” arXiv:1606.08415.
    [28] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” arXiv:1612.03144.
    [29] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and J. Yang, “Deformable convolutional networks” in Proc. of IEEE/CVF Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22- Oct.29, 2017, pp.764-773.
    [30] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable ConvNets v2: more deformable, better results” in Proc. of IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, Jun.16-Jun.20, 2019, pp.9308-9316.
    [31] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, “Pyramid methods in image processing,” RCA engineer, vol.29, no.6, pp.33-41, 1984.
    [32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, “SSD: single shot multiBox detector,” arXiv:1512.02325.
    [33] D. G. Lowe, “Object recognition from local scale-invariant features” in Proc. of IEEE/CVF Int. Conf. on Computer Vision (ICCV), Kerkyra, Greece, Sep.20- Sep25, 1999, pp.1150-1157.
    [34] Y. L. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of feature pooling in visual recognition,” in Proc. International Conference on Machine Learning (ICML), Haifa, Israel, Jun.21-Jun.24, 2010, pp.111-118.
    [35] Q. Li, S. Jin, and J. Yan, “Mimicking very efficient network for object detection” in Proc. IEEE Conf. on Pattern Recognition and Computer Vision (CVPR), Honolulu, Hawaii, Jul.22-Jul.25, 2017, pp.6356-6364.
    [36] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” in Proc. of Neural Information Processing Systems (NIPS), Palais des Congrès de Montréal, Montréal, Dec.2-8, 2018, pp.8778-8788.
    [37] I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” arXiv:1608.03983.
    [38] D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980.
    [39] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv:1711.05101.

    QR CODE
    :::