| 研究生: |
謝芯蓉 Sin-Rong Hsieh |
|---|---|
| 論文名稱: |
合成瑕疵電子元件影像的生成對抗網路 Synthesis of Defect Images for Electronic Components using A Generative Adversarial Network |
| 指導教授: |
曾定章
Din-Chang Tseng |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 73 |
| 中文關鍵詞: | 深度學習 、生成對抗網路 、瑕疵影像合成 |
| 外文關鍵詞: | deep learning, generative adversarial network, defect image synthesis |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在電子產業的自動化生產中,將自動光學檢測 (automated optical inspection, AOI) 與深度學習技術做結合,代替傳統人工目視的瑕疵檢測方式,不僅降低人力成本,亦能減少漏檢率並提升檢測速度。對於深度學習系統而言,除了良好的演算法可提升檢測的準確率,訓練資料也是影響網路效能的重要因素。若訓練資料不夠充足,網路權重無法確定,會導致網路能力不佳。收集訓練資料需耗費大量人力,且罕見的瑕疵可取得的樣本數很少,訓練網路時會有資料不平衡的問題。
為了讓深度學習技術能夠更好的應用於自動光學檢測,本研究使用條件式生成對抗網路 (conditional generative adversarial network, CGAN) 將印刷電路板的非瑕疵影像轉換成瑕疵影像,透過複製原有的瑕疵來產生更多瑕疵樣本,增加其他深度學習系統可使用的訓練資料數量,達到類似於影像資料擴增的作用,讓檢測效果更好。
使用的訓練集僅有111組成對影像,其中一張是有瑕疵的樣本,另一張是相同內容但無瑕疵的樣本。訓練時會將資料擴增為八倍;我們以人工標記瑕疵位置,繪製成遮罩作為生成網路的輸入以提供更多資訊。在測試階段改變輸入的遮罩與向量,可使影像中的瑕疵產生對應的變異,亦可改變輸入的非瑕疵影像,讓瑕疵轉移至指定的背景。
以pix2pix網路為基礎架構,考慮到實際應用的方便性,我們減少生成網路的下採樣次數以加快網路的執行速度。生成對抗網路通常需要數萬張訓練影像,否則容易過度擬合 (overfit),加上訓練過程中兩個網路可能強弱懸殊,與成對影像並非良好對齊的緣故,合成結果常有模糊的現象。針對上述問題,我們提高生成網路相對於判別網路的訓練次數比例,平衡兩者的能力差距,讓訓練更穩定,緩解合成影像的模糊化;此外,我們根據遮罩提供的位置資訊,在計算損失時將瑕疵與背景分開處理,能夠較高程度的保留原始背景的細節,讓合成影像更清晰,此做法可使FID從68.49降為49.27。若以同樣方式拆開計算MAE與MSE,MAE可從5.08降至1.44,MSE從57.00降為4.94。最後加入位置注意力模組 (position attention module, PAM),讓網路更專注於瑕疵位置的生成,可使MAE、MSE、與FID分別再減少0.02、0.26、與0.25。
In the automated production of the electronics industry, the combination of automatic optical inspection (AOI) and deep learning technology can replace the traditional manual visual defect inspection method, which not only reduces labor costs, but also reduces the error rate and improves the inspection speed. For deep learning systems, in addition to a good algorithm that can improve the accuracy of inspection, training data is also an important factor affecting network performance. If the training data is not sufficient, the network weight cannot be determined, resulting in poor network capabilities. Collecting training data requires a lot of labor costs. The number of samples that can be obtained for rare defects is small, and there will be data imbalance problems when training the network.
In order to make deep learning technology better applied to automatic optical inspection, this study uses conditional generative adversarial network (CGAN) to convert non-defective images of printed circuit boards into defective images. By duplicating the original defects to generate more defect samples, the amount of training data that can be used by other deep learning systems can be increased, which is similar to image data augmentation, making the inspection effect better.
The training set used consists of only 111 pairs of images, one of which is a defective sample and the other is a non-defective sample with the same content. The data will be expanded by a factor of eight during training. We manually marked the defect locations, drawn as masks as the input of the generator to provide it more information. In the testing phase, by changing the input masks and vectors, we can make the defects in the images change accordingly. We can also change the input non-defective image to move the defect to a specified background.
We use pix2pix as the basic architecture. Considering the convenience of practical application, we reduce the down-sampling times of the generator to speed up the execution speed of the network. Generative adversarial networks usually require tens of thousands of training images, otherwise it is easy to overfit. In addition, the capabilities of the two networks may be very different during the training process, and because the paired images are not well aligned, the synthetic results are often blurred. In response to the above problems, we increase the ratio of the training times of the generator to the discriminator to balance the ability gap between the two networks, so that the training is more stable and the blurring of the synthetic images is alleviated. Furthermore, according to the location information provided by the mask, we process the defect and the background separately when calculating the loss, which can preserve the details of the original background to a higher degree and make the synthetic images clearer. This method can reduce FID from 68.49 to 49.27. If MAE and MSE are calculated in the same way, MAE can be reduced from 5.08 to 1.44, and MSE can be reduced from 57.00 to 4.94. Finally, by adding position attention module, the network can be more focused on the generation of defect locations, which can reduce MAE, MSE, and FID by 0.02, 0.26, and 0.25 respectively.
[1] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv:1411.1784v1.
[2] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” arXiv:1611.07004v3.
[3] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” arXiv:1505.04597v1.
[4] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” arXiv:1406.2661v1.
[5] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative adversarial networks: An overview,” arXiv:1710.07035v1.
[6] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier GANs,” arXiv:1610.09585v4.
[7] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv:1511.06434v2.
[8] L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” arXiv:1508.06576v2.
[9] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proc. of the IEEE Conf. on CVPR 2016, Las Vegas, NV, Jun.27-30, 2016, pp.2414-2423.
[10] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556v6.
[11] J. Johnson, A. Alahi, and F.-F. Li, “Perceptual losses for real-time style transfer and super-resolution,” arXiv:1603.08155v1.
[12] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” arXiv:1812.04948v3.
[13] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” arXiv:1912.04958v2.
[14] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” arXiv:1703.06868v2.
[15] A. Karnewar, and O. Wang, “MSG-GAN: Multi-Scale gradients for generative adversarial networks,” arXiv:1903.06048v4.
[16] Y. Pang, J. Lin, T. Qin, and Z. Chen, “Image-to-image translation: Methods and applications,” arXiv:2101.08629v2.
[17] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” arXiv:1411.4038v2.
[18] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv:1703.10593v7.
[19] L. Kong, C. Lian, D. Huang, Z. Li, Y. Hu, and Q. Zhou, “Breaking the dilemma of medical image-to-image translation,” arXiv:2110.06465v2.
[20] A. Antoniou, A. Storkey, and H. Edwards, “Data augmentation generative adversarial networks,” arXiv:1711.04340v3.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385v1.
[22] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” arXiv:1608.06993v5.
[23] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXiv:1701.07875v3.
[24] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” arXiv:1704.00028v3.
[25] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks,” arXiv:1711.10485v1.
[26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.-N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv:1706.03762v5.
[27] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” arXiv:1805.08318v2.
[28] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” arXiv:1709.01507v4.
[29] S. Woo, J. Park, J.-Y. Lee, and I. Kweon, “CBAM: convolutional block attention module,” arXiv:1807.06521v2.
[30] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” arXiv:1809.02983v4.
[31] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, “Toward multimodal image-to-image translation,” arXiv:1711.11586v4.
[32] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” arXiv:1502.03167v3.
[33] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: the missing ingredient for fast stylization,” arXiv:1607.08022v3.
[34] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. of ICML Conf., Haifa, Israel, Jun.21-24, 2010, pp.807-814.
[35] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. of ICML Conf., Atlanta, GA, Jun.16-21, 2013, pp.1-6.
[36] X. Mao, Q. Li, H. Xie, R. Y.K. Lau, Z. Wang, and S. P. Smolley, “Least squares generative adversarial networks,” arXiv:1611.04076v3.
[37] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980v9.
[38] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” arXiv:1706.08500v6.
[39] M. Tan, and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” arXiv:1905.11946v5.