| 研究生: |
王譽鈞 Yuh-Jiun Wang |
|---|---|
| 論文名稱: |
基於深度學習之文件影像陰影偵測及去除演算法 A Deep Learning-based Algorithm for Shadow Detection and Removal from Document Images |
| 指導教授: |
蘇木春
Mu-Chun Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 深度學習 、陰影偵測 、陰影去除 、條件式生成對抗網路 、光學字元辨識 |
| 外文關鍵詞: | Deep Learning, Shadow Detection, Shadow Removal, cGAN, OCR |
| 相關次數: | 點閱:22 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技不斷發展和進步,幾乎每個人都有一支智慧型手機,時常需要用於拍照或是拍攝文件來記錄重要的資訊,但拍攝過程中卻經常因為光線被物體所阻擋,例如拍攝者的手或是手機本身,而導致拍攝的影像中產生不必要的陰影。如此一來,除了會造成照片本身觀感不佳之外,有時甚至還會影響到文字的閱讀。為了避免陰影的產生,拍攝者必須調整成特定的拍攝角度,或是再後續自行使用修圖軟體,選取陰影部分並且調整亮度、色調等,但這些步驟不僅花費大量時間且修改完的結果不盡理想。而對於文件影像的處理,不但要去除陰影,還要同時確保文字可被識別,自然就更加困難了。
本論文提出基於深度學習的演算法,可以針對文件影像偵測及去除陰影。首先,訓練一個條件式生成對抗網路,使能夠找出一張影像中陰影區域,並產生陰影遮罩。從陰影區域與非陰影區域找出各自的主要背景顏色,並結合輸入影像明度資訊與前一個階段的陰影遮罩,透過另一個條件式生成對抗網路生成出影像修復的結果,以達成陰影去除的目的。在實驗結果中,本論文的方法所生成的結果,能夠同時達成陰影去除且使文字可閱讀,與未經過處理之原始輸入影像相較之下,PSRN 與SSIM評估指標皆有所提升,也大幅提高光學字元辨識的正確率。
With the constantly development of technology, almost everyone has a smartphone, which is often used for taking pictures or documenting important information. Nevertheless, unwanted shadows may appear in the captured picture due to the blocked light cause by user’s hand or the phone itself. In this way, it will not only result in bad visual quality of images, but also make the text unreadable sometimes. In order to prevent shadows in images, users need to capture images under well-controlled lighting conditions or use an image editing tool to get rid of shadows by selecting the shadow areas and adjusting the brightness or hue. However, these processes waste a lot of time and do not always come up with a result that users really want. Correcting illumination distortion of document images is even greater challenges because it requires not only removing shadows but also ensuring the legibility of the text.
This paper proposes a deep learning-based algorithm to detect and remove shadows from document images. The algorithm starts with a conditional Generative Adversarial Network (cGAN), which has a generator that can find shadow areas from an image and create shadow detection mask. Then, estimating the main background color of the shadow and non-shadow areas combine with the brightness information of original image and its shadow detection mask as input. With the second cGAN, the input goes through the generator to get a shadow-free image. According to the experimental results, the proposed method can be more efficient at both correcting illumination and making text more legible. Compared to original images, both PSNR and SSIM have been increased and the correct rate of Optical Character Recognition (OCR) has also been greatly improved.
[1] R. Smith, “An overview of the tesseract ocr engine,” in Ninth international conference on document analysis and recognition (ICDAR 2007), IEEE, vol. 2, 2007, pp. 629–633.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation,
vol. 9, no. 8, pp. 1735–1780, 1997.
[3] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint
arXiv:1411.1784, 2014.
[4] A. Ecins, C. Fermüller, and Y. Aloimonos, “Shadow free segmentation in still images using local density measure,” in 2014 IEEE International Conference on Computational Photography (ICCP), 2014, pp. 1–8.
[5] M. Zhang, W. Zhao, X. Li, and D. Wang, “Shadow detection of moving objects in traffic monitoring video,” in 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), vol. 9, 2020, pp. 1983–1987.
[6] J. Wang, X. Li, and J. Yang, “Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2018, pp. 1788–1797.
[7] L. Qu, J. Tian, S. He, Y. Tang, and R. W. Lau, “Deshadownet: A multi-context embedding deep network for shadow removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4067–4075.
[8] G. Finlayson, S. Hordley, C. Lu, and M. Drew, “On the removal of shadows from images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 1, pp. 59–68, 2006.
[9] R. Guo, Q. Dai, and D. Hoiem, “Single-image shadow detection and removal using paired regions,” in CVPR 2011, IEEE, 2011, pp. 2033–2040.
[10] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002.
[11] V. Nguyen, T. F. Yago Vicente, M. Zhao, M. Hoai, and D. Samaras, “Shadow detection with conditional generative adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4510–4518.
[12] X. Hu, Y. Jiang, C.-W. Fu, and P.-A. Heng, “Mask-shadowgan: Learning to remove shadows from unpaired data,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2472–2481.
[13] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
[14] Z. Liu, H. Yin, Y. Mi, M. Pu, and S. Wang, “Shadow removal by a lightness-guided network with training on unpaired data,” IEEE Transactions on Image Processing, vol. 30, pp. 1853–1865, 2021.
[15] Y. Jin, A. Sharma, and R. T. Tan, “Dc-shadownet: Single-image hard and soft shadow removal using unsupervised domain-classifier guided network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5027–5036.
[16] L. Guo, S. Huang, D. Liu, H. Cheng, and B. Wen, “Shadowformer: Global context helps image shadow removal,” arXiv preprint arXiv:2302.01650, 2023.
[17] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[18] S. Jung, M. A. Hasan, and C. Kim, “Water-filling: An efficient algorithm for digitized document shadow removal,” in Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part I 14, Springer, 2019, pp. 398–414.
[19] B. Wang and C. L. P. Chen, “Local water-filling algorithm for shadow detection and removal of document images,” Sensors, vol. 20, no. 23, 2020.
[20] S. Bako, S. Darabi, E. Shechtman, J. Wang, K. Sunkavalli, and P. Sen, “Removing shadows from images of documents,” Asian Conference on Computer Vision (ACCV 2016), 2016.
[21] N. Kligler, S. Katz, and A. Tal, “Document enhancement using visibility detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2374–2382.
[22] J.-R. Wang and Y.-Y. Chuang, “Shadow removal of text document images by estimating local and global background colors,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1534–1538.
[23] K. Nazeri, E. Ng, T. Joseph, F. Qureshi, and M. Ebrahimi, “Edgeconnect: Structure
guided image inpainting using edge prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct. 2019.
[24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, Eds., vol. 27, Curran Associates, Inc., 2014.
[25] J. Gauthier, “Conditional generative adversarial nets for convolutional face generation,” Class project for Stanford CS231N: convolutional neural networks for visual recognition, Winter semester, vol. 2014, no. 5, p. 2, 2014.
[26] D. Michelsanti and Z.-H. Tan, “Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification,” arXiv preprint arXiv:1709.01703, 2017.
[27] H. Park, Y. Yoo, and N. Kwak, “Mc-gan: Multi-conditional generative adversarial network for image synthesis,” arXiv preprint arXiv:1805.01123, 2018.
[28] H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” IEEE transactions on circuits and systems for video technology, vol. 30, no. 11, pp. 3943–3956, 2019.
[29] S. Murali, M. R. Rajati, and S. Suryadevara, “Image generation and style transfer using conditional generative adversarial networks,” in 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 2019, pp. 1415–1419.
[30] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
[31] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, 2015, pp. 234–241.
[32] A. Aghabiglou and E. M. Eksioglu, “Projection-based cascaded u-net model for mr image reconstruction,” Computer Methods and Programs in Biomedicine, vol. 207, p. 106 151, 2021.
[33] N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni, “U-net and its variants for medical image segmentation: A review of theory and applications,” Ieee Access, vol. 9, pp. 82 031–82 057, 2021.
[34] A. Kar and K. Deb, “Moving cast shadow detection and removal from video based on hsv color space,” in 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), IEEE, 2015, pp. 1–6.
[35] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pmlr, 2015, pp. 448–456.
[36] J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means clustering algorithm,” Journal of the royal statistical society. series c (applied statistics), vol. 28, no. 1, pp. 100–108, 1979.
[37] H. Kim and S. Kim, “Automated target detection using k-means based on per-norm for invariant illumination in hyperspectral image,” in 2015 12th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), 2015, pp. 570–572.
[38] C. Clausner, A. Antonacopoulos, and S. Pletschacher, “Icdar2017 competition on recognition of documents with complex layouts - rdcl2017,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, 2017, pp. 1404–1410.
[39] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.