使用深度與彩色影像的卷積神經網路做倒車障礙物偵測

簡易檢索 / 詳目顯示

回結果列表

研究生：	謝鎧楠 Kai-Nan Hsieh
論文名稱：	使用深度與彩色影像的卷積神經網路做倒車障礙物偵測 Rear obstacle detection using a deep convolutional neural network with RGB-D images
指導教授：	曾定章
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	中文
論文頁數：	75
中文關鍵詞：	卷積神經網路、深度與彩色影像、障礙物偵測
外文關鍵詞：	Rear Obstacle Detection
相關次數：	點閱：13 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

汽車成為人們依賴的交通工具後也衍生出許多相關車輛的意外事件。因為駕駛未注意到車輛後方的情況而造成稻車時的碰撞事故是經常發生的事故之一。為了減少這類事故的發生，透過電腦視覺偵測與辨識的技術來了解車輛後方的情況，以提醒駕駛注意車輛後方的安全。最近由於卷積神經網路 (CNN) 的發展使得電腦視覺上在偵測與辨識的能力比以往有更高的準確率及穩定性。透過深度學習的方式來訓練電腦視覺系統找出可能會在倒車時造成危險的物件並利用深度資訊了解該物件與車輛的距離來了解是否可能發生碰撞以警示駕駛，透過 3D 相機 (3D camera) 取得深度資訊來輔助作為判斷影像中障礙物的資訊來識別是否有會造成倒車意外的立體物件存在於影像當中。
由於 3D 相機 Kinect 的彩色相機模組與深度相機模組在位置上及視野 (FOV) 範圍不同，我們必須先將所拍攝到的彩色影像與距離影像透過 Kinect SDK 進行位置校正避免我們後續在準備訓練資料時框選的位置有太大的差異導致訓練誤差。訓練資料準備完畢之後我們修改更快速區域卷積神經網路 (Faster Regions with Convolutional Neural Networks, Faster R-CNN) 的資料輸入端使卷積神經網路可以接受深度影像及彩色影像的四維影像 (RGB-D) 輸入。我們的實驗包含不同的輸入影像：彩色影像、深度影像及彩色與深度影像的四維影像 (RGB-D) 來進行障礙物偵測及兩種不同的卷積網路神經架構對彩色影像及距離影像提取特徵的方式來比較結果，找出障礙物之後透過我們所使用的深度影像資訊計算出車輛與其距離。我們最後實驗的結果顯示對於四維影像的特徵提取方式效果最佳的是對彩色影像及深度影像透過不同的卷積層分別提取特徵圖，卷積層分別提取出特徵圖後將彩色影像的特徵圖及深度影像的特徵圖進行串接，串接結果輸入全連接層進行最後的偵測及辨識。

Car accident happens frequently after becoming the most popular transportation devices in daily life, and it costs life and properties because of driver’s negligence. Therefore, many motor manufacturing have invested and developed the “Driving Assistant System” in order to promote the safety of driving. Computer Vision (CV) has been adopted due to it’s ability of object detection and recognition. In recent years, Convolutional neural networks (CNN) has dramaticly developed which makes computer vision much more reliable.
We train our “Rear obstacle detection and recognizing system” via deep learning model and use data of color image and depth image which received from Microsoft KinectV2. Because of the field of view (FOV) from KinectV2 is different, we calibrate color image and depth image using Kinect SDK in order to decrease the disparity of pixel position. Our detecting and recognizing system is based on Faster R-CNN. Our input data contains two images, and we experiment two different architectures on convolutional neural networks to extract feature maps from input data. One is single feature extractor and single classifier, and the other is two feature extractor and single classifier. Two feature extractor generate the best detection result. Furthermore, we use only color image or depth image as input doing experiments comparing with previous two methods. Finally, after detecting obstacle we use depth image to estimate the distance between vehicle and obstacle.

摘要    ii
Abstract    iii
致謝    iiiv
目錄    v
圖目錄    vii
表目錄    x
第一章    緒論    1
1.1    研究動機    1
1.2    系統架構    2
1.3    系統特色    6
1.4    論文架構    7
第二章    相關研究    8
2.1    障礙物偵測    8
2.1.1    單眼動態資訊    8
2.1.2    靜態資訊機器學習法    10
2.1.3    雙眼立體視覺法    11
2.2    卷積神經網路的物件偵測    13
2.2.1    卷積神經網路    13
2.2.2    區域卷積神經網路    14
2.2.3    快速區域卷積神經網路    14
2.3    四維輸入影像的卷積神經網路    15
2.3.1    深度影像的取得    15
2.3.2    四維輸入卷積神經網路架構    16
第三章    更快速區域卷積神經網路    20
3.1    更快速區域卷積神經網路介紹    20
3.2    區域建議網路    21
3.3    感興趣區域池化    24
3.4    損失函數    26
第四章    深度與彩色影像的更快速區域卷積神經網路    28
4.1    不同架構的網路架構    28
4.1.1    4D-input Faster R-CNN    29
4.1.2    2-input Faster R-CNN    30
4.2    障礙物偵測    32
4.3    障礙物距離估計    32
第五章    實驗結果    34
5.1    實驗環境與設備介紹    34
5.2    更快速區域卷積神經網路實驗結果    35
5.2.1    更快速區域卷積神經網路修改    36
5.2.2    訓練資料蒐集    36
5.2.3    訓練方法    39
5.2.4    實驗結果與分析    40
第六章    結論與未來展望    57
參考文獻    58

                                

[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, is.6, pp.1137-1149, 2016.
[2] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE MultiMedia, vol.19, no.2, pp.4-10, Feb. 2012.
[3] M. Sotelo, J. Barriga, D. Fernández, I. Parra, J. Naranjo, M. Marrón, S. Alvarez, and M. Gavilán, "Vision-based blind spot detection using optical flow," Lecture Notes in Computer Science, vol.4739, pp.1113-1118, 2007.
[4] C. Braillon, C. Pradalier, J. Crowley, C. Laugier, L. Gravir, and I. Rhone-alpes, “Real-time moving obstacle detection using optical flow models,” in Proc. Intelligent Vehicles Symp., Tokyo, Japan, Jun.13-15, 2006, pp.466-471.
[5] K. Yamaguchi, “Vehicle ego-motion estimation and moving object detection using a monocular camera,” in Proc. 18th Int. Conf. on Pattern Recognition, Hong Kong, China, Aug.22-24, 2006, pp.610-613.
[6] D. Hoiem, A. Efros, and M. Hebert, "Putting objects in perspective," Int. Journal of Computer Vision, vol.80, no.1, pp.3-15, 2008.
[7] A. Saxena, S. Chung, and A. Ng, "3-D depth reconstruction from a single still image," Int. Journal of Computer Vision, vol.76, no.1, pp.53-69, 2008.

[8] M. Collins, R. Schapire, and Y. Singer, “Logistic regression, adaboost and bregman distances,” in Proc. the 13th Annual Conf. on Computational Learning Theory, San Francisico, CA, Jun.27-Jul.1, 2000, pp.1-26.
[9] S. Zhang, C. Wang, S. Chan, X. Wei, and C. Ho, “New object detection, tracking, and recognition approaches for video surveillance over camera network,” IEEE Sensors Journal, vol.15, no.5, pp.2679-2691, 2015.
[10] D. Comaniciu, P. Meer, and S. Member, “Mean shift : a robust approach toward feature space analysis,” IEEE Trans. on Pattern Anal. and Mach. Intell., vol.24, no.5, pp.603-619, 2002.
[11] Z. Zivkovic and F. DerHeijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognition Letters, vol.27, no.7, pp.773-780, 2006.
[12] S. Chan, B. Liao, K. Tsui, P. Road, and H. Kong, “Bayesian Kalman filtering, regularization and compressed sampling,” in Proc. IEEE Conf. on Circuits and Systems (MWSCAS), Seoul, South Korea, Aug.7-10, 2011, pp.1-4.
[13] H.-S. Sandhu, K.-J. Singh, and D.-S. Kapoor, “Automatic edge detection algorithm and area calculation for flame and fire images,” in Proc. IEEE Conf. on Cloud System and Big Data Engineering, Noida, India, Jan.14-15, 2016, pp.403-407.
[14] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems 2012 (NIPS 2012), Lake Tahoe, Nevada, Dec.3-8, 2012, pp.1-9.
[15] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. Journal of Computer Vision (IJCV), vol.60, is.2, pp.91-110, 2004.
[16] H. Bay, A. Ess, T. Tuytelaars, L. Gool, "SURF: Speeded up robust features", Computer Vision and Image Understanding (CVIU), vol.110, No.3, pp.346–359, 2008.
[17] C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. 4th Alvey Vision Conf., Manchester, UK, Aug.30-Sep.2, 1988, pp.147-152.
[18] J. Shi and C. Tomasi, “Good features to track” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, Jun.21-23, 1994, pp.593-600.
[19] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.779-788.
[20] W. Liu, D.Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, ”SSD: Single shot multibox detector,” in European Conf. on Computer Vision (ECCV), Amsterdam, Holland, Oct.8-16, 2016, pp.21-37.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, Jun.23-28, 2014, pp.580-587.
[22] R. Girshick, “Fast R-CNN,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[23] J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. Journal of Computer Vision (IJCV), vol.104, is.2, pp.154-171, 2013.
[24] J. Aceituno, R. Arnay, J. Toledo, and Leopoldo Acosta, “Using kinect on an autonomous vehicle for outdoors obstacle detection,” IEEE Sensor Journal, vol.16, no.10, May 15, 2016.
[25] J. Choi, D. Kim, H. Yoo, and K. Sohn, “Rear obstacle detection system based on depth from Kinect,” in Proc. 15th Int. IEEE Conf. Intelligent Transportation Systems (ITSC), Sep.16-19, 2012, pp. 98-101.
[26] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGBD images," in Proc. European Conf. on Computer Vision (ECCV), Florence, Italy, Oct.7-13, 2012, pp.746-760.
[27] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, "Learning rich features from RGB-D images for object detection and segmentation," in Proc. European Conf. on Computer Vision (ECCV), Zürich, Switzerland, Sep.6-12, 2014, pp.345-360.
[28] A. Eitel, J. Springenberg, L. Spinello, M. Riedmiller, W. Burgard, “Multimodal deep learning for robust RGB-D object recognition,” in Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Hamburg, Sep.28-Oct.2, 2015, pp.681-687.
[29] Z. Deng and L. Latecki, "Amodal detection of 3D objects: inferring 3D bounding boxes from 2D ones in RGB-depth images," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp.398-406.
[30] X. Xu, Y. Li, G. Wu, and J. Luo, "Multi-modal deep feature learning for RGB-D object detection," Pattern Recognition, vol.72, pp.300-313, 2017.
[31] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.8-10, 2015, pp.3431-3440.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.37, Is.9, pp.1904-1916, 2015.
[33] M. Zeiler and R.Fergus, “Visualizing and understanding convolutional networks,” in Proc. European Conf. on Computer Vision (ECCV), Zürich, Switzerland, Sep.6-12, 2014, pp.818-833.
[34] A. Krizhevsky, I. Sutskever and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in NIPS Proc. Int.l Conf. on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, Dec.03-06, 2012, pp.1097-1105.
[35] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proc. of the 22nd ACM Int. Conf. on Multimedia, Orlando, FL, 2014, pp.675-678.

簡易檢索 / 詳目顯示

相關論文