| 研究生: |
謝鎧楠 Kai-Nan Hsieh |
|---|---|
| 論文名稱: |
使用深度與彩色影像的卷積神經網路做倒車障礙物偵測 Rear obstacle detection using a deep convolutional neural network with RGB-D images |
| 指導教授: | 曾定章 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 卷積神經網路 、深度與彩色影像 、障礙物偵測 |
| 外文關鍵詞: | Rear Obstacle Detection |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
汽車成為人們依賴的交通工具後也衍生出許多相關車輛的意外事件。因為駕駛未注意到車輛後方的情況而造成稻車時的碰撞事故是經常發生的事故之一。為了減少這類事故的發生,透過電腦視覺偵測與辨識的技術來了解車輛後方的情況,以提醒駕駛注意車輛後方的安全。最近由於卷積神經網路 (CNN) 的發展使得電腦視覺上在偵測與辨識的能力比以往有更高的準確率及穩定性。透過深度學習的方式來訓練電腦視覺系統找出可能會在倒車時造成危險的物件並利用深度資訊了解該物件與車輛的距離來了解是否可能發生碰撞以警示駕駛,透過 3D 相機 (3D camera) 取得深度資訊來輔助作為判斷影像中障礙物的資訊來識別是否有會造成倒車意外的立體物件存在於影像當中。
由於 3D 相機 Kinect 的彩色相機模組與深度相機模組在位置上及視野 (FOV) 範圍不同,我們必須先將所拍攝到的彩色影像與距離影像透過 Kinect SDK 進行位置校正避免我們後續在準備訓練資料時框選的位置有太大的差異導致訓練誤差。訓練資料準備完畢之後我們修改更快速區域卷積神經網路 (Faster Regions with Convolutional Neural Networks, Faster R-CNN) 的資料輸入端使卷積神經網路可以接受深度影像及彩色影像的四維影像 (RGB-D) 輸入。我們的實驗包含不同的輸入影像:彩色影像、深度影像及彩色與深度影像的四維影像 (RGB-D) 來進行障礙物偵測及兩種不同的卷積網路神經架構對彩色影像及距離影像提取特徵的方式來比較結果,找出障礙物之後透過我們所使用的深度影像資訊計算出車輛與其距離。我們最後實驗的結果顯示對於四維影像的特徵提取方式效果最佳的是對彩色影像及深度影像透過不同的卷積層分別提取特徵圖,卷積層分別提取出特徵圖後將彩色影像的特徵圖及深度影像的特徵圖進行串接,串接結果輸入全連接層進行最後的偵測及辨識。
Car accident happens frequently after becoming the most popular transportation devices in daily life, and it costs life and properties because of driver’s negligence. Therefore, many motor manufacturing have invested and developed the “Driving Assistant System” in order to promote the safety of driving. Computer Vision (CV) has been adopted due to it’s ability of object detection and recognition. In recent years, Convolutional neural networks (CNN) has dramaticly developed which makes computer vision much more reliable.
We train our “Rear obstacle detection and recognizing system” via deep learning model and use data of color image and depth image which received from Microsoft KinectV2. Because of the field of view (FOV) from KinectV2 is different, we calibrate color image and depth image using Kinect SDK in order to decrease the disparity of pixel position. Our detecting and recognizing system is based on Faster R-CNN. Our input data contains two images, and we experiment two different architectures on convolutional neural networks to extract feature maps from input data. One is single feature extractor and single classifier, and the other is two feature extractor and single classifier. Two feature extractor generate the best detection result. Furthermore, we use only color image or depth image as input doing experiments comparing with previous two methods. Finally, after detecting obstacle we use depth image to estimate the distance between vehicle and obstacle.
[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, is.6, pp.1137-1149, 2016.
[2] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE MultiMedia, vol.19, no.2, pp.4-10, Feb. 2012.
[3] M. Sotelo, J. Barriga, D. Fernández, I. Parra, J. Naranjo, M. Marrón, S. Alvarez, and M. Gavilán, "Vision-based blind spot detection using optical flow," Lecture Notes in Computer Science, vol.4739, pp.1113-1118, 2007.
[4] C. Braillon, C. Pradalier, J. Crowley, C. Laugier, L. Gravir, and I. Rhone-alpes, “Real-time moving obstacle detection using optical flow models,” in Proc. Intelligent Vehicles Symp., Tokyo, Japan, Jun.13-15, 2006, pp.466-471.
[5] K. Yamaguchi, “Vehicle ego-motion estimation and moving object detection using a monocular camera,” in Proc. 18th Int. Conf. on Pattern Recognition, Hong Kong, China, Aug.22-24, 2006, pp.610-613.
[6] D. Hoiem, A. Efros, and M. Hebert, "Putting objects in perspective," Int. Journal of Computer Vision, vol.80, no.1, pp.3-15, 2008.
[7] A. Saxena, S. Chung, and A. Ng, "3-D depth reconstruction from a single still image," Int. Journal of Computer Vision, vol.76, no.1, pp.53-69, 2008.
[8] M. Collins, R. Schapire, and Y. Singer, “Logistic regression, adaboost and bregman distances,” in Proc. the 13th Annual Conf. on Computational Learning Theory, San Francisico, CA, Jun.27-Jul.1, 2000, pp.1-26.
[9] S. Zhang, C. Wang, S. Chan, X. Wei, and C. Ho, “New object detection, tracking, and recognition approaches for video surveillance over camera network,” IEEE Sensors Journal, vol.15, no.5, pp.2679-2691, 2015.
[10] D. Comaniciu, P. Meer, and S. Member, “Mean shift : a robust approach toward feature space analysis,” IEEE Trans. on Pattern Anal. and Mach. Intell., vol.24, no.5, pp.603-619, 2002.
[11] Z. Zivkovic and F. DerHeijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognition Letters, vol.27, no.7, pp.773-780, 2006.
[12] S. Chan, B. Liao, K. Tsui, P. Road, and H. Kong, “Bayesian Kalman filtering, regularization and compressed sampling,” in Proc. IEEE Conf. on Circuits and Systems (MWSCAS), Seoul, South Korea, Aug.7-10, 2011, pp.1-4.
[13] H.-S. Sandhu, K.-J. Singh, and D.-S. Kapoor, “Automatic edge detection algorithm and area calculation for flame and fire images,” in Proc. IEEE Conf. on Cloud System and Big Data Engineering, Noida, India, Jan.14-15, 2016, pp.403-407.
[14] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems 2012 (NIPS 2012), Lake Tahoe, Nevada, Dec.3-8, 2012, pp.1-9.
[15] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. Journal of Computer Vision (IJCV), vol.60, is.2, pp.91-110, 2004.
[16] H. Bay, A. Ess, T. Tuytelaars, L. Gool, "SURF: Speeded up robust features", Computer Vision and Image Understanding (CVIU), vol.110, No.3, pp.346–359, 2008.
[17] C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. 4th Alvey Vision Conf., Manchester, UK, Aug.30-Sep.2, 1988, pp.147-152.
[18] J. Shi and C. Tomasi, “Good features to track” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, Jun.21-23, 1994, pp.593-600.
[19] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.779-788.
[20] W. Liu, D.Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, ”SSD: Single shot multibox detector,” in European Conf. on Computer Vision (ECCV), Amsterdam, Holland, Oct.8-16, 2016, pp.21-37.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, Jun.23-28, 2014, pp.580-587.
[22] R. Girshick, “Fast R-CNN,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[23] J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. Journal of Computer Vision (IJCV), vol.104, is.2, pp.154-171, 2013.
[24] J. Aceituno, R. Arnay, J. Toledo, and Leopoldo Acosta, “Using kinect on an autonomous vehicle for outdoors obstacle detection,” IEEE Sensor Journal, vol.16, no.10, May 15, 2016.
[25] J. Choi, D. Kim, H. Yoo, and K. Sohn, “Rear obstacle detection system based on depth from Kinect,” in Proc. 15th Int. IEEE Conf. Intelligent Transportation Systems (ITSC), Sep.16-19, 2012, pp. 98-101.
[26] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGBD images," in Proc. European Conf. on Computer Vision (ECCV), Florence, Italy, Oct.7-13, 2012, pp.746-760.
[27] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, "Learning rich features from RGB-D images for object detection and segmentation," in Proc. European Conf. on Computer Vision (ECCV), Zürich, Switzerland, Sep.6-12, 2014, pp.345-360.
[28] A. Eitel, J. Springenberg, L. Spinello, M. Riedmiller, W. Burgard, “Multimodal deep learning for robust RGB-D object recognition,” in Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Hamburg, Sep.28-Oct.2, 2015, pp.681-687.
[29] Z. Deng and L. Latecki, "Amodal detection of 3D objects: inferring 3D bounding boxes from 2D ones in RGB-depth images," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp.398-406.
[30] X. Xu, Y. Li, G. Wu, and J. Luo, "Multi-modal deep feature learning for RGB-D object detection," Pattern Recognition, vol.72, pp.300-313, 2017.
[31] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.8-10, 2015, pp.3431-3440.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.37, Is.9, pp.1904-1916, 2015.
[33] M. Zeiler and R.Fergus, “Visualizing and understanding convolutional networks,” in Proc. European Conf. on Computer Vision (ECCV), Zürich, Switzerland, Sep.6-12, 2014, pp.818-833.
[34] A. Krizhevsky, I. Sutskever and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in NIPS Proc. Int.l Conf. on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, Dec.03-06, 2012, pp.1097-1105.
[35] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proc. of the 22nd ACM Int. Conf. on Multimedia, Orlando, FL, 2014, pp.675-678.