跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張仲良
Chang-Liang Chung
論文名稱: Prototype Design of Smart Helmets
指導教授: 孫敏德
Min-Te Sun
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系在職專班
Executive Master of Computer Science & Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 76
中文關鍵詞: 物件偵測物件辨識交通號誌辨識系統深度學習
外文關鍵詞: YOLO, Object detection, Traffic Sign Recognition, Deep learning
相關次數: 點閱:18下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 高級駕駛員輔助系統 (ADAS)主要用於四輪汽車。ADAS的一個重要組成部分是交通標誌識別,它可以識別重要的道路標誌,以提醒駕駛員應注意的道路法規或注意事項。不幸的是,現在仍然缺乏成熟的ADAS在摩托車上。在這項研究中,我們打算為摩托車構建一個輕量級的 ADAS,它可以識別道路標誌的重要部分,即限速牌和限速地面標誌。 YOLOv4 模型的兩個輕量級版本 YOLOv4-tiny 和 YOLOv4-tiny-3l通過遷移學習進行調整以識別限速標誌。為確保模型適用於嵌入式設備(即用於摩托車頭盔),應用模型剪枝技術提高模型效率。最後,將模型部署在 NVIDIA Jetson Nano 上並通過 TensorRT 加速以評估其性能。實驗結果表明,其中一個模型達到了27.72 FPS和96.19% 的mAP@0.50。


    Advanced Driver Assistance Systems (ADAS) have been used in automobiles primarily in 4-wheeled vehicles. An essential part of ADAS is traffic sign recognition, which recognizes important road signs for the driver to warn the road regulations or matters that he should be aware of. Unfortunately, a mature ADAS for 2-wheeled motorcyclists is still lacking. In this research, we intend to build a lightweight ADAS for motorcyclists that recognizes the important portion of the road signs, i.e., the speed limit posts and speed limit ground signs. Two lightweight versions of the YOLOv4 models, YOLOv4-tiny and YOLOv4-tiny-3l, are tuned by transfer learning to recognize speed limit signs. To ensure the model is suited for embedded device (i.e., to be used on motorcyclist's helmet), the model pruning technique is applied to improve model efficiency. Finally, the models are deployed on NVIDIA Jetson Nano and accelerated by TensorRT to assess their performance. The experiment results indicate that one of the models achieves mAP@0.50 at 96.19% with 27.72 FPS.

    1 Introduction 1 2 RelatedWork 4 2.1 Traditional Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Object detection with Convolutional Neural Networks . . . . . . . . . . . . 4 2.3 Traffic Sign Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Preliminary 7 3.1 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.1 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . 8 3.2 Techniques To Reduce Overfitting . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.2 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.3 Dropout and DropBlock . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Techniques to Reduce Model Size . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 L1 And L2 Regularization . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 Model Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.3 YOLOv4-tiny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Model Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5.1 Confusion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.6 Darknet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.7 NVIDIA Jetson Nano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.7.1 JetPack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7.2 Raspberry Pi NoIR Bulk Camera Board V2 . . . . . . . . . . . . . 28 4 Design 29 4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 Videos from Motorcycle Dash Camera . . . . . . . . . . . . . . . . 31 4.1.2 Open Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.2 Data Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Model Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.1 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.2 Model Pruning and Model Deployment . . . . . . . . . . . . . . . . 37 5 Experiment 39 5.1 Experiment Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.1 Mean Average Precision . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.2 Frames Per Second . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3 Base Training Results From Darknet . . . . . . . . . . . . . . . . . . . . . 49 5.4 Pruning Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.5 Final Model Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.6 Model Deployment on NVIDIA Jetson Nano . . . . . . . . . . . . . . . . . 55 5.7 Best Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6 Conclusions 57 Reference 58

    [1] Frame per second. https://en.wikipedia.org/wiki/Frame_rate.
    [2] google street view. https://www.google.com/streetview.
    [3] T. Ahonen, A. Hadid, and M. Pietikainen. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and
    Machine Intelligence, 28(12):2037–2041, 2006.
    [4] N. S. Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185, 1992.
    [5] C. Bahlmann, Y. Zhu, Visvanathan Ramesh, M. Pellkofer, and T. Koehler. A system
    for traffic sign detection, tracking, and recognition using color, shape, and motion
    information. In IEEE Proceedings. Intelligent Vehicles Symposium, 2005., pages 255–
    260, 2005.
    [6] Belongie and Malik. Matching with shape contexts. In 2000 Proceedings Workshop
    on Content-based Access of Image and Video Libraries, pages 20–26, 2000.
    [7] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal
    speed and accuracy of object detection, 2020.
    [8] D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for
    image classification. In 2012 IEEE Conference on Computer Vision and Pattern
    Recognition, pages 3642–3649, 2012.
    [9] LOOKING company. Looking db1. https://www.lookingtaiwan.com/product_
    detail/191220112825.
    10] PHILO company. Philo m1. https://www.ylmotor.com.tw/portal_c1_cnt_page.
    php?owner_num=c1_7939&button_num=c1&folder_id=68648&cnt_id=691252&
    chbib_buy_look=.
    [11] Nvidia Corporation. Nvidia jetpack. https://developer.nvidia.com/embedded/
    jetpack.
    [12] Corinna Cortes and Vladimir Vapnik. Support vector network. Machine Learning,
    20:273–297, 09 1995.
    [13] Chinese Traffic Sign Database. Chinese traffic signs. http://www.nlpr.ia.ac.cn/
    pal/trafficdata/recognition.html.
    [14] Policy Research Indicators Database. Motorcycle registrations and density in
    asian countries in 2018. https://pride.stpi.narl.org.tw/index/graph-world/
    detail/4b1141ad7395f065017399f0ddb226e4.
    [15] Mark Everingham, Luc Van Gool, C. K. I. Williams, J. Winn, and Andrew Zisserman.
    The pascal visual object classes (voc) challenge, 2010.
    [16] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In 2008 IEEE Conference on Computer Vision and
    Pattern Recognition, pages 1–8, 2008.
    [17] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Cascade object detection with
    deformable part models. In 2010 IEEE Computer Society Conference on Computer
    Vision and Pattern Recognition, pages 2241–2248, 2010.
    [18] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern
    Analysis and Machine Intelligence, 32(9):1627–1645, 2010.
    [19] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line
    learning and an application to boosting, 1995.
    [20] Alvaro Arcos Garc´ıa, Juan Antonio ´ Alvarez-Garc´ıa, and Luis Miguel Soria-Morillo. ´
    Evaluation of deep neural networks for traffic sign detection systems. Neurocomputing, 316:332–344, 2018.
    [21] Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding
    yolo series in 2021, 2021.
    [22] Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V. Le. Dropblock: A regularization method
    for convolutional networks, 2018.
    [23] Ross Girshick. Fast R-CNN. In Proceedings of the International Conference on
    Computer Vision (ICCV), 2015.
    [24] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of
    the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
    [25] Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both weights and
    connections for efficient neural networks, 2015.
    [26] Soufiane Hayou, Arnaud Doucet, and Judith Rousseau. On the selection of initialization and activation function for deep neural networks, 2018.
    [27] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine
    Intelligence, 37(9):1904–1916, 2015.
    28] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
    In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
    pages 770–778, 2016.
    [29] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling
    in deep convolutional networks for visual recognition. Lecture Notes in Computer
    Science, page 346–361, 2014.
    [30] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely
    connected convolutional networks, 2018.
    [31] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network
    training by reducing internal covariate shift, 2015.
    [32] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with
    deep convolutional neural networks. Neural Information Processing Systems, 25, 01
    2012.
    [33] lawbank. Road traffic management and penalty act, article 18-1. https://db.
    lawbank.com.tw/FLAW/FLAWQRY03.aspx?lno=18.1&lsid=FL012454.
    [34] Yann Lecun, Leon Bottou, Y. Bengio, and Patrick Haffner. Gradient-based learning
    applied to document recognition. Proceedings of the IEEE, 86:2278 – 2324, 12 1998.
    [35] Youngwan Lee, Joong won Hwang, Sangrok Lee, Yuseok Bae, and Jongyoul Park.
    An energy and gpu-computation efficient backbone network for real-time object detection, 2019.
    [36] Claude Lemar´echal. Cauchy and the gradient method, 2012.
    [37] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan. Perceptual generative adversarial
    networks for small object detection. In 2017 IEEE Conference on Computer Vision
    and Pattern Recognition (CVPR), pages 1951–1959, 2017.
    [38] Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and
    Serge Belongie. Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017.
    [39] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick,
    James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Doll´ar.
    Microsoft coco: Common objects in context, 2015.
    [40] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network
    for instance segmentation, 2018.
    [41] D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings
    of the Seventh IEEE International Conference on Computer Vision, volume 2, pages
    1150–1157 vol.2, 1999.
    [42] Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. Rectifier nonlinearities
    improve neural network acoustic models. In in ICML Workshop on Deep Learning
    for Audio, Speech and Language Processing, 2013.
    [43] S. Maldonado-Bascon, S. Lafuente-Arroyo, P. Gil-Jimenez, H. Gomez-Moreno, and
    F. Lopez-Ferreras. Road-sign detection and recognition based on support vector
    machines. IEEE Transactions on Intelligent Transportation Systems, 8(2):264–278,
    2007.
    [44] Jiri Matas, Ondrej Chum, Martin Urban, and Tomas Pajdla. Robust wide baseline
    stereo from maximally stable extremal regions. volume 22(10), pages 384–396, 01
    2002.
    [45] Diganta Misra. Mish: A self regularized non-monotonic neural activation function.
    CoRR, abs/1908.08681, 2019.
    [46] Ministry of Transportation and Communications. Traffic accident statistics in taiwan.
    https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=100.
    [47] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR), pages 779–788, 2016.
    [48] J. Redmon and A. Farhadi. Yolo9000: Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017.
    [49] Joseph Redmon. Darknet: Open source neural networks in c. http://pjreddie.
    com/darknet/, 2013–2016.
    [50] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv, 2018.
    [51] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards
    real-time object detection with region proposal networks. In Neural Information
    Processing Systems (NIPS), 2015.
    [52] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
    Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C.
    Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge, 2015.
    [53] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition, 2015.
    [54] Stephen V. Stehman. Selecting and interpreting measures of thematic classification
    accuracy. Remote Sensing of Environment, 62(1):77–89, 1997.
    [55] TNTWEN. Pruned-open-vino-yolo. https://github.com/TNTWEN/
    Pruned-OpenVINO-YOLO.
    [56] B. Triggs and N. Dalal. Histograms of oriented gradients for human detection. In
    2013 IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages
    886–893, Los Alamitos, CA, USA, jun 2005. IEEE Computer Society.
    [57] tzutalin. labelimg tool. https://github.com/tzutalin/labelImg.
    [58] K. E. A. van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders. Segmentation as selective search for object recognition. In 2011 International Conference
    on Computer Vision, pages 1879–1886, 2011.
    [59] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of
    simple features. volume 1, pages I–511, 02 2001.
    [60] Paul Viola and Michael Jones. Robust real-time object detection. International
    Journal of Computer Vision, 57:137–154, 01 2001.
    [61] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Scaled-yolov4:
    Scaling cross stage partial network, 2021.
    [62] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable
    bag-of-freebies sets new state-of-the-art for real-time object detectors, 2022.
    [63] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, Ping-Yang
    Chen, and Jun-Wei Hsieh. Cspnet: A new backbone that can enhance learning
    capability of cnn, 2019.
    [64] Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. You only learn one representation: Unified network for multiple tasks, 2021.
    [65] Mcculloch Warren and Pitts Walter. A logical calculus of the ideas immanent in
    nervous activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943.
    [66] S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo, and J. Choe. Cutmix: Regularization
    strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF
    International Conference on Computer Vision (ICCV), pages 6022–6031, 2019.
    [67] Pengyi Zhang, Yunxin Zhong, and Xiaoqiong Li. Slimyolov3: Narrower, faster and
    better for real-time uav applications. In 2019 IEEE/CVF International Conference
    on Computer Vision Workshop (ICCVW). IEEE, oct 2019.
    [68] G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns
    with an application to facial expressions. IEEE Transactions on Pattern Analysis
    and Machine Intelligence, 29(6):915–928, 2007.
    [69] Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren.
    Distance-iou loss: Faster and better learning for bounding box regression, 2019.
    [70] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu. Traffic-sign detection and
    classification in the wild. In 2016 IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), pages 2110–2118, 2016.

    QR CODE
    :::