跳到主要內容

簡易檢索 / 詳目顯示

研究生: 城筱筑
Hsiao-Chu Cheng
論文名稱: 基於AI技術之視障人士的路況分析及障礙物辨識與測距
指導教授: 王文俊
Wen-June Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 91
中文關鍵詞: 穿戴式裝置視障者輔具語義分割物件偵測
外文關鍵詞: visually impaired people, walking guide
相關次數: 點閱:30下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文為視障人士提出一種穿戴式裝置,以協助視障人士安全地於設有人行道及斑馬線的戶行走。藉由識別室外路況資訊,並且辨識室外常見的障礙物,再透過定位裝置模組實現室外導航,綜合所有的室外路況資訊簡單扼要地以語音引導視障者安全地前往目的地。此篇研究內容包含四個部分,第一部分為穿戴式導盲裝置的設計與多個功能之間的整合,其中包含系統架構與硬體介紹,多個功能之間的行程間通訊與切換機制設計。第二部分我們使用語義分割神經網路和物件偵測神經網路,來正確地識別室外路況以及常見的障礙物。第三部分為行走路線規劃的設計與系統輸出之提示音的內容說明,透過不斷的比對路徑規劃點的方向與感測器所獲得的方向是否一致作為引導機制。第四部分為障礙物距離估測與語音播報優先序的設定,透過RGB-D攝影機回傳的深度圖,結合物件偵測神經網路,經過計算得知障礙物與視障者的距離,並語音播報出最優先需避免碰撞到的障礙物種類。在實際與視障者實驗後,證實本研究所提出的方法能夠正確的指引視障者行走在人行道及斑馬線上,並確實避開障礙物,最後能安全地抵達設定之目的地。


    This paper proposes a wearable device for the visually impaired to help the visually impaired safely walk on the sidewalk and crosswalks. The device not only can recognize the sidewalk and crosswalk and obstacles ahead but also have navigation function so that the visually impaired people can walk to the destination safely with some simple guiding voice. This study consists of four parts. The first part is the design of a wearable guide device which contains system architecture and hardware introduction, inter-process communication, and switching mechanism design between multiple functions. In the second part, we use semantic segmentation and object detection neural network models, respectively, to recognize outdoor road conditions and common obstacles correctly. The third part is the walking route planning and the guiding voice content description By continuously comparing the direction of the path planning point with the direction obtained by the sensor as a guiding mechanism. The fourth part is the obstacle distance estimation and guiding voice broadcast priority setting. Based on the depth map measured by the RGB-D camera and the output of the object detection model, the distance between the obstacle and the visually impaired is calculated. Then, the voice broadcasts the obstacle name which is the highest priority to be avoided. Finally, we have a real experiment with visually impaired people, and the experiment shows that the visually impaired people can be guided to walk on sidewalks and crosswalks safely and avoid obstacles appropriately, and finally reach the set destination.

    目錄 摘要 i Abstract ii 致謝 iii 目錄 iv 圖目錄 vi 表目錄 ix 第一章 緒論 1 1.1. 研究動機與背景 1 1.2. 文獻回顧 2 1.3. 論文目標 5 1.4. 論文架構 5 第二章 系統架構與硬體介紹 7 2.1. 系統架構與通訊格式 7 2.2. 整合機制設計 14 第三章 語義分割網路 16 3.1. 網路架構 16 3.2. 訓練資料 19 第四章 物件偵測網路 24 4.1. 網路架構 24 4.1.1. YOLOv3-spp 24 4.1.2. YOLOv5s 27 4.2. 訓練資料 31 第五章 導盲之路線規劃與定向機制 35 5.1. 定向與定位 35 5.1.1. 定位方法之比較 36 5.1.2. 定向方法之比較 40 5.2. 路徑規劃 42 5.2.1. Google Maps API 43 5.2.2. 路徑節點定義與解析 44 5.3. 節點更新機制 44 5.4. 定向糾正機制 45 第六章 障礙物距離估測與語音播報優先序 47 6.1. 障礙物深度估測計算方式 47 6.2. 障礙物播報與優先順序 49 第七章 實驗結果 51 7.1. 語義分割 51 7.2. 物件偵測 53 7.3. 導航系統與定向機制 55 7.3.1. 定位、定向、路線規劃 56 7.3.2. 節點更新與定向糾正機制 59 7.4. 障礙物距離估測與播報優先序 64 第八章 結論與未來展望 67 8.1. 結論 67 8.2. 未來展望 68 參考文獻 70 研究投稿 75

    參考文獻
    [1] World Health Organization, "World report on vision." [Online]. Available: https://www.who.int/docs/default-source/documents/publications/world-vision-report-accessible.pdf?sfvrsn=223f9bf7_2.
    [2] D. M. Brouwer, G. Sadlo, K. Winding, and M. I. G. Hanneman, "Limitations in mobility: experiences of visually impaired older people," British Journal of Occupational Therapy, vol. 71, no. 10, pp. 414-421, Oct. 2008.
    [3] Y. Shiizu, Y. Hirahara, K. Yanashima, and K. Magatani, "The development of a white cane which navigates the visually impaired," in Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE, 2007, pp. 5005-5008.
    [4] L. Whitmarsh, "The benefits of guide dog ownership," Visual Impairment Research, vol. 7, no. 1, pp. 27-42, 2005.
    [5] 汪孟璇, "基於深度學習之道路資訊辨識導盲系統," 碩士, 電機工程學系, 國立中央大學, 桃園市, 2020.
    [6] 沈鴻儒, "基於深度學習之道路障礙物偵測與盲人行走輔助技術," 碩士, 電機工程學系, 國立中央大學, 桃園市, 2020.
    [7] 謝易軒, "基於AI技術之視障人士的行進避障及超商辨識與引導," 碩士, 電機工程學系, 國立中央大學, 桃園市, 2021.
    [8] R. P. Poudel, S. Liwicki, and R. Cipolla, "Fast-Scnn: Fast semantic segmentation network," arXiv preprint arXiv:1902.04502, 2019.
    [9] J. Redmon and A. Farhadi, " Yolo v3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
    [10] M. M. Islam, M. S. Sadi, K. Z. Zamli, and M. M. Ahmed, "Developing walking assistants for visually impaired people: A review," IEEE Sensors Journal, vol. 19, no. 8, pp. 2814–2828, Apr. 2019.
    [11] Y. Lin, K. Wang, W. Yi, and S. Lian, "Deep learning based wearable assistive system for visually impaired people," 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 2549-2557, doi: 10.1109/ICCVW.2019.00312.
    [12] K. Yang, L. M. Bergasa, E. Romera, R. Cheng, T. Chen, and K. Wang, "Unifying terrain awareness through real-time semantic segmentation," IEEE Intelligent Vehicles Symposium(IV), Suzhou, China, 2018, pp. 1033-1038.
    [13] Z. Cao, X. Xu, B. Hu, and M. Zhou, “Rapid detection of blind roads and crosswalks by using a lightweight semantic segmentation network,” in IEEE Transactions on Intelligent Transportation Systems, May. 2020, pp.1-10, doi: 10.1109/TITS.2020.2989129.
    [14] R. Tapu, B. Mocanu, and T. Zaharia, "DEEP-SEE: Joint object detection, tracking and recognition with application to visually impaired navigational assistance," Sensors, vol. 17, no. 11, p. 2473, Oct. 2017.
    [15] K. Yang, R. Cheng, L. M. Bergasa, E. Romera, K. Wang, and N. Long, "Intersection perception through real-time semantic segmentation to assist navigation of visually impaired pedestrians," 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 2018, pp. 1034-1039.
    [16] S. Lin, R. Cheng, K. Wang, and K. Yang, "Visual localizer: Outdoor localization based on ConvNet descriptor and global optimization for visually impaired pedestrians," Sensors, vol. 18, no. 8, p. 2476, Jul. 2018.
    [17] W. -J. Chang, L. -B. Chen, C. -Y. Sie, and C. -H. Yang, "An artificial intelligence edge computing-based assistive system for visually impaired pedestrian safety at zebra crossings," in IEEE Transactions on Consumer Electronics, vol. 67, no. 1, pp. 3-11, Feb. 2021, doi: 10.1109/TCE.2020.3037065.
    [18] A. Aladrén, G. López-Nicolás, L. Puig, and J. J. Guerrero, "Navigation assistance for the visually impaired using RGB-D sensor with range expansion," IEEE Systems Journal, vol. 10, no. 3, pp. 922-932, Sept. 2016.
    [19] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, "Enet: A deep neural network architecture for real-time semantic segmentation," arXiv preprint arXiv:1606.02147, 2016.
    [20] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, "ICNet for real-time semantic segmentation on high-resolution images," European Conference on Computer Vision(ECCV), Munich, Germany, 2018, pp. 405-420.
    [21] E. Romera, J. M. Álvarez, L. M. Bergasa, and R. Arroyo, "ERFNet: Efficient residual factorized convnet for real-time semantic segmentation," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263-272, 2018.
    [22] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, "Bisenet: Bilateral segmentation network for real-time semantic segmentation," European Conference on Computer Vision(ECCV), Munich, Germany, 2018, pp. 325-341.
    [23] J. Zhuang, J. Yang, L. Gu, and N. C. Dvornek, "ShelfNet for fast semantic segmentation," arXiv preprint arXiv:1811.11254, 2018.
    [24] W. Chen, X. Gong, X. Liu, Q. Zhang, Y. Li, and Z. Wang, "FasterSeg: Searching for faster real-time semantic segmentation," arXiv preprint arXiv:1912.10917, 2019.
    [25] W. Liu et al., "SSD: Single Shot MultiBox Detector," arXiv preprint arXiv:1512.02325, 2015.
    [26] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
    [27] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," arXiv preprint arXiv:1612.08242, 2016.
    [28] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal loss for dense object detection," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 42, no. 02, pp. 318-327, Feb. 2020.
    [29] M. Tan, R. Pang, and Q. V. Le, "EfficientDet: Scalable and efficient object detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10778-10787.
    [30] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
    [31] T.-Y. Lin, et al., "Microsoft COCO: Common objects in context." arXiv preprint arXiv:1405.0312, 2014.
    [32] R. Jocher, et al., “yolov5: v5.0,” 2020. [Online]. Available: https://github.com/ultralytics/yolov5.
    [33] "Jetson AGX Xavier," [Online]. Available: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/.
    [34] "ZED 2," [Online]. Available: https://www.stereolabs.com/zed-2/
    [35] "SIM-7600CE DC5V 4G/GNSS," [Online]. Available: https://shop.cpu.com.tw/product/57574/info/.
    [36] "HAGiBiS海備思USB鋁合金外接音效卡三孔國際版," [Online]. Available: https://24h.pchome.com.tw/prod/DCADL6-A9009A36N.
    [37] "Desire Power V8 14.8V 5200mAh 35C-70C 4S鋰電池," [Online]. Available: https://reurl.cc/0jXzY6.
    [38] G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "MobileNets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv: 1704.04861, 2017.
    [39] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, "MobileNetV2: Inverted residuals and linear bottlenecks," IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, Utah, 2018, pp. 4510-4520.
    [40] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, Hawaii, 2017, pp. 2881-2890.
    [41] A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, "The reversible residual network: Backpropagation without storing activations," arXiv preprint arXiv:1707.04585 (2017)
    [42] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," arXiv preprint arXiv: 1406.4729, 2014.
    [43] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," arXiv preprint arXiv: 1803.01534, 2018.
    [44] "labelImg," [Online]. Available: https://github.com/tzutalin/labelImg.
    [45] "World Geodetic System," [Online]. Available: https://en.wikipedia.org/wiki/World_Geodetic_System.

    QR CODE
    :::