跳到主要內容

簡易檢索 / 詳目顯示

研究生: 邱千芳
Chien-Fang Chiu
論文名稱: 基於時空域摺積神經網路之抽菸動作辨識
Smoking Action Recognition Based on Spatial-Temporal Convolutional Neural Networks
指導教授: 張寶基
Pao-Chi Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 通訊工程學系
Department of Communication Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 59
中文關鍵詞: 抽菸動作辨識視訊分類摺積神經網路深度學習
外文關鍵詞: Smoking action recognition, Video Classification, Convolutional neural networks, Deep learning
相關次數: 點閱:24下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 國際上有許多國家或各區於室內公共或工作場所全面禁止抽菸,台灣也不例外。但在醫院的門口、校園的角落,仍時常看到有人在抽菸。即使沒有吸菸,但若站在吸菸者旁邊,仍會吸到菸,此菸稱為二手菸。二手菸對於人體危害甚多,除了增加罹患疾病的機率,如癌症、心臟病、中風、呼吸道疾病等,更進一步有可能傷害大腦機能。我們希望經由深度學習的技術與方法,用以辨識揪出違法的吸菸者。
    本研究為「基於時空域摺積神經網路之抽菸動作辨識」,提出應用於抽菸動作辨識的系統。採用資料平衡與資料增加等方式增加效能,使用深度學習中的摺積神經網路 GoogLeNet,與Temporal segment networks之影片分段架構,組成擁有時間結構之空間域摺積神經網路(即題目之時空域神經網路),達成有效辨識抽菸影片之系統。於原先之 Hmdb51 抽菸影片,辨識達100%,於增加之 Activitynet smoking 日常抽菸影片 (Hmdb51 + Activi-tynet smoking),可達99.16%。於選擇之 AVA data 電影抽菸片段,亦能達到91.667%,能有效分辨抽菸之影片。


    Cigarette smoking increases risk for death from all causes in men and wom-en. If one stands next to a smoker, this person still can be infected, called passive smoking. Consequently, smoking is prohibited in many closed public areas such as government buildings, educational facilities, hospitals, enclosed sport facili-ties, and buses. However, it still often happens that smokers smoke even in highly prohibited places such as hospitals and elementary school campuses. The objective of this work is to develop a smoking action recognition system based on deep learning, which allows quick discovery of smoking behavior.
    In this work, we propose a system that can recognize smoking action. It uti-lizes data balancing and data augmentation based on GoogLeNet and Temporal segment networks (TSN) architecture to achieve effective smoking action recog-nition. In our experiment, spatial CNN is more powerful than temporal CNN in smoking action. The experimental results show that the smoking accuracy rate can reach 100% for Hmdb51 test dataset. For additional ActivityNet smoking, accuracy rate can reach 99.16%. For additional irrelevant movie smoking clips, the accuracy can also be as high as 91.67%.

    摘要 i Abstract ii 誌謝 iii 目錄 iv 圖目錄 vi 表目錄 viii 第一章 緒論 1 1-1 研究背景 1 1-2研究動機與目的 2 1-3 論文架構 3 第二章 類神經網路與深度學習 4 2-1 類神經網路 4 2-1-1 類神經網路之發展 5 2-1-2 倒傳遞類神經網路 7 2-2 深度學習 11 2-2-1 深度神經網路 11 2-2-2 摺積神經網路 (CNN) 13 2-2-3 批次資料正規化 (Batch Normalization) 18 2-3 動作辨識領域之發展 22 2-3-1 雙串流的神經網路 (Two-stream networks) 22 2-3-2 Temporal segment networks (TSN) 23 第三章 提出之方法與相關使用 26 3-1 影片影格之提取與資料前處理 27 3-2 訓練階段 28 3-3 測試階段 29 第四章 實驗結果與分析 30 4-1 實驗環境 30 4-2 相關參數與資料選擇 33 4-2-1 預訓練模型與輸入網路選擇之實驗 34 4-2-2 資料增加之選擇 35 4-2-3 資料平衡 (data balancing) 後之實驗結果 36 4-2-4實驗結果比較與分析 39 第五章 結論與未來展望 44 參考文獻 45

    [1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning ap-plied to document recognition,” in Proceedings of the IEEE 86.11, pp. 2278-2324, 1998.
    [2] K. Alex, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Pro-cessing Systems, pp.1097-1105, 2012.
    [3] ImageNet Large Scale Visual Recognition Competition: http://www.image-net.org/challenges/LSVRC/
    [4] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.
    [5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), pp. 1-9, 2015.
    [6] 華文戒菸網-菸害防制法: https://www.e-quit.org/CustomPage/HtmlEditorPage.aspx?MId=242&ML=3
    [7] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” In 2011 International Con-ference on Computer Vision, pp.2556–2563, IEEE, 2011.
    [8] F. Caba Heilbron, V. Escorcia, B. Ghanem, and J. Carlos Niebles, “Activi-tynet: A large-scale video benchmark for human activity understanding,” in Computer Vision and Pattern Recognition (CVPR), pp. 961-970, 2015.
    [9] C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li, ... and C. Schmid, “AVA: A video dataset of spatio-temporally localized atomic visual actions,” arXiv preprint arXiv: 1705.08421, 2017.
    [10] Y. Jia, et al., “Caffe: Convolutional architecture for fast feature embedding,” ACM International Conference on Multimedia, 2014.
    [11] H. Wang, and C. Schmid, “Action recognition with improved trajectories,” In: Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, pp. 3551-3558, 2013.
    [12] H. Wang, A. Kläser, C. Schmid, and C. L. Liu, “Dense trajectories and mo-tion boundary descriptors for action recognition,” International journal of computer vision, 103.1, pp. 60-79, 2013.
    [13] K. Simonyan, and A. Zisserman. “Two-stream convolutional networks for action recognition in videos,” Advances in neural information processing systems, pp.568-576, 2014.
    [14] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spa-tiotemporal features with 3d convolutional networks,” Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE, pp. 4489-4497, 2015.
    [15] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, “Temporal segment networks: Towards good practices for deep action recognition,” in European Conference on Computer Vision, pp. 20-36, 2016.
    [16] OpenCV: Open Source Computer Vision Library , https://opencv.org/
    [17] K. He, Zhang, X., S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
    [18] R. Girshick, J. Donahue, T Darrell., and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
    [19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time ob-ject detection with region proposal networks.” Advances in neural infor-mation processing systems, pp. 91-99, 2015.
    [20] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural net-works,” Proceedings of the IEEE conference on Computer Vision and Pat-tern Recognition, pp. 1725-1732, 2014.
    [21] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, 2016.
    [22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, 2016.
    [23] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
    [24] D. O. Hebb, “The Organization of Behavior,” New York: Wiley & Sons, 1949.
    [25] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological review , 65(6), 386, 1958.
    [26] M. Minsky and S. Paper, “Perceptrons,” Cambridge, MA: MIT Press, 1969.
    [27] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal repre-sentations by error propagation,” No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
    [28] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986.
    [29] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, 18(7), pp. 1527-1554, 2006.
    [30] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in International Conference on Machine Learning, pp. 448-456, 2015.
    [31] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?,” IEEE transactions on medical imag-ing 35(5), pp. 1299-1312, 2016.
    [32] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.

    QR CODE
    :::