基於時空域摺積神經網路之抽菸動作辨識｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱千芳 Chien-Fang Chiu
論文名稱：	基於時空域摺積神經網路之抽菸動作辨識 Smoking Action Recognition Based on Spatial-Temporal Convolutional Neural Networks
指導教授：	張寶基 Pao-Chi Chang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	中文
論文頁數：	59
中文關鍵詞：	抽菸動作辨識、視訊分類、摺積神經網路、深度學習
外文關鍵詞：	Smoking action recognition, Video Classification, Convolutional neural networks, Deep learning
相關次數：	點閱：24 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

國際上有許多國家或各區於室內公共或工作場所全面禁止抽菸，台灣也不例外。但在醫院的門口、校園的角落，仍時常看到有人在抽菸。即使沒有吸菸，但若站在吸菸者旁邊，仍會吸到菸，此菸稱為二手菸。二手菸對於人體危害甚多，除了增加罹患疾病的機率，如癌症、心臟病、中風、呼吸道疾病等，更進一步有可能傷害大腦機能。我們希望經由深度學習的技術與方法，用以辨識揪出違法的吸菸者。
本研究為「基於時空域摺積神經網路之抽菸動作辨識」，提出應用於抽菸動作辨識的系統。採用資料平衡與資料增加等方式增加效能，使用深度學習中的摺積神經網路 GoogLeNet，與Temporal segment networks之影片分段架構，組成擁有時間結構之空間域摺積神經網路(即題目之時空域神經網路)，達成有效辨識抽菸影片之系統。於原先之 Hmdb51 抽菸影片，辨識達100%，於增加之 Activitynet smoking 日常抽菸影片 (Hmdb51 + Activi-tynet smoking)，可達99.16%。於選擇之 AVA data 電影抽菸片段，亦能達到91.667%，能有效分辨抽菸之影片。

Cigarette smoking increases risk for death from all causes in men and wom-en. If one stands next to a smoker, this person still can be infected, called passive smoking. Consequently, smoking is prohibited in many closed public areas such as government buildings, educational facilities, hospitals, enclosed sport facili-ties, and buses. However, it still often happens that smokers smoke even in highly prohibited places such as hospitals and elementary school campuses. The objective of this work is to develop a smoking action recognition system based on deep learning, which allows quick discovery of smoking behavior.
In this work, we propose a system that can recognize smoking action. It uti-lizes data balancing and data augmentation based on GoogLeNet and Temporal segment networks (TSN) architecture to achieve effective smoking action recog-nition. In our experiment, spatial CNN is more powerful than temporal CNN in smoking action. The experimental results show that the smoking accuracy rate can reach 100% for Hmdb51 test dataset. For additional ActivityNet smoking, accuracy rate can reach 99.16%. For additional irrelevant movie smoking clips, the accuracy can also be as high as 91.67%.

摘要    i
Abstract    ii
誌謝    iii
目錄    iv
圖目錄    vi
表目錄    viii
第一章 緒論    1
1-1 研究背景    1
1-2研究動機與目的    2
1-3 論文架構    3
第二章 類神經網路與深度學習    4
2-1 類神經網路    4
2-1-1 類神經網路之發展    5
2-1-2 倒傳遞類神經網路    7
2-2 深度學習    11
2-2-1 深度神經網路    11
2-2-2 摺積神經網路 (CNN)    13
2-2-3 批次資料正規化 (Batch Normalization)    18
2-3    動作辨識領域之發展    22
2-3-1 雙串流的神經網路 (Two-stream networks)    22
2-3-2 Temporal segment networks (TSN)    23
第三章 提出之方法與相關使用    26
3-1 影片影格之提取與資料前處理    27
3-2 訓練階段    28
3-3 測試階段    29
第四章 實驗結果與分析    30
4-1 實驗環境    30
4-2 相關參數與資料選擇    33
4-2-1 預訓練模型與輸入網路選擇之實驗    34
4-2-2 資料增加之選擇    35
4-2-3 資料平衡 (data balancing) 後之實驗結果    36
4-2-4實驗結果比較與分析    39
第五章 結論與未來展望    44
參考文獻    45

                                

[1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning ap-plied to document recognition,” in Proceedings of the IEEE 86.11, pp. 2278-2324, 1998.
[2] K. Alex, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Pro-cessing Systems, pp.1097-1105, 2012.
[3] ImageNet Large Scale Visual Recognition Competition: http://www.image-net.org/challenges/LSVRC/
[4] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), pp. 1-9, 2015.
[6] 華文戒菸網-菸害防制法: https://www.e-quit.org/CustomPage/HtmlEditorPage.aspx?MId=242&ML=3
[7] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” In 2011 International Con-ference on Computer Vision, pp.2556–2563, IEEE, 2011.
[8] F. Caba Heilbron, V. Escorcia, B. Ghanem, and J. Carlos Niebles, “Activi-tynet: A large-scale video benchmark for human activity understanding,” in Computer Vision and Pattern Recognition (CVPR), pp. 961-970, 2015.
[9] C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li, ... and C. Schmid, “AVA: A video dataset of spatio-temporally localized atomic visual actions,” arXiv preprint arXiv: 1705.08421, 2017.
[10] Y. Jia, et al., “Caffe: Convolutional architecture for fast feature embedding,” ACM International Conference on Multimedia, 2014.
[11] H. Wang, and C. Schmid, “Action recognition with improved trajectories,” In: Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, pp. 3551-3558, 2013.
[12] H. Wang, A. Kläser, C. Schmid, and C. L. Liu, “Dense trajectories and mo-tion boundary descriptors for action recognition,” International journal of computer vision, 103.1, pp. 60-79, 2013.
[13] K. Simonyan, and A. Zisserman. “Two-stream convolutional networks for action recognition in videos,” Advances in neural information processing systems, pp.568-576, 2014.
[14] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spa-tiotemporal features with 3d convolutional networks,” Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE, pp. 4489-4497, 2015.
[15] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, “Temporal segment networks: Towards good practices for deep action recognition,” in European Conference on Computer Vision, pp. 20-36, 2016.
[16] OpenCV: Open Source Computer Vision Library , https://opencv.org/
[17] K. He, Zhang, X., S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[18] R. Girshick, J. Donahue, T Darrell., and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
[19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time ob-ject detection with region proposal networks.” Advances in neural infor-mation processing systems, pp. 91-99, 2015.
[20] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural net-works,” Proceedings of the IEEE conference on Computer Vision and Pat-tern Recognition, pp. 1725-1732, 2014.
[21] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, 2016.
[22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, 2016.
[23] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[24] D. O. Hebb, “The Organization of Behavior,” New York: Wiley & Sons, 1949.
[25] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological review , 65(6), 386, 1958.
[26] M. Minsky and S. Paper, “Perceptrons,” Cambridge, MA: MIT Press, 1969.
[27] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal repre-sentations by error propagation,” No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
[28] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986.
[29] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, 18(7), pp. 1527-1554, 2006.
[30] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in International Conference on Machine Learning, pp. 448-456, 2015.
[31] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?,” IEEE transactions on medical imag-ing 35(5), pp. 1299-1312, 2016.
[32] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.

簡易檢索 / 詳目顯示

相關論文