| 研究生: |
許峰碩 Feng-Shuo Hsu |
|---|---|
| 論文名稱: |
複合式智慧型跌倒感測系統及快速姿勢辨識輕量化深度學習網路之開發與研究 |
| 指導教授: |
陳健章
Chien-Chang Chen |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
生醫理工學院 - 生醫科學與工程學系 Department of Biomedical Sciences and Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 跌倒偵測 、複合式感測系統 、輕量化神經網路 、貝氏神經網絡 、資料密度泛函轉換 、人體姿勢辨識 |
| 外文關鍵詞: | Fall detection, Hybrid sensing system, Lightweight neural networks, Bayesian neural networks, Data Density Functional Transform, Human posture identification |
| 相關次數: | 點閱:20 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
跌倒意外可能發生在老年人、住院病患、特定疾病患者與行動不便者的日常 生活當中,並對他們的周身安全和生活品質造成嚴重的威脅,同時亦可能對他們的照護者的身體和心理產生沉重的負面影響。此外,跌倒意外若越遲被發現,受傷者完全康復的機會則可能越低。為了能及早發覺跌倒事件,在本研究中我們提出了一個數據導向的跌倒偵測模型與感測器融合架構。我們結合商用網路攝影機 (WebCam) 與實驗室自製超音波感測器陣列 (ultrasonic array),開發一個能自動偵測人體跌倒的感測裝置與相應的機率模型 (probabilistic model)。
在這個研究所提出的複合式感測架構當中,網路攝影機與超音波陣列分別用於收集人體移動的上、下、左、右等四個方向 (即橫向) 以及前、後兩個方向 (即縱向) 的時間序列訊號,而在人物的辨識追蹤技術上則是利用 Haar 特徵階層式分類器 (Haar-feature-based cascade classifier) 與特徵通道與空間可信度循跡 (channel and spatial reliability tracking, CSRT) 追蹤器兩種不同的偵測演算法來辨識及追蹤目標對象。然後再將這些訊號依時間對齊並組合成一組在三維空間 (three-dimensional space, 3D space) 中的運動軌跡圖,爾後以離散型快速資料密度泛函轉換 (Discrete fast Data Density Functional Transform, D-fDDFT) 產生跌倒辨識的機率圖譜。實驗中七位受試者平均身高為 164.2 ± 12 公分,並以特定的運動模式模擬自然狀況下的暈眩跌倒姿態。我們以 D-fDDFT 技術提取了他們的 3D 動作時序數據的資料群數及對應的資料邊界,再以這些動作特徵建立辨識人體跌倒的機率模型。
在 D-fDDFT 技術的特徵提取與分類下,該機率模型能視覺化呈現人體動作的三種可能狀態:正常行動姿勢、跌倒前期的過渡區、以及跌倒姿勢的完成。研究中,複合式感測系統在數據驗證中的準確率 (accuracy) 為 90%、靈敏度 (sensitivity) 為 90%、精確率 (precision) 為 95%,從預測跌倒可能發生到發出警報的平均時間約為 0.7秒。這些關鍵結果不僅顯示複合式感測系統的偵測性能可與當前的其他方法相媲美,也證實了該系統的實際可行性。此外,為因應當前人體姿勢追蹤與跌倒偵測技術仍以影像技術併用深度卷積神經網路 (convolutional neural network, CNN) 為主流的趨勢,我們提出一個結合物件偵測技術與隨機變分推理 (stochastic variational inference, SVI) 的新方案:藉由建構輕量化單次多框偵測器 (single-shot multi-box detector, SSD) 神經網路模型來縮小辨識模型的尺寸並提高推理速度,以利將此技術應用於快速人體姿態辨識。
技術上,我們採用惟整數運算 (integer-arithmetic-only, IOA) 演算法來降低模型訓練的計算複雜度,並採用特徵金字塔網路 (feature pyramid network, FPN) 加強捕捉小物體的特徵。同時利用自注意力機制 (self-attention mechanism) 來提取人體連續動作框之特徵,亦即偵測框的質心座標,再透過貝氏神經網絡 (Bayesian neural network) 與隨機變分推論 (stochastic variational inference) 技術,人體姿勢便可以經由快速解析以高斯混合模型 (Gaussian mixture model, GMM) 產生的數據群簇來進行即時分類。模型以即時質心特徵作為輸入值 (inputs),並且在機率圖譜中的不同位置來顯示可能的人體姿勢。相對於作為比較基準的 ResNet 模型,我們的模型具有較高的平均精確度 (mAP: 34.6 vs. 32.5)、較快的推理速度 (inference speed: 27 vs. 48 milliseconds)、以及較小的模型尺寸 (46.2 vs. 227.8 MB),且能在疑似跌倒事件發生前約 0.66 秒就發出警報。
Elderly people, inpatients, individuals with specific conditions, and those with mobility impairments are at high risk of fall accidents, which may pose severe impacts on their health and quality of life, as well as significant burdens to their caregivers. The later a fall accident is discovered, the chance of full recovery is slimer. We develop a data-driven system with a hybrid sensing mechanism and a probabilistic model to enable the automatic detection of fall incidents. In the dual sensing platform, a webcam and an ultrasonic array correspondingly capture the target subject's transverse and longitudinal time-series signals, which are assembled into a three-dimensional (3D) motion trajectory map. Two different detection and tracking algorithms are utilized to identify the target subject. The synergy of the Haar-feature-based cascade classifier and channel and spatial reliability tracking (CSRT) tracker ensures continuous face tracking of a moving subject. The average height of 7 normal subjects participating in our study is 164.2 ± 12 cm. We use 3D motion data with discrete fast data density functional theory (D-fDDFT) to estimate cluster numbers and their corresponding boundaries and employ a Gaussian mixture model (GMM) as the kernel of D-fDDFT. These features are then used to construct a probabilistic model for visually displaying the three possible motion states of normal movement, transition from normal to fall, and fall. The hybrid sensing system achieves an accuracy of 90%, a sensitivity of 90%, and a precision of 95% during data validation. The average time from a fall to the alarm being triggered is approximately 0.7 seconds. These key results not only demonstrate detection performance comparable to contemporary methods but also validate the feasibility of the proposed system. Since image-based approaches employing deep neural networks remain mainstream for posture detection, we further establish a novel framework combining object detection techniques with stochastic variational inference (SVI). By constructing lightweight neural network models, we aim to reduce model sizes and improve inference speed, facilitating their application in rapid human posture recognition. We adopt the integer-arithmetic-only (IOA) algorithm to lower the computational complexity during model training and utilize the feature pyramid network (FPN) to capture features of small objects. The self-attention mechanism is employed to extract features from continuous human motion frames, which are the centroid coordinates of bounding boxes. By integrating Bayesian neural networks and stochastic variational inference techniques, human postures can be classified promptly by efficiently resolving a Gaussian mixture model (GMM). With the instant centroid features as inputs, the potential human postures can be displayed on probabilistic maps. Compared to the ResNet model, which serves as a benchmark, our model demonstrates superior performance with higher mean precision (34.6 vs. 32.5), faster inference speed (27 vs. 48 milliseconds), and smaller model size (46.2 vs. 227.8 MB). Additionally, the model can issue an alert approximately 0.66 seconds before a suspected fall event occurs.
[1] WHO. "Falls." Available online: https://www.who.int/news-room/fact-sheets/detail/falls (accessed 2024).
[2] WHO. "WHO global report on falls prevention in older age." Available online: https://www.who.int/publications/i/item/9789241563536 (accessed 2024).
[3] United Nations Department of Economic and Social Affairs. "World Social Report 2023: Leaving No One Behind in an Aging World" Available online: https://www.un.org/development/desa/dspd/wp-content/uploads/sites/22/2023/01/WSR_2023_Chapter_Key_Messages.pdf (accessed 2024).
[4] F. Clay, G. Yap, and A. Melder, "Risk factors for in hospital falls: Evidence Review," Centre for Clinical Effectiveness, Monash Health, Melbourne, Australia, 2018.
[5] H. Chander et al., "Wearable stretch sensors for human movement monitoring and fall detection in ergonomics," International journal of environmental research and public health, vol. 17, no. 10, p. 3554, 2020.
[6] O. Kerdjidj, N. Ramzan, K. Ghanem, A. Amira, and F. Chouireb, "Fall detection and human activity classification using wearable sensors and compressed sensing," Journal of Ambient Intelligence and Humanized Computing, vol. 11, pp. 349-361, 2020.
[7] F.-S. Hsu, T.-C. Chang, Z.-J. Su, S.-J. Huang, and C.-C. Chen, "Smart fall detection framework using hybridized video and ultrasonic sensors," Micromachines, vol. 12, no. 5, p. 508, 2021.
[8] F. Shu and J. Shu, "An eight-camera fall detection system using human fall pattern recognition via machine learning by a low-cost android box," Scientific reports, vol. 11, no. 1, p. 2471, 2021.
[9] S. Rastogi and J. Singh, "Human fall detection and activity monitoring: a comparative analysis of vision-based methods for classification and detection techniques," Soft Computing, vol. 26, no. 8, pp. 3679-3701, 2022.
[10] W. Ding, B. Hu, H. Liu, X. Wang, and X. Huang, "Human posture recognition based on multiple features and rule learning," International Journal of Machine Learning and Cybernetics, vol. 11, pp. 2529-2540, 2020.
[11] T. Alanazi and G. Muhammad, "Human fall detection using 3D multi-stream convolutional neural networks with fusion," Diagnostics, vol. 12, no. 12, p. 3060, 2022.
[12] K. Fei, C. Wang, J. Zhang, Y. Liu, X. Xie, and Z. Tu, "Flow-pose Net: An effective two-stream network for fall detection," The Visual Computer, vol. 39, no. 6, pp. 2305-2320, 2023.
[13] J. Liu, Y. Wang, Y. Liu, S. Xiang, and C. Pan, "3D PostureNet: A unified framework for skeleton-based posture recognition," Pattern Recognition Letters, vol. 140, pp. 143-149, 2020.
[14] Y.-H. Nho, J. G. Lim, and D.-S. Kwon, "Cluster-analysis-based user-adaptive fall detection using fusion of heart rate sensor and accelerometer in a wearable device," IEEE Access, vol. 8, pp. 40389-40401, 2020.
[15] S.-J. Huang, C.-J. Wu, and C.-C. Chen, "Pattern recognition of human postures using the data density functional method," Applied Sciences, vol. 8, no. 9, p. 1615, 2018.
[16] G. L. Santos, P. T. Endo, K. H. d. C. Monteiro, E. d. S. Rocha, I. Silva, and T. Lynn, "Accelerometer-based human fall detection using convolutional neural networks," Sensors, vol. 19, no. 7, p. 1644, 2019.
[17] S. B. Khojasteh, J. R. Villar, C. Chira, V. M. González, and E. De la Cal, "Improving fall detection using an on-wrist wearable accelerometer," Sensors, vol. 18, no. 5, p. 1350, 2018.
[18] A. Jefiza, E. Pramunanto, H. Boedinoegroho, and M. H. Purnomo, "Fall detection based on accelerometer and gyroscope using back propagation," in 2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 2017: IEEE, pp. 1-6.
[19] F.-T. Wang, H.-L. Chan, M.-H. Hsu, C.-K. Lin, P.-K. Chao, and Y.-J. Chang, "Threshold-based fall detection using a hybrid of tri-axial accelerometer and gyroscope," Physiological measurement, vol. 39, no. 10, p. 105002, 2018.
[20] E. Casilari, M. Álvarez-Marco, and F. García-Lagos, "A study of the use of gyroscope measurements in wearable fall detection systems," Symmetry, vol. 12, no. 4, p. 649, 2020.
[21] P. Pierleoni, A. Belli, L. Palma, M. Pellegrini, L. Pernini, and S. Valenti, "A high reliability wearable device for elderly fall detection," IEEE Sensors Journal, vol. 15, no. 8, pp. 4544-4553, 2015.
[22] S. Abbate, M. Avvenuti, F. Bonatesta, G. Cola, P. Corsini, and A. Vecchio, "A smartphone-based fall detection system. Pervasive Mob. Comput. 8 (6), 883–899 (2012)," ed, 2012.
[23] M. A. Guvensan, A. O. Kansiz, N. C. Camgoz, H. I. Turkmen, A. G. Yavuz, and M. E. Karsligil, "An energy-efficient multi-tier architecture for fall detection on smartphones," Sensors, vol. 17, no. 7, p. 1487, 2017.
[24] M. Saleh and R. L. B. Jeannès, "Elderly fall detection using wearable sensors: A low cost highly accurate algorithm," IEEE Sensors Journal, vol. 19, no. 8, pp. 3156-3164, 2019.
[25] M. Mubashir, L. Shao, and L. Seed, "A survey on fall detection: Principles and approaches," Neurocomputing, vol. 100, pp. 144-152, 2013.
[26] Y. Kong, J. Huang, S. Huang, Z. Wei, and S. Wang, "Learning spatiotemporal representations for human fall detection in surveillance video," Journal of Visual Communication and Image Representation, vol. 59, pp. 215-230, 2019.
[27] H. Sin and G. Lee, "Additional virtual reality training using Xbox Kinect in stroke survivors with hemiplegia," American journal of physical medicine & rehabilitation, vol. 92, no. 10, pp. 871-880, 2013.
[28] J. Zhang, C. Wu, and Y. Wang, "Human fall detection based on body posture spatio-temporal evolution," Sensors, vol. 20, no. 3, p. 946, 2020.
[29] L. Panahi and V. Ghods, "Human fall detection using machine vision techniques on RGB–D images," Biomedical Signal Processing and Control, vol. 44, pp. 146-153, 2018.
[30] Y. Du, Y. Fu, and L. Wang, "Skeleton based action recognition with convolutional neural network," in 2015 3rd IAPR Asian conference on pattern recognition (ACPR), 2015: IEEE, pp. 579-583.
[31] S. Yan, Y. Xiong, and D. Lin, "Spatial temporal graph convolutional networks for skeleton-based action recognition," in Proceedings of the AAAI conference on artificial intelligence, 2018, vol. 32, no. 1.
[32] M. Andriluka, S. Roth, and B. Schiele, "People-tracking-by-detection and people-detection-by-tracking," in 2008 IEEE Conference on computer vision and pattern recognition, 2008: IEEE, pp. 1-8.
[33] K. Zhang, L. Zhang, and M.-H. Yang, "Fast compressive tracking," IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 10, pp. 2002-2015, 2014.
[34] Z. Kalal, K. Mikolajczyk, and J. Matas, "Tracking-learning-detection," IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 7, pp. 1409-1422, 2011.
[35] A. Lukezic, T. Vojir, L. ˇCehovin Zajc, J. Matas, and M. Kristan, "Discriminative correlation filter with channel and spatial reliability," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6309-6318.
[36] A. Sharifara, M. S. M. Rahim, and Y. Anisi, "A general review of human face detection including a study of neural networks and Haar feature-based cascade classifier in face detection," in 2014 International symposium on biometrics and security technologies (ISBAST), 2014: IEEE, pp. 73-78.
[37] S. Choudhury, S. P. Chattopadhyay, and T. K. Hazra, "Vehicle detection and counting using haar feature-based classifier," in 2017 8th annual industrial automation and electromechanical engineering conference (IEMECON), 2017: IEEE, pp. 106-109.
[38] OpenCV. "Face Detection using Haar Cascades." Available online: https://docs.opencv.org/3.4/d2/d99/tutorial_js_face_detection.html (accessed 2024).
[39] X. Farhodov, O.-H. Kwon, K. W. Kang, S.-H. Lee, and K.-R. Kwon, "Faster RCNN detection based OpenCV CSRT tracker using drone data," in 2019 international conference on information science and communications technologies (icisct), 2019: IEEE, pp. 1-3.
[40] OpenCV. "cv::TrackerCSRT Class Reference." Available online: https://docs.opencv.org/3.4/d2/da2/classcv_1_1TrackerCSRT.html (accessed 2024).
[41] C.-C. Chen, H.-H. Juan, M.-Y. Tsai, and H. H.-S. Lu, "Unsupervised learning and pattern recognition of biological data structures with density functional theory and machine learning," Scientific reports, vol. 8, no. 1, p. 557, 2018.
[42] W. Liu et al., "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016: Springer, pp. 21-37.
[43] R. Araki, T. Onishi, T. Hirakawa, T. Yamashita, and H. Fujiyoshi, "MT-DSSD: Deconvolutional single shot detector using multi task learning for object detection, segmentation, and grasping detection," in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020: IEEE, pp. 10487-10493.
[44] Z. Shen, Z. Liu, J. Li, Y.-G. Jiang, Y. Chen, and X. Xue, "Dsod: Learning deeply supervised object detectors from scratch," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1919-1927.
[45] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[46] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
[47] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137-1149, 2016.
[48] L. Liu, Y. Peng, S. Wang, M. Liu, and Z. Huang, "Complex activity recognition using time series pattern dictionary learned from ubiquitous sensors," Information Sciences, vol. 340, pp. 41-57, 2016.
[49] M. Y. Ansari et al., "A lightweight neural network with multiscale feature enhancement for liver CT segmentation," Scientific reports, vol. 12, no. 1, p. 14153, 2022.
[50] W. Li, J. Liu, and H. Mei, "Lightweight convolutional neural network for aircraft small target real-time detection in Airport videos in complex scenes," Scientific reports, vol. 12, no. 1, p. 14474, 2022.
[51] A. G. Howard et al., "MobileNets: efficient convolutional neural networks for mobile vision applications (2017)," arXiv preprint arXiv:1704.04861, vol. 126, 2017.
[52] X. Zhang, X. Zhou, M. Lin, and J. Sun, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856.
[53] A. Howard et al., "Searching for mobilenetv3," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314-1324.
[54] M. Tan and Q. V. Le, "Mixconv: Mixed depthwise convolutional kernels," arXiv preprint arXiv:1907.09595, 2019.
[55] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
[56] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[57] F. H. Iandola, S; Moskewicz, MW; Ashraf, K; Dally, WJ; Keutzer, K, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size," arXiv preprint arXiv:1602.07360, 2016.
[58] S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," arXiv preprint arXiv:1510.00149, 2015.
[59] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, "Shufflenet v2: Practical guidelines for efficient cnn architecture design," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116-131.
[60] B. Jacob et al., "Quantization and training of neural networks for efficient integer-arithmetic-only inference," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704-2713.
[61] H. Zhao, D. Liu, and H. Li, "Efficient integer-arithmetic-only convolutional neural networks," arXiv preprint arXiv:2006.11735, 2020.
[62] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[63] M.-T. P. Luong, H.; Manning, C.D., "Effective approaches to attention-based neural machine translation," arXiv preprint arXiv:1508.04025, 2015.
[64] A. S. Vaswani, N; Usakoreit, J; Jones, L; Gomez, AN; Kaiser, L; Polosukhin, I, "Attention is all you need," Advances in Neural Information Processing Systems, 2017.
[65] L. Yao and Z. Ge, "Nonlinear Gaussian mixture regression for multimode quality prediction with partially labeled data," IEEE transactions on industrial informatics, vol. 15, no. 7, pp. 4044-4053, 2018.
[66] J. Pai. "Haar Feature-based Cascade Classifiers." Medium. Available online: https://medium.com/lifes-a-struggle/haar-feature-based-cascade-classifiers-5ef4c33af02b (accessed 2024).
[67] C.-C. Chen, M.-Y. Tsai, M.-Z. Kao, and H. H.-S. Lu, "Medical image segmentation with adjustable computational complexity using data density functionals," Applied Sciences, vol. 9, no. 8, p. 1718, 2019.
[68] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the royal statistical society: series B (methodological), vol. 39, no. 1, pp. 1-22, 1977.
[69] L. Huang, D. Yang, B. Lang, and J. Deng, "Decorrelated batch normalization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 791-800.
[70] Y. Wu and K. He, "Group normalization," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
[71] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International journal of computer vision, vol. 111, pp. 98-136, 2015.
[72] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[73] F.-S. Hsu et al., "Lightweight deep neural network embedded with stochastic variational inference loss function for fast detection of human postures," Entropy, vol. 25, no. 2, p. 336, 2023.
[74] M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley, "Stochastic variational inference," Journal of Machine Learning Research, 2013.
[75] D. P. Kingma, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013.
[76] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, "Variational inference: A review for statisticians," Journal of the American statistical Association, vol. 112, no. 518, pp. 859-877, 2017.
[77] Y. L. Tai, S. J. Huang, C. C. Chen, and H. H. Lu, "Computational Complexity Reduction of Neural Networks of Brain Tumor Image Segmentation by Introducing Fermi-Dirac Correction Functions," Entropy (Basel), vol. 23, no. 2, Feb 11 2021, doi: 10.3390/e23020223.