| 研究生: |
董致輔 Chih-Fu Tung |
|---|---|
| 論文名稱: |
基於通道拓樸增強圖卷積神經網絡之手語單詞辨識演算法 A CTRGCN-based model for Isolated Sign Language Recognition |
| 指導教授: |
蘇木春
Mu-Chun Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 深度學習 、骨架辨識 、手語單詞辨識 、圖卷積神經網路 |
| 外文關鍵詞: | Deep learning, Skeleton recognition, Sign language recognition, Graph convolutional neural network |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,聽障人士的人口逐漸增長,大眾對於手語學習的需求也跟
著逐年提升,然而,手語學習的困難度高,且學習資源有限,使得手語
學習成為一個困難的任務。
為了解決這個問題,本論文提出了一種基於通道拓樸增強圖卷積神
經網絡(CTRGCN)的基於骨架手語單詞辨識演算法。本研究針對手語
單詞辨識,設計了改良的CTRGCN 模型,並提出多分支的架構,以提高
辨識準確度。我們使用WLASL100 數據集進行訓練,並與現有模型進行
了的比較。結果顯示,我們的方法在多數情境下均優於現有技術,展示
了其在手語單詞辨識上的潛力和實用性,並希望為手語學習提供更多的
幫助。
In recent years, the population of hearing-impaired individuals has been
gradually increasing, and the public’s demand for sign language learning has
been steadily rising as well. However, the difficulty of learning sign language is
high, and the learning resources are limited, making it a relatively challenging
task.
To address this issue, this paper proposes a Skeleton based sign language
word recognition algorithm based on Channel-Topology Refinement Graph Convolutional
Network (CTRGCN). This method tackles the challenges in sign language
word recognition, by designing an improved CTRGCN model to enhance
recognition accuracy. We trained the model using the WLASL100 dataset and
compared it with existing models. The results demonstrate that our method outperforms
existing techniques in most scenarios, showcasing its potential and
practicality in sign language word recognition. We hope to provide more assistance
for sign language learning through this approach.
[1] W. H. Organization. “Deafness and hearing loss — who.int.” (2024), [Online]. Available:
https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss (visited
on 05/18/2024).
[2] D. Li, C. Rodriguez, X. Yu, and H. Li, “Word-level deep sign language recognition from
video: A new large-scale dataset and methods comparison,” in The IEEE Winter Conference
on Applications of Computer Vision, 2020, pp. 1459–1469.
[3] 教育部國民及學前教育署. “學齡前2 至6 歲教保服務人員手語手冊,” [Online].
Available: https://www.ece.moe.edu.tw/ch/special_education/skill/skill_0002/ (visited
on 06/11/2024).
[4] 李信賢. “國際手語(is) 是否為一種語言?.” (2019), [Online]. Available: https : / /
taslifamily.org/?p=4826 (visited on 05/18/2024).
[5] E. Drasgow. “American sign language.” (2024), [Online]. Available: https : / / www .
britannica.com/topic/American-Sign-Language (visited on 05/20/2024).
[6] D. W. Vicars. “Gloss,” [Online]. Available: https://www.lifeprint.com/asl101/topics/
gloss.htm (visited on 05/20/2024).
[7] 中華民國啟聰協會. “台灣手語介紹及手語qa,” [Online]. Available: https://www.
deaf.org.tw/OnePage.aspx?mid=51&id=46 (visited on 05/20/2024).
[8] SignTube, 台灣手語南北差異1 tsl dialects (1), YouTube, Accessed: 2024-06-02, 2023.
[9] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional
networks,” arXiv preprint arXiv:1609.02907, 2016.
[10] C. Lugaresi, J. Tang, H. Nash, et al., “Mediapipe: A framework for building perception
pipelines,” arXiv preprint arXiv:1906.08172, 2019.
[11] google-ai-edge. “Mediapipe holistic.” Accessed: 2024-06-02. (2022), [Online]. Available:
https://github.com/google-ai-edge/mediapipe/blob/master/docs/solutions/holistic.
md.
[12] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person
2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. arXiv:
1812.08008. [Online]. Available: http://arxiv.org/abs/1812.08008.
[13] T. Jiang, P. Lu, L. Zhang, et al., “Rtmpose: Real-time multi-person pose estimation based
on mmpose,” arXiv preprint arXiv:2303.07399, 2023.
[14] A. Sengupta, F. Jin, R. Zhang, and S. Cao, “Mm-pose: Real-time human skeletal posture
estimation using mmwave radars and cnns,” IEEE Sensors Journal, vol. 20, no. 17,
pp. 10 032–10 044, 2020.
[15] C. Li, P. Wang, S. Wang, Y. Hou, and W. Li, “Skeleton-based action recognition using
LSTM and CNN,” CoRR, vol. abs/1707.02356, 2017. arXiv: 1707.02356. [Online].
Available: http://arxiv.org/abs/1707.02356.
[16] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeletonbased
action recognition,” CoRR, vol. abs/1801.07455, 2018. arXiv: 1801.07455. [Online].
Available: http://arxiv.org/abs/1801.07455.
[17] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Adaptive spectral graph convolutional networks
for skeleton-based action recognition,” CoRR, vol. abs/1805.07694, 2018. arXiv: 1805.
07694. [Online]. Available: http://arxiv.org/abs/1805.07694.
[18] Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement
graph convolution for skeleton-based action recognition,” CoRR, vol. abs/2107.12213,
2021. arXiv: 2107.12213. [Online]. Available: https://arxiv.org/abs/2107.12213.
[19] J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics
dataset,” CoRR, vol. abs/1705.07750, 2017. arXiv: 1705.07750. [Online]. Available:
http://arxiv.org/abs/1705.07750.
[20] S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy, “Rethinking spatiotemporal feature
learning for video understanding,” CoRR, vol. abs/1712.04851, 2017. arXiv: 1712.04851.
[Online]. Available: http://arxiv.org/abs/1712.04851.
[21] A. Tunga, S. V. Nuthalapati, and J. P. Wachs, “Pose-based sign language recognition
using GCN and BERT,” CoRR, vol. abs/2012.00781, 2020. arXiv: 2012.00781. [Online].
Available: https://arxiv.org/abs/2012.00781.
[22] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional
transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018. arXiv:
1810.04805. [Online]. Available: http://arxiv.org/abs/1810.04805.
[23] M. Boháček and M. Hrúz, “Sign pose-based transformer for word-level sign language
recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision (WACV) Workshops, Jan. 2022, pp. 182–191.
[24] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” CoRR, vol. abs/
1706.03762, 2017. arXiv: 1706.03762. [Online]. Available: http://arxiv.org/abs/1706.
03762.
[25] H. Hu, W. Zhao, W. Zhou, and H. Li, “Signbert+: Hand-model-aware self-supervised
pre-training for sign language understanding,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 45, no. 9, pp. 11 221–11 239, Sep. 2023, ISSN: 1939-3539.
DOI: 10.1109/tpami.2023.3269220. [Online]. Available: http://dx.doi.org/10.1109/
TPAMI.2023.3269220.
[26] D. Laines, G. Bejarano, M. Gonzalez-Mendoza, and G. Ochoa-Ruiz, Isolated sign language
recognition based on tree structure skeleton images, 2023. arXiv: 2304 . 05403
[cs.CV].
[27] M. Contributors. “Openmmlab pose estimation toolbox and benchmark.” Accessed: 2024-
06-02. (2020), [Online]. Available: https://github.com/open-mmlab/mmpose.
[28] jin-s13. “Coco-wholebody.” (2020), [Online]. Available: https://github.com/jin- s13/
COCO-WholeBody/ (visited on 06/02/2024).
[29] Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph
convolutions for skeleton-based action recognition,” CoRR, vol. abs/2003.14111, 2020.
arXiv: 2003.14111. [Online]. Available: https://arxiv.org/abs/2003.14111.
[30] A. G. Howard, M. Zhu, B. Chen, et al., “Mobilenets: Efficient convolutional neural networks
for mobile vision applications,” CoRR, vol. abs/1704.04861, 2017. arXiv: 1704.
04861. [Online]. Available: http://arxiv.org/abs/1704.04861.
[31] S. Jiang, B. Sun, L. Wang, Y. Bai, K. Li, and Y. Fu, “Sign language recognition via
skeleton-aware multi-model ensemble,” CoRR, vol. abs/2110.06161, 2021. arXiv: 2110.
06161. [Online]. Available: https://arxiv.org/abs/2110.06161.
[32] R. Zuo, F. Wei, and B. Mak, Natural language-assisted sign language recognition, 2023.
arXiv: 2303.12080 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2303.12080.
[33] D. Li, X. Yu, C. Xu, L. Petersson, and H. Li, “Transferring cross-domain knowledge for
video sign language recognition,” CoRR, vol. abs/2003.03703, 2020. arXiv: 2003.03703.
[Online]. Available: https://arxiv.org/abs/2003.03703.