| 研究生: |
鄭書伃 Shu-Yu Cheng |
|---|---|
| 論文名稱: |
基於多層自組織映射圖之手語辨識演算法 A Hierarchical Self-Organizing Maps-based Sign Language Recognition Algorithm |
| 指導教授: |
蘇木春
Mu-Chun Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 手語辨識 、自組織映射圖 、深度學習 |
| 外文關鍵詞: | Sign language recognition, Self-Organizing Maps,, Deep learning |
| 相關次數: | 點閱:17 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
手語辨識可以讓很多聽語障人士受益,也能縮短聽語障人士與親 友之間溝通的橋樑。多年來,深度學習在手語辨識的領域達成了很大 的成就。有很多方法可以提取手部骨架或符號的特徵。這些不同的特 徵被許多研究用作深度神經網絡(DNN)的輸入,以用來辨識手語。 然而,特徵提取的效率和手語辨識的準確度仍有進步的空間。本文 中,我們提出了一種新的手語辨識演算法,此演算法先用多層自組織 映射圖(SOM)來將動態手語轉成靜態的響應圖 (response map)。由 於卷積神經網路(CNN)在圖像分類方面具有非凡的性能,因此,我 們就將此靜態的響應圖當成特徵輸入卷積神經網路予以達成手語辨 識之目的。
從美國手語詞典視頻資料集 (ASLLVD) 中選出來 36 個單字作 為我們的資料集來測試所提之手語辨識演算法之有效性,我們在資料 集上達到了 78.57% 的辨識準確率。
The recognition of sign language can benefit many dumb deaf people and bridge the gap of communication between them and their families and friends. For many years, deep learning has achieved great results in the field of sign language recognition. There are lots of methods for extracting features of hand shapes or signs. These different features are used as input of deep neural networks (DNN) in many studies for sign language recognition. However, the efficiency of feature extraction and the recognition accuracy still have room for improvement. In this study, we proposed a novel algorithm for sign language recognition. The algorithm first uses a hierarchical self-organizing map (SOM) to covert dynamic sign language into a static response map. Since the convolutional neural network (CNN) has an extraordinary performance in image classification, we take the static response maps as input features to CNN to achieve the purpose of sign language recognition.
We selected 36 signs from the American sign language lexicon video dataset (ASLLVD) as our dataset to test the effectiveness of our proposed algorithm. Finally, We reached a recognition accuracy of 78.57% on the dataset.
[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L.
Chang, and M. Grundmann, “Mediapipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020.
[3] M.-C. Su and H.-T. Chang, “Fast self-organizing feature map algorithm,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 721733, 2000.
[4] C. Neidle, A. Thangali, and S. Sclaroff, “Challenges in development of the american sign language lexicon video dataset (asllvd) corpus,” in 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC. Citeseer, 2012.
[5] Y. Bengio and P. Frasconi, “An input output hmm architecture,” Advances in neural information processing systems, pp. 427–434, 1995.
[6] T. Starner, J. Weaver, and A. Pentland, “Real-time american sign language recognition using desk and wearable computer based video,”IEEE Transactions on pattern analysis and machine intelligence, vol. 20, no. 12, pp. 1371–1375, 1998.
[7] C. Vogler and D. Metaxas, “Parallel hidden markov models for american sign language recognition,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1. IEEE, 1999, pp. 116–122.
[8] Z. Zafrulla, H. Brashear, T. Starner, H. Hamilton, and P. Presti, “American sign language recognition with the kinect,” in Proceedings of the 13th international conference on multimodal interfaces, 2011, pp. 279–286.
[9] S. Theodorakis, V. Pitsikalis, and P. Maragos, “Dynamic–static unsupervised sequentiality, statistical subunits and lexicon for sign language recognition,” Image and Vision Computing, vol. 32, no. 8, pp. 533–549, 2014.
[10] T.-W. Chong and B.-G. Lee, “American sign language recognition using leap motion controller with machine learning approach,” Sensors, vol. 18, no. 10, p. 3554, 2018.
[11] C. K. Lee, K. K. Ng, C.-H. Chen, H. C. Lau, S. Chung, and T. Tsoi, “American sign language recognition and training method with recurrent neural network,” Expert Systems with Applications, vol. 167, p. 114403, 2021.
[12] N. Kasukurthi, B. Rokad, S. Bidani, D. Dennisan et al., “American sign language alphabet recognition using deep learning,” arXiv preprint arXiv:1905.05487, 2019.
[13] K. Bantupalli and Y. Xie, “American sign language recognition using deep learning and computer vision,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 4896–4899.
[14] C. C. de Amorim, D. Macˆedo, and C. Zanchettin, “Spatial-temporal graph convolutional networks for sign language recognition,” in International Conference on Artificial Neural Networks. Springer, 2019, pp. 646–657.
[15] T.-W. Chong and B.-J. Kim, “American sign language recognition system using wearable sensors with deep learning approach,” The Journal of the Korea institute of electronic communication sciences, vol. 15, no. 2, pp. 291–298, 2020.
[16] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987.
[17] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” in Linear algebra. Springer, 1971, pp. 134–151.
[18] P. Comon, “Independent component analysis, a new concept?” Signal processing, vol. 36, no. 3, pp. 287–314, 1994.
[19] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
[20] A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1. Citeseer, 2013, p. 3.
[21] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. PMLR, 2015, pp. 448–456.
[22] J. D. Schein and M. T. Delk Jr, “The deaf population of the united states.” 1974.
[23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.