基於多層自組織映射圖之手語辨識演算法｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭書伃 Shu-Yu Cheng
論文名稱：	基於多層自組織映射圖之手語辨識演算法 A Hierarchical Self-Organizing Maps-based Sign Language Recognition Algorithm
指導教授：	蘇木春 Mu-Chun Su
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	67
中文關鍵詞：	手語辨識、自組織映射圖、深度學習
外文關鍵詞：	Sign language recognition, Self-Organizing Maps,, Deep learning
相關次數：	點閱：17 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

手語辨識可以讓很多聽語障人士受益，也能縮短聽語障人士與親友之間溝通的橋樑。多年來，深度學習在手語辨識的領域達成了很大的成就。有很多方法可以提取手部骨架或符號的特徵。這些不同的特徵被許多研究用作深度神經網絡（DNN）的輸入，以用來辨識手語。然而，特徵提取的效率和手語辨識的準確度仍有進步的空間。本文中，我們提出了一種新的手語辨識演算法，此演算法先用多層自組織映射圖（SOM）來將動態手語轉成靜態的響應圖 (response map)。由於卷積神經網路（CNN）在圖像分類方面具有非凡的性能，因此，我們就將此靜態的響應圖當成特徵輸入卷積神經網路予以達成手語辨識之目的。

從美國手語詞典視頻資料集 (ASLLVD) 中選出來 36 個單字作為我們的資料集來測試所提之手語辨識演算法之有效性，我們在資料集上達到了 78.57% 的辨識準確率。

The recognition of sign language can beneﬁt many dumb deaf people and bridge the gap of communication between them and their families and friends. For many years, deep learning has achieved great results in the ﬁeld of sign language recognition. There are lots of methods for extracting features of hand shapes or signs. These diﬀerent features are used as input of deep neural networks (DNN) in many studies for sign language recognition. However, the eﬃciency of feature extraction and the recognition accuracy still have room for improvement. In this study, we proposed a novel algorithm for sign language recognition. The algorithm ﬁrst uses a hierarchical self-organizing map (SOM) to covert dynamic sign language into a static response map. Since the convolutional neural network (CNN) has an extraordinary performance in image classiﬁcation, we take the static response maps as input features to CNN to achieve the purpose of sign language recognition.

We selected 36 signs from the American sign language lexicon video dataset (ASLLVD) as our dataset to test the eﬀectiveness of our proposed algorithm. Finally, We reached a recognition accuracy of 78.57% on the dataset.

Contents

Abstract i

Contents v

List of Figures vii

List of Algorithms ix

List of Tables x

Introduction 1

1 Introudction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Background 3

1 Related Works of Sign Language Recognition . . . . . . . . . . 3

2 Review of Unsupervised Learning Methods . . . . . . . . . . . 5

2.1 K-means Clustering . . . . . . . . . . . . . . . . . . . . 5

2.2 Principal Component Analysis . . . . . . . . . . . . . . 6

2.3 Singular Value Decomposition . . . . . . . . . . . . . . 7

2.4 Independent Component Analysis . . . . . . . . . . . . 8

3 Review of Self-Organizing Maps . . . . . . . . . . . . . . . . . 10

4 Review of Convolution Neural Networks . . . . . . . . . . . . 13

v 2.4.1 Convolution Layers . . . . . . . . . . . . . . . . . . . . 14

4.2 Activation Layers . . . . . . . . . . . . . . . . . . . . . 14

4.3 Pooling Layers . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Batch Normalization . . . . . . . . . . . . . . . . . . . 19

4.5 Fully connected Layers . . . . . . . . . . . . . . . . . . 20

The Proposed Algorithm 21

1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1 Hand detection using MediaPipe . . . . . . . . . . . . 24

3 The Flowchart of the Proposed Algorithm . . . . . . . . . . . 26

4 Network Conﬁguration . . . . . . . . . . . . . . . . . . . . . . 31

4.1 The Architecture of Fast Self-Organizing Maps . . . . . 31

4.2 The Architecture of Convolutional Neural Networks . . 34

Results and Discussion 36

1 Experimental Deﬁnition and Premise . . . . . . . . . . . . . . 36

2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . 42

Conclusions and Perspectives 49

[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haﬀner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

[2] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L.

Chang, and M. Grundmann, “Mediapipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020.

[3] M.-C. Su and H.-T. Chang, “Fast self-organizing feature map algorithm,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 721733, 2000.

[4] C. Neidle, A. Thangali, and S. Sclaroﬀ, “Challenges in development of the american sign language lexicon video dataset (asllvd) corpus,” in 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC. Citeseer, 2012.

[5] Y. Bengio and P. Frasconi, “An input output hmm architecture,” Advances in neural information processing systems, pp. 427–434, 1995.

[6] T. Starner, J. Weaver, and A. Pentland, “Real-time american sign language recognition using desk and wearable computer based video,”IEEE Transactions on pattern analysis and machine intelligence, vol. 20, no. 12, pp. 1371–1375, 1998.

[7] C. Vogler and D. Metaxas, “Parallel hidden markov models for american sign language recognition,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1. IEEE, 1999, pp. 116–122.

[8] Z. Zafrulla, H. Brashear, T. Starner, H. Hamilton, and P. Presti, “American sign language recognition with the kinect,” in Proceedings of the 13th international conference on multimodal interfaces, 2011, pp. 279–286.

[9] S. Theodorakis, V. Pitsikalis, and P. Maragos, “Dynamic–static unsupervised sequentiality, statistical subunits and lexicon for sign language recognition,” Image and Vision Computing, vol. 32, no. 8, pp. 533–549, 2014.

[10] T.-W. Chong and B.-G. Lee, “American sign language recognition using leap motion controller with machine learning approach,” Sensors, vol. 18, no. 10, p. 3554, 2018.

[11] C. K. Lee, K. K. Ng, C.-H. Chen, H. C. Lau, S. Chung, and T. Tsoi, “American sign language recognition and training method with recurrent neural network,” Expert Systems with Applications, vol. 167, p. 114403, 2021.

[12] N. Kasukurthi, B. Rokad, S. Bidani, D. Dennisan et al., “American sign language alphabet recognition using deep learning,” arXiv preprint arXiv:1905.05487, 2019.

[13] K. Bantupalli and Y. Xie, “American sign language recognition using deep learning and computer vision,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 4896–4899.

[14] C. C. de Amorim, D. Macˆedo, and C. Zanchettin, “Spatial-temporal graph convolutional networks for sign language recognition,” in International Conference on Artiﬁcial Neural Networks. Springer, 2019, pp. 646–657.

[15] T.-W. Chong and B.-J. Kim, “American sign language recognition system using wearable sensors with deep learning approach,” The Journal of the Korea institute of electronic communication sciences, vol. 15, no. 2, pp. 291–298, 2020.

[16] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987.

[17] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” in Linear algebra. Springer, 1971, pp. 134–151.

[18] P. Comon, “Independent component analysis, a new concept?” Signal processing, vol. 36, no. 3, pp. 287–314, 1994.

[19] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.

[20] A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectiﬁer nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1. Citeseer, 2013, p. 3.

[21] S. Ioﬀe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. PMLR, 2015, pp. 448–456.

[22] J. D. Schein and M. T. Delk Jr, “The deaf population of the united states.” 1974.

[23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overﬁtting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.

簡易檢索 / 詳目顯示

相關論文