聯合局部保留字典學習法研究｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	沈正勝 Seksan Mathulaprangsan
論文名稱：	聯合局部保留字典學習法研究 A Study of Locality Preserved Joint Dictionary Learning
指導教授：	王家慶 Jia-Ching Wang
口試委員:
學位類別：	博士 Doctor
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	66
中文關鍵詞：	字典學習、聯合詞典學習、局部特徵保留、非負矩陣分解
外文關鍵詞：	dictionary learning, joint dictionary learning, locality preserving, nonnegative matrix factorization
相關次數：	點閱：11 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文結合局部特徵保留(Locality preserving)技術與字典學習方法，並且藉此提升其應用在語音情緒辨識以及物件辨識上之效果。
首先，針對影像物件辨識應用，我們提出二個新穎之局部保留字典學習方法，其一為具鑑別性(Discriminative)之局部保留KSVD(LP-KSVD)，將標籤資訊引入局部保留項。其二為標籤一致性(Label-consistent)之LP-KSVD(LCLP-KSVD)，利用標籤一致做為限制項來進一步強化不同類別間之鑑別性。
接著，本論文針對語音情緒辨識應用，提出具局部保留之聯合非負矩陣分解(Joint nonnegative matrix factorization)方法(LP-JNMF)，透過同時重建語音特徵與訓練一簡單線性分類器，來學習具備高度鑑別力之共通特徵。此外，我們也引入局部保留限制項來使得學習出的特徵保留高維度特徵之流型(Manifold)。
實驗結果顯示，所提出的方法在物件辨識與語音情緒辨識的應用上，優於多項先進字典學習方法。

This study focuses on using the locality preserving technique, which uses geometric information of data, to boost up the performance of dictionary learning approaches to a number of pattern recognition tasks including speech emotion recognition and object recognition.
Firstly, to exploit fully the potential of the locality-preserving technique for the object recognition task, two novel dual-layer locality-preserving methods were developed. The former is the discriminative LP-KSVD (DLP-KSVD), which incorporates the label information into locality-preserving term. The latter is the label-consistent LP-KSVD (LCLP-KSVD), which applied the label-consistent constraint to the original LP-KSVD model to penalize the sparse codes from different classes to improve the discriminative power.
Secondly, a novel approach for speech emotion recognition, named locality preserved joint NMF (LP-JNMF), is introduced. This study achieves two goals jointly; the first is to learn a dictionary for the reconstruction of input acoustic features and the second is to learn a simple linear classifier for annotation. Since the learned representations are shared between the learned dictionaries and annotation matrix, the discriminative power is promoted. Moreover, to preserve the manifold of input acoustic features, a locality penalty term is incorporated into the objective function of joint dictionary learning. Thus, the discriminability of the learned dictionary is further improved.
Experimental results prove that the proposed methods outperform the baseline algorithms, which are state-of-the-art dictionary learning algorithms for object recognition and speech emotion recognition problems.

Abstract     iii
摘要    v
Table of Contents    vi
List of Figures    ix
List of Table    x
Abbreviations    xi
Pupblications    xiii
Chapter 1    Introduction    1
1.1.    Background    1
1.2.    Research Problem and Scope    3
1.3.    Thesis Organization    5
Chapter 2    Related Works    6
2.1.    Sparse Dictionary Learning    6
2.2.    Dictonary Learning via Collaborative Representation    7
2.3.    Joint Dictionary Learning    9
2.3.1.    Discriminative K-SVD    10
2.3.2.    Optimization for Joint Dictionary Learning    11
2.4    Nonnegative Matrix Factorization    13
2.5    Joint NMF    14
2.6    Locality Preserving Projection    15
Chapter 3    Proposed Models    17
3.1    Locality-preserving K-SVD Based Joint Dictionary Learning and Classifier    17
3.1.1    DLP-KSVD and LCLP-KSVD System Overview    17
3.1.2    Discriminative LP-KSVD (DLP-KSVD)    19
3.1.3    Label-consistent LP-KSVD (LCLP-KSVD)    20
3.1.4    Locality-incorporated Dictionary Learning    21
3.2    Locality Preserved Joint Nonnegative Matrix Factorization    24
3.2.1    LP-JNMF System Overview    25
3.2.2    The Proposed LP-JNMF    26
Chapter 4    Experimental Results    28
4.1    Experiments of the Proposed Locality-preserving K-SVD Based JDL Methods on Caltech101 dataset    28
4.1.1    Comparison of Effects of Variously Sized Training Data    30
4.1.2    Comparison of Effects of Variously Sized Dictionary    31
4.2    Experiments of the Proposed LP-JNMF on MHMC    32
4.2.1    Effect of Different Sizes of Bases    35
4.2.2    Effect of Different Types of Features    36
4.2.3    Effect of Different Sizes of Training Data    37
Chapter 5    Discussions    40
5.1    Pros and Cons of the Proposed Models    40
Chapter 6    Conclustion and Future Works    43
6.1    Conclusion    43
6.2    Future Works    44
Bibliographies    45


                                

[1] J. C. Wang, Y. H. Chin, B. W. Chen, C. H. Lin, and C. H. Wu, “Speech Emotion Verification Using Emotion Variance Modeling and Discriminant Scale-Frequency Maps,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 10, pp. 1552–1562, 2015.
[2] M. J. Gangeh, P. Fewzee, A. Ghodsi, M. S. Kamel, and F. Karray, “Multiview Supervised Dictionary Learning in Speech Emotion Recognition,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 6, pp. 1056–1068, Jun. 2014.
[3] S. Lazebnik and M. Raginsky, “Supervised Learning of Quantizer Codebooks by Information Loss Minimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 7, pp. 1294–1309, Jul. 2009.
[4] X. C. Lian, Z. Li, C. Wang, B. L. Lu, and L. Zhang, “Probabilistic Models for Supervised Dictionary Learning,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2010, pp. 2305–2312.
[5] J. Yang, K. Yu, and T. Huang, “Supervised translation-invariant sparse coding,” IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3517–3524, 2010.
[6] H. Zhang, Y. Zhang, and T. S. Huang, “Simultaneous Discriminative Projection and Dictionary Learning for Sparse Representation Based Classification,” Pattern Recognit., vol. 46, no. 1, pp. 346–354, 2013.
[7] Q. Zhang and B. Li, “Discriminative K-SVD for Dictionary Learning in Face Recognition,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2010, pp. 2691–2698.
[8] Z. Jiang, Z. Lin, and L. S. Davis, “Learning a Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2011, pp. 1697–1704.
[9] W. Liu, Z. Yu, M. Yang, L. Lu, and Y. Zou, “Joint kernel dictionary and classifier learning for sparse coding via locality preserving K-SVD,” Proc. - IEEE Int. Conf. Multimed. Expo, vol. 2015–Augus, 2015.
[10] X. He and P. Niyogi, “Locality Preserving Projections,” in Proc. Conf. Advances Neural Inform. Process. Syst., 2003, pp. 153–160.
[11] Y. Zhou, J. Gao, and K. E. Barner, “Locality Preserving KSVD for Nonlinear Manifold Learning,” in Acoust., Speech, and Signal Process. (ICASSP), 2013, pp. 3372–3376.
[12] T. Komatsu, Y. Senda, and R. Kondo, “Acoustic Event Detection Based on Non-negative Matrix Factorization with Mixtures of Local Dictionaries and Activation Aggregation,” in Acoust., Speech, and Signal Process. (ICASSP), 2016, pp. 2259–2263.
[13] A. Mesaros, T. Heittola, O. Dikmen, and T. Virtanen, “Sound Event Detection in Real Life Recordings Using Coupled Matrix Factorization of Spectral Representations and Class Activity Annotations,” in Acoust., Speech, and Signal Process. (ICASSP), 2015, pp. 151–155.
[14] Z. Wu, E. S. Chng, and H. Li, “Joint nonnegative matrix factorization for exemplar-based voice conversion.”
[15] S. Fu, P. Li, Y. Lai, C. Yang, L. Hsieh, and Y. Tsao, “Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to,” vol. 64, no. 11, pp. 2584–2594, 2017.
[16] A. Y. N. Honglak Lee, Alexis Battle, Rajat Raina, “Efficient Sparse coding algorithms,” Adv. nerual infromation Process. Syst., pp. 801–808, 2006.
[17] K. Gregor and Y. Lecun, “Learning Fast Approximations of Sparse Coding,” Vision, Image Signal Process. IEE Proc. -, vol. 152, no. 3, pp. 318–326, 2005.
[18] L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: Which helps face recognition?,” Proc. IEEE Int. Conf. Comput. Vis., pp. 471–478, 2011.
[19] Z. Zhang, S. Member, Y. Xu, and S. Member, “A Survey of Sparse Representation : Algorithms and Applications,” IEEE Access, vol. 3, pp. 490–530, 2015.
[20] I. S. Dhillon and S. Sra, “Generalized nonnegative matrix approximations with Bregman divergences,” in Advances in neural information processing systems 18, 2005.
[21] R. Tandon and S. Sra, “Sparse nonnegative matrix approximation: new formulations and algorithms,” Tech Report No. 193, Max-Planck, 2010.
[22] K. Jeong, J. Song, and H. Jeong, “NMF Features for Speech Emotion Recognition,” in Proceedings of the 2009 International Conference on Hybrid Information Technology, 2009, pp. 368–374.
[23] K. Jeong, J. Song, and H. Jeong, “Spectral Analysis for Emotion Recognition by NMF Features,” in 2009 Fifth International Conference on Natural Computation, 2009, vol. 5, pp. 121–125.
[24] S.-Y. Lee, H.-A. Song, and S. Amari, “A new discriminant NMF algorithm and its application to the extraction of subtle emotional differences in speech,” Cognitive Neurodynamics, vol. 6, no. 6. Dordrecht, pp. 525–535, Dec-2012.
[25] D. Kim, S. Y. Lee, and S. I. Amari, “Representative and discriminant feature extraction based on NMF for emotion recognition in speech,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5863 LNCS, no. PART 1, C. S. Leung, M. Lee, and J. H. Chan, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 649–656.
[26] P. Song, S. Ou, W. Zheng, Y. Jin, and L. Zhao, “Speech emotion recognition using transfer non-negative matrix factorization,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5180–5184.
[27] Z. Wu, E. Chng, and H. Li, “Joint nonnegative matrix factorization for exemplar-based voice conversion,” Multimed. Tools Appl., vol. 74, 2014.
[28] L. Zhang, G. Bao, Y. Luo, and Z. Ye, “Monaural Speech Enhancement Using Joint Dictionary Learning with Cross-Coherence Penalties,” Proc. - 2015 8th Int. Symp. Comput. Intell. Des. Isc. 2015, vol. 2, pp. 518–522, 2016.
[29] J. Sadasivan, S. Mukherjee, and C. S. Seelamantula, “Joint dictionary training for bandwidth extension of speech signals,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2016–May, pp. 5925–5929, 2016.
[30] Y. K. Yılmaz and a T. Cemgil, “Generalised Coupled Tensor Factorisation,” Adv. Neural Inf. Process. Syst., pp. 2151--2159, 2011.
[31] D. Cai, X. He, J. Han, and T. S. Huang, “Graph Regularized Nonnegative Matrix Factorization for Data Representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1548–1560, 2011.
[32] J. Wang, J. Yang, K. Yu, F. Lv, and T. Huang, “Locality-constrained Linear Coding for Image Classification,” in IEEE Conf. Comput. Vision Pattern Recognition (CPVR), 2010, pp. 3360–3367.
[33] L. Fei-fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples : An incremental Bayesian approach tested on 101 object categories,” vol. 106, pp. 59–70, 2007.
[34] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009.
[35] S. Lazebnik and C. Schmid, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognition (CVPR), 2006, pp. 2169–2178.
[36] J. C. Lin, C. H. Wu, and W. L. Wei, “Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition,” IEEE Trans. Multimed., vol. 14, no. 1, pp. 142–156, Feb. 2012.
[37] M. R. Schädler and B. Kollmeier, “Separable Spectro-temporal Gabor Filter Bank Features: Reducing the Complexity of Robust Features for Automatic Speech Recognition,” J. Acoust. Soc. Am., vol. 137, no. 4, pp. 2047–2059, 2015.
[38] Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent K-SVD: Learning a discriminative dictionary for recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 2651–2664, 2013.
[39] Y. S. Lee, C. Y. Wang, S. Mathulaprangsan, J. H. Zhao, and J. C. Wang, “Locality-preserving K-SVD Based Joint Dictionary and Classifier Learning for Object Recognition,” in Proc. ACM Multimedia Conf., 2016, pp. 481–485.
[40] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit,” CS Tech., pp. 1–15, 2008.
[41] R. Hennequin, “NMF-matlab, https://github.com/romi1502/NMF-matlab.” 2015.
[42] I. Luengo, E. Navas, I. Hernáez, and J. Sánchez, “Automatic Emotion Recognition using Prosodic Parameters,” in in Proc. of INTERSPEECH, 2005, pp. 493–496.

簡易檢索 / 詳目顯示

相關論文