關於一些非線性降維的方法與改進｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王先弘 Xian-Hong Wang
論文名稱：	關於一些非線性降維的方法與改進 On Some Nonlinear Dimensionality Reduction Methods and Improvements
指導教授：	楊肅煜 Suh-Yuh Yang
口試委員:
學位類別：	碩士 Master
系所名稱：	理學院 - 數學系 Department of Mathematics
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	49
中文關鍵詞：	維度縮減、主成分分析、線性判別分析、多維尺度擬合、等距映射、擴散映射、拉普拉斯特徵映射、局部線性嵌入、核主成分分析
外文關鍵詞：	dimensionality reduction, principal component analysis, linear discriminant analysis, multidimensional scaling, Isomap, diffusion maps, Laplacian eigenmap, locally linear embedding, kernel PCA
相關次數：	點閱：10 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

降維是指減少數據集中變量的數量，理想情況下接近其內在維度，同時保留原始數據的有意義特性，它通常是資料科學中模型訓練之前的數據預處理步驟。具體來說，它可用於資料可視化、集群分析、降噪，或作為促進其他研究的中間步驟。在這篇論文中，我們簡要介紹主成分分析和線性判別分析兩種線性降維的方法，和一些主要的非線性降維方法，包括多維尺度擬合、等距映射、擴散映射、拉普拉斯特徵映射、局部線性嵌入和核主成分分析等方法的推導。此外，我們借助測地距離對拉普拉斯特徵映射和擴散映射進行了改進，我們還提出了一種選擇降維維度的方法。最後，我們進行數值實驗並比較各種降維技術。

Dimensionality reduction is reducing the number of variables in a dataset, ideally close to its intrinsic dimension, while retaining meaningful properties of the orig- inal data. It is usually a data preprocessing step before training models in data science. Specifically, it can be used for data visualization, cluster analysis, noise reduction, or as an intermediate step to facilitate other studies. In this thesis, we briefly present the derivations of linear dimensionality reduction methods of the principal component analysis and linear discriminant analysis, and several nonlinear dimensionality reduction methods, including the multidimensional scaling, isometric mapping, diffusion maps, Laplacian eigenmap, locally linear embedding, and ker- nel PCA. Furthermore, we propose modifications to the Laplacian eigenmap and diffusion maps with the help of geodesic distance. We also present a method for selecting the dimension for dimensionality reduction. Finally, we perform numerical experiments and compare the various dimensionality reduction techniques.

Introduction 1
Linear dimensionality reduction methods 4
1 Principalcomponentanalysis ...................... 4
1.1 Definitionofspread........................ 4
1.2 Optimizationproblem ...................... 5
2 Lineardiscriminantanalysis ....................... 6
2.1 Scattermatrixandspread .................... 6
2.2 Optimizationproblem ...................... 7
Nonlinear dimensionality reduction methods 9
1 Mutidimensionalscaling ......................... 9
1.1 Centeringmatrix ......................... 9
1.2 MinimizationproblemofMDS.................. 10
2 Isometricmapping ............................ 11
2.1 MinimizationproblemofIsomap ................ 11
2.2 KernelIsomap........................... 11
3 Diffusionmaps .............................. 12
3.1 Diffusionprocess ......................... 13
3.2 Diffusionmap........................... 14
4 Laplacianeigenmap............................ 14
4.1 Optimizationproblem ...................... 15
4.2 Eigenvalueproblem........................ 16
5 Locallylinearembedding......................... 16
5.1 LLEalgorithm .......................... 17
5.2 Optimizingtheweights...................... 17
5.3 MappingtolowerdimensionalspaceRd . . . . . . . . . . .18 3.5.4 Eigenvalueproblem........................ 19
6 KernelPCA................................ 19
6.1 Kernelfunction .......................... 19
6.2 Projectiondirection........................ 20
Two modified nonlinear methods 23
1 Computingthegeodesicdistance .................... 23
2 ModifiedLaplacianeigenmap ...................... 24
3 Modifieddiffusionmaps ......................... 24
Approximating the intrinsic dimension d 26
Numerical experiments 28
1 S-curve .................................. 28
2 Swissroll ................................. 30
3 Swissrollwithahole........................... 32
4 Irisplantsdataset............................. 34
5 Handwritten digits dataset........................ 34
Conclusions 38
References 40
                                

[1] D. Calvetti and E. Somersalo, Mathematics of Data Science: A Computational Approach to Clustering and Classification, SIAM, Philadelphia, PA, 2021.
[2] A. M. Martinez and A. C. Kak, PCA versus LDA, IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 23 (2001), pp. 228-233.
[3] J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimen- sional scaling via an n-way generalization of “Eckart-Young” decomposition, Psychometrika, 35 (1970), pp. 283-319.
[4] J. B. Tenenbaum, V. D. Silva, and J. C. Langford, A global geometric frame- work for nonlinear dimensionality reduction, Science, 290 (2000), pp. 2319-2322.
[5] C. Heeyoul and C. Seungjin, Robust kernel Isomap, Pattern Recognition, 40 (2007), pp. 853-862.
[6] F. Cailliez, The analytical solution of the additive constant problem, Psychome- trika, 48 (1983), pp. 305-308.
[7] E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik, 1 (1959), pp. 269-271.
[8] R. R. Coifman and S. Lafon, Diffusion maps, Applied and Computational Har- monic Analysis, 21 (2006), pp. 5-30.
[9] J. de la Porte, B. M. Herbst, W. Hereman, and S. J. van der Walt, An introduc- tion to diffusion maps, Proceedings of the Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), 2008, pp. 15-25.
[10] M. Belkin and P. Niyogi, Laplacian eigenmaps and spectral techniques for em- bedding and clustering, Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 2001, pp. 585- 591.
[11] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, 15 (2003), pp. 1373-1396.
[12] L. K. Saul and S. T. Roweis, An introduction to locally linear embedding, Journal of Machine Learning Research, 7 (2001).
[13] L. K. Saul and S. T. Roweis, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), pp. 2323-2326.
[14] R. Rosipal, M. Girolami, L. J. Trejo, and A. Cichocki, Kernel PCA for fea- ture extraction and de-noising in nonlinear regression, Neural Computing & Applications, 10 (2001), pp. 231-243.
[15] B. Scho ̈lkopf, A. Smola, and K. R. Mu ̈ller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10 (1998), pp. 1299-1319.
[16] E. J. Cand`es, X. Li, Y. Ma, and J. Wright, Robust principal component anal- ysis?, Journal of the ACM, 58 (2011), Article 11.

簡易檢索 / 詳目顯示

相關論文