跳到主要內容

簡易檢索 / 詳目顯示

研究生: 蕭宸欣
Chen-Hsin Hsiao
論文名稱: 基於網絡嵌入的集成學習以改善鏈結預測準確度
An ensemble model for link prediction based on graph embedding
指導教授: 陳彥良
Yen-Liang Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 59
中文關鍵詞: 鏈結預測網絡嵌入集成學習
外文關鍵詞: Link prediction, Ensemble learning, Graph embedding
相關次數: 點閱:17下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網絡是一種數據表示形式,目前已被廣泛應用在多個領域中,例如在社群網絡中,我們將節點視為個人或群體,節點之間的邊則稱作鏈結,而鏈結預測的核心概念是藉由分析網絡中節點之間的交互作用,來推斷節點之間是否存在新關係,或是挖掘網絡中的隱藏鏈結,而有效的網絡分析能夠使我們對數據背後的內容有更深入的了解。
    目前鏈結預測已被廣泛的應用在社群網絡、電子商務、生物資訊等各個領域,透過鏈結預測,可以幫助研究人員了解網絡的樣貌,並從中挖掘訊息來間接的反映現實生活中的情形。在鏈結預測中,透過網絡嵌入的方式,能夠將網絡中的節點訊息投射到低維向量空間中,並有效的保留網絡結構。在本論文中,我們將採用三種網絡嵌入的方式,分別是:Matrix Factorization based methods、Random walk based methods 以及Deep learning based methods ,每一種網絡嵌入法都有各自的優缺點,因此我們提出一個集成學習模型來保留每一網絡嵌入的特性,透過不同的網絡嵌入學習節點表示。我們在五個資料集上進行實驗,結果顯示利用多個網絡嵌入表示法的學習,透過多個不同的分類器進行訓練,最後以深度神經網絡作為最後的結果預測,能有效提升鏈結預測的準確率。


    Network is a form of data representation, and it has been widely used in many fields. For example, in social networks, we regard nodes as individuals or groups, and the edges between nodes are called links, which means the interaction of the people. By analyzing the interaction of the nodes, we could learn more information on the relationship of the network. The core idea of link prediction is to predict whether there is a new relationship between the pair of nodes or to discover the hidden links in the network. Nowadays, link prediction has been used in social networks, e-commerce, biological information, and other fields. Moreover, researchers use graph embedding for link prediction, which effectively preserves the network structure and converts the node information into the low-dimensional vector space. In this study, we use three graph embedding methods: Matrix Factorization based methods, Random walk based methods, and Deep learning based methods. Each method has its own strength and weaknesses, so we propose an ensemble model to combine these graph embedding to a new representation for each node. The new representations will be regarded as the input of our link prediction model. The performance evaluations are conducted on multiple datasets. Experimental results show that using multiple graph embedding for representations can effectively improve the performance of link prediction.

    摘要 ...i ABSTRACT ii List of Figures v List of Tables vi 1. Introduction 1 2. Related work 5 2-1 Link prediction 5 2-2 Ensemble learning 6 2-2-1 Bagging 8 2-2-2 Boosting 8 2-2-3 Stacking 9 2-3 Graph embedding 9 2-3-1 Factorization based methods 10 2-3-2 Random walk based methods 11 2-3-3 Deep learning based methods 13 2-3-4 Other 14 2-3-5 Graph embedding summary 15 3. Proposed approach 16 3-1 Model structure 16 3-2 Graph embedding 17 3-2-1 Node combination 18 3-3 Classifiers setting 19 3-3-1 First level classifiers 20 3-3-2 Second level classifier 22 4. Experiments and results 24 4-1 Datasets 25 4-2 Data preprocessing 26 4-3 Experimental setting 27 4-4 Baseline setting 28 4-5 Experimental results 28 4-6 Statistical tests 32 4-7 Experimental summary 39 5. Conclusion 40 5-1 Limitations and future work 41 Reference 42

    [1] Daud, N.N., et al., Applications of link prediction in social networks: A review. Journal of Network and Computer Applications, 2020: p. 102716.
    [2] Fire, M., et al. Link prediction in social networks using computationally efficient topological features. in 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing. 2011. IEEE.
    [3] Aiello, L.M., et al., Friendship prediction and homophily in social media. ACM Transactions on the Web (TWEB), 2012. 6(2): p. 1-33.
    [4] Chen, A., et al., Finding hidden links in terrorist networks by checking indirect links of different sub-networks, in Counterterrorism and open source intelligence. 2011, Springer. p. 143-158.
    [5] Wang, Y.-B., et al., Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Molecular BioSystems, 2017. 13(7): p. 1336-1344.
    [6] Yao, L., et al., Link prediction based on common-neighbors for dynamic social network. Procedia Computer Science, 2016. 83: p. 82-89.
    [7] Crichton, G., et al., Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches. BMC bioinformatics, 2018. 19(1): p. 176.
    [8] Du, X., J. Yan, and H. Zha. Joint Link Prediction and Network Alignment via Cross-graph Embedding. in IJCAI. 2019.
    [9] Cai, H., V.W. Zheng, and K.C.-C. Chang, A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 2018. 30(9): p. 1616-1637.
    [10] Wang, X., et al. Community preserving network embedding. in AAAI. 2017.
    [11] Nie, F., W. Zhu, and X. Li. Unsupervised large graph embedding. in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2017.
    [12] Tang, J., et al. Visualizing large-scale and high-dimensional data. in Proceedings of the 25th international conference on world wide web. 2016.
    [13] Zhou, Z.-H., Ensemble learning. Encyclopedia of biometrics, 2009. 1: p. 270-273.
    [14] Abellán, J. and C.J. Mantas, Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 2014. 41(8): p. 3825-3830.
    [15] Martínez, V., F. Berzal, and J.-C. Cubero, A survey of link prediction in complex networks. ACM computing surveys (CSUR), 2016. 49(4): p. 1-33.
    [16] Wang, P., et al., Link prediction in social networks: the state-of-the-art. Science China Information Sciences, 2015. 58(1): p. 1-38.
    [17] Al Hasan, M., et al. Link prediction using supervised learning. in SDM06: workshop on link analysis, counter-terrorism and security. 2006.
    [18] Wang, Y. and J. Zeng, Predicting drug-target interactions using restricted Boltzmann machines. Bioinformatics, 2013. 29(13): p. i126-i134.
    [19] Krebs, V.E., Mapping networks of terrorist cells. Connections, 2002. 24(3): p. 43-52.
    [20] Sagi, O. and L. Rokach, Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018. 8(4): p. e1249.
    [21] Kadam, V.J., S.M. Jadhav, and K. Vijayakumar, Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoencoders and softmax regression. Journal of medical systems, 2019. 43(8): p. 263.
    [22] Idrees, F., et al., PIndroid: A novel Android malware detection system using ensemble learning methods. Computers & Security, 2017. 68: p. 36-46.
    [23] Da Silva, N.F., E.R. Hruschka, and E.R. Hruschka Jr, Tweet sentiment analysis with classifier ensembles. Decision Support Systems, 2014. 66: p. 170-179.
    [24] Breiman, L., Bagging predictors. Machine learning, 1996. 24(2): p. 123-140.
    [25] Schapire, R.E., The strength of weak learnability. Machine learning, 1990. 5(2): p. 197-227.
    [26] Wolpert, D.H., Stacked generalization. Neural networks, 1992. 5(2): p. 241-259.
    [27] Yang, C., et al. Network representation learning with rich text information. in IJCAI. 2015.
    [28] Ahmed, A., et al. Distributed large-scale natural graph factorization. in Proceedings of the 22nd international conference on World Wide Web. 2013.
    [29] Cao, S., W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. in Proceedings of the 24th ACM international on conference on information and knowledge management. 2015.
    [30] Ou, M., et al. Asymmetric transitivity preserving graph embedding. in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
    [31] Mikolov, T., et al., Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546, 2013.
    [32] Perozzi, B., R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
    [33] Grover, A. and J. Leskovec. node2vec: Scalable feature learning for networks. in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
    [34] Ribeiro, L.F., P.H. Saverese, and D.R. Figueiredo. struc2vec: Learning node representations from structural identity. in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.
    [35] Wang, D., P. Cui, and W. Zhu. Structural deep network embedding. in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
    [36] Kipf, T.N. and M. Welling, Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
    [37] Tang, J., et al. Line: Large-scale information network embedding. in Proceedings of the 24th international conference on world wide web. 2015.
    [38] Mara, A., J. Lijffijt, and T. De Bie, EvalNE: a framework for evaluating network embeddings on link prediction. arXiv preprint arXiv:1901.09691, 2019.
    [39] Kang, B., J. Lijffijt, and T. De Bie, Conditional network embeddings. arXiv preprint arXiv:1805.07544, 2018.
    [40] Gao, M., et al. Bine: Bipartite network embedding. in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018.
    [41] Liao, H.-Y., K.-Y. Chen, and D.-R. Liu, Virtual friend recommendations in virtual worlds. Decision Support Systems, 2015. 69: p. 59-69.
    [42] Sun, J. and H. Li, Financial distress prediction using support vector machines: Ensemble vs. individual. Applied Soft Computing, 2012. 12(8): p. 2254-2265.
    [43] Huang, M.-W., et al., SVM and SVM ensembles in breast cancer prediction. PloS one, 2017. 12(1): p. e0161501.
    [44] Breiman, L., et al., Classification and regression trees. 1984: CRC press.
    [45] Biau, G. and E. Scornet, A random forest guided tour. Test, 2016. 25(2): p. 197-227.
    [46] Qi, Y., Random forest for bioinformatics, in Ensemble machine learning. 2012, Springer. p. 307-323.
    [47] Xuan, S., et al. Random forest for credit card fraud detection. in 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). 2018. IEEE.
    [48] Masetic, Z. and A. Subasi, Congestive heart failure detection using random forest classifier. Computer methods and programs in biomedicine, 2016. 130: p. 54-64.
    [49] Subasi, A., E. Alickovic, and J. Kevric, Diagnosis of chronic kidney disease by using random forest, in CMBEBIH 2017. 2017, Springer. p. 589-594.
    [50] Friedman, J.H., Greedy function approximation: a gradient boosting machine. Annals of statistics, 2001: p. 1189-1232.
    [51] Wang, J., et al., A short-term photovoltaic power prediction model based on the gradient boost decision tree. Applied Sciences, 2018. 8(5): p. 689.
    [52] Zhou, C., et al., Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree. PLoS One, 2017. 12(8): p. e0181426.
    [53] Hu, J. and J. Min, Automated detection of driver fatigue based on EEG signals using gradient boosting decision tree model. Cognitive neurodynamics, 2018. 12(4): p. 431-440.
    [54] Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
    [55] Liu, W., et al., A survey of deep neural network architectures and their applications. Neurocomputing, 2017. 234: p. 11-26.
    [56] Ba, L., Adaptive dropout for training deep neural networks. 2013.
    [57] Yue, X., et al., Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics, 2020. 36(4): p. 1241-1251.

    QR CODE
    :::