跳到主要內容

簡易檢索 / 詳目顯示

研究生: 邱柏嘉
Po-Chia Chiu
論文名稱: 結合多特徵及深度學習擴增技術提升Android小樣本惡意家族分類能力
Effective Android minor malware family detection using multiple feature integration approach and deep learning augmentation technique
指導教授: 陳奕明
Yi-Ming Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 89
中文關鍵詞: Android惡意程式分析多特徵惡意程式圖像化惡意家族分類深度卷積生成對抗網路卷積神經網路
外文關鍵詞: Android Malware Detection, Multi-feature, Malware Visualization, Malware Family Classification, DCGAN, CNN
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年的惡意程式檢測技術,憑藉著硬體運算能力的快速增長,利用深度學習技術 檢測惡意程式的研究逐漸增加,且偵測的效果也比傳統技術更加準確,Android 惡意程 式的攻擊手法不斷變化,產生許多不同的攻擊類型,而具有較相似攻擊目標及行為的惡 意程式則被研究人員歸類在同個惡意家族,以利後續分析,但某些惡意家族的樣本數較 少,造成使用深度學習技術來偵測惡意程式的方法無法有效學習這些惡意家族的特徵, 使深度學習技術對於辨識特定惡意程式的效果下降。本研究則試圖改善此一問題,使用 Android 應用程式中的多種特徵——Opcode、API 及 Permission,以不同的前處理方式生 成三個特徵向量,接著將三個特徵向量結合成 RGB 圖像,並使用深度卷積生成對抗網 路(Deep Convolutional Generative Adversarial Network,GAN)擴增少樣本惡意家族中的 樣本,最後輸入至卷積神經網路(Convolutional Neural Network,CNN)進行惡意家族分 類,提升深度學習對少樣本惡意家族的偵測率。實驗結果顯示結合多特徵及深度卷積生 成對抗網路能有效提升深度學習辨識 Android 少樣本惡意家族的能力。


    With the continuous changes in malicious attack methods, the imbalanced Android malware family dataset is a big problem, which causes deep learning model cannot effectively learn the features of small families, resulting in decreased effectiveness of malware detection. This research used three static features in Android applications, which are opcode, API and permission, and used different pre-processing methods to generate feature vectors in order to form the RGB image. After RGB images generated, DCGAN (Deep Convolutional Generative Adversarial Network) is used to augment samples of small families, then input them to Convolutional Neural Networks (CNN) for family classification. The experimental results showed that using multi-feature and DCGAN can effectively improve the ability of Convolutional Neural Network (CNN) to identify small families, and the F1-score of small families can be increased between 2%-20%.

    中文摘要 i Abstract ii 致謝 iii 目錄 iv 圖目錄 vi 表目錄 vii 一、 前言 1 1.1 研究背景 1 1.2 研究動機 4 1.3 研究貢獻 7 1.4 章節架構 7 二、 相關研究 9 2.1 Android惡意程式分析 9 2.1.1 惡意程式特徵 9 2.1.2 分析方式 11 2.1.3 惡意家族分類 14 2.2 惡意程式圖像化及擴增技術 15 2.2.1 惡意程式圖像化 15 2.2.2 卷積神經網路 22 2.2.3 生成對抗網路 26 2.2.4 惡意程式樣本擴增 27 三、 系統設計 30 3.1 反編譯模組(Decompile Module) 31 3.2 特徵向量化模組(Feature Vectorization) 32 3.3 RGB圖片生成模組(RGB Image Generation) 36 3.4 惡意程式樣本擴增模組(Augmentation Module) 40 3.5 惡意家族分類模組(Classification Module) 42 四、 實驗結果 43 4.1 實驗環境 43 4.2 資料集 44 4.3 實驗設計 44 4.3.1 實驗一 44 4.3.2 實驗二 47 4.3.3 實驗三 50 4.3.4 實驗四 53 4.3.5 實驗五 55 4.3.6 實驗六 58 4.3.7 實驗七 61 4.3.8 實驗八 63 五、 結論與未來研究 66 5.1 結論與貢獻 66 5.2 研究限制 67 5.3 未來研究 69 參考文獻 70

    [1] S. O'Dea. (2021). Number of smartphone users worldwide from 2016 to 2023. Available: https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/
    [2] statcounter. (2021). Mobile Operating System Market Share Worldwide. Available: https://gs.statcounter.com/os-market-share/mobile/worldwide
    [3] J. Johnson. (2021). Development of new Android malware worldwide from June 2016 to March 2020. Available: https://www.statista.com/statistics/680705/global-android-malware-volume/
    [4] S. Türker and A. B. Can, "Andmfc: Android malware family classification framework," in 2019 IEEE 30th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC Workshops), 2019, pp. 1-6: IEEE.
    [5] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, "Droid-sec: deep learning in android malware detection," in Proceedings of the 2014 ACM conference on SIGCOMM, 2014, pp. 371-372.
    [6] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, and F. Mercaldo, "Detection of obfuscation techniques in Android applications," in Proceedings of the 13th International Conference on Availability, Reliability and Security, 2018, pp. 1-9.
    [7] X. Xiao, S. Zhang, F. Mercaldo, G. Hu, A. K. J. M. T. Sangaiah, and Applications, "Android malware detection based on system call sequences and LSTM," vol. 78, no. 4, pp. 3979-3999, 2019.
    [8] Y. Lu and J. Li, "Generative adversarial network for improving deep learning based malware classification," in 2019 Winter Simulation Conference (WSC), 2019, pp. 584-593: IEEE.
    [9] J. Yan, Y. Qi, Q. J. S. Rao, and C. Networks, "LSTM-based hierarchical denoising network for Android malware detection," (in English), vol. 2018, 2018.
    [10] A. Krizhevsky, I. Sutskever, and G. E. J. A. i. n. i. p. s. Hinton, "Imagenet classification with deep convolutional neural networks," vol. 25, pp. 1097-1105, 2012.
    [11] W.-N. Hsu, Y. Zhang, and J. Glass, "Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation," in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, pp. 16-23: IEEE.
    [12] I. J. Goodfellow et al., "Generative adversarial networks," 2014.
    [13] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223-2232.
    [14] R. Huang, S. Zhang, T. Li, and R. He, "Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2439-2448.
    [15] M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, "Synthetic data augmentation using GAN for improved liver lesion classification," in 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), 2018, pp. 289-293: IEEE.
    [16] C. Bermudez, A. J. Plassard, L. T. Davis, A. T. Newton, S. M. Resnick, and B. A. Landman, "Learning implicit brain MRI manifolds with deep learning," in Medical Imaging 2018: Image Processing, 2018, vol. 10574, p. 105741L: International Society for Optics and Photonics.
    [17] Y.-M. Chen, C.-H. Yang, and G.-C. Chen, "Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection," in 2021 IEEE Conference on Dependable and Secure Computing (DSC), 2021, pp. 1-8: IEEE.
    [18] G. Iadarola, F. Martinelli, F. Mercaldo, and A. Santone, "Image-based Malware Family Detection: An Assessment between Feature Extraction and Classification Techniques," in IoTBDS, 2020, pp. 499-506.
    [19] X. Zhiwu, K. Ren, and F. Song, "Android malware family classification and characterization using CFG and DFG," in 2019 International Symposium on Theoretical Aspects of Software Engineering (TASE), 2019, pp. 49-56: IEEE.
    [20] JesusFreke. Baksmali. Available: https://github.com/JesusFreke/smali
    [21] J. Qiu et al., "A3CM: automatic capability annotation for android malware," vol. 7, pp. 147156-147168, 2019.
    [22] N. Xie, X. Wang, W. Wang, and J. J. F. o. C. S. Liu, "Fingerprinting Android malware families," vol. 13, no. 3, pp. 637-646, 2019.
    [23] J. Jiang et al., "Android Malware Family Classification Based on Sensitive Opcode Sequence," in 2019 IEEE Symposium on Computers and Communications (ISCC), 2019, pp. 1-7: IEEE.
    [24] B. Kang, S. Y. Yerima, K. McLaughlin, and S. Sezer, "N-opcode analysis for android malware classification and categorization," in 2016 International conference on cyber security and protection of digital services (cyber security), 2016, pp. 1-7: IEEE.
    [25] J. Lee, S. Lee, H. J. c. Lee, and security, "Screening smartphone applications using malware family signatures," vol. 52, pp. 234-249, 2015.
    [26] G. Suarez-Tangil, J. E. Tapiador, P. Peris-Lopez, and J. J. E. S. w. A. Blasco, "Dendroid: A text mining approach to analyzing and classifying code structures in android malware families," vol. 41, no. 4, pp. 1104-1117, 2014.
    [27] Y. Fang, Y. Gao, F. Jing, and L. J. I. A. Zhang, "Android malware familial classification based on DEX file section features," vol. 8, pp. 10614-10627, 2020.
    [28] S. Malik, K. J. I. J. o. S. Khatter, and Technology, "System call analysis of android malware families," vol. 9, no. 21, 2016.
    [29] M. Aresu, D. Ariu, M. Ahmadi, D. Maiorca, and G. Giacinto, "Clustering android malware families by http traffic," in 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), 2015, pp. 128-135: IEEE.
    [30] A. Martín, V. Rodríguez-Fernández, and D. J. E. A. o. A. I. Camacho, "CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains," vol. 74, pp. 121-133, 2018.
    [31] L. Massarelli, L. Aniello, C. Ciccotelli, L. Querzoni, D. Ucci, and R. Baldoni, "Android malware family classification based on resource consumption over time," in 2017 12th International Conference on Malicious and Unwanted Software (MALWARE), 2017, pp. 31-38: IEEE.
    [32] P. Rovelli and Ý. Vigfússon, "Pmds: Permission-based malware detection system," in International conference on information systems security, 2014, pp. 338-357: Springer.
    [33] K. A. Talha, D. I. Alper, and C. J. D. I. Aydin, "APK Auditor: Permission-based Android malware detection system," vol. 13, pp. 1-14, 2015.
    [34] R. Sato, D. Chiba, and S. J. P. o. t. A.-P. A. N. Goto, "Detecting android malware by analyzing manifest files," vol. 36, no. 23-31, p. 17, 2013.
    [35] J. M. Vidal, M. A. S. Monge, and L. J. G. J. K.-B. S. Villalba, "A novel pattern recognition system for detecting Android malware by analyzing suspicious boot sequences," vol. 150, pp. 198-217, 2018.
    [36] V. G. Shankar and G. J. P. C. S. Somani, "Anti-Hijack: Runtime detection of malware initiated hijacking in android," vol. 78, pp. 587-594, 2016.
    [37] Y. S. Sun, C.-C. Chen, S.-W. Hsiao, and M. C. Chen, "ANTSdroid: Automatic malware family behaviour generation and analysis for Android apps," in Australasian Conference on Information Security and Privacy, 2018, pp. 796-804: Springer.
    [38] S. W. Thomas, B. Adams, A. E. Hassan, and D. Blostein, "Validating the use of topic models for software evolution," in 2010 10th IEEE working conference on source code analysis and manipulation, 2010, pp. 55-64: IEEE.
    [39] M. Eskandari, Z. Khorshidpour, S. J. J. o. C. V. Hashemi, and H. Techniques, "HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection," vol. 9, no. 2, pp. 77-93, 2013.
    [40] A. I. Ali-Gombe, B. Saltaformaggio, D. Xu, G. G. J. c. Richard III, and security, "Toward a more dependable hybrid analysis of android malware using aspect-oriented programming," vol. 73, pp. 235-248, 2018.
    [41] R. Surendran, T. Thomas, S. J. J. o. I. S. Emmanuel, and Applications, "A TAN based hybrid model for android malware detection," vol. 54, p. 102483, 2020.
    [42] L. Wei, W. Luo, J. Weng, Y. Zhong, X. Zhang, and Z. J. I. A. Yan, "Machine learning-based malicious application detection of android," vol. 5, pp. 25591-25601, 2017.
    [43] X. Xiao, Z. Wang, Q. Li, S. Xia, and Y. J. I. I. S. Jiang, "Back‐propagation neural network on Markov chains from system call sequences: a new approach for detecting Android malware with system call sequences," vol. 11, no. 1, pp. 8-15, 2017.
    [44] Z. Yuan, Y. Lu, Y. J. T. S. Xue, and Technology, "Droiddetector: android malware characterization and detection using deep learning," vol. 21, no. 1, pp. 114-123, 2016.
    [45] Y. Li and Z. Jin, "An Android malware detection method based on feature codes," in 2015 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering, 2015: Atlantis Press.
    [46] R. Sihwail, K. Omar, K. A. Z. J. I. J. o. A. S. Ariffin, Engineering, and I. Technology, "A survey on malware analysis techniques: Static, dynamic, hybrid and memory analysis," vol. 8, no. 4-2, p. 1662, 2018.
    [47] F. Alswaina and K. J. E. Elleithy, "Android malware family classification and analysis: Current status and future directions," vol. 9, no. 6, p. 942, 2020.
    [48] M. Fan et al., "Android malware familial classification and representative sample selection via frequent subgraph analysis," vol. 13, no. 8, pp. 1890-1905, 2018.
    [49] M. Fan et al., "Frequent subgraph based familial classification of android malware," in 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), 2016, pp. 24-35: IEEE.
    [50] H. Zhou, W. Zhang, F. Wei, and Y. Chen, "Analysis of Android malware family characteristic based on isomorphism of sensitive API call graph," in 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC), 2017, pp. 319-327: IEEE.
    [51] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images: visualization and automatic classification," in Proceedings of the 8th international symposium on visualization for cyber security, 2011, pp. 1-7.
    [52] M. Yang and Q. Wen, "Detecting android malware by applying classification techniques on images patterns," in 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), 2017, pp. 344-347: IEEE.
    [53] T. Hsien-De Huang and H.-Y. Kao, "R2-d2: Color-inspired convolutional neural network (cnn)-based android malware detections," in 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 2633-2642: IEEE.
    [54] F. Mercaldo, A. J. J. o. C. V. Santone, and H. Techniques, "Deep learning for image-based mobile malware detection," pp. 1-15, 2020.
    [55] A. J. G. Rakhlin, "Convolutional Neural Networks for Sentence Classification," 2016.
    [56] N. McLaughlin et al., "Deep android malware detection," in Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, 2017, pp. 301-308.
    [57] Z. Xu, K. Ren, S. Qin, and F. Craciun, "CDGDroid: Android malware detection based on deep learning using CFG and DFG," in International Conference on Formal Engineering Methods, 2018, pp. 177-193: Springer.
    [58] E. B. Karbab, M. Debbabi, A. Derhab, and D. J. D. I. Mouheb, "MalDozer: Automatic framework for android malware detection using deep learning," (in English), vol. 24, pp. S48-S59, 2018.
    [59] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. J. a. p. a. Dean, "Distributed representations of words and phrases and their compositionality," 2013.
    [60] J. Pennington, R. Socher, and C. Manning, "Global Vectors for Word Representation," 2015.
    [61] A. Hota and P. Irolla, "Deep Neural Networks for Android Malware Detection," in ICISSP, 2019, pp. 657-663.
    [62] Q. Le and T. Mikolov, "Distributed representations of sentences and documents," in International conference on machine learning, 2014, pp. 1188-1196: PMLR.
    [63] N. Huang, M. Xu, N. Zheng, T. Qiao, and K.-K. R. Choo, "Deep Android malware classification with API-based feature graph," in 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019, pp. 296-303: IEEE.
    [64] W. Guo, T. Wang, and J. Wei, "Malware detection with convolutional neural network using hardware events," in CCF National Conference on Compujter Engineering and Technology, 2017, pp. 104-115: Springer.
    [65] Y. Ye et al., "AiDroid: When heterogeneous information network marries deep neural network for real-time Android malware detection," 2018.
    [66] M. Mirza and S. J. a. p. a. Osindero, "Conditional generative adversarial nets," (in English), 2014.
    [67] A. Radford, L. Metz, and S. J. a. p. a. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," (in English), 2015.
    [68] L. Chen, S. Hou, Y. Ye, and S. Xu, "Droideye: Fortifying security of learning-based classifier against adversarial android malware attacks," in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2018, pp. 782-789: IEEE.
    [69] W. Hu and Y. J. a. p. a. Tan, "Generating adversarial malware examples for black-box attacks based on GAN," (in English), 2017.
    [70] J. W. Stokes, D. Wang, M. Marinescu, M. Marino, and B. Bussone, "Attack and defense of dynamic analysis-based, adversarial neural malware detection models," in MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM), 2018, pp. 1-8: IEEE.
    [71] I. Rosenberg, A. Shabtai, Y. Elovici, and L. J. a. p. a. Rokach, "Query-efficient gan based black-box attack against sequence based machine and deep learning classifiers," (in English), 2018.
    [72] J.-Y. Kim, S.-J. Bu, and S.-B. J. I. S. Cho, "Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders," (in English), vol. 460, pp. 83-102, 2018.
    [73] A. Desnos. Androguard. Available: https://androguard.readthedocs.io/en/latest/
    [74] J. Xu, Y. Li, R. Deng, K. J. I. T. o. D. Xu, and S. Computing, "SDAC: A Slow-Aging Solution for Android Malware Detection Using Semantic Distance Based API Clustering," 2020.
    [75] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, F. Mercaldo, and C. A. Visaggio, "Impact of Code Obfuscation on Android Malware Detection based on Static and Dynamic Analysis," in ICISSP, 2018, pp. 379-385.

    QR CODE
    :::