跳到主要內容

簡易檢索 / 詳目顯示

研究生: 丁翊軒
Yi-Hsuan Ting
論文名稱: 使用混合RGB圖像擴增技術提升Android小樣本惡意家族分類能力
RGB-based Hybrid Augmentation for Android Minor Malware Family Classification
指導教授: 陳奕明
Yi-Ming Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 73
中文關鍵詞: Android惡意程式檢測惡意家族分類資料擴增混合擴增深度學習
外文關鍵詞: Android, Malware detection, Malware family classification, Data augmentation, Hybrid augmentation, Deep learning
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著電腦運算速度的提升,許多研究透過深度學習方法來進行Android惡意程式檢測,但是除了惡意程式的二元檢測外,惡意程式家族分類更能夠使惡意程式研究人員了解其惡意家族的行為進而優化檢測方式及預防其變體。然而新出現的惡意程式家族數量少,容易導致分類效果不理想,而基於生成對抗網路的方法來進行擴增雖然可以提升分類效果,但是少量的資料還是會導致生成對抗網路方法所生成出的樣本品質不穩定,進而使分類效果提升有限。因此,本研究提出一種混合擴增方法,首先將提取惡意程式特徵並轉換成RGB圖像,再將樣本數過少的家族先經過高斯雜訊擴增方法(Gaussian Noise),再結合對於圖像擴增有更好效果的深度捲積生成對抗網路(Deep Convolutional Generative Adversarial Network,DCGAN)來擴增少數樣本的惡意程式家族,最後輸入至CNN(Convolutional Neural Network)進行家族分類。實驗結果顯示,使用本研究所提出的混合擴增方法,相較於未擴增以及只使用深度捲積生成對抗網路進行擴增,其F1-Score分別提升7~34%以及2%~7%。


    With the improvement of computer computing speed, many researches use deep learning for Android malware detection. In addition to malware detection, malware family classification will help malware researchers understand the behavior of the malware families to optimize detection and prevent variants. However, the new malware family has few samples, which lead to poor classification results. Although the deep learning augmentation method (GAN-based) can improve the classification results, but minor data will still lead to the unstable quality of the data generated by the deep learning augmentation method, which will limit the improvement of classification results. In this study, we will propose a hybrid augmentation method, first extracting malware features and converting them into RGB images, and then the minor families will augment by the gaussian noise augmentation method, and then combined with the deep convolutional generative adversarial network (DCGAN) which have better effect on image augmentation, and finally input to CNN for family classification. The experimental results show that using the hybrid augmentation method proposed in this study, compared to no augmentation and augmentation with only using the deep convolutional generative adversarial network, the F1-Score increased between 7%~34% and 2%~7%.

    一、緒論 1 1-1 研究背景 1 1-2 研究動機 3 1-3 研究目的與貢獻 5 1-4 章節架構 6 二、相關研究 7 2-1 Android惡意程式分析方法 7 2-2 程式碼圖像化 11 2-3 擴增技術 17 三、研究方法 23 3-1 資料前處理 24 3-1-1 反編譯模組 24 3-1-2 RGB圖像轉換模組 24 3-2 混合擴增 25 3-2-1 基本擴增模組 25 3-2-2 深度學習擴增模組 29 3-3 家族分類 32 3-3-1 惡意程式家族分類模組 32 3-4 評估指標 32 3-5 系統運作流程 34 四、實驗結果 35 4-1 實驗環境以及資料集 35 4-1-1 實驗環境 35 4-1-2 資料集 35 4-2 實驗一(混合擴增實驗) 37 4-2-1 混合擴增實驗於Drebin資料集 37 4-2-2 混合擴增實驗於AMD資料集 39 4-2-3 混合擴增實驗於CICMalDroid2020資料集 40 4-3 實驗二(不同加噪方法比較實驗) 43 4-3-1 不同加噪方法比較實驗於Drebin資料集 43 4-3-2 不同加噪方法比較實驗於AMD資料集 44 4-4 實驗三不同基本擴增方法結合DCGAN實驗 47 4-4-1 不同基本擴增方法結合DCGAN實驗於Drebin資料集 47 4-4-2 不同基本擴增方法結合DCGAN實驗於AMD資料集 49 4-5 實驗四極小樣本實驗 51 五、結論 52 5-1 結論與貢獻 52 5-2 研究限制 53 5-3 未來研究 54 參考文獻 55

    [1] statcounter. (2022). Desktop vs Mobile vs Tablet Market Share Worldwide. Available: https://gs.statcounter.com/platform-market-share/desktop-mobile-tablet/worldwide/#monthly-202103-202202
    [2] statcounter. (2022). Mobile Operating System Market Share Worldwide. Available: https://gs.statcounter.com/os-market-share/mobile/worldwide/#monthly-202103-202202
    [3] E. Willems. (2022). Android malware: An underestimated problem. Available: https://www.gdatasoftware.com/blog/2022/02/android-malware-an-underestimated-problem
    [4] A. Al Zaabi and D. Mouheb, "Android malware detection using static features and machine learning," in 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), 2020, pp. 1-5: IEEE.
    [5] A. Razgallah, R. Khoury, S. Hallé, and K. Khanmohammadi, "A survey of malware detection in Android apps: Recommendations and perspectives for future research," Computer Science Review, vol. 39, 2021.
    [6] T. Bhatia and R. Kaushal, "Malware detection in android based on dynamic analysis," in 2017 International Conference on Cyber Security And Protection Of Digital Services (Cyber Security), 2017, pp. 1-6: IEEE.
    [7] O. N. Elayan and A. M. Mustafa, "Android malware detection using deep learning," Procedia Computer Science, vol. 184, pp. 847-852, 2021.
    [8] L. Chan et al., "Survey of AI in cybersecurity for information technology management," in 2019 IEEE technology & engineering management conference (TEMSCON), 2019, pp. 1-8: IEEE.
    [9] F. Alswaina and K. J. E. Elleithy, "Android malware family classification and analysis: Current status and future directions," electronics, vol. 9, no. 6, p. 942, 2020.
    [10] S. Türker and A. B. Can, "Andmfc: Android malware family classification framework," in 2019 IEEE 30th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC Workshops), 2019, pp. 1-6: IEEE.
    [11] G. Iadarola, F. Martinelli, F. Mercaldo, and A. Santone, "Evaluating deep learning classification reliability in android malware family detection," in 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), 2020, pp. 255-260: IEEE.
    [12] C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of big data, vol. 6, no. 1, pp. 1-48, 2019.
    [13] Y. Lu and J. Li, "Generative adversarial network for improving deep learning based malware classification," in 2019 Winter Simulation Conference (WSC), 2019, pp. 584-593: IEEE.
    [14] L. Taylor and G. Nitschke, "Improving deep learning with generic data augmentation," in 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 1542-1547: IEEE.
    [15] P. Chaudhari, H. Agrawal, and K. Kotecha, "Data augmentation using MG-GAN for improved cancer classification on gene expression data," Soft Computing, vol. 24, no. 15, pp. 11381-11391, 2019.
    [16] R. Huang, S. Zhang, T. Li, and R. He, "Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2439-2448.
    [17] W. Hu and Y. Tan, "Generating adversarial malware examples for black-box attacks based on GAN," arXiv preprint, vol. arXiv:1702.05983, 2017.
    [18] R. Burks, K. A. Islam, Y. Lu, and J. Li, "Data augmentation with generative models for improved malware detection: A comparative study," in 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2019, pp. 0660-0665: IEEE.
    [19] P.-C. Chiu, "Effective Android minor malware family detection using multiple feature integration approach and deep learning augmentation technique," Master Thesis, National Central University, Department of Information Management, 2020.
    [20] Y. Pan, X. Ge, C. Fang, and Y. Fan, "A systematic literature review of android malware detection using static analysis," IEEE Access, vol. 8, pp. 116363-116379, 2020.
    [21] W. Enck, M. Ongtang, and P. McDaniel, "On lightweight mobile phone application certification," in Proceedings of the 16th ACM conference on Computer and communications security, 2009, pp. 235-245.
    [22] Z. Wang, K. Li, Y. Hu, A. Fukuda, and W. Kong, "Multilevel permission extraction in android applications for malware detection," in 2019 International Conference on Computer, Information and Telecommunication Systems (CITS), 2019, pp. 1-5: IEEE.
    [23] J. Jiang et al., "Android malware family classification based on sensitive opcode sequence," in 2019 IEEE Symposium on Computers and Communications (ISCC), 2019, pp. 1-7: IEEE.
    [24] A. Pektaş and T. Acarman, "Deep learning to detect Android malware via opcode sequences," Neurocomputing, 2018.
    [25] J.-S. Ko, J.-S. Jo, D.-H. Kim, S.-K. Choi, and J. Kwak, "Real time android ransomware detection by analyzed android applications," in 2019 International Conference on Electronics, Information, and Communication (ICEIC), 2019, pp. 1-5: IEEE.
    [26] X. Xiao, S. Zhang, F. Mercaldo, G. Hu, and A. K. Sangaiah, "Android malware detection based on system call sequences and LSTM," Multimedia Tools, vol. 78, no. 4, pp. 3979-3999, 2019.
    [27] R. Thangavelooa, W. W. Jinga, C. K. Lenga, and J. Abdullaha, "Datdroid: Dynamic analysis technique in android malware detection," Int. J. Adv. Sci. Eng. Inf. Technol, vol. 10, pp. 536-541, 2020.
    [28] K. Liu, S. Xu, G. Xu, M. Zhang, D. Sun, and H. Liu, "A review of android malware detection approaches based on machine learning," IEEE Access, vol. 8, pp. 124579-124607, 2020.
    [29] M. Odusami, O. Abayomi-Alli, S. Misra, O. Shobayo, R. Damasevicius, and R. Maskeliunas, "Android malware detection: A survey," in International conference on applied informatics, 2018, pp. 255-266: Springer.
    [30] P. Agrawal and B. Trivedi, "Machine learning classifiers for android malware detection," in Data Management, Analytics and Innovation: Springer, 2021, pp. 311-322.
    [31] J. Jung et al., "Android malware detection based on useful API calls and machine learning," in 2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 2018, pp. 175-178: IEEE.
    [32] A. Naway and Y. Li, "A review on the use of deep learning in android malware detection," arXiv preprint, vol. arXiv:1812.10360, 2018.
    [33] M. Gohari, S. Hashemi, and L. Abdi, "Android Malware Detection and Classification Based on Network Traffic Using Deep Learning," in 2021 7th International Conference on Web Research (ICWR), 2021, pp. 71-77: IEEE.
    [34] M. K. Alzaylaee, S. Y. Yerima, and S. Sezer, "DL-Droid: Deep learning based android malware detection using real devices," Computers & Security, vol. 89, p. 101663, 2020.
    [35] N. Zhang, Y.-a. Tan, C. Yang, and Y. Li, "Deep learning feature exploration for android malware detection," Applied Soft Computing, vol. 102, p. 107069, 2021.
    [36] A. H. E. Fiky, A. E. Shenawy, and M. A. Madkour, "Android Malware Category and Family Detection and Identification using Machine Learning," arXiv preprint, vol. arXiv:.01927, 2021.
    [37] A. Darwaish and F. Naït-Abdesselam, "Rgb-based android malware detection and classification using convolutional neural network," in GLOBECOM 2020-2020 IEEE Global Communications Conference, 2020, pp. 1-6: IEEE.
    [38] B. Kang, S. Y. Yerima, K. McLaughlin, and S. Sezer, "N-opcode analysis for android malware classification and categorization," in 2016 International conference on cyber security and protection of digital services (cyber security), 2016, pp. 1-7: IEEE.
    [39] H. Zhou, W. Zhang, F. Wei, and Y. Chen, "Analysis of Android malware family characteristic based on isomorphism of sensitive API call graph," in 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC), 2017, pp. 319-327: IEEE.
    [40] W. Zhang, N. Luktarhan, C. Ding, and B. Lu, "Android malware detection using tcn with bytecode image," Symmetry, vol. 13, no. 7, p. 1107, 2021.
    [41] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images: visualization and automatic classification," in Proceedings of the 8th international symposium on visualization for cyber security, 2011, pp. 1-7.
    [42] F. O. Catak, J. Ahmed, K. Sahinbas, and Z. H. Khand, "Data augmentation based malware detection using convolutional neural networks," PeerJ Computer Science, vol. 7, p. e346, 2021.
    [43] Y. Jian, H. Kuang, C. Ren, Z. Ma, and H. Wang, "A novel framework for image-based malware detection with a deep neural network," Computers Security, vol. 109, p. 102400, 2021.
    [44] F. Mercaldo and A. Santone, "Deep learning for image-based mobile malware detection," Journal of Computer Virology and Hacking Techniques, vol. 16, no. 2, pp. 157-171, 2020.
    [45] X. Xiao and S. Yang, "An image-inspired and cnn-based android malware detection approach," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 1259-1261: IEEE.
    [46] 楊竣憲, "Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection," Master Thesis, National Central University, Department of Information Management, 2020.
    [47] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, "Revisiting unreasonable effectiveness of data in deep learning era," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 843-852.
    [48] W. Li, C. Chen, M. Zhang, H. Li, and Q. Du, "Data augmentation for hyperspectral image classification with deep CNN," IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 4, pp. 593-597, 2018.
    [49] M. Nisa et al., "Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features," Applied Sciences, vol. 10, no. 14, p. 4966, 2020.
    [50] A. F. Costa, G. Humpire-Mamani, and A. J. M. Traina, "An efficient algorithm for fractal analysis of textures," in 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, 2012, pp. 39-46: IEEE.
    [51] H. Inoue, "Data augmentation by pairing samples for images classification," 2018.
    [52] I. Goodfellow et al., "Generative adversarial nets," vol. 27, 2014.
    [53] J. A. Pandian, G. Geetharamani, and B. Annette, "Data augmentation on plant leaf disease image dataset using image manipulation and deep learning techniques," in 2019 IEEE 9th International Conference on Advanced Computing (IACC), 2019, pp. 199-204: IEEE.
    [54] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv preprint, vol. arXiv:1511.06434, 2016.
    [55] G. Iadarola, F. Martinelli, F. Mercaldo, and A. Santone, "Image-based Malware Family Detection: An Assessment between Feature Extraction and Classification Techniques," in IoTBDS, 2020, pp. 499-506.
    [56] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin: Effective and explainable detection of android malware in your pocket," in Ndss, 2014, vol. 14, pp. 23-26.
    [57] F. Wei, Y. Li, S. Roy, X. Ou, and W. Zhou, "Deep ground truth analysis of current android malware," in International conference on detection of intrusions and malware, and vulnerability assessment, 2017, pp. 252-276: Springer.
    [58] S. Mahdavifar, A. F. A. Kadir, R. Fatemi, D. Alhadidi, and A. A. Ghorbani, "Dynamic android malware category classification using semi-supervised deep learning," in 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 2020, pp. 515-522: IEEE.
    [59] K. Elish, M. Elish, and H. Almohri, "Lightweight, Effective Detection and Characterization of Mobile Malware Families," IEEE Transactions on Computers, 2022.

    QR CODE
    :::