跳到主要內容

簡易檢索 / 詳目顯示

研究生: 楊竣憲
Chun-Hsien Yang
論文名稱: 應用生成對抗網路於資料擴增之Android惡意程式分析研究
Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection
指導教授: 陳奕明
Yi-Ming Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 63
中文關鍵詞: 生成對抗網路資料擴增深度學習Android
外文關鍵詞: GAN, Data augmentation, Deep learning, Android
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著惡意攻擊手法不斷推陳出新,面對這些層出不窮的新穎惡意程式,資料集中經常出現樣本不平衡的問題,使得分類器在訓練過程無法透過足夠數據學習某些類別其潛在惡意特徵。本研究將應用生成對抗網路於Android惡意程式分析領域,生成對抗網路是一種針對圖像進行訓練和生成數據的深度學習架構,已經被廣泛用為資料擴增於其他的機器視覺圖像辨識研究中。本論文透過將Android程式特徵轉為圖像化表達,並將數量稀少的惡意家族由該方法進行資料生成,藉此平衡、擴增原有資料集。同時本研究也比較了其他傳統的資料擴增技術,探討是否有益於辨識出少量的惡意類別樣本。測試證實不論是傳統圖像擴增方法或是生成對抗網路皆能提升分類的準確率,但生成對抗網路能更有效提高分類模型檢測出資料集中原本因數量較少而辨識準確率較低的惡意家族,實驗結果表示在Drebin四千筆與AMD兩萬筆資料的不同資料集中,對於樣本數量較稀少的類別經由生成對抗網路擴增後,相較於擴增前,兩者準確率的差異可達5%~20%。


    As malicious attack techniques continue to evolve, in the face of these endless new malicious programs, the problem of sample imbalance often occurs in the dataset, making the classifier unable to learn certain categories of its potential malicious features through sufficient data during the training process. In this study, will apply the Generative Adversarial Networks(GAN), which is a kind of deep learning architecture that trains and generates data for images, to the field of Android malware analysis. GAN has been widely used as data augmentation for other machine vision image recognition researching. In this paper, the characteristics of Android programs are converted into graphical expressions, and a few malicious families are generated by this method to balance and expand the original data set. At the same time, this study also compared other traditional data amplification techniques to explore whether it is beneficial to identify a small number of malicious category samples. Tests have confirmed that both traditional image amplification methods and GAN can improve the accuracy of classification, but the GAN can more effectively improve the classification model. The detection accuracy of the data set was originally low due to the small number of data. The malicious family, the experimental results show that in the different data sets of Drebin's 4,000 and AMD's 20,000 samples, the accuracy of the two types of samples with a relatively small number of samples is amplified by the generation of the anti-network, compared to before the amplification. The difference can reach 5%~20%.

    第一章 緒論………………………………………………………………………………………………1 1-1 研究背景…………………………………………………………………………………1 1-2 研究動機…………………………………………………………………………………3 1-3 研究貢獻…………………………………………………………………………………5 1-4 章節架構…………………………………………………………………………………6 第二章 相關研究………………………………………………………………………………………7 2-1 程式碼圖像化之研究……………………………………………………………7 2-1-1 於Windows惡意程式分析…………………………………7 2-1-2 於Android惡意程式分析…………………………………8 2-2 基於捲積神經網路檢測惡意程式之研究…………………10 2-2-1 使用不同特徵………………………………………………………10 2-2-2 使用不同捲積模型………………………………………………12 2-3 生成對抗網路…………………………………………………………………………13 2-4 資料擴增於惡意程式分析之研究…………………………………15 2-5 小結……………………………………………………………………………………………16 第三章 系統設計………………………………………………………………………………………18 3-1 系統架構……………………………………………………………………………………18 3-1-1 資料前處理………………………………………………………………19 3-1-2 資料擴增模組…………………………………………………………21 3-1-3 分類模組……………………………………………………………………21 3-1-4 評估指標……………………………………………………………………22 3-2 系統之訓練與使用流程………………………………………………………23 第四章 實驗結果………………………………………………………………………………………25 4-1實驗環境與使用資料集…………………………………………………………25 4-1-1 實驗環境……………………………………………………………………25 4-1-2 資料集…………………………………………………………………………26 4-2 實驗設計……………………………………………………………………………………28 4-2-1 實驗一…………………………………………………………………………28 4-2-2 實驗二…………………………………………………………………………30 4-2-3 實驗三…………………………………………………………………………32 4-2-4 實驗四…………………………………………………………………………33 4-2-5 實驗五…………………………………………………………………………35 4-2-6 實驗六…………………………………………………………………………37 4-2-7 實驗七…………………………………………………………………………38 4-3 實驗結果與討論……………………………………………………………………40 第五章 結論與未來研究…………………………………………………………………………43 5-1結論與貢獻…………………………………………………………………………………43 5-2研究限制………………………………………………………………………………………44 5-3未來研究………………………………………………………………………………………45 參考文獻…………………………………………………………………………………………………………46

    [參考網站]
    [1] Statcounter. (2020). "Mobile vs Tablet Market Share Worldwide," Available:
    https://gs.statcounter.com/platform-market-share/mobile-tablet/worldwide/#monthly-
    201906-202006
    [3] PURPLESEC. (2019). "The Ultimate List Of Cyber Security Statistics For 2019,"
    Available:https://purplesec.us/resources/cyber-security-statistics/
    [13] FIREEYE."What is a Zero-Day Exploit?," Available:https://www.fireeye.com/current-
    threats/what-is-a-zero-day-exploit.html
    [26] Wiki. "Generative model," Available:https://en.wikipedia.org/wiki/Generative_model
    [45] Github. "the-gan-zoo," Available:https://github.com/hindupuravinash/the-gan-zoo.
    [55] Apktool. " A tool for reverse engineering 3rd party," Available:
    https://ibotpeaches.github.io/Apktool
    [59] A. M. Dataset, Available: http://amd.arguslab.org/
    [60] A. D. Project, Available: https://www.sec.cs.tu-bs.de/~danarp/drebin/
    [中文文獻]
    [57] 張櫻瀞, "整合注意力機制與圖像化操作碼之 Android 惡意程式分析研究," 國立中
    央大學資訊管理所碩士論文, 2019.
    [英文文獻]
    [2] S. Karthick and S. Binu, "Android security issues and solutions," in 2017 International
    Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 2017: IEEE,
    pp. 686-689.
    [4] F. Wei, S. Roy, and X. Ou, "Amandroid: A precise and general inter-component data flow
    analysis framework for security vetting of android apps," in Proceedings of the 2014 ACM
    SIGSAC conference on computer and communications security, 2014, pp. 1329-1341.
    [5] A. Martín, A. Calleja, H. D. Menéndez, J. Tapiador, and D. Camacho, "ADROIT: Android
    malware detection using meta-information," in 2016 IEEE Symposium Series on
    Computational Intelligence (SSCI), 2016: IEEE, pp. 1-8.
    [6] Z. Qu, S. Alam, Y. Chen, X. Zhou, W. Hong, and R. Riley, "Dydroid: Measuring dynamic
    code loading and its security implications in android applications," in 2017 47th Annual
    IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2017:
    IEEE, pp. 415-426.
    [7] C. K. Chang, "Situation analytics: a foundation for a new software engineering paradigm,"
    Computer, vol. 49, no. 1, pp. 24-33, 2016.
    [8] Z. Lin, R. Wang, X. Jia, S. Zhang, and C. Wu, "Classifying Android malware with
    dynamic behavior dependency graphs," in 2016 IEEE Trustcom/BigDataSE/ISPA, 2016:
    IEEE, pp. 378-385.
    [9] B. Chen, Z. Ren, C. Yu, I. Hussain, and J. Liu, "Adversarial examples for CNN-based
    malware detectors," IEEE Access, vol. 7, pp. 54360-54371, 2019.
    [10] T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, "N-gram-based detection of
    new malicious code," in Proceedings of the 28th Annual International Computer
    Software and Applications Conference, 2004. COMPSAC 2004., 2004, vol. 2: IEEE, pp.
    41-42.
    [11] G. Yan, N. Brown, and D. Kong, "Exploring discriminatory features for automated
    malware classification," in International Conference on Detection of Intrusions and
    Malware, and Vulnerability Assessment, 2013: Springer, pp. 41-61.
    [12] J. Yan, Y. Qi, and Q. Rao, "LSTM-based hierarchical denoising network for Android
    malware detection," Security and Communication Networks, vol. 2018, 2018.
    [14] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp.
    436-444, 2015.
    [15] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, and F. Mercaldo, "Detection of
    obfuscation techniques in Android applications," in Proceedings of the 13th International
    Conference on Availability, Reliability and Security, 2018, pp. 1-9.
    [16] X. Xiao, S. Zhang, F. Mercaldo, G. Hu, and A. K. Sangaiah, "Android malware detection
    based on system call sequences and LSTM," Multimedia Tools and Applications, vol. 78,
    no. 4, pp. 3979-3999, 2019.
    [17] Y. Ding, R. Wu, and F. Xue, "Detecting Android Malware Using Bytecode Image," in International Conference on Cognitive Computing, 2018: Springer, pp. 164-169.
    [18] W. Wang, M. Zhao, and J. Wang, "Effective android malware detection with a hybrid
    model based on deep autoencoder and convolutional neural network," Journal of
    Ambient Intelligence and Humanized Computing, vol. 10, no. 8, pp. 3035-3043, 2019.
    [19] E. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, "MalDozer: Automatic
    framework for android malware detection using deep learning," Digital Investigation,
    vol. 24, pp. S48-S59, 2018.
    [20] J. Wang and L. Perez, "The effectiveness of data augmentation in image classification
    using deep learning," Convolutional Neural Networks Vis. Recognit, vol. 11, 2017.
    [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep
    convolutional neural networks," in Advances in neural information processing systems,
    2012, pp. 1097-1105.
    [22] H. Rizk, A. Shokry, and M. Youssef, "Effectiveness of data augmentation in cellular -based localization using deep learning," in 2019 IEEE Wireless Communications and Networking Conference (WCNC), 2019: IEEE, pp. 1-6.
    [23] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," stat, vol. 1050, p. 1,
    2014.
    [24] I. Goodfellow et al., "Generative adversarial nets," in Advances in neural information
    processing systems, 2014, pp. 2672-2680.
    [25] A. Y. Ng and M. I. Jordan, "On discriminative vs. generative classifiers: A comparison of
    logistic regression and naive bayes," in Advances in neural information processing
    systems, 2002, pp. 841-848.
    [27] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using
    cycle-consistent adversarial networks," in Proceedings of the IEEE international
    conference on computer vision, 2017, pp. 2223-2232.
    [28] R. Huang, S. Zhang, T. Li, and R. He, "Beyond face rotation: Global and local perception
    gan for photorealistic and identity preserving frontal view synthesis," in Proceedings of
    the IEEE International Conference on Computer Vision, 2017, pp. 2439-2448.
    [29] M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, "Synthetic data
    augmentation using GAN for improved liver lesion classification," in 2018 IEEE 15th
    international symposium on biomedical imaging (ISBI 2018), 2018: IEEE, pp. 289-293.
    [30] C. Bermudez, A. J. Plassard, L. T. Davis, A. T. Newton, S. M. Resnick, and B. A.
    Landman, "Learning implicit brain MRI manifolds with deep learning," in Medical
    Imaging 2018: Image Processing, 2018, vol. 10574: International Society for Optics and
    Photonics, p. 105741L.
    [31] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images:
    visualization and automatic classification," in Proceedings of the 8th international
    symposium on visualization for cyber security, 2011, pp. 1-7.
    [32] M. Yang and Q. Wen, "Detecting android malware by applying classification techniques
    on images patterns," in 2017 IEEE 2nd International Conference on Cloud Computing
    and Big Data Analysis (ICCCBDA), 2017: IEEE, pp. 344-347.
    [33] A. Makandar and A. Patrot, "Malware class recognition using image processing
    techniques," in 2017 International Conference on Data Management, Analytics and
    Innovation (ICDMAI), 2017: IEEE, pp. 76-80.
    [34] N. McLaughlin et al., "Deep android malware detection," in Proceedings of the Seventh
    ACM on Conference on Data and Application Security and Privacy, 2017, pp. 301-308. [35] W. Guo, T. Wang, and J. Wei, "Malware detection with convolutional neural network
    using hardware events," in CCF National Conference on Compujter Engineering and
    Technology, 2017: Springer, pp. 104-115.
    [36] T. Hsien-De Huang and H.-Y. Kao, "R2-D2: color-inspired convolutional neural
    network (CNN)-based android malware detections," in 2018 IEEE International
    Conference on Big Data (Big Data), 2018: IEEE, pp. 2633-2642.
    [37] X. Liu, J. Zhang, Y. Lin, and H. Li, "Atmpa: Attacking machine learning-based malware
    visualization detection methods via adversarial examples," in 2019 IEEE/ACM 27th
    International Symposium on Quality of Service (IWQoS), 2019: IEEE, pp. 1-10.
    [38] D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, and Q. Zheng, "IMCFN: Image-
    based malware classification using fine-tuned convolutional neural network
    architecture," Computer Networks, vol. 171, p. 107138, 2020.
    [39] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale
    image recognition," 2015.
    [40] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in
    Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
    pp. 770-778.
    [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception
    architecture for computer vision," in Proceedings of the IEEE conference on computer
    vision and pattern recognition, 2016, pp. 2818-2826.
    [42] M. Mirza and S. Osindero, "Conditional generative adversarial nets," arXiv preprint
    arXiv:1411.1784, 2014.
    [43] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, "Infogan:
    Interpretable representation learning by information maximizing generative adversarial
    nets," in Advances in neural information processing systems, 2016, pp. 2172-2180.
    [44] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep
    convolutional generative adversarial networks," arXiv preprint arXiv:1511.06434, 2015.
    [46] L. Chen, S. Hou, Y. Ye, and S. Xu, "Droideye: Fortifying security of learning-based
    classifier against adversarial android malware attacks," in 2018 IEEE/ACM
    International Conference on Advances in Social Networks Analysis and Mining
    (ASONAM), 2018: IEEE, pp. 782-789.
    [47] W. Hu and Y. Tan, "Generating Adversarial Malware Examples for Black-Box Attacks
    Based on GAN," arXiv, p. arXiv: 1702.05983, 2017.
    [48] J. W. Stokes, D. Wang, M. Marinescu, M. Marino, and B. Bussone, "Attack and defense
    of dynamic analysis-based, adversarial neural malware detection models," in MILCOM
    2018-2018 IEEE Military Communications Conference (MILCOM), 2018: IEEE, pp. 1-8.
    [49] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, "Query-efficient gan based black-
    box attack against sequence based machine and deep learning classifiers," arXiv preprint
    arXiv:1804.08778, 2018.
    [50] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, "Adversarial
    perturbations against deep neural networks for malware classification," arXiv preprint
    arXiv:1606.04435, 2016.
    [51] W. Hu and Y. Tan, "Generating adversarial malware examples for black-box attacks
    based on gan," arXiv preprint arXiv:1702.05983, 2017.
    [52] J.-Y. Kim, S.-J. Bu, and S.-B. Cho, "Zero-day malware detection using transferred
    generative adversarial networks based on deep autoencoders," Information Sciences, vol.
    460, pp. 83-102, 2018.
    [53] Y. Lu and J. Li, "Generative adversarial network for improving deep learning based
    malware classification," in 2019 Winter Simulation Conference (WSC), 2019: IEEE, pp.
    584-593.
    [54] Q. Jerome, K. Allix, R. State, and T. Engel, "Using opcode-sequences to detect malicious
    Android applications," in 2014 IEEE International Conference on Communications
    (ICC), 2014: IEEE, pp. 914-919.
    [56] J. Yan, Y. Qi, and Q. Rao, "Detecting malware with an ensemble method based on deep
    neural network," Security and Communication Networks, vol. 2018, 2018.
    [61] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin:
    Effective and explainable detection of android malware in your pocket," in Ndss, 2014,
    vol. 14, pp. 23-26.
    [62] C. Hasegawa and H. Iyatomi, "One-dimensional convolutional neural networks for
    android malware detection," in 2018 IEEE 14th International Colloquium on Signal
    Processing & Its Applications (CSPA), 2018: IEEE, pp. 99-102.

    QR CODE
    :::