| 研究生: |
楊竣憲 Chun-Hsien Yang |
|---|---|
| 論文名稱: |
應用生成對抗網路於資料擴增之Android惡意程式分析研究 Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection |
| 指導教授: |
陳奕明
Yi-Ming Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 生成對抗網路 、資料擴增 、深度學習 、Android |
| 外文關鍵詞: | GAN, Data augmentation, Deep learning, Android |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著惡意攻擊手法不斷推陳出新,面對這些層出不窮的新穎惡意程式,資料集中經常出現樣本不平衡的問題,使得分類器在訓練過程無法透過足夠數據學習某些類別其潛在惡意特徵。本研究將應用生成對抗網路於Android惡意程式分析領域,生成對抗網路是一種針對圖像進行訓練和生成數據的深度學習架構,已經被廣泛用為資料擴增於其他的機器視覺圖像辨識研究中。本論文透過將Android程式特徵轉為圖像化表達,並將數量稀少的惡意家族由該方法進行資料生成,藉此平衡、擴增原有資料集。同時本研究也比較了其他傳統的資料擴增技術,探討是否有益於辨識出少量的惡意類別樣本。測試證實不論是傳統圖像擴增方法或是生成對抗網路皆能提升分類的準確率,但生成對抗網路能更有效提高分類模型檢測出資料集中原本因數量較少而辨識準確率較低的惡意家族,實驗結果表示在Drebin四千筆與AMD兩萬筆資料的不同資料集中,對於樣本數量較稀少的類別經由生成對抗網路擴增後,相較於擴增前,兩者準確率的差異可達5%~20%。
As malicious attack techniques continue to evolve, in the face of these endless new malicious programs, the problem of sample imbalance often occurs in the dataset, making the classifier unable to learn certain categories of its potential malicious features through sufficient data during the training process. In this study, will apply the Generative Adversarial Networks(GAN), which is a kind of deep learning architecture that trains and generates data for images, to the field of Android malware analysis. GAN has been widely used as data augmentation for other machine vision image recognition researching. In this paper, the characteristics of Android programs are converted into graphical expressions, and a few malicious families are generated by this method to balance and expand the original data set. At the same time, this study also compared other traditional data amplification techniques to explore whether it is beneficial to identify a small number of malicious category samples. Tests have confirmed that both traditional image amplification methods and GAN can improve the accuracy of classification, but the GAN can more effectively improve the classification model. The detection accuracy of the data set was originally low due to the small number of data. The malicious family, the experimental results show that in the different data sets of Drebin's 4,000 and AMD's 20,000 samples, the accuracy of the two types of samples with a relatively small number of samples is amplified by the generation of the anti-network, compared to before the amplification. The difference can reach 5%~20%.
[參考網站]
[1] Statcounter. (2020). "Mobile vs Tablet Market Share Worldwide," Available:
https://gs.statcounter.com/platform-market-share/mobile-tablet/worldwide/#monthly-
201906-202006
[3] PURPLESEC. (2019). "The Ultimate List Of Cyber Security Statistics For 2019,"
Available:https://purplesec.us/resources/cyber-security-statistics/
[13] FIREEYE."What is a Zero-Day Exploit?," Available:https://www.fireeye.com/current-
threats/what-is-a-zero-day-exploit.html
[26] Wiki. "Generative model," Available:https://en.wikipedia.org/wiki/Generative_model
[45] Github. "the-gan-zoo," Available:https://github.com/hindupuravinash/the-gan-zoo.
[55] Apktool. " A tool for reverse engineering 3rd party," Available:
https://ibotpeaches.github.io/Apktool
[59] A. M. Dataset, Available: http://amd.arguslab.org/
[60] A. D. Project, Available: https://www.sec.cs.tu-bs.de/~danarp/drebin/
[中文文獻]
[57] 張櫻瀞, "整合注意力機制與圖像化操作碼之 Android 惡意程式分析研究," 國立中
央大學資訊管理所碩士論文, 2019.
[英文文獻]
[2] S. Karthick and S. Binu, "Android security issues and solutions," in 2017 International
Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 2017: IEEE,
pp. 686-689.
[4] F. Wei, S. Roy, and X. Ou, "Amandroid: A precise and general inter-component data flow
analysis framework for security vetting of android apps," in Proceedings of the 2014 ACM
SIGSAC conference on computer and communications security, 2014, pp. 1329-1341.
[5] A. Martín, A. Calleja, H. D. Menéndez, J. Tapiador, and D. Camacho, "ADROIT: Android
malware detection using meta-information," in 2016 IEEE Symposium Series on
Computational Intelligence (SSCI), 2016: IEEE, pp. 1-8.
[6] Z. Qu, S. Alam, Y. Chen, X. Zhou, W. Hong, and R. Riley, "Dydroid: Measuring dynamic
code loading and its security implications in android applications," in 2017 47th Annual
IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2017:
IEEE, pp. 415-426.
[7] C. K. Chang, "Situation analytics: a foundation for a new software engineering paradigm,"
Computer, vol. 49, no. 1, pp. 24-33, 2016.
[8] Z. Lin, R. Wang, X. Jia, S. Zhang, and C. Wu, "Classifying Android malware with
dynamic behavior dependency graphs," in 2016 IEEE Trustcom/BigDataSE/ISPA, 2016:
IEEE, pp. 378-385.
[9] B. Chen, Z. Ren, C. Yu, I. Hussain, and J. Liu, "Adversarial examples for CNN-based
malware detectors," IEEE Access, vol. 7, pp. 54360-54371, 2019.
[10] T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, "N-gram-based detection of
new malicious code," in Proceedings of the 28th Annual International Computer
Software and Applications Conference, 2004. COMPSAC 2004., 2004, vol. 2: IEEE, pp.
41-42.
[11] G. Yan, N. Brown, and D. Kong, "Exploring discriminatory features for automated
malware classification," in International Conference on Detection of Intrusions and
Malware, and Vulnerability Assessment, 2013: Springer, pp. 41-61.
[12] J. Yan, Y. Qi, and Q. Rao, "LSTM-based hierarchical denoising network for Android
malware detection," Security and Communication Networks, vol. 2018, 2018.
[14] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp.
436-444, 2015.
[15] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, and F. Mercaldo, "Detection of
obfuscation techniques in Android applications," in Proceedings of the 13th International
Conference on Availability, Reliability and Security, 2018, pp. 1-9.
[16] X. Xiao, S. Zhang, F. Mercaldo, G. Hu, and A. K. Sangaiah, "Android malware detection
based on system call sequences and LSTM," Multimedia Tools and Applications, vol. 78,
no. 4, pp. 3979-3999, 2019.
[17] Y. Ding, R. Wu, and F. Xue, "Detecting Android Malware Using Bytecode Image," in International Conference on Cognitive Computing, 2018: Springer, pp. 164-169.
[18] W. Wang, M. Zhao, and J. Wang, "Effective android malware detection with a hybrid
model based on deep autoencoder and convolutional neural network," Journal of
Ambient Intelligence and Humanized Computing, vol. 10, no. 8, pp. 3035-3043, 2019.
[19] E. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, "MalDozer: Automatic
framework for android malware detection using deep learning," Digital Investigation,
vol. 24, pp. S48-S59, 2018.
[20] J. Wang and L. Perez, "The effectiveness of data augmentation in image classification
using deep learning," Convolutional Neural Networks Vis. Recognit, vol. 11, 2017.
[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep
convolutional neural networks," in Advances in neural information processing systems,
2012, pp. 1097-1105.
[22] H. Rizk, A. Shokry, and M. Youssef, "Effectiveness of data augmentation in cellular -based localization using deep learning," in 2019 IEEE Wireless Communications and Networking Conference (WCNC), 2019: IEEE, pp. 1-6.
[23] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," stat, vol. 1050, p. 1,
2014.
[24] I. Goodfellow et al., "Generative adversarial nets," in Advances in neural information
processing systems, 2014, pp. 2672-2680.
[25] A. Y. Ng and M. I. Jordan, "On discriminative vs. generative classifiers: A comparison of
logistic regression and naive bayes," in Advances in neural information processing
systems, 2002, pp. 841-848.
[27] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using
cycle-consistent adversarial networks," in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2223-2232.
[28] R. Huang, S. Zhang, T. Li, and R. He, "Beyond face rotation: Global and local perception
gan for photorealistic and identity preserving frontal view synthesis," in Proceedings of
the IEEE International Conference on Computer Vision, 2017, pp. 2439-2448.
[29] M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, "Synthetic data
augmentation using GAN for improved liver lesion classification," in 2018 IEEE 15th
international symposium on biomedical imaging (ISBI 2018), 2018: IEEE, pp. 289-293.
[30] C. Bermudez, A. J. Plassard, L. T. Davis, A. T. Newton, S. M. Resnick, and B. A.
Landman, "Learning implicit brain MRI manifolds with deep learning," in Medical
Imaging 2018: Image Processing, 2018, vol. 10574: International Society for Optics and
Photonics, p. 105741L.
[31] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images:
visualization and automatic classification," in Proceedings of the 8th international
symposium on visualization for cyber security, 2011, pp. 1-7.
[32] M. Yang and Q. Wen, "Detecting android malware by applying classification techniques
on images patterns," in 2017 IEEE 2nd International Conference on Cloud Computing
and Big Data Analysis (ICCCBDA), 2017: IEEE, pp. 344-347.
[33] A. Makandar and A. Patrot, "Malware class recognition using image processing
techniques," in 2017 International Conference on Data Management, Analytics and
Innovation (ICDMAI), 2017: IEEE, pp. 76-80.
[34] N. McLaughlin et al., "Deep android malware detection," in Proceedings of the Seventh
ACM on Conference on Data and Application Security and Privacy, 2017, pp. 301-308. [35] W. Guo, T. Wang, and J. Wei, "Malware detection with convolutional neural network
using hardware events," in CCF National Conference on Compujter Engineering and
Technology, 2017: Springer, pp. 104-115.
[36] T. Hsien-De Huang and H.-Y. Kao, "R2-D2: color-inspired convolutional neural
network (CNN)-based android malware detections," in 2018 IEEE International
Conference on Big Data (Big Data), 2018: IEEE, pp. 2633-2642.
[37] X. Liu, J. Zhang, Y. Lin, and H. Li, "Atmpa: Attacking machine learning-based malware
visualization detection methods via adversarial examples," in 2019 IEEE/ACM 27th
International Symposium on Quality of Service (IWQoS), 2019: IEEE, pp. 1-10.
[38] D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, and Q. Zheng, "IMCFN: Image-
based malware classification using fine-tuned convolutional neural network
architecture," Computer Networks, vol. 171, p. 107138, 2020.
[39] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale
image recognition," 2015.
[40] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
pp. 770-778.
[41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception
architecture for computer vision," in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2016, pp. 2818-2826.
[42] M. Mirza and S. Osindero, "Conditional generative adversarial nets," arXiv preprint
arXiv:1411.1784, 2014.
[43] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, "Infogan:
Interpretable representation learning by information maximizing generative adversarial
nets," in Advances in neural information processing systems, 2016, pp. 2172-2180.
[44] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep
convolutional generative adversarial networks," arXiv preprint arXiv:1511.06434, 2015.
[46] L. Chen, S. Hou, Y. Ye, and S. Xu, "Droideye: Fortifying security of learning-based
classifier against adversarial android malware attacks," in 2018 IEEE/ACM
International Conference on Advances in Social Networks Analysis and Mining
(ASONAM), 2018: IEEE, pp. 782-789.
[47] W. Hu and Y. Tan, "Generating Adversarial Malware Examples for Black-Box Attacks
Based on GAN," arXiv, p. arXiv: 1702.05983, 2017.
[48] J. W. Stokes, D. Wang, M. Marinescu, M. Marino, and B. Bussone, "Attack and defense
of dynamic analysis-based, adversarial neural malware detection models," in MILCOM
2018-2018 IEEE Military Communications Conference (MILCOM), 2018: IEEE, pp. 1-8.
[49] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, "Query-efficient gan based black-
box attack against sequence based machine and deep learning classifiers," arXiv preprint
arXiv:1804.08778, 2018.
[50] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, "Adversarial
perturbations against deep neural networks for malware classification," arXiv preprint
arXiv:1606.04435, 2016.
[51] W. Hu and Y. Tan, "Generating adversarial malware examples for black-box attacks
based on gan," arXiv preprint arXiv:1702.05983, 2017.
[52] J.-Y. Kim, S.-J. Bu, and S.-B. Cho, "Zero-day malware detection using transferred
generative adversarial networks based on deep autoencoders," Information Sciences, vol.
460, pp. 83-102, 2018.
[53] Y. Lu and J. Li, "Generative adversarial network for improving deep learning based
malware classification," in 2019 Winter Simulation Conference (WSC), 2019: IEEE, pp.
584-593.
[54] Q. Jerome, K. Allix, R. State, and T. Engel, "Using opcode-sequences to detect malicious
Android applications," in 2014 IEEE International Conference on Communications
(ICC), 2014: IEEE, pp. 914-919.
[56] J. Yan, Y. Qi, and Q. Rao, "Detecting malware with an ensemble method based on deep
neural network," Security and Communication Networks, vol. 2018, 2018.
[61] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin:
Effective and explainable detection of android malware in your pocket," in Ndss, 2014,
vol. 14, pp. 23-26.
[62] C. Hasegawa and H. Iyatomi, "One-dimensional convolutional neural networks for
android malware detection," in 2018 IEEE 14th International Colloquium on Signal
Processing & Its Applications (CSPA), 2018: IEEE, pp. 99-102.