| 研究生: |
何岸錡 An-Chi He |
|---|---|
| 論文名稱: |
整合區塊特徵萃取與多頭注意力機制之Android惡意程式偵測系統 |
| 指導教授: |
陳奕明
Yi-Ming Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 深度學習 、多頭注意力 、Transformer 、Bi-LSTM 、靜態分析 |
| 外文關鍵詞: | Deep learning, Multi-head Attention, Transformer, Bi-LSTM, Staticanalysis |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習技術地快速發展,對行動惡意程式的偵測任務有了突破性的進展。然而,基於時間序列的深度學習模型,在輸入長序列特徵時,仍然會因為遞歸神經網路的記憶限制,產生梯度消散的問題。因此,後續有許多研究針對長序列特徵提出特徵壓縮、提取方法,但目前尚未發現有研究能在壓縮序列的同時,仍能涵蓋原始序列的完整特徵資訊與語意的時序關係。因此,本研究提出一個多模型惡意程式偵測架構,著重在涵蓋全局特徵的前提下,壓縮特徵間仍能保有部份的時序關係,並在整合多頭注意力(Multi-head Attention)機制後,改善遞歸神經網路的記憶問題。模型分為兩個階段執行:前處理階段,主要針對Android底層操作碼(Dalvik Opcode)進行分段、統計,後續輸入 Bi-LSTM進行語意萃取,此階段有助於將原始Opcode序列進行壓縮,產生富有時序意義的語意區塊序列,作為下游分類器的分類特徵;在分類階段,本研究改良Transformer模型,由Multi-head Attention機制對序列特徵進行有效率的專注,後續加入全局池化層(Global Pooling Layer),強化模型對數據的敏感度,並進行降維,減少模型的過度擬合。實驗結果顯示在多家族分類的偵測準確率達99.30%,且二元分類、小樣本分類效能相比現有研究皆有顯著的提升,此外,本研究亦進行多項消融測試證實各個模型在整體架構中的重要性。
With the rapid development of deep learning technology, the task of detecting mobile malware has made breakthrough progress. However, the deep learning model based on time series still has the problem of gradient vanishing due to the memory limitation of the recurrent neural net-work when inputting long sequence features. Many researchers have proposed feature com-pression and extraction methods for processing the long sequence features, but no research has been found that can compress the sequence while retaining the global features of the original sequence and the semantic relationship. Therefore, we propose a multi-model malware detection architecture that focuses on holding the whole global features while retaining partial timing rela-tionships among compressed features. We also apply the Multi-head Attention mechanism to improve the memory problem of the recurrent neural network. The model is executed in two stages: the pre-processing stage, which mainly performs segmentation and statistics for the An-droid underlying operation code (Dalvik Opcode), and then enters Bi-LSTM for semantic ex-traction. This stage helps to compress the original Opcode sequence to generate Semantic block sequences feature rich in temporal significance are used as the classification features of down-stream classifiers; in the classification stage, this research improves the Transformer model, and uses the Multi-head Attention mechanism to focus on block sequence features efficiently, and then adds the global pooling layer (Global Pooling Layer), strengthen the sensitivity of the model to the block feature, and reduce the dimensionality to reduce the over-fitting of the model. Experimental results show that the detection accuracy of multi-family classification is 99.30%, and the performance of binary classification and small sample classification have been signifi-cantly improved. In addition, this study also conducted multiple ablation tests to confirm the importance of each model in the overall architecture.
參考文獻
[參考網站]
[1] Gameloft. "APKpure." https://apkpure.com/tw/. (accessed.
[2] IDC. "Smartphone Market Share." https://www.idc.com/promo/smartphone-market-share/os (accessed.
[3] Kaspersky. "IT threat evolution Q1 2020. Statistics." https://securelist.com/it-threat-evolution-q1-2020-statistics/96959/ (accessed.
[4] Statista. "Number of smartphone users worldwide from 2016 to 2021." https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/ (accessed.
[5] Wiśniewski, R. "APKTOOL." https://ibotpeaches.github.io/Apktool/ (accessed.
[中文網站]
[6] 徐振皓, "一種針對LSTM長序列問題之新型前處理降維方法研究-以Android惡意程式分析為例;A Novel Preprocessing Method for Solving Long Sequence Problem in Android Malware Detection," 國立中央大學資訊管理所碩士論文, 2019.
[7] 曾博彥, "基於系統呼叫序列與注意力LSTM模型偵測Android惡意軟體之研究;Android Malware Analysis Based on System Call sequences and Attention-LSTM," 國立中央大學資訊管理所碩士論文, 2019.
[英文網站]
[8] Adhikari, A., Ram, A., Tang, R., and Lin, J., "Docbert: Bert for document classification," arXiv preprint arXiv:1904.08398, 2019.
[9] Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C., "Drebin: Effective and explainable detection of android malware in your pocket," Ndss, Vol. 14, pp. 23-26, 2014.
[10] Bahdanau, D., Cho, K., and Bengio, Y., "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[11] Canfora, G., De Lorenzo, A., Medvet, E., Mercaldo, F., and Visaggio, C. A., "Effectiveness of opcode ngrams for detection of multi family android malware," 2015 10th International Conference on Availability, Reliability and Security, pp. 333-340, 2015.
[12] Chen, T., Mao, Q., Yang, Y., Lv, M., and Zhu, J. J. M. i. s., "TinyDroid: a lightweight and efficient model for Android malware detection and classification," Vol. 2018, 2018.
[13] Chen, Y. M., Hsu, C. H., and Chung, K. C. K., "A Novel Preprocessing Method for Solving Long Sequence Problem in Android Malware Detection," 2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media), pp. 12-17, 2019.
[14] Cui, Z., Xue, F., Cai, X., Cao, Y., Wang, G.-g., and Chen, J. J. I. T. o. I. I., "Detection of malicious code variants based on deep learning," Vol. 14, No. 7, pp. 3187-3196, 2018.
[15] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[16] Elman, J. L., "Finding structure in time," Cognitive science, Vol. 14, No. 2, pp. 179-211, 1990.
[17] Graves, A., "Long short-term memory," Supervised sequence labelling with recurrent neural networks, pp. 37-45, 2012.
[18] Hasegawa, C. and Iyatomi, H., "One-dimensional convolutional neural networks for android malware detection," 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), pp. 99-102, 2018.
[19] He, G., Xu, B., Zhu, H. J. S., and Networks, C., "AppFA: a novel approach to detect malicious android applications on the network," Vol. 2018, 2018.
[20] Huang, Y.-T., Chen, T.-Y., Sun, Y. S., and Chen, M. C. J. a. p. a., "Learning Malware Representation based on Execution Sequences," 2019.
[21] Jerome, Q., Allix, K., State, R., and Engel, T., "Using opcode-sequences to detect malicious Android applications," 2014 IEEE International Conference on Communications (ICC), pp. 914-919, 2014.
[22] Kang, J., Jang, S., Li, S., Jeong, Y.-S., and Sung, Y., "Long short-term memory-based malware classification method for information security," Computers & Electrical Engineering, Vol. 77, pp. 366-375, 2019.
[23] Le, Q., Boydell, O., Mac Namee, B., and Scanlon, M., "Deep learning at the shallow end: Malware classification for non-domain experts," Digital Investigation, Vol. 26, pp. S118-S126, 2018.
[24] Lin, M., Chen, Q., and Yan, S., "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[25] Ma, Z., Ge, H., Wang, Z., Liu, Y., and Liu, X. J. a. p. a., "Droidetec: Android malware detection and malicious code localization through deep learning," 2020.
[26] Maiorca, D., Ariu, D., Corona, I., Aresu, M., and Giacinto, G., "Stealth attacks: An extended insight into the obfuscation effects on android malware," Computers & Security, Vol. 51, pp. 16-31, 2015.
[27] McLaughlin, N. et al., "Deep android malware detection," Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 301-308, 2017.
[28] Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B. S., "Malware images: visualization and automatic classification," Proceedings of the 8th international symposium on visualization for cyber security, pp. 1-7, 2011.
[29] Naway, A. and Li, Y., "A review on the use of deep learning in android malware detection," arXiv preprint arXiv:1812.10360, 2018.
[30] Oak, R., Du, M., Yan, D., Takawale, H., and Amit, I., "Malware Detection on Highly Imbalanced Data through Sequence Modeling," Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp. 37-48, 2019.
[31] Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., and Dehak, N., "Hierarchical Transformers for Long Document Classification," 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 838-844, 2019.
[32] Pascanu, R., Stokes, J. W., Sanossian, H., Marinescu, M., and Thomas, A., "Malware classification with recurrent networks," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916-1920, 2015.
[33] Ren, B., Liu, C., Cheng, B., Guo, J., and Chen, J., "MobiSentry: Towards easy and effective detection of android malware on smartphones," Mobile Information Systems, Vol. 2018, 2018.
[34] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., "Learning representations by back-propagating errors," nature, Vol. 323, No. 6088, pp. 533-536, 1986.
[35] Sun, G. and Qian, Q., "Deep learning and visualization for identifying malware families," IEEE Transactions on Dependable and Secure Computing, 2018.
[36] Sundermeyer, M., Schlüter, R., and Ney, H., "LSTM neural networks for language modeling," Thirteenth annual conference of the international speech communication association, 2012.
[37] Vaswani, A. et al., "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008, 2017.
[38] Vinayakumar, R., Soman, K., Poornachandran, P., and Sachin Kumar, S., "Detecting Android malware using long short-term memory (LSTM)," Journal of Intelligent & Fuzzy Systems, Vol. 34, No. 3, pp. 1277-1288, 2018.
[39] Wang, W., Gao, Z., Zhao, M., Li, Y., Liu, J., and Zhang, X., "DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features," IEEE Access, Vol. 6, pp. 31798-31807, 2018.
[40] Wei, F., Li, Y., Roy, S., Ou, X., and Zhou, W., "Deep ground truth analysis of current android malware," International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 252-276, 2017.
[41] Xiao, X., Zhang, S., Mercaldo, F., Hu, G., and Sangaiah, A. K., "Android malware detection based on system call sequences and LSTM," Multimedia Tools and Applications, Vol. 78, No. 4, pp. 3979-3999, 2019.
[42] Xiaofeng, L., Xiao, Z., Fangshuo, J., Shengwei, Y., and Jing, S., "ASSCA: API based sequence and statistics features combined malware detection architecture," Procedia Computer Science, Vol. 129, pp. 248-256, 2018.
[43] Yan, J., Qi, Y., and Rao, Q., "Detecting malware with an ensemble method based on deep neural network," Security and Communication Networks, Vol. 2018, 2018.
[44] Yan, J., Qi, Y., and Rao, Q., "LSTM-based hierarchical denoising network for Android malware detection," Security and Communication Networks, Vol. 2018, 2018.
[45] Ye, Y. et al., "AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection," 2018.
[46] Yuan, Z., Lu, Y., Wang, Z., and Xue, Y., "Droid-sec: deep learning in android malware detection," Proceedings of the 2014 ACM conference on SIGCOMM, pp. 371-372, 2014.
[47] Yuan, Z., Lu, Y., Xue, Y. J. T. S., and Technology, "Droiddetector: android malware characterization and detection using deep learning," Vol. 21, No. 1, pp. 114-123, 2016.
[48] Zhang, F., Huang, H., Zhu, S., Wu, D., and Liu, P., "ViewDroid: Towards obfuscation-resilient mobile application repackaging detection," Proceedings of the 2014 ACM conference on Security and privacy in wireless & mobile networks, pp. 25-36, 2014.
[49] Zhang, Y., Yang, Y., and Wang, X., "A novel android malware detection approach based on convolutional neural network," Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, pp. 144-149, 2018.