跳到主要內容

簡易檢索 / 詳目顯示

研究生: 何岸錡
An-Chi He
論文名稱: 整合區塊特徵萃取與多頭注意力機制之Android惡意程式偵測系統
指導教授: 陳奕明
Yi-Ming Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 70
中文關鍵詞: 深度學習多頭注意力TransformerBi-LSTM靜態分析
外文關鍵詞: Deep learning, Multi-head Attention, Transformer, Bi-LSTM, Staticanalysis
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習技術地快速發展,對行動惡意程式的偵測任務有了突破性的進展。然而,基於時間序列的深度學習模型,在輸入長序列特徵時,仍然會因為遞歸神經網路的記憶限制,產生梯度消散的問題。因此,後續有許多研究針對長序列特徵提出特徵壓縮、提取方法,但目前尚未發現有研究能在壓縮序列的同時,仍能涵蓋原始序列的完整特徵資訊與語意的時序關係。因此,本研究提出一個多模型惡意程式偵測架構,著重在涵蓋全局特徵的前提下,壓縮特徵間仍能保有部份的時序關係,並在整合多頭注意力(Multi-head Attention)機制後,改善遞歸神經網路的記憶問題。模型分為兩個階段執行:前處理階段,主要針對Android底層操作碼(Dalvik Opcode)進行分段、統計,後續輸入 Bi-LSTM進行語意萃取,此階段有助於將原始Opcode序列進行壓縮,產生富有時序意義的語意區塊序列,作為下游分類器的分類特徵;在分類階段,本研究改良Transformer模型,由Multi-head Attention機制對序列特徵進行有效率的專注,後續加入全局池化層(Global Pooling Layer),強化模型對數據的敏感度,並進行降維,減少模型的過度擬合。實驗結果顯示在多家族分類的偵測準確率達99.30%,且二元分類、小樣本分類效能相比現有研究皆有顯著的提升,此外,本研究亦進行多項消融測試證實各個模型在整體架構中的重要性。


    With the rapid development of deep learning technology, the task of detecting mobile malware has made breakthrough progress. However, the deep learning model based on time series still has the problem of gradient vanishing due to the memory limitation of the recurrent neural net-work when inputting long sequence features. Many researchers have proposed feature com-pression and extraction methods for processing the long sequence features, but no research has been found that can compress the sequence while retaining the global features of the original sequence and the semantic relationship. Therefore, we propose a multi-model malware detection architecture that focuses on holding the whole global features while retaining partial timing rela-tionships among compressed features. We also apply the Multi-head Attention mechanism to improve the memory problem of the recurrent neural network. The model is executed in two stages: the pre-processing stage, which mainly performs segmentation and statistics for the An-droid underlying operation code (Dalvik Opcode), and then enters Bi-LSTM for semantic ex-traction. This stage helps to compress the original Opcode sequence to generate Semantic block sequences feature rich in temporal significance are used as the classification features of down-stream classifiers; in the classification stage, this research improves the Transformer model, and uses the Multi-head Attention mechanism to focus on block sequence features efficiently, and then adds the global pooling layer (Global Pooling Layer), strengthen the sensitivity of the model to the block feature, and reduce the dimensionality to reduce the over-fitting of the model. Experimental results show that the detection accuracy of multi-family classification is 99.30%, and the performance of binary classification and small sample classification have been signifi-cantly improved. In addition, this study also conducted multiple ablation tests to confirm the importance of each model in the overall architecture.

    目錄 論文摘要 v Abstract vi 目錄 vii 圖目錄 ix 表目錄 xi 第一章 緒論 1 1-1 研究動機 4 1-2 研究貢獻 9 1-3 章節架構 9 第二章 相關研究 10 2-1 Dalvik opcode靜態特徵分析之研究 10 2-2 基於RNN深度學習模型之相關研究 17 2-3 Transformer多頭注意力機制相關研究 19 2-4 小結 22 第三章 系統架構 24 3-1 系統架構 24 3-1-1 反編譯模組(Decompile Module) 25 3-1-2 特徵向量化模組(Feature Vectorization Module) 26 3-1-3 語意萃取模組(Semantic Extraction Module) 28 3-1-4 注意力分類模組(Attention-based Classification Module) 29 3-2 系統運作流程 33 第四章 實驗結果 35 4-1 實驗環境 35 4-2 各種序列壓縮方法之比較 35 4-1-1實驗一 惡意程式二元分類 35 4-2-2實驗二 惡意程式多元分類 38 4-2-3 實驗三 惡意程式小樣本分類 41 4-2-4 實驗四 圖像壓縮方法之比較 43 4-3多模型各模組之消融測試 45 4-3-1 實驗五 區塊壓縮方法 45 4-3-2 實驗六 語意萃取模組 46 4-3-3 實驗七 Global Pooling 47 4-3-4實驗八 梯度消散測試 48 4-3實驗結果與討論 49 第五章 結論與未來貢獻 51 5-1 結論與貢獻 51 5-2 研究限制與未來研究 53 參考文獻 54

    參考文獻
    [參考網站]
    [1] Gameloft. "APKpure." https://apkpure.com/tw/. (accessed.
    [2] IDC. "Smartphone Market Share." https://www.idc.com/promo/smartphone-market-share/os (accessed.
    [3] Kaspersky. "IT threat evolution Q1 2020. Statistics." https://securelist.com/it-threat-evolution-q1-2020-statistics/96959/ (accessed.
    [4] Statista. "Number of smartphone users worldwide from 2016 to 2021." https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/ (accessed.
    [5] Wiśniewski, R. "APKTOOL." https://ibotpeaches.github.io/Apktool/ (accessed.
    [中文網站]
    [6] 徐振皓, "一種針對LSTM長序列問題之新型前處理降維方法研究-以Android惡意程式分析為例;A Novel Preprocessing Method for Solving Long Sequence Problem in Android Malware Detection," 國立中央大學資訊管理所碩士論文, 2019.
    [7] 曾博彥, "基於系統呼叫序列與注意力LSTM模型偵測Android惡意軟體之研究;Android Malware Analysis Based on System Call sequences and Attention-LSTM," 國立中央大學資訊管理所碩士論文, 2019.
    [英文網站]
    [8] Adhikari, A., Ram, A., Tang, R., and Lin, J., "Docbert: Bert for document classification," arXiv preprint arXiv:1904.08398, 2019.
    [9] Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C., "Drebin: Effective and explainable detection of android malware in your pocket," Ndss, Vol. 14, pp. 23-26, 2014.
    [10] Bahdanau, D., Cho, K., and Bengio, Y., "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
    [11] Canfora, G., De Lorenzo, A., Medvet, E., Mercaldo, F., and Visaggio, C. A., "Effectiveness of opcode ngrams for detection of multi family android malware," 2015 10th International Conference on Availability, Reliability and Security, pp. 333-340, 2015.
    [12] Chen, T., Mao, Q., Yang, Y., Lv, M., and Zhu, J. J. M. i. s., "TinyDroid: a lightweight and efficient model for Android malware detection and classification," Vol. 2018, 2018.
    [13] Chen, Y. M., Hsu, C. H., and Chung, K. C. K., "A Novel Preprocessing Method for Solving Long Sequence Problem in Android Malware Detection," 2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media), pp. 12-17, 2019.
    [14] Cui, Z., Xue, F., Cai, X., Cao, Y., Wang, G.-g., and Chen, J. J. I. T. o. I. I., "Detection of malicious code variants based on deep learning," Vol. 14, No. 7, pp. 3187-3196, 2018.
    [15] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
    [16] Elman, J. L., "Finding structure in time," Cognitive science, Vol. 14, No. 2, pp. 179-211, 1990.
    [17] Graves, A., "Long short-term memory," Supervised sequence labelling with recurrent neural networks, pp. 37-45, 2012.
    [18] Hasegawa, C. and Iyatomi, H., "One-dimensional convolutional neural networks for android malware detection," 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), pp. 99-102, 2018.
    [19] He, G., Xu, B., Zhu, H. J. S., and Networks, C., "AppFA: a novel approach to detect malicious android applications on the network," Vol. 2018, 2018.
    [20] Huang, Y.-T., Chen, T.-Y., Sun, Y. S., and Chen, M. C. J. a. p. a., "Learning Malware Representation based on Execution Sequences," 2019.
    [21] Jerome, Q., Allix, K., State, R., and Engel, T., "Using opcode-sequences to detect malicious Android applications," 2014 IEEE International Conference on Communications (ICC), pp. 914-919, 2014.
    [22] Kang, J., Jang, S., Li, S., Jeong, Y.-S., and Sung, Y., "Long short-term memory-based malware classification method for information security," Computers & Electrical Engineering, Vol. 77, pp. 366-375, 2019.
    [23] Le, Q., Boydell, O., Mac Namee, B., and Scanlon, M., "Deep learning at the shallow end: Malware classification for non-domain experts," Digital Investigation, Vol. 26, pp. S118-S126, 2018.
    [24] Lin, M., Chen, Q., and Yan, S., "Network in network," arXiv preprint arXiv:1312.4400, 2013.
    [25] Ma, Z., Ge, H., Wang, Z., Liu, Y., and Liu, X. J. a. p. a., "Droidetec: Android malware detection and malicious code localization through deep learning," 2020.
    [26] Maiorca, D., Ariu, D., Corona, I., Aresu, M., and Giacinto, G., "Stealth attacks: An extended insight into the obfuscation effects on android malware," Computers & Security, Vol. 51, pp. 16-31, 2015.
    [27] McLaughlin, N. et al., "Deep android malware detection," Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 301-308, 2017.
    [28] Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B. S., "Malware images: visualization and automatic classification," Proceedings of the 8th international symposium on visualization for cyber security, pp. 1-7, 2011.
    [29] Naway, A. and Li, Y., "A review on the use of deep learning in android malware detection," arXiv preprint arXiv:1812.10360, 2018.
    [30] Oak, R., Du, M., Yan, D., Takawale, H., and Amit, I., "Malware Detection on Highly Imbalanced Data through Sequence Modeling," Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp. 37-48, 2019.
    [31] Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., and Dehak, N., "Hierarchical Transformers for Long Document Classification," 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 838-844, 2019.
    [32] Pascanu, R., Stokes, J. W., Sanossian, H., Marinescu, M., and Thomas, A., "Malware classification with recurrent networks," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916-1920, 2015.
    [33] Ren, B., Liu, C., Cheng, B., Guo, J., and Chen, J., "MobiSentry: Towards easy and effective detection of android malware on smartphones," Mobile Information Systems, Vol. 2018, 2018.
    [34] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., "Learning representations by back-propagating errors," nature, Vol. 323, No. 6088, pp. 533-536, 1986.
    [35] Sun, G. and Qian, Q., "Deep learning and visualization for identifying malware families," IEEE Transactions on Dependable and Secure Computing, 2018.
    [36] Sundermeyer, M., Schlüter, R., and Ney, H., "LSTM neural networks for language modeling," Thirteenth annual conference of the international speech communication association, 2012.
    [37] Vaswani, A. et al., "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008, 2017.
    [38] Vinayakumar, R., Soman, K., Poornachandran, P., and Sachin Kumar, S., "Detecting Android malware using long short-term memory (LSTM)," Journal of Intelligent & Fuzzy Systems, Vol. 34, No. 3, pp. 1277-1288, 2018.
    [39] Wang, W., Gao, Z., Zhao, M., Li, Y., Liu, J., and Zhang, X., "DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features," IEEE Access, Vol. 6, pp. 31798-31807, 2018.
    [40] Wei, F., Li, Y., Roy, S., Ou, X., and Zhou, W., "Deep ground truth analysis of current android malware," International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 252-276, 2017.
    [41] Xiao, X., Zhang, S., Mercaldo, F., Hu, G., and Sangaiah, A. K., "Android malware detection based on system call sequences and LSTM," Multimedia Tools and Applications, Vol. 78, No. 4, pp. 3979-3999, 2019.
    [42] Xiaofeng, L., Xiao, Z., Fangshuo, J., Shengwei, Y., and Jing, S., "ASSCA: API based sequence and statistics features combined malware detection architecture," Procedia Computer Science, Vol. 129, pp. 248-256, 2018.
    [43] Yan, J., Qi, Y., and Rao, Q., "Detecting malware with an ensemble method based on deep neural network," Security and Communication Networks, Vol. 2018, 2018.
    [44] Yan, J., Qi, Y., and Rao, Q., "LSTM-based hierarchical denoising network for Android malware detection," Security and Communication Networks, Vol. 2018, 2018.
    [45] Ye, Y. et al., "AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection," 2018.
    [46] Yuan, Z., Lu, Y., Wang, Z., and Xue, Y., "Droid-sec: deep learning in android malware detection," Proceedings of the 2014 ACM conference on SIGCOMM, pp. 371-372, 2014.
    [47] Yuan, Z., Lu, Y., Xue, Y. J. T. S., and Technology, "Droiddetector: android malware characterization and detection using deep learning," Vol. 21, No. 1, pp. 114-123, 2016.
    [48] Zhang, F., Huang, H., Zhu, S., Wu, D., and Liu, P., "ViewDroid: Towards obfuscation-resilient mobile application repackaging detection," Proceedings of the 2014 ACM conference on Security and privacy in wireless & mobile networks, pp. 25-36, 2014.
    [49] Zhang, Y., Yang, Y., and Wang, X., "A novel android malware detection approach based on convolutional neural network," Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, pp. 144-149, 2018.

    QR CODE
    :::