結合函式呼叫圖語意特徵及域適應技術之Android 抗混淆惡意軟體檢測模型研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊蕙瑄 Hui-Hsuan Yang
論文名稱：	結合函式呼叫圖語意特徵及域適應技術之Android 抗混淆惡意軟體檢測模型研究 A Research of Android Anti-Obfuscated Malware Detection Combined with Function Call Graph Semantic Feature and Domain Adaptation
指導教授：	陳奕明 Yi-Ming Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	100
中文關鍵詞：	混淆攻擊、深度學習、遷移學習、Android惡意軟體檢測、靜態分析
外文關鍵詞：	obfuscate attack, deep learning, transfer learning, Android malware detection, static analysis
相關次數：	點閱：13 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來人工智慧技術被廣泛應用在Android惡意程式檢測研究中。但是惡意軟體開發人員也會透過不同方式逃避檢測，一種常見的方式叫做混淆攻擊，透過這種攻擊方式可以改變APK結構，使得檢測系統提取之特徵改變，導致模型判斷錯誤。根據先前的研究，一個原本可以達到97.7%的惡意軟體檢測模型，在接收到經過API Call Obfuscation技術混淆之資料後準確率會只剩下50.3%。本研究從特徵面與模型面思考如何防禦混淆問題，從特徵面來看，APK經過混淆後特徵雖然被改變，但還是要能夠表現出混淆前的行為，所以如果在特徵前處理的過程中可以表達軟體的行為將降低混淆對檢測系統的影響。本研究選擇函式呼叫圖（Function Call Graph）做為特徵基礎，並利用節點崁入（Node Embedding）技術學習節點之間表達的語意訊息，以建模軟體的行為特徵。而從模型面思考，儘管Node Embedding可以學習到APK的語意訊息，一些進階的混淆技術會透過修改程式碼的方式使得不同語意可以表達出相同行為。所以在模型面，本研究將使用遷移學習（Transfer Learning）中的域適應（Domain Adaptation）技術訓練模型，讓模型可以拉近混淆前後資料集在特徵空間中的距離，使得模型能夠判斷經過混淆之資料集，以達到抗混淆之目的。本研究所提出的檢測系統在未經混淆的情況下可以達到0.9888的檢測準確率，而在受到多種混淆技術的情況下可以維持平均0.9672的檢測準確率。其中Domain Adaptation技術將經過CallIndirection混淆影響的檢測準確率從87%提升到95%。

Artificial intelligence（AI）is widely used in Android malware detection. However, malware developers will use different methods to evade detection. A common method is called obfuscate attack. APK structure can be changed through the attack, resulting in model misjudgment. According to other research, a malware detection model that can reach 97.7% accuracy only have an accuracy rate of 51.3% after receiving the APK obfuscated by API Call Obfuscation. This research shows how to defend obfuscation in two aspects. From the sight of features, although the characteristics of APK will change after obfuscation, it still needs to keep the behavior before obfuscation. Therefore, if the behavior of an APK can be extracted in the process of feature preprocessing, the impact of obfuscation will reduce. As a result, this study chooses Function Call Graph（FCG）as a feature and uses Node Embedding to learn the semantic information between functions. From the perspective of the model, some advanced obfuscation attacks will modify code structure letting different semantics express the same behavior. Therefore, this study uses Domain Adaptation to train the model, so that the model can shorten the distance between different domains. Resulting the model to classify the obfuscated dataset to achieve the purpose of anti-obfuscation. My detection system can achieve 98% accuracy without obfuscated attacks. When facing multiple types of obfuscation attacks, it can maintain an average accuracy of 96%. In addition, Domain Adaptation improves the detection accuracy affected by CallIndirection from 87% to 95%.

摘要.................................................... i
Abstract............................................... ii
誌謝.................................................. iii
目錄................................................... iv
圖目錄................................................. vii
表目錄................................................... x
一、緒論................................................. 1
1.研究背景............................................. 1
2.研究動機與目的........................................ 2
3.研究貢獻............................................. 3
4.章節架構............................................. 4
二、相關研究.............................................. 5
1 Android惡意軟體混淆................................... 5
1.1 Android 惡意軟體檢測................................ 5
1.2 Android 惡意軟體攻擊................................ 5
1.3 混淆技術用於Android惡意軟體......................... 6
2 抗混淆檢測系統........................................ 7
2.1 特徵面之抗混淆檢測系統............................... 7
2.2 模型面之抗混淆檢測系統............................... 9
3 圖神經網路之節點表示................................... 10
3.1 LINE（Large-scale information network embedding）.. 11
3.2 SDNE（Structural Deep Network Embedding）.......... 12
3.3 Node2Vec........................................... 12
3.4 Struc2Vec.......................................... 12
4 域適應（Domain Adaptation）........................... 13
三、研究方法.............................................. 16
1 系統架構............................................. 16
2 特徵前處理............................................ 17
2.1 提取FCG............................................. 17
2.2 取得敏感子圖........................................ 18
2.3 節點表示學習........................................ 20
2.4 調整向量大小........................................ 22
3 模型訓練&測試......................................... 26
4 Domain Adaptation模型訓練&測試........................ 27
4.1 混淆資料集.......................................... 28
4.2 Domain Adaptation模型訓練........................... 29
4.3 測試混淆資料集...................................... 32
四、實驗與評估............................................ 34
1 實驗環境.............................................. 34
1.1 硬體與軟體設置...................................... 34
1.2 實驗資料集.......................................... 35
2 評估指標.............................................. 37
3 待回答之問題.......................................... 39
4 實驗設計與結果........................................ 39
4.1 實驗一.............................................. 39
4.2 實驗二.............................................. 44
4.3 實驗三.............................................. 47
4.4 實驗四.............................................. 52
4.5 實驗五.............................................. 61
4.6 實驗六.............................................. 73
五、結論與未來研究......................................... 79
1 研究總結.............................................. 79
2 研究限制.............................................. 80
3 未來研究.............................................. 81
參考文獻.................................................. 82

                                

[1]“Smartphone subscriptions worldwide 2027,” Statista. https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/ (accessed Sep. 07, 2022).
[2]“Global mobile OS market share 2012-2022,” Statista. https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/ (accessed Sep. 07, 2022).
[3]M. Hammad, J. Garcia, and S. Malek, “A large-scale empirical study on the effects of code obfuscations on Android apps and anti-malware products,” in Proceedings of the 40th International Conference on Software Engineering, in ICSE ’18. New York, NY, USA: Association for Computing Machinery, May 2018, pp. 421–431. doi: 10.1145/3180155.3180228.
[4]“DANdroid | Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy.” https://dl.acm.org/doi/abs/10.1145/3374664.3375746 (accessed Mar. 02, 2023).
[5]Y. Yang, X. Du, Z. Yang, and X. Liu, “Android Malware Detection Based on Structural Features of the Function Call Graph,” Electronics, vol. 10, p. 186, Jan. 2021, doi: 10.3390/electronics10020186.
[6]Y. Jiang, R. Li, J. Tang, A. Davanian, and H. Yin, “AOMDroid: Detecting Obfuscation Variants of Android Malware Using Transfer Learning,” in Security and Privacy in Communication Networks, N. Park, K. Sun, S. Foresti, K. Butler, and N. Saxena, Eds., in Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Cham: Springer International Publishing, 2020, pp. 242–253. doi: 10.1007/978-3-030-63095-9_14.
[7]“Drebin: Effective and Explainable Detection of Android Malware in Your Pocket – NDSS Symposium.” https://www.ndss-symposium.org/ndss2014/programme/drebin-effective-and-explainable-detection-android-malware-your-pocket/ (accessed Sep. 23, 2022).
[8]M. Fan et al., “Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 8, pp. 1890–1905, Aug. 2018, doi: 10.1109/TIFS.2018.2806891.
[9]J. Tang, R. Li, Y. Jiang, X. Gu, and Y. Li, “Android malware obfuscation variants detection method based on multi-granularity opcode features,” Future Gener. Comput. Syst., vol. 129, no. C, pp. 141–151, Apr. 2022, doi: 10.1016/j.future.2021.11.005.
[10]Q. Li, Q. Hu, Y. Qi, S. Qi, X. Liu, and P. Gao, “Semi-supervised two-phase familial analysis of Android malware with normalized graph embedding,” Knowledge-Based Systems, vol. 218, p. 106802, Apr. 2021, doi: 10.1016/j.knosys.2021.106802.
[11]“PetaDroid: Adaptive Android Malware Detection Using Deep Learning | SpringerLink.” https://link.springer.com/chapter/10.1007/978-3-030-80825-9_16 (accessed Sep. 23, 2022).
[12]Y. Sung, S. Jang, Y.-S. Jeong, and J. H. (James J. ) Park, “Malware classification algorithm using advanced Word2vec-based Bi-LSTM for ground control stations,” Computer Communications, vol. 153, pp. 342–348, Mar. 2020, doi: 10.1016/j.comcom.2020.02.005.
[13]G. C. Georgiu, “ClaudiuGeorgiu/Obfuscapk.” Nov. 23, 2022. Accessed: Nov. 25, 2022. [Online]. Available: https://github.com/ClaudiuGeorgiu/Obfuscapk
[14]S. Arshad, M. Shah, A. Khan, and M. Ahmed, “Android Malware Detection & Protection: A Survey,” International Journal of Advanced Computer Science and Applications, vol. 7, Feb. 2016, doi: 10.14569/IJACSA.2016.070262.
[15]“A Review of Android Malware Detection Approaches Based on Machine Learning | IEEE Journals & Magazine | IEEE Xplore.” https://ieeexplore.ieee.org/document/9130686 (accessed Sep. 26, 2022).
[16]B. H. Tang et al., “Android Malware Detection Based on Deep Learning Techniques,” in 2021 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Aug. 2021, pp. 481–486. doi: 10.1109/PRAI53619.2021.9551073.
[17]S. Dong et al., “Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild,” Jan. 2018.
[18]V. Sihag, M. Vardhan, and P. Singh, “A survey of android application and malware hardening,” Computer Science Review, vol. 39, p. 100365, Feb. 2021, doi: 10.1016/j.cosrev.2021.100365.
[19]A. Apvrille and R. Nigam, “OBFUSCATION IN ANDROID MALWARE, AND HOW TO FIGHT BACK,” p. 10, 2014.
[20]A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, and F. Mercaldo, “Detection of Obfuscation Techniques in Android Applications,” in Proceedings of the 13th International Conference on Availability, Reliability and Security, Hamburg Germany: ACM, Aug. 2018, pp. 1–9. doi: 10.1145/3230833.3232823.
[21]V. Rastogi, Y. Chen, and X. Jiang, “Catch Me If You Can: Evaluating Android Anti-Malware Against Transformation Attacks,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 1, pp. 99–108, Jan. 2014, doi: 10.1109/TIFS.2013.2290431.
[22]X. Lu, J. Zhao, and P. Lio, “Robust android malware detection based on subgraph network and denoising GCN network,” in Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, in MobiSys ’22. New York, NY, USA: Association for Computing Machinery, Jun. 2022, pp. 549–550. doi: 10.1145/3498361.3538778.
[23]S. Kumar, D. Mishra, B. Panda, and S. K. Shukla, “DeepDetect: A Practical On-device Android Malware Detector,” in 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), Feb. 2021, pp. 40–51. doi: 10.1109/QRS54544.2021.00015.
[24]Y. Wu, S. Dou, D. Zou, W. Yang, W. Qiang, and H. Jin, “Obfuscation-resilient Android Malware Analysis Based on Contrastive Learning.” arXiv, Jul. 08, 2021. doi: 10.48550/arXiv.2107.03799.
[25]P. Singh, S. K. Borgohain, and J. Kumar, “Performance Enhancement of SVM-based ML Malware Detection Model Using Data Preprocessing,” in 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET), Jun. 2022, pp. 1–4. doi: 10.1109/ICEFEET51821.2022.9848192.
[26]J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: Large-scale Information Network Embedding,” in Proceedings of the 24th International Conference on World Wide Web, in WWW ’15. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, May 2015, pp. 1067–1077. doi: 10.1145/2736277.2741093.
[27]“Structural Deep Network Embedding | Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.” https://dl.acm.org/doi/10.1145/2939672.2939753 (accessed Oct. 12, 2022).
[28]A. Grover and J. Leskovec, “node2vec: Scalable Feature Learning for Networks,” KDD, vol. 2016, pp. 855–864, Aug. 2016, doi: 10.1145/2939672.2939754.
[29]“struc2vec | Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.” https://dl.acm.org/doi/abs/10.1145/3097983.3098061?casa_token=jbKAq2czGl8AAAAA:Bku89SFCqgrIUNrPMK8fUBgtUrA0LJhW3tJtsz83QkdsA42WOF1stZ2Ua7-uzXeb_zA3XCZTCTCpVAQ (accessed Oct. 12, 2022).
[30]B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, in KDD ’14. New York, NY, USA: Association for Computing Machinery, Aug. 2014, pp. 701–710. doi: 10.1145/2623330.2623732.
[31]F. Zhuang et al., “A Comprehensive Survey on Transfer Learning,” Proceedings of the IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555.
[32]E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep Domain Confusion: Maximizing for Domain Invariance.” arXiv, Dec. 10, 2014. doi: 10.48550/arXiv.1412.3474.
[33]L. Jingjing, C. Erpeng, D. Zhengming, Z. Lei, L. Ke, and S. H. Tao, “Maximum Density Divergence for Domain Adaptation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 3918–3930, Nov. 2021, doi: 10.1109/TPAMI.2020.2991050.
[34]“Androguard.” androguard, Aug. 26, 2022. Accessed: Aug. 26, 2022. [Online]. Available: https://github.com/androguard/androguard
[35]M. Backes, S. Bugiel, E. Derr, P. McDaniel, D. Octeau, and S. Weisgerber, “On Demystifying the Android Application Framework: {Re-Visiting} Android Permission Specification Analysis,” presented at the 25th USENIX Security Symposium (USENIX Security 16), 2016, pp. 1101–1118. Accessed: Oct. 01, 2022. [Online]. Available: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/backes_android
[36]K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon, “AndroZoo: collecting millions of Android apps for the research community,” in Proceedings of the 13th International Conference on Mining Software Repositories, Austin Texas: ACM, May 2016, pp. 468–471. doi: 10.1145/2901739.2903508.
[37]“The Drebin Dataset.” https://www.sec.tu-bs.de/~danarp/drebin/ (accessed Oct. 25, 2022).
[38]“Deep Ground Truth Analysis of Current Android Malware | SpringerLink.” https://link.springer.com/chapter/10.1007/978-3-319-60876-1_12 (accessed Oct. 25, 2022).
[39]J. Li, L. Sun, Q. Yan, Z. Li, W. Srisa-an, and H. Ye, “Significant Permission Identification for Machine-Learning-Based Android Malware Detection,” IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3216–3225, Jul. 2018, doi: 10.1109/TII.2017.2789219.
[40]D. Zhu, H. Jin, Y. Yang, D. Wu, and W. Chen, “DeepFlow: Deep learning-based malware detection by mining Android application for abnormal usage of sensitive data,” in 2017 IEEE Symposium on Computers and Communications (ISCC), Jul. 2017, pp. 438–443. doi: 10.1109/ISCC.2017.8024568.
[41]X. Ge, Y. Pan, Y. Fan, and C. Fang, “AMDroid: Android Malware Detection Using Function Call Graphs,” in 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Jul. 2019, pp. 71–77. doi: 10.1109/QRS-C.2019.00027.
[42]“DexProtector,” DexProtector, Aug. 29, 2014. https://dexprotector.com/dexprotector (accessed Mar. 15, 2023).

簡易檢索 / 詳目顯示

相關論文