| 研究生: |
林聖富 Sheng-Fu Lin |
|---|---|
| 論文名稱: |
基於二階段分類器之惡意流量偵測 Two-stage Classifier For Malicious Traffic Detection |
| 指導教授: |
陳奕明
Yi-Ming Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系在職專班 Executive Master of Information Management |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 90 |
| 中文關鍵詞: | 網路入侵偵測 、資訊獲利 、極限隨機樹 、ADASYN 、TomekLinks |
| 外文關鍵詞: | NIDS, Information Gain, Extra Trees, ADASYN, TomekLinks |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著物聯網(Internet of Things, IoT)的迅速發展,我們面臨著越來越多的資訊安全威脅。為有效對抗這些威脅,機器學習已被廣泛應用於網路入侵偵測(Network Intrusion Detection, NIDS)。然而,面對這些龐大的入侵偵測數據,經常出現資料不平衡和特徵冗餘的問題,導致分類器在訓練過程中易於過度擬合,進而影響模型的效能與準確率。本研究提出了一種新穎的二階段分類器模型,可偵測二元和多元分類,該模型糅合了機器學習和集成學習的方法,並結合特徵選擇和資料平衡方法,以應對大規模網路流量。在第一階段,本研究比較六種機器學習方法的準確率和時間效率,最終選擇了決策樹(Decision Tree)作為分類器來識別正常和攻擊數據。在第二階段,本研究對第一階段預測為攻擊的數據進行攻擊類別的分類,利用資訊獲利(Information Gain)來選擇重要的特徵,並比較了三種集成學習方法和兩種資料平衡方法,實驗顯示,極限隨機樹(Extra Trees)和ADASYN+TomekLinks方法具高模型效能及時間效率,並優於SMOTE平衡方法。本研究在CIC-IDS2017和UNSW-NB15兩種不同的資料集上驗證了二階段分類器模型具有卓越的偵測效能,F1-Score分別可達到99.65%和79.70%,總訓練時間分別為171.47秒和11.95秒,相較於其他研究,本研究的模型在效能和時間效率表現更為出色。
As the Internet of Things(IoT) developes rapidly, people are facing the increasing number information security threats. As a result, machine learning has been widely applied in detecting network intrusion to effectively combat these threats. However, while facing this massive data for intrusion detection the problem such as data imbalance and feature redundancy has been provoked as the same time.These issues cause classifiers to overfit during the training process, subsequently affecting the efficiency and accuracy of the model. This study proposes a novel two-stage classifier model capable of detecting binary and multi-category classifications. The model incorporates both machine learning and ensemble learning methods, combined with feature selection and data balancing techniques, to address large-scale network traffic. In the first stage, we compared the accuracy and time efficiency of six machine learning methods, ultimately selecting the Decision Tree as the classifier to distinguish between normal and attack data. In the second stage, the data predicted as attacks in the first stage are classified into attack categories. Information Gain is used to select significant features, and three ensemble learning methods and two data balancing methods are compared. Experimental results indicate that the Extra Trees and ADASYN+TomekLinks methods provide high model efficiency and time efficiency, outperforming the SMOTE balancing method. This study validates the excellent detection efficiency of the two-stage classifier model on two different datasets, CIC-IDS2017 and UNSW-NB15, with F1-Scores reaching 99.65% and 79.70% respectively. The total training time is 171.47 seconds and 11.95 seconds, respectively. Compared to other research, the model in this study exhibits superior performance in both efficiency and time efficiency.
[1]Z. K. A. Mohammed and E. S. A. Ahmed, “Internet of Things Applications, Challenges and Related Future Technologies,” 2017.
[2]R. Khan, S. U. Khan, R. Zaheer, and S. Khan, “Future Internet: The Internet of Things Architecture, Possible Applications and Key Challenges,” in 2012 10th International Conference on Frontiers of Information Technology, Islamabad, Pakistan: IEEE, Dec. 2012, pp. 257–260.
[3]N. M. Karie, N. M. Sahri, and P. Haskell-Dowland, “IoT Threat Detection Advances, Challenges and Future Directions,” in 2020 Workshop on Emerging Technologies for Security in IoT (ETSecIoT), Sydney, Australia: IEEE, Apr. 2020, pp. 22–29.
[4]TRANSFORMAINSIGHTS, “Global IoT connections to hit 29.4 billion in 2030.” Available: https://transformainsights.com/global-iot-connections-294
[5]Check Point, “Check Point Research Reports a 38% Increase in 2022 Global Cyberattacks.” Available:https://blog.checkpoint.com/2023/01/05/38-increase-in-2022-global-cyberattacks/
[6]DARKTRACE, “Generative AI Business Email Compromises and Novel Social Engineering Attacks.” Available: https://darktrace.com/news/darktrace-email-defends-organizations-against-evolving-cyber-threat-landscape
[7]B. Subba, S. Biswas, and S. Karmakar, “Intrusion Detection Systems using Linear Discriminant Analysis and Logistic Regression,” in 2015 Annual IEEE India Conference (INDICON), New Delhi, India: IEEE, Dec. 2015, pp. 1–6.
[8]V. M. Deolindo et al., “Using Quadratic Discriminant Analysis by Intrusion Detection Systems for Port Scan and Slowloris Attack Classification,” in Computational Science and Its Applications – ICCSA 2021, O. Gervasi, B. Murgante, S. Misra, C. Garau, I. Blečić, D. Taniar, B. O. Apduhan, A. M. A. C. Rocha, E. Tarantino, and C. M. Torre, Eds., Cham: Springer International Publishing, 2021, pp. 188–200.
[9]B. Naveen, J. K. Grandhi, K. Lasya, E. M. Reddy, N. Srinivasu, and S. Bulla, “Intrusion Detection System (IDS) using Machine Learning Algorithms against Network Attacks,” vol. 71, no. 4, 2022.
[10]Y. Hua, “An Efficient Traffic Classification Scheme Using Embedded Feature Selection and LightGBM,” in 2020 Information Communication Technologies Conference (ICTC), Nanjing, China: IEEE, May 2020, pp. 125–130.
[11]J. Thaker, N. K. Jadav, S. Tanwar, P. Bhattacharya, and H. Shahinzadeh, “Ensemble Learning-based Intrusion Detection System for Autonomous Vehicle,” in 2022 Sixth International Conference on Smart Cities, Internet of Things and Applications (SCIoT), Mashhad, Iran, Islamic Republic of: IEEE, Sep. 2022, pp. 1–6.
[12]L. Ashiku and C. Dagli, “Network Intrusion Detection System using Deep Learning,” Procedia Computer Science, vol. 185, pp. 239–247, 2021.
[13]H. Kaur, H. S. Pannu, and A. K. Malhi, “A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions,” ACM Computing Surveys, vol. 52, no. 4, pp. 1–36, Jul. 2020.
[14]S. Balakrishnan, V. K, and K. A, “Intrusion Detection System Using Feature Selection and Classification Technique,” International Journal of Computer Science and Application, vol. 3, no. 4, p. 145, 2014.
[15]H. Zhang, B. Zhang, L. Huang, Z. Zhang, and H. Huang, “An Efficient Two-Stage Network Intrusion Detection System in the Internet of Things,” Information, vol. 14, no. 2, p. 77, Jan. 2023.
[16]F. A. Khan, A. Gumaei, A. Derhab, and A. Hussain, “TSDL: A Two-Stage Deep Learning Model for Efficient Network Intrusion Detection,” IEEE Access, vol. 7, pp. 30373–30385, 2019.
[17]A. Jahan and M. A. Alam, “Intrusion Detection Systems based on Artificial Intelligence,” International Journal of Advanced Research in Computer Science, 2017.
[18]A. Pal Singh and M. Deep Singh, “Analysis of Host-Based and Network-Based Intrusion Detection System,” IJCNIS, vol. 6, no. 8, pp. 41–47, Jul. 2014.
[19]O. Depren, M. Topallar, E. Anarim, and M. K. Ciliz, “An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks,” Expert Systems with Applications, vol. 29, no. 4, pp. 713–722, Nov. 2005.
[20]dummies, “Examining Different Types of Intrusion Detection Systems.” Available: https://www.dummies.com/article/home-auto-hobbies/home-improvement-appliances/safety-security/examining-different-types-of-intrusion-detection-systems-200408/
[21]V. Kumar, Ed., Managing cyber threats: issues, approaches, and challenges. in Massive computing, no. 5. New York, NY: Springer, 2005.
[22]N. Moustafa and J. Slay, “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set,” Information Security Journal: A Global Perspective, vol. 25, no. 1–3, pp. 18–31, Apr. 2016.
[23]Canadian Institute for Cybersecurity, “CIC-IDS 2017 Datasets.” Available: https://www.unb.ca/cic/datasets/ids-2017.html
[24]Kurniabudi, D. Stiawan, Darmawijoyo, M. Y. Bin Idris, A. M. Bamhdi, and R. Budiarto, “CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection,” IEEE Access, vol. 8, pp. 132911–132921, 2020.
[25]E. Osa and O. E. Oghenevbaire, “Comparative Analysis of Machine Learning Models in Computer Network Intrusion Detection,” Lagos, Nigeria: IEEE, Apr. 2022, pp. 1–5.
[26]M. A. Almaiah et al., “Performance Investigation of Principal Component Analysis for Intrusion Detection System Using Different Support Vector Machine Kernels,” Electronics, vol. 11, no. 21, p. 3571, Nov. 2022.
[27]A. A. Abdulrahman and M. K. Ibrahem, “Toward Constructing a Balanced Intrusion Detection Dataset Based on CICIDS2017”.
[28]Q. Tian, D. Han, K.-C. Li, X. Liu, L. Duan, and A. Castiglione, “An intrusion detection approach based on improved deep belief network,” Applied Intelligence, vol. 50, no. 10, pp. 3162–3178, Oct. 2020.
[29]F. Jiang et al., “Deep Learning Based Multi-Channel Intelligent Attack Detection for Data Security,” IEEE Transactions on Sustainable Computing, vol. 5, no. 2, pp. 204–212, Apr. 2020.
[30]B. Cao, C. Li, Y. Song, Y. Qin, and C. Chen, “Network Intrusion Detection Model Based on CNN and GRU,” Applied Sciences, vol. 12, no. 9, p. 4184, Apr. 2022.
[31]L.-H. Li, R. Ahmad, W.-C. Tsai, and A. K. Sharma, “A Feature Selection Based DNN for Intrusion Detection System,” in 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), IEEE, Jan. 2021, pp. 1–8.
[32]Department of Computer Engineering, Istanbul University-Cerrahpasa, Istanbul, Turkey et al., “Multiple Classification of Cyber Attacks Using Machine Learning,” Electrica, vol. 22, no. 2, pp. 313–320, Jun. 2022.
[33]M. Ahmed Siddiqi and W. Pak, “An Optimized and Hybrid Framework for Image Processing Based Network Intrusion Detection System,” Computers, Materials & Continua, vol. 73, no. 2, pp. 3921–3949, 2022.
[34]Y. Sun et al., “Borderline SMOTE Algorithm and Feature Selection-Based Network Anomalies Detection Strategy,” Energies, vol. 15, no. 13, p. 4751, Jun. 2022.
[35]T. A. Alhaj, M. M. Siraj, A. Zainal, H. T. Elshoush, and F. Elhaj, “Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation,” PLOS ONE, vol. 11, no. 11, p. e0166017, Nov. 2016.
[36]Y. Wu, Y. Ding, and J. Feng, “SMOTE-Boost-based sparse Bayesian model for flood prediction,” J Wireless Com Network, vol. 2020, no. 1, p. 78, Dec. 2020.
[37]H. Kaur, H. S. Pannu, and A. K. Malhi, “A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions,” ACM Computing Surveys, vol. 52, no. 4, pp. 1–36, Jul. 2020.
[38]N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” jair, vol. 16, pp. 321–357, Jun. 2002.
[39]Z. Chen, L. Zhou, and W. Yu, “ADASYN−Random Forest Based Intrusion Detection Model,” in 2021 4th International Conference on Signal Processing and Machine Learning, Beijing China: ACM, Aug. 2021, pp. 152–159.
[40]A. Abdelkhalek and M. Mashaly, “Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning,” J Supercomput, vol. 79, no. 10, pp. 10611–10644, Jul. 2023.
[41]G. Sah, S. Singh, and S. Banerjee, “Intrusion Detection System Using Classification Algorithms with Feature Selection Mechanism over Real-time Data Traffic,” Jul. 2022.
[42]R. Abdulhammed, H. Musafer, A. Alessa, M. Faezipour, and A. Abuzneid, “Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection,” Electronics, vol. 8, no. 3, p. 322, Mar. 2019.
[43]A. Rosay, E. Cheval, F. Carlier, and P. Leroux, “Network Intrusion Detection: A Comprehensive Analysis of CIC-IDS2017:,” 2022, pp. 25–36.
[44]M. M. Rashid, J. Kamruzzaman, M. M. Hassan, T. Imam, and S. Gordon, “Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques,” IJERPH, vol. 17, no. 24, p. 9347, Dec. 2020.
[45]X. Liu, T. Li, R. Zhang, D. Wu, Y. Liu, and Z. Yang, “A GAN and Feature Selection-Based Oversampling Technique for Intrusion Detection,” Security and Communication Networks, vol. 2021, pp. 1–15, Jul. 2021.
[46]S. M. Kasongo and Y. Sun, “Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset,” J Big Data, vol. 7, no. 1, p. 105, Dec. 2020.
[47]R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, vol. 7, pp. 41525–41550, 2019.
[48]T.-C. Vuong, H. Tran, M. X. Trang, V.-D. Ngo, and T. V. Luong, “A Comparison of Feature Selection and Feature Extraction in Network Intrusion Detection Systems,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand: IEEE, Nov. 2022, pp. 1798–1804.