| 研究生: |
林詠麒 Yong-Chi Lin |
|---|---|
| 論文名稱: |
基於輕量化特徵選擇與樹模型之網路惡意流量偵測設計與分析 Design and Analysis of Network Malicious Traffic Detection Based on Lightweight Feature Selection and Tree-Based Models |
| 指導教授: |
陳永芳
Yung-Fang Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系在職專班 Executive Master of Communication Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 92 |
| 中文關鍵詞: | 惡意流量偵測 、輕量梯度提升 、決策樹 、隨機森林 、極限梯度提升 |
| 外文關鍵詞: | Malicious Traffic Detection, LightGBM, Decision Tree, Random Forest, XGBoost |
| 相關次數: | 點閱:27 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究即是探討機器學習方法於網路惡意流量偵測中的應用,目標為設計一套兼具辨識效能與測試效率的偵測模型。實驗中選用UNSW-NB15與CSE-CIC-IDS2018兩組公開數據集作為基礎,這兩個數據集涵蓋從基本偵查攻擊到複雜系統漏洞利用等多種真實世界的網路攻擊情境。在模型建構前,針對兩組數據集分別進行適當的預處理,包括數據清洗、重複值及缺失值處理與類型轉換。完成預處理後,採用輕量梯度提升的嵌入式特徵選擇法進行關鍵特徵篩選,並進一步建構雙層樹模型架構,分別結合決策樹、隨機森林、極限梯度提升與輕量梯度提升,強化模型對惡意流量的辨識能力與泛化效果。為評估模型效能,本研究採用多項指標進行量化分析。實驗結果顯示,在相同特徵選擇條件下,輕量梯度提升於兩個數據集中皆達成最高整體準確度與F1-score,同時還具備所有模型中最短的每筆測試時間,為本次實驗最佳;隨機森林在兩組數據集中各項指標略低於輕量梯度提升且測試時間稍長。極限梯度提升在惡意流量偵測上具備高召回率與中等測試時間;而單一決策樹雖測試速度最快,但分類準確度明顯低於前述集成模型。本研究驗證了將輕量梯度提升特徵篩選結合樹模型的方法,能有效提升惡意流量識別的效能與效率,並且模型對不同數據集有良好的適應能力,具備實務可行性與應用潛力。
This study investigates the application of machine learning in malicious traffic detection, aiming to design a model that achieves both high performance and efficiency. Experiments were conducted on the UNSW-NB15 and CSE-CIC-IDS2018 datasets, which include various real-world attack scenarios. After preprocessing, LightGBM’s embedded method was used for feature selection. Based on the selected features, four models—Decision Tree, Random Forest, XGBoost, and LightGBM—were individually trained and compared. Results show that LightGBM achieved the best performance in accuracy, F1-score, and testing speed, making it the best-performing model in this study. Random Forest performed consistently with high recall; XGBoost showed strong malicious flow detection with moderate test time; while Decision Tree was fastest but less accurate. Overall, the proposed method demonstrates high detection effectiveness, efficiency, and adaptability, indicating strong potential for real-world deployment.
[1] Verizon (2023). 2023 Data Breach Investigations Report.
[2] Mohurle, S. and Patil, M. (2017). A brief study of wannacry threat: Ransomware attack 2017. Journal of Advanced Research in Computer Science, 8(5):1938-3940.
[3] Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3(3), 210-229.
[4] Ke et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 30.
[5] Bellman, R. (1957). Dynamic programming. Princeton University Press.
[6] Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering.
[7] Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
[8] Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.
[9] Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). "Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization", ICISSP 2018.
[10] Waad Bouaguel (2015), On Feature Selection Methods for Credit Scoring.
[11] B. Venkatesh, J. Anuradha (2019), A Review of Feature Selection and Its Methods.
[12] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
[13] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
[14] Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
[15] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
[16] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154.
[17] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
[18] Han, L., Wang, J., & Li, T. (2020). An improved XGBoost-based network intrusion detection system using feature selection. International Journal of Network Security & Its Applications, 12(3), 31–47.
[19] Zhang, H., Zhao, X., & Liu, J. (2021). Enhancing precision in intrusion detection using random forest and ensemble learning. Cybersecurity and Data Privacy, 8(1), 72–88.
[20] Sharma, R., Patel, B., & Singh, M. (2022). Deep learning-based intrusion detection systems: A recall-driven approach. Applied Artificial Intelligence, 36(5), 299–317.
[21] Hsu, C.-Y., Lin, Y.-T., & Cheng, W. (2023). Evaluating intrusion detection systems using F1-score and hybrid feature selection. IEEE Transactions on Information Forensics and Security, 18, 1124–1135.
[22] Ali, M., Khan, S., & Raza, M. (2023). A hybrid machine learning model for anomaly-based intrusion detection systems. Journal of Cyber Security and Intelligence, 15(2), 45–61.
[23] 郭芳瑜(2023)。人工智慧方法對於網路入侵攻擊的預測。﹝碩士論文。國立中興大學﹞臺灣博碩士論文知識加值系統。
[24] 林詩穎(2024)。基於機器學習的入侵檢測系統。﹝碩士論文。大葉大學﹞臺灣博碩士論文知識加值系統。
[25] 郭伊陽(2022)。應用人工智慧的入侵檢測系統以特徵學習為例。﹝碩士論文。中華大學﹞臺灣博碩士論文知識加值系統。
[26] Scikit-learn developers. (n.d.). LabelEncoder. Scikit-learn. Retrieved April 3, 2025, from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
[27] He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
[28] Pearson, K. (1904). Mathematical contributions to the theory of evolution. Dulau and Co.
[29] Google. (n.d.). Google Colaboratory. Retrieved April 3, 2025, from https://colab.research.google.com/
[30] Leevy, J. L., Hancock, J., Zuech, R., & Khoshgoftaar, T. M. (2021). Detecting cybersecurity attacks across different network features and learners. Journal of Big Data, 8(1), 38.