跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林詠麒
Yong-Chi Lin
論文名稱: 基於輕量化特徵選擇與樹模型之網路惡意流量偵測設計與分析
Design and Analysis of Network Malicious Traffic Detection Based on Lightweight Feature Selection and Tree-Based Models
指導教授: 陳永芳
Yung-Fang Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 通訊工程學系在職專班
Executive Master of Communication Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 92
中文關鍵詞: 惡意流量偵測輕量梯度提升決策樹隨機森林極限梯度提升
外文關鍵詞: Malicious Traffic Detection, LightGBM, Decision Tree, Random Forest, XGBoost
相關次數: 點閱:27下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究即是探討機器學習方法於網路惡意流量偵測中的應用,目標為設計一套兼具辨識效能與測試效率的偵測模型。實驗中選用UNSW-NB15與CSE-CIC-IDS2018兩組公開數據集作為基礎,這兩個數據集涵蓋從基本偵查攻擊到複雜系統漏洞利用等多種真實世界的網路攻擊情境。在模型建構前,針對兩組數據集分別進行適當的預處理,包括數據清洗、重複值及缺失值處理與類型轉換。完成預處理後,採用輕量梯度提升的嵌入式特徵選擇法進行關鍵特徵篩選,並進一步建構雙層樹模型架構,分別結合決策樹、隨機森林、極限梯度提升與輕量梯度提升,強化模型對惡意流量的辨識能力與泛化效果。為評估模型效能,本研究採用多項指標進行量化分析。實驗結果顯示,在相同特徵選擇條件下,輕量梯度提升於兩個數據集中皆達成最高整體準確度與F1-score,同時還具備所有模型中最短的每筆測試時間,為本次實驗最佳;隨機森林在兩組數據集中各項指標略低於輕量梯度提升且測試時間稍長。極限梯度提升在惡意流量偵測上具備高召回率與中等測試時間;而單一決策樹雖測試速度最快,但分類準確度明顯低於前述集成模型。本研究驗證了將輕量梯度提升特徵篩選結合樹模型的方法,能有效提升惡意流量識別的效能與效率,並且模型對不同數據集有良好的適應能力,具備實務可行性與應用潛力。


    This study investigates the application of machine learning in malicious traffic detection, aiming to design a model that achieves both high performance and efficiency. Experiments were conducted on the UNSW-NB15 and CSE-CIC-IDS2018 datasets, which include various real-world attack scenarios. After preprocessing, LightGBM’s embedded method was used for feature selection. Based on the selected features, four models—Decision Tree, Random Forest, XGBoost, and LightGBM—were individually trained and compared. Results show that LightGBM achieved the best performance in accuracy, F1-score, and testing speed, making it the best-performing model in this study. Random Forest performed consistently with high recall; XGBoost showed strong malicious flow detection with moderate test time; while Decision Tree was fastest but less accurate. Overall, the proposed method demonstrates high detection effectiveness, efficiency, and adaptability, indicating strong potential for real-world deployment.

    第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 2 1.4 論文架構 3 第二章 文獻探討 4 2.1數據集介紹 4 2.1.1 UNSW-NB15數據集 4 2.1.2 CSE-CIC-IDS2018數據集 10 2.2 特徵選擇 13 2.2.1 篩選法(Filter Methods) 14 2.2.2 包裝法(Wrapper Methods) 14 2.2.3 嵌入法(Embedded Methods) 15 2.3 樹模型演算法 17 2.3.1 決策樹(Decision Tree) 17 2.3.2 隨機森林(Random Forest, RF) 18 2.3.3 梯度提升樹(Gradient Boosting Trees, GBT) 19 2.3.4 輕量梯度提升(LightGBM) 20 2.3.5 極限梯度提升(XGBoost) 21 2.4 效能評估指標 22 2.5 相關研究 24 第三章 研究方法 26 3.1 特徵選擇與提取 28 3.1.1 數據收集及預處理 28 3.1.2 隨機欠採樣(Random Undersampling) 30 3.1.3 特徵選擇 31 3.2 模型構建與測試 33 3.2.1 模型構建 33 3.2.2 模型測試 37 3.3 模型效能評估 38 3.3.1 評估指標 38 3.3.2 混淆矩陣(Confusion Matrix)分析 39 第四章 實驗過程與結果 40 4.1 實驗環境 40 4.2 數據集辨識及預處理 41 4.2.1 處理數據不平衡 46 4.3 特徵貢獻度分析 47 4.3.1 LightGBM分類器超參數設定 47 4.3.2 計算特徵貢獻度 48 4.3.3 特徵篩選 51 4.4 模型訓練 52 4.5 評估模型的測試表現 54 4.5.1 模型測試效率 67 4.6 與其他特徵選擇方式及機器學習模型比較 68 4.7 實驗結果 72 第五章 研究結論與未來建議 74 5.1 研究結論與貢獻 74 5.2 未來研究建議 76 參考文獻 77

    [1] Verizon (2023). 2023 Data Breach Investigations Report.
    [2] Mohurle, S. and Patil, M. (2017). A brief study of wannacry threat: Ransomware attack 2017. Journal of Advanced Research in Computer Science, 8(5):1938-3940.
    [3] Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3(3), 210-229.
    [4] Ke et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 30.
    [5] Bellman, R. (1957). Dynamic programming. Princeton University Press.
    [6] Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering.
    [7] Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
    [8] Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.
    [9] Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). "Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization", ICISSP 2018.
    [10] Waad Bouaguel (2015), On Feature Selection Methods for Credit Scoring.
    [11] B. Venkatesh, J. Anuradha (2019), A Review of Feature Selection and Its Methods.
    [12] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
    [13] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
    [14] Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
    [15] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
    [16] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154.
    [17] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
    [18] Han, L., Wang, J., & Li, T. (2020). An improved XGBoost-based network intrusion detection system using feature selection. International Journal of Network Security & Its Applications, 12(3), 31–47.
    [19] Zhang, H., Zhao, X., & Liu, J. (2021). Enhancing precision in intrusion detection using random forest and ensemble learning. Cybersecurity and Data Privacy, 8(1), 72–88.
    [20] Sharma, R., Patel, B., & Singh, M. (2022). Deep learning-based intrusion detection systems: A recall-driven approach. Applied Artificial Intelligence, 36(5), 299–317.
    [21] Hsu, C.-Y., Lin, Y.-T., & Cheng, W. (2023). Evaluating intrusion detection systems using F1-score and hybrid feature selection. IEEE Transactions on Information Forensics and Security, 18, 1124–1135.
    [22] Ali, M., Khan, S., & Raza, M. (2023). A hybrid machine learning model for anomaly-based intrusion detection systems. Journal of Cyber Security and Intelligence, 15(2), 45–61.
    [23] 郭芳瑜(2023)。人工智慧方法對於網路入侵攻擊的預測。﹝碩士論文。國立中興大學﹞臺灣博碩士論文知識加值系統。
    [24] 林詩穎(2024)。基於機器學習的入侵檢測系統。﹝碩士論文。大葉大學﹞臺灣博碩士論文知識加值系統。
    [25] 郭伊陽(2022)。應用人工智慧的入侵檢測系統以特徵學習為例。﹝碩士論文。中華大學﹞臺灣博碩士論文知識加值系統。
    [26] Scikit-learn developers. (n.d.). LabelEncoder. Scikit-learn. Retrieved April 3, 2025, from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
    [27] He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
    [28] Pearson, K. (1904). Mathematical contributions to the theory of evolution. Dulau and Co.
    [29] Google. (n.d.). Google Colaboratory. Retrieved April 3, 2025, from https://colab.research.google.com/
    [30] Leevy, J. L., Hancock, J., Zuech, R., & Khoshgoftaar, T. M. (2021). Detecting cybersecurity attacks across different network features and learners. Journal of Big Data, 8(1), 38.

    QR CODE
    :::