| 研究生: |
薛德豪 Te-Hao Hsueh |
|---|---|
| 論文名稱: |
基於半監督式學習的網路流量分類 Network traffic classification via semi-supervised learning |
| 指導教授: |
柯士文
Shih-Wen Ke |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系在職專班 Executive Master of Information Management |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 網路流量分析 、機器學習 、半監督式學習 、資料探勘 |
| 外文關鍵詞: | Wireshark, Label Propagation, Label Spreading |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
過去十幾年來,隨著物聯網與人工智慧的興起,人類對於網路的依賴程度也越來
越高,而網路的普及同時也帶來了網路安全的隱憂,因此網路流量分類成為了一個很
重要的網路安全議題。對於企業而言了解網路中各種應用程式所產生的流量是非常重
要的事情。透過進一步的分析與研究,企業可以更準確的掌握到整個公司的網路流
向、來源與目的。
本研究使用 Wireshark 蒐集了個案公司 P 的網路流量作為資料集,經過特徵選取
後,使用了半監督式學習演算法 Label Propagation Algorithm(LPA)、Label Spreading
Algorithm(LSA)對標有少量標籤的訓練資料集進行 pseudo label 的標籤預測,然後將
帶有 pseudo label 的訓練資料集結合四種機器學習分類器:決策樹、隨機森林、SVM、
貝式分類器中進行建模,建模完成之後,再以標有正確標籤的測試資料集進行預測。
實驗結果表明,若選擇使用 LPA 演算法結合 SVM 分類器建模,則可以達到最好的分
類成效。
Over the past few decades. With the rapid of the Internet of Things(IoT) and artificial
intelligence(AI). Human dependence on the network is more and more common and bring
the cybersecurity threats. Therefore, network traffic classification has become a crucial issue
in network security. For enterprise, it’s important to understand the flow generated by various
applications on the network. Through further analysis and research, enterprise can gain a more
understanding of the network flow, sources, and destinations within the entire company.
In this paper, we collected data from private enterprises to create a proprietary dataset. the
dataset was processed using the algorithm of Label Propagation(LPA)and Label Spreading
(LSA)to build model after feature selection. And then we use model to predict the small
amount labeled dataset and add pseudo label to this dataset. And then we use classifier such as
Decision Tree、Random Forest、Support Vector Machine(SVM)、Naïve Bayes to train
the dataset which include pseudo label and build model. Finally, we use this model to predict
test dataset. The experimental results demonstrate that when combining the LPA with SVM
classifier, it is possible to achieve an optimal effectiveness.
英文部分
[1] D. Marchette, "A statistical method for profiling network traffic," 1999.
[2] S. Zander, T. Nguyen, and G. Armitage, "Automated traffic classification and
application identification using machine learning," in The IEEE Conference on Local
Computer Networks 30th Anniversary (LCN'05)l, 17-17 Nov. 2005 2005, pp. 250-257,
doi: 10.1109/LCN.2005.35.
[3] J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, "Network Traffic
Classification Using Correlation Information," IEEE Transactions on Parallel and
Distributed Systems, vol. 24, no. 1, pp. 104-117, 2013, doi: 10.1109/TPDS.2012.98.
[4] M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, "Deep
packet: A novel approach for encrypted traffic classification using deep learning," Soft
Computing, vol. 24, no. 3, pp. 1999-2012, 2020.
[5] M. Soysal and E. G. Schmidt, "Machine learning algorithms for accurate flow-based
network traffic classification: Evaluation and comparison," Performance Evaluation,
vol. 67, no. 6, pp. 451-467, 2010.
[6] J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, "Robust Network Traffic
Classification," IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1257-
1270, 2015, doi: 10.1109/TNET.2014.2320577.
[7] Mohammad Reza Parsaei, Mohammad Javad Sobouti, Seyed Raouf khayami and Reza
Javidan, “Network Traffic Classification using Machine Learning Techniques over
Software Defined Networks” International Journal of Advanced Computer Science
and Applications(IJACSA), 8(7), 2017.
http://dx.doi.org/10.14569/IJACSA.2017.080729
[8] S. Ezennaya-Gomez, S. Kiltz, C. Kraetzer, and J. Dittmann, "A Semi-Automated
HTTP Traffic Analysis for Online Payments for Empowering Security, Forensics and
Privacy Analysis," presented at the Proceedings of the 16th International Conference
on Availability, Reliability and Security, Vienna, Austria, 2021. [Online]. Available:
https://doi.org/10.1145/3465481.3470114
[9] A. Kaur and M. Saluja, "Investigating TCP/IP, HTTP, ARP, ICMP Packets Using
Wireshark," 2014.
[10] A. G. D’Sa, I. Illina, D. Fohr, D. Klakow, and D. Ruiter, "Label Propagation-Based
Semi-Supervised Learning for Hate Speech Classification," Online, November 2020:
Association for Computational Linguistics, in Proceedings of the First Workshop on
Insights from Negative Results in NLP, pp. 54-59, doi: 10.18653/v1/2020.insights-1.8.
[Online]. Available: https://aclanthology.org/2020.insights-1.8
43
https://doi.org/10.18653/v1/2020.insights-1.8
[11] A. Azab, M. Khasawneh, S. Alrabaee, K.-K. R. Choo, and M. Sarsour, "Network
traffic classification: Techniques, datasets, and challenges," Digital Communications
and Networks, 2022/09/18/ 2022, doi:
https://doi.org/10.1016/j.dcan.2022.09.009
中文部分
[1] 陳品瑄, 陳俊傑, and 梁明章, "網路流量異常偵測分析-以 TWAREN 為例,"
2019, no. 2019: 國立金門大學, pp. 174-178, doi: 10.6927/ncs.201911.0035.
[2] 張智傑, "適用於網路入侵偵測不平衡資料之階層式多重分類器," 碩士, 電機工
程學研究所, 國立臺灣大學, 台北市, 2015. [Online]. Available:
https://hdl.handle.net/11296/59h7u3
[3] 蔡秉任, "針對未知攻擊辨識之混合式入侵偵測系統," 碩士, 資訊科學與工程研
究所, 國立交通大學, 新竹市, 2014. [Online]. Available:
https://hdl.handle.net/11296/54f85v
[4] 陳建智, 蔡雨龍, and 周國森, "開放網路架構異常流量之檢測技術," (in 繁體中
文), 電工通訊季刊, no. 2021第4季, pp. 81-92, 2021, doi:
10.6328/ciee.202112_(4).0007.
[5] 蕭漢威, 曾金山, 魏志平, and 楊竹星, "以網際網路流量進行網路服務分類預測
之研究," (in 繁體中文), 網際網路技術學刊, vol. 5, no. 1, pp. 49-55, 2004, doi:
10.6138/jit.2004.5.1.07.
[6] 連崇翰, "基於二元搜尋法上的封包分類演算法," 碩士, 資訊工程學系所, 國立中
興大學, 台中市, 2014. [Online]. Available:
https://hdl.handle.net/11296/pnz6kh
[7] 張瑜倫, "基於長短期記憶模型之異常網路流量偵測," 碩士, 資訊工程研究所,
國立中正大學, 嘉義縣, 2019. [Online]. Available:
https://hdl.handle.net/11296/y42zp6
[8] 高子棋, "一個偵測HTTP服務新型態異常的新穎方法," 碩士, 資訊工程學系所,
國立中興大學, 台中市, 2020. [Online]. Available:
https://hdl.handle.net/11296/2fnug3
[9] 李陳洋, "以知識蒸餾實現網路內學習之流量分類," 碩士, 網路工程研究所, 國立
交通大學, 新竹市, 2020. [Online]. Available:
https://hdl.handle.net/11296/7k593u
[10] 王澤宇, "機器學習於入侵偵測之資安成效研究:封包流量、系統日誌與系統資
源統計之比較," 碩士, 資訊科學與工程研究所, 國立交通大學, 新竹市, 2021.
[Online]. Available:
https://hdl.handle.net/11296/5x276g
[11] 許嘉榮, "一個有效的半監督式學習方法應用於入侵偵測系統," 碩士, 資訊工程
學系研究所, 國立中山大學, 高雄市, 2021. [Online]. Available:
44
https://hdl.handle.net/11296/vka9j2
[12] 林冠宏, "使用少量標記資料以半監督式學習建立砂輪表面異常檢測模型," 碩士,
工業與資訊管理學系碩士在職專班, 國立成功大學, 台南市, 2021. [Online].
Available:
https://hdl.handle.net/11296/6e6p2v