使用圖神經網路偵測 PTT 的低活躍異常帳號｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王瑞緣 JUI-YUAN WANG
論文名稱：	使用圖神經網路偵測 PTT 的低活躍異常帳號 Using Graph Neural Networks to Detect Inactive Spammers on PTT
指導教授：	陳弘軒
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	73
中文關鍵詞：	網軍、異常帳號、低活躍異常帳號、低活躍帳號、圖神經網路、批踢踢、批踢踢實業坊
外文關鍵詞：	inactive user, inactive spammer
相關次數：	點閱：8 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著社群媒體的興起，雇用「公關公司」在網路上散播不實消息，成
為左右時事輿論的新興手法，公關公司的大量帳號，常被各大論壇視為
異常帳號。國內外皆有學者以深度學習偵測異常帳號，但我們發現，現
階段偵測異常帳號的論文中，並沒有針對帳號的活躍程度作探討。
本篇論文中，我們依據帳號在限定時間內的活動次數定義出「活
躍值」的概念，我們觀察到，用簡單的卷積類神經網路 (Convolutional
Neural Network) 模型，即可在偵測高活躍異常帳號的任務中達到 0.9169 的 ROC 曲線下的面積 (AUROC)，但是偵測低活躍的異常帳號卻只有 0.7830，顯示出偵測低活躍異常帳號是非常棘手的任務。我們利用使用者與使用者之間的關係建立社群網路，以提供額外的特徵作為訓練的資料，
並引入圖神經網路，成功改善偵測低活躍異常帳號的任務。

With the rise of social media, hiring public relations companies to spread fake news on the Internet has become an emerging method to manipulate public opinions. These large number of accounts owned by public relations companies are regarded as spammers by most online forums. Researchers have used deep learning techniques to detect abnormal accounts.
However, we found that these studies likely conducted experiments mainly on the active users.
In this thesis, we define the concept of ”Active Value” based on the number of activities of an account within a unit period. For active users, even a simple Convolutional Neural Network model can distinguish a spammer from a regular user: the area under the ROC curve (AUROC) achieves
0.9169. However, for the inactive users, the score drops to 0.7830. The result indicates that detecting inactivity spammers is much more challenging. We use user-to-user relationships to build a social network. We apply graph neural networks to the social network and extract additional social features as training clues. Experimental results show that these strategies better distinguish the spammers from regular users, especially when these users have limited activities.

目錄
頁次
摘要 iv
Abstract v
致謝 vii
目錄 viii
圖目錄 xi
表目錄 xiii
一、 緒論 1
二、 相關研究 3
2.1 不同網路平台上的異常帳號 .......................................... 3
2.2 偵測異常帳號的研究方法 ............................................. 4
2.3 偵測台灣 PTT 異常帳號 .............................................. 5
三、 研究模型及方法 7
3.1 活躍值 (Active Value).................................................. 7
3.2 資料集 ..................................................................... 7
3.2.1 PTT 的介紹及統計數字 ...................................... 7
3.2.2 PTT 官方認定的異常帳號 ................................... 8
3.2.3 帳號的篩選機制 ................................................ 8
viii
目錄 目錄
3.3 訓練特徵 .................................................................. 10
3.3.1 帳號參與的文章的總留言數 ................................. 10
3.3.2 帳號參與的文章的推噓總分 ................................. 12
3.3.3 帳號的活動時間 ................................................ 13
3.4 GNN 模型介紹........................................................... 14
3.4.1 Graph Convolutional Networks ............................. 14
3.4.2 Topology Adaptive Graph Convolutional Networks.... 16
3.4.3 Graph Attention Network .................................... 17
四、 實驗結果 21
4.1 實驗設置 .................................................................. 21
4.1.1 參數設置 ......................................................... 21
4.1.2 比較模型 ......................................................... 22
4.1.3 評估指標 ......................................................... 23
4.2 實驗結果與討論 ......................................................... 24
4.2.1 「偵測高活躍異常帳號」與「偵測低活躍異常帳號」
是否難度相同? ........................................................... 25
4.2.2 GNN-Method 是否有成功改善「偵測低活躍異常帳
號」的任務 ............................................................... 33
4.2.3 在 Baseline 加入 Social Network 的特徵，是否也能
夠改善「偵測低活躍異常帳號」的任務? .......................... 34
4.2.4 GNN-Method 模型加入 Social Network 的特徵，是
否在「偵測低活躍異常帳號」的任務中表現更為出色? ........ 38
4.2.5 模型認為前 K 個最有可能為異常的帳號，用 F1-
Score, Recall 與 Precision 評估效能。 ............................. 43
4.2.6 為什麼 F1-Score 並不會隨著活躍值上升?................ 55
五、 總結與未來展望 56
ix
目錄 目錄
參考文獻 57
附錄 59
                                

參考文獻
[1] Nhut-Lam Nguyen, Ming-Hung Wang*, Yu-Chen Dai, and Chyi-Ren, “Understanding Malicious Accounts in Online Political Discussions: A Multilayer Network Approach,” MDPI Sensor, 2021.
[2] 蔡榮峰, 陳詠君, “激化情緒對立的口水戰-PTT 八卦板疫情輿論分析,” 國家安全
研究所, 資訊工業策進會, 2021.
[3] Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, “Detecting and Characterizing Social Spam Campaigns,” IMC ’10: Internet Measurement Conference,
2010.
[4] Nhut-Lam Nguyen, Ming-Hung Wang, Chyi-Ren Dow, “Learning to Recognize
Sockpuppets in Online Political Discussions,” IEEE Systems Journal, 2021.
[5] X. Hu, J. Tang, and H. Liu, “Online social spammer detection,” in In AAAI, 2014.
[6] Yuqing Lu, Lei Zhang, Yudong Xiao, Yangguang Li, “Simultaneously detecting fake
reviews and review spammers using factor graph model,” WebSci ’13: Proceedings
of the 5th Annual ACM Web Science Conference, 2013.
[7] Yongji Wu, Defu Lian, Yiheng Xu, Le Wu, Enhong Chen, “Graph Convolutional
Networks with Markov Random Field Reasoning for Social Spammer Detection,”
AAAI Technical Track: Applications, 2020.
[8] Leyan Deng, Chenwang Wu, Defu Lian, Yongji Wu, Enhong Chen, “Markov-Driven
Graph Convolutional Networks for Social Spammer Detection,” IEEE Transactions
on Knowledge and Data Engineering, 2022.
[9] Yu Liu, Bin Wu, Bai Wang, Guanchen Li, “SDHM: A Hybrid Model for Spammer
Detection in Weibo,” 2014 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining (ASONAM 2014), 2014.
[10] Hao Fu, Xing Xie, Yong Rui, “Leveraging Careful Microblog Users for Spammer Detection,” WWW ’15 Companion: Proceedings of the 24th International Conference
on World Wide Web, 2015.
[11] L. A. Junting Ye, “Discovering opinion spammer groupsby network footprints,”
ECML Lecture Notes in Computer Science, 2015.
參考文獻
[12] 蔡秉承, “網軍判斷系統—以批踢踢電子布告欄系統為例,” 國立中山大學應用數學
系研究所, 2020.
[13] F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, “Detecting spammers on
twitter,” in In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS, 2010.
[14] Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, Yihong Zhao, “Spammer
Behavior Analysis and Detection in User Generated Content on Social Networks,”
2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012.
[15] Xia Hu, Jiliang Tang, Yanchao Zhang, Huan Liu, “Social spammer detection in
microblogging,” IJCAI ’13: Proceedings of the Twenty-Third international joint
conference on Artificial Intelligence, 2013.
[16] Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu, “Review Graph Based Online
Store Review Spammer Detection,” IEEE International Conference on Data Mining, 2011.
[17] Ming-Hung Wang, Nhut-Lam Nguyen, Shih-chan Dai, Po-Wen Chi, and Chyi-Ren,
“Understanding Potential Cyber-Armies in Elections: A Study of Taiwan,” MDPI
Sustainability, 2020.
[18] Thomas N. Kipf, Max Welling, “Semi-Supervised Classification with Graph Convolutional Networks,” ICLR 2017 conference submission, 2017.
[19] Jian Du, Shanghang Zhang, Guanhang Wu, José M. F. Moura & Soummya Kar,
“Topology Adaptive Graph Convolutional Networks,” ICLR 2018 Conference Blind
Submission, 2018.
[20] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
Liò, Yoshua Bengio, “Graph Attention Networks,” ICLR 2018 Conference Blind
Submission, 2018.

簡易檢索 / 詳目顯示

相關論文