跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林圓皓
Yuan-Hau Lin
論文名稱: Facebook活動事件擷取系統
Facebook Activity Event Extraction System
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 38
中文關鍵詞: 活動事件擷取命名實體辨識社群媒體事件
外文關鍵詞: Activity Event Extraction, Named Entity Recognition, Social Media Event
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 社群網路的普及使得不少人以Facebook為媒介來宣傳活動,因此本論文的目的即是建立一個Facebook的活動事件擷取系統,以幫助使用者快速地掌握活動的資訊。我們改善了黄等人的Web NER Model Generation工具,藉以建立活動名稱及地點擷取模型,再利用序列樣版探勘找出活動的起始、結束日期。此外,我們也嘗試以大量的Facebook打卡地點來改善地點辨識準確率。實驗測試了1,300篇人工標記答案的貼文,以評斷系統擷取活動事件的效能和命名實體辨識的效能,並將擷取出來的活動地點實際投射到經緯度座標上,以評估預測活動實際位置的準確度。實驗結果顯示活動名稱、地點以及開始、結束日期擷取的F_1-score分別為0.727, 0.694及0.865, 0.72,活動事件整體辨識率為0.708,顯示藉由此系統來統整Facebook上的活動事件並定位出事件發生的地點是相當可行的。


    The popularity of social networks has made them a perfect medium for activity or advertising campaign promotion. Therefore, many people use Facebook pages to announce their advertising campaign. The purpose of this study is to extract activity events by constructing two named entity recognition models, namely activity name and location, via a Web NER model generation tool [1]. We enhance the tool by improving the tokenizer and alignment technique. In addition, we also use a large database of FB checkin places for location name recognition improvement. For entity relation extraction, we apply sequential pattern mining to find rules for start date, end date, and location coupling. We use 1,300 posts from Facebook to test the activity event extraction performance. The experimental results show 0.727, 0.694 F_1-score for activity name and location recognition; and 0.865, 0.72 F_1-score for start and end date extraction. Overall, the extraction performance for activity event extraction is 0.708.

    摘要 i Abstract ii 目錄 iii 表目錄 v 圖目錄 vi 一、 簡介 1 二、 相關研究 3 2-1 事件擷取 3 2-1-1 社群媒體事件擷取系統 3 2-1-2 新聞的事件擷取系統 3 2-2 資訊萃取 4 三、 系統架構及方法 5 3-1 定義擷取的活動事件 5 3-2 資料來源 6 3-3 實體的識別和工具的擴充 6 3-3-1社群媒體事件擷取系統 7 3-3-2工具的擴充 7 3-3-3地點實體的識別 8 3-3-4地址實體的識別和時程表達式標記模組 11 3-4 驗證資料集的產生和使用 11 3-5 關係的耦合 11 3-5-1擷取代表事件的活動名稱模組 11 3-5-2建立擷取活動時間的規則 12 3-6 活動事件的過濾 13 四、 實驗 14 4-1 資料集 15 4-2 命名實體辨識的效能 15 4-2-1活動名稱模型的評估 16 4-2-2地點模型的評估 18 4-3 事件擷取的效能 20 4-3-1活動事件關係的估量 20 4-3-2活動事件預測活動位置(GPS)評估實驗 21 五、 結論 22 參考 23 附錄一 - 擷取的活動事件的展示 25 附錄二 - 系統活動名稱定義 & CityTalk活動名稱 27 完整的活動事件名稱 27 CityTalk活動名稱 27 附錄三 - Query Based Crawler效能的提升 28

    [1] Y. Y. Huang and C.H. Chung, "A Tool for Web NER Model Generation Based on Google Snippets", National Central University graduated paper, 2015.
    [2] A. Ritter, O. Etzioni, and S. Clark. Open domain event extraction from Twitter. In Proc. SIGKDD, pages 1104–1112, 2012.
    [3] Wang, W.: Chinese news event 5w1h semantic elements extraction for event ontology population. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 197–202. ACM (2012)
    [4] N. Kanhabua, S. Romano, and A. Stewart. Identifying relevant temporal expressions for real-world events. In Proceedings of the SIGIR 2012 Workshop on Time-aware Information Access (TAIA ’12), 2012.
    [5] Suthasinee Kuptabut and Ponrudee Netisopakul Event Extraction using Ontology Directed Semantic Grammar. Journal of Information Science and Engineering 32,79-96 (2016)
    [6] Wallach, H.M. (2004) Conditional Random Fields: An Introduction.University of Pennsylvania CIS Technical Report MS-CIS-04-21.
    [7] N. Dalvi, M. Olteanu, M. Raghavan, and P. Bohannon. Deduplicating a places database. In Proceedings of the 23rd international conference on World wide web, pages 409–418. International World Wide Web Conferences Steering Committee, 2014.
    [8] Feng, Y., Huang, R., Sun, L.: Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models. In: Sixth SIGHAN Workshop on CLP, pp. 120–123. ACL Press, Hyderabad.
    [9] T.-S. Chen, M.-C, Chen, C.-H, Chang, "基於頁面層級之快速網頁資料擷取與綱要驗證", Conference on Technologies and Applications of Artificial Intelligencester, 2014.
    [10] Y.-S. Su, Associated Information Extraction for Enabling Entity Search on Electronic Map, National Central University, 2012.
    [11] J. Strötgen and M. Gertz. Heideltime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation, 2010.
    [12] Yu-Yang Lin Author, Chia-Hui Chang Author, “網頁商家名稱擷取與地址配對之研究” (ROCLING 2014) , Chung-li, Taiwan, September 91-93, 2014.

    QR CODE
    :::