| 研究生: |
林圓皓 Yuan-Hao Lin |
|---|---|
| 論文名稱: |
中文聚會活動來源探索暨上下文感知式精細資訊擷取之研究 Chinese Meetup Event Extraction via Event Source Page Discovery and Context-Aware Information Extraction |
| 指導教授: |
張嘉惠
Chia-Hui Chang |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 140 |
| 中文關鍵詞: | 事件來源頁面發現 、自動分頁識別 、包裝程式歸納 、樣板移除 、活動檢測 、活動擷取 |
| 外文關鍵詞: | Event source page discovery, Automatic pagination recognition, Wrapper induction, Boilerplate removal, Event detection, Meetup event extraction |
| 相關次數: | 點閱:27 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自動從互聯網提取活動信息能顯著提升活動發現的便利性。現有方法通常依賴事件社交網路(EBSN)提供的開放 API,以捕捉特定地區與主題的活動資料,或透過全面的網路爬取方式來過濾活動,兩者皆存在一定的局限性。本研究提出了一個新穎的五階段框架,針對活動組織者網站與學校系所網站,自動提取活動資訊。該框架包括:事件來源頁面發現、自動分頁識別、樣板移除、活動檢測與活動擷取五個階段。
我們以 Facebook 活動頁面為起點,蒐集了潛在活動組織者網站,並另外收集了學校系所網站資料。最終共建立了 19,013 個事件來源記錄集資料 API,並在 2023 年 7 月 13 日至 2025 年 6 月 17 日期間以 24 小時週期排程擷取,累計抓取913,853 個張貼頁面的連結。經由樣板移除模塊處理後,我們提取了 404,497 條信息,並透過活動檢測模塊識別出 99,833 條活動消息。在這些活動組織者網站中,活動頁面所佔比例達到 11%,顯著高於 Google 研究團隊王等人在一般網站中發現的 1% 活動頁面比例,顯示本方法在成本效益上的優勢。最終,我們透過活動擷取模塊成功擷取了 73,913 個活動。
本文探討三大問題:(1) 事件來源的自動建立與定期爬取,(2) 自動從張貼頁面中擷取活動詳細資訊,以及 (3) 活動的搜尋與分析。本研究首先針對網路資訊快速變動且分散的特性,提出事件來源頁面發現策略與自動分頁識別模型,能自動定位並整合更新頁為長期可爬取的事件來源,再透過包裝程式歸納和排程機制定期擷取最新張貼頁面。其次,面對異質且雜訊龐雜的網頁貼文,我們結合樣板移除、活動檢測與活動擷取技術,建立細粒度資訊擷取流程,可準確萃取活動標題、地點與起訖日期等關鍵欄位,並將結果統一封裝為結構化資料,作為後續搜尋與分析的基礎。最後,為提升資料可用性與應用價值,本研究建立活動搜尋服務,提供多元且直觀的視覺化介面,並導入活動類型與年齡層分類模型,賦予語意標籤以支援多條件檢索與趨勢分析,協助用戶和企業有效發掘與解讀活動資訊。實驗結果顯示,所提出之端到端架構能在大規模中文網頁環境中有效建構結構化活動資料庫。
Automatically extracting meetup event information from the web can significantly enhance the convenience of event discovery. Existing approaches typically rely on open APIs provided by Event-Based Social Networks (EBSNs) to capture meetup event data for specific regions and topics, or conduct large-scale web crawling to filter meetup events, both of which have inherent limitations. In this study, we propose a novel five-stage framework for extracting meetup event information from event organizers’ websites and academic department websites. The framework consists of event source page discovery, automatic pagination recognition, boilerplate removal, event detection, and meetup event extraction.
Starting from Facebook event pages, we collected potential event organizer websites, supplemented by the acquisition of departmental websites from academic institutions. Ultimately, we established 19,013 event source record set APIs, scheduled for extraction at 24-hour intervals between July 13, 2023, and June 17, 2025, cumulatively retrieving links to 913,853 posting pages. After processing through the boilerplate removal module, we extracted 404,497 messages and, through the event detection module, identified 99,833 event messages. Among these event organizer websites, event pages constituted 11% of the total, a proportion significantly higher than the 1% event page rate discovered in general websites by Wang et al. of the Google research team, thus demonstrating the superior cost-effectiveness of our method.
Ultimately, the meetup event extraction module successfully extracted 73,913 meetup events.
We focus on three major challenges: (1) automatic establishment and periodic crawling of event sources, (2) automated extraction of event details from posting pages, and (3) the search and analysis of events. In response to the rapidly evolving and highly dispersed nature of web information, we first propose a strategy for event source page discovery alongside an automated pagination recognition model, capable of autonomously locating and consolidating update pages into sustainable, long-term event sources. These sources are then regularly crawled using wrapper induction and a scheduling mechanism. Secondly, to tackle the heterogeneity and pervasive noise in web posting pages, we integrate boilerplate removal, event detection, and meetup event extraction techniques to establish a fine-grained information extraction pipeline. This process accurately captures key fields such as event titles, venues, and start and end dates, and uniformly encapsulates the results into structured data to serve as a foundation for subsequent search and analysis. Finally, to enhance the usability and practical value of the data, we develop an event search service offering a diverse and intuitive visual interface. By integrating classification models for event types and age groups, we assign semantic tags that facilitate multi-criteria retrieval and trend analysis, empowering users and businesses to effectively uncover and interpret event-related insights. Experimental results demonstrate that the proposed end-to-end framework is capable of constructing a robust structured event database within a large-scale Chinese web environment.
[1] Qifan Wang, Bhargav Kanagal, Vijay Garg, and D. Sivakumar. Constructing a comprehensive events database from the web. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, page 229–238, New York, NY, USA, 2019. Association for Computing Machinery.
[2] Jiahuan Lei, Qing Zhang, Jinshan Wang, and Hengliang Luo. Bert based hierarchical sequence classification for context-aware microblog sentiment analysis. In International Conference on Neural Information Processing, pages 376–386, Sydney, NSW, Australia, 2019. Springer, Springer-VerlagBerlin, Heidelberg.
[3] Chia-Hui Chang, Yu-Ching Liao, and Ting Yeh. Event source page discovery via policy-based rl with multi-task neural sequence model. In Web Information Systems Engineering–WISE 2022: 23rd International Conference, Biarritz, France, November 1–3, 2022, Proceedings, pages 597–606. Springer, 2022.
[4] John Foley, Michael Bendersky, and Vanja Josifovski. Learning to extract local events from the web. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, page 423–432, New York, NY, USA, 2015. Association for Computing Machinery.
[5] Jason Rennie and Andrew McCallum. Using reinforcement learning to spider the web efficiently. In Proceedings of the Sixteenth International Conference on Machine Learning, ICML ’99, page 335–343, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[6] Filippo Menczer and Richard K Belew. Adaptive retrieval agents: Internalizing local context and scaling up to the web. Machine Learning, 39:203–242, 2000.
[7] Alexandros Grigoriadis and Georgios Paliouras. Focused crawling using temporal difference-learning. In Methods and Applications of Artificial Intelligence: Third Hellenic Conference on AI, SETN 2004, Samos, Greece, May 5-8, 2004. Proceedings 3, pages 142–153. Springer, 2004.
[8] Filippo Menczer, Gautam Pant, and Padmini Srinivasan. Topical web crawlers: Evaluating adaptive algorithms. ACM Transactions on Internet Technology (TOIT), 4(4):378–419, 2004.
[9] Ioannis Partalas, Georgios Paliouras, and Ioannis Vlahavas. Reinforcement learning with classifier selection for focused crawling. In ECAI 2008, pages 759–760. IOS Press, 2008.
[10] Miyoung Han, Pierre-Henri Wuillemin, and Pierre Senellart. Focused crawling through reinforcement learning. In Web Engineering: 18th International Conference, ICWE 2018, Cáceres, Spain, June 5-8, 2018, Proceedings 18, pages 261–278. Springer, 2018.
[11] Robert Meusel, Peter Mika, and Roi Blanco. Focused crawling for structured data. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pages 1039–1048, 2014.
[12] KPHB Colony. Previous/ Next Page Extension. https: //chrome.google.com/webstore/detail/previous-next-page/ fmichikmgflpgibapdhepmodjdjemmda, 2022. Chrome Extension.
[13] Google Extension. nextPage Extension. https://chrome.google.com/ webstore/detail/nextpage/njgkgdihapikidfkbodalicplflciggb, 2024. Chrome Extension.
[14] Tim Furche, Giovanni Grasso, Andrey Kravchenko, and Christian Schallhart. Turn the page: Automated traversal of paginated websites. In Marco Brambilla, Takehiro Tokuda, and Robert Tolksdorf, editors, Web Engineering, pages 332– 346, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
[15] Tianhao Wu and Vincent Sgro. Methods and systems for automated detection of pagination, 2016. US20160103799A1.
[16] Mikhail Korobov and Iván de Prado and Mark E. Haase. Autopager: Detect and classify pagination links. https://github.com/TeamHG-Memex/autopager, 2016.
[17] Cheng-Ju Wu, Chia-Hui Chang, and Tzu-Ping Lin. Automatic web data api creation via cross-lingual neural pagination recognition. In Web Engineering: 22nd International Conference, ICWE 2022, Bari, Italy, July 5–8, 2022, Proceedings, page 117–131, Berlin, Heidelberg, 2022. Springer-Verlag.
[18] Yuan-Hao Lin, Chia-Hui Chang, Hsiu-Min Chuang, Xiang-Shun Lin, Ting Yeh, and Min-Jhao Hong. Cost-effective event mining on the web via event source page discovery and data api construction. IEEE Access, 12:115981–115993, 2024.
[19] Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. Web data extraction, applications and techniques: A survey. Knowledge-based systems, 70:301–323, 2014.
[20] Sunita Sarawagi. Information extraction. Foundations and Trends in Databases, 1(3), 261–377, 2008. March 2008.
[21] Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in web pages. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, page 601–606, New York, NY, USA, 2003. Association for Computing Machinery.
[22] CH Chang. Information extraction based on pattern discovery. In Proc. of 10th World Wide Web Conference, 2001.
[23] Valter Crescenzi, Paolo Merialdo, and Disheng Qiu. Alfred: crowd assisted data extraction. In Proceedings of the 22nd international conference on World Wide Web, pages 297–300, 2013.
[24] Patricia Jiménez and Rafael Corchuelo. On learning web information extraction rules with tango. Information Systems, 62:74–103, 2016.
[25] Andrew Carlson, Justin Betteridge, Richard C Wang, Estevam R Hruschka Jr, and Tom M Mitchell. Coupled semi-supervised learning for information extraction. In Proceedings of the third ACM international conference on Web search and data mining, pages 101–110, 2010.
[26] Lidong Bing, Wai Lam, and Tak-Lam Wong. Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 567–576, 2013.
[27] Arvind Arasu and Hector Garcia-Molina. Extracting structured data from web pages. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 337–348, 2003.
[28] Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo, et al. Roadrunner: Towards automatic data extraction from large web sites. In VLDB, volume 1, pages 109–118, 2001.
[29] Mohammed Kayed and Chia-Hui Chang. Fivatech: Page-level web data extraction from template pages. IEEE transactions on knowledge and data engineering, 22(2):249–263, 2009.
[30] Hassan A Sleiman and Rafael Corchuelo. Tex: An efficient and effective unsupervised web information extractor. Knowledge-Based Systems, 39:109–123, 2013.
[31] Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, and Paolo Papotti. Extraction and integration of partially overlapping web sources. Proceedings of the VLDB Endowment, 6(10):805–816, 2013.
[32] Oviliani Yenty Yuliana and Chia-Hui Chang. A novel alignment algorithm for effective web data extraction from singleton-item pages. Applied Intelligence, 48(11):4355–4370, 2018.
[33] Chia-Hui Chang, Tian-Sheng Chen, Ming-Chuan Chen, and Jhung-Li Ding. Efficient page-level data extraction via schema induction and verification. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 478– 490. Springer, 2016.
[34] Oviliani Yenty Yuliana and Chia-Hui Chang. Dcade: Divide and conquer alignment with dynamic encoding for full page data extraction. Applied Intelligence, 50(2):271–295, feb 2020.
[35] Christian Kohlschütter, Peter Fankhauser, and Wolfgang Nejdl. Boilerplate detection using shallow text features. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, pages 441–450, New York, NY, USA, 2010. Association for Computing Machinery.
[36] Roland Schäfer. Accurate and efficient general-purpose boilerplate detection for crawled web corpora. Language Resources and Evaluation, 51(3):873–889, September 2017.
[37] Thijs Vogels, Octavian-Eugen Ganea, and Carsten Eickhoff. Web2text: Deep structured boilerplate removal. In Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, and Allan Hanbury, editors, Advances in Information Retrieval, pages 167–179, Cham, 2018. Springer International Publishing.
[38] Jurek Leonhardt, Avishek Anand, and Megha Khosla. Boilerplate removal using a neural sequence labeling model. In Companion Proceedings of the Web Conference 2020, pages 226–229, New York, NY, USA, 2020. Association for Computing Machinery.
[39] Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. Webtables: Exploring the power of tables on the web. Proc. VLDB Endow., 1(1):538–549, August 2008.
[40] Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. Turl: Table understanding through representation learning. SIGMOD Rec., 51(1):33–40, June 2022.
[41] Kaihang Zhang, Chuang Zhang, Xiaojun Chen, and Jianlong Tan. Automatic web news extraction based on ds theory considering content topics. In Yong Shi, Haohuan Fu, Yingjie Tian, Valeria V. Krzhizhanovskaya, Michael Harold Lees, Jack Dongarra, and Peter M. A. Sloot, editors, Computational Science – ICCS 2018, pages 194–207, Cham, 2018. Springer International Publishing.
[42] Marco Baroni, Francis Chantree, Adam Kilgarriff, and Serge Sharoff. Cleaneval: A competition for cleaning web pages. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, editors, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, May 2008. European Language Resources Association (ELRA).
[43] Geunseong Jung, Sungjae Han, Hansung Kim, Kwanguk Kim, and Jaehyuk Cha. Extracting the main content of web pages using the first impression area. IEEE Access, 10:129958–129969, 2022.
[44] Yu-Hao Wu and Chia-Hui Chang. Multi-task neural sequence labeling for zeroshot cross-language boilerplate removal. In IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pages 326– 334. Web Intelligence and Intelligent Agent Technology, 2021.
[45] Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel, and Pablo Gamallo. An unsupervised perplexity-based method for boilerplate removal. Natural Language Engineering, 30(1):132–149, 2024.
[46] Julián Alarte and Josep Silva. Hybex: A hybrid tool for template extraction. In Companion Proceedings of the Web Conference 2022, WWW ’22, pages 205–209, New York, NY, USA, 2022. Association for Computing Machinery.
[47] Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 167–176, Beijing, China, July 2015. Association for Computational Linguistics.
[48] Thien Huu Nguyen, Kyunghyun Cho, and Ralph Grishman. Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 300–309, San Diego, California, June 2016. Association for Computational Linguistics.
[49] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[50] Tongtao Zhang, Heng Ji, and Avirup Sil. Joint entity and event extraction with generative adversarial imitation learning. Data Intelligence, 1(2):99–120, 2019.
[51] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
[52] Xinya Du and Claire Cardie. Event extraction by answering (almost) natural questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 671–683, Online, November 2020. Association for Computational Linguistics.
[53] Rui Feng, Jie Yuan, and Chao Zhang. Probing and fine-tuning reading comprehension models for few-shot event extraction. arXiv preprint arXiv:2010.11325, 2020.
[54] Di Lu, Shihao Ran, Joel Tetreault, and Alejandro Jaimes. Event extraction as question generation and answering. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1666–1688, Toronto, Canada, July 2023. Association for Computational Linguistics.
[55] Can Tian, Yawei Zhao, and Liang Ren. A chinese event relation extraction model based on bert. In 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), pages 271–276. IEEE, 2019.
[56] Yue Zhang and Jie Yang. Chinese NER using lattice LSTM. In Iryna Gurevych and Yusuke Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1554–1564, Melbourne, Australia, July 2018. Association for Computational Linguistics.
[57] Ruotian Ma, Minlong Peng, Qi Zhang, Zhongyu Wei, and Xuanjing Huang. Simplify the usage of lexicon in Chinese NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5951–5960, Online, July 2020. Association for Computational Linguistics.
[58] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[59] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
[60] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
[61] Yaojie Lu, Hongyu Lin, Jin Xu, Xianpei Han, Jialong Tang, Annan Li, Le Sun, Meng Liao, and Shaoyi Chen. Text2event: Controllable sequence-to-structure generation for end-to-end event extraction. arXiv preprint arXiv:2106.09232, 2021.
[62] Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang. Pre-training with whole word masking for chinese bert. IEEE Transactions on Audio, Speech and Language Processing, 2021.
[63] Ruirui Chen, Chengwei Qin, Weifeng Jiang, and Dongkyu Choi. Is a large language model a good annotator for event extraction? Proceedings of the AAAI Conference on Artificial Intelligence, 38(16):17772–17780, March 2024.
[64] Zongxi Li, Xianming Li, Yuzhang Liu, Haoran Xie, Jing Li, Fu-lee Wang, Qing Li, and Xiaoqin Zhong. Label supervised llama finetuning. arXiv preprint arXiv:2310.01208, 2023.
[65] Kaiwen Wei, Xian Sun, Zequn Zhang, Jingyuan Zhang, Guo Zhi, and Li Jin. Trigger is not sufficient: Exploiting frame-aware knowledge for implicit event argument extraction. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4672–4682, Online, August 2021. Association for Computational Linguistics.
[66] Kaiwen Wei, Xian Sun, Zequn Zhang, Li Jin, Jingyuan Zhang, Jianwei Lv, and Zhi Guo. Implicit event argument extraction with argument-argument relational knowledge. IEEE Transactions on Knowledge and Data Engineering, 35(9):8865–8879, 2023.
[67] Sha Li, Heng Ji, and Jiawei Han. Document-level event argument extraction by conditional generation. In North American Chapter of the Association for Computational Linguistics, 2021.
[68] Bangze Pan, Yang Li, Suge Wang, Xiaoli Li, Deyu Li, Jian Liao, and Jianxing Zheng. Document-level event extraction via information interaction based on event relation and argument correlation. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5156–5166, Torino, Italia, May 2024. ELRA and ICCL.
[69] Qiang Gao, Zixiang Meng, Bobo Li, Jun Zhou, Fei Li, Chong Teng, and Donghong Ji. Harvesting events from multiple sources: Towards a crossdocument event extraction paradigm. In Findings of the Association for Computational Linguistics: ACL 2024, pages 1913–1927, Bangkok, Thailand, 2024. Association for Computational Linguistics.
[70] Alan Ritter, Mausam, Oren Etzioni, and Sam Clark. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, page 1104– 1112, New York, NY, USA, 2012. Association for Computing Machinery.
[71] Xiaoming Zhang, Xiaoming Chen, Yan Chen, Senzhang Wang, Zhoujun Li, and Jiali Xia. Event detection and popularity prediction in microblogging. Neurocomputing, 149:1469 – 1480, 2015.
[72] Yubo Chen, Shulin Liu, Xiang Zhang, Kang Liu, and Jun Zhao. Automatically labeled data generation for large scale event extraction. In ACL, 2017.
[73] Kaiwen Wei, Yiran Yang, Li Jin, Xian Sun, Zequn Zhang, Jingyuan Zhang, Xiao Li, Linhao Zhang, Jintao Liu, and Guo Zhi. Guide the many-to-one assignment: Open information extraction via IoU-aware optimal transport. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4971–4984, Toronto, Canada, July 2023. Association for Computational Linguistics.
[74] Minale A. Abebe, Joe Tekli, Fekade Getahun, Richard Chbeir, and Gilbert Tekli. Generic metadata representation framework for social-based event detection, description, and linkage. Knowledge-Based Systems, 188:104817, 2020.
[75] Yuwei Cao, Hao Peng, Jia Wu, Yingtong Dou, Jianxin Li, and Philip S. Yu. Knowledge-preserving incremental social event detection via heterogeneous gnns. In Proceedings of the Web Conference 2021, WWW ’21, page 3383–3395, New York, NY, USA, 2021. Association for Computing Machinery.
[76] Huaiwen Zhang, Quan Fang, Shengsheng Qian, and Changsheng Xu. Multimodal knowledge-aware event memory network for social media rumor detection. In Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, page 1942–1951, New York, NY, USA, 2019. Association for Computing Machinery.
[77] Yuan-Hao Lin, Chia-Hui Chang, and Hsiu-Min Chuang. Eventgo! mining events through semi-supervised event title recognition and pattern-based venue/date coupling. Journal of Information Science Engineering, 39(3):655–670, 2023.
[78] Yuan-Hao Lin, Chia-Hui Chang, and Hsiu-Min Chuang. Fine-grained meetup events extraction through context-aware event argument positioning and recognition. International Journal of Computational Intelligence Systems, 17(1):296, 2024.
[79] Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, and Juliana S. Teixeira. A brief survey of web data extraction tools. SIGMOD Rec., 31(2):84–93, jun 2002.
[80] Chia-Hui Chang, M. Kayed, M.R. Girgis, and K.F. Shaalan. A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18(10):1411–1428, 2006.
[81] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8(3–4):229–256, May 1992.
[82] Hung Le, Quang Pham, Doyen Sahoo, and Steven CH Hoi. Urlnet: Learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:1802.03162, 2018.
[83] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019.
[84] Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. Gpt understands, too, 2021.
[85] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[86] Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 207–212, Berlin, Germany, August 2016. Association for Computational Linguistics.
[87] Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Ouyang Xuan, Dianhai Yu, Hao Tian, Hua Wu, and Haifeng Wang. Ernie 3.0: Largescale knowledge enhanced pre-training for language understanding and generation. In arXiv preprint arXiv:2107.02137, volume abs/2107.02137, ”Online”, 2021. arXiv.
[88] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 260–270, San Diego, California, June 2016. Association for Computational Linguistics.
[89] Yuan-Hao Lin and Chia-Hui Chang. Facebook 活動事件擷取系統(Facebook activity event extraction system)[in Chinese]. In Chung-Hsien Wu, Yuen-Hsien Tseng, and Hung-Yu Kao, editors, Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016), pages 229–243, Tainan, Taiwan, October 2016. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
[90] Shih Chien Student Council. TIMELESS ORBIT shih chien university campus concert promotion. https://www.instagram.com/p/DH3EPq9SSQ-/, 2025. Instagram post with event details and visual story promotion.
[91] National Taiwan Normal University OIA. 2025 Forest Symphony: NTNU International Cultural Festival. https://www.instagram.com/p/DJ1AMRXygiz/, 2025. Instagram promotion for cultural exchange and opening ceremony.
[92] NTUST Architecture Dept. Timeless ntust architecture thesis exhibition. https://www.instagram.com/p/DKUvE5XyWT0/?img_index=1, 2025.
[93] Yuying Zhu and Guoxin Wang. CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3384–3393, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
[94] Wei Liu, Tongge Xu, Qinghua Xu, Jiayu Song, and Yueran Zu. An encoding strategy based word-character LSTM for Chinese NER. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2379–2389, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
[95] Ruixue Ding, Pengjun Xie, Xiaoyan Zhang, Wei Lu, Linlin Li, and Luo Si. A neural multi-digraph model for Chinese NER with gazetteers. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1462–1467, Florence, Italy, July 2019. Association for Computational Linguistics.
[96] Xiaonan Li, Hang Yan, Xipeng Qiu, and Xuanjing Huang. FLAT: Chinese NER using flat-lattice transformer. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6836–6842, Online, July 2020. Association for Computational Linguistics.
[97] Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. ERNIE: Enhanced language representation with informative entities. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1441– 1451, Florence, Italy, July 2019. Association for Computational Linguistics.
[98] Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, and Yonggang Wang. ZEN: Pre-training Chinese text encoder enhanced by n-gram representations. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4729–4740, Online, November 2020. Association for Computational Linguistics.