| 研究生: |
林志龍 Chih-Lung Lin |
|---|---|
| 論文名稱: |
關聯性字組在文件摘要上的探討 Mining Association Words for Document Summarization |
| 指導教授: |
張嘉惠
Chia-Hui Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 91 |
| 語文別: | 英文 |
| 論文頁數: | 34 |
| 中文關鍵詞: | 關聯性字組 、文件摘要 |
| 外文關鍵詞: | document summarization, text summarization |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文件摘要是文件處理中重要的技術,可視為一種文件壓縮,主要目的在挑選合適字句當成摘要內容,大致上可分為兩類:針對單一文件或多份文件,單一文件的摘要方法大都使用分析文件意義與結構的方式,至於多份文件的摘要則大都使用叢聚的技術找出重要的共同部分,本研究採用最大頻繁序列的概念對大量文件找出其重要的部分,接著再利用它做文件摘要,此外文件摘要中一直存在的問題在於如何評量摘要結果,由於個人的主觀要素,很難有客觀的評量方法,針對此點,本篇論文採用文件分類的方式來評估摘要結果,提供了一個較客觀且快速的做法。
本研究的文件摘要方法其架構可分為幾個步驟,首先利用分群的技術將整個文件集合分成幾個較小的集合,如此可避免直接由一大群集合中尋找最大頻繁字組時所遇到最小門檻值過低的問題,接著分別對這些集合擷取關聯性字組,當成文件中重要的部分,然後搭配這些關聯性字組和一些計分的方法決定將哪些句子當成摘要結果,實驗結果顯示摘要的內容確實保留了文件中重要的部分。
[1] Helena Ahonen-Myka. Finding All Maximal Frequent Sequences in Text. International Conference on Machine Learning (ICML) 1997 p3
[2] Helena Ahonen-Myka. Finding Co-occurring Text Phrases by Combining Sequence and Frequent Set Discovery. International Joint Conference on Artificial Intelligence (IJCAI) 1999. p3
[3] Martin Rajman, Romaric Besancon. Text Mining- Knowledge extraction from unstructed textual data. 1998. p3
[4] Luhn, H.P. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2 1969. p5
[5] Eduard Hovy, Chin-Yew Lin. Automated Text Summarization in SUMMARIST. P6
[6] Daniel Marcu, Laurie Gerber. An Inquiry into the Nature of Multidocument Abstracts, Extracts, and Their Evaluation. 2001 p7
[7] Chin-Yew Lin, Eduard Hovy. From Single to Multi-document Summarization: A Prototype System and its Evaluation. ACL2002 p8
[8] S.M. Harabagiu, S. J.Maiorano. Multi-Document Summarization with GISTEXTER. P9
[9] Naomi Daniel, Dragomir Radev, Timothy Allison. Sub-event based multi-document summarization. DUC 2003 p10
[10] Hongyan Jing, Regina Barzilay, Kathleen McKeown, Michael Elhadad. Summarization Evaluation Methods:Experiments and Analysis. P11
[11] Inderjeet Mani. Recent Developments in Text Summarization.
[12] Satoshi Sekine, Chikashi Nobata. A Survey for Multi-Document Summarization. HLT-NAACL 2003 Workshop.
[13] Kathleen McKeown. The Columbia Multi-Document Summarizer for DUC. 2002
[14] Ken Barker, Yllias Chali, Terry Copeck, Stan Matwin, Stan Szpakowicz. The Design of a Configurable Text Summarization System. 1998
[15] Jade Goldstein, Vibhu Mittal, Jaime Carbonell, Mark Kantrowitz. Multi-Document Summarization By Sentence Extraction.
[16] Mann, W., and Thompson, S. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. 1988. p5
[17] Dragomir R. Radev, Hongyan Jing, Malgorzata Budzikowska. Centroid-based Summarization of Multiple Documents: Sentence Extraction, Utility-based evaluation, and User Studies. ANLP/NAACL Workshop 2000. p5