| 研究生: |
楊佩臻 Pei-Chen Yang |
|---|---|
| 論文名稱: |
利用文句關係網路自動萃取文件摘要之研究 Using Sentence Network to Automatic Document Summarization |
| 指導教授: |
林熙禎
Shi-Jen Lin |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 73 |
| 中文關鍵詞: | 自動文件摘要 、文句關係網路 、圖形化摘要方法 |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究提出一個使用基於NGD的通用萃取式圖形化摘要方法,由於NGD擁有只需要文件本身資訊及搜尋引擎搜尋結果數的特點,可除去對外部資源如語料庫及語義辭典的依賴。本研究使用NGD計算文件內字詞之間的關聯找出文件關鍵字並用其建立一向量空間,以文句在向量空間中的餘弦相似度為基準建立文句關係網路,再利用鏈結分析找出文句關係網路中重要的文句節點作為摘要。
經ROUGE評估摘要品質,本研究所提出利用文句關係網路計分的方法在單文件摘要及50字的多文件摘要中,可達到比DUC2001及DUC2002當年利用機器學習摘要組合的方法更佳的結果,而在100字及200字的多文件摘要中,也僅略遜於當年利用機器學習的幾位參賽者。證明本研究確實建立一有效的不需要依賴相關語料庫及語義辭典的通用非監督式文件萃取式摘要方法。
This paper proposed a Graph-based Summarization method by building a sentence network that represent the relation between sentences with NGD. The method can get rid of the dependence of external resources like corpus and lexical database by using the words in the documents and the search result. Using Wiki Engine to calculate NGD and find out the relation between words. Finally, the keywords in the documents are found out. Building a Vector Space Model by the keywords and calculating the similarity between sentences to build a sentence network. The most import sentences are extracted by using Link Analysis. The experiment results showed that the ROUGE value of proposed graph-based single-document summarization method is better than other machine learning methods, and the ROUGE value of proposed graph-based multi-documents summarization method is just lower than few peers using machine learning methods. It proves that this proposed method is an effective unsupervised document summarization without external resources like corpus and lexical database.
[1] Abuobieda A., Salim N., Albaham A. T., Osman A.H., Kumar Y. J. (2012). Text Summarization Features Selection Method using Pseudo Genetic-based Model. International Conference on Information Retrieval & Knowledge Management, 2012, pp. 193-197.
[2] Antiqueira L., Jr. O. N. O., Costa, L. D. F., and Nunes, M. D. G. V. (2009). A complex network approach to text summarization. Information Sciences, 179 (2009), pp. 584-599.
[3] Atkinson, J. and Munoz, R. (2013). Rhetorics-based multi-document summarization. Expert Systems with Applications, (2013)
[4] Chali, Y. and Hasan, S. A. (2011). Query-focused multi-document summarization: automatic data annotations and supervised learning approaches. Natural Language Engineering, pp. 1-37. doi: 10.1017/S1351324911000167.
[5] Chen, B., Lin, S. H., Chang, Y. M., and Liu, J. W. (2013). Extractive speech summarization using evaluation metric-related training criteria. Information Processing and Management, 49 (1), pp. 1-12.
[6] Chen, P. I. and Lin, S. J. (2011). Word AdHoc Network: Using Google Core Distance to extract the most relevant information. Knowledge-Based Systems, 24(3), pp. 393-405.
[7] Cilibrasi, R.L. and Vitanyi, P.M.B. (2007). The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), pp, 370-383.
[8] Mihalcea, R. and Radev, D. (2011). Graph-based Natural Language Processing and Information Retrieval . Cambridge University Press.
[9] Wald, R., Khoshgoftaar, T. M., Dittman, D., Awada, W. and Napolitano, A. (2012). An extensive comparison of feature ranking aggregation techniques in bioinformatics. The 13th IEEE International Conference on Information Reuse and Integration, Las Vegas, USA August 8–10, 2012.
[10] Zhang, Z., Ge, S. S., and He, H. (2012). Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling. Information Processing and Management, 48 (2012), pp. 767-778.
[11] 郭映彤,「運用字詞與語句關係自動萃取文件摘要之研究」,國立中央大學,碩士論文,民國101年。
[12] 鄭奕駿,「離線搜尋Wikipedia以縮減NGD運算時間之研究」,國立中央大學,碩士論文,民國101年。