跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張朝順
Chao-Shun Zhang
論文名稱: 以詞𢑥分析探勘論文寫作之3C結構水準
Using Word Analysis to Explore the 3C Structure Level of Master’s Thesis
指導教授: 薛義誠
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 129
中文關鍵詞: 多文章自動摘要語意分析論文一致性論文完整性論文正確性
外文關鍵詞: multi-document automatic summarization, Master’s Thesis consistency, completeness, correctness
相關次數: 點閱:25下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 好的論文會明確地寫出目的及結論,且兩者之間應具有一致性及完整性,反之則會缺漏造成內容前後不一致,並造成使用者錯誤之引用,使用者搜尋論文主要是透過摘要來評定是否參考,摘要在論文架構中之功能為協助讀者快速理解論文內容,因此好的論文摘要能提升論文搜尋之正確性,摘要內容又以目的及結論為重,故本研究以臺灣碩博論文知識加值系統近六年來中央大學資訊管理系之碩士論文為實驗資料,以TextRank演算法萃取文章之特徵,採用ROUGE-1 做為評量依據評量論文之一致性及完整性,再以多文章之擷取式自動摘要技術TextRank、LexRank、Luhn與潛在語意分析 (Latent Semantic Analysis,LSA)四種方式產生正確性較佳之摘要,與原始摘要做特徵比對,進而評估正確性。
    本研究目的為評論論文寫作邏輯之一致性,範圍之完整性及結論之正確性之結構水準,並將自動摘要附在原文摘要之後,藉此幫助查詢者提升論文搜尋之正確性,藉由實驗發現2013年及2015年一致性、完整性及正確性較佳,2014年及2017較差且發現論文具一致性、完整性及正確性因素之一為指導教授。具體貢獻(1)建立一個非監督式學習的驗證系統 (2)系統評估速度快,可立即評估該論文是否通過一致性、完整性及正確性(3)提供各項數據,供論文寫作者調整其結構達到一致性、完整性及正確性(4)讓指導教授以系統自動檢核論文減少人工審閱時間(5)提供自動摘要增加內容分辨力輔助提升查詢論文之正確性


    A good Master’s thesis will clearly stating the purpose and conclusion. Between the purpose and the conclusion, there should be consistency and completeness. On the other hand, missing the above two points, the content will fall short and have contradictions, that will misguide the thesis readers to quote incorrectly. Generally, a reader defines a thesis worth takes references by abstract. The abstract helps the readers to understand the content in a quicker way. Therefore, a good abstract will elevate the correctness of giving a right thesis to meet the reader’s needs. The content of the abstract of a thesis values the purpose and the conclusion the most. This research takes the master’s thesis of the Department of Information Management of National Central University from National Digital Library of Theses and Dissertations System in Taiwan as research data, using TextRank algorithm to extract the features of a thesis, applying ROUGE-1 as evaluation basis to measure the consistency and completeness of the thesis. Furthermore, with the help of the four algorithm of automatic multi-document extraction system TextRand, LexRank, Luhn, and potential semantic meaning analyzation system LSA (Latent Semantic Analysis) to make an abstract with a better correctness. Then, using this automatic summarization from the above technologies to compare with the original abstract to measure the correctness.
    The purpose of this research is to comment on the consistency of the paper writing logic, the completeness of the scope, and the correctness of the conclusions, hopefully, after applying the auto- abstract to the original summary, there will be results with better correctness occurring from the thesis searching for the readers. From experiments, the consistency, completeness and correctness of 2013 and 2015 were found better, and 2014 and 2017 were found worse and the professor guidance has a great correlation about 3C structure . The contributions are: (1) Establishing a unsupervised verification system. (2) the consistency, completeness and correctness of the thesis can immediately be assessed pass or not. (3) Provide data for thesis writing to adjust its structure to achieve consistency, completeness and correctness. (4) Professors can automatically check thesis to reduce manual review task. (5) Provide automatic summaries to help improve the accuracy of query thesis.

    摘要 I ABSTRACT II 致謝 III 目錄 IV 圖目錄 VIII 表目錄 X 第一章、 緒論 1 1.1研究背景 1 1.2研究動機 2 1.3研究目的 4 1.4 預期影響性及研究貢獻 5 1.5 研究流程 5 第二章、 文獻回顧 6 2.1文字探勘 6 2.1.1中文斷詞相關研究 7 2.1.2 特徵選取 8 2.1.3 文本排序 8 2.2 完整性、一致性及正確性 9 2.3自動摘要技術 10 2.3.1 詞彙中心性排序 12 2.3.2 盧恩演算法 13 2.3.3 潛在語意分析 14 第三章、 實驗方法 17 3.1實驗架構 17 3.2實驗資料 17 3.3以原始摘要、目錄、結論章節評估一致性及完整性 18 3.3.1擷取目的、結論及摘要章節 19 3.3.2以混合式方式進行中文斷詞 19 3.3.3萃取文章特徵 19 3.3.4以原始摘要、目的、結論章節評估一致性 20 3.3.5以原始摘要、目的、結論章節評估完整性 22 3.4產生自動摘要以評估一致性及完整性 23 3.4.1將目的及結論章節依照標點符號切割句子 24 3.4.2產生自動摘要 24 3.4.3以自動摘要、目錄、結論章節評估一致性及完整性 24 3.5判別自動摘要結果及進行正確性評估 24 3.5.1依照自動摘要所評估之結果進行判讀 25 3.5.2以原始摘要及自動摘要評估正確性 25 第四章、 實驗結果 27 4.1評估歷年資料之一致性 27 4.1.1採用2012年之實驗資料 27 4.1.2採用2013年之實驗資料 28 4.1.3採用2014年之實驗資料 30 4.1.4採用2015年之實驗資料 31 4.1.5採用2016年之實驗資料 32 4.1.6採用2017年之實驗資料 34 4.1.7 小結 35 4.2評估歷年資料之完整性 36 4.2.1採用2012年之實驗資料 36 4.2.2採用2013年之實驗資料 37 4.2.3採用2014年之實驗資料 39 4.2.4採用2015年之實驗資料 41 4.2.5採用2016年之實驗資料 43 4.2.6採用2017年之實驗資料 44 4.2.7小結 46 4.3評估歷年資料之正確性 47 4.3.1採用一致性較佳之自動摘要評估正確性 47 4.3.1.1採用2012年之實驗資料 48 4.3.1.2採用2013年之實驗資料 49 4.3.1.3採用2014年之實驗資料 51 4.3.1.4採用2015年之實驗資料 52 4.3.1.5採用2016年之實驗資料 54 4.3.1.6採用2017年之實驗資料 55 4.3.2採用完整性較佳之自動摘要評估正確性 57 4.3.2.1採用2012年之實驗資料 57 4.3.2.2採用2013年之實驗資料 59 4.3.2.3採用2014年之實驗資料 60 4.3.2.4採用2015年之實驗資料 62 4.3.2.5採用2016年之實驗資料 64 4.3.2.6採用2017年之實驗資料 66 4.3.3小結 67 4.4個案討論 70 4.4.1 原始摘要評估一致性、完整性結果 70 4.4.1.1一致性、完整性結構水平高之論文 71 4.4.1.2一致性、完整性結構水平低之論文 72 4.4.2 自動摘要評估一致性、完整性結果 73 4.4.2.2一致性低於原始摘要之自動摘要 73 4.4.2.2完整性低於原始摘要之自動摘要 75 4.4.3正確性較佳之原始摘要 76 4.4.3.1正確性結構水平高之原始摘要 76 4.4.3.2正確性結構水平低之原始摘要 77 第五章、 結論 79 5.1研究結論及貢獻 79 5.2研究限制 83 5.3未來研究 83 參考文獻 86 英文文獻 86 中文文獻 90 附錄一 實驗數據 91 附錄二 實驗資料 105

    英文文獻
    Adline, A. L., Mahalakshmi, G. S., & Sendhilkumar, S. (2018). Graph Based Generation of Research Paper Summaries. Journal of Computational and Theoretical Nanoscience, 15(4), 1106-1111.
    Bazerman, C. (1984). Modern evolution of the experimental report in physics: Spectroscopic articles in Physical Review, 1893-1980. Social studies of science, 14(2), 163-196.
    Boehm, B. W. (1984). Verifying and validating software requirements and design specifications. IEEE software, 1(1), 75.
    Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU, 4, 192-195.
    Davis, A. M. (1990). Software requirements: analysis and specification: Prentice Hall Press.
    Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391.
    Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. journal of artificial intelligence research, 22, 457-479.
    Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, USA.
    Lin, C.-Y. (2003). ROUGE: Recall-oriented understudy for gisting evaluation. In.
    Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
    Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. Paper presented at the Proceedings of the 2004 conference on empirical methods in natural language processing.
    M. Witbrock and V. Mittal, Ultra Summarization: a Statistical Approach to Generating
    Highly Condensed Non-extractive Summaries, Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information
    Retrieval (SIGIR), pp. 315–316, 1999
    Ng, H. T., Goh, W. B., & Low, K. L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. Paper presented at the ACM SIGIR Forum.
    Paice, C. D. (1990). Constructing literature abstracts by computer: techniques and prospects. Information Processing & Management, 26(1), 171-186.
    Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational linguistics, 28(4), 399-408.
    Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
    Sollaci, L. B., & Pereira, M. G. (2004). The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey. Journal of the medical library association, 92(3), 364.
    Steinberger, J., & Jezek, K. (2004). Using latent semantic analysis in text summarization and summary evaluation. Proc. ISIM, 4, 93-100.
    Sullivan, D. (2001). Document warehousing and text mining: techniques for improving business operations, marketing, and sales: John Wiley & Sons, Inc.
    Witbrock, M. J., & Mittal, V. O. (1999). Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries. Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval.
    Wu, J. (2011). Improving the writing of research papers: IMRAD and beyond. In: Springer.
    Xiong, C., Li, Y., & Lv, K. (2017, June). Multi-documents Summarization Based on the TextRank and Its Application in Argumentation System. In International Conference on Emerging Internetworking, Data & Web Technologies (pp. 457-466). Springer, Cham.
    Xiong, S., & Luo, Y. (2014, December). A new approach for multi-document summarization based on latent semantic analysis. In Computational Intelligence and Design (ISCID), 2014 Seventh International Symposium on (Vol. 1, pp. 177-180). IEEE.
    Yeh, C.-L. (1991). Rule-based word identification for Mandarin Chinese sentences-A unification approach. Computer Processing of Chinese and Oriental Languages.
    Yonghe, L., & Jianhua, C. (2014). Public opinion analysis of microblog content. Paper presented at the Information Science and Applications (ICISA), 2014 International Conference on.
    Zowghi, D., & Gervasi, V. (2002). The Three Cs of requirements: consistency, completeness, and correctness. Paper presented at the International Workshop on Requirements Engineering: Foundations for Software Quality, Essen, Germany: Essener Informatik Beitiage.
    Zowghi, D., & Gervasi, V. (2003). On the interplay between consistency, completeness, and correctness in requirements evolution. Information and Software Technology, 45(14), 993-1009. doi:https://doi.org/10.1016/S0950-5849(03)00100-9
    Zowghi, D., & Gervasi, V. (2004). Erratum to “On the interplay between consistency, completeness, and correctness in requirements evolution”. Information and Software Technology, 46(11), 763-779.
    中文文獻
    李俊宏、張興亞(2007)。一個以 Ontology 為基礎的 Web-Mining 技術應用於供應 鏈競爭分析之研究.電子商務學報,第九卷,第三期,頁 435-160。
    李麗華、李富民、詹尚驥、周裕健(2009)。以學術部落格為主之個人化推薦系統.資訊科技國際期刊(IJAIT),第 3 卷,Vol. 3,頁 56-75.
    曾朝譽. (2017)。運用文字探勘評估碩士論文之一致性與完整性. 國立中央大學.
    葉鎮源、柯皓仁、楊維邦. (2001)。文件自動化摘要方法之研究及其在中文文件的應用.
    劉志明、于波、歐陽純萍、余穎、陽小華、翟雲. (2017)。基於主題的SE -Text Rank情感摘要方法. 情報工程, 3(3), 97-104.
    謝育倫、劉士弘、陳冠宇、王新民、許聞廉、陳柏琳 (2016)。運用序列到序列生成架構於重寫式自動摘要

    QR CODE
    :::