跳到主要內容

簡易檢索 / 詳目顯示

研究生: 范芳瑄
Fang-Syuan Fan
論文名稱: 應用於校內法規之分類化文字探勘與檢索技術
Classified Term Frequency-Inverse Document Frequency technique applied to school regulationsClassified Term Frequency-Inverse Document Frequency technique applied to school regulations
指導教授: 蔡孟峰
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系在職專班
Executive Master of Computer Science & Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 73
中文關鍵詞: 文字探勘文字探勘與檢索相似度分析階層式分群
外文關鍵詞: text mining, TF-IDF, Cosine Similarity, Hierarchical Clustering
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』,並建立於雲端平台上來做法規分類化處理。
    文字探勘與檢索技術只能呈現一種衡量量化方法,無法呈現多元化的選擇,因此透過相性並搭配餘弦相似性、階層式分群法等技術,使得一篇法規可在不同的相性產生不同的結果,透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。

    關鍵字:文字探勘、文字探勘與檢索、相似度分析、階層式分群
    本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』,並建立於雲端平台上來做法規分類化處理。
    文字探勘與檢索技術只能呈現一種衡量量化方法,無法呈現多元化的選擇,因此透過相性並搭配餘弦相似性、階層式分群法等技術,使得一篇法規可在不同的相性產生不同的結果,透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。


    This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
    Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

    keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
    This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
    Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

    keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
    This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
    Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

    中文摘要 i Abstract ii 致謝 iii 目錄 iv 圖目錄 vii 表目錄 ix 第一章 緒論 1 1.1 研究動機與背景 1 1.2 研究目的 2 1.3 論文架構 3 第二章 文獻探討 5 2.1 文字探勘與檢索技術(TF-IDF) 5 2.1.1 詞頻(TF) 5 2.1.2 逆向文本頻率(IDF) 6 2.1.3 結論 8 2.2 餘弦相似性 9 2.3 群聚分析 10 第三章 系統設計 13 3.1 系統流程與架構 13 3.1.1資料建置 13 3.1.2 文字處理 14 3.1.3 法條相性歸類 15 3.1.4 文字探勘與檢索 16 3.1.5 相似度分析 17 3.1.6 階層式分群 18 3.2 研究對象 19 第四章 研究方法 21 4.1 資料蒐集 21 4.2 文字前置處理 24 4.2.1 停用詞 24 4.2.2 同義詞替換 25 4.2.3 自定詞庫斷詞 25 4.3 相性定義 25 4.4 文字探勘與檢索(TF-IDF) 26 4.4.1 詞頻(TF) 29 4.4.2 逆向文本頻率(IDF) 30 4.4.2 結果 31 4.5 計算相似度分析 32 4.6 階層式分群法(Hierarchical Clustering) 32 第五章 雲端平台分析設計流程 34 5.1 開發環境 34 5.2 自定相性 34 5.3 匯入基本資料 35 5.4 自定詞庫 36 5.5 文本切詞 37 5.6 計算TF×IDF 38 5.7 法規相似度比較 40 第六章 實證分析與結果 41 6.1 相性詞語統計 41 6.2 個相性的分布結果 42 第七章 結論 47 7.1 結論 47 7.2 遇到的困難 47 7.3 未來展望 47 參考文獻 48 附錄一 法規明細表 50 附錄二 限制條件明細表 54 附錄三 利益與權利詞語明細表 55 附錄四 法規依據詞語明細表 56 附錄五 適用對象詞語明細表 57 附錄六 審核機制詞語明細表 58

    [1] P.‐N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison‐Wesley, Pearson International Edition, 2018.
    [2] A. Ochiai. Zoogeographical studies on the solenoid fish found in japan and its neighboring regions. Bull, Japan Soc. Sci. Fisheries 22, 526–530, 1957.
    [3] J. J. Barkman, Phytosociology and ecology of cryptogamic epiphytes, 1958.
    [4] Chowdhury, G. G. Introduction to modern information retrieval, Facet publishing, 2010.
    [5] G. Salton, E. A. Fox, H. Wu, Extended Boolean information retrieval. Cornell University, 1022–1036, 1982.
    [6] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Information processing & management, 24(5), 513-523, 1988.
    [7] V. Zappala, A. Cellino, P. Farinella, Z. Knezevic, Asteroid families. I-Identification by hierarchical clustering and reliability assessment, The Astronomical Journal, 100, 2030-2046, December 1990.
    [8] W. J. Frawley, G. Piatetsky-Shapiro, C. J. Matheus, Knowledge discovery in databases: An overview, AI magazine, 13(3), 57-57, 1992.
    [9] M. Bramer, Principles of data mining (Vol. 180), London: Springer, 2007.
    [10] I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2016.
    [11] K. A. Taipale, Data mining and domestic security: Connecting the dots to make sense of data, Columbia Science and Technology Law Review, 5(2), 2003.
    [12] C. Pitts, The End of Illegal Domestic Spying? Don't Count on It. Washington Spectator, 2007.
    [13] F. Schwed, J. Zweig, Where are the Customers' Yachts? Or A Good Hard Look at Wall Street (p. 212). New York: Simon and Schuster, 1940.
    [14] T. Menzies, Y. Hu, Data mining for very busy people. Computer, 36(11), 22-29, 2003.
    [15] R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, WEKA−Experiences with a Java Open-Source Project. Journal of Machine Learning Research, 11(Sep), 2533-2541, 2010.
    [16] J. Forcier, P. Bissex, W. J. Chun, Python web development with Django. Addison-Wesley Professional, 2008.

    QR CODE
    :::