| 研究生: |
范芳瑄 Fang-Syuan Fan |
|---|---|
| 論文名稱: |
應用於校內法規之分類化文字探勘與檢索技術 Classified Term Frequency-Inverse Document Frequency technique applied to school regulationsClassified Term Frequency-Inverse Document Frequency technique applied to school regulations |
| 指導教授: | 蔡孟峰 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 73 |
| 中文關鍵詞: | 文字探勘 、文字探勘與檢索 、相似度分析 、階層式分群 |
| 外文關鍵詞: | text mining, TF-IDF, Cosine Similarity, Hierarchical Clustering |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』,並建立於雲端平台上來做法規分類化處理。
文字探勘與檢索技術只能呈現一種衡量量化方法,無法呈現多元化的選擇,因此透過相性並搭配餘弦相似性、階層式分群法等技術,使得一篇法規可在不同的相性產生不同的結果,透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。
關鍵字:文字探勘、文字探勘與檢索、相似度分析、階層式分群
本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』,並建立於雲端平台上來做法規分類化處理。
文字探勘與檢索技術只能呈現一種衡量量化方法,無法呈現多元化的選擇,因此透過相性並搭配餘弦相似性、階層式分群法等技術,使得一篇法規可在不同的相性產生不同的結果,透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.
keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.
keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.
[1] P.‐N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison‐Wesley, Pearson International Edition, 2018.
[2] A. Ochiai. Zoogeographical studies on the solenoid fish found in japan and its neighboring regions. Bull, Japan Soc. Sci. Fisheries 22, 526–530, 1957.
[3] J. J. Barkman, Phytosociology and ecology of cryptogamic epiphytes, 1958.
[4] Chowdhury, G. G. Introduction to modern information retrieval, Facet publishing, 2010.
[5] G. Salton, E. A. Fox, H. Wu, Extended Boolean information retrieval. Cornell University, 1022–1036, 1982.
[6] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Information processing & management, 24(5), 513-523, 1988.
[7] V. Zappala, A. Cellino, P. Farinella, Z. Knezevic, Asteroid families. I-Identification by hierarchical clustering and reliability assessment, The Astronomical Journal, 100, 2030-2046, December 1990.
[8] W. J. Frawley, G. Piatetsky-Shapiro, C. J. Matheus, Knowledge discovery in databases: An overview, AI magazine, 13(3), 57-57, 1992.
[9] M. Bramer, Principles of data mining (Vol. 180), London: Springer, 2007.
[10] I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2016.
[11] K. A. Taipale, Data mining and domestic security: Connecting the dots to make sense of data, Columbia Science and Technology Law Review, 5(2), 2003.
[12] C. Pitts, The End of Illegal Domestic Spying? Don't Count on It. Washington Spectator, 2007.
[13] F. Schwed, J. Zweig, Where are the Customers' Yachts? Or A Good Hard Look at Wall Street (p. 212). New York: Simon and Schuster, 1940.
[14] T. Menzies, Y. Hu, Data mining for very busy people. Computer, 36(11), 22-29, 2003.
[15] R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, WEKA−Experiences with a Java Open-Source Project. Journal of Machine Learning Research, 11(Sep), 2533-2541, 2010.
[16] J. Forcier, P. Bissex, W. J. Chun, Python web development with Django. Addison-Wesley Professional, 2008.