跳到主要內容

簡易檢索 / 詳目顯示

研究生: 劉譯閎
Yi-Hung Liu
論文名稱: 對於法律問題進行判例檢索和法條預測
Judgment Retrieval and Statute Prediction for Legal Problems
指導教授: 陳彥良
口試委員:
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2014
畢業學年度: 103
語文別: 英文
論文頁數: 77
中文關鍵詞: 文件探勘法條刑事判例向量空間模型標準化谷歌距離支援向量機
外文關鍵詞: Text Mining, Statute, Criminal judgment, Vector space model, Normalized Google Distance, Support Vector Machines
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 應用文件探勘在法律問題處理上已成為近年來新興的研究領域。就我們所知,即使先前已有少數的研究著重於協助法律專業人士檢索相關的法律文件,然而這些研究並未考量到一般人民在使用法律詞彙來描述碰到的法律問題有其困難的地方,同時,也沒有研究在探討有關於利用法律問題來進行相關法條預測。在本論文中,我們探討二個研究議題:藉由運用法律文件的特性進行判例檢索及法條預測。在第一個研究主題之中,我們提出了一個基於文件探勘的方法讓一般人士可以使用日常詞彙來搜尋及檢索出相關的刑事判例。在第二個研究主題中,提出了一個三階段法條預測方法。這個預測的方法提供非專業人士使用日常詞彙來描述法律問題進而用以找出問題所涉及相關的法條。本文透過兩個主要實驗設計來驗證成效。在第一個研究議題實驗上,我們使用了傳統的TF-IDF方法與本文所提出的判例檢索方法透過問卷調查的方式進行成效比較。就第二個研究議題實驗中,我們採用了四個知名的檢索方法分別為Cosine 相似度、Pearson 相關係數、 Spearman's相關係數及TF-IDF與本文提出的三階段法條預測方法進行成效比較。經由實驗過程(以中文刑事判例為資料集),說明這兩個研究議題所提出的方法皆具有效性及準確性,同時顯示此兩個方法皆優於傳統方法。


    Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although a few previous studies focused on assisting professionals in the retrieval of related legal documents, to our knowledge, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms and could not provide relevant statutes to the general public using problem statements. In this dissertation, we formulate two research topics: judgment retrieval and statute prediction using the unique characteristics of legal documents. In the first research topic, we design a text mining based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. Then we present an innovative approach, the three-phase prediction (TPP) algorithm, which enables laypeople to use daily vocabulary to describe their problems and find pertinent statutes for their cases. There are two experiments to validate our proposed research methods. The first experimental study compares the performances of traditional TF-IDF method and our judgment retrieval approach through a survey. The second one is based on the statute prediction problem, and four state of the art retrieval functions including Cosine similarity, Pearson correlation coefficient, Spearman's correlation coefficient and TF-IDF methods are compared with TPP. Both proposed methods have been verified for accuracy and effectiveness by using Chinese Criminal Code judgments. The results show that the proposed methods are accurate and they are more advantageous than traditional methods.

    Table of Contents i List of Figures iii List of Tables iv Chapter 1. Introduction 1 1.1. Considering the judgment aspect of legal problems 3 1.2. Considering the statute aspect of legal problems 5 1.3. Organization of the Dissertation 8 Chapter 2. Literature Review 9 2.1. Background 9 2.2. An overview of text mining 10 2.3. Applications of text mining 11 2.4. Related academic research on text mining in the legal domain 12 Chapter 3. Retrieving associated judgments for legal problems 13 3.1. Definitions 13 3.2. The Judgment Retrieval Approach 14 3.2.1. Phase 1: Training set generation 14 3.2.2. Phase 2: Query 14 3.3. Experimental Study 22 3.3.1. Data Collection 22 3.3.2. Details of implementation 14 3.3.3. Experimental results and evaluation 25 3.4. Summary 27 Chapter 4. Predicting relevant statutes for legal poblems............................................29 4.1. Differences between legal documents and normal documents 29 4.2. The Three-Phase Prediction Approach 30 4.2.1. Phase 1: Select the top k1 statutes 31 4.2.2. Phase 2: Select the top k2 statutes 37 4.2.3. Phase 3: Select the final predicted statutes 38 4.3. Experimental Study 40 4.3.1. Testbed 40 4.3.2. Details of implementation 42 4.3.3. Experimental results and evaluation 44 4.3.3.1. Find the optimal combination 44 4.3.3.2. Comparison 47 4.4. Summary 50 Chapter 5. Discussions and Limitations 51 5.1. Findings 51 5.2. Limitations 52 Chapter 6. Conclusions and Future Works 51 References 56 Appendix 60

    [1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Wokingham, UK: Addison-Wesley, 1999.
    [2] A. Bergholz, J. De Beer, S. Glahn, M.F. Moens, G. Paaß and S. Strobel, “New filtering approaches for phishing email”, Journal of Computer Security, 18(1), pp.7-35, 2010.
    [3] A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. van der Goot, M. Halkia, B. Pouliquen and J. Belyaeva, “Sentiment Analysis in the News”, Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC'2010), pp. 2216-2220. Valletta, Malta, pp.19-21, May, 2010.
    [4] F. Can and E.A. Ozkarahan, “Computation of term/document discrimination values by use of the cover coefficient concept”, Journal of the American Society for Information Science, 38(3), pp.171-183, 1987.
    [5] Chuan-hsi Chen and Jeffery Y. P. Chi, “Use Text Mining to Generate the Draft of indictment for Prosecutor”, PACIS 2010 proceedings, pp.706-712, 2010.
    [6] C.C. Chang and C.J. Lin, “LIBSVM: a library for support vector machines”, <http://www.csie.ntu.edu.tw/~cjlin/libsvm> Accessed 01.07.2013.
    [7] S.C. Chou and T.P. Hsing, “Text Mining Technique for Chinese Written Judgment of Criminal Case”, IEEE Intelligence and Security Informatics Conference, pp.113-125, 2010.
    [8] L. Chen, D. Zhang and M. Levene, “Question retrieval with user intent”, Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, July 28-August 01, Dublin, Ireland, 2013.
    [9] Y.L. Chen and Y.T. Chiu, “An IPC-based vector space model for patent retrieval”, Information Processing and Management, 47(3), pp.309-322, 2011.
    [10] W. Chen, J. Yan, B. Zhang, Z. Chen, and Q. Yang, “Document Transformation for Multi-label Feature Selection in Text Categorization”, Proc. 7th IEEE International Conference on Data Mining, IEEE Computer Society, Los Alamitos, CA, USA, pp.451–456, 2007.
    [11] Rudi L. Cilibrasi and Paul M.B. Vitanyi, “The Google Similarity Distance“, IEEE Transactions on Knowledge and Data Engineering, 19(3), pp.370-383, 2007.
    [12] A. Clare and R.D. King, “Knowledge Discovery in Multi-Label Phenotype Data“, Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pp.42-53, 2001.
    [13] J.G. Conrad and F. Schilder, “Opinion mining in legal blogs“, ICAIL '07 Proceedings of the 11th international conference on Artificial intelligence and law, pp.231-236, 2007.
    [14] A. Evangelista and B. Kjos-Hanssen, “Google distance between words“, Frontiers in Undergraduate Research, Univ. of Connecticut, 2006.
    [15] R. Feldman and J. Sanger, “The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data“, New York, USA: Cambridge University Press, 2007.
    [16] J. Goldstein, V. Mittal, J. Carbonell and M. Kantrowitz, “Multi-document summarization by sentence extraction“, Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization, pp.40-48, April 30-30, Seattle, Washington, 2000.
    [17] A. Gomez-Perez, F. Ortiz-Rodriguez and B. Villazon-Terrazas, “Ontology-based legal information retrieval to improve the information access in e-government“, Proceedings of the 15th international conference on World Wide Web, pp.1007-1008, 2007.
    [18] A. Hotho, A. Nürnberger and G. Paaß, “A brief Survey of text mining“, Journal for Computational Linguistics and Language Technology, 20(1), pp.19-62, 2005.
    [19] H.H. Hsu, Y.F. Chen, C.Y. Lin, C.W. Hsieh and T.K. Shih, “Emotion Care Services with Facebook Wall Messages“, The 26th International Conference on Advanced Information Networking and Applications Workshops, pp.875-880, 2012.
    [20] J. Kaur, M. Yusof, P. Boursier and J.M. Ogier, “Automated scientific document retrieval“, The 2nd International Conference on Computer and Automation Engineering, ICCAE 20105, pp.732-736, 2010.
    [21] H. Kawai, A. Jatowt, K. Tanaka, L. Kunieda and K. Yamada, “Query expansion and text mining for chronoseeker-search engine for future/past events“, IEICE Transactions on Information and Systems, E94-D (3), pp.552-563, 2011.
    [22] K.E. Lochbaum and L.A. Streeter, “Combining and comparing the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval“, Information Processing and Management, 25(6), pp.665-676, 1989.
    [23] Y.J. Li, C. Luo, and S.M. Chung, “Text clustering with feature selection by using statistical data“, IEEE Transactions on Knowledge And Data Engineering, 20(5), pp.641-652, 2008.
    [24] N. Li and D.D. Wu, “Using text mining and sentiment analysis for online forums hotspot detection and forecast“, Decision Support Systems, 48(2), pp.354–368, 2010.
    [25] X. Li, L. Du and Y.D. Shen, “Update Summarization via Graph-Based Sentence Ranking“, IEEE Transactions on Knowledge and Data Engineering, 25(5), pp.1162–1174, 2013.
    [26] H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering“, IEEE Transactions on Knowledge and Data Engineering, 17(4), pp.491-502, 2005.
    [27] M.F. Moens, “Innovative techniques for legal text retrieval“, Artificial Intelligence and Law, pp.29-57, 2001.
    [28] M.F. Moens, “Combining structured and unstructured information in a retrieval model for accessing legislation“, ICAIL '05 Proceedings of the 10th international conference on Artificial intelligence and law, pp.141-145, 2005.
    [29] L. Nie, M. Wang, Z. Zha, G. Li and T.S. Chua, “Multimedia answering: enriching text QA with media information“, Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July 24-28, Beijing, China, 2011.
    [30] A. Reyes, P. Rosso and D. Buscaldi, “From humor recognition to irony detection: The figurative language of social media“, Data & Knowledge Engineering, 74, pp.1-12, 2012.
    [31] M.N. Ribeiro, M.J.R. Neto and R.B.C. Prudêncio, “Local feature selection in text clustering“,15th ICONIP, Springer, pp.45-52, 2008.
    [32] M. Rogati and Y. Yang, “High-performing feature selection for text classification“, CIKM’02, pp.659-661, 2002.
    [33] G. Salton, A. Wong and C.S. Yang, “A vector space model for automatic indexing“, Communications of the ACM, 18(11), pp.613-620, 1975.
    [34] G. Salton and M. McGill, Introduction to Modern Information Retrieval. New York, USA: McGraw-Hill, 1983.
    [35] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval“, Information Processing and Management, 24(5), pp.513-523, 1988
    [36] G. Salton, Automatic Text Processing, Addison-Wesley, USA, 1989.
    [37] G. Salton, J. Allan and C. Buckley, “Automatic structuring and retrieval of large text files“, Communications of the ACM, 37(2), pp.97–108, 1994.
    [38] R. Schumaker, Y. Zhang, C. Huang and H. Chen, “Evaluating sentiment in financial news articles“, Decision Support Systems, 53(3), pp.458-464, 2012.
    [39] E. Stamatatos, “A survey of modern authorship attribution methods“, Journal of the American Society for Information Science and Technology, 60(3), pp.538-556, 2009.
    [40] S. Thomaidou and M. Vazirgiannis, “Multiword keyword recommendation system for online advertising“, Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining, pp.423-427, 2011.
    [41] A.J.C. Trappey and C.V. Trappey, “An R&D knowledge management method for patent document“, Industrial Management and Data Systems, 108(1-2), pp.245-257, 2008.
    [42] D. Tikk, G. Biró and A. Törcsvári, “A hierarchical online classifier for patent categorization“, Emerging Technologies of Text Mining: Techniques and Applications, pp.244–267, 2007.
    [43] G. Tsoumakas, I. Katakis and I. Vlahavas, “Mining Multi-label Data“, Data Mining and Knowledge Discovery Handbook, pp.667-685, 2010.
    [44] D. Wang, S. Zhu, T. Li and Y. Gong, “Multi-document summarization using sentence-based topic models“, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, pp.297-300, 2009.
    [45] K. Wang, Z.Y. Ming, X. Hu and T.S. Chua, “Segmentation of multi-sentence questions: towards effective question retrieval in cQA services“, Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, Geneva, Switzerland, 2010.
    [46] J. Wang, B. Wang, L.Y. Duan, Q. Tian and H. Lu, “Interactive ads recommendation with contextual search on product topic space“, Multimedia Tools and Applications, pp.1-22, 2011.
    [47] T.A. Almeida, J. Almeida and A. Yamakami, “Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers “, Journal of Internet Services and Applications, 1(3), pp.183-200, 2011.
    [48] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. (3nd ed.). San Francisco, CA: Morgan Kaufmann, 2011.
    [49] H. Yin, “Method and system of knowledge based search engine using text mining“, Google Patents, US Patent 7257530, 2007.
    [50] S.E. Seker, C. Mert, K. Al-Naami, U. Ayan and N. Ozalp, “Ensemble classification over stock market time series and economy news “, IEEE International Conference on Intelligence and Security Informatics (ISI), pp.272-273, 2013.
    [51] R. Zheng, J. Li, H. Chen and Z. Huang, “A Framework for Authorship Identification of Online Messages“, Journal of the American Society for Information Science and Technology, 57(3), pp.378-393, 2006.
    [52] A. Wyner, R. Mochales-Palau, M. Moens and D. Milward, “Approaches to text mining arguments from legal cases”, Lecture Notes in Computer Science, 6036, pp.60-79, 2010.
    [53] M. Truyens and P.V. Eecke, “Legal aspects of text mining”, Computer Law & Security Review, 30(2), pp.153-170, 2014.
    [54] Erik Cambria, Bjorn Schuller, Bing Liu, Haixun Wang and Catherine Havasi, “Knowledge-Based Approaches to Concept-Level Sentiment Analysis”, IEEE Intelligent Systems, 28(2), pp.12-14, 2013.

    QR CODE
    :::