| 研究生: |
胡蕙玲 Hui-Ling Hu |
|---|---|
| 論文名稱: |
典型資料模式挖掘研究 The Research of Typical Pattern Mining |
| 指導教授: |
陳彥良
Yen-Liang Chen |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 94 |
| 語文別: | 英文 |
| 論文頁數: | 120 |
| 中文關鍵詞: | 資料挖掘 、典型資料模式挖掘 、叢集 |
| 外文關鍵詞: | Data mining, Typical patterns mining, Clustering |
| 相關次數: | 點閱:7 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來由於資訊科技的發達,已有許多技術及方法被成功的發展出來,用來挖掘有用及有趣的資訊模式,包括觀念描述、關聯規則、分類與預測、叢集和演化分析等,本論文提出一種新的資訊模式,稱為典型資料模式,提供決策者對給定的資料集有更好的了解。假定給定一個包含n個物件的資料集,每個物件可以以一組屬性值來描述,典型資料模式挖掘將由資料集中,選擇出一個緊實而適合的k物件子集合,用來代表整個資料集,根據這樣的定義,本研究提出典型資料模式挖掘方法,並且以幾個真實資料集來實作,找出有用的典型資料模式。另外,由於自動化的典型資料挖掘方法無法藉助使用者的專業知識與經驗,本研究也提出動態的使用者互動式典型資料模式挖掘方法,讓使用者可以根據經驗和專業的知識操控參數,以獲得更好的典型資料模式挖掘結果,根據所提出的互動模式,本論文開發使用者互動典型資料模式挖掘系統,以挖掘資訊系統相關典型期刊,提供一個比靜態的典型資料模式挖掘更有效的方法。
Many approaches have been proposed to discover useful information patterns from databases, such as concept description, associations, sequential patterns, classification, clustering, and deviation detection. This research proposes a new type of information pattern, called typical patterns, which can provide decision makers with a better understanding of a given dataset. Suppose we are given a dataset containing n objects, each of which is described by a set of attribute values. Mining typical patterns is to select a small subset of objects, say k objects, from these n objects so that these k chosen objects are a compact and suitable representation of the original dataset. Accordingly, the Typical Patterns Mining (TPM) algorithms have been developed to mine typical patterns from databases. Also, extensive experiments have been carried out using real datasets to demonstrate the usefulness of typical patterns in practical situations. Then, although TPM is a good method to automatically determine typical patterns, it lacks ability to accommodate user’s experience and domain knowledge, which are very crucial for making decision in a dynamic business environment. Therefore, this research also develops a dynamic and interactive approach for typical pattern mining, called interactive Typical Pattern Mining (iTPM). In this approach, we accommodate users’ experiences and knowledge by allowing users to iteratively adjust the parameters during the interactive process. Then, an iTPM system is developed to mine typical journals of IS field. The results of experiments indicate that iTPM is more effective than the previous static approach.
References
[1] D. A. Adjeroh, K. C. Nwosu, Multimedia database management-requirements and issues, IEEE multimedia 4(3) (1997) 24-33.
[2] R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pages 94-105, Seattle, WA, June, 1998.
[3] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’93), pages 207-216, Washington, DC, May 1993.
[4] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. I. Verkamo, Fast discovery of association rules, In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996.
[5] R. Agrawal, Srikant, Fast algorithms for mining association rules in large databases, In Research Report RJ 9839, IBM Almaden Research Center, San Jose, CA, June 1994.
[6] M. Ankerst, M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: Ordering points to identify the clustering structure, In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pages 49-60, Philadelphia, PA, June, 1999.
[7] Association for Information Systems, “MIS Journal Ranking”. Retrieved April 4, 2006, from the World Wide Web: http://www.isworld.org/csaunders/rankings.htm.
[8] D. BarBara, W. DuMouchel, C. Faloutsos, P. J. Haan, J. H. Helerstein, Y. Ioanniddis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, K. C. Servcik, The New Jersey data reduction report, Bulletin of the Technical Committee on Data Engineering, 20 (1997), 3-45.
[9] S. Basumallick, J. S. K. Wong, Design and implementation of a distributed database system, Journal of System Software 34(4) (1996) 21-29.
[10] A. Berson, S. J. Smith, Data Warehousing, Data Mining, and OLAP, McGraw-Hill, 1997.
[11] P. A. Bradley, 1994, BradleyCase-based reasoning: Business applications, Communication of the ACM, 37(3) (1994) 40-42.
[12] P. Bradley, U. Fayyad, C. Reina, Scaling clustering algorithms to large databases, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pages 9-15, New York, August, 1998.
[13] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees, Wadsworth International Group, 1984.
[14] Y. Cai, N. Cercone, J. Han Attribute-Oriented induction in relational database. In G. Piatetsky-Shapiro, W. J. Frawley, editors, Knowledge Discovery in Databases, Cambridge, 1991.
[15] C. Carter, H. Hamilton, Efficient attribute-oriented generalization for knowledge discovery from large databases, IEEE Trans. Knowledge and Data Engineering, 1998.
[16] S. Chaudhuri, U. Dayal, An overview of data warehousing and OLAP technology, ACM SIGMOD Record, 26 (1997) 65-74.
[17] Y. C. Chen, H. L. Hu, A novel approach for mining typical patterns from databases. Manuscript submitted for publication (2006).
[18] W. Cleveland, Visualizing Data. Summit, Hobart Press, 1993.
[19] S. P. Curran, J. Mingers, Neural networks, decision tree induction and discriminate analysis: An empirical comparison, J. Operational Research Society, 45, 1994.
[20] M. Dash, H. Liu, Feature selection methods for classification, Intelligent Data Analysis: An International Journal, 1, 1997.
[21] R.N. Dave, Validating Fuzzy Partitions obtained through c-shells clustering, Pattern Recognition Letters 17(6) (1996) 613-623.
[22] J. L. Devore. Probability and Statistic for Engineering and the Sciences, 4th ed. Duxbury Press, 1995.
[23] R. Duda, P. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, 1973.
[24] R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, Fourth Edition, Addison-Wesley, 2003.
[25] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases, In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD’96), pages 226-23, Portland, OR, August, 1996.
[26] M. Ester, H. -P. Kriegel, X. Xu, Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification, In Proc. 4th Int. Symp. Large Spatial Databases (SSD’95), pages 67-82, Portland, ME, August, 1995.
[27] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J. D. Ullman, Computing iceberg queries efficiently, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pages 299-310, New York, Aug. 1998.
[28] D. Fisher, Improving inference through conceptual clustering, In Proc. 1987 AAAI Conf., pages 461-465, Seattle, WA, July, 1987.
[29] J. H. Friedman, A recursive partitioning decision rule for nonparametric classifiers, IEEE trans. on Comp., (26) (1977) 404-408.
[30] Y. H. Fu, Scientific Collaboration and Coauthors in Life Science Journal Articles, Journal of Library and Information Studies (17) (2002) 71-80.
[31] P. Ganesan, H. Garcia-Molina, J. Widom, Exploiting Hierarchical Domain Structure to Compute Similarity, ACM Transactions on Information Systems, 21 (1) (2003) 64–93.
[32] D. Goldberg, Genetic Algorithms in Search , Optimization, and Machine Learning. Reading, Addison-Wesley, 1989.
[33] J. Grabmeier, A. Rudolph, Techniques of Cluster Algorithms in Data Mining, Data Mining and Knowledge Discovery journal 6(4) ( 2002) 303-360.
[34] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh, Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, Data Mining and Knowledge Discovery, 1(1997) 29-54.
[35] S. Guha, R. Rastogi, K. Shim, Cure: An efficient clustering algorithm for large databases, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pages 73-84, Seattle, WA, June, 1998.
[36] S. Guha, R. Rastogi, K. Shim, Rock: A robust clustering algorithm for categorical attributes, In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pages 512-521, Sydney, Australia, March, 1999.
[37] C. S. Guynes, L. Pelley, Monitoring database performance in an end user environment, Journal of System Management 44(8) (1993) 27-30.
[38] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Academic Press, San Francisco, 2001.
[39] J. Han, Y. Cai, N. Cersone, Data-driven discovery of quantitative rules in relational databases. IEEE Trans. Knowledge and Data Engineering, 5 (1993) 29-40.
[40] J. Han, Y. Fu, Discovery of multiple-level association rules form large databases, In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB’95), pages 420-431, Zurich, Switzerland, Sept. 1995.
[41] J. Han, Y. Fu, Exploration of the power of attribute-oriented induction in data mining, In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, Cambridge, 1996.
[42] B. C. Hardgrave, K. A. Walstrom, Forums for MIS Scholars, Communications of the ACM 40 (11) (1997) 119-124.
[43] A. Hinneburg, D. A. Keim, An efficient approach to clustering in large multimedia databases with noise, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pages 58-65, New York, August, 1998.
[44] C. W. Holsapple, L. E. Johnson, H. Manakyan, J. Tanner, A Citation Analysis of Business Computing Research Journals, Information Management 25 (5) (1993) 231-244.
[45] N.C. Hsieh, Hybrid Mining Approach in the Design of Credit Scoring Models, Expert Systems with Applications 28(4) (2005) 655-665.
[46] Z. Huang, Extensions to the k-means algorithm for clustering large datasets with categorical values, Data Mining and Knowledge Discovery (2) (1998) 283-304.
[47] P.W. Huang, P.L. Lin, H.Y. Lin, Optimizing storage utilization in R-tree dynamic index structure for spatial databases, The Journal of Systems and Software 55(3) (2001) 291-299.
[48] W. H. Inmon, Building the Data Warehouse, John Wiley & Sons, 1996.
[49] A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: A survey, ACM comput. Surv., (31) (1999) 264-323.
[50] M. James, Classification Algorithms, John Wiley & Sons, 1985.
[51] D. K. Jeffrey, H. G. Kristin, D. Cynthia, A Method for Building Core Journal Lists in Interdisciplinary Subject Areas, Journal of Document 54 (4) (1998) 477-488.
[52] G. Karypis, E.-H. Han, V. Kumar, CHAMELEON: A hierarchical clustering algorithm using dynamic modeling, COMPUTER, (32) (1999) 68-75.
[53] L. Kaufman, P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York, John Wiley & Sons, 1990.
[54] R. L. Kennedy, Y. Lee, B. Van Roy, C. D. Reed, R. P. Lippman, Solving Data Mining Problems Through Pattern Recognition, Prentice Hall, 1998.
[55] R. Kimball, The Data Warehouse Toolkit, John Wiley & Sons, 1996.
[56] S. L. Lauritzen, The EM algorithm for graphical association models with missing data, Computational Statistics and Data Analysis, (19) (1995) 191-120.
[57] S.I. Lee, S. Batzoglou, Application of Independent Component Analysis to Microarrays, Genome Biology 4 (11) No. R76 (2003).
[58] B. Lent, A. Swami, J. Widom, Clustering association rules, In Proc. 1997 Int. Conf. Data Engineering (ICDE’97), pages 220-231, Birmingham, England, Apr. 1997.
[59] H. Liu, H. Motoda, editors, Feature Extraction, Construction, and Selection: A Data Mining Perspective, Kluwer Academic Publishers, 1998.
[60] H. Liu and H. Motoda. Feature Selection for knowledge Discovery and Data Mining. Kluwer Academic Publishers, 1998.
[61] P. B. Lowry, D. Romans, A. Curtis, Global journal prestige and supporting disciplines: A scientometric study of information systems journals, Journal of the Association for Information Systems 5 (2) (2004) 29-75.
[62] C. Lu, M.S. Drew, J. Au, An Automatic Video Classification System Based on a Combination of HMM and Video Summarization, International Journal of Smart Engineering System Design 5 (2003) 33-45.
[63] J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability 1 ( 1967) 281-297.
[64] H. Mannila, H. Toivonen, A. I. Verkamo, Efficient algorithms for discovering association rules, In Proc. AAAI’94 Workshop Knowledge Discovery in Databases (KDD’94), pages 181-192, Seattle, WA, July 1994.
[65] G. S. Mela, Radiological Research in Europe: A Bibliometric Study, European Radiology 13 (4) (2003) 657-662.
[66] R. S. Michalski, R. E. Stepp, Learning from observation: Conceptual clustering, In R. S. Michalski, J. G. Carbonell, T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach (1), San Mateo, Morgan Kaufmann, 1983.
[67] N.A. Mylonopoulos, V. Theoharakis, On-Site: Global Perceptions of IS Journals, Communications of the ACM 44 (9) (2001) 29-33.
[68] W. J. Nash, T. L. Sellers, S. R. Talbot, A.J. Cawthorn, W. B. Ford, The Population Biology of Abalone (_Haliotis_species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report 48 (1994).
[69] J. Neter, M. H. Kutner, C. J. Nachtsheim, L. Wasserman, Applied Linear Statistical Models, Fifth edition, McGraw-Hill, 2005.
[70] R. Ng, J. Han, Efficient and effective clustering method for spatial data mining, In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pages 144-155, Santiago, Chile, September, 1994.
[71] E. Ogston, B. Overeinder, M.V. Steen, F. Brazier, A method for decentralized clustering in large multi-agent systems, Proceedings of the second international joint conference on Autonomous agents and multiagent systems (2003) 789-796.
[72] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, In Proc. 7th Int. Conf. Database Theory (ICDT’99), pages 398-416, Jerusalem, Israel, Jan. 1999.
[73] K. Peffers, Y. Tang , Identifying and evaluating the universe of outlets for information systems research: Ranking the journals, The Journal of Information Technology Theory and Application (JITTA) 5 (1) (2003) 63-84.
[74] J. Pei, J. Han, R. Mao, CLOSET: An efficient algorithm for mining frequent closed itemsets, In Proc. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DMKD00), pages 11-20, Dallas, TX, May 2000.
[75] D. Pyle, Data Preparation for Data Mining, Morgan Kaufmann, 1999.
[76] J. R. Quinlan, Bagging, Boosting, and C4.5, In Proc. 12th Natl. Conf. Artificial Intelligence (AAAI’96), page 725-730, Portland, OR, Aug, 1996.
[77] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.
[78] J. R. Quinlan, Unknown attribute values in induction, In Proc. 6th Int. Workshop on Machine Learning, pages 164-168, Ithaca, NY, June 1989.
[79] R. K. Rainer, M. Miller, Examining differences across journal rankings, Communications of the ACM 48 (2) (2005) 91-94.
[80] R. Ramakrishnan, J. Gehrke, Database Management Systems, Third Edition, McGraw Hill, 2002.
[81] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, In D. E. Rumelhart, J. L. McClelland, editors, Parallel Distributed Processing, MIT Press, 1986.
[82] J. W. Shavlik, T. G. Dietterich, Readings in Machine Learning, San Meteo, Morgan Kaufmann, 1990.
[83] R. C. Schank, Dynamic Memory: A Theory of Reminding and Learning in Computers and People, Cambridge Press, 1983.
[84] G. Sheikholeslami, S. Chatterjee, A. Zhang, WaveCluster: A multiresolution clustering approach for very large sptial databases, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pages 428-439, New York, August, 1998.
[85] A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Fifth Edition, McGraw-Hill, 2005.
[86] R. Srikant, R. Agrawal, Mining generalized association rules, In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB’95), pages 407-419, Zurich, Switzerland, Sept. 1995.
[87] P. N. Tan, V. Kumar, J. Srivastava, Selecting the right objective measure for association analysis, Information Systems (29) (2004) 293–313.
[88] C.W. Tao, Unsupervised Fuzzy Clustering with Multi-Center Clusters, Fuzzy Sets and Systems 128(3) (2002) 305-322.
[89] Thomson Corp., “ISI Web of Knowledge and Journal Report”. Retrieved February 28, 2006, from the World Wide Web: http://www.isisnet.com.
[90] J. D. Ullman, J. Widom, A first Course in Database System, Second edition, Prentice Hall, 2001.
[91] W. Wang, J. Yang, R. Muntz, STING: A statistical information grid approach to spatial data mining, In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB’97), pages 186-195, Athens, Greece, August, 1997.
[92] S. M. Weiss, C. A. Kulikowski, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, Morgan Kaufmann, 1991.
[93] S. M. Weiss, N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998.
[94] M. E. Whitman, A. R. Hendrickson, A. M. Townsend, Research Commentary. Academic Rewards for Teaching, Research and service: Data and Discourse, Information Systems Research 10 (2) (1999) 99-109.
[95] M.S. Yang, C.H. Ko, On A Class of Fuzzy C-Numbers Clustering Procedures for Fuzzy Data, Fuzzy Sets and Systems 84(1) (1996) 49-60.
[96] C. Zadeh, Fuzzy sets, Information Control, 8 (1965) 338-353.
[97] T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: An efficient data clustering method for very large databases, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pages 103-114, Montreal, Canada, June, 1996.
[98] W. Ziarko, The discovery, analysis, and representation of data dependencies in databases, In G. Piatetsky-Shapiro, W. J. Frawley, editors, Knowledge Discovery in Databases, pages 195-209, Menlo Park: AAAI Press, 1991.