跳到主要內容

簡易檢索 / 詳目顯示

研究生: 吳欣怡
Shin-Yi Wu
論文名稱: 區間式及點式序列樣式探勘
Interval-based and Point-based Sequential Pattern Mining
指導教授: 陳彥良
Yen-Liang Chen
口試委員:
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
畢業學年度: 95
語文別: 英文
論文頁數: 103
中文關鍵詞: 序列樣式時間區間樣式混合序列樣式資料探勘區間序列樣式
外文關鍵詞: Data Mining, Sequential Patterns, Temporal Patterns, Interval-based Event Sequence, Hybrid Event Sequence
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資料探勘技術可運用於許多領域,例如:行銷分析、決策支援、詐欺偵測、企業管理等等。資料探勘研究領域發展了許多技術從大量資料中分析出有用的資訊,而序列樣式探勘是其中重要的技術之一。過去的序列樣式探勘技術多為針對點式事件所設計,也就是說,這些技術所探勘的序列資料中,所有的事件皆發生於某個時間點。然而,在許多應用中,事件並非必然只發生在一個時間點,而可能持續發生於一段時間,這樣的事件稱之為區間式事件。在區間式事件所組成的序列中尋找頻繁樣式,稱之為區間序列樣式探勘。而在其它一些應用中,序列中的事件也許不必然為點式或區間式,而是兩種事件皆可能發生的情況。此類序列稱之為混合事件序列。而在這類序列中尋找頻繁序列即稱之為混合序列樣式探勘。由於傳統的序列樣式探勘方法無法用來探勘區間事件序列或混合序列樣式,因此本文提出兩個方法分別用以探勘區間序列樣式及混合序列樣式。經由一連串實驗過程 (包含人工資料及真實資料),說明此二探勘方法皆為有效率及有效。


    Data mining is useful in various domains, such as market analysis, decision support, fraud detection and business management, among others. Many approaches have been proposed to extract information and sequential pattern mining is one of the mostimportant methods. Previous studies of sequential pattern mining have discovered patterns from point-based event sequences. However, in some applications, event sequences may contain interval-based events or hybrid events (both point-based and interval-based events). Frequent patterns discovered from interval-based event sequences are called temporal patterns, and those discovered from hybrid event sequences are called hybrid temporal patterns. But because the existing methods for discovering sequential patterns are not applicable to mine temporal pattern or hybrid patterns, this study is dedicated to develop new methods to discover temporal patterns and hybrid temporal patterns. Both proposed methods have been verified for efficiency and effectiveness by using synthetic and real datasets.

    ABSTRACT.................................................................................................................. I 中文摘要......................................................................................................................II 誌謝............................................................................................................................. III CONTENTS............................................................................................................... IV LIST OF FIGURES .................................................................................................. VI LIST OF TABLES....................................................................................................VII CHAPTER 1 INTRODUCTION.............................................................................1 1.1 APPLICATIONS OF TEMPORAL PATTERN MINING .................................................3 1.2 APPLICATIONS OF HYBRID TEMPORAL PATTERN MINING....................................4 1.3 ORGANIZATION OF THIS DISSERTATION ...............................................................6 CHAPTER 2 RELATED WORKS..........................................................................8 2.1 BACKGROUND ....................................................................................................8 2.2 DATA MINING RESEARCHES..............................................................................10 2.3 SEQUENTIAL PATTERN MINING RESEARCHES....................................................12 2.4 CLASSIC SEQUENTIAL PATTERN MINING METHODS..........................................16 2.4.1. GSP .........................................................................................................17 2.4.2. PrefixSpan...............................................................................................18 CHAPTER 3 TEMPORAL PATTERN MINING................................................21 3.1 MOTIVATION .....................................................................................................21 3.2 NONAMBIGIOUS REPRESENTATION....................................................................24 3.2.1 Problem Definition...................................................................................24 3.2.2 Why Oue Format is Unambiguous...........................................................29 3.3 ALGORITHM FOR MINING TEMPORAL PATTERNS...............................................30 3.3.1 Data Transformation................................................................................30 3.3.2 The TPrefixSpan Algorithm......................................................................30 3.3.3 Correctness and Completeness ................................................................38 3.4 EXPERIMENTS ...................................................................................................39 3.4.1 Performance Evaluation ..........................................................................40 3.4.2 Real Case Analyses ..................................................................................45 3.4.3 Predictive Accuracy .................................................................................52 3.5 SUMMARY.........................................................................................................56 CHAPTER 4 HYBRID TEMPORAL PATTERN MINING...............................57 4.1 PROBLEM DEFINITIONS.....................................................................................57 4.2 TEMPORAL RELATIONS BETWEEN HYBRID EVENTS ..........................................62 4.3 ALGORITHM FOR MINING HYBRID TEMPORAL PATTERNS .................................64 4.4 EXPERIMENTS ...................................................................................................70 4.4.1. Performance Evaluation ............................................................................70 4.4.2. Real case analyses .....................................................................................78 4.5 SUMMARY.........................................................................................................83 CHAPTER 5 USAGE GUIDE...............................................................................84 5.1 IN FINANCE DOMAIN ........................................................................................86 5.2 IN ELECTRONIC COMMERCE DOMAIN...............................................................90 CHAPTER 6 CONCLUSIONS AND FUTURE WORKS ..................................96 REFERENCES...........................................................................................................98

    [1] R. Agrawal, C. Faloutsos, and A. Swami, "Efficient similarity search in
    sequence databases", Proceedings of the 4th International Conference of
    Foundations of Data Organization and Algorithms (FODO), Chicago, Illinois,
    1993.
    [2] R. Agrawal, et al., Automatic subspace clustering of high dimensional data for
    data mining applications, Google Patents, 1999.
    [3] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between
    sets of items in large databases", ACM SIGMOD Record, 22(2), 207-216
    1993.
    [4] R. Agrawal, et al., "Fast discovery of association rules", Advances in
    knowledge discovery and data mining table of contents, 307-328 1996.
    [5] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in
    Large Databases", Proceedings of the 20th International Conference on Very
    Large Data Bases, 1994.
    [6] R. Agrawal and R. Srikant, "Mining sequential patterns", Eleventh
    International Conference on Data Engineering, Taipei, Taiwan, 1995.
    [7] J. F. Allen, "Maintaining knowledge about temporal intervals",
    Communications of the ACM, 26(11), 832-843 1983.
    [8] M. Ankerst, et al., "OPTICS: ordering points to identify the clustering
    structure", Proceedings of the 1999 ACM SIGMOD international conference
    on Management of data, 1999.
    [9] S. Berchtold, et al., "Fast parallel similarity search in multimedia databases",
    Proceedings of the 1997 ACM SIGMOD international conference on
    Management of data, 1997.
    [10] S. Berchtold and H. P. Kriegel, "S3: similarity search in CAD database
    systems", ACM SIGMOD Record, 26(2), 564-567 1997.
    [11] D. J. Berndt and J. Clifford, "Finding patterns in time series: a dynamic
    programming approach", Advances in knowledge discovery and data mining
    table of contents, 229-248 1996.
    [12] M. W. Berry, Survey of text mining: clustering, classification, and retrieval,
    Springer, 2003.
    [13] A. Berson, S. Smith, and K. Thearling, Building data mining applications for
    CRM, McGraw-Hill New York, 2000.
    [14] S. Chakrabarti, Mining the Web: discovering knowledge from hypertext data,
    Morgan Kaufmann, 2003.
    [15] P. K. Chan, et al., "Distributed data mining in credit card fraud detection",
    Intelligent Systems and Their Applications, IEEE (see also IEEE Intelligent
    Systems), 14(6), 67-74 1999.
    [16] M. S. Chen, J. Han, and P. S. Yu, "Data mining: an overview from a database
    perspective", IEEE Transactions on Knowledge and Data Engineering, 8(6),
    866-883 1996.
    [17] Y. L. Chen, M. C. Chiang, and M. T. Ko, "Discovering time-interval sequential
    patterns in sequence databases", Expert Systems With Applications, 25(3),
    343-354 2003.
    [18] Y. L. Chen and T. C. K. Huang, "Discovering fuzzy time-interval sequential
    patterns in sequence databases", Systems, Man and Cybernetics, Part B, IEEE
    Transactions on, 35(5), 959-972 2005.
    [19] Y. L. Chen and T. C. K. Huang, "A new approach for discovering fuzzy
    quantitative sequential patterns in sequence databases", Fuzzy Sets and
    Systems, 157(12), 1641-1661 2006.
    [20] T. Denoeux, "A k-nearest neighbor classification rule based on
    Dempster-Shafertheory", Systems, Man and Cybernetics, IEEE Transactions
    on, 25(5), 804-813 1995.
    [21] R. O. Duda and P. E. Hart, Pattern classification and scene analysis, Wiley
    New York, 1973.
    [22] M. El-Sayed, C. Ruiz, and E. A. Rundensteiner, "FS-Miner: efficient and
    incremental mining of frequent sequence patterns in web logs", Proceedings of
    the 6th annual ACM international workshop on Web information and data
    management, 2004.
    [23] M. Ester, et al., "Algorithms for characterization and trend detection in spatial
    databases", Proc. of the 4th International Conference on Knowledge Discovery
    and Data Mining (KDD-98), 1998.
    [24] M. Ester, et al., "A density-based algorithm for discovering clusters in large
    spatial databases with noise", Proc. 2nd Int. Conf. on Knowledge Discovery
    and Data Mining, Portland, OR, AAAI Press, 1996.
    [25] M. S. Flickner, et al., "Query by image and video content: the QBIC system",
    Computer, 28(9), 23-32 1995.
    [26] W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, "Knowledge discovery
    in databases: an overview", AI Magazine, 13(3), 57-70 1992.
    [27] M. N. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: sequential pattern
    mining with regular expression constraints", Proceedings of the 25th
    International Conference on Very Large Data Bases, 1999.
    [28] P. Giudici, Applied data mining: statistical methods for business and industry, Wiley, 2003.
    [29] S. Guha, R. Rastogi, and K. Shim, "CURE: an efficient clustering algorithm
    for large databases", Proceedings of the 1998 ACM SIGMOD international
    conference on Management of data, 1998.
    [30] V. Guralnik and G. Karypis, "Parallel tree-projection-based sequence mining
    algorithms", Parallel Computing, 30(4), 443-472 2004.
    [31] J. Han, G. Dong, and Y. Yin, "Efficient mining of partial periodic patterns in
    time series database", ICDE, 99, 106-115 1999.
    [32] J. Han, W. Gong, and Y. Yin, "Mining segment-wise periodic patterns in
    time-related databases", Proc. Int. Conf. on Knowledge Discovery and Data
    Mining, 1998.
    [33] J. Han and M. Kamber, Data mining: concepts and techniques, 2nd edition,
    Morgan Kaufmann, 2006.
    [34] J. Han, S. Nishio, and H. Kawano, "Knowledge discovery in object-oriented
    and active databases", Knowledge Building and Knowledge Sharing, 221-230
    1994.
    [35] J. Han, et al., "Generalization-based data mining in object-oriented databases
    using an object cube model", Data and Knowledge Engineering, 25(1-2),
    55-97 1998.
    [36] J. Han, et al., "FreeSpan: frequent pattern-projected sequential pattern mining",
    Proceedings of the sixth ACM SIGKDD international conference on
    Knowledge discovery and data mining, Boston, Massachusetts, United States,
    2000.
    [37] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate
    generation", ACM SIGMOD Record, 29(2), 1-12 2000.
    [38] G. Hepner, et al., "Artificial neural network classification using a minimal
    training set-comparison to conventional supervised classification",
    Photogrammetric Engineering and Remote Sensing, 56, 469-473 1990.
    [39] J. Hipp, U. G  tzer, and G. Nakhaeizadeh, "Algorithms for association rule
    mining: general survey and comparison", ACM SIGKDD Explorations
    Newsletter, 2(1), 58-64 2000.
    [40] T. P. Hong, C. S. Kuo, and S. C. Chi, "Mining fuzzy sequential patterns from
    quantitative data", Systems, Man, and Cybernetics, 1999. IEEE SMC''99
    Conference Proceedings. 1999 IEEE International Conference on, 1999.
    [41] T. P. Hong, K. Y. Lin, and S. L. Wang, "Mining fuzzy sequential patterns from
    multiple-item transactions", IFSA World Congress and 20th NAFIPS
    International Conference, Vancouver, BC, Canada, 2001.
    [42] M. James, Classification algorithms, Wiley-Interscience New York, NY, USA, 1985.
    [43] P.-s. Kam and A. W.-c. Fu, "Discovering temporal patterns for interval-based
    events", Proceeding of Second International Conference on Data Warehousing
    and Knowledge Discovery, London, UK, 2000.
    [44] G. Karypis, E. H. Han, and V. Kumar, "CHAMELEON: a hierarchical
    clustering algorithm using dynamic modeling", COMPUTER, 32, 68-75 1999.
    [45] D. E. Knuth, J. H. Morris Jr, and V. R. Pratt, "Fast pattern matching in strings",
    SIAM Journal on Computing, 6, 323 1977.
    [46] T. Kohonen, "Self-organized formation of topologically correct feature maps",
    Biological Cybernetics, 43(1), 59-69 1982.
    [47] K. Koperski and J. Han, "Discovery of spatial association rules in geographic
    information databases", Proceedings of the 4th International Symposium on
    Advances in Spatial Databases, 1995.
    [48] B. Kovalerchuk and E. Vityaev, Data mining in finance: advances in relational
    and hybrid methods, Kluwer Academic, 2000.
    [49] P. Langley, W. Iba, and K. Thompson, "An analysis of Bayesian classifiers",
    Proceedings of the Tenth National Conference on Artificial Intelligence, 1992.
    [50] C. S. Li, P. S. Yu, and V. Castelli, "HierarchyScan: a hierarchical similarity
    search algorithm for databases of long sequences", Proceedings of the Twelfth
    International Conference on Data Engineering, 1996.
    [51] S. Ma, et al., "Mining partially periodic event patterns with unknown periods",
    Data Engineering, 2001. Proceedings. 17th International Conference on, 2001.
    [52] J. MacQueen, "Some methods for classification and analysis of multivariate
    observations", Proceedings of the Fifth Berkeley Symposium on Mathematical
    Statistics and Probability, 1967.
    [53] H. Mannila and H. Toivonen, "Levelwise search and borders of theories in
    knowledge discovery", Data Mining and Knowledge Discovery, 1(3), 241-258
    1997.
    [54] H. Mannila, H. Toivonen, and A. Inkeri Verkamo, "Discovery of frequent
    episodes in event sequences", Data Mining and Knowledge Discovery, 1(3),
    259-289 1997.
    [55] R. Mattison, Data warehousing and data mining for telecommunications,
    Artech House, Inc. Norwood, MA, USA, 1997.
    [56] H. J. Mo and S. D. M. White, "An analytic model for the spatial clustering of
    dark matter haloes", Arxiv preprint astro-ph/9512127 1995.
    [57] S. K. Murthy, "Automatic construction of decision trees from data: a
    multi-disciplinary survey", Data Mining and Knowledge Discovery, 2(4),
    345-389 1998.
    [58] R. T. Ng and J. Han, "Efficient and effective clustering methods for spatial
    data mining", Proceedings of the 20th International Conference on Very Large
    Data Bases, 1994.
    [59] S. Parthasarathy, et al., "Incremental and interactive sequence mining",
    Proceedings of the eighth international conference on Information and
    knowledge management, 1999.
    [60] J. Pei, et al., "PrefixSpan: mining sequential patterns efficiently by
    prefix-projected pattern growth", Data Engineering, 2001. Proceedings. 17th
    International Conference on, Heidelberg, Germany, 2001.
    [61] J. Pei, J. Han, and W. Wang, "Mining sequential patterns with constraints in
    large databases", Proceedings of the eleventh international conference on
    Information and knowledge management, McLean, Virginia, USA, 2002.
    [62] H. Pinto, et al., "Multi-dimensional sequential pattern mining", Proceedings of
    the tenth international conference on Information and knowledge management,
    2001.
    [63] D. Pyle, Business modeling and data mining, Morgan Kaufmann, 2003.
    [64] J. R. Quilan, "C4. 5: programs for machine learning", Morgan Kaufmann
    1993.
    [65] J. R. Quinlan, "Induction of decision trees", Machine Learning, 1(1), 81-106
    1986.
    [66] J. R. Quinlan, "Simplifying decision trees", International Journal of
    Man-Machine Studies, 27(3), 221-234 1987.
    [67] D. E. Rumelhart and D. Zipser, "Feature discovery by competitive learning",
    Cognitive Science, 9(1), 75-112 1985.
    [68] G. Sheikholeslami, S. Chatterjee, and A. Zhang, "WaveCluster: a
    multi-resolution clustering approach for very large spatial databases",
    Proceedings of the 24rd International Conference on Very Large Data Bases,
    1998.
    [69] R. Srikant and R. Agrawal, "Mining sequential patterns: generalizations and
    performance improvements", Preceedings of the 5th International Conference
    on Extending Database Technology (EDBT), Avignon, France, 1996.
    [70] R. Sullivan, A. Timmermann, and H. White, The dangers of data-driven
    inference: the case of calendar effects in stock returns, LSE Financial Markets
    Group, 1998.
    [71] E. A. Wan, "Neural network classification: a Bayesian interpretation", IEEE
    Transactions on Neural Networks, 1(4), 303-305 1990.
    [72] J. Wang and J. Han, "BIDE: efficient mining of frequent closed sequences",
    Data Engineering, 2004. Proceedings. 20th International Conference on, 2004.
    [73] K. Wang, et al., "Top down fp-growth for association rule mining", Proc. of
    6th Pacific-Asia conference on Knowledge Discovery and Data Mining, 2002.
    [74] W. Wang, J. Yang, and R. Muntz, "STING: a statistical information grid
    approach to spatial data mining", Proceedings of the 23rd International
    Conference on Very Large Data Bases, 1997.
    [75] C. R. Westphal and T. Blaxton, Data mining solutions, Wiley New York, 1998.
    [76] S.-Y. Wu and Y.-L. Chen, "Mining non-ambiguous temporal patterns for
    interval-based events", IEEE Transactions on Knowledge and Data
    Engineering (forthcomming), 19(6) 2007.
    [77] X. Yan, J. Han, and R. Afshar, "CloSpan: Mining closed sequential patterns in
    large datasets", Proceedings of the Int. Conference SIAM Data Mining, 2003.
    [78] J. Yang, W. Wang, and P. S. Yu, "Mining asynchronous periodic patterns in
    time series data", IEEE Transactions on Knowledge and Data Engineering,
    15(3), 613-628 2003.
    [79] C.-C. Yu and Y.-L. Chen, "Mining sequential patterns from multi-dimensional
    sequence data", IEEE Transaction on Data and Knowledge Engineering,
    17(1), 136-140 2005.
    [80] O. R. Za

    QR CODE
    :::