| 研究生: |
楊慧如 Hui-Ru Yang |
|---|---|
| 論文名稱: |
在序列資料庫中挖掘多重時間間隔樣式 Discovering multi-time-interval sequential patterns in sequence database |
| 指導教授: |
陳彥良
Yen-Liang Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 92 |
| 語文別: | 英文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 知識挖掘 、序列樣式 、時間間隔 、多重時間間隔 、資料挖礦 |
| 外文關鍵詞: | Data mining, knowledge discovery, sequential patterns, multi time interval, time interval |
| 相關次數: | 點閱:4 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
序列樣式的挖掘在許多應用扮演十分重要的角色,包括生物電腦研究、顧客行為分析及系統效能研究等等,但是一般的序列樣式挖掘很少考慮到時間間隔,一直到Chen, Jiang, and Ko 提出時間間隔樣式挖掘之後,我們發現只挖掘出兩兩項目之間的時間間隔是不夠的,必須找出所有項目之間的時間間隔的樣式才能幫助決策者得到詳細請足夠的支援,於是我們提出兩項演算法:MI-Apriori以及MI-PrefixSpan分別改自Apriori以及PrefixSpan演算法,其中MI-PrefixSpan的效率優於MI-Apriori,而scalablity的表現則相反。
Sequential pattern mining is of great importance in many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. Recently, an extension of sequential patterns, called time-interval sequential patterns, is proposed by Chen, Jiang, and Ko, which not only reveals the order of items but also the time intervals between successive items. For example: having bought a laser printer, a customer returns to buy a scanner in three months and then a CD burner in six months. Although time-interval sequential patterns are useful in predicting when the customer would take the next step, it can not determine when the next k steps will be taken. Hence, we present two efficient algorithms, MI-Apriori and MI-PrefixSpan to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm but the MI-Apriori algorithm has a better scalability.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of 1994 International Conference on Very Large Data Bases, 487–499.
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. Proceedings of 1995 International Conference on Data Engineering, 3–14.
Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883.
Chen, Yen-Liang; Chiang, Mei-Ching; Ko, Ming-Tat (2003). Discovering time-interval sequential patterns in sequence databases. Expert Systems with Applications. 25(3), 343-354.
Chen, Yen-Liang; Chen, Shih-Sheng; Hsu, Ping-Yu (2002). Mining hybrid sequential patterns and sequential rules. Information Systems, 27(5), 345-362, July, 2002.
Cowan, Adrian M. (2000). Data Mining in Finance: Advances in Relational and Hybrid Methods: Boris Kovalerchuk and Evgenii Vityaev (Eds.), Kluwer Academic Publishers, Norwell, Massachusetts, 2000, HB US $120, ISBN 0-7923-7804-0 . International Journal of Forecasting. 18(1), 155-156.
Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1991). Knowledge discovery in databases: An overview. Cambridge, MA: AAAI/MIT press.
H. J. Loether and D. G. McTavish. (1993). Descriptive and Inferential Statistics: An Introduction. Allyn and Bacon.
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. (1997). Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 259 -289.
J. Han, W. Gong, and Y. Yin. (1998). Mining Segment-Wise Periodic Patterns in Time-Related Databases. Proc. of 1998 Int''l Conf. on Knowledge Discovery and Data Mining (KDD''98), 214-218, New York City, NY.
J. Han, G. Dong and Y. Yin. (1999) Efficient mining of partial periodic patterns in time series database In Proc. 1999 Int. Conf. Data Engineering (ICDE''99), Sydney.
J. Han. (1999). Data Mining. in J. Urban and P. Dasgupta (eds.). Encyclopedia of Distributed Computing , Kluwer Academic Publishers.
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu. (2000). FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining. Proc. 2000 Int. Conf. on Knowledge Discovery and Data Mining (KDD''00). 355-359
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. (2001). PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, Proc. 2001 Int. Conf. on Data Engineering (ICDE''01)
J. Han. (2002). How Can Data Mining Help Bio-Data Analysis?, Proc. 2002 Workshop on Data Mining in Bioinformatics (with SIGKDD02 Conf.)
J. Yang, P. Yu, W. Wang, and J. Han. (2002). Mining Long Sequential Patterns in a Noisy Environment. In Proc. of 2002 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD''02), Madison, WI.
Lee, Anthony J.T.; Wang, Yao-Te. (2003). Efficient data mining for calling path
patterns in GSM networks . Information Systems, 28(8), 929-948.
Mannila, H., Toivonen, H., and Verkamo, A.I. (1995). Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD ’95). Montr´eal, Canada. 210–215.
Mannila, H. and Toivonen, H. (1996). Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD ’96). Portland, OR. 146–151.
M.-S. Chen, J.-S. Park and P. S. Yu. (1998). Efficient Data Mining for Path Traversal Patterns. IEEE Trans. on Knowledge and Data Engineering, 10(2), 209-221.
R. Srikant, R. Agrawal. (1996). Mining Sequential Patterns: Generalizations and Performance Improvements. In Proc. of the Fifth Int''l Conference on Extending Database Technology (EDBT), Avignon, France. Expanded version available as IBM Research Report RJ 9994.
Sherri K. Harms, Jitender Deogun, Tsegaye Tadesse. (2002). Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences. Lecture Notes in Artificial Intelligence, 2366(0), 0432.
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Ramasamy Uthurusamy Summary from the KDD-03 panel: data mining: the next 10 years. ACM SIGKDD Explorations Newsletter, 5(2), 191 - 196 .December, 2003
YJ. Yang, P. Yu, W. Wang, and J. Han. (2002). Mining Long Sequential Patterns in a Noisy Environment Proc. 2002 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD''02), Madison, WI, June 2002