跳到主要內容

簡易檢索 / 詳目顯示

研究生: 王瑞閔
Jui-Ming Wang
論文名稱: 運用極限梯度提升樹與關聯規則於零售業的顧客分群
Using XGBoost and Association Rules for Customer Segmentation in Retail Industry
指導教授: 沈建文
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 企業管理學系
Department of Business Administration
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 60
中文關鍵詞: 顧客分群關聯規則RFM 模型極限梯度提升樹
外文關鍵詞: Customer segmentation, Association rules, RFM, XGBoost
相關次數: 點閱:12下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在這資訊科技快速發展的年代,降低了企業對於資料蒐集的門檻,企業與
    顧客的連結更加緊密,因此瞭解並掌握顧客儼然已是現今行銷人員最重要的課
    題,以及智慧零售透過數據分析顧客全通路的購物、瀏覽行為,使企業針對不
    同屬性的會員,提供精準的行銷資訊,因此顧客分群成為重要方法,目的是將
    顧客依照不同的特性劃分,大部分的行銷人員以 RFM 模型和集群分析來完成
    顧客分群,然而此方法依然存在許多可改進的地方,且鮮少將交易資料以外的
    數據也納入分析中,因此本研究欲應用多種機器學習方法以及多個面向的變數
    進行顧客的分群分析。
    本研究透過台灣零售業的資料進行顧客分群分析,採用極限梯度提升樹
    (XGBoost)找出關鍵的特徵值,接著透過多種集群分析演算法的比較找出最合適
    的分群方法與結果,包含 K-means、Hierarchical Clustering、Birch 和 SOM,後
    續針對不同群體的各個變數進行討論,接著進行關聯規則分析,除了購物籃分
    析之外本研究也探討了網路瀏覽和促銷之間的關聯性。本研究結果在 RFM 模
    型的三個變數之外,另外找出兩個重要變數彌補了原有模型的缺陷,且 Birch
    演算法在該資料的集群分析具有良好的表現,最後將顧客分為五群,針對分群
    結果進一步的探討分析,深入了解各集群的變數差異性以及顧客行為的不同,
    並透過關聯規則分析找出多個關聯組合,為企業提供決策支援並使顧客價值最
    大化。


    With the rapid development of information technology, companies can build
    connections with customers more easily by collecting various data. Therefore,
    understanding customers has become the most important issue for today's marketers.
    By using data to analyze customers' shopping and browsing behaviors in all channels,
    retail companies can provide accurate marketing information for different customers.
    Therefore, how to divide customers according to different features and complete
    customer segmentation has become one of the most important research topics in retail.
    While most marketers use RFM model and cluster analysis to complete customer
    segmentation, the drawback of this method is that it only considers recent purchase
    and only have transaction data in the analysis. For this reason, this study aims to apply
    different machine learning methods with a variety of variables for customer
    segmentation analysis.
    In this thesis, customer segmentation analysis was conducted based on the data
    of Taiwan's retail industry. I use extreme gradient boosting tree (XGBoost) to find the
    important features and the most suitable clustering method and results through the
    comparison of various cluster analysis algorithms, including K- means, Hierarchical
    Clustering, Birch, and SOM. Finally, I discuss the impact of various variables for
    different groups by association rule analysis. The results of the research show that
    besides the three features of the RFM model, there are two important variables such
    as Period and NES that could improve the defects of the original model, and Birch
    algorithm has a good performance in the cluster analysis of the data. The results of
    clustering analysis indicate that customers can be divided into five segments with
    different features and purchase behaviors. Together with the results of clustering
    analysis and association rules, retailers can design suitable marketing campaign and
    improve their customer relationships.

    中文摘要.........................................................................................................................i ABSTRACT...........................................................................................................ii 致謝.......................................................................................................................iv 圖目錄..................................................................................................................vii 表目錄................................................................................................................ viii 第一章 緒論..................................................................................................................1 1.1 研究背景與動機.............................................................................................1 1.2 研究目的.........................................................................................................3 1.3 研究架構.........................................................................................................4 第二章 文獻探討..........................................................................................................5 2.1 零售業的市場區隔及顧客分群相關研究......................................................5 2.2 關聯規則之應用相關研究............................................................................10 第三章 研究方法........................................................................................................12 3.1 資料蒐集.......................................................................................................12 3.2 關鍵特徵值分析...........................................................................................13 3.2.1 XGBoost 模型的特徵分數 ......................................................................13 3.2.2 相關性過濾..............................................................................................14 3.3 集群分析.......................................................................................................15 3.3.1 去除極端值..............................................................................................15 3.3.2 資料標準化..............................................................................................16 3.3.3 K-means 演算法.......................................................................................16 3.3.4 Hierarchical Clustering .............................................................................17 3.3.5 Birch..........................................................................................................18 3.3.6 Self-organizing Map(SOM)......................................................................19 3.3.7 silhouette coefficient.................................................................................20 vi 3.4 關聯規則分析...............................................................................................20 第四章 研究結果........................................................................................................22 4.1 資料敘述統計與處理結果...........................................................................22 4.2 特徵值選擇結果...........................................................................................25 4.3 集群分析結果...............................................................................................27 4.4 關聯規則分析結果.......................................................................................36 第五章 結論與建議....................................................................................................44 5.1 結論...............................................................................................................44 5.2 研究限制與建議...........................................................................................45 參考文獻..............................................................................................................46

    Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between
    sets of items in large databases. Paper presented at the Proceedings of the
    1993 ACM SIGMOD international conference on Management of data.
    Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules.
    Paper presented at the Proc. 20th int. conf. very large data bases, VLDB.
    Chen, D., Sain, S. L., Guo, K. J. J. o. D. M., & Management, C. S. (2012). Data
    mining for the online retail industry: A case study of RFM model-based
    customer segmentation using data mining. Journal of Database Marketing &
    Customer Strategy Management, 19(3), 197-208.
    Chen, M., Liu, Q., Chen, S., Liu, Y., Zhang, C.-H., & Liu, R. J. I. A. (2019).
    XGBoost-based algorithm interpretation and application on post-fault transient
    stability status prediction of power system. IEEE Access, 7, 13149-13158.
    Cheng, C.-H., & Chen, Y.-S. J. E. s. w. a. (2009). Classifying the segmentation of
    customer value via RFM model and RS theory. Expert systems with
    applications, 36(3), 4176-4184.
    Dogan, O., Ayçin, E., Bulut, Z. A. J. I. J. o. C. E., & Sciences, A. (2018). Customer
    segmentation by using RFM model and clustering methods: a case study in
    retail industry. International Journal of Contemporary Economics and
    Administrative Sciences, 8(1), 1-19.
    Hughes, A. M. (1994). Strategic database marketing : the masterplan for starting and
    managing a profitable, customer-based marketing program.
    Kaur, M., & Kang, S. J. P. c. s. (2016). Market Basket Analysis: Identify the changing
    trends of market data using association rule mining. Procedia computer
    science, 85, 78-85.
    Kohonen, T., & Honkela, T. J. S. (2007). Kohonen network. 2(1), 1568.
    Kotler, P. (1997). Marketing management : analysis, planning, implementation and
    control. Upper Saddle River (New Jersey); London: Prentice Hall
    International : Prentice-Hall.
    Kotler, P., & Armstrong, G. (2010). Principles of marketing: Pearson education.
    Kumar, T. S. J. J. o. A. I. (2020). Data mining based marketing decision support
    system using hybrid machine learning algorithm. Journal of Artificial
    Intelligence, 2(03), 185-193.
    Kuo, R. J., Ho, L., Hu, C. J. C., & Engineering, I. (2002). Cluster analysis in
    industrial market segmentation through artificial neural network. Computers &
    Industrial Engineering, 42(2-4), 391-399.
    Lisheng, L., & Ming, L. (2020). Research on the Training Mode of Applicationoriented Marketing Talents under the Smart New Retail. Paper presented at the
    2020 International Conference on Big Data and Informatization Education
    (ICBDIE).
    MacQueen, J. B. (1965). On the Asymptotic Behavior of k-means.
    Marcus, C. (1998). A practical yet meaningful approach to customer segmentation.
    Journal of Consumer Marketing, 15(5), 494-504.
    doi:10.1108/07363769810235974
    Peker, S., Kocyigit, A., Eren, P. E. J. M. I., & Planning. (2017). LRFMP model for
    customer segmentation in the grocery retail industry: a case study.
    Punj, G., & Stewart, D. W. J. J. o. m. r. (1983). Cluster analysis in marketing research:
    Review and suggestions for application. Journal of marketing research, 20(2),
    134-148.
    Rong, J., Vu, H. Q., Law, R., & Li, G. J. T. M. (2012). A behavioral analysis of web
    sharers and browsers in Hong Kong using targeted association rule mining.
    Tourism Management, 33(4), 731-740.
    Rousseeuw, P. J. J. J. o. c., & mathematics, a. (1987). Silhouettes: a graphical aid to
    the interpretation and validation of cluster analysis. Journal of computational
    and applied mathematics, 20, 53-65.
    Shaw, M. J., Subramaniam, C., Tan, G. W., & Welge, M. E. J. D. s. s. (2001).
    Knowledge management and data mining for marketing. 31(1), 127-137.
    Smith, W. R. J. J. o. m. (1956). Product differentiation and market segmentation as
    alternative marketing strategies. Journal of marketing, 21(1), 3-8.
    Sokol, O., & Holý, V. J. I. J. o. M. R. (2021). The role of shopping mission in retail
    customer segmentation. International Journal of Market Research, 63(4), 454-
    470.
    Taylor, R. J. J. o. d. m. s. (1990). Interpretation of the correlation coefficient: a basic
    review. Journal of diagnostic medical sonography, 6(1), 35-39.
    Tedlow, R. S. (2014). The fourth phase of marketing: marketing history and the
    business world today. In The Rise and Fall of Mass Marketing (RLE
    Marketing) (pp. 24-51): Routledge.
    Zhang, T., Ramakrishnan, R., & Livny, M. J. A. s. r. (1996). BIRCH: an efficient data
    clustering method for very large databases. ACM sigmod record, 25(2), 103-
    114.
    Zheng, H., Yuan, J., & Chen, L. J. E. (2017). Short-term load forecasting using EMDLSTM neural networks with a Xgboost algorithm for feature importance
    evaluation. Energies, 10(8), 1168.
    Zheng, Z., Kohavi, R., & Mason, L. (2001). Real world performance of association
    rule algorithms. Paper presented at the Proceedings of the seventh ACM
    SIGKDD international conference on Knowledge discovery and data mining.

    QR CODE
    :::