跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳廷睿
Ting-Rui Chen
論文名稱: 擴展點擊流:分析點擊流中缺少的使用者行為
Extended Clickstream: an analysis of the missing user behaviors in the Clickstream
指導教授: 陳弘軒
Hung-Hsuan Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 62
中文關鍵詞: 點擊流日誌分析使用者行為分析時序資料回歸預測Clickstreamlog analysisUser Behavior ModelTime-Series Recurrent Prediction
外文關鍵詞: Clickstream, web usage mining, User Behavior Model, Time-Series Recurrent Prediction
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 一般認為使用者的點擊流 (clickstream) 可以代表使用者的線上瀏覽行為,然而,我們發現點擊流只能概略表示使用者的部份行為,例如:分頁切換、視窗切換等介面間的瀏覽行為因為沒有產生與伺服器的互動,所以不會出現在點擊流或日誌 (log) 中,但使用者仍然在瀏覽網頁。本文將這些行為收集並命名為「擴展點擊流」(extended clickstream)。透過建設完整的系統服務並招募受試者來同步蒐集點擊流和擴展點擊流,並對兩者進行比較分析及建構深度學習模型。我們使用含有 GRU 元件的深度學習模型,對點擊流和擴展點擊流這類型的時序資料進行「使用者下次會去什麼類型的網站」、「下次點擊會間隔多久」的多目標預測。實驗結果顯示:融合點擊流和擴展點擊流可以增進預測效能。除此之外,本文發現點擊流會因為部分網站的運作機制而多計入了使用者沒有意圖執行的行為;另外,我們也可以透過融合點擊流及擴展點擊流來區分出來自不同裝置的單一使用者


    Nowadays, people often use clickstream to represent the behavior of online users. However, we found that clickstream only represents part of users' browsing behaviors. For instance, clickstream does not include tab switching and browser window switching. We collect these kinds of behaviors and named as ``extended clickstream". This thesis builds a service to capture both of clickstream and extended clickstream, also provides an analysis of the differences between above. We use a Multi-Task learning model with GRU components to perform multi-objective predictions of ``what kind of website the user will go next time" and ``how long the interval of clicks will be" for the time series of clickstreams and extended clickstreams. Our experimental results show that combining clickstream and extended clickstream can improve the prediction performance. In addition, this article finds that the clickstream will record unintended clicks due to the operation mechanism of certain websites. Moreover, we can differentiate the single user from several devices by combining the clickstream and extended clickstream.

    摘要 ... ix Abstract ... xi Contents ... xiii 1 Introduction ... 1 2 Related Work ... 3 2.1 Clickstream & Long-Term Cross-Domain Clickstream ... 3 2.2 Post-collected Dataset ... 5 2.2.1 Published as an Open Dataset ... 6 2.3 Discretize the intervals between events in Time-Series data ... 6 2.4 Multi-Task Learning(MTL) ... 7 3 Extended Clickstream(ECS) 9 3.1 What is Extended Clickstream(ECS) ... 9 3.2 Merits of ECS ... 13 3.2.1 Easy to understand ... 13 3.2.2 Make CS more useful ... 13 3.2.3 Enhance the predictive power of modeling user behavior ... 13 4 Methods ... 15 4.1 Phase I. - Data Collecting ... 15 4.1.1 System Requirement ... 15 4.1.2 Market Analysis ... 16 4.1.3 Solution ... 16 4.2 Phase II. - Data Preprocessing ... 17 4.2.1 Filter unintentional event ... 17 4.2.2 Session split ... 17 4.2.3 Time Mapping ... 18 4.2.4 Time Precision Alignment ... 19 4.2.5 Summary of Data Preprocessing ... 20 4.3 Phase III. - Model the User Behavior ... 21 5 Results ... 25 5.1 Collected Data ... 25 5.2 Data Analysis ... 27 5.2.1 Statics Analysis ... 27 5.2.2 Case Study - Multi-device detection and Unintentional events in CS ... 31 5.3 Model Evaluate ... 32 6 Conclusion & Discussion ... 39 Bibliography ... 41 A Data Collect System ... 43

    [1] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing user behavior in online social networks,” in Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, ser. IMC ’09, Chicago, Illinois, USA: ACM, 2009, pp. 49–62, isbn: 978-1-60558-771-4. doi: 10.1145/1644893.1644900. [Online]. Available: http://doi.acm.org/10.1145/1644893.1644900.
    [2] Y. Chi, T. Jiang, D. He, and R. Meng, “Towards an integrated clickstream data analysis framework for understanding web users’ information behavior,” iConference 2017 Proceedings, 2017.
    [3] Z. S. Zubi and M. Raiani, “Using web logs dataset via web mining for user behavior understanding,” Int J Comput Comm, vol. 8, pp. 103–111, 2014.
    [4] Y. Wang, N. Law, E. Hemberg, and U.-M. O’Reilly, “Using detailed access trajectories for learning behavior analysis,” in Proceedings of the 9th International Conference on Learning Analytics & Knowledge, ser. LAK19, Tempe, AZ, USA: ACM, 2019, pp. 290–299, isbn: 978-1-4503-6256-6. doi: 10 . 1145 / 3303772 . 3303781. [Online]. Available: http://doi.acm.org/10.1145/3303772.3303781.
    [5] G. Wang, X. Zhang, S. Tang, C. Wilson, H. Zheng, and B. Y. Zhao, “Clickstream user behavior models,” ACM Trans. Web, vol. 11, no. 4, 21:1–21:37, Jul. 2017, issn: 1559-1131. doi: 10.1145/3068332. [Online]. Available: http://doi.acm.org/10. 1145/3068332.
    [6] K. Ma, R. Jiang, M. Dong, Y. Jia, and A. Li, “Neural network based web log analysis for web intrusion detection,” in Security, Privacy, and Anonymity in Computation, Communication, and Storage, G. Wang, M. Atiquzzaman, Z. Yan, and K.-K. R. Choo, Eds., Cham: Springer International Publishing, 2017, pp. 194–204, isbn: 978-3-319-72395-2.
    [7] C.-Y. Lien, Predicting Users􅟰Demographic Information and Personality Through Browsing History. 2018. [Online]. Available: https://github.com/ncu-dart/Lab-Publications/raw/master/Thesis2018_Cheng_You_Lien.pdf.
    [8] G.-J. Bai, Predicting Users􅟰Browsing Tendency During Holidays by Matrix Factorization based Multi-objective Method. 2018. [Online]. Available: https://github.com/ncu- dart/Lab- Publications/raw/master/Thesis2018_Guo_Jhen_Bai.pdf.
    [9] T.-R. Chen, Clickstream open dataset. [Online]. Available: https://ncu-dart.github.io/#CS_open_dataset.
    [10] S. Ruder, “An overview of multi-task learning in deep neural networks,” CoRR, vol. abs/1706.05098, 2017. arXiv: 1706.05098. [Online]. Available: http://arxiv. org/abs/1706.05098.
    [11] G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai, “Deep interest evolution network for click-through rate prediction,” CoRR, vol. abs/1809.03672, 2018. arXiv: 1809 . 03672. [Online]. Available: http : / / arxiv . org / abs / 1809 . 03672.
    [12] Google, Chrome.history. [Online]. Available: https://developer.chrome.com/ extensions/history#transition_types.
    [13] J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” CoRR, vol. abs/1412.3555, 2014. arXiv: 1412.3555. [Online]. Available: http://arxiv.org/abs/1412.3555.
    [14] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, issn: 0899-7667. doi: 10.1162/neco.1997. 9.8.1735. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735.
    [15] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. [Online]. Available: http://arxiv.org/abs/1412.6980.
    [16] S. L. Smith, P. Kindermans, and Q. V. Le, “Don’t decay the learning rate, increase the batch size,” CoRR, vol. abs/1711.00489, 2017. arXiv: 1711.00489. [Online]. Available: http://arxiv.org/abs/1711.00489.
    [17] S. Kullback and R. A. Leibler, “On information and sufficiency,” Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, Mar. 1951. doi: 10.1214/aoms/1177729694. [Online]. Available: https://doi.org/10.1214/aoms/1177729694.

    QR CODE
    :::