跳到主要內容

簡易檢索 / 詳目顯示

研究生: 謝宗佑
Tsung-Yu Hsieh
論文名稱: 基於使用者行為的數位音樂推薦方法
指導教授: 蔡志豐
Chih-Fong Tsai
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系在職專班
Executive Master of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 48
中文關鍵詞: 協同過濾Word2vec關聯規則Apache Spark
外文關鍵詞: Collaborative Filtering, Word2vec, Frequent-Pattern Growth, Apache Spark
相關次數: 點閱:17下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 推薦系統廣泛被主流的線上服務商(例如:Amazon、Spotify、Netflix)應用來增加服務、商品能見度進而誘發使用者購買商品或持續使用服務,受益於網際網路技術成熟與巨量資料相關技術不斷進步,推薦系統逐漸從分析傳統交易資料(熱門購買商品)跨進使用各種演算法預測使用者對歌曲的喜好程度進而做到個人化推薦。
    本研究使用Yahoo! Music中使用者對於歌曲評分資料,以目前廣泛被使用在個人化推薦的協同過濾演算法作為基準輔以兩種基於使用者行為上找商品相似度的演算法關聯法則、Word2vec組合出來的混合模型,同時考量實際上的情境:
    1.時間序問題:使用Real-life split的概念來切割訓練與驗證資料集。
    2.有限的推薦商品數:取Top k的資料驗證map@5,map@10效果。
    結果顯示兩種方法皆可以提升準確率且本論文的技術採用Apache Spark,處理大量資料集將帶來顯著的效益。


    The recommendation system is widely used in the on-line entertainment industries.By building the system, services prociders like Amazon、Spotify、Netflix can reveal as more products or contents to their users as possible. The more satisfaction they get from their users means the more user engagement they win.
    Take digital music services, in trandition, the system recommended musics based on the historical records or its’ metadata. Along with the improvement of technology, we can easily process large datasets such as user-ratings data or user-behavior data and apply some data mining algorithm such as collaborative filtering algorithm to do the personalization recommendation.
    In this study, the Yahoo! Music dataset is used.First, we try to tune the performance of collaborative filtering algorithm and treat it as the baseline of our recommendation system. Second, we reform the user-ratings data to apply two algorithms: Frequent-Pattern Growth and Word2vec in order to find the similarity of songs. Finally, the hybrid models combine the results of CF and fp-growth/Word2vec and both their evaluation metrics : map@5、map@10 are improved. Moreover, the approach we provided is adopted in the Apache Spark framework. It benefits us when dealing with the larger datasets in real world.

    摘要 I Abstract II 誌謝 III 目錄 IV 圖目錄 V 表目錄 VI 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 4 1.3 研究目的 5 1.4 論文架構 5 第二章 文獻探討 7 2.1 推薦系統介紹 7 2.2 協同過濾(Collaborative Filtering)與應用 8 2.3 關聯法則(Frequent-Pattern Growth)與應用 11 2.4 Word2vec與應用 13 2.5 巨量資料平行運算框架:Apache Spark 14 2.6 資料集切割方法:Real-life split strategy (RLS) 16 第三章 研究方法 17 3.1 資料集介紹 17 3.2 研究流程 18 3.3 資料處理流程 19 3.4 模型評估方法 27 第四章 研究結果 28 4.1 協同過濾演算法結果 28 4.2 混合協同過濾與關聯法則結果 30 4.3 混合協同過濾與Word2vec結果 30 4.4 結果分析 31 第五章 研究結論與建議 33 5.1. 研究結論 33 5.2. 研究貢獻 33 5.3. 研究限制 33 5.4. 未來研究方向與建議 34 參考文獻 35

    1. IFPI. (2020). IFPI issues annual Global Music Report. Retrieved June 7th, 2020, from https://www.ifpi.org/news/IFPI-issues-annual-Global-Music-Report
    2. 文化部. (2020). 107年流行音樂產業調查報告
    3. PwC. (2019). 2019-2023 全球娛樂暨媒體業展望
    4. Global Online Music Streaming Grew 32% YoY to Cross 350 Million Subscriptions in 2019. Retrieved June 7th, 2020, from https://www.counterpointresearch.com/global-online-music-streaming-grew-2019/
    5. Spotify. https://www.spotify.com/
    6. Apple Music. https://www.apple.com/tw/apple-music/
    7. Amazon Music. https://music.amazon.com/home
    8. 資策會產業情報研究所. (2015). 2015台灣數位音樂型態與消費趨勢分析
    9. 為何Spotify推薦的歌曲總能符合你的喜好?讓它超越競爭對手的秘密.Retrieved June 7th, 2020, from https://www.businessweekly.com.tw/business/blog/3001912
    10. 數位時代. (2019). LINE MUSIC強勢登台!一張圖看串流音樂「四兄弟」差異. Retrieved June 7th, 2020, from https://www.bnext.com.tw/article/53955/streaming-music-service-in-taiwan-line-music-kkbox-apple-music-spotify
    11. 協同過濾演算法. Retrieved June 7th, 2020, from https://zh.wikipedia.org/wiki/%E5%8D%94%E5%90%8C%E9%81%8E%E6%BF%BE
    12. Szu-Yu Chou et al.,. (2015) Evaluating music recommendation in a real-world setting: on data splitting and evaluation metrics
    13. Apache Spark. https://spark.apache.org/
    14. Yahoo! Music. https://en.wikipedia.org/wiki/Yahoo!_Music
    15. P.Resnick, H. R. Varian, G.(1997) Editors. Recommender Systems .Communications Of The ACM,1997,40(3):56-58.
    16. Recommender system. https://en.wikipedia.org/wiki/Recommender_system
    17. Cold start.https://en.wikipedia.org/wiki/Cold_start_(recommender_systems)
    18. Gomez-Uribe, Carlos A.; Hunt, Neil (2015). The Netflix Recommender System. ACM Transactions on Management Information Systems. 6 (4): 1–19.
    19. Systex Etu.(2014). Etu Recommender 2.0 精準推薦和消費者行為分析平台教育訓練
    20. Akoios.(2020). Building a movie recommender system. Retrieved June 10th, 2020, from https://medium.com/@akoios/building-a-movie-recommender-system-e2384328a134
    21. 張良卉.(2013).矩陣分解法對網路評比資料分析之探討.
    22. Reza Zadeh,Databricks,Stanford. (2015). Stanford CME 323: Distributed Algorithms and Optimization, Spring 2015 ,lecture 14
    23. DigiTimes.(2014) Big Data經典案例:星期五、尿布與啤酒. Retrieved June 8th, 2020, from https://www.digitimes.com.tw/tw/dt/n/shwnws.asp?cnlid=10&cat=35&id=401927
    24. Tomas Mikolov et al., (2013). Efficient Estimation of Word Representations in Vector Space.
    25. Co-occurrence matrix.https://en.wikipedia.org/wiki/Co-occurrence_matrix
    26. 唐正陽. (2016) 用 Word2vec 輕鬆處理新金融風控場景中的文本類數據 https://kknews.cc/tech/38lg8v8.html
    27. 溫品竹,蔡易霖,蔡宗翰 (2015) 基於Word2Vec 詞向量的網路情緒文和流行音樂媒合方法之研究
    28. David Reinsel et al., (2018) The Digitization of the World from Edge to Core .IDC White Paper – #US44413318.
    29. Apache Hadoop.(2006) https://zh.wikipedia.org/wiki/Apache_Hadoop
    30. Map Reduce.(2005) https://zh.wikipedia.org/wiki/MapReduce
    31. 加州大學柏克萊分校AMPLab. https://amplab.cs.berkeley.edu/
    32. Apache Mahout. https://mahout.apache.org/
    33. Scalable Collaborative Filtering with Apache Spark MLlib. Retrieved June 10th, 2020,from https://databricks.com/blog/2014/07/23/scalable-collaborative-filtering-with-spark-mllib.html
    34. WSDM - KKBox's Music Recommendation Challenge. (2018) https://www.kaggle.com/c/kkbox-music-recommendation-challenge/
    35. FMA: A Dataset For Music Analysis Data Set. (2017) https://archive.ics.uci.edu/ml/datasets/FMA%3A+A+Dataset+For+Music+Analysis
    36. Netflix Prize.(2009). https://en.wikipedia.org/wiki/Netflix_Prize
    37. Yahoo Webscope Program. (2020) Retrieved June 8th, 2020, from https://webscope.sandbox.yahoo.com/
    38. Yahoo! Music Webscope_C15-Yahoo! Music user ratings of musical tracks, albums, artists and genres, v 1.0 (1.5 Gbyte) Retrieved Feb 4th, 2017, from https://webscope.sandbox.yahoo.com/catalog.php?datatype=c&did=48
    39. Fayyadet al., (1996) , “ From Data Mining to Knowledge Discovery in Databases, “ AI Magazine, Volume 17, Number 3. pp. 37–54
    40. 梁德馨,葉建良. (2007) 消費者信用貸款違約風險評估模型之研究- 以 CART 分類與迴歸樹建模. 中山管理評論
    41. Apache Spark ALS.(2020). Retrieved Feb 4th, 2020, from https://spark.apache.org/docs/1.6.0/mllib-collaborative-filtering.html#collaborative-filtering
    42. Regularization.(2020). 林軒田 機器學習基石-第十四講 https://www.coursera.org/learn/ntumlone-algorithmicfoundations
    43. Overfitting https://zh.wikipedia.org/wiki/%E9%81%8E%E9%81%A9
    44. Apache Spark ALS API.(2020). Retrieved Feb 4th, 2020, from https://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#pyspark.mllib.recommendation.ALS
    45. Gensim.(2010) https://radimrehurek.com/gensim/models/word2vec.html
    46. RSME.(2020) https://en.wikipedia.org/wiki/Root-mean-square_deviation
    47. Dheeraj kumar Bokde et al., (2015). An Item-Based Collaborative Filtering using Dimensionality Reduction Techniques on Mahout Framework
    48. map.(2014). Stanford Class cs276 .Retrieved Feb 4th, 2020, from https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
    49. Nick Pentreath. (2015). Machine Learning with Spark. ISBN: 9781783288519

    QR CODE
    :::