| 研究生: |
周凱緯 Kai-Wei Chou |
|---|---|
| 論文名稱: |
對於NBA主力球員未來表現效率的公式創造與研究 |
| 指導教授: |
洪盟凱
Meng-Kai Hong |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 數學系 Department of Mathematics |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 40 |
| 中文關鍵詞: | 美國職業籃球聯賽 、機器學習 、K-平均演算法 、特徵工程 、多項式迴歸 |
| 外文關鍵詞: | NBA, Machine Learning, K-means Clustering, Feature Engineering, Polynomial Regression |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文靈感來自於美國職棒大聯盟(MLB)近期多人注目的賽伯計量學(Sabermetrics),賽伯計量學對各項傳統的基礎數據做分析,組合出多項由基礎數據組合而來的進階數據。因此,本篇論文主要是希望能夠設計出一套使用基礎數據來評估美國職業籃球聯賽(NBA)球隊中主力球員未來表現效率的公式。研究中所蒐集的資料來源於NBA官網和網站 Basketball-Reference 。
本次研究先將所有資料分群後透過大量數據觀察,根據分群結果挑選出主力球員,接著我們針對主力球員資料來做預測。過程中使用了多個不同的特徵工程手法,當中包括一個自行設計的特徵工程手法,效果也較其他手法佳。最後也放上了表格做比較,該表格顯示使用更多的模型搭配剛才使用過的特徵工程方法來做預測的訓練集和測試集分數,並且比較各模型與各個特徵工程手法之間的結果。
This paper is inspired by sabermetrics in Major League Baseball (MLB) in the United States. Sabermetrics involves analyzing various traditional basic data and combining them to create advanced metrics. Therefore, the main goal of this paper is to design a formula that uses basic data to evaluate the future performance efficiency of key players in the National Basketball Association (NBA). The data for this research was collected from the NBA official website and the Basketball-Reference website.
In this study, we do clustering to all the data, and through extensive data observation, key players were selected based on the clustering results. Subsequently, we focused on predicting the performance of these key players using their data. Various feature engineering techniques were employed in the process, including a self-designed method, which yielded better results compared to other techniques. Finally, a comparison table was provided, showing the scores of the training and testing sets when using multiple models in conjunction with the feature engineering methods used earlier. The table also compares the results between different models and feature engineering techniques.
[1] J. Albert, “Sabermetrics: The past, the present, and the future,” 2010.
[2] D. D. Tung, “Data mining career batting performances in baseball,” viXra, 2012.
[3] D. L. Herrlin, “Forecasting mlb performane utilizing a bayesian approach in order to optimize a
fantasy baseball draft,” 2015.
[4] D. Calvetti and E. Somersalo, Mathematics of Data Science: A Computational Approach to Clustering and Classification (Data Science). Society for Industrial and Applied Mathematics, 2020.
[5] K. P. Murphy, Probabilistic Machine Learning: An introduction. MIT Press, 2022, pp. 718–720.
[6] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using
support vector machines,” Machine Learning, 2002.
[7] L. Breiman, “Random forests,” Machine Learning, 2001.
[8] T. M. Cover and J. A. Thomas, Elements of Information Theory. 1991.
[9] B. C. Ross, “Mutual information between discrete and continuous data sets,” PLOS ONE, vol. 9,
pp. 1–5, Feb. 2014.
[10] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, 2006.
[11] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions
on Intelligent Systems and Technology, vol. 2, 27:1–27:27, 3 2011.
[12] W.-Y. Loh, “Classification and regression trees,” WIREs Data Mining and Knowledge Discovery,
vol. 1, no. 1, pp. 14–23, 2011.
[13] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” CoRR, 2016