對於NBA主力球員未來表現效率的公式創造與研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	周凱緯 Kai-Wei Chou
論文名稱：	對於NBA主力球員未來表現效率的公式創造與研究
指導教授：	洪盟凱 Meng-Kai Hong
口試委員:
學位類別：	碩士 Master
系所名稱：	理學院 - 數學系 Department of Mathematics
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	40
中文關鍵詞：	美國職業籃球聯賽、機器學習、K-平均演算法、特徵工程、多項式迴歸
外文關鍵詞：	NBA, Machine Learning, K-means Clustering, Feature Engineering, Polynomial Regression
相關次數：	點閱：9 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本篇論文靈感來自於美國職棒大聯盟(MLB)近期多人注目的賽伯計量學(Sabermetrics)，賽伯計量學對各項傳統的基礎數據做分析，組合出多項由基礎數據組合而來的進階數據。因此，本篇論文主要是希望能夠設計出一套使用基礎數據來評估美國職業籃球聯賽(NBA)球隊中主力球員未來表現效率的公式。研究中所蒐集的資料來源於NBA官網和網站 Basketball-Reference 。
本次研究先將所有資料分群後透過大量數據觀察，根據分群結果挑選出主力球員，接著我們針對主力球員資料來做預測。過程中使用了多個不同的特徵工程手法，當中包括一個自行設計的特徵工程手法，效果也較其他手法佳。最後也放上了表格做比較，該表格顯示使用更多的模型搭配剛才使用過的特徵工程方法來做預測的訓練集和測試集分數，並且比較各模型與各個特徵工程手法之間的結果。

This paper is inspired by sabermetrics in Major League Baseball (MLB) in the United States. Sabermetrics involves analyzing various traditional basic data and combining them to create advanced metrics. Therefore, the main goal of this paper is to design a formula that uses basic data to evaluate the future performance efficiency of key players in the National Basketball Association (NBA). The data for this research was collected from the NBA official website and the Basketball-Reference website.
In this study, we do clustering to all the data, and through extensive data observation, key players were selected based on the clustering results. Subsequently, we focused on predicting the performance of these key players using their data. Various feature engineering techniques were employed in the process, including a self-designed method, which yielded better results compared to other techniques. Finally, a comparison table was provided, showing the scores of the training and testing sets when using multiple models in conjunction with the feature engineering methods used earlier. The table also compares the results between different models and feature engineering techniques.

摘要 iv
Abstract v
致謝 vi
目錄 vii
一、 緒論 1
二、 數據來源 2
2.1 數據蒐集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 數據介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
三、 研究方法 5
3.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Multivariable Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 特徵工程 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 RFECV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.2 Permutation Importance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.3 F-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.4 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
四、 研究過程 9
4.1 資料分群 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 資料前處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.2 分群結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 模型預測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 資料前處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
vii
目錄
4.2.2 特徵工程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.3 預測結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
五、 總結 28
參考文獻 29
                                

[1] J. Albert, “Sabermetrics: The past, the present, and the future,” 2010.
[2] D. D. Tung, “Data mining career batting performances in baseball,” viXra, 2012.
[3] D. L. Herrlin, “Forecasting mlb performane utilizing a bayesian approach in order to optimize a
fantasy baseball draft,” 2015.
[4] D. Calvetti and E. Somersalo, Mathematics of Data Science: A Computational Approach to Clustering and Classification (Data Science). Society for Industrial and Applied Mathematics, 2020.
[5] K. P. Murphy, Probabilistic Machine Learning: An introduction. MIT Press, 2022, pp. 718–720.
[6] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using
support vector machines,” Machine Learning, 2002.
[7] L. Breiman, “Random forests,” Machine Learning, 2001.
[8] T. M. Cover and J. A. Thomas, Elements of Information Theory. 1991.
[9] B. C. Ross, “Mutual information between discrete and continuous data sets,” PLOS ONE, vol. 9,
pp. 1–5, Feb. 2014.
[10] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, 2006.
[11] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions
on Intelligent Systems and Technology, vol. 2, 27:1–27:27, 3 2011.
[12] W.-Y. Loh, “Classification and regression trees,” WIREs Data Mining and Knowledge Discovery,
vol. 1, no. 1, pp. 14–23, 2011.
[13] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” CoRR, 2016

簡易檢索 / 詳目顯示

相關論文