| 研究生: |
萬柏良 Bo-Liang Wan |
|---|---|
| 論文名稱: |
以機器學習方法預測美國職棒大聯盟打者薪資 |
| 指導教授: |
洪盟凱
John M. Hong 胡中興 Chung-Hsing Alex Hu |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 數學系 Department of Mathematics |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 38 |
| 中文關鍵詞: | 美國職棒大聯盟 、機器學習 、薪資預測 |
| 外文關鍵詞: | Major League Baseball, Machine Learning, Salary prediction |
| 相關次數: | 點閱:17 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究的預測目標是美國職棒大聯盟打者薪資,從打者歷年的打擊數據 (安打、得分、全
壘打、...)、守備數據 (刺殺、助殺、失誤、...)、其他紀錄 (年度、年資、年齡、出賽次數、先
發次數) 找出合適的自變數,將次年的薪資作為應變數,投入多個迴歸模型訓練。本研究以
2003-2014 年度紀錄投入訓練,預測 2015 年度過後美國職棒大聯盟打者將會獲得之薪資。
當中資料前置處裡做了三件事:
1. 排除了外援打者 (來自古巴聯賽、委內瑞拉職業棒球聯盟、多明尼加冬季棒球聯盟、...)
的數據。
2. 薪資取自然對數。
3. 原先數據僅記錄當年的表現數據 (打擊數據、守備數據)。變更為記錄最近五年來的表現
數據 (打擊數據、守備數據) 之加總。
This research aims to predict Major League Baseball batter’s salary. The batters’ batting
records(H,R,HR,...), fielding records(PO,E,A,...) and other records(year, seniority,age,G,GS)
are independent variables. With the help of feature engineering, we can find out the suitable
feature variables which are fed for training a prediction model. This research uses the record
from 2003-2014 as the dataset of a regression model that predicts batters’ salary after 2015.
In data preprocessing we did three things:
1. Drop the international players(from Serie Nacional de Béisbol, Venezuelan Professional
Baseball League, Dominicana Professional Baseball League,...) data.
2. Natural logarithm of salary.
3. Original data table record performance in each year(batting record, fielding record). However, we changed record method, use sum of last five years performance record(batting
record, fielding record).
[1] Charu C. Aggarwal. Outlier Analysis. Springer Cham. ISBN:978-3-319-47577-6, (2017).
[2] Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine
Learning. Cambridge University Press. ISBN:9781108679930, (2020).
[3] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. The
Annals of Statistics, Oct., 2001, Vol. 29, No. 5, pp. 1189-1232, (2001).
[4] Joseph Gatto, Ravi Lanka, Yumi Iwashita, and Adrian Stoica. Single sample feature importance: An interpretable algorithm for low-level feature analysis. arXiv:1911.11901,
(2019).
[5] Stanton A. Glantz and Bryan K. Slinker. Primer of applied regression and analysis of
variance. Mcgraw-Hill. ISBN:978-0070234079, (1990).
[6] James Richard Hill and William Spellman. Pay discrimination in baseball: Data from the
seventies. Industrial Relations.23, 103-112, (1984).
[7] Martin J Hirzel, Scott Schneider, and Kanat Tangwongsan. Sliding-window aggregation
algorithms: Tutorial. DEBS ’17: Proceedings of the 11th ACM International Conference
on Distributed and Event-based Systems.9781450350655, (2017).
[8] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to
Statistical Learning with Applications in R. Springer Texts in Statistics. ISBN: 978-1-4614-
7138-7. (2013).
[9] James R. Lackritz. Salary evaluation for professional baseball players. The American
Statistician Vol. 44, No. 1, (1990).
[10] Sean Lahman. Lahman’s baseball database. https://www.seanlahman.com/, (2020).
[11] Don N. MacDonald and Morgan O. Reynolds. Are baseball players paid their marginal
products? Managerial and Decision Economics Vol. 15, No. 5, Special Issue: The Economics of Sports Enterprises, pp. 443-457, (1994).
[12] Major League Baseball. Salary Arbitration, (2022).
https://www.mlb.com/glossary/transactions/salary-arbitration.
[13] Gerald W. Scully. Pay and performance in major league baseball. American Economic
Review. vol. 64, issue 6, 915-30, (1974).
[14] C. Sheppard. Tree-based Machine Learning Algorithms: Decision Trees, Random Forests,
and Boosting. CreateSpace Independent Publishing Platform ISBN:9781975860974,
(2017).
[15] John W Tukey. Exploratory Data Analysis. Addison-Wesley. ISBN:978-0-201-07616-5,
(1977).
[16] Mehmet Barlas Uzun, Gülbin Özçelikay, and Gizem Aykaç Gülpınar. The situation
of curriculums of faculty of pharmacies in turkey. Marmara Pharmaceutical Journal.
21(24530):183-189, (2016).
[17] 蕭文龍. 多變量分析最佳入門實用書 (第二版). 碁峰 ISBN:9789861817347, (2009).