跳到主要內容

簡易檢索 / 詳目顯示

研究生: 萬柏良
Bo-Liang Wan
論文名稱: 以機器學習方法預測美國職棒大聯盟打者薪資
指導教授: 洪盟凱
John M. Hong
胡中興
Chung-Hsing Alex Hu
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 38
中文關鍵詞: 美國職棒大聯盟機器學習薪資預測
外文關鍵詞: Major League Baseball, Machine Learning, Salary prediction
相關次數: 點閱:17下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究的預測目標是美國職棒大聯盟打者薪資,從打者歷年的打擊數據 (安打、得分、全
    壘打、...)、守備數據 (刺殺、助殺、失誤、...)、其他紀錄 (年度、年資、年齡、出賽次數、先
    發次數) 找出合適的自變數,將次年的薪資作為應變數,投入多個迴歸模型訓練。本研究以
    2003-2014 年度紀錄投入訓練,預測 2015 年度過後美國職棒大聯盟打者將會獲得之薪資。
    當中資料前置處裡做了三件事:
    1. 排除了外援打者 (來自古巴聯賽、委內瑞拉職業棒球聯盟、多明尼加冬季棒球聯盟、...)
    的數據。
    2. 薪資取自然對數。
    3. 原先數據僅記錄當年的表現數據 (打擊數據、守備數據)。變更為記錄最近五年來的表現
    數據 (打擊數據、守備數據) 之加總。


    This research aims to predict Major League Baseball batter’s salary. The batters’ batting
    records(H,R,HR,...), fielding records(PO,E,A,...) and other records(year, seniority,age,G,GS)
    are independent variables. With the help of feature engineering, we can find out the suitable
    feature variables which are fed for training a prediction model. This research uses the record
    from 2003-2014 as the dataset of a regression model that predicts batters’ salary after 2015.
    In data preprocessing we did three things:
    1. Drop the international players(from Serie Nacional de Béisbol, Venezuelan Professional
    Baseball League, Dominicana Professional Baseball League,...) data.
    2. Natural logarithm of salary.
    3. Original data table record performance in each year(batting record, fielding record). However, we changed record method, use sum of last five years performance record(batting
    record, fielding record).

    摘要 i Abstract ii 目錄 iii 圖目錄 v 1 緒論 1 1.1 研究動機 1 1.2 研究目的 1 1.3 研究問題 1 1.4 研究對象 2 2 背景知識 3 2.1 棒球數據 3 2.2 薪資仲裁制度 [12] 3 2.3 統計方法 3 2.3.1 皮爾森相關係數 r 3 2.3.2 共線性問題 [17] 4 2.4 線性迴歸模型 [2] 5 2.5 決策樹模型 [14] 6 2.6 集成學習 [14] 7 2.6.1 Bagging 7 2.6.2 Boosting[3] 7 2.6.3 Feature Importance[4] 13 2.7 判定係數 [5] 14 3 實驗 & 結果 16 3.1 實驗流程 16 3.2 預測效能 (機器學習訓練與測試) 26 4 結論 27 參考文獻 28

    [1] Charu C. Aggarwal. Outlier Analysis. Springer Cham. ISBN:978-3-319-47577-6, (2017).
    [2] Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine
    Learning. Cambridge University Press. ISBN:9781108679930, (2020).
    [3] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. The
    Annals of Statistics, Oct., 2001, Vol. 29, No. 5, pp. 1189-1232, (2001).
    [4] Joseph Gatto, Ravi Lanka, Yumi Iwashita, and Adrian Stoica. Single sample feature importance: An interpretable algorithm for low-level feature analysis. arXiv:1911.11901,
    (2019).
    [5] Stanton A. Glantz and Bryan K. Slinker. Primer of applied regression and analysis of
    variance. Mcgraw-Hill. ISBN:978-0070234079, (1990).
    [6] James Richard Hill and William Spellman. Pay discrimination in baseball: Data from the
    seventies. Industrial Relations.23, 103-112, (1984).
    [7] Martin J Hirzel, Scott Schneider, and Kanat Tangwongsan. Sliding-window aggregation
    algorithms: Tutorial. DEBS ’17: Proceedings of the 11th ACM International Conference
    on Distributed and Event-based Systems.9781450350655, (2017).
    [8] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to
    Statistical Learning with Applications in R. Springer Texts in Statistics. ISBN: 978-1-4614-
    7138-7. (2013).
    [9] James R. Lackritz. Salary evaluation for professional baseball players. The American
    Statistician Vol. 44, No. 1, (1990).
    [10] Sean Lahman. Lahman’s baseball database. https://www.seanlahman.com/, (2020).
    [11] Don N. MacDonald and Morgan O. Reynolds. Are baseball players paid their marginal
    products? Managerial and Decision Economics Vol. 15, No. 5, Special Issue: The Economics of Sports Enterprises, pp. 443-457, (1994).
    [12] Major League Baseball. Salary Arbitration, (2022).
    https://www.mlb.com/glossary/transactions/salary-arbitration.
    [13] Gerald W. Scully. Pay and performance in major league baseball. American Economic
    Review. vol. 64, issue 6, 915-30, (1974).
    [14] C. Sheppard. Tree-based Machine Learning Algorithms: Decision Trees, Random Forests,
    and Boosting. CreateSpace Independent Publishing Platform ISBN:9781975860974,
    (2017).
    [15] John W Tukey. Exploratory Data Analysis. Addison-Wesley. ISBN:978-0-201-07616-5,
    (1977).
    [16] Mehmet Barlas Uzun, Gülbin Özçelikay, and Gizem Aykaç Gülpınar. The situation
    of curriculums of faculty of pharmacies in turkey. Marmara Pharmaceutical Journal.
    21(24530):183-189, (2016).
    [17] 蕭文龍. 多變量分析最佳入門實用書 (第二版). 碁峰 ISBN:9789861817347, (2009).

    QR CODE
    :::