跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李承祐
Cheng-Yu Lee
論文名稱: 透過機器學習預測美國職棒大聯盟球員薪資
Using Machine Learning to predict salaries of Major League Baseball players
指導教授: 許秉瑜
Ping-Yu Hsu
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 企業管理學系
Department of Business Administration
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 61
中文關鍵詞: 美國職棒限梯度提升支援向量機鄰近法薪資預測分類
外文關鍵詞: MLB, XGBoost, SVM, KNN, Predicting Salaries, Classification
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 美國職棒大聯盟(MLB, Major League Baseball)是全世界具有龐大關注度的運動之
    一,近年來除了關注球員以及球隊的表現外,球員的薪資也是球迷討論中的焦點之一,
    總會引起球迷的討論,也會開始檢視該球員的表現是否真的符合他的身價。
    所以如何評估球員薪資的依據一直是很熱門的話題,最直接的依據就是球員在比賽
    中的成績表現,除了球員本身在賽場上所呈現的數據表現外,許多學者也提出一些可能
    會影響球員薪資的變數。目前已經有許多關於大聯盟薪資的研究,影響薪資的原因有很
    多種,甚至有學者將球員分成投手與打者兩者進行分析。
    因此本研究致力於將球員當年度的薪資與隔年度的薪資漲幅做區間,利用機器學習
    的方法,如極限梯度提升(XGBoost)、支援向量機(SVM)與 K 鄰近法(KNN)建構分類
    (Classificaition)預測模型,除了建構預測球員薪資漲幅的模型,也利用極限梯度提升去
    驗證我們在本研究所新增的變數,結果顯示本研究所新增的變數可以做為預測薪資的依
    據。


    Major League Baseball is one of the most watched sports in the world. In recent years, in
    addition to focusing on the performance of a player and his team, a player's salary has also been a
    focus of fan discussion, always generating discussion and beginning to examine whether a player's
    performance really matches his worth.
    Therefore, how to evaluate the salary of players has always been a hot topic. The most direct basis
    is the performance of players in the game. In addition to the statistical performance of players on
    the field, many scholars have also proposed some variables that may affect the salary of players. At
    present, there have been many studies on the salary of major league baseball, and there are many
    reasons for the influence of salary. Some scholars even divide the players into pitcher and hitter for
    analysis.
    Therefore, this study focused on the players into the compensation to the annual salary increase do
    interval, using machine learning methods, such as limit gradient (XGBoost) and support vector
    machine (SVM) and K Nearest Neighbor (KNN) to do a classficiation prediction model, in addition
    to build models of forecasting player salary increase, also use limit gradient to validate our new
    variables in this research institute, the results show that the new variables can be predicted as salary
    in our study.

    中文摘要................................................................................................ i ABSTRACT......................................................................................... ii 目錄...................................................................................................... iii 圖目錄................................................................................................... v 表目錄.................................................................................................. vi 第一章 緒論......................................................................................... 1 1-1 研究背景.................................................................................................................1 1-2 研究動機.................................................................................................................2 1-3 研究目的...............................................................................................................3 1-4 論文結構...............................................................................................................5 第二章 文獻探討................................................................................. 6 2-1 美國職棒薪水變數的文獻探討..............................................................................6 第三章 研究方法............................................................................... 13 3-1 研究設計...............................................................................................................13 3-2 分類模型...............................................................................................................14 3-2-1 極限梯度提升(XGboost)...................................................................................14 3-2-2 支援向量機(SVM)...........................................................................................16 3-2-3 K 鄰近算法(KNN)............................................................................................17iv 第四章 研究分析............................................................................... 19 4-1 美國職棒概述.......................................................................................................19 4-2 資料來源與資料集...............................................................................................22 4-3 資料預處理...........................................................................................................27 4-4 結果驗證...............................................................................................................30 4-4-1 XGBoost 模型預測結果.....................................................................................30 4-4-2 SVM 模型預測結果...........................................................................................37 4-4-3 KNN 模型預測結果...........................................................................................41 4-5 準確度的比較.......................................................................................................46 第五章 結論與建議........................................................................... 47 5-1 研究結論...............................................................................................................47 5-2 研究限制與建議...................................................................................................48 參考資料............................................................................................. 49

    [1] 林玉凡. (2015). 改變棒球的大數據統計. Retrieved from
    https://group.dailyview.tw/article/detail/280
    [2] 林柏辰. (2015). 自由球員實施 40 年 最高薪資漲逾 100 倍.
    [3] 張佑生. (2019). 「最強吸金機」波拉斯 MLB 經紀界之王. Retrieved from
    https://udn.com/news/story/6813/4255420
    [4] 陳重嘉. (2013). 從洋基的補強談 MLB 的豪華稅制. Retrieved from
    https://tw.sports.yahoo.com/blogs/mlb/從洋基的補強談 mlb 的豪華稅制
    -010258425.html
    [5] Adankon, M. M., & Cheriet, M. J. P. R. (2009). Model selection for the LS-SVM.
    Application to handwriting recognition. 42(12), 3264-3270.
    [6] Baumer, B. S., Jensen, S. T., & Matthews, G. J. J. J. o. Q. A. i. S. (2015). openWAR: An
    open source system for evaluating overall player performance in major league
    baseball. 11(2), 69-84.
    [7] Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and
    regression trees: CRC press.
    [8] Brown, M. (2019).
    https://www.forbes.com/sites/maurybrown/2019/02/11/inside-the-numbers-the-pl
    ayer-salary-battle-lines-between-mlb-and-the-mlbpa/#44e659ee5c14. Retrieved
    from
    https://www.forbes.com/sites/maurybrown/2019/02/11/inside-the-numbers-th
    e-player-salary-battle-lines-between-mlb-and-the-mlbpa/#44e659ee5c14
    [9] Calandra, W. (2020). The MLB Has A Competitive Balance Issue, And It’s Related To
    Money And Payroll Inequalities. Retrieved from
    https://georgetownvoice.com/2020/02/18/the-mlb-has-a-competitive-balance-iss
    ue-and-its-related-to-money-and-payroll-inequalities/
    [10] Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper
    presented at the Proceedings of the 22nd acm sigkdd international conference on
    knowledge discovery and data mining.
    [11] Chen, W.-H., Hsu, S.-H., Shen, H.-P. J. C., & Research, O. (2005). Application of SVM
    and ANN for intrusion detection. 32(10), 2617-2634.
    [12] Cherkassky, V., & Ma, Y. J. N. n. (2004). Practical selection of SVM parameters and
    noise estimation for SVM regression. 17(1), 113-126.
    [13] Dinerstein, M. J. R. M. (2007). Free Agency and Contract Options: How Major League
    Baseball Teams Value Players. 1, 2007.
    [14] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting
    machine. Annals of statistics, 1189-1232.
    [15] Gatto, T. (2020). MLB payrolls 2020: Five takeaways from reported salary figures.
    Retrieved from
    https://www.sportingnews.com/us/mlb/news/mlb-payrolls-2020-salary-luxury-t
    ax-yankees-dodgers/1n9r4tfqs5ycu1w2pksgajfh70
    [16] Hakes, J. K., & Turner, C. J. J. o. P. A. (2011). Pay, productivity and aging in Major
    League Baseball. 35(1), 61-74.
    [17] Hochberg, D. (2011). The Effect of Contract Year Performance on Free Agent Salary in 50
    Major League Baseball.
    [18] Holmes, P. J. L. E. (2011). New evidence of salary discrimination in major league
    baseball. 18(3), 320-331.
    [19] James, B. (1988). The Bill James historical baseball abstract: Random House
    Incorporated.
    [20] Li, X., Wu, S., Li, X., Yuan, H., & Zhao, D. J. C. J. o. M. E. (2020). Particle Swarm
    Optimization-Support Vector Machine Model for Machinery Fault Diagnoses in
    High-Voltage Circuit Breakers. 33(1), 1-10.
    [21] Magel, R., & Hoffman, M. (2015). Predicting salaries of major league baseball players.
    International Journal of Sports Science, 5(2), 51-58.
    [22] Meltzer, J. J. A. S. U. (2005). Average salary and contract length in Major League
    Baseball: When do they diverge?
    [23] Palmer, M. C., & King, R. H. J. E. E. J. (2006). Has salary discrimination really
    disappeared from major league baseball? , 32(2), 285-297.
    [24] Rottenberg, S. J. J. o. p. e. (1956). The baseball players' labor market. 64(3), 242-258.
    [25] Scully, G. W. J. T. A. E. R. (1974). Pay and performance in major league baseball.
    64(6), 915-930.
    [26] Strobl, C., Malley, J., & Tutz, G. J. P. m. (2009). An introduction to recursive
    partitioning: rationale, application, and characteristics of classification and
    regression trees, bagging, and random forests. 14(4), 323.
    [27] Torlay, L., Perrone-Bertolotti, M., Thomas, E., & Baciu, M. J. B. i. (2017). Machine
    learning–XGBoost analysis of language networks to classify patients with epilepsy.
    4(3), 159-169.
    [28] Weinberger, K. Q., & Saul, L. K. J. J. o. M. L. R. (2009). Distance metric learning for
    large margin nearest neighbor classification. 10(Feb), 207-244.
    [29] Zhang, M.-L., & Zhou, Z.-H. J. P. r. (2007). ML-KNN: A lazy learning approach to
    multi-label learning. 40(7), 2038-2048.

    QR CODE
    :::