跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李思娜
Intan Lisnawati
論文名稱: 基於樹的集成方法在房屋銷售價格預測中的應用
Tree-Based Ensemble Methods with an Application in House Sale Price Prediction
指導教授: 須上苑
Shang-Yuan Shiu
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2022
畢業學年度: 111
語文別: 英文
論文頁數: 79
中文關鍵詞: 预测合奏法基础学习者损失函数
外文關鍵詞: prediction, ensemble method, base learner, loss function
相關次數: 點閱:12下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 给定一些输入,我们想要对相应的输出进行预测。为了对单个估计器做出 更好的预测,集成方法结合了使用给定学习算法构建的几个基本估计器的预 测。还可以调整每种方法的参数,以使真实值和预测值之间的差距更小。通 过使用房屋销售价格训练数据集,我们应用一些集成方法来预测看不见的房 屋销售价格数据集,并根据它们的均方根误差(RMSE) 值查看准确性。结果 表明,Gradient Boosting 给出的RMSE 最小,为22,766 美元,同时随机森林 为23,269 美元,XGBoost 为24,069 美元,决策树为35,637 美元


    Given some input, we want to make a prediction for the correspond- ing output. In order to make a better prediction over a single estimator, ensemble methods combine the predictions of several base estimators built with a given learning algorithm. Each method’s parameter also can be adjusted in order to get a closer gap between the real and the predicted value. By using House Sale Price training data set, we apply some ensem- ble methods to predict the unseen House Sale Price data set and see the accuracy based on their Root Mean Squared Error (RMSE) value. It shows that Gradient Boosting gives the smallest RMSE, US$ 22,766, meanwhile Random Forest US$ 23,269, XGBoost US$ 24,069, and Decision Tree US$ 35,637.

    Abstract vi Acknowledgement vii Table of contents ix List of Tables x List of Figures xii 1 Introduction 1 1.1 Background 1 1.2 TheFounders 2 1.3 Goals 5 2 Decision Trees 5 2.1 DecisionTreesTerminology 5 2.2 RoughIdea 6 2.3 MathematicalFormulation 10 2.4 SimpleSimulation 12 3 Ensemble Methods 15 3.1 RandomForests 15 3.1.1 RoughIdea 15 3.1.2 Bootstrapping 15 3.1.3 RandomForestsAlgorithm 16 3.1.4 SimpleSimulation 17 3.2 GradientBoosting 18 3.2.1 RoughIdea 18 3.2.2 Gradient Points in The Direction of Maximum Increase 19 3.2.3 SimpleSimulation 21 3.2.4 Plugging Base Learner in Gradient Boosting 26 3.2.5 Gradient Boosting Algorithm 28 3.3 XGBoos 29 3.3.1 RoughIdea 29 3.3.2 XGBoostAlgorithm 35 4 Numerical Simulations 37 4.1 DataPreprocessing 37 4.1.1 About Ames 37 4.1.2 Exploratory Data Analysis 37 4.1.3 Feature Selection 41 4.2 The Benchmark 45 4.3 Numerical Simulation by using Decision Tree 47 4.3.1 The Tree’s Appearancein A DecisionTree 49 4.4 Numerical Simulation by using Random Forests 50 4.4.1 The Tree’s Appearance in Random Forests 52 4.5 Numerical Simulation by using Gradient Boosting 54 4.5.1 The Tree’s Appearance in Gradient Boosting 58 4.6 Numerical Simulation by using XGBoost 60 4.6.1 The Tree’s Appearance in XGBoost 63 5 Conclusion 65 References 66

    [1] About ames. https://www.cityofames.org/about-ames/about-ames. (Accessed on 08/08/2022).
    [2] Distributed (deep) machine learning common. http://dmlc.io. (Accessed on 08/08/2022).
    [3]Jerome h. friedman: Applying statistics to data and ma- chine learning. https://www.historyofdatascience.com/ jerome-friedman-applying-statistics-to-data-and-machine-learning/. (Accessed on 08/08/2022).
    [4] Jeremy Adler and Ingela Parmryd. Quantifying colocalization by correla- tion: The pearson correlation coefficient is superior to the mander’s overlap coefficient. Cytometry Part A, 77(8):733–742, 2010.
    [5] Abbas Alharan, Radhwan Alsagheer, and Ali Al-Haboobi. Popular decision tree algorithms of data mining techniques: A review. International Journal of Computer Science and Mobile Computing, 6:133–142, 06 2017.
    [6] E Chandra Blessie and E Karthikeyan. Sigmis: a feature selection algorithm using correlation based method. Journal of Algorithms & Computational Technology, 6(3):385–394, 2012.
    [7] Louis-Ashley CAMUS. The explanation of the color circle around your profil !! https://www.kaggle.com/general/193193. (Accessed on 09/09/2022).
    [8] Tianqi Chen. https://www.linkedin.com/in/tianqi-chen-679a9856/. (Accessed on 08/08/2022).
    [9] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting sys- tem. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794, 2016.
    [10] Dean De Cock. House prices - advanced regres- sion techniques. https://www.kaggle.com/competitions/ house-prices-advanced-regression-techniques. (Accessed on 08/08/2022).
    [11] Adele Cutler. Remembering leo breiman. The Annals of Applied Statistics, 4(4):1621–1633, 2010.
    [12] Nicholas I. Fisher. A conversation with jerry friedman. Statistical Science, 30(2):268–295, 2015.
    [13] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189-1232, 1999.
    [14] Jerome Harold Friedman. Jerome h. friedman. https://jerryfriedman. su.domains. (Accessed on 08/08/2022).
    [15] Jerome Harold Friedman. Vita. https://jerryfriedman.su.domains/ ftp/vita.pdf, December 2012. (Accessed on 08/08/2022).
    [16] James Gareth, Witten Daniela, Hastie Trevor, and Tibshirani Robert. An Introduction To Statistical Learning: with Applications in R. Spinger, 2017.
    [17] Map of Ames. https://www.istockphoto.com/vector/ iowa-outline-vector-map-usa-printable-gm1176116889-327779552. (Accessed on 08/08/2022).
    [18] John Rice Peter Bickel, Michael Jordan. In memoriam leo breiman.
    https://senate.universityofcalifornia.edu/_files/inmemoriam/ html/leobreiman.htm. (Accessed on 08/08/2022).
    [19] Patrick Schober, Christa Boer, and Lothar A Schwarte. Correlation co- efficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5):1763–1768, 2018.
    [20] Scikit-Learn. Decision trees — scikit-learn documentation. https: //scikit-learn.org/stable/modules/tree.html. (Accessed on 08/08/2022).
    [21] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
    [22] Berkeley Statistics. In memory of leo breiman. https://statistics. berkeley.edu/about/memoriam/memory-leo-breiman. (Accessed on 08/08/2022).
    [23] Richard E Williamson, Richard H Crowell, and Hale F Trotter. Calculus of Vector Functions. Prentice Hall, 1972.
    [24] Hulin Wu, Jose Miguel Yamal, Ashraf Yaseen, and Vahed Maroufy. Statistics and Machine Learning Methods for EHR Data: From Data Extraction to Data Analytics. CRC Press, 2021.
    [25] Zhi-Hua Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, 2012.

    QR CODE
    :::