| 研究生: |
李思娜 Intan Lisnawati |
|---|---|
| 論文名稱: |
基於樹的集成方法在房屋銷售價格預測中的應用 Tree-Based Ensemble Methods with an Application in House Sale Price Prediction |
| 指導教授: |
須上苑
Shang-Yuan Shiu |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 數學系 Department of Mathematics |
| 論文出版年: | 2022 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 79 |
| 中文關鍵詞: | 预测 、合奏法 、基础学习者 、损失函数 |
| 外文關鍵詞: | prediction, ensemble method, base learner, loss function |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
给定一些输入,我们想要对相应的输出进行预测。为了对单个估计器做出 更好的预测,集成方法结合了使用给定学习算法构建的几个基本估计器的预 测。还可以调整每种方法的参数,以使真实值和预测值之间的差距更小。通 过使用房屋销售价格训练数据集,我们应用一些集成方法来预测看不见的房 屋销售价格数据集,并根据它们的均方根误差(RMSE) 值查看准确性。结果 表明,Gradient Boosting 给出的RMSE 最小,为22,766 美元,同时随机森林 为23,269 美元,XGBoost 为24,069 美元,决策树为35,637 美元
Given some input, we want to make a prediction for the correspond- ing output. In order to make a better prediction over a single estimator, ensemble methods combine the predictions of several base estimators built with a given learning algorithm. Each method’s parameter also can be adjusted in order to get a closer gap between the real and the predicted value. By using House Sale Price training data set, we apply some ensem- ble methods to predict the unseen House Sale Price data set and see the accuracy based on their Root Mean Squared Error (RMSE) value. It shows that Gradient Boosting gives the smallest RMSE, US$ 22,766, meanwhile Random Forest US$ 23,269, XGBoost US$ 24,069, and Decision Tree US$ 35,637.
[1] About ames. https://www.cityofames.org/about-ames/about-ames. (Accessed on 08/08/2022).
[2] Distributed (deep) machine learning common. http://dmlc.io. (Accessed on 08/08/2022).
[3]Jerome h. friedman: Applying statistics to data and ma- chine learning. https://www.historyofdatascience.com/ jerome-friedman-applying-statistics-to-data-and-machine-learning/. (Accessed on 08/08/2022).
[4] Jeremy Adler and Ingela Parmryd. Quantifying colocalization by correla- tion: The pearson correlation coefficient is superior to the mander’s overlap coefficient. Cytometry Part A, 77(8):733–742, 2010.
[5] Abbas Alharan, Radhwan Alsagheer, and Ali Al-Haboobi. Popular decision tree algorithms of data mining techniques: A review. International Journal of Computer Science and Mobile Computing, 6:133–142, 06 2017.
[6] E Chandra Blessie and E Karthikeyan. Sigmis: a feature selection algorithm using correlation based method. Journal of Algorithms & Computational Technology, 6(3):385–394, 2012.
[7] Louis-Ashley CAMUS. The explanation of the color circle around your profil !! https://www.kaggle.com/general/193193. (Accessed on 09/09/2022).
[8] Tianqi Chen. https://www.linkedin.com/in/tianqi-chen-679a9856/. (Accessed on 08/08/2022).
[9] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting sys- tem. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794, 2016.
[10] Dean De Cock. House prices - advanced regres- sion techniques. https://www.kaggle.com/competitions/ house-prices-advanced-regression-techniques. (Accessed on 08/08/2022).
[11] Adele Cutler. Remembering leo breiman. The Annals of Applied Statistics, 4(4):1621–1633, 2010.
[12] Nicholas I. Fisher. A conversation with jerry friedman. Statistical Science, 30(2):268–295, 2015.
[13] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189-1232, 1999.
[14] Jerome Harold Friedman. Jerome h. friedman. https://jerryfriedman. su.domains. (Accessed on 08/08/2022).
[15] Jerome Harold Friedman. Vita. https://jerryfriedman.su.domains/ ftp/vita.pdf, December 2012. (Accessed on 08/08/2022).
[16] James Gareth, Witten Daniela, Hastie Trevor, and Tibshirani Robert. An Introduction To Statistical Learning: with Applications in R. Spinger, 2017.
[17] Map of Ames. https://www.istockphoto.com/vector/ iowa-outline-vector-map-usa-printable-gm1176116889-327779552. (Accessed on 08/08/2022).
[18] John Rice Peter Bickel, Michael Jordan. In memoriam leo breiman.
https://senate.universityofcalifornia.edu/_files/inmemoriam/ html/leobreiman.htm. (Accessed on 08/08/2022).
[19] Patrick Schober, Christa Boer, and Lothar A Schwarte. Correlation co- efficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5):1763–1768, 2018.
[20] Scikit-Learn. Decision trees — scikit-learn documentation. https: //scikit-learn.org/stable/modules/tree.html. (Accessed on 08/08/2022).
[21] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
[22] Berkeley Statistics. In memory of leo breiman. https://statistics. berkeley.edu/about/memoriam/memory-leo-breiman. (Accessed on 08/08/2022).
[23] Richard E Williamson, Richard H Crowell, and Hale F Trotter. Calculus of Vector Functions. Prentice Hall, 1972.
[24] Hulin Wu, Jose Miguel Yamal, Ashraf Yaseen, and Vahed Maroufy. Statistics and Machine Learning Methods for EHR Data: From Data Extraction to Data Analytics. CRC Press, 2021.
[25] Zhi-Hua Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, 2012.