期刊文献+

基于XGBoost的房价预测优化 被引量:4

Optimized housing price prediction based on XGBoost
下载PDF
导出
摘要 客观来说,房价受到诸多因素的制约,正因如此,房价预测仍然是数据分析中一个非常经典且具有挑战性的问题.本文针对房价数据冗余,在实际场景中很难确定重要特征,提出了一种创新的数据预处理方式,并通过双模型迭代拟合的方式进行数据预测.首先从数据意义、数据形式和数据关联性三个方面进行初始数据预处理,然后根据数据选择适合的模型进行训练.在传统机器学习中,Random Forest和XGBoost是两种常用的方法.RF模型通过其Bagging过程,能够准确地评判“冗余”特征,而XGB模型在提高预测效果的同时,也囿于其泛化能力下降,无法稳定地反映特征重要性.因此,本文利用RF模型处理冗余数据,并使用XGB模型对新数据集进行拟合提高预测效果.本文在Kaggle竞赛的数据集("House Prices-Advanced Regression Techniques")上进行了实验,测试结果显示,XGB回归模型最终的回归精度R^(2)为87%,而单独的RF模型或XGB模型的R^(2)分别为79.2%和78.7%.实验证明,该数据预测方法能够明显提高房价预测效果.同时,为充分体现模型拟合效果和预测能力,将“房价”改为具有“高”和“低”两类的离散变量,最终预测结果的精确度为93%,召回率为93%. Objectively,housing prices are restricted by many factors and because of this,house price prediction remains a very classical and challenging problem in data analysis.In response to the redundancy of house price data,which makes it difficult to identify important features in practical scenarios,this paper proposes an innovative approach to data pre-processing and data prediction by means of double model iterative fitting.The initial data is pre-processed in terms of data meaning,data form and data relevance,then suitable models are selected for training.In traditional machine learning,Random Forest(RF)and XGBoost(XGB)are two commonly used methods.The RF model is able to accurately judge"redundant"features through its Bagging process.The XGB model,while improving prediction,is also limited by its reduced generalisation ability and cannot stably reflect the importance of features.Therefore,this paper uses the RF model to process redundant data and uses the XGB model to fit new data sets to improve the prediction results.In this paper,experiments were conducted on the Kaggle competition dataset("House Prices-Advanced Regression Techniques")and the test results showed that the final regression accuracy R^(2) of the XGB regression model was 87%,while the R^(2) of the single RF model and the single XGB model were 79.2%and 78.7%,respectively.The experiment proves that the data prediction method can significantly improve the effect of housing price prediction.To fully reflect the model fitting effect and prediction ability,the authors change the"house price"to discrete variable which has two categories of"high"and"low",and get the Confusion Matrix with an precision of 93%and a recall of 93%.
作者 陶然 TAO Ran(University of Auckland,Auckland 999030,New Zealand)
机构地区 奥克兰大学
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2022年第3期181-198,共18页 Journal of Sichuan University(Natural Science Edition)
基金 国家自然科学基金(61806040)。
关键词 房价预测 机器学习 XGBoost Random Forest 模型迭代回归 House price prediction Machine learning XGBoost Random Forest Iterative model regression
  • 相关文献

同被引文献46

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部