摘要
为探究北京市PM2.5含量与其他指标之间的关系,文章基于北京市2020年一整年的AQI指数及其六项指标进行回归分析。首先,通过散点图初步判断PM2.5的指标与其他几项指标间的线性关系。然后,检验各自变量指标间是否存在相关性,即模型是否存在复共线性。接着对变量运用主成分回归,选取合适的主成分,来消除变量间复共线性对拟合线性回归模型的影响。之后对主成分回归得到的线性模型进行统计诊断,消除异常值点对模型的影响。最后,用逐步回归得到PM2.5与PM10、SO_(2)、CO、NO_(2)、O_(3)最终的线性回归模型。
In order to explore the relationship between PM2.5 content and other indicators in Beijing,this paper makes a regression analysis based on Beijing's AQI index and its six indicators in 2020.First of all,the linear relationship between PM2.5 index and other indicators is preliminarily judged by scatter plot.Then,test whether there is a correlation between the respective variables,that is,whether the model is multicollinearity.Then,the principal component regression is used to select the appropriate principal components to eliminate the influence of complex collinearity between variables on the fitting linear regression model.After that,the linear model obtained by principal component regression is statistically diagnosed to eliminate the influence of abnormal points on the model.Finally,the final linear regression model of PM2.5 and PM10,SO_(2),CO,NO_(2),O_(3) is obtained by stepwise regression.
出处
《科技创新与应用》
2021年第35期10-14,共5页
Technology Innovation and Application
关键词
PM2.5
相关分析
主成分回归
统计诊断
COOK距离
PM2.5
correlation analysis
principal component regression
statistical diagnosis
Cook distance