摘要
通过实际案例讨论自变量共线性关系的诊断和共线性条件下回归建模常用方法存在的主要问题以及实际建模效果。结果表明,变量筛选法在处理共线性问题时,会将一些重要的解释变量排除在模型之外,从而削弱了理论的优先地位和导向功能;岭回归、主成分回归和偏最小二乘回归都能够不同程度地减轻或消除自变量共线性的不良影响,但均不能在理论和建模效果上一致地优于其他方法。就模型的可解释性而言,在本案例中,主成分回归和偏最小二乘回归模型的回归系数含义较岭回归模型更符合人们的认识水平,但从模型的拟合和预测效果来看,岭回归模型的拟合精度明显优于主成分回归而略高于偏最小二乘回归,预测精度则低于偏最小二乘回归,岭回归模型可能存在过度拟合问题;主成分回归模型的拟合与预测精度均不理想。综合而言,偏最小二乘回归建模在自变量共线性数据的处理方面较其他几种方法更为稳健,其拟合和预测效果比较理想。
The diagnosis of collinearity,the main problems of modeling under the condition of collinearity and the actual effect of modeling are mainly discussed through an actual case study.The result shows that when the method of independent variables selecting is applied to collinearity problems,this method will exclude some important explaining variables,which leads to weakening the priority and guidance of theory.Ridge regression(RR,principle component regression(PCR) and partial least square regression(PLSR) can all alleviate or eliminate the negative effects of collinearity at some degrees, but all fail in preceding other methods on theories and modeling effects. In the view of the explanation of modeling in this case, compared with RR modeling, PCR and PLSR modeling agree more with people' s cognitive level. But in the view of the goodness of fit and prediction of the models, the fitting precision of RR modeling is greater than PCR and PLSR, but its prediction precision is smaller than PLSR. RR modeling probably has the problem of over-fitting; the goodness of fit and prediction of PCR model is not good. It concludes that PLSR modeling is more robust in dealing with collinearity and its goodness of fit and prediction is more ideal than other methods.
出处
《体育科学》
CSSCI
北大核心
2009年第9期18-23,41,共7页
China Sport Science
关键词
共线性诊断
普通最小二乘回归
岭回归
主成分回归
偏最小二乘回归
diagnosis of collinearity
ordinary least square
ridge regression
principle component regression
partial least square regression