摘要
目的考察遗传算法作为logistic回归模型参数估计方法的效能,并与极大似然估计法比较。方法通过数据模拟建立三种模型,分别用遗传算法和极大似然法作参数估计,考察建立模型的分类效能。结果一般情况下,极大似然估计法的分类效能稍高于遗传算法。在样本量较小或自变量关系复杂的情况下,极大似然估计法和遗传算法的泛化误差增加。极大似然估计法的泛化误差主要源于在验证集中分类效能下降,而遗传算法的泛化误差主要源于训练集中的过拟合。当样本量小且自变量关系复杂的情况下,极大似然估计法出现迭代不收敛,参数失拟合,遗传算法无此现象。结论遗传算法适用于自变量多而样本量相对小时logistic回归模型参数估计。
Objective To evaluate the genetic algorithm for the parametric estimation of logistic regression model comparing to the maximum likelihood method.Methods Three models were constructed and the sample data were simulated.Two methods were used to estimate the parameters of logistic models and their classification ability and variance was evaluated.Results The classification ability of maximum likelihood method was better than that of genetic algorithm in general samples.Generalized error increased as sample size decreased or variables related.The generalized error in maximum likelihood method caused by decreasing ability of classification in validation samples while that in genetic algorithm mainly caused by over-fitness in training samples.Maximum likelihood method was not convergent and lost of estimation of parameters when sample size is small.Conclusion Genetic algorithm is valuable for parametric estimation of logistic regression model when number of variables is big and sample size is relative small.
出处
《中国卫生统计》
CSCD
北大核心
2012年第1期74-76,共3页
Chinese Journal of Health Statistics
关键词
遗传算法
LOGISTIC回归
极大似然法
参数估计
Genetic algorithm
Logistic regression
Maximum likelihood method
Parameter estimation