摘要
针对目前标记噪声普遍在数据集中出现的这一现象,研究了一种新的模型,称为鲁棒逻辑回归模型。该模型以传统的贝叶斯逻辑回归模型为基础,在分类器中加入标记转换概率来应对可能出现的标记噪声。同时在模型中运用了正则化的方法,使分类器在拟合数据与变量选择间保持平衡。实验中分别用到了合成数据集和真实的数据集,从而对鲁棒逻辑回归模型在分类问题中具有的预测能力和变量选择能力以及对标记噪声的鲁棒性进行验证,再与传统的模型进行比较。结果表明在面对含有标记噪声的数据时,由鲁棒逻辑回归模型训练产生的分类器有更低的误分类率,在变量选择方面也更准确。
Labelling errors are common in the microarray data sets. A new model is studied on. It is called robust logistic regression model. In the new model,which is constructed on the basis of the Bayesian logistic regression model,added label-filpping probability in the classifier to cope with the labelling errors. It regularizes in the objective function,balances the classifier between the over-fitting problem and the variable selection capability. The experiments use synthetic data sets and real data sets,testing predictive ability,variable selection ability and robustness against labelling errors of robust logistic regression model,compared with the Bayesian logistic regression model. It turns out that when labelling errors are in data sets, the classifier trained by the robust logistic regression model has lower misclassication error and a more accurate estimation of the parameter.
作者
滕文
TENG Wen(School of Computer Science, Shaanxi Institute of International Trade and Commerce, Xi'an 712000,China)
出处
《信息技术》
2018年第5期133-138,共6页
Information Technology