摘要
纠错输出编码作为解决多类分类问题的通用集成框架,能有效的把多类问题分解为二类问题从而使问题得以简化.然而在生成基分类器的过程中,经常面临提高基分类器之间的差异性和增加各基分类器与集成分类器学习的一致性的矛盾,称之为consistent-diverse平衡问题.在保证差异性的前提下减小由学习不一致性引起的分类错误率是解决该平衡问题的一个出发点,在此利用加权解码,通过对加权系数矩阵的再学习进而减弱和消除由基分类器学习不一致性产生的误差.实验利用人工数据集和UCI数据集分别加以验证,结果表明以集成分类器的分类错误率为适应度函数的遗传算法搜索出的最优加权系数矩阵相比其它方法产生的系数矩阵在解决consistent-diverse平衡问题更具有优越性.
Error-Correcting Output Codes as a unifying framework for studying the multiclass categorization problems can reduce them to multiple binary problems effectively,thus simplifying the problem.But when generating component classifiers,we usually need to face the contradiction between the diversity among the component classifiers and the consistency of learning between the component classifiers and the ensemble classifiers.We call this contradiction consistent-diverse balance problem.How to reduce the error ratio caused by the inconsistency under diversity big enough is the breakthrough of the balance problem.Using weighted decoding,we can reduce the classification error caused by the learning inconsistency through relearning for weight coefficient matrix.In the proposed algorithm,by using GA to learn the weight coefficient matrix and taking the final generalization error of the ensemble classifiers as the fitness function,we can get the weight coefficient matrix of which the error of the training samples is minimum.The experiments respectively on artificial data sets and UCI data sets have proved that the algorithm is better than others for the consistent-diverse balance problem.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2011年第7期1514-1522,共9页
Acta Electronica Sinica
基金
国家自然科学基金(No.60975026)
关键词
纠错输出码
多类分类
加权解码
遗传算法
error-correcting output codes
multiclass categorization
weighed decoding
genetic algorithms