摘要
纠错输出编码是一种处理多类分类问题的有效方法,但它只能用于有监督的数据,而对大量未标签样本却无法利用.提出一种新颖的基于半监督技术的层次编码算法,对传统的纠错输出编码算法(ECOC)进行改造,拓展了编码的概念.在编码阶段,根据簇特征进行同类组合后再进行层次编码,从而在充分利用了无标签样本的同时,根据数据类分布的特点进行编码以提高算法精度.最后在化工产品有毒性预测数据集上的实验结果表明了本方法的可行性和有效性.
Error correcting output coding is an effective method used to deal with multi-class olassification problems.However,it can not be used for a large amount unlabeled training examples except labeled ones.A novel hierarchical ECOC algorithm based on semi-supervised technique is proposed in this paper which extends the original idea of the traditional ECOC.To improve the accuracy of multi-classifier classification,on coding phase,a hierarchical coding is built for classes after combining all clusters with the same class label.The experimental results on some toxicity datasets of chemical compounds from real-world application show its effectiveness of the proposed method.
出处
《小型微型计算机系统》
CSCD
北大核心
2010年第8期1659-1664,共6页
Journal of Chinese Computer Systems
基金
福建省自然科学基金项目(2008J04004
2007J0016和2009J01273)资助
教育部回国留学人员基金项目(教外司留[2008]890号)资助
关键词
纠错输出编码
半监督学习
层次编码
多类分类
error correcting output codes
semi-supervised learning
hierarchical coding
multi-class classification