摘要
代表点选择是实现缩减数据集规模的有效途径,可以提高分类的准确率和执行效率。为此,通过引入分类置信度熵的概念,提出适应度评价函数,用于评估代表点的选择效果,以此找到最优的代表点集。该方法可与其他代表点选择方法结合,得到性能更优的代表点选择方法。与多个经典代表点选择方法进行实验比较,结果表明基于置信度的代表点选择方法在分类准确率和数据降低率上有一定优势。
Representative point selection method aims to reduce the amount of training data instances for nearest neighbor classification algorithms,in order to improve the implementation efficiency and the classification accuracy.By introducing the concept of classification confidence entropy,a new fitness evaluation function is proposed to evaluate the prototype instances,and a new genetic algorithm is designed for representative point selection.This paper demonstrates that the new concept can also be used in other kind of representative point selection methods,in order to improve their performances.Compared with some other famous representative point selection algorithms,experimental results show that confidence based approach has some advantages in improving both the classification accuracy and the data reduction rate.
出处
《计算机工程》
CAS
CSCD
2012年第19期167-169,174,共4页
Computer Engineering
关键词
置信度熵
适应度评价函数
代表点选择
k最近邻
半监督学习
遗传算法
confidence entropy
fitness evaluation function
representative point selection
k-nearest neighbor
semi-supervised learning
genetic algorithm