摘要
分类是一种常见的数据挖掘方法,而属性值缺失是分类过程中常见的一类数据质量问题,缺失值填充可以减少属性值缺失造成的分类错误。缺失值填充首先要求准确率高,在许多实际应用当中,缺失值填充还必须保证较高的计算效率。提出了一种填充缺失属性值算法APT-KNN,APT-KNN算法利用属性与属性之间的相互关系,根据与目标最相似的几个实例属性值来估计缺失值,以保证填充结果具有更高的准确性,同时设计了一种优化的AntiPole树索引结构,提高了缺失属性值的填充效率。实验表明,APT-KNN方法与现有的几种缺失属性填充方法相比,具有更高的准确率和填充效率。
Classification is one of the common data mining methods.However,one common data quality problem in classification process is attribute value missing,and missing data imputation can reduce the effect on the classification errors caused by the attribute value missing.Missing data imputation requires high accuracy first,and it shall ensure higher computation efficiency in many practical applications as well.In this paper,we present a new imputation method for missed attribute value – APT-KNN,it makes use of the relations among the attributes and estimates the missing value according to a couple of instance attribute values which are most similar to the object,so as to guarantee higher accuracy of the imputed results.At the same time,an optimised AntiPole-Tree index structure is designed,which improves the efficiency of missed attribute values imputation.Experiments show that APT-KNN outperforms several current methods of missed attribute imputation on efficiency and accuracy.
出处
《计算机应用与软件》
CSCD
2011年第4期135-139,共5页
Computer Applications and Software
基金
上海市科委重点科技攻关课题(08511500203)
关键词
分类
缺失值填充
索引
数据挖掘
数据准备
Classification Missing value imputation Index Data mining Data preparation