期刊文献+

APT-KNN:一种面向分类问题的高效缺失值填充算法 被引量:10

APT-KNN:AN EFFICIENT MISSING VALUE IMPUTATION METHOD ORIENTED TOWARD CLASSIFICATION ISSUE
下载PDF
导出
摘要 分类是一种常见的数据挖掘方法,而属性值缺失是分类过程中常见的一类数据质量问题,缺失值填充可以减少属性值缺失造成的分类错误。缺失值填充首先要求准确率高,在许多实际应用当中,缺失值填充还必须保证较高的计算效率。提出了一种填充缺失属性值算法APT-KNN,APT-KNN算法利用属性与属性之间的相互关系,根据与目标最相似的几个实例属性值来估计缺失值,以保证填充结果具有更高的准确性,同时设计了一种优化的AntiPole树索引结构,提高了缺失属性值的填充效率。实验表明,APT-KNN方法与现有的几种缺失属性填充方法相比,具有更高的准确率和填充效率。 Classification is one of the common data mining methods.However,one common data quality problem in classification process is attribute value missing,and missing data imputation can reduce the effect on the classification errors caused by the attribute value missing.Missing data imputation requires high accuracy first,and it shall ensure higher computation efficiency in many practical applications as well.In this paper,we present a new imputation method for missed attribute value – APT-KNN,it makes use of the relations among the attributes and estimates the missing value according to a couple of instance attribute values which are most similar to the object,so as to guarantee higher accuracy of the imputed results.At the same time,an optimised AntiPole-Tree index structure is designed,which improves the efficiency of missed attribute values imputation.Experiments show that APT-KNN outperforms several current methods of missed attribute imputation on efficiency and accuracy.
出处 《计算机应用与软件》 CSCD 2011年第4期135-139,共5页 Computer Applications and Software
基金 上海市科委重点科技攻关课题(08511500203)
关键词 分类 缺失值填充 索引 数据挖掘 数据准备 Classification Missing value imputation Index Data mining Data preparation
  • 相关文献

参考文献8

  • 1Pang Ning Tan,Michael Steinbach,Vipin Kumar.数据挖掘导论[M].北京:人民邮电出版社,2006.
  • 2Chan T M. Approximating the diameter, width, smallest enclosing cylinder, andminimum-width annulus[ J ]. International Journal of Computational Geometry and Applications ,2002,12, ( 1 - 2 ) :67 - 85.
  • 3Cantone D, Ferro A, Pulvirenti A, et al. Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces [ J ]. IEEE Transactions on Knowledge and Data Engineering ,2005,17 (4) : 535-550.
  • 4Mehala B, Vivekanandan K, Ranjit Jeba Thangaiah P. An analysis on kmeans Algorithm as in imputation method to deal with missing values [ J ]. Asian Journal of Information Technology,2008,7 (9) :434 - 441.
  • 5Blake C, Merz C. (1998)Repository of machine learning database. Itvine, CA : university of California, Department of Information and Computer Science. http://www, ics. uci. edu/- mlearn/MLRepository, html.
  • 6Gustavo E A P A Batista, Maira C Monard. An analysis of four missing data treatment methods for supervised learning [ J ]. Applied Artificial Intelligence, 2003,17 ( 5 ) :519 - 533.
  • 7殷杰,石锐.SAS中处理数据集缺失值方法的对比研究[J].计算机应用,2007,27(B06):438-439. 被引量:8
  • 8Alsuwaiyel M H. Algorithm design techniques and analysis[ J]. Lecture Notes Series on Computing, 1998 (7) :374- 376.

二级参考文献4

  • 1茅群霞,李晓松.多重填补法Markov Chain Monte Carlo模型在有缺失值的妇幼卫生纵向数据中的应用[J].四川大学学报(医学版),2005,36(3):422-425. 被引量:7
  • 2GIARDINA M,HUO Y,AZUAJE F,et al.A Missing Data Estimation Analysis in Type Ⅱ Diabetes Databases[A].Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems[C].2005.
  • 3BARZI F,WOODWARD M.Imputation of Missing Values in Practice:Results from Imputations of Serum Cholesterol in 28 Cohort Studies[J].American Journal of Epidemiology,2004,160 (1):34-351.
  • 4ARNOLD AM,KRONMAL RA.Multiple Imputation of Baseline Data in the Cardiovascular Health Study[J].American Journal of Epidemiology,2003,157 (1):74-841.

共引文献7

同被引文献96

引证文献10

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部