期刊文献+

缺失值填充:基于信息增益的方法 被引量:8

Missing data imputation:Information gain based on approach
下载PDF
导出
摘要 在数据挖掘以及机器学习等领域,都需要涉及一个数据预处理过程,以消除数据中所包含的错误、噪声、不一致数据或缺失值。其中,缺失值的填充是一个非常具有挑战性的任务,因为填充效果的好坏会极大的影响学习算法及挖掘算法的后续处理过程。目前已有的一些填充算法,如基于粗糙集的和基于最近邻法的算法等,在一定程度上能够处理缺失值问题。与以上方法不同,提出了一种扩展的基于信息增益的缺失值填充算法,它充分利用数据集中各属性之间隐含的关系对缺失的数据进行填充。大量的实验表明,提出的扩展的基于信息增益的缺失值填充算法是有效的。 In the data mining or machine learning field, a data preprocessing procedure is often needed to eliminate errors, noises, inconsistent data or missing data that are contained in the dataset. Among them, the missing data filling is a very challenging task, because the filling results greatly affect the following procedures of the learning or mining algorithms. While some existing filling algorithms, such as rough set based and nearest neighbor based algorithms etc, can deal with the missing data problem to some extent. Different from these methods, an extended information gain (IG) based on algorithm is proposed for dealing with missing data, which fully utilizes the underlying relationships between attributes of the dataset. Extensive experiments show that the proposed algorithm is efficient.
作者 张红霞
出处 《计算机工程与设计》 CSCD 北大核心 2006年第24期4810-4812,共3页 Computer Engineering and Design
关键词 机器学习 缺失值填充 信息增益 分类准确率 machine learning missing data imputation, information gain classification accuracy
  • 相关文献

参考文献11

  • 1Cios K,Kurgan L.Trends in data mining and knowledge discovery[C].Knowledge Discovery in Advanced Information Systems,2002.
  • 2Brown M L,Kros J F.Data mining and the impact of missing data[J].Industrial Management and Data Systems,2003,103 (8):611-621.
  • 3Zhang C,Yang Q,Liu B.Intelligent data preparation[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(9):1163-1165.
  • 4Marco Ramoni.Learning Bayesian networks from incomplete databases[EB/OL].Technical report kmi-97-6,Knowledge Media Institute,The Open University,1997.http://kmi.open.ac.uk/publications/index.cfm? trnumber=kmi-97-6.
  • 5Ghahramani Z,Jordan M I.Mixture models for learning from incomplete data[C].Cambridge,MA:Computational Learning Theory and Natural Learning Systems,Volume Ⅳ:Making Learning Systems Practical,The MIT Press,1997.67-85.
  • 6韩家炜,坎伯.数据挖掘:概念与技术[M].北京:机械工业出版社,2000.
  • 7Chmielewski M R,Grzymala-Busse J W,Peterson N W,et al.The rule induction system LERS-a version for personal computers[J].Found Computer Decision Science,1993,18(3/4):181-212.
  • 8王清毅,蔡智,邹翔,蔡庆生.部分数据缺失环境下的知识发现方法[J].软件学报,2001,12(10):1516-1524. 被引量:18
  • 9Quinlan J R.C4.5:Programs for machine learning[M].San Mate,CA:Morgan Kaufmann Publishers Inc,1993.
  • 10邹志文,朱金伟.数据挖掘算法研究与综述[J].计算机工程与设计,2005,26(9):2304-2307. 被引量:52

二级参考文献28

  • 1毛国君.数据挖掘的概念、系统结构和方法[J].计算机工程与设计,2002,23(8):13-17. 被引量:28
  • 2Ragel A,Research and Development in Knowledge Discovery and Data Mining,1998年,258页
  • 3Zhang T,Technical Report,1995年
  • 4孙文爽,多元统计分析,1994年
  • 5Gehrke J, Ramakrishnan R, Ganti V. Rainforest a framework for fast decision tree construction of large datasets[A]. In VLDB[C].1998.
  • 6Friedman N, Geiger D, Goldszmidt M. Bayesian network classifier [J]. Machine L earning, 1997, 29(1): 131-163.
  • 7Liu B, Hsu W, Ma Y. Integrating classification and association rule mining[A]. Proc of the 4th int confon knowle-dge discovery and dataMining[C]. NY, USA:AAAIPress, 1998.80-86.
  • 8WANG M, Iyer B, Vitter J S. Scalable mining for classification rules in relational databases[A]. Eaglestone B, DesaiBC, SHAO Jianhua. Proc of the 1998 Int database eng and appl symp[C].Cardiff, Wales, UK:IEEEComputer Society, 1998.58-67.
  • 9MacQueen J. Some methods for classification and analysis of multivariate observations[A].Proc 5th berkeley symp.math statist[C]. Prob, 1967-01.
  • 10Kaufman L, Rousseeuw P J. Finding groups in data: an introduction to cluster analysis[M]. John Wiley and Sons, 1990.

共引文献98

同被引文献76

引证文献8

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部