期刊文献+

一种可扩展的遗失值填充算法 被引量:2

An Extensible Algorithm for Filling Missing Data Values
下载PDF
导出
摘要 针对数据预处理中的遗失值填充问题,运用策略模式设计了一种可扩展的遗失值填充算法;构造了Sim pleImputation,KNNImputation和DTBImputation3个具体的策略类,分别封装了简单遗失值填充算法、KNN遗失值填充算法以及DTB遗失值填充算法。实验结果表明:简单填充算法执行速度最快但精度最低,DTB算法执行速度较慢但精度较高,KNN算法执行速度最慢但精度最高。该算法允许用户根据自身对速度和精度的需求来选取相应的填充算法,并通过添加新策略类的方式来扩展其遗失值填充功能,从而解决了遗失值造成的数据质量问题,提高了数据预处理程序的通用性和可扩展性。 For filling missing data values in the data pre-processing, an extensible algorithm based on strategy pattern is put forward. In the algorithm three concrete strategy classes is used for encapsulate the simple-filling , KNN-filling and DTB-filling algorithm for dealing with missing data values. The experiment results of filling missing data show that simple-filling algorithm has the fastest speed and the lowest precision, DTB-filling algorithm has slower speed and higher precision,KNN-filling algorithm has the slowest speed and the highest precision. Allowing users to choose the proper algorithm according to their own requirements, such as time or precision, and extend the function of filling missing data values through adding new strategy classes, the extensible algorithm does not only solve the problem of the data quality caused by missing data values, but also enhance the extensible and general capability of the data pre-processing application.
出处 《中南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2004年第5期825-829,共5页 Journal of Central South University:Science and Technology
基金 国家自然科学基金资助项目(69971007 60171043)
关键词 执行速度 可扩展性 填充算法 KNN 数据预处理 精度 封装 运用策略 需求 问题 missing data values strategy pattern data pre-processing
  • 相关文献

参考文献13

  • 1HERNANDEZ M A, STOLFO S J. Real-World Data Is Dirty:Data Cleansing and the Merge/purge Problem[J]. Data Mining and Knowledge Discovery,1998,2(14):9-37.
  • 2方幼林,杨冬青,唐世渭,张卫华,余利波,付强.数据仓库中数据质量控制研究[J].计算机工程与应用,2003,39(13):1-4. 被引量:38
  • 3RAHM E, DO H H. Data Cleaning: Problems and Current Approachs[J]. IEEE Bulletin of the ACM, 1998,41(2):79-82.
  • 4KYYSZKIE W M. Rough Set Approach to Incomplete Information Systems[J]. Information Sciences, 1998,11(4):39-49.
  • 5BARNETT V, TAYI K. Methodology for Allocating Resources for Data Quality Enhancement[J]. CACM, 1989,32(2):320-329.
  • 6SOMBE L. Special Issue on Reason Under Incomplete Information in Artificial Intelligence[J]. International Journal of Intelligent Systems, 199l,5(1):85-94.
  • 7郭景峰,米浦波,刘国华.基于决策树的数据遗失值填充方法的研究[J].计算机工程与科学,2002,24(5):8-10. 被引量:6
  • 8LUKAC M J, Marko Marincek. Determination of the EM field in Rotating Mirror Q-Switch Er:Glass Laser[J]. 1992,16(27):13-20.
  • 9JILL D. Data Warehouse,Metadata and Middleware[J].EAI Journal,2000,11(1):71-76.
  • 10JAMES W C. Java Design Patterns: A Tutorial[M]. CA:Addison Wesley Pearson Press,2002.

二级参考文献11

  • 1ISO 8402:1994 Quality Management and Quality Assarance-Vocabulary[Sl.lntemational Organization for Standardization.
  • 2ISO 9000:2000 Quality Management System-Fundamentals and Volcabulary[S].Intemational Organization for Standardization.
  • 3ISO 9001:2000 Quality Management System-Requirements [S].International Organization for Standardization.
  • 4William H Inmon.Building the Data Warehouse[M].Second Edition, John Wiley & Sons, 1996-03.
  • 5Jarke M,Jeusfeld M,Quix C et al.Architecture and Quality in Data Warehouse:An Extended Repository Approach[J].Information Systems, 1999;24(3).
  • 6R J A Little, D B Rubin. Statistical Analysis with Missing Data, Wiley Series in Probability and Mathematical Statistics[M]. Near York:Wiley, 1987.
  • 7A Ragel, B Crémilleux. MVC-A Preprocessing Method to Deal with Missing Values[J]. Knowledge-Based Systems,1999,12:285-291.
  • 8J.R.Quinlan. C4.5: Programs for Machine Learning[M]. San Mateo,Morgan Kaufmanum, CA:1993.
  • 9K Lakshminarayan, S A Harp, R Goldman, et al. Imputation of Missing Data Using Machine Learning Techniques[A]. Proc of the and In, Conf on Knowledge Discovery and Dada Mining (KDD-96)[C]. Porland, USA, 1996.140-146;
  • 10J Rattary, J D Floros, R H Liton. Computer-Aided Microbial Identification Using Decision Trees[J]. Food Control, 1999,10: 107-116.

共引文献44

同被引文献15

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部