摘要
针对数据预处理中的遗失值填充问题,运用策略模式设计了一种可扩展的遗失值填充算法;构造了Sim pleImputation,KNNImputation和DTBImputation3个具体的策略类,分别封装了简单遗失值填充算法、KNN遗失值填充算法以及DTB遗失值填充算法。实验结果表明:简单填充算法执行速度最快但精度最低,DTB算法执行速度较慢但精度较高,KNN算法执行速度最慢但精度最高。该算法允许用户根据自身对速度和精度的需求来选取相应的填充算法,并通过添加新策略类的方式来扩展其遗失值填充功能,从而解决了遗失值造成的数据质量问题,提高了数据预处理程序的通用性和可扩展性。
For filling missing data values in the data pre-processing, an extensible algorithm based on strategy pattern is put forward. In the algorithm three concrete strategy classes is used for encapsulate the simple-filling , KNN-filling and DTB-filling algorithm for dealing with missing data values. The experiment results of filling missing data show that simple-filling algorithm has the fastest speed and the lowest precision, DTB-filling algorithm has slower speed and higher precision,KNN-filling algorithm has the slowest speed and the highest precision. Allowing users to choose the proper algorithm according to their own requirements, such as time or precision, and extend the function of filling missing data values through adding new strategy classes, the extensible algorithm does not only solve the problem of the data quality caused by missing data values, but also enhance the extensible and general capability of the data pre-processing application.
出处
《中南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2004年第5期825-829,共5页
Journal of Central South University:Science and Technology
基金
国家自然科学基金资助项目(69971007
60171043)