摘要
数据集浓缩是在一定条件下去除数据集合中的噪声和冗余数据,选用一个充分小的数据子集来代替原有数据集,而不降低数据挖掘任务的精确度的过程,是数据挖掘任务得到良好效果的保障,在某些场合又可作为完成数据挖掘任务的主要方法。对数据集浓缩技术的发展状况进行总揽和评述,分析并展望未来发展方向,为将致力于此研究方向的科研人员提供参考。
Dataset condensation refers to the removal of the noisy and redundancy data in dataset in certain conditions. It supersedes the original dataset with a data subset small enough while maintaining the accuracy of data mining. It is the guarantee to the data mining with pref erable effect. In some cases, it can also be the main approach for implementing data mining task. The aim of this paper is to offer a reference to the researchers to be devoted to the field of dataset condensation technology by giving an overview and the comments on its development sit uation as well as the analysis and outlook on its prospect.
出处
《计算机应用与软件》
CSCD
北大核心
2012年第10期211-215,共5页
Computer Applications and Software
基金
国家重大科技专项(2010ZX01045-001-004)
国家科技支撑计划项目(2011BAD21B02)
国家自然科学基金项目(60871042)
关键词
数据集
浓缩
噪声
冗余
子集
数据挖掘
Dataset Condensation Noisy Redundancy Subset Data mining