摘要
数据清洗是数据挖掘过程中最消耗资源的一步,如何对数据进行有效的清理和转换使之成为符合数据挖掘要求的数据源是影响数据挖掘准确性的关键因素.论述了数据挖掘过程中的数据清洗的领域、数据清洗的原理与方法.以高技能人才信息系统中的数据为例,使用SQL对数据进行清洗和转换,将数据分类和整理,使之满足数据挖掘的需要.
Data cleaning is the most resource -consuming step in the data mining process. Data are effectively cleaned up and converted to comply with the requirements of data mining,and this is a key factor that af-fects the accuracy of data mining. The areas,principles and methods of data cleaning are discussed. The methods that use SQL for data cleaning and conversion are introduced depending on the needs of data mining in the case of data of highly skilled personnel information system. These methods are used to clas-sify and organize the data so that make these data become dataset that meet the needs of mining.
出处
《通化师范学院学报》
2015年第4期7-10,共4页
Journal of Tonghua Normal University
基金
安徽省高等学校省级质量工程项目"示范实验实训中心信息实训中心"(2014sxzx032)
滁州职业技术学院质量工程项目"计算机应用专业教学团队"(zlgc2014006)
关键词
数据挖掘
数据清洗
分类
SQL
SQL
data mining
data cleaning
classification
SQL