摘要
信息处理技术的不断变革,使每个行业都拥有了许多计算机信息系统,同时也产生了大量的数据。因此能够使数据有效地进行组织的日常运作和判断,要求数据可靠准确是研究的热点,文中提出了一种ETL与数据清洗结合的分布式数据集成工具,将数据清理的技术引入到ETL中,制定数据清洗规则,并基于统计的方法,聚类方法,关联规则的方法等提出数据清洗的算法,并进行比较,提出清洗数据信息的框架,从而提高数据的质量,进行数据清洗评估,认为方法可行有效,具有实际应用意义。
Information processing technology continues to change,each industry has a lot of computer information systems,but also produced a lot of data. Therefore,it is a hot spot to research the data efficiently and effectively. The paper puts forward a distributed data integration tool combining ETL and data cleansing. The technology of data cleaning is introduced into ETL,Data cleansing rules are proposed,and data cleaning algorithms are proposed based on statistical methods,clustering methods and association rules. The framework of cleaning data information is proposed to improve the quality of data and to evaluate data cleaning. Feasible and effective,with practical significance.
出处
《信息技术》
2017年第10期133-136,共4页
Information Technology