摘要
数据的不一致性检测是数据清洗中一个重要的主题。传统集中式数据的不一致性检测问题可以使用基于SQL的技术得到解决,而对于分布式的数据,往往面临着诸多挑战。目前研究者提出了基于函数条件依赖的不一致性检测技术对该问题进行了深入研究,将分布式不一致性检测问题转化成最优化问题,并提出了若干可行的解决算法。本文介绍了分布式数据下的基于函数条件依赖的不一致性检测问题,并实现了基于最优化问题的分布式检测算法,最后组织相关实验进行验证和改进。
Detecting inconsistency is one of the central issues in data cleaning. There have been effective methods based on SQL techniques to detect inconsistency in centralized database. However,it's far more challenging when the database is distributed. There have been some studies on data inconsistency that is based on conditional functional dependency,formulating the inconsistency detecting problems as optimization problems,in which several effective algorithms were developed.This paper introduces the detection problem of inconsistency on distributed data,which is based on the conditional functional dependencies. Then,the paper develops the characterizations of the conditional functional dependencies,the fragment of dataset and the optimization problem and relevant algorithms of inconsistency detection. Finally,the paper organizes several experiments to verify and meliorate these algorithms.
出处
《智能计算机与应用》
2015年第3期57-60,64,共5页
Intelligent Computer and Applications
基金
国家自然科学基金(61173022)
关键词
分布式数据
不一致性
条件函数依赖
最优化
Distributed Data
Inconsistency
Conditional Functional Dependency
Optimizations