摘要
随着信息技术的发展,数据的规模正在高速增长,数据中普遍存在质量问题.针对海量关系数据中普遍存在的数据不完整现象,研究了关系数据完整性度量问题.针对数据的完整性计算问题,提出了数据完整性计算模型,以及精确算法和基于均匀抽样的近似算法.理论分析证明了近似算法可以达到任意的精度要求,可以高效地对数据完整性进行计算.通过在DBLP数据上的实验验证了算法的有效性和高效性.
With the development of information technology,the scale of data is increasing sharply, which brings more quality problems with it.Incomplete data usually exits in massive data,which gives rise to the research problem of this paper.A model of evaluating data completeness is proposed. And an exact algorithm and an approximate algorithm based on uniform sampling are proposed to evaluate data completeness in this paper.The theoretical analysis demonstrates that the proposed approximate algorithm can reach arbitrary precision,which can evaluate data completeness efficiently. Experiments on data extracted from DBLP show effectiveness and high performance of our approximate algorithm.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第S1期230-238,共9页
Journal of Computer Research and Development
基金
国家"九七三"重点基础研究发展计划基金项目(2012CB316202)
关键词
数据质量
数据完整性
均匀抽样
近似算法
数据完整性模型
data quality
data completeness
uniform sampling
approximate algorithm
a model of data completeness