摘要
为更加准确地对问题数据进行检测及修复,针对存在属性值缺失的情况,提出综合利用数据质量规则与FellegiHolt方法进行数据质量检查的策略。针对不同的检测需求,分别设计以问题数据定位和问题数据修复为目标的检测算法,提出相应算法以解决问题数据的修复以及缺失数据的填充问题。分别利用实例数据与生成数据进行实验,实验结果表明,该方法对问题数据进行检测的召回率和准确率有明显优势,两种检测策略在进行问题数据检测时的效率也有较大差异。
To carry out the detection of error data in the condition of field value missing more exactly,an error data detection method based on Fellegi-Holt method and data quality rules was put forward.To meet different requirements,two algorithms were designed to solve the error data location and repairing problems.Algorithms were also put forward to solve the error data repairing and missing data filling problems.Experiments were conducted with both real-life and synthetic data to examine the algorithms.The results show that there is a great improvement in the recall and accuracy rate of error data detection,and the efficiencies of two algorithms differ a lot.
出处
《计算机工程与设计》
北大核心
2016年第3期643-649,共7页
Computer Engineering and Design
基金
国家自然科学基金项目(61371196)
中国博士后科学基金特别基金项目(201003797)