期刊文献+

一种基于CFDs规则的修复序列快速判定方法

Rapid Decision Method for Repairing Sequence Based on CFDs
下载PDF
导出
摘要 数据一致性是大数据质量管理研究的一个重要内容。条件函数依赖(CFDs)是维护数据一致性的有效技术手段。然而,在修复过程中选择不同的CFDs修复顺序,会影响修复的准确性和效率。因此,如何选取一个正确且合理的修复顺序对数据修复至关重要。针对该问题,提出一种基于CFDs规则的快速判定修复序列的计算方法。首先,设计了一种数据修复框架。然后,利用CFDs之间的关联关系,提出了修复序列图的概念,以用于CFDs修复顺序的计算。一方面,可以避免某些错误的或者不必要的数据修复,提高修复的准确性。另一方面,使用规则来判定修复顺序比使用实际数据进行判定更为快速。此外,在判定修复序列的过程中,对修复死锁进行了检测,保证了修复过程的可终止性。最后,通过在真实数据集上与现有方法进行对比实验,证明了所提方法具有更高的准确性和运行效率。 Data consistency is one central issue of big data quality management research.Conditional functional dependencies(CFDs)are effective techniques for maintaining data consistency.In practice,different repairing sequences may affect precision and efficiency of data repairing.It is critical to select an appropriate repairing sequence.To solve the problem,based on CFDs,this paper presented a rapid decision method for repairing sequence.Firstly,a framework is designed for consistency repairing.Then,by analyzing the association between constraints,the concept of repairing sequence graph is presented to determine repairing sequence on CFDs.It contributes to avoiding some incorrect and unnecessary repairs,which can improve the accuracy of repairing.Meanwhile,repairing sequence with rules runs faster than that with real data.Furthermore,in the process of repairing sequence decision,repairing-deadlock detection is implemented to ensure the termination of repairing.Finally,compared with the existing method,this solution is more accurate and efficient evidenced by the empirical evaluation on two real-life datasets.
出处 《计算机科学》 CSCD 北大核心 2018年第3期311-316,共6页 Computer Science
基金 河北省自然科学基金(F2014409008) 河北省科技计划项目(17210336) 廊坊市科技计划项目(2017011042)资助
关键词 数据一致性 条件函数依赖 修复序列 Data consistency Conditional functional dependencies(CFDs) Repairing sequence
  • 相关文献

参考文献3

二级参考文献28

  • 1Aebi D, Perrochon L. Towards improving data quality[C]// Proc. of the international conference on information systems and management Of data. New York, ACM, 1993 : 273-281.
  • 2Pernici B,Scannapieco M. Data Quality in Web Information Sys- tems[C]//Proc, of the 21st International Conference on Concep- tual Modeling. Berlin Heidelberg: Springer, 2002 : 397-413.
  • 3Dalip D H, Cristo M, Calado P. Automatic assessment of docu- ment quality in web collaborative digital libraries [J]. ACM Journal of Data and Information Quality, 2011,2 (3) : 14.
  • 4Hu Mei-qun, Lim Ee-peng, Sun Ai-xirL Measuring Article Quali- ty in Wikipedia: Models and Evaluation[C]//Proc. of the 16th CIKM. New York: ACM, 2007.,243- 252.
  • 5Zeng H, Alhossaini M A, Li D, et al. Computing trust from revi- sion history[C]//Proc, of the 2006 International Conference on Privacy, Security and Trust:Bridge the Gap Between PST Tech- nologies and Business Services. New York: ACM, 2006.
  • 6Blumenstock J E. Size Matters: Word Count as a Measure of Quality on Wikipedia[C]//Proc. of the 17th International Con- ference on World Wide Web. New York:ACM,2008:1095-1096.
  • 7Knap T, Mlynkova I. Quality Assessment Social Networks: A Novel Approach for Assessing the Quality of Information on the Web[C]ffProc. of QDB of VLDB' 10. 2010.
  • 8Baeza-Yates R, Rello L. On Measuring the Lexical Quality of the Web[C]// Proe. of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. New York: ACM, 2012 : 1-6.
  • 9Blei D M, Ng A Y,Jordan M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003 (3) : 993-1022.
  • 10Dong Xin, Berti-Equille L, Hu Yi-fan, et al. Global Detection of Complex Copying Relationships Between Sources[C]//Proc. ofVLDB Endowment. New York: VLDB Endowment, 2010, 1358- 1369.

共引文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部