期刊文献+

Content-Related Repairing of Inconsistencies in Distributed Data

Content-Related Repairing of Inconsistencies in Distributed Data
原文传递
导出
摘要 Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive: 1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new incon- sistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets. Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive: 1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new incon- sistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期741-758,共18页 计算机科学技术学报(英文版)
基金 This research was supported by the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61033007 and 61472070, and the Fundamental Research Funds for the Central Universities of China under Grant No. N150408001-3.
关键词 data quality management distributed consistency content relativity consistency repairing data quality management, distributed consistency, content relativity, consistency repairing
  • 相关文献

参考文献2

二级参考文献100

  • 1Martinenghi D. Advanced techniques for efficient data integrity checking [Ph.D. Dissertation]. Roskilde University, Roskilde, Denmark, 2005.
  • 2Feras A H H. Integrity constraints maintenance for parallel databases [Ph.D. Dissertation]. Universiti Putra Malaysia, Malaysia, 2006.
  • 3Grefen P W P J. Combining theory and practice in integrity control: A declarative approach to the specification of a transaction modification subsystem. In Proc. the 19th International Conference on Very Large Data Bases (VLBD 19), Dublin, Ireland, August 24-27, 1993, pp.581- 591.
  • 4Ibrahim H, Gray W A, Fiddian N J. Optimizing fragment constraints -- A performance evaluation. International Journal of Intelligent Systems -- Verification and Validation Issues in Databases, Knowledge-Based Systems, and Ontologies, John Wiley & Sons Inc., 2001, 16(3): 285-306.
  • 5Simon E, Valduriez P. Integrity control in distributed database systems. In Proc. the 19th International Conference on System Sciences, Hawaii, USA, January 8-10, 1986, pp.622-632.
  • 6Qian x. Distribution design of integrity constraints. In Proc. the 2nd International Conference on Expert Database Sys- tems, Vienna, Virginia, USA, April 25-27, 1989, pp.205-226.
  • 7Mazumdar S. Optimizing distributed integrity constraints. In Proc. the 3rd International Symposium on Database Systems for Advanced Applications, Taejon, Korea, April 6-8, 1993, Vol.4, pp.327-334.
  • 8Gupta A. Partial information based integrity constraint checking [Ph.D. Dissertation]. Department of Computer Sci- ence, Stanford University, USA, 1994.
  • 9Ibrahim H. A strategy for semantic integrity checking in distributed databases. In Proc. the Ninth International Con- ference on Parallel and Distributed Systems, Taiwan, China, IEEE Computer Society, Dec. 17-20, 2002, p.139.
  • 10Madiraju P, Sunderraman R, Haibin W. A framework for global constraint checking involving aggregates in multidatabases using granular computing. In Proc. IEEE International Conference on Granular Computing (IEEE-GrC'06), Atlanta, USA, May 10-12, 2006, pp.506-509.

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部