Conditional functional dependencies(CFDs) are important techniques for data consistency. However, CFDs are limited to 1) provide the reasonable values for consistency repairing and 2) detect potential errors. This...Conditional functional dependencies(CFDs) are important techniques for data consistency. However, CFDs are limited to 1) provide the reasonable values for consistency repairing and 2) detect potential errors. This paper presents context-aware conditional functional dependencies(CCFDs) which contribute to provide reasonable values and detect po- tential errors. Especially, we focus on automatically discov- ering minimal CCFDs. In this paper, we present context rela- tivity to measure the relationship of CFDs. The overlap of the related CFDs can provide reasonable values which result in more accuracy consistency repairing, and some related CFDs are combined into CCFDs. Moreover, we prove that discover- ing minimal CCFDs is NP-complete and we design the pre- cise method and the heuristic method. We also present the dominating value to facilitate the process in both the precise method and the heuristic method. Additionally, the context relativity of the CFDs affects the cleaning results. We will give an approximate threshold of context relativity accord- ing to data distribution for suggestion. The repairing results are approved more accuracy, even evidenced by our empirical evaluation.展开更多
文摘Conditional functional dependencies(CFDs) are important techniques for data consistency. However, CFDs are limited to 1) provide the reasonable values for consistency repairing and 2) detect potential errors. This paper presents context-aware conditional functional dependencies(CCFDs) which contribute to provide reasonable values and detect po- tential errors. Especially, we focus on automatically discov- ering minimal CCFDs. In this paper, we present context rela- tivity to measure the relationship of CFDs. The overlap of the related CFDs can provide reasonable values which result in more accuracy consistency repairing, and some related CFDs are combined into CCFDs. Moreover, we prove that discover- ing minimal CCFDs is NP-complete and we design the pre- cise method and the heuristic method. We also present the dominating value to facilitate the process in both the precise method and the heuristic method. Additionally, the context relativity of the CFDs affects the cleaning results. We will give an approximate threshold of context relativity accord- ing to data distribution for suggestion. The repairing results are approved more accuracy, even evidenced by our empirical evaluation.