Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-re...Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive: 1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new incon- sistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets.展开更多
For the linear model y_i=x_iθ+e_i, i=1, 2,…, let the error sequence {e_i}_i=1 be iidr.v.’s, with unknown density f(x). In this paper,a nonparametric estimation method based onthe residuals is proposed for estimatin...For the linear model y_i=x_iθ+e_i, i=1, 2,…, let the error sequence {e_i}_i=1 be iidr.v.’s, with unknown density f(x). In this paper,a nonparametric estimation method based onthe residuals is proposed for estimating f(x) and the consistency of the estimators is obtained.展开更多
Let {Xi = (X1,i,...,Xm,i)T, i ≥ 1} be a sequence of independent and identically distributed nonnegative m-dimensional random vectors. The univariate marginal distributions of these vectors have consistently varying...Let {Xi = (X1,i,...,Xm,i)T, i ≥ 1} be a sequence of independent and identically distributed nonnegative m-dimensional random vectors. The univariate marginal distributions of these vectors have consistently varying tails and finite means. Here, the components of X1 are allowed to be generally dependent. Moreover, let N(.) be a nonnegative integer-valued process, independent of the sequence {Xi, i ≥ 1}. Under several mild assumptions, precise large deviations for Sn =∑i=1 n Xi and SN(t) =∑i=1 N(t) Xi are investigated. Meanwhile, some simulation examples are also given to illustrate the results.展开更多
基金This research was supported by the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61033007 and 61472070, and the Fundamental Research Funds for the Central Universities of China under Grant No. N150408001-3.
文摘Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive: 1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new incon- sistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets.
基金The project supported by National Natural Science Foundation of China Crant 18971061
文摘For the linear model y_i=x_iθ+e_i, i=1, 2,…, let the error sequence {e_i}_i=1 be iidr.v.’s, with unknown density f(x). In this paper,a nonparametric estimation method based onthe residuals is proposed for estimating f(x) and the consistency of the estimators is obtained.
文摘Let {Xi = (X1,i,...,Xm,i)T, i ≥ 1} be a sequence of independent and identically distributed nonnegative m-dimensional random vectors. The univariate marginal distributions of these vectors have consistently varying tails and finite means. Here, the components of X1 are allowed to be generally dependent. Moreover, let N(.) be a nonnegative integer-valued process, independent of the sequence {Xi, i ≥ 1}. Under several mild assumptions, precise large deviations for Sn =∑i=1 n Xi and SN(t) =∑i=1 N(t) Xi are investigated. Meanwhile, some simulation examples are also given to illustrate the results.