摘要
给定一个查询结果的一致性程度阈值,可行性判定判断不一致数据上查询结果的一致性程度是否大于给定的阈值。若不是,则查询结果对用户来说是没有意义的,此查询不可行。对于数据量大,查询开销较大的应用中,若是能在查询之前预估查询结果的准确度,则能在很大程度上节省查询的开销以及用户的时间。在查询密集型场景,判定查询的可行性具有重要的意义。查询可行性的判定等价于预估查询结果的一致性。本文采用抽样方法预估查询结果的一致性。抽样算法分别对一致的数据部分和不一致的数据部分采样,使得保证抽出的样本大概率下满足查询条件并且服从不一致数据的分布。根据抽出的样本,本文给出了估计一致性程度的算法,证明了一致性程度的估计是渐进无偏的。
When the consistency degree of the query results exceeds the user tolerable threshold,the query results for the user is invalid. This query could be called not feasible. If the query can be estimated before implementation by a relatively small price,the efficiency of query processing will improve greatly. It gets more gains for larger data sets,and larger query overhead. This paper designs a sample method to estimate the query results consistency degree. The sample method take samples separately from consistent part and inconsistent part of inconsistent data,by different sample strategy. Then based on the samples,the consistency degree is estimated. It is proved that the estimate is gradually unbiased,Also,It is proved that the estimate is a( / epsilon,/ delta) estimate.
作者
刘雪莉
李建中
LIU Xueli;LI Jianzhong(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
出处
《智能计算机与应用》
2018年第2期1-6,13,共7页
Intelligent Computer and Applications
基金
国家自然科学基金(61190115
61033015)
国家重点基础研究发展计划(973)(2012CB316200)
关键词
不一致弱可用数据
聚集查询
上下界
近似
inconsistent weak available
aggregation query
upper and lower bound
approximate