摘要
当前的Web数据库查询系统返回的top-k个查询结果之间在内容上通常非常相似,而现实中用户希望看到彼此有一定差异且具有代表性的查询结果.提出一种基于元组之间语义相似度的top-k典型化查询方法,该方法在离线阶段首先根据属性值的关联信息评估不同属性值之间的耦合关系;然后根据属性值之间的耦合关系,评估不同元组之间的语义相似度.当查询到来时,根据结果元组之间的语义距离,利用概率密度估计方法评估每个元组的典型程度,然后利用top-k近似选取算法从中获取top-k个典型元组返回给用户.实验结果与分析表明,提出的元组典型程度分析方法具有较高的用户满意度,提出的top-k典型元组近似选取方法具有较高的准确性和执行效率,能够有效适用于大规模查询结果集的top-k典型化查询.
The answer tuples returned by traditional top-k query algorithms are usually too similar to each other in term of semantics, while in reality the user would like to obtain the tuples that are relevant to the given query but are different from each other. This paper proposes a semantic similarity-based top-k typicality query approach which consists of offiine and online processing steps. During the offline step,given a pair of attribute values, this approach leverages the information associated to them to measure the coupling relationship between different pairs of attribute values. And then, the semantic similarity between different tuples can be computed by com- bining the corresponding attribute value coupling relationships. When a new query coming,based on the semantic distance of different tuples, the probabilistic density estimation method is used to measure the typicality of each tuple in the answer. After this, the number of top-k typicaity tuples can be returned by using the top-k approximate selection algorithm. The experimental resutls demonstrates that our tuple typicality estimation method can achieve the high user satisfaction. The performatnce of the top-k approximate selection algorithm is also demonstrated, which is verfied to be suitable for processing the large dataset.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第8期1692-1696,共5页
Journal of Chinese Computer Systems
基金
国家青年科学基金项目(61003162)资助
辽宁省高等学校杰出青年学者成长计划项目(LJQ2013038)资助