期刊文献+

结合语义相似度分析的Web数据库Top-K典型化查询方法 被引量:4

Top-k Typicality Query Approach Incorporating Similarity Analysis for Web Database
下载PDF
导出
摘要 当前的Web数据库查询系统返回的top-k个查询结果之间在内容上通常非常相似,而现实中用户希望看到彼此有一定差异且具有代表性的查询结果.提出一种基于元组之间语义相似度的top-k典型化查询方法,该方法在离线阶段首先根据属性值的关联信息评估不同属性值之间的耦合关系;然后根据属性值之间的耦合关系,评估不同元组之间的语义相似度.当查询到来时,根据结果元组之间的语义距离,利用概率密度估计方法评估每个元组的典型程度,然后利用top-k近似选取算法从中获取top-k个典型元组返回给用户.实验结果与分析表明,提出的元组典型程度分析方法具有较高的用户满意度,提出的top-k典型元组近似选取方法具有较高的准确性和执行效率,能够有效适用于大规模查询结果集的top-k典型化查询. The answer tuples returned by traditional top-k query algorithms are usually too similar to each other in term of semantics, while in reality the user would like to obtain the tuples that are relevant to the given query but are different from each other. This paper proposes a semantic similarity-based top-k typicality query approach which consists of offiine and online processing steps. During the offline step,given a pair of attribute values, this approach leverages the information associated to them to measure the coupling relationship between different pairs of attribute values. And then, the semantic similarity between different tuples can be computed by com- bining the corresponding attribute value coupling relationships. When a new query coming,based on the semantic distance of different tuples, the probabilistic density estimation method is used to measure the typicality of each tuple in the answer. After this, the number of top-k typicaity tuples can be returned by using the top-k approximate selection algorithm. The experimental resutls demonstrates that our tuple typicality estimation method can achieve the high user satisfaction. The performatnce of the top-k approximate selection algorithm is also demonstrated, which is verfied to be suitable for processing the large dataset.
出处 《小型微型计算机系统》 CSCD 北大核心 2016年第8期1692-1696,共5页 Journal of Chinese Computer Systems
基金 国家青年科学基金项目(61003162)资助 辽宁省高等学校杰出青年学者成长计划项目(LJQ2013038)资助
关键词 WEB数据库 耦合关系 高斯核函数 典型程度分析 top-k近似选取 Web database coupling relationship gaussain kernel function typicality analysis top-k approximate selection
  • 相关文献

参考文献13

  • 1Ilyas I F, Beskales G, Soliman M A. A survey of top-k query pro- cessing techniques in relational database systems[ J ]. ACM Compu- ting Surveys,2008,40(4) :1101-1158.
  • 2Bruno N, Wang H. The threshold algorithm: from middleware sys- tems to the relational engine [ J ]. IEEE Trasactions on Knowledge and Data Engineering,2007,19(4) :523-537.
  • 3Yu A W, Mamoulis N, Su H. Reverse top-k search using random walk with restart[ C]. Proceedings of the 40th International Confer- ence on Very Large Data Bases (PVLDB) ,2014,7 (5) :401-412.
  • 4Deutch D, Milo T, Polyzotis N. Top-k queries over web applications [ J]. The International Journal on Very Large Data Bases( VLDB ), 2013,22(4) :519-542.
  • 5Dubois D. Vagueness, typicality, and uncertainty in class hierahies [ J]. International Journal of Intelligent Systems,1991,20(6) :167-183.
  • 6Nosofsky R M. Similarity, frequency, and category representations [ J]. Journal of Experimental Psychology: Learning, Memory, and Cognition, 1988,14( 1 ) :54-65.
  • 7Gan G J,Ma C Q,Wu J H. Data clustering[ M]. Society for Indus- trial and Applied Mathematics,2007.
  • 8Bouveyron C, Brunet Saumard C. Model-based clustering of high- dimensional data: a review [ J ]. Computational Statistics and Data Analysis,2014,71 (3) :52-78.
  • 9Wang C, She Z, Cao L B. Coupled clustering ensemble:incorpora- ting coupling relationships both between base clusterings and objects [ C ]. Proceedings of the International Conference on Data Engi- neering ,2013:374-385.
  • 10Wang X, Sukthankar G. MuM-label relational neighbor classifica- tion using social context features [ C ]. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ,2013:464-472.

同被引文献40

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部