期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
PMS-Sorting:A New Sorting Algorithm Based on Similarity
1
作者 Hongbin Wang Lianke Zhou +4 位作者 Guodong Zhao Nianbin Wang Jianguo Sun Yue Zheng Lei Chen 《Computers, Materials & Continua》 SCIE EI 2019年第4期229-237,共9页
Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is... Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is not very good and the computing method of relative score in Borda sorting algorithm is according to the rule of the linear regressive,but position relationship cannot fully represent the correlation changes.aimed at this drawback,the new sorting algorithm is proposed in this paper,named PMS-Sorting algorithm,firstly the position score of the returned results is standardized processing,and the similarity retrieval word string with the query results is combined into the algorithm,the similarity calculation method is also improved,through the experiment,the improved algorithm is superior to traditional sorting algorithm. 展开更多
关键词 Meta search engine result sorting query similarity Borda sorting algorithm position relationship
下载PDF
ETI: an efficient index for set similarity queries 被引量:2
2
作者 Lianyin JIA Jianqing XI +2 位作者 Mengjuan LI Yong LIU Decheng MIAO 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第6期700-712,共13页
Set queries are an important topic and have attracted a lot of attention. Earlier research mainly concentrated on set containment queries. In this paper we focus on the T-Overlap query which is the foundation of the s... Set queries are an important topic and have attracted a lot of attention. Earlier research mainly concentrated on set containment queries. In this paper we focus on the T-Overlap query which is the foundation of the set similarity query. To address this issue, unlike traditional algorithms that are based on an inverted index, we design a new paradigm based on the prefix tree (trie) called the expanded trie index (ETI) which expands the trie node structure by adding some new properties. Based on ETI, we convert the T- Overlap problem to finding query nodes with specific query depth equaling to T and propose a new algorithm called T- Similarity to solve T-Overlap efficiently. Then we carry out a three-step framework to extend T-Overlap to other simi- larity predicates. Extensive experiments are carried out to compare T-Similarity with other inverted index based algorithms from cardinality of query, overlap threshold, dataset size, the number of distinct elements and so on. Results show that T-Similarity outperforms the state-of-the-art algorithms in many aspects. 展开更多
关键词 expanded trie index (ETI) set similarity query T-Overlap T-similarity algorithm T-similarityExact algorithm
原文传递
Most similar maximal clique query on large graphs
3
作者 Yun PENG Yitong XU +2 位作者 Huawei ZHAO Zhizheng ZHOU Huimin HAN 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第3期113-128,共16页
This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications inclu... This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications including advertising industry,public security,task crowdsourcing and social network,etc.MSMCQ can be studied as a special case of the general set similarity query(SSQ).However,the MCs of G has several specialties from the general sets.Based on the specialties of MCs,we propose a novel index,namely MCIndex.MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time.Specifically,we first construct an inverted indexⅠfor all the MCs of G.Since the MCs in a posting list often have a lot of overlaps,MCIndex selects some pivots to cluster the MCs with a small radius.Given a query Q,we compute the distance from the pivots to Q.The clusters of the pivots assured not answer can be pruned by our distance based pruning rule.Since it is NP-hard to construct a minimum MCIndex,we propose to construct a minimal MCIndex onⅠ(v)with an approximation ratio 1+ln|Ⅰ(v)|.Since the MCs have properties that are inherent of graph structure,we further propose a S Index within each cluster of a MCIndex and a structure based pruning rule.S Index can significantly reduce the number of candidates.Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation,we also propose a binary representation of MCs to improve the efficiency of the intersection size computation.Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets. 展开更多
关键词 most similar maximal clique similarity query graph data
原文传递
MC-Tree: Dynamic Index Structure for Partially Clustered Multi-Dimensional Database
4
作者 靳晓明 王丽坤 +1 位作者 陆玉昌 石纯一 《Tsinghua Science and Technology》 SCIE EI CAS 2003年第2期174-180,共7页
Index structure that enables efficient similarity queries in high-dimensional space is crucial for many applications. This paper discusses the indexing problem in dataset composed of partially clustered data, which ex... Index structure that enables efficient similarity queries in high-dimensional space is crucial for many applications. This paper discusses the indexing problem in dataset composed of partially clustered data, which exists in many applications. Current index methods are inefficient with partially clustered datasets. The dynamic and adaptive index structure presented here, called a multi-cluster tree (MC-tree), consists of a set of height-balanced trees for indexing. This index structure improves the querying efficiency in three ways: 1) Most bounding regions achieve uniform distributions, which results in fewer splits and less overlap compared with a single indexing tree. 2) The clusters in the dataset are dynamically detected when the index is updated. 3) The query process does not involve a sequential scan. The MC-tree was shown to be better than hierarchical and cluster-based indexes for the partially clustered datasets. 展开更多
关键词 MC-tree multi-dimensional index similarity query partially clustered dataset
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部