期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
FrepJoin:an efficient partition-based algorithm for edit similarity join
1
作者 Ji-zhou LUO Sheng-fei SHI +1 位作者 Hong-zhi WANG Jian-zhong LI 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第10期1499-1510,共12页
String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-and... String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets. 展开更多
关键词 string similarity join Edit distance Filter and refine Data partition Combined frequency vectors
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部