期刊文献+

基于替换错误的相似片段查找

Similar Fragment Queries Based on Substitution Errors
下载PDF
导出
摘要 破译未知语言的关键是寻找相似的字母片段序列。本文针对相似片段的查找,编写了一种新的算法。首先建立索引结构,多次间隔划分得到片段。然后基于海明距离建立相似公式和相似矩阵用于表示两个片段之间的相似度。结合实际,在大量文本记录时发生替换错误的基础下建立相似阈值公式,并通过该公式判断是否为要求查找的相似片段。最后获得了多段文本的相似片段以及其对应的位置。此外使用平均准确率评价算法,经分析和实验表明,该算法有较高的准确率和查找效率。 The key to deciphering an unknown language is to look for similar sequences of letter fragments. In this paper, a new algorithm for finding similar fragments is developed. First, the index structure is built and the fragments are divided at intervals. Then the similarity formula and the similarity matrix are established based on the hamming distance to represent the similarity between the two fragments. In combination with practice, the similarity threshold formula is established on the basis of substitution errors in a large number of text records, and the formula is used to judge whether it is the similar fragment to be searched. Finally, the similar fragments of multiple text and their corresponding positions are obtained. In addition, the average accuracy evaluation algorithm is used, and the analysis and experiments show that the algorithm has good accuracy and search efficiency.
出处 《计算机科学与应用》 2020年第5期971-977,共7页 Computer Science and Application
  • 相关文献

参考文献2

二级参考文献6

共引文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部