摘要
破译未知语言的关键是寻找相似的字母片段序列。本文针对相似片段的查找,编写了一种新的算法。首先建立索引结构,多次间隔划分得到片段。然后基于海明距离建立相似公式和相似矩阵用于表示两个片段之间的相似度。结合实际,在大量文本记录时发生替换错误的基础下建立相似阈值公式,并通过该公式判断是否为要求查找的相似片段。最后获得了多段文本的相似片段以及其对应的位置。此外使用平均准确率评价算法,经分析和实验表明,该算法有较高的准确率和查找效率。
The key to deciphering an unknown language is to look for similar sequences of letter fragments. In this paper, a new algorithm for finding similar fragments is developed. First, the index structure is built and the fragments are divided at intervals. Then the similarity formula and the similarity matrix are established based on the hamming distance to represent the similarity between the two fragments. In combination with practice, the similarity threshold formula is established on the basis of substitution errors in a large number of text records, and the formula is used to judge whether it is the similar fragment to be searched. Finally, the similar fragments of multiple text and their corresponding positions are obtained. In addition, the average accuracy evaluation algorithm is used, and the analysis and experiments show that the algorithm has good accuracy and search efficiency.
出处
《计算机科学与应用》
2020年第5期971-977,共7页
Computer Science and Application