摘要
讨论了目前常用的相似度度量技术:属性计数技术和结构度量技术,通过字符串匹配算法对得到的标记字符串作比较,并根据比较结果给出它们之间匹配程度的数值表示,以此作为文件相似度的度量值。该值越大说明文件越相似,资料信息化过程中存在拷贝的可能性也越高。验证结果表明,该实验系统能检测到资料录入数据中大部分的相似内容。
This thesis discusses the present techniques of measuring the similarities: Structure metrics and Attribute counting. This research is mainly about how to measure the similarities among file. It compares two strings, and calculates the similarity values through the matching results. This value will show how similar two files are. The higher the values are, the more similar the files are. It is more possible that the data is copied. Empirical results indicate that the system can find most similar segments between the two data sets.
出处
《电子设计工程》
2013年第3期20-23,共4页
Electronic Design Engineering
关键词
气象资料
相似度
度量
算法
meteorological data
similarity
measurement
algorithm