期刊文献+

基于词语软匹配和修饰词权重差异化的术语相似度算法 被引量:2

An Term Similarity Algorithm Based on Word Soft Matching and Weight Difference of Modifing Words
下载PDF
导出
摘要 针对现有基于语词的术语相似度典型算法存在的问题,提出了将WordNet和编辑距离计算应用于术语词语匹配过程,并根据术语修饰词的位置赋予特征权重的术语相似度改进算法。和已有算法相比,新的算法在三个方面有所改进。首先,在术语中心词匹配过程中引入WordNet的同义词、近义词检索功能,实现中心词之间的语义匹配;其次,将术语词语的直接匹配改进为基于编辑距离计算的模糊匹配;最后,在计算过程中充分考虑了术语修饰词与中心词之间的距离对修饰词权重分配的影响因素。针对新算法提出了具体的实现步骤,并选取基因工程领域实验数据对改进算法和现有典型算法进行对比评测。实验证明,每种改进方法在单独测试时效果优于或至少不低于Nenadic算法。基于三种改进方法的综合计算方法在计算效果方面具有明显提升。 Based on the problems exist in typical lexical term similarity algorithm, the paper puts forward improved term similarity algorithm which apply WordNet and edit distance computation to progress of matching words and allocate different similarity weight to modify words according to their position. Comparing to current algorithm, the algorithm is improved in three aspects. Firstly, the algorithm realizes semantic similarity calculation between head words by means of searching WordNet. Secondly, the algorithm apply fuzzy matching method based on edit distance computation to term words matching progress instead of direct matching. Thirdly, influence of distance between modifiers and head on similarity weight allocation is considered in computing progress. The paper also presents specific implementation steps of new algorithm, and evaluates the algorithm on the basis of gene engineering field experiment data set. Experiment result demonstrates that the performance of each improved method is higher or at least not lower than Nenadic algorithm. The integrated improved algorithm has obvious improvement in computing performance.
作者 徐健 张智雄
出处 《情报学报》 CSSCI 北大核心 2011年第11期1145-1151,共7页 Journal of the China Society for Scientific and Technical Information
基金 教育部人文社会科学研究项目(09YJC870031)基金资助.
关键词 术语相似度 语词相似度 相似度计算 term similarity, lexical similarity, similarity computation
  • 相关文献

参考文献21

  • 1Chen P I, Lin S J. Automatic keyword prediction using Google similarity distance [ J ]. Expert Systems with Applications ,2010,37 ( 3 ) : 1928-1938.
  • 2Shehata S. A WordNet-based semantic model for enhan- cing text clustering [ C ] ff Proceedings of IEEE Internat- ional Conference on Data Mining Workshops. Miami, 2009:477-482.
  • 3Aim6 X, Furst F, Kuntz P, et al. SemioSem: A semiotic- based similarity measure [ J]. Lecture Notes in Computer Science, 2009,5872 : 584-593.
  • 4I)ong H, Hussain F K, Chang E. A hybrid conceptsimilarity measure model for ontology environment [ J ]. Lecture Notes in Computer Science,2009,5872:848-857.
  • 5Levenshtein V I. Binary codes capable of correcting deletions,insertions, and reversals [ J ]. Soy Phys Dokl, 1966(10) :707-710.
  • 6Kelil A, Wang S, Jiang Q, et al. A general measure of similarity for categorical sequences [ J ]. Knowledge and Information Systems, 2010,24 ( 2 ) : 197-220.
  • 7Winkler W E. The state of record linkage and current research problems [ R ]. Wachington, DC : U. S. Bureau of the Census, 1999 : 1-15.
  • 8Bourigault D, Jacquemin C. Term extraction + term clustering: an integrated platform for computer-aided terminology [ C ]//Proceedings of the Ninth Conference on European Chapter of the Association of Computational Linguistics, 1999,15 : 19-22.
  • 9Nenadid G, Spasid I, Ananiadou S. To insert individual citation into a bibliography in a wordprocessor, select your preferred citation style below and drag-and-drop it into the document. Automatic discovery of term similarities using pattern mining [ C ] ////Proceedings of International Conference On Computational Linguistics. Taipei, 2002 : 1-7.
  • 10章成志.基于多层特征的字符串相似度计算模型[J].情报学报,2005,24(6):696-701. 被引量:40

二级参考文献3

共引文献42

同被引文献16

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部