摘要
针对现有基于语词的术语相似度典型算法存在的问题,提出了将WordNet和编辑距离计算应用于术语词语匹配过程,并根据术语修饰词的位置赋予特征权重的术语相似度改进算法。和已有算法相比,新的算法在三个方面有所改进。首先,在术语中心词匹配过程中引入WordNet的同义词、近义词检索功能,实现中心词之间的语义匹配;其次,将术语词语的直接匹配改进为基于编辑距离计算的模糊匹配;最后,在计算过程中充分考虑了术语修饰词与中心词之间的距离对修饰词权重分配的影响因素。针对新算法提出了具体的实现步骤,并选取基因工程领域实验数据对改进算法和现有典型算法进行对比评测。实验证明,每种改进方法在单独测试时效果优于或至少不低于Nenadic算法。基于三种改进方法的综合计算方法在计算效果方面具有明显提升。
Based on the problems exist in typical lexical term similarity algorithm, the paper puts forward improved term similarity algorithm which apply WordNet and edit distance computation to progress of matching words and allocate different similarity weight to modify words according to their position. Comparing to current algorithm, the algorithm is improved in three aspects. Firstly, the algorithm realizes semantic similarity calculation between head words by means of searching WordNet. Secondly, the algorithm apply fuzzy matching method based on edit distance computation to term words matching progress instead of direct matching. Thirdly, influence of distance between modifiers and head on similarity weight allocation is considered in computing progress. The paper also presents specific implementation steps of new algorithm, and evaluates the algorithm on the basis of gene engineering field experiment data set. Experiment result demonstrates that the performance of each improved method is higher or at least not lower than Nenadic algorithm. The integrated improved algorithm has obvious improvement in computing performance.
出处
《情报学报》
CSSCI
北大核心
2011年第11期1145-1151,共7页
Journal of the China Society for Scientific and Technical Information
基金
教育部人文社会科学研究项目(09YJC870031)基金资助.
关键词
术语相似度
语词相似度
相似度计算
term similarity, lexical similarity, similarity computation