摘要
词汇相似度广泛应用于自然语言处理的多个领域。然而词汇相似度的计算一般都是基于词而不是基于词的义项来进行的。针对这种情况,提出一种相似词的分类算法。算法首先采用PMImax工具来计算目标词的相似词,然后以Word Net的义项为参照,采用一种改进后的Lesk算法自动将这些相似词按照不同的义项进行分类,每一类相似词只跟对应的义项相似。实验结果表示,该算法的分类正确率可达到84.27%。
Word similarity is widely used in quite a few natural language processing fields. However usually the word similarity measures are computed based on words rather than word senses. Aiming at this situation,in this paper we present a classification algorithm for similar words. It first uses PMImaxtool to calculate similar words of the target words,and then takes the sense of Word Net as reference,it adopts an improved Lesk algorithm to automatically classify these similar words according to different senses,and the similar word of each class only resembles the corresponding sense. Experimental results suggest that this algorithm achieves the classification accuracy up to 84. 27%.
出处
《计算机应用与软件》
CSCD
2015年第3期258-260,288,共4页
Computer Applications and Software
基金
教育部人文社会科学研究基金青年项目(07JC740009)