摘要
作为自然语言理解的一项基础工作,词汇语义相似度度量一直是研究的重点。语义相似度度量本身是一个中间任务,它是大多数自然语言处理任务中一个必不可少的中间层次,在自然语言处理中有着广泛的应用,如词义消歧、信息检索以及机器翻译等。论文提出了一种新的基于百度百科词条信息的词汇相似度计算方法。该方法经由百科名片、词条正文,开放分类和相关词条四个部分的内容,分别计算出它们之间的相似性值,以此来获得一对词汇间的整体相似性。经试验证明,在Words-240数据集上,论文方法在词汇相似度计算上,准确率更高。
As a basic work of natural language understanding,lexical semantic similarity measurement has always been the focused on research.Semantic similarity measurement is an intermediate task,and it is an essential intermediate level in most natural language processing tasks.It is widely used in natural language processing,such as word sense disambiguation,information retrieval and machine translation.This paper proposes a new method for computing lexical similarity based on Baidu encyclopedia entry information.This method calculates the similarity values of encyclopedia business cards,entries text,open classification and related entries respectively,so as to obtain the overall similarity between a pair of words.Experiments show that the proposed method has a higher accuracy in word similarity calculation on Words-240 data set.
作者
仲远
王芳
黄树成
ZHONG Yuan;WANG Fang;HUANG Shucheng(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212003)
出处
《计算机与数字工程》
2020年第7期1580-1584,1736,共6页
Computer & Digital Engineering