摘要
自然语言处理(NLP)旨在如何让计算机更好的理解人类的语言,但是在自然语言中句段、词汇本身存在多义和歧义,计算机无法将其转换为能识别的二进制编码,这是当下NLP领域内存在的最大问题。本文将Viterbi算法的词性标注模型、CBOW语言模型及K-Means聚类算法组合,构建一种基于词向量的多义词组合消歧模型(VCK-Vector)。通过词性分布对比、语义相关度任务和聚类效果分析等方法评测模型,最后通过百度AI词向量与模型输出结果进行对比。结果显示基于VCK-vector模型在实际场景运用中是可行的。
Natural Language Processing(NLP)aims to make computers better understand human language.However in natural language,there are polysemy and ambiguity in sentence segment and vocabulary,and computers cannot convert them into recognizable binary codes.This is the biggest problem in the field of NLP.This paper combined the part of speech tagging model of Viterbi algorithm,CBOW language model and K-Means clustering algorithm to construct a polysemous word combination disambiguation model(VCK-Vector)based on word vector.The model was evaluated by comparing part-of-speech distribution,semantic correlation task and clustering effect analysis.Finally,Baidu AI word vector was compared with the output of the model.The results are showed that the paper propose polysemous word combination disambiguation model(VCK-Vector)based is feasible in scene application.
作者
戴洪涛
侯开虎
周洲
肖灵云
DAI Hong-tao;HOU Kai-hu;ZHOU Zhou;XIAO Ling-yun(School of mechanical and electrical engineering,Kunming University of science and technology,Kunming 650500,Yunnan Province)
出处
《软件》
2020年第2期134-140,共7页
Software