摘要
针对汉语中存在的词汇歧义问题,根据左右邻接词汇的词形、词性和译文信息,采用卷积神经网络(convolution neural network,CNN)来确定它的真实含义。选取歧义词汇的消歧词窗,共包含两个邻接词汇单元,抽取其词形、词性和译文作为消歧特征。以消歧特征为基础,结合卷积神经网络来构建词义消歧分类器。利用SemEval-2007:Task#5的训练语料和哈尔滨工业大学语义标注语料来优化CNN的参数。采用SemEval-2007:Task#5的测试语料对词义消歧分类器进行测试。实验结果表明:相对于贝叶斯(Bayes)模型和BP神经网络(BP neural network)而言,本文所提出方法的消歧平均准确率分别提高了14.94%和6.9%。
For vocabulary ambiguity problem in Chinese,CNN(Convolution Neural Network)is adopted to determine true meaning of ambiguous vocabulary where word,part-of-speech and translation around its left and right adjacent words are used.We select disambiguation window of ambiguous word which contains two adjacent lexical units and word,part-of-speech and translation are extracted as disambiguation features.Based on disambiguation features,convolution neural network is used to construct word sense disambiguation(WSD)classifier.Training corpus in SemEval-2007:Task#5 and semantic annotation corpus in Harbin Institute of Technology are used to optimize parameters of CNN.Test corpus in SemEval-2007:Task#5 is applied to test word sense disambiguation classifier.Experimental results show that compared with Bayes model and BP neural network,the proposed method in this paper can make average disambiguation accuracy improve 14.94%and 6.9%.
作者
张春祥
赵凌云
高雪瑶
ZHANG Chun-xiang;ZHAO Ling-yun;GAO Xue-yao(School of Software and Microelectronics, Harbin University of Science and Technology, Harbin 150080, China;School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China)
出处
《哈尔滨理工大学学报》
CAS
北大核心
2020年第3期131-136,共6页
Journal of Harbin University of Science and Technology
基金
国家自然科学基金(61502124,60903082)
中国博士后科学基金(2014M560249)
黑龙江省自然科学基金(F2015041,F201420)
黑龙江省普通高校基本科研业务费专项资金(LGYC2018JC014)。
关键词
词汇歧义
卷积神经网络
词汇单元
消歧特征
词义消歧
vocabulary ambiguity
convolution neural network
lexical unit
disambiguation feature
word sense disambiguation