期刊文献+

结合词形词性和译文的汉语词义消歧 被引量:2

Chinese Word Sense Disambiguation Based on Word-translation and Part-of-speech
下载PDF
导出
摘要 针对汉语中存在的词汇歧义问题,根据左右邻接词汇的词形、词性和译文信息,采用卷积神经网络(convolution neural network,CNN)来确定它的真实含义。选取歧义词汇的消歧词窗,共包含两个邻接词汇单元,抽取其词形、词性和译文作为消歧特征。以消歧特征为基础,结合卷积神经网络来构建词义消歧分类器。利用SemEval-2007:Task#5的训练语料和哈尔滨工业大学语义标注语料来优化CNN的参数。采用SemEval-2007:Task#5的测试语料对词义消歧分类器进行测试。实验结果表明:相对于贝叶斯(Bayes)模型和BP神经网络(BP neural network)而言,本文所提出方法的消歧平均准确率分别提高了14.94%和6.9%。 For vocabulary ambiguity problem in Chinese,CNN(Convolution Neural Network)is adopted to determine true meaning of ambiguous vocabulary where word,part-of-speech and translation around its left and right adjacent words are used.We select disambiguation window of ambiguous word which contains two adjacent lexical units and word,part-of-speech and translation are extracted as disambiguation features.Based on disambiguation features,convolution neural network is used to construct word sense disambiguation(WSD)classifier.Training corpus in SemEval-2007:Task#5 and semantic annotation corpus in Harbin Institute of Technology are used to optimize parameters of CNN.Test corpus in SemEval-2007:Task#5 is applied to test word sense disambiguation classifier.Experimental results show that compared with Bayes model and BP neural network,the proposed method in this paper can make average disambiguation accuracy improve 14.94%and 6.9%.
作者 张春祥 赵凌云 高雪瑶 ZHANG Chun-xiang;ZHAO Ling-yun;GAO Xue-yao(School of Software and Microelectronics, Harbin University of Science and Technology, Harbin 150080, China;School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China)
出处 《哈尔滨理工大学学报》 CAS 北大核心 2020年第3期131-136,共6页 Journal of Harbin University of Science and Technology
基金 国家自然科学基金(61502124,60903082) 中国博士后科学基金(2014M560249) 黑龙江省自然科学基金(F2015041,F201420) 黑龙江省普通高校基本科研业务费专项资金(LGYC2018JC014)。
关键词 词汇歧义 卷积神经网络 词汇单元 消歧特征 词义消歧 vocabulary ambiguity convolution neural network lexical unit disambiguation feature word sense disambiguation
  • 相关文献

参考文献6

二级参考文献49

  • 1胡荣,罗庆云.kNN算法在文本分类中的改进[J].南华大学学报(自然科学版),2005,19(3):78-80. 被引量:4
  • 2卢志茂,刘挺,李生.统计词义消歧的研究进展[J].电子学报,2006,34(2):333-343. 被引量:28
  • 3刘怀军,车万翔,刘挺.中文语义角色标注的特征工程[J].中文信息学报,2007,21(1):79-84. 被引量:39
  • 4刘挺,车万翔,李生.基于最大熵分类器的语义角色标注[J].软件学报,2007,18(3):565-573. 被引量:73
  • 5SEBASTIANI F. Machine Learning in Automated Text Categorization[ J]. ACM Computing Surveys,2002,34( 1 ) :1 -47.
  • 6Seong-Bae Park,Byoung-Tak Zhang,Yung Taek Kim.Word sense disambiguation by learning decision trees from unlabeled data[J].Applied Intelligence,2003(19):27-38.
  • 7Salton G,Buckley B.Term-weighting approaches in automatic text retrieval[J].Information Processing and Management,1988,24(5):513-523.
  • 8Della Pietra,V Della Pietra,Mercer R L,et al.Adaptive language modeling using minimum discriminant estimation[C] // In Proceedings of the Speech and Natural Language DARPA Wokershop,1992.
  • 9Adma L Berge,Stephen A Della Pietra,Vincent J Della Pietra.A maximum entropy approach to natural language processing[J].Computational Linguistic,22(1):39-71,1996.
  • 10张仰森.面向语言资源建设的汉语词义消歧与标注方法研究[D].北京:北京大学计算语言学研究所,2006

共引文献48

同被引文献50

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部