期刊文献+

一种半监督的汉语词义消歧方法 被引量:7

Semi-Supervised Method for Chinese Word Sense Disambiguation
下载PDF
导出
摘要 为了解决自然语言处理领域中的一词多义问题,本文提出了一种利用多种语言学知识和词义消歧模型的半监督消歧方法.首先,以歧义词汇左、右邻接词单元的词形、词性和译文作为消歧特征,来构建贝叶斯(Bayes)词义分类器,并以歧义词汇左、右邻接词单元的词形和词性作为消歧特征,来构建最大熵(maximum entropy,ME)词义分类器;其次,采用Co-Training算法并结合大量无标注语料来优化词义消歧模型;再次,进行了优化实验,在实验中,使用SemEval-2007:Task#5的训练语料和哈尔滨工业大学的无标注语料来优化贝叶斯分类器和最大熵分类器;最后,对优化后的词义消歧模型进行测试.测试结果表明:与基于支持向量机(support vector machine,SVM)的词义消歧方法相比,本文所提出方法的消歧准确率提高了0.9%.词义消歧的性能有所提高. To solve the problem of a word having multiple meanings in the natural language processing(NLP)field,a semi-supervised disambiguation method,that uses a range of word sense disambiguation(WSD)models and linguistic knowledge has been proposed in this paper.First,words,parts of speech and translations were used as discriminative features,which were extracted from word units adjacent to the left and right of an ambiguous word.A word sense classifier was constructed using a Bayes model,following which a word sense classifier based on a maximum entropy(ME)model was constructed.Second,a Co-Training algorithm,based on a multitude of unannotated corpora,was adopted to optimize the WSD model.Third,optimization experiments were conducted in which training corpus in SemEval-2007:Task#5 and a large number of unannotated corpora from Harbin Institute of Technology were applied to optimize the Bayesian classifier and the maximum entropy classifier.Finally,the optimized WSD model was tested.Test results demonstrate an increase in the disambiguation accuracy of the proposed method by 0.9%compared to WSD models based on support vector machines,thereby exhibiting an improvement in WSD performance.
作者 张春祥 徐志峰 高雪瑶 ZHANG Chunxiang;XU Zhifeng;GAO Xueyao(School of Software and Microelectronics,Harbin University of Science and Technology,Harbin 150080,China;School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China)
出处 《西南交通大学学报》 EI CSCD 北大核心 2019年第2期408-414,共7页 Journal of Southwest Jiaotong University
基金 国家自然科学基金资助项目(61502124 60903082) 中国博士后科学基金资助项目(2014M560249) 黑龙江省自然科学基金资助项目(F201420 F2015041)
关键词 自然语言处理 词义消歧 最大熵 贝叶斯分类器 natural language processing word sense disambiguation maximum entropy Bayesian classifier
  • 相关文献

参考文献6

二级参考文献93

  • 1陈文亮,朱靖波,朱慕华,姚天顺.基于领域词典的文本特征表示[J].计算机研究与发展,2005,42(12):2155-2160. 被引量:21
  • 2卢志茂,刘挺,李生.统计词义消歧的研究进展[J].电子学报,2006,34(2):333-343. 被引量:27
  • 3卢志茂,刘挺,李生.基于无指导机器学习的全文词义自动标注方法[J].自动化学报,2006,32(2):228-236. 被引量:2
  • 4YANG Che-Yu.Word sense disambiguation using semantic relatedness measurement[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(10):1609-1625. 被引量:7
  • 5Ide N, Veronis J. Word sense disambiguation: the state of the art. Computational Linguistics, 1998, 24(1): 1-41.
  • 6Lin S D, Kaxin V. A semantics-enhanced language model for unsupervised word sense disambiguation. In: Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing. Haifa, Israel: Springer, 2008. 287-298.
  • 7McCarthy D, Koeling R, Weeds J, Carroll J. Unsupervised acquisition of predominant word senses. Computational Linguistics, 2007, 33(4): 553-590.
  • 8Pedersen T, Bruce R. Distinguishing word senses in untagged text. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing. New York, USA: 1997. 197-207.
  • 9盛骤.概率论与数理统计.上海:上海交通人学出版社,1999.83-84.
  • 10Klein D. Unsupervised learning for natural language processing. In: Proceedings of the 21st Annual Conference on Learning Theory. Helsinki, Finland: Springer, 2008. 5-6.

共引文献25

同被引文献60

引证文献7

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部