期刊文献+

基于语言模型验证的词义消歧语料获取 被引量:4

Word Sense Disambiguation Corpus Acquisition by Language Model Validation
下载PDF
导出
摘要 作为一种稀缺资源,人工标注语料的匮乏限制了有指导词义消歧系统的大规模应用。有人提出了利用目标词的单义同义词在生语料中自动获取词义消歧语料的方法,然而,在某些上下文当中,用目标词替换这些单义的同义词并不合适,从而带来噪声。为此,笔者使用语言模型过滤这些噪声,达到净化训练数据,提高系统性能的目的。笔者在Senseval-3国际评测中文采样词词义消歧数据集上进行了实验,结果表明经过语言模型过滤的词义消歧系统性能明显高于未经过滤的系统。 The lack of hand crafted training data is a critical issue for supervised word sense disambiguation (WSD) systems. The monosemous lexical relatives substitution of target words have been proposed to acquire WSD corpus from the Web automatically. However, in some cases, the monosemous lexical relatives cannot be substituted by the target word suitably and then noises will be brought in. We propose a language models validation method to filter these noises, which can purify the training data, and improve the performance accordingly. Our experiments on Senseval-3 Chinese lexical sample task show that the system based on the training data acquired from the Web with language model validation achieves better accuracy than the one without language models validation.
出处 《中文信息学报》 CSCD 北大核心 2008年第6期38-42,共5页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60575042 60675034) 国家863计划资助项目(2006AA01Z145)
关键词 计算机应用 中文信息处理 词义消歧 语言模型 噪声过滤 computer application Chinese information processing word sense disambiguation language model noise filter
  • 相关文献

参考文献24

  • 1Yee Seng Chan, Hwee Tou Ng, and David Chiang. Word sense disambiguation improves statistical ma chine translation [C]//Proeeedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic: 2007: 33-40.
  • 2Liqi Gao, Yu Zhang, Ting Liu, and Gulping Liu. Word sense language model for information retrieval [C]//AIRS, 2006: 158-171.
  • 3Rada Mihalcea and Dan I. Moldovan. An automatic method for generating sense tagged corpora [C]// AAAI '99/IAAI '99: Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Menlo Park, CA, USA. ,1999,461-466.
  • 4David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods [C]//Proceedings of the 33rd annual meeting on Association for Computational Linguistics, Morristown, NJ, USA: 1995: 189-196.
  • 5Yee Seng Chan "and Hwee Tou Ng. Scaling up word sense disambiguation via parallel texts [C]//Manuela M. Veloso and Subbarao Kambhampati, editors, AAAI, AAAI Press/The MIT Press, 2005.. 1037 1042.
  • 6Hang Li and Cong Li. Word translation disambiguation using bilingual bootstrapping [J]. Computational Linguistics, 2004, 30(1): 1-22.
  • 7Claudia Leacock, George A. Miller, and Martin Chodorow. Using corpus statistics and wordnet relations for sense identification [J]. Computational Lin guistics, 1998, 24(1): 147-165.
  • 8Rada Mihalcea and Dan I. Moldovan. An automatic method for generating sense tagged corpora [C]// AAAI '99/IAAI '991 Proceedings of the sixteenth national conference on Artificial intelligence and the e leventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence,Menlo Park, CA, USA,1999,461-466.
  • 9Eneko Agirre and David Martinez. Exploring automat ic word sense disambiguation with decision lists and the web [C]//Proceedings of the Semantic Annotation And Intelligent Annotation workshop organized by COLING Luxembourg 2000, 2000.
  • 10Eneko Agirre and David Martinez. Unsupervised wsd based on automatically retrieved examples: The importance of bias [C]//Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, Barcelona, Spain: 2004:25-32.

共引文献3

同被引文献97

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部