期刊文献+

中文词义消歧上下文最优边界问题研究 被引量:1

Optimal Context Window for Chinese Word Sense Disambiguation
原文传递
导出
摘要 为了选择最优的边界,采用交叉验证方法,将取得错误率最低的上下文边界确定为上下文最优边界,并应用此方法对Sem Eval-2007中文数据集进行处理,得出此数据集的上下文最优边界为[-2,+2]。为了验证其结果的有效性,进一步采用SemEval-2007测试集进行消歧测试,结果表明采用交叉验证法确定的最优边界对词义消歧准确率有一定提升。同时对不同词性歧义词的最优边界也进行讨论。 To determine the optimal context field of ambiguous word, the paper uses cross -validation method to identify the optimal context window, and the best one has the lowest error rate in all of candidates. Using this method, it processes SemEval - 2007 data sets and finds that the optimal context windows for this data sets is [ - 2, + 2 ]. In order to verify this result, there is a WSD test for SemEval - 2007 test data sets, which shows that the performance of Chinese WSD upgrades to a certain extent. And the different optimal context windows for different parts of speech of ambiguous word are discussed.
出处 《现代图书情报技术》 CSSCI 北大核心 2009年第7期49-53,共5页 New Technology of Library and Information Service
基金 国家自然科学基金项目“文本集特征提取方法及应用研究”(项目编号:70673070)的研究成果之一
关键词 词义消歧 上下文边界 特征选择 中文 Word sense disambiguation Context window Feature selection Chinese
  • 相关文献

参考文献17

  • 1Nancy Ide, Jean Veronis. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art [ J]. Computational Linguistics, 1998, 24(1) : 2 -40.
  • 2Mosteller F, Wallace D L. Inference and Disputed Authorship: The Federalist Papers[ M 1. USA: Addison - Wesley Educational Publishers Inc, 1964.
  • 3Martin W J R, Al B P F, Van Sterkenburg P J G. On the Processing of Text Corpus: From Textual Data to Lexicographical Information [ A]. //Lexicography: Principles and Practice [ M]. USA: Academic Press, 1983 : 56 -64.
  • 4Choueka Y, Lusignan S. Disambiguation by Short Contexts [ J ]. Computers and the Humanities, 1985, 19 (3) : 147 - 157.
  • 5Gale W A, Church K W, Yarowsky D. A Method for Disambiguating Word Senses in a Large Corpus [ J]. Computers and the Humanities, 1992, 26(5 -6) : 415 -439.
  • 6Yarowsky D. One Sense per Collocation [ C ]. In : Proceedings of the Workshop on Human Language Technology, Princeton, New Jersey. USA : Association for Computational Linguistics, 1993 : 266 - 271.
  • 7Hughes J. Automatically Acquiring a Classification of Words [ D ]. Paris: University of Leeds, 1994.
  • 8朱靖波,李珩,张跃,姚天顺.基于对数模型的词义自动消歧[J].软件学报,2001,12(9):1405-1412. 被引量:13
  • 9卢志茂,刘挺,郎君,李生.神经网络和贝叶斯网络在汉语词义消歧上的对比研究[J].高技术通讯,2004,14(8):15-19. 被引量:9
  • 10吴云芳,王淼,金澎,俞士汶.多分类器集成的汉语词义消歧研究[J].计算机研究与发展,2008,45(8):1354-1361. 被引量:14

二级参考文献32

  • 1卢志茂,刘挺,郎君,李生.神经网络和贝叶斯网络在汉语词义消歧上的对比研究[J].高技术通讯,2004,14(8):15-19. 被引量:9
  • 2黄河燕,陈肇雄,张孝飞,张克亮.大规模句子相似度计算方法[J].中文信息学报,2006,20(B03):47-52. 被引量:6
  • 3全昌勤,何婷婷,姬东鸿,余绍文.基于多分类器决策的词义消歧方法[J].计算机研究与发展,2006,43(5):933-939. 被引量:8
  • 4Schutze H. Automatic word sense discrimination [ J ]. Computational Linguistics, 1998, 24 ( 1 ) : 97 - 123.
  • 5Salton G, Buekley B. Term-Weighting approaches in automatic text retrieval [J ]. Information Processing and Management, 1988, 24(5) : 513 - 523.
  • 6Stetina Jiri,Proc 5th Workshop on Very Large Corpora,1997年,66页
  • 7白硕,语言学知识的计算机辅助发现,1995年
  • 8方开泰,实用多元统计分析,1989年
  • 9Jurafsky D,Martin J H.自然语言处理综论[M].冯志伟,孙乐译.北京:电子工业出版社,2005:22-30.
  • 10Nancy Ide,Jean Véronis.Introduction to the special issue on word sense disambiguation:the state of the art[J].Computational Linguistics,1998,24(1):1 -5

共引文献76

同被引文献10

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部