基于语言模型验证的词义消歧语料获取被引量：4

Word Sense Disambiguation Corpus Acquisition by Language Model Validation

下载PDF

导出

摘要作为一种稀缺资源,人工标注语料的匮乏限制了有指导词义消歧系统的大规模应用。有人提出了利用目标词的单义同义词在生语料中自动获取词义消歧语料的方法,然而,在某些上下文当中,用目标词替换这些单义的同义词并不合适,从而带来噪声。为此,笔者使用语言模型过滤这些噪声,达到净化训练数据,提高系统性能的目的。笔者在Senseval-3国际评测中文采样词词义消歧数据集上进行了实验,结果表明经过语言模型过滤的词义消歧系统性能明显高于未经过滤的系统。 The lack of hand crafted training data is a critical issue for supervised word sense disambiguation （WSD） systems. The monosemous lexical relatives substitution of target words have been proposed to acquire WSD corpus from the Web automatically. However, in some cases, the monosemous lexical relatives cannot be substituted by the target word suitably and then noises will be brought in. We propose a language models validation method to filter these noises, which can purify the training data, and improve the performance accordingly. Our experiments on Senseval-3 Chinese lexical sample task show that the system based on the training data acquired from the Web with language model validation achieves better accuracy than the one without language models validation.

作者郭宇航车万翔刘挺

机构地区哈尔滨工业大学计算机科学与技术学院信息检索研究室

出处《中文信息学报》 CSCD 北大核心 2008年第6期38-42,共5页 Journal of Chinese Information Processing

基金国家自然科学基金资助项目(60575042 60675034) 国家863计划资助项目(2006AA01Z145)

关键词计算机应用中文信息处理词义消歧语言模型噪声过滤 computer application Chinese information processing word sense disambiguation language model noise filter

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献24

1Yee Seng Chan, Hwee Tou Ng, and David Chiang. Word sense disambiguation improves statistical ma chine translation [C]//Proeeedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic: 2007: 33-40.
2Liqi Gao, Yu Zhang, Ting Liu, and Gulping Liu. Word sense language model for information retrieval [C]//AIRS, 2006: 158-171.
3Rada Mihalcea and Dan I. Moldovan. An automatic method for generating sense tagged corpora [C]// AAAI '99/IAAI '99: Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Menlo Park, CA, USA. ,1999,461-466.
4David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods [C]//Proceedings of the 33rd annual meeting on Association for Computational Linguistics, Morristown, NJ, USA: 1995: 189-196.
5Yee Seng Chan "and Hwee Tou Ng. Scaling up word sense disambiguation via parallel texts [C]//Manuela M. Veloso and Subbarao Kambhampati, editors, AAAI, AAAI Press/The MIT Press, 2005.. 1037 1042.
6Hang Li and Cong Li. Word translation disambiguation using bilingual bootstrapping [J]. Computational Linguistics, 2004, 30(1): 1-22.
7Claudia Leacock, George A. Miller, and Martin Chodorow. Using corpus statistics and wordnet relations for sense identification [J]. Computational Lin guistics, 1998, 24(1): 147-165.
8Rada Mihalcea and Dan I. Moldovan. An automatic method for generating sense tagged corpora [C]// AAAI '99/IAAI '991 Proceedings of the sixteenth national conference on Artificial intelligence and the e leventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence,Menlo Park, CA, USA,1999,461-466.
9Eneko Agirre and David Martinez. Exploring automat ic word sense disambiguation with decision lists and the web [C]//Proceedings of the Semantic Annotation And Intelligent Annotation workshop organized by COLING Luxembourg 2000, 2000.
10Eneko Agirre and David Martinez. Unsupervised wsd based on automatically retrieved examples: The importance of bias [C]//Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, Barcelona, Spain: 2004:25-32.

共引文献3

1吕俊生.网上信息资源的链接分析研究[J].情报科学,2005,23(1):78-82. 被引量：14
2王灿辉,张敏,马少平.自然语言处理在信息检索中的应用综述[J].中文信息学报,2007,21(2):35-45. 被引量：50
3冯元勇,孙乐,董静,李文波.基于分类信心重排序的中文共指消解研究[J].中文信息学报,2007,21(6):22-28.

同被引文献97

1王振华,孔祥龙,陆汝占,刘绍明.结合决策树方法的中文姓名识别[J].中文信息学报,2004,18(6):10-15. 被引量：15
2张克亮.基于HNC理论的句法结构歧义消解[J].中文信息学报,2004,18(6):43-52. 被引量：9
3全昌勤,何婷婷,姬东鸿,刘辉.从搭配知识获取最优种子的词义消歧方法[J].中文信息学报,2005,19(1):30-35. 被引量：13
4刘蓓,杜利民.汉语口语对话系统中语义分析的消歧策略[J].中文信息学报,2005,19(1):76-83. 被引量：3
5杨莹,李应潭.基于意象知识的消歧体系[J].中文信息学报,1993,7(1):40-47. 被引量：1
6钱树人.歧义、系统歧义和语境[J].中文信息学报,1993,7(2):18-26. 被引量：7
7徐秉铮,詹剑,贺前华.基于神经网络的分词方法[J].中文信息学报,1993,7(2):36-44. 被引量：15
8邰晓英,童頫.限制汉语语法分析中歧义性的启发式方法[J].中文信息学报,1993,7(4):10-17. 被引量：3
9金博,史彦军,滕弘飞.基于语义理解的文本相似度算法[J].大连理工大学学报,2005,45(2):291-297. 被引量：80
10王立霞,孙宏林.现代汉语介词短语边界识别研究[J].中文信息学报,2005,19(3):80-86. 被引量：11

引证文献4

1车超,滕弘飞.伪实例与人工标注实例相结合的词义消歧方法[J].中文信息学报,2009,23(6):31-38. 被引量：1
2孙超,张仰森.利用浅层句法分析提取特征的词义消歧[J].计算机工程与设计,2010,31(21):4704-4707.
3张禄彭,易绵竹,周云.中文歧义研究25年——以《中文信息学报》论文为例[J].中文信息学报,2012,26(4):73-84. 被引量：4
4李国臣,张立凡,李茹,刘海静,石佼.基于词元语义特征的汉语框架排歧研究[J].中文信息学报,2013,27(4):44-51. 被引量：7

二级引证文献12

1张禄彭,易绵竹,周云.中文歧义研究25年——以《中文信息学报》论文为例[J].中文信息学报,2012,26(4):73-84. 被引量：4
2木合亚提·尼亚孜别克,古力沙吾利·塔里甫,古丽拉·阿东别克.哈萨克语NP和VP结构的歧义类型与消除策略研究[J].西南师范大学学报（自然科学版）,2014,39(7):41-46.
3石佼,李茹,王智强.汉语核心框架语义分析[J].中文信息学报,2014,28(6):48-55. 被引量：6
4杜家利,于屏方.花园幽径现象理解折返性的数据结构分析[J].中文信息学报,2015,29(1):28-37. 被引量：2
5党帅兵,李国臣,王瑞波,李济洪.基于词分布表征的汉语框架排歧研究[J].中北大学学报（自然科学版）,2015,36(3):328-332. 被引量：4
6赵红燕,李茹,张晟,张力文.基于DNN的汉语框架识别研究[J].中文信息学报,2016,30(6):75-83. 被引量：8
7孙凡,苏垚开.基于XBRL的自然语言语句的形式化标注研究[J].会计之友,2017(24):70-73.
8张力文,王瑞波,李茹,张晟.基于词分布式表征的汉语框架排歧模型[J].中文信息学报,2017,31(6):50-57. 被引量：7
9门宇鹏,郝晓燕,董嘉敏.基于语义依存分析的CFN框架排歧[J].计算机工程与设计,2019,40(9):2654-2659. 被引量：2
10秦博宇,郝晓燕,刘永芳.基于SVM和CRF双层模型的FrameNet框架消歧[J].计算机工程与应用,2021,57(18):255-262.

1360杀毒获得AV-TEST认证成绩排名第一[J].中国信息安全,2013(2):95-95.
2钱揖丽,荀恩东.基于标点信息和统计语言模型的语音停顿预测[J].模式识别与人工智能,2008,21(4):541-545. 被引量：8
3金澎.词义消歧和词义消歧评测简介[J].术语标准化与信息技术,2010(3):29-34. 被引量：1
4曹馨宇,曹存根.从Web获取部分整体关系语料的方法[J].中文信息学报,2011,25(5):17-23. 被引量：4
5苗洪霞,蔡东风,宋彦,孙景广.利用统计机器翻译方法实现航空领域标题的翻译[J].通讯和计算机（中英文版）,2007,4(2):75-78.
6宋洋,王厚峰.共指消解研究方法综述[J].中文信息学报,2015,29(1):1-12. 被引量：9
7张仰森,黄改娟,苏文杰.基于隐最大熵原理的汉语词义消歧方法[J].中文信息学报,2012,26(3):72-78. 被引量：8
8王梦来,李想,陈奇,李澜博,赵衍运.基于CNN的监控视频事件检测[J].自动化学报,2016,42(6):892-903. 被引量：16
9李正华,车万翔,刘挺.基于柱搜索的高阶依存句法分析[J].中文信息学报,2010,24(1):37-41. 被引量：13
10郭茂盛,张宇,刘挺.文本蕴含关系识别与知识获取研究进展及展望[J].计算机学报,2017,40(4):889-910. 被引量：28

中文信息学报

2008年第6期

浏览历史

内容加载中请稍等...

基于语言模型验证的词义消歧语料获取被引量：4

参考文献24

共引文献3

同被引文献97

引证文献4

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于语言模型验证的词义消歧语料获取 被引量：4

参考文献24

共引文献3

同被引文献97

引证文献4

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于语言模型验证的词义消歧语料获取被引量：4