摘要
训练语料库的规模对基于机器学习的命名实体间语义关系抽取具有重要的作用,而语料库的人工标注需要花费大量的时间和人力。该文提出了使用机器翻译的方法将源语言的关系实例转换成目标语言的关系实例,并通过实体对齐策略将它们加入到目标语言的训练集中,从而使资源丰富的源语言帮助欠资源的目标语言进行语义关系抽取。在ACE2005中英文语料库上的关系抽取实验表明,无论是将中文翻译成英文,还是将英文翻译成中文,都对另一种语言的关系抽取具有帮助作用。特别是当目标语言的训练语料库规模较小时,这种帮助就尤其显著。
The scale of training corpus plays an important role in machine learning-based semantic relation extraction between named entities,however,the annotation of corpus is time-consuming and labor-intensive.In order that a resource-rich language can help a resource-poor language in semantic relation extraction,we propose an approach to transforming relation instances from the source language to the target language via machine translation,and then add them into the training corpus of the target language by way of entity alignment.The experiments on the ACE2005Chinese and English corpora show that,Chinese and English can help each other in relation extraction.Furthermore,this help is particularly significant especially when the scale of training corpus in target language is small.
出处
《中文信息学报》
CSCD
北大核心
2013年第5期191-197,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60873150
90920004)
江苏省自然科学基金资助项目(BK2010219)
江苏省高校自然科学重大项目(11KJA520003)
关键词
跨语言关系抽取
机器翻译
实体对齐
Cross-lingual relation extraction
machine translation
entity alignment