Multilingual corpora have well been recognised as a valuable resource in contrastive and translation studies.This article investigates the development and use of multilingual corpora with a focus on work done in Scand...Multilingual corpora have well been recognised as a valuable resource in contrastive and translation studies.This article investigates the development and use of multilingual corpora with a focus on work done in Scandinavia with the purpose of showing how parallel corpora can be useful within different fields of language descriptions:lexis,grammar and discourse.It also presents a case study that demonstrates how a parallel corpus can be used in comparing two seemingly equivalent future-referring expressions cross-linguistically,namely the English 'be going to' and the Norwegian 'kommer til '('come to').展开更多
Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from para...Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER.展开更多
近年来,使用平行语料库进行汉译英研究已成为国内翻译学界的一大热点。本研究自建平行语料库,从词汇特征入手,对汉语学术著作《语言符号学》及其英译本,以及英语学术著作Handbook of Semiotics(以下简称Handbook)的词汇丰富度进行了比...近年来,使用平行语料库进行汉译英研究已成为国内翻译学界的一大热点。本研究自建平行语料库,从词汇特征入手,对汉语学术著作《语言符号学》及其英译本,以及英语学术著作Handbook of Semiotics(以下简称Handbook)的词汇丰富度进行了比较研究。研究结果表明:1)《语言符号学》英译本的词汇多样性与Handbook比较接近,未呈现出词汇范围窄化的倾向;2)相对于原作,《语言符号学》英译本的词汇密度有所降低,连词、介词和代词存在扩增现象;3)《语言符号学》英译本的词汇复杂度低于Handbook,前者的阅读难度相对较低。展开更多
文摘Multilingual corpora have well been recognised as a valuable resource in contrastive and translation studies.This article investigates the development and use of multilingual corpora with a focus on work done in Scandinavia with the purpose of showing how parallel corpora can be useful within different fields of language descriptions:lexis,grammar and discourse.It also presents a case study that demonstrates how a parallel corpus can be used in comparing two seemingly equivalent future-referring expressions cross-linguistically,namely the English 'be going to' and the Norwegian 'kommer til '('come to').
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 61133012, 61273321) and the National 863 Leading Technology Research Project (2012AA011102). Special thanks to Wanxiang Che, Yanyan Zhao, Wei He, Fikadu Gemechu, Yuhang Guo, Zhenghua Li, Meishan Zhang and the anonymous reviewers for insightful comments and suggestions.
文摘Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER.
文摘近年来,使用平行语料库进行汉译英研究已成为国内翻译学界的一大热点。本研究自建平行语料库,从词汇特征入手,对汉语学术著作《语言符号学》及其英译本,以及英语学术著作Handbook of Semiotics(以下简称Handbook)的词汇丰富度进行了比较研究。研究结果表明:1)《语言符号学》英译本的词汇多样性与Handbook比较接近,未呈现出词汇范围窄化的倾向;2)相对于原作,《语言符号学》英译本的词汇密度有所降低,连词、介词和代词存在扩增现象;3)《语言符号学》英译本的词汇复杂度低于Handbook,前者的阅读难度相对较低。