摘要
语料库的语料标注准确性是基于语料库的学术研究结果可靠性的前提。本文以国内出版的英语专业学习者语料库(CEM)翻译部分的关系从句错误的标注为研究对象,结合实例分析,考察了该语料库语料标注的准确性问题。研究发现,CEM关系从句错误标注存在类型标注不准确和标注遗漏等问题,超过了可允许的误差,使其作为研究工具的可靠性大打折扣。基于本研究的结果,语料库研制者应该重视语料的标注准确性问题,不但要设计合理的标注规则,更要确保让规则在误差允许的范围内准确、一致地落实到语料中去,以保证语料库质量。
Tagging accuracy of corpora constitutes the important basis for the reliability of the corpus-based research results. The question of the C-E translation data tagging accuracy of CEM is studied with reference to its English relative clause error tagging system, exemplified by data drawn from the corpus. It is found that the number of incorrect and inaccurate tagging cases in the CEM C-E translation relative clause error tagging system is much higher than the allowable tagging error, diminishing considerably the accountability of CEM as a research tool. Informed by this finding, it is suggested that corpus makers be more rigorous with corpus tagging quality control, guaranteeing that their tagging rules are sound, and more importantly, be carried out accurately and consistently in tagging practices so as to make reliable corpora.
出处
《大学英语教学与研究》
2014年第3期28-34,共7页
College English Teaching & Research