摘要
当前主流计算机辅助翻译系统(CAT)借助翻译记忆(TM)和术语库(TB)提高翻译效率。翻译记忆以自然句为主要匹配单位,需要整句相似或重复,匹配难度大。与之相比,术语库以词块为匹配单位,较为灵活,可弥补翻译记忆的缺陷。术语库的构建涉及术语自动提取,需要参考特定文本类型中高频语块的词性规则。文章使用n-gram提取英语民航规章文本的复现语块,探究不同词项长度和复现频数下高频语块的词性组合特征;并将其与文学文本进行对比。研究发现,在英语民航规章文本中,适用于计算机辅助翻译系统术语库的复现语块以名词短语为主,与文学文本存在显著差异。
Most of the current CAT systems leverage Translation Memory(TM)and Termbase(TB)to enhance efficiency of translation.With respect to TM,due to its limitations in practice,whole sentence repetition often should be complemented by translation termbase,which is more flexible in use.Building a termbase requires the automatic extraction of terms,which demands knowledge of its POS(part of speech)configuration in the specific text typology.With corpus tools,we extracted n-grams of certain length and frequency from Civil Aviation Regulations in the US and examined the POS configuration of those recurrent chunks,followed by a contrast with that of literary texts.The study shows a dominance of NP and PP in recurrent chunks suitable for CAT termbase in those Civil Aviation Regulations,different from the result in literary texts.
出处
《中国科技术语》
2022年第2期65-69,共5页
CHINA TERMINOLOGY
基金
中国民航大学中央高校基金项目“英汉翻译中的透明话语策略研究”(3122018R010)。
关键词
计算机辅助翻译
术语库
N-GRAM
民航规章
Computer Aided Translation(CAT)
termbase
n-gram
civil aviation regulations