摘要
讨论了面向北京奥运的多语语料库建设中的若干基础问题。提出了面向事件、多领域融合的语料收集原则,制定了具有分类信息的标注规范,初步建立了具有近七万句对的可控多语语料库。
Discusses several fundamental problems of muhilingual corpus construction, which oriented Beijing Olympic games, and proposes event-oriented multi-domain fused corpus selection rule, establishes marking guidelines with classification information. A controlled multilingual corpus has been built which contained nearly 70 000 aligned sentence pairs.
出处
《计算机应用研究》
CSCD
北大核心
2005年第11期23-24,30,共3页
Application Research of Computers
基金
国家"863"计划资助项目(2002AA117010-09)
国家自然科学基金资助项目(60375019)
关键词
语料收集原则
标注规范
可控多语语料库
Corpus Selection Rule
Marking Guidelines
Controlled Multilingual Corpus