摘要
本文系统阐述了一个计算机领域英汉双语语料库平台的构建方案,包括降噪处理,特征语块提取,关键词标注,中文分词,词频信息统计,段对齐标注,句对齐标注10步,在此基础上我们提出特征语块的定义。
The paper elaborates the construction of a platform for the English-Chinese bilingual corpus in the computer field. It includes ten steps- --the noise reduction processing, the characteristic chunk tree extraction, keyword tagging, Chinese word segmentation, word frequency statistics, paragraphs alignment, and sentences alignment, etc. Based on the above steps, the paper puts forward the definition of characteristic chunk.
出处
《科技广场》
2009年第9期132-135,共4页
Science Mosaic
基金
江西省教育厅科技课题"计算机领域英汉双语语料库翻译平台的创建与应用"
关键词
双语语料
XML标注
双语词典
特征语块
Bilingual Corpus
XML Tag, Bilingual Dictionary
Characteristic Chunk