摘要
词对齐是自然语言处理领域的基础性研究课题之一.文中提出基于链式条件随机场(CRF)判别式模型的蒙古文-英文词对齐方法.该方法根据蒙古文和英文之间的差异,选择词形、词汇、词性等信息作为特征,建立双层CRF词对齐模型:在第1层利用CRF模型实现子块的对齐;在第2层利用CRF模型得到块内词之间的对齐结果.通过人工构建的词对齐语料开展相应实验.实验结果表明,该方法有效提高蒙英词对齐质量.
Word alignment is an essential issue in the field of natural language processing. A discriminative word alignment method is proposed using the linear CRF model for Mongolian-English language pair. According to the differences between Mongolian and English languages, morphological, lexical and part- of-speech features can be incorporated into the CRF model, and a dual-layer CRF word alignment model is constructed. In the first layer, the chunks that are split from the sentence are aligned. Then in the second layer, the words of chunks are aligned using CRF word alignment model. The experimental results on Mongolian-English task demonstrate that the proposed method improves the performance of word alignment.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2012年第3期521-526,共6页
Pattern Recognition and Artificial Intelligence
基金
安徽省高等学校省级自然科学研究项目(No.KJ2012B147)
安徽省高等学校优秀青年人才基金项目(No.2012SQRL171)资助
关键词
条件随机场(CRF)判别式模型
词对齐
语言特征
Conditional Random Field (CRF) Discriminative Model, Word Alignment, Linguistic Feature