摘要
本文描述了一个基于分层语块分析的统计翻译模型。该模型在形式上不仅符合同步上下文无关文法,而且融合了基于条件随机场的英文语块分析知识,因此基于分层语块分析的统计翻译模型做到了将句法翻译模型和短语翻译模型有效地结合。该系统的解码算法改进了线图分析的CKY算法,融入了线性的N-gram语言模型。目前,本文主要针对中文-英文的口语翻译进行了一系列实验,并以国际口语评测IWSLT(International Workshopon Spoken Language Translation)为标准,在2005年的评测测试集上,BLEU和NIST得分均比统计短语翻译系统有所提高。
This paper describes a Hierarchical chunking-phrase based (HCPB) statistical translation model. The model not only comply with formal synchronous context-free grammar but also learned partial parsing knowledge using CRF (Conditional Random Fields) . Therefore it can be taken as combination of fundamental ideas from both syntax-based translation and phrase-based translation. The decoder for HCPB MT system is based on Chart-CKY algorithm, and integrates N-gram language model effectively. In our benchmark evaluation focusing on Chinese-English spoken language translation. The method achieves higher accuracy in measure of Bleu and NIST score in IWSLT2005.
出处
《中文信息学报》
CSCD
北大核心
2007年第5期87-90,117,共5页
Journal of Chinese Information Processing
基金
国家863计划资助项目(2006AA01Z194)
富士通合作项目(K0604040)
关键词
人工智能
机器翻译
基于分层语块分析的统计翻译模型
条件随机场
CKY算法
artificial intelligence
machine translation
hierarchical chunking-phrase based SMT
conditional random fields
chart-based CKY algorithm