摘要
基于最大熵的括号转录语法模型具有翻译能力强、模型训练简单的优点,成为近些年统计机器翻译研究的热点。然而,该模型存在短语调序实例样本分布不平衡的缺点。针对该问题,该文提出了一种引入集成学习的短语调序模型训练方法。在大规模数据集上的实验结果表明,我们的方法能有效改善调序模型的训练效果,显著提高翻译系统性能。
The Maximum Entropy Based BTG model becomes a hot topic in statistical machine translation in recent years due to its strong translation and easy to-train abilities. However, the distribution of reordering examples in this model is imbalanced. To solve this problem, we introduce an ensemble learning method for training phrase reor- dering model. Experimental results show that,the reordering model can reach a better training effect via our method and the performance of the translation system is improved significantly in a large-scale dataset.
出处
《中文信息学报》
CSCD
北大核心
2014年第1期87-93,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61303082
61005052)
国家科技支撑计划(2012BAH14F03)
高等学校博士学科点专项科研基金(20120121120046)
关键词
最大熵
短语调序
不平衡分类
集成学习
maximum entropy
phrase reordering
imbalanced classifier
ensemble learning