摘要
统计机器翻译的准确性在很大程度上取决于翻译建模的质量,而翻译建模往往依赖于数据的分布。通常,大多数机器学习任务会假设训练数据和测试数据是独立同分布的,然而在实际的系统中,这种假设未必成立。因此,为了达到性能的最优,需要根据数据分布的情况对模型进行适当的迁移。近年来,领域自适应技术成为统计机器翻译研究中的一个热点话题,目的在于解决训练数据和测试数据的领域分布不一致问题。本文介绍了几类流行的统计机器翻译领域自适应方法,并对未来的研究提出一些展望。
Statistical Machine Translation (SMT) depends largely on the performance of translation modeling, which fur- ther relies on data distribution. Usually, many machine learning tasks assume that the data distributions of training and tes- ting domains are similar. However, this assumption does not hold for real world SMT systems. Therefore, the researchers need to adapt the models according to the data distribution in order to optimize the performance. Recently, domain adapta- tion is an active topic in SMT and aims to alleviate the domain mismatch between training and testing data. This paper in- troduces several popular methods in domain adaptation for statistical machine translation and discusses some future work in this area.
出处
《智能计算机与应用》
2014年第6期31-34,共4页
Intelligent Computer and Applications
关键词
统计机器翻译
领域自适应
Statistical Machine Translation
Domain Adaptation