摘要
解码是统计学自然语言翻译系统的重要一步,解码器的任务是用从训练文本中学习到的语言/翻译模型的信息来确定源句子最可能的翻译句子,解码器的输入是翻译模型和语言模型,以及源语言句子,输出源语言句子最可能的对应目标句子/翻译。由于可能的目标句子很多,通常解码算法只能搜索一小部分可能的目标语言句子。该文介绍了一种基于堆栈算法的,用Java实现的解码器。Java平台提供了方便的跨平台的应用,高度安全、开放、健壮。解码器的实现重点在于解码算法和参数的选择。
Decoding is a critical part in statistical machine translation system;the decoder's job is to find the most likely translation according to previous learned information from training corpora.The decoder takes translation model and language model information as well as source sentence as input,use some searching algorithm to find the target sentence that has maximum probability of being translation of given source sentence,typically decoding algorithms can only promise to search a small scope of possible translations due to the large scope of possible target sentences.In this report,a stack-based decoder based on statistical natural language translation principles are described and implemented.The programming language is Java.Java platform provides freedom of choice through multi-platform compatibility.It is a highly secure,open,robust,viable and flexible platform for developing.The software development is concentrated on the decoding algorithm and parameter selection.
出处
《计算机工程与应用》
CSCD
北大核心
2005年第4期105-108,共4页
Computer Engineering and Applications