摘要
针对目前国内外汉法机器翻译系统较少,且研究的汉语语例基本为简单短句的情况,利用《人民日报》中、法文网络版的部分文章建立了一个小型的汉法平行语料库,并基于此,利用改进的Yamada算法构建了一个汉法机器翻译系统。系统通过对汉法平行语料的统计结果,把汉语句型大致归为单谓和多谓两大类,并提炼出4096个汉法对齐基本句型,将之应用于汉法机译中;并首次提出了三词序列出现概率的概念,用于解决词语搭配的问题。试验表明系统在处理多谓语的汉语长句上有明显的优势。
The study puts forward a corpus-based statistical solution to the rare Chinese-French machine transhtion system, with which by now can only deal short sentence. A Chinese-French machine translation system, then, is established by applying 4096 aligned Chinese-French basic sentence types obtained within the aligned Chinese-French corpus. For the first time, 3-word-sequence appearing probability, a new concept by which the difficulty of collocation may be untied, is introduced also. Finally, a test provides evidence of the conclusion that the system given in this article does have advantage in translating Chinese multi-predicate, and long customarily, sentence into French.
出处
《计算机技术与发展》
2008年第4期114-117,共4页
Computer Technology and Development
关键词
三词序列出现概率
汉法对齐基本句型
多谓句
语料库
3-word-sequence appearing probability
aligned Chinese-French basic sentence type
multi-predicate sentence
corpus