摘要
双语语料库建设及其自动对齐研究对计算语言学的发展具有重要的意义。双语对齐技术是加工双语文本的核心,对齐效果的好坏直接影响了以后工作(诸如机器辅助翻译)的进行。基于汉英双语的实际情况,提出了一种新的句子对齐混合算法,该算法主要采用一种新的基于长度的对齐算法,并结合基于词典的对齐算法,通过正反双向对齐,进一步提高了句子对齐的准确率。最后通过100个文件,5000多句英汉双语对该算法进行了验证,从对齐效果可以发现,结果比较理想,因而可以证明,该算法在实际工作中是可行的。
Bilingual corpus and its automatic alignment are of great significance to the development of computational linguistics. As the key technology during the course of building corpus, bilingual alignment technology has a direct impact on the future work (such as computer- assisted translation) process. Based on the actual situation of Chinese- English bilingual, this paper proposes a new hybrid algorithm for sentence alignment, which is mainly based on the length - based method and combined with the lexicon - based method. Through the pros and cons of two -way alignment, this algorithm further improved the accuracy of the sentence alignment. Finally, by using 100 documents, more than 5,000 English -Chinese bilingual sentences, the algorithm was verified, and from the effects of alignment it can be found that the results are satisfactory, and the algorithm in practical work is feasible.
出处
《计算机仿真》
CSCD
北大核心
2009年第2期329-333,共5页
Computer Simulation
关键词
双语语料库
句子对齐
混合算法
Bilingual corpus
Sentence alignment
Hybrid algorithm