摘要
在实现基于网络语料库和双语网页搜索的辅助翻译系统的过程中,利用网络机器人从互联网上获取中英文双语对照网页,对它们进行过滤,留下有用的信息,再把中英文句子进行匹配存入数据库。分句匹配算法是语言翻译处理领域的双语句子对齐过程,它将网页净化后获得的有用信息进行匹配,产生最终的双语语料。对分句匹配算法进行了描述,并且研究了匹配算法的实现过程。
In the process of implementing the assistant translation system based on bilingual corpus and bilingual pages searching, the Chinese and English bilingual pages from the net using network robot are used and filtered so as to get useful information. In this way, the Chinese and English sentences are matched and stored in the database. The text segmentation algorithm is the process that matches the bilingual sentences in language translation processing domain, which matches the useful information that we get in web page cleaning module to get the last bilingual corpus. The text segmentation algorithm was described and the implementing process of test segmentation algorithm was studied.
出处
《武汉理工大学学报(信息与管理工程版)》
CAS
2008年第5期708-710,共3页
Journal of Wuhan University of Technology:Information & Management Engineering
关键词
分句匹配
双语句对
匹配最优
text segmentation
bilingual pairs of sentences
best match