摘要
介绍了一个基于 Internet的双语词汇获取系统 ,提出了根据文本结构信息和内容信息进行对齐的方法 .该方法的实现不依赖于任何语言的特点 ,从对齐结果中自动抽取双语词汇 ,系统最终的结果词汇又被用作鉴定未知词汇的依据 .所抽取出的词汇反映了大量的新词、专有名词和在不同上下文里的各种译文 ,可以应用在机器翻译和多语种信息检索中 .
This paper presented a system to extract bilingual lexicons from the Internet. It uses a new alignment method based on both structural and lexical information. The bilingual lexicons are extracted from scratch, augmented incrementally and fed back as a lexical resource for alignment. The result shows a lot of new words, context translations and some proper names, which can be used in machine translation and cross language information retrieval. The method has been applied in German, English and Chinese languages, but the realization is independent of any markup, natural language or domain.
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2001年第9期1386-1389,1394,共5页
Journal of Shanghai Jiaotong University
基金
国家自然科学基金资助项目 ( 6 0 0 830 0 3)