摘要
以中国传媒大学平行语料检索软件(CUC_ParaConc)为例论述平行语料检索技术,主要以纯文本形式的对齐语料为例进行阐述,包括平行语料的存储、读取技术以及双语、多语关键词检索技术。平行语料检索可分为"一对一"与"一对多"两种形式。在一对一平行语料检索中,以汉英平行语料为例分别论述了以汉语为对象的非拼音文字语料的检索技术,以英语为对象的拼音文字语料检索技术,对两者的异同进行了对比;在一对多平行语料检索中,重点论述了多语关键词检索技术。
Parallel corpus retrieval technology is discussed in the light of CUC_ParaConc, Communication University of China’s parallel corpus retrieval software. On the basis of the alignment of the plain text corpus form, how to store and read parallel data is prsented, bilingual and multilingual keyword retrieval technology is illustrated. Parallel corpus retrieval can be conducted on either “one-to-one” technique or “one-to-many” technique. As for “one-to-one” technique, Chinese-English parallel data are employed to expound and compare non-phonetic corpus retrieval technology for Chinese and phonetic corpus retrieval technology for English. Special attention has been given to a multi-lingual keyword search technology in “one-to-many” parallel corpus retrieval.
出处
《计算机工程与应用》
CSCD
2012年第31期134-139,共6页
Computer Engineering and Applications
关键词
平行语料
检索
双语
多语
parallel corpus retrieval bilingual multilingual