基于可比语料库的双语术语抽取技术研究

Research of Bilingual Term Extraction Based on Comparable Corpora

下载PDF

导出

摘要对双语术语抽取技术中的一项重要分支：基于可比语料库的双语术语抽取技术进行了综述分析。当前研究者采用的方法依据是“上下文相似”理论，即两个在源语言共现的词，对应到目标语言中的两个词也将共现。当前技术主要包含候选词的上下文特征的模型构造和上下文特征模型的优化。对已有的研究给出了一个初步的评价标准，分别对两项研究按照方法复杂度层次进行分析总结，指出存在的问题。最后对基于可比语料库的双语术语抽取技术的未来进行了展望。 This article gives a research survey on the bilingual term extraction based on comparable corpora, which is a branch of bilingual term extraction. Most researchers use the Context-similar theory, which claims that if two words appear nearly in the source text then their translations could appear nearly in the target text. The bilingual term extraction based on comparable corpora includes two tasks： the context features models and the optimization of the context features models. The status of this technology has been analyzed in detail by the generation of method. And the problems have been discussed during analyzing. In the end, the paper presents the prospects of the study of the bilingual term extraction based on comparable corpora. According to these researchers＇ experiment result, this technology can be used in machine aided translation and building bilingual dictionary.

作者俞卓黄河燕

机构地区南京理工大学计算机科学与技术学院北京理工大学计算机学院

出处《情报学报》 CSSCI 北大核心 2011年第12期1286-1292,共7页 Journal of the China Society for Scientific and Technical Information

基金本文为国家“863”高新技术研究发展计划基金项目,项目编号2006AA010109.

关键词基于可比语料库的双语术语抽取双语语料库可比语料库上下文特征 bilingual term extraction based on comparable corpora, bilingual corpora, comparable corpora, context features

分类号 H319 [语言文字—英语]

引文网络
相关文献

参考文献39

1Izuha T.Machine translation using bilingual term entries extracted from parallel texts[J].IEIC Technical Report,2001,101(89):1-7.
2Miangah T.Automatic term extraction for cross-language information retrieval using a bilingual parallel corpus[C] //Proceedings of the 6th International Conference on Informatics and Systems Special Track on Natural Language Processing,2008:81-84.
3Church K,Gale W,Fung P,et al.Aligning parallel texts:do methods developed for English-French generalize to Asian languages[C] //Proceedings of Pacific Asia Confe-rence on Formal and Computational Linguistics,1993.
4Cheung P,Fung P.Sentence alignment in parallel,com-parable,and quasi-comparable corpora[C] //Proceedings of LREC,2004.
5Fung P,McKeown K.Aligning noisy parallel corpora across language groups:word pair feature matching by dynamic time warping[C] //Proceedings of Association of Machine Translation in the Americas,1994.
6Fung P.A pattern matching method for finding noun and proper noun translations from noisy parallel corpora[C] //Proceedings of 33rd Annual Conference of the Association for Computational Linguistics,1995.
7Fung P.Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus[C] //Proceedings of Third Annual Workshop on Very Large Corpora,1995.
8Fung P,Wu D.Coerced markov models for cross-lingual lexical-tag relations[C] //Proceedings of Sixth Intern-ational Conference on Theoretical and Methodological Issues in Machine Translation,1995:240-255.
9Fung P,McKeown K.A technical word and term transl-ation aid using noisy parallel corpora across language groups[J].The Machine Translation,1997,12(1-2):53-87.
10Fung P.Domain word translation by space-frequency an-alysis of context length histograms[C] // Proceedings of ICASSP′96:International Conference on Acoustics,Signal and Speech,1996.

二级参考文献18

1Tony M E.Multilingual corpora-current practice and future trends[C]//Proc of the 19th ASLIB Machine Translation Conference,London,1997:71-83.
2Tanka K,Iwasaki H.Extraction of lexical translations from nonaligned corpora[C]//Proc of International Conference on Computational Linguistics(COLING 96),1996.
3Fung Pascale,Yee Lo Yuen.An IR approach for translating new words from nonparallel,comparable texts[C]//Proc of the 36th Conference for AC L,Montreal,1998:414-420.
4Fung Pascale.Extracting key terms from japanese and Chinese texts[J].Computer Processing of Oriental Languages,1998,12 (1):99-122.
5Rapp R.Automatic identification of word translation from unrelated English-German corpora[C]//Proc of the 37th ACL,College Park,Maryland,1999.
6Mona Diab,Steve Finch.A statistical word-level translation model for comparable corpora[C]//Proc of RIAO 2000,Paris,France,2000.
7Chiao Y-C,Sta J-D,Zweigenbaum P.A novel approach to improve word translations extraction from non-parallel,comparable corpora[C]//Proc of the First International Joint Conference on Natural Language Processing,Sanya,Hainan Island,China,2004.
8Chiao Y C,Zweigenbaum P.Looking for candidate translational equivalents in specialized,comparable corpora[C]//Proc of COLONG2002,2002.
9Fatiha Sadat,Herve Dejean,Eric Gaussier.A combination of models for billingual lexicon extraction from comparable corpora[C]//Proc of Papillon 2002 Seminar,Tokyo,Japan,2002:16-18.
10Fatiha Sadat,Masatoshi Yoshikawa,Shunsuke Uemura.Billingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval[C]//Proc ACL2003,Sapporo,Japan,2003.

共引文献12

1章成志,王惠临.多语言文本聚类研究综述[J].现代图书情报技术,2009(6):31-36. 被引量：4
2康小丽,章成志,王惠临.基于可比语料库的双语术语抽取研究述评[J].现代图书情报技术,2009(10):7-13. 被引量：6
3钟玉峰.基于平行语料库的文献术语抽取研究[J].黑龙江工程学院学报,2011,25(4):60-62. 被引量：1
4康小丽,章成志.用于双语术语抽取的专业领域中英文可比语料库构建[J].现代图书情报技术,2012(2):28-33. 被引量：5
5吴玥.基于依存上下文的中—英词表构建方法[J].信息通信,2013,26(7):95-96. 被引量：1
6胡小鹏,袁琦,耿鑫辉,朱姝.构建和剖析中英三元组可比语料库[J].计算机工程与应用,2014,50(13):153-157. 被引量：5
7丁玉飞,王曰芬,刘卫江.面向半结构化文本的知识抽取研究[J].情报理论与实践,2015,38(3):101-106. 被引量：7
8李响,胡小鹏,袁琦.面向多引擎融合技术的统计后编辑方法研究[J].工业技术创新,2015,2(6):591-596. 被引量：1
9司莉,史雅莉.基于跨语言信息检索的可比语料库构建方法研究[J].国家图书馆学刊,2016,25(6):64-70. 被引量：1
10彭飞,吐尔根.依布拉音,艾山.吾买尔,米尔夏提.力提甫.用于双语科技术语对齐的汉维文可比语料库构建[J].新疆大学学报（自然科学版）,2017,34(3):316-321. 被引量：2

1王文良.从商务英语的特点看商务英语教学[J].时代文学（上半月）,2007(3):217-218. 被引量：5
2冯敏萱,方芳,陈小荷.带后缀三字词的自动识别[J].计算机工程与应用,2006,42(19):161-163. 被引量：1
3谢文怡.新时期商务英语教学模式探析[J].国际商务研究,2005,26(6):57-61. 被引量：33
4石志亮.从词汇联想测试看二语心理词库的发展特征[J].西安外国语大学学报,2009,17(3):80-83. 被引量：7
5李斌荣.阅读课探讨[J].集宁师专学报,2002,24(1):84-86.
6张砥.语际释意相似与口译中的不可译性[J].重庆三峡学院学报,2006,22(1):118-121. 被引量：3
7陈晶晶.“V不C”构式研究综述[J].现代语文（下旬．语言研究）,2015(4):27-30. 被引量：2
8姚琪.多维度／多特征分析模型综述[J].剑南文学（经典教苑）（下）,2010(11):109-109.
9田巍.语料库应用于英语专业写作教学过程的理论和现状分析[J].鞍山师范学院学报,2008,10(3):37-40. 被引量：2
10王艾录,孟宪良.试论相似思维在语言创造中的地位[J].山西大学学报（哲学社会科学版）,1997,20(2):71-75.

情报学报

2011年第12期

浏览历史

内容加载中请稍等...

基于可比语料库的双语术语抽取技术研究

参考文献39

二级参考文献18

共引文献12

相关作者

相关机构

相关主题

浏览历史