期刊文献+

基于可比语料库的双语词典抽取方法比较研究 被引量:4

Comparison of Approaches for Bilingual Lexicon Extraction from Comparable Corpora
下载PDF
导出
摘要 双语词典是一种重要的语言资源,但现有的基于可比语料库的双语词典抽取方法在体系结构、所依赖的基础性资源等方面差异较大,这使得在统一的实验条件下对各种算法进行比较变得很困难.因此,目前的研究工作多选择将性能评测任务限定在很狭小的范围内,缺乏统一的评测结果给双语词典抽取任务的发展和算法的选择带来一定困难.为解决上述问题,选取并实现了四种代表性的双语词典抽取方案,在统一的测试数据集上进行比较研究.在比较研究中,我们重点揭示了词典抽取任务中几种关键因素如语料库大小、训练词典大小等对各算法性能的不同影响程度.本文的结论对今后相关工作中的实验设计、性能比较与算法选用都具有重要的理论意义和实践价值. The bilingual lexicon is an important language resource. Current methods of lexicon extraction from comparable corpora differ in architecture and basic resources they rely on,which makes it difficult to make comparisons among various algorithms. Therefore, existing research work has performed the evaluation in a narrow range. The lack of unified evaluation results brings certain difficulty to the development of bilingual lexicon extraction task and the choosing of algorithms. To resolve above problems, this paper has chosen and implemented four representative methods of bilingual lexicon extraction,and makes a comparative study on unified data sets. It has mainly revealed the impact of such key factors as corpus size and training dictionary size on lexicon extraction task. The conclusion of the paper has great theoretical significance and practical value for the experimental design, performance comparison and algorithm choosing in the field of lexicon extraction from comparable corpora.
出处 《小型微型计算机系统》 CSCD 北大核心 2017年第7期1554-1561,共8页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61300144)资助 国家语委科研项目(YB125-132)资助 中央高校基本科研业务费专项资金项目(CCNU15A05062 CCNU16A06015)资助
关键词 可比语料库 双语词典抽取 上下文向量 词向量 comparable corpus bilingual lexicon extraction context vector distributed representation
  • 相关文献

同被引文献15

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部