期刊文献+

面向中英平行专利的双语术语自动抽取 被引量:8

Unsupervised bilingual terminology extraction algorithm for Chinese-English parallel patents
原文传递
导出
摘要 双语术语自动抽取是自然语言处理领域的重要研究课题之一,对于跨语言检索、机器翻译,以及双语词典的构建等具有重要意义。该文提出了一种面向中英平行专利语料的无监督双语术语自动抽取算法。该算法利用基于短语的统计机器翻译模型中的短语对齐和基于条件随机场的组块分析,实现双语术语自动抽取,同时借助专利语料的领域主题信息进一步提高双语术语抽取的准确率。实验表明:该算法在5 867组电通信技术领域的中英平行专利文档上进行双语术语的自动抽取,准确率达到94.00%。 Automatic bilingual terminology extraction is one of the most important natural language processing(NLP)tasks,which is meaningful for cross-language information retrieval and machine translation. An unsupervised bilingual terminology extraction algorithm is developed for Chinese-English parallel patents,which utilizes phrase alignment of statistical machine translation and chunk analysis based on conditional random fields(CRF)to extract bilingual terminologies.This algorithm makes use of detailed topics information of patents to improve the precision of bilingual terminology extraction.Experiments demonstrate that this algorithm has a precision up to 94%.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2014年第10期1339-1343,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家科技支撑计划重点项目(2009BAH41B04) 教育部哲学社会科学研究重大课题攻关项目(10JZD0043)
关键词 短语对齐 条件随机场 组块分析 双语术语 phrase alignment conditional random fields(CRF) chunk analysis bilingual terminology
  • 相关文献

参考文献24

  • 1李秀英.术语与机器翻译——实验结果分析与术语数据库的构建[J].实验室研究与探索,2008,27(11):51-56. 被引量:3
  • 2孙乐,金友兵,杜林,孙玉芳.平行语料库中双语术语词典的自动抽取[J].中文信息学报,2000,14(6):33-39. 被引量:30
  • 3Erdmann M, Nakayama K, HaraT, et al. An approach for extracting bilingual terminology from Wikipedia [C]// Database Systems for Advanced Applications. Berlin, Heidelberg: Springer, 2008:380-392.
  • 4Bourigault D. Surface grammatical analysis for the extraction of terminological noun phrases [C]// Proceedings of the 14-th Conference on Computational Linguistics, Volume 3. Nantes, France: Association for Computational Linguistics, 1992: 977-981.
  • 5Justeson J S, Katz S M. Technical terminology: Some linguistic properties and an algorithm for identification in text [J]. Natural Language Engineering, 1995, 1(1): 9-27.
  • 6Ananiadou S. A methodology for automatic term recognition [C]// Proceedings of the 15-th Conference on Computational Linguistics, Volume 2. Kyoto, Japan: Association for Computational Linguistics, 1994: 1034- 1038.
  • 7Frantzi K, Ananiadou S, Mima H. Automatic recognition of multi-word terms: The C-value/NC-value method [J]. International Journal on Digital Libraries, 2000, 3(2) : 115 - 130.
  • 8Takeuchi K, Collier N. Use of support vector machines in extended named entity recognition [C]// Proceedings of the 6-th Conference on Natural Language Learning, Volume 20. Stroudsburg, PA: Association for Computational Linguistics, 2002 : 1 - 7.
  • 9Lafferty J, Mccallum A, Pereira F C. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [C]// Proceedings of the 18-th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2001 : 282 - 289.
  • 10Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statistical machine translation: Parameter estimation [J]. Computational Linguistics, 1993, 19(2): 263 -311.

二级参考文献74

共引文献89

同被引文献69

引证文献8

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部