期刊文献+

一种命名实体翻译等价对的抽取方法 被引量:10

An Approach to Extract Named Entity Translingual Equivalence
下载PDF
导出
摘要 有关命名实体的翻译等价对在多语言处理中有着非常重要的意义。在过去的几年里,双语字典查找,音译模型等方法先后被提出。另一种极具价值的方法是从平行语料库中自动抽取有关命名实体的翻译等价对,现有的方法要求预先对双语语料库的两种语言文本进行命名实体标注。提出了一种只要求对语料库中源语言进行命名实体标注,目标语言不需标注,然后利用训练得到的HMM词对齐结果来抽取有关命名实体翻译等价对的方法。在实验中,把中文作为源语言,英文作为目标语言。实验结果表明用该方法,即使在对齐模型只是部分准确的情况下,也得到了较高正确率的命名实体翻译等价对。 Identification of translingual equivalence of named entities is substantial to multilingual natural language processing. Some approaches to named entity translation, such as bilingual dictionary lookup, word/sub-word translation or transliteration, have been explored in the past years. Another promising approach is to extract named entity translingual equivalence automatically from a parallel corpus, which usually requires the named entities to be annotated manually or automatically for both languages. In this paper, we propose a new approach to extract equivalence of named entities from a parallel corpus with only the source language annotation and the result of HMM alignment. The experiment is carried in a Chinese-English parallel copus, and we treat Chinese as the source language and English as the target language. The result shows that our new approach achieves high quality of named entity pairs with relatively high precision, even though sometimes the word alignment result is partially correct.
出处 《中文信息学报》 CSCD 北大核心 2008年第4期55-60,共6页 Journal of Chinese Information Processing
基金 国家863计划资助项目(2006AA01Z143 2006AA01Z139) 国家自然科学基金资助项目(60673043) 江苏省自然科学基金资助项目(BK2006117)
关键词 人工智能 机器翻译 命名实体 翻译等价对 HMM 对齐模型 artificial intelligence machine translation named entity translingual equivalence HMM alignment model
  • 相关文献

参考文献12

  • 1D. Bikel, S. Milker, R. Schward, etc. A High-performance Learning Name-finder [C]//Proceedings of Applied Natural Language Processing, Washington DC: 1997.
  • 2Y. Al-Onaizan, and K. Knight. Translating Named Entity Using Bilingual and Monolingual Resources [C]//Proceedings of Association of Computational Linguistics, Philadelphia PA: 2002.
  • 3H. Meng, W. K. Lo, B. Chen, and K. Tang. Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval [C]//Proceedings of the Automatic Speech Recognition and Understanding Workshop, Trento, Italy: 2001.
  • 4B. Stalls, and K. Knight. Translating Names and Technical Terms in Arabic Text [C]//Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages, Philadelphia, Pennsylvania : 1998.
  • 5Huang. F, Vogel. S, and Waibel. A. Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization [C]//Proceedings of Association of Computational Linguistics,Sapporo, Japan: 2003.
  • 6Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematics of Statistical Machine Translation, Parameter Estimation [J]. Computational Linguistics, 1993, 19 (2) : 263-311.
  • 7Stephan Vogel, Hermann Ney, and Christoph Till- mann. HMM-based Word Alignment in Statistical Translation [C]//The 16th International Conference on Computational Linguistics, Copenhagen, Denmark: 1996.
  • 8Zhou Jun-sheng, Dai Xin-yu, Ni Rui-yu, Chen Jia-jun. A Hybrid Approach to Chinese Word Segmentation around CRFs [C]//Proceedings of the 4^th SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea: 2005.
  • 9Ashish Venugopal, Stephan Vogel, and Alex Waibel. Effective Phrase Translation Extraction from Alignment Models [C]//Proceedings of 41^st Annual Meeting of ACL, Sopporo, Japan: July, 2003.
  • 10Bing Zhao, and Stephan Vogel. Word Alignment Based on Bilingual Bracketing [C]//HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, Edmonton, Alberta, Canada: May 2003.

二级参考文献9

  • 1周强,俞士汶.汉语短语标注标记集的确定[J].中文信息学报,1996,10(4):1-11. 被引量:35
  • 2Xun E, ghou M, and Huang C. A Unified Statistical Modal for the Identification of English Base NP.The 38th Annual Meeting of the Association for Computational Linguistics [C], 2002.
  • 3Lance A. Ramshaw and Mitchell P. Marcus. Text Chunking Using Transformation-Based Learning.Proceedings of the Third ACL Workshop on Very Large Corpora [C], Cambridge MA, USA, 1995.
  • 4Jlian M. Kupiec. An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora. Proceedings of the 3Ist Annual Meeting of the ACL [ C] ,1993.
  • 5Smadja F, McKeown K. R and Hatzivassiloglou V. Translation Collocations for Bilingual Lexicons: A Statistical Approach [J] Computational Linguistics 1996,22(1) : 1 - 38.
  • 6Melamed I. D. Automatic Discovery of Non-Compositional Compounds. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing [C], Providence, RI 1997.
  • 7Jianfeng Gao, Jian-Yun Nie. Improving Query Translation for Cross-language Information Retrieval Using Statistical Models Proceedings of the 24th annual international ACMSIGIR conference [C] 96 - 104,2001.
  • 8赵军,黄昌宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2):1-7. 被引量:41
  • 9周强,孙茂松,黄昌宁.汉语最长名词短语的自动识别[J].软件学报,2000,11(2):195-201. 被引量:37

共引文献6

同被引文献111

引证文献10

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部