

摘要 命名实体、术语的翻译对自然语言处理,机器翻译性能的影响越来越得到重视,但是这些翻译很难从现有的翻译词典中获得充足的信息。提出了一种从网页中自动获取高质量命名实体短语翻译对的方法,首次探索了对双语文本中对齐缺失部分自动补充的方法。该方法利用网页双语翻译对的特点,使用统计判别模型,融合多种识别特征自动挖掘网站中存在的双语短语翻译三元对。实验结果表明,采用该模型能高效处理命名实体双语翻译对,正确率达到95.6%。 The effect of translations of named entities and terms on many application systems such as NLP and machine translation attracts more and more attention.However,these translations are hard to attain sufficient information from current bilingual dictionary.In this paper we propose a method to automatically acquire high quality phrase translation pairs of the named entities from web corpora,and explore for the first time the automatic complementary way for the lost part of the bilingual corpora.The method utilises the features of bilingual translation pairs in web pages,uses a statistical discriminative model and combines with multiple recognising features to automatically mine ternary bilingual phrases translation pairs in web stations.Experimental results show that the use of the model can effectively deal with bilingual translation pairs of the named entities with high accuracy of 95.6%.
出处 《计算机应用与软件》 CSCD 2011年第4期33-36,共4页 Computer Applications and Software
基金 国家自然科学基金资助项目(60673041)
关键词 自然语言处理 中文信息处理 双语翻译对 命名实体短语 对齐补充 Natural language processing(NLP) Chinese information processing Bilingual translation pairs Named entities phrase Complement of bilingual alignment
  • 相关文献


  • 1郭稷,吕雅娟,刘群.一种有效的基于Web的双语翻译对获取方法[J].中文信息学报,2008,22(6):103-109. 被引量:11
  • 2Zhang Y,Vines P. Using the Web for Automated Translation Extraction in Cross2Language Information Retrieval [ C]//Proceedings of SIGIR 2004: 162-169.
  • 3Huang F,Zhang Y ,Vogel S. Mining Key Phrase Translations from Web Corpora [ C ]//Proceedings of HL T2EMNLP 2005:483 - 490.
  • 4Huang F, Vogel S, Waibel A. Automatic extraction of named entity translingual equivalence based on muhi-feature cost minimization [ C ]//Proceedings of ACL 2003 workshop on Muhilingual and mixed21anguage named entity recognition,9-16.
  • 5Huang F, Vogel S. Improved Named Entity Translation and Bilingual Named Entity Extraction[ C]//Proceedings of ICMI 2002 , 253-258.
  • 6张永臣,孙乐,李飞,李文波,西野文人,于浩,方高林.基于Web数据的特定领域双语词典抽取[J].中文信息学报,2006,20(2):16-23. 被引量:11
  • 7Naoaki Okazaki, Sophia Ananiadou. A Discriminative Alignment Model for Abbreviation Recognition International Conference On Computational Linguistics Proceedings of the 22nd International Conference on Computational Linguistics 2008.
  • 8Andrew Galen , Jianfeng Gao. Scalable training of Ll-regularized loglinear models [ C ]//Proceedings of the 24th International Conference on Machine Learning ( ICML 2007), 2007:33-40.
  • 9Blunsom Phil ,Trevor Cohn. Discriminative word alignment with conditional random fields [ C ]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics ( Coling-ACL 2006 ), 2006 : 65 - 72.
  • 10Jain Alpa, Silviu Cucerzan, Saliha Azzam. Acronym-expansion recognition and ranking on the web [ C ]//Proceedings of the IEEE International Conference on Information Reuse and Integration ( IRI 2007 ), 2007:209 - 214.


  • 1许勇,荀恩东,贾爱平,宋柔.基于互连网的术语定义获取系统[J].中文信息学报,2004,18(4):37-43. 被引量:13
  • 2张永臣,孙乐,李飞,李文波,西野文人,于浩,方高林.基于Web数据的特定领域双语词典抽取[J].中文信息学报,2006,20(2):16-23. 被引量:11
  • 3Y. Zhang and P. Vines. Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval [C]//the Proceedings of SIGIR 2004, 162-169.
  • 4F. Huang, Y. Zhang and S. Vogel. Mining Key Phrase Translations from Web Corpora[C]//the Proceedings of HLT-EMNLP 2005: 483-490.
  • 5F. Huang, S. Vogel and A. Waibel. Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization[C]//the Proceedings of ACL 2003 workshop on Multilingual and mixed-language named entity recognition,9-16.
  • 6F. Huang and S. Vogel. Improved Named Entity Translation and Bilingual Named Entity Extraction [C]//the Proceedings of ICMI 2002, 253-258.
  • 7Y. Zhang and P. Vines. Detection and Translation of OOV Terms Prior to Query Time[C]//the Proceed ings of SIGIR2004,524-525.
  • 8G. H. Cao, J. F. Gao and J. Y. Nie. A System to Mine Large-Scale Bilingual Dictionaries from Monolingual Web Pages[C]//MT Summit XI, 57-64.
  • 9M. Collins and N. Dully. New Ranking Algorithms for Parsing and Tagging: Kernel over Discrete Struc- tures, and the Voted Perceptron[C]//the Proceedings of ACL2002, 263-270.
  • 10M. Collins and T. Koo. Discriminative Reranking for Natural Language Parsing[J]. Computational Linguistics,2005,31 25-70.









使用帮助 返回顶部