期刊文献+

Generating Chinese named entity data from parallel corpora 被引量:1

Generating Chinese named entity data from parallel corpora
原文传递
导出
摘要 Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER. Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第4期629-641,共13页 中国计算机科学前沿(英文版)
基金 This work was supported by the National Natural Science Foundation of China (Grant Nos. 61133012, 61273321) and the National 863 Leading Technology Research Project (2012AA011102). Special thanks to Wanxiang Che, Yanyan Zhao, Wei He, Fikadu Gemechu, Yuhang Guo, Zhenghua Li, Meishan Zhang and the anonymous reviewers for insightful comments and suggestions.
关键词 named entity recognition Chinese named entity training data generating parallel corpora named entity recognition, Chinese named entity, training data generating, parallel corpora
  • 相关文献

参考文献30

  • 1Zhou G, Su J. Named entity recognition using an hmm-based chunk tagger. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002, 473-480.
  • 2Chieu H L, Ng H T. Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th Inter- national Conference on Computational Linguistics. 2002, 1 : 1-7.
  • 3Takeuchi K, Collier N. Use of support vector machines in extended named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning. 2002, 20:1-7.
  • 4Settles B. Biomedical named entity recognition using conditional ran- dom fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. 2004, 104-107.
  • 5Florian R, Ittycheriah A, Jing H, Zhang T. Named entity recognition through classifier combination. In: Proceedings of the 7th Conference on Natural Language Learning. 2003, 4:168-171.
  • 6Klein D, Smart J, Nguyen H, Manning C D. Named entity recognition with character-level models. In: Proceedings of the 7th Conference on Natural Language Learning. 2003, 4:180-183.
  • 7Finkel J, Dingare S, Manning C, Nissim M, Alex B, Grover C. Explor- ing the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics, 2005, 6(Suppl 1): S5.
  • 8Ciaramita M, Altun Y. Named-entity recognition in novel domains with external lexical knowledge. In: Proceedings of Human Language Tech- nologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers. 2005, 209-212.
  • 9Resnik P, Smith N A. The web as a parallel corpus. Computational Linguistics, 2003, 29(3): 349-380.
  • 10Zhang Y, Wu K, Gao J, Vines E Automatic acquisition of chinese- english parallel corpus from the web. In: Proceedings of the 28th Eu- ropean Conference on Advances in Information Retrieval. 2006, 420 -431.

同被引文献3

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部