期刊文献+

融合跨语言特征的柬埔寨语命名实体识别方法 被引量:5

A Khmer named entity recognition method incorporating cross-lingual features
下载PDF
导出
摘要 为了解决柬埔寨语词法标注语料稀缺、柬埔寨语命名实体缺乏明显标识特征的问题,提出一种引入英柬跨语言特征的柬埔寨语命名实体识别方法.首先,借助英语命名实体的成熟模型及英柬双语平行语料的词对齐关系,将源语言的实体类别映射到目标语言;然后根据柬埔寨语词向量构造最近邻图,采用标签传播算法,获得柬埔寨语单词的实体类别分布,完成跨语言知识转移;最后,将柬埔寨语单词的命名实体类别分布作为约束特征融入到条件随机场模型中.实验结果表明,融入跨语言特征的条件随机场模型能有效地提升柬埔寨语命名实体识别的效果. In order to solve the scarcity of Khmer lexical labeled corpus and the lack of obvious discriminative features of Khmer named entity,a Khmer named entity recognition method using English-Khmer cross lingual features is proposed.First,the named entity category of the source language word is projected to the target language side by means of the mature English named entity recognition model and the word alignment pairs from bilingual parallel corpus.Then the nearest neighbor graph is built using the Khmer word embeddings,label propagation algorithm is used to obtain the named entity category distribution of Khmer words,and cross lingual knowledge transfer is carried out.Finally,the named entity category distribution of Khmer words is incorporated into the conditional random field model as a constraint feature.The experimental results show that the conditional random field model incorporating cross lingual features can improve the effect of named entity recognition of Khmer.
作者 徐广义 严馨 余正涛 周丽华 XU Guang-yi;YAN Xin;YU Zheng-tao;ZHOU Li-hua(Yunnan Nantian Electronics Information Co.,Ltd.,Kunming 650041,China;School of Information Engineering and Automation,Kunming University of Science and Teehnology,Kunming 650500,China;School of Information Science and Engineering,Yunnan University,Kunming 650500,China)
出处 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2018年第5期865-871,共7页 Journal of Yunnan University(Natural Sciences Edition)
基金 国家自然科学基金(61462055 61562049 61363044) 云南省高新技术产业发展项目计划(201606)
关键词 英柬双语 柬语命名实体识别 跨语言映射 标签传播 词向量 English - Khmer Khmer named entity recognition cross lingual projection label propagation word embeddings
  • 相关文献

参考文献4

二级参考文献59

  • 1张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量:24
  • 2李维刚,刘挺,张宇,李生.基于长度和位置信息的双语句子对齐方法[J].哈尔滨工业大学学报,2006,38(5):689-692. 被引量:25
  • 3google翻译[EB/OL].http://translate.google.cn/,2013-02-11.
  • 4Zhou G, Su J. Named entity recognition using an hmm-based chunk tagger. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002, 473-480.
  • 5Chieu H L, Ng H T. Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th Inter- national Conference on Computational Linguistics. 2002, 1 : 1-7.
  • 6Takeuchi K, Collier N. Use of support vector machines in extended named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning. 2002, 20:1-7.
  • 7Settles B. Biomedical named entity recognition using conditional ran- dom fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. 2004, 104-107.
  • 8Florian R, Ittycheriah A, Jing H, Zhang T. Named entity recognition through classifier combination. In: Proceedings of the 7th Conference on Natural Language Learning. 2003, 4:168-171.
  • 9Klein D, Smart J, Nguyen H, Manning C D. Named entity recognition with character-level models. In: Proceedings of the 7th Conference on Natural Language Learning. 2003, 4:180-183.
  • 10Finkel J, Dingare S, Manning C, Nissim M, Alex B, Grover C. Explor- ing the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics, 2005, 6(Suppl 1): S5.

共引文献4

同被引文献81

引证文献5

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部