期刊文献+

基于双语协同训练的最大名词短语识别研究 被引量:5

Title Recognition of Maximal-Length Noun Phrase Based on Bilingual Co-Training
下载PDF
导出
摘要 针对传统方法对双语最大名词短语识别一致性差以及跨领域识别能力弱的缺点,提出一种基于半监督学习的双语最大名词短语识别算法.利用汉英最大名词短语的互译性和识别的互补性,把平行的汉语句子和英语句子这两个数据集看作一个数据集的两个不同的视图进行双语协同训练.在协同训练中,把双语对齐标注一致率作为标记置信度估计依据,进行增量标记数据的选择.实验结果表明:该算法显著提高了双语最大名词短语的识别能力,在跨领域测试和同领域测试中,F值分别比目前最好的最大名词短语识别模型提高了4.52%和3.08%. This article focuses on the problem of weak cross-domain ability on bilingual maximal-length noun phrase recognition. A bilingual noun phrase recognition algorithm based on semi-supervised learning is proposed. The approach can make full use of both the English features and the Chinese features in a unified framework, and it regards the two language corpus as different view of one dataset. Instances with the highest confidence score are selected and merged, and then added to the labeled data set to train the classifier. Experimental results on test sets show the effectiveness of the proposed approach which outperforms 4.52% over the baseline in cross-domain, and 3.08% over the baseline in similar domain.
出处 《软件学报》 EI CSCD 北大核心 2015年第7期1615-1625,共11页 Journal of Software
基金 国家重点基础研究发展计划(973)(2013CB329300) 国家自然科学基金(61132009 61201352 61202244)
关键词 最大名词短语 半监督学习 标注投射 双语协同训练 短语识别 maximal-length noun phrase semi-supervised learning label projection bilingual co-training phrase identifieation
  • 相关文献

参考文献1

二级参考文献22

  • 1王立霞,孙宏林.现代汉语介词短语边界识别研究[J].中文信息学报,2005,19(3):80-86. 被引量:11
  • 2干俊伟,黄德根.汉语介词短语的自动识别[J].中文信息学报,2005,19(4):17-23. 被引量:14
  • 3冯冲,陈肇雄,黄河燕,张亮,王江伟.基于条件随机域的复杂最长名词短语识别[J].小型微型计算机系统,2006,27(6):1134-1139. 被引量:16
  • 4ZHOU Guodong, SU Jian, TEY Tongguan. Hybrid text chunking [ C ]//Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Lisbon, Portugal, 2000: 163-165.
  • 5KUDO T, MATSUMOTO Y. Chunking with support vector machines[ C]//Proceedings of the North American Chapter of the Association for Computational Linguistics. Pittsburgh, USA, 2001: 192-199.
  • 6SHA Fei, PEREIRA F. Shallow parsing with conditional random fields [ C ]//Proceedings of the North American Chapter of the Association for Computational Linguistics. Edmonton, Canada, 2003: 213-220.
  • 7BAI Xuemei, LI Jinji, KIM Dongil, et al. Identification of maximal-length noun phrases based on expanded chunks and classified punctuations in Chinese [ C ]//Proceedings of International Conference on Computer Processing of Oriental Languages. Singapore, 2006: 268-276.
  • 8TJONG KIM SANG E F. Noun phrase recognition by system combination [ C ]//Proceedings of the North American Chapter of the Association for Computational Linguistics. Seattle. USA. 2000: 50-55.
  • 9CHEN Wenliang, ZHANG Yujie, ISAHARA H. An em pirical study of Chinese chunking[ C]//Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics. Sydney, Australia, 2006 : 97-104.
  • 10LEE Linshan, LIN Longji, CHEN Kehjiann. An efficient natural language processing system specially designed for the Chinese language [ J ]. Computational Linguistics, 1991, 17(4): 347-374.

共引文献8

同被引文献21

引证文献5

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部