期刊文献+

基于错误驱动学习策略的藏语句法功能组块边界识别 被引量:7

Tibetan Chunking Based on Error-Driven Learning Strategy
下载PDF
导出
摘要 藏语句法功能组块分析旨在识别出藏语句子的句法成分,为后续句子级深入分析提供支持。根据藏语的语言特点,该文在藏语句法功能组块描述体系基础上,提出基于错误驱动学习策略的藏语功能组块边界识别方法。具体思路为,首先基于条件随机场(Conditional Random Fields,CRFs)识别组块,然后分别基于转换规则的错误驱动学习(Transformation-based Error-driven Learning,TBL)及基于新特征模板的CRFs错误驱动学习进行二次识别,并对初次结果进行校正,F值分别提高了1.65%、8.36%。最后通过实验分析,进一步将两种错误驱动学习机制融合,在18 073词级的藏语语料上开展实验,识别性能进一步提高,准确率、召回率与F值分别达到94.1%、94.76%与94.43%,充分验证了本文提出方法的有效性。 Tibetan chunking is aimed at identifying syntactic constituent in Tibetan sentences to facilitate further analysis of sentences. According to the unique characteristics o{ Tibetan, the paper puts forward an error-driven learning strategy to identify the chunk boundary based on the description system of Tibetan syntactic functional chunk. The specific idea is as follows: we recognize the chunk boundary using the Conditional Random Fields (CRFs) model at first. Then the recognition result is refined through Transformatiowbased Error-driven Learning (TBL) method and the CRFs error-driven method. The F values of both methods increase 1.65% and 8.36%, respectively. Finally we combine these two error-driven techniques. In the experiment of the Tibetan corpus which contains 18073 words, the precision, recall and F value achieves 94. 1% ,94.76% and 94.43%, respectively.
出处 《中文信息学报》 CSCD 北大核心 2014年第5期170-175,191,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金(61201352 61132009) 国家重点基础研究发展规划(973)(2013CB329303) 北京理工大学基础研究基金(20130742010)
关键词 错误驱动学习 藏语句法功能组块 组块边界识别 CRFS TBL error-driven learning Tibetan syntactic functional chunk chunk boundary recognition CRFs TBL
  • 相关文献

参考文献21

  • 1Abney,Steven P.Parsing by Chunks[M].Springer Netherlands,1992.
  • 2Ramshaw,Lance,Mitchell Marcus.Text Chunking using Transformation-Based Learning[C]Proceedings of the ACL Third Workshop on Very Large Corpora,1995:82-94.
  • 3Tjong Kim Sang E F,Buchholz S.Introduction to the CoNLL-2000 Shared Task:Chunking[C]//Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning.Association for Computational Linguistics,2000(7):127-132.
  • 4Pierce D,Cardie C.Limitations of co-training for Natural Language Learning from Large Datasets.[C]// Proceeding of the 2001 Conference on Empirical Methods in Natural Language Processing,Cornel University,Ithaca NY,2001:1-9.
  • 5李珩,朱靖波,姚天顺.基于SVM的中文组块分析[J].中文信息学报,2004,18(2):1-7. 被引量:50
  • 6李素建,刘群,杨志峰.基于最大熵模型的组块分析[J].计算机学报,2003,26(12):1722-1727. 被引量:58
  • 7Tan Y M,Yao T S,Chen Q,et al.Applying Conditional Random Fields to Chinese Shallow Parsing.Proceedings of CICLing2-2005.Mexico City,Mexico,2005:167-176.
  • 8周强,赵颖泽.汉语功能块自动分析[J].中文信息学报,2007,21(5):18-24. 被引量:13
  • 9陈亿,周强,宇航.分层次的汉语功能块描述库构建分析[J].中文信息学报,2008,22(3):24-31. 被引量:8
  • 10黄德根,于静.分布式策略与CRFs相结合识别汉语组块[J].中文信息学报,2009,23(1):16-22. 被引量:6

二级参考文献91

共引文献143

同被引文献76

引证文献7

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部