摘要
藏语句法功能组块分析旨在识别出藏语句子的句法成分,为后续句子级深入分析提供支持。根据藏语的语言特点,该文在藏语句法功能组块描述体系基础上,提出基于错误驱动学习策略的藏语功能组块边界识别方法。具体思路为,首先基于条件随机场(Conditional Random Fields,CRFs)识别组块,然后分别基于转换规则的错误驱动学习(Transformation-based Error-driven Learning,TBL)及基于新特征模板的CRFs错误驱动学习进行二次识别,并对初次结果进行校正,F值分别提高了1.65%、8.36%。最后通过实验分析,进一步将两种错误驱动学习机制融合,在18 073词级的藏语语料上开展实验,识别性能进一步提高,准确率、召回率与F值分别达到94.1%、94.76%与94.43%,充分验证了本文提出方法的有效性。
Tibetan chunking is aimed at identifying syntactic constituent in Tibetan sentences to facilitate further analysis of sentences. According to the unique characteristics o{ Tibetan, the paper puts forward an error-driven learning strategy to identify the chunk boundary based on the description system of Tibetan syntactic functional chunk. The specific idea is as follows: we recognize the chunk boundary using the Conditional Random Fields (CRFs) model at first. Then the recognition result is refined through Transformatiowbased Error-driven Learning (TBL) method and the CRFs error-driven method. The F values of both methods increase 1.65% and 8.36%, respectively. Finally we combine these two error-driven techniques. In the experiment of the Tibetan corpus which contains 18073 words, the precision, recall and F value achieves 94. 1% ,94.76% and 94.43%, respectively.
出处
《中文信息学报》
CSCD
北大核心
2014年第5期170-175,191,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61201352
61132009)
国家重点基础研究发展规划(973)(2013CB329303)
北京理工大学基础研究基金(20130742010)