摘要
本文提出了一种快速学习算法,它解决了Brill基于变换的学习方法中规则获取时间过长的问题.在每次迭代过程中,该算法仅需调整受到影响的小部分变换模式,而无需遍历所有变换模式,大大节省了学习时间.应用这一快速学习算法,以SUSANNE英语语料库作为训练文本,获得300条英语词性标注规则和生词处理的概率知识,实现了一个英语词性标注系统,系统封闭测试和开放测试的正确率分别达到了98.2%和96.6%.
A fast learning algorithm is presented to shorten the time needed for theBrill's transformation based on learning algorithm to a great extent. In every cy-cling of learning, this new algorithm modifies only the influenced transformationsinstead of inspecting all the existed transformations. Therefore, the learning timeis reduced greatly. Applying this algorithm, an experiment is conducted with Eng-lish SUSANNE corpus, and 300 part-of-speech tagging rules out of six kinds oftemplates are obtained. The tagging precision reaches as high as 98. 2 % and 96. 6 %respectively for close test and open text.
出处
《计算机学报》
EI
CSCD
北大核心
1998年第4期357-366,共10页
Chinese Journal of Computers
基金
国家自然科学基金!69672027
航天预研基金
关键词
学习算法
词性标注
自然语言处理
变换算法
Transformation-based learning, part-of-speech tagging, natural language processing