摘要
各种词性标注方法总是利用从某一侧面描述的语言学知识,当训练语料达到一定规模、训练模型完善到一定程度后,标注精度很难再有进一步的提高。本文在对TBED、DT、HMM和ME四种基于语料库的词性标注方法研究的基础上,提出了一种新的词性标注融合策略——相关投票法。从理论上分析了该方法的优越性,并与其他融合策略进行了对比实验。实验结果表明,应用融合策略可以更加全面地描述词性标注知识,从而更好地完成词性标注任务;在几种融合策略中,相关投票法是最优秀的,它使标注的平均错误率降低27.85%。
Part-of-speech (POS) tagging approaches always utilizes linguistic knowledge described from one perspective. Based on the research of four kinds of POS tagging methods, such as, TBED, DT, HMM and ME, we propose a novel data fusion strategy for POS tagging--- correlation voting method. The result of experiment shows that linguistic knowledge of POS tagging can be more roundly described by applying data fusion, and the correlative voting is better than other fusion methods for an average decrease of 27.85% in tagging error rate.
出处
《中文信息学报》
CSCD
北大核心
2007年第2期9-13,共5页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60372038)
关键词
人工智能
自然语言处理
词性标注
融合策略
相关投票
artificial intelligence
natural language processing
part of speech tagging
fusion strategy
correlationvoting