摘要
对只能获得部分标记的训练文本,将主动学习方法应用到文本信息抽取中,提出了一种基于主动学习隐马尔可夫模型的文本信息抽取方法.在该方法中,通过主动学习,仅将对隐马尔可夫模型的训练最有价值的训练文本挑选出来进行标记.实验表明,通过选择模型信任值的最佳门槛值,该方法在保证文本信息抽取性能的前提下,大大减少了用户标记训练文本的工作量.
An active learning was used in text information extraction for training text, which was partly labeled. And an approach of text information extraction based on active hidden Markov model was proposed. In this approach, only the most valuable training text for training the model was selected out to label through active learning. Experimental results show that, by selecting the optimal threshold for active model parameters, the approach can reduce the user workload for labeling without affecting the performance of text information extraction.
出处
《湖南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2007年第6期74-77,共4页
Journal of Hunan University:Natural Sciences
基金
国家'863'高技术研究发展计划基金资助项目(2006AA01Z227)
关键词
主动学习
隐马尔可夫模型
文本信息抽取
active learning
hidden Markov model
text information extraction