摘要
传统基于统计的命名实体识别方法存在需要大量人工标注的缺陷,导致识别准确率较低。为了提升识别效果,提出一种基于条件随机场的半监督学习方法(S-CRF)对命名实体进行识别。该方法将实体识别看作序列标注问题,对少量数据进行人工标注并构建实体集,通过K-means聚类算法选取有代表性的未标注数据文本进行自动标注,采用条件随机场对语料进行训练测试。选取中文应急预案文档进行实验,该方法在各个标签上的识别效果分别达到93.52%、93.04%、95.81%。实验结果表明,该方法优于传统规则方法,能有效提高应急预案命名实体的识别效果。
The traditional statistical-based named entity recognition method requires large number of manual labeling defects,resulting in low recognition accuracy.In order to improve the recognition effect,we propose a method of conditional random field semi-supervised learning method(S-CRF)to identify and extract named entities.This method regards the entity recognition as the sequence labeling problem,manually label small amounts of data and constructed entity set.The K-means clustering algorithm is used to select representative unlabeled data texts for automatic labeling,and the conditional random field is used to sequence the corpus.The Chinese emergency plan document was selected for experiment.The accuracy of the B,M,and O labels reached 93.52%,93.04% and 95.81%,respectively.The experimental results show that the method is superior to the traditional rules method and can effectively improve the identification effect of named entity of the contingency plan.
作者
刘彤
魏静
倪维健
陈思源
LIU Tong;WEI Jing;NI Wei-jian;CHEN Si-yuan(College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China)
出处
《软件导刊》
2020年第3期35-38,共4页
Software Guide
基金
国家自然科学基金项目(71704096,61602278)
青岛市社科规划项目(QDSKL1801122)。
关键词
应急预案
命名实体识别
条件随机场
半监督学习
emergency plan
named entity identification
conditional random field
semi-supervised learning