摘要
针对信息抽取领域中存在的抽取结果难以满足需要的问题,给出基于条件随机域模型的方法,以解决组块标注和实体关系抽取问题。通过定义中文组块和实体关系的标注方式,选择比较通用的《人民日报》语料,训练出效率较高的二阶模板来抽取文本中的实体关系。实验结果表明,该方法可以获得更好的抽取效果。
To solve disorder among information items and lack of information item in the field of information extraction, this paper proposes a solution to deal with chunks labeling and Entity Relation Extraction(ERE) based on the conditional random fields model. This paper defines the representation of Chinese chunk and entity relation, and uses label dataset of "People's Daily" as sample dataset to train an optimized model for the entity extraction. Experimental results show this method has better extraction performance.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第24期192-194,共3页
Computer Engineering
关键词
信息抽取
组块标注
实体关系抽取
条件随机域模型
information extraction
chunks labeling
entity relation extraction
Conditional Random Fields(CRFs) model