摘要
现有命名实体关系抽取算法没有考虑关系特征序列的模式差异。针对该不足,提出一种改进的命名实体关系抽取算法。在语料库中识别出所有命名实体,利用最短依存路径以及与实体本身关系密切的词对实体关系特征进行提取,基于核函数计算关系特征序列的相似度,输出候选命名实体关系对及其关系。实验结果表明,改进算法具有较好的查全率与查准率,其调和平均值可达78%。
Existing named entity relation extraction algorithm does not consider the pattern difference of relation characteristic sequence. Aiming at this shortage, this papcr proposcs an improved entity relation extraction algorithm. It identifies all of the named entity in the corpus, extracts entity relation characteristic bascd on the shortest path dependence and the words closely related to the entities, and computes the similarity of the relation feature sequences based on kernel function. Experimental result shows that the improved algorithm has good recall and precision, and its harmonic mean is up to 78%.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第24期289-290,F0003,共3页
Computer Engineering
关键词
命名实体关系抽取
最短依存路径
核函数
调和平均值
named entity relation extraction
shortest dependence path
kernel function
harmonic mean