摘要
二元实体关系元组可以应用到知识库构建,数据挖掘,模式抽取等多个领域.本文利用特定关系的一个元组和一个关键词作为种子,结合多种自然语言处理底层技术,采取改进的模式获取方法和自举迭代策略,提出了一种新的从Web上抽取实体关系元组的方法.基准方法的平均准确率达到了78.12%,采用过滤措施后抽取方法的平均准确率达到了98.42%.实验结果表明,利用网络挖掘方法获取的实体关系元组能够很好满足信息抽取的应用,对抽取出的元组进一步处理,能够获取更多有价值的信息.
Binary entity relationship tuples can be applied in many fields such as knowledge base construction,data mining and pattern extraction and so on.A seed with a tuple and a keyword of a special relation is used to implement the method of extracting entity relation tuples from the web.Multiple Natural Language Processing(NLP)technologies are combined in this method.A novel pattern acquisition method and an improved bootstrapping iteration strategy are adopted to extract tuples.The baseline method achieves to 78.12% of average precision.The method with filtering measure achieves to 98.42%.The experimental results show that it can satisfy information extraction application well and the extracted tuples can derive more valuable information through further processing.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2007年第11期2111-2116,共6页
Acta Electronica Sinica
基金
国家自然科学基金(No.60503072
No.60575042)
关键词
自举方法
实体关系
元组
信息抽取
网络挖掘
bootstrapping
entity relation
tuples
information extraction
web mining