摘要
提出了基于条件随机场(conditional random fields,CRF)的网页动态关系抽取算法.给出了动态关系的定义,建立了动态关系的表示模型,并用一个六维结构来表达动态关系.与传统关系抽取中基于规则或者基于分类的解决方法不同,本文认为可以将动态关系识别问题转化为一个标注问题,并提出了基于CRF的句子层面的关系标注和抽取方法.在本算法中,首先将一个句子通过语义角色标注(semantic role labeling,SRL)系统进行成分识别,然后将语义角色标注结果以及词的POS类型、词组的命名实体类型等作为CRF的训练特征,对句子成分进行标注.最后测试了大量的真实新闻网页,实验结果表明了本文提出算法的实用性和有效性.
New methods for extracting dynamic relations from web resources such as news pages were proposed.A relation was defined as dynamic if its instances changed over time.An example was the employment relation between people and companies.The nature of dynamic relations required the extraction methods to capture the temporal context of the relation.While most previous work on this topic has been domain-specific,a domain-independent,general approach was proposed using a conditional random fields(CRF)based technique.Experiment results show the practicality and precision of the proposed approach by experiments with news pages from the web.
基金
国家重点实验室开放课题(2009006)
国家自然科学基金(60776801
70803001)
北京市"现代信息科学与网络技术"重点实验室暨铁道部"铁路信息科学与工程"开放实验室开放基金(XDXX1005)资助
关键词
条件随机场
关系抽取
语义角色标准
conditional random fields
relation extraction
semantic role labeling