摘要
跨文本(实体)指代消解(CDCR)的任务就是把所有分布在不同文本但指向相同实体的词组合在一起形成一个指代链。传统的跨文本指代消解主要采用聚类方法来解决信息检索中遇到的重名消歧问题。将聚类问题转换为分类问题,并采用支持向量机(SVM)分类器来解决信息抽取中的重名消歧和多名聚合问题。该方法可有效融合实体名称的构词特征、读音特征以及文本内部和文本外部的多种语义特征。在中文跨文本指代语料库上的实验表明,同聚类方法相比,该方法在提高精度的同时,也提高了召回率。
The task of Cross-Document Co-reference Resolution(CDCR) aims to merge those words distributed in different texts which refer to the same entity together to form co-reference chains.The traditional research on CDCR addresses name disambiguation posed in information retrieval using clustering methods.This paper transformed CDCR as a classification problem by using an Support Vector Machine(SVM) classifier to resolve both name disambiguation and variant consolidation,both of which were prevalent in information extraction.This method can effectively integrate various features,such as morphological,phonetic,and semantic knowledge collected from the corpus and the Internet.The experiment on a Chinese cross-document co-reference corpus shows the classification method outperforms clustering methods in both precision and recall.
出处
《计算机应用》
CSCD
北大核心
2013年第4期984-987,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60873150
90920004)
江苏省自然科学基金资助项目(BK2010219)
江苏省高校自然科学重大项目(11KJA520003)
关键词
跨文本指代
信息抽取
支持向量机分类器
语义信息
重名消歧
多名聚合
cross document co-reference resolution
information extraction
Support Vector Machine(SVM) classifier
semantics
name disambiguation
variant consolidation