摘要
指代消解是自然语言处理领域中的一个重要问题。针对当前中文指代标注训练语料非常缺乏的现状,本文提出一种无监督聚类算法实现对名词短语的指代消解。引入图对名词短语的指代消解问题进行建模,将指代消解问题转化为图划分问题,并引入一个有效的模块函数实现对图的自动划分,使得指代消解过程并不是孤立地对每一对名词短语分别进行共指决策,而是充分考虑了多个待消解项之间的相关性,并且避免了阈值选择问题。通过在ACE中文语料上的人称代词消解和名词短语消解实验结果表明,该算法是一种有效可行的无监督指代消解算法。
Coreference resolution plays an important role in natural language processing. Facing the fact that the Chinese training corpus for coreference resolution is heavily lacking, this paper presents a new unsupervised clustering algorithm for noun phrase coreference resolution. In this approach, the problem of coreference resolution is firstly converted as a graph clustering problem, and then an objective function called the modularity function, which allows automatic selection of the number of clusters, is selected for graph clustering. The proposed algorithm does not make pairwise coreference decisions independently of each other. The experimental results on the Chinese ACE training corpus demonstrate that the proposed method is a feasible unsupervised algorithm for noun phrase coreference resolution.
出处
《中文信息学报》
CSCD
北大核心
2007年第2期77-82,共6页
Journal of Chinese Information Processing
基金
国家863高技术研究发展计划资助项目(2006AA01Z143)
国家自然科学基金资助项目(60673043)
江苏省自然科学基金项目(BK2006117)
关键词
人工智能
自然语言处理
聚类
指代消解
模块函数
artificial intelligence
natural language processing
clustering
coreference resolution
modularity function