摘要
Named entity disambiguation (NED) is the task of linking mentions of ambiguous entities to their referenced entities in a knowledge base such as Wikipedia. We propose an approach to effectively disentangle the discriminative features in the manner of collaborative utilization of collective wisdom (via human-labeled crowd labels) and deep learning (via human-generated data) for the NED task. In particular, we devise a crowd model to elicit the underlying features (crowd features) from crowd labels that indicate a matching candidate for each mention, and then use the crowd features to fine-tune a dynamic convolutional neural network (DCNN). The learned DCNN is employed to obtain deep crowd features to enhance traditional hand-crafted features for the NED task. The proposed method substantially benefits from the utilization of crowd knowledge (via crowd labels) into a generic deep learning for the NED task. Experimental analysis demonstrates that the proposed approach is superior to the traditional hand-crafted features when enough crowd labels are gathered.
命名实体消歧主要研究如何将存在歧义的文本描述映射到其对应知识库(例如Wikipedia)中的实体。本文结合群体智能(即群体用户生成的标签)和深度学习(即数据驱动学习),提出了在命名实体消歧过程中生成区别度更高的特征方法。具体来说,通过设计一个众包模型,学习文本描述或实体所对应"众包特征",然后利用"众包特征"对动态卷积神经网络(Dynamic convolutional neural network,DCNN)进行优化,最后用优化得到的DCNN来提取"深度众包特征",以此来解决传统命名实体消歧算法中单独依赖手工设计特征的不足。本文所提出方法巧妙将群体认知(由众包标签反映)结合到命名实体消歧深度学习框架中。实验分析表明,当有足够多众包标签时,所提出方法优于传统手工设计特征。
基金
supported by the National Basic Research Program of China(No.2015CB352300)
the National Natural Science Foundation of China(Nos.61402401 and U1509206)
the Zhejiang Provincial Natural Science Foundation of China(No.LQ14F010004)
the China Knowledge Centre for Engineering Sciences and Technology
the Fundamental Research Funds for the Central Universities
the Qianjiang Talents Program of Zhejiang Province,China