Disambiguating named entities with deep supervised learning via crowd labels

基于众包标签数据深度学习的命名实体消歧算法（英文）

导出

摘要 Named entity disambiguation （NED） is the task of linking mentions of ambiguous entities to their referenced entities in a knowledge base such as Wikipedia. We propose an approach to effectively disentangle the discriminative features in the manner of collaborative utilization of collective wisdom （via human-labeled crowd labels） and deep learning （via human-generated data） for the NED task. In particular, we devise a crowd model to elicit the underlying features （crowd features） from crowd labels that indicate a matching candidate for each mention, and then use the crowd features to fine-tune a dynamic convolutional neural network （DCNN）. The learned DCNN is employed to obtain deep crowd features to enhance traditional hand-crafted features for the NED task. The proposed method substantially benefits from the utilization of crowd knowledge （via crowd labels） into a generic deep learning for the NED task. Experimental analysis demonstrates that the proposed approach is superior to the traditional hand-crafted features when enough crowd labels are gathered. 命名实体消歧主要研究如何将存在歧义的文本描述映射到其对应知识库(例如Wikipedia)中的实体。本文结合群体智能(即群体用户生成的标签)和深度学习(即数据驱动学习),提出了在命名实体消歧过程中生成区别度更高的特征方法。具体来说,通过设计一个众包模型,学习文本描述或实体所对应"众包特征",然后利用"众包特征"对动态卷积神经网络(Dynamic convolutional neural network,DCNN)进行优化,最后用优化得到的DCNN来提取"深度众包特征",以此来解决传统命名实体消歧算法中单独依赖手工设计特征的不足。本文所提出方法巧妙将群体认知(由众包标签反映)结合到命名实体消歧深度学习框架中。实验分析表明,当有足够多众包标签时,所提出方法优于传统手工设计特征。

作者 Le-kui ZHOU Si-liang TANG Jun XIAO Fei WU Yue-ting ZHUANG

机构地区 Institute of Artificial Intelligence

出处《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第1期97-106,共10页 信息与电子工程前沿（英文版）

基金 supported by the National Basic Research Program of China(No.2015CB352300) the National Natural Science Foundation of China(Nos.61402401 and U1509206) the Zhejiang Provincial Natural Science Foundation of China(No.LQ14F010004) the China Knowledge Centre for Engineering Sciences and Technology the Fundamental Research Funds for the Central Universities the Qianjiang Talents Program of Zhejiang Province,China

关键词 Named entity disambiguation Crowdsourcing Deep learning 命名实体歧义消除；Crowdsourcing；深学习；TP391.4；

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1Wei LI,Wen-jun WU,Huai-min WANG,Xue-qi CHENG,Hua-jun CHEN,Zhi-hua ZHOU,Rong DING.Crowd intelligence in AI 2.0 era[J].Frontiers of Information Technology & Electronic Engineering,2017,18(1):15-43. 被引量：36
2Mark GREAVES.Semantics and the crowd[J].Journal of Zhejiang University-Science C(Computers and Electronics),2012,13(4):247-249.
3骆凯,罗军勇,尹美娟,高李政.一种基于动态阈值的突发流量异常检测方法[J].信息工程大学学报,2016,17(4):509-512. 被引量：4
4姜红德.娱乐之后,众筹下一步做什么?[J].中国信息化,2014(10):12-12.
5Ratna SANYAL,Kushal KESHRI,Vidya NAND.Importance of retrieving noun phrases and named entities from digital library content[J].Journal of Zhejiang University-Science C(Computers and Electronics),2010,11(11):844-849.
6吴婷,Yan Guozheng,Yang Banghua,Sun Hong.EEG classification based on probabilistic neural network with supervised learning in brain computer interface[J].High Technology Letters,2009,15(4):384-387. 被引量：1
7伍大勇,Zhao Shiqi,Liu Ting.Acquiring synonymous attribute phrases for named entities via online encyclopedia[J].High Technology Letters,2013,19(4):398-405.
8梁静,葛宇,冉晓娟,李琦.一种人工蜂群算法改进方案[J].计算机应用研究,2015,32(11):3295-3299. 被引量：2
9骆凯,罗军勇,尹美娟,刘琰,高李政.Flash Crowd与DDoS攻击区分方法研究综述[J].计算机科学,2015,42(B11):313-316. 被引量：2
10陈鸿昶,程国振,伊鹏.基于多尺度特征融合的异常流量检测方法[J].计算机科学,2012,39(2):42-46. 被引量：3

Frontiers of Information Technology & Electronic Engineering

2017年第1期

浏览历史

内容加载中请稍等...

Disambiguating named entities with deep supervised learning via crowd labels

相关作者

相关机构

相关主题

浏览历史