This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors deno...This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus.展开更多
Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studie...Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studies have proposed various approaches based on different types of information,such as raw document features(e.g.,co-authors,titles,and keywords),the fusion feature(e.g.,a hybrid publication embedding based on multiple raw document features),the local structural information(e.g.,a publication's neighborhood information on a graph),and the global structural information(e.g.,interactive information between a node and others on a graph).However,there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far.To fill the gap,we propose a novel framework named EAND(Towards Effective Author Name Disambiguation by Hybrid Attention).Specifically,we design a novel feature extraction model,which consists of three hybrid attention mechanism layers,to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients,raw document features,and the fusion feature.Each hybrid attention mechanism layer contains three key modules:a local structural perception,a global structural perception,and a feature extractor.Additionally,the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space.Experimental results on two real-world datasets demonstrate that EAND achieves superior performance,outperforming state-of-the-art methods by at least+2.74%in terms of the micro-F1 score and+3.31%in terms of the macro-F1 score.展开更多
The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in docum...The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multi- kernel function and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named per- son knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.展开更多
AMiner is a novel online academic search and mining system,and it aims to provide a systematic modeling approach to help researchers and scientists gain a deeper understanding of the large and heterogeneous networks f...AMiner is a novel online academic search and mining system,and it aims to provide a systematic modeling approach to help researchers and scientists gain a deeper understanding of the large and heterogeneous networks formed by authors,papers,conferences,journals and organizations.The system is subsequently able to extract researchers’profiles automatically from the Web and integrates them with published papers by a way of a process that first performs name disambiguation.Then a generative probabilistic model is devised to simultaneously model the different entities while providing a topic-level expertise search.In addition,AMiner offers a set of researcher-centered functions,including social influence analysis,relationship mining,collaboration recommendation,similarity analysis and community evolution.The system has been in operation since 2006 and has been accessed from more than 8 million independent IP addresses residing in more than 200 countries and regions.展开更多
文摘This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus.
基金supported by the Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant Nos.19KJA610002 and 19KJB520050the National Natural Science Foundation of China under Grant No.61902270.
文摘Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studies have proposed various approaches based on different types of information,such as raw document features(e.g.,co-authors,titles,and keywords),the fusion feature(e.g.,a hybrid publication embedding based on multiple raw document features),the local structural information(e.g.,a publication's neighborhood information on a graph),and the global structural information(e.g.,interactive information between a node and others on a graph).However,there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far.To fill the gap,we propose a novel framework named EAND(Towards Effective Author Name Disambiguation by Hybrid Attention).Specifically,we design a novel feature extraction model,which consists of three hybrid attention mechanism layers,to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients,raw document features,and the fusion feature.Each hybrid attention mechanism layer contains three key modules:a local structural perception,a global structural perception,and a feature extractor.Additionally,the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space.Experimental results on two real-world datasets demonstrate that EAND achieves superior performance,outperforming state-of-the-art methods by at least+2.74%in terms of the micro-F1 score and+3.31%in terms of the macro-F1 score.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 61370165 and 61203378), Shcnzhcn Development and Rcforrn Commission ([2014]1507), Shcnzhcn Peacock Plan Research (KQCX20140521144507925) and Shenzhcn Fundarncntal Research Funding (JCYJ20150625142543470). The work by the second author was partially supported by the Hong Kong Polytechnic University, China.
文摘The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multi- kernel function and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named per- son knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.
文摘AMiner is a novel online academic search and mining system,and it aims to provide a systematic modeling approach to help researchers and scientists gain a deeper understanding of the large and heterogeneous networks formed by authors,papers,conferences,journals and organizations.The system is subsequently able to extract researchers’profiles automatically from the Web and integrates them with published papers by a way of a process that first performs name disambiguation.Then a generative probabilistic model is devised to simultaneously model the different entities while providing a topic-level expertise search.In addition,AMiner offers a set of researcher-centered functions,including social influence analysis,relationship mining,collaboration recommendation,similarity analysis and community evolution.The system has been in operation since 2006 and has been accessed from more than 8 million independent IP addresses residing in more than 200 countries and regions.