摘要
学术引文推荐是指通过论文间的匹配关系为查询论文提供深度匹配的引文文献列表,提高学者科研工作效率.现有方法主要基于短文本匹配(如关键词、标题等),缺乏对论文结构和整体语义的表示能力,导致检索结果语义相关性差.本文从长文本的深层次数据特征出发,提出一种基于层次化交互注意力匹配的引文推荐算法.基于深度神经网络构建单词、句子、文章的层次化表示框架,提升长文本的结构化表示能力;使用内部注意力机制增强学术论文的内部语义表示;使用交互注意力机制挖掘引文间细粒度匹配特征.在计算机、自然语言处理、医学等学术文献数据集上进行实验验证,提出的方法在ACC和F1等指标均优于短文本匹配模型,结果表明层次化交互注意力能获得更好的引文匹配效果.
Academic citation recommendation refers to providing a deep matching list of citations to research papers through the matching relationship between papers,so as to improve the efficiency of scholars′scientific research.Existing methods are mainly based on short text matching(such as keywords,titles,etc.),lacking the ability to represent the structure and overall semantic meaning of the paper,which leads to poor semantic relevance of retrieval results.Starting from the deep data characteristics of long texts,this paper proposes a citation recommendation algorithm based on hierarchical interactive attention matching.The hierarchical representation frame of words,sentences and articles of paper is constructed based on the deep neural network,enhancing the structured representation of long text.The internal semantic information representation of academic papers is enhanced by using an internal attention mechanism,and the interactive attention mechanism is used to mine fine-grained matching features between citations.The proposed method was validated on computer,NLP,and medical literature datasets,and was superior to the short text matching model in ACC and F1,indicating that the hierarchical interactive attention model can be used to achieve better semantic matching of citations.
作者
樊加倍
钱宇华
彭甫镕
FAN Jia-bei;QIAN Yu-hua;PENG Fu-rong(Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China;College of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education,Taiyuan 030006,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2023年第12期2656-2662,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62136005)资助。
关键词
引文推荐
文本匹配
数据挖掘
文本编码
citation recommendation
text matching
data mining
text representation