摘要
关键词抽取对文本处理影响较大,其识别的准确度及流畅程度是任务的关键。为有效缓解短文本关键词提取过程中词划分不准确、关键词与文本主题不匹配、多语言混合等难题,提出了一种基于图到序列学习模型的自适应短文本关键词生成模型ADGCN。模型采用图神经网络与注意力机制相结合的方式作为对文本信息特征提取的编码框架,针对词的位置特征和语境特征编码,解决了短文本结构不规律和词之间存在关联复杂信息的问题。同时采用了一种线性解码方案,生成了可解释的关键词。在解决问题的过程中,从某社交平台收集并公布了一个标签数据集,其包括社交平台发文文本和话题标签。实验中,从用户需求角度出发对模型结果的相关性、信息量、连贯性进行评估和分析,所提模型不仅可以生成符合短文本主题的关键词,还可以有效缓解数据扰动对模型的影响。所提模型在公开数据集KP20k上仍表现良好,具有较好的可移植性.
Keyword extraction has a great impact on text processing,and the accuracy and fluency of keyword recognition are the keys to the task.In order to effectively solve the problems such as inaccurate word division,mismatch between keywords and text topics,and multi-language mixing in the process of keyword extraction from short text,we propose an adaptive short text keyword generation model based on graph convolutional neural network(ADGCN).First,the model uses graph neural network as the coding framework of text information feature extraction to solve the problem of irregular short text structure and the existence of complex information between words.Then,according to the location features and context features of words,the self attention mechanism is combined to capture rich context dependent information.Finally,a linear decoding scheme is used to generate interpretable keywords.We collect and publish a tag dataset TH from social media platform,including text and topic tags.We evaluate and analyze the relevance,information and coherence of the model results from the perspective of user needs.The model can not only generate keywords that meet the topic of short text,but also effectively alleviate the impact of data disturbance on the model.It is proved that the model performs well on the public dataset KP20k and has good portability.
作者
王永剑
孙亚茹
杨莹
WANG Yongjian;SUN Yaru;YANG Ying(The Third Research Institute of Ministry of Public Security,Shanghai 201204,China)
出处
《北京航空航天大学学报》
EI
CAS
CSCD
北大核心
2022年第2期199-208,共10页
Journal of Beijing University of Aeronautics and Astronautics
关键词
关键词提取
关键词生成
图神经网络
注意力机制
主题模型
keyword extraction
keyword generation
graph neural network
attention mechanism
topic model