期刊文献+

自适应短文本关键词生成模型 被引量:1

Adaptive short text keyword generation model
下载PDF
导出
摘要 关键词抽取对文本处理影响较大,其识别的准确度及流畅程度是任务的关键。为有效缓解短文本关键词提取过程中词划分不准确、关键词与文本主题不匹配、多语言混合等难题,提出了一种基于图到序列学习模型的自适应短文本关键词生成模型ADGCN。模型采用图神经网络与注意力机制相结合的方式作为对文本信息特征提取的编码框架,针对词的位置特征和语境特征编码,解决了短文本结构不规律和词之间存在关联复杂信息的问题。同时采用了一种线性解码方案,生成了可解释的关键词。在解决问题的过程中,从某社交平台收集并公布了一个标签数据集,其包括社交平台发文文本和话题标签。实验中,从用户需求角度出发对模型结果的相关性、信息量、连贯性进行评估和分析,所提模型不仅可以生成符合短文本主题的关键词,还可以有效缓解数据扰动对模型的影响。所提模型在公开数据集KP20k上仍表现良好,具有较好的可移植性. Keyword extraction has a great impact on text processing,and the accuracy and fluency of keyword recognition are the keys to the task.In order to effectively solve the problems such as inaccurate word division,mismatch between keywords and text topics,and multi-language mixing in the process of keyword extraction from short text,we propose an adaptive short text keyword generation model based on graph convolutional neural network(ADGCN).First,the model uses graph neural network as the coding framework of text information feature extraction to solve the problem of irregular short text structure and the existence of complex information between words.Then,according to the location features and context features of words,the self attention mechanism is combined to capture rich context dependent information.Finally,a linear decoding scheme is used to generate interpretable keywords.We collect and publish a tag dataset TH from social media platform,including text and topic tags.We evaluate and analyze the relevance,information and coherence of the model results from the perspective of user needs.The model can not only generate keywords that meet the topic of short text,but also effectively alleviate the impact of data disturbance on the model.It is proved that the model performs well on the public dataset KP20k and has good portability.
作者 王永剑 孙亚茹 杨莹 WANG Yongjian;SUN Yaru;YANG Ying(The Third Research Institute of Ministry of Public Security,Shanghai 201204,China)
出处 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2022年第2期199-208,共10页 Journal of Beijing University of Aeronautics and Astronautics
关键词 关键词提取 关键词生成 图神经网络 注意力机制 主题模型 keyword extraction keyword generation graph neural network attention mechanism topic model
  • 相关文献

参考文献3

二级参考文献22

  • 1谭胜,马静,吴一占.基于主题描述模型的相关性判断在网页信息抽取中的应用[J].情报学报,2011,30(2):155-159. 被引量:6
  • 2Turney P D. Learning algorithms for keyphrase extraction [ J]. Information Retrieval, 2000, 2(4) : 303-336.
  • 3Hammouda K M, Matute D N, Kamel M S. Corephrase: Keyphrase extraction for document clustering [ M ]// Machine Learning and Data Mining in Pattern Recognition. Springer Berlin Heidelberg, 2005 : 265-274.
  • 4Blei D M,Ng A Y, Jordan M I. Latent dirichlet allocation [J]. the Journal of Machine Learning Research, 2003, 3 : 993-1022.
  • 5Hofmann T. Probabilistic latent semantic indexing[ C ]// Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999: 50-57.
  • 6Mihalcea R,Tarau P. TextRank: Bringing order into texts [ C ]. Association for Computational Linguistics, 2004.
  • 7Witten I H,Paynter G W, Frank E, et al. KEA : Practical automatic keyphrase extraction [ C ]//Proceedings of the fourth ACM conference on Digital libraries. ACM, 1999 : 254-255.
  • 8E1-Beltagy S R,Rafea A. KP-Miner:A keyphrase extraction system for English and Arabic documents[J].Information Systems, 2009, 34( 1 ) : 132-144.
  • 9Li Z, He B. Adding Lexical Chain to Keyphrase Extraction [ C ]//Web Information System and Application Conference (WISA),2014 llth. IEEE, 2014: 254-257.
  • 10Hu X, Zhang X, Lu C, et al. Exploiting Wikipedia as external knowledge for document clustering [ C ]// Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009: 389-396.

共引文献63

同被引文献12

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部