期刊文献+

基于注意力机制面向短文本多分类的关键词权重优化 被引量:3

Keyword weight optimization for short text multi-classification based on attention mechanism
下载PDF
导出
摘要 关键词语义敏感影响短文本选择关键词赋予合适权重。针对仅关注关键词是否完备没有考虑到混淆关键词会对分类造成消极影响的问题,提出一种降低混淆关键词权重实现关键词权重优化的模型。首先,基于词频-逆文件频率(TF-IDF)和混淆矩阵的原理选择文本中被定义的混淆关键词。然后,基于注意力机制构建文本表征,通过全连接层降维重构表征;训练重构的表征尽可能相似于原表征,从而选出能保留句子信息的关键词;将混淆关键词从提取的关键词里排除出去,将筛选后的关键词称为强关键词。最后,使用双向长短记忆网络-注意力机制(BiLSTMAttention)经典模型作短文本多分类的基础模型。在此基础模型之上把强关键词整体做嵌入表示加入到BiLSTMAttention模型中的注意力部分激活函数计算当中。与BiLSTM-Attention基础模型进行的实验结果表明,所提模型在Snippets公开数据集上,分类准确率提高0.41个百分点。 Keyword semantic sensitivity affects to give appropriate weights to selected keywords in short text. Aiming atthe problem of focusing only on the completeness of keywords without considering the negative impact of confusing keywordson classification,a model for reducing the weights of confusing keywords to achieve keyword weight optimization wasproposed. First,the defined confusing keywords in the text were selectedbased on the principle of Term Frequency-InverseDocument Frequency(TF-IDF)and confusion matrix;then the text representation was constructed based on the attentionmechanism,and the representation was reconstructed through the fully connected layer dimensionality reduction. Therepresentation reconstructed by training is similar to the original representation as much as possible,so as to select keywordsthat can retain sentence information. The confusing keywords were excluded from the extracted keywords,and the filteredkeywords were called strong keywords. Finally,the classic model of Bidirectional Long Short-Term Memory networkAttention Mechanism(Bi LSTM-Attention)was used as the basic model for short text multi-classification. Based on this basicmodel,the strong keywords were integrated as an embedding representation and added to the activation function calculationof attention part in the Bi LSTM-Attention model. The experimental results show that compared with the Bi LSTM-Attentionbasic model,the model proposed in this paper has a classification accuracy increase of 0. 41 percentage points on theSnippets public dataset.
作者 彭伟乐 武浩 徐立 PENG Weile;WU Hao;XU Li(School of Information,Yunnan University,Yunnan Kunming 650500,China)
出处 《计算机应用》 CSCD 北大核心 2021年第S02期19-24,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61962061)。
关键词 词频 重构文本 关键词 注意力机制 权重优化 word frequency reconstructed text keyword words attention mechanism weight optimization
  • 相关文献

参考文献1

二级参考文献8

共引文献28

同被引文献41

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部