期刊文献+

结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别 被引量:10

Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM
下载PDF
导出
摘要 随着手机短信成为人们日常生活交往的重要手段,垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层,经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量,最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明,结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%,运行时间减少了0.6 s–10.2 s. Mobile phone text messaging has become an increasingly important means of daily communication,so the identification of spam messages has importantly practical significance.A self-attention-based Bi-LSTM neural network model combined with TFIDF is proposed for this purpose.The model first inputs the short message to the Bi-LSTM layer in a vector manner,after feature extraction and combining the information of TFIDF and self-attention layers,the final feature vector is obtained.Finally,the feature vector is classified by the Softmax classifier to obtain the classification result.The experimental results show,compared with the traditional classification model,the self-attention-based Bi-LSTM model combined with TFIDF improves the accuracy of text recognition by 2.1%–4.6%,and the running time is reduced by 0.6 s–10.2 s.
作者 吴思慧 陈世平 WU Si-Hui;CHEN Shi-Ping(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 201203,China)
出处 《计算机系统应用》 2020年第9期171-177,共7页 Computer Systems & Applications
基金 国家自然科学基金(61472256,61170277,61003031) 上海重点科技攻关项目(14511107902) 上海市工程中心建设项目(GCZXL14014) 上海市一流学科建设项目(S1201YLXK,XTKX2021.) 上海市数据科学重点实验室开发课题(201609060003) 沪江基金(A14006) 沪江基金研究基地专项(C14001)。
关键词 垃圾短信 文本分类 self-attention Bi-LSTM TFIDF spam message text categorization self-attention Bi-LSTM TFIDF
  • 相关文献

参考文献4

二级参考文献57

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 2黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 3汉语信息处理词汇01部分:基本术语(GB12200.1-90)6[s],中国标准出版社,1991.
  • 4Hinton G E,Salakhutdinov R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
  • 5Bengio Y,Schwenk H,Senécal J S,et al.Neural probabilistic language models[M].Innovations in Machine Learning.Springer Berlin Heidelberg,2006:137-186.
  • 6Collobert R,Weston J,Bottou L,et al.Natural language processing (almost) from scratch[J].The Journal of Machine Learning Research,2011,12:2493-2537.
  • 7Xue N.Chinese word segmentation as character tagging[J].Computational Linguistics and Chinese Language Processing,2003,8(1):29-48.
  • 8Peng F,Feng F,McCallum A.Chinese segmentation and new word detection using conditional random fields[C]//Proceedings of the 20th International Conference on Computational Linguistics.Association for Computational Linguistics,2004:562.
  • 9Tang B,Wang X,Wang X.Chinese Word Segmentation Based on Large Margin Methods[J].Int.J.of Asian Lang.Proc.,2009,19(2):55-68.
  • 10ZhaoH,Huang C N,Li M,et al.Effective tag set selection in Chinese word segmentation via conditional random field modeling[C]//Proceedings of PACLIC.2006,20:87-94.

共引文献198

同被引文献81

引证文献10

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部