摘要
随着手机短信成为人们日常生活交往的重要手段,垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层,经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量,最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明,结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%,运行时间减少了0.6 s–10.2 s.
Mobile phone text messaging has become an increasingly important means of daily communication,so the identification of spam messages has importantly practical significance.A self-attention-based Bi-LSTM neural network model combined with TFIDF is proposed for this purpose.The model first inputs the short message to the Bi-LSTM layer in a vector manner,after feature extraction and combining the information of TFIDF and self-attention layers,the final feature vector is obtained.Finally,the feature vector is classified by the Softmax classifier to obtain the classification result.The experimental results show,compared with the traditional classification model,the self-attention-based Bi-LSTM model combined with TFIDF improves the accuracy of text recognition by 2.1%–4.6%,and the running time is reduced by 0.6 s–10.2 s.
作者
吴思慧
陈世平
WU Si-Hui;CHEN Shi-Ping(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 201203,China)
出处
《计算机系统应用》
2020年第9期171-177,共7页
Computer Systems & Applications
基金
国家自然科学基金(61472256,61170277,61003031)
上海重点科技攻关项目(14511107902)
上海市工程中心建设项目(GCZXL14014)
上海市一流学科建设项目(S1201YLXK,XTKX2021.)
上海市数据科学重点实验室开发课题(201609060003)
沪江基金(A14006)
沪江基金研究基地专项(C14001)。