摘要
针对垃圾邮件分类问题中词向量学习不充分的问题,文章引入ALBERT动态词向量生成模型,并提出一种将ALBERT动态词向量与循环神经网络相结合的ALBERT-RNN模型。利用公开的垃圾邮件数据集(TEC06C),对传统统计学模型与4种不同RNN结构的ALBERT-RNN模型进行了对比实验,并用Focal Loss方法对交叉熵损失函数进行了优化。实验结果表明,使用Focal Loss优化的ALBERT-LSTM模型在TEC06C数据集上达到了较高的准确率(99.13%)。
In order to solve the problem of insufficient word vector learning in spam classification,this paper introduces a model with ALBERT dynamic word vector,and proposes an ALBERT-RNN model which combines the ALBERT dynamic word vector with the recurrent neural network.In the open spam dataset(TEC06C),two traditional statistical models and four ALBERT-RNN models with different RNN structure are compared,and the cross entropy loss function of ALBERT-RNN is optimized by Focal Loss method.The experimental results show that the ALBERT-LSTM model with Focal Loss achieves the highest accuracy(99.13%)on the TEC06C dataset.
作者
周枝凝
王斌君
翟一鸣
仝鑫
ZHOU Zhining;WANG Binjun;ZHAI Yiming;TONG Xin(College of Information and Cyber Security,People’s Public Security University of China,Beijing 100038,China)
出处
《信息网络安全》
CSCD
北大核心
2020年第9期107-111,共5页
Netinfo Security
基金
公安部技术研究计划竞争性遴选项目[2019JZX009]
公安部科技强警技术专项[2018GABJC03]
河南省高等学校重点科研项目计划[20B520008]。