摘要
网络欺凌检测是网络空间信息内容安全的重要研究内容,也关乎青少年在线安全.针对目前网络欺凌检测方案存在的训练样本少、难以处理多义词、分类性能不太理想等问题,提出一种ELMo-TextCNN检测模型.该模型首先采用迁移学习思想,利用预训练的ELMo(embeddings from language models)生成动态词向量,不仅解决了网络欺凌样本规模小的问题,而且由于ELMo采用了双向长短期记忆(bi-directional long short-term memory,BiLSTM)网络结构,会根据上下文推断每个词对应的词向量,能够根据语境理解多义词.该模型再通过擅长处理短文本数据的TextCNN(text convolutional neural network)提取文本特征,最后经过全连接层输出分类结果.实验结果证明,提出的ELMo-TextCNN检测方法能够处理一词多义,并获得更好的分类检测效果.
Cyberbullying detection is an important research content on cyberspace information content security,and it is also related to youth online security.Aiming at the problems of few training samples,difficulty in processing polysemous words and unsatisfactory classification performance in current cyberbullying detection schemes,an ELMo-TextCNN detection model is proposed.The model first adopts the idea of transfer learning and uses pre-trained embeddings from language models(ELMo)to generate dynamic word vectors,which not only solves the problem of small cyberbullying sample size,but also because ELMo uses the bi-directional long short-term memory(BiLSTM)network structure,it will infer the word vector corresponding to each word based on the context,and can understand polysemous words according to context.The model extracts text features through a text convolutional neural network(TextCNN),which is good at processing short text data,and finally outputs the classification results through a fully connected layer.Experimental results prove that the proposed ELMo-TextCNN detection method can handle the ambiguity of a word and obtain better classification and detection results.
作者
叶水欢
葛寅辉
陈波
于泠
Ye Shuihuan;Ge Yinhui;Chen Bo;Yu Ling(School of Computer and Electronic InformationSchool of Artificial Intelligence,Nanjing Normal University,Nanjing 210023;Jiangsu Key Laboratory for Numerical Simulation of Large Scale Complex Systems(Nanjing Normal University),Nanjing 210023)
出处
《信息安全研究》
CSCD
2023年第9期868-876,共9页
Journal of Information Security Research
基金
国家社会科学基金项目(21BSH022)
教育部科技司赛尔网络下一代互联网技术创新项目(NGIICS20190504)。