摘要
【目的】基于图神经网络设计一种针对网络社区中敏感文本的分类模型,为治理网络舆情、维护网络社区信息安全提供帮助。【方法】在文本和词的基础上添加敏感实体构造异构图,引入网络舆情敏感信息的先验知识,然后利用BERT捕获文本的深度语义信息,使用图卷积网络(GCN)获取全局的共现特征,结合两者获得预训练模型和图模型的互补优势,适应长短文本之间的结构差异,最后根据基于网络社区舆情特点设计的敏感文本分类体系进行分类。【结果】在网络舆情敏感文本自制数据集上进行广泛的实验,实验结果表明,所提模型准确率达到70.80%,相较于基线模型至少提高3.52个百分点。【局限】在大语料库上构建的异构图过大会影响计算速度。【结论】所提模型能够适应网络社区敏感文本的结构差异,更好地捕捉文本中的敏感特征以提升分类性能,在敏感文本分类上有较好的效果。
[Objective]This paper proposes a classification model for sensitive texts in online communities based on a graph neural network,which supports public opinion governance and information security.[Methods]First,we constructed a heterogeneous graph based on sensitive entities of texts and words,which included the existing knowledge about sensitive information of online public opinion.Second,we adopted BERT and GCN to capture high-level semantic information of the text and global co-occurrence features.Third,we combined the complementary advantages of pre-training and graph models to address heterogeneous issues due to structural differences between long and short texts.Finally,we classified sensitive texts based on features of online public opinion.[Results]We examined the proposed model on a self-made sensitive text dataset of online public opinion.The accuracy of our method reached 70.80%,which was 3.52%higher than that of other models.[Limitations]Large heterogeneous graphs built on long texts will reduce the computing speed.[Conclusions]The proposed model could effectively identify and classify sensitive content from different online texts.
作者
高浩鑫
孙利娟
吴京宸
高宇童
吴旭
Gao Haoxin;Sun Lijuan;Wu Jingchen;Gao Yutong;Wu Xu(Key Laboratory of Trustworthy Distributed Computing and Service,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Cyberspace Security,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Economics and Management,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Computer Science(National Pilot Software Engineering School),Beijing University of Posts and Telecommunications,Beijing 100876,China;Beijing University of Posts and Telecommunications Library,Beijing 100876,China;School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)
出处
《数据分析与知识发现》
EI
CSCD
北大核心
2023年第11期26-36,共11页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金重大项目(项目编号:72293583)
中国博士后科学基金面上项目(项目编号:2022M710463)的研究成果之一。
关键词
图卷积网络
敏感文本分类
异构图
BERT
Graph Convolutional Network
Sensitive Text Classification
Heterogeneous Graph
BERT