期刊文献+

基于异构图卷积网络的网络社区敏感文本分类模型 被引量:1

Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network
原文传递
导出
摘要 【目的】基于图神经网络设计一种针对网络社区中敏感文本的分类模型,为治理网络舆情、维护网络社区信息安全提供帮助。【方法】在文本和词的基础上添加敏感实体构造异构图,引入网络舆情敏感信息的先验知识,然后利用BERT捕获文本的深度语义信息,使用图卷积网络(GCN)获取全局的共现特征,结合两者获得预训练模型和图模型的互补优势,适应长短文本之间的结构差异,最后根据基于网络社区舆情特点设计的敏感文本分类体系进行分类。【结果】在网络舆情敏感文本自制数据集上进行广泛的实验,实验结果表明,所提模型准确率达到70.80%,相较于基线模型至少提高3.52个百分点。【局限】在大语料库上构建的异构图过大会影响计算速度。【结论】所提模型能够适应网络社区敏感文本的结构差异,更好地捕捉文本中的敏感特征以提升分类性能,在敏感文本分类上有较好的效果。 [Objective]This paper proposes a classification model for sensitive texts in online communities based on a graph neural network,which supports public opinion governance and information security.[Methods]First,we constructed a heterogeneous graph based on sensitive entities of texts and words,which included the existing knowledge about sensitive information of online public opinion.Second,we adopted BERT and GCN to capture high-level semantic information of the text and global co-occurrence features.Third,we combined the complementary advantages of pre-training and graph models to address heterogeneous issues due to structural differences between long and short texts.Finally,we classified sensitive texts based on features of online public opinion.[Results]We examined the proposed model on a self-made sensitive text dataset of online public opinion.The accuracy of our method reached 70.80%,which was 3.52%higher than that of other models.[Limitations]Large heterogeneous graphs built on long texts will reduce the computing speed.[Conclusions]The proposed model could effectively identify and classify sensitive content from different online texts.
作者 高浩鑫 孙利娟 吴京宸 高宇童 吴旭 Gao Haoxin;Sun Lijuan;Wu Jingchen;Gao Yutong;Wu Xu(Key Laboratory of Trustworthy Distributed Computing and Service,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Cyberspace Security,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Economics and Management,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Computer Science(National Pilot Software Engineering School),Beijing University of Posts and Telecommunications,Beijing 100876,China;Beijing University of Posts and Telecommunications Library,Beijing 100876,China;School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)
出处 《数据分析与知识发现》 EI CSCD 北大核心 2023年第11期26-36,共11页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金重大项目(项目编号:72293583) 中国博士后科学基金面上项目(项目编号:2022M710463)的研究成果之一。
关键词 图卷积网络 敏感文本分类 异构图 BERT Graph Convolutional Network Sensitive Text Classification Heterogeneous Graph BERT
  • 相关文献

参考文献2

二级参考文献24

共引文献15

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部