期刊文献+

基于联邦学习的主动半监督短文本分类方法

An Active Semi-Supervised Short Text Classification Method Based on Federated Learning
下载PDF
导出
摘要 短文本分类应用广泛,是当前的研究热点,但受到短文本标注数据稀缺和数据隐私保护不便集中训练的影响,分类效果不佳.针对上述问题,我们提出了基于联邦学习的主动半监督异质图注意力网络模型(Active Semi-Supervised Learning empowered Heterogeneous Graph ATtention network model based on Federated learning,Fed-ASSL-HGAT),通过设计新颖的主动半监督学习(Active Semi-Supervised Learning,ASSL)框架生成高质量标注样本赋能异质图注意力网络(Heterogeneous Graph ATttention network model,HGAT),引入联邦学习对部署在不同节点的模型进行联合训练以满足数据隐私保护需求.所提出的ASSL框架通过将主动学习的多类别标注转化成二元类别标注,可大大降低标注难度;设计基于信息增益的选择策略筛选软、硬标签,以防止信息损失;通过半监督学习选择高准确率、高稳定性的正负样本打伪标签以确保标注质量.实验结果表明,所提出的ASSL-HGAT(S)在AGNews、Snippets、TagMyNews数据集上相比HGAT基线模型F1值分别提升2.45%、8.11%、7.46%.融合联邦学习所进一步提出的Fed-ASSL-HGAT模型可在不泄漏隐私数据的情况下满足性能要求. Short-text classification is broadly used and is a current hot research spot.However,the performance of short-text classification is hampered by the sca1rcity of annotated data for short texts and the challenges of centralized training for private data.To address these issues,we propose Fed-ASSL-HGAT(Active Semi-Supervised Heterogeneous Graph ATtention network model based on Federated learning),an active semi-supervised heterogeneous graph attention network model based on federated learning.This model utilizes the innovative active semi-supervised learning(ASSL)framework to generate high-quality labeled samples for empowering the heterogeneous graph attention network(HGAT)model.Additionally,federated learning is introduced to facilitate the joint training of the models deployed on different nodes,thereby satisfying the requirements of data privacy protection.The proposed ASSL framework significantly reduces the annotation difficulty by transforming the multi-class annotation task into a binary classification task.To mitigate information loss,we employ a selection strategy based on information gain to filter soft and hard labels.Semi-supervised learning is employed to select positive and negative samples with high accuracy and stability for pseudo-labeling,thereby ensuring the labeling quality.Experimental results demonstrate that the proposed ASSL-HGAT(Active Semi-supervised Learning Empowered Heterogeneous Graph Attention Network)model achieves improvements of 2.45%,8.11%,and 7.46%in F1 scores comparing with the HGAT baseline model on the AGNews,Snippets,and TagMyNews datasets,respectively.By incorporating the federated learning,the Fed-ASSL-HGAT model can meet the performance requirements without scarifying data privacy.
作者 孔德焱 冀振燕 杨燕燕 刘洋 刘吉强 KONG De-yan;JI Zhen-yan;YANG Yan-yan;LIU Yang;LIU Ji-qiang(School of Software Engineering,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Laboratory of Security and Privacy in Intelligent Transportation,School of Cyberspace Science and Techonology,Beijing Jiaotong University,Beijing 100044,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2024年第10期3517-3526,共10页 Acta Electronica Sinica
基金 国家自然科学基金(No.52175493,No.51935002)。
关键词 异质图神经网络 主动学习 半监督学习 联邦学习 heterogeneous graph neural network active learning semi-supervised learning federated learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部