摘要
单一的迁移学习存在无法有效的将知识迁移到目标领域的问题,且迁移过程中易出现负迁移现象,在此背景下,提出了基于混合式迁移学习的文本分类方法。该方法首先利用样本之间的距离作为权衡样本相似性的标准进行样本迁移以扩充目标领域样本,然后利用模型迁移建立带有数据分布自适应的文本分类深度网络结构,最后用扩充后的目标领域数据集来训练网络。实验中使用不同的预训练模型来验证方法的有效性,其中,MT2CERNIE的准确率达到0.884、召回率达到0.890、F1分数达到0.878,具有最佳的预测性能。结果表明,所提方法能够在一定程度上解决标注样本不足、出现负迁移现象等问题。
Single transfer learning can not effectively transfer knowledge to the target domain, and the phenomenon of negative transfer is easy to occur in the transfer process. In this context, a text classification method based on mixed transfer learning is proposed. In this method, the distance between samples is used as the standard to weigh the similarity of samples to expand the target domain samples. Then, the model migration is used to build the deep network structure of text classification with adaptive data distribution, and the network is trained with the expanded target domain dataset. In the experiment, different pre-training models were used to verify the effectiveness of the method. Among them, MT2CERNIE had the best prediction performance with the accuracy of 0.884, recall rate of 0.890 and F1 score of 0.878.The results show that the proposed method can solve the problems of insufficient labeled samples and negative migration to a certain extent.
作者
张合欢
陈致君
杨顶
ZHANG Hehuan;CHEN Zhijun;YANG Ding(College of Computer and Information,China Three Gorges University,Yichang 443002,China)
出处
《长江信息通信》
2022年第5期54-57,共4页
Changjiang Information & Communications
关键词
迁移学习
预训练模型
领域
数据分布
文本分类
transferlearning
pre-trainingmodel
domain
datadistribution
textclassification