摘要
【目的】构建一个基于两阶段迁移学习的多标签分类模型,以解决现有模型中多标签数据采样困难与跨领域迁移学习共性特征较少的问题。【方法】提出"通用领域-目标领域单标签数据-多标签数据"的两阶段迁移学习模型,首先在通用领域上训练,之后迁移到使用上采样方法均衡后的目标领域单标签数据进行微调,最后迁移到多标签数据,实现多标签分类。【结果】以医学文献图像标注为例,实证结果表明:所提模型对于图像多标签分类和文本多标签分类任务均有较好效果,F1值在一阶段迁移学习模型的基础上提升超过50%。【局限】如何根据不同任务优选基础模型和采样方法还有待研究。【结论】本研究可供存在数据集受限的领域大数据标注、检索与利用等研究借鉴。
[Objective]This paper proposes a multi-label classification model,aiming to improve data sampling and add common characteristics of the existing models.[Methods]We constructed a two-stage migration learning model of"common domain-single tag data in the target domain-multiple tag data".Then,we trained this model in the general and the target fields,as well as fine-tuned it with the single label data balanced with the oversampling method.Finally,we migrated the model to multi-label data and generated multi-label classification.[Results]We examined the new model with image annotations from medical literature.On multi-label classification tasks for images and texts,the F1 score was improved by more than 50%compared to the one-stage transfer learning model.[Limitations]More research is needed to choose better basic model and sampling method for different tasks.[Conclusions]This proposed method coud be used in annotation,retrieval and utilization of big data sets with constraints.
作者
陆泉
何超
陈静
田敏
刘婷
Lu Quan;He Chao;Chen Jing;Tian Min;Liu Ting(Center for Studies of Information Resources,Wuhan University,Wuhan 430072,China;Big Data Research Institute,Wuhan University,Wuhan 430072,China;School of Information Management,Central China Normal University,Wuhan 430079,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2021年第7期91-100,共10页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金创新研究群体项目(项目编号:71921002)
武汉大学国家保密学院2020年度建设项目的研究成果之一。