期刊文献+

基于深度互学习的多标记零样本分类

Multi-Label Zero-Shot Classification Based on Deep Mutual Learning
下载PDF
导出
摘要 目前已有大量方案解决零样本图像分类问题,但对多标记零样本图像分类问题的研究很少,在现有的解决方案中,模型在训练时除了利用已标注的数据集和给定的先验知识外,只利用图像区域信息或只利用标签语义信息。基于深度互学习技术,提出一种能同时利用图像区域和标签语义两种信息的解决方法。设计两个子网络,将子网络1用于增强图像视觉特征,通过多头自注意机制关联图像中不同区域的特征信息,得到基于区域的视觉特征表示,再将该特征表示映射到语义空间中,并输出预测概率分布;使子网络2用于融合标签语义信息与图像视觉特征,通过计算标签和图像区域特征的相关性,得到基于语义的视觉特征表示,将特征表示映射到语义空间中输出概率分布。最后引入深度互学习技术,利用两个子网络得到的概率分布为对方提供训练经验以进行互相学习,该过程中子网络在训练自身分类性能的同时也学习对方的训练经验,有效提升多标记零样本图像分类的性能。实验结果表明,所提方法在MS COCO数据集上的F1值相比Deep0Tag方法提升了5.2个百分点。 Numerous methods have been proposed to solve the zero-shot image classification problem;however,there are limited studies on the multi-label zero-shot image classification problem.In the existing solutions,in addition to the use of the basic settings of the labeled dataset and the given prior knowledge,the model either only uses the image region information or only the label semantic information.Based on deep mutual learning technology,this study proposes a solution that utilizes both the image region and label semantic information.Two sub-networks are designed.Sub-network 1 is used to enhance the visual features of the image,whereas the multi-head self-attention mechanism is used to associate the feature information of different regions in the image to obtain a region-based visual feature representation and then map the feature representation to the semantic space to output the predicted probability.Sub-network 2 is used to fuse the label semantic information and image visual features by calculating the correlation between the labels and image region features to obtain a semantic-based visual feature representation,and then map the feature representation to the semantic space to output a probability distribution.Finally,the deep mutual learning technology is introduced,and the probability distribution obtained by the two sub-networks is used to provide training experience for mutual learning.In this process,the sub-network refers to the training experience of the other sub-network while training its own classification performance,which effectively improves the performance of multi-label zero-shot image classification.The experimental results show that the F1 value of the proposed method on the MS COCO dataset increased by 5.2 percentage points compared to the Deep0Tag method.
作者 袁志祥 王雅卿 黄俊 YUAN Zhixiang;WANG Yaqing;HUANG Jun(School of Computer Science and Technology,Anhui University of Technology,Maanshan 243032,Anhui,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第10期64-71,共8页 Computer Engineering
基金 国家自然科学基金(61806005) 安徽省高校科学研究重点项目(KJ2021A0372,KJ2021A0373) 安徽省高校优秀青年人才支持计划项目(gxyqZD2022032)。
关键词 深度学习 图像分类 多标记学习 零样本学习 互学习 deep learning image classification multi-label learning zero-shot learning mutual learning
  • 相关文献

参考文献3

二级参考文献27

  • 1Tsoumakas G, Katakis I. Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 2007,3 (3) : 1 - 13.
  • 2Tsoumakas G, Katakis, Vlahavas L Mining multi- label data. In: Maimon O, Rokach L. Data Mining and Knowledge Discovery Handbook, Part 6. The 2^nd Edition. US : Springer, 2010,67 - 685.
  • 3Schapire R E, Singer Y. Boostexter: A boosting- based system for text categorization. Machine Learning, 2000,39 (2/3) :135-168.
  • 4Godbole S, Sarawagi S. Discriminative methods for multi-labeled classification. In: PAKDD'04: The 8^th Pacific-Asia Conferenee on Knowledge Discovery and Data Mining. Berlin: Springer, 2004,22-30.
  • 5Ftirnkranz J, Htillermeier E, Mencia E L, et al. Multilabel classification via calibrated label ranking. Machine Learning, 2008, 73 (2): 133-153.
  • 6Clare A, King R D. Knowledge discovery in multi-label phenotype data. In:De Raedt L, Siebes A. Leeture Notes in Computer Science 2168. Berlin: Springer, 2001,42 - 53.
  • 7Elisseeff A,Weston J. A kernel method for multi- labelled classification. In: Dietteroch T G, Bercker S,Ghahramani Z. Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, 2002,681- 687.
  • 8Barutcuoglu Z,Schapire R E, Troyanskaya O G. Hierarchical multi-label prediction of gene function. Bioinformatics, 2006,22 (7) : 830 - 836.
  • 9Boutell M R, Luo J, Shen X, et al. Learning multi- label scene classification. Pattern Recognition. 2004,37(9) :1757-1771.
  • 10Qi G J, Hua X S, Rui Y, et al. Correlative multi- label video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia. New York, NY: ACM Press, 2007, 17-26.

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部