Region-Aware Fashion Contrastive Learning for Unified Attribute Recognition and Composed Retrieval

面向属性识别和组合检索的区域感知时尚对比学习

下载PDF

导出

摘要 Clothing attribute recognition has become an essential technology,which enables users to automatically identify the characteristics of clothes and search for clothing images with similar attributes.However,existing methods cannot recognize newly added attributes and may fail to capture region-level visual features.To address the aforementioned issues,a region-aware fashion contrastive language-image pre-training(RaF-CLIP)model was proposed.This model aligned cropped and segmented images with category and multiple fine-grained attribute texts,achieving the matching of fashion region and corresponding texts through contrastive learning.Clothing retrieval found suitable clothing based on the user-specified clothing categories and attributes,and to further improve the accuracy of retrieval,an attribute-guided composed network(AGCN)as an additional component on RaF-CLIP was introduced,specifically designed for composed image retrieval.This task aimed to modify the reference image based on textual expressions to retrieve the expected target.By adopting a transformer-based bidirectional attention and gating mechanism,it realized the fusion and selection of image features and attribute text features.Experimental results show that the proposed model achieves a mean precision of 0.6633 for attribute recognition tasks and a recall@10(recall@k is defined as the percentage of correct samples appearing in the top k retrieval results)of 39.18 for composed image retrieval task,satisfying user needs for freely searching for clothing through images and texts. 服装属性识别已成为一项关键技术,使用户能够自动识别服装的特征,并搜索具有相似属性的服装图片。然而,现有方法无法识别新添加的属性,并且可能无法捕获区域级别视觉特征。为解决上述问题,该研究提出一种区域感知时尚对比语言图像预训练(region-aware fashion contrastive language-image pretraining,RaF-CLIP)模型。该模型将裁剪和分割的图像与类别和多个细粒度属性文本进行对齐,通过对比学习实现时尚区域与相应文本的匹配。服装检索基于用户指定的服装类别和属性来找到合适的服装,为进一步提高检索的准确性,该研究在RaF-CLIP模型上引入属性引导的组合网络(attribute-guided composed network,AGCN),并将其作为附加组件,专用于组合图像检索任务。该任务旨在根据文本表达修改参考图像以检索预期的目标。通过采用基于transformer的双向注意力和门控机制,该网络实现了图像特征和属性文本特征的融合与选择。试验结果表明,所提出的模型在属性识别任务中平均精度达到0.6633,在组合图像检索任务中recall@10(recall@k表示正确样本出现在前k个检索结果中的百分比)指标达到39.18,满足用户通过图像和文本自由搜索服装的需求。

作者 WANG Kangping ZHAO Mingbo 王康平;赵鸣博(东华大学信息科学与技术学院,上海201620)

机构地区 College of Information Science and Technology

出处《Journal of Donghua University(English Edition)》 CAS 2024年第4期405-415,共11页 东华大学学报（英文版）

基金 National Natural Science Foundation of China(No.61971121)。

关键词 attribute recognition image retrieval contrastive language-image pre-training(CLIP) image text matching transformer 属性识别图像检索对比语言图像预训练(CLIP) 图像文本匹配 transformer

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1罗辛,夏冬梅,陶然,史有群.基于细粒度特征的面料图像检索[J].Journal of Donghua University(English Edition),2024,41(2):115-129. 被引量：2
2查剑宏,燕彩蓉,张艳婷,王俊.Image Retrieval with Text Manipulation by Local Feature Modification[J].Journal of Donghua University(English Edition),2023,40(4):404-409. 被引量：2
3李锋,潘煌圣,盛守祥,王国栋.Image Retrieval Based on Vision Transformer and Masked Learning[J].Journal of Donghua University(English Edition),2023,40(5):539-547. 被引量：5

二级参考文献5

1Shen Fei,Wei Mengwan,Liu Jiajun,Zeng Huanqiang,Zhu Jianqing.RGB and LBP-texture deep nonlinearly fusion features for fabric retrieval[J].High Technology Letters,2020,26(2):196-203. 被引量：1
2ZHOU Honglei,PENG Zhifei,TAO Ran,ZHANG Lu.Feature Fusion Multi_XMNet Convolution Neural Network for Clothing Image Classification[J].Journal of Donghua University(English Edition),2021,38(6):519-526. 被引量：2
3MA Xiao,WANG Shaoyu,YE Shaoping,FAN Jingyi,XU An,XIA Xiaoling.Narrow Pooling Clothing Classification Based on Attention Mechanism[J].Journal of Donghua University(English Edition),2022,39(4):367-372. 被引量：2
4查剑宏,燕彩蓉,张艳婷,王俊.Image Retrieval with Text Manipulation by Local Feature Modification[J].Journal of Donghua University(English Edition),2023,40(4):404-409. 被引量：2
5李锋,潘煌圣,盛守祥,王国栋.Image Retrieval Based on Vision Transformer and Masked Learning[J].Journal of Donghua University(English Edition),2023,40(5):539-547. 被引量：5

共引文献5

1李锋,余彦君,盛守祥.Robot Positioning Based on Multiple Quick Response Code Landmarks[J].Journal of Donghua University(English Edition),2023,40(6):667-675.
2罗辛,夏冬梅,陶然,史有群.基于细粒度特征的面料图像检索[J].Journal of Donghua University(English Edition),2024,41(2):115-129. 被引量：2
3刘冰,刘莹,郑小虎,李廨晨,杜思淇.面向缝纫设备运维管理的语言模型构建方法研究[J].Journal of Donghua University(English Edition),2024,41(3):315-322. 被引量：1
4LI Chengzu,WEI Kehan,ZHAO Yingbo,TIAN Xuehui,QIAN Yang,ZHANG Lu,WANG Rongwu.Improvement of High-Speed Detection Algorithm for Nonwoven Material Defects Based on Machine Vision[J].Journal of Donghua University(English Edition),2024,41(4):416-427. 被引量：2
5ZHAO Hongzhi,HAO Lingguang,HAO Kuangrong,WEI Bing,LIU Xiaoyan.Attention-Guided Sparse Adversarial Attacks with Gradient Dropout[J].Journal of Donghua University(English Edition),2024,41(5):545-556.

1Zhuo ZHANG,Ya LI,Jianxin XUE,Xiaoguang MAO.Improving fault localization with pre-training[J].Frontiers of Computer Science,2024,18(1):247-249.
2分类训练一[J].时代英语（高二版）,2024(3):16-51.
3Sania Binte Saleem,Yousaf Ali.Effect of lifestyle changes and consumption patterns on environmental impact:a comparison study of Pakistan and China[J].Chinese Journal of Population,Resources and Environment,2019,17(2):113-122.
4周奕,马汉杰,许永恩,宗佳敏,李少华.基于对抗训练和片段级别的双向情感三元组抽取模型[J].软件工程,2024,27(9):73-78.
5Lirong Yin,Lei Wang,Zhuohang Cai,Siyu Lu,Ruiyang Wang,Ahmed AlSanad,Salman A.AlQahtani,Xiaobing Chen,Zhengtong Yin,Xiaolu Li,Wenfeng Zheng.DPAL-BERT:A Faster and Lighter Question Answering Model[J].Computer Modeling in Engineering & Sciences,2024,141(10):771-786.
6Zhanhong Ye,Xiang Huang,Hongsheng Liu,Bin Dong.Meta-Auto-Decoder:a Meta-Learning-Based Reduced Order Model for Solving Parametric Partial Differential Equations[J].Communications on Applied Mathematics and Computation,2024,6(2):1096-1130.
7姜文涛,高原,袁姮,刘万军.门控机制的图像分类网络[J].电子学报,2024,52(7):2393-2406.
8Wenjia Li,Liang Wu,Xinde Xu,Zhong Xie,Qinjun Qiu,Hao Liu,Zhen Huang,Jianguo Chen.Deep Learning and Network Analysis:Classifying and Visualizing Geologic Hazard Reports[J].Journal of Earth Science,2024,35(4):1289-1303.
9Tao Zhang,Ying Fu,Liwei Huang,Siyuan Li,Shaodi You,Chenggang Yan.RGB-guided hyperspectral image super-resolution with deep progressive learning[J].CAAI Transactions on Intelligence Technology,2024,9(3):679-694.
10Xue-Jing Luo,Shuo Wang,Zongwei Wu,Christos Sakaridis,Yun Cheng,Deng-Ping Fan,Luc Van Gool.CamDiff:Camouflage Image Augmentation via Diffusion[J].CAAI Artificial Intelligence Research,2023,2(1):55-64.

Journal of Donghua University(English Edition)

2024年第4期

浏览历史

内容加载中请稍等...

Region-Aware Fashion Contrastive Learning for Unified Attribute Recognition and Composed Retrieval

参考文献3

二级参考文献5

共引文献5

相关作者

相关机构

相关主题

浏览历史