Masked Vision-language Transformer in Fashion 被引量：1

导出

摘要 We present a masked vision-language transformer(MVLT)for fashion-specific multi-modal representation.Technically,we simply utilize the vision transformer architecture for replacing the bidirectional encoder representations from Transformers(BERT)in the pre-training model,making MVLT the first end-to-end framework for the fashion domain.Besides,we designed masked image reconstruction(MIR)for a fine-grained understanding of fashion.MVLT is an extensible and convenient architecture that admits raw multimodal inputs without extra pre-processing models(e.g.,ResNet),implicitly modeling the vision-language alignments.More importantly,MVLT can easily generalize to various matching and generative tasks.Experimental results show obvious improvements in retrieval(rank@5:17%)and recognition(accuracy:3%)tasks over the Fashion-Gen 2018 winner,Kaleido-BERT.The code is available at https://github.com/GewelsJI/MVLT.

作者 Ge-Peng Ji Mingchen Zhuge Dehong Gao Deng-Ping Fan Christos Sakaridis Luc Van Gool

机构地区 International Core Business Unit Computer Vision Lab

出处《Machine Intelligence Research》 EI CSCD 2023年第3期421-434,共14页 机器智能研究（英文版）

关键词 Vision-language masked image reconstruction TRANSFORMER FASHION e-commercial

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献1

1吴友政,李浩然,姚霆,何晓冬.多模态信息处理前沿综述:应用、融合和预训练[J].中文信息学报,2022,36(5):1-20. 被引量：18

引证文献1

1郭锐锋,魏靖烜,于碧辉,孙林壮.MCM-ICE:联合独立编码和协同编码的多模态分类模型[J].小型微型计算机系统,2024,45(9):2080-2086.

1张宪,樊晓聪,唐一鸣,董宝焕,王惠芳.悬吊训练对后侧链薄弱人群动态平衡影响的研究[J].中文科技期刊数据库（全文版）医药卫生,2021(12):13-15.
2李军,孙显,于瀚雯,徐丰,Jón Atli BENEDIKTSSON.遥感与人工智能的交叉创新专题简介[J].中国科学：信息科学,2023,53(5):1026-1026. 被引量：1
3Xulun YE,Jieyu ZHAO.Heterogeneous clustering via adversarial deep Bayesian generative model[J].Frontiers of Computer Science,2023,17(3):103-112.
4Bin Wang,Tianyi Yan.Multi-modal neuroimaging technique: Innovations and applications[J].Brain Science Advances,2023,9(2):53-55.
5Nidhi Kundu,Geeta Rani,Vijaypal Singh Dhaka,Kalpit Gupta,Siddaiah Chandra Nayaka,Eugenio Vocaturo,Ester Zumpano.Disease detection,severity prediction,and crop loss estimation in MaizeCrop using deep learning[J].Artificial Intelligence in Agriculture,2022(1):276-291. 被引量：1
6Benoit D.Jones,Chris R.I.Clayton.Interpretation of tangential and radial pressure cells in and on sprayed concrete tunnel linings[J].Underground Space,2021,6(5):516-527.
7Seyyed Mohammadmahdi Hosseinikia,Negar Khiabanchian,Hadi Rezaei Rad.Assessing the Role of Environmental Factors in the Transmission of Infectious Diseases in Communal Spaces[J].Journal of Architectural Environment & Structural Engineering Research,2023,6(2):33-44.
8Hongyang LI,Xinghua LI,Qingfeng CHENG.A fine-grained privacy protection data aggregation scheme for outsourcing smart grid[J].Frontiers of Computer Science,2023,17(3):187-198.
9Bule Sun,Zhiqin Wang,Ang Yang,Xiaofeng Liu,Shi Jin,Peng Sun,Rakesh Tamrakar,Dajie Jiang.AI Enlightens Wireless Communication: Analyses and Solutions for DMRS Channel Estimation[J].China Communications,2023,20(5):275-287.
10张朝阳.基于BERT的非招标采购实体关系抽取研究[J].信息通信技术与政策,2023,49(6):2-9.

Machine Intelligence Research

2023年第3期

浏览历史

内容加载中请稍等...

Masked Vision-language Transformer in Fashion 被引量：1

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史