摘要
文本行人跨模态检索是指通过对行人的语言描述在行人图像集中检索出对应身份的图像。针对文本特征判别性不足的问题,提出在文本支路结合BERT模型和Text-CNN网络来提升文本特征判别性的方法。该方法主要采用BERT模型作为词嵌入工具,并使用Text-CNN网络对文本特征进行进一步的特征提取。此外,除了使用全局匹配以外,该方法还考虑了局部特征对检索的影响。在CUHK-PEDES数据集上对所提方法进行了实验验证,实验结果证明了所提方法的有效性和优越性。
Text pedestrian cross-modal retrieval refers to retrieving images of corresponding identities from pedestrian image sets through language descriptions of pedestrians.Aiming at the problem of insufficient discriminativeness of text features,this paper proposes a method to improve the discriminativeness of text features by combining BERT model and Text-CNN network in the text branch.This method mainly uses BERT as a word embedding tool,and uses a Text-CNN network for further feature extraction of text features.Furthermore,in addition to using global matching,this paper also considers the impact of local features on retrieval.In this paper,the proposed method is experimentally verified on the CUHK-PEDES dataset,and the experimental results demonstrate the effectiveness and superiority of the proposed method.
作者
莫承见
MO Chengjian(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处
《电视技术》
2022年第4期25-30,共6页
Video Engineering
关键词
跨模态
特征提取
局部特征
cross-modality
feature extraction
local features