Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method ba...Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention(VALA), which utilizes view information to guide the recognition process to focus on specific attributes and attention mechanism to localize specific attribute-corresponding areas. Concretely, view information is leveraged by the view prediction branch to generate four view weights that represent the confidences for attributes from different views. View weights are then delivered back to compose specific view-attributes, which will participate and supervise deep feature extraction. In order to explore the spatial location of a view-attribute, regional attention is introduced to aggregate spatial information and encode inter-channel dependencies of the view feature. Subsequently, a fine attentive attribute-specific region is localized, and regional weights for the view-attribute from different spatial locations are gained by the regional attention. The final view-attribute recognition outcome is obtained by combining the view weights with the regional weights. Experiments on three wide datasets(richly annotated pedestrian(RAP), annotated pedestrian v2(RAPv2), and PA-100 K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.展开更多
Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) ...Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.展开更多
基金supported by National Key R&D Program of China(No.2018YFB1308000)Natural Science Foundation of Zhejiang province(No.LY21F 030018)。
文摘Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention(VALA), which utilizes view information to guide the recognition process to focus on specific attributes and attention mechanism to localize specific attribute-corresponding areas. Concretely, view information is leveraged by the view prediction branch to generate four view weights that represent the confidences for attributes from different views. View weights are then delivered back to compose specific view-attributes, which will participate and supervise deep feature extraction. In order to explore the spatial location of a view-attribute, regional attention is introduced to aggregate spatial information and encode inter-channel dependencies of the view feature. Subsequently, a fine attentive attribute-specific region is localized, and regional weights for the view-attribute from different spatial locations are gained by the regional attention. The final view-attribute recognition outcome is obtained by combining the view weights with the regional weights. Experiments on three wide datasets(richly annotated pedestrian(RAP), annotated pedestrian v2(RAPv2), and PA-100 K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.
基金supported by the National Natural Science Foundation of China (41874173)。
文摘Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.