摘要
目的现有的图像识别方法应用于从同一分布中提取的训练数据和测试数据时具有良好性能,但这些方法在实际场景中并不适用,从而导致识别精度降低。使用领域自适应方法是解决此类问题的有效途径,领域自适应方法旨在解决来自两个领域相关但分布不同的数据问题。方法通过对数据分布的分析,提出一种基于注意力迁移的联合平衡自适应方法,将源域有标签数据中提取的图像特征迁移至无标签的目标域。首先,使用注意力迁移机制将有标签源域数据的空间类别信息迁移至无标签的目标域。通过定义卷积神经网络的注意力,使用关注信息来提高图像识别精度。其次,基于目标数据集引入网络参数的先验分布,并且赋予网络自动调整每个领域对齐层特征对齐的能力。最后,通过跨域偏差来描述特定领域的特征对齐层的输入分布,定量地表示每层学习到的领域适应性程度。结果该方法在数据集Office-31上平均识别准确率为77.6%,在数据集Office-Caltech上平均识别准确率为90.7%,不仅大幅领先于传统手工特征方法,而且取得了与目前最优的方法相当的识别性能。结论注意力迁移的联合平衡领域自适应方法不仅可以获得较高的识别精度,而且能够自动学习领域间特征的对齐程度,同时也验证了进行域间特征迁移可以提高网络优化效果这一结论。
Objective Many image recognition methods demonstrate good performance when applied to training and test data extracted from the same distribution. However,these methods are unsuitable in practical scenarios and result in low performance. Using domain adaptive methods is an effective approach for solving such problem. Domain adaptation aims to solve various problems,such as when data are from two related domains but with different distributions. In practical applications,labeling data takes substantial manual labor. Thus,unsupervised learning has become a clear trend in image recognition. Transfer learning can extract knowledge from the labeled data in the source domain and transfer it to the unlabeled target domain. Method We propose a joint balanced adaptive method based on attention transfer mechanism,which transfers feature representations extracted from the labeled datasets in the source domain to the unlabeled datasets in the target domain. Specifically,we first transfer the labeled source-domain space category information to the unlabeled target domain via attention transfer mechanism. Neural networks reflect the basic characteristics of the human brain,and attention is precisely an important part of the human visual experience and closely related to perception. Artificial attention mechanism started to be developed as artificial neural network has become increasingly popular in various fields,such as computer vision and pattern recognition. Allowing a system to learn attending objects and understand the mechanism behind neural networks has become a research tool. Attention information can be used to improve image recognition accuracy significantly by defining the attention of convolutional neural networks( CNNs). In this study,attention can be seen as a set of spatial mappings that encode the spatial regions highly concerned with the network input to determine its possible output. Second,we introduce the prior distribution of the network parameters on the basis of the target dataset and endow the layer with the capability of automatically learning the alignment degree that should be pursued at different levels of the network. We expect to explore abundant source-domain attributes through cross-domain learning and capture substantial complex crossdomain knowledge by embedding cross-dataset information for minimizing the original function loss for the learning tasks in two domains as much as possible. Machine learning is an alternative approach for recognizing the refined features after preprocessing raw data into features on the basis of prior knowledge of humans. Machine learning experts have spent most of their time designing features in the past few years because recognition results depend on the quality of features. Recent breakthrough in object recognition has been mainly achieved by approaches based on deep CNN due to its more powerful feature extraction and image representation capabilities than manually defined features,such as HOG and SIFT. The higher the network layers are,the more specific the characteristics are for the target categorization tasks. Meanwhile,the features on successive layers interact with each other in a complex and fragile way. Accordingly,the neurons between neighboring layers co-adapt during training. Therefore,the mobility of features and classifiers decreases as the cross-domain difference increases. Finally,we describe the input distribution of the domain-specific adaptive alignment layer by introducing crossdomain biases,thereby quantitatively indicating the inter-domain adaptation degree that each layer learns. Meanwhile,we adaptively change the weight of each category in the dataset. Although deep CNN is a unified training and prediction framework that combines multi-level feature extractors and recognizers,end-to-end processing is particularly important. The design concept for our model fully utilizes the capability of CNN to perform end-to-end processing. Result The average recognition accuracies of the method in datasets Office-31 and Office-Caltech are 77. 6% and 90. 7%,respectively. Thus,this method significantly outperforms traditional methods based on handcrafted feature and is also comparable with state-of-theart methods. Although not all single transfer tasks achieve optimal results,the average recognition accuracy of the six transfer tasks is improved compared with the current mainstream methods. Conclusion Transferring image features extracted from labeled data in the source domain to the unlabeled target domain effectively solves data problems from two domains that are related but differently distributed. The method fully utilizes the spatial location information of the labeled data in the source domain through attention transfer mechanism and uses the deep CNN to learn the alignment degree of the features between domains automatically. Learning ability largely depends on the degree of inter-domain correlation,which is a major limitation for transfer learning. In addition,knowledge transition is apparently ineffective if no similarity exists between the domains. Thus,we fully consider the feature correlation in the dataset between source and target domains and adaptively change the weight of each category in the dataset. Our method can not only effectively obtain high recognition accuracy but also automatically learn the degree of feature alignment between domains. This method also verifies that the inter-domain feature transfer can improve network optimization effect.
作者
汪荣贵
姚旭晨
杨娟
薛丽霞
Wang Ronggui;Yao Xuchen;Yang Juan;Xue Lixia(School of Computer and Information, Hefei University of Technology, Hefei 230601, China)
出处
《中国图象图形学报》
CSCD
北大核心
2019年第7期1116-1125,共10页
Journal of Image and Graphics
关键词
迁移学习
领域自适应
注意力机制
无监督学习
图像识别
卷积神经网络
transfer learning
domain adaptation
attention mechanism
unsupervised learning
image recognition
convolutional neural networks