摘要
目的 遥感图像处理技术在农作物规划、植被检测以及农用地监测等方面具有重要的作用。然而农作物遥感图像上存在类别不平衡的问题,部分样本中农作物类间相似度高、类内差异性大,使得农作物遥感图像的语义分割更具挑战性。为了解决这些问题,提出一种融合不同尺度类别关系的农作物遥感图像语义分割网络CRNet(class relation network)。方法 该网络将ResNet-34作为编码器的主干网络提取图像特征,并采用特征金字塔结构融合高阶语义特征和低阶空间信息,增强网络对图像细节的处理能力。引入类别关系模块获取不同尺度的类别关系,利用一种新的类别特征加强注意力机制(class feature enhancement, CFE)结合通道注意力和加强位置信息的空间注意力,使得农作物类间的语义差异和农作物类内的相关性增大。在解码器中,将不同尺度的类别关系融合,增强了网络对不同尺度农作物特征的识别能力,从而提高了对农作物边界分割的精度。通过数据预处理、数据增强和类别平衡损失函数(class-balanced loss, CB loss)进一步缓解了农作物遥感图像中类别不平衡的问题。结果 在Barley Remote Sensing数据集上进行的实验表明,CRNet网络的平均交并比(mean intersection over union, MIoU)和总体分类精度(overall accuracy, OA)分别达到68.89%和82.59%,性能在评价指标和可视化效果上均优于PSPNet(pyramid scene parsing network)、FPN(feature pyramid network)、LinkNet、DeepLabv3+、FarSeg(foreground-aware relation network)以及STLNet(statistical texture learning network)。结论 CRNet网络通过类别关系模块,在遥感图像复杂的地物背景中更加精准地区分相似的不同农作物,识别特征差异大的同种农作物,并融合多级特征使得提取出的目标边界更加清晰完整,提高了分割精度。
Objective Remote sensing based image processing technology plays an important role in crop planning, vegetation detection and agricultural land detection. The purpose of crop-relevant remote sensing image semantic segmentation is to classify the crop-relevant remote sensing image at pixel level and segment the image into regions with different semantic identification. The semantic segmentation of crop-relevant remote sensing image has been challenging in contrast to natural scene on the two aspects: 1) the number of samples of different categories varies greatly and the distribution is extremely unbalanced. For example, there are much more background-related samples with less samples remaining. The following overfitting and poor robustness problems are appeared for network training. 2) The similarity of appearance features of different crops is presented higher, which makes it difficult to distinguish similar appearance for the network, while the appearance features of the same crop are different, which could cause misclassify the same crop. We develop a semantic segmentation network called class relation network(CRNet) for crop-relevant remote sensing image, which integrates multiple scale class relations. Our experimental data is carried out on Barley Remote Sensing Dataset derived from the Tianchi Big Data Competition. Since the dataset consists of 4 large-size high-resolution remote sensing images, it cannot be as an input to a neural network. First, it is necessary to process the image and cut it into many sub-graphs of 512×512 pixels. Next, there are 11 750 sub-graphs in the dataset after cutting, including 9 413 images in the training set and 2 337 images in the test set. The ratio of the training set is about 4 ∶1 to the test set. Method Our CRNet is composed of three parts like variant of feature pyramid network encoder, category relation module and decoder. 1) In the encoder, ResNet-34 is used as the backbone network to extract the image features from bottom to top gradually, which can process image details better. Similar to the original feature pyramid structure(from top to bottom), horizontal links are used to fuse high-level semantic features and low-level spatial information. 2) The category relation module consists of three layers of paralleled structure. After the features of the three layers outputted by the encoder pass through the 1×1 convolution layer, the channel dimension is reduced to 5. The 1×1 convolutional layer here can be regarded as a classifier that maps global features into 5 channels, corresponding to the classification category, and each channel can represent features of a targeted category. Then, the feature map of each layer is input into the category feature enhancement(CFE) attention mechanism. The CFE attention module is segmented to channel-based and spatial-relevant. Assigned weights for each category is conducted by learning the correlation between the features of each channel. To clarify the features between different categories, the channel attention mechanism is focused on strengthening the strong-correlated features and suppressing the weak-correlated features. The channel information is encoded in the spatial dimension through global average pooling and global max pooling, and the global context information is modeled to obtain the global features of each channel. The spatial attention module enhances the location information of crops, such as the sites of crops in the farmland. Each location is connected with the horizontal or vertical direction in the feature image via learning the spatial information in the horizontal and vertical directions. The CFE attention module can obtain more distinct features in different categories. The feature differences are identified further between multiple crops. At the same time, more context information is improved for the feature of the same category, which aids to reduce the misclassification of the same crop. 3) In the decoder, the classification relations of different scales are fused and restored to the initial resolution, and the final classification is carried out by fully combining the feature information of each scale. In addition, we use data enhancement to reduce the proportion of background samples and expand the number of samples of other categories. To further alleviate the problem of class imbalance in crop-relevant remote sensing images, a class-balanced loss(CB loss) function is introduced. Result To verify the effectiveness of the CRNet, our training model is tested on Barley Remote Sensing dataset, and the mean intersection over union(MIoU) is 68.89%, and the overall accuracy(OA) is 82.59%. Our CRNet is increased by 7.42%, 4.86%, 4.57%, 4.36%, 4.05%, and 3.63% respectively in MIoU in contrast to the Linknet, pyramid scene parsing network(PSPNet), DeepLabv3+, foreground-aware relation network(FarSeg), statistical texture learning network(STLNet) and feature pyramid network(FPN), and our OA is improved by 4.35%, 2.6%, 3.01%, 2.5%, 2.45% and 1.85% of each. The number of parameters and inference speed of CRNet are reached to 21.98 MB and 68 frames/s. Compared to LinkNet and FPN, its number of parameters and inference speed are increased, which are 7.42% and 4.35% higher than LinkNet, 3.63% and 1.85% higher than FPN in MIoU and OA. Conclusion In the combination of multi-level features and the introduction of category relation module, our CRNet network can distinguish the similar crops more accurately. The same crops are sorted out in the complex ground object background of remote sensing image. The completed target boundary can be extracted more. The experiment shows that our CRNet has its priority for crop-relevant semantic segmentation methods.
作者
董荣胜
马雨琪
刘意
李凤英
Dong Rongsheng;Ma Yuqi;Liu Yi;Li Fengying(Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China)
出处
《中国图象图形学报》
CSCD
北大核心
2022年第11期3382-3394,共13页
Journal of Image and Graphics
基金
国家自然科学基金项目(62062029,61762024)
广西自然科学基金项目(2017GXNSFDA198050)。