为深入研究光学遥感图像中的船舶检测问题,提升检测精度和降低模型复杂度,设计基于改进旋转区域卷积和神经网络(Rotational Region Convolutional Neural Networks),R^(2)CNN的两阶段旋转框检测模型。在模型的第一阶段使用水平框作为候...为深入研究光学遥感图像中的船舶检测问题,提升检测精度和降低模型复杂度,设计基于改进旋转区域卷积和神经网络(Rotational Region Convolutional Neural Networks),R^(2)CNN的两阶段旋转框检测模型。在模型的第一阶段使用水平框作为候选区域;在模型第二阶段引入水平框预测分支,并且设计一种间接预测角度的回归模型;在测试阶段进行旋转框非极大值抑制时,设计基于掩码矩阵的旋转框IoU(Intersection over Union)算法。试验结果显示:改进R^(2)CNN模型在HRSC2016(High Resolution Ship Collection 2016)数据集上取得81.0%的平均精确度,相比其他模型均有不同程度的提升,说明改进R^(2)CNN在简化模型的同时能有效提升使用旋转框检测船舶的性能。展开更多
<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream a...<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters;they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost;further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network;therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.</span>展开更多
文摘<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters;they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost;further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network;therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.</span>