alient object detection aims at identifying the visually interesting object regions that are consistent with human perception. Multispectral remote sensing images provide rich radiometric information in revealing the ...alient object detection aims at identifying the visually interesting object regions that are consistent with human perception. Multispectral remote sensing images provide rich radiometric information in revealing the physical properties of the observed objects, which leads to great potential to perform salient object detection for remote sensing images. Conventional salient object detection methods often employ handcrafted features to predict saliency by evaluating the pixel-wise or superpixel-wise contrast. With the recent use of deep learning framework, in particular, fully convolutional neural networks, there has been profound progress in visual saliency detection. However, this success has not been extended to multispectral remote sensing images, and existing multispectral salient object detection methods are still mainly based on handcrafted features, essentially due to the difficulties in image acquisition and labeling. In this paper, we propose a novel deep residual network based on a top-down model, which is trained in an end-to-end manner to tackle the above issues in multispectral salient object detection. Our model effectively exploits the saliency cues at different levels of the deep residual network. To overcome the limited availability of remote sensing images in training of our deep residual network, we also introduce a new spectral image reconstruction model that can generate multispectral images from RGB images. Our extensive experimental results using both multispectral and RGB salient object detection datasets demonstrate a significant performance improvement of more than 10% improvement compared with the state-of-the-art methods.展开更多
Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous human...Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.展开更多
The frequent occurrence of extreme weather events has rendered numerous landslides to a global natural disaster issue.It is crucial to rapidly and accurately determine the boundaries of landslides for geohazards evalu...The frequent occurrence of extreme weather events has rendered numerous landslides to a global natural disaster issue.It is crucial to rapidly and accurately determine the boundaries of landslides for geohazards evaluation and emergency response.Therefore,the Skip Connection DeepLab neural network(SCDnn),a deep learning model based on 770 optical remote sensing images of landslide,is proposed to improve the accuracy of landslide boundary detection.The SCDnn model is optimized for the over-segmentation issue which occurs in conventional deep learning models when there is a significant degree of similarity between topographical geomorphic features.SCDnn exhibits notable improvements in landslide feature extraction and semantic segmentation by combining an enhanced Atrous Spatial Pyramid Convolutional Block(ASPC)with a coding structure that reduces model complexity.The experimental results demonstrate that SCDnn can identify landslide boundaries in 119 images with MIoU values between 0.8and 0.9;while 52 images with MIoU values exceeding 0.9,which exceeds the identification accuracy of existing techniques.This work can offer a novel technique for the automatic extensive identification of landslide boundaries in remote sensing images in addition to establishing the groundwork for future inve stigations and applications in related domains.展开更多
The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivot...The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivotal components of image preprocessing,fostering an improvement in the quality of remote sensing imagery.This enhancement renders remote sensing data more indispensable,thereby enhancing the accuracy of target iden-tification.Conventional defogging techniques based on simplistic atmospheric degradation models have proven inadequate for mitigating non-uniform haze within remotely sensed images.In response to this challenge,a novel UNet Residual Attention Network(URA-Net)is proposed.This paradigmatic approach materializes as an end-to-end convolutional neural network distinguished by its utilization of multi-scale dense feature fusion clusters and gated jump connections.The essence of our methodology lies in local feature fusion within dense residual clusters,enabling the extraction of pertinent features from both preceding and current local data,depending on contextual demands.The intelligently orchestrated gated structures facilitate the propagation of these features to the decoder,resulting in superior outcomes in haze removal.Empirical validation through a plethora of experiments substantiates the efficacy of URA-Net,demonstrating its superior performance compared to existing methods when applied to established datasets for remote sensing image defogging.On the RICE-1 dataset,URA-Net achieves a Peak Signal-to-Noise Ratio(PSNR)of 29.07 dB,surpassing the Dark Channel Prior(DCP)by 11.17 dB,the All-in-One Network for Dehazing(AOD)by 7.82 dB,the Optimal Transmission Map and Adaptive Atmospheric Light For Dehazing(OTM-AAL)by 5.37 dB,the Unsupervised Single Image Dehazing(USID)by 8.0 dB,and the Superpixel-based Remote Sensing Image Dehazing(SRD)by 8.5 dB.Particularly noteworthy,on the SateHaze1k dataset,URA-Net attains preeminence in overall performance,yielding defogged images characterized by consistent visual quality.This underscores the contribution of the research to the advancement of remote sensing technology,providing a robust and efficient solution for alleviating the adverse effects of haze on image quality.展开更多
Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose ...Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose an innovative transformer-based method.This method leverages transformers,which are adept at processing data sequences,to enhance cloud detection accuracy.Additionally,we introduce a Cyclic Refinement Architecture that improves the resolution and quality of feature extraction,thereby aiding in the retention of critical details often lost during cloud detection.Our extensive experimental validation shows that our approach significantly outperforms established models,excelling in high-resolution feature extraction and precise cloud segmentation.By integrating Positional Visual Transformers(PVT)with this architecture,our method advances high-resolution feature delineation and segmentation accuracy.Ultimately,our research offers a novel perspective for surmounting traditional challenges in cloud detection and contributes to the advancement of precise and dependable image analysis across various domains.展开更多
Remote sensing images carry crucial ground information,often involving the spatial distribution and spatiotemporal changes of surface elements.To safeguard this sensitive data,image encryption technology is essential....Remote sensing images carry crucial ground information,often involving the spatial distribution and spatiotemporal changes of surface elements.To safeguard this sensitive data,image encryption technology is essential.In this paper,a novel Fibonacci sine exponential map is designed,the hyperchaotic performance of which is particularly suitable for image encryption algorithms.An encryption algorithm tailored for handling the multi-band attributes of remote sensing images is proposed.The algorithm combines a three-dimensional synchronized scrambled diffusion operation with chaos to efficiently encrypt multiple images.Moreover,the keys are processed using an elliptic curve cryptosystem,eliminating the need for an additional channel to transmit the keys,thus enhancing security.Experimental results and algorithm analysis demonstrate that the algorithm offers strong security and high efficiency,making it suitable for remote sensing image encryption tasks.展开更多
With the arrival of new data acquisition platforms derived from the Internet of Things(IoT),this paper goes beyond the understanding of traditional remote sensing technologies.Deep fusion of remote sensing and compute...With the arrival of new data acquisition platforms derived from the Internet of Things(IoT),this paper goes beyond the understanding of traditional remote sensing technologies.Deep fusion of remote sensing and computer vision has hit the industrial world and makes it possible to apply Artificial intelligence to solve problems such as automatic extraction of information and image interpretation.However,due to the complex architecture of IoT and the lack of a unified security protection mechanism,devices in remote sensing are vulnerable to privacy leaks when sharing data.It is necessary to design a security scheme suitable for computation‐limited devices in IoT,since traditional encryption methods are based on computational complexity.Visual Cryptography(VC)is a threshold scheme for images that can be decoded directly by the human visual system when superimposing encrypted images.The stacking‐to‐see feature and simple Boolean decryption operation make VC an ideal solution for privacy‐preserving recognition for large‐scale remote sensing images in IoT.In this study,the secure and efficient transmission of high‐resolution remote sensing images by meaningful VC is achieved.By diffusing the error between the encryption block and the original block to adjacent blocks,the degradation of quality in recovery images is mitigated.By fine‐tuning the pre‐trained model from large‐scale datasets,we improve the recognition performance of small encryption datasets for remote sensing images.The experimental results show that the proposed lightweight privacy‐preserving recognition framework maintains high recognition performance while enhancing security.展开更多
The exploration of building detection plays an important role in urban planning,smart city and military.Aiming at the problem of high overlapping ratio of detection frames for dense building detection in high resoluti...The exploration of building detection plays an important role in urban planning,smart city and military.Aiming at the problem of high overlapping ratio of detection frames for dense building detection in high resolution remote sensing images,we present an effective YOLOv3 framework,corner regression-based YOLOv3(Correg-YOLOv3),to localize dense building accurately.This improved YOLOv3 algorithm establishes a vertex regression mechanism and an additional loss item about building vertex offsets relative to the center point of bounding box.By extending output dimensions,the trained model is able to output the rectangular bounding boxes and the building vertices meanwhile.Finally,we evaluate the performance of the Correg-YOLOv3 on our self-produced data set and provide a comparative analysis qualitatively and quantitatively.The experimental results achieve high performance in precision(96.45%),recall rate(95.75%),F1 score(96.10%)and average precision(98.05%),which were 2.73%,5.4%,4.1%and 4.73%higher than that of YOLOv3.Therefore,our proposed algorithm effectively tackles the problem of dense building detection in high resolution images.展开更多
The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectiv...The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectively capture global context.In order to solve this problem,this paper proposes a hybrid model based on ResNet50 and swin transformer to directly capture long-range dependence,which fuses features through Cross Feature Modulation Module(CFMM).Experimental results on two publicly available datasets,Vaihingen and Potsdam,are mIoU of 70.27%and 76.63%,respectively.Thus,CFM-UNet can maintain a high segmentation performance compared with other competitive networks.展开更多
Semantic segmentation of remote sensing images is one of the core tasks of remote sensing image interpretation.With the continuous develop-ment of artificial intelligence technology,the use of deep learning methods fo...Semantic segmentation of remote sensing images is one of the core tasks of remote sensing image interpretation.With the continuous develop-ment of artificial intelligence technology,the use of deep learning methods for interpreting remote-sensing images has matured.Existing neural networks disregard the spatial relationship between two targets in remote sensing images.Semantic segmentation models that combine convolutional neural networks(CNNs)and graph convolutional neural networks(GCNs)cause a lack of feature boundaries,which leads to the unsatisfactory segmentation of various target feature boundaries.In this paper,we propose a new semantic segmentation model for remote sensing images(called DGCN hereinafter),which combines deep semantic segmentation networks(DSSN)and GCNs.In the GCN module,a loss function for boundary information is employed to optimize the learning of spatial relationship features between the target features and their relationships.A hierarchical fusion method is utilized for feature fusion and classification to optimize the spatial relationship informa-tion in the original feature information.Extensive experiments on ISPRS 2D and DeepGlobe semantic segmentation datasets show that compared with the existing semantic segmentation models of remote sensing images,the DGCN significantly optimizes the segmentation effect of feature boundaries,effectively reduces the noise in the segmentation results and improves the segmentation accuracy,which demonstrates the advancements of our model.展开更多
A method to remove stripes from remote sensing images is proposed based on statistics and a new image enhancement method.The overall processing steps for improving the quality of remote sensing images are introduced t...A method to remove stripes from remote sensing images is proposed based on statistics and a new image enhancement method.The overall processing steps for improving the quality of remote sensing images are introduced to provide a general baseline.Due to the differences in satellite sensors when producing images,subtle but inherent stripes can appear at the stitching positions between the sensors.These stitchingstripes cannot be eliminated by conventional relative radiometric calibration.The inherent stitching stripes cause difficulties in downstream tasks such as the segmentation,classification and interpretation of remote sensing images.Therefore,a method to remove the stripes based on statistics and a new image enhancement approach are proposed in this paper.First,the inconsistency in grayscales around stripes is eliminated with the statistical method.Second,the pixels within stripes are weighted and averaged based on updated pixel values to enhance the uniformity of the overall image radiation quality.Finally,the details of the images are highlighted by a new image enhancement method,which makes the whole image clearer.Comprehensive experiments are performed,and the results indicate that the proposed method outperforms the baseline approach in terms of visual quality and radiation correction accuracy.展开更多
Recently,the convolutional neural network(CNN)has been dom-inant in studies on interpreting remote sensing images(RSI).However,it appears that training optimization strategies have received less attention in relevant ...Recently,the convolutional neural network(CNN)has been dom-inant in studies on interpreting remote sensing images(RSI).However,it appears that training optimization strategies have received less attention in relevant research.To evaluate this problem,the author proposes a novel algo-rithm named the Fast Training CNN(FST-CNN).To verify the algorithm’s effectiveness,twenty methods,including six classic models and thirty archi-tectures from previous studies,are included in a performance comparison.The overall accuracy(OA)trained by the FST-CNN algorithm on the same model architecture and dataset is treated as an evaluation baseline.Results show that there is a maximal OA gap of 8.35%between the FST-CNN and those methods in the literature,which means a 10%margin in performance.Meanwhile,all those complex roadmaps,e.g.,deep feature fusion,model combination,model ensembles,and human feature engineering,are not as effective as expected.It reveals that there was systemic suboptimal perfor-mance in the previous studies.Most of the CNN-based methods proposed in the previous studies show a consistent mistake,which has made the model’s accuracy lower than its potential value.The most important reasons seem to be the inappropriate training strategy and the shift in data distribution introduced by data augmentation(DA).As a result,most of the performance evaluation was conducted based on an inaccurate,suboptimal,and unfair result.It has made most of the previous research findings questionable to some extent.However,all these confusing results also exactly demonstrate the effectiveness of FST-CNN.This novel algorithm is model-agnostic and can be employed on any image classification model to potentially boost performance.In addition,the results also show that a standardized training strategy is indeed very meaningful for the research tasks of the RSI-SC.展开更多
The remote sensing ships’fine-grained classification technology makes it possible to identify certain ship types in remote sensing images,and it has broad application prospects in civil and military fields.However,th...The remote sensing ships’fine-grained classification technology makes it possible to identify certain ship types in remote sensing images,and it has broad application prospects in civil and military fields.However,the current model does not examine the properties of ship targets in remote sensing images with mixed multi-granularity features and a complicated backdrop.There is still an opportunity for future enhancement of the classification impact.To solve the challenges brought by the above characteristics,this paper proposes a Metaformer and Residual fusion network based on Visual Attention Network(VAN-MR)for fine-grained classification tasks.For the complex background of remote sensing images,the VAN-MR model adopts the parallel structure of large kernel attention and spatial attention to enhance the model’s feature extraction ability of interest targets and improve the classification performance of remote sensing ship targets.For the problem of multi-grained feature mixing in remote sensing images,the VAN-MR model uses a Metaformer structure and a parallel network of residual modules to extract ship features.The parallel network has different depths,considering both high-level and lowlevel semantic information.The model achieves better classification performance in remote sensing ship images with multi-granularity mixing.Finally,the model achieves 88.73%and 94.56%accuracy on the public fine-grained ship collection-23(FGSC-23)and FGSCR-42 datasets,respectively,while the parameter size is only 53.47 M,the floating point operations is 9.9 G.The experimental results show that the classification effect of VAN-MR is superior to that of traditional CNNs model and visual model with Transformer structure under the same parameter quantity.展开更多
Hyperspectral remote sensing/imaging spectroscopy is a novel approach to reaching a spectrum from all the places of a huge array of spatial places so that several spectral wavelengths are utilized for making coherent ...Hyperspectral remote sensing/imaging spectroscopy is a novel approach to reaching a spectrum from all the places of a huge array of spatial places so that several spectral wavelengths are utilized for making coherent images.Hyperspectral remote sensing contains acquisition of digital images from several narrow,contiguous spectral bands throughout the visible,Thermal Infrared(TIR),Near Infrared(NIR),and Mid-Infrared(MIR)regions of the electromagnetic spectrum.In order to the application of agricultural regions,remote sensing approaches are studied and executed to their benefit of continuous and quantitativemonitoring.Particularly,hyperspectral images(HSI)are considered the precise for agriculture as they can offer chemical and physical data on vegetation.With this motivation,this article presents a novel Hurricane Optimization Algorithm with Deep Transfer Learning Driven Crop Classification(HOADTL-CC)model onHyperspectralRemote Sensing Images.The presentedHOADTL-CC model focuses on the identification and categorization of crops on hyperspectral remote sensing images.To accomplish this,the presentedHOADTL-CC model involves the design ofHOAwith capsule network(CapsNet)model for generating a set of useful feature vectors.Besides,Elman neural network(ENN)model is applied to allot proper class labels into the input HSI.Finally,glowworm swarm optimization(GSO)algorithm is exploited to fine tune the ENNparameters involved in this article.The experimental result scrutiny of the HOADTL-CC method can be tested with the help of benchmark dataset and the results are assessed under distinct aspects.Extensive comparative studies stated the enhanced performance of the HOADTL-CC model over recent approaches with maximum accuracy of 99.51%.展开更多
To address the issue of imbalanced detection performance and detection speed in current mainstream object detection algorithms for optical remote sensing images,this paper proposes a multi-scale object detection model...To address the issue of imbalanced detection performance and detection speed in current mainstream object detection algorithms for optical remote sensing images,this paper proposes a multi-scale object detection model for remote sensing images on complex backgrounds,called DI-YOLO,based on You Only Look Once v7-tiny(YOLOv7-tiny).Firstly,to enhance the model’s ability to capture irregular-shaped objects and deformation features,as well as to extract high-level semantic information,deformable convolutions are used to replace standard convolutions in the original model.Secondly,a Content Coordination Attention Feature Pyramid Network(CCA-FPN)structure is designed to replace the Neck part of the original model,which can further perceive relationships between different pixels,reduce feature loss in remote sensing images,and improve the overall model’s ability to detect multi-scale objects.Thirdly,an Implicitly Efficient Decoupled Head(IEDH)is proposed to increase the model’s flexibility,making it more adaptable to complex detection tasks in various scenarios.Finally,the Smoothed Intersection over Union(SIoU)loss function replaces the Complete Intersection over Union(CIoU)loss function in the original model,resulting in more accurate prediction of bounding boxes and continuous model optimization.Experimental results on the High-Resolution Remote Sensing Detection(HRRSD)dataset demonstrate that the proposed DI-YOLO model outperforms mainstream target detection algorithms in terms of mean Average Precision(mAP)for optical remote sensing image detection.Furthermore,it achieves Frames Per Second(FPS)of 138.9,meeting fast and accurate detection requirements.展开更多
Preserving biodiversity and maintaining ecological balance is essential in current environmental conditions.It is challenging to determine vegetation using traditional map classification approaches.The primary issue i...Preserving biodiversity and maintaining ecological balance is essential in current environmental conditions.It is challenging to determine vegetation using traditional map classification approaches.The primary issue in detecting vegetation pattern is that it appears with complex spatial structures and similar spectral properties.It is more demandable to determine the multiple spectral ana-lyses for improving the accuracy of vegetation mapping through remotely sensed images.The proposed framework is developed with the idea of ensembling three effective strategies to produce a robust architecture for vegetation mapping.The architecture comprises three approaches,feature-based approach,region-based approach,and texture-based approach for classifying the vegetation area.The novel Deep Meta fusion model(DMFM)is created with a unique fusion frame-work of residual stacking of convolution layers with Unique covariate features(UCF),Intensity features(IF),and Colour features(CF).The overhead issues in GPU utilization during Convolution neural network(CNN)models are reduced here with a lightweight architecture.The system considers detailing feature areas to improve classification accuracy and reduce processing time.The proposed DMFM model achieved 99%accuracy,with a maximum processing time of 130 s.The training,testing,and validation losses are degraded to a significant level that shows the performance quality with the DMFM model.The system acts as a standard analysis platform for dynamic datasets since all three different fea-tures,such as Unique covariate features(UCF),Intensity features(IF),and Colour features(CF),are considered very well.展开更多
Based on low-altitude remote sensing images,this paper established sample set of typical river vegetation elements and proposed river vegetation extraction technical solution to adaptively extract typical vegetation e...Based on low-altitude remote sensing images,this paper established sample set of typical river vegetation elements and proposed river vegetation extraction technical solution to adaptively extract typical vegetation elements of river basins.The main research of this paper were as follows:(1)a typical vegetation extraction sample set based on low-altitude remote sensing images was established.(2)A low-altitude remote sensing image vegetation extraction model based on the focus perception module was designed to realize the end-to-end automatic extraction of different types of vegetation areas of low-altitude remote sensing images to fully learn the spectral spatial texture information and deep semantic information of the images.(3)By comparison with the baseline method,baseline method with embedded focus perception module showed an improvement in the precision by 7.37%and mIoU by 49.49%.Through visual interpretation and quantitative calculation analysis,the typical river vegetation adaptive extraction network has effectiveness and generalization ability,consistent with the needs of practical applications of vegetation extraction.展开更多
The primary objective of this research is to delineate potential groundwater recharge zones in the Kadaladi taluk of Ramanathapuram,Tamil Nadu,India,using a combination of remote sensing and Geographic Information Sys...The primary objective of this research is to delineate potential groundwater recharge zones in the Kadaladi taluk of Ramanathapuram,Tamil Nadu,India,using a combination of remote sensing and Geographic Information Systems(GIS)with the Analytical Hierarchical Process(AHP).Various factors such as geology,geomorphology,soil,drainage,density,lineament density,slope,rainfall were analyzed at a specific scale.Thematic layers were evaluated for quality and relevance using Saaty's scale,and then inte-grated using the weighted linear combination technique.The weights assigned to each layer and features were standardized using AHP and the Eigen vector technique,resulting in the final groundwater potential zone map.The AHP method was used to normalize the scores following the assignment of weights to each criterion or factor based on Saaty's 9-point scale.Pair-wise matrix analysis was utilized to calculate the geometric mean and normalized weight for various parameters.The groundwater recharge potential zone map was created by mathematically overlaying the normalized weighted layers.Thematic layers indicating major elements influencing groundwater occurrence and recharge were derived from satellite images.2 Results indicate that approximately 21.8 km of the total area exhibits high potential for groundwater recharge.Groundwater recharge is viable in areas with moderate slopes,particularly in the central and southeastern regions.展开更多
River bank erosion is a natural process that occurs when the water flow of a river exceeds the bank’s ability to withstand it. It is a common phenomenon that causes extensive land damage, displacement of people, loss...River bank erosion is a natural process that occurs when the water flow of a river exceeds the bank’s ability to withstand it. It is a common phenomenon that causes extensive land damage, displacement of people, loss of crops, and infrastructure damage. The Gorai River, situated on the right bank of the Ganges, is a significant branch of the river that flows into the Bay of Bengal via the Mathumati and Baleswar rivers. The erosion of the banks of the Gorai River in Kushtia district is not a recent occurrence. Local residents have been dealing with this issue for the past hundred years, and according to the elderly members of the community, the erosion has become more severe activities. Therefore, the main objective of this research is to quantify river bank erosion and accretion and bankline shifting from 2003 to 2022 using multi-temporal Landsat images data with GIS and remote sensing technique. Bank-line migration occurs as a result of the interplay and interconnectedness of various factors such as the degree of river-related processes such as erosion, transportation, and deposition, the amount of water in the river during the high season, the geological and soil makeup, and human intervention in the river. The results show that the highest eroded area was 4.6 square kilometers during the period of 2016 to 2019, while the highest accreted area was 7.12 square kilometers during the period of 2013 to 2016. However, the erosion and accretion values fluctuated from year to year.展开更多
In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a visi...In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.展开更多
基金National 1000 Young Talents Plan of ChinaNational Natural Science Foundation of China(61420106007,61671387,61871325)DECRA of Australica Resenrch Council (DE140100180).
文摘alient object detection aims at identifying the visually interesting object regions that are consistent with human perception. Multispectral remote sensing images provide rich radiometric information in revealing the physical properties of the observed objects, which leads to great potential to perform salient object detection for remote sensing images. Conventional salient object detection methods often employ handcrafted features to predict saliency by evaluating the pixel-wise or superpixel-wise contrast. With the recent use of deep learning framework, in particular, fully convolutional neural networks, there has been profound progress in visual saliency detection. However, this success has not been extended to multispectral remote sensing images, and existing multispectral salient object detection methods are still mainly based on handcrafted features, essentially due to the difficulties in image acquisition and labeling. In this paper, we propose a novel deep residual network based on a top-down model, which is trained in an end-to-end manner to tackle the above issues in multispectral salient object detection. Our model effectively exploits the saliency cues at different levels of the deep residual network. To overcome the limited availability of remote sensing images in training of our deep residual network, we also introduce a new spectral image reconstruction model that can generate multispectral images from RGB images. Our extensive experimental results using both multispectral and RGB salient object detection datasets demonstrate a significant performance improvement of more than 10% improvement compared with the state-of-the-art methods.
基金the National Natural Science Foundation of China(42001408,61806097).
文摘Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.
基金supported by the National Natural Science Foundation of China(Grant Nos.42090054,41931295)the Natural Science Foundation of Hubei Province of China(2022CFA002)。
文摘The frequent occurrence of extreme weather events has rendered numerous landslides to a global natural disaster issue.It is crucial to rapidly and accurately determine the boundaries of landslides for geohazards evaluation and emergency response.Therefore,the Skip Connection DeepLab neural network(SCDnn),a deep learning model based on 770 optical remote sensing images of landslide,is proposed to improve the accuracy of landslide boundary detection.The SCDnn model is optimized for the over-segmentation issue which occurs in conventional deep learning models when there is a significant degree of similarity between topographical geomorphic features.SCDnn exhibits notable improvements in landslide feature extraction and semantic segmentation by combining an enhanced Atrous Spatial Pyramid Convolutional Block(ASPC)with a coding structure that reduces model complexity.The experimental results demonstrate that SCDnn can identify landslide boundaries in 119 images with MIoU values between 0.8and 0.9;while 52 images with MIoU values exceeding 0.9,which exceeds the identification accuracy of existing techniques.This work can offer a novel technique for the automatic extensive identification of landslide boundaries in remote sensing images in addition to establishing the groundwork for future inve stigations and applications in related domains.
基金This project is supported by the National Natural Science Foundation of China(NSFC)(No.61902158).
文摘The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivotal components of image preprocessing,fostering an improvement in the quality of remote sensing imagery.This enhancement renders remote sensing data more indispensable,thereby enhancing the accuracy of target iden-tification.Conventional defogging techniques based on simplistic atmospheric degradation models have proven inadequate for mitigating non-uniform haze within remotely sensed images.In response to this challenge,a novel UNet Residual Attention Network(URA-Net)is proposed.This paradigmatic approach materializes as an end-to-end convolutional neural network distinguished by its utilization of multi-scale dense feature fusion clusters and gated jump connections.The essence of our methodology lies in local feature fusion within dense residual clusters,enabling the extraction of pertinent features from both preceding and current local data,depending on contextual demands.The intelligently orchestrated gated structures facilitate the propagation of these features to the decoder,resulting in superior outcomes in haze removal.Empirical validation through a plethora of experiments substantiates the efficacy of URA-Net,demonstrating its superior performance compared to existing methods when applied to established datasets for remote sensing image defogging.On the RICE-1 dataset,URA-Net achieves a Peak Signal-to-Noise Ratio(PSNR)of 29.07 dB,surpassing the Dark Channel Prior(DCP)by 11.17 dB,the All-in-One Network for Dehazing(AOD)by 7.82 dB,the Optimal Transmission Map and Adaptive Atmospheric Light For Dehazing(OTM-AAL)by 5.37 dB,the Unsupervised Single Image Dehazing(USID)by 8.0 dB,and the Superpixel-based Remote Sensing Image Dehazing(SRD)by 8.5 dB.Particularly noteworthy,on the SateHaze1k dataset,URA-Net attains preeminence in overall performance,yielding defogged images characterized by consistent visual quality.This underscores the contribution of the research to the advancement of remote sensing technology,providing a robust and efficient solution for alleviating the adverse effects of haze on image quality.
基金funded by the Chongqing Normal University Startup Foundation for PhD(22XLB021)supported by the Open Research Project of the State Key Laboratory of Industrial Control Technology,Zhejiang University,China(No.ICT2023B40).
文摘Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose an innovative transformer-based method.This method leverages transformers,which are adept at processing data sequences,to enhance cloud detection accuracy.Additionally,we introduce a Cyclic Refinement Architecture that improves the resolution and quality of feature extraction,thereby aiding in the retention of critical details often lost during cloud detection.Our extensive experimental validation shows that our approach significantly outperforms established models,excelling in high-resolution feature extraction and precise cloud segmentation.By integrating Positional Visual Transformers(PVT)with this architecture,our method advances high-resolution feature delineation and segmentation accuracy.Ultimately,our research offers a novel perspective for surmounting traditional challenges in cloud detection and contributes to the advancement of precise and dependable image analysis across various domains.
基金supported by the National Natural Science Foundation of China(Grant No.91948303)。
文摘Remote sensing images carry crucial ground information,often involving the spatial distribution and spatiotemporal changes of surface elements.To safeguard this sensitive data,image encryption technology is essential.In this paper,a novel Fibonacci sine exponential map is designed,the hyperchaotic performance of which is particularly suitable for image encryption algorithms.An encryption algorithm tailored for handling the multi-band attributes of remote sensing images is proposed.The algorithm combines a three-dimensional synchronized scrambled diffusion operation with chaos to efficiently encrypt multiple images.Moreover,the keys are processed using an elliptic curve cryptosystem,eliminating the need for an additional channel to transmit the keys,thus enhancing security.Experimental results and algorithm analysis demonstrate that the algorithm offers strong security and high efficiency,making it suitable for remote sensing image encryption tasks.
基金supported in part by the National Natural Science Foundation of China under Grants(62250410365,62071084)the Guangdong Basic and Applied Basic Research Foundation of China(2022A1515011542)the Guangzhou Science and technology program of China(202201010606).
文摘With the arrival of new data acquisition platforms derived from the Internet of Things(IoT),this paper goes beyond the understanding of traditional remote sensing technologies.Deep fusion of remote sensing and computer vision has hit the industrial world and makes it possible to apply Artificial intelligence to solve problems such as automatic extraction of information and image interpretation.However,due to the complex architecture of IoT and the lack of a unified security protection mechanism,devices in remote sensing are vulnerable to privacy leaks when sharing data.It is necessary to design a security scheme suitable for computation‐limited devices in IoT,since traditional encryption methods are based on computational complexity.Visual Cryptography(VC)is a threshold scheme for images that can be decoded directly by the human visual system when superimposing encrypted images.The stacking‐to‐see feature and simple Boolean decryption operation make VC an ideal solution for privacy‐preserving recognition for large‐scale remote sensing images in IoT.In this study,the secure and efficient transmission of high‐resolution remote sensing images by meaningful VC is achieved.By diffusing the error between the encryption block and the original block to adjacent blocks,the degradation of quality in recovery images is mitigated.By fine‐tuning the pre‐trained model from large‐scale datasets,we improve the recognition performance of small encryption datasets for remote sensing images.The experimental results show that the proposed lightweight privacy‐preserving recognition framework maintains high recognition performance while enhancing security.
基金National Natural Science Foundation of China(No.41871305)National Key Research and Development Program of China(No.2017YFC0602204)+2 种基金Fundamental Research Funds for the Central Universities,China University of Geosciences(Wuhan)(No.CUGQY1945)Open Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education and the Fundamental Research Funds for the Central Universities(No.GLAB2019ZR02)Open Fund of Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources,China(No.KF-2020-05-068)。
文摘The exploration of building detection plays an important role in urban planning,smart city and military.Aiming at the problem of high overlapping ratio of detection frames for dense building detection in high resolution remote sensing images,we present an effective YOLOv3 framework,corner regression-based YOLOv3(Correg-YOLOv3),to localize dense building accurately.This improved YOLOv3 algorithm establishes a vertex regression mechanism and an additional loss item about building vertex offsets relative to the center point of bounding box.By extending output dimensions,the trained model is able to output the rectangular bounding boxes and the building vertices meanwhile.Finally,we evaluate the performance of the Correg-YOLOv3 on our self-produced data set and provide a comparative analysis qualitatively and quantitatively.The experimental results achieve high performance in precision(96.45%),recall rate(95.75%),F1 score(96.10%)and average precision(98.05%),which were 2.73%,5.4%,4.1%and 4.73%higher than that of YOLOv3.Therefore,our proposed algorithm effectively tackles the problem of dense building detection in high resolution images.
基金Young Innovative Talents Project of Guangdong Ordinary Universities(No.2022KQNCX225)School-level Teaching and Research Project of Guangzhou City Polytechnic(No.2022xky046)。
文摘The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectively capture global context.In order to solve this problem,this paper proposes a hybrid model based on ResNet50 and swin transformer to directly capture long-range dependence,which fuses features through Cross Feature Modulation Module(CFMM).Experimental results on two publicly available datasets,Vaihingen and Potsdam,are mIoU of 70.27%and 76.63%,respectively.Thus,CFM-UNet can maintain a high segmentation performance compared with other competitive networks.
基金funded by the Major Scientific and Technological Innovation Project of Shandong Province,Grant No.2022CXGC010609.
文摘Semantic segmentation of remote sensing images is one of the core tasks of remote sensing image interpretation.With the continuous develop-ment of artificial intelligence technology,the use of deep learning methods for interpreting remote-sensing images has matured.Existing neural networks disregard the spatial relationship between two targets in remote sensing images.Semantic segmentation models that combine convolutional neural networks(CNNs)and graph convolutional neural networks(GCNs)cause a lack of feature boundaries,which leads to the unsatisfactory segmentation of various target feature boundaries.In this paper,we propose a new semantic segmentation model for remote sensing images(called DGCN hereinafter),which combines deep semantic segmentation networks(DSSN)and GCNs.In the GCN module,a loss function for boundary information is employed to optimize the learning of spatial relationship features between the target features and their relationships.A hierarchical fusion method is utilized for feature fusion and classification to optimize the spatial relationship informa-tion in the original feature information.Extensive experiments on ISPRS 2D and DeepGlobe semantic segmentation datasets show that compared with the existing semantic segmentation models of remote sensing images,the DGCN significantly optimizes the segmentation effect of feature boundaries,effectively reduces the noise in the segmentation results and improves the segmentation accuracy,which demonstrates the advancements of our model.
文摘A method to remove stripes from remote sensing images is proposed based on statistics and a new image enhancement method.The overall processing steps for improving the quality of remote sensing images are introduced to provide a general baseline.Due to the differences in satellite sensors when producing images,subtle but inherent stripes can appear at the stitching positions between the sensors.These stitchingstripes cannot be eliminated by conventional relative radiometric calibration.The inherent stitching stripes cause difficulties in downstream tasks such as the segmentation,classification and interpretation of remote sensing images.Therefore,a method to remove the stripes based on statistics and a new image enhancement approach are proposed in this paper.First,the inconsistency in grayscales around stripes is eliminated with the statistical method.Second,the pixels within stripes are weighted and averaged based on updated pixel values to enhance the uniformity of the overall image radiation quality.Finally,the details of the images are highlighted by a new image enhancement method,which makes the whole image clearer.Comprehensive experiments are performed,and the results indicate that the proposed method outperforms the baseline approach in terms of visual quality and radiation correction accuracy.
基金Hunan University of Arts and Science provided doctoral research funding for this study (grant number 16BSQD23)Fund of Geography Subject ([2022]351)also provided funding.
文摘Recently,the convolutional neural network(CNN)has been dom-inant in studies on interpreting remote sensing images(RSI).However,it appears that training optimization strategies have received less attention in relevant research.To evaluate this problem,the author proposes a novel algo-rithm named the Fast Training CNN(FST-CNN).To verify the algorithm’s effectiveness,twenty methods,including six classic models and thirty archi-tectures from previous studies,are included in a performance comparison.The overall accuracy(OA)trained by the FST-CNN algorithm on the same model architecture and dataset is treated as an evaluation baseline.Results show that there is a maximal OA gap of 8.35%between the FST-CNN and those methods in the literature,which means a 10%margin in performance.Meanwhile,all those complex roadmaps,e.g.,deep feature fusion,model combination,model ensembles,and human feature engineering,are not as effective as expected.It reveals that there was systemic suboptimal perfor-mance in the previous studies.Most of the CNN-based methods proposed in the previous studies show a consistent mistake,which has made the model’s accuracy lower than its potential value.The most important reasons seem to be the inappropriate training strategy and the shift in data distribution introduced by data augmentation(DA).As a result,most of the performance evaluation was conducted based on an inaccurate,suboptimal,and unfair result.It has made most of the previous research findings questionable to some extent.However,all these confusing results also exactly demonstrate the effectiveness of FST-CNN.This novel algorithm is model-agnostic and can be employed on any image classification model to potentially boost performance.In addition,the results also show that a standardized training strategy is indeed very meaningful for the research tasks of the RSI-SC.
文摘The remote sensing ships’fine-grained classification technology makes it possible to identify certain ship types in remote sensing images,and it has broad application prospects in civil and military fields.However,the current model does not examine the properties of ship targets in remote sensing images with mixed multi-granularity features and a complicated backdrop.There is still an opportunity for future enhancement of the classification impact.To solve the challenges brought by the above characteristics,this paper proposes a Metaformer and Residual fusion network based on Visual Attention Network(VAN-MR)for fine-grained classification tasks.For the complex background of remote sensing images,the VAN-MR model adopts the parallel structure of large kernel attention and spatial attention to enhance the model’s feature extraction ability of interest targets and improve the classification performance of remote sensing ship targets.For the problem of multi-grained feature mixing in remote sensing images,the VAN-MR model uses a Metaformer structure and a parallel network of residual modules to extract ship features.The parallel network has different depths,considering both high-level and lowlevel semantic information.The model achieves better classification performance in remote sensing ship images with multi-granularity mixing.Finally,the model achieves 88.73%and 94.56%accuracy on the public fine-grained ship collection-23(FGSC-23)and FGSCR-42 datasets,respectively,while the parameter size is only 53.47 M,the floating point operations is 9.9 G.The experimental results show that the classification effect of VAN-MR is superior to that of traditional CNNs model and visual model with Transformer structure under the same parameter quantity.
基金the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under Grant Number(25/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R303)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR28.
文摘Hyperspectral remote sensing/imaging spectroscopy is a novel approach to reaching a spectrum from all the places of a huge array of spatial places so that several spectral wavelengths are utilized for making coherent images.Hyperspectral remote sensing contains acquisition of digital images from several narrow,contiguous spectral bands throughout the visible,Thermal Infrared(TIR),Near Infrared(NIR),and Mid-Infrared(MIR)regions of the electromagnetic spectrum.In order to the application of agricultural regions,remote sensing approaches are studied and executed to their benefit of continuous and quantitativemonitoring.Particularly,hyperspectral images(HSI)are considered the precise for agriculture as they can offer chemical and physical data on vegetation.With this motivation,this article presents a novel Hurricane Optimization Algorithm with Deep Transfer Learning Driven Crop Classification(HOADTL-CC)model onHyperspectralRemote Sensing Images.The presentedHOADTL-CC model focuses on the identification and categorization of crops on hyperspectral remote sensing images.To accomplish this,the presentedHOADTL-CC model involves the design ofHOAwith capsule network(CapsNet)model for generating a set of useful feature vectors.Besides,Elman neural network(ENN)model is applied to allot proper class labels into the input HSI.Finally,glowworm swarm optimization(GSO)algorithm is exploited to fine tune the ENNparameters involved in this article.The experimental result scrutiny of the HOADTL-CC method can be tested with the help of benchmark dataset and the results are assessed under distinct aspects.Extensive comparative studies stated the enhanced performance of the HOADTL-CC model over recent approaches with maximum accuracy of 99.51%.
基金Funding for this research was provided by 511 Shaanxi Province’s Key Research and Development Plan(No.2022NY-087).
文摘To address the issue of imbalanced detection performance and detection speed in current mainstream object detection algorithms for optical remote sensing images,this paper proposes a multi-scale object detection model for remote sensing images on complex backgrounds,called DI-YOLO,based on You Only Look Once v7-tiny(YOLOv7-tiny).Firstly,to enhance the model’s ability to capture irregular-shaped objects and deformation features,as well as to extract high-level semantic information,deformable convolutions are used to replace standard convolutions in the original model.Secondly,a Content Coordination Attention Feature Pyramid Network(CCA-FPN)structure is designed to replace the Neck part of the original model,which can further perceive relationships between different pixels,reduce feature loss in remote sensing images,and improve the overall model’s ability to detect multi-scale objects.Thirdly,an Implicitly Efficient Decoupled Head(IEDH)is proposed to increase the model’s flexibility,making it more adaptable to complex detection tasks in various scenarios.Finally,the Smoothed Intersection over Union(SIoU)loss function replaces the Complete Intersection over Union(CIoU)loss function in the original model,resulting in more accurate prediction of bounding boxes and continuous model optimization.Experimental results on the High-Resolution Remote Sensing Detection(HRRSD)dataset demonstrate that the proposed DI-YOLO model outperforms mainstream target detection algorithms in terms of mean Average Precision(mAP)for optical remote sensing image detection.Furthermore,it achieves Frames Per Second(FPS)of 138.9,meeting fast and accurate detection requirements.
文摘Preserving biodiversity and maintaining ecological balance is essential in current environmental conditions.It is challenging to determine vegetation using traditional map classification approaches.The primary issue in detecting vegetation pattern is that it appears with complex spatial structures and similar spectral properties.It is more demandable to determine the multiple spectral ana-lyses for improving the accuracy of vegetation mapping through remotely sensed images.The proposed framework is developed with the idea of ensembling three effective strategies to produce a robust architecture for vegetation mapping.The architecture comprises three approaches,feature-based approach,region-based approach,and texture-based approach for classifying the vegetation area.The novel Deep Meta fusion model(DMFM)is created with a unique fusion frame-work of residual stacking of convolution layers with Unique covariate features(UCF),Intensity features(IF),and Colour features(CF).The overhead issues in GPU utilization during Convolution neural network(CNN)models are reduced here with a lightweight architecture.The system considers detailing feature areas to improve classification accuracy and reduce processing time.The proposed DMFM model achieved 99%accuracy,with a maximum processing time of 130 s.The training,testing,and validation losses are degraded to a significant level that shows the performance quality with the DMFM model.The system acts as a standard analysis platform for dynamic datasets since all three different fea-tures,such as Unique covariate features(UCF),Intensity features(IF),and Colour features(CF),are considered very well.
文摘Based on low-altitude remote sensing images,this paper established sample set of typical river vegetation elements and proposed river vegetation extraction technical solution to adaptively extract typical vegetation elements of river basins.The main research of this paper were as follows:(1)a typical vegetation extraction sample set based on low-altitude remote sensing images was established.(2)A low-altitude remote sensing image vegetation extraction model based on the focus perception module was designed to realize the end-to-end automatic extraction of different types of vegetation areas of low-altitude remote sensing images to fully learn the spectral spatial texture information and deep semantic information of the images.(3)By comparison with the baseline method,baseline method with embedded focus perception module showed an improvement in the precision by 7.37%and mIoU by 49.49%.Through visual interpretation and quantitative calculation analysis,the typical river vegetation adaptive extraction network has effectiveness and generalization ability,consistent with the needs of practical applications of vegetation extraction.
文摘The primary objective of this research is to delineate potential groundwater recharge zones in the Kadaladi taluk of Ramanathapuram,Tamil Nadu,India,using a combination of remote sensing and Geographic Information Systems(GIS)with the Analytical Hierarchical Process(AHP).Various factors such as geology,geomorphology,soil,drainage,density,lineament density,slope,rainfall were analyzed at a specific scale.Thematic layers were evaluated for quality and relevance using Saaty's scale,and then inte-grated using the weighted linear combination technique.The weights assigned to each layer and features were standardized using AHP and the Eigen vector technique,resulting in the final groundwater potential zone map.The AHP method was used to normalize the scores following the assignment of weights to each criterion or factor based on Saaty's 9-point scale.Pair-wise matrix analysis was utilized to calculate the geometric mean and normalized weight for various parameters.The groundwater recharge potential zone map was created by mathematically overlaying the normalized weighted layers.Thematic layers indicating major elements influencing groundwater occurrence and recharge were derived from satellite images.2 Results indicate that approximately 21.8 km of the total area exhibits high potential for groundwater recharge.Groundwater recharge is viable in areas with moderate slopes,particularly in the central and southeastern regions.
文摘River bank erosion is a natural process that occurs when the water flow of a river exceeds the bank’s ability to withstand it. It is a common phenomenon that causes extensive land damage, displacement of people, loss of crops, and infrastructure damage. The Gorai River, situated on the right bank of the Ganges, is a significant branch of the river that flows into the Bay of Bengal via the Mathumati and Baleswar rivers. The erosion of the banks of the Gorai River in Kushtia district is not a recent occurrence. Local residents have been dealing with this issue for the past hundred years, and according to the elderly members of the community, the erosion has become more severe activities. Therefore, the main objective of this research is to quantify river bank erosion and accretion and bankline shifting from 2003 to 2022 using multi-temporal Landsat images data with GIS and remote sensing technique. Bank-line migration occurs as a result of the interplay and interconnectedness of various factors such as the degree of river-related processes such as erosion, transportation, and deposition, the amount of water in the river during the high season, the geological and soil makeup, and human intervention in the river. The results show that the highest eroded area was 4.6 square kilometers during the period of 2016 to 2019, while the highest accreted area was 7.12 square kilometers during the period of 2013 to 2016. However, the erosion and accretion values fluctuated from year to year.
基金supported by the National Natural Science Foundation of China (61702528,61806212)。
文摘In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.