High spatial resolution and high temporal frequency fractional vegetation cover(FVC) products have been increasingly in demand to monitor and research land surface processes. This paper develops an algorithm to estima...High spatial resolution and high temporal frequency fractional vegetation cover(FVC) products have been increasingly in demand to monitor and research land surface processes. This paper develops an algorithm to estimate FVC at a 30-m/15-day resolution over China by taking advantage of the spatial and temporal information from different types of sensors: the 30-m resolution sensor on the Chinese environment satellite(HJ-1) and the 1-km Moderate Resolution Imaging Spectroradiometer(MODIS). The algorithm was implemented for each main vegetation class and each land cover type over China. First, the high spatial resolution and high temporal frequency normalized difference vegetation index(NDVI) was acquired by using the continuous correction(CC) data assimilation method. Then, FVC was generated with a nonlinear pixel unmixing model. Model coefficients were obtained by statistical analysis of the MODIS NDVI. The proposed method was evaluated based on in situ FVC measurements and a global FVC product(GEOV1 FVC). Direct validation using in situ measurements at 97 sampling plots per half month in 2010 showed that the annual mean errors(MEs) of forest, cropland, and grassland were-0.025, 0.133, and 0.160, respectively, indicating that the FVCs derived from the proposed algorithm were consistent with ground measurements [R2 = 0.809,root-mean-square deviation(RMSD) = 0.065]. An intercomparison between the proposed FVC and GEOV1 FVC demonstrated that the two products had good spatial–temporal consistency and similar magnitude(RMSD approximates 0.1). Overall, the approach provides a new operational way to estimate high spatial resolution and high temporal frequency FVC from multiple remote sensing datasets.展开更多
In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a visi...In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.展开更多
基金Supported by the National Key Research and Development Program of China (2018YFC1506501, 2018YFA0605503, and2016YFB0501502)Special Program of Gaofen Satellites (04-Y30B01-9001-18/20-3-1)National Natural Science Foundation of China (41871230 and 41871231)。
文摘High spatial resolution and high temporal frequency fractional vegetation cover(FVC) products have been increasingly in demand to monitor and research land surface processes. This paper develops an algorithm to estimate FVC at a 30-m/15-day resolution over China by taking advantage of the spatial and temporal information from different types of sensors: the 30-m resolution sensor on the Chinese environment satellite(HJ-1) and the 1-km Moderate Resolution Imaging Spectroradiometer(MODIS). The algorithm was implemented for each main vegetation class and each land cover type over China. First, the high spatial resolution and high temporal frequency normalized difference vegetation index(NDVI) was acquired by using the continuous correction(CC) data assimilation method. Then, FVC was generated with a nonlinear pixel unmixing model. Model coefficients were obtained by statistical analysis of the MODIS NDVI. The proposed method was evaluated based on in situ FVC measurements and a global FVC product(GEOV1 FVC). Direct validation using in situ measurements at 97 sampling plots per half month in 2010 showed that the annual mean errors(MEs) of forest, cropland, and grassland were-0.025, 0.133, and 0.160, respectively, indicating that the FVCs derived from the proposed algorithm were consistent with ground measurements [R2 = 0.809,root-mean-square deviation(RMSD) = 0.065]. An intercomparison between the proposed FVC and GEOV1 FVC demonstrated that the two products had good spatial–temporal consistency and similar magnitude(RMSD approximates 0.1). Overall, the approach provides a new operational way to estimate high spatial resolution and high temporal frequency FVC from multiple remote sensing datasets.
基金supported by the National Natural Science Foundation of China (61702528,61806212)。
文摘In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.