摘要
近年来,遥感智能解译技术快速发展,但大多为专用模型难以泛化到不同任务中,易造成资源浪费。基础模型是一种通用可泛化的解决方案,最近在遥感领域备受关注。尽管目前有大量工作已利用遥感单时相或多时相数据在感知识别和认知预测的部分任务上取得显著成果,但缺乏一个全面的综述给遥感基础模型提供系统概述。因此本文首先从数据、方法和应用角度对现有遥感基础模型的研究进展进行总结,然后通过分析现状存在的局限提出新一代遥感通用预测基础模型的设想,最后针对亟需研究的方向进行探讨与实验,为研究人员提供遥感基础模型过去成果与未来可能性之间的桥梁。
In recent years,remote sensing intelligent interpretation technologies have advanced rapidly,but most established models are task oriented.Therefore,generalizing them to different tasks is difficult,and considerable amounts of resources are wasted.The foundation model is a straightforward approach that has recently attracted considerable interest in the field of remote sensing.Although many works have achieved remarkable results in some tasks for perception recognition and cognitive prediction by using remote sensing single-temporal or multitemporal data,a comprehensive review that provides a systematic overview of the remote sensing foundation model is lacking.Thus,this paper begins by summarizing developments in research on existing remote sensing foundation models from the perspectives of data,methods,and applications.Then,after analyzing the current situation’s limits,we proposed a novel general predictive foundation model.Finally,some essential research areas were highlighted,and past achievements were linked with the future possibilities of remote sensing foundation model.Existing remote sensing foundation models were categorized into three groups according to the data types used(single-temporal/multitemporal)and the tasks involved(perceptual recognition/cognitive prediction):the foundation model of perceptual recognition based on single-temporal data,the foundation model of perceptual recognition based on multitemporal data,and the foundation model of cognitive prediction based on multitemporal data.According to the different self-supervised learning methods adopted,we divided the existing foundation models of perceptual recognition based on single-temporal data into those based on contrastive learning and those based on generative learning.According to the number of tasks,the foundation model of perceptual recognition based on multitemporal data was divided into a single-task-oriented foundation model and a multitask-oriented foundation model.According to different model architectures,the cognitive prediction foundation models based on multitemporal data were divided into transformer-based and graph network-based foundation models.In accordance with the aforementioned categorization,we described the current state of each type of remote sensing foundation models and summarized their data,methods,and application restrictions.Based on the summary and analysis of the existing remote sensing foundation models,a novel general predictive foundation model assumption was proposed.The information pipeline for multidomain or temporal data input and multitime or spatial scale task output can be opened up by extracting stable and generalized time-series hyper-pixel features.This approach enabled the accurate cognitive prediction of the future state.Tens of millions of multiplatform,multitype,multimodal,and multitemporal data were included.By combining the benefits of the transformer model and the graph network,a new foundation model architecture was created,which increased the model’s capacity and enhanced generalization while predicting multitarget interactions in large remote sensing scenes over the long term.In terms of application,the general predictive foundation model can be applied to diverse cognitive prediction tasks with multiple spatial and time scales.Under this assumption,we proposed four exploratory directions:multidomain time series data representation,stable feature extraction,objectenvironment interaction modeling,and multitask interaction reasoning,aiming to provide a reference for researchers exploring remote sensing foundation models.In general,foundation models with generalization ability are crucial to development of remote sensing intelligent interpretation.We provided an overview of current advances in this field by collating the current state of research on remote sensing foundation models.By analyzing the limitations of current remote sensing foundation models in terms of data,methods,and applications,we proposed a novel general predictive foundation model assumption and further clarified four exploratory directions that urgently need breakthroughs under this idea.The follow-up work will make specific and important technological breakthroughs in multidomain time series data representation,stable feature extraction,object-environment interaction modeling,and multitask interaction reasoning.We explored a general remote sensing foundation model integrating perception recognition and cognitive prediction into a single architecture.
作者
付琨
卢宛萱
刘小煜
邓楚博
于泓峰
孙显
FU Kun;LU Wanxuan;LIU Xiaoyu;DENG Chubo;YU Hongfeng;SUN Xian(Key Laboratory of Network Information System Technology(NIST),Chinese Academy of Sciences,Beijing 100190,China;Aerospace Information Research Institute,Chinese Academy of Sciences,Beijing 100094,China;University of Chinese Academy of Sciences,Beijing 100101,China)
出处
《遥感学报》
EI
CSCD
北大核心
2024年第7期1667-1680,共14页
NATIONAL REMOTE SENSING BULLETIN
基金
国家自然科学基金(编号:62201550,62171436)
中国科学院重点部署科研专项(编号:KGFZD-145-23-18)
科技创新2030-“新一代人工智能”重大项目(编号:2022ZD0118401)。
关键词
遥感智能解译
遥感基础模型
通用预测
多时相数据
多任务
remote sensing intelligent interpretation
remote sensing foundation models
general prediction
multi temporal data
multi-task