Deep learning has revolutionized the field of artificial intelligence.Based on the statistical correlations uncovered by deep learning-based methods,computer vision tasks,such as autonomous driving and robotics,are gr...Deep learning has revolutionized the field of artificial intelligence.Based on the statistical correlations uncovered by deep learning-based methods,computer vision tasks,such as autonomous driving and robotics,are growing rapidly.Despite being the basis of deep learning,such correlation strongly depends on the distribution of the original data and is susceptible to uncontrolled factors.Without the guidance of prior knowledge,statistical correlations alone cannot correctly reflect the essential causal relations and may even introduce spurious correlations.As a result,researchers are now trying to enhance deep learningbased methods with causal theory.Causal theory can model the intrinsic causal structure unaffected by data bias and effectively avoids spurious correlations.This paper aims to comprehensively review the existing causal methods in typical vision and visionlanguage tasks such as semantic segmentation,object detection,and image captioning.The advantages of causality and the approaches for building causal paradigms will be summarized.Future roadmaps are also proposed,including facilitating the development of causal theory and its application in other complex scenarios and systems.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.62233005 and 62293502)the Programme of Introducing Talents of Discipline to Universities(the 111 Project,Grant No.B17017)+1 种基金the Fundamental Research Funds for the Central Universities(Grant No.222202317006)Shanghai AI Lab。
文摘Deep learning has revolutionized the field of artificial intelligence.Based on the statistical correlations uncovered by deep learning-based methods,computer vision tasks,such as autonomous driving and robotics,are growing rapidly.Despite being the basis of deep learning,such correlation strongly depends on the distribution of the original data and is susceptible to uncontrolled factors.Without the guidance of prior knowledge,statistical correlations alone cannot correctly reflect the essential causal relations and may even introduce spurious correlations.As a result,researchers are now trying to enhance deep learningbased methods with causal theory.Causal theory can model the intrinsic causal structure unaffected by data bias and effectively avoids spurious correlations.This paper aims to comprehensively review the existing causal methods in typical vision and visionlanguage tasks such as semantic segmentation,object detection,and image captioning.The advantages of causality and the approaches for building causal paradigms will be summarized.Future roadmaps are also proposed,including facilitating the development of causal theory and its application in other complex scenarios and systems.