Transformers,the dominant architecture for natural language processing,have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and hi...Transformers,the dominant architecture for natural language processing,have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance.Transformers are sequence-to-sequence models,which use a selfattention mechanism rather than the RNN sequential structure.Thus,such models can be trained in parallel and can represent global information.This study comprehensively surveys recent visual transformer works.We categorize them according to task scenario:backbone design,high-level vision,low-level vision and generation,and multimodal learning.Their key ideas are also analyzed.Differing from previous surveys,we mainly focus on visual transformer methods in low-level vision and generation.The latest works on backbone design are also reviewed in detail.For ease of understanding,we precisely describe the main contributions of the latest works in the form of tables.As well as giving quantitative comparisons,we also present image results for low-level vision and generation tasks.Computational costs and source code links for various important works are also given in this survey to assist further development.展开更多
Line drawings, as a concise form, can be recognized by infants and even chimpanzees. Recently, how the visual system processes line-drawings attracts more and more attention from psychology, cognitive science and comp...Line drawings, as a concise form, can be recognized by infants and even chimpanzees. Recently, how the visual system processes line-drawings attracts more and more attention from psychology, cognitive science and computer science. The neuroscientific studies revealed that line drawings generate similar neural actions as color photographs, which give insights on how to efficiently process big media data. In this paper, we present a comprehensive survey on line drawing studies, including cognitive mechanism of visual perception, computational models in computer vision and intelligent process in diverse media applications. Major debates, challenges and solutions that have been addressed over the years are discussed. Finally some of the ensuing challenges in line drawing studies are outlined.展开更多
基金supported by National Key R&D Program of China under Grant No.2020AAA0106200National Natural Science Foundation of China under Grant Nos.61832016 and U20B2070.
文摘Transformers,the dominant architecture for natural language processing,have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance.Transformers are sequence-to-sequence models,which use a selfattention mechanism rather than the RNN sequential structure.Thus,such models can be trained in parallel and can represent global information.This study comprehensively surveys recent visual transformer works.We categorize them according to task scenario:backbone design,high-level vision,low-level vision and generation,and multimodal learning.Their key ideas are also analyzed.Differing from previous surveys,we mainly focus on visual transformer methods in low-level vision and generation.The latest works on backbone design are also reviewed in detail.For ease of understanding,we precisely describe the main contributions of the latest works in the form of tables.As well as giving quantitative comparisons,we also present image results for low-level vision and generation tasks.Computational costs and source code links for various important works are also given in this survey to assist further development.
文摘Line drawings, as a concise form, can be recognized by infants and even chimpanzees. Recently, how the visual system processes line-drawings attracts more and more attention from psychology, cognitive science and computer science. The neuroscientific studies revealed that line drawings generate similar neural actions as color photographs, which give insights on how to efficiently process big media data. In this paper, we present a comprehensive survey on line drawing studies, including cognitive mechanism of visual perception, computational models in computer vision and intelligent process in diverse media applications. Major debates, challenges and solutions that have been addressed over the years are discussed. Finally some of the ensuing challenges in line drawing studies are outlined.