A neuroprosthesis is a type of precision medical device that is intended to manipulate the neuronal signals of the brain in a closed-loop fashion,while simultaneously receiving stimuli from the environment and control...A neuroprosthesis is a type of precision medical device that is intended to manipulate the neuronal signals of the brain in a closed-loop fashion,while simultaneously receiving stimuli from the environment and controlling some part of a human brain or body.Incoming visual information can be processed by the brain in millisecond intervals.The retina computes visual scenes and sends its output to the cortex in the form of neuronal spikes for further computation.Thus,the neuronal signal of interest for a retinal neuroprosthesis is the neuronal spike.Closed-loop computation in a neuroprosthesis includes two stages:encoding a stimulus as a neuronal signal,and decoding it back into a stimulus.In this paper,we review some of the recent progress that has been achieved in visual computation models that use spikes to analyze natural scenes that include static images and dynamic videos.We hypothesize that in order to obtain a better understanding of the computational principles in the retina,a hypercircuit view of the retina is necessary,in which the different functional network motifs that have been revealed in the cortex neuronal network are taken into consideration when interacting with the retina.The different building blocks of the retina,which include a diversity of cell types and synaptic connections-both chemical synapses and electrical synapses(gap junctions)-make the retina an ideal neuronal network for adapting the computational techniques that have been developed in artificial intelligence to model the encoding and decoding of visual scenes.An overall systems approach to visual computation with neuronal spikes is necessary in order to advance the next generation of retinal neuroprosthesis as an artificial visual system.展开更多
Cluster science as a bridge linking atomic molecular physics and condensed matter inspired the nanomaterials development in the past decades, ranging from the single-atom catalysis to ligand-protected noble metal clus...Cluster science as a bridge linking atomic molecular physics and condensed matter inspired the nanomaterials development in the past decades, ranging from the single-atom catalysis to ligand-protected noble metal clusters. The corresponding studies not only have been restricted to the search for the geometrical structures of clusters, but also have promoted the development of cluster-assembled materials as the building blocks. The CALYPSO cluster prediction method combined with other computational techniques have significantly stimulated the development of the cluster-based nanomaterials. In this review, we will summarize some good cases of cluster structure by CALYPSO method, which have also been successfully identified by the photoelectron spectra experiments. Beginning with the alkali-metal clusters, which serve as benchmarks, a series of studies are performed on the size-dependent elemental clusters which possess relatively high stability and interesting chemical physical properties. Special attentions are paid to the boron-based clusters because of their promising applications. The NbSi12 and BeB16 clusters, for example, are two classic representatives of the silicon-and boron-based clusters, which can be viewed as building blocks of nanotubes and borophene. This review offers a detailed description of the structural evolutions and electronic properties of medium-sized pure and doped clusters, which will advance fundamental knowledge of cluster-based nanomaterials and provide valuable information for further theoretical and experimental studies.展开更多
The second generation Audio Video Coding Standard (AVS2) is the most recent video coding standard. By introducing several new coding techniques, AVS2 can provide more efficient compression for scene videos such as sur...The second generation Audio Video Coding Standard (AVS2) is the most recent video coding standard. By introducing several new coding techniques, AVS2 can provide more efficient compression for scene videos such as surveillance videos, conference videos, etc. Due to the limited scenes, scene videos have great redundancy especially in background region. The new scene video coding techniques applied in AVS2 mainly focus on reducing redundancy in order to achieve higher compression. This paper introduces several important AVS2 scene video coding techniques. Experimental results show that with scene video coding tools, AVS2 can save nearly 40%BD?rate (Bj?ntegaard?Delta bit?rate) on scene videos.展开更多
Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agr...Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.展开更多
With the urgent demand for generalized deep models,many pre-trained big models are proposed,such as bidirectional encoder representations(BERT),vision transformer(ViT),generative pre-trained transformers(GPT),etc.Insp...With the urgent demand for generalized deep models,many pre-trained big models are proposed,such as bidirectional encoder representations(BERT),vision transformer(ViT),generative pre-trained transformers(GPT),etc.Inspired by the success of these models in single domains(like computer vision and natural language processing),the multi-modal pre-trained big models have also drawn more and more attention in recent years.In this work,we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works.Specifically,we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning,pre-training works in natural language process,computer vision,and speech.Then,we introduce the task definition,key challenges,and advantages of multi-modal pre-training models(MM-PTMs),and discuss the MM-PTMs with a focus on data,objectives,network architectures,and knowledge enhanced pre-training.After that,we introduce the downstream tasks used for the validation of large-scale MM-PTMs,including generative,classification,and regression tasks.We also give visualization and analysis of the model parameters and results on representative downstream tasks.Finally,we point out possible research directions for this topic that may benefit future works.In addition,we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models:https://github.com/wangxiao5791509/MultiModal_BigModels_Survey.展开更多
The advance of the Internet in the past decade has radically changed the way people communicate and col- laborate with each other. Physical distance is no more a barrier in online social networks, but cultural differe...The advance of the Internet in the past decade has radically changed the way people communicate and col- laborate with each other. Physical distance is no more a barrier in online social networks, but cultural differences (at the individual, community, as well as societal levels) still govern human-human interactions and must be con- sidered and leveraged in the online world. The rapid de-ployment of high-speed Internet allows humans to interact using a rich set of multimedia data such as texts, pictures, and videos. This position paper proposes to define a new research area called 'connected multimedia', which is the study of a collection of research issues of the super-area social media that receive little attention in the literature. By connected multimedia, we mean the study of the social and technical interactions among users, multimedia data, and devices across cultures and explicitly exploiting the cultural differences. We justify why it is necessary to bring attention to this new research area and what benefits of this new research area may bring to the broader scientific research community and the humanity.展开更多
基金supported by projects of the National Natural Science Foundation of China(61425025)the Beijing Municipal Science and Technology Project(Z151100000915070 and Z171100000117008)。
基金supported by the National Basic Research Program of China(2015CB351806)the National Natural Science Foundation of China(61806011,61825101,61425025,and U1611461)+4 种基金the National Postdoctoral Program for Innovative Talents(BX20180005)the China Postdoctoral Science Foundation(2018M630036)the International Talent Exchange Program of Beijing Municipal Commission of Science and Technology(Z181100001018026)the Zhejiang Lab(2019KC0AB03 and 2019KC0AD02)the Royal Society Newton Advanced Fellowship(NAF-R1-191082).
文摘A neuroprosthesis is a type of precision medical device that is intended to manipulate the neuronal signals of the brain in a closed-loop fashion,while simultaneously receiving stimuli from the environment and controlling some part of a human brain or body.Incoming visual information can be processed by the brain in millisecond intervals.The retina computes visual scenes and sends its output to the cortex in the form of neuronal spikes for further computation.Thus,the neuronal signal of interest for a retinal neuroprosthesis is the neuronal spike.Closed-loop computation in a neuroprosthesis includes two stages:encoding a stimulus as a neuronal signal,and decoding it back into a stimulus.In this paper,we review some of the recent progress that has been achieved in visual computation models that use spikes to analyze natural scenes that include static images and dynamic videos.We hypothesize that in order to obtain a better understanding of the computational principles in the retina,a hypercircuit view of the retina is necessary,in which the different functional network motifs that have been revealed in the cortex neuronal network are taken into consideration when interacting with the retina.The different building blocks of the retina,which include a diversity of cell types and synaptic connections-both chemical synapses and electrical synapses(gap junctions)-make the retina an ideal neuronal network for adapting the computational techniques that have been developed in artificial intelligence to model the encoding and decoding of visual scenes.An overall systems approach to visual computation with neuronal spikes is necessary in order to advance the next generation of retinal neuroprosthesis as an artificial visual system.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.U1804121 and 11304167)
文摘Cluster science as a bridge linking atomic molecular physics and condensed matter inspired the nanomaterials development in the past decades, ranging from the single-atom catalysis to ligand-protected noble metal clusters. The corresponding studies not only have been restricted to the search for the geometrical structures of clusters, but also have promoted the development of cluster-assembled materials as the building blocks. The CALYPSO cluster prediction method combined with other computational techniques have significantly stimulated the development of the cluster-based nanomaterials. In this review, we will summarize some good cases of cluster structure by CALYPSO method, which have also been successfully identified by the photoelectron spectra experiments. Beginning with the alkali-metal clusters, which serve as benchmarks, a series of studies are performed on the size-dependent elemental clusters which possess relatively high stability and interesting chemical physical properties. Special attentions are paid to the boron-based clusters because of their promising applications. The NbSi12 and BeB16 clusters, for example, are two classic representatives of the silicon-and boron-based clusters, which can be viewed as building blocks of nanotubes and borophene. This review offers a detailed description of the structural evolutions and electronic properties of medium-sized pure and doped clusters, which will advance fundamental knowledge of cluster-based nanomaterials and provide valuable information for further theoretical and experimental studies.
基金supported by the National Basic Research Program of China under grant 2015CB351806the National Natural Science Foundation of China under contract No.61425025,No.61390515 and No.61421062Shenzhen Peacock Plan
文摘The second generation Audio Video Coding Standard (AVS2) is the most recent video coding standard. By introducing several new coding techniques, AVS2 can provide more efficient compression for scene videos such as surveillance videos, conference videos, etc. Due to the limited scenes, scene videos have great redundancy especially in background region. The new scene video coding techniques applied in AVS2 mainly focus on reducing redundancy in order to achieve higher compression. This paper introduces several important AVS2 scene video coding techniques. Experimental results show that with scene video coding tools, AVS2 can save nearly 40%BD?rate (Bj?ntegaard?Delta bit?rate) on scene videos.
基金supported in part by National Natural Science Foundation of China(Nos.62132002,61825101 and 62202010)the Key-Area Research and Development Program of Guangdong Province,China(No.2021B0101400002)the China Postdoctoral Science Foundation(No.2022M710212).
文摘Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.
基金supported by National Natural Science Foundation of China(Nos.61872256 and 62102205)Key-Area Research and Development Program of Guangdong Province,China(No.2021B0101400002)+1 种基金Peng Cheng Laboratory Key Research Project,China(No.PCL 2021A07)Multi-source Cross-platform Video Analysis and Understanding for Intelligent Perception in Smart City,China(No.U20B2052).
文摘With the urgent demand for generalized deep models,many pre-trained big models are proposed,such as bidirectional encoder representations(BERT),vision transformer(ViT),generative pre-trained transformers(GPT),etc.Inspired by the success of these models in single domains(like computer vision and natural language processing),the multi-modal pre-trained big models have also drawn more and more attention in recent years.In this work,we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works.Specifically,we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning,pre-training works in natural language process,computer vision,and speech.Then,we introduce the task definition,key challenges,and advantages of multi-modal pre-training models(MM-PTMs),and discuss the MM-PTMs with a focus on data,objectives,network architectures,and knowledge enhanced pre-training.After that,we introduce the downstream tasks used for the validation of large-scale MM-PTMs,including generative,classification,and regression tasks.We also give visualization and analysis of the model parameters and results on representative downstream tasks.Finally,we point out possible research directions for this topic that may benefit future works.In addition,we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models:https://github.com/wangxiao5791509/MultiModal_BigModels_Survey.
基金supported in part by US National Science Foundation through grant IIS-0956924College of Computer Science and Technology of Zhejiang University, China+2 种基金The follow-up workshop in 2010 held in Florence was supported in part by ACM and Microsoft ResearchZhongfei ZHANG is also supported in part by the National Basic ResearchProgram of China (No. 2012CB316400)ZJU-Alibaba Financial Joint Lab, Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, and US NSF (Nos. IIS-0812114 and CCF-1017828)
文摘The advance of the Internet in the past decade has radically changed the way people communicate and col- laborate with each other. Physical distance is no more a barrier in online social networks, but cultural differences (at the individual, community, as well as societal levels) still govern human-human interactions and must be con- sidered and leveraged in the online world. The rapid de-ployment of high-speed Internet allows humans to interact using a rich set of multimedia data such as texts, pictures, and videos. This position paper proposes to define a new research area called 'connected multimedia', which is the study of a collection of research issues of the super-area social media that receive little attention in the literature. By connected multimedia, we mean the study of the social and technical interactions among users, multimedia data, and devices across cultures and explicitly exploiting the cultural differences. We justify why it is necessary to bring attention to this new research area and what benefits of this new research area may bring to the broader scientific research community and the humanity.