期刊文献+

为您找到了以下期刊:

共找到27篇文章
< 1 2 >
每页显示 20 50 100
Artificial Social Intelligence:A Comparative and Holistic View 被引量:1
1
作者 Lifeng Fan Manjie Xu +2 位作者 Zhihao Cao Yixin Zhu Song-Chun Zhu caai artificial intelligence research 2022年第2期144-160,共17页
In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notab... In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notably,humans are distinguished from their closest primate cousins by their social cognitive skills as opposed to their physical counterparts.We believe that artificial social intelligence(ASI)will play a crucial role in shaping the future of artificial intelligence(AI).This article begins with a review of ASI from a cognitive science standpoint,including social perception,theory of mind(ToM),and social interaction.Next,we examine the recently-emerged computational counterpart in the AI community.Finally,we provide an in-depth discussion on topics related to ASI. 展开更多
关键词 social intelligence theory of mind(ToM) COMMUNICATION human-machine teaming
原文传递
State of the Art of Adaptive Dynamic Programming and Reinforcement Learning
2
作者 Derong Liu Mingming Ha Shan Xue caai artificial intelligence research 2022年第2期93-110,共18页
This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic progra... This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms. 展开更多
关键词 adaptive dynamic programming approximate dynamic programming adaptive critic designs neuro-dynamic programming neural dynamic programming reinforcement learning intelligent control learning control optimal control
原文传递
Artificial Intelligence for Metaverse:A Framework
3
作者 Yuchen Guo Tao Yu +5 位作者 Jiamin Wu Yuwang Wang Sen Wan Jiyuan Zheng Lu Fang Qionghai Dai caai artificial intelligence research 2022年第1期54-67,共14页
The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse i... The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse in a new framework based on a broad range of technologies,including perception which enables us to precisely capture the characteristics of the real world,computation which supports the large computation requirement over large-scale data,reconstruction which builds the virtual world from the real one,cooperation which facilitates long-distance communication and teamwork between users,and interaction which bridges users and the virtual world.Despite its popularity,the fundamental techniques in this framework are still immature.Innovating new techniques to facilitate the applications of metaverse is necessary.In recent years,artificial intelligence(AI),especially deep learning,has shown promising results for empowering various areas,from science to industry.It is reasonable to imagine how we can combine AI with the framework in order to promote the development of metaverse.In this survey,we present the recent achievement by AI for metaverse in the proposed framework,including perception,computation,reconstruction,cooperation,and interaction.We also discuss some future works that AI can contribute to metaverse. 展开更多
关键词 artificial intelligence metaverse PERCEPTION COMPUTATION RECONSTRUCTION COOPERATION INTERACTION
原文传递
Next Decade of Telecommunications Artificial Intelligence
4
作者 Ye Ouyang Lilei Wang +3 位作者 Aidong Yang Tongqing Gao Leping Wei Yaqin Zhang caai artificial intelligence research 2022年第1期28-53,共26页
It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computin... It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computing industries,the rapid convergence of 5th generation mobile communication technology(5G)and AI is beginning to significantly transform the core communication infrastructure,network management,and vertical applications.The paper first outlined the individual roadmaps of mobile communications and AI in the early stage,with a concentration to review the era from 3rd generation mobile communication technology(3G)to 5G when AI and mobile communications started to converge.With regard to telecommunications AI,the progress of AI in the ecosystem of mobile communications was further introduced in detail,including network infrastructure,network operation and management,business operation and management,intelligent applications towards business supporting system(BSS)&operation supporting system(OSS)convergence,verticals and private networks,etc.Then the classifications of AI in telecommunication ecosystems were summarized along with its evolution paths specified by various international telecommunications standardization organizations.Towards the next decade,the prospective roadmap of telecommunications AI was forecasted.In line with 3rd generation partnership project(3GPP)and International Telecommunication Union Radiocommunication Sector(ITU-R)timeline of 5G&6th generation mobile communication technology(6G),the network intelligence following 3GPP and open radio access network(O-RAN)routes,experience and intent-based network management and operation,network AI signaling system,intelligent middle-office based BSS,intelligent customer experience management and policy control driven by BSS&OSS convergence,evolution from service level agreement(SLA)to experience level agreement(ELA),and intelligent private network for verticals were further explored.The paper is concluded with the vision that AI will reshape the future beyond 5G(B5G)/6G landscape,and we need pivot our research and development(R&D),standardizations,and ecosystem to fully take the unprecedented opportunities. 展开更多
关键词 artificial intelligence(AI) mobile communication 5th generation(5G) general purpose technology(GPT) network intelligence intent-based network network AI signaling system
原文传递
A Survey on Noncooperative Games and Distributed Nash Equilibrium Seeking over Multi-Agent Networks
5
作者 Peng Yi Jinlong Lei +3 位作者 Xiuxian Li Shu Liang Min Meng Jie Chen caai artificial intelligence research 2022年第1期8-27,共20页
The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Fir... The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Firstly,we give the basic formulation and analysis of noncooperative games with continuous action spaces,and provide the motivation and basic setting for distributed Nash equilibrium seeking.Then we introduce both the gradient-based algorithms and best-response based algorithms for various type of games,including zero-sum games,aggregative games,potential games,monotone games,and multi-cluster games.In addition,we provide some applications of noncooperative games. 展开更多
关键词 noncooperative games multi-agent systems optimization and decision making cyber-physical systems Nash equilibrium distributed computation
原文传递
A Survey of Vision and Language Related Multi-Modal Task
6
作者 Lanxiao Wang Wenzhe Hu +5 位作者 Heqian Qiu Chao Shang Taijin Zhao Benliu Qiu King Ngi Ngan Hongliang Li caai artificial intelligence research 2022年第2期111-136,共26页
With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalitie... With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalities,and a modality represents a type of behavior or state.Common multi-modal information includes vision,hearing,language,touch,and smell.Vision and language are two of the most common modalities in human daily life,and many typical multi-modal tasks focus on these two modalities,such as visual captioning and visual grounding.In this paper,we conduct in-depth research on typical tasks of vision and language from the perspectives of generation,analysis,and reasoning.First,the analysis and summary with the typical tasks and some pretty classical methods are introduced,which will be generalized from the aspects of different algorithmic concerns,and be further discussed frequently used datasets and metrics.Then,some other variant tasks and cutting-edge tasks are briefly summarized to build a more comprehensive vision and language related multi-modal tasks framework.Finally,we further discuss the development of pre-training related research and make an outlook for future research.We hope this survey can help relevant researchers to understand the latest progress,existing problems,and exploration directions of vision and language multi-modal related tasks,and provide guidance for future research. 展开更多
关键词 deep learning vision and language multi-modal generation multi-modal analysis multi-modal reasoning pre-training
原文传递
LWD-3D:Lightweight Detector Based on Self-Attention for 3D Object Detection
7
作者 Shuo Yang Huimin Lu +2 位作者 Tohru Kamiya Yoshihisa Nakatoh Seiichi Serikawa caai artificial intelligence research 2022年第2期137-143,共7页
Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models a... Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate.However,building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem.In this paper,we focus on combining convolutional neural networks with selfattention-based vision transformers to realize lightweight and high-speed computing for 3D object detection.We propose lightweight detection 3D(LWD-3D),which is a point cloud conversion and lightweight vision transformer for autonomous driving.LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data,which provides a new feature representation method based on a vision transformer for 3D detection applications.The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection(time per image<20 ms).LWD-3D obtains a mean average precision(mAP)75%higher than that of another 3D real-time detector with half the number of parameters.Our research extends the application of visual transformers to 3D object detection tasks. 展开更多
关键词 3D object detection point clouds vision transformer one-shot regression real-time
原文传递
A Review of Disentangled Representation Learning for Remote Sensing Data
8
作者 Mi Wang Huiwen Wang +1 位作者 Jing Xiao Liang Liao caai artificial intelligence research 2022年第2期172-190,共19页
representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and contr... representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace,providing a robust representation for complex changes in the data.In this paper,we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation.Then,disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability.Subsequently,the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified.Finally,the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development. 展开更多
关键词 disentangled representation learning latent representation remote sensing data deep learning
原文传递
Multi-Label Image Classification with Weak Correlation Prior
9
作者 Xiao Ouyang Ruidong Fan +1 位作者 Hong Tao Chenping Hou caai artificial intelligence research 2022年第1期79-92,共14页
Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional m... Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods. 展开更多
关键词 image recognition label correlation multi-label classification weakly-supervised learning
原文传递
Self-Sparse Generative Adversarial Networks
10
作者 Wenliang Qian Yang Xu +1 位作者 Wangmeng Zuo Hui Li caai artificial intelligence research 2022年第1期68-78,共11页
Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the req... Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the requirement of optimization in the high dimensional parameter space and the zero gradient problem.In this work,we propose a self-sparse generative adversarial network(Self-Sparse GAN)that reduces the parameter space and alleviates the zero gradient problem.In the Self-Sparse GAN,we design a self-adaptive sparse transform module(SASTM)comprising the sparsity decomposition and feature-map recombination,which can be applied on multi-channel feature maps to obtain sparse feature maps.The key idea of Self-Sparse GAN is to add the SASTM following every deconvolution layer in the generator,which can adaptively reduce the parameter space by utilizing the sparsity in multi-channel feature maps.We theoretically prove that the SASTM can not only reduce the search space of the convolution kernel weight of the generator but also alleviate the zero gradient problem by maintaining meaningful features in the batch normalization layer and driving the weight of deconvolution layers away from being negative.The experimental results show that our method achieves the best Fréchet inception distance(FID)scores for image generation compared with Wasserstein GAN with gradient penalty(WGAN-GP)on MNIST,Fashion-MNIST,CIFAR-10,STL-10,mini-ImageNet,CELEBA-HQ,and LSUN bedrooms datasets,and the relative decrease of FID is 4.76%-21.84%.Meanwhile,an architectural sketch dataset(Sketch)is also used to validate the superiority of the proposed method. 展开更多
关键词 generative adversarial networks self-adaptive sparse transform module self-sparse generative adversarial network(Self-Sparse GAN)
原文传递
Message from Editor-in-Chief
11
作者 Qionghai Dai caai artificial intelligence research 2022年第1期I0001-I0001,共1页
Dear readers,Welcome to the inaugural issue of CAAI Artificial Intelligence Research(CAAI AIR)!As the Editor-in-Chief,I am delighted to introduce the first issue of CAAI AIR.The journal is one of the high-start new-jo... Dear readers,Welcome to the inaugural issue of CAAI Artificial Intelligence Research(CAAI AIR)!As the Editor-in-Chief,I am delighted to introduce the first issue of CAAI AIR.The journal is one of the high-start new-journal-projects in the Excellence Action Plan of China Science and Technology Journals,aiming to reflect the state-of-the-art achievements in the field of artificial intelligence(AI)and its applications.The journal is jointly sponsored by Chinese Association for Artificial Intelligence(CAAI)and Tsinghua University,published by Tsinghua University Press quarterly. 展开更多
关键词 JOURNAL jointly artificial
原文传递
Meta-Semi:A Meta-Learning Approach for Semi-Supervised Learning
12
作者 Yulin Wang Jiayi Guo +3 位作者 Jiangshan Wang Cheng Wu Shiji Song Gao Huang caai artificial intelligence research 2022年第2期161-171,共11页
Deep learning based semi-supervised learning(SSL)algorithms have led to promising results in recent years.However,they tend to introduce multiple tunable hyper-parameters,making them less practical in real SSL scenari... Deep learning based semi-supervised learning(SSL)algorithms have led to promising results in recent years.However,they tend to introduce multiple tunable hyper-parameters,making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search.In this paper,we propose a novel meta-learning based SSL algorithm(Meta-Semi)that requires tuning only one additional hyper-parameter,compared with a standard supervised deep learning algorithm,to achieve competitive performance under various conditions of SSL.We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples,which are associated with soft pseudo labels during training.As the meta problem is computationally intensive to solve directly,we propose an efficient algorithm to dynamically obtain the approximate solutions.We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions.Empirically,Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks,and achieves competitive performance on CIFAR-10 and SVHN. 展开更多
关键词 deep learning semi-supervised learning computer vision
原文传递
A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition
13
作者 Xueyang Wu Rongzhong Lian +4 位作者 Di Jiang Yuanfeng Song Weiwei Zhao Qian Xu Qiang Yang caai artificial intelligence research 2022年第1期1-7,共7页
Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to anno... Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets. 展开更多
关键词 pre-training automatic speech recognition self-supervised learning
原文传递
A Survey on Intelligent Optimization Approaches to Boiler Combustion Optimization
14
作者 Jing Liang Hao Guo +3 位作者 Ke Chen Kunjie Yu Caitong Yue Yunpeng Ma caai artificial intelligence research 2023年第1期16-31,共16页
This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optim... This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optimization,among which evolutionary computing(EC)techniques have recently gained much attention.However,the existing researches are not sufficiently focused and have not been summarized systematically.This has led to slow progress of research on boiler combustion optimization and has obstacles in the application.This paper introduces a comprehensive survey of the works of intelligent optimization algorithms in boiler combustion optimization and summarizes the contributions of different optimization algorithms.Finally,this paper discusses new research challenges and outlines future research directions,which can guide boiler combustion optimization to improve energy efficiency and reduce pollutant emission concentrations. 展开更多
关键词 boiler combustion optimization circulating fluidized bed boiler environmental protection computational intelligence intelligent optimization algorithm
原文传递
TACFN:Transformer-Based Adaptive Cross-Modal Fusion Network for Multimodal Emotion Recognition
15
作者 Feng Liu Ziwang Fu +1 位作者 Yunlong Wang Qijian Zheng caai artificial intelligence research 2023年第1期75-82,共8页
The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suf... The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suffers from redundant features and does not capture complementary features well.We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction,and the features that can reinforce a modality may contain only a part of it.To this end,we design an innovative Transformer-based Adaptive Cross-modal Fusion Network(TACFN).Specifically,for the redundant features,we make one modality perform intra-modal feature selection through a self-attention mechanism,so that the selected features can adaptively and efficiently interact with another modality.To better capture the complementary information between the modalities,we obtain the fused weight vector by splicing and use the weight vector to achieve feature reinforcement of the modalities.We apply TCAFN to the RAVDESS and IEMOCAP datasets.For fair comparison,we use the same unimodal representations to validate the effectiveness of the proposed fusion method.The experimental results show that TACFN brings a significant performance improvement compared to other methods and reaches the state-of-the-art performance.All code and models could be accessed from https://github.com/shuzihuaiyu/TACFN. 展开更多
关键词 multimodal emotion recognition multimodal fusion adaptive cross-modal blocks Transformer computational perception
原文传递
Game Interactive Learning:A New Paradigm towards Intelligent Decision-Making
16
作者 Junliang Xing Zhe Wu +4 位作者 Zhaoke Yu Renye Yan Zhipeng Ji Pin Tao Yuanchun Shi caai artificial intelligence research 2023年第1期65-74,共10页
Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game sc... Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction.In complex practical systems,however,factors like coupled distracting features,long-term interact links,and adversarial environments and opponents,make decision-making in practical applications challenging in modeling,computing,and explaining.This work proposes game interactive learning,a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments.This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems.It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system.The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process.It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice.Human interactions are involved when the learning process needs guidance from additional knowledge and instructions,or the human wants to understand the learning machine better.We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios.Experimental results demonstrate the effectiveness of the proposed learning paradigm. 展开更多
关键词 decision-making game interactive learning human-computer interaction game theory machine learning
原文传递
Private Data Manipulation in Sponsored Search Auctions
17
作者 Xiaotie Deng Tao Lin Tao Xiao caai artificial intelligence research 2023年第1期114-122,共9页
The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be ... The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be manipulated,which may hurt the seller’s revenue.We model this problem as a Private Data Manipulation(PDM)game:the seller first announces an auction(such as Myerson’s)whose allocation and payment rules depend on the value distributions of buyers;the buyers then submit fake value distributions to the seller to implement the auction.The seller’s expected revenue and the buyers’expected utilities depend on the auction rule and the game played among the buyers in their choices of the submitted distributions.Under the PDM game,we show that Myerson’s auction is equivalent to the generalized first-price auction,and under further assumptions equivalent to the Vickrey-Clarke-Groves(VCG)auction and the generalized second-price auction.Our results partially explain why Myerson’s auction is not as popular as the generalized second-price auction in the practice of sponsored search auctions,and provide new perspectives into data-driven decision making in mechanism design. 展开更多
关键词 Internet economics sponsored search auction Myerson’s auction generalized first-price auction data-driven decision making
原文传递
CamDiff:Camouflage Image Augmentation via Diffusion
18
作者 Xue-Jing Luo Shuo Wang +4 位作者 Zongwei Wu Christos Sakaridis Yun Cheng Deng-Ping Fan Luc Van Gool caai artificial intelligence research 2023年第1期55-64,共10页
The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as e... The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as existing methods may misclassify salient objects as camouflaged ones,despite these contradictory characteristics.This limitation may stem from the lack of multipattern training images,leading to reduced robustness against salient objects.To overcome the scarcity of multi-pattern training images,we introduce CamDiff,a novel approach inspired by AI-Generated Content(AIGC).Specifically,we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes,while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training(CLIP)model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt.Consequently,the synthesized image retains its original camouflage label while incorporating salient objects,yielding camouflaged scenes with richer characteristics.The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more;thus,such samples pose a greater challenge to the existing COD models.Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost.It significantly enhances the training and testing phases of COD baselines,granting them robustness across diverse domains.Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff. 展开更多
关键词 AI-generated content diffusion model camouflaged object detection salient object detection
原文传递
3D Single Object Tracking with Multi-View Unsupervised Center Uncertainty Learning
19
作者 Chengpeng Zhong Hui Shuai +2 位作者 Jiaqing Fan Kaihua Zhang Qingshan Liu caai artificial intelligence research 2023年第1期45-54,共10页
Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of nois... Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of noise in the labeling.Therefore,directly regressing the center coordinates is not very reasonable.Existing methods usually use volumetric-based,point-based,and view-based methods,with a relatively single modality.In addition,the sampling strategies commonly used usually result in the loss of object information,and holistic and detailed information is beneficial for object localization.To address these challenges,we propose a novel Multi-view unsupervised center Uncertainty 3D single object Tracker(MUT).MUT models the potential uncertainty of center coordinates localization using an unsupervised manner,allowing the model to learn the true distribution.By projecting point clouds,MUT can obtain multi-view depth map features,realize efficient knowledge transfer from 2D to 3D,and provide another modality information for the tracker.We also propose a former attraction probability sampling strategy that preserves object information.By using both holistic and detailed descriptors of point clouds,the tracker can have a more comprehensive understanding of the tracking environment.Experimental results show that the proposed MUT network outperforms the baseline models on the KITTI dataset by 0.8%and 0.6%in precision and success rate,respectively,and on the NuScenes dataset by 1.4%,and 6.1%in precision and success rate,respectively.The code is made available at https://github.com/abchears/MUT.git. 展开更多
关键词 3D single object tracking uncertainty modeling multi-view feature holistic and detailed descriptor
原文传递
Decision Making in Team-Adversary Games with Combinatorial Action Space
20
作者 Shuxin Li Youzhi Zhang +2 位作者 Xinrun Wang Wanqi Xue Bo An caai artificial intelligence research 2023年第1期102-113,共12页
The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action ... The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action space of the team is combinatorial and exponentially related to the number of team members.It also hampers the existing equilibrium finding algorithms from solving team-adversary games efficiently.To solve this issue caused by the combinatorial action space,we propose a novel framework based on Counterfactual Regret Minimization(CFR)framework:CFR-MIX.Firstly,we propose a new strategy representation to replace the traditional joint action strategy by using the individual action strategies of all the team members,which can significantly reduce the strategy space.To maintain the cooperation between team members,a strategy consistency relationship is proposed.Then,we transform the consistency relationship of the strategy to the regret consistency for computing the equilibrium strategy with the new strategy representation under the CFR framework.To guarantee the regret consistency relationship,a product-form decomposition method over cumulative regret values is proposed.To implement this decomposition method,our CFR-MIX framework employs a mixing layer under the CFR framework to get the final decision strategy for the team,i.e.,the Nash equilibrium strategy.Finally,we conduct experiments on games in different domains.Extensive results show that CFR-MIX significantly outperforms state-of-the-art algorithms.We hope it can help the team make decisions in large-scale team-adversary games. 展开更多
关键词 decision making team-adversary games Nash equilibrium Counterfactual Regret Minimization(CFR)
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部