In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notab...In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notably,humans are distinguished from their closest primate cousins by their social cognitive skills as opposed to their physical counterparts.We believe that artificial social intelligence(ASI)will play a crucial role in shaping the future of artificial intelligence(AI).This article begins with a review of ASI from a cognitive science standpoint,including social perception,theory of mind(ToM),and social interaction.Next,we examine the recently-emerged computational counterpart in the AI community.Finally,we provide an in-depth discussion on topics related to ASI.展开更多
This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic progra...This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.展开更多
The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse i...The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse in a new framework based on a broad range of technologies,including perception which enables us to precisely capture the characteristics of the real world,computation which supports the large computation requirement over large-scale data,reconstruction which builds the virtual world from the real one,cooperation which facilitates long-distance communication and teamwork between users,and interaction which bridges users and the virtual world.Despite its popularity,the fundamental techniques in this framework are still immature.Innovating new techniques to facilitate the applications of metaverse is necessary.In recent years,artificial intelligence(AI),especially deep learning,has shown promising results for empowering various areas,from science to industry.It is reasonable to imagine how we can combine AI with the framework in order to promote the development of metaverse.In this survey,we present the recent achievement by AI for metaverse in the proposed framework,including perception,computation,reconstruction,cooperation,and interaction.We also discuss some future works that AI can contribute to metaverse.展开更多
It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computin...It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computing industries,the rapid convergence of 5th generation mobile communication technology(5G)and AI is beginning to significantly transform the core communication infrastructure,network management,and vertical applications.The paper first outlined the individual roadmaps of mobile communications and AI in the early stage,with a concentration to review the era from 3rd generation mobile communication technology(3G)to 5G when AI and mobile communications started to converge.With regard to telecommunications AI,the progress of AI in the ecosystem of mobile communications was further introduced in detail,including network infrastructure,network operation and management,business operation and management,intelligent applications towards business supporting system(BSS)&operation supporting system(OSS)convergence,verticals and private networks,etc.Then the classifications of AI in telecommunication ecosystems were summarized along with its evolution paths specified by various international telecommunications standardization organizations.Towards the next decade,the prospective roadmap of telecommunications AI was forecasted.In line with 3rd generation partnership project(3GPP)and International Telecommunication Union Radiocommunication Sector(ITU-R)timeline of 5G&6th generation mobile communication technology(6G),the network intelligence following 3GPP and open radio access network(O-RAN)routes,experience and intent-based network management and operation,network AI signaling system,intelligent middle-office based BSS,intelligent customer experience management and policy control driven by BSS&OSS convergence,evolution from service level agreement(SLA)to experience level agreement(ELA),and intelligent private network for verticals were further explored.The paper is concluded with the vision that AI will reshape the future beyond 5G(B5G)/6G landscape,and we need pivot our research and development(R&D),standardizations,and ecosystem to fully take the unprecedented opportunities.展开更多
The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Fir...The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Firstly,we give the basic formulation and analysis of noncooperative games with continuous action spaces,and provide the motivation and basic setting for distributed Nash equilibrium seeking.Then we introduce both the gradient-based algorithms and best-response based algorithms for various type of games,including zero-sum games,aggregative games,potential games,monotone games,and multi-cluster games.In addition,we provide some applications of noncooperative games.展开更多
With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalitie...With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalities,and a modality represents a type of behavior or state.Common multi-modal information includes vision,hearing,language,touch,and smell.Vision and language are two of the most common modalities in human daily life,and many typical multi-modal tasks focus on these two modalities,such as visual captioning and visual grounding.In this paper,we conduct in-depth research on typical tasks of vision and language from the perspectives of generation,analysis,and reasoning.First,the analysis and summary with the typical tasks and some pretty classical methods are introduced,which will be generalized from the aspects of different algorithmic concerns,and be further discussed frequently used datasets and metrics.Then,some other variant tasks and cutting-edge tasks are briefly summarized to build a more comprehensive vision and language related multi-modal tasks framework.Finally,we further discuss the development of pre-training related research and make an outlook for future research.We hope this survey can help relevant researchers to understand the latest progress,existing problems,and exploration directions of vision and language multi-modal related tasks,and provide guidance for future research.展开更多
Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models a...Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate.However,building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem.In this paper,we focus on combining convolutional neural networks with selfattention-based vision transformers to realize lightweight and high-speed computing for 3D object detection.We propose lightweight detection 3D(LWD-3D),which is a point cloud conversion and lightweight vision transformer for autonomous driving.LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data,which provides a new feature representation method based on a vision transformer for 3D detection applications.The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection(time per image<20 ms).LWD-3D obtains a mean average precision(mAP)75%higher than that of another 3D real-time detector with half the number of parameters.Our research extends the application of visual transformers to 3D object detection tasks.展开更多
representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and contr...representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace,providing a robust representation for complex changes in the data.In this paper,we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation.Then,disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability.Subsequently,the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified.Finally,the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.展开更多
Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional m...Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.展开更多
Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the req...Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the requirement of optimization in the high dimensional parameter space and the zero gradient problem.In this work,we propose a self-sparse generative adversarial network(Self-Sparse GAN)that reduces the parameter space and alleviates the zero gradient problem.In the Self-Sparse GAN,we design a self-adaptive sparse transform module(SASTM)comprising the sparsity decomposition and feature-map recombination,which can be applied on multi-channel feature maps to obtain sparse feature maps.The key idea of Self-Sparse GAN is to add the SASTM following every deconvolution layer in the generator,which can adaptively reduce the parameter space by utilizing the sparsity in multi-channel feature maps.We theoretically prove that the SASTM can not only reduce the search space of the convolution kernel weight of the generator but also alleviate the zero gradient problem by maintaining meaningful features in the batch normalization layer and driving the weight of deconvolution layers away from being negative.The experimental results show that our method achieves the best Fréchet inception distance(FID)scores for image generation compared with Wasserstein GAN with gradient penalty(WGAN-GP)on MNIST,Fashion-MNIST,CIFAR-10,STL-10,mini-ImageNet,CELEBA-HQ,and LSUN bedrooms datasets,and the relative decrease of FID is 4.76%-21.84%.Meanwhile,an architectural sketch dataset(Sketch)is also used to validate the superiority of the proposed method.展开更多
Dear readers,Welcome to the inaugural issue of CAAI Artificial Intelligence Research(CAAI AIR)!As the Editor-in-Chief,I am delighted to introduce the first issue of CAAI AIR.The journal is one of the high-start new-jo...Dear readers,Welcome to the inaugural issue of CAAI Artificial Intelligence Research(CAAI AIR)!As the Editor-in-Chief,I am delighted to introduce the first issue of CAAI AIR.The journal is one of the high-start new-journal-projects in the Excellence Action Plan of China Science and Technology Journals,aiming to reflect the state-of-the-art achievements in the field of artificial intelligence(AI)and its applications.The journal is jointly sponsored by Chinese Association for Artificial Intelligence(CAAI)and Tsinghua University,published by Tsinghua University Press quarterly.展开更多
Deep learning based semi-supervised learning(SSL)algorithms have led to promising results in recent years.However,they tend to introduce multiple tunable hyper-parameters,making them less practical in real SSL scenari...Deep learning based semi-supervised learning(SSL)algorithms have led to promising results in recent years.However,they tend to introduce multiple tunable hyper-parameters,making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search.In this paper,we propose a novel meta-learning based SSL algorithm(Meta-Semi)that requires tuning only one additional hyper-parameter,compared with a standard supervised deep learning algorithm,to achieve competitive performance under various conditions of SSL.We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples,which are associated with soft pseudo labels during training.As the meta problem is computationally intensive to solve directly,we propose an efficient algorithm to dynamically obtain the approximate solutions.We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions.Empirically,Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks,and achieves competitive performance on CIFAR-10 and SVHN.展开更多
Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to anno...Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets.展开更多
This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optim...This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optimization,among which evolutionary computing(EC)techniques have recently gained much attention.However,the existing researches are not sufficiently focused and have not been summarized systematically.This has led to slow progress of research on boiler combustion optimization and has obstacles in the application.This paper introduces a comprehensive survey of the works of intelligent optimization algorithms in boiler combustion optimization and summarizes the contributions of different optimization algorithms.Finally,this paper discusses new research challenges and outlines future research directions,which can guide boiler combustion optimization to improve energy efficiency and reduce pollutant emission concentrations.展开更多
The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suf...The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suffers from redundant features and does not capture complementary features well.We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction,and the features that can reinforce a modality may contain only a part of it.To this end,we design an innovative Transformer-based Adaptive Cross-modal Fusion Network(TACFN).Specifically,for the redundant features,we make one modality perform intra-modal feature selection through a self-attention mechanism,so that the selected features can adaptively and efficiently interact with another modality.To better capture the complementary information between the modalities,we obtain the fused weight vector by splicing and use the weight vector to achieve feature reinforcement of the modalities.We apply TCAFN to the RAVDESS and IEMOCAP datasets.For fair comparison,we use the same unimodal representations to validate the effectiveness of the proposed fusion method.The experimental results show that TACFN brings a significant performance improvement compared to other methods and reaches the state-of-the-art performance.All code and models could be accessed from https://github.com/shuzihuaiyu/TACFN.展开更多
Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game sc...Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction.In complex practical systems,however,factors like coupled distracting features,long-term interact links,and adversarial environments and opponents,make decision-making in practical applications challenging in modeling,computing,and explaining.This work proposes game interactive learning,a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments.This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems.It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system.The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process.It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice.Human interactions are involved when the learning process needs guidance from additional knowledge and instructions,or the human wants to understand the learning machine better.We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios.Experimental results demonstrate the effectiveness of the proposed learning paradigm.展开更多
The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be ...The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be manipulated,which may hurt the seller’s revenue.We model this problem as a Private Data Manipulation(PDM)game:the seller first announces an auction(such as Myerson’s)whose allocation and payment rules depend on the value distributions of buyers;the buyers then submit fake value distributions to the seller to implement the auction.The seller’s expected revenue and the buyers’expected utilities depend on the auction rule and the game played among the buyers in their choices of the submitted distributions.Under the PDM game,we show that Myerson’s auction is equivalent to the generalized first-price auction,and under further assumptions equivalent to the Vickrey-Clarke-Groves(VCG)auction and the generalized second-price auction.Our results partially explain why Myerson’s auction is not as popular as the generalized second-price auction in the practice of sponsored search auctions,and provide new perspectives into data-driven decision making in mechanism design.展开更多
The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as e...The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as existing methods may misclassify salient objects as camouflaged ones,despite these contradictory characteristics.This limitation may stem from the lack of multipattern training images,leading to reduced robustness against salient objects.To overcome the scarcity of multi-pattern training images,we introduce CamDiff,a novel approach inspired by AI-Generated Content(AIGC).Specifically,we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes,while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training(CLIP)model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt.Consequently,the synthesized image retains its original camouflage label while incorporating salient objects,yielding camouflaged scenes with richer characteristics.The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more;thus,such samples pose a greater challenge to the existing COD models.Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost.It significantly enhances the training and testing phases of COD baselines,granting them robustness across diverse domains.Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.展开更多
Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of nois...Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of noise in the labeling.Therefore,directly regressing the center coordinates is not very reasonable.Existing methods usually use volumetric-based,point-based,and view-based methods,with a relatively single modality.In addition,the sampling strategies commonly used usually result in the loss of object information,and holistic and detailed information is beneficial for object localization.To address these challenges,we propose a novel Multi-view unsupervised center Uncertainty 3D single object Tracker(MUT).MUT models the potential uncertainty of center coordinates localization using an unsupervised manner,allowing the model to learn the true distribution.By projecting point clouds,MUT can obtain multi-view depth map features,realize efficient knowledge transfer from 2D to 3D,and provide another modality information for the tracker.We also propose a former attraction probability sampling strategy that preserves object information.By using both holistic and detailed descriptors of point clouds,the tracker can have a more comprehensive understanding of the tracking environment.Experimental results show that the proposed MUT network outperforms the baseline models on the KITTI dataset by 0.8%and 0.6%in precision and success rate,respectively,and on the NuScenes dataset by 1.4%,and 6.1%in precision and success rate,respectively.The code is made available at https://github.com/abchears/MUT.git.展开更多
The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action ...The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action space of the team is combinatorial and exponentially related to the number of team members.It also hampers the existing equilibrium finding algorithms from solving team-adversary games efficiently.To solve this issue caused by the combinatorial action space,we propose a novel framework based on Counterfactual Regret Minimization(CFR)framework:CFR-MIX.Firstly,we propose a new strategy representation to replace the traditional joint action strategy by using the individual action strategies of all the team members,which can significantly reduce the strategy space.To maintain the cooperation between team members,a strategy consistency relationship is proposed.Then,we transform the consistency relationship of the strategy to the regret consistency for computing the equilibrium strategy with the new strategy representation under the CFR framework.To guarantee the regret consistency relationship,a product-form decomposition method over cumulative regret values is proposed.To implement this decomposition method,our CFR-MIX framework employs a mixing layer under the CFR framework to get the final decision strategy for the team,i.e.,the Nash equilibrium strategy.Finally,we conduct experiments on games in different domains.Extensive results show that CFR-MIX significantly outperforms state-of-the-art algorithms.We hope it can help the team make decisions in large-scale team-adversary games.展开更多
基金supported in part by the National Key R&D Program of China(No.2022ZD0114900)and the Beijing Nova Program.
文摘In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notably,humans are distinguished from their closest primate cousins by their social cognitive skills as opposed to their physical counterparts.We believe that artificial social intelligence(ASI)will play a crucial role in shaping the future of artificial intelligence(AI).This article begins with a review of ASI from a cognitive science standpoint,including social perception,theory of mind(ToM),and social interaction.Next,we examine the recently-emerged computational counterpart in the AI community.Finally,we provide an in-depth discussion on topics related to ASI.
文摘This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.
基金This work was supported by the National Key Research and Development Program of China(Nos.2020AAA0105500 and 2021ZD0109901)the National Natural Science Foundation of China(Nos.62088102,62125106,and 61971260)the Beijing Municipal Science and Technology Commission(No.Z181100003118014).
文摘The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse in a new framework based on a broad range of technologies,including perception which enables us to precisely capture the characteristics of the real world,computation which supports the large computation requirement over large-scale data,reconstruction which builds the virtual world from the real one,cooperation which facilitates long-distance communication and teamwork between users,and interaction which bridges users and the virtual world.Despite its popularity,the fundamental techniques in this framework are still immature.Innovating new techniques to facilitate the applications of metaverse is necessary.In recent years,artificial intelligence(AI),especially deep learning,has shown promising results for empowering various areas,from science to industry.It is reasonable to imagine how we can combine AI with the framework in order to promote the development of metaverse.In this survey,we present the recent achievement by AI for metaverse in the proposed framework,including perception,computation,reconstruction,cooperation,and interaction.We also discuss some future works that AI can contribute to metaverse.
文摘It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computing industries,the rapid convergence of 5th generation mobile communication technology(5G)and AI is beginning to significantly transform the core communication infrastructure,network management,and vertical applications.The paper first outlined the individual roadmaps of mobile communications and AI in the early stage,with a concentration to review the era from 3rd generation mobile communication technology(3G)to 5G when AI and mobile communications started to converge.With regard to telecommunications AI,the progress of AI in the ecosystem of mobile communications was further introduced in detail,including network infrastructure,network operation and management,business operation and management,intelligent applications towards business supporting system(BSS)&operation supporting system(OSS)convergence,verticals and private networks,etc.Then the classifications of AI in telecommunication ecosystems were summarized along with its evolution paths specified by various international telecommunications standardization organizations.Towards the next decade,the prospective roadmap of telecommunications AI was forecasted.In line with 3rd generation partnership project(3GPP)and International Telecommunication Union Radiocommunication Sector(ITU-R)timeline of 5G&6th generation mobile communication technology(6G),the network intelligence following 3GPP and open radio access network(O-RAN)routes,experience and intent-based network management and operation,network AI signaling system,intelligent middle-office based BSS,intelligent customer experience management and policy control driven by BSS&OSS convergence,evolution from service level agreement(SLA)to experience level agreement(ELA),and intelligent private network for verticals were further explored.The paper is concluded with the vision that AI will reshape the future beyond 5G(B5G)/6G landscape,and we need pivot our research and development(R&D),standardizations,and ecosystem to fully take the unprecedented opportunities.
基金This work was supperted by Shanghai Sailing Program(Nos.20YF1453000 and 20YF1452800)the National Science Foundation of China(Nos.62003239,62003240,62003243,and 61903027)+1 种基金Shanghai Municipal Science and Technology Major Project(No.2021SHZDZX0100)Shanghai Municipal Commission of Science and Technology(No.19511132101).
文摘The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Firstly,we give the basic formulation and analysis of noncooperative games with continuous action spaces,and provide the motivation and basic setting for distributed Nash equilibrium seeking.Then we introduce both the gradient-based algorithms and best-response based algorithms for various type of games,including zero-sum games,aggregative games,potential games,monotone games,and multi-cluster games.In addition,we provide some applications of noncooperative games.
基金supported in part by the National Natural Science Foundation of China(No.61831005).
文摘With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalities,and a modality represents a type of behavior or state.Common multi-modal information includes vision,hearing,language,touch,and smell.Vision and language are two of the most common modalities in human daily life,and many typical multi-modal tasks focus on these two modalities,such as visual captioning and visual grounding.In this paper,we conduct in-depth research on typical tasks of vision and language from the perspectives of generation,analysis,and reasoning.First,the analysis and summary with the typical tasks and some pretty classical methods are introduced,which will be generalized from the aspects of different algorithmic concerns,and be further discussed frequently used datasets and metrics.Then,some other variant tasks and cutting-edge tasks are briefly summarized to build a more comprehensive vision and language related multi-modal tasks framework.Finally,we further discuss the development of pre-training related research and make an outlook for future research.We hope this survey can help relevant researchers to understand the latest progress,existing problems,and exploration directions of vision and language multi-modal related tasks,and provide guidance for future research.
基金supported by the National Natural Science Foundation of China(No.62206237)Japan Science Promotion Society(Nos.22K12093 and 22K12094)Japan Science and Technology Agency(No.JPMJST2281).
文摘Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate.However,building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem.In this paper,we focus on combining convolutional neural networks with selfattention-based vision transformers to realize lightweight and high-speed computing for 3D object detection.We propose lightweight detection 3D(LWD-3D),which is a point cloud conversion and lightweight vision transformer for autonomous driving.LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data,which provides a new feature representation method based on a vision transformer for 3D detection applications.The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection(time per image<20 ms).LWD-3D obtains a mean average precision(mAP)75%higher than that of another 3D real-time detector with half the number of parameters.Our research extends the application of visual transformers to 3D object detection tasks.
基金supported by the National Natural Science Foundation of China(Nos.61825103,62202349)the Natural Science Foundation of Hubei Province(Nos.2022CFB352,2020CFA001)the Key Research&Development of Hubei Province(No.2020BIB006).
文摘representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace,providing a robust representation for complex changes in the data.In this paper,we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation.Then,disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability.Subsequently,the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified.Finally,the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.
基金supported by the National Natural Science Foundation of China(Nos.61922087,61906201,62006238,and 62136005)the Natural Science Fund for Distinguished Young Scholars of Hunan Province(No.2019JJ20020).
文摘Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.
基金This work was supported by the National Natural Science Foundation of China(Nos.51921006 and 52008138)Heilongjiang Touyan Innovation Team Program(No.AUEA5640200320).
文摘Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the requirement of optimization in the high dimensional parameter space and the zero gradient problem.In this work,we propose a self-sparse generative adversarial network(Self-Sparse GAN)that reduces the parameter space and alleviates the zero gradient problem.In the Self-Sparse GAN,we design a self-adaptive sparse transform module(SASTM)comprising the sparsity decomposition and feature-map recombination,which can be applied on multi-channel feature maps to obtain sparse feature maps.The key idea of Self-Sparse GAN is to add the SASTM following every deconvolution layer in the generator,which can adaptively reduce the parameter space by utilizing the sparsity in multi-channel feature maps.We theoretically prove that the SASTM can not only reduce the search space of the convolution kernel weight of the generator but also alleviate the zero gradient problem by maintaining meaningful features in the batch normalization layer and driving the weight of deconvolution layers away from being negative.The experimental results show that our method achieves the best Fréchet inception distance(FID)scores for image generation compared with Wasserstein GAN with gradient penalty(WGAN-GP)on MNIST,Fashion-MNIST,CIFAR-10,STL-10,mini-ImageNet,CELEBA-HQ,and LSUN bedrooms datasets,and the relative decrease of FID is 4.76%-21.84%.Meanwhile,an architectural sketch dataset(Sketch)is also used to validate the superiority of the proposed method.
文摘Dear readers,Welcome to the inaugural issue of CAAI Artificial Intelligence Research(CAAI AIR)!As the Editor-in-Chief,I am delighted to introduce the first issue of CAAI AIR.The journal is one of the high-start new-journal-projects in the Excellence Action Plan of China Science and Technology Journals,aiming to reflect the state-of-the-art achievements in the field of artificial intelligence(AI)and its applications.The journal is jointly sponsored by Chinese Association for Artificial Intelligence(CAAI)and Tsinghua University,published by Tsinghua University Press quarterly.
基金supported by the National Key R&D Program of China(No.2019YFC1408703)the National Natural Science Foundation of China(No.62022048)THU-Bosch JCML,and Beijing Academy of Artificial Intelligence.
文摘Deep learning based semi-supervised learning(SSL)algorithms have led to promising results in recent years.However,they tend to introduce multiple tunable hyper-parameters,making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search.In this paper,we propose a novel meta-learning based SSL algorithm(Meta-Semi)that requires tuning only one additional hyper-parameter,compared with a standard supervised deep learning algorithm,to achieve competitive performance under various conditions of SSL.We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples,which are associated with soft pseudo labels during training.As the meta problem is computationally intensive to solve directly,we propose an efficient algorithm to dynamically obtain the approximate solutions.We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions.Empirically,Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks,and achieves competitive performance on CIFAR-10 and SVHN.
文摘Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets.
基金supported by the National Natural Science Foundation of China(Nos.61806179,61876169,61922072,61976237,61673404,62106230,62006069,62206255,and 62203332)China Postdoctoral Science Foundation(Nos.2021T140616,2021M692920,2022M712878,and 2022TQ0298)+2 种基金Key R&D Projects of Ministry of Science and Technology(No.2022YFD2001200)Key R&D and Promotion Projects in Henan Province(Nos.192102210098 and 212102210510)Henan Postdoctoral Foundation(No.202003019).
文摘This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optimization,among which evolutionary computing(EC)techniques have recently gained much attention.However,the existing researches are not sufficiently focused and have not been summarized systematically.This has led to slow progress of research on boiler combustion optimization and has obstacles in the application.This paper introduces a comprehensive survey of the works of intelligent optimization algorithms in boiler combustion optimization and summarizes the contributions of different optimization algorithms.Finally,this paper discusses new research challenges and outlines future research directions,which can guide boiler combustion optimization to improve energy efficiency and reduce pollutant emission concentrations.
基金supported by Beijing Key Laboratory of Behavior and Mental Health,Peking University。
文摘The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suffers from redundant features and does not capture complementary features well.We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction,and the features that can reinforce a modality may contain only a part of it.To this end,we design an innovative Transformer-based Adaptive Cross-modal Fusion Network(TACFN).Specifically,for the redundant features,we make one modality perform intra-modal feature selection through a self-attention mechanism,so that the selected features can adaptively and efficiently interact with another modality.To better capture the complementary information between the modalities,we obtain the fused weight vector by splicing and use the weight vector to achieve feature reinforcement of the modalities.We apply TCAFN to the RAVDESS and IEMOCAP datasets.For fair comparison,we use the same unimodal representations to validate the effectiveness of the proposed fusion method.The experimental results show that TACFN brings a significant performance improvement compared to other methods and reaches the state-of-the-art performance.All code and models could be accessed from https://github.com/shuzihuaiyu/TACFN.
文摘Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction.In complex practical systems,however,factors like coupled distracting features,long-term interact links,and adversarial environments and opponents,make decision-making in practical applications challenging in modeling,computing,and explaining.This work proposes game interactive learning,a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments.This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems.It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system.The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process.It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice.Human interactions are involved when the learning process needs guidance from additional knowledge and instructions,or the human wants to understand the learning machine better.We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios.Experimental results demonstrate the effectiveness of the proposed learning paradigm.
基金supported by the Science and Technology Innovation 2030-“New Generation Artificial Intelligence”Major Project(No.2018AAA0100901)National Natural Science Foundation of China(Nos.61761146005 and 61632017).
文摘The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be manipulated,which may hurt the seller’s revenue.We model this problem as a Private Data Manipulation(PDM)game:the seller first announces an auction(such as Myerson’s)whose allocation and payment rules depend on the value distributions of buyers;the buyers then submit fake value distributions to the seller to implement the auction.The seller’s expected revenue and the buyers’expected utilities depend on the auction rule and the game played among the buyers in their choices of the submitted distributions.Under the PDM game,we show that Myerson’s auction is equivalent to the generalized first-price auction,and under further assumptions equivalent to the Vickrey-Clarke-Groves(VCG)auction and the generalized second-price auction.Our results partially explain why Myerson’s auction is not as popular as the generalized second-price auction in the practice of sponsored search auctions,and provide new perspectives into data-driven decision making in mechanism design.
文摘The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as existing methods may misclassify salient objects as camouflaged ones,despite these contradictory characteristics.This limitation may stem from the lack of multipattern training images,leading to reduced robustness against salient objects.To overcome the scarcity of multi-pattern training images,we introduce CamDiff,a novel approach inspired by AI-Generated Content(AIGC).Specifically,we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes,while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training(CLIP)model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt.Consequently,the synthesized image retains its original camouflage label while incorporating salient objects,yielding camouflaged scenes with richer characteristics.The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more;thus,such samples pose a greater challenge to the existing COD models.Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost.It significantly enhances the training and testing phases of COD baselines,granting them robustness across diverse domains.Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.
文摘Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of noise in the labeling.Therefore,directly regressing the center coordinates is not very reasonable.Existing methods usually use volumetric-based,point-based,and view-based methods,with a relatively single modality.In addition,the sampling strategies commonly used usually result in the loss of object information,and holistic and detailed information is beneficial for object localization.To address these challenges,we propose a novel Multi-view unsupervised center Uncertainty 3D single object Tracker(MUT).MUT models the potential uncertainty of center coordinates localization using an unsupervised manner,allowing the model to learn the true distribution.By projecting point clouds,MUT can obtain multi-view depth map features,realize efficient knowledge transfer from 2D to 3D,and provide another modality information for the tracker.We also propose a former attraction probability sampling strategy that preserves object information.By using both holistic and detailed descriptors of point clouds,the tracker can have a more comprehensive understanding of the tracking environment.Experimental results show that the proposed MUT network outperforms the baseline models on the KITTI dataset by 0.8%and 0.6%in precision and success rate,respectively,and on the NuScenes dataset by 1.4%,and 6.1%in precision and success rate,respectively.The code is made available at https://github.com/abchears/MUT.git.
文摘The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action space of the team is combinatorial and exponentially related to the number of team members.It also hampers the existing equilibrium finding algorithms from solving team-adversary games efficiently.To solve this issue caused by the combinatorial action space,we propose a novel framework based on Counterfactual Regret Minimization(CFR)framework:CFR-MIX.Firstly,we propose a new strategy representation to replace the traditional joint action strategy by using the individual action strategies of all the team members,which can significantly reduce the strategy space.To maintain the cooperation between team members,a strategy consistency relationship is proposed.Then,we transform the consistency relationship of the strategy to the regret consistency for computing the equilibrium strategy with the new strategy representation under the CFR framework.To guarantee the regret consistency relationship,a product-form decomposition method over cumulative regret values is proposed.To implement this decomposition method,our CFR-MIX framework employs a mixing layer under the CFR framework to get the final decision strategy for the team,i.e.,the Nash equilibrium strategy.Finally,we conduct experiments on games in different domains.Extensive results show that CFR-MIX significantly outperforms state-of-the-art algorithms.We hope it can help the team make decisions in large-scale team-adversary games.