Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in ...Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in contribution between different-level features,and(2)designing an effective mechanism for fusing these features.Unlike existing CNN-based methods,we adopt a transformer encoder,which learns more powerful and robust representations.In addition,considering the image acquisition influence and elusive properties of polyps,we introduce three standard modules,including a cascaded fusion module(CFM),a camouflage identification module(CIM),and a similarity aggregation module(SAM).Among these,the CFM is used to collect the semantic and location information of polyps from high-level features;the CIM is applied to capture polyp information disguised in low-level features,and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area,thereby effectively fusing cross-level features.The proposed model,named Polyp-PVT,effectively suppresses noises in the features and significantly improves their expressive capabilities.Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations(e.g.,appearance changes,small objects,and rotation)than existing representative methods.The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.展开更多
This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optim...This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optimization,among which evolutionary computing(EC)techniques have recently gained much attention.However,the existing researches are not sufficiently focused and have not been summarized systematically.This has led to slow progress of research on boiler combustion optimization and has obstacles in the application.This paper introduces a comprehensive survey of the works of intelligent optimization algorithms in boiler combustion optimization and summarizes the contributions of different optimization algorithms.Finally,this paper discusses new research challenges and outlines future research directions,which can guide boiler combustion optimization to improve energy efficiency and reduce pollutant emission concentrations.展开更多
The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suf...The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suffers from redundant features and does not capture complementary features well.We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction,and the features that can reinforce a modality may contain only a part of it.To this end,we design an innovative Transformer-based Adaptive Cross-modal Fusion Network(TACFN).Specifically,for the redundant features,we make one modality perform intra-modal feature selection through a self-attention mechanism,so that the selected features can adaptively and efficiently interact with another modality.To better capture the complementary information between the modalities,we obtain the fused weight vector by splicing and use the weight vector to achieve feature reinforcement of the modalities.We apply TCAFN to the RAVDESS and IEMOCAP datasets.For fair comparison,we use the same unimodal representations to validate the effectiveness of the proposed fusion method.The experimental results show that TACFN brings a significant performance improvement compared to other methods and reaches the state-of-the-art performance.All code and models could be accessed from https://github.com/shuzihuaiyu/TACFN.展开更多
Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game sc...Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction.In complex practical systems,however,factors like coupled distracting features,long-term interact links,and adversarial environments and opponents,make decision-making in practical applications challenging in modeling,computing,and explaining.This work proposes game interactive learning,a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments.This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems.It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system.The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process.It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice.Human interactions are involved when the learning process needs guidance from additional knowledge and instructions,or the human wants to understand the learning machine better.We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios.Experimental results demonstrate the effectiveness of the proposed learning paradigm.展开更多
The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be ...The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be manipulated,which may hurt the seller’s revenue.We model this problem as a Private Data Manipulation(PDM)game:the seller first announces an auction(such as Myerson’s)whose allocation and payment rules depend on the value distributions of buyers;the buyers then submit fake value distributions to the seller to implement the auction.The seller’s expected revenue and the buyers’expected utilities depend on the auction rule and the game played among the buyers in their choices of the submitted distributions.Under the PDM game,we show that Myerson’s auction is equivalent to the generalized first-price auction,and under further assumptions equivalent to the Vickrey-Clarke-Groves(VCG)auction and the generalized second-price auction.Our results partially explain why Myerson’s auction is not as popular as the generalized second-price auction in the practice of sponsored search auctions,and provide new perspectives into data-driven decision making in mechanism design.展开更多
The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as e...The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as existing methods may misclassify salient objects as camouflaged ones,despite these contradictory characteristics.This limitation may stem from the lack of multipattern training images,leading to reduced robustness against salient objects.To overcome the scarcity of multi-pattern training images,we introduce CamDiff,a novel approach inspired by AI-Generated Content(AIGC).Specifically,we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes,while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training(CLIP)model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt.Consequently,the synthesized image retains its original camouflage label while incorporating salient objects,yielding camouflaged scenes with richer characteristics.The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more;thus,such samples pose a greater challenge to the existing COD models.Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost.It significantly enhances the training and testing phases of COD baselines,granting them robustness across diverse domains.Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.展开更多
Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of nois...Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of noise in the labeling.Therefore,directly regressing the center coordinates is not very reasonable.Existing methods usually use volumetric-based,point-based,and view-based methods,with a relatively single modality.In addition,the sampling strategies commonly used usually result in the loss of object information,and holistic and detailed information is beneficial for object localization.To address these challenges,we propose a novel Multi-view unsupervised center Uncertainty 3D single object Tracker(MUT).MUT models the potential uncertainty of center coordinates localization using an unsupervised manner,allowing the model to learn the true distribution.By projecting point clouds,MUT can obtain multi-view depth map features,realize efficient knowledge transfer from 2D to 3D,and provide another modality information for the tracker.We also propose a former attraction probability sampling strategy that preserves object information.By using both holistic and detailed descriptors of point clouds,the tracker can have a more comprehensive understanding of the tracking environment.Experimental results show that the proposed MUT network outperforms the baseline models on the KITTI dataset by 0.8%and 0.6%in precision and success rate,respectively,and on the NuScenes dataset by 1.4%,and 6.1%in precision and success rate,respectively.The code is made available at https://github.com/abchears/MUT.git.展开更多
The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action ...The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action space of the team is combinatorial and exponentially related to the number of team members.It also hampers the existing equilibrium finding algorithms from solving team-adversary games efficiently.To solve this issue caused by the combinatorial action space,we propose a novel framework based on Counterfactual Regret Minimization(CFR)framework:CFR-MIX.Firstly,we propose a new strategy representation to replace the traditional joint action strategy by using the individual action strategies of all the team members,which can significantly reduce the strategy space.To maintain the cooperation between team members,a strategy consistency relationship is proposed.Then,we transform the consistency relationship of the strategy to the regret consistency for computing the equilibrium strategy with the new strategy representation under the CFR framework.To guarantee the regret consistency relationship,a product-form decomposition method over cumulative regret values is proposed.To implement this decomposition method,our CFR-MIX framework employs a mixing layer under the CFR framework to get the final decision strategy for the team,i.e.,the Nash equilibrium strategy.Finally,we conduct experiments on games in different domains.Extensive results show that CFR-MIX significantly outperforms state-of-the-art algorithms.We hope it can help the team make decisions in large-scale team-adversary games.展开更多
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making(IDM)systems.Consequently,IDM shoul...The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making(IDM)systems.Consequently,IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications.The advancement of Artificial General Intelligence(AGI)that transcends task and application boundaries is critical for enhancing IDM.Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks,including computer vision,natural language processing,and reinforcement learning.We propose that a Foundation Decision Model(FDM)can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture,offering a promising solution for expanding IDM applications in complex real-world situations.In this paper,we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI,production scheduling,and robotics tasks.Lastly,we present a case study demonstrating our FDM implementation,DigitalBrain(DB1)with 1.3 billion parameters,achieving human-level performance in 870 tasks,such as text generation,image captioning,video game playing,robotic control,and traveling salesman problems.As a foundation decision model,DB1 represents an initial step toward more autonomous and efficient real-world IDM applications.展开更多
In Video-based Point Cloud Compression(V-PCC),2D videos to be encoded are generated by 3D point cloud projection,and compressed by High Efficiency Video Coding(HEVC).In the process of 2D video compression,the best mod...In Video-based Point Cloud Compression(V-PCC),2D videos to be encoded are generated by 3D point cloud projection,and compressed by High Efficiency Video Coding(HEVC).In the process of 2D video compression,the best mode of Coding Unit(CU)is searched by brute-force strategy,which greatly increases the complexity of the encoding process.To address this issue,we first propose a simple and effective Portable Perceptron Network(PPN)-based fast mode decision method for V-PCC under Random Access(RA)configuration.Second,we extract seven simple hand-extracted features for input into the PPN network.Third,we design an adaptive loss function,which can calculate the loss by allocating different weights according to different Rate-Distortion(RD)costs,to train our PPN network.Finally,experimental results show that the proposed method can save encoding complexity of 43.13%with almost no encoding efficiency loss under RA configuration,which is superior to the state-of-the-art methods.The source code is available at https://github.com/Mesks/PPNforV-PCC.展开更多
Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution s...Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution shift between the learning policy and the previously-collected offline dataset lies at the core of offline RL.To tackle this problem,some methods underestimate the values of states given by learned dynamics models or state-action pairs with actions sampled from policies different from the behavior policy.However,since these generated states or state-action pairs are not guaranteed to be OOD,staying conservative on them may adversely affect the in-distribution ones.In this paper,we propose an OOD state-conservative offline RL method(OSCAR),which aims to address the limitation by explicitly generating reliable OOD states that are located near the manifold of the offline dataset,and then design a conservative policy evaluation approach that combines the vanilla Bellman error with a regularization term that only underestimates the values of these generated OOD states.In this way,we can prevent the value errors of OOD states from propagating to in-distribution states through value bootstrapping and policy improvement.We also theoretically prove that the proposed conservative policy evaluation approach guarantees to underestimate the values of OOD states.OSCAR along with several strong baselines is evaluated on the offline decision-making benchmarks D4RL and autonomous driving benchmark SMARTS.Experimental results show that OSCAR outperforms the baselines on a large portion of the benchmarks and attains the highest average return,substantially outperforming existing offline RL methods.展开更多
Combinatorial Optimization(CO)problems have been intensively studied for decades with a wide range of applications.For some classic CO problems,e.g.,the Traveling Salesman Problem(TSP),both traditional planning algori...Combinatorial Optimization(CO)problems have been intensively studied for decades with a wide range of applications.For some classic CO problems,e.g.,the Traveling Salesman Problem(TSP),both traditional planning algorithms and the emerging reinforcement learning have made solid progress in recent years.However,for CO problems with nested sub-tasks,neither end-to-end reinforcement learning algorithms nor traditional evolutionary methods can obtain satisfactory strategies within a limited time and computational resources.In this paper,we propose an algorithmic framework for solving CO problems with nested sub-tasks,in which learning and planning algorithms can be combined in a modular way.We validate our framework in the Job-Shop Scheduling Problem(JSSP),and the experimental results show that our algorithm has good performance in both solution qualities and model generalizations.展开更多
Aiming at the problem of poor tracking robustness caused by severe occlusion,deformation,and object rotation of deep learning object tracking algorithm in complex scenes,an improved deep reinforcement learning object ...Aiming at the problem of poor tracking robustness caused by severe occlusion,deformation,and object rotation of deep learning object tracking algorithm in complex scenes,an improved deep reinforcement learning object tracking algorithm based on actor-double critic network is proposed.In offline training phase,the actor network moves the rectangular box representing the object location according to the input sequence image to obtain the action value,that is,the horizontal,vertical,and scale transformation of the object.Then,the designed double critic network is used to evaluate the action value,and the output double Q value is averaged to guide the actor network to optimize the tracking strategy.The design of double critic network effectively improves the stability and convergence,especially in challenging scenes such as object occlusion,and the tracking performance is significantly improved.In online tracking phase,the well-trained actor network is used to infer the changing action of the bounding box,directly causing the tracker to move the box to the object position in the current frame.Several comparative tracking experiments were conducted on the OTB100 visual tracker benchmark and the experimental results show that more intensive reward settings significantly increase the actor network’s output probability of positive actions.This makes the tracking algorithm proposed in this paper outperforms the mainstream deep reinforcement learning tracking algorithms and deep learning tracking algorithms under the challenging attributes such as occlusion,deformation,and rotation.展开更多
In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notab...In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notably,humans are distinguished from their closest primate cousins by their social cognitive skills as opposed to their physical counterparts.We believe that artificial social intelligence(ASI)will play a crucial role in shaping the future of artificial intelligence(AI).This article begins with a review of ASI from a cognitive science standpoint,including social perception,theory of mind(ToM),and social interaction.Next,we examine the recently-emerged computational counterpart in the AI community.Finally,we provide an in-depth discussion on topics related to ASI.展开更多
This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic progra...This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.展开更多
The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse i...The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse in a new framework based on a broad range of technologies,including perception which enables us to precisely capture the characteristics of the real world,computation which supports the large computation requirement over large-scale data,reconstruction which builds the virtual world from the real one,cooperation which facilitates long-distance communication and teamwork between users,and interaction which bridges users and the virtual world.Despite its popularity,the fundamental techniques in this framework are still immature.Innovating new techniques to facilitate the applications of metaverse is necessary.In recent years,artificial intelligence(AI),especially deep learning,has shown promising results for empowering various areas,from science to industry.It is reasonable to imagine how we can combine AI with the framework in order to promote the development of metaverse.In this survey,we present the recent achievement by AI for metaverse in the proposed framework,including perception,computation,reconstruction,cooperation,and interaction.We also discuss some future works that AI can contribute to metaverse.展开更多
It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computin...It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computing industries,the rapid convergence of 5th generation mobile communication technology(5G)and AI is beginning to significantly transform the core communication infrastructure,network management,and vertical applications.The paper first outlined the individual roadmaps of mobile communications and AI in the early stage,with a concentration to review the era from 3rd generation mobile communication technology(3G)to 5G when AI and mobile communications started to converge.With regard to telecommunications AI,the progress of AI in the ecosystem of mobile communications was further introduced in detail,including network infrastructure,network operation and management,business operation and management,intelligent applications towards business supporting system(BSS)&operation supporting system(OSS)convergence,verticals and private networks,etc.Then the classifications of AI in telecommunication ecosystems were summarized along with its evolution paths specified by various international telecommunications standardization organizations.Towards the next decade,the prospective roadmap of telecommunications AI was forecasted.In line with 3rd generation partnership project(3GPP)and International Telecommunication Union Radiocommunication Sector(ITU-R)timeline of 5G&6th generation mobile communication technology(6G),the network intelligence following 3GPP and open radio access network(O-RAN)routes,experience and intent-based network management and operation,network AI signaling system,intelligent middle-office based BSS,intelligent customer experience management and policy control driven by BSS&OSS convergence,evolution from service level agreement(SLA)to experience level agreement(ELA),and intelligent private network for verticals were further explored.The paper is concluded with the vision that AI will reshape the future beyond 5G(B5G)/6G landscape,and we need pivot our research and development(R&D),standardizations,and ecosystem to fully take the unprecedented opportunities.展开更多
The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Fir...The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Firstly,we give the basic formulation and analysis of noncooperative games with continuous action spaces,and provide the motivation and basic setting for distributed Nash equilibrium seeking.Then we introduce both the gradient-based algorithms and best-response based algorithms for various type of games,including zero-sum games,aggregative games,potential games,monotone games,and multi-cluster games.In addition,we provide some applications of noncooperative games.展开更多
With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalitie...With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalities,and a modality represents a type of behavior or state.Common multi-modal information includes vision,hearing,language,touch,and smell.Vision and language are two of the most common modalities in human daily life,and many typical multi-modal tasks focus on these two modalities,such as visual captioning and visual grounding.In this paper,we conduct in-depth research on typical tasks of vision and language from the perspectives of generation,analysis,and reasoning.First,the analysis and summary with the typical tasks and some pretty classical methods are introduced,which will be generalized from the aspects of different algorithmic concerns,and be further discussed frequently used datasets and metrics.Then,some other variant tasks and cutting-edge tasks are briefly summarized to build a more comprehensive vision and language related multi-modal tasks framework.Finally,we further discuss the development of pre-training related research and make an outlook for future research.We hope this survey can help relevant researchers to understand the latest progress,existing problems,and exploration directions of vision and language multi-modal related tasks,and provide guidance for future research.展开更多
文摘Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in contribution between different-level features,and(2)designing an effective mechanism for fusing these features.Unlike existing CNN-based methods,we adopt a transformer encoder,which learns more powerful and robust representations.In addition,considering the image acquisition influence and elusive properties of polyps,we introduce three standard modules,including a cascaded fusion module(CFM),a camouflage identification module(CIM),and a similarity aggregation module(SAM).Among these,the CFM is used to collect the semantic and location information of polyps from high-level features;the CIM is applied to capture polyp information disguised in low-level features,and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area,thereby effectively fusing cross-level features.The proposed model,named Polyp-PVT,effectively suppresses noises in the features and significantly improves their expressive capabilities.Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations(e.g.,appearance changes,small objects,and rotation)than existing representative methods.The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.
基金supported by the National Natural Science Foundation of China(Nos.61806179,61876169,61922072,61976237,61673404,62106230,62006069,62206255,and 62203332)China Postdoctoral Science Foundation(Nos.2021T140616,2021M692920,2022M712878,and 2022TQ0298)+2 种基金Key R&D Projects of Ministry of Science and Technology(No.2022YFD2001200)Key R&D and Promotion Projects in Henan Province(Nos.192102210098 and 212102210510)Henan Postdoctoral Foundation(No.202003019).
文摘This paper reviews the researches on boiler combustion optimization,which is an important direction in the field of energy saving and emission reduction.Many methods have been used to deal with boiler combustion optimization,among which evolutionary computing(EC)techniques have recently gained much attention.However,the existing researches are not sufficiently focused and have not been summarized systematically.This has led to slow progress of research on boiler combustion optimization and has obstacles in the application.This paper introduces a comprehensive survey of the works of intelligent optimization algorithms in boiler combustion optimization and summarizes the contributions of different optimization algorithms.Finally,this paper discusses new research challenges and outlines future research directions,which can guide boiler combustion optimization to improve energy efficiency and reduce pollutant emission concentrations.
基金supported by Beijing Key Laboratory of Behavior and Mental Health,Peking University。
文摘The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suffers from redundant features and does not capture complementary features well.We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction,and the features that can reinforce a modality may contain only a part of it.To this end,we design an innovative Transformer-based Adaptive Cross-modal Fusion Network(TACFN).Specifically,for the redundant features,we make one modality perform intra-modal feature selection through a self-attention mechanism,so that the selected features can adaptively and efficiently interact with another modality.To better capture the complementary information between the modalities,we obtain the fused weight vector by splicing and use the weight vector to achieve feature reinforcement of the modalities.We apply TCAFN to the RAVDESS and IEMOCAP datasets.For fair comparison,we use the same unimodal representations to validate the effectiveness of the proposed fusion method.The experimental results show that TACFN brings a significant performance improvement compared to other methods and reaches the state-of-the-art performance.All code and models could be accessed from https://github.com/shuzihuaiyu/TACFN.
文摘Decision-making plays an essential role in various real-world systems like automatic driving,traffic dispatching,information system management,and emergency command and control.Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction.In complex practical systems,however,factors like coupled distracting features,long-term interact links,and adversarial environments and opponents,make decision-making in practical applications challenging in modeling,computing,and explaining.This work proposes game interactive learning,a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments.This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems.It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system.The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process.It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice.Human interactions are involved when the learning process needs guidance from additional knowledge and instructions,or the human wants to understand the learning machine better.We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios.Experimental results demonstrate the effectiveness of the proposed learning paradigm.
基金supported by the Science and Technology Innovation 2030-“New Generation Artificial Intelligence”Major Project(No.2018AAA0100901)National Natural Science Foundation of China(Nos.61761146005 and 61632017).
文摘The repeated nature of sponsored search auctions allows the seller to implement Myerson’s auction to maximize revenue using past data.But since these data are provided by strategic buyers in the auctions,they can be manipulated,which may hurt the seller’s revenue.We model this problem as a Private Data Manipulation(PDM)game:the seller first announces an auction(such as Myerson’s)whose allocation and payment rules depend on the value distributions of buyers;the buyers then submit fake value distributions to the seller to implement the auction.The seller’s expected revenue and the buyers’expected utilities depend on the auction rule and the game played among the buyers in their choices of the submitted distributions.Under the PDM game,we show that Myerson’s auction is equivalent to the generalized first-price auction,and under further assumptions equivalent to the Vickrey-Clarke-Groves(VCG)auction and the generalized second-price auction.Our results partially explain why Myerson’s auction is not as popular as the generalized second-price auction in the practice of sponsored search auctions,and provide new perspectives into data-driven decision making in mechanism design.
文摘The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as existing methods may misclassify salient objects as camouflaged ones,despite these contradictory characteristics.This limitation may stem from the lack of multipattern training images,leading to reduced robustness against salient objects.To overcome the scarcity of multi-pattern training images,we introduce CamDiff,a novel approach inspired by AI-Generated Content(AIGC).Specifically,we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes,while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training(CLIP)model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt.Consequently,the synthesized image retains its original camouflage label while incorporating salient objects,yielding camouflaged scenes with richer characteristics.The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more;thus,such samples pose a greater challenge to the existing COD models.Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost.It significantly enhances the training and testing phases of COD baselines,granting them robustness across diverse domains.Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.
文摘Center point localization is a major factor affecting the performance of 3D single object tracking.Point clouds themselves are a set of discrete points on the local surface of an object,and there is also a lot of noise in the labeling.Therefore,directly regressing the center coordinates is not very reasonable.Existing methods usually use volumetric-based,point-based,and view-based methods,with a relatively single modality.In addition,the sampling strategies commonly used usually result in the loss of object information,and holistic and detailed information is beneficial for object localization.To address these challenges,we propose a novel Multi-view unsupervised center Uncertainty 3D single object Tracker(MUT).MUT models the potential uncertainty of center coordinates localization using an unsupervised manner,allowing the model to learn the true distribution.By projecting point clouds,MUT can obtain multi-view depth map features,realize efficient knowledge transfer from 2D to 3D,and provide another modality information for the tracker.We also propose a former attraction probability sampling strategy that preserves object information.By using both holistic and detailed descriptors of point clouds,the tracker can have a more comprehensive understanding of the tracking environment.Experimental results show that the proposed MUT network outperforms the baseline models on the KITTI dataset by 0.8%and 0.6%in precision and success rate,respectively,and on the NuScenes dataset by 1.4%,and 6.1%in precision and success rate,respectively.The code is made available at https://github.com/abchears/MUT.git.
文摘The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary.However,decision-making in this type of game is a big challenge since the joint action space of the team is combinatorial and exponentially related to the number of team members.It also hampers the existing equilibrium finding algorithms from solving team-adversary games efficiently.To solve this issue caused by the combinatorial action space,we propose a novel framework based on Counterfactual Regret Minimization(CFR)framework:CFR-MIX.Firstly,we propose a new strategy representation to replace the traditional joint action strategy by using the individual action strategies of all the team members,which can significantly reduce the strategy space.To maintain the cooperation between team members,a strategy consistency relationship is proposed.Then,we transform the consistency relationship of the strategy to the regret consistency for computing the equilibrium strategy with the new strategy representation under the CFR framework.To guarantee the regret consistency relationship,a product-form decomposition method over cumulative regret values is proposed.To implement this decomposition method,our CFR-MIX framework employs a mixing layer under the CFR framework to get the final decision strategy for the team,i.e.,the Nash equilibrium strategy.Finally,we conduct experiments on games in different domains.Extensive results show that CFR-MIX significantly outperforms state-of-the-art algorithms.We hope it can help the team make decisions in large-scale team-adversary games.
文摘The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making(IDM)systems.Consequently,IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications.The advancement of Artificial General Intelligence(AGI)that transcends task and application boundaries is critical for enhancing IDM.Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks,including computer vision,natural language processing,and reinforcement learning.We propose that a Foundation Decision Model(FDM)can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture,offering a promising solution for expanding IDM applications in complex real-world situations.In this paper,we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI,production scheduling,and robotics tasks.Lastly,we present a case study demonstrating our FDM implementation,DigitalBrain(DB1)with 1.3 billion parameters,achieving human-level performance in 870 tasks,such as text generation,image captioning,video game playing,robotic control,and traveling salesman problems.As a foundation decision model,DB1 represents an initial step toward more autonomous and efficient real-world IDM applications.
基金supported by the National Natural Science Foundation of China(No.62001209).
文摘In Video-based Point Cloud Compression(V-PCC),2D videos to be encoded are generated by 3D point cloud projection,and compressed by High Efficiency Video Coding(HEVC).In the process of 2D video compression,the best mode of Coding Unit(CU)is searched by brute-force strategy,which greatly increases the complexity of the encoding process.To address this issue,we first propose a simple and effective Portable Perceptron Network(PPN)-based fast mode decision method for V-PCC under Random Access(RA)configuration.Second,we extract seven simple hand-extracted features for input into the PPN network.Third,we design an adaptive loss function,which can calculate the loss by allocating different weights according to different Rate-Distortion(RD)costs,to train our PPN network.Finally,experimental results show that the proposed method can save encoding complexity of 43.13%with almost no encoding efficiency loss under RA configuration,which is superior to the state-of-the-art methods.The source code is available at https://github.com/Mesks/PPNforV-PCC.
基金supported by the National Key R&D Program of China(No.2022ZD0116402)the National Natural Science Foundation of China(No.62106172).
文摘Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution shift between the learning policy and the previously-collected offline dataset lies at the core of offline RL.To tackle this problem,some methods underestimate the values of states given by learned dynamics models or state-action pairs with actions sampled from policies different from the behavior policy.However,since these generated states or state-action pairs are not guaranteed to be OOD,staying conservative on them may adversely affect the in-distribution ones.In this paper,we propose an OOD state-conservative offline RL method(OSCAR),which aims to address the limitation by explicitly generating reliable OOD states that are located near the manifold of the offline dataset,and then design a conservative policy evaluation approach that combines the vanilla Bellman error with a regularization term that only underestimates the values of these generated OOD states.In this way,we can prevent the value errors of OOD states from propagating to in-distribution states through value bootstrapping and policy improvement.We also theoretically prove that the proposed conservative policy evaluation approach guarantees to underestimate the values of OOD states.OSCAR along with several strong baselines is evaluated on the offline decision-making benchmarks D4RL and autonomous driving benchmark SMARTS.Experimental results show that OSCAR outperforms the baselines on a large portion of the benchmarks and attains the highest average return,substantially outperforming existing offline RL methods.
基金supported by the National Key Research and Development Program of China(No.2020AAA0106302)National Natural Science Foundation of China(Nos.62061136001,92248303,62106123,and 61972224)Tsinghua Institute for Guo Qiang,and the High Performance Computing Center,Tsinghua University.
文摘Combinatorial Optimization(CO)problems have been intensively studied for decades with a wide range of applications.For some classic CO problems,e.g.,the Traveling Salesman Problem(TSP),both traditional planning algorithms and the emerging reinforcement learning have made solid progress in recent years.However,for CO problems with nested sub-tasks,neither end-to-end reinforcement learning algorithms nor traditional evolutionary methods can obtain satisfactory strategies within a limited time and computational resources.In this paper,we propose an algorithmic framework for solving CO problems with nested sub-tasks,in which learning and planning algorithms can be combined in a modular way.We validate our framework in the Job-Shop Scheduling Problem(JSSP),and the experimental results show that our algorithm has good performance in both solution qualities and model generalizations.
基金supported in part by the National Key R&D Program of China(No.2022YFB2602203)in part by the National Natural Science Foundation of China(Nos.U20A20225 and 61873200)Shaanxi Provincial Key Research and Development Program(No.2022-GY111).
文摘Aiming at the problem of poor tracking robustness caused by severe occlusion,deformation,and object rotation of deep learning object tracking algorithm in complex scenes,an improved deep reinforcement learning object tracking algorithm based on actor-double critic network is proposed.In offline training phase,the actor network moves the rectangular box representing the object location according to the input sequence image to obtain the action value,that is,the horizontal,vertical,and scale transformation of the object.Then,the designed double critic network is used to evaluate the action value,and the output double Q value is averaged to guide the actor network to optimize the tracking strategy.The design of double critic network effectively improves the stability and convergence,especially in challenging scenes such as object occlusion,and the tracking performance is significantly improved.In online tracking phase,the well-trained actor network is used to infer the changing action of the bounding box,directly causing the tracker to move the box to the object position in the current frame.Several comparative tracking experiments were conducted on the OTB100 visual tracker benchmark and the experimental results show that more intensive reward settings significantly increase the actor network’s output probability of positive actions.This makes the tracking algorithm proposed in this paper outperforms the mainstream deep reinforcement learning tracking algorithms and deep learning tracking algorithms under the challenging attributes such as occlusion,deformation,and rotation.
基金supported by the National Key R&D Program of China(No.2021ZD0140407)the National Natural Science Foundation of China(No.62022048)the National Defense Basic Science and Technology Strengthening Program of China.
基金supported in part by the National Key R&D Program of China(No.2022ZD0114900)and the Beijing Nova Program.
文摘In addition to a physical comprehension of the world,humans possess a high social intelligence-the intelligence that senses social events,infers the goals and intents of others,and facilitates social interaction.Notably,humans are distinguished from their closest primate cousins by their social cognitive skills as opposed to their physical counterparts.We believe that artificial social intelligence(ASI)will play a crucial role in shaping the future of artificial intelligence(AI).This article begins with a review of ASI from a cognitive science standpoint,including social perception,theory of mind(ToM),and social interaction.Next,we examine the recently-emerged computational counterpart in the AI community.Finally,we provide an in-depth discussion on topics related to ASI.
文摘This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.
基金This work was supported by the National Key Research and Development Program of China(Nos.2020AAA0105500 and 2021ZD0109901)the National Natural Science Foundation of China(Nos.62088102,62125106,and 61971260)the Beijing Municipal Science and Technology Commission(No.Z181100003118014).
文摘The metaverse is attracting considerable attention recently.It aims to build a virtual environment that people can interact with the world and cooperate with each other.In this survey paper,we re-introduce metaverse in a new framework based on a broad range of technologies,including perception which enables us to precisely capture the characteristics of the real world,computation which supports the large computation requirement over large-scale data,reconstruction which builds the virtual world from the real one,cooperation which facilitates long-distance communication and teamwork between users,and interaction which bridges users and the virtual world.Despite its popularity,the fundamental techniques in this framework are still immature.Innovating new techniques to facilitate the applications of metaverse is necessary.In recent years,artificial intelligence(AI),especially deep learning,has shown promising results for empowering various areas,from science to industry.It is reasonable to imagine how we can combine AI with the framework in order to promote the development of metaverse.In this survey,we present the recent achievement by AI for metaverse in the proposed framework,including perception,computation,reconstruction,cooperation,and interaction.We also discuss some future works that AI can contribute to metaverse.
文摘It has been an exciting journey since the mobile communications and artificial intelligence(AI)were conceived in 1983 and 1956.While both fields evolved independently and profoundly changed communications and computing industries,the rapid convergence of 5th generation mobile communication technology(5G)and AI is beginning to significantly transform the core communication infrastructure,network management,and vertical applications.The paper first outlined the individual roadmaps of mobile communications and AI in the early stage,with a concentration to review the era from 3rd generation mobile communication technology(3G)to 5G when AI and mobile communications started to converge.With regard to telecommunications AI,the progress of AI in the ecosystem of mobile communications was further introduced in detail,including network infrastructure,network operation and management,business operation and management,intelligent applications towards business supporting system(BSS)&operation supporting system(OSS)convergence,verticals and private networks,etc.Then the classifications of AI in telecommunication ecosystems were summarized along with its evolution paths specified by various international telecommunications standardization organizations.Towards the next decade,the prospective roadmap of telecommunications AI was forecasted.In line with 3rd generation partnership project(3GPP)and International Telecommunication Union Radiocommunication Sector(ITU-R)timeline of 5G&6th generation mobile communication technology(6G),the network intelligence following 3GPP and open radio access network(O-RAN)routes,experience and intent-based network management and operation,network AI signaling system,intelligent middle-office based BSS,intelligent customer experience management and policy control driven by BSS&OSS convergence,evolution from service level agreement(SLA)to experience level agreement(ELA),and intelligent private network for verticals were further explored.The paper is concluded with the vision that AI will reshape the future beyond 5G(B5G)/6G landscape,and we need pivot our research and development(R&D),standardizations,and ecosystem to fully take the unprecedented opportunities.
基金This work was supperted by Shanghai Sailing Program(Nos.20YF1453000 and 20YF1452800)the National Science Foundation of China(Nos.62003239,62003240,62003243,and 61903027)+1 种基金Shanghai Municipal Science and Technology Major Project(No.2021SHZDZX0100)Shanghai Municipal Commission of Science and Technology(No.19511132101).
文摘The work gives a review on the distributed Nash equilibrium seeking of noncooperative games in multi-agent networks,which emerges as one of the frontier research topics in the area of systems and control community.Firstly,we give the basic formulation and analysis of noncooperative games with continuous action spaces,and provide the motivation and basic setting for distributed Nash equilibrium seeking.Then we introduce both the gradient-based algorithms and best-response based algorithms for various type of games,including zero-sum games,aggregative games,potential games,monotone games,and multi-cluster games.In addition,we provide some applications of noncooperative games.
基金supported in part by the National Natural Science Foundation of China(No.61831005).
文摘With the significant breakthrough in the research of single-modal related deep learning tasks,more and more works begin to focus on multi-modal tasks.Multi-modal tasks usually involve more than one different modalities,and a modality represents a type of behavior or state.Common multi-modal information includes vision,hearing,language,touch,and smell.Vision and language are two of the most common modalities in human daily life,and many typical multi-modal tasks focus on these two modalities,such as visual captioning and visual grounding.In this paper,we conduct in-depth research on typical tasks of vision and language from the perspectives of generation,analysis,and reasoning.First,the analysis and summary with the typical tasks and some pretty classical methods are introduced,which will be generalized from the aspects of different algorithmic concerns,and be further discussed frequently used datasets and metrics.Then,some other variant tasks and cutting-edge tasks are briefly summarized to build a more comprehensive vision and language related multi-modal tasks framework.Finally,we further discuss the development of pre-training related research and make an outlook for future research.We hope this survey can help relevant researchers to understand the latest progress,existing problems,and exploration directions of vision and language multi-modal related tasks,and provide guidance for future research.