Unsupervised learning methods such as graph contrastive learning have been used for dynamic graph represen-tation learning to eliminate the dependence of labels.However,existing studies neglect positional information ...Unsupervised learning methods such as graph contrastive learning have been used for dynamic graph represen-tation learning to eliminate the dependence of labels.However,existing studies neglect positional information when learning discrete snapshots,resulting in insufficient network topology learning.At the same time,due to the lack of appropriate data augmentation methods,it is difficult to capture the evolving patterns of the network effectively.To address the above problems,a position-aware and subgraph enhanced dynamic graph contrastive learning method is proposed for discrete-time dynamic graphs.Firstly,the global snapshot is built based on the historical snapshots to express the stable pattern of the dynamic graph,and the random walk is used to obtain the position representation by learning the positional information of the nodes.Secondly,a new data augmentation method is carried out from the perspectives of short-term changes and long-term stable structures of dynamic graphs.Specifically,subgraph sampling based on snapshots and global snapshots is used to obtain two structural augmentation views,and node structures and evolving patterns are learned by combining graph neural network,gated recurrent unit,and attention mechanism.Finally,the quality of node representation is improved by combining the contrastive learning between different structural augmentation views and between the two representations of structure and position.Experimental results on four real datasets show that the performance of the proposed method is better than the existing unsupervised methods,and it is more competitive than the supervised learning method under a semi-supervised setting.展开更多
Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurr...Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurrent Temporal Graph Convolution Networks(IndRT-GCNets)framework to efficiently and accurately capture event attribute information.The framework models the knowledge graph sequences to learn the evolutionary represen-tations of entities and relations within each period.Firstly,by utilizing the temporal graph convolution module in the evolutionary representation unit,the framework captures the structural dependency relationships within the knowledge graph in each period.Meanwhile,to achieve better event representation and establish effective correlations,an independent recurrent neural network is employed to implement auto-regressive modeling.Furthermore,static attributes of entities in the entity-relation events are constrained andmerged using a static graph constraint to obtain optimal entity representations.Finally,the evolution of entity and relation representations is utilized to predict events in the next subsequent step.On multiple real-world datasets such as Freebase13(FB13),Freebase 15k(FB15K),WordNet11(WN11),WordNet18(WN18),FB15K-237,WN18RR,YAGO3-10,and Nell-995,the results of multiple evaluation indicators show that our proposed IndRT-GCNets framework outperforms most existing models on knowledge reasoning tasks,which validates the effectiveness and robustness.展开更多
Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant resear...Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.展开更多
We construct a new U(1)slave-spin representation for the single-band Hubbard model in the large-U limit.The mean-field theory in this representation is more amenable to describe both the spin-charge-separation physics...We construct a new U(1)slave-spin representation for the single-band Hubbard model in the large-U limit.The mean-field theory in this representation is more amenable to describe both the spin-charge-separation physics of the Mott insulator at half-filling and the strange metal behavior at finite doping.展开更多
Dear Editor,This letter proposes a symmetry-preserving dual-stream graph neural network(SDGNN) for precise representation learning to an undirected weighted graph(UWG). Although existing graph neural networks(GNNs) ar...Dear Editor,This letter proposes a symmetry-preserving dual-stream graph neural network(SDGNN) for precise representation learning to an undirected weighted graph(UWG). Although existing graph neural networks(GNNs) are influential instruments for representation learning to a UWG, they invariably adopt a unique node feature matrix for illustrating the sole node set of a UWG.展开更多
The exploration of spin symmetry (SS) in nuclear physics has been instrumental in identifying atomic nucleus structures.In this study,we solve the Dirac equation from the relativistic mean field (RMF) in complex momen...The exploration of spin symmetry (SS) in nuclear physics has been instrumental in identifying atomic nucleus structures.In this study,we solve the Dirac equation from the relativistic mean field (RMF) in complex momentum representation.We investigated SS and its breaking in single-particle resonant states within deformed nuclei,with a focus on the illustrative nucleus168Er.This was the initial discovery of a resonant spin doublet in a deformed nucleus,with the expectation of the SS approaching the continuum threshold.With increasing single-particle energy,the splitting of the resonant spin doublets widened significantly.This escalating splitting implies diminishing adherence to the SS,indicating a departure from the expected behavior as the energy levels increase.We also analyzed the width of the resonant states,showing that lower orbital angular momentum resonances possess shorter decay times and that SS is preserved within broad resonant doublets,as opposed to narrow resonant doublets.Comparing the radial density of the upper components for the bound-state and resonant-state doublets,it becomes evident that while SS is well-preserved in the bound states,it deteriorates in the resonant states.The impact of nuclear deformation (β_(2)) on SS was examined,demonstrating that an increase in β_(2) resulted in higher energy and width splitting in the resonant spin doublets,which is attributed to increased component mixing.Furthermore,the sensitivity of spin doublets to various potential parameters such as surface diffuseness (a),radius (R),and depth (Σ0) is discussed,emphasizing the role of these parameters in SS.This study provides valuable insights into the behavior of spin doublets in deformed nuclei and their interplay with the nuclear structure,thereby advancing our understanding of SS in the resonance state.展开更多
Deep learning has been a catalyst for a transformative revo-lution in machine learning and computer vision in the past decade.Within these research domains,methods grounded in deep learning have exhibited exceptional ...Deep learning has been a catalyst for a transformative revo-lution in machine learning and computer vision in the past decade.Within these research domains,methods grounded in deep learning have exhibited exceptional performance across a spectrum of tasks.The success of deep learning methods can be attributed to their capability to derive potent representations from data,integral for a myriad of downstream applications.These representations encapsulate the intrinsic structure,fea-tures,or latent variables characterising the underlying statistics of visual data.Despite these achievements,the challenge per-sists in effectively conducting representation learning of visual data with deep models,particularly when confronted with vast and noisy datasets.This special issue is a dedicated platform for researchers worldwide to disseminate their latest,high-quality articles,aiming to enhance readers'comprehension of the principles,limitations,and diverse applications of repre-sentation learning in computer vision.展开更多
User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated...User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data,and thus cannot be measured directly.Text-based data models can learn user representations by mining latent semantics,which is beneficial to enhancing the semantic function of user representations.However,these technologies only extract common features in historical records and cannot represent changes in user intentions.However,sequential feature can express the user’s interests and intentions that change time by time.But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors.To address these issues,we propose in this paper a novel model with Dual-Layer User Representation,named DLUR,where the user’s intention is learned based on two different layer representations.Specifically,the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation.The sequence layer uses the Transformer model to encode the user’s preference intention to clarify changes in the user’s intention.Therefore,this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations.Our extensive experiments on five benchmark datasets demonstrate DLUR’s performance over state-of-the-art recommendation models.In addition,DLUR’s ability to explain recommendation results is also demonstrated through some specific cases.展开更多
Using the operator correspondence of the real and fictious modes in the thermo entangled state representation, wesolve the quantum master equation describing the diffusion channel and obtain the Kraus operator-sum rep...Using the operator correspondence of the real and fictious modes in the thermo entangled state representation, wesolve the quantum master equation describing the diffusion channel and obtain the Kraus operator-sum representation ofits analytical solution. we find that the pure coherent states evolve into the new mixed thermal superposed states in thediffusion channel. Also, we investigate the statistical properties of the initial coherent states and their entropy evolutions inthe diffusion channel, and find that the entropy evolutions are only related to the decay time and without the amplitudes ofthe initial coherent states.展开更多
Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation...Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.展开更多
Network anomaly detection plays a vital role in safeguarding network security.However,the existing network anomaly detection task is typically based on the one-class zero-positive scenario.This approach is susceptible...Network anomaly detection plays a vital role in safeguarding network security.However,the existing network anomaly detection task is typically based on the one-class zero-positive scenario.This approach is susceptible to overfitting during the training process due to discrepancies in data distribution between the training set and the test set.This phenomenon is known as prediction drift.Additionally,the rarity of anomaly data,often masked by normal data,further complicates network anomaly detection.To address these challenges,we propose the PUNet network,which ingeniously combines the strengths of traditional machine learning and deep learning techniques for anomaly detection.Specifically,PUNet employs a reconstruction-based autoencoder to pre-train normal data,enabling the network to capture potential features and correlations within the data.Subsequently,PUNet integrates a sampling algorithm to construct a pseudo-label candidate set among the outliers based on the reconstruction loss of the samples.This approach effectively mitigates the prediction drift problem by incorporating abnormal samples.Furthermore,PUNet utilizes the CatBoost classifier for anomaly detection to tackle potential data imbalance issues within the candidate set.Extensive experimental evaluations demonstrate that PUNet effectively resolves the prediction drift and data imbalance problems,significantly outperforming competing methods.展开更多
The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-atten...The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.展开更多
Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in ...Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in sparse representation means that only a few of instances selected from all training samples can effectively convey the essential class-specific information of the test sample,which is very important for classification.For deformable images such as human faces,pixels at the same location of different images of the same subject usually have different intensities.Therefore,extracting features and correctly classifying such deformable objects is very hard.Moreover,the lighting,attitude and occlusion cause more difficulty.Considering the problems and challenges listed above,a novel image representation and classification algorithm is proposed.First,the authors’algorithm generates virtual samples by a non-linear variation method.This method can effectively extract the low-frequency information of space-domain features of the original image,which is very useful for representing deformable objects.The combination of the original and virtual samples is more beneficial to improve the clas-sification performance and robustness of the algorithm.Thereby,the authors’algorithm calculates the expression coefficients of the original and virtual samples separately using the sparse representation principle and obtains the final score by a designed efficient score fusion scheme.The weighting coefficients in the score fusion scheme are set entirely automatically.Finally,the algorithm classifies the samples based on the final scores.The experimental results show that our method performs better classification than conventional sparse representation algorithms.展开更多
This study introduces a pre-orthogonal adaptive Fourier decomposition(POAFD)to obtain approximations and numerical solutions to the fractional Laplacian initial value problem and the extension problem of Caffarelli an...This study introduces a pre-orthogonal adaptive Fourier decomposition(POAFD)to obtain approximations and numerical solutions to the fractional Laplacian initial value problem and the extension problem of Caffarelli and Silvestre(generalized Poisson equation).As a first step,the method expands the initial data function into a sparse series of the fundamental solutions with fast convergence,and,as a second step,makes use of the semigroup or the reproducing kernel property of each of the expanding entries.Experiments show the effectiveness and efficiency of the proposed series solutions.展开更多
Anticipating others’actions is innate and essential in order for humans to navigate and interact well with others in dense crowds.This ability is urgently required for unmanned systems such as service robots and self...Anticipating others’actions is innate and essential in order for humans to navigate and interact well with others in dense crowds.This ability is urgently required for unmanned systems such as service robots and self-driving cars.However,existing solutions struggle to predict pedestrian anticipation accurately,because the influence of group-related social behaviors has not been well considered.While group relationships and group interactions are ubiquitous and significantly influence pedestrian anticipation,their influence is diverse and subtle,making it difficult to explicitly quantify.Here,we propose the group interaction field(GIF),a novel group-aware representation that quantifies pedestrian anticipation into a probability field of pedestrians’future locations and attention orientations.An end-to-end neural network,GIFNet,is tailored to estimate the GIF from explicit multidimensional observations.GIFNet quantifies the influence of group behaviors by formulating a group interaction graph with propagation and graph attention that is adaptive to the group size and dynamic interaction states.The experimental results show that the GIF effectively represents the change in pedestrians’anticipation under the prominent impact of group behaviors and accurately predicts pedestrians’future states.Moreover,the GIF contributes to explaining various predictions of pedestrians’behavior in different social states.The proposed GIF will eventually be able to allow unmanned systems to work in a human-like manner and comply with social norms,thereby promoting harmonious human-machine relationships.展开更多
Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of ...Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.展开更多
Social media has become increasingly significant in modern society,but it has also turned into a breeding ground for the propagation of misleading information,potentially causing a detrimental impact on public opinion...Social media has become increasingly significant in modern society,but it has also turned into a breeding ground for the propagation of misleading information,potentially causing a detrimental impact on public opinion and daily life.Compared to pure text content,multmodal content significantly increases the visibility and share ability of posts.This has made the search for efficient modality representations and cross-modal information interaction methods a key focus in the field of multimodal fake news detection.To effectively address the critical challenge of accurately detecting fake news on social media,this paper proposes a fake news detection model based on crossmodal message aggregation and a gated fusion network(MAGF).MAGF first uses BERT to extract cumulative textual feature representations and word-level features,applies Faster Region-based ConvolutionalNeuralNetwork(Faster R-CNN)to obtain image objects,and leverages ResNet-50 and Visual Geometry Group-19(VGG-19)to obtain image region features and global features.The image region features and word-level text features are then projected into a low-dimensional space to calculate a text-image affinity matrix for cross-modal message aggregation.The gated fusion network combines text and image region features to obtain adaptively aggregated features.The interaction matrix is derived through an attention mechanism and further integrated with global image features using a co-attention mechanism to producemultimodal representations.Finally,these fused features are fed into a classifier for news categorization.Experiments were conducted on two public datasets,Twitter and Weibo.Results show that the proposed model achieves accuracy rates of 91.8%and 88.7%on the two datasets,respectively,significantly outperforming traditional unimodal and existing multimodal models.展开更多
We redesign the parameterized quantum circuit in the quantum deep neural network, construct a three-layer structure as the hidden layer, and then use classical optimization algorithms to train the parameterized quantu...We redesign the parameterized quantum circuit in the quantum deep neural network, construct a three-layer structure as the hidden layer, and then use classical optimization algorithms to train the parameterized quantum circuit, thereby propose a novel hybrid quantum deep neural network(HQDNN) used for image classification. After bilinear interpolation reduces the original image to a suitable size, an improved novel enhanced quantum representation(INEQR) is used to encode it into quantum states as the input of the HQDNN. Multi-layer parameterized quantum circuits are used as the main structure to implement feature extraction and classification. The output results of parameterized quantum circuits are converted into classical data through quantum measurements and then optimized on a classical computer. To verify the performance of the HQDNN, we conduct binary classification and three classification experiments on the MNIST(Modified National Institute of Standards and Technology) data set. In the first binary classification, the accuracy of 0 and 4 exceeds98%. Then we compare the performance of three classification with other algorithms, the results on two datasets show that the classification accuracy is higher than that of quantum deep neural network and general quantum convolutional neural network.展开更多
With the rapid advancement of 5G technology,the Internet of Things(IoT)has entered a new phase of appli-cations and is rapidly becoming a significant force in promoting economic development.Due to the vast amounts of ...With the rapid advancement of 5G technology,the Internet of Things(IoT)has entered a new phase of appli-cations and is rapidly becoming a significant force in promoting economic development.Due to the vast amounts of data created by numerous 5G IoT devices,the Ethereum platform has become a tool for the storage and sharing of IoT device data,thanks to its open and tamper-resistant characteristics.So,Ethereum account security is necessary for the Internet of Things to grow quickly and improve people's lives.By modeling Ethereum trans-action records as a transaction network,the account types are well identified by the Ethereum account classifi-cation system established based on Graph Neural Networks(GNNs).This work first investigates the Ethereum transaction network.Surprisingly,experimental metrics reveal that the Ethereum transaction network is neither optimal nor even satisfactory in terms of accurately representing transactions per account.This flaw may significantly impede the classification capability of GNNs,which is mostly governed by their attributes.This work proposes an Adaptive Multi-channel Bayesian Graph Attention Network(AMBGAT)for Ethereum account clas-sification to address this difficulty.AMBGAT uses attention to enhance node features,estimate graph topology that conforms to the ground truth,and efficiently extract node features pertinent to downstream tasks.An extensive experiment with actual Ethereum transaction data demonstrates that AMBGAT obtains competitive performance in the classification of Ethereum accounts while accurately estimating the graph topology.展开更多
Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the intro...Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.展开更多
文摘Unsupervised learning methods such as graph contrastive learning have been used for dynamic graph represen-tation learning to eliminate the dependence of labels.However,existing studies neglect positional information when learning discrete snapshots,resulting in insufficient network topology learning.At the same time,due to the lack of appropriate data augmentation methods,it is difficult to capture the evolving patterns of the network effectively.To address the above problems,a position-aware and subgraph enhanced dynamic graph contrastive learning method is proposed for discrete-time dynamic graphs.Firstly,the global snapshot is built based on the historical snapshots to express the stable pattern of the dynamic graph,and the random walk is used to obtain the position representation by learning the positional information of the nodes.Secondly,a new data augmentation method is carried out from the perspectives of short-term changes and long-term stable structures of dynamic graphs.Specifically,subgraph sampling based on snapshots and global snapshots is used to obtain two structural augmentation views,and node structures and evolving patterns are learned by combining graph neural network,gated recurrent unit,and attention mechanism.Finally,the quality of node representation is improved by combining the contrastive learning between different structural augmentation views and between the two representations of structure and position.Experimental results on four real datasets show that the performance of the proposed method is better than the existing unsupervised methods,and it is more competitive than the supervised learning method under a semi-supervised setting.
基金the National Natural Science Founda-tion of China(62062062)hosted by Gulila Altenbek.
文摘Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurrent Temporal Graph Convolution Networks(IndRT-GCNets)framework to efficiently and accurately capture event attribute information.The framework models the knowledge graph sequences to learn the evolutionary represen-tations of entities and relations within each period.Firstly,by utilizing the temporal graph convolution module in the evolutionary representation unit,the framework captures the structural dependency relationships within the knowledge graph in each period.Meanwhile,to achieve better event representation and establish effective correlations,an independent recurrent neural network is employed to implement auto-regressive modeling.Furthermore,static attributes of entities in the entity-relation events are constrained andmerged using a static graph constraint to obtain optimal entity representations.Finally,the evolution of entity and relation representations is utilized to predict events in the next subsequent step.On multiple real-world datasets such as Freebase13(FB13),Freebase 15k(FB15K),WordNet11(WN11),WordNet18(WN18),FB15K-237,WN18RR,YAGO3-10,and Nell-995,the results of multiple evaluation indicators show that our proposed IndRT-GCNets framework outperforms most existing models on knowledge reasoning tasks,which validates the effectiveness and robustness.
基金supported by the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the National Natural Science Foundation of China(Grant No.62302086).
文摘Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.
基金supported by Startup Fund of Anhui University(Grant No.S020118002/002)support from the Kavli Institute for Theoretical Sciences+1 种基金supported by the National Key R&D Program of China(Grant No.2023YFA1406500)the National Science Foundation of China(Grant Nos.12334008 and 12174441)。
文摘We construct a new U(1)slave-spin representation for the single-band Hubbard model in the large-U limit.The mean-field theory in this representation is more amenable to describe both the spin-charge-separation physics of the Mott insulator at half-filling and the strange metal behavior at finite doping.
基金supported in part by the National Natural Science Foundation of China (62372385,62002337)the Chongqing Natural Science Foundation (CSTB2022NSCQMSX1486,CSTB2023NSCQ-LZX0069)。
文摘Dear Editor,This letter proposes a symmetry-preserving dual-stream graph neural network(SDGNN) for precise representation learning to an undirected weighted graph(UWG). Although existing graph neural networks(GNNs) are influential instruments for representation learning to a UWG, they invariably adopt a unique node feature matrix for illustrating the sole node set of a UWG.
基金supported by the National Natural Science Foundation of China(No.11935001)the Natural Science Foundation of Anhui Province(No.2008085MA26).
文摘The exploration of spin symmetry (SS) in nuclear physics has been instrumental in identifying atomic nucleus structures.In this study,we solve the Dirac equation from the relativistic mean field (RMF) in complex momentum representation.We investigated SS and its breaking in single-particle resonant states within deformed nuclei,with a focus on the illustrative nucleus168Er.This was the initial discovery of a resonant spin doublet in a deformed nucleus,with the expectation of the SS approaching the continuum threshold.With increasing single-particle energy,the splitting of the resonant spin doublets widened significantly.This escalating splitting implies diminishing adherence to the SS,indicating a departure from the expected behavior as the energy levels increase.We also analyzed the width of the resonant states,showing that lower orbital angular momentum resonances possess shorter decay times and that SS is preserved within broad resonant doublets,as opposed to narrow resonant doublets.Comparing the radial density of the upper components for the bound-state and resonant-state doublets,it becomes evident that while SS is well-preserved in the bound states,it deteriorates in the resonant states.The impact of nuclear deformation (β_(2)) on SS was examined,demonstrating that an increase in β_(2) resulted in higher energy and width splitting in the resonant spin doublets,which is attributed to increased component mixing.Furthermore,the sensitivity of spin doublets to various potential parameters such as surface diffuseness (a),radius (R),and depth (Σ0) is discussed,emphasizing the role of these parameters in SS.This study provides valuable insights into the behavior of spin doublets in deformed nuclei and their interplay with the nuclear structure,thereby advancing our understanding of SS in the resonance state.
文摘Deep learning has been a catalyst for a transformative revo-lution in machine learning and computer vision in the past decade.Within these research domains,methods grounded in deep learning have exhibited exceptional performance across a spectrum of tasks.The success of deep learning methods can be attributed to their capability to derive potent representations from data,integral for a myriad of downstream applications.These representations encapsulate the intrinsic structure,fea-tures,or latent variables characterising the underlying statistics of visual data.Despite these achievements,the challenge per-sists in effectively conducting representation learning of visual data with deep models,particularly when confronted with vast and noisy datasets.This special issue is a dedicated platform for researchers worldwide to disseminate their latest,high-quality articles,aiming to enhance readers'comprehension of the principles,limitations,and diverse applications of repre-sentation learning in computer vision.
基金supported by the Applied Research Center of Artificial Intelligence,Wuhan College(Grant Number X2020113)the Wuhan College Research Project(Grant Number KYZ202009).
文摘User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data,and thus cannot be measured directly.Text-based data models can learn user representations by mining latent semantics,which is beneficial to enhancing the semantic function of user representations.However,these technologies only extract common features in historical records and cannot represent changes in user intentions.However,sequential feature can express the user’s interests and intentions that change time by time.But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors.To address these issues,we propose in this paper a novel model with Dual-Layer User Representation,named DLUR,where the user’s intention is learned based on two different layer representations.Specifically,the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation.The sequence layer uses the Transformer model to encode the user’s preference intention to clarify changes in the user’s intention.Therefore,this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations.Our extensive experiments on five benchmark datasets demonstrate DLUR’s performance over state-of-the-art recommendation models.In addition,DLUR’s ability to explain recommendation results is also demonstrated through some specific cases.
基金Collaborative Innovation Project of University,Anhui Province(Grant No.GXXT-2022-088).
文摘Using the operator correspondence of the real and fictious modes in the thermo entangled state representation, wesolve the quantum master equation describing the diffusion channel and obtain the Kraus operator-sum representation ofits analytical solution. we find that the pure coherent states evolve into the new mixed thermal superposed states in thediffusion channel. Also, we investigate the statistical properties of the initial coherent states and their entropy evolutions inthe diffusion channel, and find that the entropy evolutions are only related to the decay time and without the amplitudes ofthe initial coherent states.
基金funded by the Major Science and Technology Projects in Henan Province,China,Grant No.221100210600.
文摘Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.
文摘Network anomaly detection plays a vital role in safeguarding network security.However,the existing network anomaly detection task is typically based on the one-class zero-positive scenario.This approach is susceptible to overfitting during the training process due to discrepancies in data distribution between the training set and the test set.This phenomenon is known as prediction drift.Additionally,the rarity of anomaly data,often masked by normal data,further complicates network anomaly detection.To address these challenges,we propose the PUNet network,which ingeniously combines the strengths of traditional machine learning and deep learning techniques for anomaly detection.Specifically,PUNet employs a reconstruction-based autoencoder to pre-train normal data,enabling the network to capture potential features and correlations within the data.Subsequently,PUNet integrates a sampling algorithm to construct a pseudo-label candidate set among the outliers based on the reconstruction loss of the samples.This approach effectively mitigates the prediction drift problem by incorporating abnormal samples.Furthermore,PUNet utilizes the CatBoost classifier for anomaly detection to tackle potential data imbalance issues within the candidate set.Extensive experimental evaluations demonstrate that PUNet effectively resolves the prediction drift and data imbalance problems,significantly outperforming competing methods.
基金the Communication University of China(CUC230A013)the Fundamental Research Funds for the Central Universities.
文摘The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.
文摘Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in sparse representation means that only a few of instances selected from all training samples can effectively convey the essential class-specific information of the test sample,which is very important for classification.For deformable images such as human faces,pixels at the same location of different images of the same subject usually have different intensities.Therefore,extracting features and correctly classifying such deformable objects is very hard.Moreover,the lighting,attitude and occlusion cause more difficulty.Considering the problems and challenges listed above,a novel image representation and classification algorithm is proposed.First,the authors’algorithm generates virtual samples by a non-linear variation method.This method can effectively extract the low-frequency information of space-domain features of the original image,which is very useful for representing deformable objects.The combination of the original and virtual samples is more beneficial to improve the clas-sification performance and robustness of the algorithm.Thereby,the authors’algorithm calculates the expression coefficients of the original and virtual samples separately using the sparse representation principle and obtains the final score by a designed efficient score fusion scheme.The weighting coefficients in the score fusion scheme are set entirely automatically.Finally,the algorithm classifies the samples based on the final scores.The experimental results show that our method performs better classification than conventional sparse representation algorithms.
基金supported by the Science and Technology Development Fund of Macao SAR(FDCT0128/2022/A,0020/2023/RIB1,0111/2023/AFJ,005/2022/ALC)the Shandong Natural Science Foundation of China(ZR2020MA004)+2 种基金the National Natural Science Foundation of China(12071272)the MYRG 2018-00168-FSTZhejiang Provincial Natural Science Foundation of China(LQ23A010014).
文摘This study introduces a pre-orthogonal adaptive Fourier decomposition(POAFD)to obtain approximations and numerical solutions to the fractional Laplacian initial value problem and the extension problem of Caffarelli and Silvestre(generalized Poisson equation).As a first step,the method expands the initial data function into a sparse series of the fundamental solutions with fast convergence,and,as a second step,makes use of the semigroup or the reproducing kernel property of each of the expanding entries.Experiments show the effectiveness and efficiency of the proposed series solutions.
基金supported in part by the National Natural Science Foundation of China (NSFC,62125106,61860206003,and 62088102)in part by the Ministry of Science and Technology of China (2021ZD0109901)in part by the Provincial Key Research and Development Program of Zhejiang (2021C01016).
文摘Anticipating others’actions is innate and essential in order for humans to navigate and interact well with others in dense crowds.This ability is urgently required for unmanned systems such as service robots and self-driving cars.However,existing solutions struggle to predict pedestrian anticipation accurately,because the influence of group-related social behaviors has not been well considered.While group relationships and group interactions are ubiquitous and significantly influence pedestrian anticipation,their influence is diverse and subtle,making it difficult to explicitly quantify.Here,we propose the group interaction field(GIF),a novel group-aware representation that quantifies pedestrian anticipation into a probability field of pedestrians’future locations and attention orientations.An end-to-end neural network,GIFNet,is tailored to estimate the GIF from explicit multidimensional observations.GIFNet quantifies the influence of group behaviors by formulating a group interaction graph with propagation and graph attention that is adaptive to the group size and dynamic interaction states.The experimental results show that the GIF effectively represents the change in pedestrians’anticipation under the prominent impact of group behaviors and accurately predicts pedestrians’future states.Moreover,the GIF contributes to explaining various predictions of pedestrians’behavior in different social states.The proposed GIF will eventually be able to allow unmanned systems to work in a human-like manner and comply with social norms,thereby promoting harmonious human-machine relationships.
基金supported in part by National Natural Science Foundation of China(No.62176041)in part by Excellent Science and Technique Talent Foundation of Dalian(No.2022RY21).
文摘Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.
基金supported by the National Natural Science Foundation of China(No.62302540)with author Fangfang Shan.For more information,please visit their website at https://www.nsfc.gov.cn/(accessed on 31/05/2024)+3 种基金Additionally,it is also funded by the Open Foundation of Henan Key Laboratory of Cyberspace Situation Awareness(No.HNTS2022020)where Fangfang Shan is an author.Further details can be found at http://xt.hnkjt.gov.cn/data/pingtai/(accessed on 31/05/2024)supported by the Natural Science Foundation of Henan Province Youth Science Fund Project(No.232300420422)for more information,you can visit https://kjt.henan.gov.cn/2022/09-02/2599082.html(accessed on 31/05/2024).
文摘Social media has become increasingly significant in modern society,but it has also turned into a breeding ground for the propagation of misleading information,potentially causing a detrimental impact on public opinion and daily life.Compared to pure text content,multmodal content significantly increases the visibility and share ability of posts.This has made the search for efficient modality representations and cross-modal information interaction methods a key focus in the field of multimodal fake news detection.To effectively address the critical challenge of accurately detecting fake news on social media,this paper proposes a fake news detection model based on crossmodal message aggregation and a gated fusion network(MAGF).MAGF first uses BERT to extract cumulative textual feature representations and word-level features,applies Faster Region-based ConvolutionalNeuralNetwork(Faster R-CNN)to obtain image objects,and leverages ResNet-50 and Visual Geometry Group-19(VGG-19)to obtain image region features and global features.The image region features and word-level text features are then projected into a low-dimensional space to calculate a text-image affinity matrix for cross-modal message aggregation.The gated fusion network combines text and image region features to obtain adaptively aggregated features.The interaction matrix is derived through an attention mechanism and further integrated with global image features using a co-attention mechanism to producemultimodal representations.Finally,these fused features are fed into a classifier for news categorization.Experiments were conducted on two public datasets,Twitter and Weibo.Results show that the proposed model achieves accuracy rates of 91.8%and 88.7%on the two datasets,respectively,significantly outperforming traditional unimodal and existing multimodal models.
基金Project supported by the Natural Science Foundation of Shandong Province,China (Grant No. ZR2021MF049)the Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001)。
文摘We redesign the parameterized quantum circuit in the quantum deep neural network, construct a three-layer structure as the hidden layer, and then use classical optimization algorithms to train the parameterized quantum circuit, thereby propose a novel hybrid quantum deep neural network(HQDNN) used for image classification. After bilinear interpolation reduces the original image to a suitable size, an improved novel enhanced quantum representation(INEQR) is used to encode it into quantum states as the input of the HQDNN. Multi-layer parameterized quantum circuits are used as the main structure to implement feature extraction and classification. The output results of parameterized quantum circuits are converted into classical data through quantum measurements and then optimized on a classical computer. To verify the performance of the HQDNN, we conduct binary classification and three classification experiments on the MNIST(Modified National Institute of Standards and Technology) data set. In the first binary classification, the accuracy of 0 and 4 exceeds98%. Then we compare the performance of three classification with other algorithms, the results on two datasets show that the classification accuracy is higher than that of quantum deep neural network and general quantum convolutional neural network.
基金supported in part by the National Natural Science Foundation of China under Grant 62272405,School and Locality Integration Development Project of Yantai City(2022)the Youth Innovation Science and Technology Support Program of Shandong Provincial under Grant 2021KJ080+2 种基金the Natural Science Foundation of Shandong Province,Grant ZR2022MF238Yantai Science and Technology Innovation Development Plan Project under Grant 2021YT06000645the Open Foundation of State key Laboratory of Networking and Switching Technology(Beijing University of Posts and Telecommunications)under Grant SKLNST-2022-1-12.
文摘With the rapid advancement of 5G technology,the Internet of Things(IoT)has entered a new phase of appli-cations and is rapidly becoming a significant force in promoting economic development.Due to the vast amounts of data created by numerous 5G IoT devices,the Ethereum platform has become a tool for the storage and sharing of IoT device data,thanks to its open and tamper-resistant characteristics.So,Ethereum account security is necessary for the Internet of Things to grow quickly and improve people's lives.By modeling Ethereum trans-action records as a transaction network,the account types are well identified by the Ethereum account classifi-cation system established based on Graph Neural Networks(GNNs).This work first investigates the Ethereum transaction network.Surprisingly,experimental metrics reveal that the Ethereum transaction network is neither optimal nor even satisfactory in terms of accurately representing transactions per account.This flaw may significantly impede the classification capability of GNNs,which is mostly governed by their attributes.This work proposes an Adaptive Multi-channel Bayesian Graph Attention Network(AMBGAT)for Ethereum account clas-sification to address this difficulty.AMBGAT uses attention to enhance node features,estimate graph topology that conforms to the ground truth,and efficiently extract node features pertinent to downstream tasks.An extensive experiment with actual Ethereum transaction data demonstrates that AMBGAT obtains competitive performance in the classification of Ethereum accounts while accurately estimating the graph topology.
基金National College Students’Training Programs of Innovation and Entrepreneurship,Grant/Award Number:S202210022060the CACMS Innovation Fund,Grant/Award Number:CI2021A00512the National Nature Science Foundation of China under Grant,Grant/Award Number:62206021。
文摘Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.