针对分布式电源和新型负荷容量累积造成负荷影响因素多元化和不确定性特性增强的问题,文中提出一种采用记忆神经网络和曲线形状修正的负荷预测方法。在负荷峰值预测中,采用最大信息系数计算负荷峰值与影响因素的非线性相关性,实现对输...针对分布式电源和新型负荷容量累积造成负荷影响因素多元化和不确定性特性增强的问题,文中提出一种采用记忆神经网络和曲线形状修正的负荷预测方法。在负荷峰值预测中,采用最大信息系数计算负荷峰值与影响因素的非线性相关性,实现对输入特征的筛选;综合考虑负荷峰值序列的长短期自相关性和输入特征与负荷峰值的不同程度相关性,结合Attention机制和双向长短时记忆(bidirectional long short-term memory,BiLSTM)神经网络建立负荷峰值预测模型。在负荷标幺曲线预测中,通过误差倒数法组合相似日和相邻日,建立负荷标幺曲线预测模型;针对预测偏差的非平稳特征,利用自适应噪声的完全集成经验模态分解和BiLSTM网络建立误差预测模型,对曲线形状进行修正。应用中国北方某城市的区域电网负荷数据为算例,验证了所提模型的有效性。展开更多
针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进...针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进行文本特征向量的提取,并获得上下文语义信息;将预训练提取的文本特征作为Seq2Seq-Attention(Sequence to Sequence-Attention)模型的输入序列,构建标签字典以获取多标签间的关联关系.将分类模型在3种数据集上分别进行对比实验,结果表明:模型分类的效果F1值均超过90%.该模型不仅能提高档案文本的多标签分类效果,也能关注标签之间的相关关系.展开更多
Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images wit...Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images with large holes,leading to distortions in the structure and blurring of textures.To address these problems,we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms.The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details.This method divides the inpainting task into two phases:edge prediction and image inpainting.Specifically,in the edge prediction phase,a transformer architecture is designed to combine axial attention with standard self-attention.This design enhances the extraction capability of global structural features and location awareness.It also balances the complexity of self-attention operations,resulting in accurate prediction of the edge structure in the defective region.In the image inpainting phase,a multi-scale fusion attention module is introduced.This module makes full use of multi-level distant features and enhances local pixel continuity,thereby significantly improving the quality of image inpainting.To evaluate the performance of our method.comparative experiments are conducted on several datasets,including CelebA,Places2,and Facade.Quantitative experiments show that our method outperforms the other mainstream methods.Specifically,it improves Peak Signal-to-Noise Ratio(PSNR)and Structure Similarity Index Measure(SSIM)by 1.141~3.234 db and 0.083~0.235,respectively.Moreover,it reduces Learning Perceptual Image Patch Similarity(LPIPS)and Mean Absolute Error(MAE)by 0.0347~0.1753 and 0.0104~0.0402,respectively.Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details.Furthermore,our model exhibits impressive performance in terms of the number of parameters,memory cost,and testing time.展开更多
This study proposes a pose estimation-convolutional neural network-bidirectional gated recurrent unit(PSECNN-BiGRU)fusion model for human posture recognition to address low accuracy issues in abnormal posture recognit...This study proposes a pose estimation-convolutional neural network-bidirectional gated recurrent unit(PSECNN-BiGRU)fusion model for human posture recognition to address low accuracy issues in abnormal posture recognition due to the loss of some feature information and the deterioration of comprehensive performance in model detection in complex home environments.Firstly,the deep convolutional network is integrated with the Mediapipe framework to extract high-precision,multi-dimensional information from the key points of the human skeleton,thereby obtaining a human posture feature set.Thereafter,a double-layer BiGRU algorithm is utilized to extract multi-layer,bidirectional temporal features from the human posture feature set,and a CNN network with an exponential linear unit(ELU)activation function is adopted to perform deep convolution of the feature map to extract the spatial feature of the human posture.Furthermore,a squeeze and excitation networks(SENet)module is introduced to adaptively learn the importance weights of each channel,enhancing the network’s focus on important features.Finally,comparative experiments are performed on available datasets,including the public human activity recognition using smartphone dataset(UCIHAR),the public human activity recognition 70 plus dataset(HAR70PLUS),and the independently developed home abnormal behavior recognition dataset(HABRD)created by the authors’team.The results show that the average accuracy of the proposed PSE-CNN-BiGRU fusion model for human posture recognition is 99.56%,89.42%,and 98.90%,respectively,which are 5.24%,5.83%,and 3.19%higher than the average accuracy of the five models proposed in the comparative literature,including CNN,GRU,and others.The F1-score for abnormal posture recognition reaches 98.84%(heartache),97.18%(fall),99.6%(bellyache),and 98.27%(climbing)on the self-builtHABRDdataset,thus verifying the effectiveness,generalization,and robustness of the proposed model in enhancing human posture recognition.展开更多
Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for ma...Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.展开更多
In recent years,frequent network attacks have highlighted the importance of efficient detection methods for ensuring cyberspace security.This paper presents a novel intrusion detection system consisting of a data prep...In recent years,frequent network attacks have highlighted the importance of efficient detection methods for ensuring cyberspace security.This paper presents a novel intrusion detection system consisting of a data prepro-cessing stage and a deep learning model for accurately identifying network attacks.We have proposed four deep neural network models,which are constructed using architectures such as Convolutional Neural Networks(CNN),Bi-directional Long Short-Term Memory(BiLSTM),Bidirectional Gate Recurrent Unit(BiGRU),and Attention mechanism.These models have been evaluated for their detection performance on the NSL-KDD dataset.To enhance the compatibility between the data and the models,we apply various preprocessing techniques and employ the particle swarm optimization algorithm to perform feature selection on the NSL-KDD dataset,resulting in an optimized feature subset.Moreover,we address class imbalance in the dataset using focal loss.Finally,we employ the BO-TPE algorithm to optimize the hyperparameters of the four models,maximizing their detection performance.The test results demonstrate that the proposed model is capable of extracting the spatiotemporal features of network traffic data effectively.In binary and multiclass experiments,it achieved accuracy rates of 0.999158 and 0.999091,respectively,surpassing other state-of-the-art methods.展开更多
Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to est...Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.展开更多
The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have ...The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.展开更多
The attention is a scarce resource in decentralized autonomous organizations(DAOs),as their self-governance relies heavily on the attention-intensive decision-making process of“proposal and voting”.To prevent the ne...The attention is a scarce resource in decentralized autonomous organizations(DAOs),as their self-governance relies heavily on the attention-intensive decision-making process of“proposal and voting”.To prevent the negative effects of pro-posers’attention-capturing strategies that contribute to the“tragedy of the commons”and ensure an efficient distribution of attention among multiple proposals,it is necessary to establish a market-driven allocation scheme for DAOs’attention.First,the Harberger tax-based attention markets are designed to facilitate its allocation via continuous and automated trading,where the individualized Harberger tax rate(HTR)determined by the pro-posers’reputation is adopted.Then,the Stackelberg game model is formulated in these markets,casting attention to owners in the role of leaders and other competitive proposers as followers.Its equilibrium trading strategies are also discussed to unravel the intricate dynamics of attention pricing.Moreover,utilizing the single-round Stackelberg game as an illustrative example,the existence of Nash equilibrium trading strategies is demonstrated.Finally,the impact of individualized HTR on trading strategies is investigated,and results suggest that it has a negative correlation with leaders’self-accessed prices and ownership duration,but its effect on their revenues varies under different conditions.This study is expected to provide valuable insights into leveraging attention resources to improve DAOs’governance and decision-making process.展开更多
文摘针对分布式电源和新型负荷容量累积造成负荷影响因素多元化和不确定性特性增强的问题,文中提出一种采用记忆神经网络和曲线形状修正的负荷预测方法。在负荷峰值预测中,采用最大信息系数计算负荷峰值与影响因素的非线性相关性,实现对输入特征的筛选;综合考虑负荷峰值序列的长短期自相关性和输入特征与负荷峰值的不同程度相关性,结合Attention机制和双向长短时记忆(bidirectional long short-term memory,BiLSTM)神经网络建立负荷峰值预测模型。在负荷标幺曲线预测中,通过误差倒数法组合相似日和相邻日,建立负荷标幺曲线预测模型;针对预测偏差的非平稳特征,利用自适应噪声的完全集成经验模态分解和BiLSTM网络建立误差预测模型,对曲线形状进行修正。应用中国北方某城市的区域电网负荷数据为算例,验证了所提模型的有效性。
文摘针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进行文本特征向量的提取,并获得上下文语义信息;将预训练提取的文本特征作为Seq2Seq-Attention(Sequence to Sequence-Attention)模型的输入序列,构建标签字典以获取多标签间的关联关系.将分类模型在3种数据集上分别进行对比实验,结果表明:模型分类的效果F1值均超过90%.该模型不仅能提高档案文本的多标签分类效果,也能关注标签之间的相关关系.
基金supported in part by the National Natural Science Foundation of China under Grant 62062061/in part by the Major Project Cultivation Fund of Xizang Minzu University under Grant 324112300447.
文摘Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images with large holes,leading to distortions in the structure and blurring of textures.To address these problems,we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms.The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details.This method divides the inpainting task into two phases:edge prediction and image inpainting.Specifically,in the edge prediction phase,a transformer architecture is designed to combine axial attention with standard self-attention.This design enhances the extraction capability of global structural features and location awareness.It also balances the complexity of self-attention operations,resulting in accurate prediction of the edge structure in the defective region.In the image inpainting phase,a multi-scale fusion attention module is introduced.This module makes full use of multi-level distant features and enhances local pixel continuity,thereby significantly improving the quality of image inpainting.To evaluate the performance of our method.comparative experiments are conducted on several datasets,including CelebA,Places2,and Facade.Quantitative experiments show that our method outperforms the other mainstream methods.Specifically,it improves Peak Signal-to-Noise Ratio(PSNR)and Structure Similarity Index Measure(SSIM)by 1.141~3.234 db and 0.083~0.235,respectively.Moreover,it reduces Learning Perceptual Image Patch Similarity(LPIPS)and Mean Absolute Error(MAE)by 0.0347~0.1753 and 0.0104~0.0402,respectively.Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details.Furthermore,our model exhibits impressive performance in terms of the number of parameters,memory cost,and testing time.
基金funded by the Henan Provincial Science and Technology Research Project(222102210086)the Starry Sky Creative Space Innovation Space Innovation Incubation Project of Zhengzhou University of Light Industry(2023ZCKJ211).
文摘This study proposes a pose estimation-convolutional neural network-bidirectional gated recurrent unit(PSECNN-BiGRU)fusion model for human posture recognition to address low accuracy issues in abnormal posture recognition due to the loss of some feature information and the deterioration of comprehensive performance in model detection in complex home environments.Firstly,the deep convolutional network is integrated with the Mediapipe framework to extract high-precision,multi-dimensional information from the key points of the human skeleton,thereby obtaining a human posture feature set.Thereafter,a double-layer BiGRU algorithm is utilized to extract multi-layer,bidirectional temporal features from the human posture feature set,and a CNN network with an exponential linear unit(ELU)activation function is adopted to perform deep convolution of the feature map to extract the spatial feature of the human posture.Furthermore,a squeeze and excitation networks(SENet)module is introduced to adaptively learn the importance weights of each channel,enhancing the network’s focus on important features.Finally,comparative experiments are performed on available datasets,including the public human activity recognition using smartphone dataset(UCIHAR),the public human activity recognition 70 plus dataset(HAR70PLUS),and the independently developed home abnormal behavior recognition dataset(HABRD)created by the authors’team.The results show that the average accuracy of the proposed PSE-CNN-BiGRU fusion model for human posture recognition is 99.56%,89.42%,and 98.90%,respectively,which are 5.24%,5.83%,and 3.19%higher than the average accuracy of the five models proposed in the comparative literature,including CNN,GRU,and others.The F1-score for abnormal posture recognition reaches 98.84%(heartache),97.18%(fall),99.6%(bellyache),and 98.27%(climbing)on the self-builtHABRDdataset,thus verifying the effectiveness,generalization,and robustness of the proposed model in enhancing human posture recognition.
基金supported in part by the National Natural Science Foundation of China (62176059, 62101136)。
文摘Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.
文摘In recent years,frequent network attacks have highlighted the importance of efficient detection methods for ensuring cyberspace security.This paper presents a novel intrusion detection system consisting of a data prepro-cessing stage and a deep learning model for accurately identifying network attacks.We have proposed four deep neural network models,which are constructed using architectures such as Convolutional Neural Networks(CNN),Bi-directional Long Short-Term Memory(BiLSTM),Bidirectional Gate Recurrent Unit(BiGRU),and Attention mechanism.These models have been evaluated for their detection performance on the NSL-KDD dataset.To enhance the compatibility between the data and the models,we apply various preprocessing techniques and employ the particle swarm optimization algorithm to perform feature selection on the NSL-KDD dataset,resulting in an optimized feature subset.Moreover,we address class imbalance in the dataset using focal loss.Finally,we employ the BO-TPE algorithm to optimize the hyperparameters of the four models,maximizing their detection performance.The test results demonstrate that the proposed model is capable of extracting the spatiotemporal features of network traffic data effectively.In binary and multiclass experiments,it achieved accuracy rates of 0.999158 and 0.999091,respectively,surpassing other state-of-the-art methods.
基金supported in part by the Nationa Natural Science Foundation of China (61876011)the National Key Research and Development Program of China (2022YFB4703700)+1 种基金the Key Research and Development Program 2020 of Guangzhou (202007050002)the Key-Area Research and Development Program of Guangdong Province (2020B090921003)。
文摘Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention,but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space(content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectN N. Source code of this paper is available at https://github.com/yahuiliu99/PointC onT.
基金supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(Grant No.2022D01B186 and No.2022D01B05)。
文摘The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.
基金supported by the National Natural Science Foundation of China(62103411)the Science and Technology Development Fund of Macao SAR(0093/2023/RIA2,0050/2020/A1)。
文摘The attention is a scarce resource in decentralized autonomous organizations(DAOs),as their self-governance relies heavily on the attention-intensive decision-making process of“proposal and voting”.To prevent the negative effects of pro-posers’attention-capturing strategies that contribute to the“tragedy of the commons”and ensure an efficient distribution of attention among multiple proposals,it is necessary to establish a market-driven allocation scheme for DAOs’attention.First,the Harberger tax-based attention markets are designed to facilitate its allocation via continuous and automated trading,where the individualized Harberger tax rate(HTR)determined by the pro-posers’reputation is adopted.Then,the Stackelberg game model is formulated in these markets,casting attention to owners in the role of leaders and other competitive proposers as followers.Its equilibrium trading strategies are also discussed to unravel the intricate dynamics of attention pricing.Moreover,utilizing the single-round Stackelberg game as an illustrative example,the existence of Nash equilibrium trading strategies is demonstrated.Finally,the impact of individualized HTR on trading strategies is investigated,and results suggest that it has a negative correlation with leaders’self-accessed prices and ownership duration,but its effect on their revenues varies under different conditions.This study is expected to provide valuable insights into leveraging attention resources to improve DAOs’governance and decision-making process.