Accurate traffic prediction is crucial for an intelligent traffic system (ITS). However, the excessive non-linearity and complexity of the spatial-temporal correlation in traffic flow severely limit the prediction acc...Accurate traffic prediction is crucial for an intelligent traffic system (ITS). However, the excessive non-linearity and complexity of the spatial-temporal correlation in traffic flow severely limit the prediction accuracy of most existing models, which simply stack temporal and spatial modules and fail to capture spatial-temporal features effectively. To improve the prediction accuracy, a multi-head attention spatial-temporal graph neural network (MSTNet) is proposed in this paper. First, the traffic data is decomposed into unique time spans that conform to positive rules, and valuable traffic node attributes are mined through an adaptive graph structure. Second, time and spatial features are captured using a multi-head attention spatial-temporal module. Finally, a multi-step prediction module is used to achieve future traffic condition prediction. Numerical experiments were conducted on an open-source dataset, and the results demonstrate that MSTNet performs well in spatial-temporal feature extraction and achieves more positive forecasting results than the baseline methods.展开更多
Various deep learning models have been proposed for the accurate assisted diagnosis of early-stage Alzheimer’s disease(AD).Most studies predominantly employ Convolutional Neural Networks(CNNs),which focus solely on l...Various deep learning models have been proposed for the accurate assisted diagnosis of early-stage Alzheimer’s disease(AD).Most studies predominantly employ Convolutional Neural Networks(CNNs),which focus solely on local features,thus encountering difficulties in handling global features.In contrast to natural images,Structural Magnetic Resonance Imaging(sMRI)images exhibit a higher number of channel dimensions.However,during the Position Embedding stage ofMulti Head Self Attention(MHSA),the coded information related to the channel dimension is disregarded.To tackle these issues,we propose theRepBoTNet-CESA network,an advanced AD-aided diagnostic model that is capable of learning local and global features simultaneously.It combines the advantages of CNN networks in capturing local information and Transformer networks in integrating global information,reducing computational costs while achieving excellent classification performance.Moreover,it uses the Cubic Embedding Self Attention(CESA)proposed in this paper to incorporate the channel code information,enhancing the classification performance within the Transformer structure.Finally,the RepBoTNet-CESA performs well in various AD-aided diagnosis tasks,with an accuracy of 96.58%,precision of 97.26%,and recall of 96.23%in the AD/NC task;an accuracy of 92.75%,precision of 92.84%,and recall of 93.18%in the EMCI/NC task;and an accuracy of 80.97%,precision of 83.86%,and recall of 80.91%in the AD/EMCI/LMCI/NC task.This demonstrates that RepBoTNet-CESA delivers outstanding outcomes in various AD-aided diagnostic tasks.Furthermore,our study has shown that MHSA exhibits superior performance compared to conventional attention mechanisms in enhancing ResNet performance.Besides,the Deeper RepBoTNet-CESA network fails to make further progress in AD-aided diagnostic tasks.展开更多
精准的分布式光伏短期发电功率预测有助于电力系统运行与功率就地平衡。该文提出一种基于BIRCH(balanced iterative reducing and clustering using hierarchies)相似日聚类的L-Transformer(LSTM-Transformer)模型进行短期光伏功率预测...精准的分布式光伏短期发电功率预测有助于电力系统运行与功率就地平衡。该文提出一种基于BIRCH(balanced iterative reducing and clustering using hierarchies)相似日聚类的L-Transformer(LSTM-Transformer)模型进行短期光伏功率预测。首先使用BIRCH无监督聚类算法对历史数据聚类得到3种典型天气,根据聚类结果划分测试集对模型进行训练。为提高不同天气类型下的预测精度,采用双层架构的L-Transformer模型,首层通过长短期记忆网络(long short term memory,LSTM)的门控单元机制捕捉时间序列中的长期依赖关系;次层结合Transformer模型的自注意力机制聚焦于当前任务更关键的特征量,通过多注意力头与光伏数据特征量相结合生成向量,注意力头并行计算,从而高效、精确地预测短期光伏功率。实测数据验证结果表明L-Transformer模型对于不同天气类型功率预测泛化性优异、精确度高,气象数据波动大时鲁棒性强。展开更多
为了提高语音分离的效果,除了利用混合的语音信号,还可以借助视觉信号作为辅助信息。这种融合了视觉与音频信号的多模态建模方式,已被证实可以有效地提高语音分离的性能,为语音分离任务提供了新的可能性。为了更好地捕捉视觉与音频特征...为了提高语音分离的效果,除了利用混合的语音信号,还可以借助视觉信号作为辅助信息。这种融合了视觉与音频信号的多模态建模方式,已被证实可以有效地提高语音分离的性能,为语音分离任务提供了新的可能性。为了更好地捕捉视觉与音频特征中的长期依赖关系,并强化网络对输入上下文信息的理解,本文提出了一种基于一维扩张卷积与Transformer的时域视听融合语音分离模型。将基于频域的传统视听融合语音分离方法应用到时域中,避免了时频变换带来的信息损失和相位重构问题。所提网络架构包含四个模块:一个视觉特征提取网络,用于从视频帧中提取唇部嵌入特征;一个音频编码器,用于将混合语音转换为特征表示;一个多模态分离网络,主要由音频子网络、视频子网络,以及Transformer网络组成,用于利用视觉和音频特征进行语音分离;以及一个音频解码器,用于将分离后的特征还原为干净的语音。本文使用LRS2数据集生成的包含两个说话者混合语音的数据集。实验结果表明,所提出的网络在尺度不变信噪比改进(Scale-Invariant Signal-to-Noise Ratio Improvement,SISNRi)与信号失真比改进(Signal-to-Distortion Ratio Improvement,SDRi)这两种指标上分别达到14.0 dB与14.3 dB,较纯音频分离模型和普适的视听融合分离模型有明显的性能提升。展开更多
点击率(CTR)预测通过预测用户对广告或商品的点击概率,实现数字广告精准推荐。针对现有CTR模型存在原始嵌入向量未精化、特征交互方式偏简单的问题,本文提出自注意力深度域嵌入因子分解机(self-attention deep field-embedded factoriza...点击率(CTR)预测通过预测用户对广告或商品的点击概率,实现数字广告精准推荐。针对现有CTR模型存在原始嵌入向量未精化、特征交互方式偏简单的问题,本文提出自注意力深度域嵌入因子分解机(self-attention deep field-embedded factorization machine,Self-AtDFEFM)模型。首先,通过多头自注意力对原始嵌入向量加权,精化出关键低层特征;其次,构建深度域嵌入因子分解机(FEFM)模块,设计域对对称矩阵以提升不同特征域之间的交互强度,为高阶特征交互优选出低阶特征组合;再次,基于低阶特征组合构建深度神经网络(DNN),完成隐式高阶特征交互;然后,围绕精化后的嵌入向量,联合多头自注意力与残差机制堆叠多个显式高阶特征交互层,通过自注意力捕获同一特征在不同子空间上的互补信息,完成显示高阶特征交互;最后,联合显式与隐式高阶特征交互实现点击率预测。在Criteo和Avazu两大公开数据集上,将Self-AtDFEFM模型与主流基线模型在AUC和LogLoss指标上进行对比实验;为Self-AtDFEFM模型调制显式高阶特征交互层层数、注意力头数量、嵌入层维度及隐式高阶特征交互层层数等参数;对Self-AtDFEFM模型进行消融实验。实验结果表明:在两大数据集上,Self-AtDFEFM模型的AUC、LogLoss均优于主流基线模型;Self-AtDFEFM模型的全部参数已调为最佳;各模块形成合力以促使Self-AtDFEFM模型性能达到最优,其中显示高阶特征交互层的作用最大。Self-AtDFEFM模型各模块即插即用,易于构建和部署,且在性能与复杂度之间取得平衡,具备较高实用性。展开更多
目前的脑电(EEG)情感识别模型忽略了不同时段情感状态的差异性,未能强化关键的情感信息。针对上述问题,提出一种多上下文向量优化的卷积递归神经网络(CR-MCV)。首先构造脑电信号的特征矩阵序列,通过卷积神经网络(CNN)学习多通道脑电的...目前的脑电(EEG)情感识别模型忽略了不同时段情感状态的差异性,未能强化关键的情感信息。针对上述问题,提出一种多上下文向量优化的卷积递归神经网络(CR-MCV)。首先构造脑电信号的特征矩阵序列,通过卷积神经网络(CNN)学习多通道脑电的空间特征;然后利用基于多头注意力的递归神经网络生成多上下文向量进行高层抽象特征提取;最后利用全连接层进行情感分类。在DEAP(Database for Emotion Analysis using Physiological signals)数据集上进行实验,CR-MCV在唤醒和效价维度上分类准确率分别为88.09%和89.30%。实验结果表明,CR-MCV在利用电极空间位置信息和不同时段情感状态显著性特征基础上,能够自适应地分配特征的注意力并强化情感状态显著性信息。展开更多
文摘Accurate traffic prediction is crucial for an intelligent traffic system (ITS). However, the excessive non-linearity and complexity of the spatial-temporal correlation in traffic flow severely limit the prediction accuracy of most existing models, which simply stack temporal and spatial modules and fail to capture spatial-temporal features effectively. To improve the prediction accuracy, a multi-head attention spatial-temporal graph neural network (MSTNet) is proposed in this paper. First, the traffic data is decomposed into unique time spans that conform to positive rules, and valuable traffic node attributes are mined through an adaptive graph structure. Second, time and spatial features are captured using a multi-head attention spatial-temporal module. Finally, a multi-step prediction module is used to achieve future traffic condition prediction. Numerical experiments were conducted on an open-source dataset, and the results demonstrate that MSTNet performs well in spatial-temporal feature extraction and achieves more positive forecasting results than the baseline methods.
基金the Key Project of Zhejiang Provincial Natural Science Foundation under Grants LD21F020001,Z20F020022the National Natural Science Foundation of China under Grants 62072340,62076185the Major Project of Wenzhou Natural Science Foundation under Grants 2021HZSY0071,ZS2022001.
文摘Various deep learning models have been proposed for the accurate assisted diagnosis of early-stage Alzheimer’s disease(AD).Most studies predominantly employ Convolutional Neural Networks(CNNs),which focus solely on local features,thus encountering difficulties in handling global features.In contrast to natural images,Structural Magnetic Resonance Imaging(sMRI)images exhibit a higher number of channel dimensions.However,during the Position Embedding stage ofMulti Head Self Attention(MHSA),the coded information related to the channel dimension is disregarded.To tackle these issues,we propose theRepBoTNet-CESA network,an advanced AD-aided diagnostic model that is capable of learning local and global features simultaneously.It combines the advantages of CNN networks in capturing local information and Transformer networks in integrating global information,reducing computational costs while achieving excellent classification performance.Moreover,it uses the Cubic Embedding Self Attention(CESA)proposed in this paper to incorporate the channel code information,enhancing the classification performance within the Transformer structure.Finally,the RepBoTNet-CESA performs well in various AD-aided diagnosis tasks,with an accuracy of 96.58%,precision of 97.26%,and recall of 96.23%in the AD/NC task;an accuracy of 92.75%,precision of 92.84%,and recall of 93.18%in the EMCI/NC task;and an accuracy of 80.97%,precision of 83.86%,and recall of 80.91%in the AD/EMCI/LMCI/NC task.This demonstrates that RepBoTNet-CESA delivers outstanding outcomes in various AD-aided diagnostic tasks.Furthermore,our study has shown that MHSA exhibits superior performance compared to conventional attention mechanisms in enhancing ResNet performance.Besides,the Deeper RepBoTNet-CESA network fails to make further progress in AD-aided diagnostic tasks.
文摘精准的分布式光伏短期发电功率预测有助于电力系统运行与功率就地平衡。该文提出一种基于BIRCH(balanced iterative reducing and clustering using hierarchies)相似日聚类的L-Transformer(LSTM-Transformer)模型进行短期光伏功率预测。首先使用BIRCH无监督聚类算法对历史数据聚类得到3种典型天气,根据聚类结果划分测试集对模型进行训练。为提高不同天气类型下的预测精度,采用双层架构的L-Transformer模型,首层通过长短期记忆网络(long short term memory,LSTM)的门控单元机制捕捉时间序列中的长期依赖关系;次层结合Transformer模型的自注意力机制聚焦于当前任务更关键的特征量,通过多注意力头与光伏数据特征量相结合生成向量,注意力头并行计算,从而高效、精确地预测短期光伏功率。实测数据验证结果表明L-Transformer模型对于不同天气类型功率预测泛化性优异、精确度高,气象数据波动大时鲁棒性强。
文摘为了提高语音分离的效果,除了利用混合的语音信号,还可以借助视觉信号作为辅助信息。这种融合了视觉与音频信号的多模态建模方式,已被证实可以有效地提高语音分离的性能,为语音分离任务提供了新的可能性。为了更好地捕捉视觉与音频特征中的长期依赖关系,并强化网络对输入上下文信息的理解,本文提出了一种基于一维扩张卷积与Transformer的时域视听融合语音分离模型。将基于频域的传统视听融合语音分离方法应用到时域中,避免了时频变换带来的信息损失和相位重构问题。所提网络架构包含四个模块:一个视觉特征提取网络,用于从视频帧中提取唇部嵌入特征;一个音频编码器,用于将混合语音转换为特征表示;一个多模态分离网络,主要由音频子网络、视频子网络,以及Transformer网络组成,用于利用视觉和音频特征进行语音分离;以及一个音频解码器,用于将分离后的特征还原为干净的语音。本文使用LRS2数据集生成的包含两个说话者混合语音的数据集。实验结果表明,所提出的网络在尺度不变信噪比改进(Scale-Invariant Signal-to-Noise Ratio Improvement,SISNRi)与信号失真比改进(Signal-to-Distortion Ratio Improvement,SDRi)这两种指标上分别达到14.0 dB与14.3 dB,较纯音频分离模型和普适的视听融合分离模型有明显的性能提升。
文摘点击率(CTR)预测通过预测用户对广告或商品的点击概率,实现数字广告精准推荐。针对现有CTR模型存在原始嵌入向量未精化、特征交互方式偏简单的问题,本文提出自注意力深度域嵌入因子分解机(self-attention deep field-embedded factorization machine,Self-AtDFEFM)模型。首先,通过多头自注意力对原始嵌入向量加权,精化出关键低层特征;其次,构建深度域嵌入因子分解机(FEFM)模块,设计域对对称矩阵以提升不同特征域之间的交互强度,为高阶特征交互优选出低阶特征组合;再次,基于低阶特征组合构建深度神经网络(DNN),完成隐式高阶特征交互;然后,围绕精化后的嵌入向量,联合多头自注意力与残差机制堆叠多个显式高阶特征交互层,通过自注意力捕获同一特征在不同子空间上的互补信息,完成显示高阶特征交互;最后,联合显式与隐式高阶特征交互实现点击率预测。在Criteo和Avazu两大公开数据集上,将Self-AtDFEFM模型与主流基线模型在AUC和LogLoss指标上进行对比实验;为Self-AtDFEFM模型调制显式高阶特征交互层层数、注意力头数量、嵌入层维度及隐式高阶特征交互层层数等参数;对Self-AtDFEFM模型进行消融实验。实验结果表明:在两大数据集上,Self-AtDFEFM模型的AUC、LogLoss均优于主流基线模型;Self-AtDFEFM模型的全部参数已调为最佳;各模块形成合力以促使Self-AtDFEFM模型性能达到最优,其中显示高阶特征交互层的作用最大。Self-AtDFEFM模型各模块即插即用,易于构建和部署,且在性能与复杂度之间取得平衡,具备较高实用性。
文摘目前的脑电(EEG)情感识别模型忽略了不同时段情感状态的差异性,未能强化关键的情感信息。针对上述问题,提出一种多上下文向量优化的卷积递归神经网络(CR-MCV)。首先构造脑电信号的特征矩阵序列,通过卷积神经网络(CNN)学习多通道脑电的空间特征;然后利用基于多头注意力的递归神经网络生成多上下文向量进行高层抽象特征提取;最后利用全连接层进行情感分类。在DEAP(Database for Emotion Analysis using Physiological signals)数据集上进行实验,CR-MCV在唤醒和效价维度上分类准确率分别为88.09%和89.30%。实验结果表明,CR-MCV在利用电极空间位置信息和不同时段情感状态显著性特征基础上,能够自适应地分配特征的注意力并强化情感状态显著性信息。