本文针对滚动轴承故障诊断准确率不高的问题提出一种新方法。首先,将振动信号通过短时傅里叶变换(Short Time Fourier Transform,STFT)转化为时频图像构建数据集。其次,采用批量归一化算法和GeLU激活函数改进Alexnet网络,对不同工况的...本文针对滚动轴承故障诊断准确率不高的问题提出一种新方法。首先,将振动信号通过短时傅里叶变换(Short Time Fourier Transform,STFT)转化为时频图像构建数据集。其次,采用批量归一化算法和GeLU激活函数改进Alexnet网络,对不同工况的时频图像进行训练和故障诊断。在凯斯西储大学(Case Western Reserve University,CWRU)轴承数据集试验中,改进后的Alexnet网络训练损失更低,收敛速度更快,故障识别准确率更高。最后,比较模拟滚动轴承损伤故障实验平台采集的样本数据,改进Alexnet网络的故障识别准确率为97.2%,明显优于Alexnet网络、SVM网络和CNN网络,验证了该改进方法的有效性。展开更多
With the advancement of image sensing technology, estimating 3Dhuman pose frommonocular video has becomea hot research topic in computer vision. 3D human pose estimation is an essential prerequisite for subsequentacti...With the advancement of image sensing technology, estimating 3Dhuman pose frommonocular video has becomea hot research topic in computer vision. 3D human pose estimation is an essential prerequisite for subsequentaction analysis and understanding. It empowers a wide spectrum of potential applications in various areas, suchas intelligent transportation, human-computer interaction, and medical rehabilitation. Currently, some methodsfor 3D human pose estimation in monocular video employ temporal convolutional network (TCN) to extractinter-frame feature relationships, but the majority of them suffer from insufficient inter-frame feature relationshipextractions. In this paper, we decompose the 3D joint location regression into the bone direction and length, wepropose the TCG, a temporal convolutional network incorporating Gaussian error linear units (GELU), to solvebone direction. It enablesmore inter-frame features to be captured andmakes the utmost of the feature relationshipsbetween data. Furthermore, we adopt kinematic structural information to solve bone length enhancing the use ofintra-frame joint features. Finally, we design a loss function for joint training of the bone direction estimationnetwork with the bone length estimation network. The proposed method has extensively experimented on thepublic benchmark dataset Human3.6M. Both quantitative and qualitative experimental results showed that theproposed method can achieve more accurate 3D human pose estimations.展开更多
The collection and extraction of tongue images has always been an important part of intelligent tongue diagnosis.At present,the collection of tongue images generally needs to be completed in a sealed,stable light envi...The collection and extraction of tongue images has always been an important part of intelligent tongue diagnosis.At present,the collection of tongue images generally needs to be completed in a sealed,stable light environment,which is not conducive to the promotion of extensive tongue image and intelligent tongue diagnosis.In response to the problem,a newalgorithm named GCYTD(GELU-CA-YOLO Tongue Detection)is proposed to quickly detect and locate the tongue in a natural environment,which can greatly reduce the restriction of the tongue image collection environment.The algorithm is based on the YOLO(You Only Look Once)V4-tiny network model to detect the tongue.Firstly,the GELU(Gaussian Error Liner Units)activation function is integrated into the model to improve the training speed and reduce the number of model parameters;then,the CA(Coordinate Attention)mechanism is integrated into the model to enhance the detection precision and improve the failure tolerance of the model.Compared with the other classical algorithms,Experimental results show thatGCYTD algorithm has a better performance on the tongue images of all types in terms of training speed,tongue detection speed and detection precision,etc.The lighter model can contribute on deploying the tongue detection model on small mobile terminals.展开更多
针对目前通过超分辨率技术重建后图像的质量不高、纹理细节模糊、网络训练不稳定等问题,提出一种基于改进生成对抗网络的单图超分辨率重建算法。该算法以生成对抗网络为基础,采用多尺度卷积层和GELU(Gaussian Error Linear Units)激活...针对目前通过超分辨率技术重建后图像的质量不高、纹理细节模糊、网络训练不稳定等问题,提出一种基于改进生成对抗网络的单图超分辨率重建算法。该算法以生成对抗网络为基础,采用多尺度卷积层和GELU(Gaussian Error Linear Units)激活函数对生成网络中的残差块进行优化,提高网络泛化能力;利用Wasserstein距离和Huber损失对损失函数进行优化,增强网络训练的稳定性;减少判别网络中的批规范化层,优化网络结构。实验结果表明:在Set5等数据集上,该算法重建后的图像在客观评价指标和主观视觉效果上均优于其他经典算法。展开更多
基于Transformer的大语言模型(Large Language Models,LLM)和视觉Transformer(Vision Transformers,ViTs)分别在自然语言处理、机器视觉任务上实现了最为先进的性能.但是ViTs和LLM的常用激活函数GELU(Gaussian Error Linear Unit)、Swis...基于Transformer的大语言模型(Large Language Models,LLM)和视觉Transformer(Vision Transformers,ViTs)分别在自然语言处理、机器视觉任务上实现了最为先进的性能.但是ViTs和LLM的常用激活函数GELU(Gaussian Error Linear Unit)、Swish在Transformer全量化推理中存在精度不足、计算效率低的问题,限制了它们在资源受限的边缘端设备上的部署和应用.本文提出了一种基于分段二次多项式拟合的激活函数高精度近似计算方法(Segmented Quadratic Polynomial Fitting,SQPF)及其量化推理过程,以实现端侧非线性激活函数的高性能部署.SQPF采用最小二乘法和粒子群优化方法求解非线性激活函数拟合优化问题,给出最优的二次多项式拟合系数和区间划分.得到的二次多项式拟合采用动态精度定点对称量化方法进行纯整数推理,推理过程仅包含移位操作和乘加运算.本文使用SQPF计算了GELU和Swish的二次多项式拟合Si-GELU和Si-Swish,并评估了量化推理精度.实验结果表明,在标准数据集ImageNet上,Si-GELU引起的ViTs(ViT、DeiT和Swin)模型分类任务准确率衰减仅为0.09%,是其他同类方法的27.3%;在主流的大语言模型评测数据集MMLU上,Si-Swish引起的子类别精度衰减不超过0.77%,大类别精度衰减不超过0.23%.极小的精度损失表明SQPF计算得到的最优分段二次多项式拟合可以直接替换Transformer模型中全精度浮点激活函数,不必进行参数微调或者重训练.展开更多
文摘本文针对滚动轴承故障诊断准确率不高的问题提出一种新方法。首先,将振动信号通过短时傅里叶变换(Short Time Fourier Transform,STFT)转化为时频图像构建数据集。其次,采用批量归一化算法和GeLU激活函数改进Alexnet网络,对不同工况的时频图像进行训练和故障诊断。在凯斯西储大学(Case Western Reserve University,CWRU)轴承数据集试验中,改进后的Alexnet网络训练损失更低,收敛速度更快,故障识别准确率更高。最后,比较模拟滚动轴承损伤故障实验平台采集的样本数据,改进Alexnet网络的故障识别准确率为97.2%,明显优于Alexnet网络、SVM网络和CNN网络,验证了该改进方法的有效性。
基金supported by the Key Project of NSFC(Grant No.U1908214)Special Project of Central Government Guiding Local Science and Technology Development(Grant No.2021JH6/10500140)+5 种基金the Program for Innovative Research Team in University of Liaoning Province(LT2020015)the Support Plan for Key Field Innovation Team of Dalian(2021RT06)the Support Plan for Leading Innovation Team of Dalian University(XLJ202010)the Science and Technology Innovation Fund of Dalian(Grant No.2020JJ25CY001)in part by the National Natural Science Foundation of China under Grant 61906032the FundamentalResearch Funds for the Central Universities under Grant DUT21TD107.
文摘With the advancement of image sensing technology, estimating 3Dhuman pose frommonocular video has becomea hot research topic in computer vision. 3D human pose estimation is an essential prerequisite for subsequentaction analysis and understanding. It empowers a wide spectrum of potential applications in various areas, suchas intelligent transportation, human-computer interaction, and medical rehabilitation. Currently, some methodsfor 3D human pose estimation in monocular video employ temporal convolutional network (TCN) to extractinter-frame feature relationships, but the majority of them suffer from insufficient inter-frame feature relationshipextractions. In this paper, we decompose the 3D joint location regression into the bone direction and length, wepropose the TCG, a temporal convolutional network incorporating Gaussian error linear units (GELU), to solvebone direction. It enablesmore inter-frame features to be captured andmakes the utmost of the feature relationshipsbetween data. Furthermore, we adopt kinematic structural information to solve bone length enhancing the use ofintra-frame joint features. Finally, we design a loss function for joint training of the bone direction estimationnetwork with the bone length estimation network. The proposed method has extensively experimented on thepublic benchmark dataset Human3.6M. Both quantitative and qualitative experimental results showed that theproposed method can achieve more accurate 3D human pose estimations.
基金This work was supported by the Key Research and Development Plan of China(No.2017YFC1703306)Key Project of Education Department in Hunan Province(No.18A227)Key Project of Traditional Chinese Medicine Scientific Research Plan in Hunan Province(2020002).
文摘The collection and extraction of tongue images has always been an important part of intelligent tongue diagnosis.At present,the collection of tongue images generally needs to be completed in a sealed,stable light environment,which is not conducive to the promotion of extensive tongue image and intelligent tongue diagnosis.In response to the problem,a newalgorithm named GCYTD(GELU-CA-YOLO Tongue Detection)is proposed to quickly detect and locate the tongue in a natural environment,which can greatly reduce the restriction of the tongue image collection environment.The algorithm is based on the YOLO(You Only Look Once)V4-tiny network model to detect the tongue.Firstly,the GELU(Gaussian Error Liner Units)activation function is integrated into the model to improve the training speed and reduce the number of model parameters;then,the CA(Coordinate Attention)mechanism is integrated into the model to enhance the detection precision and improve the failure tolerance of the model.Compared with the other classical algorithms,Experimental results show thatGCYTD algorithm has a better performance on the tongue images of all types in terms of training speed,tongue detection speed and detection precision,etc.The lighter model can contribute on deploying the tongue detection model on small mobile terminals.
文摘针对目前通过超分辨率技术重建后图像的质量不高、纹理细节模糊、网络训练不稳定等问题,提出一种基于改进生成对抗网络的单图超分辨率重建算法。该算法以生成对抗网络为基础,采用多尺度卷积层和GELU(Gaussian Error Linear Units)激活函数对生成网络中的残差块进行优化,提高网络泛化能力;利用Wasserstein距离和Huber损失对损失函数进行优化,增强网络训练的稳定性;减少判别网络中的批规范化层,优化网络结构。实验结果表明:在Set5等数据集上,该算法重建后的图像在客观评价指标和主观视觉效果上均优于其他经典算法。
文摘基于Transformer的大语言模型(Large Language Models,LLM)和视觉Transformer(Vision Transformers,ViTs)分别在自然语言处理、机器视觉任务上实现了最为先进的性能.但是ViTs和LLM的常用激活函数GELU(Gaussian Error Linear Unit)、Swish在Transformer全量化推理中存在精度不足、计算效率低的问题,限制了它们在资源受限的边缘端设备上的部署和应用.本文提出了一种基于分段二次多项式拟合的激活函数高精度近似计算方法(Segmented Quadratic Polynomial Fitting,SQPF)及其量化推理过程,以实现端侧非线性激活函数的高性能部署.SQPF采用最小二乘法和粒子群优化方法求解非线性激活函数拟合优化问题,给出最优的二次多项式拟合系数和区间划分.得到的二次多项式拟合采用动态精度定点对称量化方法进行纯整数推理,推理过程仅包含移位操作和乘加运算.本文使用SQPF计算了GELU和Swish的二次多项式拟合Si-GELU和Si-Swish,并评估了量化推理精度.实验结果表明,在标准数据集ImageNet上,Si-GELU引起的ViTs(ViT、DeiT和Swin)模型分类任务准确率衰减仅为0.09%,是其他同类方法的27.3%;在主流的大语言模型评测数据集MMLU上,Si-Swish引起的子类别精度衰减不超过0.77%,大类别精度衰减不超过0.23%.极小的精度损失表明SQPF计算得到的最优分段二次多项式拟合可以直接替换Transformer模型中全精度浮点激活函数,不必进行参数微调或者重训练.