Recently,Siamese-based trackers have achieved excellent performance in object tracking.However,the high speed and deformation of objects in the movement process make tracking difficult.Therefore,we have incorporated c...Recently,Siamese-based trackers have achieved excellent performance in object tracking.However,the high speed and deformation of objects in the movement process make tracking difficult.Therefore,we have incorporated cascaded region-proposal-network(RPN)fusion and coordinate attention into Siamese trackers.The proposed network framework consists of three parts:a feature-extraction sub-network,coordinate attention block,and cascaded RPN block.We exploit the coordinate attention block,which can embed location information into channel attention,to establish long-term spatial location dependence while maintaining channel associations.Thus,the features of different layers are enhanced by the coordinate attention block.We then send these features separately into the cascaded RPN for classification and regression.According to the two classification and regression results,the final position of the target is obtained.To verify the effectiveness of the proposed method,we conducted comprehensive experiments on the OTB100,VOT2016,UAV123,and GOT-10k datasets.Compared with other state-of-the-art trackers,the proposed tracker achieved good performance and can run at real-time speed.展开更多
Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than consider...Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than considering only the significant context. Method This paper proposes a novel video anomaly detection method called COVAD that mainly focuses on the region of interest in the video instead of the entire video. Our proposed COVAD method is based on an autoencoded convolutional neural network and a coordinated attention mechanism,which can effectively capture meaningful objects in the video and dependencies among different objects. Relying on the existing memory-guided video frame prediction network, our algorithm can significantly predict the future motion and appearance of objects in a video more effectively. Result The proposed algorithm obtained better experimental results on multiple datasets and outperformed the baseline models considered in our analysis. Simultaneously, we provide an improved visual test that can provide pixel-level anomaly explanations.展开更多
Neutron radiography is a crucial nondestructive testing technology widely used in the aerospace,military,and nuclear industries.However,because of the physical limitations of neutron sources and collimators,the result...Neutron radiography is a crucial nondestructive testing technology widely used in the aerospace,military,and nuclear industries.However,because of the physical limitations of neutron sources and collimators,the resulting neutron radiographic images inevitably exhibit multiple distortions,including noise,geometric unsharpness,and white spots.Furthermore,these distortions are particularly significant in compact neutron radiography systems with low neutron fluxes.Therefore,in this study,we devised a multi-distortion suppression network that employs a modified generative adversarial network to improve the quality of degraded neutron radiographic images.Real neutron radiographic image datasets with various types and levels of distortion were built for the first time as multi-distortion suppression datasets.Thereafter,the coordinate attention mechanism was incorporated into the backbone network to augment the capability of the proposed network to learn the abstract relationship between ideally clear and degraded images.Extensive experiments were performed;the results show that the proposed method can effectively suppress multiple distortions in real neutron radiographic images and achieve state-of-theart perceptual visual quality,thus demonstrating its application potential in neutron radiography.展开更多
Diseases in tea trees can result in significant losses in both the quality and quantity of tea production.Regular monitoring can help to prevent the occurrence of large-scale diseases in tea plantations.However,existi...Diseases in tea trees can result in significant losses in both the quality and quantity of tea production.Regular monitoring can help to prevent the occurrence of large-scale diseases in tea plantations.However,existingmethods face challenges such as a high number of parameters and low recognition accuracy,which hinders their application in tea plantation monitoring equipment.This paper presents a lightweight I-MobileNetV2 model for identifying diseases in tea leaves,to address these challenges.The proposed method first embeds a Coordinate Attention(CA)module into the originalMobileNetV2 network,enabling the model to locate disease regions accurately.Secondly,a Multi-branch Parallel Convolution(MPC)module is employed to extract disease features across multiple scales,improving themodel’s adaptability to different disease scales.Finally,the AutoML for Model Compression(AMC)is used to compress themodel and reduce computational complexity.Experimental results indicate that our proposed algorithm attains an average accuracy of 96.12%on our self-built tea leaf disease dataset,surpassing the original MobileNetV2 by 1.91%.Furthermore,the number of model parameters have been reduced by 40%,making itmore suitable for practical application in tea plantation environments.展开更多
Acoustic scene classification(ASC)is a method of recognizing and classifying environments that employ acoustic signals.Various ASC approaches based on deep learning have been developed,with convolutional neural networ...Acoustic scene classification(ASC)is a method of recognizing and classifying environments that employ acoustic signals.Various ASC approaches based on deep learning have been developed,with convolutional neural networks(CNNs)proving to be the most reliable and commonly utilized in ASC systems due to their suitability for constructing lightweight models.When using ASC systems in the real world,model complexity and device robustness are essential considerations.In this paper,we propose a two-pass mobile network for low-complexity classification of the acoustic scene,named TP-MobNet.With inverse residuals and linear bottlenecks,TPMobNet is based on MobileNetV2,and following mobile blocks,coordinate attention and two-pass fusion approaches are utilized.The log-range dependencies and precise position information in feature maps can be trained via coordinate attention.By capturing more diverse feature resolutions at the network’s end sides,two-pass fusions can also train generalization.Also,the model size is reduced by applying weight quantization to the trained model.By adding weight quantization to the trained model,the model size is also lowered.The TAU Urban Acoustic Scenes 2020 Mobile development set was used for all of the experiments.It has been confirmed that the proposed model,with a model size of 219.6 kB,achieves an accuracy of 73.94%.展开更多
With the improvement of image editing technology,the threshold of image tampering technology decreases,which leads to a decrease in the authenticity of image content.This has also driven research on image forgery dete...With the improvement of image editing technology,the threshold of image tampering technology decreases,which leads to a decrease in the authenticity of image content.This has also driven research on image forgery detection techniques.In this paper,a U-Net with multiple sensory field feature extraction(MSCU-Net)for image forgery detection is proposed.The proposed MSCU-Net is an end-to-end image essential attribute segmentation network that can perform image forgery detection without any pre-processing or post-processing.MSCU-Net replaces the single-scale convolution module in the original network with an improved multiple perceptual field convolution module so that the decoder can synthesize the features of different perceptual fields use residual propagation and residual feedback to recall the input feature information and consolidate the input feature information to make the difference in image attributes between the untampered and tampered regions more obvious,and introduce the channel coordinate confusion attention mechanism(CCCA)in skip-connection to further improve the segmentation accuracy of the network.In this paper,extensive experiments are conducted on various mainstream datasets,and the results verify the effectiveness of the proposed method,which outperforms the state-of-the-art image forgery detection methods.展开更多
基金supported in part by the National Natural Science Foundation of China under Grants 61972056 and 61901061the Science Fund for Creative Research Groups of Hunan Province under Grant 2020JJ1006+3 种基金the Natural Science Foundation of Hunan Province under Grant 2020JJ5603the Postgraduate Training Innovation Base Construction Project of Hunan Province under Grant 2019-248-51the Basic Research Fund of Zhongye Changtian International Engineering Co.,Ltd.under Grant 2020JCYJ07the Scientific Research Fund of Hunan Provincial Education Department under Grant 19C0031.
文摘Recently,Siamese-based trackers have achieved excellent performance in object tracking.However,the high speed and deformation of objects in the movement process make tracking difficult.Therefore,we have incorporated cascaded region-proposal-network(RPN)fusion and coordinate attention into Siamese trackers.The proposed network framework consists of three parts:a feature-extraction sub-network,coordinate attention block,and cascaded RPN block.We exploit the coordinate attention block,which can embed location information into channel attention,to establish long-term spatial location dependence while maintaining channel associations.Thus,the features of different layers are enhanced by the coordinate attention block.We then send these features separately into the cascaded RPN for classification and regression.According to the two classification and regression results,the final position of the target is obtained.To verify the effectiveness of the proposed method,we conducted comprehensive experiments on the OTB100,VOT2016,UAV123,and GOT-10k datasets.Compared with other state-of-the-art trackers,the proposed tracker achieved good performance and can run at real-time speed.
文摘Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than considering only the significant context. Method This paper proposes a novel video anomaly detection method called COVAD that mainly focuses on the region of interest in the video instead of the entire video. Our proposed COVAD method is based on an autoencoded convolutional neural network and a coordinated attention mechanism,which can effectively capture meaningful objects in the video and dependencies among different objects. Relying on the existing memory-guided video frame prediction network, our algorithm can significantly predict the future motion and appearance of objects in a video more effectively. Result The proposed algorithm obtained better experimental results on multiple datasets and outperformed the baseline models considered in our analysis. Simultaneously, we provide an improved visual test that can provide pixel-level anomaly explanations.
基金supported by National Natural Science Foundation of China(Nos.11905028,12105040)Scientific Research Project of Education Department of Jilin Province(No.JJKH20231294KJ)。
文摘Neutron radiography is a crucial nondestructive testing technology widely used in the aerospace,military,and nuclear industries.However,because of the physical limitations of neutron sources and collimators,the resulting neutron radiographic images inevitably exhibit multiple distortions,including noise,geometric unsharpness,and white spots.Furthermore,these distortions are particularly significant in compact neutron radiography systems with low neutron fluxes.Therefore,in this study,we devised a multi-distortion suppression network that employs a modified generative adversarial network to improve the quality of degraded neutron radiographic images.Real neutron radiographic image datasets with various types and levels of distortion were built for the first time as multi-distortion suppression datasets.Thereafter,the coordinate attention mechanism was incorporated into the backbone network to augment the capability of the proposed network to learn the abstract relationship between ideally clear and degraded images.Extensive experiments were performed;the results show that the proposed method can effectively suppress multiple distortions in real neutron radiographic images and achieve state-of-theart perceptual visual quality,thus demonstrating its application potential in neutron radiography.
基金supported by National Key Research and Development Program(No.2016YFD0201305-07)Guizhou Provincial Basic Research Program(Natural Science)(No.ZK[2023]060)Open Fund Project in Semiconductor Power Device Reliability Engineering Center of Ministry of Education(No.ERCMEKFJJ2019-06).
文摘Diseases in tea trees can result in significant losses in both the quality and quantity of tea production.Regular monitoring can help to prevent the occurrence of large-scale diseases in tea plantations.However,existingmethods face challenges such as a high number of parameters and low recognition accuracy,which hinders their application in tea plantation monitoring equipment.This paper presents a lightweight I-MobileNetV2 model for identifying diseases in tea leaves,to address these challenges.The proposed method first embeds a Coordinate Attention(CA)module into the originalMobileNetV2 network,enabling the model to locate disease regions accurately.Secondly,a Multi-branch Parallel Convolution(MPC)module is employed to extract disease features across multiple scales,improving themodel’s adaptability to different disease scales.Finally,the AutoML for Model Compression(AMC)is used to compress themodel and reduce computational complexity.Experimental results indicate that our proposed algorithm attains an average accuracy of 96.12%on our self-built tea leaf disease dataset,surpassing the original MobileNetV2 by 1.91%.Furthermore,the number of model parameters have been reduced by 40%,making itmore suitable for practical application in tea plantation environments.
基金This work was supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)[No.2021-0-0268,Artificial Intelligence Innovation Hub(Artificial Intelligence Institute,Seoul National University)]。
文摘Acoustic scene classification(ASC)is a method of recognizing and classifying environments that employ acoustic signals.Various ASC approaches based on deep learning have been developed,with convolutional neural networks(CNNs)proving to be the most reliable and commonly utilized in ASC systems due to their suitability for constructing lightweight models.When using ASC systems in the real world,model complexity and device robustness are essential considerations.In this paper,we propose a two-pass mobile network for low-complexity classification of the acoustic scene,named TP-MobNet.With inverse residuals and linear bottlenecks,TPMobNet is based on MobileNetV2,and following mobile blocks,coordinate attention and two-pass fusion approaches are utilized.The log-range dependencies and precise position information in feature maps can be trained via coordinate attention.By capturing more diverse feature resolutions at the network’s end sides,two-pass fusions can also train generalization.Also,the model size is reduced by applying weight quantization to the trained model.By adding weight quantization to the trained model,the model size is also lowered.The TAU Urban Acoustic Scenes 2020 Mobile development set was used for all of the experiments.It has been confirmed that the proposed model,with a model size of 219.6 kB,achieves an accuracy of 73.94%.
基金supported in part by the National Natural Science Foundation of China(Grant Number 61971078)Chongqing University of Technology Graduate Innovation Foundation(Grant Number gzlcx20222064).
文摘With the improvement of image editing technology,the threshold of image tampering technology decreases,which leads to a decrease in the authenticity of image content.This has also driven research on image forgery detection techniques.In this paper,a U-Net with multiple sensory field feature extraction(MSCU-Net)for image forgery detection is proposed.The proposed MSCU-Net is an end-to-end image essential attribute segmentation network that can perform image forgery detection without any pre-processing or post-processing.MSCU-Net replaces the single-scale convolution module in the original network with an improved multiple perceptual field convolution module so that the decoder can synthesize the features of different perceptual fields use residual propagation and residual feedback to recall the input feature information and consolidate the input feature information to make the difference in image attributes between the untampered and tampered regions more obvious,and introduce the channel coordinate confusion attention mechanism(CCCA)in skip-connection to further improve the segmentation accuracy of the network.In this paper,extensive experiments are conducted on various mainstream datasets,and the results verify the effectiveness of the proposed method,which outperforms the state-of-the-art image forgery detection methods.
文摘由于浅层卷积神经网络(convolutional neural network,CNN)模型感受野的限制,无法捕获远距离特征,在高光谱图像(hyperspectral image,HSI)分类问题中无法充分利用图像空间-光谱信息,很难获得较高精度的分类结果。针对上述问题,本文提出了一种基于卷积神经网络与注意力机制的模型(model based on convolutional neural network and attention mechanism,CNNAM),该模型利用CA(coordinate attention)对图像通道数据进行位置编码,并利用以自注意力机制为核心架构的Transformer模块对其进行远距离特征提取以解决CNN感受野的限制问题。CNNAM在Indian Pines和Salinas两个数据集上得到的总体分类精度分别为97.63%和99.34%,对比于其他模型,本文提出的模型表现出更好的分类性能。另外,本文以是否结合CA为参考进行了消融实验,并证明了CA在CNNAM中发挥重要作用。实验证明将传统CNN与注意力机制相结合可以在HSI分类问题中获得更高的分类精度。