Automatic modulation recognition(AMR)of radiation source signals is a research focus in the field of cognitive radio.However,the AMR of radiation source signals at low SNRs still faces a great challenge.Therefore,the ...Automatic modulation recognition(AMR)of radiation source signals is a research focus in the field of cognitive radio.However,the AMR of radiation source signals at low SNRs still faces a great challenge.Therefore,the AMR method of radiation source signals based on two-dimensional data matrix and improved residual neural network is proposed in this paper.First,the time series of the radiation source signals are reconstructed into two-dimensional data matrix,which greatly simplifies the signal preprocessing process.Second,the depthwise convolution and large-size convolutional kernels based residual neural network(DLRNet)is proposed to improve the feature extraction capability of the AMR model.Finally,the model performs feature extraction and classification on the two-dimensional data matrix to obtain the recognition vector that represents the signal modulation type.Theoretical analysis and simulation results show that the AMR method based on two-dimensional data matrix and improved residual network can significantly improve the accuracy of the AMR method.The recognition accuracy of the proposed method maintains a high level greater than 90% even at -14 dB SNR.展开更多
With the widespread use of Internet of Things(IoT)technology in daily life and the considerable safety risks of falls for elderly individuals,research on IoT-based fall detection systems has gainedmuch attention.This ...With the widespread use of Internet of Things(IoT)technology in daily life and the considerable safety risks of falls for elderly individuals,research on IoT-based fall detection systems has gainedmuch attention.This paper proposes an IoT-based spatiotemporal data processing framework based on a depthwise separable convolution generative adversarial network using skip-connection(Skip-DSCGAN)for fall detection.The method uses spatiotemporal data from accelerometers and gyroscopes in inertial sensors as input data.A semisupervised learning approach is adopted to train the model using only activities of daily living(ADL)data,which can avoid data imbalance problems.Furthermore,a quantile-based approach is employed to determine the fall threshold,which makes the fall detection frameworkmore robust.This proposed fall detection framework is evaluated against four other generative adversarial network(GAN)models with superior anomaly detection performance using two fall public datasets(SisFall&MobiAct).The test results show that the proposed method achieves better results,reaching 96.93% and 92.75% accuracy on the above two test datasets,respectively.At the same time,the proposed method also achieves satisfactory results in terms ofmodel size and inference delay time,making it suitable for deployment on wearable devices with limited resources.In addition,this paper also compares GAN-based semisupervised learning methods with supervised learning methods commonly used in fall detection.It clarifies the advantages of GAN-based semisupervised learning methods in fall detection.展开更多
With the growth of the Internet,more and more business is being done online,for example,online offices,online education and so on.While this makes people’s lives more convenient,it also increases the risk of the netw...With the growth of the Internet,more and more business is being done online,for example,online offices,online education and so on.While this makes people’s lives more convenient,it also increases the risk of the network being attacked by malicious code.Therefore,it is important to identify malicious codes on computer systems efficiently.However,most of the existing malicious code detection methods have two problems:(1)The ability of the model to extract features is weak,resulting in poor model performance.(2)The large scale of model data leads to difficulties deploying on devices with limited resources.Therefore,this paper proposes a lightweight malicious code identification model Lightweight Malicious Code Classification Method Based on Improved SqueezeNet(LCMISNet).In this paper,the MFire lightweight feature extraction module is constructed by proposing a feature slicing module and a multi-size depthwise separable convolution module.The feature slicing module reduces the number of parameters by grouping features.The multi-size depthwise separable convolution module reduces the number of parameters and enhances the feature extraction capability by replacing the standard convolution with depthwise separable convolution with different convolution kernel sizes.In addition,this paper also proposes a feature splicing module to connect the MFire lightweight feature extraction module based on the feature reuse and constructs the lightweight model LCMISNet.The malicious code recognition accuracy of LCMISNet on the BIG 2015 dataset and the Malimg dataset reaches 98.90% and 99.58%,respectively.It proves that LCMISNet has a powerful malicious code recognition performance.In addition,compared with other network models,LCMISNet has better performance,and a lower number of parameters and computations.展开更多
In the model of the vehicle recognition algorithm implemented by the convolutional neural network,the model needs to compute and store a lot of parameters.Too many parameters occupy a lot of computational resources ma...In the model of the vehicle recognition algorithm implemented by the convolutional neural network,the model needs to compute and store a lot of parameters.Too many parameters occupy a lot of computational resources making it difficult to run on computers with poor performance.Therefore,obtaining more efficient feature information of target image or video with better accuracy on computers with limited arithmetic power becomes the main goal of this research.In this paper,a lightweight densely connected,and deeply separable convolutional network(DCDSNet)algorithmis proposed to achieve this goal.Visual Geometry Group(VGG)model is improved by utilizing the convolution instead of the fully connected module,the deeply separable convolution module,and the densely connected network module,with the first two modules reducing the parameters and the third module allowing the algorithm to have more features in a limited number of parameters.The algorithm achieves better results in the mine vehicle recognition dataset.Experiments show that the recognition accuracy is improved by 4.41% compared to VGG19 and the amount of parameters is reduced by 71% compared to VGG19.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Recently,video-based fire detection technology has become an important research topic in the field of machine vision.This paper proposes a method of combining the classification model and target detection model in dee...Recently,video-based fire detection technology has become an important research topic in the field of machine vision.This paper proposes a method of combining the classification model and target detection model in deep learning for fire detection.Firstly,the depthwise separable convolution is used to classify fire images,which saves a lot of detection time under the premise of ensuring detection accuracy.Secondly,You Only Look Once version 3(YOLOv3)target regression function is used to output the fire position information for the images whose classification result is fire,which avoids the problem that the accuracy of detection cannot be guaranteed by using YOLOv3 for target classification and position regression.At the same time,the detection time of target regression for images without fire is greatly reduced saved.The experiments were tested using a network public database.The detection accuracy reached 98%and the detection rate reached 38fps.This method not only saves the workload of manually extracting flame characteristics,reduces the calculation cost,and reduces the amount of parameters,but also improves the detection accuracy and detection rate.展开更多
Pointwise convolution is usually utilized to expand or squeeze features in modern lightweight deep models.However,it takes up most of the overall computational cost(usually more than 90%).This paper proposes a novel P...Pointwise convolution is usually utilized to expand or squeeze features in modern lightweight deep models.However,it takes up most of the overall computational cost(usually more than 90%).This paper proposes a novel Poker module to expand features by taking advantage of cheap depthwise convolution.As a result,the Poker module can greatly reduce the computational cost,and meanwhile generate a large number of effective features to guarantee the performance.The proposed module is standardized and can be employed wherever the feature expansion is needed.By varying the stride and the number of channels,different kinds of bottlenecks are designed to plug the proposed Poker module into the network.Thus,a lightweight model can be easily assembled.Experiments conducted on benchmarks reveal the effectiveness of our proposed Poker module.And our Poker Net models can reduce the computational cost by 7.1%-15.6%.Poker Net models achieve comparable or even higher recognition accuracy than previous state-of-the-art(SOTA)models on the Image Net ILSVRC2012 classification dataset.Code is available at https://github.com/diaomin/pokernet.展开更多
Channel pruning can reduce memory consumption and running time with least performance damage,and is one of the most important techniques in network compression.However,existing channel pruning methods mainly focus on ...Channel pruning can reduce memory consumption and running time with least performance damage,and is one of the most important techniques in network compression.However,existing channel pruning methods mainly focus on the pruning of standard convolutional networks,and they rely intensively on time-consuming fine-tuning to achieve the performance improvement.To this end,we present a novel efficient probability-based channel pruning method for depthwise separable convolutional networks.Our method leverages a new simple yet effective probability-based channel pruning criterion by taking the scaling and shifting factors of batch normalization layers into consideration.A novel shifting factor fusion technique is further developed to improve the performance of the pruned networks without requiring extra time-consuming fine-tuning.We apply the proposed method to five representative deep learning networks,namely MobileNetV1,MobileNetV2,ShuffleNetV1,ShuffleNetV2,and GhostNet,to demonstrate the efficiency of our pruning method.Extensive experimental results and comparisons on publicly available CIFAR10,CIFAR100,and ImageNet datasets validate the feasibility of the proposed method.展开更多
Image classification using Convolutional Neural Network(CNN)achieves optimal perfor-mance with a particular strategy.MobileNet reduces the parameter number for learning features by switching from the standard convolut...Image classification using Convolutional Neural Network(CNN)achieves optimal perfor-mance with a particular strategy.MobileNet reduces the parameter number for learning features by switching from the standard convolution paradigm to the depthwise separable convolution(DSC)paradigm.However,there are not enough features to learn for identify-ing the freshness of fish eyes.Furthermore,minor variances in features should not require complicated CNN architecture.In this paper,our first contribution proposed DSC Bottle-neck with Expansion for learning features of the freshness of fish eyes with a Bottleneck Multiplier.The second contribution proposed Residual Transition to bridge current feature maps and skip connection feature maps to the next convolution block.The third contribu-tion proposed MobileNetV1 Bottleneck with Expansion(MB-BE)for classifying the freshness of fish eyes.The result obtained from the Freshness of the Fish Eyes dataset shows that MB-BE outperformed other models such as original MobileNet,VGG16,Densenet,Nasnet Mobile with 63.21%accuracy.展开更多
Image semantic segmentation has become an essential part of autonomous driving.To further improve the generalization ability and the robustness of semantic segmentation algorithms,a lightweight algorithm network based...Image semantic segmentation has become an essential part of autonomous driving.To further improve the generalization ability and the robustness of semantic segmentation algorithms,a lightweight algorithm network based on Squeeze-and-Excitation Attention Mechanism(SE)and Depthwise Separable Convolution(DSC)is designed.Meanwhile,Adam-GC,an Adam optimization algorithm based on Gradient Compression(GC),is proposed to improve the training speed,segmentation accuracy,generalization ability and stability of the algorithm network.To verify and compare the effectiveness of the algorithm network proposed in this paper,the trained networkmodel is used for experimental verification and comparative test on the Cityscapes semantic segmentation dataset.The validation and comparison results show that the overall segmentation results of the algorithmnetwork can achieve 78.02%MIoU on Cityscapes validation set,which is better than the basic algorithm network and the other latest semantic segmentation algorithms network.Besides meeting the stability and accuracy requirements,it has a particular significance for the development of image semantic segmentation.展开更多
Memristor-based neuromorphic computing shows great potential for high-speed and high-throughput signal processing applications,such as electroencephalogram(EEG)signal processing.Nonetheless,the size of one-transistor ...Memristor-based neuromorphic computing shows great potential for high-speed and high-throughput signal processing applications,such as electroencephalogram(EEG)signal processing.Nonetheless,the size of one-transistor one-resistor(1T1R)memristor arrays is limited by the non-ideality of the devices,which prevents the hardware implementation of large and complex networks.In this work,we propose the depthwise separable convolution and bidirectional gate recurrent unit(DSC-BiGRU)network,a lightweight and highly robust hybrid neural network based on 1T1R arrays that enables efficient processing of EEG signals in the temporal,frequency and spatial domains by hybridizing DSC and BiGRU blocks.The network size is reduced and the network robustness is improved while ensuring the network classification accuracy.In the simulation,the measured non-idealities of the 1T1R array are brought into the network through statistical analysis.Compared with traditional convolutional networks,the network parameters are reduced by 95%and the network classification accuracy is improved by 21%at a 95%array yield rate and 5%tolerable error.This work demonstrates that lightweight and highly robust networks based on memristor arrays hold great promise for applications that rely on low consumption and high efficiency.展开更多
In the deep learning approach for identifying plant diseases,the high complexity of the network model,the large number of parameters,and great computational effort make it challenging to deploy the model on terminal d...In the deep learning approach for identifying plant diseases,the high complexity of the network model,the large number of parameters,and great computational effort make it challenging to deploy the model on terminal devices with limited computational resources.In this study,a lightweight method for plant diseases identification that is an improved version of the ShuffleNetV2 model is proposed.In the proposed model,the depthwise convolution in the basic module of ShuffleNetV2 is replaced with mixed depthwise convolution to capture crop pest images with different resolutions;the efficient channel attention module is added into the ShuffleNetV2 model network structure to enhance the channel features;and the ReLU activation function is replaced with the ReLU6 activation function to prevent the gen-eration of large gradients.Experiments are conducted on the public dataset PlantVillage.The results show that the proposed model achieves an accuracy of 99.43%,which is an improvement of 0.6 percentage points compared to the ShuffleNetV2 model.Compared to lightweight network models,such as MobileNetV2,MobileNetV3,EfficientNet,and EfficientNetV2,and classical convolutional neural network models,such as ResNet34,ResNet50,and ResNet101,the proposed model has fewer parameters and higher recognition accuracy,which provides guidance for deploying crop pest identification methods on resource-constrained devices,including mobile terminals.展开更多
Infrared target detection models are more required than ever before to be deployed on embedded platforms,which requires models with less memory consumption and better real-time performance while considering accuracy.T...Infrared target detection models are more required than ever before to be deployed on embedded platforms,which requires models with less memory consumption and better real-time performance while considering accuracy.To address the above challenges,we propose a modified You Only Look Once(YOLO)algorithm PF-YOLOv4-Tiny.The algorithm incorpo-rates spatial pyramidal pooling(SPP)and squeeze-and-excitation(SE)visual attention modules to enhance the target localization capability.The PANet-based-feature pyramid networks(P-FPN)are proposed to transfer semantic information and location information simultaneously to ameliorate detection accuracy.To lighten the network,the standard convolutions other than the backbone network are replaced with depthwise separable convolutions.In post-processing the images,the soft-non-maximum suppression(soft-NMS)algorithm is employed to subside the missed and false detection problems caused by the occlusion between targets.The accuracy of our model can finally reach 61.75%,while the total Params is only 9.3 M and GFLOPs is 11.At the same time,the inference speed reaches 87 FPS on NVIDIA GeForce GTX 1650 Ti,which can meet the requirements of the infrared target detection algorithm for the embedded deployments.展开更多
Intelligent straw coverage detection plays an important role in agricultural production and the ecological environment.Traditional pattern recognition has some problems,such as low precision and a long processing time...Intelligent straw coverage detection plays an important role in agricultural production and the ecological environment.Traditional pattern recognition has some problems,such as low precision and a long processing time,when segmenting complex farmland,which cannot meet the conditions of embedded equipment deployment.Based on these problems,we proposed a novel deep learning model with high accuracy,small model size and fast running speed named Residual Unet with Attention mechanism using depthwise convolution(RADw–UNet).This algorithm is based on the UNet symmetric codec model.All the feature extraction modules of the network adopt the residual structure,and the whole network only adopts 8 times the downsampling rate to reduce the redundant parameters.To better extract the semantic information of the spatial and channel dimensions,the depthwise convolutional residual block is designed to be used in feature maps with larger depths to reduce the number of parameters while improving the model accuracy.Meanwhile,the multi–level attention mechanism is introduced in the skip connection to effectively integrate the information of the low–level and high–level feature maps.The experimental results showed that the segmentation performance of RADw–UNet outperformed traditional methods and the UNet algorithm.The algorithm achieved an mIoU of 94.9%,the number of trainable parameters was only approximately 0.26 M,and the running time for a single picture was less than 0.03 s.展开更多
Purpose:As to January 11,2021,coronavirus disease(COVID-19)has caused more than 2 million deaths worldwide.Mainly diagnostic methods of COVID-19 are:(i)nucleic acid testing.This method requires high requirements on th...Purpose:As to January 11,2021,coronavirus disease(COVID-19)has caused more than 2 million deaths worldwide.Mainly diagnostic methods of COVID-19 are:(i)nucleic acid testing.This method requires high requirements on the sample testing environment.When collecting samples,staff are in a susceptible environment,which increases the risk of infection.(ii)chest computed tomography.The cost of it is high and some radiation in the scan process.(iii)chest X-ray images.It has the advantages of fast imaging,higher spatial recognition than chest computed tomography.Therefore,our team chose the chest X-ray images as the experimental dataset in this paper.Methods:We proposed a novel framework—BEVGG and three methods(BEVGGC-I,BEVGGC-II,and BEVGGC-III)to diagnose COVID-19 via chest X-ray images.Besides,we used biogeography-based optimization to optimize the values of hyperparameters of the convolutional neural network.Results:The experimental results show that the OA of our proposed three methods are 97.65%±0.65%,94.49%±0.22%and 94.81%±0.52%.BEVGGC-I has the best performance of all methods.Conclusions:The OA of BEVGGC-I is 9.59%±1.04%higher than that of state-of-the-art methods.展开更多
According to recent research statistics,approximately 30%of people who experienced falls are over the age of 65.Therefore,it is meaningful research to detect it in time and take appropriate measures when falling behav...According to recent research statistics,approximately 30%of people who experienced falls are over the age of 65.Therefore,it is meaningful research to detect it in time and take appropriate measures when falling behavior occurs.In this paper,a fall detection model based on improved human posture estimation algorithm is proposed.The improved human posture estimation algorithm is implemented on the basis of Openpose.An im-proved strategy based on depthwise separable convolution combined with HDC structure is proposed.The depthwise separable convolution is used to replace the convolution neural network structure,which makes the network lightweight and reduces the redundant layer in the network.At the same time,in order to ensure that the image features are not lost and ensure the accuracy of detecting human joint points,HDC structure is introduced.Experiments show that the improved algorithm with HDC structure has higher accuracy in joint point detection.Then,human posture estimation is applied to fall detection research,and fall event modeling is carried out through fall feature extraction.The designed convolution neural network model is used to classify and distinguish falls.The experimental results show that our method achieves 98.53%,97.71%and 97.20%accuracy on three public fall detection data sets.Compared with the experimental results of other methods on the same data set,the model designed in this paper has a certain improvement in system accuracy.The sensitivity is also improved,which will reduce the error detection probability of the system.In addition,this paper also verifies the real-time performance of the model.Even if researchers are experimenting with low-level hardware,it can ensure a certain detection speed without too much delay.展开更多
Deep kernel mapping support vector machines have achieved good results in numerous tasks by mapping features from a low-dimensional space to a high-dimensional space and then using support vector machines for classifi...Deep kernel mapping support vector machines have achieved good results in numerous tasks by mapping features from a low-dimensional space to a high-dimensional space and then using support vector machines for classification.However,the depth kernel mapping support vector machine does not take into account the connection of different dimensional spaces and increases the model parameters.To further improve the recognition capability of deep kernel mapping support vector machines while reducing the number of model parameters,this paper proposes a framework of Lightweight Deep Convolutional Cross-Connected Kernel Mapping Support Vector Machines(LC-CKMSVM).The framework consists of a feature extraction module and a classification module.The feature extraction module first maps the data from low-dimensional to high-dimensional space by fusing the representations of different dimensional spaces through cross-connections;then,it uses depthwise separable convolution to replace part of the original convolution to reduce the number of parameters in the module;The classification module uses a soft margin support vector machine for classification.The results on 6 different visual datasets show that LC-CKMSVM obtains better classification accuracies on most cases than the other five models.展开更多
In unstructured environments,dense grape fruit growth and the presence of occlusion cause difficult recognition problems,which will seriously affect the performance of grape picking robots.To address these problems,th...In unstructured environments,dense grape fruit growth and the presence of occlusion cause difficult recognition problems,which will seriously affect the performance of grape picking robots.To address these problems,this study improves the YOLOx-Tiny model and proposes a new grape detection model,YOLOX-RA,which can quickly and accurately identify densely growing and occluded grape bunches.The proposed YOLOX-RA model uses a 3×3 convolutional layer with a step size of 2 to replace the focal layer to reduce the computational burden.The CBS layer in the ResBlock_Body module of the second,third,and fourth layers of the backbone layer is removed,and the CSPLayer module is replaced by the ResBlock-M module to speed up the detection.An auxiliary network(AlNet)with the remaining network blocks was added after the ResBlock-M module to improve the detection accuracy.Two depth-separable convolutions(DsC)are used in the neck module layer to replace the normal convolution to reduce the computational cost.We evaluated the detection performance of SSD,YOLOv4 SSD,YOLOv4-Tiny,YOLO-Grape,YOLOv5-X,YOLOX-Tiny,and YOLOX-RA on a grape test set.The results show that the YOLOX-RA model has the best detection performance,achieving 88.75%mAP,a recognition speed of 84.88 FPS,and model size of 17.53 MB.It can accurately detect densely grown and shaded grape bunches,which can effectively improve the performance of the grape picking robot.展开更多
In CRYPTO 2019,Gohr opens up a new direction for cryptanalysis.He successfully applied deep learning to differential cryptanalysis against the NSA block cipher SPECK32/64,achieving higher accuracy than traditional dif...In CRYPTO 2019,Gohr opens up a new direction for cryptanalysis.He successfully applied deep learning to differential cryptanalysis against the NSA block cipher SPECK32/64,achieving higher accuracy than traditional differential distinguishers.Until now,one of the mainstream research directions is increasing the training sample size and utilizing different neural networks to improve the accuracy of neural distinguishers.This conversion mindset may lead to a huge number of parameters,heavy computing load,and a large number of memory in the distinguishers training process.However,in the practical application of cryptanalysis,the applicability of the attacks method in a resourceconstrained environment is very important.Therefore,we focus on the cost optimization and aim to reduce network parameters for differential neural cryptanalysis.ln this paper,we propose two cost-optimized neural distinguisher improvement methods from the aspect of data format and network structure,respectively.Firstly,we obtain a partial output difference neural distinguisher using only 4-bits training data format which is constructed with a new advantage bits search algorithm based on two key improvement conditions.In addition,we perform an interpretability analysis of the new neural distinguishers whose results are mainly reflected in the relationship between the neural distinguishers,truncated differential,and advantage bits.Secondly,we replace the traditional convolution with the depthwise separable convolution to reduce the training cost without affecting the accuracy as much as possible.Overall,the number of training parameters can be reduced by less than 50%by using our new network structure for training neural distinguishers.Finally,we apply the network structure to the partial output difference neural distinguishers.The combinatorial approach have led to a further reduction in the number of parameters(approximately 30% of Gohr's distinguishers for SPECK).展开更多
With the remarkable success of change detection(CD)in remote sensing images in the context of deep learning,many convolutional neural network(CNN)based methods have been proposed.In the current research,to obtain a be...With the remarkable success of change detection(CD)in remote sensing images in the context of deep learning,many convolutional neural network(CNN)based methods have been proposed.In the current research,to obtain a better context modeling method for remote sensing images and to capture more spatiotemporal characteristics,several attention-based methods and transformer(TR)-based methods have been proposed.Recent research has also continued to innovate on TR-based methods,and many new methods have been proposed.Most of them require a huge number of calculation to achieve good results.Therefore,using the TR-based mehtod while maintaining the overhead low is a problem to be solved.Here,we propose a GNN-based multi-scale transformer siamese network for remote sensing image change detection(GMTS)that maintains a low network overhead while effectively modeling context in the spatiotemporal domain.We also design a novel hybrid backbone to extract features.Compared with the current CNN backbone,our backbone network has a lower overhead and achieves better results.Further,we use high/low frequency(HiLo)attention to extract more detailed local features and the multi-scale pooling pyramid transformer(MPPT)module to focus on more global features respectively.Finally,we leverage the context modeling capabilities of TR in the spatiotemporal domain to optimize the extracted features.We have a relatively low number of parameters compared to that required by current TR-based methods and achieve a good effect improvement,which provides a good balance between efficiency and performance.展开更多
基金National Natural Science Foundation of China under Grant No.61973037China Postdoctoral Science Foundation under Grant No.2022M720419。
文摘Automatic modulation recognition(AMR)of radiation source signals is a research focus in the field of cognitive radio.However,the AMR of radiation source signals at low SNRs still faces a great challenge.Therefore,the AMR method of radiation source signals based on two-dimensional data matrix and improved residual neural network is proposed in this paper.First,the time series of the radiation source signals are reconstructed into two-dimensional data matrix,which greatly simplifies the signal preprocessing process.Second,the depthwise convolution and large-size convolutional kernels based residual neural network(DLRNet)is proposed to improve the feature extraction capability of the AMR model.Finally,the model performs feature extraction and classification on the two-dimensional data matrix to obtain the recognition vector that represents the signal modulation type.Theoretical analysis and simulation results show that the AMR method based on two-dimensional data matrix and improved residual network can significantly improve the accuracy of the AMR method.The recognition accuracy of the proposed method maintains a high level greater than 90% even at -14 dB SNR.
基金supported partly by the Natural Science Foundation of Zhejiang Province,China(LGF21F020017).
文摘With the widespread use of Internet of Things(IoT)technology in daily life and the considerable safety risks of falls for elderly individuals,research on IoT-based fall detection systems has gainedmuch attention.This paper proposes an IoT-based spatiotemporal data processing framework based on a depthwise separable convolution generative adversarial network using skip-connection(Skip-DSCGAN)for fall detection.The method uses spatiotemporal data from accelerometers and gyroscopes in inertial sensors as input data.A semisupervised learning approach is adopted to train the model using only activities of daily living(ADL)data,which can avoid data imbalance problems.Furthermore,a quantile-based approach is employed to determine the fall threshold,which makes the fall detection frameworkmore robust.This proposed fall detection framework is evaluated against four other generative adversarial network(GAN)models with superior anomaly detection performance using two fall public datasets(SisFall&MobiAct).The test results show that the proposed method achieves better results,reaching 96.93% and 92.75% accuracy on the above two test datasets,respectively.At the same time,the proposed method also achieves satisfactory results in terms ofmodel size and inference delay time,making it suitable for deployment on wearable devices with limited resources.In addition,this paper also compares GAN-based semisupervised learning methods with supervised learning methods commonly used in fall detection.It clarifies the advantages of GAN-based semisupervised learning methods in fall detection.
文摘With the growth of the Internet,more and more business is being done online,for example,online offices,online education and so on.While this makes people’s lives more convenient,it also increases the risk of the network being attacked by malicious code.Therefore,it is important to identify malicious codes on computer systems efficiently.However,most of the existing malicious code detection methods have two problems:(1)The ability of the model to extract features is weak,resulting in poor model performance.(2)The large scale of model data leads to difficulties deploying on devices with limited resources.Therefore,this paper proposes a lightweight malicious code identification model Lightweight Malicious Code Classification Method Based on Improved SqueezeNet(LCMISNet).In this paper,the MFire lightweight feature extraction module is constructed by proposing a feature slicing module and a multi-size depthwise separable convolution module.The feature slicing module reduces the number of parameters by grouping features.The multi-size depthwise separable convolution module reduces the number of parameters and enhances the feature extraction capability by replacing the standard convolution with depthwise separable convolution with different convolution kernel sizes.In addition,this paper also proposes a feature splicing module to connect the MFire lightweight feature extraction module based on the feature reuse and constructs the lightweight model LCMISNet.The malicious code recognition accuracy of LCMISNet on the BIG 2015 dataset and the Malimg dataset reaches 98.90% and 99.58%,respectively.It proves that LCMISNet has a powerful malicious code recognition performance.In addition,compared with other network models,LCMISNet has better performance,and a lower number of parameters and computations.
基金supported by the open project of National Local Joint Engineering Research Center for Agro-Ecological Big Data Analysis and Application Technology,“Adaptive Agricultural Machinery Motion Detection and Recognition in Natural Scenes”,AE202210By the school-level key discipline of Suzhou University in China with No.2019xjzdxk12022 Anhui Province College Research Program Project of the Suzhou Vocational College of Civil Aviation,No.2022AH053155.
文摘In the model of the vehicle recognition algorithm implemented by the convolutional neural network,the model needs to compute and store a lot of parameters.Too many parameters occupy a lot of computational resources making it difficult to run on computers with poor performance.Therefore,obtaining more efficient feature information of target image or video with better accuracy on computers with limited arithmetic power becomes the main goal of this research.In this paper,a lightweight densely connected,and deeply separable convolutional network(DCDSNet)algorithmis proposed to achieve this goal.Visual Geometry Group(VGG)model is improved by utilizing the convolution instead of the fully connected module,the deeply separable convolution module,and the densely connected network module,with the first two modules reducing the parameters and the third module allowing the algorithm to have more features in a limited number of parameters.The algorithm achieves better results in the mine vehicle recognition dataset.Experiments show that the recognition accuracy is improved by 4.41% compared to VGG19 and the amount of parameters is reduced by 71% compared to VGG19.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金This work was supported by Liaoning Provincial Science Public Welfare Research Fund Project(No.2016002006)Liaoning Provincial Department of Education Scientific Research Service Local Project(No.L201708).
文摘Recently,video-based fire detection technology has become an important research topic in the field of machine vision.This paper proposes a method of combining the classification model and target detection model in deep learning for fire detection.Firstly,the depthwise separable convolution is used to classify fire images,which saves a lot of detection time under the premise of ensuring detection accuracy.Secondly,You Only Look Once version 3(YOLOv3)target regression function is used to output the fire position information for the images whose classification result is fire,which avoids the problem that the accuracy of detection cannot be guaranteed by using YOLOv3 for target classification and position regression.At the same time,the detection time of target regression for images without fire is greatly reduced saved.The experiments were tested using a network public database.The detection accuracy reached 98%and the detection rate reached 38fps.This method not only saves the workload of manually extracting flame characteristics,reduces the calculation cost,and reduces the amount of parameters,but also improves the detection accuracy and detection rate.
基金supported by National Natural Science Foundation of China(Nos.61525306,61633021,61721004,61806194,U1803261 and 61976132)Major Project for New Generation of AI(No.2018AAA0100400)+2 种基金Beijing Nova Program(No.Z201100006820079)Shandong Provincial Key Research and Development Program(No.2019JZZY010119)CAS-AIR。
文摘Pointwise convolution is usually utilized to expand or squeeze features in modern lightweight deep models.However,it takes up most of the overall computational cost(usually more than 90%).This paper proposes a novel Poker module to expand features by taking advantage of cheap depthwise convolution.As a result,the Poker module can greatly reduce the computational cost,and meanwhile generate a large number of effective features to guarantee the performance.The proposed module is standardized and can be employed wherever the feature expansion is needed.By varying the stride and the number of channels,different kinds of bottlenecks are designed to plug the proposed Poker module into the network.Thus,a lightweight model can be easily assembled.Experiments conducted on benchmarks reveal the effectiveness of our proposed Poker module.And our Poker Net models can reduce the computational cost by 7.1%-15.6%.Poker Net models achieve comparable or even higher recognition accuracy than previous state-of-the-art(SOTA)models on the Image Net ILSVRC2012 classification dataset.Code is available at https://github.com/diaomin/pokernet.
基金the National Natural Science Foundation of China under Grant Nos.62036010 and 62072340the Zhejiang Provincial Natural Science Foundation of China under Grant Nos.LZ21F020001 and LSZ19F020001the Open Project Program of the State Key Laboratory of CAD&CG,Zhejiang University under Grant No.A2220.
文摘Channel pruning can reduce memory consumption and running time with least performance damage,and is one of the most important techniques in network compression.However,existing channel pruning methods mainly focus on the pruning of standard convolutional networks,and they rely intensively on time-consuming fine-tuning to achieve the performance improvement.To this end,we present a novel efficient probability-based channel pruning method for depthwise separable convolutional networks.Our method leverages a new simple yet effective probability-based channel pruning criterion by taking the scaling and shifting factors of batch normalization layers into consideration.A novel shifting factor fusion technique is further developed to improve the performance of the pruned networks without requiring extra time-consuming fine-tuning.We apply the proposed method to five representative deep learning networks,namely MobileNetV1,MobileNetV2,ShuffleNetV1,ShuffleNetV2,and GhostNet,to demonstrate the efficiency of our pruning method.Extensive experimental results and comparisons on publicly available CIFAR10,CIFAR100,and ImageNet datasets validate the feasibility of the proposed method.
文摘Image classification using Convolutional Neural Network(CNN)achieves optimal perfor-mance with a particular strategy.MobileNet reduces the parameter number for learning features by switching from the standard convolution paradigm to the depthwise separable convolution(DSC)paradigm.However,there are not enough features to learn for identify-ing the freshness of fish eyes.Furthermore,minor variances in features should not require complicated CNN architecture.In this paper,our first contribution proposed DSC Bottle-neck with Expansion for learning features of the freshness of fish eyes with a Bottleneck Multiplier.The second contribution proposed Residual Transition to bridge current feature maps and skip connection feature maps to the next convolution block.The third contribu-tion proposed MobileNetV1 Bottleneck with Expansion(MB-BE)for classifying the freshness of fish eyes.The result obtained from the Freshness of the Fish Eyes dataset shows that MB-BE outperformed other models such as original MobileNet,VGG16,Densenet,Nasnet Mobile with 63.21%accuracy.
基金supported by Qingdao People’s Livelihood Science and Technology Plan (Grant 19-6-1-88-nsh).
文摘Image semantic segmentation has become an essential part of autonomous driving.To further improve the generalization ability and the robustness of semantic segmentation algorithms,a lightweight algorithm network based on Squeeze-and-Excitation Attention Mechanism(SE)and Depthwise Separable Convolution(DSC)is designed.Meanwhile,Adam-GC,an Adam optimization algorithm based on Gradient Compression(GC),is proposed to improve the training speed,segmentation accuracy,generalization ability and stability of the algorithm network.To verify and compare the effectiveness of the algorithm network proposed in this paper,the trained networkmodel is used for experimental verification and comparative test on the Cityscapes semantic segmentation dataset.The validation and comparison results show that the overall segmentation results of the algorithmnetwork can achieve 78.02%MIoU on Cityscapes validation set,which is better than the basic algorithm network and the other latest semantic segmentation algorithms network.Besides meeting the stability and accuracy requirements,it has a particular significance for the development of image semantic segmentation.
基金Project supported by the National Key Research and Development Program of China(Grant No.2019YFB2205102)the National Natural Science Foundation of China(Grant Nos.61974164,62074166,61804181,62004219,62004220,and 62104256).
文摘Memristor-based neuromorphic computing shows great potential for high-speed and high-throughput signal processing applications,such as electroencephalogram(EEG)signal processing.Nonetheless,the size of one-transistor one-resistor(1T1R)memristor arrays is limited by the non-ideality of the devices,which prevents the hardware implementation of large and complex networks.In this work,we propose the depthwise separable convolution and bidirectional gate recurrent unit(DSC-BiGRU)network,a lightweight and highly robust hybrid neural network based on 1T1R arrays that enables efficient processing of EEG signals in the temporal,frequency and spatial domains by hybridizing DSC and BiGRU blocks.The network size is reduced and the network robustness is improved while ensuring the network classification accuracy.In the simulation,the measured non-idealities of the 1T1R array are brought into the network through statistical analysis.Compared with traditional convolutional networks,the network parameters are reduced by 95%and the network classification accuracy is improved by 21%at a 95%array yield rate and 5%tolerable error.This work demonstrates that lightweight and highly robust networks based on memristor arrays hold great promise for applications that rely on low consumption and high efficiency.
基金supported by the Guangxi Key R&D Project(Gui Ke AB21076021)the Project of Humanities and social sciences of“cultivation plan for thousands of young and middle-aged backbone teachers in Guangxi Colleges and universities”in 2021:Research on Collaborative integration of logistics service supply chain under high-quality development goals(2021QGRW044).
文摘In the deep learning approach for identifying plant diseases,the high complexity of the network model,the large number of parameters,and great computational effort make it challenging to deploy the model on terminal devices with limited computational resources.In this study,a lightweight method for plant diseases identification that is an improved version of the ShuffleNetV2 model is proposed.In the proposed model,the depthwise convolution in the basic module of ShuffleNetV2 is replaced with mixed depthwise convolution to capture crop pest images with different resolutions;the efficient channel attention module is added into the ShuffleNetV2 model network structure to enhance the channel features;and the ReLU activation function is replaced with the ReLU6 activation function to prevent the gen-eration of large gradients.Experiments are conducted on the public dataset PlantVillage.The results show that the proposed model achieves an accuracy of 99.43%,which is an improvement of 0.6 percentage points compared to the ShuffleNetV2 model.Compared to lightweight network models,such as MobileNetV2,MobileNetV3,EfficientNet,and EfficientNetV2,and classical convolutional neural network models,such as ResNet34,ResNet50,and ResNet101,the proposed model has fewer parameters and higher recognition accuracy,which provides guidance for deploying crop pest identification methods on resource-constrained devices,including mobile terminals.
基金supported by The Natural Science Foundation of the Jiangsu Higher Education Institutions of China(Grants No.19JKB520031).
文摘Infrared target detection models are more required than ever before to be deployed on embedded platforms,which requires models with less memory consumption and better real-time performance while considering accuracy.To address the above challenges,we propose a modified You Only Look Once(YOLO)algorithm PF-YOLOv4-Tiny.The algorithm incorpo-rates spatial pyramidal pooling(SPP)and squeeze-and-excitation(SE)visual attention modules to enhance the target localization capability.The PANet-based-feature pyramid networks(P-FPN)are proposed to transfer semantic information and location information simultaneously to ameliorate detection accuracy.To lighten the network,the standard convolutions other than the backbone network are replaced with depthwise separable convolutions.In post-processing the images,the soft-non-maximum suppression(soft-NMS)algorithm is employed to subside the missed and false detection problems caused by the occlusion between targets.The accuracy of our model can finally reach 61.75%,while the total Params is only 9.3 M and GFLOPs is 11.At the same time,the inference speed reaches 87 FPS on NVIDIA GeForce GTX 1650 Ti,which can meet the requirements of the infrared target detection algorithm for the embedded deployments.
基金National Natural Science Foundation of China,grant number 42001256key science and technology projects of science and technology department of Jilin province,Grant Number 20180201014NY+1 种基金science and technology project of education department of Jilin province,Grant Number JJKH20190927KJinnovation fund project of Jilin provincial development and reform commission,Grant Number 2019C054.
文摘Intelligent straw coverage detection plays an important role in agricultural production and the ecological environment.Traditional pattern recognition has some problems,such as low precision and a long processing time,when segmenting complex farmland,which cannot meet the conditions of embedded equipment deployment.Based on these problems,we proposed a novel deep learning model with high accuracy,small model size and fast running speed named Residual Unet with Attention mechanism using depthwise convolution(RADw–UNet).This algorithm is based on the UNet symmetric codec model.All the feature extraction modules of the network adopt the residual structure,and the whole network only adopts 8 times the downsampling rate to reduce the redundant parameters.To better extract the semantic information of the spatial and channel dimensions,the depthwise convolutional residual block is designed to be used in feature maps with larger depths to reduce the number of parameters while improving the model accuracy.Meanwhile,the multi–level attention mechanism is introduced in the skip connection to effectively integrate the information of the low–level and high–level feature maps.The experimental results showed that the segmentation performance of RADw–UNet outperformed traditional methods and the UNet algorithm.The algorithm achieved an mIoU of 94.9%,the number of trainable parameters was only approximately 0.26 M,and the running time for a single picture was less than 0.03 s.
基金Key Science and Technology Program of Henan Province,China(212102310084)J.Sun,X.Li,and C.Tang received the grant.Provincial Key Laboratory for Computer Information Processing Technology,Soochow University(KJS2048),J.Sun received the grant.
文摘Purpose:As to January 11,2021,coronavirus disease(COVID-19)has caused more than 2 million deaths worldwide.Mainly diagnostic methods of COVID-19 are:(i)nucleic acid testing.This method requires high requirements on the sample testing environment.When collecting samples,staff are in a susceptible environment,which increases the risk of infection.(ii)chest computed tomography.The cost of it is high and some radiation in the scan process.(iii)chest X-ray images.It has the advantages of fast imaging,higher spatial recognition than chest computed tomography.Therefore,our team chose the chest X-ray images as the experimental dataset in this paper.Methods:We proposed a novel framework—BEVGG and three methods(BEVGGC-I,BEVGGC-II,and BEVGGC-III)to diagnose COVID-19 via chest X-ray images.Besides,we used biogeography-based optimization to optimize the values of hyperparameters of the convolutional neural network.Results:The experimental results show that the OA of our proposed three methods are 97.65%±0.65%,94.49%±0.22%and 94.81%±0.52%.BEVGGC-I has the best performance of all methods.Conclusions:The OA of BEVGGC-I is 9.59%±1.04%higher than that of state-of-the-art methods.
文摘According to recent research statistics,approximately 30%of people who experienced falls are over the age of 65.Therefore,it is meaningful research to detect it in time and take appropriate measures when falling behavior occurs.In this paper,a fall detection model based on improved human posture estimation algorithm is proposed.The improved human posture estimation algorithm is implemented on the basis of Openpose.An im-proved strategy based on depthwise separable convolution combined with HDC structure is proposed.The depthwise separable convolution is used to replace the convolution neural network structure,which makes the network lightweight and reduces the redundant layer in the network.At the same time,in order to ensure that the image features are not lost and ensure the accuracy of detecting human joint points,HDC structure is introduced.Experiments show that the improved algorithm with HDC structure has higher accuracy in joint point detection.Then,human posture estimation is applied to fall detection research,and fall event modeling is carried out through fall feature extraction.The designed convolution neural network model is used to classify and distinguish falls.The experimental results show that our method achieves 98.53%,97.71%and 97.20%accuracy on three public fall detection data sets.Compared with the experimental results of other methods on the same data set,the model designed in this paper has a certain improvement in system accuracy.The sensitivity is also improved,which will reduce the error detection probability of the system.In addition,this paper also verifies the real-time performance of the model.Even if researchers are experimenting with low-level hardware,it can ensure a certain detection speed without too much delay.
基金This work is supported by the National Natural Science Foundation of China(61806013,61876010,61906005,62166002)General project of Science and Technology Plan of Beijing Municipal Education Commission(KM202110005028)+1 种基金Project of Interdisciplinary Research Institute of Beijing University of Technology(2021020101)International Research Cooperation Seed Fund of Beijing University of Technology(2021A01).
文摘Deep kernel mapping support vector machines have achieved good results in numerous tasks by mapping features from a low-dimensional space to a high-dimensional space and then using support vector machines for classification.However,the depth kernel mapping support vector machine does not take into account the connection of different dimensional spaces and increases the model parameters.To further improve the recognition capability of deep kernel mapping support vector machines while reducing the number of model parameters,this paper proposes a framework of Lightweight Deep Convolutional Cross-Connected Kernel Mapping Support Vector Machines(LC-CKMSVM).The framework consists of a feature extraction module and a classification module.The feature extraction module first maps the data from low-dimensional to high-dimensional space by fusing the representations of different dimensional spaces through cross-connections;then,it uses depthwise separable convolution to replace part of the original convolution to reduce the number of parameters in the module;The classification module uses a soft margin support vector machine for classification.The results on 6 different visual datasets show that LC-CKMSVM obtains better classification accuracies on most cases than the other five models.
基金the National Natural Science Foundation of Chima(32171909,51705365)Guangdong Basic and Applied Basic Research Foundation(2020B1515120050,2019A1515110304)+2 种基金NationalNatural Science Foundation of Guangdong(2023A1515011255)Yunfu Science and Technology Plan Project(2021A090103)Key Fields of Universities in Guangdong Province(2022ZDZX309).
文摘In unstructured environments,dense grape fruit growth and the presence of occlusion cause difficult recognition problems,which will seriously affect the performance of grape picking robots.To address these problems,this study improves the YOLOx-Tiny model and proposes a new grape detection model,YOLOX-RA,which can quickly and accurately identify densely growing and occluded grape bunches.The proposed YOLOX-RA model uses a 3×3 convolutional layer with a step size of 2 to replace the focal layer to reduce the computational burden.The CBS layer in the ResBlock_Body module of the second,third,and fourth layers of the backbone layer is removed,and the CSPLayer module is replaced by the ResBlock-M module to speed up the detection.An auxiliary network(AlNet)with the remaining network blocks was added after the ResBlock-M module to improve the detection accuracy.Two depth-separable convolutions(DsC)are used in the neck module layer to replace the normal convolution to reduce the computational cost.We evaluated the detection performance of SSD,YOLOv4 SSD,YOLOv4-Tiny,YOLO-Grape,YOLOv5-X,YOLOX-Tiny,and YOLOX-RA on a grape test set.The results show that the YOLOX-RA model has the best detection performance,achieving 88.75%mAP,a recognition speed of 84.88 FPS,and model size of 17.53 MB.It can accurately detect densely grown and shaded grape bunches,which can effectively improve the performance of the grape picking robot.
基金supported by the National Natural Science Foundation of China[Grant number 62206312].
文摘In CRYPTO 2019,Gohr opens up a new direction for cryptanalysis.He successfully applied deep learning to differential cryptanalysis against the NSA block cipher SPECK32/64,achieving higher accuracy than traditional differential distinguishers.Until now,one of the mainstream research directions is increasing the training sample size and utilizing different neural networks to improve the accuracy of neural distinguishers.This conversion mindset may lead to a huge number of parameters,heavy computing load,and a large number of memory in the distinguishers training process.However,in the practical application of cryptanalysis,the applicability of the attacks method in a resourceconstrained environment is very important.Therefore,we focus on the cost optimization and aim to reduce network parameters for differential neural cryptanalysis.ln this paper,we propose two cost-optimized neural distinguisher improvement methods from the aspect of data format and network structure,respectively.Firstly,we obtain a partial output difference neural distinguisher using only 4-bits training data format which is constructed with a new advantage bits search algorithm based on two key improvement conditions.In addition,we perform an interpretability analysis of the new neural distinguishers whose results are mainly reflected in the relationship between the neural distinguishers,truncated differential,and advantage bits.Secondly,we replace the traditional convolution with the depthwise separable convolution to reduce the training cost without affecting the accuracy as much as possible.Overall,the number of training parameters can be reduced by less than 50%by using our new network structure for training neural distinguishers.Finally,we apply the network structure to the partial output difference neural distinguishers.The combinatorial approach have led to a further reduction in the number of parameters(approximately 30% of Gohr's distinguishers for SPECK).
基金The authors acknowledge the National Natural Science Foundation of China(Grant nos.61772319,62002200,62202268 and 62272281)Shandong Natural Science Foundation of China(Grant no.ZR2021QF134 and ZR2021MF107)Yantai Science And Technology Innovation Development Plan(2022JCYJ031).
文摘With the remarkable success of change detection(CD)in remote sensing images in the context of deep learning,many convolutional neural network(CNN)based methods have been proposed.In the current research,to obtain a better context modeling method for remote sensing images and to capture more spatiotemporal characteristics,several attention-based methods and transformer(TR)-based methods have been proposed.Recent research has also continued to innovate on TR-based methods,and many new methods have been proposed.Most of them require a huge number of calculation to achieve good results.Therefore,using the TR-based mehtod while maintaining the overhead low is a problem to be solved.Here,we propose a GNN-based multi-scale transformer siamese network for remote sensing image change detection(GMTS)that maintains a low network overhead while effectively modeling context in the spatiotemporal domain.We also design a novel hybrid backbone to extract features.Compared with the current CNN backbone,our backbone network has a lower overhead and achieves better results.Further,we use high/low frequency(HiLo)attention to extract more detailed local features and the multi-scale pooling pyramid transformer(MPPT)module to focus on more global features respectively.Finally,we leverage the context modeling capabilities of TR in the spatiotemporal domain to optimize the extracted features.We have a relatively low number of parameters compared to that required by current TR-based methods and achieve a good effect improvement,which provides a good balance between efficiency and performance.