The complexity of fire and smoke in terms of shape, texture, and color presents significant challenges for accurate fire and smoke detection. To address this, a YOLOv8-based detection algorithm integrated with the Con...The complexity of fire and smoke in terms of shape, texture, and color presents significant challenges for accurate fire and smoke detection. To address this, a YOLOv8-based detection algorithm integrated with the Convolutional Block Attention Module (CBAM) has been developed. This algorithm initially employs the latest YOLOv8 for object recognition. Subsequently, the integration of CBAM enhances its feature extraction capabilities. Finally, the WIoU function is used to optimize the network’s bounding box loss, facilitating rapid convergence. Experimental validation using a smoke and fire dataset demonstrated that the proposed algorithm achieved a 2.3% increase in smoke and fire detection accuracy, surpassing other state-of-the-art methods.展开更多
The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but th...The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but they cannot be directly shared since power grid data is highly privacysensitive.How to use these multi-source heterogeneous data as much as possible to build a power grid knowledge map under the premise of protecting privacy security has become an urgent problem in developing smart grid.Therefore,this paper proposes federated learning named entity recognition method for the power grid field,aiming to solve the problem of building a named entity recognition model covering the entire power grid process training by data with different security requirements.We decompose the named entity recognition(NER)model FLAT(Chinese NER Using Flat-Lattice Transformer)in each platform into a global part and a local part.The local part is used to capture the characteristics of the local data in each platform and is updated using locally labeled data.The global part is learned across different operation platforms to capture the shared NER knowledge.Its local gradients fromdifferent platforms are aggregated to update the global model,which is further delivered to each platform to update their global part.Experiments on two publicly available Chinese datasets and one power grid dataset validate the effectiveness of our method.展开更多
Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate b...Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate based on facial video is an exciting research field for getting palpation information by observation diagnosis.However,most studies focus on optimizing the algorithm based on a small sample of participants without systematically investigating multiple influencing factors.A total of 209 participants and 2,435 facial videos,based on our self-constructed Multi-Scene Sign Dataset and the public datasets,were used to perform a multi-level and multi-factor comprehensive comparison.The effects of different datasets,blood volume pulse signal extraction algorithms,region of interests,time windows,color spaces,pulse rate calculation methods,and video recording scenes were analyzed.Furthermore,we proposed a blood volume pulse signal quality optimization strategy based on the inverse Fourier transform and an improvement strategy for pulse rate estimation based on signal-to-noise ratio threshold sliding.We found that the effects of video estimation of pulse rate in the Multi-Scene Sign Dataset and Pulse Rate Detection Dataset were better than in other datasets.Compared with Fast independent component analysis and Single Channel algorithms,chrominance-based method and plane-orthogonal-to-skin algorithms have a more vital anti-interference ability and higher robustness.The performances of the five-organs fusion area and the full-face area were better than that of single sub-regions,and the fewer motion artifacts and better lighting can improve the precision of pulse rate estimation.展开更多
The process of entrainment-mixing between cumulus clouds and the ambient air is important for the development of cumulus clouds.Accurately obtaining the entrainment rate(λ)is particularly important for its parameteri...The process of entrainment-mixing between cumulus clouds and the ambient air is important for the development of cumulus clouds.Accurately obtaining the entrainment rate(λ)is particularly important for its parameterization within the overall cumulus parameterization scheme.In this study,an improved bulk-plume method is proposed by solving the equations of two conserved variables simultaneously to calculateλof cumulus clouds in a large-eddy simulation.The results demonstrate that the improved bulk-plume method is more reliable than the traditional bulk-plume method,becauseλ,as calculated from the improved method,falls within the range ofλvalues obtained from the traditional method using different conserved variables.The probability density functions ofλfor all data,different times,and different heights can be well-fitted by a log-normal distribution,which supports the assumed stochastic entrainment process in previous studies.Further analysis demonstrate that the relationship betweenλand the vertical velocity is better than other thermodynamic/dynamical properties;thus,the vertical velocity is recommended as the primary influencing factor for the parameterization ofλin the future.The results of this study enhance the theoretical understanding ofλand its influencing factors and shed new light on the development ofλparameterization.展开更多
Gluten,known as the major allergen in wheat,has gained increasing concerns in industrialized countries,resulting in an urgent need for accurate,high-sensitive,and on-site detection of wheat gluten in complex food syst...Gluten,known as the major allergen in wheat,has gained increasing concerns in industrialized countries,resulting in an urgent need for accurate,high-sensitive,and on-site detection of wheat gluten in complex food systems.Herein,we proposed a silver nanoparticles(AgNPs)/metal-organic framework(MOF)substrate-based surface-enhanced Raman scattering(SERS)sensor for the high-sensitive on-site detection of wheat gluten.The detection occurred on the newly in-situ synthesized AgNPs/MOF-modified SERS substrate,providing an enhancement factor(EF)of 1.89×10^(5).Benefitting from the signal amplification function of AgNPs/MOF and the superiority of SERS,this sensor represented high sensitivity performance and a wide detection range from 1×10^(-15)mol/L to 2×10^(-6)mol/L with a detection limit of 1.16×10^(-16)mol/L,which allowed monitoring the trace of wheat gluten in complex food system without matrix interference.This reliable sandwich SERS sensor may provide a promising platform for high-sensitive,accurate,and on-site detection of allergens in the field of food safety.展开更多
This work constructed a machine learning(ML)model to predict the atmospheric corrosion rate of low-alloy steels(LAS).The material properties of LAS,environmental factors,and exposure time were used as the input,while ...This work constructed a machine learning(ML)model to predict the atmospheric corrosion rate of low-alloy steels(LAS).The material properties of LAS,environmental factors,and exposure time were used as the input,while the corrosion rate as the output.6 dif-ferent ML algorithms were used to construct the proposed model.Through optimization and filtering,the eXtreme gradient boosting(XG-Boost)model exhibited good corrosion rate prediction accuracy.The features of material properties were then transformed into atomic and physical features using the proposed property transformation approach,and the dominant descriptors that affected the corrosion rate were filtered using the recursive feature elimination(RFE)as well as XGBoost methods.The established ML models exhibited better predic-tion performance and generalization ability via property transformation descriptors.In addition,the SHapley additive exPlanations(SHAP)method was applied to analyze the relationship between the descriptors and corrosion rate.The results showed that the property transformation model could effectively help with analyzing the corrosion behavior,thereby significantly improving the generalization ability of corrosion rate prediction models.展开更多
Rockburst are often encountered in tunnel construction due to the complex geological conditions.To study the influence of unloading rate on rockburst,gneiss rockburst experiments were conducted under three groups of u...Rockburst are often encountered in tunnel construction due to the complex geological conditions.To study the influence of unloading rate on rockburst,gneiss rockburst experiments were conducted under three groups of unloading rates.A high-speed photography system and acoustic emission(AE)system were used to monitor the entire process of rockburst process in real-time.The results show that the intensity of gneiss rockburst decreases with decrease of unloading rate,which is manifested as the reduction of AE energy and fragments ejection velocity.The mechanisms are proposed to explain this effect:(i)The reduction of unloading rate changes the crack propagation mechanism in the process of rockburst.This makes the rockbursts change from the tensile failure mechanism at high unloading rate to the tension-shear mixed failure mechanism at low unloading rate,and more energy released in the form of shear crack propagation.Then,less strain energy is converted into kinetic energy of fragments ejection.(ii)Less plate cracking degree of gneiss has taken shape due to decrease of unloading rate,resulting in the destruction of rockburst incubation process.The enlightenments of reducing the unloading rate for the project are also described quantitatively.The rockburst magnitude is reduced from the medium magnitude at the unloading rate of 0.1 MPa/s to the slight magnitude at the unloading rate of 0.025 MPa/s,which was judged by the ejection velocity.展开更多
The mutation rate is a pivotal biological characteristic,intricately governed by natural selection and historically garnering considerable attention.Recent advances in high-throughput sequencing and analytical methodo...The mutation rate is a pivotal biological characteristic,intricately governed by natural selection and historically garnering considerable attention.Recent advances in high-throughput sequencing and analytical methodologies have profoundly transformed our understanding in this domain,ushering in an unprecedented era of mutation rate research.This paper aims to provide a comprehensive overview of the key concepts and methodologies frequently employed in the study of mutation rates.It examines various types of mutations,explores the evolutionary dynamics and associated theories,and synthesizes both classical and contemporary hypotheses.Furthermore,this review comprehensively explores recent advances in understanding germline and somatic mutations in animals and offers an overview of experimental methodologies,mutational patterns,molecular mechanisms,and driving forces influencing variations in mutation rates across species and tissues.Finally,it proposes several potential research directions and pressing questions for future investigations.展开更多
In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining ...In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.展开更多
This work aimed to construct an epidemic model with fuzzy parameters.Since the classical epidemic model doesnot elaborate on the successful interaction of susceptible and infective people,the constructed fuzzy epidemi...This work aimed to construct an epidemic model with fuzzy parameters.Since the classical epidemic model doesnot elaborate on the successful interaction of susceptible and infective people,the constructed fuzzy epidemicmodel discusses the more detailed versions of the interactions between infective and susceptible people.Thenext-generation matrix approach is employed to find the reproduction number of a deterministic model.Thesensitivity analysis and local stability analysis of the systemare also provided.For solving the fuzzy epidemic model,a numerical scheme is constructed which consists of three time levels.The numerical scheme has an advantage overthe existing forward Euler scheme for determining the conditions of getting the positive solution.The establishedscheme also has an advantage over existing non-standard finite difference methods in terms of order of accuracy.The stability of the scheme for the considered fuzzy model is also provided.From the plotted results,it can beobserved that susceptible people decay by rising interaction parameters.展开更多
The identification of intercepted radio fuze modulation types is a prerequisite for decision-making in interference systems.However,the electromagnetic environment of modern battlefields is complex,and the signal-to-n...The identification of intercepted radio fuze modulation types is a prerequisite for decision-making in interference systems.However,the electromagnetic environment of modern battlefields is complex,and the signal-to-noise ratio(SNR)of such environments is usually low,which makes it difficult to implement accurate recognition of radio fuzes.To solve the above problem,a radio fuze automatic modulation recognition(AMR)method for low-SNR environments is proposed.First,an adaptive denoising algorithm based on data rearrangement and the two-dimensional(2D)fast Fourier transform(FFT)(DR2D)is used to reduce the noise of the intercepted radio fuze intermediate frequency(IF)signal.Then,the textural features of the denoised IF signal rearranged data matrix are extracted from the statistical indicator vectors of gray-level cooccurrence matrices(GLCMs),and support vector machines(SVMs)are used for classification.The DR2D-based adaptive denoising algorithm achieves an average correlation coefficient of more than 0.76 for ten fuze types under SNRs of-10 d B and above,which is higher than that of other typical algorithms.The trained SVM classification model achieves an average recognition accuracy of more than 96%on seven modulation types and recognition accuracies of more than 94%on each modulation type under SNRs of-12 d B and above,which represents a good AMR performance of radio fuzes under low SNRs.展开更多
With the advent of the Industry 5.0 era,the Internet of Things(IoT)devices face unprecedented proliferation,requiring higher communications rates and lower transmission delays.Considering its high spectrum efficiency,...With the advent of the Industry 5.0 era,the Internet of Things(IoT)devices face unprecedented proliferation,requiring higher communications rates and lower transmission delays.Considering its high spectrum efficiency,the promising filter bank multicarrier(FBMC)technique using offset quadrature amplitude modulation(OQAM)has been applied to Beyond 5G(B5G)industry IoT networks.However,due to the broadcasting nature of wireless channels,the FBMC-OQAMindustry IoT network is inevitably vulnerable to adversary attacks frommalicious IoT nodes.The FBMC-OQAMindustry cognitive radio network(ICRNet)is proposed to ensure security at the physical layer to tackle the above challenge.As a pivotal step of ICRNet,blind modulation recognition(BMR)can detect and recognize the modulation type of malicious signals.The previous works need to accomplish the BMR task of FBMC-OQAM signals in ICRNet nodes.A novel FBMC BMR algorithm is proposed with the transform channel convolution network(TCCNet)rather than a complicated two-dimensional convolution.Firstly,this is achieved by designing a low-complexity binary constellation diagram(BCD)gridding matrix as the input of TCCNet.Then,a transform channel convolution strategy is developed to convert the image-like BCD matrix into a serieslike data format,accelerating the BMR process while keeping discriminative features.Monte Carlo experimental results demonstrate that the proposed TCCNet obtains a performance gain of 8%and 40%over the traditional inphase/quadrature(I/Q)-based and constellation diagram(CD)-based methods at a signal noise ratio(SNR)of 12 dB,respectively.Moreover,the proposed TCCNet can achieve around 29.682 and 2.356 times faster than existing CD-Alex Network(CD-AlexNet)and I/Q-Convolutional Long Deep Neural Network(I/Q-CLDNN)algorithms,respectively.展开更多
It is found that when the parity–time symmetry phenomenon is introduced into the resonant optical gyro system and it works near the exceptional point,the sensitivity can in theory be significantly amplified at low an...It is found that when the parity–time symmetry phenomenon is introduced into the resonant optical gyro system and it works near the exceptional point,the sensitivity can in theory be significantly amplified at low angular rate.However,in fact,the exceptional point is easily disturbed by external environmental variables,which means that it depends on harsh experimental environment and strong control ability,so it is difficult to move towards practical application.Here,we propose a new angular rate sensor structure based on exceptional surface,which has the advantages of high sensitivity and high robustness.The system consists of two fiber-optic ring resonators and two optical loop mirrors,and one of the resonators contains a variable ratio coupler and a variable optical attenuator.We theoretically analyze the system response,and the effects of phase and coupling ratio on the system response.Finally,compared with the conventional resonant gyro,the sensitivity of this exceptional surface angular rate sensor can be improved by about 300 times at low speed.In addition,by changing the loss coefficient in the ring resonator,we can achieve a wide range of 600 rad/s.This scheme provides a new approach for the development of ultra-high sensitivity and wide range angular rate sensors in the future.展开更多
Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi...Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.展开更多
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i...Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis.展开更多
This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distri...This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distribution feature extraction layer in SDFEN replaces convolutional output neural networks with the spatial distribution features that focus more on inter-sample information by incorporating class center vectors.The designed hybrid loss function considers both intra-class distance and inter-class distance,thereby enhancing the similarity among samples of the same class and increasing the dissimilarity between samples of different classes during training.Consequently,this method allows unknown classes to occupy a larger space in the feature space.This reduces the possibility of overlap with known class samples and makes the boundaries between known and unknown samples more distinct.Additionally,the feature comparator threshold can be used to reject unknown samples.For signal open set recognition,seven methods,including the proposed method,are applied to two kinds of electromagnetic signal data:modulation signal and real-world emitter.The experimental results demonstrate that the proposed method outperforms the other six methods overall in a simulated open environment.Specifically,compared to the state-of-the-art Openmax method,the novel method achieves up to 8.87%and 5.25%higher micro-F-measures,respectively.展开更多
Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automa...Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.展开更多
Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seaml...Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seamless and error-free HCI.HGRoc technology is pivotal in healthcare and communication for the deaf community.Despite significant advancements in computer vision-based gesture recognition for language understanding,two considerable challenges persist in this field:(a)limited and common gestures are considered,(b)processing multiple channels of information across a network takes huge computational time during discriminative feature extraction.Therefore,a novel hand vision-based convolutional neural network(CNN)model named(HVCNNM)offers several benefits,notably enhanced accuracy,robustness to variations,real-time performance,reduced channels,and scalability.Additionally,these models can be optimized for real-time performance,learn from large amounts of data,and are scalable to handle complex recognition tasks for efficient human-computer interaction.The proposed model was evaluated on two challenging datasets,namely the Massey University Dataset(MUD)and the American Sign Language(ASL)Alphabet Dataset(ASLAD).On the MUD and ASLAD datasets,HVCNNM achieved a score of 99.23% and 99.00%,respectively.These results demonstrate the effectiveness of CNN as a promising HGRoc approach.The findings suggest that the proposed model have potential roles in applications such as sign language recognition,human-computer interaction,and robotics.展开更多
The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula...The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.展开更多
Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(...Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(LPEN)to efficiently and effectively detect abnormal fall-down and trespass incidents in electric power training scenarios.The LPEN network,comprising three stages—MobileNet,Initial Stage,and Refinement Stage—is employed to swiftly extract image features,detect human key points,and refine them for accurate analysis.Subsequently,a Pose-aware Action Analysis Module(PAAM)captures the positional coordinates of human skeletal points in each frame.Finally,an Abnormal Action Inference Module(AAIM)evaluates whether abnormal fall-down or unauthorized trespass behavior is occurring.For fall-down recognition,three criteria—falling speed,main angles of skeletal points,and the person’s bounding box—are considered.To identify unauthorized trespass,emphasis is placed on the position of the ankles.Extensive experiments validate the effectiveness and efficiency of the proposed system in ensuring the safety and reliability of electric power training.展开更多
文摘The complexity of fire and smoke in terms of shape, texture, and color presents significant challenges for accurate fire and smoke detection. To address this, a YOLOv8-based detection algorithm integrated with the Convolutional Block Attention Module (CBAM) has been developed. This algorithm initially employs the latest YOLOv8 for object recognition. Subsequently, the integration of CBAM enhances its feature extraction capabilities. Finally, the WIoU function is used to optimize the network’s bounding box loss, facilitating rapid convergence. Experimental validation using a smoke and fire dataset demonstrated that the proposed algorithm achieved a 2.3% increase in smoke and fire detection accuracy, surpassing other state-of-the-art methods.
基金Thisworkwas supported by State Grid Science and TechnologyResearch Program(SGSCJY00NYJS2200026).
文摘The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but they cannot be directly shared since power grid data is highly privacysensitive.How to use these multi-source heterogeneous data as much as possible to build a power grid knowledge map under the premise of protecting privacy security has become an urgent problem in developing smart grid.Therefore,this paper proposes federated learning named entity recognition method for the power grid field,aiming to solve the problem of building a named entity recognition model covering the entire power grid process training by data with different security requirements.We decompose the named entity recognition(NER)model FLAT(Chinese NER Using Flat-Lattice Transformer)in each platform into a global part and a local part.The local part is used to capture the characteristics of the local data in each platform and is updated using locally labeled data.The global part is learned across different operation platforms to capture the shared NER knowledge.Its local gradients fromdifferent platforms are aggregated to update the global model,which is further delivered to each platform to update their global part.Experiments on two publicly available Chinese datasets and one power grid dataset validate the effectiveness of our method.
基金supported by the Key Research Program of the Chinese Academy of Sciences(grant number ZDRW-ZS-2021-1-2).
文摘Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate based on facial video is an exciting research field for getting palpation information by observation diagnosis.However,most studies focus on optimizing the algorithm based on a small sample of participants without systematically investigating multiple influencing factors.A total of 209 participants and 2,435 facial videos,based on our self-constructed Multi-Scene Sign Dataset and the public datasets,were used to perform a multi-level and multi-factor comprehensive comparison.The effects of different datasets,blood volume pulse signal extraction algorithms,region of interests,time windows,color spaces,pulse rate calculation methods,and video recording scenes were analyzed.Furthermore,we proposed a blood volume pulse signal quality optimization strategy based on the inverse Fourier transform and an improvement strategy for pulse rate estimation based on signal-to-noise ratio threshold sliding.We found that the effects of video estimation of pulse rate in the Multi-Scene Sign Dataset and Pulse Rate Detection Dataset were better than in other datasets.Compared with Fast independent component analysis and Single Channel algorithms,chrominance-based method and plane-orthogonal-to-skin algorithms have a more vital anti-interference ability and higher robustness.The performances of the five-organs fusion area and the full-face area were better than that of single sub-regions,and the fewer motion artifacts and better lighting can improve the precision of pulse rate estimation.
基金supported by the National Natural Science Foundation of China(Grant Nos.42175099,42027804,42075073)the Innovative Project of Postgraduates in Jiangsu Province in 2023(Grant No.KYCX23_1319)+3 种基金supported by the National Natural Science Foundation of China(Grant No.42205080)the Natural Science Foundation of Sichuan(Grant No.2023YFS0442)the Research Fund of Civil Aviation Flight University of China(Grant No.J2022-037)supported by the National Key Scientific and Technological Infrastructure project“Earth System Science Numerical Simulator Facility”(Earth Lab)。
文摘The process of entrainment-mixing between cumulus clouds and the ambient air is important for the development of cumulus clouds.Accurately obtaining the entrainment rate(λ)is particularly important for its parameterization within the overall cumulus parameterization scheme.In this study,an improved bulk-plume method is proposed by solving the equations of two conserved variables simultaneously to calculateλof cumulus clouds in a large-eddy simulation.The results demonstrate that the improved bulk-plume method is more reliable than the traditional bulk-plume method,becauseλ,as calculated from the improved method,falls within the range ofλvalues obtained from the traditional method using different conserved variables.The probability density functions ofλfor all data,different times,and different heights can be well-fitted by a log-normal distribution,which supports the assumed stochastic entrainment process in previous studies.Further analysis demonstrate that the relationship betweenλand the vertical velocity is better than other thermodynamic/dynamical properties;thus,the vertical velocity is recommended as the primary influencing factor for the parameterization ofλin the future.The results of this study enhance the theoretical understanding ofλand its influencing factors and shed new light on the development ofλparameterization.
基金financially supported by the Zhejiang Provincial Natural Science Foundation of China(LY21C200008)。
文摘Gluten,known as the major allergen in wheat,has gained increasing concerns in industrialized countries,resulting in an urgent need for accurate,high-sensitive,and on-site detection of wheat gluten in complex food systems.Herein,we proposed a silver nanoparticles(AgNPs)/metal-organic framework(MOF)substrate-based surface-enhanced Raman scattering(SERS)sensor for the high-sensitive on-site detection of wheat gluten.The detection occurred on the newly in-situ synthesized AgNPs/MOF-modified SERS substrate,providing an enhancement factor(EF)of 1.89×10^(5).Benefitting from the signal amplification function of AgNPs/MOF and the superiority of SERS,this sensor represented high sensitivity performance and a wide detection range from 1×10^(-15)mol/L to 2×10^(-6)mol/L with a detection limit of 1.16×10^(-16)mol/L,which allowed monitoring the trace of wheat gluten in complex food system without matrix interference.This reliable sandwich SERS sensor may provide a promising platform for high-sensitive,accurate,and on-site detection of allergens in the field of food safety.
基金the National Key R&D Program of China(No.2021YFB3701705).
文摘This work constructed a machine learning(ML)model to predict the atmospheric corrosion rate of low-alloy steels(LAS).The material properties of LAS,environmental factors,and exposure time were used as the input,while the corrosion rate as the output.6 dif-ferent ML algorithms were used to construct the proposed model.Through optimization and filtering,the eXtreme gradient boosting(XG-Boost)model exhibited good corrosion rate prediction accuracy.The features of material properties were then transformed into atomic and physical features using the proposed property transformation approach,and the dominant descriptors that affected the corrosion rate were filtered using the recursive feature elimination(RFE)as well as XGBoost methods.The established ML models exhibited better predic-tion performance and generalization ability via property transformation descriptors.In addition,the SHapley additive exPlanations(SHAP)method was applied to analyze the relationship between the descriptors and corrosion rate.The results showed that the property transformation model could effectively help with analyzing the corrosion behavior,thereby significantly improving the generalization ability of corrosion rate prediction models.
基金The financial support from the National Natural Science Foundation of China(Grant Nos.41941018 and 52074299)the Fundamental Research Funds for the Central Universities of China(Grant No.2023JCCXSB02)。
文摘Rockburst are often encountered in tunnel construction due to the complex geological conditions.To study the influence of unloading rate on rockburst,gneiss rockburst experiments were conducted under three groups of unloading rates.A high-speed photography system and acoustic emission(AE)system were used to monitor the entire process of rockburst process in real-time.The results show that the intensity of gneiss rockburst decreases with decrease of unloading rate,which is manifested as the reduction of AE energy and fragments ejection velocity.The mechanisms are proposed to explain this effect:(i)The reduction of unloading rate changes the crack propagation mechanism in the process of rockburst.This makes the rockbursts change from the tensile failure mechanism at high unloading rate to the tension-shear mixed failure mechanism at low unloading rate,and more energy released in the form of shear crack propagation.Then,less strain energy is converted into kinetic energy of fragments ejection.(ii)Less plate cracking degree of gneiss has taken shape due to decrease of unloading rate,resulting in the destruction of rockburst incubation process.The enlightenments of reducing the unloading rate for the project are also described quantitatively.The rockburst magnitude is reduced from the medium magnitude at the unloading rate of 0.1 MPa/s to the slight magnitude at the unloading rate of 0.025 MPa/s,which was judged by the ejection velocity.
文摘The mutation rate is a pivotal biological characteristic,intricately governed by natural selection and historically garnering considerable attention.Recent advances in high-throughput sequencing and analytical methodologies have profoundly transformed our understanding in this domain,ushering in an unprecedented era of mutation rate research.This paper aims to provide a comprehensive overview of the key concepts and methodologies frequently employed in the study of mutation rates.It examines various types of mutations,explores the evolutionary dynamics and associated theories,and synthesizes both classical and contemporary hypotheses.Furthermore,this review comprehensively explores recent advances in understanding germline and somatic mutations in animals and offers an overview of experimental methodologies,mutational patterns,molecular mechanisms,and driving forces influencing variations in mutation rates across species and tissues.Finally,it proposes several potential research directions and pressing questions for future investigations.
基金This research was funded by the National Natural Science Foundation of China(No.62272124)the National Key Research and Development Program of China(No.2022YFB2701401)+3 种基金Guizhou Province Science and Technology Plan Project(Grant Nos.Qiankehe Paltform Talent[2020]5017)The Research Project of Guizhou University for Talent Introduction(No.[2020]61)the Cultivation Project of Guizhou University(No.[2019]56)the Open Fund of Key Laboratory of Advanced Manufacturing Technology,Ministry of Education(GZUAMT2021KF[01]).
文摘In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.
基金the support of Prince Sultan University for paying the article processing charges(APC)of this publication.
文摘This work aimed to construct an epidemic model with fuzzy parameters.Since the classical epidemic model doesnot elaborate on the successful interaction of susceptible and infective people,the constructed fuzzy epidemicmodel discusses the more detailed versions of the interactions between infective and susceptible people.Thenext-generation matrix approach is employed to find the reproduction number of a deterministic model.Thesensitivity analysis and local stability analysis of the systemare also provided.For solving the fuzzy epidemic model,a numerical scheme is constructed which consists of three time levels.The numerical scheme has an advantage overthe existing forward Euler scheme for determining the conditions of getting the positive solution.The establishedscheme also has an advantage over existing non-standard finite difference methods in terms of order of accuracy.The stability of the scheme for the considered fuzzy model is also provided.From the plotted results,it can beobserved that susceptible people decay by rising interaction parameters.
基金National Natural Science Foundation of China under Grant No.61973037China Postdoctoral Science Foundation 2022M720419 to provide fund for conducting experiments。
文摘The identification of intercepted radio fuze modulation types is a prerequisite for decision-making in interference systems.However,the electromagnetic environment of modern battlefields is complex,and the signal-to-noise ratio(SNR)of such environments is usually low,which makes it difficult to implement accurate recognition of radio fuzes.To solve the above problem,a radio fuze automatic modulation recognition(AMR)method for low-SNR environments is proposed.First,an adaptive denoising algorithm based on data rearrangement and the two-dimensional(2D)fast Fourier transform(FFT)(DR2D)is used to reduce the noise of the intercepted radio fuze intermediate frequency(IF)signal.Then,the textural features of the denoised IF signal rearranged data matrix are extracted from the statistical indicator vectors of gray-level cooccurrence matrices(GLCMs),and support vector machines(SVMs)are used for classification.The DR2D-based adaptive denoising algorithm achieves an average correlation coefficient of more than 0.76 for ten fuze types under SNRs of-10 d B and above,which is higher than that of other typical algorithms.The trained SVM classification model achieves an average recognition accuracy of more than 96%on seven modulation types and recognition accuracies of more than 94%on each modulation type under SNRs of-12 d B and above,which represents a good AMR performance of radio fuzes under low SNRs.
基金supported by the National Natural Science Foundation of China(Nos.61671095,61371164)the Project of Key Laboratory of Signal and Information Processing of Chongqing(No.CSTC2009CA2003).
文摘With the advent of the Industry 5.0 era,the Internet of Things(IoT)devices face unprecedented proliferation,requiring higher communications rates and lower transmission delays.Considering its high spectrum efficiency,the promising filter bank multicarrier(FBMC)technique using offset quadrature amplitude modulation(OQAM)has been applied to Beyond 5G(B5G)industry IoT networks.However,due to the broadcasting nature of wireless channels,the FBMC-OQAMindustry IoT network is inevitably vulnerable to adversary attacks frommalicious IoT nodes.The FBMC-OQAMindustry cognitive radio network(ICRNet)is proposed to ensure security at the physical layer to tackle the above challenge.As a pivotal step of ICRNet,blind modulation recognition(BMR)can detect and recognize the modulation type of malicious signals.The previous works need to accomplish the BMR task of FBMC-OQAM signals in ICRNet nodes.A novel FBMC BMR algorithm is proposed with the transform channel convolution network(TCCNet)rather than a complicated two-dimensional convolution.Firstly,this is achieved by designing a low-complexity binary constellation diagram(BCD)gridding matrix as the input of TCCNet.Then,a transform channel convolution strategy is developed to convert the image-like BCD matrix into a serieslike data format,accelerating the BMR process while keeping discriminative features.Monte Carlo experimental results demonstrate that the proposed TCCNet obtains a performance gain of 8%and 40%over the traditional inphase/quadrature(I/Q)-based and constellation diagram(CD)-based methods at a signal noise ratio(SNR)of 12 dB,respectively.Moreover,the proposed TCCNet can achieve around 29.682 and 2.356 times faster than existing CD-Alex Network(CD-AlexNet)and I/Q-Convolutional Long Deep Neural Network(I/Q-CLDNN)algorithms,respectively.
基金supported in part by the National Natural Science Foundation of China (Grant Nos.62273314,U21A20141,and 51821003)Fundamental Research Program of Shanxi Province (Grant No.202303021224008)Shanxi Province Key Laboratory of Quantum Sensing and Precision Measure-ment (Grant No.201905D121001).
文摘It is found that when the parity–time symmetry phenomenon is introduced into the resonant optical gyro system and it works near the exceptional point,the sensitivity can in theory be significantly amplified at low angular rate.However,in fact,the exceptional point is easily disturbed by external environmental variables,which means that it depends on harsh experimental environment and strong control ability,so it is difficult to move towards practical application.Here,we propose a new angular rate sensor structure based on exceptional surface,which has the advantages of high sensitivity and high robustness.The system consists of two fiber-optic ring resonators and two optical loop mirrors,and one of the resonators contains a variable ratio coupler and a variable optical attenuator.We theoretically analyze the system response,and the effects of phase and coupling ratio on the system response.Finally,compared with the conventional resonant gyro,the sensitivity of this exceptional surface angular rate sensor can be improved by about 300 times at low speed.In addition,by changing the loss coefficient in the ring resonator,we can achieve a wide range of 600 rad/s.This scheme provides a new approach for the development of ultra-high sensitivity and wide range angular rate sensors in the future.
文摘Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.
文摘Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis.
文摘This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distribution feature extraction layer in SDFEN replaces convolutional output neural networks with the spatial distribution features that focus more on inter-sample information by incorporating class center vectors.The designed hybrid loss function considers both intra-class distance and inter-class distance,thereby enhancing the similarity among samples of the same class and increasing the dissimilarity between samples of different classes during training.Consequently,this method allows unknown classes to occupy a larger space in the feature space.This reduces the possibility of overlap with known class samples and makes the boundaries between known and unknown samples more distinct.Additionally,the feature comparator threshold can be used to reject unknown samples.For signal open set recognition,seven methods,including the proposed method,are applied to two kinds of electromagnetic signal data:modulation signal and real-world emitter.The experimental results demonstrate that the proposed method outperforms the other six methods overall in a simulated open environment.Specifically,compared to the state-of-the-art Openmax method,the novel method achieves up to 8.87%and 5.25%higher micro-F-measures,respectively.
基金supported from the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.
基金funded by Researchers Supporting Project Number(RSPD2024 R947),King Saud University,Riyadh,Saudi Arabia.
文摘Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seamless and error-free HCI.HGRoc technology is pivotal in healthcare and communication for the deaf community.Despite significant advancements in computer vision-based gesture recognition for language understanding,two considerable challenges persist in this field:(a)limited and common gestures are considered,(b)processing multiple channels of information across a network takes huge computational time during discriminative feature extraction.Therefore,a novel hand vision-based convolutional neural network(CNN)model named(HVCNNM)offers several benefits,notably enhanced accuracy,robustness to variations,real-time performance,reduced channels,and scalability.Additionally,these models can be optimized for real-time performance,learn from large amounts of data,and are scalable to handle complex recognition tasks for efficient human-computer interaction.The proposed model was evaluated on two challenging datasets,namely the Massey University Dataset(MUD)and the American Sign Language(ASL)Alphabet Dataset(ASLAD).On the MUD and ASLAD datasets,HVCNNM achieved a score of 99.23% and 99.00%,respectively.These results demonstrate the effectiveness of CNN as a promising HGRoc approach.The findings suggest that the proposed model have potential roles in applications such as sign language recognition,human-computer interaction,and robotics.
基金The support of this research was by Hubei Provincial Natural Science Foundation(2022CFB449)Science Research Foundation of Education Department of Hubei Province(B2020061),are gratefully acknowledged.
文摘The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.
基金supportted by Natural Science Foundation of Jiangsu Province(No.BK20230696).
文摘Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(LPEN)to efficiently and effectively detect abnormal fall-down and trespass incidents in electric power training scenarios.The LPEN network,comprising three stages—MobileNet,Initial Stage,and Refinement Stage—is employed to swiftly extract image features,detect human key points,and refine them for accurate analysis.Subsequently,a Pose-aware Action Analysis Module(PAAM)captures the positional coordinates of human skeletal points in each frame.Finally,an Abnormal Action Inference Module(AAIM)evaluates whether abnormal fall-down or unauthorized trespass behavior is occurring.For fall-down recognition,three criteria—falling speed,main angles of skeletal points,and the person’s bounding box—are considered.To identify unauthorized trespass,emphasis is placed on the position of the ankles.Extensive experiments validate the effectiveness and efficiency of the proposed system in ensuring the safety and reliability of electric power training.