Multimodal lung tumor medical images can provide anatomical and functional information for the same lesion.Such as Positron Emission Computed Tomography(PET),Computed Tomography(CT),and PET-CT.How to utilize the lesio...Multimodal lung tumor medical images can provide anatomical and functional information for the same lesion.Such as Positron Emission Computed Tomography(PET),Computed Tomography(CT),and PET-CT.How to utilize the lesion anatomical and functional information effectively and improve the network segmentation performance are key questions.To solve the problem,the Saliency Feature-Guided Interactive Feature Enhancement Lung Tumor Segmentation Network(Guide-YNet)is proposed in this paper.Firstly,a double-encoder single-decoder U-Net is used as the backbone in this model,a single-coder single-decoder U-Net is used to generate the saliency guided feature using PET image and transmit it into the skip connection of the backbone,and the high sensitivity of PET images to tumors is used to guide the network to accurately locate lesions.Secondly,a Cross Scale Feature Enhancement Module(CSFEM)is designed to extract multi-scale fusion features after downsampling.Thirdly,a Cross-Layer Interactive Feature Enhancement Module(CIFEM)is designed in the encoder to enhance the spatial position information and semantic information.Finally,a Cross-Dimension Cross-Layer Feature Enhancement Module(CCFEM)is proposed in the decoder,which effectively extractsmultimodal image features through global attention and multi-dimension local attention.The proposed method is verified on the lung multimodal medical image datasets,and the results showthat theMean Intersection overUnion(MIoU),Accuracy(Acc),Dice Similarity Coefficient(Dice),Volumetric overlap error(Voe),Relative volume difference(Rvd)of the proposed method on lung lesion segmentation are 87.27%,93.08%,97.77%,95.92%,89.28%,and 88.68%,respectively.It is of great significance for computer-aided diagnosis.展开更多
Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providi...Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.展开更多
While single-modal visible light images or infrared images provide limited information,infrared light captures significant thermal radiation data,whereas visible light excels in presenting detailed texture information...While single-modal visible light images or infrared images provide limited information,infrared light captures significant thermal radiation data,whereas visible light excels in presenting detailed texture information.Com-bining images obtained from both modalities allows for leveraging their respective strengths and mitigating individual limitations,resulting in high-quality images with enhanced contrast and rich texture details.Such capabilities hold promising applications in advanced visual tasks including target detection,instance segmentation,military surveillance,pedestrian detection,among others.This paper introduces a novel approach,a dual-branch decomposition fusion network based on AutoEncoder(AE),which decomposes multi-modal features into intensity and texture information for enhanced fusion.Local contrast enhancement module(CEM)and texture detail enhancement module(DEM)are devised to process the decomposed images,followed by image fusion through the decoder.The proposed loss function ensures effective retention of key information from the source images of both modalities.Extensive comparisons and generalization experiments demonstrate the superior performance of our network in preserving pixel intensity distribution and retaining texture details.From the qualitative results,we can see the advantages of fusion details and local contrast.In the quantitative experiments,entropy(EN),mutual information(MI),structural similarity(SSIM)and other results have improved and exceeded the SOTA(State of the Art)model as a whole.展开更多
In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through cr...In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through crosssite scripting(XSS)attacks is one of the most commonly used attacks by hackers.Currently,deep learning-based XSS attack detection methods have good application prospects;however,they suffer from problems such as being prone to overfitting,a high false alarm rate,and low accuracy.To address these issues,we propose a multi-stage feature extraction and fusion model for XSS detection based on Random Forest feature enhancement.The model utilizes RandomForests to capture the intrinsic structure and patterns of the data by extracting leaf node indices as features,which are subsequentlymergedwith the original data features to forma feature setwith richer information content.Further feature extraction is conducted through three parallel channels.Channel I utilizes parallel onedimensional convolutional layers(1Dconvolutional layers)with different convolutional kernel sizes to extract local features at different scales and performmulti-scale feature fusion;Channel II employsmaximum one-dimensional pooling layers(max 1D pooling layers)of various sizes to extract key features from the data;and Channel III extracts global information bi-directionally using a Bi-Directional Long-Short TermMemory Network(Bi-LSTM)and incorporates a multi-head attention mechanism to enhance global features.Finally,effective classification and prediction of XSS are performed by fusing the features of the three channels.To test the effectiveness of the model,we conduct experiments on six datasets.We achieve an accuracy of 100%on the UNSW-NB15 dataset and 99.99%on the CICIDS2017 dataset,which is higher than that of the existing models.展开更多
Manhole cover defect recognition is of significant practical importance as it can accurately identify damaged or missing covers, enabling timely replacement and maintenance. Traditional manhole cover detection techniq...Manhole cover defect recognition is of significant practical importance as it can accurately identify damaged or missing covers, enabling timely replacement and maintenance. Traditional manhole cover detection techniques primarily focus on detecting the presence of covers rather than classifying the types of defects. However, manhole cover defects exhibit small inter-class feature differences and large intra-class feature variations, which makes their recognition challenging. To improve the classification of manhole cover defect types, we propose a Progressive Dual-Branch Feature Fusion Network (PDBFFN). The baseline backbone network adopts a multi-stage hierarchical architecture design using Res-Net50 as the visual feature extractor, from which both local and global information is obtained. Additionally, a Feature Enhancement Module (FEM) and a Fusion Module (FM) are introduced to enhance the network’s ability to learn critical features. Experimental results demonstrate that our model achieves a classification accuracy of 82.6% on a manhole cover defect dataset, outperforming several state-of-the-art fine-grained image classification models.展开更多
At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production ...At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.展开更多
The difficulty to select the best system parameters restricts the engineering application of stochastic resonance (SR). An adaptive cascade stochastic resonance (ACSR) is proposed in the present study. The propose...The difficulty to select the best system parameters restricts the engineering application of stochastic resonance (SR). An adaptive cascade stochastic resonance (ACSR) is proposed in the present study. The proposed method introduces correlation theory into SR, and uses correlation coefficient of the input signals and noise as a weight to construct the weighted signal-to-noise ratio (WSNR) index. The influence of high frequency noise is alleviated and the signal-to-noise ratio index used in traditional SR is improved accordingly. The ACSR with WSNR can obtain optimal parameters adaptively. And it is not necessary to predict the exact frequency of the target signal. In addition, through the secondary utilization of noise, ACSR makes the signal output waveforrn smoother and the fluctuation period more obvious. Simulation example and engineering application of gearbox fault diagnosis demonstrate the effectiveness and feasibility of the proposed method.展开更多
With the development of social media and the prevalence of mobile devices,an increasing number of people tend to use social media platforms to express their opinions and attitudes,leading to many online controversies....With the development of social media and the prevalence of mobile devices,an increasing number of people tend to use social media platforms to express their opinions and attitudes,leading to many online controversies.These online controversies can severely threaten social stability,making automatic detection of controversies particularly necessary.Most controversy detection methods currently focus on mining features from text semantics and propagation structures.However,these methods have two drawbacks:1)limited ability to capture structural features and failure to learn deeper structural features,and 2)neglecting the influence of topic information and ineffective utilization of topic features.In light of these phenomena,this paper proposes a social media controversy detection method called Dual Feature Enhanced Graph Convolutional Network(DFE-GCN).This method explores structural information at different scales from global and local perspectives to capture deeper structural features,enhancing the expressive power of structural features.Furthermore,to strengthen the influence of topic information,this paper utilizes attention mechanisms to enhance topic features after each graph convolutional layer,effectively using topic information.We validated our method on two different public datasets,and the experimental results demonstrate that our method achieves state-of-the-art performance compared to baseline methods.On the Weibo and Reddit datasets,the accuracy is improved by 5.92%and 3.32%,respectively,and the F1 score is improved by 1.99%and 2.17%,demonstrating the positive impact of enhanced structural features and topic features on controversy detection.展开更多
The femtosecond pulse shaping technique has been shown to be an effective method to control the multi-photon absorption by the light–matter interaction. Previous studies mainly focused on the quantum coherent control...The femtosecond pulse shaping technique has been shown to be an effective method to control the multi-photon absorption by the light–matter interaction. Previous studies mainly focused on the quantum coherent control of the multi-photon absorption by the phase, amplitude and polarization modulation, but the coherent features of the multi-photon absorption depending on the energy level structure, the laser spectrum bandwidth and laser central frequency still lack in-depth systematic research. In this work, we further explore the coherent features of the resonance-mediated two-photon absorption in a rubidium atom by varying the energy level structure, spectrum bandwidth and central frequency of the femtosecond laser field. The theoretical results show that the change of the intermediate state detuning can effectively influence the enhancement of the near-resonant part, which further affects the transform-limited (TL)-normalized final state population maximum. Moreover, as the laser spectrum bandwidth increases, the TL-normalized final state population maximum can be effectively enhanced due to the increase of the enhancement in the near-resonant part, but the TL-normalized final state population maximum is constant by varying the laser central frequency. These studies can provide a clear physical picture for understanding the coherent features of the resonance-mediated two-photon absorption, and can also provide a theoretical guidance for the future applications.展开更多
The detection of brain disease is an essential issue in medical and research areas.Deep learning techniques have shown promising results in detecting and diagnosing brain diseases using magnetic resonance imaging(MRI)...The detection of brain disease is an essential issue in medical and research areas.Deep learning techniques have shown promising results in detecting and diagnosing brain diseases using magnetic resonance imaging(MRI)images.These techniques involve training neural networks on large datasets of MRI images,allowing the networks to learn patterns and features indicative of different brain diseases.However,several challenges and limitations still need to be addressed further to improve the accuracy and effectiveness of these techniques.This paper implements a Feature Enhanced Stacked Auto Encoder(FESAE)model to detect brain diseases.The standard stack auto encoder’s results are trivial and not robust enough to boost the system’s accuracy.Therefore,the standard Stack Auto Encoder(SAE)is replaced with a Stacked Feature Enhanced Auto Encoder with a feature enhancement function to efficiently and effectively get non-trivial features with less activation energy froman image.The proposed model consists of four stages.First,pre-processing is performed to remove noise,and the greyscale image is converted to Red,Green,and Blue(RGB)to enhance feature details for discriminative feature extraction.Second,feature Extraction is performed to extract significant features for classification using DiscreteWavelet Transform(DWT)and Channelization.Third,classification is performed to classify MRI images into four major classes:Normal,Tumor,Brain Stroke,and Alzheimer’s.Finally,the FESAE model outperforms the state-of-theart,machine learning,and deep learning methods such as Artificial Neural Network(ANN),SAE,Random Forest(RF),and Logistic Regression(LR)by achieving a high accuracy of 98.61% on a dataset of 2000 MRI images.The proposed model has significant potential for assisting radiologists in diagnosing brain diseases more accurately and improving patient outcomes.展开更多
In thefield of diagnosis of medical images the challenge lies in tracking and identifying the defective cells and the extent of the defective region within the complex structure of a brain cavity.Locating the defective...In thefield of diagnosis of medical images the challenge lies in tracking and identifying the defective cells and the extent of the defective region within the complex structure of a brain cavity.Locating the defective cells precisely during the diagnosis phase helps tofight the greatest exterminator of mankind.Early detec-tion of these defective cells requires an accurate computer-aided diagnostic system(CAD)that supports early treatment and promotes survival rates of patients.An ear-lier version of CAD systems relies greatly on the expertise of radiologist and it con-sumed more time to identify the defective region.The manuscript takes the efficacy of coalescing features like intensity,shape,and texture of the magnetic resonance image(MRI).In the Enhanced Feature Fusion Segmentation based classification method(EEFS)the image is enhanced and segmented to extract the prominent fea-tures.To bring out the desired effect the EEFS method uses Enhanced Local Binary Pattern(EnLBP),Partisan Gray Level Co-occurrence Matrix Histogram of Oriented Gradients(PGLCMHOG),and iGrab cut method to segment image.These prominent features along with deep features are coalesced to provide a single-dimensional fea-ture vector that is effectively used for prediction.The coalesced vector is used with the existing classifiers to compare the results of these classifiers with that of the gen-erated vector.The generated vector provides promising results with commendably less computatio nal time for pre-processing and classification of MR medical images.展开更多
Utilizing the spatiotemporal features contained in extensive trajectory data for identifying operation modes of agricultural machinery is an important basis task for subsequent agricultural machinery trajectory resear...Utilizing the spatiotemporal features contained in extensive trajectory data for identifying operation modes of agricultural machinery is an important basis task for subsequent agricultural machinery trajectory research.In the present study,to effectively identify agricultural machinery operation mode,a feature deformation network with multi-range feature enhancement was proposed.First,a multi-range feature enhancement module was developed to fully explore the feature distribution of agricultural machinery trajectory data.Second,to further enrich the representation of trajectories,a feature deformation module was proposed that can map trajectory points to high-dimensional space to form feature maps.Then,EfficientNet-B0 was used to extract features of different scales and depths from the feature map,select features highly relevant to the results,and finally accurately predict the mode of each trajectory point.To validate the effectiveness of the proposed method,experiments were conducted to compare the results with those of other methods on a dataset of real agricultural trajectories.On the corn and wheat harvester trajectory datasets,the model achieved accuracies of 96.88%and 96.68%,as well as F1 scores of 93.54%and 94.19%,exhibiting improvements of 8.35%and 9.08%in accuracy and 20.99%and 20.04%in F1 score compared with the current state-of-the-art method.展开更多
The extraction of water bodies is essential for monitoring water resources,ecosystem services and the hydrological cycle,so analyzing water bodies from remote sensing images is necessary.The water index is designed to...The extraction of water bodies is essential for monitoring water resources,ecosystem services and the hydrological cycle,so analyzing water bodies from remote sensing images is necessary.The water index is designed to highlight water bodies in remote sensing images.We employ a new water index and digital image processing technology to extract water bodies automatically and accurately from Landsat 8 OLI images.Firstly,we preprocess Landsat 8 OLI images with radiometric calibration and atmospheric correction.Subsequently,we apply KT transformation,LBV transformation,AWEI nsh,and HIS transformation to the preprocessed image to calculate a new water index.Then,we perform linear feature enhancement and improve the local adaptive threshold segmentation method to extract small water bodies accurately.Meanwhile,we employ morphological enhancement and improve the local adaptive threshold segmentation method to extract large water bodies.Finally,we combine small and large water bodies to get complete water bodies.Compared with other traditional methods,our method has apparent advantages in water extraction,particularly in the extraction of small water bodies.展开更多
Good proposal initials are critical for 3D object detection applications.However,due to the significant geometry variation of indoor scenes,incomplete and noisy proposals are inevitable in most cases.Mining feature in...Good proposal initials are critical for 3D object detection applications.However,due to the significant geometry variation of indoor scenes,incomplete and noisy proposals are inevitable in most cases.Mining feature information among these“bad”proposals may mislead the detection.Contrastive learning provides a feasible way for representing proposals,which can align complete and incomplete/noisy proposals in feature space.The aligned feature space can help us build robust 3D representation even if bad proposals are given.Therefore,we devise a new contrast learning framework for indoor 3D object detection,called EFECL,that learns robust 3D representations by contrastive learning of proposals on two different levels.Specifically,we optimize both instance-level and category-level contrasts to align features by capturing instance-specific characteristics and semantic-aware common patterns.Furthermore,we propose an enhanced feature aggregation module to extract more general and informative features for contrastive learning.Evaluations on ScanNet V2 and SUN RGB-D benchmarks demonstrate the generalizability and effectiveness of our method,and our method can achieve 12.3%and 7.3%improvements on both datasets over the benchmark alternatives.The code and models are publicly available at https://github.com/YaraDuan/EFECL.展开更多
This paper proposes a real-time detection method to improve the Infrared small target detection CenterNet(ISTD-CenterNet)network for detecting small infrared targets in complex environments.The method eliminates the n...This paper proposes a real-time detection method to improve the Infrared small target detection CenterNet(ISTD-CenterNet)network for detecting small infrared targets in complex environments.The method eliminates the need for an anchor frame,addressing the issues of low accuracy and slow speed.HRNet is used as the framework for feature extraction,and an ECBAM attention module is added to each stage branch for intelligent identification of the positions of small targets and significant objects.A scale enhancement module is also added to obtain a high-level semantic representation and fine-resolution prediction map for the entire infrared image.Besides,an improved sensory field enhancement module is designed to leverage semantic information in low-resolution feature maps,and a convolutional attention mechanism module is used to increase network stability and convergence speed.Comparison experiments conducted on the infrared small target data set ESIRST.The experiments show that compared to the benchmark network CenterNet-HRNet,the proposed ISTD-CenterNet improves the recall by 22.85%and the detection accuracy by 13.36%.Compared to the state-of-the-art YOLOv5small,the ISTD-CenterNet recall is improved by 5.88%,the detection precision is improved by 2.33%,and the detection frame rate is 48.94 frames/sec,which realizes the accurate real-time detection of small infrared targets.展开更多
Sea cucumber detection is widely recognized as the key to automatic culture.The underwater light environment is complex and easily obscured by mud,sand,reefs,and other underwater organisms.To date,research on sea cucu...Sea cucumber detection is widely recognized as the key to automatic culture.The underwater light environment is complex and easily obscured by mud,sand,reefs,and other underwater organisms.To date,research on sea cucumber detection has mostly concentrated on the distinction between prospective objects and the background.However,the key to proper distinction is the effective extraction of sea cucumber feature information.In this study,the edge-enhanced scaling You Only Look Once-v4(YOLOv4)(ESYv4)was proposed for sea cucumber detection.By emphasizing the target features in a way that reduced the impact of different hues and brightness values underwater on the misjudgment of sea cucumbers,a bidirectional cascade network(BDCN)was used to extract the overall edge greyscale image in the image and add up the original RGB image as the detected input.Meanwhile,the YOLOv4 model for backbone detection is scaled,and the number of parameters is reduced to 48%of the original number of parameters.Validation results of 783images indicated that the detection precision of positive sea cucumber samples reached 0.941.This improvement reflects that the algorithm is more effective to improve the edge feature information of the target.It thus contributes to the automatic multi-objective detection of underwater sea cucumbers.展开更多
Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse...Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse facial features of individual frames.In this paper, a frame-level attention module is integrated into an improved VGG-based frame work and a lightweight facial expression recognition method is proposed.The proposed network takes a sub video cut from an experimental video sequence as its input and generates a fixed-dimension representation.The VGG-based network with an enhanced branch embeds face images into feature vectors.The frame-level attention module learns weights which are used to adaptively aggregate the feature vectors to form a single discriminative video representation.Finally, a regression module outputs the classification results.The experimental results on CK+and AFEW databases show that the recognition rates of the proposed method can achieve the state-of-the-art performance.展开更多
Image fusion has been developing into an important area of research. In remote sensing, the use of the same image sensor in different working modes, or different image sensors, can provide reinforcing or complementary...Image fusion has been developing into an important area of research. In remote sensing, the use of the same image sensor in different working modes, or different image sensors, can provide reinforcing or complementary information. Therefore, it is highly valuable to fuse outputs from multiple sensors (or the same sensor in different working modes) to improve the overall performance of the remote images, which are very useful for human visual perception and image processing task. Accordingly, in this paper, we first provide a comprehensive survey of the state of the art of multi-sensor image fusion methods in terms of three aspects: pixel-level fusion, feature-level fusion and decision-level fusion. An overview of existing fusion strategies is then introduced, after which the existing fusion quality measures are summarized. Finally, this review analyzes the development trends in fusion algorithms that may attract researchers to further explore the research in this field.展开更多
基金supported in part by the National Natural Science Foundation of China(Grant No.62062003)Natural Science Foundation of Ningxia(Grant No.2023AAC03293).
文摘Multimodal lung tumor medical images can provide anatomical and functional information for the same lesion.Such as Positron Emission Computed Tomography(PET),Computed Tomography(CT),and PET-CT.How to utilize the lesion anatomical and functional information effectively and improve the network segmentation performance are key questions.To solve the problem,the Saliency Feature-Guided Interactive Feature Enhancement Lung Tumor Segmentation Network(Guide-YNet)is proposed in this paper.Firstly,a double-encoder single-decoder U-Net is used as the backbone in this model,a single-coder single-decoder U-Net is used to generate the saliency guided feature using PET image and transmit it into the skip connection of the backbone,and the high sensitivity of PET images to tumors is used to guide the network to accurately locate lesions.Secondly,a Cross Scale Feature Enhancement Module(CSFEM)is designed to extract multi-scale fusion features after downsampling.Thirdly,a Cross-Layer Interactive Feature Enhancement Module(CIFEM)is designed in the encoder to enhance the spatial position information and semantic information.Finally,a Cross-Dimension Cross-Layer Feature Enhancement Module(CCFEM)is proposed in the decoder,which effectively extractsmultimodal image features through global attention and multi-dimension local attention.The proposed method is verified on the lung multimodal medical image datasets,and the results showthat theMean Intersection overUnion(MIoU),Accuracy(Acc),Dice Similarity Coefficient(Dice),Volumetric overlap error(Voe),Relative volume difference(Rvd)of the proposed method on lung lesion segmentation are 87.27%,93.08%,97.77%,95.92%,89.28%,and 88.68%,respectively.It is of great significance for computer-aided diagnosis.
基金funded by(i)Natural Science Foundation China(NSFC)under Grant Nos.61402397,61263043,61562093 and 61663046(ii)Open Foundation of Key Laboratory in Software Engineering of Yunnan Province:No.2020SE304.(iii)Practical Innovation Project of Yunnan University,Project Nos.2021z34,2021y128 and 2021y129.
文摘Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.
基金supported in part by the National Natural Science Foundation of China(Grant No.61971078)Chongqing Education Commission Science and Technology Major Project(No.KJZD-M202301901).
文摘While single-modal visible light images or infrared images provide limited information,infrared light captures significant thermal radiation data,whereas visible light excels in presenting detailed texture information.Com-bining images obtained from both modalities allows for leveraging their respective strengths and mitigating individual limitations,resulting in high-quality images with enhanced contrast and rich texture details.Such capabilities hold promising applications in advanced visual tasks including target detection,instance segmentation,military surveillance,pedestrian detection,among others.This paper introduces a novel approach,a dual-branch decomposition fusion network based on AutoEncoder(AE),which decomposes multi-modal features into intensity and texture information for enhanced fusion.Local contrast enhancement module(CEM)and texture detail enhancement module(DEM)are devised to process the decomposed images,followed by image fusion through the decoder.The proposed loss function ensures effective retention of key information from the source images of both modalities.Extensive comparisons and generalization experiments demonstrate the superior performance of our network in preserving pixel intensity distribution and retaining texture details.From the qualitative results,we can see the advantages of fusion details and local contrast.In the quantitative experiments,entropy(EN),mutual information(MI),structural similarity(SSIM)and other results have improved and exceeded the SOTA(State of the Art)model as a whole.
文摘In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through crosssite scripting(XSS)attacks is one of the most commonly used attacks by hackers.Currently,deep learning-based XSS attack detection methods have good application prospects;however,they suffer from problems such as being prone to overfitting,a high false alarm rate,and low accuracy.To address these issues,we propose a multi-stage feature extraction and fusion model for XSS detection based on Random Forest feature enhancement.The model utilizes RandomForests to capture the intrinsic structure and patterns of the data by extracting leaf node indices as features,which are subsequentlymergedwith the original data features to forma feature setwith richer information content.Further feature extraction is conducted through three parallel channels.Channel I utilizes parallel onedimensional convolutional layers(1Dconvolutional layers)with different convolutional kernel sizes to extract local features at different scales and performmulti-scale feature fusion;Channel II employsmaximum one-dimensional pooling layers(max 1D pooling layers)of various sizes to extract key features from the data;and Channel III extracts global information bi-directionally using a Bi-Directional Long-Short TermMemory Network(Bi-LSTM)and incorporates a multi-head attention mechanism to enhance global features.Finally,effective classification and prediction of XSS are performed by fusing the features of the three channels.To test the effectiveness of the model,we conduct experiments on six datasets.We achieve an accuracy of 100%on the UNSW-NB15 dataset and 99.99%on the CICIDS2017 dataset,which is higher than that of the existing models.
文摘Manhole cover defect recognition is of significant practical importance as it can accurately identify damaged or missing covers, enabling timely replacement and maintenance. Traditional manhole cover detection techniques primarily focus on detecting the presence of covers rather than classifying the types of defects. However, manhole cover defects exhibit small inter-class feature differences and large intra-class feature variations, which makes their recognition challenging. To improve the classification of manhole cover defect types, we propose a Progressive Dual-Branch Feature Fusion Network (PDBFFN). The baseline backbone network adopts a multi-stage hierarchical architecture design using Res-Net50 as the visual feature extractor, from which both local and global information is obtained. Additionally, a Feature Enhancement Module (FEM) and a Fusion Module (FM) are introduced to enhance the network’s ability to learn critical features. Experimental results demonstrate that our model achieves a classification accuracy of 82.6% on a manhole cover defect dataset, outperforming several state-of-the-art fine-grained image classification models.
基金supported by the Sichuan Science and Technology Program under Grants No.2022YFQ0052 and No.2021YFQ0009.
文摘At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.
基金supported by the National Basic Research Program of China ("973" Program) (Grant No. 2011CB706805)the National Natural Science Foundation of China (Grant No. 51035007)
文摘The difficulty to select the best system parameters restricts the engineering application of stochastic resonance (SR). An adaptive cascade stochastic resonance (ACSR) is proposed in the present study. The proposed method introduces correlation theory into SR, and uses correlation coefficient of the input signals and noise as a weight to construct the weighted signal-to-noise ratio (WSNR) index. The influence of high frequency noise is alleviated and the signal-to-noise ratio index used in traditional SR is improved accordingly. The ACSR with WSNR can obtain optimal parameters adaptively. And it is not necessary to predict the exact frequency of the target signal. In addition, through the secondary utilization of noise, ACSR makes the signal output waveforrn smoother and the fluctuation period more obvious. Simulation example and engineering application of gearbox fault diagnosis demonstrate the effectiveness and feasibility of the proposed method.
基金funded by the Natural Science Foundation of China Grant No.202204120017the Autonomous Region Science and Technology Program Grant No.2022B01008-2the Autonomous Region Science and Technology Program Grant No.2020A02001-1.
文摘With the development of social media and the prevalence of mobile devices,an increasing number of people tend to use social media platforms to express their opinions and attitudes,leading to many online controversies.These online controversies can severely threaten social stability,making automatic detection of controversies particularly necessary.Most controversy detection methods currently focus on mining features from text semantics and propagation structures.However,these methods have two drawbacks:1)limited ability to capture structural features and failure to learn deeper structural features,and 2)neglecting the influence of topic information and ineffective utilization of topic features.In light of these phenomena,this paper proposes a social media controversy detection method called Dual Feature Enhanced Graph Convolutional Network(DFE-GCN).This method explores structural information at different scales from global and local perspectives to capture deeper structural features,enhancing the expressive power of structural features.Furthermore,to strengthen the influence of topic information,this paper utilizes attention mechanisms to enhance topic features after each graph convolutional layer,effectively using topic information.We validated our method on two different public datasets,and the experimental results demonstrate that our method achieves state-of-the-art performance compared to baseline methods.On the Weibo and Reddit datasets,the accuracy is improved by 5.92%and 3.32%,respectively,and the F1 score is improved by 1.99%and 2.17%,demonstrating the positive impact of enhanced structural features and topic features on controversy detection.
基金Supported by the National Natural Science Foundation of China under Grant Nos 51132004,11474096 and 11604199the Science and Technology Commission of Shanghai Municipality under Grant No 14JC1401500the Higher Education Key Program of He'nan Province under Grant Nos 17A140025 and 16A140030
文摘The femtosecond pulse shaping technique has been shown to be an effective method to control the multi-photon absorption by the light–matter interaction. Previous studies mainly focused on the quantum coherent control of the multi-photon absorption by the phase, amplitude and polarization modulation, but the coherent features of the multi-photon absorption depending on the energy level structure, the laser spectrum bandwidth and laser central frequency still lack in-depth systematic research. In this work, we further explore the coherent features of the resonance-mediated two-photon absorption in a rubidium atom by varying the energy level structure, spectrum bandwidth and central frequency of the femtosecond laser field. The theoretical results show that the change of the intermediate state detuning can effectively influence the enhancement of the near-resonant part, which further affects the transform-limited (TL)-normalized final state population maximum. Moreover, as the laser spectrum bandwidth increases, the TL-normalized final state population maximum can be effectively enhanced due to the increase of the enhancement in the near-resonant part, but the TL-normalized final state population maximum is constant by varying the laser central frequency. These studies can provide a clear physical picture for understanding the coherent features of the resonance-mediated two-photon absorption, and can also provide a theoretical guidance for the future applications.
基金supported by financial support from Universiti Sains Malaysia(USM)under FRGS Grant Number FRGS/1/2020/TK03/USM/02/1the School of Computer Sciences USM for their support.
文摘The detection of brain disease is an essential issue in medical and research areas.Deep learning techniques have shown promising results in detecting and diagnosing brain diseases using magnetic resonance imaging(MRI)images.These techniques involve training neural networks on large datasets of MRI images,allowing the networks to learn patterns and features indicative of different brain diseases.However,several challenges and limitations still need to be addressed further to improve the accuracy and effectiveness of these techniques.This paper implements a Feature Enhanced Stacked Auto Encoder(FESAE)model to detect brain diseases.The standard stack auto encoder’s results are trivial and not robust enough to boost the system’s accuracy.Therefore,the standard Stack Auto Encoder(SAE)is replaced with a Stacked Feature Enhanced Auto Encoder with a feature enhancement function to efficiently and effectively get non-trivial features with less activation energy froman image.The proposed model consists of four stages.First,pre-processing is performed to remove noise,and the greyscale image is converted to Red,Green,and Blue(RGB)to enhance feature details for discriminative feature extraction.Second,feature Extraction is performed to extract significant features for classification using DiscreteWavelet Transform(DWT)and Channelization.Third,classification is performed to classify MRI images into four major classes:Normal,Tumor,Brain Stroke,and Alzheimer’s.Finally,the FESAE model outperforms the state-of-theart,machine learning,and deep learning methods such as Artificial Neural Network(ANN),SAE,Random Forest(RF),and Logistic Regression(LR)by achieving a high accuracy of 98.61% on a dataset of 2000 MRI images.The proposed model has significant potential for assisting radiologists in diagnosing brain diseases more accurately and improving patient outcomes.
文摘In thefield of diagnosis of medical images the challenge lies in tracking and identifying the defective cells and the extent of the defective region within the complex structure of a brain cavity.Locating the defective cells precisely during the diagnosis phase helps tofight the greatest exterminator of mankind.Early detec-tion of these defective cells requires an accurate computer-aided diagnostic system(CAD)that supports early treatment and promotes survival rates of patients.An ear-lier version of CAD systems relies greatly on the expertise of radiologist and it con-sumed more time to identify the defective region.The manuscript takes the efficacy of coalescing features like intensity,shape,and texture of the magnetic resonance image(MRI).In the Enhanced Feature Fusion Segmentation based classification method(EEFS)the image is enhanced and segmented to extract the prominent fea-tures.To bring out the desired effect the EEFS method uses Enhanced Local Binary Pattern(EnLBP),Partisan Gray Level Co-occurrence Matrix Histogram of Oriented Gradients(PGLCMHOG),and iGrab cut method to segment image.These prominent features along with deep features are coalesced to provide a single-dimensional fea-ture vector that is effectively used for prediction.The coalesced vector is used with the existing classifiers to compare the results of these classifiers with that of the gen-erated vector.The generated vector provides promising results with commendably less computatio nal time for pre-processing and classification of MR medical images.
基金supported by the National Natural Science Foundation of China(Grant No.32301691)the National Key R&D Program of China and Shandong Province,China(Grant No.2021YFB3901300)the National Precision Agriculture Application Project(Grant/Contract number:JZNYYY001).
文摘Utilizing the spatiotemporal features contained in extensive trajectory data for identifying operation modes of agricultural machinery is an important basis task for subsequent agricultural machinery trajectory research.In the present study,to effectively identify agricultural machinery operation mode,a feature deformation network with multi-range feature enhancement was proposed.First,a multi-range feature enhancement module was developed to fully explore the feature distribution of agricultural machinery trajectory data.Second,to further enrich the representation of trajectories,a feature deformation module was proposed that can map trajectory points to high-dimensional space to form feature maps.Then,EfficientNet-B0 was used to extract features of different scales and depths from the feature map,select features highly relevant to the results,and finally accurately predict the mode of each trajectory point.To validate the effectiveness of the proposed method,experiments were conducted to compare the results with those of other methods on a dataset of real agricultural trajectories.On the corn and wheat harvester trajectory datasets,the model achieved accuracies of 96.88%and 96.68%,as well as F1 scores of 93.54%and 94.19%,exhibiting improvements of 8.35%and 9.08%in accuracy and 20.99%and 20.04%in F1 score compared with the current state-of-the-art method.
基金Auhui Provincial Key Research and Development Project(No.202004a07020050)National Natural Science Foundation of China Youth Program(No.61901006)。
文摘The extraction of water bodies is essential for monitoring water resources,ecosystem services and the hydrological cycle,so analyzing water bodies from remote sensing images is necessary.The water index is designed to highlight water bodies in remote sensing images.We employ a new water index and digital image processing technology to extract water bodies automatically and accurately from Landsat 8 OLI images.Firstly,we preprocess Landsat 8 OLI images with radiometric calibration and atmospheric correction.Subsequently,we apply KT transformation,LBV transformation,AWEI nsh,and HIS transformation to the preprocessed image to calculate a new water index.Then,we perform linear feature enhancement and improve the local adaptive threshold segmentation method to extract small water bodies accurately.Meanwhile,we employ morphological enhancement and improve the local adaptive threshold segmentation method to extract large water bodies.Finally,we combine small and large water bodies to get complete water bodies.Compared with other traditional methods,our method has apparent advantages in water extraction,particularly in the extraction of small water bodies.
基金This work is supported in part by the National Key R&D Program of China(2018AAA0102200)National Natural Science Foundation of China(62002375,62002376,62132021)+1 种基金Natural Science Foundation of Hunan Province of China(2021RC3071,2022RC1104,2021JJ40696)NUDT Research Grants(ZK22-52).
文摘Good proposal initials are critical for 3D object detection applications.However,due to the significant geometry variation of indoor scenes,incomplete and noisy proposals are inevitable in most cases.Mining feature information among these“bad”proposals may mislead the detection.Contrastive learning provides a feasible way for representing proposals,which can align complete and incomplete/noisy proposals in feature space.The aligned feature space can help us build robust 3D representation even if bad proposals are given.Therefore,we devise a new contrast learning framework for indoor 3D object detection,called EFECL,that learns robust 3D representations by contrastive learning of proposals on two different levels.Specifically,we optimize both instance-level and category-level contrasts to align features by capturing instance-specific characteristics and semantic-aware common patterns.Furthermore,we propose an enhanced feature aggregation module to extract more general and informative features for contrastive learning.Evaluations on ScanNet V2 and SUN RGB-D benchmarks demonstrate the generalizability and effectiveness of our method,and our method can achieve 12.3%and 7.3%improvements on both datasets over the benchmark alternatives.The code and models are publicly available at https://github.com/YaraDuan/EFECL.
基金funded by National Natural Science Foundation of China,Fund Number 61703424.
文摘This paper proposes a real-time detection method to improve the Infrared small target detection CenterNet(ISTD-CenterNet)network for detecting small infrared targets in complex environments.The method eliminates the need for an anchor frame,addressing the issues of low accuracy and slow speed.HRNet is used as the framework for feature extraction,and an ECBAM attention module is added to each stage branch for intelligent identification of the positions of small targets and significant objects.A scale enhancement module is also added to obtain a high-level semantic representation and fine-resolution prediction map for the entire infrared image.Besides,an improved sensory field enhancement module is designed to leverage semantic information in low-resolution feature maps,and a convolutional attention mechanism module is used to increase network stability and convergence speed.Comparison experiments conducted on the infrared small target data set ESIRST.The experiments show that compared to the benchmark network CenterNet-HRNet,the proposed ISTD-CenterNet improves the recall by 22.85%and the detection accuracy by 13.36%.Compared to the state-of-the-art YOLOv5small,the ISTD-CenterNet recall is improved by 5.88%,the detection precision is improved by 2.33%,and the detection frame rate is 48.94 frames/sec,which realizes the accurate real-time detection of small infrared targets.
基金supported by Scientific Research Project of Tianjin Education Commission(Nos.2020KJ091,2018KJ184)National Key Research and Development Program of China(No.2020YFD0900600)+1 种基金the Earmarked Fund for CARS(No.CARS-47)Tianjin Mariculture Industry Technology System Innovation Team Construction Project(No.ITTMRS2021000)。
文摘Sea cucumber detection is widely recognized as the key to automatic culture.The underwater light environment is complex and easily obscured by mud,sand,reefs,and other underwater organisms.To date,research on sea cucumber detection has mostly concentrated on the distinction between prospective objects and the background.However,the key to proper distinction is the effective extraction of sea cucumber feature information.In this study,the edge-enhanced scaling You Only Look Once-v4(YOLOv4)(ESYv4)was proposed for sea cucumber detection.By emphasizing the target features in a way that reduced the impact of different hues and brightness values underwater on the misjudgment of sea cucumbers,a bidirectional cascade network(BDCN)was used to extract the overall edge greyscale image in the image and add up the original RGB image as the detected input.Meanwhile,the YOLOv4 model for backbone detection is scaled,and the number of parameters is reduced to 48%of the original number of parameters.Validation results of 783images indicated that the detection precision of positive sea cucumber samples reached 0.941.This improvement reflects that the algorithm is more effective to improve the edge feature information of the target.It thus contributes to the automatic multi-objective detection of underwater sea cucumbers.
基金Supported by the Future Network Scientific Research Fund Project of Jiangsu Province (No. FNSRFP2021YB26)the Jiangsu Key R&D Fund on Social Development (No. BE2022789)the Science Foundation of Nanjing Institute of Technology (No. ZKJ202003)。
文摘Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse facial features of individual frames.In this paper, a frame-level attention module is integrated into an improved VGG-based frame work and a lightweight facial expression recognition method is proposed.The proposed network takes a sub video cut from an experimental video sequence as its input and generates a fixed-dimension representation.The VGG-based network with an enhanced branch embeds face images into feature vectors.The frame-level attention module learns weights which are used to adaptively aggregate the feature vectors to form a single discriminative video representation.Finally, a regression module outputs the classification results.The experimental results on CK+and AFEW databases show that the recognition rates of the proposed method can achieve the state-of-the-art performance.
文摘Image fusion has been developing into an important area of research. In remote sensing, the use of the same image sensor in different working modes, or different image sensors, can provide reinforcing or complementary information. Therefore, it is highly valuable to fuse outputs from multiple sensors (or the same sensor in different working modes) to improve the overall performance of the remote images, which are very useful for human visual perception and image processing task. Accordingly, in this paper, we first provide a comprehensive survey of the state of the art of multi-sensor image fusion methods in terms of three aspects: pixel-level fusion, feature-level fusion and decision-level fusion. An overview of existing fusion strategies is then introduced, after which the existing fusion quality measures are summarized. Finally, this review analyzes the development trends in fusion algorithms that may attract researchers to further explore the research in this field.