Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the f...Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the features in lung X-ray images.A pneumonia classification model based on multi-scale directional feature enhancement MSD-Net is proposed in this paper.The main innovations are as follows:Firstly,the Multi-scale Residual Feature Extraction Module(MRFEM)is designed to effectively extract multi-scale features.The MRFEM uses dilated convolutions with different expansion rates to increase the receptive field and extract multi-scale features effectively.Secondly,the Multi-scale Directional Feature Perception Module(MDFPM)is designed,which uses a three-branch structure of different sizes convolution to transmit direction feature layer by layer,and focuses on the target region to enhance the feature information.Thirdly,the Axial Compression Former Module(ACFM)is designed to perform global calculations to enhance the perception ability of global features in different directions.To verify the effectiveness of the MSD-Net,comparative experiments and ablation experiments are carried out.In the COVID-19 RADIOGRAPHY DATABASE,the Accuracy,Recall,Precision,F1 Score,and Specificity of MSD-Net are 97.76%,95.57%,95.52%,95.52%,and 98.51%,respectively.In the chest X-ray dataset,the Accuracy,Recall,Precision,F1 Score and Specificity of MSD-Net are 97.78%,95.22%,96.49%,95.58%,and 98.11%,respectively.This model improves the accuracy of lung image recognition effectively and provides an important clinical reference to pneumonia Computer-Aided Diagnosis.展开更多
In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract i...In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd dat...In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the network.Specifically,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.展开更多
With the development of sensors,the application of multi-source remote sensing data has been widely concerned.Since hyperspectral image(HSI)contains rich spectral information while light detection and ranging(LiDAR)da...With the development of sensors,the application of multi-source remote sensing data has been widely concerned.Since hyperspectral image(HSI)contains rich spectral information while light detection and ranging(LiDAR)data contains elevation information,joint use of them for ground object classification can yield positive results,especially by building deep networks.Fortu-nately,multi-scale deep networks allow to expand the receptive fields of convolution without causing the computational and training problems associated with simply adding more network layers.In this work,a multi-scale feature fusion network is proposed for the joint classification of HSI and LiDAR data.First,we design a multi-scale spatial feature extraction module with cross-channel connections,by which spatial information of HSI data and elevation information of LiDAR data are extracted and fused.In addition,a multi-scale spectral feature extraction module is employed to extract the multi-scale spectral features of HSI data.Finally,joint multi-scale features are obtained by weighting and concatenation operations and then fed into the classifier.To verify the effective-ness of the proposed network,experiments are carried out on the MUUFL Gulfport and Trento datasets.The experimental results demonstrate that the classification performance of the proposed method is superior to that of other state-of-the-art methods.展开更多
Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality a...Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality and quantity,and a narrow range of applicable tasks.These limitations significantly restrict the capacity and applicability of CMFD.To overcome the limitations of existing methods,a novel solution called IMTNet is proposed for CMFD by employing a feature decoupling approach.Firstly,this study formulates the objective task and network relationship as an optimization problem using transfer learning.Furthermore,it thoroughly discusses and analyzes the relationship between CMFD and deep network architecture by employing ResNet-50 during the optimization solving phase.Secondly,a quantitative comparison between fine-tuning and feature decoupling is conducted to evaluate the degree of similarity between the image classification and CMFD domains by the enhanced ResNet-50.Finally,suspicious regions are localized using a feature pyramid network with bottom-up path augmentation.Experimental results demonstrate that IMTNet achieves faster convergence,shorter training times,and favorable generalization performance compared to existingmethods.Moreover,it is shown that IMTNet significantly outperforms fine-tuning based approaches in terms of accuracy and F_(1).展开更多
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta...The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.展开更多
Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often...Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.展开更多
In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) ba...In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) based on the maximum margin criterion(MMC) is proposed for recognizing the class of ship targets utilizing the high-resolution range profile(HRRP). Multi-scale fusion is introduced to capture the local and detailed information in small-scale features, and the global and contour information in large-scale features, offering help to extract the edge information from sea clutter and further improving the target recognition accuracy. The proposed method can maximally preserve the multi-scale fusion sparse of data and maximize the class separability in the reduced dimensionality by reproducing kernel Hilbert space. Experimental results on the measured radar data show that the proposed method can effectively extract the features of ship target from sea clutter, further reduce the feature dimensionality, and improve target recognition performance.展开更多
In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on m...In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on multi-scale wavelet entropy feature extraction and feature weighting was proposed. With the only priori knowledge of signal to noise ratio(SNR), the method of extracting multi-scale wavelet entropy features of wavelet coefficients from different received signals were combined with calculating uneven weight factor and stability weight factor of the extracted multi-dimensional characteristics. Radar emitter signals of different modulation types and different parameters modulated were recognized through feature weighting and feature fusion. Theoretical analysis and simulation results show that the presented algorithm has a high recognition rate. Additionally, when the SNR is greater than-4 d B, the correct recognition rate is higher than 93%. Hence, the proposed algorithm has great application value.展开更多
Feature extraction of signals plays an important role in classification problems because of data dimension reduction property and potential improvement of a classification accuracy rate. Principal component analysis (...Feature extraction of signals plays an important role in classification problems because of data dimension reduction property and potential improvement of a classification accuracy rate. Principal component analysis (PCA), wavelets transform or Fourier transform methods are often used for feature extraction. In this paper, we propose a multi-scale PCA, which combines discrete wavelet transform, and PCA for feature extraction of signals in both the spatial and temporal domains. Our study shows that the multi-scale PCA combined with the proposed new classification methods leads to high classification accuracy for the considered signals.展开更多
Face detection is applied to many tasks such as auto focus control, surveillance, user interface, and face recognition. Processing speed and detection accuracy of the face detection have been improved continuously. Th...Face detection is applied to many tasks such as auto focus control, surveillance, user interface, and face recognition. Processing speed and detection accuracy of the face detection have been improved continuously. This paper describes a novel method of fast face detection with multi-scale window search free from image resizing. We adopt statistics of gradient images (SGI) as image features and append an overlapping cell array to improve detection accuracy. The SGI feature is scale invariant and insensitive to small difference of pixel value. These characteristics enable the multi-scale window search without image resizing. Experimental results show that processing speed of our method is 3.66 times faster than a conventional method, adopting HOG features combined to an SVM classifier, without accuracy degradation.展开更多
Three materials(agar,konjac glucomannan(KGM)andκ-carrageenan)were used to prepare ternary systems,i.e.,sol-gels and their dried composites conditioned at varied relative humidity(RH)(33%,54%and 75%).Combined methods,...Three materials(agar,konjac glucomannan(KGM)andκ-carrageenan)were used to prepare ternary systems,i.e.,sol-gels and their dried composites conditioned at varied relative humidity(RH)(33%,54%and 75%).Combined methods,e.g.,scanning electron microscopy,small-angle X-ray scattering,infrared spectroscopy(IR)and X-ray diffraction(XRD),were used to disclose howκ-carrageenan addition tailors the features of agar/KGM/κ-carrageenan ternary system.As affirmed by IR and XRD,the ternary systems withκ-carrageenan below 25%(agar/KGM/carrageenan,50:25:25,m/m)displayed proper component interactions,which increased the sol-gel transition temperature and the hardness of obtained gels.For instance,the ternary composites could show hardness about 3 to 4 times higher than that for binary counterpart.These gels were dehydrated to acquire ternary composites.Compared to agar/KGM composite,the ternary composites showed fewer crystallites and nanoscale orders,and newly-formed nanoscale structures from chain assembly.Such multi-scale structures,for composites withκ-carrageenan below 25%,showed weaker changes with RH,as revealed by especially morphologic and crystalline features.Consequently,the ternary composites with lessκ-carrageenan(below 25%)exhibited stabilized elongation at break and hydrophilicity at different RHs.This hints to us that agar/KGM/κ-carrageenan composite systems can display series applications with improved features,e.g.,increased sol-gel transition point.展开更多
A system for classifying four basic table tennis strokes using wearable devices and deep learning networks is proposed in this study.The wearable device consisted of a six-axis sensor,Raspberry Pi 3,and a power bank.M...A system for classifying four basic table tennis strokes using wearable devices and deep learning networks is proposed in this study.The wearable device consisted of a six-axis sensor,Raspberry Pi 3,and a power bank.Multiple kernel sizes were used in convolutional neural network(CNN)to evaluate their performance for extracting features.Moreover,a multiscale CNN with two kernel sizes was used to perform feature fusion at different scales in a concatenated manner.The CNN achieved recognition of the four table tennis strokes.Experimental data were obtained from20 research participants who wore sensors on the back of their hands while performing the four table tennis strokes in a laboratory environment.The data were collected to verify the performance of the proposed models for wearable devices.Finally,the sensor and multi-scale CNN designed in this study achieved accuracy and F1 scores of 99.58%and 99.16%,respectively,for the four strokes.The accuracy for five-fold cross validation was 99.87%.This result also shows that the multi-scale convolutional neural network has better robustness after fivefold cross validation.展开更多
Background Recurrent recovery is a common method for video super-resolution(VSR)that models the correlation between frames via hidden states.However,the application of this structure in real-world scenarios can lead t...Background Recurrent recovery is a common method for video super-resolution(VSR)that models the correlation between frames via hidden states.However,the application of this structure in real-world scenarios can lead to unsatisfactory artifacts.We found that in real-world VSR training,the use of unknown and complex degradation can better simulate the degradation process in the real world.Methods Based on this,we propose the RealFuVSR model,which simulates real-world degradation and mitigates artifacts caused by the VSR.Specifically,we propose a multiscale feature extraction module(MSF)module that extracts and fuses features from multiple scales,thereby facilitating the elimination of hidden state artifacts.To improve the accuracy of the hidden state alignment information,RealFuVSR uses an advanced optical flow-guided deformable convolution.Moreover,a cascaded residual upsampling module was used to eliminate noise caused by the upsampling process.Results The experiment demonstrates that RealFuVSR model can not only recover high-quality videos but also outperforms the state-of-the-art RealBasicVSR and RealESRGAN models.展开更多
Background:Early singular nodular hepatocellular carcinoma(HCC)is an ideal surgical indication in clinical practice.However,almost half of the patients have tumor recurrence,and there is no reliable prognostic predict...Background:Early singular nodular hepatocellular carcinoma(HCC)is an ideal surgical indication in clinical practice.However,almost half of the patients have tumor recurrence,and there is no reliable prognostic prediction tool.Besides,it is unclear whether preoperative neoadjuvant therapy is necessary for patients with early singular nodular HCC and which patient needs it.It is critical to identify the patients with high risk of recurrence and to treat these patients preoperatively with neoadjuvant therapy and thus,to improve the outcomes of these patients.The present study aimed to develop two prognostic models to preoperatively predict the recurrence-free survival(RFS)and overall survival(OS)in patients with singular nodular HCC by integrating the clinical data and radiological features.Methods:We retrospective recruited 211 patients with singular nodular HCC from December 2009 to January 2019 at Eastern Hepatobiliary Surgery Hospital(EHBH).They all met the surgical indications and underwent radical resection.We randomly divided the patients into the training cohort(n=132)and the validation cohort(n=79).We established and validated multivariate Cox proportional hazard models by the preoperative clinicopathologic factors and radiological features for association with RFS and OS.By analyzing the receiver operating characteristic(ROC)curve,the discrimination accuracy of the models was compared with that of the traditional predictive models.Results:Our RFS model was based on HBV-DNA score,cirrhosis,tumor diameter and tumor capsule in imaging.RFS nomogram had fine calibration and discrimination capabilities,with a C-index of 0.74(95%CI:0.68-0.80).The OS nomogram,based on cirrhosis,tumor diameter and tumor capsule in imaging,had fine calibration and discrimination capabilities,with a C-index of 0.81(95%CI:0.74-0.87).The area under the receiver operating characteristic curve(AUC)of our model was larger than that of traditional liver cancer staging system,Korea model and Nomograms in Hepatectomy Patients with Hepatitis B VirusRelated Hepatocellular Carcinoma,indicating better discrimination capability.According to the models,we fitted the linear prediction equations.These results were validated in the validation cohort.Conclusions:Compared with previous radiography model,the new-developed predictive model was concise and applicable to predict the postoperative survival of patients with singular nodular HCC.Our models may preoperatively identify patients with high risk of recurrence.These patients may benefit from neoadjuvant therapy which may improve the patients’outcomes.展开更多
The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identifica...The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.展开更多
Self-oscillating systems abound in the natural world and offer substantial potential for applications in controllers,micro-motors,medical equipments,and so on.Currently,numerical methods have been widely utilized for ...Self-oscillating systems abound in the natural world and offer substantial potential for applications in controllers,micro-motors,medical equipments,and so on.Currently,numerical methods have been widely utilized for obtaining the characteristics of self-oscillation including amplitude and frequency.However,numerical methods are burdened by intricate computations and limited precision,hindering comprehensive investigations into self-oscillating systems.In this paper,the stability of a liquid crystal elastomer fiber self-oscillating system under a linear temperature field is studied,and analytical solutions for the amplitude and frequency are determined.Initially,we establish the governing equations of self-oscillation,elucidate two motion regimes,and reveal the underlying mechanism.Subsequently,we conduct a stability analysis and employ a multi-scale method to obtain the analytical solutions for the amplitude and frequency.The results show agreement between the multi-scale and numerical methods.This research contributes to the examination of diverse self-oscillating systems and advances the theoretical analysis of self-oscillating systems rooted in active materials.展开更多
In industrial process control systems,there is overwhelming evidence corroborating the notion that economic or technical limitations result in some key variables that are very difficult to measure online.The data-driv...In industrial process control systems,there is overwhelming evidence corroborating the notion that economic or technical limitations result in some key variables that are very difficult to measure online.The data-driven soft sensor is an effective solution because it provides a reliable and stable online estimation of such variables.This paper employs a deep neural network with multiscale feature extraction layers to build soft sensors,which are applied to the benchmarked Tennessee-Eastman process(TEP)and a real wind farm case.The comparison of modelling results demonstrates that the multiscale feature extraction layers have the following advantages over other methods.First,the multiscale feature extraction layers significantly reduce the number of parameters compared to the other deep neural networks.Second,the multiscale feature extraction layers can powerfully extract dataset characteristics.Finally,the multiscale feature extraction layers with fully considered historical measurements can contain richer useful information and improved representation compared to traditional data-driven models.展开更多
Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usuall...Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.展开更多
基金supported in part by the National Natural Science Foundation of China(Grant No.62062003)Natural Science Foundation of Ningxia(Grant No.2023AAC03293).
文摘Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the features in lung X-ray images.A pneumonia classification model based on multi-scale directional feature enhancement MSD-Net is proposed in this paper.The main innovations are as follows:Firstly,the Multi-scale Residual Feature Extraction Module(MRFEM)is designed to effectively extract multi-scale features.The MRFEM uses dilated convolutions with different expansion rates to increase the receptive field and extract multi-scale features effectively.Secondly,the Multi-scale Directional Feature Perception Module(MDFPM)is designed,which uses a three-branch structure of different sizes convolution to transmit direction feature layer by layer,and focuses on the target region to enhance the feature information.Thirdly,the Axial Compression Former Module(ACFM)is designed to perform global calculations to enhance the perception ability of global features in different directions.To verify the effectiveness of the MSD-Net,comparative experiments and ablation experiments are carried out.In the COVID-19 RADIOGRAPHY DATABASE,the Accuracy,Recall,Precision,F1 Score,and Specificity of MSD-Net are 97.76%,95.57%,95.52%,95.52%,and 98.51%,respectively.In the chest X-ray dataset,the Accuracy,Recall,Precision,F1 Score and Specificity of MSD-Net are 97.78%,95.22%,96.49%,95.58%,and 98.11%,respectively.This model improves the accuracy of lung image recognition effectively and provides an important clinical reference to pneumonia Computer-Aided Diagnosis.
基金the Scientific Research Foundation of Liaoning Provincial Department of Education(No.LJKZ0139)the Program for Liaoning Excellent Talents in University(No.LR15045).
文摘In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金the Humanities and Social Science Fund of the Ministry of Education of China(21YJAZH077)。
文摘In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the network.Specifically,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.
基金supported by the National Key Research and Development Project(No.2020YFC1512000)the General Projects of Key R&D Programs in Shaanxi Province(No.2020GY-060)Xi’an Science&Technology Project(No.2020KJRC 0126)。
文摘With the development of sensors,the application of multi-source remote sensing data has been widely concerned.Since hyperspectral image(HSI)contains rich spectral information while light detection and ranging(LiDAR)data contains elevation information,joint use of them for ground object classification can yield positive results,especially by building deep networks.Fortu-nately,multi-scale deep networks allow to expand the receptive fields of convolution without causing the computational and training problems associated with simply adding more network layers.In this work,a multi-scale feature fusion network is proposed for the joint classification of HSI and LiDAR data.First,we design a multi-scale spatial feature extraction module with cross-channel connections,by which spatial information of HSI data and elevation information of LiDAR data are extracted and fused.In addition,a multi-scale spectral feature extraction module is employed to extract the multi-scale spectral features of HSI data.Finally,joint multi-scale features are obtained by weighting and concatenation operations and then fed into the classifier.To verify the effective-ness of the proposed network,experiments are carried out on the MUUFL Gulfport and Trento datasets.The experimental results demonstrate that the classification performance of the proposed method is superior to that of other state-of-the-art methods.
基金supported and founded by the Guizhou Provincial Science and Technology Project under the Grant No.QKH-Basic-ZK[2021]YB311the Youth Science and Technology Talent Growth Project of Guizhou Provincial Education Department under Grant No.QJH-KY-ZK[2021]132+2 种基金the Guizhou Provincial Science and Technology Project under the Grant No.QKH-Basic-ZK[2021]YB319the National Natural Science Foundation of China(NSFC)under Grant 61902085the Key Laboratory Program of Blockchain and Fintech of Department of Education of Guizhou Province(2023-014).
文摘Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality and quantity,and a narrow range of applicable tasks.These limitations significantly restrict the capacity and applicability of CMFD.To overcome the limitations of existing methods,a novel solution called IMTNet is proposed for CMFD by employing a feature decoupling approach.Firstly,this study formulates the objective task and network relationship as an optimization problem using transfer learning.Furthermore,it thoroughly discusses and analyzes the relationship between CMFD and deep network architecture by employing ResNet-50 during the optimization solving phase.Secondly,a quantitative comparison between fine-tuning and feature decoupling is conducted to evaluate the degree of similarity between the image classification and CMFD domains by the enhanced ResNet-50.Finally,suspicious regions are localized using a feature pyramid network with bottom-up path augmentation.Experimental results demonstrate that IMTNet achieves faster convergence,shorter training times,and favorable generalization performance compared to existingmethods.Moreover,it is shown that IMTNet significantly outperforms fine-tuning based approaches in terms of accuracy and F_(1).
基金This work was supported in part by the National Natural Science Foundation of China(Grant#:82260362)in part by the National Key R&D Program of China(Grant#:2021ZD0111000)+1 种基金in part by the Key R&D Project of Hainan Province(Grant#:ZDYF2021SHFZ243)in part by the Major Science and Technology Project of Haikou(Grant#:2020-009).
文摘The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.
基金This research was supported by the National Natural Science Foundation of China No.62276086the National Key R&D Program of China No.2022YFD2000100Zhejiang Provincial Natural Science Foundation of China under Grant No.LTGN23D010002.
文摘Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.
基金supported by the National Natural Science Foundation of China (62271255,61871218)the Fundamental Research Funds for the Central University (3082019NC2019002)+1 种基金the Aeronautical Science Foundation (ASFC-201920007002)the Program of Remote Sensing Intelligent Monitoring and Emergency Services for Regional Security Elements。
文摘In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) based on the maximum margin criterion(MMC) is proposed for recognizing the class of ship targets utilizing the high-resolution range profile(HRRP). Multi-scale fusion is introduced to capture the local and detailed information in small-scale features, and the global and contour information in large-scale features, offering help to extract the edge information from sea clutter and further improving the target recognition accuracy. The proposed method can maximally preserve the multi-scale fusion sparse of data and maximize the class separability in the reduced dimensionality by reproducing kernel Hilbert space. Experimental results on the measured radar data show that the proposed method can effectively extract the features of ship target from sea clutter, further reduce the feature dimensionality, and improve target recognition performance.
基金Project(61301095)supported by the National Natural Science Foundation of ChinaProject(QC2012C070)supported by Heilongjiang Provincial Natural Science Foundation for the Youth,ChinaProjects(HEUCF130807,HEUCFZ1129)supported by the Fundamental Research Funds for the Central Universities of China
文摘In modern electromagnetic environment, radar emitter signal recognition is an important research topic. On the basis of multi-resolution wavelet analysis, an adaptive radar emitter signal recognition method based on multi-scale wavelet entropy feature extraction and feature weighting was proposed. With the only priori knowledge of signal to noise ratio(SNR), the method of extracting multi-scale wavelet entropy features of wavelet coefficients from different received signals were combined with calculating uneven weight factor and stability weight factor of the extracted multi-dimensional characteristics. Radar emitter signals of different modulation types and different parameters modulated were recognized through feature weighting and feature fusion. Theoretical analysis and simulation results show that the presented algorithm has a high recognition rate. Additionally, when the SNR is greater than-4 d B, the correct recognition rate is higher than 93%. Hence, the proposed algorithm has great application value.
文摘Feature extraction of signals plays an important role in classification problems because of data dimension reduction property and potential improvement of a classification accuracy rate. Principal component analysis (PCA), wavelets transform or Fourier transform methods are often used for feature extraction. In this paper, we propose a multi-scale PCA, which combines discrete wavelet transform, and PCA for feature extraction of signals in both the spatial and temporal domains. Our study shows that the multi-scale PCA combined with the proposed new classification methods leads to high classification accuracy for the considered signals.
文摘Face detection is applied to many tasks such as auto focus control, surveillance, user interface, and face recognition. Processing speed and detection accuracy of the face detection have been improved continuously. This paper describes a novel method of fast face detection with multi-scale window search free from image resizing. We adopt statistics of gradient images (SGI) as image features and append an overlapping cell array to improve detection accuracy. The SGI feature is scale invariant and insensitive to small difference of pixel value. These characteristics enable the multi-scale window search without image resizing. Experimental results show that processing speed of our method is 3.66 times faster than a conventional method, adopting HOG features combined to an SVM classifier, without accuracy degradation.
基金the National Natural Science Foundation of China(32172240)BL19U2 beamline of National Facility for Protein Science in Shanghai(NFPS)at Shanghai Synchrotron Radiation Facility,for their assistance during data collection。
文摘Three materials(agar,konjac glucomannan(KGM)andκ-carrageenan)were used to prepare ternary systems,i.e.,sol-gels and their dried composites conditioned at varied relative humidity(RH)(33%,54%and 75%).Combined methods,e.g.,scanning electron microscopy,small-angle X-ray scattering,infrared spectroscopy(IR)and X-ray diffraction(XRD),were used to disclose howκ-carrageenan addition tailors the features of agar/KGM/κ-carrageenan ternary system.As affirmed by IR and XRD,the ternary systems withκ-carrageenan below 25%(agar/KGM/carrageenan,50:25:25,m/m)displayed proper component interactions,which increased the sol-gel transition temperature and the hardness of obtained gels.For instance,the ternary composites could show hardness about 3 to 4 times higher than that for binary counterpart.These gels were dehydrated to acquire ternary composites.Compared to agar/KGM composite,the ternary composites showed fewer crystallites and nanoscale orders,and newly-formed nanoscale structures from chain assembly.Such multi-scale structures,for composites withκ-carrageenan below 25%,showed weaker changes with RH,as revealed by especially morphologic and crystalline features.Consequently,the ternary composites with lessκ-carrageenan(below 25%)exhibited stabilized elongation at break and hydrophilicity at different RHs.This hints to us that agar/KGM/κ-carrageenan composite systems can display series applications with improved features,e.g.,increased sol-gel transition point.
基金supporting of the Ministry of Science and Technology MOST(Grant No.MOST 108–2221-E-150–022-MY3,MOST 110–2634-F-019–002)the National Taiwan Ocean University,China.
文摘A system for classifying four basic table tennis strokes using wearable devices and deep learning networks is proposed in this study.The wearable device consisted of a six-axis sensor,Raspberry Pi 3,and a power bank.Multiple kernel sizes were used in convolutional neural network(CNN)to evaluate their performance for extracting features.Moreover,a multiscale CNN with two kernel sizes was used to perform feature fusion at different scales in a concatenated manner.The CNN achieved recognition of the four table tennis strokes.Experimental data were obtained from20 research participants who wore sensors on the back of their hands while performing the four table tennis strokes in a laboratory environment.The data were collected to verify the performance of the proposed models for wearable devices.Finally,the sensor and multi-scale CNN designed in this study achieved accuracy and F1 scores of 99.58%and 99.16%,respectively,for the four strokes.The accuracy for five-fold cross validation was 99.87%.This result also shows that the multi-scale convolutional neural network has better robustness after fivefold cross validation.
基金Supported by Open Project of the Ministry of Industry and Information Technology Key Laboratory of Performance and Reliability Testing and Evaluation for Basic Software and Hardware。
文摘Background Recurrent recovery is a common method for video super-resolution(VSR)that models the correlation between frames via hidden states.However,the application of this structure in real-world scenarios can lead to unsatisfactory artifacts.We found that in real-world VSR training,the use of unknown and complex degradation can better simulate the degradation process in the real world.Methods Based on this,we propose the RealFuVSR model,which simulates real-world degradation and mitigates artifacts caused by the VSR.Specifically,we propose a multiscale feature extraction module(MSF)module that extracts and fuses features from multiple scales,thereby facilitating the elimination of hidden state artifacts.To improve the accuracy of the hidden state alignment information,RealFuVSR uses an advanced optical flow-guided deformable convolution.Moreover,a cascaded residual upsampling module was used to eliminate noise caused by the upsampling process.Results The experiment demonstrates that RealFuVSR model can not only recover high-quality videos but also outperforms the state-of-the-art RealBasicVSR and RealESRGAN models.
基金supported by grants from the Shanghai Rising-Star Program(19QA1408700)the National Natural Science Founda-tion of China(81972575 and 81521091)Clinical Research Plan of SHDC(SHDC2020CR5007)。
文摘Background:Early singular nodular hepatocellular carcinoma(HCC)is an ideal surgical indication in clinical practice.However,almost half of the patients have tumor recurrence,and there is no reliable prognostic prediction tool.Besides,it is unclear whether preoperative neoadjuvant therapy is necessary for patients with early singular nodular HCC and which patient needs it.It is critical to identify the patients with high risk of recurrence and to treat these patients preoperatively with neoadjuvant therapy and thus,to improve the outcomes of these patients.The present study aimed to develop two prognostic models to preoperatively predict the recurrence-free survival(RFS)and overall survival(OS)in patients with singular nodular HCC by integrating the clinical data and radiological features.Methods:We retrospective recruited 211 patients with singular nodular HCC from December 2009 to January 2019 at Eastern Hepatobiliary Surgery Hospital(EHBH).They all met the surgical indications and underwent radical resection.We randomly divided the patients into the training cohort(n=132)and the validation cohort(n=79).We established and validated multivariate Cox proportional hazard models by the preoperative clinicopathologic factors and radiological features for association with RFS and OS.By analyzing the receiver operating characteristic(ROC)curve,the discrimination accuracy of the models was compared with that of the traditional predictive models.Results:Our RFS model was based on HBV-DNA score,cirrhosis,tumor diameter and tumor capsule in imaging.RFS nomogram had fine calibration and discrimination capabilities,with a C-index of 0.74(95%CI:0.68-0.80).The OS nomogram,based on cirrhosis,tumor diameter and tumor capsule in imaging,had fine calibration and discrimination capabilities,with a C-index of 0.81(95%CI:0.74-0.87).The area under the receiver operating characteristic curve(AUC)of our model was larger than that of traditional liver cancer staging system,Korea model and Nomograms in Hepatectomy Patients with Hepatitis B VirusRelated Hepatocellular Carcinoma,indicating better discrimination capability.According to the models,we fitted the linear prediction equations.These results were validated in the validation cohort.Conclusions:Compared with previous radiography model,the new-developed predictive model was concise and applicable to predict the postoperative survival of patients with singular nodular HCC.Our models may preoperatively identify patients with high risk of recurrence.These patients may benefit from neoadjuvant therapy which may improve the patients’outcomes.
基金National Natural Science Foundation of China(No.62001100)。
文摘The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.
基金Project supported by the National Natural Science Foundation of China (No.12172001)the Anhui Provincial Natural Science Foundation of China (No.2208085Y01)+1 种基金the University Natural Science Research Project of Anhui Province of China (No.2022AH020029)the Housing and Urban-Rural Development Science and Technology Project of Anhui Province of China (No.2023-YF129)。
文摘Self-oscillating systems abound in the natural world and offer substantial potential for applications in controllers,micro-motors,medical equipments,and so on.Currently,numerical methods have been widely utilized for obtaining the characteristics of self-oscillation including amplitude and frequency.However,numerical methods are burdened by intricate computations and limited precision,hindering comprehensive investigations into self-oscillating systems.In this paper,the stability of a liquid crystal elastomer fiber self-oscillating system under a linear temperature field is studied,and analytical solutions for the amplitude and frequency are determined.Initially,we establish the governing equations of self-oscillation,elucidate two motion regimes,and reveal the underlying mechanism.Subsequently,we conduct a stability analysis and employ a multi-scale method to obtain the analytical solutions for the amplitude and frequency.The results show agreement between the multi-scale and numerical methods.This research contributes to the examination of diverse self-oscillating systems and advances the theoretical analysis of self-oscillating systems rooted in active materials.
基金supported by National Natural Science Foundation of China(No.61873142)the Science and Technology Research Program of the Chongqing Municipal Education Commission,China(Nos.KJZD-K202201901,KJQN202201109,KJQN202101904,KJQN202001903 and CXQT21035)+2 种基金the Scientific Research Foundation of Chongqing University of Technology,China(No.2019ZD76)the Scientific Research Foundation of Chongqing Institute of Engineering,China(No.2020xzky05)the Chongqing Municipal Natural Science Foundation,China(No.cstc2020jcyj-msxmX0666).
文摘In industrial process control systems,there is overwhelming evidence corroborating the notion that economic or technical limitations result in some key variables that are very difficult to measure online.The data-driven soft sensor is an effective solution because it provides a reliable and stable online estimation of such variables.This paper employs a deep neural network with multiscale feature extraction layers to build soft sensors,which are applied to the benchmarked Tennessee-Eastman process(TEP)and a real wind farm case.The comparison of modelling results demonstrates that the multiscale feature extraction layers have the following advantages over other methods.First,the multiscale feature extraction layers significantly reduce the number of parameters compared to the other deep neural networks.Second,the multiscale feature extraction layers can powerfully extract dataset characteristics.Finally,the multiscale feature extraction layers with fully considered historical measurements can contain richer useful information and improved representation compared to traditional data-driven models.
基金This work was supported by the National Natural Science Foundation of China(Nos.62073322 and 61633020)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.