Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly...Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly heterogeneous appearance and shape.Deep convolution neural networks(CNNs)have recently improved glioma segmentation performance.However,extensive down-sampling such as pooling or stridden convolution in CNNs significantly decreases the initial image resolution,resulting in the loss of accurate spatial and object parts information,especially information on the small sub-region tumors,affecting segmentation performance.Hence,this paper proposes a novel multi-level parallel network comprising three different level parallel subnetworks to fully use low-level,mid-level,and high-level information and improve the performance of brain tumor segmentation.We also introduce the Combo loss function to address input class imbalance and false positives and negatives imbalance in deep learning.The proposed method is trained and validated on the BraTS 2020 training and validation dataset.On the validation dataset,ourmethod achieved a mean Dice score of 0.907,0.830,and 0.787 for the whole tumor,tumor core,and enhancing tumor core,respectively.Compared with state-of-the-art methods,the multi-level parallel network has achieved competitive results on the validation dataset.展开更多
For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work co...For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program.The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced.A common convolutional neural network(CNN)is built to verify the compatibility and functionality of the Pomelo processor.CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor.A specific algorithm for organizing a Z-shaped memory structure is developed,which addresses reducing memory access in mass data computing tasks.Performing the above-combined adaptation and optimization strategy,the experimental result demonstrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance.A 6.52 times performance is achieved on the 4 processing elements case.展开更多
In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid...In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.展开更多
In the actual complex environment,the recognition accuracy of crop leaf disease is often not high.Inspired by the brain parallel interaction mechanism,a two-stream parallel interactive convolutional neural network(TSP...In the actual complex environment,the recognition accuracy of crop leaf disease is often not high.Inspired by the brain parallel interaction mechanism,a two-stream parallel interactive convolutional neural network(TSPI-CNN)is proposed to improve the recognition accuracy.TSPI-CNN includes a two-stream parallel network(TSP-Net)and a parallel interactive network(PI-Net).TSP-Net simulates the ventral and dorsal stream.PI-Net simulates the interaction between two pathways in the process of human brain visual information transmission.A large number of experiments shows that the proposed TSPI-CNN performs well on MK-D2,PlantVillage,Apple-3 leaf,and Cassava leaf datasets.Furthermore,the effect of numbers of interactions on the recognition performance of TSPI-CNN is discussed.The experimental results show that as the number of interactions increases,the recognition accuracy of the network also increases.Finally,the network is visualized to show the working mechanism of the network and provide enlightenment for future research.展开更多
To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic...To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic scheduling algorithms can generate local optima easily,while the dynamic programming algorithm has a long convergence time for complex structural models.This paper mainly studies the parallel scheduling between operators and proposes an inter-operator parallelism schedule(IOPS)scheduling algorithm that guarantees the minimum similar execution delay.Firstly,a graph partitioning algorithm based on the largest block is designed to split the neural network model into multiple subgraphs.Then,the operators that meet the conditions is replaced according to the defined operator replacement rules.Finally,the optimal scheduling method based on backtracking is used to schedule the computational graph.Network models such as Inception-v3,ResNet-50,and RandWire are selected for testing.The experimental results show that the algorithm designed in this paper can achieve a 1.6×speedup compared with the existing sequential execution methods.展开更多
基金supported by the Sichuan Science and Technology Program (No.2019YJ0356).
文摘Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly heterogeneous appearance and shape.Deep convolution neural networks(CNNs)have recently improved glioma segmentation performance.However,extensive down-sampling such as pooling or stridden convolution in CNNs significantly decreases the initial image resolution,resulting in the loss of accurate spatial and object parts information,especially information on the small sub-region tumors,affecting segmentation performance.Hence,this paper proposes a novel multi-level parallel network comprising three different level parallel subnetworks to fully use low-level,mid-level,and high-level information and improve the performance of brain tumor segmentation.We also introduce the Combo loss function to address input class imbalance and false positives and negatives imbalance in deep learning.The proposed method is trained and validated on the BraTS 2020 training and validation dataset.On the validation dataset,ourmethod achieved a mean Dice score of 0.907,0.830,and 0.787 for the whole tumor,tumor core,and enhancing tumor core,respectively.Compared with state-of-the-art methods,the multi-level parallel network has achieved competitive results on the validation dataset.
基金the Scientific Research Program Funded by Shaanxi Provincial Education Department(20JY058)。
文摘For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program.The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced.A common convolutional neural network(CNN)is built to verify the compatibility and functionality of the Pomelo processor.CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor.A specific algorithm for organizing a Z-shaped memory structure is developed,which addresses reducing memory access in mass data computing tasks.Performing the above-combined adaptation and optimization strategy,the experimental result demonstrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance.A 6.52 times performance is achieved on the 4 processing elements case.
基金The National Natural Science Foundation of China(No.61603091)。
文摘In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.
基金National Natural Science Foundation of China(Nos.61806051 and 61903078)Fundamental Research Funds for the Central Universities,China(Nos.2232021A-10 and 2232021D-32)Natural Science Foundation of Shanghai,China(No.20ZR1400400)。
文摘In the actual complex environment,the recognition accuracy of crop leaf disease is often not high.Inspired by the brain parallel interaction mechanism,a two-stream parallel interactive convolutional neural network(TSPI-CNN)is proposed to improve the recognition accuracy.TSPI-CNN includes a two-stream parallel network(TSP-Net)and a parallel interactive network(PI-Net).TSP-Net simulates the ventral and dorsal stream.PI-Net simulates the interaction between two pathways in the process of human brain visual information transmission.A large number of experiments shows that the proposed TSPI-CNN performs well on MK-D2,PlantVillage,Apple-3 leaf,and Cassava leaf datasets.Furthermore,the effect of numbers of interactions on the recognition performance of TSPI-CNN is discussed.The experimental results show that as the number of interactions increases,the recognition accuracy of the network also increases.Finally,the network is visualized to show the working mechanism of the network and provide enlightenment for future research.
基金Supported by the National Key Research and Development Project of China(No.2020AAA0104603)the National Natural Science Foundation of China(No.61834005,61772417)the Shaanxi Province Key R&D Plan(No.2021GY-029).
文摘To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic scheduling algorithms can generate local optima easily,while the dynamic programming algorithm has a long convergence time for complex structural models.This paper mainly studies the parallel scheduling between operators and proposes an inter-operator parallelism schedule(IOPS)scheduling algorithm that guarantees the minimum similar execution delay.Firstly,a graph partitioning algorithm based on the largest block is designed to split the neural network model into multiple subgraphs.Then,the operators that meet the conditions is replaced according to the defined operator replacement rules.Finally,the optimal scheduling method based on backtracking is used to schedule the computational graph.Network models such as Inception-v3,ResNet-50,and RandWire are selected for testing.The experimental results show that the algorithm designed in this paper can achieve a 1.6×speedup compared with the existing sequential execution methods.