Accurate preoperative prediction of lymph node metastasis(LNM)in esophageal cancer(EC)patients is of crucial clinical significance for treatment planning and prognosis.AIM To develop a clinical radiomics nomogram that...Accurate preoperative prediction of lymph node metastasis(LNM)in esophageal cancer(EC)patients is of crucial clinical significance for treatment planning and prognosis.AIM To develop a clinical radiomics nomogram that can predict the preoperative lymph node(LN)status in EC patients.METHODS A total of 32 EC patients confirmed by clinical pathology(who underwent surgical treatment)were included.Real-time fluorescent quantitative reverse transcription-polymerase chain reaction was used to detect the expression of B7-H3 mRNA in EC tissue obtained during preoperative gastroscopy,and its correlation with LNM was analyzed.Radiomics features were extracted from multi-modal magnetic resonance imaging of EC using Pyradiomics in Python.Feature extraction,data dimensionality reduction,and feature selection were performed using XGBoost model and leave-one-out cross-validation.Multivariable logistic regression analysis was used to establish the prediction model,which included radiomics features,LN status from computed tomography(CT)reports,and B7-H3 mRNA expression,represented by a radiomics nomogram.Receiver operating characteristic area under the curve(AUC)and decision curve analysis(DCA)were used to evaluate the predictive performance and clinical application value of the model.RESULTS The relative expression of B7-H3 mRNA in EC patients with LNM was higher than in those without metastasis,and the difference was statistically significant(P<0.05).The AUC value in the receiver operating characteristic(ROC)curve was 0.718(95%CI:0.528-0.907),with a sensitivity of 0.733 and specificity of 0.706,indicating good diagnostic performance.The individualized clinical prediction nomogram included radiomics features,LN status from CT reports,and B7-H3 mRNA expression.The ROC curve demonstrated good diagnostic value,with an AUC value of 0.765(95%CI:0.598-0.931),sensitivity of 0.800,and specificity of 0.706.DCA indicated the practical value of the radiomics nomogram in clinical practice.CONCLUSION This study developed a radiomics nomogram that includes radiomics features,LN status from CT reports,and B7-H3 mRNA expression,enabling convenient preoperative individualized prediction of LNM in EC patients.展开更多
Multimodal imaging,including augmented or mixed reality,transforms the physicians’interaction with clinical imaging,allowing more accurate data interpretation,better spatial resolution,and depth perception of the pat...Multimodal imaging,including augmented or mixed reality,transforms the physicians’interaction with clinical imaging,allowing more accurate data interpretation,better spatial resolution,and depth perception of the patient’s anatomy.We successfully overlay 3D holographic visualization to magnetic resonance imaging images for preoperative decision making of a complex case of cardiac tumour in a 7-year-old girl.展开更多
3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be...3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection.展开更多
In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection ...In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection.展开更多
We introduce in this paper an extension of the Multimodal Compression technique (MC) for the purpose of coding hyperspectral image sequences. The main idea requires few steps, namely: (1) reducing the size of the sequ...We introduce in this paper an extension of the Multimodal Compression technique (MC) for the purpose of coding hyperspectral image sequences. The main idea requires few steps, namely: (1) reducing the size of the sequence by inserting smooth images containing less information into the remaining images of the same sequence, (2) then coding the new compacted sequence using 3D-SPIHT algorithm. In this new scheme, called MC-3D-SPIHT, the insertion is achieved only in the contour of each image, according to a non-supervised way, so that one can preserve the Region of Interest (ROI) quality. For this purpose, a mixing function is employed. After the decoding process, inserted images are extracted by a separation function and the original sequence is reconstructed. By considering data from AVIRIS database, we will show how one decrease significantly the computing time for both coding and decoding.展开更多
针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成...针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成分分析(IPCA)对输入的数据进行降维预处理;其次将输入模型的像素分割成小的重叠的三维小卷积块,在分割的小块上基于中心像素形成地面标签,利用三维核函数进行卷积处理,形成连续的三维特征图,保留空谱特征。用3D-CNN同时提取空谱特征,然后在三维卷积中加入深度可分离卷积对空间特征再次提取,丰富空谱特征的同时减少参数量,从而减少计算时间,分类精度也有所提高。所提模型在Indian Pines、Salinas Scene和University of Pavia公开数据集上验证,并且同其他经典的分类方法进行比较。实验结果表明,该方法不仅能大幅度节省可学习的参数,降低模型复杂度,而且表现出较好的分类性能,其中总体精度(OA)、平均分类精度(AA)和Kappa系数均可达99%以上。展开更多
Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbase...Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.展开更多
In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous...In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous lung structures. Moreover, the nodules are of small size at their early stage of development. This poses a serious challenge to develop a Computer aided diagnosis (CAD) system with better false positive reduction. Hence, to reduce the false positives per scan and to deal with the challenges mentioned, this paper proposes a set of three diverse 3D Attention based CNN architectures (3D ACNN) whose predictions on given low dose Volumetric Computed Tomography (CT) scans are fused to achieve more effective and reliable results. Attention mechanism is employed to selectively concentrate/weigh more on nodule specific features and less weight age over other irrelevant features. By using this attention based mechanism in CNN unlike traditional methods there was a significant gain in the classification performance. Contextual dependencies are also taken into account by giving three patches of different sizes surrounding the nodule as input to the ACNN architectures. The system is trained and validated using a publicly available LUNA16 dataset in a 10 fold cross validation approach where a competition performance metric (CPM) score of 0.931 is achieved. The experimental results demonstrate that either a single patch or a single architecture in a one-to-one fashion that is adopted in earlier methods cannot achieve a better performance and signifies the necessity of fusing different multi patched architectures. Though the proposed system is mainly designed for pulmonary nodule detection it can be easily extended to classification tasks of any other 3D medical diagnostic computed tomography images where there is a huge variation and uncertainty in classification.展开更多
肺癌是长期威胁人类健康的恶性疾病之一,针对传统方法在肺癌CT图像分类中的预处理过程复杂、工作量大的问题,本文提出了基于三维卷积神经网络(3D-CNN)模型的肺部CT图像分类方法。该模型以卷积神经网络模型为基础,并在训练的过程中使用...肺癌是长期威胁人类健康的恶性疾病之一,针对传统方法在肺癌CT图像分类中的预处理过程复杂、工作量大的问题,本文提出了基于三维卷积神经网络(3D-CNN)模型的肺部CT图像分类方法。该模型以卷积神经网络模型为基础,并在训练的过程中使用特定顺序输入策略,还在公开的Kaggle Data Science Bowl 2017数据集上进行了实验。实验表明,该方法对图像的分类准确率达到76%,比采用随机顺序的输入策略时有所提升,能够为肺部病理图像的分类研究提供有价值的参考。展开更多
FIB-SEM tomography is a powerful technique that integrates a focused ion beam(FIB)and a scanning electron microscope(SEM)to capture high-resolution imaging data of nanostructures.This approach involves collecting in-p...FIB-SEM tomography is a powerful technique that integrates a focused ion beam(FIB)and a scanning electron microscope(SEM)to capture high-resolution imaging data of nanostructures.This approach involves collecting in-plane SEM imagesand using FIB to remove material layers for imaging subsequent planes,thereby producing image stacks.However,theseimage stacks in FIB-SEM tomography are subject to the shine-through effect,which makes structures visible from theposterior regions of the current plane.This artifact introduces an ambiguity between image intensity and structures in thecurrent plane,making conventional segmentation methods such as thresholding or the k-means algorithm insufficient.Inthis study,we propose a multimodal machine learning approach that combines intensity information obtained at differentelectron beam accelerating voltages to improve the three-dimensional(3D)reconstruction of nanostructures.By treatingthe increased shine-through effect at higher accelerating voltages as a form of additional information,the proposed methodsignificantly improves segmentation accuracy and leads to more precise 3D reconstructions for real FIB tomography data.展开更多
Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepin...Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepiness often have associated indicators that researchers can use to identify and promptly warn drowsy drivers to avoid potential accidents.This paper proposes a spatiotemporal model for monitoring drowsiness visual indicators from videos.This model depends on integrating a 3D convolutional neural network(3D-CNN)and long short-term memory(LSTM).The 3DCNN-LSTM can analyze long sequences by applying the 3D-CNN to extract spatiotemporal features within adjacent frames.The learned features are then used as the input of the LSTM component for modeling high-level temporal features.In addition,we investigate how the training of the proposed model can be affected by changing the position of the batch normalization(BN)layers in the 3D-CNN units.The BN layer is examined in two different placement settings:before the non-linear activation function and after the non-linear activation function.The study was conducted on two publicly available drowsy drivers datasets named 3MDAD and YawDD.3MDAD is mainly composed of two synchronized datasets recorded from the frontal and side views of the drivers.We show that the position of the BN layers increases the convergence speed and reduces overfitting on one dataset but not the other.As a result,the model achieves a test detection accuracy of 96%,93%,and 90%on YawDD,Side-3MDAD,and Front-3MDAD,respectively.展开更多
Medical image segmentation has consistently been a significant topic of research and a prominent goal,particularly in computer vision.Brain tumor research plays a major role in medical imaging applications by providin...Medical image segmentation has consistently been a significant topic of research and a prominent goal,particularly in computer vision.Brain tumor research plays a major role in medical imaging applications by providing a tremendous amount of anatomical and functional knowledge that enhances and allows easy diagnosis and disease therapy preparation.To prevent or minimize manual segmentation error,automated tumor segmentation,and detection became the most demanding process for radiologists and physicians as the tumor often has complex structures.Many methods for detection and segmentation presently exist,but all lack high accuracy.This paper’s key contribution focuses on evaluating machine learning techniques that are supposed to reduce the effect of frequently found issues in brain tumor research.Furthermore,attention concentrated on the challenges related to level set segmentation.The study proposed in this paper uses the Population-based Artificial Bee Colony Clustering(P-ABCC)methodology to reliably collect initial contour points,which helps minimize the number of iterations and segmentation errors of the level-set process.The proposed model measures cluster centroids(ABC populations)and uses a level-set approach to resolve contour differences as brain tumors vary as they have irregular form,structure,and volume.The suggested model comprises of three major steps:first,pre-processing to separate the brain from the head and improves contrast stretching.Secondly,P-ABCC is used to obtain tumor edges that are utilized as an initial MRI sequence contour.The level-set segmentation is then used to detect tumor regions from all volume slices with fewer iterations.Results suggest improved model efficiency compared to state-of-the-art methods for both datasets BRATS 2019 and BRATS 2017.At BRATS 2019,dice progress was achieved for Entire Tumor(WT),Tumor Center(TC),and Improved Tumor(ET)by 0.03%,0.03%,and 0.01%respectively.At BRATS 2017,an increase in precision for WT was reached by 5.27%.展开更多
Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a bi...Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.展开更多
目的探讨多模态磁共振成像(M R I)联合血清人生长分化因子3(GDF3)、热休克蛋白-90α(HSP90A)诊断乳腺癌(BRCA)的临床价值。方法收集2017年1月-2020年12月间在本院健康检查后怀疑为BRCA的96例乳腺疾病患者作为研究对象。以术后病理或穿...目的探讨多模态磁共振成像(M R I)联合血清人生长分化因子3(GDF3)、热休克蛋白-90α(HSP90A)诊断乳腺癌(BRCA)的临床价值。方法收集2017年1月-2020年12月间在本院健康检查后怀疑为BRCA的96例乳腺疾病患者作为研究对象。以术后病理或穿刺活检结果为标准,将疑似患者分为BRCA组65例,良性组31例。所有受试者接受多模态MRI检查;酶联免疫吸附法检测血清GDF3、HSP90A水平,ROC和四表格分析多模态MRI、血清GDF3、HSP90A水平单独及联合诊断BRCA的价值。结果BRCA组Ktrans、Kep、MD显著高于良性组,ADCslow、ADCfast、MK均低于良性组(P<0.05);DCE-MRI、IVIM及DKI参数(Kep、ADCslow及MK值)诊断BRCA的AUC分别为0.724、0.730、0.652,DCEMRI+IVIM+DKI的诊断效能高于单一模型(Z=2.287~3.793,P=0.001~0.022),AUC为0.839。BRCA组血清GDF3、HSP90A水平均显著高于良性组(P<0.05);血清GDF3、HSP90A水平诊断BRCA的AUC为0.828、0.817,敏感度、70.77%、66.15%;特异度分别为83.87%、93.55%。多模态MRI联合血清GDF3、HSP90A检出假阳性6例,假阴性6例,Kappa值为0.714(P<0.05),与病理结果一致性较高,联合诊断BRCA的灵敏度、阴性预测值及准确度明显高于多模态MRI、血清GDF3、HSP90A单独诊断(P<0.05)。结论多模态MRI联合血清GDF3、HSP90A水平诊断BRCA具有较高的敏感度和准确度,具有一定临床应用价值。展开更多
基金The Yancheng Key Research and Development Program(Social Development),No.YCBE202324。
文摘Accurate preoperative prediction of lymph node metastasis(LNM)in esophageal cancer(EC)patients is of crucial clinical significance for treatment planning and prognosis.AIM To develop a clinical radiomics nomogram that can predict the preoperative lymph node(LN)status in EC patients.METHODS A total of 32 EC patients confirmed by clinical pathology(who underwent surgical treatment)were included.Real-time fluorescent quantitative reverse transcription-polymerase chain reaction was used to detect the expression of B7-H3 mRNA in EC tissue obtained during preoperative gastroscopy,and its correlation with LNM was analyzed.Radiomics features were extracted from multi-modal magnetic resonance imaging of EC using Pyradiomics in Python.Feature extraction,data dimensionality reduction,and feature selection were performed using XGBoost model and leave-one-out cross-validation.Multivariable logistic regression analysis was used to establish the prediction model,which included radiomics features,LN status from computed tomography(CT)reports,and B7-H3 mRNA expression,represented by a radiomics nomogram.Receiver operating characteristic area under the curve(AUC)and decision curve analysis(DCA)were used to evaluate the predictive performance and clinical application value of the model.RESULTS The relative expression of B7-H3 mRNA in EC patients with LNM was higher than in those without metastasis,and the difference was statistically significant(P<0.05).The AUC value in the receiver operating characteristic(ROC)curve was 0.718(95%CI:0.528-0.907),with a sensitivity of 0.733 and specificity of 0.706,indicating good diagnostic performance.The individualized clinical prediction nomogram included radiomics features,LN status from CT reports,and B7-H3 mRNA expression.The ROC curve demonstrated good diagnostic value,with an AUC value of 0.765(95%CI:0.598-0.931),sensitivity of 0.800,and specificity of 0.706.DCA indicated the practical value of the radiomics nomogram in clinical practice.CONCLUSION This study developed a radiomics nomogram that includes radiomics features,LN status from CT reports,and B7-H3 mRNA expression,enabling convenient preoperative individualized prediction of LNM in EC patients.
文摘Multimodal imaging,including augmented or mixed reality,transforms the physicians’interaction with clinical imaging,allowing more accurate data interpretation,better spatial resolution,and depth perception of the patient’s anatomy.We successfully overlay 3D holographic visualization to magnetic resonance imaging images for preoperative decision making of a complex case of cardiac tumour in a 7-year-old girl.
基金supported by the Financial Support of the Key Research and Development Projects of Anhui (202104a05020003)the Natural Science Foundation of Anhui Province (2208085MF173)the Anhui Development and Reform Commission Supports R&D and Innovation Projects ([2020]479).
文摘3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection.
基金The authors would like to thank the financial support of Natural Science Foundation of Anhui Province(No.2208085MF173)the key research and development projects of Anhui(202104a05020003)+2 种基金the anhui development and reform commission supports R&D and innovation project([2020]479)the national natural science foundation of China(51575001)Anhui university scientific research platform innovation team building project(2016-2018).
文摘In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection.
文摘We introduce in this paper an extension of the Multimodal Compression technique (MC) for the purpose of coding hyperspectral image sequences. The main idea requires few steps, namely: (1) reducing the size of the sequence by inserting smooth images containing less information into the remaining images of the same sequence, (2) then coding the new compacted sequence using 3D-SPIHT algorithm. In this new scheme, called MC-3D-SPIHT, the insertion is achieved only in the contour of each image, according to a non-supervised way, so that one can preserve the Region of Interest (ROI) quality. For this purpose, a mixing function is employed. After the decoding process, inserted images are extracted by a separation function and the original sequence is reconstructed. By considering data from AVIRIS database, we will show how one decrease significantly the computing time for both coding and decoding.
文摘针对卷积神经网络在高光谱图像特征提取和分类的过程中,存在空谱特征提取不充分以及网络层数太多引起的参数量大、计算复杂的问题,提出快速三维卷积神经网络(3D-CNN)结合深度可分离卷积(DSC)的轻量型卷积模型。该方法首先利用增量主成分分析(IPCA)对输入的数据进行降维预处理;其次将输入模型的像素分割成小的重叠的三维小卷积块,在分割的小块上基于中心像素形成地面标签,利用三维核函数进行卷积处理,形成连续的三维特征图,保留空谱特征。用3D-CNN同时提取空谱特征,然后在三维卷积中加入深度可分离卷积对空间特征再次提取,丰富空谱特征的同时减少参数量,从而减少计算时间,分类精度也有所提高。所提模型在Indian Pines、Salinas Scene和University of Pavia公开数据集上验证,并且同其他经典的分类方法进行比较。实验结果表明,该方法不仅能大幅度节省可学习的参数,降低模型复杂度,而且表现出较好的分类性能,其中总体精度(OA)、平均分类精度(AA)和Kappa系数均可达99%以上。
文摘Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.
文摘In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous lung structures. Moreover, the nodules are of small size at their early stage of development. This poses a serious challenge to develop a Computer aided diagnosis (CAD) system with better false positive reduction. Hence, to reduce the false positives per scan and to deal with the challenges mentioned, this paper proposes a set of three diverse 3D Attention based CNN architectures (3D ACNN) whose predictions on given low dose Volumetric Computed Tomography (CT) scans are fused to achieve more effective and reliable results. Attention mechanism is employed to selectively concentrate/weigh more on nodule specific features and less weight age over other irrelevant features. By using this attention based mechanism in CNN unlike traditional methods there was a significant gain in the classification performance. Contextual dependencies are also taken into account by giving three patches of different sizes surrounding the nodule as input to the ACNN architectures. The system is trained and validated using a publicly available LUNA16 dataset in a 10 fold cross validation approach where a competition performance metric (CPM) score of 0.931 is achieved. The experimental results demonstrate that either a single patch or a single architecture in a one-to-one fashion that is adopted in earlier methods cannot achieve a better performance and signifies the necessity of fusing different multi patched architectures. Though the proposed system is mainly designed for pulmonary nodule detection it can be easily extended to classification tasks of any other 3D medical diagnostic computed tomography images where there is a huge variation and uncertainty in classification.
文摘肺癌是长期威胁人类健康的恶性疾病之一,针对传统方法在肺癌CT图像分类中的预处理过程复杂、工作量大的问题,本文提出了基于三维卷积神经网络(3D-CNN)模型的肺部CT图像分类方法。该模型以卷积神经网络模型为基础,并在训练的过程中使用特定顺序输入策略,还在公开的Kaggle Data Science Bowl 2017数据集上进行了实验。实验表明,该方法对图像的分类准确率达到76%,比采用随机顺序的输入策略时有所提升,能够为肺部病理图像的分类研究提供有价值的参考。
基金funded by the Deutsche Forschungsgemein-schaft(DFG,German Research Foundation)-SFB 986-Project number 192346071.
文摘FIB-SEM tomography is a powerful technique that integrates a focused ion beam(FIB)and a scanning electron microscope(SEM)to capture high-resolution imaging data of nanostructures.This approach involves collecting in-plane SEM imagesand using FIB to remove material layers for imaging subsequent planes,thereby producing image stacks.However,theseimage stacks in FIB-SEM tomography are subject to the shine-through effect,which makes structures visible from theposterior regions of the current plane.This artifact introduces an ambiguity between image intensity and structures in thecurrent plane,making conventional segmentation methods such as thresholding or the k-means algorithm insufficient.Inthis study,we propose a multimodal machine learning approach that combines intensity information obtained at differentelectron beam accelerating voltages to improve the three-dimensional(3D)reconstruction of nanostructures.By treatingthe increased shine-through effect at higher accelerating voltages as a form of additional information,the proposed methodsignificantly improves segmentation accuracy and leads to more precise 3D reconstructions for real FIB tomography data.
文摘Today,fatalities,physical injuries,and significant economic losses occur due to car accidents.Among the leading causes of car accidents is drowsiness behind the wheel,which can affect any driver.Drowsiness and sleepiness often have associated indicators that researchers can use to identify and promptly warn drowsy drivers to avoid potential accidents.This paper proposes a spatiotemporal model for monitoring drowsiness visual indicators from videos.This model depends on integrating a 3D convolutional neural network(3D-CNN)and long short-term memory(LSTM).The 3DCNN-LSTM can analyze long sequences by applying the 3D-CNN to extract spatiotemporal features within adjacent frames.The learned features are then used as the input of the LSTM component for modeling high-level temporal features.In addition,we investigate how the training of the proposed model can be affected by changing the position of the batch normalization(BN)layers in the 3D-CNN units.The BN layer is examined in two different placement settings:before the non-linear activation function and after the non-linear activation function.The study was conducted on two publicly available drowsy drivers datasets named 3MDAD and YawDD.3MDAD is mainly composed of two synchronized datasets recorded from the frontal and side views of the drivers.We show that the position of the BN layers increases the convergence speed and reduces overfitting on one dataset but not the other.As a result,the model achieves a test detection accuracy of 96%,93%,and 90%on YawDD,Side-3MDAD,and Front-3MDAD,respectively.
文摘Medical image segmentation has consistently been a significant topic of research and a prominent goal,particularly in computer vision.Brain tumor research plays a major role in medical imaging applications by providing a tremendous amount of anatomical and functional knowledge that enhances and allows easy diagnosis and disease therapy preparation.To prevent or minimize manual segmentation error,automated tumor segmentation,and detection became the most demanding process for radiologists and physicians as the tumor often has complex structures.Many methods for detection and segmentation presently exist,but all lack high accuracy.This paper’s key contribution focuses on evaluating machine learning techniques that are supposed to reduce the effect of frequently found issues in brain tumor research.Furthermore,attention concentrated on the challenges related to level set segmentation.The study proposed in this paper uses the Population-based Artificial Bee Colony Clustering(P-ABCC)methodology to reliably collect initial contour points,which helps minimize the number of iterations and segmentation errors of the level-set process.The proposed model measures cluster centroids(ABC populations)and uses a level-set approach to resolve contour differences as brain tumors vary as they have irregular form,structure,and volume.The suggested model comprises of three major steps:first,pre-processing to separate the brain from the head and improves contrast stretching.Secondly,P-ABCC is used to obtain tumor edges that are utilized as an initial MRI sequence contour.The level-set segmentation is then used to detect tumor regions from all volume slices with fewer iterations.Results suggest improved model efficiency compared to state-of-the-art methods for both datasets BRATS 2019 and BRATS 2017.At BRATS 2019,dice progress was achieved for Entire Tumor(WT),Tumor Center(TC),and Improved Tumor(ET)by 0.03%,0.03%,and 0.01%respectively.At BRATS 2017,an increase in precision for WT was reached by 5.27%.
基金supported by the National Natural Science Foundation of China(Nos.61862058,61962034,and 8226070356)in part by the Gansu Provincial Science&Technology Department(No.20JR10RA076)。
文摘Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.
文摘目的探讨多模态磁共振成像(M R I)联合血清人生长分化因子3(GDF3)、热休克蛋白-90α(HSP90A)诊断乳腺癌(BRCA)的临床价值。方法收集2017年1月-2020年12月间在本院健康检查后怀疑为BRCA的96例乳腺疾病患者作为研究对象。以术后病理或穿刺活检结果为标准,将疑似患者分为BRCA组65例,良性组31例。所有受试者接受多模态MRI检查;酶联免疫吸附法检测血清GDF3、HSP90A水平,ROC和四表格分析多模态MRI、血清GDF3、HSP90A水平单独及联合诊断BRCA的价值。结果BRCA组Ktrans、Kep、MD显著高于良性组,ADCslow、ADCfast、MK均低于良性组(P<0.05);DCE-MRI、IVIM及DKI参数(Kep、ADCslow及MK值)诊断BRCA的AUC分别为0.724、0.730、0.652,DCEMRI+IVIM+DKI的诊断效能高于单一模型(Z=2.287~3.793,P=0.001~0.022),AUC为0.839。BRCA组血清GDF3、HSP90A水平均显著高于良性组(P<0.05);血清GDF3、HSP90A水平诊断BRCA的AUC为0.828、0.817,敏感度、70.77%、66.15%;特异度分别为83.87%、93.55%。多模态MRI联合血清GDF3、HSP90A检出假阳性6例,假阴性6例,Kappa值为0.714(P<0.05),与病理结果一致性较高,联合诊断BRCA的灵敏度、阴性预测值及准确度明显高于多模态MRI、血清GDF3、HSP90A单独诊断(P<0.05)。结论多模态MRI联合血清GDF3、HSP90A水平诊断BRCA具有较高的敏感度和准确度,具有一定临床应用价值。