目前基于3D-ConvNet的行为识别算法普遍使用全局平均池化(global average pooling,GAP)压缩特征信息,但会产生信息损失、信息冗余和网络过拟合等问题。为了解决上述问题,更好地保留卷积层提取到的高级语义信息,提出了基于全局频域池化(g...目前基于3D-ConvNet的行为识别算法普遍使用全局平均池化(global average pooling,GAP)压缩特征信息,但会产生信息损失、信息冗余和网络过拟合等问题。为了解决上述问题,更好地保留卷积层提取到的高级语义信息,提出了基于全局频域池化(global frequency domain pooling,GFDP)的行为识别算法。首先,根据离散余弦变换(discrete cosine transform,DCT)看出,GAP是频域中特征分解的一种特例,从而引入更多频率分量增加特征通道间的特异性,减少信息压缩后的信息冗余;其次,为了更好地抑制过拟合问题,引入卷积层的批标准化策略,并将其拓展在以ERB(efficient residual block)-Res3D为骨架的行为识别模型的全连接层以优化数据分布;最后,将该方法在UCF101数据集上进行验证。结果表明,模型计算量为3.5 GFlops,参数量为7.4 M,最终的识别准确率在ERB-Res3D模型的基础上提升了3.9%,在原始Res3D模型基础上提升了17.4%,高效实现了更加准确的行为识别结果。展开更多
The interpretability of deep learning models has emerged as a compelling area in artificial intelligence research.The safety criteria for medical imaging are highly stringent,and models are required for an explanation...The interpretability of deep learning models has emerged as a compelling area in artificial intelligence research.The safety criteria for medical imaging are highly stringent,and models are required for an explanation.However,existing convolutional neural network solutions for left ventricular segmentation are viewed in terms of inputs and outputs.Thus,the interpretability of CNNs has come into the spotlight.Since medical imaging data are limited,many methods to fine-tune medical imaging models that are popular in transfer models have been built using massive public Image Net datasets by the transfer learning method.Unfortunately,this generates many unreliable parameters and makes it difficult to generate plausible explanations from these models.In this study,we trained from scratch rather than relying on transfer learning,creating a novel interpretable approach for autonomously segmenting the left ventricle with a cardiac MRI.Our enhanced GPU training system implemented interpretable global average pooling for graphics using deep learning.The deep learning tasks were simplified.Simplification included data management,neural network architecture,and training.Our system monitored and analyzed the gradient changes of different layers with dynamic visualizations in real-time and selected the optimal deployment model.Our results demonstrated that the proposed method was feasible and efficient:the Dice coefficient reached 94.48%,and the accuracy reached 99.7%.It was found that no current transfer learning models could perform comparably to the ImageNet transfer learning architectures.This model is lightweight and more convenient to deploy on mobile devices than transfer learning models.展开更多
Building an automatic fish recognition and detection system for largescale fish classes is helpful for marine researchers and marine scientists because there are large numbers of fish species.However,it is quite diffi...Building an automatic fish recognition and detection system for largescale fish classes is helpful for marine researchers and marine scientists because there are large numbers of fish species.However,it is quite difficult to build such systems owing to the lack of data imbalance problems and large number of classes.To solve these issues,we propose a transfer learning-based technique in which we use Efficient-Net,which is pre-trained on ImageNet dataset and fine-tuned on QuT Fish Database,which is a large scale dataset.Furthermore,prior to the activation layer,we use Global Average Pooling(GAP)instead of dense layer with the aim of averaging the results of predictions along with having more information compared to the dense layer.To check the validity of our model,we validate our model on the validation set which achieves satisfactory results.Also,for the localization task,we propose an architecture that consists of localization aware block,which captures localization information for better prediction and residual connections to handle the over-fitting problem.Actually,the residual connections help the layer to combine missing information with the relevant one.In addition,we use class weights and Focal Loss(FL)to handle class imbalance problems along with reducing false predictions.Actually,class weights assign less weights to classes having fewer instances and large weights to classes having more number of instances.During the localization,the qualitative assessment shows that we achieve 57%Mean Intersection Over Union(IoU)on testing data,and the classification results show 75%precision,70%recall,78%accuracy and 74%F1-Score for 468 fish species.展开更多
基金The National Natural Science Foundation of China (62176048)provided funding for this research.
文摘The interpretability of deep learning models has emerged as a compelling area in artificial intelligence research.The safety criteria for medical imaging are highly stringent,and models are required for an explanation.However,existing convolutional neural network solutions for left ventricular segmentation are viewed in terms of inputs and outputs.Thus,the interpretability of CNNs has come into the spotlight.Since medical imaging data are limited,many methods to fine-tune medical imaging models that are popular in transfer models have been built using massive public Image Net datasets by the transfer learning method.Unfortunately,this generates many unreliable parameters and makes it difficult to generate plausible explanations from these models.In this study,we trained from scratch rather than relying on transfer learning,creating a novel interpretable approach for autonomously segmenting the left ventricle with a cardiac MRI.Our enhanced GPU training system implemented interpretable global average pooling for graphics using deep learning.The deep learning tasks were simplified.Simplification included data management,neural network architecture,and training.Our system monitored and analyzed the gradient changes of different layers with dynamic visualizations in real-time and selected the optimal deployment model.Our results demonstrated that the proposed method was feasible and efficient:the Dice coefficient reached 94.48%,and the accuracy reached 99.7%.It was found that no current transfer learning models could perform comparably to the ImageNet transfer learning architectures.This model is lightweight and more convenient to deploy on mobile devices than transfer learning models.
基金Zamil S.Alzamil would like to thank Deanship of Scientific Research at Majmaah University for supporting this work under Project No.R-2022-172.
文摘Building an automatic fish recognition and detection system for largescale fish classes is helpful for marine researchers and marine scientists because there are large numbers of fish species.However,it is quite difficult to build such systems owing to the lack of data imbalance problems and large number of classes.To solve these issues,we propose a transfer learning-based technique in which we use Efficient-Net,which is pre-trained on ImageNet dataset and fine-tuned on QuT Fish Database,which is a large scale dataset.Furthermore,prior to the activation layer,we use Global Average Pooling(GAP)instead of dense layer with the aim of averaging the results of predictions along with having more information compared to the dense layer.To check the validity of our model,we validate our model on the validation set which achieves satisfactory results.Also,for the localization task,we propose an architecture that consists of localization aware block,which captures localization information for better prediction and residual connections to handle the over-fitting problem.Actually,the residual connections help the layer to combine missing information with the relevant one.In addition,we use class weights and Focal Loss(FL)to handle class imbalance problems along with reducing false predictions.Actually,class weights assign less weights to classes having fewer instances and large weights to classes having more number of instances.During the localization,the qualitative assessment shows that we achieve 57%Mean Intersection Over Union(IoU)on testing data,and the classification results show 75%precision,70%recall,78%accuracy and 74%F1-Score for 468 fish species.