Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients ar...The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients around the zero value are very few, so we cannot sparsely represent low-frequency image information. The low-frequency component contains the main energy of the image and depicts the profile of the image. Direct fusion of the low-frequency component will not be conducive to obtain highly accurate fusion result. Therefore, this paper presents an infrared and visible image fusion method combining the multi-scale and top-hat transforms. On one hand, the new top-hat-transform can effectively extract the salient features of the low-frequency component. On the other hand, the multi-scale transform can extract highfrequency detailed information in multiple scales and from diverse directions. The combination of the two methods is conducive to the acquisition of more characteristics and more accurate fusion results. Among them, for the low-frequency component, a new type of top-hat transform is used to extract low-frequency features, and then different fusion rules are applied to fuse the low-frequency features and low-frequency background; for high-frequency components, the product of characteristics method is used to integrate the detailed information in high-frequency. Experimental results show that the proposed algorithm can obtain more detailed information and clearer infrared target fusion results than the traditional multiscale transform methods. Compared with the state-of-the-art fusion methods based on sparse representation, the proposed algorithm is simple and efficacious, and the time consumption is significantly reduced.展开更多
In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Poste...In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Posteriori(MAP)approach by considering a trivariate statistical model for the local neighboring of 2D-SMCWT coefficients.For the approx imation coefficients,a new fusion rule based on the Principal Component Analysis(PCA)is applied.We conduct several experiments using three different groups of multimodal medical images to evaluate the performance of the proposed method.The obt ained results prove the superiority of the proposed method over the state of the art fusion methods in terms of visual quality and several commonly used metrics.Robustness of the proposed method is further tested against different types of noise.The plots of fusion met rics establish the accuracy of the proposed fusion method.展开更多
Performance of Video Question and Answer(VQA)systems relies on capturing key information of both visual images and natural language in the context to generate relevant questions’answers.However,traditional linear com...Performance of Video Question and Answer(VQA)systems relies on capturing key information of both visual images and natural language in the context to generate relevant questions’answers.However,traditional linear combinations of multimodal features focus only on shallow feature interactions,fall far short of the need of deep feature fusion.Attention mechanisms were used to perform deep fusion,but most of them can only process weight assignment of single-modal information,leading to attention imbalance for different modalities.To address above problems,we propose a novel VQA model based on Triple Multimodal feature Cyclic Fusion(TMCF)and Self-AdaptiveMultimodal Balancing Mechanism(SAMB).Our model is designed to enhance complex feature interactions among multimodal features with cross-modal information balancing.In addition,TMCF and SAMB can be used as an extensible plug-in for exploring new feature combinations in the visual image domain.Extensive experiments were conducted on MSVDQA and MSRVTT-QA datasets.The results confirm the advantages of our approach in handling multimodal tasks.Besides,we also provide analyses for ablation studies to verify the effectiveness of each proposed component.展开更多
To eliminate unnecessary background information,such as soft tissues in original CT images and the adverse impact of the similarity of adjacent spines on lumbar image segmentation and surgical path planning,a two‐sta...To eliminate unnecessary background information,such as soft tissues in original CT images and the adverse impact of the similarity of adjacent spines on lumbar image segmentation and surgical path planning,a two‐stage approach for localising lumbar segments is proposed.First,based on the multi‐scale feature fusion technology,a non‐linear regression method is used to achieve accurate localisation of the overall spatial region of the lumbar spine,effectively eliminating useless background information,such as soft tissues.In the second stage,we directly realised the precise positioning of each segment in the lumbar spine space region based on the non‐linear regression method,thus effectively eliminating the interference caused by the adjacent spine.The 3D Intersection over Union(3D_IOU)is used as the main evaluation indicator for the positioning accuracy.On an open dataset,3D_IOU values of 0.8339�0.0990 and 0.8559�0.0332 in the first and second stages,respectively is achieved.In addition,the average time required for the proposed method in the two stages is 0.3274 and 0.2105 s respectively.Therefore,the proposed method performs very well in terms of both pre-cision and speed and can effectively improve the accuracy of lumbar image segmentation and the effect of surgical path planning.展开更多
AIM: To examine whether the sedative effects assessed by psychomotor tests would depend on the cytochrome P450 (CYP ) 2C19 genotypes after an infusion regimen of diazepam commonly used forgastrointestinal endoscopy in...AIM: To examine whether the sedative effects assessed by psychomotor tests would depend on the cytochrome P450 (CYP ) 2C19 genotypes after an infusion regimen of diazepam commonly used forgastrointestinal endoscopy in Japan. METHODS: Fifteen healthy Japanese volunteers consisting of three different CYP2C19 genotype groups underwent a critical ? icker fusion test, an eye movement analysis and a postural sway test as a test for physical sedative effects, and a visual analog scale (VAS) symptom assessment method as a test for mental sedative effects during the 336 h period after the intravenous infusion of diazepam (5 mg). RESULTS: The physical sedative effects assessed by the critical flicker test continued for 1 h (t values of 5 min, 30 min and 60 min later: 4.35, 5.00 and 3.19, respectively) and those by the moving radial area of a postural sway test continued for 3 h (t values of 5 h, 30 h, 60 min and 3 h later: -4.05, -3.42, -2.17 and -2.58, respectively), which changed significantly compared with the baseline level before infusion (P < 0.05). On the other hand, the mental sedative effects by the VAS method improved within 1 h. The CYP2C19 genotype-dependent differences in the postinfusion sedative effects were not observed in any of the four psychomotor function tests. CONCLUSION: With the psychomotor tests, the objective sedative effects of diazepam continued for 1 h to 3 h irrespective of CYP2C19 genotype status and the subjective sedative symptoms improved within 1 h. Up to 3 h of clinical care appears to be required after the infusion of diazepam, although patients feel subjectively improved.展开更多
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金Project supported by the National Natural Science Foundation of China(Grant No.61402368)Aerospace Support Fund,China(Grant No.2017-HT-XGD)Aerospace Science and Technology Innovation Foundation,China(Grant No.2017 ZD 53047)
文摘The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients around the zero value are very few, so we cannot sparsely represent low-frequency image information. The low-frequency component contains the main energy of the image and depicts the profile of the image. Direct fusion of the low-frequency component will not be conducive to obtain highly accurate fusion result. Therefore, this paper presents an infrared and visible image fusion method combining the multi-scale and top-hat transforms. On one hand, the new top-hat-transform can effectively extract the salient features of the low-frequency component. On the other hand, the multi-scale transform can extract highfrequency detailed information in multiple scales and from diverse directions. The combination of the two methods is conducive to the acquisition of more characteristics and more accurate fusion results. Among them, for the low-frequency component, a new type of top-hat transform is used to extract low-frequency features, and then different fusion rules are applied to fuse the low-frequency features and low-frequency background; for high-frequency components, the product of characteristics method is used to integrate the detailed information in high-frequency. Experimental results show that the proposed algorithm can obtain more detailed information and clearer infrared target fusion results than the traditional multiscale transform methods. Compared with the state-of-the-art fusion methods based on sparse representation, the proposed algorithm is simple and efficacious, and the time consumption is significantly reduced.
文摘In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Posteriori(MAP)approach by considering a trivariate statistical model for the local neighboring of 2D-SMCWT coefficients.For the approx imation coefficients,a new fusion rule based on the Principal Component Analysis(PCA)is applied.We conduct several experiments using three different groups of multimodal medical images to evaluate the performance of the proposed method.The obt ained results prove the superiority of the proposed method over the state of the art fusion methods in terms of visual quality and several commonly used metrics.Robustness of the proposed method is further tested against different types of noise.The plots of fusion met rics establish the accuracy of the proposed fusion method.
基金This work was supported by the National Natural Science Foundation of China(No.61872231)the National Key Research and Development Program of China(No.2021YFC2801000)the Major Research plan of the National Social Science Foundation of China(No.20&ZD130).
文摘Performance of Video Question and Answer(VQA)systems relies on capturing key information of both visual images and natural language in the context to generate relevant questions’answers.However,traditional linear combinations of multimodal features focus only on shallow feature interactions,fall far short of the need of deep feature fusion.Attention mechanisms were used to perform deep fusion,but most of them can only process weight assignment of single-modal information,leading to attention imbalance for different modalities.To address above problems,we propose a novel VQA model based on Triple Multimodal feature Cyclic Fusion(TMCF)and Self-AdaptiveMultimodal Balancing Mechanism(SAMB).Our model is designed to enhance complex feature interactions among multimodal features with cross-modal information balancing.In addition,TMCF and SAMB can be used as an extensible plug-in for exploring new feature combinations in the visual image domain.Extensive experiments were conducted on MSVDQA and MSRVTT-QA datasets.The results confirm the advantages of our approach in handling multimodal tasks.Besides,we also provide analyses for ablation studies to verify the effectiveness of each proposed component.
基金Original Innovation Joint Fund:L202010 and the National Key Research and Development Program of China:2018YFB1307604National Key Research and Development Program of China,Grant/Award Numbers:2018YFB1307604。
文摘To eliminate unnecessary background information,such as soft tissues in original CT images and the adverse impact of the similarity of adjacent spines on lumbar image segmentation and surgical path planning,a two‐stage approach for localising lumbar segments is proposed.First,based on the multi‐scale feature fusion technology,a non‐linear regression method is used to achieve accurate localisation of the overall spatial region of the lumbar spine,effectively eliminating useless background information,such as soft tissues.In the second stage,we directly realised the precise positioning of each segment in the lumbar spine space region based on the non‐linear regression method,thus effectively eliminating the interference caused by the adjacent spine.The 3D Intersection over Union(3D_IOU)is used as the main evaluation indicator for the positioning accuracy.On an open dataset,3D_IOU values of 0.8339�0.0990 and 0.8559�0.0332 in the first and second stages,respectively is achieved.In addition,the average time required for the proposed method in the two stages is 0.3274 and 0.2105 s respectively.Therefore,the proposed method performs very well in terms of both pre-cision and speed and can effectively improve the accuracy of lumbar image segmentation and the effect of surgical path planning.
基金Grants-in-Aid for Scientific Research from YOKOYAMA Foundation for Clinical Pharmacologya Grant-in-Aid from the Center of Excellence (COE) from the Ministry of Education, Culture, Sports, Science and Technology of JapanGrant-in-Aid from the Ministry of Education, Culture, Sports, Science and Technology of Japan No. 17590470
文摘AIM: To examine whether the sedative effects assessed by psychomotor tests would depend on the cytochrome P450 (CYP ) 2C19 genotypes after an infusion regimen of diazepam commonly used forgastrointestinal endoscopy in Japan. METHODS: Fifteen healthy Japanese volunteers consisting of three different CYP2C19 genotype groups underwent a critical ? icker fusion test, an eye movement analysis and a postural sway test as a test for physical sedative effects, and a visual analog scale (VAS) symptom assessment method as a test for mental sedative effects during the 336 h period after the intravenous infusion of diazepam (5 mg). RESULTS: The physical sedative effects assessed by the critical flicker test continued for 1 h (t values of 5 min, 30 min and 60 min later: 4.35, 5.00 and 3.19, respectively) and those by the moving radial area of a postural sway test continued for 3 h (t values of 5 h, 30 h, 60 min and 3 h later: -4.05, -3.42, -2.17 and -2.58, respectively), which changed significantly compared with the baseline level before infusion (P < 0.05). On the other hand, the mental sedative effects by the VAS method improved within 1 h. The CYP2C19 genotype-dependent differences in the postinfusion sedative effects were not observed in any of the four psychomotor function tests. CONCLUSION: With the psychomotor tests, the objective sedative effects of diazepam continued for 1 h to 3 h irrespective of CYP2C19 genotype status and the subjective sedative symptoms improved within 1 h. Up to 3 h of clinical care appears to be required after the infusion of diazepam, although patients feel subjectively improved.
文摘由于低照度图像具有对比度低、细节丢失严重、噪声大等缺点,现有的目标检测算法对低照度图像的检测效果不理想.为此,本文提出一种结合空间感知注意力机制和多尺度特征融合(Spatial-aware Attention Mechanism and Multi-Scale Feature Fusion,SAM-MSFF)的低照度目标检测方法 .该方法首先通过多尺度交互内存金字塔融合多尺度特征,增强低照度图像特征中的有效信息,并设置内存向量存储样本的特征,捕获样本之间的潜在关联性;然后,引入空间感知注意力机制获取特征在空间域的长距离上下文信息和局部信息,从而增强低照度图像中的目标特征,抑制背景信息和噪声的干扰;最后,利用多感受野增强模块扩张特征的感受野,对具有不同感受野的特征进行分组重加权计算,使检测网络根据输入的多尺度信息自适应地调整感受野的大小.在ExDark数据集上进行实验,本文方法的平均精度(mean Average Precision,mAP)达到77.04%,比现有的主流目标检测方法提高2.6%~14.34%.