The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resoluti...In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resolution,single viewpoint,and occlusion.Different from the existing works predicting symmetry from the complete shape,we propose a learning approach for symmetry predic-tion based on a single RGB-D image.Instead of directly predicting the symmetry from incomplete shapes,our method consists of two modules,i.e.,the multi-mod-al feature fusion module and the detection-by-reconstruction module.Firstly,we build a channel-transformer network(CTN)to extract cross-fusion features from the RGB-D as the multi-modal feature fusion module,which helps us aggregate features from the color and the depth separately.Then,our self-reconstruction net-work based on a 3D variational auto-encoder(3D-VAE)takes the global geo-metric features as input,followed by a prediction symmetry network to detect the symmetry.Our experiments are conducted on three public datasets:ShapeNet,YCB,and ScanNet,we demonstrate that our method can produce reliable and accurate results.展开更多
Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual r...Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.展开更多
An extended ocean-atmosphere coupled characteristic system including thermodynamic physical processes in ocean mixed layer is formulated in order to describe SST explicitly and remove possible limitation of ocean-atmo...An extended ocean-atmosphere coupled characteristic system including thermodynamic physical processes in ocean mixed layer is formulated in order to describe SST explicitly and remove possible limitation of ocean-atmosphere coupling assumption in hydrodynamic ENSO models. It is revealed that there is a kind of abrupt nonlinear characteristic behaviour, which relates to rapid onset and intermittency of El Nino events, on the second order slow time scale due to the nonlinear interaction between a linear unstable low-frequency primary eigen component of ocean-atmosphere coupled Kelvin wave and its higher order harmonic components under a strong ocean-atmosphere coupling background. And, on the other hand, there is a kind of finite amplitude nonlinear characteristic behaviour on the second order slow time scale due to the nonlinear interaction between the linear unstable primary eigen component and its higher order harmonic components under a weak ocean-atmosphere coupling background in this model system.展开更多
A novel face recognition method, which is a fusion of muhi-modal face parts based on Gabor feature (MMP-GF), is proposed in this paper. Firstly, the bare face image detached from the normalized image was convolved w...A novel face recognition method, which is a fusion of muhi-modal face parts based on Gabor feature (MMP-GF), is proposed in this paper. Firstly, the bare face image detached from the normalized image was convolved with a family of Gabor kernels, and then according to the face structure and the key-points locations, the calculated Gabor images were divided into five parts: Gabor face, Gabor eyebrow, Gabor eye, Gabor nose and Gabor mouth. After that multi-modal Gabor features were spatially partitioned into non-overlapping regions and the averages of regions were concatenated to be a low dimension feature vector, whose dimension was further reduced by principal component analysis (PCA). In the decision level fusion, match results respectively calculated based on the five parts were combined according to linear discriminant analysis (LDA) and a normalized matching algorithm was used to improve the performance. Experiments on FERET database show that the proposed MMP-GF method achieves good robustness to the expression and age variations.展开更多
In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is pro...In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels.展开更多
AIM:To describe the clinical,electrophysiological,and genetic features of an unusual case with an RDH12 homozygous pathogenic variant and reviewed the characteristics of the patients reported with the same variant.MET...AIM:To describe the clinical,electrophysiological,and genetic features of an unusual case with an RDH12 homozygous pathogenic variant and reviewed the characteristics of the patients reported with the same variant.METHODS:The patient underwent a complete ophthalmologic examination including best-corrected visual acuity,anterior segment and dilated fundus,visual field,spectral-domain optical coherence tomography(OCT)and electroretinogram(ERG).The retinal disease panel genes were sequenced through chip capture high-throughput sequencing and Sanger sequencing was used to confirm the result.Then we reviewed the characteristics of the patients reported with the same variant.RESULTS:A 30-year male presented with severe early retinal degeneration who complained night blindness,decreased visual acuity,vitreous floaters and amaurosis fugax.The best corrected vision was 0.04 OD and 0.12 OS,respectively.The fundus photo and OCT showed bilateral macular atrophy but larger areas of macular atrophy in the left eye.Autofluorescence shows bilateral symmetrical hypo-autofluorescence.ERG revealed that the amplitudes of a-and b-wave were severely decreased.Multifocal ERG showed decreased amplitudes in the local macular area.A homozygous missense variant c.146C>T(chr14:68191267)was found.The clinical characteristics of a total of 13 patients reported with the same pathologic variant varied.CONCLUSION:An unusual patient with a homozygous pathogenic variant in the c.146C>T of RDH12 which causes late-onset and asymmetric retinal degeneration are reported.The clinical manifestations of the patient with multimodal retinal imaging and functional examinations have enriched our understanding of this disease.展开更多
大体积混凝土结构被广泛应用于土木、水利等领域的重大工程中,而混凝土抗拉强度低的力学特性决定了其易产生裂纹,因此,发展高效的检测方法,识别大体积混凝土结构中的裂纹信息十分必要.论文提出了一种新的方法,通过提取响应信号频谱中特...大体积混凝土结构被广泛应用于土木、水利等领域的重大工程中,而混凝土抗拉强度低的力学特性决定了其易产生裂纹,因此,发展高效的检测方法,识别大体积混凝土结构中的裂纹信息十分必要.论文提出了一种新的方法,通过提取响应信号频谱中特定频率的幅值特征,基于BP人工神经网络建立幅值特征与裂纹信息间的映射关系,从而有效识别出裂纹信息.首先采用扩展有限元法(eXtended Finite Element Methods, XFEM)和人工吸收边界模型,分别模拟了单裂纹和双裂纹情形下,大量不同裂纹信息下特定位置传感器的响应,分析其频谱曲线并提取特征,建立频谱特征—裂尖位置数据集,以训练人工神经网络,测试集的反演效果显示,该方法具有较好的准确度,可有效识别出裂纹信息.展开更多
Feature selection is always an important issue in the visual SLAM (simultaneous location and mapping) literature. Considering that the location estimation can be improved by tracking features with larger value of vi...Feature selection is always an important issue in the visual SLAM (simultaneous location and mapping) literature. Considering that the location estimation can be improved by tracking features with larger value of visible time, a new feature selection method based on motion estimation is proposed. First, a k-step iteration algorithm is presented for visible time estimation using an affme motion model; then a delayed feature detection method is introduced for efficiently detecting features with the maximum visible time. As a means of validation for the proposed method, both simulation and real data experiments are carded out. Results show that the proposed method can improve both the estimation performance and the computational performance compared with the existing random feature selection method.展开更多
Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplic...Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplicate detection methods. We have designed a coarse-to-fine near duplicate detection framework to speed-up the process and a multi-modal integra-tion scheme for accurate detection. The duplicate pairs are detected with both global feature (partition based color his-togram) and local feature (CPAM and SIFT Bag-of-Word model). The experiment results on large scale data set proved the effectiveness of the proposed design.展开更多
Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe...Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network(CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion(MMFF) is proposed. Specifically, first residual network(Resnet)-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network(FPN), finally squeeze-and-excitation fusion(SEF) module and self-attention network(SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.展开更多
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
文摘In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resolution,single viewpoint,and occlusion.Different from the existing works predicting symmetry from the complete shape,we propose a learning approach for symmetry predic-tion based on a single RGB-D image.Instead of directly predicting the symmetry from incomplete shapes,our method consists of two modules,i.e.,the multi-mod-al feature fusion module and the detection-by-reconstruction module.Firstly,we build a channel-transformer network(CTN)to extract cross-fusion features from the RGB-D as the multi-modal feature fusion module,which helps us aggregate features from the color and the depth separately.Then,our self-reconstruction net-work based on a 3D variational auto-encoder(3D-VAE)takes the global geo-metric features as input,followed by a prediction symmetry network to detect the symmetry.Our experiments are conducted on three public datasets:ShapeNet,YCB,and ScanNet,we demonstrate that our method can produce reliable and accurate results.
文摘Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.
文摘An extended ocean-atmosphere coupled characteristic system including thermodynamic physical processes in ocean mixed layer is formulated in order to describe SST explicitly and remove possible limitation of ocean-atmosphere coupling assumption in hydrodynamic ENSO models. It is revealed that there is a kind of abrupt nonlinear characteristic behaviour, which relates to rapid onset and intermittency of El Nino events, on the second order slow time scale due to the nonlinear interaction between a linear unstable low-frequency primary eigen component of ocean-atmosphere coupled Kelvin wave and its higher order harmonic components under a strong ocean-atmosphere coupling background. And, on the other hand, there is a kind of finite amplitude nonlinear characteristic behaviour on the second order slow time scale due to the nonlinear interaction between the linear unstable primary eigen component and its higher order harmonic components under a weak ocean-atmosphere coupling background in this model system.
基金Supported by the National Key Technology R&D Program (No. 2006BAK08B07)
文摘A novel face recognition method, which is a fusion of muhi-modal face parts based on Gabor feature (MMP-GF), is proposed in this paper. Firstly, the bare face image detached from the normalized image was convolved with a family of Gabor kernels, and then according to the face structure and the key-points locations, the calculated Gabor images were divided into five parts: Gabor face, Gabor eyebrow, Gabor eye, Gabor nose and Gabor mouth. After that multi-modal Gabor features were spatially partitioned into non-overlapping regions and the averages of regions were concatenated to be a low dimension feature vector, whose dimension was further reduced by principal component analysis (PCA). In the decision level fusion, match results respectively calculated based on the five parts were combined according to linear discriminant analysis (LDA) and a normalized matching algorithm was used to improve the performance. Experiments on FERET database show that the proposed MMP-GF method achieves good robustness to the expression and age variations.
基金National Youth Natural Science Foundation of China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)Jiangsu University Superior Discipline Construction Project。
文摘In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels.
基金Supported by Shenzhen Science and Technology Program,Shenzhen,China(No.JCYJ20200109145001814,No.SGDX20211123120001001)the National Natural Science Foundation of China(No.81970790)Sanming Project of Medicine in Shenzhen(No.SZSM202011015).
文摘AIM:To describe the clinical,electrophysiological,and genetic features of an unusual case with an RDH12 homozygous pathogenic variant and reviewed the characteristics of the patients reported with the same variant.METHODS:The patient underwent a complete ophthalmologic examination including best-corrected visual acuity,anterior segment and dilated fundus,visual field,spectral-domain optical coherence tomography(OCT)and electroretinogram(ERG).The retinal disease panel genes were sequenced through chip capture high-throughput sequencing and Sanger sequencing was used to confirm the result.Then we reviewed the characteristics of the patients reported with the same variant.RESULTS:A 30-year male presented with severe early retinal degeneration who complained night blindness,decreased visual acuity,vitreous floaters and amaurosis fugax.The best corrected vision was 0.04 OD and 0.12 OS,respectively.The fundus photo and OCT showed bilateral macular atrophy but larger areas of macular atrophy in the left eye.Autofluorescence shows bilateral symmetrical hypo-autofluorescence.ERG revealed that the amplitudes of a-and b-wave were severely decreased.Multifocal ERG showed decreased amplitudes in the local macular area.A homozygous missense variant c.146C>T(chr14:68191267)was found.The clinical characteristics of a total of 13 patients reported with the same pathologic variant varied.CONCLUSION:An unusual patient with a homozygous pathogenic variant in the c.146C>T of RDH12 which causes late-onset and asymmetric retinal degeneration are reported.The clinical manifestations of the patient with multimodal retinal imaging and functional examinations have enriched our understanding of this disease.
文摘大体积混凝土结构被广泛应用于土木、水利等领域的重大工程中,而混凝土抗拉强度低的力学特性决定了其易产生裂纹,因此,发展高效的检测方法,识别大体积混凝土结构中的裂纹信息十分必要.论文提出了一种新的方法,通过提取响应信号频谱中特定频率的幅值特征,基于BP人工神经网络建立幅值特征与裂纹信息间的映射关系,从而有效识别出裂纹信息.首先采用扩展有限元法(eXtended Finite Element Methods, XFEM)和人工吸收边界模型,分别模拟了单裂纹和双裂纹情形下,大量不同裂纹信息下特定位置传感器的响应,分析其频谱曲线并提取特征,建立频谱特征—裂尖位置数据集,以训练人工神经网络,测试集的反演效果显示,该方法具有较好的准确度,可有效识别出裂纹信息.
文摘Feature selection is always an important issue in the visual SLAM (simultaneous location and mapping) literature. Considering that the location estimation can be improved by tracking features with larger value of visible time, a new feature selection method based on motion estimation is proposed. First, a k-step iteration algorithm is presented for visible time estimation using an affme motion model; then a delayed feature detection method is introduced for efficiently detecting features with the maximum visible time. As a means of validation for the proposed method, both simulation and real data experiments are carded out. Results show that the proposed method can improve both the estimation performance and the computational performance compared with the existing random feature selection method.
文摘Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplicate detection methods. We have designed a coarse-to-fine near duplicate detection framework to speed-up the process and a multi-modal integra-tion scheme for accurate detection. The duplicate pairs are detected with both global feature (partition based color his-togram) and local feature (CPAM and SIFT Bag-of-Word model). The experiment results on large scale data set proved the effectiveness of the proposed design.
基金supported by the National Natural Science Foundation of China(61962010,62262005)the Natural Science Foundation of Guizhou Priovince(QianKeHeJichu[2019]1425).
文摘Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network(CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion(MMFF) is proposed. Specifically, first residual network(Resnet)-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network(FPN), finally squeeze-and-excitation fusion(SEF) module and self-attention network(SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.