Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, t...Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.展开更多
Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not...Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not consider the offset of pixels along the epipolar lines in complementary views when integrating stereo information.To address these challenges,this paper introduces a novel epipolar line window attention stereo image super-resolution network(EWASSR).For detail feature restoration,we design a feature extractor based on Transformer and convolutional neural network(CNN),which consists of(shifted)window-based self-attention((S)W-MSA)and feature distillation and enhancement blocks(FDEB).This combination effectively solves the problem of global image perception and local feature attention and captures more discriminative high-frequency features of the image.Furthermore,to address the problem of offset of complementary pixels in stereo images,we propose an epipolar line window attention(EWA)mechanism,which divides windows along the epipolar direction to promote efficient matching of shifted pixels,even in pixel smooth areas.More accurate pixel matching can be achieved using adjacent pixels in the window as a reference.Extensive experiments demonstrate that our EWASSR can reconstruct more realistic detailed features.Comparative quantitative results show that in the experimental results of our EWASSR on the Middlebury and Flickr1024 data sets for 2×SR,compared with the recent network,the Peak signal-to-noise ratio(PSNR)increased by 0.37 dB and 0.34 dB,respectively.展开更多
Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The signif...Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.展开更多
In multi-view stereo,unreliable matching in low-textured regions has a negative impact on the completeness of reconstructed models.Since the photometric consistency of low-textured regions is not discriminative under ...In multi-view stereo,unreliable matching in low-textured regions has a negative impact on the completeness of reconstructed models.Since the photometric consistency of low-textured regions is not discriminative under a local window,non-local information provided by the Markov Random Field(MRF)model can alleviate the matching ambiguity but is limited in continuous space with high computational complexity.Owing to its sampling and propagation strategy,PatchMatch multi-view stereo methods have advantages in terms of optimizing the continuous labeling problem.In this paper,we propose a novel method to address this problem,namely the Coarse-Hypotheses Guided Non-Local PAtchMatch Multi-View Stereo(CNLPA-MVS),which takes the advantages of both MRF-based non-local methods and PatchMatch multi-view stereo and compensates for their defects mutually.First,we combine dynamic programing(DP)and sequential propagation along scanlines in parallel to perform CNLPA-MVS,thereby obtaining the optimal depth and normal hypotheses.Second,we introduce coarse inference within a universal window provided by winner-takes-all to eliminate the stripe artifacts caused by DP and improve completeness.Third,we add a local consistency strategy based on the hypotheses of similar color pixels sharing approximate values into CNLPA-MVS for further improving completeness.CNLPA-MVS was validated on public benchmarks and achieved state-of-the-art performance with high completeness.展开更多
In this paper,we present a practical method for reconstructing the bidirectional reflectance distribution function(BRDF)from multiple images of a real object composed of a homogeneous material.The key idea is that the...In this paper,we present a practical method for reconstructing the bidirectional reflectance distribution function(BRDF)from multiple images of a real object composed of a homogeneous material.The key idea is that the BRDF can be sampled after geometry estimation using multi-view stereo(MVS)techniques.Our contribution is selection of reliable samples of lighting,surface normal,and viewing directions for robustness against estimation errors of MVS.Our method is quantitatively evaluated using synthesized images and its effectiveness is shown via real-world experiments.展开更多
Epilepsy is a central nervous system disorder in which brain activity becomes abnormal.Electroencephalogram(EEG)signals,as recordings of brain activity,have been widely used for epilepsy recognition.To study epilep-ti...Epilepsy is a central nervous system disorder in which brain activity becomes abnormal.Electroencephalogram(EEG)signals,as recordings of brain activity,have been widely used for epilepsy recognition.To study epilep-tic EEG signals and develop artificial intelligence(AI)-assist recognition,a multi-view transfer learning(MVTL-LSR)algorithm based on least squares regression is proposed in this study.Compared with most existing multi-view transfer learning algorithms,MVTL-LSR has two merits:(1)Since traditional transfer learning algorithms leverage knowledge from different sources,which poses a significant risk to data privacy.Therefore,we develop a knowledge transfer mechanism that can protect the security of source domain data while guaranteeing performance.(2)When utilizing multi-view data,we embed view weighting and manifold regularization into the transfer framework to measure the views’strengths and weaknesses and improve generalization ability.In the experimental studies,12 different simulated multi-view&transfer scenarios are constructed from epileptic EEG signals licensed and provided by the Uni-versity of Bonn,Germany.Extensive experimental results show that MVTL-LSR outperforms baselines.The source code will be available on https://github.com/didid5/MVTL-LSR.展开更多
Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the fi...Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.展开更多
In multi-view image localization task,the features of the images captured from different views should be fused properly.This paper considers the classification-based image localization problem.We propose the relationa...In multi-view image localization task,the features of the images captured from different views should be fused properly.This paper considers the classification-based image localization problem.We propose the relational graph location network(RGLN)to perform this task.In this network,we propose a heterogeneous graph construction approach for graph classification tasks,which aims to describe the location in a more appropriate way,thereby improving the expression ability of the location representation module.Experiments show that the expression ability of the proposed graph construction approach outperforms the compared methods by a large margin.In addition,the proposed localization method outperforms the compared localization methods by around 1.7%in terms of meter-level accuracy.展开更多
Deep matrix factorization(DMF)has been demonstrated to be a powerful tool to take in the complex hierarchical information of multi-view data(MDR).However,existing multiview DMF methods mainly explore the consistency o...Deep matrix factorization(DMF)has been demonstrated to be a powerful tool to take in the complex hierarchical information of multi-view data(MDR).However,existing multiview DMF methods mainly explore the consistency of multi-view data,while neglecting the diversity among different views as well as the high-order relationships of data,resulting in the loss of valuable complementary information.In this paper,we design a hypergraph regularized diverse deep matrix factorization(HDDMF)model for multi-view data representation,to jointly utilize multi-view diversity and a high-order manifold in a multilayer factorization framework.A novel diversity enhancement term is designed to exploit the structural complementarity between different views of data.Hypergraph regularization is utilized to preserve the high-order geometry structure of data in each view.An efficient iterative optimization algorithm is developed to solve the proposed model with theoretical convergence analysis.Experimental results on five real-world data sets demonstrate that the proposed method significantly outperforms stateof-the-art multi-view learning approaches.展开更多
When training a stereo matching network with a single training dataset, the network may overly rely on the learned features of the single training dataset due to differences in the training dataset scenes, resulting i...When training a stereo matching network with a single training dataset, the network may overly rely on the learned features of the single training dataset due to differences in the training dataset scenes, resulting in poor performance on all datasets. Therefore, feature consistency between matched pixels is a key factor in solving the network’s generalization ability. To address this issue, this paper proposed a more widely applicable stereo matching network that introduced whitening loss into the feature extraction module of stereo matching, and significantly improved the applicability of the network model by constraining the variation between salient feature pixels. In addition, this paper used a GRU iterative update module in the disparity update calculation stage, which expanded the model’s receptive field at multiple resolutions, allowing for precise disparity estimation not only in rich texture areas but also in low texture areas. The model was trained only on the Scene Flow large-scale dataset, and the disparity estimation was conducted on mainstream datasets such as Middlebury, KITTI 2015, and ETH3D. Compared with earlier stereo matching algorithms, this method not only achieves more accurate disparity estimation but also has wider applicability and stronger robustness.展开更多
针对多视图立体网络在弱纹理或非朗伯曲面等挑战性区域重建效果差的问题,首先提出一个基于3个并行扩展卷积和注意力机制的多尺度特征提取模块,在增加感受野的同时捕获特征之间的依赖关系以获取全局上下文信息,从而提升多视图立体网络在...针对多视图立体网络在弱纹理或非朗伯曲面等挑战性区域重建效果差的问题,首先提出一个基于3个并行扩展卷积和注意力机制的多尺度特征提取模块,在增加感受野的同时捕获特征之间的依赖关系以获取全局上下文信息,从而提升多视图立体网络在挑战性区域特征的表征能力以进行鲁棒的特征匹配。其次在代价体正则化3D CNN部分引入注意力机制,使网络注意于代价体中的重要区域以进行平滑处理。另外建立一个神经渲染网络,该网络利用渲染参考损失精确地解析辐射场景表达的几何外观信息,并引入深度一致性损失保持多视图立体网络与神经渲染网络之间的几何一致性,有效地缓解有噪声代价体对多视图立体网络的不利影响。该算法在室内DTU数据集中测试,点云重建的完整性和整体性指标分别为0.289和0.326,与基准方法CasMVSNet相比,分别提升24.9%和8.2%,即使在挑战性区域也得到高质量的重建效果;在室外Tanks and Temples中级数据集中,点云重建的平均F-score为60.31,与方法UCS-Net相比提升9.9%,体现出较强的泛化能力。展开更多
The binocular stereo vision is the lowest cost sensor for obtaining 3D information.Considering the weakness of long‐distance measurement and stability,the improvement of accuracy and stability of stereo vision is urg...The binocular stereo vision is the lowest cost sensor for obtaining 3D information.Considering the weakness of long‐distance measurement and stability,the improvement of accuracy and stability of stereo vision is urgently required for application of precision agriculture.To address the challenges of stereo vision long‐distance measurement and stable perception without hardware upgrade,inspired by hawk eyes,higher resolution perception and the adaptive HDR(High Dynamic Range)were introduced in this paper.Simulating the function from physiological structure of‘deep fovea’and‘shallow fovea’of hawk eye,the higher resolution reconstruction method in this paper was aimed at ac-curacy improving.Inspired by adjustment of pupils,the adaptive HDR method was proposed for high dynamic range optimisation and stable perception.In various light conditions,compared with default stereo vision,the accuracy of proposed algorithm was improved by 28.0%evaluated by error ratio,and the stability was improved by 26.56%by disparity accuracy.For fixed distance measurement,the maximum improvement was 78.6%by standard deviation.Based on the hawk‐eye‐inspired perception algorithm,the point cloud of orchard was improved both in quality and quantity.The hawk‐eye‐inspired perception algorithm contributed great advance in binocular 3D point cloud recon-struction in orchard navigation map.展开更多
针对在基于深度学习技术的特征提取网络中,深层次的卷积神经网络提取的特征缺乏低级语义信息的问题,该文提出了语义增强的多视立体视觉方法。首先,提出了一种ConvLSTM(Convolutional Long Short-Term Memory)语义聚合网络,通过使用ConvL...针对在基于深度学习技术的特征提取网络中,深层次的卷积神经网络提取的特征缺乏低级语义信息的问题,该文提出了语义增强的多视立体视觉方法。首先,提出了一种ConvLSTM(Convolutional Long Short-Term Memory)语义聚合网络,通过使用ConvLSTM网络结构,对多个卷积层提取的特征图进行预测,得到融合每层语义信息的特征图,有助于在空间上层层抽取图像的高级特征时,利用长短期记忆神经网络结构的记忆功能来增强高层特征图中的低级语义信息,提高了弱纹理区域的重建效果,提高了3D重建的鲁棒性和完整性;其次,提出了一种可见性网络,在灰度图的基础上,通过突出特征图上可见区域的特征,加深了可见区域在特征图中的影响,有助于提高三维重建效果;最后,提取图像的纹理信息,并进入ConvLSTM语义聚合网络提取深层次特征,提高了弱纹理区域的重建效果。与主流的多视立体视觉重建方法相比,重建效果较好。展开更多
视差不连续区域和重复纹理区域的误匹配率高一直是影响双目立体匹配测量精度的主要问题,为此,本文提出一种基于多特征融合的立体匹配算法。首先,在代价计算阶段,通过高斯加权法赋予邻域像素点的权值,从而优化绝对差之和(Sum of Absolute...视差不连续区域和重复纹理区域的误匹配率高一直是影响双目立体匹配测量精度的主要问题,为此,本文提出一种基于多特征融合的立体匹配算法。首先,在代价计算阶段,通过高斯加权法赋予邻域像素点的权值,从而优化绝对差之和(Sum of Absolute Differences,SAD)算法的计算精度。接着,基于Census变换改进二进制链码方式,将邻域内像素的平均灰度值与梯度图像的灰度均值相融合,进而建立左右图像对应点的判断依据并优化其编码长度。然后,构建基于十字交叉法与改进的引导滤波器相融合的聚合方法,从而实现视差值再分配,以降低误匹配率。最后,通过赢家通吃(Winner Take All,WTA)算法获取初始视差,并采用左右一致性检测方法及亚像素法提高匹配精度,从而获取最终的视差结果。实验结果表明,在Middlebury数据集的测试中,所提SAD-Census算法的平均非遮挡区域和全部区域的误匹配率为分别为2.67%和5.69%,测量200~900 mm距离的平均误差小于2%;而实际三维测量的最大误差为1.5%。实验结果检验了所提算法的有效性和可靠性。展开更多
基金supported in part by NUS startup grantthe National Natural Science Foundation of China (52076037)。
文摘Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.
基金This work was supported by Sichuan Science and Technology Program(2023YFG0262).
文摘Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not consider the offset of pixels along the epipolar lines in complementary views when integrating stereo information.To address these challenges,this paper introduces a novel epipolar line window attention stereo image super-resolution network(EWASSR).For detail feature restoration,we design a feature extractor based on Transformer and convolutional neural network(CNN),which consists of(shifted)window-based self-attention((S)W-MSA)and feature distillation and enhancement blocks(FDEB).This combination effectively solves the problem of global image perception and local feature attention and captures more discriminative high-frequency features of the image.Furthermore,to address the problem of offset of complementary pixels in stereo images,we propose an epipolar line window attention(EWA)mechanism,which divides windows along the epipolar direction to promote efficient matching of shifted pixels,even in pixel smooth areas.More accurate pixel matching can be achieved using adjacent pixels in the window as a reference.Extensive experiments demonstrate that our EWASSR can reconstruct more realistic detailed features.Comparative quantitative results show that in the experimental results of our EWASSR on the Middlebury and Flickr1024 data sets for 2×SR,compared with the recent network,the Peak signal-to-noise ratio(PSNR)increased by 0.37 dB and 0.34 dB,respectively.
文摘Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.
基金supported by the National Natural Science Foundation of China under Grant Nos.61732015,61932018,and 61472349the National Key Research and Development Program of China under Grant No.2017YFB0202203.
文摘In multi-view stereo,unreliable matching in low-textured regions has a negative impact on the completeness of reconstructed models.Since the photometric consistency of low-textured regions is not discriminative under a local window,non-local information provided by the Markov Random Field(MRF)model can alleviate the matching ambiguity but is limited in continuous space with high computational complexity.Owing to its sampling and propagation strategy,PatchMatch multi-view stereo methods have advantages in terms of optimizing the continuous labeling problem.In this paper,we propose a novel method to address this problem,namely the Coarse-Hypotheses Guided Non-Local PAtchMatch Multi-View Stereo(CNLPA-MVS),which takes the advantages of both MRF-based non-local methods and PatchMatch multi-view stereo and compensates for their defects mutually.First,we combine dynamic programing(DP)and sequential propagation along scanlines in parallel to perform CNLPA-MVS,thereby obtaining the optimal depth and normal hypotheses.Second,we introduce coarse inference within a universal window provided by winner-takes-all to eliminate the stripe artifacts caused by DP and improve completeness.Third,we add a local consistency strategy based on the hypotheses of similar color pixels sharing approximate values into CNLPA-MVS for further improving completeness.CNLPA-MVS was validated on public benchmarks and achieved state-of-the-art performance with high completeness.
基金partly supported by JSPS KAKENHI JP15K16027,JP26700013,JP15H05918,JP19H04138,JST CREST JP179423the Foundation for Nara Institute of Science and Technology.
文摘In this paper,we present a practical method for reconstructing the bidirectional reflectance distribution function(BRDF)from multiple images of a real object composed of a homogeneous material.The key idea is that the BRDF can be sampled after geometry estimation using multi-view stereo(MVS)techniques.Our contribution is selection of reliable samples of lighting,surface normal,and viewing directions for robustness against estimation errors of MVS.Our method is quantitatively evaluated using synthesized images and its effectiveness is shown via real-world experiments.
基金supported in part by the National Natural Science Foundation of China(Grant No.82072019)the Shenzhen Basic Research Program(JCYJ20210324130209023)of Shenzhen Science and Technology Innovation Committee+6 种基金the Shenzhen-Hong Kong-Macao S&T Program(Category C)(SGDX20201103095002019)the Natural Science Foundation of Jiangsu Province(No.BK20201441)the Provincial and Ministry Co-constructed Project of Henan Province Medical Science and Technology Research(SBGJ202103038 and SBGJ202102056)the Henan Province Key R&D and Promotion Project(Science and Technology Research)(222102310015)the Natural Science Foundation of Henan Province(222300420575)the Henan Province Science and Technology Research(222102310322)The Jiangsu Students’Innovation and Entrepreneurship Training Program(202110304096Y).
文摘Epilepsy is a central nervous system disorder in which brain activity becomes abnormal.Electroencephalogram(EEG)signals,as recordings of brain activity,have been widely used for epilepsy recognition.To study epilep-tic EEG signals and develop artificial intelligence(AI)-assist recognition,a multi-view transfer learning(MVTL-LSR)algorithm based on least squares regression is proposed in this study.Compared with most existing multi-view transfer learning algorithms,MVTL-LSR has two merits:(1)Since traditional transfer learning algorithms leverage knowledge from different sources,which poses a significant risk to data privacy.Therefore,we develop a knowledge transfer mechanism that can protect the security of source domain data while guaranteeing performance.(2)When utilizing multi-view data,we embed view weighting and manifold regularization into the transfer framework to measure the views’strengths and weaknesses and improve generalization ability.In the experimental studies,12 different simulated multi-view&transfer scenarios are constructed from epileptic EEG signals licensed and provided by the Uni-versity of Bonn,Germany.Extensive experimental results show that MVTL-LSR outperforms baselines.The source code will be available on https://github.com/didid5/MVTL-LSR.
基金supported in part by the Key Program of NSFC (Grant No.U1908214)Special Project of Central Government Guiding Local Science and Technology Development (Grant No.2021JH6/10500140)+3 种基金Program for the Liaoning Distinguished Professor,Program for Innovative Research Team in University of Liaoning Province (LT2020015)Dalian (2021RT06)and Dalian University (XLJ202010)the Science and Technology Innovation Fund of Dalian (Grant No.2020JJ25CY001)Dalian University Scientific Research Platform Project (No.202101YB03).
文摘Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.
文摘In multi-view image localization task,the features of the images captured from different views should be fused properly.This paper considers the classification-based image localization problem.We propose the relational graph location network(RGLN)to perform this task.In this network,we propose a heterogeneous graph construction approach for graph classification tasks,which aims to describe the location in a more appropriate way,thereby improving the expression ability of the location representation module.Experiments show that the expression ability of the proposed graph construction approach outperforms the compared methods by a large margin.In addition,the proposed localization method outperforms the compared localization methods by around 1.7%in terms of meter-level accuracy.
基金This work was supported by the National Natural Science Foundation of China(62073087,62071132,61973090).
文摘Deep matrix factorization(DMF)has been demonstrated to be a powerful tool to take in the complex hierarchical information of multi-view data(MDR).However,existing multiview DMF methods mainly explore the consistency of multi-view data,while neglecting the diversity among different views as well as the high-order relationships of data,resulting in the loss of valuable complementary information.In this paper,we design a hypergraph regularized diverse deep matrix factorization(HDDMF)model for multi-view data representation,to jointly utilize multi-view diversity and a high-order manifold in a multilayer factorization framework.A novel diversity enhancement term is designed to exploit the structural complementarity between different views of data.Hypergraph regularization is utilized to preserve the high-order geometry structure of data in each view.An efficient iterative optimization algorithm is developed to solve the proposed model with theoretical convergence analysis.Experimental results on five real-world data sets demonstrate that the proposed method significantly outperforms stateof-the-art multi-view learning approaches.
文摘When training a stereo matching network with a single training dataset, the network may overly rely on the learned features of the single training dataset due to differences in the training dataset scenes, resulting in poor performance on all datasets. Therefore, feature consistency between matched pixels is a key factor in solving the network’s generalization ability. To address this issue, this paper proposed a more widely applicable stereo matching network that introduced whitening loss into the feature extraction module of stereo matching, and significantly improved the applicability of the network model by constraining the variation between salient feature pixels. In addition, this paper used a GRU iterative update module in the disparity update calculation stage, which expanded the model’s receptive field at multiple resolutions, allowing for precise disparity estimation not only in rich texture areas but also in low texture areas. The model was trained only on the Scene Flow large-scale dataset, and the disparity estimation was conducted on mainstream datasets such as Middlebury, KITTI 2015, and ETH3D. Compared with earlier stereo matching algorithms, this method not only achieves more accurate disparity estimation but also has wider applicability and stronger robustness.
文摘针对多视图立体网络在弱纹理或非朗伯曲面等挑战性区域重建效果差的问题,首先提出一个基于3个并行扩展卷积和注意力机制的多尺度特征提取模块,在增加感受野的同时捕获特征之间的依赖关系以获取全局上下文信息,从而提升多视图立体网络在挑战性区域特征的表征能力以进行鲁棒的特征匹配。其次在代价体正则化3D CNN部分引入注意力机制,使网络注意于代价体中的重要区域以进行平滑处理。另外建立一个神经渲染网络,该网络利用渲染参考损失精确地解析辐射场景表达的几何外观信息,并引入深度一致性损失保持多视图立体网络与神经渲染网络之间的几何一致性,有效地缓解有噪声代价体对多视图立体网络的不利影响。该算法在室内DTU数据集中测试,点云重建的完整性和整体性指标分别为0.289和0.326,与基准方法CasMVSNet相比,分别提升24.9%和8.2%,即使在挑战性区域也得到高质量的重建效果;在室外Tanks and Temples中级数据集中,点云重建的平均F-score为60.31,与方法UCS-Net相比提升9.9%,体现出较强的泛化能力。
基金funded by the National Natural Science Foundation of China(No.51979275)Key Laboratory of Spatial‐temporal Big Data Analysis and Application of Nat-ural Resources in Megacities,MNR(No.KFKT‐2022‐05)+3 种基金Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(No.KF‐2021‐06‐115)Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems,Bei-hang University(No.VRLAB2022C10)Jiangsu Province and Education Ministry Co‐sponsored Synergistic Innovation Center of Modern Agricultural Equipment(No.XTCX2002)2115 Talent Development Program of China Agricultural University and Chinese Universities Scientific Fund(No.2021TC105).
文摘The binocular stereo vision is the lowest cost sensor for obtaining 3D information.Considering the weakness of long‐distance measurement and stability,the improvement of accuracy and stability of stereo vision is urgently required for application of precision agriculture.To address the challenges of stereo vision long‐distance measurement and stable perception without hardware upgrade,inspired by hawk eyes,higher resolution perception and the adaptive HDR(High Dynamic Range)were introduced in this paper.Simulating the function from physiological structure of‘deep fovea’and‘shallow fovea’of hawk eye,the higher resolution reconstruction method in this paper was aimed at ac-curacy improving.Inspired by adjustment of pupils,the adaptive HDR method was proposed for high dynamic range optimisation and stable perception.In various light conditions,compared with default stereo vision,the accuracy of proposed algorithm was improved by 28.0%evaluated by error ratio,and the stability was improved by 26.56%by disparity accuracy.For fixed distance measurement,the maximum improvement was 78.6%by standard deviation.Based on the hawk‐eye‐inspired perception algorithm,the point cloud of orchard was improved both in quality and quantity.The hawk‐eye‐inspired perception algorithm contributed great advance in binocular 3D point cloud recon-struction in orchard navigation map.
文摘针对在基于深度学习技术的特征提取网络中,深层次的卷积神经网络提取的特征缺乏低级语义信息的问题,该文提出了语义增强的多视立体视觉方法。首先,提出了一种ConvLSTM(Convolutional Long Short-Term Memory)语义聚合网络,通过使用ConvLSTM网络结构,对多个卷积层提取的特征图进行预测,得到融合每层语义信息的特征图,有助于在空间上层层抽取图像的高级特征时,利用长短期记忆神经网络结构的记忆功能来增强高层特征图中的低级语义信息,提高了弱纹理区域的重建效果,提高了3D重建的鲁棒性和完整性;其次,提出了一种可见性网络,在灰度图的基础上,通过突出特征图上可见区域的特征,加深了可见区域在特征图中的影响,有助于提高三维重建效果;最后,提取图像的纹理信息,并进入ConvLSTM语义聚合网络提取深层次特征,提高了弱纹理区域的重建效果。与主流的多视立体视觉重建方法相比,重建效果较好。
文摘视差不连续区域和重复纹理区域的误匹配率高一直是影响双目立体匹配测量精度的主要问题,为此,本文提出一种基于多特征融合的立体匹配算法。首先,在代价计算阶段,通过高斯加权法赋予邻域像素点的权值,从而优化绝对差之和(Sum of Absolute Differences,SAD)算法的计算精度。接着,基于Census变换改进二进制链码方式,将邻域内像素的平均灰度值与梯度图像的灰度均值相融合,进而建立左右图像对应点的判断依据并优化其编码长度。然后,构建基于十字交叉法与改进的引导滤波器相融合的聚合方法,从而实现视差值再分配,以降低误匹配率。最后,通过赢家通吃(Winner Take All,WTA)算法获取初始视差,并采用左右一致性检测方法及亚像素法提高匹配精度,从而获取最终的视差结果。实验结果表明,在Middlebury数据集的测试中,所提SAD-Census算法的平均非遮挡区域和全部区域的误匹配率为分别为2.67%和5.69%,测量200~900 mm距离的平均误差小于2%;而实际三维测量的最大误差为1.5%。实验结果检验了所提算法的有效性和可靠性。