In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clini...In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.展开更多
现有深度多视图立体(MVS)方法将Transformer引入级联网络,以实现高分辨率深度估计,从而实现高精确度和完整度的三维重建结果。然而,基于Transformer的方法受计算成本的限制,无法扩展到更精细的阶段。为此,提出一种新颖的跨尺度Transfor...现有深度多视图立体(MVS)方法将Transformer引入级联网络,以实现高分辨率深度估计,从而实现高精确度和完整度的三维重建结果。然而,基于Transformer的方法受计算成本的限制,无法扩展到更精细的阶段。为此,提出一种新颖的跨尺度Transformer的MVS网络,在不增加额外计算的情况下处理不同阶段的特征表示。引入一种自适应匹配感知Transformer(AMT),在多个尺度上使用不同的交互式注意力组合。这种组合策略使所提网络能够捕捉图像内部的上下文信息,并增强图像之间的特征关系。此外,设计双特征引导聚合(DFGA),将粗糙的全局语义信息嵌入到更精细的代价体构建中,以进一步增强全局和局部特征的感知。同时,通过设计一种特征度量损失,用于评估变换前后的特征偏差,以减少特征错误匹配对深度估计的影响。实验结果表明,在DTU数据集中,所提网络的完整度和整体度量达到0.264、0.302,在Tanks and temples 2个大场景的重建平均值分别达到64.28、38.03。展开更多
针对海运货物邮件实体识别中存在识别精度不高、实体边界确定困难的问题,提出一种结合深度学习与规则匹配的识别方法。其中:深度学习方法是在BiLSTM-CRF(Bidirectional Long Short Term Memory-Conditional Random Field)模型的基础上...针对海运货物邮件实体识别中存在识别精度不高、实体边界确定困难的问题,提出一种结合深度学习与规则匹配的识别方法。其中:深度学习方法是在BiLSTM-CRF(Bidirectional Long Short Term Memory-Conditional Random Field)模型的基础上添加词的字符级特征,并融入多头注意力机制以捕获邮件文本中长距离依赖;规则匹配方法则根据领域实体特点制定规则来完成识别。根据货物邮件特点将语料进行标注并划分为:货物名称、货物重量、装卸港口、受载期和佣金五个类别。在自建语料中设置多组对比实验,实验表明所提方法在海运货物邮件实体识别的F1值达到79.3%。展开更多
相比基于特征点的传统图像特征匹配算法,基于深度学习的特征匹配算法能产生更大规模和更高质量的匹配.为获取较大范围且清晰的路面裂缝图像,并解决弱纹理图像拼接过程中发生的匹配对缺失问题,本文基于深度学习LoFTR(detector-free local...相比基于特征点的传统图像特征匹配算法,基于深度学习的特征匹配算法能产生更大规模和更高质量的匹配.为获取较大范围且清晰的路面裂缝图像,并解决弱纹理图像拼接过程中发生的匹配对缺失问题,本文基于深度学习LoFTR(detector-free local feature matching with Transformers)算法实现路面图像的拼接,并结合路面图像的特点,提出局部拼接方法缩短算法运行的时间.先对相邻图像做分割处理,再通过LoFTR算法产生密集特征匹配,根据匹配结果计算出单应矩阵值并实现像素转换,然后通过基于小波变换的图像融合算法获得局部拼接后的图像,最后添加未输入匹配网络的部分图像,得到相邻图像的完整拼接结果.实验结果表明,与基于SIFT(scale-invariant feature transform)、SURF(speeded up robust features)、ORB(oriented FAST and rotated BRIEF)的图像拼接方法比较,研究所提出的拼接方法对路面图像的拼接效果更佳,特征匹配阶段产生的匹配结果置信度更高.对于两幅路面图像的拼接,采用局部拼接方法耗费的时间较改进之前缩短了27.53%.研究提出的拼接方案是高效且准确的,能够为道路病害监测提供总体病害信息.展开更多
基金This work was supported by Science and Technology Cooperation Special Project of Shijiazhuang(SJZZXA23005).
文摘In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.
文摘现有深度多视图立体(MVS)方法将Transformer引入级联网络,以实现高分辨率深度估计,从而实现高精确度和完整度的三维重建结果。然而,基于Transformer的方法受计算成本的限制,无法扩展到更精细的阶段。为此,提出一种新颖的跨尺度Transformer的MVS网络,在不增加额外计算的情况下处理不同阶段的特征表示。引入一种自适应匹配感知Transformer(AMT),在多个尺度上使用不同的交互式注意力组合。这种组合策略使所提网络能够捕捉图像内部的上下文信息,并增强图像之间的特征关系。此外,设计双特征引导聚合(DFGA),将粗糙的全局语义信息嵌入到更精细的代价体构建中,以进一步增强全局和局部特征的感知。同时,通过设计一种特征度量损失,用于评估变换前后的特征偏差,以减少特征错误匹配对深度估计的影响。实验结果表明,在DTU数据集中,所提网络的完整度和整体度量达到0.264、0.302,在Tanks and temples 2个大场景的重建平均值分别达到64.28、38.03。
文摘针对海运货物邮件实体识别中存在识别精度不高、实体边界确定困难的问题,提出一种结合深度学习与规则匹配的识别方法。其中:深度学习方法是在BiLSTM-CRF(Bidirectional Long Short Term Memory-Conditional Random Field)模型的基础上添加词的字符级特征,并融入多头注意力机制以捕获邮件文本中长距离依赖;规则匹配方法则根据领域实体特点制定规则来完成识别。根据货物邮件特点将语料进行标注并划分为:货物名称、货物重量、装卸港口、受载期和佣金五个类别。在自建语料中设置多组对比实验,实验表明所提方法在海运货物邮件实体识别的F1值达到79.3%。
文摘相比基于特征点的传统图像特征匹配算法,基于深度学习的特征匹配算法能产生更大规模和更高质量的匹配.为获取较大范围且清晰的路面裂缝图像,并解决弱纹理图像拼接过程中发生的匹配对缺失问题,本文基于深度学习LoFTR(detector-free local feature matching with Transformers)算法实现路面图像的拼接,并结合路面图像的特点,提出局部拼接方法缩短算法运行的时间.先对相邻图像做分割处理,再通过LoFTR算法产生密集特征匹配,根据匹配结果计算出单应矩阵值并实现像素转换,然后通过基于小波变换的图像融合算法获得局部拼接后的图像,最后添加未输入匹配网络的部分图像,得到相邻图像的完整拼接结果.实验结果表明,与基于SIFT(scale-invariant feature transform)、SURF(speeded up robust features)、ORB(oriented FAST and rotated BRIEF)的图像拼接方法比较,研究所提出的拼接方法对路面图像的拼接效果更佳,特征匹配阶段产生的匹配结果置信度更高.对于两幅路面图像的拼接,采用局部拼接方法耗费的时间较改进之前缩短了27.53%.研究提出的拼接方案是高效且准确的,能够为道路病害监测提供总体病害信息.