期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Research on Improved MobileViT Image Tamper Localization Model
1
作者 Jingtao Sun Fengling Zhang +1 位作者 Huanqi Liu Wenyan Hou 《Computers, Materials & Continua》 SCIE EI 2024年第8期3173-3192,共20页
As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately l... As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately localizing limited samples,multiple types,and various sizes of regions remains a multitude of challenges.These issues impede the model’s universality and generalization capability and detrimentally affect its performance.To tackle these issues,we propose FL-MobileViT-an improved MobileViT model devised for image tampering localization.Our proposed model utilizes a dual-stream architecture that independently processes the RGB and noise domain,and captures richer traces of tampering through dual-stream integration.Meanwhile,the model incorporating the Focused Linear Attention mechanism within the lightweight network(MobileViT).This substitution significantly diminishes computational complexity and resolves homogeneity problems associated with traditional Transformer attention mechanisms,enhancing feature extraction diversity and improving the model’s localization performance.To comprehensively fuse the generated results from both feature extractors,we introduce the ASPP architecture for multi-scale feature fusion.This facilitates a more precise localization of tampered regions of various sizes.Furthermore,to bolster the model’s generalization ability,we adopt a contrastive learning method and devise a joint optimization training strategy that leverages fused features and captures the disparities in feature distribution in tampered images.This strategy enables the learning of contrastive loss at various stages of the feature extractor and employs it as an additional constraint condition in conjunction with cross-entropy loss.As a result,overfitting issues are effectively alleviated,and the differentiation between tampered and untampered regions is enhanced.Experimental evaluations on five benchmark datasets(IMD-20,CASIA,NIST-16,Columbia and Coverage)validate the effectiveness of our proposed model.The meticulously calibrated FL-MobileViT model consistently outperforms numerous existing general models regarding localization accuracy across diverse datasets,demonstrating superior adaptability. 展开更多
关键词 Image tampering localization focused linear attention mechanism MobileViT contrastive loss
下载PDF
用于流式语音识别的轻量化端到端声学架构 被引量:1
2
作者 杨淑莹 李欣 《模式识别与人工智能》 EI CSCD 北大核心 2023年第3期268-279,共12页
在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with... 在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with a Single Head)与卷积神经网络块结合.卷积块中采用Inception V2网络,提取语音信号多尺度的局部特征.再通过Coordinate Attention机制捕获特征的位置信息和多通道之间的相互关联.此外,采用深度可分离卷积,用于特征增强和层间平滑过渡.为了使其可流式化处理音频,采用RNN-T(Recurrent Neural Network Transducer)架构进行训练与解码.将当前块已经计算的全局注意力作为隐变量,传入后续块中,串联各块信息,保留训练的并行性和相关性,并且不会随着序列的增长而消耗计算资源.在开源数据集THCHS30上进行训练与测试,CFLASH-Transducer取得较高的识别率.并且相比离线识别,流式识别精度损失不超过1%. 展开更多
关键词 自动语言识别 流式识别 Fast linear attention with a Single Head(FLASH) 卷积神经网络(CNN) Re-current Neural Network Transducer(RNN-T)
下载PDF
A Deep Double-Channel Dense Network for Hyperspectral Image Classifica-tion 被引量:15
3
作者 Kexian WANG Shunyi ZHENG +1 位作者 Rui LI Li GUI 《Journal of Geodesy and Geoinformation Science》 2021年第4期46-62,共17页
Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational res... Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational resources and high-quality labelled datasets,while the expenditures of high-performance computing and data annotation are expensive.In this paper,to reduce the dependence on massive calculation and labelled samples,we propose a deep Double-Channel dense network(DDCD)for Hyperspectral Image Classification.Specifically,we design a 3D Double-Channel dense layer to capture the local and global features of the input.And we propose a Linear Attention Mechanism that is approximate to dot-product attention with much less memory and computational costs.The number of parameters and the consumptions of calculation are observably less than contrapositive deep learning methods,which means DDCD owns simpler architecture and higher efficiency.A series of quantitative experiences on 6 widely used hyperspectral datasets show that the proposed DDCD obtains state-of-the-art performance,even though when the absence of labelled samples is severe. 展开更多
关键词 3D Double-Channel dense layer linear attention Mechanism Deep Learning(DL) hyperspectral classification
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部