As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately l...As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately localizing limited samples,multiple types,and various sizes of regions remains a multitude of challenges.These issues impede the model’s universality and generalization capability and detrimentally affect its performance.To tackle these issues,we propose FL-MobileViT-an improved MobileViT model devised for image tampering localization.Our proposed model utilizes a dual-stream architecture that independently processes the RGB and noise domain,and captures richer traces of tampering through dual-stream integration.Meanwhile,the model incorporating the Focused Linear Attention mechanism within the lightweight network(MobileViT).This substitution significantly diminishes computational complexity and resolves homogeneity problems associated with traditional Transformer attention mechanisms,enhancing feature extraction diversity and improving the model’s localization performance.To comprehensively fuse the generated results from both feature extractors,we introduce the ASPP architecture for multi-scale feature fusion.This facilitates a more precise localization of tampered regions of various sizes.Furthermore,to bolster the model’s generalization ability,we adopt a contrastive learning method and devise a joint optimization training strategy that leverages fused features and captures the disparities in feature distribution in tampered images.This strategy enables the learning of contrastive loss at various stages of the feature extractor and employs it as an additional constraint condition in conjunction with cross-entropy loss.As a result,overfitting issues are effectively alleviated,and the differentiation between tampered and untampered regions is enhanced.Experimental evaluations on five benchmark datasets(IMD-20,CASIA,NIST-16,Columbia and Coverage)validate the effectiveness of our proposed model.The meticulously calibrated FL-MobileViT model consistently outperforms numerous existing general models regarding localization accuracy across diverse datasets,demonstrating superior adaptability.展开更多
在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with...在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with a Single Head)与卷积神经网络块结合.卷积块中采用Inception V2网络,提取语音信号多尺度的局部特征.再通过Coordinate Attention机制捕获特征的位置信息和多通道之间的相互关联.此外,采用深度可分离卷积,用于特征增强和层间平滑过渡.为了使其可流式化处理音频,采用RNN-T(Recurrent Neural Network Transducer)架构进行训练与解码.将当前块已经计算的全局注意力作为隐变量,传入后续块中,串联各块信息,保留训练的并行性和相关性,并且不会随着序列的增长而消耗计算资源.在开源数据集THCHS30上进行训练与测试,CFLASH-Transducer取得较高的识别率.并且相比离线识别,流式识别精度损失不超过1%.展开更多
Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational res...Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational resources and high-quality labelled datasets,while the expenditures of high-performance computing and data annotation are expensive.In this paper,to reduce the dependence on massive calculation and labelled samples,we propose a deep Double-Channel dense network(DDCD)for Hyperspectral Image Classification.Specifically,we design a 3D Double-Channel dense layer to capture the local and global features of the input.And we propose a Linear Attention Mechanism that is approximate to dot-product attention with much less memory and computational costs.The number of parameters and the consumptions of calculation are observably less than contrapositive deep learning methods,which means DDCD owns simpler architecture and higher efficiency.A series of quantitative experiences on 6 widely used hyperspectral datasets show that the proposed DDCD obtains state-of-the-art performance,even though when the absence of labelled samples is severe.展开更多
基金This study was funded by the Science and Technology Project in Xi’an(No.22GXFW0123)this work was supported by the Special Fund Construction Project of Key Disciplines in Ordinary Colleges and Universities in Shaanxi Province,the authors would like to thank the anonymous reviewers for their helpful comments and suggestions.
文摘As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately localizing limited samples,multiple types,and various sizes of regions remains a multitude of challenges.These issues impede the model’s universality and generalization capability and detrimentally affect its performance.To tackle these issues,we propose FL-MobileViT-an improved MobileViT model devised for image tampering localization.Our proposed model utilizes a dual-stream architecture that independently processes the RGB and noise domain,and captures richer traces of tampering through dual-stream integration.Meanwhile,the model incorporating the Focused Linear Attention mechanism within the lightweight network(MobileViT).This substitution significantly diminishes computational complexity and resolves homogeneity problems associated with traditional Transformer attention mechanisms,enhancing feature extraction diversity and improving the model’s localization performance.To comprehensively fuse the generated results from both feature extractors,we introduce the ASPP architecture for multi-scale feature fusion.This facilitates a more precise localization of tampered regions of various sizes.Furthermore,to bolster the model’s generalization ability,we adopt a contrastive learning method and devise a joint optimization training strategy that leverages fused features and captures the disparities in feature distribution in tampered images.This strategy enables the learning of contrastive loss at various stages of the feature extractor and employs it as an additional constraint condition in conjunction with cross-entropy loss.As a result,overfitting issues are effectively alleviated,and the differentiation between tampered and untampered regions is enhanced.Experimental evaluations on five benchmark datasets(IMD-20,CASIA,NIST-16,Columbia and Coverage)validate the effectiveness of our proposed model.The meticulously calibrated FL-MobileViT model consistently outperforms numerous existing general models regarding localization accuracy across diverse datasets,demonstrating superior adaptability.
文摘在流式识别方法中,分块识别破坏并行性且消耗资源较大,而限制自注意力机制的上下文识别很难获得所有信息.由此,文中提出轻量化端到端声学架构(CFLASH-Transducer).为了获取细腻的局部特征,采用轻量化的FLASH(Fast Linear Attention with a Single Head)与卷积神经网络块结合.卷积块中采用Inception V2网络,提取语音信号多尺度的局部特征.再通过Coordinate Attention机制捕获特征的位置信息和多通道之间的相互关联.此外,采用深度可分离卷积,用于特征增强和层间平滑过渡.为了使其可流式化处理音频,采用RNN-T(Recurrent Neural Network Transducer)架构进行训练与解码.将当前块已经计算的全局注意力作为隐变量,传入后续块中,串联各块信息,保留训练的并行性和相关性,并且不会随着序列的增长而消耗计算资源.在开源数据集THCHS30上进行训练与测试,CFLASH-Transducer取得较高的识别率.并且相比离线识别,流式识别精度损失不超过1%.
基金National Natural Science Foundations of China(41671452)China Postdoctoral Science Foundation Funded Project(2017M612510)。
文摘Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational resources and high-quality labelled datasets,while the expenditures of high-performance computing and data annotation are expensive.In this paper,to reduce the dependence on massive calculation and labelled samples,we propose a deep Double-Channel dense network(DDCD)for Hyperspectral Image Classification.Specifically,we design a 3D Double-Channel dense layer to capture the local and global features of the input.And we propose a Linear Attention Mechanism that is approximate to dot-product attention with much less memory and computational costs.The number of parameters and the consumptions of calculation are observably less than contrapositive deep learning methods,which means DDCD owns simpler architecture and higher efficiency.A series of quantitative experiences on 6 widely used hyperspectral datasets show that the proposed DDCD obtains state-of-the-art performance,even though when the absence of labelled samples is severe.