摘要
近年来,基于深度学习的视频压缩技术主要基于卷积神经网络(CNN)且采用运动补偿-残差编码的架构,由于常见的CNN只能利用局部的相关性,以及预测残差本身的稀疏特性,难以取得最优压缩性能。因此,提出一种基于Transformer架构的条件视频压缩算法,以实现更优的压缩效果。所提算法基于前后帧之间的运动信息,利用可形变卷积得到对应的预测帧特征;将预测帧特征作为条件信息,对原始输入帧特征进行条件编码,避免了直接编码稀疏的残差信号;利用特征间的非局部相关性,提出一个基于Transformer的深度条件视频压缩编码算法,用来实现运动信息编码和条件编码,进一步提升压缩编码的性能。实验结果表明:所提算法在HEVC、UVG数据集上均超越了当前主流的基于深度学习的视频压缩算法。
Convolutional neural networks(CNN)are the foundation of most recent learning-based video compression algorithms,which also use residual coding and motion compensation architectures.It is difficult to attain the best compression performance given that typical CNN can only use local correlations and the sparsity of prediction residual.To solve the problems above,this paper proposed a Transformer-based deep conditional video compression algorithm,which can achieve better compression performance.The proposed algorithm uses deformable convolution to obtain the predicted frame feature based on the motion information between the front and rear frames.The predicted frame feature is used as conditional information to conditionally encode the original input frame feature which avoids the direct encoding of sparse residual signals.The proposed algorithm further utilizes the non-local correlation between the features and proposes a transformer-based autoencoder architecture to implement motion coding and conditional coding,which further improves the performance of compression.Experiments show that our Transformer based deep conditional video compression algorithm surpasses the current mainstream learning-based video compression algorithms in both HEVC and UVG datasets.
作者
鲁国
钟天雄
耿晶
LU Guo;ZHONG Tianxiong;GENG Jing(School of Computer Science and Engineering,Beijing Institute of Technology,Beijing 100081,China)
出处
《北京航空航天大学学报》
EI
CAS
CSCD
北大核心
2024年第2期442-448,共7页
Journal of Beijing University of Aeronautics and Astronautics
基金
国家自然科学基金(62102024)。