基于深度生成模型的视觉模式表示与编码

Visual Pattern Representation and Coding Based on Deep Generative Models

下载PDF

导出

摘要认为早期智能编码方法的性能受限于手工设计的方案,当前基于神经网络的编码方法可解释性不足,不利于后续面向人机视觉的分析与交互。受生成模型的启发,生成式编码方法通过构建生成模型来实现图像和视频的压缩和合成,获得可解释的紧凑视觉表示并生成符合图像先验分布的高视觉质量内容。其中概念图像编码与概念视频编码利用生成模型强大的样本生成能力与紧凑层次视觉表示模型,实现了编码性能更优的图像与视频编码;跨模态语义编码对图像与文本域进行跨模态转换与编码,保持可解释的同时实现上千倍的超高压缩比与令人满意的重构结果。 The performance of early intelligent encoding methods was limited by manually designed solutions,while current neural networkbased encoding methods lack interpretability,which hinders subsequent analysis and interaction between humans and machine vision.In⁃spired by generative models,the generative encoding methods aim to achieve compression and synthesis of images and videos by con⁃structing efficient generative models,obtaining interpretable compact visual representations,and synthesizing high-quality visual content that conforms to the prior distribution of images.Among them,conceptual image encoding and conceptual video encoding leverage the pow⁃erful sample generation capability and compact hierarchical visual representation models of generative models,resulting in superior encod⁃ing performance for images and videos.Cross-modal semantic coding,on the other hand,enables cross-modal transformation and coding between the image and text domains while maintaining interpretability,achieving ultra-high compression ratios of thousands of times and satisfactory reconstruction results.

作者郭怡琳常建慧黄成马思伟 GUO Yilin;CHANG Jianhui;HUANG Cheng;MA Siwei(Peking University Shenzhen Graduate School,Shenzhen 518055,China;Peking University,Beijing 100871,China;ZTE Corporation,Shenzhen 518057,China;Pengcheng Laboratory,Shenzhen 518057,China)

机构地区北京大学深圳研究生院北京大学中兴通讯股份有限公司鹏城实验室

出处《中兴通讯技术》北大核心 2024年第S01期60-66,共7页 ZTE Technology Journal

基金国家自然科学基金项目(62025101) 鹏城实验室重大攻关项目(PCL2024A02)。

关键词智能视频编码生成式编码跨模态压缩概念编码 intelligent video encoding generative encoding cross-modal compression conceptual coding

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1程萧潇,吴栎骞.生成式人工智能在内容分析中的应用及测量效度评估[J].全球传媒学刊,2024,11(2):51-78.
2祝岚,翟亚红,徐龙艳,王杰,赵逸凡,叶子恒.多尺度的开放词汇目标检测[J].湖北汽车工业学院学报,2024,38(3):77-80.
3娄铮铮,张万闯,吴云鹏.光斑密度峰值的毫米波雷达目标检测[J].小型微型计算机系统,2024,45(10):2455-2464.

中兴通讯技术

2024年第S01期

浏览历史

内容加载中请稍等...

基于深度生成模型的视觉模式表示与编码

相关作者

相关机构

相关主题

浏览历史