AIGC视觉内容生成与溯源研究进展被引量：1

Review on the progress of the AIGC visual contentgeneration and traceability

导出

摘要随着数字媒体与创意产业的快速发展,人工智能生成内容(artificial intelligence generated content, AIGC)技术以其在视觉内容生成中的创新应用而逐渐受到关注。本文旨在围绕AIGC视觉内容生成与溯源研究进展深入研讨。首先,针对图像生成技术进行探讨,从基于生成式对抗网络的传统方法出发,系统地分析了基于生成式对抗网络、自回归模型和扩散概率模型的最新进展。接着,深入探讨可控图像生成技术,突出了通过布局、线稿等附加信息以及基于视觉参考的方法来为创作者提供精确控制的技术现状。随着图像生成技术的革新和应用,生成图像的安全性问题逐渐浮现。而预先审核和过滤的技术手段已难以满足实际需求,故亟需实现生成内容的溯源来进行监管。因此,本文进而对生成图像溯源技术进行研讨,并聚焦水印技术在确保生成内容可靠性和安全性方面的应用。依据水印嵌入的流程节点,首先将现有的水印相关的生成图像溯源方法归为无水印嵌入的生成图像溯源、水印前置嵌入的生成图像溯源、水印后置嵌入的生成图像溯源以及联合生成的生成图像溯源并进行详细分析,然后介绍针对生成图像的水印攻击研究现状,最后对生成图像溯源技术进行总结和展望。鉴于视觉内容生成在质量和安全上的挑战,旨在为研究者提供一个视觉内容生成与溯源的系统研究视角,以促进数字媒体创作环境的安全与可信,并引导未来相关技术的发展方向。 In the contemporary digital era,which is characterized by rapid technological advancements,multimedia con⁃tent creation,particularly in visual content generation,has become an integral part of modern societal development.Theexponential growth of digital media and the creative industry has attracted attention to artificial intelligence generated con⁃tent(AIGC)technology.The groundbreaking applications of AIGC in visual content generation not only have equippedmultimedia creators with novel tools and capabilities but also have delivered substantial benefits across diverse domains,which span from the realms of cinema and gaming to the immersive landscapes of virtual reality.This review comprehensiveintroduces the profound advancements within AIGC technology.Our particular emphasis is on the domain of visual contentgeneration and its critical facet of traceability.Initially,our discussions trace the evolutionary path of image generationtechnology,from its inception within generative adversarial networks(GANs)to the latest advancements in Transformerauto-regressive models and diffusion probability models.This progression unveils a remarkable leap in the quality and capa⁃bility of image generation,which underscores the rapid evolution of this field.This evolution has transitioned from itsnascent stages to an era characterized by explosive growth.First,we delve into the development of GANs,encompassingtheir evolution from text-conditioned methods to sophisticated techniques for style control and the development of largescale models.This type of technology pioneered the text-to-image generation.GANs can further improve their performanceby expanding network parameters and dataset size due to their strong scalability.Furthermore,we explore the emergence ofTransformer-based auto-regressive models,such as DALL·E and CogView,which have heralded a new epoch in thedomain of image generation.The basic strategy of autoregressive models is to first use the Transformer structure to predictthe feature sequence of images based on other conditional feature sequences such as text and sketches.Then,it uses a spe⁃cially trained decoding network to decode these feature sequences into a complete image.They can generate realisticimages based on the large-scale parameters.In addition,our discourse delves into the burgeoning interest surrounding dif⁃fusion probability models,which are renowned for their stable training methods and their ability to yield high-quality out⁃puts.The diffusion models first adopt an iterative and random process to simulate the gradual transformation of the observeddata into a known noise distribution.Then,they reconstruct the original data in the opposite direction from the noise distri⁃bution.This random process based on stochastic approach provides a more stable training process,while it also demon⁃strates impressive results in terms of generated quality and diversity.As the development of AIGC technology continues toadvance,it encounters challenges,such as the enhancement in content quality and the need of precise control to align withspecific requisites.Within this context,this review conducts a thorough exploration of controllable image generation tech⁃nology,which is a pivotal research domain that strives to furnish meticulous control over the generated content.Thisachievement is facilitated through the integration of supplementary elements,such as intricate layouts,detailed sketches,and precise visual references.This approach empowers creators to preserve their artistic autonomy while upholding exact⁃ing standards of quality.One notable facet that has garnered considerable academic attention is the utilization of visual ref⁃erences as a mechanism to enable the generation of diverse styles and personalized outcomes by incorporating user-providedvisual elements.This review underscores the profound potential inherent in these methodologies,which illustrates theirtransformative role across domains such as digital art and interactive media.The development of these technologies intro⁃duces new horizons in digital creativity.However,it presents profound challenges,particularly in the domain of imageauthenticity and the potential for malevolent misuse.These risks are exemplified by the creation of deep fakes or the prolif⁃eration of fake news.These challenges extend far beyond mere technical intricacies;they encompass substantial risks per⁃taining to individual privacy,security,and the broader societal implications of eroding public trust and social stability.Inresponse to these formidable challenges,watermark-related image traceability technology has emerged as an indispensablesolution.This technology harnesses the power of watermarking techniques to authenticate and verify AI-generated images,which safeguards their integrity.Within the pages of this review,we meticulously categorize these watermarking techniquesinto distinct types:watermark-free embedding,watermark pre-embedding,watermark post-embedding,and joint genera⁃tion methods.First,we introduce the watermark-free embedding methods,which treat the generated traces left duringmodel generation as fingerprints.The inherent fingerprint information is used to achieve model attribution of generatedimages and achieve traceability purposes.Second,the watermark pre-embedding methods aim to embed the watermark intoinput training data such as noise and image.Another aim is to use the embedded watermark data to train the generationmodel,which can also introduce traceability information in the generated image.Third,the watermark post-embedding methods divide the process of generating watermark images into two stages:image generation and watermark embedding.Watermark embedding is performed after image generation.Finally,the joint generation methods aim to achieve adaptiveembedding of watermark information during the image generation process,minimize damage to the image generation pro⁃cess when fusing with image features,and ultimately generate images carrying watermarks.Each of these approaches playsa pivotal role in the verification of traceability across diverse scenarios,which offers a robust defense against potential mis⁃uses of AI-generated imagery.In conclusion,while AIGC technology offers promising new opportunities in visual contentcreation,it simultaneously causes significant challenges regarding the security and integrity of generated content.This com⁃prehensive review covers the breadth of AIGC technology,which starts from an overview of existing image generation tech⁃nologies,such as GANs,auto-regressive models,and diffusion probability models.It then categorizes and analyzes con⁃trollable image generation technology from the perspectives of additional conditions and visual examples.In addition,thereview focuses on watermark-related image traceability technology,discusses various watermark embedding techniques andthe current state of watermark attacks on generated images,and provides an extensive overview and future outlook of gen⁃eration image traceability technology.The aim is to offer researchers a detailed,systematic,and comprehensive perspec⁃tive on the advancements in AIGC visual content generation and traceability.This study deepens the understanding of cur⁃rent research trends,challenges,and future directions in this rapidly evolving field.

作者刘安安苏育挺王岚君李斌钱振兴张卫明周琳娜张新鹏张勇东黄继武俞能海 Liu Anan;Su Yuting;Wang Lanjun;Li Bin;Qian Zhenxing;Zhang Weiming;Zhou Linna;Zhang Xinpeng;Zhang Yongdong;Huang Jiwu;Yu Nenghai(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;College of Electronics and Information Engineering,Shenzhen University,Shenzhen 518060,China;School of Computer Science,Fudan University,Shanghai 200438,China;School of Information Science and Technology,University of Science and Technology of China,Hefei 230026,China;School of Cyberspace Security,Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Cyber Science and Technology,University of Science and Technology of China,Hefei 230027,China)

机构地区天津大学电气自动化与信息工程学院深圳大学电子信息与工程学院复旦大学计算机科学技术学院中国科学技术大学信息科学技术学院北京邮电大学网络空间安全学院中国科学技术大学网络空间安全学院

出处《中国图象图形学报》 CSCD 北大核心 2024年第6期1535-1554,共20页 Journal of Image and Graphics

基金国家自然科学基金项目(U21B2024,U20B2047,U2336206,U20B2051,U23B2022,62371330,62202329,62172053)。

关键词人工智能内容生成(AIGC) 视觉内容生成可控图像生成生成内容安全生成图像溯源 artificial intelligence generated content(AIGC) visual content generation controllable image generation security of generated content traceability of generated images

分类号 TP309.7 [自动化与计算机技术—计算机系统结构] TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1吴汉舟,张杰,李越,殷赵霞,张新鹏,田晖,李斌,张卫明,俞能海.人工智能模型水印研究进展[J].中国图象图形学报,2023,28(6):1792-1810. 被引量：7

二级参考文献3

1郑钢,胡东辉,戈辉,郑淑丽.生成对抗网络驱动的图像隐写与水印模型[J].中国图象图形学报,2021,26(10):2485-2502. 被引量：5
2王翌妃,周杨铭,钱振兴,李晟,张新鹏.鲁棒视频水印研究进展[J].中国图象图形学报,2022,27(1):27-42. 被引量：6
3孙杉,张卫明,方涵,俞能海.中文水印字库的自动生成方法[J].中国图象图形学报,2022,27(1):262-276. 被引量：6

共引文献6

1郭晶晶,刘玖樽,马勇,刘志全,熊宇鹏,苗可,李佳星,马建峰.基于模型水印的联邦学习后门攻击防御方法[J].计算机学报,2024,47(3):662-676.
2王金伟,姜晓丽,谭贵峰,罗向阳.生成图水印的前沿研究与展望[J].网络空间安全科学学报,2024,2(1):50-62.
3陈可江,李帅,张卫明,俞能海.基于知识注入的大语言模型水印[J].网络空间安全科学学报,2024,2(1):63-71.
4郝国文,徐青,杨烨,孙延岭,潘伟峰,邢汉.基于网络水印技术的水电智能终端安全通信方法研究[J].电力信息与通信技术,2024,22(6):52-58.
5谭景轩,钟楠,郭钰生,钱振兴,张新鹏.深度神经网络模型水印研究进展[J].上海理工大学学报,2024,46(3):225-242.
6王永威,沈弢,张圣宇,吴帆,赵洲,蔡海滨,吕承飞,马利庄,杨承磊,吴飞.大小模型端云协同进化技术进展[J].中国图象图形学报,2024,29(6):1510-1534.

同被引文献3

1蒲清平,向往.生成式人工智能——ChatGPT的变革影响、风险挑战及应对策略[J].重庆大学学报（社会科学版）,2023,29(3):102-114. 被引量：73
2吴萌.色彩心理在皮革制品设计中的应用[J].中国皮革,2023,52(12):141-144. 被引量：6
3许馨之.探究AIGC技术在包装设计领域中的渗透[J].鞋类工艺与设计,2024,4(10):12-14. 被引量：1

引证文献1

1王冰.生成式AI技术在鞋类配色设计中的应用与分析[J].鞋类工艺与设计,2024,4(14):198-200.

1绘画[J].画刊,2023(2):6-11.
2李津.图像生成技术在美术课程教育中的应用[J].美术教育研究,2024(8):154-156.
3谭景轩,钟楠,郭钰生,钱振兴,张新鹏.深度神经网络模型水印研究进展[J].上海理工大学学报,2024,46(3):225-242.
4刘楚阳.人工智能图像生成技术与非遗的传承创新[J].文化产业,2024(13):100-102.
5姚睿,郎宪明,袁开欣,侍殿超,张欣冉.基于改进的Swin Transformer的输油管道微小缺陷检测模型[J].管道技术与设备,2024(3):35-42.
6周怀宇,向双斌.人工智能“图生图”式景观平面生成技术的适用性评价与反思[J].景观设计学（中英文）,2024,12(2):58-73.

中国图象图形学报

2024年第6期

浏览历史

内容加载中请稍等...

AIGC视觉内容生成与溯源研究进展被引量：1

参考文献1

二级参考文献3

共引文献6

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

AIGC视觉内容生成与溯源研究进展 被引量：1

参考文献1

二级参考文献3

共引文献6

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

AIGC视觉内容生成与溯源研究进展被引量：1