面向大模型艺术图像生成的提示词工程研究

Research on prompt engineering for large model art image generation

下载PDF

导出

摘要随着人工智能技术在艺术领域的迅速发展,提示词驱动的艺术图像生成已变得非常流行。然而,提示词生成艺术图像的规律和方法还未被充分研究。该研究通过CLIP模型计算和专家评估对Midjourney模型生成的图像进行定量评价,并结合网络民族志的参与式观察,全面揭示提示词生成艺术图像的规律和方法。研究结果发现,随着版本的提升(Midjourney V2到V5),Midjourney模型在美学质量方面得到显著地提升,突显了艺术家和创作者需要不断学习来适应新版本AI模型的重要性。为此,提出了优化的提示词公式,可快速高效地生成各种风格的高美学质量的图像。AI模型在不同主题中表现出不同的能力,Midjourney模型较为擅长生成油画、水彩水墨和二次元角色等,并在具象与抽象主题中表现得同样出色,而在素描和彩铅风格方面相对较弱。创作者应利用其优势风格进行图像创作。此外,还发现特定版本的优秀提示词组合可以极大提升生成图像的质量,精心设计提示词至关重要,且新版本并不一定比之前的版本更好。提示创作者需探索并积累与版本匹配的优秀提示词。该研究不仅揭示提示词生成艺术图像的规律和方法,也为创作者在AI艺术创作领域提供了理论和实践方面的指导。 With the rapid advancement of artificial intelligence technology in the field of art,prompt-driven art image generation has become highly popular.However,the rules and methods for generating artistic images using prompts remain underexplored.This study quantitatively evaluated images generated by the Midjourney model through CLIP model calculations and expert assessments,combined with participatory observation through netnography,to comprehensively reveal the rules and methods of prompt-generated art images.The results showed that with the advancement of versions(from Midjourney V2 to V5),the aesthetic quality of images generated by the Midjourney model has significantly improved,highlighting the necessity for artists and creators to continuously learn to adapt to the evolving AI models.Therefore,an optimized prompt formula was proposed,which can swiftly and efficiently generate various high-aesthetic quality images.The AI model demonstrated different capabilities across various themes,excelling in generating oil paintings,watercolor ink paintings,and anime characters,and performing well in both figurative and abstract themes,though relatively weaker in sketch and colored pencil styles.Creators should leverage its strengths in these styles for image creation.Additionally,it was found that using the best prompt combinations tailored to specific versions can greatly enhance the quality of generated images.Carefully designing prompts is crucial,and newer versions are not necessarily superior to older ones.Creators need to explore and accumulate the best prompts that match the versions.This study not only revealed the rules and methods of prompt-generated art images but also provided theoretical and practical guidance for art creators in the field of AI art creation.

作者王常圣 WANG Changsheng(Department of Performance,Film,and Animation,Sejong University,Seoul 05006,Republic of Korea)

机构地区世宗大学公演·影像·动画系

出处《图学学报》 CSCD 北大核心 2024年第6期1243-1255,共13页 Journal of Graphics

关键词 AI绘画提示工程 CLIP模型人工智能艺术文本生成图像网络民族志 AI painting prompt engineering CLIP model AI art text-to-image generation netnography