期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Feature-Grounded Single-Stage Text-to-Image Generation
1
作者 Yuan Zhou Peng Wang +1 位作者 Lei Xiang Haofeng Zhang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第2期469-480,共12页
Recently,Generative Adversarial Networks(GANs)have become the mainstream text-to-image(T2I)framework.However,a standard normal distribution noise of inputs cannot provide sufficient information to synthesize an image ... Recently,Generative Adversarial Networks(GANs)have become the mainstream text-to-image(T2I)framework.However,a standard normal distribution noise of inputs cannot provide sufficient information to synthesize an image that approaches the ground-truth image distribution.Moreover,the multistage generation strategy results in complex T2I applications.Therefore,this study proposes a novel feature-grounded single-stage T2I model,which considers the“real”distribution learned from training images as one input and introduces a worst-case-optimized similarity measure into the loss function to enhance the model's generation capacity.Experimental results on two benchmark datasets demonstrate the competitive performance of the proposed model in terms of the Frechet inception distance and inception score compared to those of some classical and state-of-the-art models,showing the improved similarities among the generated image,text,and ground truth. 展开更多
关键词 text-to-image(T2I) feature-grounded single-stage generation Generative Adversarial Network(GAN)
原文传递
CRD-CGAN:category-consistent and relativistic constraints for diverse text-to-image generation
2
作者 Tao HU Chengjiang LONG Chunxia XIAO 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第1期61-75,共15页
Generating photo-realistic images from a text description is a challenging problem in computer vision.Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adve... Generating photo-realistic images from a text description is a challenging problem in computer vision.Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adversarial Networks(GANs).In this paper,we focus on the category-consistent and relativistic diverse constraints to optimize the diversity of synthetic images.Based on those constraints,a category-consistent and relativistic diverse conditional GAN(CRD-CGAN)is proposed to synthesize K photo-realistic images simultaneously.We use the attention loss and diversity loss to improve the sensitivity of the GAN to word attention and noises.Then,we employ the relativistic conditional loss to estimate the probability of relatively real or fake for synthetic images,which can improve the performance of basic conditional loss.Finally,we introduce a category-consistent loss to alleviate the over-category issues between K synthetic images.We evaluate our approach using the Caltech-UCSD Birds-200-2011,Oxford 102 flower and MS COCO 2014 datasets,and the extensive experiments demonstrate superiority of the proposed method in comparison with state-of-the-art methods in terms of photorealistic and diversity of the generated synthetic images. 展开更多
关键词 text-to-image diverse conditional GAN relativi-stic category-consistent
原文传递
A Comprehensive Pipeline for Complex Text-to-Image Synthesis
3
作者 Fei Fang Fei Luo +3 位作者 Hong-Pan Zhang Hua-Jian Zhou Alix L.H.Chow Chun-Xia Xiao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第3期522-537,共16页
Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing... Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing and computer vision.We model it as a combination of semantic entity recognition,object retrieval and recombination,and objects’status optimization.To reach a satisfactory result,we propose a comprehensive pipeline to convert the input text to its visual counterpart.The pipeline includes text processing,foreground objects and background scene retrieval,image synthesis using constrained MCMC,and post-processing.Firstly,we roughly divide the objects parsed from the input text into foreground objects and background scenes.Secondly,we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset,and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.Thirdly,in order to ensure the rationality of foreground objects’positions and sizes in the image synthesis step,we design a cost function and use the Markov Chain Monte Carlo(MCMC)method as the optimizer to solve this constrained layout problem.Finally,to make the image look natural and harmonious,we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step.The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks(GANs)in visual quality of generated scene images. 展开更多
关键词 image synthesis scene generation text-to-image conversion Markov Chain Monte Carlo(MCMC)
原文传递
Exploring the Latest Applications of OpenAI and ChatGPT: An In-Depth Survey
4
作者 Hong Zhang Haijian Shao 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第3期2061-2102,共42页
OpenAI and ChatGPT, as state-of-the-art languagemodels driven by cutting-edge artificial intelligence technology,have gained widespread adoption across diverse industries. In the realm of computer vision, these models... OpenAI and ChatGPT, as state-of-the-art languagemodels driven by cutting-edge artificial intelligence technology,have gained widespread adoption across diverse industries. In the realm of computer vision, these models havebeen employed for intricate tasks including object recognition, image generation, and image processing, leveragingtheir advanced capabilities to fuel transformative breakthroughs. Within the gaming industry, they have foundutility in crafting virtual characters and generating plots and dialogues, thereby enabling immersive and interactiveplayer experiences. Furthermore, these models have been harnessed in the realm of medical diagnosis, providinginvaluable insights and support to healthcare professionals in the realmof disease detection. The principal objectiveof this paper is to offer a comprehensive overview of OpenAI, OpenAI Gym, ChatGPT, DALL E, stable diffusion,the pre-trained clip model, and other pertinent models in various domains, encompassing CLIP Text-to-Image,education, medical imaging, computer vision, social influence, natural language processing, software development,coding assistance, and Chatbot, among others. Particular emphasis will be placed on comparative analysis andexamination of popular text-to-image and text-to-video models under diverse stimuli, shedding light on thecurrent research landscape, emerging trends, and existing challenges within the domains of OpenAI and ChatGPT.Through a rigorous literature review, this paper aims to deliver a professional and insightful overview of theadvancements, potentials, and limitations of these pioneering language models. 展开更多
关键词 OpenAI ChatGPT DALL E stable diffusion OpenAI Gym text-to-image text-to-video
下载PDF
Novel Framework for Generating Criminals Images Based on Textual Data Using Identity GANs
5
作者 Mohamed Fathallah Mohamed Sakr Sherif Eletriby 《Computers, Materials & Continua》 SCIE EI 2023年第7期383-396,共14页
Text-to-image generation is a vital task in different fields,such as combating crime and terrorism and quickly arresting lawbreakers.For several years,due to a lack of deep learning and machine learning resources,poli... Text-to-image generation is a vital task in different fields,such as combating crime and terrorism and quickly arresting lawbreakers.For several years,due to a lack of deep learning and machine learning resources,police officials required artists to draw the face of a criminal.Traditional methods of identifying criminals are inefficient and time-consuming.This paper presented a new proposed hybrid model for converting the text into the nearest images,then ranking the produced images according to the available data.The framework contains two main steps:generation of the image using an Identity Generative Adversarial Network(IGAN)and ranking of the images according to the available data using multi-criteria decision-making based on neutrosophic theory.The IGAN has the same architecture as the classical Generative Adversarial Networks(GANs),but with different modifications,such as adding a non-linear identity block,smoothing the standard GAN loss function by using a modified loss function and label smoothing,and using mini-batch training.The model achieves efficient results in Inception Distance(FID)and inception score(IS)when compared with other architectures of GANs for generating images from text.The IGAN achieves 42.16 as FID and 14.96 as IS.When it comes to ranking the generated images using Neutrosophic,the framework also performs well in the case of missing information and missing data. 展开更多
关键词 GAN deep learning text-to-image identity GAN
下载PDF
An inter-semiotic analysis of ideational meaning in text-prompted AI-generated images
6
作者 Arash Ghazvineh 《Language and Semiotic Studies》 2024年第1期17-42,共26页
This paper explores the inter-semiotic analysis of the ideational meaning in images generated by the text-to-image AI tool,Bing Image Creator.It adopts Kress and Van Leeuwen’s Grammar of Visual Design as its theoreti... This paper explores the inter-semiotic analysis of the ideational meaning in images generated by the text-to-image AI tool,Bing Image Creator.It adopts Kress and Van Leeuwen’s Grammar of Visual Design as its theoretical framework as the original grounding of the framework in systemic functional grammar(SFG)ensures a solid theoretical basis for undertaking analyses that involve the incorporation of textual and visual components.The integration of an AI generative model within the analytical framework enables a systematic connection between language and visual representations.This incorporation offers the potential to generate well-regulated pictorial representations that are systematically grounded in controlled textual prompts.This approach introduces a novel avenue for re-examining inter-semiotic processes,leveraging the power of AI technology.The paper argues that visual representations possess unique structural devices that surpass the limitations of verbal or written communication as they readily accommodate larger amounts of information in contrast to the limitations of the linear nature of alphabetic writing.Moreover,this paper extends its contribution by critically evaluating specific aspects of the Grammar of Visual Design. 展开更多
关键词 inter-semiotic analysis AI text-to-image generator systemic functional linguistics grammar of visual designs
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部