Enhancing aesthetic image generation with reinforcement learning guided prompt optimization in stable diffusion

Research output: Contribution to journalArticlepeer-review

Abstract

Generative models, e.g., stable diffusion, excel at producing compelling images but remain highly dependent on crafted prompts. Refining prompts for specific objectives, especially aesthetic quality, is time-consuming and inconsistent. We propose a novel approach that leverages LLMs to enhance prompt refinement process for stable diffusion. First, we propose a model to predict aesthetic image quality, examining various aesthetic elements in spatial, channel, and color domains. Reinforcement learning is employed to refine the prompt, starting from a rudimentary version and iteratively improving them with LLM's assistance. This iterative process is guided by a policy network updating prompts based on interactions with the generated images, with a reward function measuring aesthetic improvement and adherence to the prompt. Our experimental results demonstrate that this method significantly boosts the visual quality of generated images when using these refined prompts. Beyond image synthesis, this approach provides a broader framework for improving prompts across diverse applications with the support of LLMs.

Original languageEnglish
Article number104641
JournalJournal of Visual Communication and Image Representation
Volume114
DOIs
StatePublished - Jan 2026

Keywords

  • Artificial Intelligence Generated Content (AIGC)
  • Image aesthetic assessment
  • Large language models
  • Reinforcement learning
  • Stable diffusion

Fingerprint

Dive into the research topics of 'Enhancing aesthetic image generation with reinforcement learning guided prompt optimization in stable diffusion'. Together they form a unique fingerprint.

Cite this