AI for Images & Design

Stable Diffusion — The Complete Guide

Stable Diffusion is an open-source AI image generation model you can run on your own computer — for free, with no per-image costs, no content restrictions and no data sent to any company. The trade-off is setup complexity. Once running, it is the most powerful and flexible AI image tool available, with full control over models, settings, and output.

Open Source AIFree — run locallyFull controlGPU requiredLast reviewed: April 2026

What is Stable Diffusion?

Stable Diffusion is an open-source image generation model released by Stability AI. Unlike Midjourney, DALL-E or Adobe Firefly — which run on company servers and charge per image — Stable Diffusion runs on your own computer. You download the model, run the software, and generate images locally. There is no per-image cost once you have hardware capable of running it. There is no company seeing your images. There are no content filters beyond what you configure yourself.

The current model family (Stable Diffusion 3.5 and the community-developed FLUX models based on earlier Stability research) produces image quality that competes with commercial tools. FLUX Pro in particular is considered by independent tests as producing the most photorealistic AI images available in 2026 — the outputs pass as real photographs in blind tests with professional photographers.

Stable Diffusion has spawned an enormous ecosystem. Thousands of fine-tuned models are available on Civitai and Hugging Face — trained for specific styles, characters, aesthetic directions. Interfaces like AUTOMATIC1111 and ComfyUI provide full control over every generation parameter. LoRA technology lets you train small model add-ons to produce consistent characters, styles or subjects. This ecosystem is more advanced than any commercial tool.

What you need to run it

GPU — Stable Diffusion runs on your graphics card. An NVIDIA GPU with at least 6GB of VRAM is the practical minimum. 8GB runs most base models comfortably. 12–24GB opens access to higher-quality models and larger resolutions. AMD GPUs work with some configurations but require additional setup. Apple Silicon Macs can run Stable Diffusion via the MPS backend — slower than NVIDIA but functional.

Storage — Model files range from 2GB to 15GB each. A typical installation with a few models and LoRAs requires 20–50GB of storage.

Software interface — AUTOMATIC1111 (the most popular) or ComfyUI (more powerful, steeper learning curve) provide the user interface. Both are free and open source.

Technical comfort — Installation requires running commands in a terminal, installing Python dependencies and configuring paths. If you have never used a command line, this will be challenging. Services like Pinokio automate much of the setup process.

Who Stable Diffusion is for

Developers building applications that incorporate AI image generation — Stable Diffusion's open nature makes it the foundation for thousands of products. Technical creators who need full control over every parameter without content restrictions. Privacy-conscious users who cannot or will not send creative work to third-party servers. High-volume generators for whom per-image costs on commercial tools would add up to significant expense. Researchers studying generative AI.

It is not suitable for non-technical users who want a simple interface. It is not suitable for people without a capable GPU. For straightforward image generation without technical complexity, Ideogram, Leonardo.ai or Canva AI are better starting points.

Why not just use Midjourney or DALL-E?

Commercial tools are faster to start, simpler to use, and require no hardware investment. For most users they are the right choice. Stable Diffusion's advantages apply when: you need content that commercial tools would refuse (mature content for adult platforms, violence for game assets, copyrighted characters for personal fan work); you need volume that would cost hundreds per month in API fees; you need complete privacy; or you are building a product that embeds AI generation.

Is Stable Diffusion free?

Yes — entirely free to download and run locally. The model weights for Stable Diffusion and SDXL are available under a licence that permits free use by anyone earning less than $1 million per year. For enterprise use above that threshold, Stability AI offers commercial licences. Cloud APIs via Stability AI's developer platform start at $0.01 per image for those who cannot run locally.

Getting started — the fastest path

Option A — Pinokio (easiest)

Pinokio is a browser for AI applications that automates the installation of Stable Diffusion and other AI tools. Download Pinokio from pinokio.computer. Open the app store within Pinokio. Search for AUTOMATIC1111 or ComfyUI. Click Install. Pinokio handles all dependencies automatically. This reduces a complex technical setup to a few clicks.

Option B — AUTOMATIC1111 (manual, most popular)

1. Install Python 3.10 and Git for your OS. 2. Clone the repository: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui. 3. Download a model file (.safetensors) from Hugging Face or Civitai — place it in models/Stable-diffusion/. 4. Run webui.bat (Windows) or webui.sh (Mac/Linux). The web interface opens in your browser at localhost:7860.

Option C — ComfyUI (for advanced workflows)

ComfyUI uses a node-based workflow editor — you build image generation pipelines visually by connecting nodes. It has a steeper learning curve than AUTOMATIC1111 but supports more complex workflows including advanced img2img, ControlNet and custom pipelines. Install via the official GitHub repository at github.com/comfyanonymous/ComfyUI.

Recommended models to start with

For photorealistic images: RealisticVision, Deliberate or any SDXL-based photorealism model. For artistic illustration: DreamShaper or Juggernaut. For anime/manga style: AnyMix or Counterfeit. For FLUX quality: Download FLUX.1-dev from Hugging Face (requires accepting the licence agreement).

14 Stable Diffusion prompts and techniques

Basic text-to-image prompting

Photorealistic portrait
Positive: a beautiful [age and demographic] in [setting], natural skin texture, detailed eyes, professional photography, 4k, studio lighting, photorealistic, sharp focus, highly detailed. Negative: (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, blurry, watermark, text, duplicate.
Fantasy character
Positive: epic fantasy [class — warrior / mage / rogue], [gender], intricate armour design, detailed cloth physics, glowing magical effects, dramatic backlighting, concept art, fantasy illustration, artstation trending, high detail. Negative: blurry, ugly, watermark, text, multiple people, duplicate.
Product photography
Positive: professional product photography of a [product], on [surface], studio light from upper left, bokeh background, [brand colour] accent, hyperrealistic, 4k, commercial photography, clean white space. Negative: hand, person, watermark, text, blurry, distortion.
Landscape
Positive: [time of day and season] [landscape type — forest, mountain, ocean], dramatic atmospheric perspective, volumetric light rays, highly detailed, photorealistic, 8k uhd, award-winning landscape photography, cinematic colour grade. Negative: text, watermark, people, cars, buildings (unless desired).

Advanced techniques

LoRA for consistent character
After training a LoRA on 15–20 images of a character: Include the LoRA trigger word in your prompt: '[trigger word], [character] [doing action] in [setting]. Adjust LoRA strength in the Additional Networks section (0.6–0.8 is typical starting range). Generate multiple images with the same trigger word to produce the character across different scenes consistently.
img2img for product variations
img2img workflow: Upload a base product image. Set denoising strength to 0.4 (lower = more similar to original, higher = more creative). Prompt: '[describe the variation — same product, different season / lighting / background colour]'. Generate. Produces variations that maintain the product's core appearance while changing specific elements.
Inpainting for background removal
Load image in the Inpaint tab. Paint a mask over the background (not the subject). Prompt: '[describe the new background — clean white studio / outdoor setting / gradient]'. Denoising: 0.8–0.95. Generates a new background while preserving the masked subject. More controllable than automated background removal for complex subjects.
Outpainting to extend an image
In AUTOMATIC1111 or ComfyUI, use the Outpainting script or the ControlNet Tile model. Extend the canvas in the direction needed. Prompt should describe the content that would naturally continue beyond the original frame. Denoising: 0.7–0.8. Useful for adjusting crop ratios while maintaining visual coherence.
Upscaling low-resolution outputs
Take any generated image and upscale in the Extras tab. Recommended upscalers: R-ESRGAN 4x+ for photorealistic images, R-ESRGAN 4x+ Anime6B for illustrated content. Scale factor: 2x or 4x depending on target resolution. Enable tile upscale for very large outputs to avoid memory issues.
Style transfer via image strength
img2img workflow for style transfer: Upload a photo. Upload a style reference via the Style image option in AUTOMATIC1111. Set image strength to 0.6. Describe the desired output. The generated image combines the composition from the input photo with the style from the reference. Useful for turning photography into illustration or vice versa.

Essential tips

Use negative prompts for every generation. A standard negative prompt removes the most common artefacts: (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation, watermark, text. Add this to every generation.

CFG Scale controls prompt adherence. Guidance Scale (CFG) determines how strictly the model follows your prompt. 7 is a good starting point. Lower (5–6) allows more creative interpretation. Higher (10–12) follows the prompt more literally but can produce over-saturated or distorted outputs.

Sampling steps: 20–30 is enough. More steps produce more detailed images but with diminishing returns above 30. DDIM and Euler a samplers converge faster — 20 steps with these is equivalent to 50+ steps with slower samplers.

Technical background

Stable Diffusion was developed by researchers at LMU Munich (CompVis group) and Stability AI, and released publicly in August 2022. The model architecture was described in the paper "High-Resolution Image Synthesis with Latent Diffusion Models" by Rombach et al. (2022), available at arXiv:2112.10752. This is one of the 19 foundational papers cited across the AI Atlas.

Stable Diffusion uses latent diffusion — a technique that performs the diffusion process in a compressed latent space rather than pixel space, making generation faster and less memory-intensive than pixel-space diffusion models. The model uses a VAE (Variational Autoencoder) to encode images into latent space and decode them back, and a U-Net to model the reverse diffusion process conditioned on CLIP text embeddings.

FLUX models

FLUX is a family of models developed by Black Forest Labs, founded by key members of the original Stable Diffusion research team. FLUX uses a diffusion transformer (DiT) architecture rather than U-Net. FLUX.1-dev and FLUX.1-schnell are available open-weight from Hugging Face. FLUX Pro is available via API. Independent photorealism benchmarks in 2026 consistently rank FLUX Pro as producing the highest-quality photorealistic outputs of any model.

AUTOMATIC1111 and ComfyUI

AUTOMATIC1111 (stable-diffusion-webui on GitHub) is the most widely used Stable Diffusion interface, developed by user AUTOMATIC1111 as an open-source project. ComfyUI is a node-based alternative developed by comfyanonymous, better suited for complex conditional generation pipelines. Both are free, open source and actively maintained.

Licensing

Stable Diffusion models use the CreativeML Open RAIL-M licence. For personal users and organisations with revenue under $1 million annually, the licence permits free use. Enterprise use above $1 million annual revenue requires a commercial licence from Stability AI. FLUX models have their own licensing terms — FLUX.1-schnell is Apache 2.0 (fully open), FLUX.1-dev has a non-commercial licence.

Primary sources cited in this guide