stable-diffusion-image-generation▌
davila7/claude-code-templates · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Text-to-image generation and image transformation with Stable Diffusion models via HuggingFace Diffusers.
- ›Supports multiple generation modes: text-to-image, image-to-image translation, inpainting, outpainting, and ControlNet spatial conditioning for precise control
- ›Compatible with SD 1.5, SDXL, SD 3.0, and Flux models; includes scheduler swapping (Euler, DPM-Solver, LCM) for quality and speed trade-offs
- ›LoRA adapter support for efficient style fine-tuning and multi-adapter compositio
Stable Diffusion Image Generation
Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.
When to use Stable Diffusion
Use Stable Diffusion when:
- Generating images from text descriptions
- Performing image-to-image translation (style transfer, enhancement)
- Inpainting (filling in masked regions)
- Outpainting (extending images beyond boundaries)
- Creating variations of existing images
- Building custom image generation workflows
Key features:
- Text-to-Image: Generate images from natural language prompts
- Image-to-Image: Transform existing images with text guidance
- Inpainting: Fill masked regions with context-aware content
- ControlNet: Add spatial conditioning (edges, poses, depth)
- LoRA Support: Efficient fine-tuning and style adaptation
- Multiple Models: SD 1.5, SDXL, SD 3.0, Flux support
Use alternatives instead:
- DALL-E 3: For API-based generation without GPU
- Midjourney: For artistic, stylized outputs
- Imagen: For Google Cloud integration
- Leonardo.ai: For web-based creative workflows
Quick start
Installation
pip install diffusers transformers accelerate torch
pip install xformers # Optional: memory-efficient attention
Basic text-to-image
from diffusers import DiffusionPipeline
import torch
# Load pipeline (auto-detects model type)
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe.to("cuda")
# Generate image
image = pipe(
"A serene mountain landscape at sunset, highly detailed",
num_inference_steps=50,
guidance_scale=7.5
).images[0]
image.save("output.png")
Using SDXL (higher quality)
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
# Enable memory optimization
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A futuristic city with flying cars, cinematic lighting",
height=1024,
width=1024,
num_inference_steps=30
).images[0]
Architecture overview
Three-pillar design
Diffusers is built around three core components:
Pipeline (orchestration)
├── Model (neural networks)
│ ├── UNet / Transformer (noise prediction)
│ ├── VAE (latent encoding/decoding)
│ └── Text Encoder (CLIP/T5)
└── Scheduler (denoising algorithm)
Pipeline inference flow
Text Prompt → Text Encoder → Text Embeddings
↓
Random Noise → [Denoising Loop] ← Scheduler
↓
Predicted Noise
↓
VAE Decoder → Final Image
Core concepts
Pipelines
Pipelines orchestrate complete workflows:
| Pipeline | Purpose |
|---|---|
StableDiffusionPipeline |
Text-to-image (SD 1.x/2.x) |
StableDiffusionXLPipeline |
Text-to-image (SDXL) |
StableDiffusion3Pipeline |
Text-to-image (SD 3.0) |
FluxPipeline |
Text-to-image (Flux models) |
StableDiffusionImg2ImgPipeline |
Image-to-image |
StableDiffusionInpaintPipeline |
Inpainting |
Schedulers
Schedulers control the denoising process:
| Scheduler | Steps | Quality | Use Case |
|---|---|---|---|
EulerDiscreteScheduler |
20-50 | Good | Default choice |
EulerAncestralDiscreteScheduler |
20-50 | Good | More variation |
DPMSolverMultistepScheduler |
15-25 | Excellent | Fast, high quality |
DDIMScheduler |
50-100 | Good | Deterministic |
LCMScheduler |
4-8 | Good | Very fast |
UniPCMultistepScheduler |
15-25 | Excellent | Fast convergence |
Swapping schedulers
from diffusers import DPMSolverMultistepScheduler
# Swap for faster generation
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
# Now generate with fewer steps
image = pipe(prompt, num_inference_steps=20).images[0]
Generation parameters
Key parameters
| Parameter | Default | Description |
|---|---|---|
prompt |
Required | Text description of desired image |
negative_prompt |
None | What to avoid in the image |
num_inference_steps |
50 | Denoising steps (more = better quality) |
guidance_scale |
7.5 | Prompt adherence (7-12 typical) |
height, width |
512/1024 | Output dimensions (multiples of 8) |
generator |
None | Torch generator for reproducibility |
num_images_per_prompt |
1 | Batch size |
Reproducible generation
import torch
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
prompt="A cat wearing a top hat",
generator=generator,
num_inference_steps=50
).images[0]
Negative prompts
image = pipe(
prompt="Professional photo of a dog in a garden",
negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
guidance_scale=7.5
).images[0]
Image-to-image
Transform existing images with text guidance:
from diffusers import AutoPipelineForImage2Image
from PIL import Image
pipe = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
init_image = Image.open("input.jpg").resize((512, 512))
image = pipe(
prompt="A watercolor painting of the scene",
image=init_image,
strength=0.75, # How much to transform (0-1)
num_inference_steps=50
).images[0]
Inpainting
Fill masked regions:
from diffusers import AutoPipelineForInpainting
from PIL import Image
pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")
image = Image.open("photo.jpg")
mask = Image.open("mask.png") # White = inpaint region
result = pipe(
prompt="A red car parked on the street",
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]
ControlNet
Add spatial conditioning for precise control:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
# Load ControlNet for edge conditioning
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_canny",
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
# Use Canny edge image as control
control_image = get_canny_image(input_image)
image = pipe(
prompt="A beautiful house in the style of Van Gogh",
image=control_image,
num_inference_steps=30
).images[0]
Available ControlNets
| ControlNet | Input Type | Use Case |
|---|---|---|
canny |
Edge maps | Preserve structure |
openpose |
Pose skeletons | Human poses |
depth |
Depth maps | 3D-aware generation |
normal |
Normal maps | Surface details |
mlsd |
Line segments | Architectural lines |
scribble |
Rough sketches | Sketch-to how to use stable-diffusion-image-generation How to use stable-diffusion-image-generation on CursorAI-first code editor with Composer 1 PrerequisitesBefore installing skills in Cursor, ensure your development environment meets these requirements:
2 Execute installation commandExecute the skills CLI command in your project's root directory to begin installation: $npx skills add https://github.com/davila7/claude-code-templates --skill stable-diffusion-image-generation The skills CLI fetches 3 Select Cursor when promptedThe CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor: ◆ Which agents do you want to install to? │ │ ── Universal (.agents/skills) ── always included ──── │ • Amp │ • Antigravity │ • Cline │ • Codex │ ●Cursor(selected) │ • Cursor │ • Windsurf 4 Verify installationConfirm successful installation by checking the skill directory location: .cursor/skills/stable-diffusion-image-generation Reload or restart Cursor to activate stable-diffusion-image-generation. Access the skill through slash commands (e.g., ⚠ Security & Verification NoticeWe perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use. Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment. List & Monetize Your SkillSubmit your Claude Code skill and start earning Use Cases▌User Story & Requirements GenerationCreate detailed user stories, acceptance criteria, and feature specs Example Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios ✓ Reduce spec writing time by 50%, ensure comprehensive coverage Competitive AnalysisResearch competitors, compare features, identify gaps Example Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities ✓ Complete competitive research in 2 hours instead of 2 days Roadmap PrioritizationEvaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs Example Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale ✓ Make data-driven prioritization decisions faster Stakeholder CommunicationDraft PRDs, status updates, and stakeholder presentations Example Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement ✓ Save 3-5 hours/week on communication overhead Implementation Guide▌Prerequisites
Time Estimate30-60 minutes to see productivity improvements Installation Steps
Common Pitfalls
Best Practices▌✓ Do
✗ Don't
💡 Pro Tips
When to Use This▌✓ Use WhenUse for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work. ✗ Avoid WhenAvoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed. Learning Path▌
DiscussionProduct Hunt–style comments (not star reviews)
general reviews Ratings4.6★★★★★53 reviews
showing 1-10 of 53 1 / 6 |