Stable Diffusion
- Deeplearning Text (prompt) to image model
- Released in 2022
- Based on diffusion techniques
- Frozen CLIP ViT-L/14 text encoder
Installation
# Macos dependencies
brew install cmake protobuf rust [email protected] git wget
# Clone repo
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
# Install and run
./stable-diffusion-webui/webui.sh
Configuration
- Config executation:
webui-user.sh
./webui.sh \
--share
--disable-nan-check \
--no-half \
--api \
--vae-path=<path> \
--no-half-vae
# same as passing directly via parameter
export COMMANDLINE_ARGS=""
Models
Checkpoints
- Stable Diffusion models are known as
checkpoints models
.safetensors
extension
# Copy your own checkpoint models
cp "Counterfeit-V2.5.safetensors" ./models/Stable-diffusion
VAE
- There is also another image enhancer which is the
VAE
(Variation Auto Encoder) model. It is the .vae.pt
files
# Copy VAE
cp "Counterfeit-V2.5.vae.pt" ./models/VAE
Embeddings
Parameters
txt2img
- Prompt
- https://safebooru.org/
- Tags and common keywords for building the prompts
- Negative Prompt
- Keyword that you want to avoid in the image
- Sampling Method
- The algorithm to produce images
- Batch count
- Number of drawings in the same image (no impact on performance)
- Batch size
- Number of images to generate (impacts on performance)
img2img
- Upload a picture to be used as a base
Extensions
LoRa
ControlNet
API
# txt2img
curl -X POST http://127.0.0.1:7860/sdapi/v1/txt2img \
-d '{
"prompt": "puppy dog",
"negative_prompt": "sad",
"steps": 5
}'
- The response is the list of images base64 encoded
// response
{
"images": ["...", "..."]
}