Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 50 additions & 37 deletions docs/distilled_sd.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,66 @@
# Running distilled models: SSD1B and SD1.x with tiny U-Nets
# Running distilled models: SSD1B and SDx.x with tiny U-Nets

## Preface
## Preface

This kind of models have a reduced U-Net part.
Unlike other SDXL models the U-Net of SSD1B has only one middle block and lesser attention layers in up and down blocks, resulting in relatively smaller files. Running these models saves more than 33% of the time. For more details, refer to Segmind's paper on https://arxiv.org/abs/2401.02677v1 .
Unlike other SD 1.x models Tiny-UNet models consist of only 6 U-Net blocks, resulting in relatively smaller files (approximately 1 GB). Running these models saves almost 50% of the time. For more details, refer to the paper: https://arxiv.org/pdf/2305.15798.pdf .
These models feature a reduced U-Net architecture. Unlike standard SDXL models, the SSD-1B U-Net contains only one middle block and fewer attention layers in its up- and down-blocks, resulting in significantly smaller file sizes. Using these models can reduce inference time by more than 33%. For more details, refer to Segmind's paper: https://arxiv.org/abs/2401.02677v1.
Similarly, SD1.x- and SD2.x-style models with a tiny U-Net consist of only 6 U-Net blocks, leading to very small files and time savings of up to 50%. For more information, see the paper: https://arxiv.org/pdf/2305.15798.pdf.

## SSD1B

Unfortunately not all of this models follow the standard model parameter naming mapping.
Anyway there are some very useful SSD1B models available online, such as:
Note that not all of these models follow the standard parameter naming conventions. However, several useful SSD-1B models are available online, such as:

* https://huggingface.co/segmind/SSD-1B/resolve/main/SSD-1B-A1111.safetensors
* https://huggingface.co/hassenhamdi/SSD-1B-fp8_e4m3fn/resolve/main/SSD-1B_fp8_e4m3fn.safetensors
* https://huggingface.co/hassenhamdi/SSD-1B-fp8_e4m3fn/resolve/main/SSD-1B_fp8_e4m3fn.safetensors

Also there are useful LORAs available:
Useful LoRAs are also available:

* https://huggingface.co/seungminh/lora-swarovski-SSD-1B/resolve/main/pytorch_lora_weights.safetensors
* https://huggingface.co/kylielee505/mylcmlorassd/resolve/main/pytorch_lora_weights.safetensors
* https://huggingface.co/kylielee505/mylcmlorassd/resolve/main/pytorch_lora_weights.safetensors

You can use this files **out-of-the-box** - unlike models in next section.
These files can be used out-of-the-box, unlike the models described in the next section.


## SD1.x with tiny U-Nets
## SD1.x, SD2.x with tiny U-Nets

There are some Tiny SD 1.x models available online, such as:
These models require conversion before use. You will need a Python script provided by the diffusers team, available on GitHub:

* https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/scripts/convert_diffusers_to_original_stable_diffusion.py

### SD2.x

NotaAI provides the following model online:

* https://huggingface.co/nota-ai/bk-sdm-v2-tiny

Creating a .safetensors file involves two steps. First, run this short Python script to download the model from Hugging Face:

```python
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("nota-ai/bk-sdm-v2-tiny",cache_dir="./")
```

Second, create the .safetensors file by running:

```bash
python convert_diffusers_to_original_stable_diffusion.py \
--model_path models--nota-ai--bk-sdm-v2-tiny/snapshots/68277af553777858cd47e133f92e4db47321bc74 \
--checkpoint_path bk-sdm-v2-tiny.safetensors --half --use_safetensors
```

This will generate the **file bk-sdm-v2-tiny.safetensors**, which is now ready for use with sd.cpp.

### SD1.x

Several Tiny SD 1.x models are available online, such as:

* https://huggingface.co/segmind/tiny-sd
* https://huggingface.co/segmind/portrait-finetuned
* https://huggingface.co/nota-ai/bk-sdm-tiny

These models need some conversion, for example because partially tensors are **non contiguous** stored. To create a usable checkpoint file, follow these **easy** steps:
These models also require conversion, partly because some tensors are stored in a non-contiguous manner. To create a usable checkpoint file, follow these simple steps:
Download and prepare the model using Python:

### Download model from Hugging Face

Download the model using Python on your computer, for example this way:
##### Download the model using Python on your computer, for example this way:

```python
import torch
Expand All @@ -46,35 +72,22 @@ for param in unet.parameters():
pipe.save_pretrained("segmindtiny-sd", safe_serialization=True)
```

### Convert that to a ckpt file

To convert the downloaded model to a checkpoint file, you need another Python script. Download the conversion script from here:

* https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/scripts/convert_diffusers_to_original_stable_diffusion.py


### Run convert script

Now, run that conversion script:
##### Run the conversion script:

```bash
python convert_diffusers_to_original_stable_diffusion.py \
--model_path ./segmindtiny-sd \
--checkpoint_path ./segmind_tiny-sd.ckpt --half
--model_path ./segmindtiny-sd \
--checkpoint_path ./segmind_tiny-sd.ckpt --half
```

The file **segmind_tiny-sd.ckpt** will be generated and is now ready to use with sd.cpp
The file segmind_tiny-sd.ckpt will be generated and is now ready for use with sd.cpp. You can follow a similar process for the other models mentioned above.

You can follow a similar process for other models mentioned above from Hugging Face.


### Another ckpt file on the net

There is another model file available online:
### Another available .ckpt file:

* https://huggingface.co/ClashSAN/small-sd/resolve/main/tinySDdistilled.ckpt
If you want to use that, you have to adjust some **non-contiguous tensors** first:

To use this file, you must first adjust its non-contiguous tensors:

```python
import torch
Expand Down
3 changes: 3 additions & 0 deletions model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1788,6 +1788,9 @@ SDVersion ModelLoader::get_sd_version() {
if (is_inpaint) {
return VERSION_SD2_INPAINT;
}
if (!has_middle_block_1) {
return VERSION_SD2_TINY_UNET;
}
return VERSION_SD2;
}
return VERSION_COUNT;
Expand Down
3 changes: 2 additions & 1 deletion model.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ enum SDVersion {
VERSION_SD1_TINY_UNET,
VERSION_SD2,
VERSION_SD2_INPAINT,
VERSION_SD2_TINY_UNET,
VERSION_SDXL,
VERSION_SDXL_INPAINT,
VERSION_SDXL_PIX2PIX,
Expand All @@ -52,7 +53,7 @@ static inline bool sd_version_is_sd1(SDVersion version) {
}

static inline bool sd_version_is_sd2(SDVersion version) {
if (version == VERSION_SD2 || version == VERSION_SD2_INPAINT) {
if (version == VERSION_SD2 || version == VERSION_SD2_INPAINT || version == VERSION_SD2_TINY_UNET) {
return true;
}
return false;
Expand Down
1 change: 1 addition & 0 deletions stable-diffusion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ const char* model_version_to_str[] = {
"SD 1.x Tiny UNet",
"SD 2.x",
"SD 2.x Inpaint",
"SD 2.x Tiny UNet",
"SDXL",
"SDXL Inpaint",
"SDXL Instruct-Pix2Pix",
Expand Down
21 changes: 12 additions & 9 deletions unet.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ class UnetModelBlock : public GGMLBlock {
int num_head_channels = -1; // channels // num_heads
int context_dim = 768; // 1024 for VERSION_SD2, 2048 for VERSION_SDXL
bool use_linear_projection = false;
bool tiny_unet = false;

public:
int model_channels = 320;
Expand Down Expand Up @@ -208,15 +209,17 @@ class UnetModelBlock : public GGMLBlock {
num_head_channels = 64;
num_heads = -1;
use_linear_projection = true;
} else if (version == VERSION_SD1_TINY_UNET) {
num_res_blocks = 1;
channel_mult = {1, 2, 4};
}
if (sd_version_is_inpaint(version)) {
in_channels = 9;
} else if (sd_version_is_unet_edit(version)) {
in_channels = 8;
}
if (version == VERSION_SD1_TINY_UNET || version == VERSION_SD2_TINY_UNET) {
num_res_blocks = 1;
channel_mult = {1, 2, 4};
tiny_unet = true;
}

// dims is always 2
// use_temporal_attention is always True for SVD
Expand Down Expand Up @@ -290,7 +293,7 @@ class UnetModelBlock : public GGMLBlock {
context_dim));
}
input_block_chans.push_back(ch);
if (version == VERSION_SD1_TINY_UNET) {
if (tiny_unet) {
input_block_idx++;
}
}
Expand All @@ -311,7 +314,7 @@ class UnetModelBlock : public GGMLBlock {
d_head = num_head_channels;
n_head = ch / d_head;
}
if (version != VERSION_SD1_TINY_UNET) {
if (!tiny_unet) {
blocks["middle_block.0"] = std::shared_ptr<GGMLBlock>(get_resblock(ch, time_embed_dim, ch));
if (version != VERSION_SDXL_SSD1B) {
blocks["middle_block.1"] = std::shared_ptr<GGMLBlock>(get_attention_layer(ch,
Expand Down Expand Up @@ -358,7 +361,7 @@ class UnetModelBlock : public GGMLBlock {
}

if (i > 0 && j == num_res_blocks) {
if (version == VERSION_SD1_TINY_UNET) {
if (tiny_unet) {
output_block_idx++;
if (output_block_idx == 2) {
up_sample_idx = 1;
Expand Down Expand Up @@ -495,7 +498,7 @@ class UnetModelBlock : public GGMLBlock {
}
hs.push_back(h);
}
if (version == VERSION_SD1_TINY_UNET) {
if (tiny_unet) {
input_block_idx++;
}
if (i != len_mults - 1) {
Expand All @@ -512,7 +515,7 @@ class UnetModelBlock : public GGMLBlock {
// [N, 4*model_channels, h/8, w/8]

// middle_block
if (version != VERSION_SD1_TINY_UNET) {
if (!tiny_unet) {
h = resblock_forward("middle_block.0", ctx, h, emb, num_video_frames); // [N, 4*model_channels, h/8, w/8]
if (version != VERSION_SDXL_SSD1B) {
h = attention_layer_forward("middle_block.1", ctx, h, context, num_video_frames); // [N, 4*model_channels, h/8, w/8]
Expand Down Expand Up @@ -554,7 +557,7 @@ class UnetModelBlock : public GGMLBlock {
}

if (i > 0 && j == num_res_blocks) {
if (version == VERSION_SD1_TINY_UNET) {
if (tiny_unet) {
output_block_idx++;
if (output_block_idx == 2) {
up_sample_idx = 1;
Expand Down
Loading