# Kandinsky

Generate images with powerful multilingual text understanding.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## What is Kandinsky?

Kandinsky is an image generation model developed by Sber AI:

* Strong multilingual text understanding
* High-quality image generation
* Image mixing and interpolation
* Inpainting and outpainting support
* Open source weights

## Resources

* **GitHub:** [ai-forever/Kandinsky-3](https://github.com/ai-forever/Kandinsky-3)
* **HuggingFace:** [kandinsky-community](https://huggingface.co/kandinsky-community)
* **Paper:** [Kandinsky Paper](https://arxiv.org/abs/2310.03502)

## Model Versions

| Version       | Resolution | Quality | Speed  |
| ------------- | ---------- | ------- | ------ |
| Kandinsky 2.1 | 768x768    | Good    | Fast   |
| Kandinsky 2.2 | 1024x1024  | Better  | Medium |
| Kandinsky 3   | 1024x1024  | Best    | Slower |

## Hardware Requirements

| Model                  | VRAM | Recommended GPU |
| ---------------------- | ---- | --------------- |
| Kandinsky 2.2          | 8GB  | RTX 3070        |
| Kandinsky 3            | 12GB | RTX 3090        |
| Kandinsky 3 (high res) | 16GB | RTX 4090        |

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install diffusers transformers accelerate gradio && \
python -c "
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'kandinsky-community/kandinsky-3',
    variant='fp16',
    torch_dtype=torch.float16
).to('cuda')

def generate(prompt, negative, steps, guidance, seed):
    generator = torch.Generator('cuda').manual_seed(seed) if seed > 0 else None
    image = pipe(
        prompt=prompt,
        negative_prompt=negative,
        num_inference_steps=steps,
        guidance_scale=guidance,
        generator=generator
    ).images[0]
    return image

gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label='Prompt'),
        gr.Textbox(label='Negative Prompt', value='low quality, blurry'),
        gr.Slider(10, 100, value=50, label='Steps'),
        gr.Slider(1, 20, value=4, label='Guidance'),
        gr.Number(value=-1, label='Seed')
    ],
    outputs=gr.Image(),
    title='Kandinsky 3'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
pip install diffusers transformers accelerate torch
```

## Basic Usage

### Kandinsky 3

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)
pipe.to("cuda")

image = pipe(
    prompt="A cat astronaut floating in space, digital art, vibrant colors",
    num_inference_steps=50,
    guidance_scale=4.0
).images[0]

image.save("cat_astronaut.png")
```

### Kandinsky 2.2

```python
import torch
from diffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline

# Load prior (text encoder)
prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

# Load decoder
decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Generate image embeddings
prompt = "A beautiful sunset over mountains, oil painting style"
image_embeds, negative_embeds = prior(
    prompt=prompt,
    guidance_scale=1.0
).to_tuple()

# Generate image
image = decoder(
    image_embeds=image_embeds,
    negative_image_embeds=negative_embeds,
    height=768,
    width=768,
    num_inference_steps=50
).images[0]

image.save("sunset.png")
```

## Multilingual Prompts

Kandinsky supports multiple languages:

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

# English
image_en = pipe("A red fox in a snowy forest").images[0]

# Russian
image_ru = pipe("Красная лиса в снежном лесу").images[0]

# Chinese
image_zh = pipe("雪林中的红狐狸").images[0]

# German
image_de = pipe("Ein roter Fuchs im verschneiten Wald").images[0]

# All produce similar images!
```

## Image Mixing

```python
import torch
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline
from diffusers.utils import load_image

prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Two prompts to mix
prompt1 = "A cat"
prompt2 = "A dog"

# Get embeddings for both
embeds1, neg1 = prior(prompt1).to_tuple()
embeds2, neg2 = prior(prompt2).to_tuple()

# Mix embeddings (50% each)
mixed_embeds = 0.5 * embeds1 + 0.5 * embeds2
mixed_neg = 0.5 * neg1 + 0.5 * neg2

# Generate mixed image
image = decoder(
    image_embeds=mixed_embeds,
    negative_image_embeds=mixed_neg,
    height=768,
    width=768
).images[0]

image.save("cat_dog_mix.png")
```

## Inpainting

```python
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    torch_dtype=torch.float16
).to("cuda")

# Load image and mask
image = load_image("photo.png")
mask = load_image("mask.png")

# Inpaint
result = pipe(
    prompt="A golden crown",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

result.save("inpainted.png")
```

## Image-to-Image

```python
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

init_image = load_image("sketch.png")

image = pipe(
    prompt="A detailed digital painting of a castle, fantasy art",
    image=init_image,
    strength=0.75,
    num_inference_steps=50
).images[0]

image.save("castle.png")
```

## Batch Generation

```python
import torch
from diffusers import AutoPipelineForText2Image
import os

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

prompts = [
    "A serene Japanese garden with cherry blossoms",
    "A cyberpunk city at night with neon lights",
    "An ancient library filled with magical books",
    "A cozy cabin in the mountains during winter"
]

os.makedirs("outputs", exist_ok=True)

for i, prompt in enumerate(prompts):
    image = pipe(
        prompt=prompt,
        num_inference_steps=50,
        guidance_scale=4.0
    ).images[0]

    image.save(f"outputs/image_{i}.png")
    print(f"Generated: {prompt[:30]}...")

    torch.cuda.empty_cache()
```

## Gradio Interface

```python
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

def generate(prompt, negative_prompt, steps, guidance, width, height, seed):
    generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None

    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=steps,
        guidance_scale=guidance,
        width=width,
        height=height,
        generator=generator
    ).images[0]

    return image

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="Describe your image..."),
        gr.Textbox(label="Negative Prompt", value="low quality, blurry, distorted"),
        gr.Slider(10, 100, value=50, step=5, label="Steps"),
        gr.Slider(1, 20, value=4, step=0.5, label="Guidance Scale"),
        gr.Slider(512, 1024, value=1024, step=64, label="Width"),
        gr.Slider(512, 1024, value=1024, step=64, label="Height"),
        gr.Number(value=-1, label="Seed (-1 for random)")
    ],
    outputs=gr.Image(label="Generated Image"),
    title="Kandinsky 3 - Image Generation",
    description="Generate images with multilingual prompts. Running on CLORE.AI."
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## Memory Optimization

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)

# Enable memory optimizations
pipe.enable_model_cpu_offload()

# Or for very low VRAM
pipe.enable_sequential_cpu_offload()

# Enable attention slicing
pipe.enable_attention_slicing()

image = pipe(
    prompt="A beautiful landscape",
    num_inference_steps=50
).images[0]
```

## Performance

| Model         | Resolution | GPU      | Time |
| ------------- | ---------- | -------- | ---- |
| Kandinsky 3   | 1024x1024  | RTX 3090 | 15s  |
| Kandinsky 3   | 1024x1024  | RTX 4090 | 10s  |
| Kandinsky 2.2 | 768x768    | RTX 3090 | 8s   |
| Kandinsky 2.2 | 768x768    | RTX 4090 | 5s   |

## Troubleshooting

### Out of Memory

**Problem:** CUDA OOM when generating

**Solutions:**

* Enable CPU offloading
* Reduce resolution
* Use Kandinsky 2.2 instead of 3
* Enable attention slicing

```python
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
```

### Poor Text Rendering

**Problem:** Text in images looks wrong

**Solutions:**

* Kandinsky struggles with text rendering (like most diffusion models)
* Add text in post-processing
* Use prompts that avoid text

### Colors Look Wrong

**Problem:** Image colors are washed out or oversaturated

**Solutions:**

* Adjust guidance scale (try 3-6 range)
* Specify color preferences in prompt
* Post-process with color correction

### Slow Generation

**Problem:** Takes too long to generate

**Solutions:**

* Reduce inference steps (30 is often enough)
* Use fp16 precision
* Use Kandinsky 2.2 for faster results
* Reduce resolution for previews

## Comparison with Other Models

| Feature       | Kandinsky 3 | SDXL      | FLUX    |
| ------------- | ----------- | --------- | ------- |
| Multilingual  | Excellent   | Limited   | Limited |
| Image Quality | High        | Very High | Highest |
| Speed         | Medium      | Medium    | Slow    |
| VRAM          | 12GB        | 12GB      | 24GB    |
| Inpainting    | Yes         | Yes       | Limited |

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

## Next Steps

* FLUX Generation - Highest quality images
* Stable Diffusion - Most popular option
* [PixArt](https://docs.clore.ai/guides/image-generation/pixart-image-gen) - Fast generation
* ComfyUI - Advanced workflows
