# Kandinsky

Generate images with powerful multilingual text understanding.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## What is Kandinsky?

Kandinsky is an image generation model developed by Sber AI:

* Strong multilingual text understanding
* High-quality image generation
* Image mixing and interpolation
* Inpainting and outpainting support
* Open source weights

## Resources

* **GitHub:** [ai-forever/Kandinsky-3](https://github.com/ai-forever/Kandinsky-3)
* **HuggingFace:** [kandinsky-community](https://huggingface.co/kandinsky-community)
* **Paper:** [Kandinsky Paper](https://arxiv.org/abs/2310.03502)

## Model Versions

| Version       | Resolution | Quality | Speed  |
| ------------- | ---------- | ------- | ------ |
| Kandinsky 2.1 | 768x768    | Good    | Fast   |
| Kandinsky 2.2 | 1024x1024  | Better  | Medium |
| Kandinsky 3   | 1024x1024  | Best    | Slower |

## Hardware Requirements

| Model                  | VRAM | Recommended GPU |
| ---------------------- | ---- | --------------- |
| Kandinsky 2.2          | 8GB  | RTX 3070        |
| Kandinsky 3            | 12GB | RTX 3090        |
| Kandinsky 3 (high res) | 16GB | RTX 4090        |

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install diffusers transformers accelerate gradio && \
python -c "
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'kandinsky-community/kandinsky-3',
    variant='fp16',
    torch_dtype=torch.float16
).to('cuda')

def generate(prompt, negative, steps, guidance, seed):
    generator = torch.Generator('cuda').manual_seed(seed) if seed > 0 else None
    image = pipe(
        prompt=prompt,
        negative_prompt=negative,
        num_inference_steps=steps,
        guidance_scale=guidance,
        generator=generator
    ).images[0]
    return image

gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label='Prompt'),
        gr.Textbox(label='Negative Prompt', value='low quality, blurry'),
        gr.Slider(10, 100, value=50, label='Steps'),
        gr.Slider(1, 20, value=4, label='Guidance'),
        gr.Number(value=-1, label='Seed')
    ],
    outputs=gr.Image(),
    title='Kandinsky 3'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
pip install diffusers transformers accelerate torch
```

## Basic Usage

### Kandinsky 3

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)
pipe.to("cuda")

image = pipe(
    prompt="A cat astronaut floating in space, digital art, vibrant colors",
    num_inference_steps=50,
    guidance_scale=4.0
).images[0]

image.save("cat_astronaut.png")
```

### Kandinsky 2.2

```python
import torch
from diffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline

# Load prior (text encoder)
prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

# Load decoder
decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Generate image embeddings
prompt = "A beautiful sunset over mountains, oil painting style"
image_embeds, negative_embeds = prior(
    prompt=prompt,
    guidance_scale=1.0
).to_tuple()

# Generate image
image = decoder(
    image_embeds=image_embeds,
    negative_image_embeds=negative_embeds,
    height=768,
    width=768,
    num_inference_steps=50
).images[0]

image.save("sunset.png")
```

## Multilingual Prompts

Kandinsky supports multiple languages:

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

# English
image_en = pipe("A red fox in a snowy forest").images[0]

# Russian
image_ru = pipe("Красная лиса в снежном лесу").images[0]

# Chinese
image_zh = pipe("雪林中的红狐狸").images[0]

# German
image_de = pipe("Ein roter Fuchs im verschneiten Wald").images[0]

# All produce similar images!
```

## Image Mixing

```python
import torch
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline
from diffusers.utils import load_image

prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Two prompts to mix
prompt1 = "A cat"
prompt2 = "A dog"

# Get embeddings for both
embeds1, neg1 = prior(prompt1).to_tuple()
embeds2, neg2 = prior(prompt2).to_tuple()

# Mix embeddings (50% each)
mixed_embeds = 0.5 * embeds1 + 0.5 * embeds2
mixed_neg = 0.5 * neg1 + 0.5 * neg2

# Generate mixed image
image = decoder(
    image_embeds=mixed_embeds,
    negative_image_embeds=mixed_neg,
    height=768,
    width=768
).images[0]

image.save("cat_dog_mix.png")
```

## Inpainting

```python
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    torch_dtype=torch.float16
).to("cuda")

# Load image and mask
image = load_image("photo.png")
mask = load_image("mask.png")

# Inpaint
result = pipe(
    prompt="A golden crown",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

result.save("inpainted.png")
```

## Image-to-Image

```python
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

init_image = load_image("sketch.png")

image = pipe(
    prompt="A detailed digital painting of a castle, fantasy art",
    image=init_image,
    strength=0.75,
    num_inference_steps=50
).images[0]

image.save("castle.png")
```

## Batch Generation

```python
import torch
from diffusers import AutoPipelineForText2Image
import os

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

prompts = [
    "A serene Japanese garden with cherry blossoms",
    "A cyberpunk city at night with neon lights",
    "An ancient library filled with magical books",
    "A cozy cabin in the mountains during winter"
]

os.makedirs("outputs", exist_ok=True)

for i, prompt in enumerate(prompts):
    image = pipe(
        prompt=prompt,
        num_inference_steps=50,
        guidance_scale=4.0
    ).images[0]

    image.save(f"outputs/image_{i}.png")
    print(f"Generated: {prompt[:30]}...")

    torch.cuda.empty_cache()
```

## Gradio Interface

```python
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

def generate(prompt, negative_prompt, steps, guidance, width, height, seed):
    generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None

    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=steps,
        guidance_scale=guidance,
        width=width,
        height=height,
        generator=generator
    ).images[0]

    return image

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="Describe your image..."),
        gr.Textbox(label="Negative Prompt", value="low quality, blurry, distorted"),
        gr.Slider(10, 100, value=50, step=5, label="Steps"),
        gr.Slider(1, 20, value=4, step=0.5, label="Guidance Scale"),
        gr.Slider(512, 1024, value=1024, step=64, label="Width"),
        gr.Slider(512, 1024, value=1024, step=64, label="Height"),
        gr.Number(value=-1, label="Seed (-1 for random)")
    ],
    outputs=gr.Image(label="Generated Image"),
    title="Kandinsky 3 - Image Generation",
    description="Generate images with multilingual prompts. Running on CLORE.AI."
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## Memory Optimization

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)

# Enable memory optimizations
pipe.enable_model_cpu_offload()

# Or for very low VRAM
pipe.enable_sequential_cpu_offload()

# Enable attention slicing
pipe.enable_attention_slicing()

image = pipe(
    prompt="A beautiful landscape",
    num_inference_steps=50
).images[0]
```

## Performance

| Model         | Resolution | GPU      | Time |
| ------------- | ---------- | -------- | ---- |
| Kandinsky 3   | 1024x1024  | RTX 3090 | 15s  |
| Kandinsky 3   | 1024x1024  | RTX 4090 | 10s  |
| Kandinsky 2.2 | 768x768    | RTX 3090 | 8s   |
| Kandinsky 2.2 | 768x768    | RTX 4090 | 5s   |

## Troubleshooting

### Out of Memory

**Problem:** CUDA OOM when generating

**Solutions:**

* Enable CPU offloading
* Reduce resolution
* Use Kandinsky 2.2 instead of 3
* Enable attention slicing

```python
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
```

### Poor Text Rendering

**Problem:** Text in images looks wrong

**Solutions:**

* Kandinsky struggles with text rendering (like most diffusion models)
* Add text in post-processing
* Use prompts that avoid text

### Colors Look Wrong

**Problem:** Image colors are washed out or oversaturated

**Solutions:**

* Adjust guidance scale (try 3-6 range)
* Specify color preferences in prompt
* Post-process with color correction

### Slow Generation

**Problem:** Takes too long to generate

**Solutions:**

* Reduce inference steps (30 is often enough)
* Use fp16 precision
* Use Kandinsky 2.2 for faster results
* Reduce resolution for previews

## Comparison with Other Models

| Feature       | Kandinsky 3 | SDXL      | FLUX    |
| ------------- | ----------- | --------- | ------- |
| Multilingual  | Excellent   | Limited   | Limited |
| Image Quality | High        | Very High | Highest |
| Speed         | Medium      | Medium    | Slow    |
| VRAM          | 12GB        | 12GB      | 24GB    |
| Inpainting    | Yes         | Yes       | Limited |

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

## Next Steps

* FLUX Generation - Highest quality images
* Stable Diffusion - Most popular option
* [PixArt](/guides/image-generation/pixart-image-gen.md) - Fast generation
* ComfyUI - Advanced workflows


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/other-workloads/kandinsky.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
