> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-ru/drugie-nagruzki/kandinsky.md).

# Kandinsky

Генерируйте изображения с мощным многоязычным пониманием текста.

{% hint style="success" %}
Все примеры можно запускать на GPU-серверах, арендуемых через [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Что такое Кандинский?

Kandinsky — это модель генерации изображений, разработанная Sber AI:

* Сильное многоязычное понимание текста
* Высококачественная генерация изображений
* Смешивание изображений и интерполяция
* Поддержка инпейнтинга и аутпейнтинга
* Открытые веса с исходным кодом

## Ресурсы

* **GitHub:** [ai-forever/Kandinsky-3](https://github.com/ai-forever/Kandinsky-3)
* **HuggingFace:** [kandinsky-community](https://huggingface.co/kandinsky-community)
* **Статья:** [Статья о Kandinsky](https://arxiv.org/abs/2310.03502)

## Версии модели

| Версия        | Разрешение | Качество | Скорость  |
| ------------- | ---------- | -------- | --------- |
| Kandinsky 2.1 | 768x768    | Хорошо   | Быстро    |
| Kandinsky 2.2 | 1024x1024  | Лучше    | Средне    |
| Kandinsky 3   | 1024x1024  | Лучшее   | Медленнее |

## Требования к аппаратному обеспечению

| Модель                           | VRAM | Рекомендуемый GPU |
| -------------------------------- | ---- | ----------------- |
| Kandinsky 2.2                    | 8GB  | RTX 3070          |
| Kandinsky 3                      | 12GB | RTX 3090          |
| Kandinsky 3 (высокое разрешение) | 16GB | RTX 4090          |

## Быстрое развертывание

**Docker-образ:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Порты:**

```
22/tcp
7860/http
```

**Команда:**

```bash
pip install diffusers transformers accelerate gradio && \
python -c "
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'kandinsky-community/kandinsky-3',
    variant='fp16',
    torch_dtype=torch.float16
).to('cuda')

def generate(prompt, negative, steps, guidance, seed):
    generator = torch.Generator('cuda').manual_seed(seed) if seed > 0 else None
    image = pipe(
        prompt=prompt,
        negative_prompt=negative,
        num_inference_steps=steps,
        guidance_scale=guidance,
        generator=generator
    ).images[0]
    return image

gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label='Prompt'),
        gr.Textbox(label='Negative Prompt', value='low quality, blurry'),
        gr.Slider(10, 100, value=50, label='Steps'),
        gr.Slider(1, 20, value=4, label='Guidance'),
        gr.Number(value=-1, label='Seed')
    ],
    outputs=gr.Image(),
    title='Kandinsky 3'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## Доступ к вашему сервису

После развертывания найдите ваш `http_pub` URL в **Моих заказах**:

1. Перейдите на **Моих заказах** страницу
2. Нажмите на ваш заказ
3. Найдите `http_pub` URL (например, `abc123.clorecloud.net`)

Используйте `https://YOUR_HTTP_PUB_URL` вместо `localhost` в примерах ниже.

## Установка

```bash
pip install diffusers transformers accelerate torch
```

## Базовое использование

### Kandinsky 3

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)
pipe.to("cuda")

image = pipe(
    prompt="Кот-астронавт, плавающий в космосе, цифровое искусство, яркие цвета",
    num_inference_steps=50,
    guidance_scale=4.0
).images[0]

image.save("cat_astronaut.png")
```

### Kandinsky 2.2

```python
import torch
from diffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline

# Загрузить prior (кодировщик текста)
prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

# Загрузить декодер
decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Сгенерировать эмбеддинги изображения
prompt = "Красивый закат над горами, в стиле масляной живописи"
image_embeds, negative_embeds = prior(
    prompt=prompt,
    guidance_scale=1.0
).to_tuple()

# Сгенерировать изображение
image = decoder(
    image_embeds=image_embeds,
    negative_image_embeds=negative_embeds,
    height=768,
    width=768,
    num_inference_steps=50
).images[0]

image.save("sunset.png")
```

## Многоязычные подсказки

Kandinsky поддерживает несколько языков:

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

# Английский
image_en = pipe("A red fox in a snowy forest").images[0]

# Русский
image_ru = pipe("Красная лиса в снежном лесу").images[0]

# Китайский
image_zh = pipe("雪林中的红狐狸").images[0]

# Немецкий
image_de = pipe("Ein roter Fuchs im verschneiten Wald").images[0]

# Все дают похожие изображения!
```

## Смешивание изображений

```python
import torch
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline
from diffusers.utils import load_image

prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Две подсказки для смешивания
prompt1 = "A cat"
prompt2 = "A dog"

# Получить эмбеддинги для обеих
embeds1, neg1 = prior(prompt1).to_tuple()
embeds2, neg2 = prior(prompt2).to_tuple()

# Смешать эмбеддинги (по 50%)
mixed_embeds = 0.5 * embeds1 + 0.5 * embeds2
mixed_neg = 0.5 * neg1 + 0.5 * neg2

# Сгенерировать смешанное изображение
image = decoder(
    image_embeds=mixed_embeds,
    negative_image_embeds=mixed_neg,
    height=768,
    width=768
).images[0]

image.save("cat_dog_mix.png")
```

## Инпэйнтинг

```python
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    torch_dtype=torch.float16
).to("cuda")

# Загрузить изображение и маску
image = load_image("photo.png")
mask = load_image("mask.png")

# Инпейнтинг
result = pipe(
    prompt="Золотая корона",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

result.save("inpainted.png")
```

## Изображение в изображение

```python
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

init_image = load_image("sketch.png")

image = pipe(
    prompt="Детализованная цифровая живопись замка, фэнтези-арт",
    image=init_image,
    strength=0.75,
    num_inference_steps=50
).images[0]

image.save("castle.png")
```

## Пакетная генерация

```python
import torch
from diffusers import AutoPipelineForText2Image
import os

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

prompts = [
    "Спокойный японский сад с цветущей вишней",
    "Киберпанк-город ночью с неоновыми огнями",
    "Древняя библиотека, полная волшебных книг",
    "Уютная хижина в горах зимой"
]

os.makedirs("outputs", exist_ok=True)

for i, prompt in enumerate(prompts):
    image = pipe(
        prompt=prompt,
        num_inference_steps=50,
        guidance_scale=4.0
    ).images[0]

    image.save(f"outputs/image_{i}.png")
    print(f"Generated: {prompt[:30]}...")

    torch.cuda.empty_cache()
```

## Интерфейс Gradio

```python
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

def generate(prompt, negative_prompt, steps, guidance, width, height, seed):
    generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None

    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=steps,
        guidance_scale=guidance,
        width=width,
        height=height,
        generator=generator
    ).images[0]

    return image

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="Опишите ваше изображение..."),
        gr.Textbox(label="Negative Prompt", value="low quality, blurry, distorted"),
        gr.Slider(10, 100, value=50, step=5, label="Steps"),
        gr.Slider(1, 20, value=4, step=0.5, label="Guidance Scale"),
        gr.Slider(512, 1024, value=1024, step=64, label="Width"),
        gr.Slider(512, 1024, value=1024, step=64, label="Height"),
        gr.Number(value=-1, label="Seed (-1 для случайного)")
    ],
    outputs=gr.Image(label="Generated Image"),
    title="Kandinsky 3 - Генерация изображений",
    description="Генерируйте изображения с многоязычными подсказками. Запуск на CLORE.AI."
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## Оптимизация памяти

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)

# Включить оптимизации памяти
pipe.enable_model_cpu_offload()

# Или для очень малого объема VRAM
pipe.enable_sequential_cpu_offload()

# Включить нарезку внимания
pipe.enable_attention_slicing()

image = pipe(
    prompt="Красивый пейзаж",
    num_inference_steps=50
).images[0]
```

## Производительность

| Модель        | Разрешение | GPU      | Время |
| ------------- | ---------- | -------- | ----- |
| Kandinsky 3   | 1024x1024  | RTX 3090 | 15s   |
| Kandinsky 3   | 1024x1024  | RTX 4090 | 10s   |
| Kandinsky 2.2 | 768x768    | RTX 3090 | 8s    |
| Kandinsky 2.2 | 768x768    | RTX 4090 | 5s    |

## Устранение неполадок

### Недостаточно памяти

**Проблема:** CUDA OOM при генерации

**Решения:**

* Включить выгрузку на CPU
* Уменьшить разрешение
* Использовать Kandinsky 2.2 вместо 3
* Включить нарезку внимания (attention slicing)

```python
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
```

### Плохая отрисовка текста

**Проблема:** Текст на изображениях выглядит неправильно

**Решения:**

* Kandinsky испытывает трудности с отрисовкой текста (как и большинство диффузионных моделей)
* Добавьте текст на этапе постобработки
* Используйте подсказки, избегающие текста

### Цвета выглядят неправильно

**Проблема:** Цвета изображения блеклые или переусиленные

**Решения:**

* Отрегулируйте масштаб руководства (попробуйте диапазон 3–6)
* Укажите предпочтения по цвету в подсказке
* Постобработка с цветокоррекцией

### Медленная генерация

**Проблема:** Генерация занимает слишком много времени

**Решения:**

* Уменьшите количество шагов вывода (часто достаточно 30)
* Используйте точность fp16
* Используйте Kandinsky 2.2 для более быстрой работы
* Уменьшите разрешение для превью

## Сравнение с другими моделями

| Функция              | Kandinsky 3 | SDXL          | FLUX       |
| -------------------- | ----------- | ------------- | ---------- |
| Многоязычная         | Отлично     | Ограничено    | Ограничено |
| Качество изображения | Высокий     | Очень высокое | Наивысшая  |
| Скорость             | Средне      | Средне        | Медленно   |
| VRAM                 | 12GB        | 12GB          | 24 ГБ      |
| Инпэйнтинг           | Да          | Да            | Ограничено |

## Оценка стоимости

Типичные ставки на маркетплейсе CLORE.AI (по состоянию на 2024):

| GPU       | Почасовая ставка | Дневная ставка | Сессия 4 часа |
| --------- | ---------------- | -------------- | ------------- |
| RTX 3060  | \~$0.03          | \~$0.70        | \~$0.12       |
| RTX 3090  | \~$0.06          | \~$1.50        | \~$0.25       |
| RTX 4090  | \~$0.10          | \~$2.30        | \~$0.40       |
| A100 40GB | \~$0.17          | \~$4.00        | \~$0.70       |
| A100 80GB | \~$0.25          | \~$6.00        | \~$1.00       |

*Цены варьируются в зависимости от провайдера. Проверьте* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *для текущих тарифов.*

## Дальнейшие шаги

* FLUX Generation — изображения высочайшего качества
* Stable Diffusion — самый популярный вариант
* [PixArt](/guides/guides_v2-ru/generaciya-izobrazhenii/pixart-image-gen.md) - Быстрая генерация
* ComfyUI — Расширенные рабочие процессы


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-ru/drugie-nagruzki/kandinsky.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.