> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-fr/autres-charges-de-travail/kandinsky.md).

# Kandinsky

Générez des images avec une puissante compréhension multilingue du texte.

{% hint style="success" %}
Tous les exemples peuvent être exécutés sur des serveurs GPU loués via [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Qu'est-ce que Kandinsky ?

Kandinsky est un modèle de génération d'images développé par Sber AI :

* Forte compréhension multilingue du texte
* Génération d'images de haute qualité
* Mélange et interpolation d'images
* Prise en charge de l'inpainting et de l'outpainting
* Poids open source

## Ressources

* **GitHub :** [ai-forever/Kandinsky-3](https://github.com/ai-forever/Kandinsky-3)
* **HuggingFace :** [kandinsky-community](https://huggingface.co/kandinsky-community)
* **Article :** [Article Kandinsky](https://arxiv.org/abs/2310.03502)

## Versions du modèle

| Version       | Résolution | Qualité  | Vitesse   |
| ------------- | ---------- | -------- | --------- |
| Kandinsky 2.1 | 768x768    | Bon      | Rapide    |
| Kandinsky 2.2 | 1024x1024  | Meilleur | Moyen     |
| Kandinsky 3   | 1024x1024  | Meilleur | Plus lent |

## Exigences matérielles

| Modèle                         | VRAM | GPU recommandé |
| ------------------------------ | ---- | -------------- |
| Kandinsky 2.2                  | 8 Go | RTX 3070       |
| Kandinsky 3                    | 12Go | RTX 3090       |
| Kandinsky 3 (haute résolution) | 16Go | RTX 4090       |

## Déploiement rapide

**Image Docker :**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports :**

```
22/tcp
7860/http
```

**Commande :**

```bash
pip install diffusers transformers accelerate gradio && \
python -c "
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'kandinsky-community/kandinsky-3',
    variant='fp16',
    torch_dtype=torch.float16
).to('cuda')

def generate(prompt, negative, steps, guidance, seed):
    generator = torch.Generator('cuda').manual_seed(seed) if seed > 0 else None
    image = pipe(
        prompt=prompt,
        negative_prompt=negative,
        num_inference_steps=steps,
        guidance_scale=guidance,
        generator=generator
    ).images[0]
    return image

gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label='Prompt'),
        gr.Textbox(label='Negative Prompt', value='low quality, blurry'),
        gr.Slider(10, 100, value=50, label='Steps'),
        gr.Slider(1, 20, value=4, label='Guidance'),
        gr.Number(value=-1, label='Seed')
    ],
    outputs=gr.Image(),
    title='Kandinsky 3'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accéder à votre service

Après le déploiement, trouvez votre `http_pub` URL dans **Mes commandes**:

1. Aller à la **Mes commandes** page
2. Cliquez sur votre commande
3. Trouvez l' `http_pub` URL (par ex., `abc123.clorecloud.net`)

Utilisez `https://VOTRE_HTTP_PUB_URL` au lieu de `localhost` dans les exemples ci-dessous.

## Installation

```bash
pip install diffusers transformers accelerate torch
```

## Utilisation de base

### Kandinsky 3

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)
pipe.to("cuda")

image = pipe(
    prompt="A cat astronaut floating in space, digital art, vibrant colors",
    num_inference_steps=50,
    guidance_scale=4.0
).images[0]

image.save("cat_astronaut.png")
```

### Kandinsky 2.2

```python
import torch
from diffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline

# Load prior (text encoder)
prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

# Load decoder
decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Generate image embeddings
prompt = "A beautiful sunset over mountains, oil painting style"
image_embeds, negative_embeds = prior(
    prompt=prompt,
    guidance_scale=1.0
).to_tuple()

# Generate image
image = decoder(
    image_embeds=image_embeds,
    negative_image_embeds=negative_embeds,
    height=768,
    width=768,
    num_inference_steps=50
).images[0]

image.save("sunset.png")
```

## Prompts multilingues

Kandinsky prend en charge plusieurs langues :

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

# English
image_en = pipe("A red fox in a snowy forest").images[0]

# Russian
image_ru = pipe("Красная лиса в снежном лесу").images[0]

# Chinese
image_zh = pipe("雪林中的红狐狸").images[0]

# German
image_de = pipe("Ein roter Fuchs im verschneiten Wald").images[0]

# All produce similar images!
```

## Mélange d'images

```python
import torch
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline
from diffusers.utils import load_image

prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Two prompts to mix
prompt1 = "A cat"
prompt2 = "A dog"

# Get embeddings for both
embeds1, neg1 = prior(prompt1).to_tuple()
embeds2, neg2 = prior(prompt2).to_tuple()

# Mix embeddings (50% each)
mixed_embeds = 0.5 * embeds1 + 0.5 * embeds2
mixed_neg = 0.5 * neg1 + 0.5 * neg2

# Generate mixed image
image = decoder(
    image_embeds=mixed_embeds,
    negative_image_embeds=mixed_neg,
    height=768,
    width=768
).images[0]

image.save("cat_dog_mix.png")
```

## Inpainting

```python
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    torch_dtype=torch.float16
).to("cuda")

# Load image and mask
image = load_image("photo.png")
mask = load_image("mask.png")

# Inpaint
result = pipe(
    prompt="A golden crown",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

result.save("inpainted.png")
```

## Image-à-Image

```python
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

init_image = load_image("sketch.png")

image = pipe(
    prompt="A detailed digital painting of a castle, fantasy art",
    image=init_image,
    strength=0.75,
    num_inference_steps=50
).images[0]

image.save("castle.png")
```

## Génération par lot

```python
import torch
from diffusers import AutoPipelineForText2Image
import os

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

prompts = [
    "A serene Japanese garden with cherry blossoms",
    "A cyberpunk city at night with neon lights",
    "An ancient library filled with magical books",
    "A cozy cabin in the mountains during winter"
]

os.makedirs("outputs", exist_ok=True)

for i, prompt in enumerate(prompts):
    image = pipe(
        prompt=prompt,
        num_inference_steps=50,
        guidance_scale=4.0
    ).images[0]

    image.save(f"outputs/image_{i}.png")
    print(f"Généré : {prompt[:30]}...")

    torch.cuda.empty_cache()
```

## Interface Gradio

```python
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

def generate(prompt, negative_prompt, steps, guidance, width, height, seed):
    generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None

    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=steps,
        guidance_scale=guidance,
        width=width,
        height=height,
        generator=generator
    ).images[0]

    return image

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="Describe your image..."),
        gr.Textbox(label="Negative Prompt", value="low quality, blurry, distorted"),
        gr.Slider(10, 100, value=50, step=5, label="Steps"),
        gr.Slider(1, 20, value=4, step=0.5, label="Guidance Scale"),
        gr.Slider(512, 1024, value=1024, step=64, label="Width"),
        gr.Slider(512, 1024, value=1024, step=64, label="Height"),
        gr.Number(value=-1, label="Graine (-1 pour aléatoire)")
    ],
    outputs=gr.Image(label="Generated Image"),
    title="Kandinsky 3 - Image Generation",
    description="Generate images with multilingual prompts. Running on CLORE.AI."
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## Optimisation de la mémoire

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)

# Enable memory optimizations
pipe.enable_model_cpu_offload()

# Or for very low VRAM
pipe.enable_sequential_cpu_offload()

# Activer le découpage de l'attention
pipe.enable_attention_slicing()

image = pipe(
    prompt="A beautiful landscape",
    num_inference_steps=50
).images[0]
```

## Performances

| Modèle        | Résolution | GPU      | Temps |
| ------------- | ---------- | -------- | ----- |
| Kandinsky 3   | 1024x1024  | RTX 3090 | 15 s  |
| Kandinsky 3   | 1024x1024  | RTX 4090 | 10s   |
| Kandinsky 2.2 | 768x768    | RTX 3090 | 8s    |
| Kandinsky 2.2 | 768x768    | RTX 4090 | 5s    |

## Dépannage

### Mémoire insuffisante

**Problème :** CUDA OOM when generating

**Solutions :**

* Enable CPU offloading
* Reduce resolution
* Use Kandinsky 2.2 instead of 3
* Enable attention slicing

```python
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
```

### Rendu de texte médiocre

**Problème :** Le texte dans les images semble incorrect

**Solutions :**

* Kandinsky a du mal avec le rendu du texte (comme la plupart des modèles de diffusion)
* Ajouter le texte en post-traitement
* Utiliser des prompts qui évitent le texte

### Les couleurs semblent incorrectes

**Problème :** Les couleurs de l'image sont délavées ou trop saturées

**Solutions :**

* Ajuster l'échelle de guidance (essayez la plage 3-6)
* Spécifier des préférences de couleur dans le prompt
* Post-traiter avec une correction des couleurs

### Génération lente

**Problème :** La génération prend trop de temps

**Solutions :**

* Réduire le nombre d'étapes d'inférence (30 suffit souvent)
* Utiliser la précision fp16
* Utiliser Kandinsky 2.2 pour des résultats plus rapides
* Réduire la résolution pour les aperçus

## Comparaison avec d'autres modèles

| Fonction        | Kandinsky 3 | SDXL        | FLUX           |
| --------------- | ----------- | ----------- | -------------- |
| Multilingue     | Excellent   | Limité      | Limité         |
| Qualité d'image | Élevé       | Très élevée | La plus élevée |
| Vitesse         | Moyen       | Moyen       | Lent           |
| VRAM            | 12Go        | 12Go        | 24 Go          |
| Inpainting      | Oui         | Oui         | Limité         |

## Estimation des coûts

Tarifs typiques du marché CLORE.AI (à partir de 2024) :

| GPU       | Tarif horaire | Tarif journalier | Session de 4 heures |
| --------- | ------------- | ---------------- | ------------------- |
| RTX 3060  | \~$0.03       | \~$0.70          | \~$0.12             |
| RTX 3090  | \~$0.06       | \~$1.50          | \~$0.25             |
| RTX 4090  | \~$0.10       | \~$2.30          | \~$0.40             |
| A100 40GB | \~$0.17       | \~$4.00          | \~$0.70             |
| A100 80GB | \~$0.25       | \~$6.00          | \~$1.00             |

*Les prix varient selon le fournisseur. Vérifiez* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *pour les tarifs actuels.*

## Prochaines étapes

* FLUX Generation - Images de la plus haute qualité
* Stable Diffusion - Option la plus populaire
* [PixArt](/guides/guides_v2-fr/generation-dimages/pixart-image-gen.md) - Génération rapide
* ComfyUI - Flux de travail avancés


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-fr/autres-charges-de-travail/kandinsky.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.