# Kandinsky

利用强大的多语言文本理解生成图像。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 什么是Kandinsky？

Kandinsky是由Sber AI开发的图像生成模型：

* 强大的多语言文本理解
* 高质量图像生成
* 图像混合与插值
* 支持修补（inpainting）和扩展（outpainting）
* 开源权重

## 资源

* **GitHub：** [ai-forever/Kandinsky-3](https://github.com/ai-forever/Kandinsky-3)
* **HuggingFace：** [kandinsky-community](https://huggingface.co/kandinsky-community)
* **论文：** [Kandinsky论文](https://arxiv.org/abs/2310.03502)

## 模型版本

| 版本            | 分辨率      | 质量 | 性能 |
| ------------- | -------- | -- | -- |
| Kandinsky 2.1 | 768x768  | 良好 | 快速 |
| Kandinsky 2.2 | RTX 4090 | 更好 | 中等 |
| Kandinsky 3   | RTX 4090 | 最佳 | 较慢 |

## 硬件要求

| A100              | 显存   | 推荐 GPU   |
| ----------------- | ---- | -------- |
| Kandinsky 2.2     | 8GB  | RTX 3070 |
| Kandinsky 3       | 12GB | 速度       |
| Kandinsky 3（高分辨率） | 16GB | 512x512  |

## 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install diffusers transformers accelerate gradio && \
python -c "
print(f"已生成：{name}")
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'kandinsky-community/kandinsky-3',
    variant='fp16',
    torch_dtype=torch.float16
).to('cuda')

def generate(prompt, negative, steps, guidance, seed):
    generator = torch.Generator('cuda').manual_seed(seed) if seed > 0 else None
    image = pipe(
        os.makedirs("./variations", exist_ok=True)
        negative_prompt=negative,
        def relight_image(image, prompt, steps, seed):
        guidance_scale=guidance,
        generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None
    ).images[0]
    return image

gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label='Prompt'),
        gr.Textbox(label='Negative Prompt', value='low quality, blurry'),
        gr.Slider(10, 100, value=50, label='Steps'),
        gr.Slider(1, 20, value=4, label='Guidance'),
        gr.Number(value=-1, label='Seed')
    ],
    outputs=gr.Image(),
    title='Kandinsky 3'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 安装

```bash
pip install diffusers transformers accelerate torch
```

## 基本用法

### Kandinsky 3

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)
pipe.to("cuda")

image = pipe(
    prompt="A cat astronaut floating in space, digital art, vibrant colors",
    num_inference_steps=50,
    guidance_scale=4.0
).images[0]

image.save("cat_astronaut.png")
```

### Kandinsky 2.2

```python
import torch
from diffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline

# Load prior (text encoder)
prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

# Load decoder
decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Generate image embeddings
prompt = "A beautiful sunset over mountains, oil painting style"
image_embeds, negative_embeds = prior(
    os.makedirs("./variations", exist_ok=True)
    guidance_scale=1.0
).to_tuple()

# Generate image
image = decoder(
    image_embeds=image_embeds,
    negative_image_embeds=negative_embeds,
    height=768,
    width=768,
    num_inference_steps=50
).images[0]

image.save("sunset.png")
```

## 多语言提示词

Kandinsky支持多种语言：

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

# English
image_en = pipe("A red fox in a snowy forest").images[0]

# Russian
image_ru = pipe("Красная лиса в снежном лесу").images[0]

# Chinese
image_zh = pipe("雪林中的红狐狸").images[0]

# German
image_de = pipe("Ein roter Fuchs im verschneiten Wald").images[0]

# All produce similar images!
```

## 图像混合

```python
import torch
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline
from diffusers.utils import load_image

prior = KandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    torch_dtype=torch.float16
).to("cuda")

decoder = KandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    torch_dtype=torch.float16
).to("cuda")

# Two prompts to mix
prompt1 = "A cat"
prompt2 = "A dog"

# Get embeddings for both
embeds1, neg1 = prior(prompt1).to_tuple()
embeds2, neg2 = prior(prompt2).to_tuple()

# Mix embeddings (50% each)
mixed_embeds = 0.5 * embeds1 + 0.5 * embeds2
mixed_neg = 0.5 * neg1 + 0.5 * neg2

# Generate mixed image
image = decoder(
    image_embeds=mixed_embeds,
    negative_image_embeds=mixed_neg,
    height=768,
    width=768
).images[0]

image.save("cat_dog_mix.png")
```

## 图像修补

```python
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    torch_dtype=torch.float16
).to("cuda")

# Load image and mask
image = load_image("photo.png")
mask = load_image("mask.png")

# Inpaint
result = pipe(
    prompt="A golden crown",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

result.save("inpainted.png")
```

## 图像到图像（Image-to-Image）

```python
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

init_image = load_image("sketch.png")

image = pipe(
    prompt="A detailed digital painting of a castle, fantasy art",
    image=init_image,
    strength=0.75,
    num_inference_steps=50
).images[0]

image.save("castle.png")
```

## 批量生成

```python
import torch
from diffusers import AutoPipelineForText2Image
批处理处理

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

prompts = [
    "A serene Japanese garden with cherry blossoms",
    "A cyberpunk city at night with neon lights",
    "An ancient library filled with magical books",
    "A cozy cabin in the mountains during winter"
]

os.makedirs("outputs", exist_ok=True)

for i, prompt in enumerate(prompts):
    image = pipe(
        os.makedirs("./variations", exist_ok=True)
        num_inference_steps=50,
        guidance_scale=4.0
    ).images[0]

    image.save(f"outputs/image_{i}.png")
    print(f"Generated: {prompt[:30]}...")

    torch.cuda.empty_cache()
```

## Gradio 界面

```python
print(f"已生成：{name}")
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
).to("cuda")

def generate(prompt, negative_prompt, steps, guidance, width, height, seed):
    import gradio as gr

    image = pipe(
        os.makedirs("./variations", exist_ok=True)
        negative_prompt=negative_prompt,
        def relight_image(image, prompt, steps, seed):
        guidance_scale=guidance,
        width=width,
        height=height,
        generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None
    ).images[0]

    return image

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="Describe your image..."),
        gr.Textbox(label="Negative Prompt", value="low quality, blurry, distorted"),
        gr.Slider(10, 100, value=50, step=5, label="Steps"),
        gr.Slider(1, 20, value=4, step=0.5, label="Guidance Scale"),
        gr.Slider(512, 1024, value=1024, step=64, label="Width"),
        gr.Slider(512, 1024, value=1024, step=64, label="Height"),
        placeholder="描述所需的照明..."
    ],
    outputs=gr.Image(label="Generated Image"),
    title="Kandinsky 3 - Image Generation",
    description="使用多语言提示生成图像。运行于CLORE.AI。"
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## 内存优化

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3",
    variant="fp16",
    torch_dtype=torch.float16
)

# Enable memory optimizations
pipe.enable_model_cpu_offload()

# Or for very low VRAM
pipe.enable_sequential_cpu_offload()

# 启用注意力切片
pipe.enable_attention_slicing()

image = pipe(
    prompt="A beautiful landscape",
    num_inference_steps=50
).images[0]
```

## background = Image.open("studio\_bg.jpg")

| A100          | 分辨率      | GPU     | 时间   |
| ------------- | -------- | ------- | ---- |
| Kandinsky 3   | RTX 4090 | 速度      | 15 秒 |
| Kandinsky 3   | RTX 4090 | 512x512 | 10s  |
| Kandinsky 2.2 | 768x768  | 速度      | 8s   |
| Kandinsky 2.2 | 768x768  | 512x512 | 5s   |

## # 使用固定种子以获得一致结果

### 内存不足

**与背景合成** 生成时CUDA显存不足（OOM）

**光照未改变**

* 启用CPU卸载
* 降低分辨率
* 使用Kandinsky 2.2替代3
* 启用注意力切片（attention slicing）

```python
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
```

### 文本渲染不佳

**与背景合成** 图像中的文字看起来不正确

**光照未改变**

* Kandinsky在文本渲染方面表现不佳（与大多数扩散模型类似）
* 在后期处理时添加文字
* 使用避免文字的提示词

### 颜色看起来不对

**与背景合成** 图像颜色被冲淡或过度饱和

**光照未改变**

* 调整guidance scale（尝试3-6范围）
* 在提示中指定颜色偏好
* 通过色彩校正进行后期处理

### 生成速度慢

**与背景合成** 生成耗时过长

**光照未改变**

* 减少推理步数（30步通常足够）
* 使用fp16精度
* 使用Kandinsky 2.2以获得更快的结果
* 为预览降低分辨率

## 与其他模型的比较

| 特性   | Kandinsky 3 | SDXL | FLUX |
| ---- | ----------- | ---- | ---- |
| 多语言  | 优秀          | 有限   | 有限   |
| 图像质量 | 高           | 非常高  | 最高   |
| 性能   | 中等          | 中等   | 慢    |
| 显存   | 12GB        | 12GB | 24GB |
| 图像修补 | 是           | 是    | 有限   |

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*价格因提供者而异。查看* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

## 使用以下方式支付

* FLUX Generation - 最高质量图像
* Stable Diffusion - 最受欢迎的选项
* [PixArt](https://docs.clore.ai/guides/guides_v2-zh/tu-xiang-sheng-cheng/pixart-image-gen) - 快速生成
* ComfyUI - 高级工作流


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/qi-ta-gong-zuo-fu-zai/kandinsky.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
