Kohya Training

LoRA und DreamBooth für Stable Diffusion mit Kohya auf Clore.ai trainieren

Trainiere LoRA, Dreambooth und vollständige Feinanpassungen für Stable Diffusion mit Kohyas Trainer.

Alle Beispiele können auf GPU-Servern ausgeführt werden, die über CLORE.AI Marketplace.

Mieten auf CLORE.AI

Besuchen Sie CLORE.AI Marketplace
Nach GPU-Typ, VRAM und Preis filtern
Wählen On-Demand (Festpreis) oder Spot (Gebotspreis)
Konfigurieren Sie Ihre Bestellung:
- Docker-Image auswählen
- Ports festlegen (TCP für SSH, HTTP für Web-UIs)
- Umgebungsvariablen bei Bedarf hinzufügen
- Startbefehl eingeben
Zahlung auswählen: CLORE, BTC, oder USDT/USDC
Bestellung erstellen und auf Bereitstellung warten

Zugriff auf Ihren Server

Verbindungsdetails finden Sie in Meine Bestellungen
Webschnittstellen: Verwenden Sie die HTTP-Port-URL
SSH: ssh -p <port> root@<proxy-address>

Was ist Kohya?

Kohya_ss ist ein Trainingstoolkit für:

LoRA - Leichte Adapter (am beliebtesten)
Dreambooth - Subjekt-/Stil-Training
Vollständige Feinanpassung - Komplettes Modelltraining
LyCORIS - Erweiterte LoRA-Varianten

Anforderungen

Trainingstyp

Min. VRAM

Schnelle Bereitstellung

Docker-Image:

pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel

Ports:

22/tcp
7860/http

Befehl:

apt-get update && apt-get install -y git libgl1 libglib2.0-0 && \
cd /workspace && \
git clone https://github.com/bmaltais/kohya_ss.git && \
cd kohya_ss && \
pip install -r requirements.txt && \
pip install xformers && \
python kohya_gui.py --listen 0.0.0.0 --server_port 7860

Zugriff auf Ihren Dienst

Nach der Bereitstellung finden Sie Ihre http_pub URL in Meine Bestellungen:

Gehen Sie zur Meine Bestellungen Seite
Klicken Sie auf Ihre Bestellung
Finden Sie die http_pub URL (z. B., abc123.clorecloud.net)

Verwenden Sie https://IHRE_HTTP_PUB_URL anstelle von localhost in den Beispielen unten.

Verwendung der Web-Oberfläche

Zugriff unter http://<proxy>:<port>
Wähle Trainingstyp (LoRA, Dreambooth, usw.)
Einstellungen konfigurieren
Training starten

Datensatzvorbereitung

Ordnerstruktur

/workspace/dataset/
├── 10_mysubject/           # Repeats_conceptname
│   ├── image1.png
│   ├── image1.txt          # Beschriftungsdatei
│   ├── image2.png
│   └── image2.txt
└── 10_regularization/      # Optionale Reg-Bilder
    ├── reg1.png
    └── reg1.txt

Bildanforderungen

Auflösung: 512x512 (SD 1.5) oder 1024x1024 (SDXL)
Format: PNG oder JPG
Anzahl: 10–50 Bilder für LoRA
Qualität: Klar, gut beleuchtet, verschiedene Blickwinkel

Beschriftungsdateien

Erstelle .txt Datei mit gleichem Namen wie das Bild:

myimage.txt:

ein Foto von sks Person, professionelles Porträt, Studiobeleuchtung, hohe Qualität

Automatische Beschriftung

Verwende BLIP für automatische Beschriftungen:

from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import os

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to("cuda")

for img_file in os.listdir("./images"):
    if img_file.endswith(('.png', '.jpg')):
        image = Image.open(f"./images/{img_file}")
        inputs = processor(image, return_tensors="pt").to("cuda")
        output = model.generate(**inputs, max_new_tokens=50)
        caption = processor.decode(output[0], skip_special_tokens=True)

        txt_file = img_file.rsplit('.', 1)[0] + '.txt'
        with open(f"./images/{txt_file}", 'w') as f:
            f.write(caption)

LoRA-Training (SD 1.5)

Konfiguration

In der Kohya-Oberfläche:

Einstellung

Wert

Modell

runwayml/stable-diffusion-v1-5

Netzwerk-Rang

32-128

Netzwerk-Alpha

16-64

Lernrate

1e-4

Batch-Größe

1-4

Epochen

10-20

Optimierer

AdamW8bit

Training über die Kommandozeile

accelerate launch --num_cpu_threads_per_process=2 train_network.py \
    --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
    --train_data_dir="/workspace/dataset" \
    --output_dir="/workspace/output" \
    --output_name="my_lora" \
    --resolution=512 \
    --train_batch_size=1 \
    --max_train_epochs=10 \
    --learning_rate=1e-4 \
    --network_module=networks.lora \
    --network_dim=32 \
    --network_alpha=16 \
    --mixed_precision=fp16 \
    --save_precision=fp16 \
    --optimizer_type=AdamW8bit \
    --lr_scheduler=cosine \
    --cache_latents \
    --xformers \
    --save_every_n_epochs=2

LoRA-Training (SDXL)

accelerate launch train_network.py \
    --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
    --train_data_dir="/workspace/dataset" \
    --output_dir="/workspace/output" \
    --output_name="my_sdxl_lora" \
    --resolution=1024 \
    --train_batch_size=1 \
    --max_train_epochs=10 \
    --learning_rate=1e-4 \
    --network_module=networks.lora \
    --network_dim=32 \
    --network_alpha=16 \
    --mixed_precision=bf16 \
    --save_precision=fp16 \
    --optimizer_type=Adafactor \
    --cache_latents \
    --xformers \
    --save_every_n_epochs=2

Dreambooth-Training

Subjekt-Training

accelerate launch train_dreambooth.py \
    --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
    --instance_data_dir="/workspace/dataset/instance" \
    --class_data_dir="/workspace/dataset/class" \
    --output_dir="/workspace/output" \
    --instance_prompt="a photo of sks person" \
    --class_prompt="a photo of person" \
    --with_prior_preservation \
    --prior_loss_weight=1.0 \
    --num_class_images=200 \
    --resolution=512 \
    --train_batch_size=1 \
    --learning_rate=2e-6 \
    --max_train_steps=1000 \
    --mixed_precision=fp16 \
    --gradient_checkpointing

Stil-Training

accelerate launch train_dreambooth.py \
    --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
    --instance_data_dir="/workspace/dataset/style" \
    --output_dir="/workspace/output" \
    --instance_prompt="painting in the style of xyz" \
    --resolution=512 \
    --train_batch_size=1 \
    --learning_rate=5e-6 \
    --max_train_steps=2000 \
    --mixed_precision=fp16

Trainingstipps

Optimale Einstellungen

Parameter

Person/Charakter

Stil

Objekt

Netzwerk-Rang

64-128

32-64

Netzwerk-Alpha

32-64

16-32

Lernrate

1e-4

5e-5

1e-4

Epochen

15-25

10-15

Vermeidung von Überanpassung

Verwende Regularisierungsbilder
Niedrigere Lernrate
Weniger Epochen
Erhöhe Netzwerk-Alpha

Vermeidung von Unteranpassung

Mehr Trainingsbilder
Höhere Lernrate
Mehr Epochen
Niedrigeres Netzwerk-Alpha

Überwachung des Trainings

TensorBoard

tensorboard --logdir /workspace/output/logs --port 6006 --bind_all

Wichtige Metriken

loss - Sollte abnehmen und dann stabilisieren
lr - Lernratenplan
epoch - Trainingsfortschritt

Testen deiner LoRA

Mit Automatic1111

Kopiere LoRA nach:

stable-diffusion-webui/models/Lora/my_lora.safetensors

Verwendung im Prompt:

<lora:my_lora:0.8> a photo of sks person

Mit ComfyUI

Lade LoRA-Knoten und verbinde mit dem Modell.

Mit Diffusers

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

pipe.load_lora_weights("/workspace/output/my_lora.safetensors")

image = pipe("a photo of sks person, professional portrait").images[0]

Fortgeschrittenes Training

LyCORIS (LoHa, LoKR)

accelerate launch train_network.py \
    --network_module=lycoris.kohya \
    --network_args "algo=loha" "conv_dim=4" "conv_alpha=2" \
    ...

Textual Inversion

accelerate launch train_textual_inversion.py \
    --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
    --train_data_dir="/workspace/dataset" \
    --learnable_property="style" \
    --placeholder_token="<my-style>" \
    --initializer_token="art" \
    --resolution=512 \
    --train_batch_size=1 \
    --max_train_steps=3000 \
    --learning_rate=5e-4

Speichern & Exportieren

Trainiertes Modell herunterladen

scp -P <port> root@<proxy>:/workspace/output/my_lora.safetensors ./

Formate konvertieren


# SafeTensors zu PyTorch
from safetensors.torch import load_file, save_file
import torch

state_dict = load_file("model.safetensors")
torch.save(state_dict, "model.pt")

Kostenabschätzung

Typische CLORE.AI-Marktplatztarife (Stand 2024):

GPU

Stundensatz

Tagessatz

4-Stunden-Sitzung

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Preise variieren je nach Anbieter und Nachfrage. Prüfen Sie CLORE.AI Marketplace auf aktuelle Preise.

Geld sparen:

Verwenden Sie Spot Markt für flexible Workloads (oft 30–50% günstiger)
Bezahlen mit CLORE Token
Preise bei verschiedenen Anbietern vergleichen

FLUX LoRA-Training

Trainiere LoRA-Adapter für FLUX.1-dev und FLUX.1-schnell — die neueste Generation von Diffusions-Transformer-Modellen mit überlegener Qualität.

VRAM-Anforderungen

Modell

Min. VRAM

Empfohlene GPU

FLUX.1-schnell

16GB

RTX 4080 / 3090

FLUX.1-dev

24GB

RTX 4090

FLUX.1-dev (bf16)

40GB+

A100 40GB

Hinweis: FLUX verwendet die DiT (Diffusion Transformer) Architektur — Trainingsdynamiken unterscheiden sich erheblich von SD 1.5 / SDXL.

Installation für FLUX

Installiere PyTorch mit CUDA 12.4-Unterstützung:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install xformers --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install accelerate sentencepiece protobuf

FLUX LoRA-Konfiguration (flux_lora.toml)

[general]
shuffle_caption = false
caption_extension = ".txt"
keep_tokens = 1

[datasets]
[[datasets.subsets]]
image_dir = "/workspace/dataset/train"
caption_extension = ".txt"
num_repeats = 5
resolution = [512, 512]

[training]
pretrained_model_name_or_path = "black-forest-labs/FLUX.1-dev"
output_dir = "/workspace/output"
output_name = "my_flux_lora"

# FLUX-spezifisch: benutze bf16 (NICHT fp16 — FLUX erfordert bf16)
mixed_precision = "bf16"
save_precision = "bf16"
full_bf16 = true

train_batch_size = 1
max_train_epochs = 20
gradient_checkpointing = true
gradient_accumulation_steps = 4

# FLUX LoRA-Parameter — niedrigere LR als bei SDXL!
learning_rate = 1e-4
lr_scheduler = "cosine_with_restarts"
lr_warmup_steps = 100

# Netzwerk-Konfiguration
network_module = "networks.lora_flux"
network_dim = 16           # FLUX: kleinere Dim funktioniert gut (16-64)
network_alpha = 16         # Auf network_dim setzen

# FLUX-spezifische Optionen
t5xxl_max_token_length = 512
apply_t5_attn_mask = true

# Optimierer — Adafactor funktioniert gut für FLUX
optimizer_type = "adafactor"
optimizer_args = ["scale_parameter=False", "relative_step=False", "warmup_init=False"]

# Speicherersparnis
cache_latents = true
cache_latents_to_disk = true
cache_text_encoder_outputs = true
cache_text_encoder_outputs_to_disk = true

# Sampling während des Trainings (optionale Vorschau)
sample_every_n_epochs = 5
sample_prompts = "/workspace/sample_prompts.txt"

FLUX LoRA-Training Befehl

# Einzelne GPU
accelerate launch train_network.py \
    --config_file flux_lora.toml \
    --network_module networks.lora_flux \
    --network_dim 16 \
    --network_alpha 16 \
    --mixed_precision bf16 \
    --full_bf16

# Mit expliziten Parametern (kein toml)
accelerate launch train_network.py \
    --pretrained_model_name_or_path "black-forest-labs/FLUX.1-dev" \
    --train_data_dir "/workspace/dataset" \
    --output_dir "/workspace/output" \
    --output_name "my_flux_lora" \
    --network_module networks.lora_flux \
    --network_dim 16 \
    --network_alpha 16 \
    --learning_rate 1e-4 \
    --max_train_epochs 20 \
    --train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --mixed_precision bf16 \
    --full_bf16 \
    --optimizer_type adafactor \
    --cache_latents \
    --cache_text_encoder_outputs \
    --t5xxl_max_token_length 512 \
    --apply_t5_attn_mask \
    --save_every_n_epochs 5

FLUX vs SDXL: Wichtige Unterschiede

Parameter

SDXL

FLUX.1

Lernrate

1e-3 bis 1e-4

1e-4 bis 5e-5

Präzision

fp16 oder bf16

bf16 ERFORDERLICH

Netzwerkmodul

networks.lora

networks.lora_flux

Netzwerk-Dim

32–128

8–64 (kleiner)

Optimierer

AdamW8bit

Adafactor

Min. VRAM

12GB

16–24GB

Architektur

U-Net

DiT (Transformer)

Lernratenleitfaden für FLUX

# Konservativ (sicherer, geringere Chance auf Überanpassung)
learning_rate = 5e-5

# Standard (guter Ausgangspunkt)
learning_rate = 1e-4

# Aggressiv (ausdrucksstärker, Risiko von Artefakten)
learning_rate = 2e-4

Tipp: FLUX ist empfindlicher gegenüber der Lernrate als SDXL. Beginne bei 1e-4 und reduziere auf 5e-5 wenn du Qualitätsprobleme siehst. Für SDXL, 1e-3 ist üblich — vermeide dies für FLUX.

Testen von FLUX LoRA

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Lade dein trainiertes LoRA
pipe.load_lora_weights("/workspace/output/my_flux_lora.safetensors")

image = pipe(
    prompt="a photo of sks person, professional portrait, studio lighting",
    num_inference_steps=28,
    guidance_scale=3.5,
    width=1024,
    height=1024,
).images[0]

image.save("flux_lora_test.png")

Fehlerbehebung

OOM-Fehler

Reduziere Batch-Größe auf 1
Aktivieren Sie Gradient Checkpointing
Verwende 8bit-Optimierer
Niedrigere Auflösung

Schlechte Ergebnisse

Mehr/bessere Trainingsbilder
Lernrate anpassen
Überprüfe, ob Beschriftungen zu Bildern passen
Probiere einen anderen Netzwerk-Rang

Training stürzt ab

Überprüfe CUDA-Version
Aktualisiere xformers
Batch-Größe reduzieren
Überprüfe Festplattenspeicher

FLUX-spezifische Probleme

"bf16 nicht unterstützt" — Verwende A-Serie (Ampere+) oder RTX 30/40 Serien GPUs
OOM bei FLUX.1-dev — Wechsle zu FLUX.1-schnell (benötigt 16GB) oder aktiviere cache_text_encoder_outputs
Verschwommene Ergebnisse — Erhöhe network_dim auf 32–64, senke die Lernrate auf 5e-5
NaN loss — Deaktiviere full_bf16, überprüfe deinen Datensatz auf beschädigte Bilder

VorherigeDreamBooth NächsteLLM feinabstimmen

Zuletzt aktualisiert vor 24 Tagen

War das hilfreich?

hashtagMieten auf CLORE.AI

hashtagZugriff auf Ihren Server

hashtagWas ist Kohya?

hashtagAnforderungen

hashtagSchnelle Bereitstellung

hashtagZugriff auf Ihren Dienst

hashtagVerwendung der Web-Oberfläche

hashtagDatensatzvorbereitung

hashtagOrdnerstruktur

hashtagBildanforderungen

hashtagBeschriftungsdateien

hashtagAutomatische Beschriftung

hashtagLoRA-Training (SD 1.5)

hashtagKonfiguration

hashtagTraining über die Kommandozeile

hashtagLoRA-Training (SDXL)

hashtagDreambooth-Training

hashtagSubjekt-Training

hashtagStil-Training

hashtagTrainingstipps

hashtagOptimale Einstellungen

hashtagVermeidung von Überanpassung

hashtagVermeidung von Unteranpassung

hashtagÜberwachung des Trainings

hashtagTensorBoard

hashtagWichtige Metriken

hashtagTesten deiner LoRA

hashtagMit Automatic1111

hashtagMit ComfyUI

hashtagMit Diffusers

hashtagFortgeschrittenes Training

hashtagLyCORIS (LoHa, LoKR)

hashtagTextual Inversion

hashtagSpeichern & Exportieren

hashtagTrainiertes Modell herunterladen

hashtagFormate konvertieren

hashtagKostenabschätzung

hashtagFLUX LoRA-Training

hashtagVRAM-Anforderungen

hashtagInstallation für FLUX

hashtagFLUX LoRA-Konfiguration (flux_lora.toml)

hashtagFLUX LoRA-Training Befehl

hashtagFLUX vs SDXL: Wichtige Unterschiede

hashtagLernratenleitfaden für FLUX

hashtagTesten von FLUX LoRA

hashtagFehlerbehebung

hashtagOOM-Fehler

hashtagSchlechte Ergebnisse

hashtagTraining stürzt ab

hashtagFLUX-spezifische Probleme

Mieten auf CLORE.AI

Zugriff auf Ihren Server

Was ist Kohya?

Anforderungen

Schnelle Bereitstellung

Zugriff auf Ihren Dienst

Verwendung der Web-Oberfläche

Datensatzvorbereitung

Ordnerstruktur

Bildanforderungen

Beschriftungsdateien

Automatische Beschriftung

LoRA-Training (SD 1.5)

Konfiguration

Training über die Kommandozeile

LoRA-Training (SDXL)

Dreambooth-Training

Subjekt-Training

Stil-Training

Trainingstipps

Optimale Einstellungen

Vermeidung von Überanpassung

Vermeidung von Unteranpassung

Überwachung des Trainings

TensorBoard

Wichtige Metriken

Testen deiner LoRA

Mit Automatic1111

Mit ComfyUI

Mit Diffusers

Fortgeschrittenes Training

LyCORIS (LoHa, LoKR)

Textual Inversion

Speichern & Exportieren

Trainiertes Modell herunterladen

Formate konvertieren

Kostenabschätzung

FLUX LoRA-Training

VRAM-Anforderungen

Installation für FLUX

FLUX LoRA-Konfiguration (flux_lora.toml)

FLUX LoRA-Training Befehl

FLUX vs SDXL: Wichtige Unterschiede

Lernratenleitfaden für FLUX

Testen von FLUX LoRA

Fehlerbehebung

OOM-Fehler

Schlechte Ergebnisse

Training stürzt ab

FLUX-spezifische Probleme