# SkyReels-V3

SkyReels-V3 is an open-source video generation model from Kunlun (SkyWork AI) built on top of the Wan2.1 video architecture. It generates smooth 24 fps clips with both text-to-video (T2V) and image-to-video (I2V) capabilities. The model inherits Wan2.1's strong motion coherence and temporal consistency while adding SkyWork's training refinements for improved visual quality and prompt adherence.

Running SkyReels-V3 on [Clore.ai](https://clore.ai/) lets you access the 24 GB VRAM it needs without buying hardware — rent an RTX 4090 for a few dollars and start generating.

## Key Features

* **24 fps output** — smooth, broadcast-quality frame rate out of the box.
* **Text-to-Video** — generate clips from natural language descriptions with strong prompt following.
* **Image-to-Video** — animate a reference image with controllable camera motion and subject movement.
* **Built on Wan2.1** — inherits the proven temporal attention and motion modeling of the Wan architecture.
* **Multi-resolution** — supports generation at 480p and 720p depending on VRAM budget.
* **Open weights** — available under an open license for research and commercial use.
* **Chinese + English** — bilingual prompt support from the Wan2.1 text encoder.

## Requirements

| Component  | Minimum                   | Recommended |
| ---------- | ------------------------- | ----------- |
| GPU VRAM   | 16 GB (480p with offload) | 24 GB       |
| System RAM | 32 GB                     | 64 GB       |
| Disk       | 25 GB                     | 50 GB       |
| Python     | 3.10+                     | 3.11        |
| CUDA       | 12.1+                     | 12.4        |

**Clore.ai GPU recommendation:** An **RTX 4090** (24 GB, \~$0.5–2/day) is the sweet spot — enough VRAM for 720p generation at full precision. An **RTX 3090** (24 GB, \~$0.3–1/day) works for 480p and offers the best price-per-clip ratio on the marketplace.

## Quick Start

```bash
# Install core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install diffusers transformers accelerate sentencepiece
pip install imageio[ffmpeg]

# Verify GPU
python -c "import torch; print(torch.cuda.get_device_name(0))"
```

## Usage Examples

### Text-to-Video

```python
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video

# SkyReels-V3 uses the Wan2.1 pipeline architecture
pipe = WanPipeline.from_pretrained(
    "SkyworkAI/SkyReels-V3-T2V",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

prompt = (
    "A samurai walking through a bamboo forest in morning fog, "
    "sunlight filtering through the tall stalks, cinematic composition, "
    "slow deliberate movement"
)

video_frames = pipe(
    prompt=prompt,
    negative_prompt="blurry, low quality, watermark, static",
    num_frames=97,               # ~4 sec at 24 fps
    width=1280,
    height=720,
    num_inference_steps=30,
    guidance_scale=5.0,
    generator=torch.Generator("cuda").manual_seed(42),
).frames[0]

export_to_video(video_frames, "samurai_forest.mp4", fps=24)
print("Saved samurai_forest.mp4")
```

### Image-to-Video

```python
import torch
from PIL import Image
from diffusers import WanImageToVideoPipeline
from diffusers.utils import export_to_video

pipe = WanImageToVideoPipeline.from_pretrained(
    "SkyworkAI/SkyReels-V3-I2V",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

image = Image.open("landscape.png").resize((1280, 720))

video_frames = pipe(
    prompt="Camera slowly pushes forward into the scene, clouds drift overhead",
    image=image,
    negative_prompt="static, jitter, blurry",
    num_frames=97,
    num_inference_steps=30,
    guidance_scale=5.0,
).frames[0]

export_to_video(video_frames, "landscape_anim.mp4", fps=24)
```

### Lower-Resolution Fast Preview

```python
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video

pipe = WanPipeline.from_pretrained(
    "SkyworkAI/SkyReels-V3-T2V", torch_dtype=torch.bfloat16
).to("cuda")

# 480p for quick iteration
frames = pipe(
    prompt="Ocean waves crashing on rocks, dramatic spray, sunset",
    num_frames=49,
    width=854,
    height=480,
    num_inference_steps=20,
    guidance_scale=5.0,
).frames[0]

export_to_video(frames, "waves_preview.mp4", fps=24)
```

## Tips for Clore.ai Users

1. **Use Wan pipeline classes** — SkyReels-V3 is architecturally based on Wan2.1, so it uses `WanPipeline` / `WanImageToVideoPipeline` from diffusers.
2. **Start at 480p** — iterate on prompts at lower resolution first, then generate final clips at 720p once you're happy with the composition.
3. **CPU offloading** — `enable_model_cpu_offload()` is recommended on 24 GB cards for 720p generation to avoid OOM.
4. **Persistent storage** — set `HF_HOME=/workspace/hf_cache` on a Clore.ai persistent volume; the model weighs \~15–20 GB.
5. **24 fps native** — do not change the export fps; the model's temporal attention was trained for 24 fps output.
6. **Bilingual prompts** — the Wan2.1 text encoder handles both English and Chinese; you can mix languages if needed.
7. **Guidance scale** — 4.0–6.0 works best. Higher values (>8) can cause oversaturation.
8. **tmux is mandatory** — always run generation in a `tmux` session on Clore.ai to survive SSH disconnects.

## Troubleshooting

| Problem                        | Fix                                                                                                                    |
| ------------------------------ | ---------------------------------------------------------------------------------------------------------------------- |
| `OutOfMemoryError` at 720p     | Enable `pipe.enable_model_cpu_offload()`; reduce to 480p if still OOM                                                  |
| Model not found on HuggingFace | Check exact repo name on [SkyworkAI HF page](https://huggingface.co/SkyworkAI) — it may be listed under a variant name |
| Jittery or flickering motion   | Increase `num_inference_steps` to 40; reduce `guidance_scale` to 4.0                                                   |
| Slow generation                | \~1–3 min per 4-sec clip on RTX 4090 is normal for 720p; 480p is roughly 2× faster                                     |
| Color shift / oversaturation   | Lower `guidance_scale` to 4.0–5.0                                                                                      |
| `ImportError: imageio`         | `pip install imageio[ffmpeg]`                                                                                          |
| Weights re-download on restart | Mount persistent storage and set `HF_HOME` environment variable                                                        |
