# LTX-Video Real-Time Generation

LTX-Video by Lightricks is the fastest open-source video generation model available. On an RTX 4090 it produces a 5-second 768×512 clip in roughly 4 seconds — faster than real-time playback. The model supports both text-to-video (T2V) and image-to-video (I2V) workflows through native `diffusers` integration via `LTXPipeline` and `LTXImageToVideoPipeline`.

Renting a GPU on [Clore.ai](https://clore.ai/) gives you instant access to the hardware LTX-Video needs, with no upfront investment and per-hour billing.

## Key Features

* **Faster than real-time** — 5-second video generated in \~4 seconds on an RTX 4090.
* **Text-to-Video** — produce clips from natural language descriptions.
* **Image-to-Video** — animate a static reference image with motion and camera control.
* **Lightweight architecture** — 2B parameter video DiT with a compact latent space.
* **Native diffusers** — `LTXPipeline` and `LTXImageToVideoPipeline` in `diffusers >= 0.32`.
* **Open weights** — Apache-2.0 license; fully commercial use permitted.
* **Temporal VAE** — 1:192 compression ratio across space and time; efficient decoding.

## Requirements

| Component  | Minimum | Recommended |
| ---------- | ------- | ----------- |
| GPU VRAM   | 16 GB   | 24 GB       |
| System RAM | 16 GB   | 32 GB       |
| Disk       | 15 GB   | 30 GB       |
| Python     | 3.10+   | 3.11        |
| CUDA       | 12.1+   | 12.4        |
| diffusers  | 0.32+   | latest      |

**Clore.ai GPU recommendation:** An **RTX 4090** (24 GB, \~$0.5–2/day) is ideal for maximum throughput. An **RTX 3090** (24 GB, \~$0.3–1/day) still runs faster than many competing models at a fraction of the cost.

## Quick Start

```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install diffusers transformers accelerate sentencepiece imageio[ffmpeg]

python -c "import torch; print(torch.cuda.get_device_name(0))"
```

## Usage Examples

### Text-to-Video

```python
import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained(
    "Lightricks/LTX-Video",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

prompt = (
    "A drone shot gliding over a turquoise coral reef, "
    "schools of tropical fish darting below, golden hour light "
    "refracting through the water surface"
)

video_frames = pipe(
    prompt=prompt,
    negative_prompt="blurry, low quality, distorted",
    num_frames=121,               # ~5 sec at 24 fps
    width=768,
    height=512,
    num_inference_steps=30,
    guidance_scale=7.5,
    generator=torch.Generator("cuda").manual_seed(0),
).frames[0]

export_to_video(video_frames, "coral_reef.mp4", fps=24)
print("Saved coral_reef.mp4")
```

### Image-to-Video

```python
import torch
from PIL import Image
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video

pipe = LTXImageToVideoPipeline.from_pretrained(
    "Lightricks/LTX-Video",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

image = Image.open("cityscape.png").resize((768, 512))

video_frames = pipe(
    prompt="Camera slowly pans right, city lights flicker on at dusk",
    negative_prompt="static, blurry",
    image=image,
    num_frames=121,
    num_inference_steps=30,
    guidance_scale=7.5,
).frames[0]

export_to_video(video_frames, "cityscape_animated.mp4", fps=24)
```

### Batch Generation Script

```python
import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained(
    "Lightricks/LTX-Video", torch_dtype=torch.bfloat16
).to("cuda")

prompts = [
    "A cat stretching on a sunlit windowsill, dust motes floating",
    "Aerial view of waves crashing on black volcanic sand",
    "Time-lapse of storm clouds rolling over a prairie",
]

for i, prompt in enumerate(prompts):
    frames = pipe(
        prompt=prompt,
        num_frames=121,
        width=768,
        height=512,
        num_inference_steps=30,
        guidance_scale=7.5,
    ).frames[0]
    export_to_video(frames, f"batch_{i:03d}.mp4", fps=24)
    print(f"[{i+1}/{len(prompts)}] Done")
```

## Tips for Clore.ai Users

1. **Speed benchmark** — on an RTX 4090, LTX-Video generates 121 frames in \~4 seconds; use this as a sanity check that your rental is performing correctly.
2. **bf16 precision** — the checkpoint is trained in bf16; do not switch to fp16 or you risk quality degradation.
3. **Cache weights** — set `HF_HOME=/workspace/hf_cache` on a persistent volume. The model is \~6 GB; re-downloading on every container start wastes time.
4. **Prompt engineering** — LTX-Video responds well to cinematic language: "drone shot", "slow motion", "golden hour", "tracking shot". Be specific about camera motion.
5. **Batch overnight** — LTX-Video is fast enough to generate hundreds of clips per hour on a 4090. Queue prompts from a file and let it run.
6. **SSH + tmux** — always run generation inside a `tmux` session so dropped connections don't interrupt long batch jobs.
7. **Monitor VRAM** — `watch -n1 nvidia-smi` in a second terminal to ensure you're not hitting swap.

## Troubleshooting

| Problem                      | Fix                                                                             |
| ---------------------------- | ------------------------------------------------------------------------------- |
| `OutOfMemoryError`           | Reduce `num_frames` to 81 or `width`/`height` to 512×320                        |
| Model not found in diffusers | Upgrade: `pip install -U diffusers` — LTXPipeline requires diffusers ≥ 0.32     |
| Black or static output       | Ensure you pass a `negative_prompt`; increase `guidance_scale` to 8–9           |
| `ImportError: imageio`       | `pip install imageio[ffmpeg]` — ffmpeg backend needed for MP4 export            |
| Slow first inference         | First run compiles CUDA kernels and downloads weights; subsequent runs are fast |
| Color banding artifacts      | Use `torch.bfloat16` (not float16); bfloat16 has wider dynamic range            |
| Container restarted mid-job  | Set `HF_HOME` to persistent storage; partial HF downloads auto-resume           |
