CogVideoX Video Generation
Generate 6-second videos from text or images with Zhipu AI's CogVideoX diffusion transformer on Clore.ai GPUs.
CogVideoX is a family of open-weight video diffusion transformers from Zhipu AI (Tsinghua). The models generate coherent 6-second clips at 720×480 resolution and 8 fps from either a text prompt (T2V) or a reference image plus prompt (I2V). Two parameter scales are available — 2B for fast iteration and 5B for higher fidelity — both with native diffusers integration through CogVideoXPipeline.
Running CogVideoX on a rented GPU from Clore.ai lets you skip local hardware constraints and generate video at scale for pennies per clip.
Key Features
Text-to-Video (T2V) — describe a scene and get a 6-second 720×480 clip at 8 fps (49 frames).
Image-to-Video (I2V) — supply a reference image plus prompt; the model animates it with temporal consistency.
Two scales — CogVideoX-2B (fast, ~12 GB VRAM) and CogVideoX-5B (higher quality, ~20 GB VRAM).
Native diffusers support — first-class
CogVideoXPipelineandCogVideoXImageToVideoPipelineclasses.3D causal VAE — compresses 49 frames into a compact latent space for efficient denoising.
Open weights — Apache-2.0 license for the 2B variant; research license for 5B.
Requirements
GPU VRAM
16 GB (2B, fp16)
24 GB (5B, bf16)
System RAM
32 GB
64 GB
Disk
30 GB
50 GB
Python
3.10+
3.11
CUDA
12.1+
12.4
Clore.ai GPU recommendation: An RTX 4090 (24 GB, ~$0.5–2/day) handles both the 2B and 5B variants comfortably. An RTX 3090 (24 GB, ~$0.3–1/day) works equally well for 5B at bf16 and is the budget pick.
Quick Start
Usage Examples
Text-to-Video (5B)
Image-to-Video (5B)
Fast Generation with the 2B Variant
Tips for Clore.ai Users
Enable VAE tiling — without
pipe.vae.enable_tiling()the 3D VAE will OOM on 24 GB cards during decode.Use
enable_model_cpu_offload()— shifts idle modules to RAM automatically; adds ~10 % wall-time but saves 4+ GB peak VRAM.bf16 for 5B, fp16 for 2B — the 5B checkpoint was trained in bf16; using fp16 can cause NaN outputs.
Persist models — mount a Clore.ai persistent volume to
/modelsand setHF_HOME=/models/hfso weights survive container restarts.Batch overnight — queue long prompt lists with a simple Python loop; Clore.ai billing is per-hour, so saturate the GPU.
SSH + tmux — run generation inside
tmuxso a dropped connection doesn't kill the process.Select the right GPU — filter Clore.ai marketplace for ≥24 GB VRAM cards; sort by price to find the cheapest RTX 3090 / 4090 available.
Troubleshooting
OutOfMemoryError during VAE decode
Call pipe.vae.enable_tiling() before inference
NaN / black frames with 5B
Switch to torch.bfloat16; fp16 is not supported for the 5B variant
ImportError: imageio
pip install imageio[ffmpeg] — the ffmpeg plugin is needed for MP4 export
Very slow first run
Model download is ~20 GB; subsequent runs use the cached weights
CUDA version mismatch
Ensure PyTorch CUDA version matches the driver: python -c "import torch; print(torch.version.cuda)"
Garbled motion / flickering
Increase num_inference_steps to 50; lower guidance_scale to 5.0
Container killed mid-download
Set HF_HOME to a persistent volume and restart — partial downloads resume automatically
Last updated
Was this helpful?