AlphaFold2 Protein Prediction

Predict protein structures with Nobel Prize-winning AI — powered by GPU acceleration on Clore.ai

AlphaFold2, developed by DeepMind, revolutionized structural biology by predicting protein 3D structures with atomic accuracy. It has been applied to over 200 million protein sequences and earned the 2024 Nobel Prize in Chemistry. Running AlphaFold2 requires significant GPU memory and compute — Clore.ai provides affordable access to the high-end GPUs needed.

GitHub: google-deepmind/alphafoldarrow-up-right — 13K+ ⭐


Prerequisites

  • A Clore.ai account with sufficient balance

  • Basic familiarity with the Linux command line

  • Your target protein sequence(s) in FASTA format

  • ~2.5TB disk space for the full genetic databases (or use reduced databases for testing)


Why Run AlphaFold2 on Clore.ai?

AlphaFold2 benefits enormously from GPU acceleration:

Hardware
Prediction Time (typical protein ~400aa)

CPU only

6–24+ hours

Single A100 80GB

15–45 minutes

Single RTX 4090

20–60 minutes

Single RTX 3090

30–90 minutes

Clore.ai offers A100, RTX 4090, and RTX 3090 nodes at a fraction of cloud provider costs, making large-scale proteomics studies accessible.


Step 1 — Choose Your GPU Rental on Clore.ai

circle-info

Recommended GPUs for AlphaFold2:

  • A100 80GB — Best for large proteins (>700 aa) and multimer prediction

  • RTX 4090 24GB — Great for standard monomers (<500 aa)

  • RTX 3090 24GB — Cost-effective for smaller proteins

For multimer prediction, 40GB+ VRAM is strongly recommended.

  1. Log in to clore.aiarrow-up-right and go to Marketplace

  2. Filter by GPU model (A100 or RTX 4090 recommended)

  3. Ensure the server has at least 100GB disk space (or 2.5TB for full databases)

  4. Select a server and click Rent


Step 2 — Configure Your Deployment

When setting up your rental order, use the following configuration:

Docker Image:

circle-exclamation

Ports to expose:

Environment Variables:

Minimum Resources:

  • CPU: 8 cores

  • RAM: 32GB (64GB recommended for large proteins)

  • Disk: 100GB minimum (2.5TB for full databases)


Step 3 — Connect via SSH

Once your instance is running:

Verify GPU is visible:

Expected output should show your GPU (e.g., A100 80GB SXM4).


Step 4 — Install AlphaFold2

Option A: Using the Official Installer Script

Option B: Using pip (Faster Setup)


Step 5 — Download Genetic Databases

circle-exclamation

Full Databases (Production Use)

This downloads:

  • BFD (~270GB) — Big Fantastic Database

  • UniRef90 (~58GB) — UniProt Reference Clusters

  • MGnify (~64GB) — Metagenomics sequences

  • PDB70 (~56GB) — Protein Data Bank representative structures

  • PDB seqres (~0.2GB)

  • UniClust30 (~86GB)

  • Small BFD (~17GB) — Reduced version

Reduced Databases (Testing/Development)

For testing on limited disk:


Step 6 — Download AlphaFold Model Weights


Step 7 — Prepare Your Input Sequence

Create a FASTA file with your target protein sequence:

circle-info

FASTA Format Tips:

  • Header line starts with >

  • Sequence should contain only standard amino acid letters (ACDEFGHIKLMNPQRSTVWY)

  • Remove any gaps or non-standard characters

  • For multimer prediction, include all chains with separate headers


Step 8 — Run AlphaFold2

Monomer Prediction (Single Chain)

Multimer Prediction (Protein Complex)


Step 9 — Understanding Output Files

AlphaFold2 produces several output files per prediction:

circle-info

Interpreting Results:

  • ranked_0.pdb is your best structure — open it in PyMOL, ChimeraX, or UCSF Chimera

  • pLDDT score (0–100): per-residue confidence. >90 = very high, 70–90 = good, 50–70 = low, <50 = disordered

  • PAE (Predicted Aligned Error) plots show inter-domain confidence


Step 10 — Visualize Results

Download PDB Files to Your Local Machine

Visualize in PyMOL (locally)

Quick pLDDT Analysis


Using ColabFold (Faster Alternative)

ColabFold is a faster AlphaFold2 implementation using MMseqs2 for MSA generation:

circle-check

Troubleshooting

CUDA Out of Memory

HHblits / Jackhmmer Errors

Database Download Failures

JAX/CUDA Compatibility Issues


Performance Tips

circle-check

Cost Estimation on Clore.ai

Scenario
GPU
Est. Time
Est. Cost

Single protein (~300aa)

RTX 3090

1–2h

~$0.30–0.60

Single protein (~500aa)

RTX 4090

45–90min

~$0.40–0.80

Multimer complex

A100 80GB

2–4h

~$1.50–3.00

Proteome screening (100 proteins)

A100 80GB

8–12h

~$6–10

Costs are approximate and depend on current marketplace pricing.


Additional Resources


This guide covers AlphaFold2 deployment on Clore.ai GPU rentals. For the latest AlphaFold3, see the separate AlphaFold3 guide.


Clore.ai GPU Recommendations

Use Case
Recommended GPU
Est. Cost on Clore.ai

Development/Testing

RTX 3090 (24GB)

~$0.12/gpu/hr

Standard Proteins

RTX 4090 (24GB)

~$0.70/gpu/hr

Large Molecules / Multimers

A100 80GB

~$1.20/gpu/hr

💡 All examples in this guide can be deployed on Clore.aiarrow-up-right GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.

Last updated

Was this helpful?