Jupyter ML Training

Set up JupyterLab with GPU support for ML training on Clore.ai

Set up JupyterLab with GPU support for machine learning experiments and model training.

All examples can be run on GPU servers rented through CLORE.AI Marketplace.

Server Requirements

Parameter

Minimum

Recommended

RAM

16GB

32GB+

VRAM

8GB

16GB+

Network

200Mbps

500Mbps+

Startup Time

2-3 minutes

JupyterLab itself is lightweight. Choose GPU and RAM based on your training workload requirements.

Quick Deploy

Docker Image:

pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime

Ports:

22/tcp
8888/http
6006/http

Environment:

JUPYTER_TOKEN=your_secure_token_here

Command:

pip install jupyterlab tensorboard && \
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --NotebookApp.token='your_secure_token_here'

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

Go to My Orders page
Click on your order
Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Verify It's Working

# Check if JupyterLab is accessible
curl https://your-http-pub.clorecloud.net/

# Access with token
# https://your-http-pub.clorecloud.net/?token=your_secure_token_here

If you get HTTP 502, wait 2-3 minutes - the service is installing dependencies.

Renting on CLORE.AI

Visit CLORE.AI Marketplace
Filter by GPU type, VRAM, and price
Choose On-Demand (fixed rate) or Spot (bid price)
Configure your order:
- Select Docker image
- Set ports (TCP for SSH, HTTP for web UIs)
- Add environment variables if needed
- Enter startup command
Select payment: CLORE, BTC, or USDT/USDC
Create order and wait for deployment

Access Your Server

Find connection details in My Orders
Web interfaces: Use the HTTP port URL
SSH: ssh -p <port> root@<proxy-address>

Access Jupyter

Wait for deployment
Find port 8888 mapping
Open: http://<proxy>:<port>?token=your_secure_token_here

Pre-configured ML Image

For full ML environment:

Image:

jupyter/pytorch-notebook:cuda12-pytorch-2.1.0

Or build custom:

FROM pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime

RUN pip install --no-cache-dir \
    jupyterlab \
    numpy pandas matplotlib seaborn \
    scikit-learn \
    transformers datasets accelerate \
    tensorboard wandb \
    opencv-python pillow \
    tqdm rich

EXPOSE 8888 6006

CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root"]

Essential Libraries

Install in Jupyter

!pip install transformers datasets accelerate bitsandbytes
!pip install wandb tensorboard
!pip install scikit-learn xgboost lightgbm
!pip install opencv-python albumentations

Create requirements.txt


# ML Frameworks
torch>=2.1.0
torchvision
torchaudio

# NLP
transformers>=4.36.0
datasets
tokenizers
sentencepiece

# Training
accelerate
bitsandbytes
peft
trl

# Monitoring
wandb
tensorboard

# Data
numpy
pandas
matplotlib
seaborn
scikit-learn

# Computer Vision
opencv-python
pillow
albumentations

Training Examples

PyTorch Image Classification

import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader

# Check GPU
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Load data
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

train_data = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform
)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=4)

# Model
model = torchvision.models.resnet18(pretrained=True)
model.fc = nn.Linear(512, 10)
model = model.cuda()

# Training
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

for epoch in range(10):
    model.train()
    for images, labels in train_loader:
        images, labels = images.cuda(), labels.cuda()

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# Save model
torch.save(model.state_dict(), 'model.pth')

HuggingFace Text Classification

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from datasets import load_dataset
import numpy as np

# Load dataset
dataset = load_dataset("imdb")

# Load model
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Tokenize
def tokenize(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized = dataset.map(tokenize, batched=True)

# Training
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=100,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
)

trainer.train()
trainer.save_model("./best_model")

LLM Fine-tuning with LoRA

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
from trl import SFTTrainer
import torch

# Load model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer.pad_token = tokenizer.eos_token

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)

# Load dataset
dataset = load_dataset("timdettmers/openassistant-guanaco")

# Train
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    dataset_text_field="text",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=TrainingArguments(
        output_dir="./lora_output",
        num_train_epochs=1,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=10,
        save_steps=100,
    ),
)

trainer.train()
trainer.save_model("./final_lora")

TensorBoard Integration

Start TensorBoard

%load_ext tensorboard
%tensorboard --logdir ./logs --port 6006 --bind_all

Or via terminal:

tensorboard --logdir ./logs --port 6006 --bind_all &

Log Training Metrics

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('./logs')

for epoch in range(epochs):
    # ... training loop ...
    writer.add_scalar('Loss/train', train_loss, epoch)
    writer.add_scalar('Loss/val', val_loss, epoch)
    writer.add_scalar('Accuracy/val', accuracy, epoch)

writer.close()

Weights & Biases Integration

import wandb

wandb.init(project="my-project", name="experiment-1")

# Log metrics
wandb.log({"loss": loss, "accuracy": acc})

# Log model
wandb.save("model.pth")

# Finish
wandb.finish()

Data Management

Download Datasets


# HuggingFace datasets
from datasets import load_dataset
dataset = load_dataset("squad")

# Kaggle datasets
!pip install kaggle
!kaggle datasets download -d username/dataset-name

# Direct download
!wget https://example.com/data.zip
!unzip data.zip

Mount Cloud Storage


# S3
!pip install boto3
import boto3
s3 = boto3.client('s3')
s3.download_file('bucket', 'key', 'local_path')

# Google Cloud
!pip install google-cloud-storage
from google.cloud import storage
client = storage.Client()
bucket = client.bucket('my-bucket')
blob = bucket.blob('data.zip')
blob.download_to_filename('data.zip')

Saving Work

Save to External Storage


# Save model to S3
import boto3
s3 = boto3.client('s3',
    aws_access_key_id='YOUR_KEY',
    aws_secret_access_key='YOUR_SECRET'
)
s3.upload_file('model.pth', 'my-bucket', 'models/model.pth')

Before Ending Session


# Download important files
scp -P <port> root@<host>:/workspace/model.pth ./
scp -P <port> -r root@<host>:/workspace/results/ ./results/

Multi-GPU Training

import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel

# Check GPUs
print(f"Available GPUs: {torch.cuda.device_count()}")

# DataParallel (simple)
model = nn.DataParallel(model)

# DistributedDataParallel (better)

# Launch with: torchrun --nproc_per_node=4 train.py
dist.init_process_group("nccl")
model = DistributedDataParallel(model)

Performance Tips

Memory Optimization


# Gradient checkpointing
model.gradient_checkpointing_enable()

# Mixed precision
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()

with autocast():
    output = model(input)
    loss = criterion(output, target)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Data Loading


# Faster data loading
loader = DataLoader(
    dataset,
    batch_size=64,
    num_workers=8,      # Use multiple workers
    pin_memory=True,    # Faster GPU transfer
    prefetch_factor=2   # Prefetch batches
)

Troubleshooting

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU

Hourly Rate

Daily Rate

4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplace for current rates.

Save money:

Use Spot market for flexible workloads (often 30-50% cheaper)
Pay with CLORE tokens
Compare prices across different providers

PreviousOverview NextDreamBooth

Last updated 25 days ago

Was this helpful?

hashtagServer Requirements

hashtagQuick Deploy

hashtagAccessing Your Service

hashtagVerify It's Working

hashtagRenting on CLORE.AI

hashtagAccess Your Server

hashtagAccess Jupyter

hashtagPre-configured ML Image

hashtagEssential Libraries

hashtagInstall in Jupyter

hashtagCreate requirements.txt

hashtagTraining Examples

hashtagPyTorch Image Classification

hashtagHuggingFace Text Classification

hashtagLLM Fine-tuning with LoRA

hashtagTensorBoard Integration

hashtagStart TensorBoard

hashtagLog Training Metrics

hashtagWeights & Biases Integration

hashtagData Management

hashtagDownload Datasets

hashtagMount Cloud Storage

hashtagSaving Work

hashtagSave to External Storage

hashtagBefore Ending Session

hashtagMulti-GPU Training

hashtagPerformance Tips

hashtagMemory Optimization

hashtagData Loading

hashtagTroubleshooting

hashtagCost Estimate

Server Requirements

Quick Deploy

Accessing Your Service

Verify It's Working

Renting on CLORE.AI

Access Your Server

Access Jupyter

Pre-configured ML Image

Essential Libraries

Install in Jupyter

Create requirements.txt

Training Examples

PyTorch Image Classification

HuggingFace Text Classification

LLM Fine-tuning with LoRA

TensorBoard Integration

Start TensorBoard

Log Training Metrics

Weights & Biases Integration

Data Management

Download Datasets

Mount Cloud Storage

Saving Work

Save to External Storage

Before Ending Session

Multi-GPU Training

Performance Tips

Memory Optimization

Data Loading

Troubleshooting

Cost Estimate