Phi-4

Run Microsoft's Phi-4 - a small but powerful language model.

circle-check

Renting on CLORE.AI

  1. Filter by GPU type, VRAM, and price

  2. Choose On-Demand (fixed rate) or Spot (bid price)

  3. Configure your order:

    • Select Docker image

    • Set ports (TCP for SSH, HTTP for web UIs)

    • Add environment variables if needed

    • Enter startup command

  4. Select payment: CLORE, BTC, or USDT/USDC

  5. Create order and wait for deployment

Access Your Server

  • Find connection details in My Orders

  • Web interfaces: Use the HTTP port URL

  • SSH: ssh -p <port> root@<proxy-address>

What is Phi-4?

Phi-4 from Microsoft offers:

  • 14B parameters with excellent performance

  • Beats larger models on benchmarks

  • Strong reasoning and math

  • Efficient inference

Model Variants

Model
Parameters
VRAM
Specialty

Phi-4

14B

16GB

General

Phi-3.5-mini

3.8B

4GB

Lightweight

Phi-3.5-MoE

42B (6.6B active)

16GB

Mixture of Experts

Phi-3.5-vision

4.2B

6GB

Vision

Quick Deploy

Docker Image:

Ports:

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Using Ollama

Installation

Basic Usage

Phi-3.5-Vision

For image understanding:

Math and Reasoning

Code Generation

Quantized Inference

Gradio Interface

Performance

Model
GPU
Tokens/sec

Phi-3.5-mini

RTX 3060

~100

Phi-3.5-mini

RTX 4090

~150

Phi-4

RTX 4090

~60

Phi-4

A100

~90

Phi-4 (4-bit)

RTX 3090

~40

Benchmarks

Model
MMLU
HumanEval
GSM8K

Phi-4

84.8%

82.6%

94.6%

GPT-4-Turbo

86.4%

85.4%

94.2%

Llama-3.1-70B

83.6%

80.5%

92.1%

Phi-4 matches or beats much larger models

Troubleshooting

"trust_remote_code" error

  • Add trust_remote_code=True to from_pretrained()

  • This is required for Phi models

Repetitive outputs

  • Lower temperature (0.3-0.6)

  • Add repetition_penalty=1.1

  • Use proper chat template

Memory issues

  • Phi-4 is efficient but still needs ~8GB for 14B

  • Use 4-bit quantization if needed

  • Reduce context length

Wrong output format

  • Use apply_chat_template() for proper formatting

  • Check you're using instruct version, not base

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU
Hourly Rate
Daily Rate
4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot market for flexible workloads (often 30-50% cheaper)

  • Pay with CLORE tokens

  • Compare prices across different providers

Use Cases

  • Math tutoring

  • Code assistance

  • Document analysis (vision)

  • Efficient edge deployment

  • Cost-effective inference

Next Steps

  • Qwen2.5 - Alternative model

  • Gemma 2 - Google's model

  • Llama 3.2 - Meta's model

Last updated

Was this helpful?