Phi-4

在 Clore.ai GPU 上运行微软的 Phi-4 小型语言模型

运行微软的 Phi-4 —— 一个小巧但强大的语言模型。

所有示例都可以在通过以下方式租用的 GPU 服务器上运行： CLORE.AI 市场.

在 CLORE.AI 上租用

访问 CLORE.AI 市场
按 GPU 类型、显存和价格筛选
选择按需（固定费率）或竞价（出价价格）
配置您的订单：
- 选择 Docker 镜像
- 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
- 如有需要，添加环境变量
- 输入启动命令
选择支付方式： CLORE, BTC，或 USDT/USDC
创建订单并等待部署

访问您的服务器

在以下位置查找连接详情： 我的订单
Web 界面：使用 HTTP 端口的 URL
SSH： ssh -p <port> root@<proxy-address>

什么是 Phi-4？

微软的 Phi-4 提供：

14B 参数且性能出色
在基准测试中超过更大模型
强大的推理和数学能力
高效推理

1024x1024

A100

参数量

显存

专长

Phi-4

14B

16GB

通用

Phi-3.5-mini

3.8B

4GB

轻量级

Phi-3.5-MoE

42B（6.6B 活跃）

16GB

专家混合（Mixture of Experts）

Phi-3.5-vision

4.2B

6GB

视觉

快速部署

Docker 镜像：

pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime

端口：

22/tcp
8000/http

命令：

pip install transformers accelerate torch && \
python phi4_server.py

访问您的服务

部署后，在以下位置查找您的 http_pub URL： 我的订单:

前往 我的订单 页面
单击您的订单
查找 http_pub URL（例如， abc123.clorecloud.net)

使用 https://YOUR_HTTP_PUB_URL 而不是 localhost 在下面的示例中。

使用 Ollama


# 运行 Phi-4
ollama run phi4

# Phi-3.5 mini（更快）
ollama run phi3.5

# Phi-3.5 视觉版
ollama run phi3.5-vision

安装

pip install transformers accelerate torch

基本用法

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "microsoft/Phi-4"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Explain the difference between TCP and UDP."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to("cuda")

outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Phi-3.5-Vision

用于图像理解：

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import torch

model_id = "microsoft/Phi-3.5-vision-instruct"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

image = Image.open("diagram.png")

messages = [
    {"role": "user", "content": "<|image_1|>\nDescribe this diagram in detail."}
]

prompt = processor.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = processor(prompt, [image], return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7
)

response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

数学与推理

messages = [
    {"role": "user", "content": """
逐步求解：
一位农夫有鸡和兔子。
总头数：35
总腿数：94
每种动物各有多少只？
"""}
]

# Phi-4 擅长逐步推理

代码生成

messages = [
    {"role": "user", "content": """
用 Python 实现二叉搜索树，包括：
- 插入
- 搜索
- 删除
- 中序遍历
包含类型提示和文档字符串。
"""}
]

量化推理

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-4",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

Gradio 界面

print(f"已生成：{name}")
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "microsoft/Phi-4"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

def chat(message, history, system_prompt, temperature):
    messages = [{"role": "system", "content": system_prompt}]
    for h in history:
        messages.append({"role": "user", "content": h[0]})
        messages.append({"role": "assistant", "content": h[1]})
    messages.append({"role": "user", "content": message})

    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
    outputs = model.generate(inputs, max_new_tokens=512, temperature=temperature, do_sample=True)

    return tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)

demo = gr.ChatInterface(
    fn=chat,
    additional_inputs=[
        gr.Textbox(value="You are a helpful assistant.", label="System"),
        gr.Slider(0.1, 1.5, value=0.7, label="Temperature")
    ],
    title="Phi-4 Chat"
)

demo.launch(server_name="0.0.0.0", server_port=7860)

background = Image.open("studio_bg.jpg")

A100

GPU

每秒标记数

Phi-3.5-mini

按小时费率

~100

Phi-3.5-mini

512x512

~150

Phi-4

512x512

~60

Phi-4

~90

Phi-4（4-bit）

速度

~40

基准测试

A100

MMLU

HumanEval

GSM8K

Phi-4

84.8%

82.6%

94.6%

GPT-4-Turbo

86.4%

85.4%

94.2%

Llama-3.1-70B

83.6%

80.5%

92.1%

Phi-4 表现可比或优于更大模型

# 使用固定种子以获得一致结果

"trust_remote_code" 错误

添加 trust_remote_code=True 设置为 from_pretrained()
Phi 模型需要此项

重复输出

降低温度（0.3-0.6）
添加 repetition_penalty=1.1
使用适当的对话模板

内存问题

Phi-4 高效但 14B 仍需约 8GB
如有需要使用 4-bit 量化
减少上下文长度

输出格式错误

使用 apply_chat_template() 以获得正确格式
检查你使用的是 instruct 版本，而不是 base

下载所有所需的检查点

检查文件完整性

GPU

验证 CUDA 兼容性

费用估算

CLORE.AI 市场的典型费率（截至 2024 年）：

按小时费率

~$0.03

~$0.70

~$0.12

速度

~$0.06

~$1.50

~$0.25

512x512

~$0.10

~$2.30

~$0.40

按日费率

~$0.17

~$4.00

~$0.70

4 小时会话

~$0.25

~$6.00

~$1.00

RTX 3060 CLORE.AI 市场 A100 40GB

A100 80GB

使用竞价价格随提供商和需求而异。请查看
以获取当前费率。 CLORE 节省费用：
市场用于灵活工作负载（通常便宜 30-50%）

使用场景

数学辅导
代码辅助
文档分析（视觉）
高效的边缘部署
具有成本效益的推理

使用以下方式支付

Qwen2.5 - 可选模型
Gemma 2 - 谷歌的模型
Llama 3.2 - Meta 的模型

上一页Gemma 2 下一页Llama 4（Scout & Maverick）

最后更新于21天前

这有帮助吗？

hashtag在 CLORE.AI 上租用

hashtag访问您的服务器

hashtag什么是 Phi-4？

hashtag1024x1024

hashtag快速部署

hashtag访问您的服务

hashtag使用 Ollama

hashtag安装

hashtag基本用法

hashtagPhi-3.5-Vision

hashtag数学与推理

hashtag代码生成

hashtag量化推理

hashtagGradio 界面

hashtagbackground = Image.open("studio_bg.jpg")

hashtag基准测试

hashtag# 使用固定种子以获得一致结果

hashtag"trust_remote_code" 错误

hashtag重复输出

hashtag内存问题

hashtag输出格式错误

hashtag下载所有所需的检查点

hashtag使用场景

hashtag使用以下方式支付

在 CLORE.AI 上租用

访问您的服务器

什么是 Phi-4？

1024x1024

快速部署

访问您的服务

使用 Ollama

安装

基本用法

Phi-3.5-Vision

数学与推理

代码生成

量化推理

Gradio 界面

background = Image.open("studio_bg.jpg")

基准测试

# 使用固定种子以获得一致结果

"trust_remote_code" 错误

重复输出

内存问题

输出格式错误

下载所有所需的检查点

使用场景

使用以下方式支付