Qwen3.5

Run Alibaba Qwen3.5 on Clore.ai — the freshest frontier model (Feb 2026)

Qwen3.5, released February 16, 2026, is Alibaba's latest flagship model and one of the hottest open-source releases of 2026. The 397B MoE flagship beat Claude 4.5 Opus on the HMMT math benchmark, while the smaller 35B dense model fits on a single RTX 4090. All models come with agentic capabilities (tool use, function calling, autonomous task execution) and multimodal understanding out of the box.

Key Features

  • Three sizes: 9B (dense), 35B (dense), 397B (MoE) — something for every GPU

  • Beat Claude 4.5 Opus on HMMT math benchmark

  • Natively multimodal: Text + image understanding

  • Agentic capabilities: Tool use, function calling, autonomous workflows

  • 128K context window: Handle large documents and codebases

  • Apache 2.0 license: Full commercial use, no restrictions

Model Variants

Model
Params
Type
VRAM (Q4)
VRAM (FP16)
Strength

Qwen3.5-9B

9B

Dense

6GB

18GB

Fast, efficient

Qwen3.5-35B

35B

Dense

22GB

70GB

Best single-GPU

Qwen3.5-397B

397B

MoE

~100GB

400GB+

Frontier-class

Requirements

Component
9B (Q4)
35B (Q4)
397B (multi-GPU)

GPU

RTX 3080 10GB

RTX 4090 24GB

4× H100 80GB

VRAM

8GB

22GB

320GB+

RAM

16GB

32GB

128GB

Disk

15GB

30GB

250GB

Recommended Clore.ai GPU: RTX 4090 24GB (~$0.5–2/day) for 35B — best quality per dollar

Quick Start with Ollama

vLLM Setup (Production)

HuggingFace Transformers

Agentic / Tool Use Example

Why Qwen3.5 on Clore.ai?

The 35B model is arguably the best model you can run on a single RTX 4090:

  • Beats Llama 4 Scout on math and reasoning

  • Beats Gemma 3 27B on agentic tasks

  • Tool use / function calling works out of the box

  • Apache 2.0 = no license headaches

At $0.5–2/day for an RTX 4090, you get frontier-class AI for the cost of a coffee.

Tips for Clore.ai Users

  • 35B is the sweet spot: Fits on RTX 4090 Q4, outperforms most 70B models

  • 9B for budget: Even RTX 3060 ($0.15/day) runs the 9B model well

  • Use Ollama for quick start: One command to serve; OpenAI-compatible API included

  • Agentic workflows: Qwen3.5 excels at tool use — combine with function calling for automation

  • Fresh model = less cached: First download takes time (~20GB for 35B). Pre-pull before your workload starts

Troubleshooting

Issue
Solution

35B OOM on 24GB

Use load_in_4bit=True or reduce --max-model-len

Ollama model not found

Update Ollama: curl -fsSL https://ollama.com/install.sh | sh

Slow on first request

Model loading takes 30-60s; subsequent requests are fast

Tool calls not working

Ensure you pass tools parameter; use instruct variant only

Further Reading

Last updated

Was this helpful?