Dify.ai Workflow Platform

Deploy Dify.ai on Clore.ai — build production-ready AI workflows, RAG pipelines, and agent applications with a visual interface at GPU cloud prices.

Overview

Dify.aiarrow-up-right is an open-source LLM application development platform with 114K+ GitHub stars. It combines a visual workflow builder, retrieval-augmented generation (RAG) pipeline, agent orchestration, model management, and a one-click API deployment layer into a single self-hostable stack.

On Clore.ai you can run the full Dify stack — including its Postgres database, Redis cache, Weaviate vector store, Nginx reverse proxy, API workers, and web frontend — on a rented GPU server for as little as $0.20–$0.35/hr (RTX 3090/4090). The GPU is optional for Dify itself, but becomes essential when you integrate local model inference through Ollama or vLLM backends.

Key capabilities:

  • 🔄 Visual workflow builder — drag-and-drop LLM pipelines with branching, loops, and conditional logic

  • 📚 RAG pipeline — upload PDFs, URLs, Notion pages; chunking + embedding + retrieval all managed in UI

  • 🤖 Agent mode — ReAct and function-calling agents with tool use (web search, code interpreter, custom APIs)

  • 🚀 API-first — every app generates a REST endpoint and SDK snippets instantly

  • 🔌 100+ model integrations — OpenAI, Anthropic, Mistral, Cohere, plus local models via Ollama/vLLM

  • 🏢 Multi-tenant — teams, workspaces, RBAC, usage quotas


Requirements

Dify runs as a multi-container Docker Compose stack. The minimum viable server for development is a CPU-only instance; for production with local model inference you'll want a GPU node.

Configuration
GPU
VRAM
System RAM
Disk
Clore.ai Price

Minimal (API keys only)

None / CPU

8 GB

30 GB

~$0.05/hr (CPU)

Standard

RTX 3080

10 GB

16 GB

50 GB

~$0.15/hr

Recommended

RTX 3090 / 4090

24 GB

32 GB

80 GB

$0.20–0.35/hr

Production + Local LLM

A100 80 GB

80 GB

64 GB

200 GB

~$1.10/hr

High-throughput

H100 SXM

80 GB

128 GB

500 GB

~$2.50/hr

Tip: If you only use cloud API providers (OpenAI, Anthropic, etc.), any 2-core CPU instance with 8 GB RAM works. A GPU matters only when running local models via Ollama or vLLM — see GPU Acceleration below.

Disk note

Weaviate and Postgres data grow quickly with document uploads. Provision at least 50 GB and mount persistent storage via Clore.ai's volume options.


Quick Start

1. Rent a Clore.ai server

Browse to clore.aiarrow-up-right, filter by your desired GPU, and deploy a server with:

  • Docker pre-installed (all Clore images include it)

  • Exposed ports 80 and 443 (add custom ports in the offer settings if needed)

  • SSH access enabled

2. Connect and prepare the server

3. Clone Dify and launch

4. Verify all services are healthy

5. Access the web UI

Open your browser and navigate to:

On first launch, Dify will redirect you to the setup wizard to create the admin account. Complete the wizard, then log in.


Configuration

All configuration lives in dify/docker/.env. Here are the most important settings:

Essential environment variables

Changing the exposed port

By default Nginx listens on port 80. To change it:

Persistent data volumes

Dify's Compose file mounts these volumes by default:

To back up:


GPU Acceleration

Dify's core platform is CPU-based, but you unlock local model inference by integrating Ollama or vLLM as model providers — both benefit enormously from a GPU.

Option A: Ollama sidecar (easiest)

Run Ollama alongside Dify on the same Clore server:

Then in Dify UI → Settings → Model Providers → Ollama:

  • Base URL: http://localhost:11434

  • Select your model and save

For a full Ollama guide, see language-models/ollama.md.

Option B: vLLM sidecar (high-throughput)

Then in Dify UI → Settings → Model Providers → OpenAI-compatible:

  • Base URL: http://localhost:8000/v1

  • API Key: dummy

  • Model name: mistralai/Mistral-7B-Instruct-v0.2

For full vLLM setup, see language-models/vllm.md.

GPU memory recommendations for local models

Model
VRAM Required
Recommended Clore GPU

Llama 3 8B (Q4)

6 GB

RTX 3060

Llama 3 8B (FP16)

16 GB

RTX 3090 / 4090

Mistral 7B (Q4)

5 GB

RTX 3060

Llama 3 70B (Q4)

40 GB

A100 40GB

Llama 3 70B (FP16)

140 GB

2× H100


Tips & Best Practices

Cost optimization on Clore.ai

Scale workers for heavy workloads

Monitor resource usage

RAG performance tuning

  • Set chunk size to 512–1024 tokens for most document types

  • Enable parent-child retrieval for long documents in Dataset settings

  • Use hybrid search (keyword + vector) for better recall on technical content

  • Index documents during off-peak hours to avoid API rate limits


Troubleshooting

Services keep restarting

"Migration failed" on startup

Can't connect to Ollama from Dify

Out of disk space

Weaviate vector store errors

Port 80 already in use


Further Reading

Last updated

Was this helpful?