# API Integration

> 💡 **Recommended:** Use the official [clore-ai Python SDK](/guides/advanced/python-sdk.md) instead of raw HTTP requests for managing Clore.ai servers and orders. Built-in rate limiting, retries, type safety, and async support.

Integrate AI models running on CLORE.AI into your applications.

{% hint style="success" %}
Deploy API servers at [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Quick Start

Most AI services on CLORE.AI provide OpenAI-compatible APIs. Replace the base URL and you're ready.

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://<your-clore-server>:8000/v1",
    api_key="not-needed"  # Most self-hosted don't require key
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```

***

## LLM APIs

### vLLM (OpenAI Compatible)

**Server setup:**

```bash
python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --host 0.0.0.0 --port 8000
```

**Python client:**

```python
from openai import OpenAI

client = OpenAI(base_url="http://server:8000/v1", api_key="dummy")

# Chat completion
response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a poem about coding"}
    ],
    temperature=0.7,
    max_tokens=500
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

**Node.js client:**

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'http://server:8000/v1',
    apiKey: 'dummy'
});

async function chat(message) {
    const response = await client.chat.completions.create({
        model: 'meta-llama/Llama-3.1-8B-Instruct',
        messages: [{ role: 'user', content: message }]
    });
    return response.choices[0].message.content;
}

// Streaming
async function streamChat(message) {
    const stream = await client.chat.completions.create({
        model: 'meta-llama/Llama-3.1-8B-Instruct',
        messages: [{ role: 'user', content: message }],
        stream: true
    });

    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content || '');
    }
}
```

**cURL:**

```bash
curl http://server:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "messages": [{"role": "user", "content": "Hello!"}]
    }'
```

### Ollama API

**Python:**

```python
import requests

# Generate
response = requests.post('http://server:11434/api/generate', json={
    'model': 'llama3.2',
    'prompt': 'Why is the sky blue?',
    'stream': False
})
print(response.json()['response'])

# Chat
response = requests.post('http://server:11434/api/chat', json={
    'model': 'llama3.2',
    'messages': [
        {'role': 'user', 'content': 'Hello!'}
    ],
    'stream': False
})
print(response.json()['message']['content'])

# Streaming
response = requests.post('http://server:11434/api/chat', json={
    'model': 'llama3.2',
    'messages': [{'role': 'user', 'content': 'Tell me a story'}],
    'stream': True
}, stream=True)

for line in response.iter_lines():
    if line:
        data = json.loads(line)
        print(data['message']['content'], end='', flush=True)
```

**Ollama also supports OpenAI format:**

```python
from openai import OpenAI

client = OpenAI(base_url='http://server:11434/v1', api_key='ollama')
# Use same code as vLLM examples
```

### TGI API

**Python:**

```python
import requests

# Generate
response = requests.post('http://server:8080/generate', json={
    'inputs': 'What is machine learning?',
    'parameters': {
        'max_new_tokens': 200,
        'temperature': 0.7,
        'do_sample': True
    }
})
print(response.json()['generated_text'])

# Streaming
response = requests.post('http://server:8080/generate_stream', json={
    'inputs': 'Explain quantum computing',
    'parameters': {'max_new_tokens': 500}
}, stream=True)

for line in response.iter_lines():
    if line:
        data = json.loads(line.decode().replace('data:', ''))
        print(data.get('token', {}).get('text', ''), end='', flush=True)
```

***

## Image Generation APIs

### Stable Diffusion WebUI API

**Enable API:** Add `--api` to launch command.

**Python:**

```python
import requests
import base64
from PIL import Image
from io import BytesIO

def txt2img(prompt, negative_prompt="", steps=20, width=512, height=512):
    response = requests.post('http://server:7860/sdapi/v1/txt2img', json={
        'prompt': prompt,
        'negative_prompt': negative_prompt,
        'steps': steps,
        'width': width,
        'height': height,
        'sampler_name': 'DPM++ 2M Karras',
        'cfg_scale': 7
    })

    # Decode base64 image
    image_data = base64.b64decode(response.json()['images'][0])
    return Image.open(BytesIO(image_data))

# Generate
image = txt2img(
    prompt="A beautiful sunset over mountains, photorealistic, 8k",
    negative_prompt="blurry, low quality"
)
image.save("output.png")

# img2img
def img2img(prompt, image_path, denoising=0.5):
    with open(image_path, 'rb') as f:
        image_b64 = base64.b64encode(f.read()).decode()

    response = requests.post('http://server:7860/sdapi/v1/img2img', json={
        'prompt': prompt,
        'init_images': [image_b64],
        'denoising_strength': denoising,
        'steps': 30
    })

    image_data = base64.b64decode(response.json()['images'][0])
    return Image.open(BytesIO(image_data))
```

**Node.js:**

```javascript
const axios = require('axios');
const fs = require('fs');

async function txt2img(prompt) {
    const response = await axios.post('http://server:7860/sdapi/v1/txt2img', {
        prompt: prompt,
        steps: 20,
        width: 512,
        height: 512
    });

    const imageBuffer = Buffer.from(response.data.images[0], 'base64');
    fs.writeFileSync('output.png', imageBuffer);
}
```

### ComfyUI API

**Python:**

```python
import json
import urllib.request
import urllib.parse
import websocket
import uuid

SERVER = "server:8188"

def queue_prompt(workflow):
    """Queue a workflow for execution"""
    data = json.dumps({"prompt": workflow}).encode('utf-8')
    req = urllib.request.Request(f"http://{SERVER}/prompt", data=data)
    return json.loads(urllib.request.urlopen(req).read())

def get_image(filename, subfolder, folder_type):
    """Download generated image"""
    params = urllib.parse.urlencode({
        "filename": filename,
        "subfolder": subfolder,
        "type": folder_type
    })
    with urllib.request.urlopen(f"http://{SERVER}/view?{params}") as response:
        return response.read()

# Load workflow from file
with open('workflow.json') as f:
    workflow = json.load(f)

# Modify prompt
workflow["6"]["inputs"]["text"] = "A cat wearing a hat"

# Queue and get result
result = queue_prompt(workflow)
print(f"Queued: {result}")
```

**WebSocket for progress:**

```python
import websocket
import json

def on_message(ws, message):
    data = json.loads(message)
    if data['type'] == 'progress':
        print(f"Progress: {data['data']['value']}/{data['data']['max']}")
    elif data['type'] == 'executed':
        print("Generation complete!")

ws = websocket.WebSocketApp(
    f"ws://{SERVER}/ws",
    on_message=on_message
)
ws.run_forever()
```

### FLUX with Diffusers

```python
import torch
from diffusers import FluxPipeline
import base64
from io import BytesIO

# Load model (do once)
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

def generate_image(prompt, height=1024, width=1024):
    image = pipe(
        prompt,
        height=height,
        width=width,
        num_inference_steps=4,
        guidance_scale=0.0
    ).images[0]
    return image

# Simple API wrapper with Flask
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/generate', methods=['POST'])
def generate():
    data = request.json
    image = generate_image(data['prompt'])

    # Convert to base64
    buffer = BytesIO()
    image.save(buffer, format='PNG')
    img_b64 = base64.b64encode(buffer.getvalue()).decode()

    return jsonify({'image': img_b64})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

***

## Audio APIs

### Whisper Transcription

**Using whisper-asr-webservice:**

```python
import requests

def transcribe(audio_path):
    with open(audio_path, 'rb') as f:
        response = requests.post(
            'http://server:9000/asr',
            files={'audio_file': f},
            data={
                'task': 'transcribe',
                'language': 'en',
                'output': 'json'
            }
        )
    return response.json()['text']

text = transcribe('audio.mp3')
print(text)
```

**Direct Whisper API:**

```python
import whisper
from flask import Flask, request, jsonify

model = whisper.load_model("large-v3")

app = Flask(__name__)

@app.route('/transcribe', methods=['POST'])
def transcribe():
    audio = request.files['audio']
    audio.save('/tmp/audio.mp3')

    result = model.transcribe('/tmp/audio.mp3')
    return jsonify({'text': result['text']})
```

### Text-to-Speech (Bark)

```python
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
import base64
from flask import Flask, request, jsonify

preload_models()

app = Flask(__name__)

@app.route('/tts', methods=['POST'])
def text_to_speech():
    text = request.json['text']
    audio = generate_audio(text)

    # Save to file
    write_wav('/tmp/output.wav', SAMPLE_RATE, audio)

    # Return base64
    with open('/tmp/output.wav', 'rb') as f:
        audio_b64 = base64.b64encode(f.read()).decode()

    return jsonify({'audio': audio_b64})
```

***

## Building Applications

### Chat Application

```python
from flask import Flask, request, jsonify, Response
from openai import OpenAI
import json

app = Flask(__name__)
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

@app.route('/chat', methods=['POST'])
def chat():
    messages = request.json.get('messages', [])

    response = client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=messages,
        temperature=0.7
    )

    return jsonify({
        'response': response.choices[0].message.content
    })

@app.route('/chat/stream', methods=['POST'])
def chat_stream():
    messages = request.json.get('messages', [])

    def generate():
        stream = client.chat.completions.create(
            model="meta-llama/Llama-3.1-8B-Instruct",
            messages=messages,
            stream=True
        )
        for chunk in stream:
            if chunk.choices[0].delta.content:
                yield f"data: {json.dumps({'content': chunk.choices[0].delta.content})}\n\n"
        yield "data: [DONE]\n\n"

    return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

### Image Generation Service

```python
from flask import Flask, request, jsonify, send_file
import requests
import base64
from io import BytesIO

app = Flask(__name__)
SD_API = "http://localhost:7860"

@app.route('/generate', methods=['POST'])
def generate():
    data = request.json

    response = requests.post(f'{SD_API}/sdapi/v1/txt2img', json={
        'prompt': data['prompt'],
        'negative_prompt': data.get('negative_prompt', ''),
        'steps': data.get('steps', 20),
        'width': data.get('width', 512),
        'height': data.get('height', 512)
    })

    image_b64 = response.json()['images'][0]

    if data.get('return_base64'):
        return jsonify({'image': image_b64})

    # Return as file
    image_data = base64.b64decode(image_b64)
    return send_file(BytesIO(image_data), mimetype='image/png')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

### Multi-Modal Pipeline

```python
from flask import Flask, request, jsonify
from openai import OpenAI
import requests
import base64

app = Flask(__name__)
llm_client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
SD_API = "http://localhost:7860"

@app.route('/create-image-from-description', methods=['POST'])
def create_image():
    description = request.json['description']

    # Step 1: Generate detailed prompt with LLM
    prompt_response = llm_client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{
            "role": "user",
            "content": f"Create a detailed image generation prompt for: {description}. Include style, lighting, and composition details. Return only the prompt, no explanation."
        }]
    )
    detailed_prompt = prompt_response.choices[0].message.content

    # Step 2: Generate image
    image_response = requests.post(f'{SD_API}/sdapi/v1/txt2img', json={
        'prompt': detailed_prompt,
        'steps': 25,
        'width': 1024,
        'height': 1024
    })

    return jsonify({
        'prompt_used': detailed_prompt,
        'image': image_response.json()['images'][0]
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

***

## Error Handling

```python
from openai import OpenAI, APIError, APIConnectionError
import time

client = OpenAI(base_url="http://server:8000/v1", api_key="dummy")

def chat_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="meta-llama/Llama-3.1-8B-Instruct",
                messages=messages,
                timeout=60
            )
            return response.choices[0].message.content

        except APIConnectionError as e:
            print(f"Connection error (attempt {attempt + 1}): {e}")
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise

        except APIError as e:
            print(f"API error: {e}")
            raise

# Usage
try:
    result = chat_with_retry([{"role": "user", "content": "Hello"}])
    print(result)
except Exception as e:
    print(f"Failed after retries: {e}")
```

***

## Best Practices

1. **Connection pooling** - Reuse HTTP connections
2. **Async requests** - Use aiohttp for concurrent calls
3. **Timeouts** - Always set request timeouts
4. **Retry logic** - Handle temporary failures
5. **Rate limiting** - Don't overwhelm the server
6. **Health checks** - Monitor server availability

***

## Next Steps

* [Batch Processing](/guides/advanced/batch-processing.md) - Process large workloads
* [Multi-GPU Setup](/guides/advanced/multi-gpu-setup.md) - Scale your deployment
* [LLM Comparison](/guides/comparisons/llm-serving-comparison.md) - Choose the right server


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/advanced/api-integration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
