# Intégration API

> 💡 **Recommandé :** Utilisez l'officiel [SDK Python clore-ai](https://docs.clore.ai/guides/guides_v2-fr/avance/python-sdk) au lieu de requêtes HTTP brutes pour gérer les serveurs et commandes Clore.ai. Limitation de débit intégrée, réessais, sécurité de type et support async.

Intégrez des modèles d'IA exécutés sur CLORE.AI dans vos applications.

{% hint style="success" %}
Déployer des serveurs API sur [Place de marché CLORE.AI](https://clore.ai/marketplace).
{% endhint %}

## Démarrage rapide

La plupart des services d'IA sur CLORE.AI fournissent des API compatibles OpenAI. Remplacez l'URL de base et vous êtes prêt.

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://<votre-serveur-clore>:8000/v1",
    api_key="pas-nécessaire"  # La plupart des instances auto-hébergées n'exigent pas de clé
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Bonjour !"}]
)
print(response.choices[0].message.content)
```

***

## APIs LLM

### vLLM (compatible OpenAI)

**Configuration du serveur :**

```bash
python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --host 0.0.0.0 --port 8000
```

**Client Python :**

```python
from openai import OpenAI

client = OpenAI(base_url="http://server:8000/v1", api_key="dummy")

# Chat completion
response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "Vous êtes un assistant serviable."},
        {"role": "user", "content": "Écris un poème sur la programmation"}
    ],
    temperature=0.7,
    max_tokens=500
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Raconte-moi une histoire"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

**Client Node.js :**

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'http://server:8000/v1',
    apiKey: 'dummy'
});

async function chat(message) {
    const response = await client.chat.completions.create({
        model: 'meta-llama/Llama-3.1-8B-Instruct',
        messages: [{ role: 'user', content: message }]
    });
    return response.choices[0].message.content;
}

// Streaming
async function streamChat(message) {
    const stream = await client.chat.completions.create({
        model: 'meta-llama/Llama-3.1-8B-Instruct',
        messages: [{ role: 'user', content: message }],
        stream: true
    });

    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content || '');
    }
}
```

**cURL :**

```bash
curl http://server:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "messages": [{"role": "user", "content": "Bonjour !"}]
    }'
```

### API Ollama

**Python :**

```python
import requests

# Générer
response = requests.post('http://server:11434/api/generate', json={
    'model': 'llama3.2',
    'prompt': 'Pourquoi le ciel est-il bleu ?',
    'stream': False
})
print(response.json()['response'])

# Chat
response = requests.post('http://server:11434/api/chat', json={
    'model': 'llama3.2',
    'messages': [
        {'role': 'user', 'content': 'Bonjour !'}
    ],
    'stream': False
})
print(response.json()['message']['content'])

# Streaming
response = requests.post('http://server:11434/api/chat', json={
    'model': 'llama3.2',
    'messages': [{'role': 'user', 'content': 'Raconte-moi une histoire'}],
    'stream': True
}, stream=True)

for line in response.iter_lines():
    if line:
        data = json.loads(line)
        print(data['message']['content'], end='', flush=True)
```

**Ollama prend également en charge le format OpenAI :**

```python
from openai import OpenAI

client = OpenAI(base_url='http://server:11434/v1', api_key='ollama')
# Utiliser le même code que les exemples vLLM
```

### API TGI

**Python :**

```python
import requests

# Générer
response = requests.post('http://server:8080/generate', json={
    'inputs': 'Qu'est-ce que l'apprentissage automatique ?',
    'parameters': {
        'max_new_tokens': 200,
        'temperature': 0.7,
        'do_sample': True
    }
})
print(response.json()['generated_text'])

# Streaming
response = requests.post('http://server:8080/generate_stream', json={
    'inputs': 'Explique l'informatique quantique',
    'parameters': {'max_new_tokens': 500}
}, stream=True)

for line in response.iter_lines():
    if line:
        data = json.loads(line.decode().replace('data:', ''))
        print(data.get('token', {}).get('text', ''), end='', flush=True)
```

***

## APIs de génération d'images

### API Stable Diffusion WebUI

**Activer l'API :** Ajouter `--api` à la commande de lancement.

**Python :**

```python
import requests
import base64
from PIL import Image
from io import BytesIO

def txt2img(prompt, negative_prompt="", steps=20, width=512, height=512):
    response = requests.post('http://server:7860/sdapi/v1/txt2img', json={
        'prompt': prompt,
        'negative_prompt': negative_prompt,
        'steps': steps,
        'width': width,
        'height': height,
        'sampler_name': 'DPM++ 2M Karras',
        'cfg_scale': 7
    })

    # Décoder l'image base64
    image_data = base64.b64decode(response.json()['images'][0])
    return Image.open(BytesIO(image_data))

# Générer
image = txt2img(
    prompt="Un magnifique coucher de soleil sur des montagnes, photoréaliste, 8k",
    negative_prompt="flou, mauvaise qualité"
)
image.save("output.png")

# img2img
def img2img(prompt, image_path, denoising=0.5):
    with open(image_path, 'rb') as f:
        image_b64 = base64.b64encode(f.read()).decode()

    response = requests.post('http://server:7860/sdapi/v1/img2img', json={
        'prompt': prompt,
        'init_images': [image_b64],
        'denoising_strength': denoising,
        'steps': 30
    })

    image_data = base64.b64decode(response.json()['images'][0])
    return Image.open(BytesIO(image_data))
```

**Node.js :**

```javascript
const axios = require('axios');
const fs = require('fs');

async function txt2img(prompt) {
    const response = await axios.post('http://server:7860/sdapi/v1/txt2img', {
        prompt: prompt,
        steps: 20,
        width: 512,
        height: 512
    });

    const imageBuffer = Buffer.from(response.data.images[0], 'base64');
    fs.writeFileSync('output.png', imageBuffer);
}
```

### API ComfyUI

**Python :**

```python
import json
import urllib.request
import urllib.parse
import websocket
import uuid

SERVER = "server:8188"

def queue_prompt(workflow):
    """Mettre en file d'exécution un workflow"""
    data = json.dumps({"prompt": workflow}).encode('utf-8')
    req = urllib.request.Request(f"http://{SERVER}/prompt", data=data)
    return json.loads(urllib.request.urlopen(req).read())

def get_image(filename, subfolder, folder_type):
    """Télécharger l'image générée"""
    params = urllib.parse.urlencode({
        "filename": filename,
        "subfolder": subfolder,
        "type": folder_type
    })
    with urllib.request.urlopen(f"http://{SERVER}/view?{params}") as response:
        return response.read()

# Charger le workflow depuis un fichier
with open('workflow.json') as f:
    workflow = json.load(f)

# Modifier le prompt
workflow["6"]["inputs"]["text"] = "Un chat portant un chapeau"

# Mettre en file et obtenir le résultat
result = queue_prompt(workflow)
print(f"En file : {result}")
```

**WebSocket pour le progrès :**

```python
import websocket
import json

def on_message(ws, message):
    data = json.loads(message)
    if data['type'] == 'progress':
        print(f"Progression : {data['data']['value']}/{data['data']['max']}")
    elif data['type'] == 'executed':
        print("Génération terminée !")

ws = websocket.WebSocketApp(
    f"ws://{SERVER}/ws",
    on_message=on_message
)
ws.run_forever()
```

### FLUX avec Diffusers

```python
import torch
from diffusers import FluxPipeline
import base64
from io import BytesIO

# Charger le modèle (une seule fois)
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

def generate_image(prompt, height=1024, width=1024):
    image = pipe(
        prompt,
        height=height,
        width=width,
        num_inference_steps=4,
        guidance_scale=0.0
    ).images[0]
    return image

# Wrapper API simple avec Flask
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/generate', methods=['POST'])
def generate():
    data = request.json
    image = generate_image(data['prompt'])

    # Convertir en base64
    buffer = BytesIO()
    image.save(buffer, format='PNG')
    img_b64 = base64.b64encode(buffer.getvalue()).decode()

    return jsonify({'image': img_b64})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

***

## APIs audio

### Transcription Whisper

**Utilisation de whisper-asr-webservice :**

```python
import requests

def transcribe(audio_path):
    with open(audio_path, 'rb') as f:
        response = requests.post(
            'http://server:9000/asr',
            files={'audio_file': f},
            data={
                'task': 'transcribe',
                'language': 'en',
                'output': 'json'
            }
        )
    return response.json()['text']

text = transcribe('audio.mp3')
print(text)
```

**API Whisper directe :**

```python
import whisper
from flask import Flask, request, jsonify

model = whisper.load_model("large-v3")

app = Flask(__name__)

@app.route('/transcribe', methods=['POST'])
def transcribe():
    audio = request.files['audio']
    audio.save('/tmp/audio.mp3')

    result = model.transcribe('/tmp/audio.mp3')
    return jsonify({'text': result['text']})
```

### Texte en parole (Bark)

```python
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
import base64
from flask import Flask, request, jsonify

preload_models()

app = Flask(__name__)

@app.route('/tts', methods=['POST'])
def text_to_speech():
    text = request.json['text']
    audio = generate_audio(text)

    # Enregistrer dans un fichier
    write_wav('/tmp/output.wav', SAMPLE_RATE, audio)

    # Retourner en base64
    with open('/tmp/output.wav', 'rb') as f:
        audio_b64 = base64.b64encode(f.read()).decode()

    return jsonify({'audio': audio_b64})
```

***

## Construire des applications

### Application de chat

```python
from flask import Flask, request, jsonify, Response
from openai import OpenAI
import json

app = Flask(__name__)
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

@app.route('/chat', methods=['POST'])
def chat():
    messages = request.json.get('messages', [])

    response = client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=messages,
        temperature=0.7
    )

    return jsonify({
        'response': response.choices[0].message.content
    })

@app.route('/chat/stream', methods=['POST'])
def chat_stream():
    messages = request.json.get('messages', [])

    def generate():
        stream = client.chat.completions.create(
            model="meta-llama/Llama-3.1-8B-Instruct",
            messages=messages,
            stream=True
        )
        for chunk in stream:
            if chunk.choices[0].delta.content:
                yield f"data: {json.dumps({'content': chunk.choices[0].delta.content})}\n\n"
        yield "data: [DONE]\n\n"

    return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

### Service de génération d'images

```python
from flask import Flask, request, jsonify, send_file
import requests
import base64
from io import BytesIO

app = Flask(__name__)
SD_API = "http://localhost:7860"

@app.route('/generate', methods=['POST'])
def generate():
    data = request.json

    response = requests.post(f'{SD_API}/sdapi/v1/txt2img', json={
        'prompt': data['prompt'],
        'negative_prompt': data.get('negative_prompt', ''),
        'steps': data.get('steps', 20),
        'width': data.get('width', 512),
        'height': data.get('height', 512)
    })

    image_b64 = response.json()['images'][0]

    if data.get('return_base64'):
        return jsonify({'image': image_b64})

    # Retourner en tant que fichier
    image_data = base64.b64decode(image_b64)
    return send_file(BytesIO(image_data), mimetype='image/png')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

### Pipeline multimodal

```python
from flask import Flask, request, jsonify
from openai import OpenAI
import requests
import base64

app = Flask(__name__)
llm_client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
SD_API = "http://localhost:7860"

@app.route('/create-image-from-description', methods=['POST'])
def create_image():
    description = request.json['description']

    # Étape 1 : Générer un prompt détaillé avec le LLM
    prompt_response = llm_client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{
            "role": "user",
            "content": f"Create a detailed image generation prompt for: {description}. Include style, lighting, and composition details. Return only the prompt, no explanation."
        }]
    )
    detailed_prompt = prompt_response.choices[0].message.content

    # Étape 2 : Générer l'image
    image_response = requests.post(f'{SD_API}/sdapi/v1/txt2img', json={
        'prompt': detailed_prompt,
        'steps': 25,
        'width': 1024,
        'height': 1024
    })

    return jsonify({
        'prompt_used': detailed_prompt,
        'image': image_response.json()['images'][0]
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

***

## Gestion des erreurs

```python
from openai import OpenAI, APIError, APIConnectionError
import time

client = OpenAI(base_url="http://server:8000/v1", api_key="dummy")

def chat_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="meta-llama/Llama-3.1-8B-Instruct",
                messages=messages,
                timeout=60
            )
            return response.choices[0].message.content

        except APIConnectionError as e:
            print(f"Erreur de connexion (tentative {attempt + 1}) : {e}")
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Recul exponentiel
            else:
                raise

        except APIError as e:
            print(f"Erreur API : {e}")
            raise

# Utilisation
try:
    result = chat_with_retry([{"role": "user", "content": "Bonjour"}])
    print(result)
except Exception as e:
    print(f"Échec après réessais : {e}")
```

***

## Bonnes pratiques

1. **Pool de connexions** - Réutiliser les connexions HTTP
2. **Requêtes asynchrones** - Utiliser aiohttp pour des appels concurrents
3. **Timeouts** - Toujours définir des délais pour les requêtes
4. **Logique de réessai** - Gérer les échecs temporaires
5. **Limitation de débit** - Ne pas surcharger le serveur
6. **Vérifications de santé** - Surveiller la disponibilité du serveur

***

## Étapes suivantes

* [Traitement par lot](https://docs.clore.ai/guides/guides_v2-fr/avance/batch-processing) - Traiter de lourdes charges de travail
* [Configuration Multi-GPU](https://docs.clore.ai/guides/guides_v2-fr/avance/multi-gpu-setup) - Faire évoluer votre déploiement
* [Comparaison des LLM](https://docs.clore.ai/guides/guides_v2-fr/comparaisons/llm-serving-comparison) - Choisir le serveur approprié
