# MiniMax Speech 2.6

{% hint style="success" %}
**Released:** March 4, 2026 — MiniMax just dropped Speech 2.6 with ultra-low latency, enhanced format handling, and human-like voice for real-time Voice Agent scenarios.
{% endhint %}

**MiniMax Speech 2.6** is a state-of-the-art text-to-speech model designed for real-time voice agent applications. It features ultra-low end-to-end latency, improved audio format handling (MP3, PCM, WAV, FLAC), and a significantly more natural voice compared to Speech 2.x. Best used via API, but can be integrated into self-hosted pipelines via the MiniMax API.

### Key Features

| Feature        | Details                                           |
| -------------- | ------------------------------------------------- |
| Latency        | Ultra-low (< 300ms TTFB)                          |
| Voice Quality  | Human-like, natural prosody                       |
| Languages      | 20+ languages including English, Chinese, Russian |
| Output Formats | MP3, PCM, WAV, FLAC                               |
| Use Case       | Voice agents, real-time TTS, streaming            |
| API            | OpenAI-compatible REST API                        |

### Why MiniMax Speech 2.6?

* **Sub-300ms latency** — suitable for real-time conversation agents
* **Streaming support** — token-by-token audio streaming for lowest perceived latency
* **Voice cloning** — clone from short audio samples
* **Production-ready** — powers MiniMax's own commercial voice products

***

## Setup: Self-Hosted API Proxy on Clore.ai

MiniMax Speech 2.6 is currently API-based. You can run a lightweight FastAPI proxy on a small Clore.ai server (even CPU-only) to integrate it into your pipeline:

```yaml
version: "3.8"
services:
  minimax-proxy:
    image: python:3.11-slim
    ports:
      - "8080:8080"
    environment:
      - MINIMAX_API_KEY=${MINIMAX_API_KEY}
      - MINIMAX_GROUP_ID=${MINIMAX_GROUP_ID}
    volumes:
      - ./app:/app
    command: >
      sh -c "pip install fastapi uvicorn httpx python-dotenv &&
             uvicorn app.main:app --host 0.0.0.0 --port 8080"
```

### Minimal FastAPI Proxy (`app/main.py`)

```python
import os, httpx
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel

app = FastAPI()

MINIMAX_API_KEY = os.environ["MINIMAX_API_KEY"]
MINIMAX_GROUP_ID = os.environ["MINIMAX_GROUP_ID"]
BASE_URL = "https://api.minimax.io/v1"

class TTSRequest(BaseModel):
    text: str
    voice_id: str = "Calm_Woman"
    speed: float = 1.0
    output_format: str = "mp3"

@app.post("/tts")
async def text_to_speech(req: TTSRequest):
    """Proxy to MiniMax Speech 2.6"""
    async with httpx.AsyncClient(timeout=30) as client:
        response = await client.post(
            f"{BASE_URL}/t2a_v2?GroupId={MINIMAX_GROUP_ID}",
            headers={"Authorization": f"Bearer {MINIMAX_API_KEY}"},
            json={
                "model": "speech-02-hd",
                "text": req.text,
                "stream": False,
                "voice_setting": {
                    "voice_id": req.voice_id,
                    "speed": req.speed,
                    "vol": 1.0,
                    "pitch": 0
                },
                "audio_setting": {
                    "sample_rate": 32000,
                    "bitrate": 128000,
                    "format": req.output_format
                }
            }
        )
    data = response.json()
    audio_b64 = data["data"]["audio"]
    import base64
    audio_bytes = base64.b64decode(audio_b64)
    return StreamingResponse(
        iter([audio_bytes]),
        media_type=f"audio/{req.output_format}"
    )

@app.get("/health")
async def health():
    return {"status": "ok", "model": "minimax-speech-2.6"}
```

### Usage

```bash
# Test TTS endpoint
curl -X POST http://localhost:8080/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello! This is MiniMax Speech 2.6 running on Clore.", "voice_id": "Calm_Woman"}' \
  --output output.mp3

# Play the result
ffplay output.mp3
```

***

## Direct API Usage (No Server Needed)

If you just need TTS in your scripts:

```python
import requests, base64, os

API_KEY = os.environ["MINIMAX_API_KEY"]
GROUP_ID = os.environ["MINIMAX_GROUP_ID"]

def synthesize(text: str, voice_id: str = "Calm_Woman") -> bytes:
    resp = requests.post(
        f"https://api.minimax.io/v1/t2a_v2?GroupId={GROUP_ID}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": "speech-02-hd",
            "text": text,
            "stream": False,
            "voice_setting": {"voice_id": voice_id, "speed": 1.0, "vol": 1.0, "pitch": 0},
            "audio_setting": {"sample_rate": 32000, "bitrate": 128000, "format": "mp3"}
        }
    )
    return base64.b64decode(resp.json()["data"]["audio"])

audio = synthesize("Running AI workloads on Clore.ai is incredibly affordable.")
with open("output.mp3", "wb") as f:
    f.write(audio)
```

***

## Available Voice IDs

| Voice ID         | Character      | Best For              |
| ---------------- | -------------- | --------------------- |
| `Calm_Woman`     | Calm female    | Assistants, narration |
| `Energetic_Man`  | Energetic male | Marketing, news       |
| `Gentle_Man`     | Gentle male    | Audiobooks, tutorials |
| `Cute_Girl`      | Young female   | Entertainment         |
| `Deep_Voice_Man` | Deep male      | Documentaries         |

***

## GPU Requirements on Clore.ai

{% hint style="info" %}
MiniMax Speech 2.6 is an API-based model — you don't need a GPU to use it. A small CPU-only Clore.ai server ($0.10–0.30/day) is sufficient to run the proxy. Combine with other GPU workloads on the same server for maximum efficiency.
{% endhint %}

| Server Type       | Use Case                | Clore.ai Cost    |
| ----------------- | ----------------------- | ---------------- |
| CPU only (2 vCPU) | Proxy + API gateway     | \~$0.10–0.20/day |
| RTX 3060          | Proxy + local GPU tasks | \~$0.37/day      |
| RTX 4090          | Proxy + heavy GPU work  | \~$2.10/day      |

***

## Clore.ai Port Forwarding

| Port | Service           |
| ---- | ----------------- |
| 8080 | FastAPI TTS proxy |

***

## Alternatives on Clore.ai

If you need **fully local** TTS without API calls:

| Model      | VRAM | Quality | Speed     | Guide                                                   |
| ---------- | ---- | ------- | --------- | ------------------------------------------------------- |
| Kokoro TTS | 4GB  | ⭐⭐⭐⭐    | Fast      | [Kokoro TTS](/guides/audio-and-voice/kokoro-tts.md)     |
| F5-TTS     | 8GB  | ⭐⭐⭐⭐⭐   | Medium    | [F5-TTS](/guides/audio-and-voice/f5-tts.md)             |
| Chatterbox | 6GB  | ⭐⭐⭐⭐    | Fast      | [Chatterbox](/guides/audio-and-voice/chatterbox-tts.md) |
| Qwen3-TTS  | 8GB  | ⭐⭐⭐⭐⭐   | Medium    | [Qwen3-TTS](/guides/audio-and-voice/qwen3-tts.md)       |
| Kani-TTS-2 | 3GB  | ⭐⭐⭐     | Very fast | [Kani-TTS](/guides/audio-and-voice/kani-tts.md)         |

***

## Links

* **MiniMax API Docs:** [platform.minimax.io/docs](https://platform.minimax.io/docs)
* **Speech 2.6 Blog Post:** [minimax.io/news/minimax-speech-26](https://www.minimax.io/news/minimax-speech-26)
* **Clore.ai Marketplace:** [clore.ai/marketplace](https://clore.ai/marketplace)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/audio-and-voice/minimax-speech.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
