# MiniMax Speech 2.6

{% hint style="success" %}
**Released:** March 4, 2026 — MiniMax just dropped Speech 2.6 with ultra-low latency, enhanced format handling, and human-like voice for real-time Voice Agent scenarios.
{% endhint %}

**MiniMax Speech 2.6** is a state-of-the-art text-to-speech model designed for real-time voice agent applications. It features ultra-low end-to-end latency, improved audio format handling (MP3, PCM, WAV, FLAC), and a significantly more natural voice compared to Speech 2.x. Best used via API, but can be integrated into self-hosted pipelines via the MiniMax API.

### Key Features

| Feature        | Details                                           |
| -------------- | ------------------------------------------------- |
| Latency        | Ultra-low (< 300ms TTFB)                          |
| Voice Quality  | Human-like, natural prosody                       |
| Languages      | 20+ languages including English, Chinese, Russian |
| Output Formats | MP3, PCM, WAV, FLAC                               |
| Use Case       | Voice agents, real-time TTS, streaming            |
| API            | OpenAI-compatible REST API                        |

### Why MiniMax Speech 2.6?

* **Sub-300ms latency** — suitable for real-time conversation agents
* **Streaming support** — token-by-token audio streaming for lowest perceived latency
* **Voice cloning** — clone from short audio samples
* **Production-ready** — powers MiniMax's own commercial voice products

***

## Setup: Self-Hosted API Proxy on Clore.ai

MiniMax Speech 2.6 is currently API-based. You can run a lightweight FastAPI proxy on a small Clore.ai server (even CPU-only) to integrate it into your pipeline:

```yaml
version: "3.8"
services:
  minimax-proxy:
    image: python:3.11-slim
    ports:
      - "8080:8080"
    environment:
      - MINIMAX_API_KEY=${MINIMAX_API_KEY}
      - MINIMAX_GROUP_ID=${MINIMAX_GROUP_ID}
    volumes:
      - ./app:/app
    command: >
      sh -c "pip install fastapi uvicorn httpx python-dotenv &&
             uvicorn app.main:app --host 0.0.0.0 --port 8080"
```

### Minimal FastAPI Proxy (`app/main.py`)

```python
import os, httpx
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel

app = FastAPI()

MINIMAX_API_KEY = os.environ["MINIMAX_API_KEY"]
MINIMAX_GROUP_ID = os.environ["MINIMAX_GROUP_ID"]
BASE_URL = "https://api.minimax.io/v1"

class TTSRequest(BaseModel):
    text: str
    voice_id: str = "Calm_Woman"
    speed: float = 1.0
    output_format: str = "mp3"

@app.post("/tts")
async def text_to_speech(req: TTSRequest):
    """Proxy to MiniMax Speech 2.6"""
    async with httpx.AsyncClient(timeout=30) as client:
        response = await client.post(
            f"{BASE_URL}/t2a_v2?GroupId={MINIMAX_GROUP_ID}",
            headers={"Authorization": f"Bearer {MINIMAX_API_KEY}"},
            json={
                "model": "speech-02-hd",
                "text": req.text,
                "stream": False,
                "voice_setting": {
                    "voice_id": req.voice_id,
                    "speed": req.speed,
                    "vol": 1.0,
                    "pitch": 0
                },
                "audio_setting": {
                    "sample_rate": 32000,
                    "bitrate": 128000,
                    "format": req.output_format
                }
            }
        )
    data = response.json()
    audio_b64 = data["data"]["audio"]
    import base64
    audio_bytes = base64.b64decode(audio_b64)
    return StreamingResponse(
        iter([audio_bytes]),
        media_type=f"audio/{req.output_format}"
    )

@app.get("/health")
async def health():
    return {"status": "ok", "model": "minimax-speech-2.6"}
```

### Usage

```bash
# Test TTS endpoint
curl -X POST http://localhost:8080/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello! This is MiniMax Speech 2.6 running on Clore.", "voice_id": "Calm_Woman"}' \
  --output output.mp3

# Play the result
ffplay output.mp3
```

***

## Direct API Usage (No Server Needed)

If you just need TTS in your scripts:

```python
import requests, base64, os

API_KEY = os.environ["MINIMAX_API_KEY"]
GROUP_ID = os.environ["MINIMAX_GROUP_ID"]

def synthesize(text: str, voice_id: str = "Calm_Woman") -> bytes:
    resp = requests.post(
        f"https://api.minimax.io/v1/t2a_v2?GroupId={GROUP_ID}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": "speech-02-hd",
            "text": text,
            "stream": False,
            "voice_setting": {"voice_id": voice_id, "speed": 1.0, "vol": 1.0, "pitch": 0},
            "audio_setting": {"sample_rate": 32000, "bitrate": 128000, "format": "mp3"}
        }
    )
    return base64.b64decode(resp.json()["data"]["audio"])

audio = synthesize("Running AI workloads on Clore.ai is incredibly affordable.")
with open("output.mp3", "wb") as f:
    f.write(audio)
```

***

## Available Voice IDs

| Voice ID         | Character      | Best For              |
| ---------------- | -------------- | --------------------- |
| `Calm_Woman`     | Calm female    | Assistants, narration |
| `Energetic_Man`  | Energetic male | Marketing, news       |
| `Gentle_Man`     | Gentle male    | Audiobooks, tutorials |
| `Cute_Girl`      | Young female   | Entertainment         |
| `Deep_Voice_Man` | Deep male      | Documentaries         |

***

## GPU Requirements on Clore.ai

{% hint style="info" %}
MiniMax Speech 2.6 is an API-based model — you don't need a GPU to use it. A small CPU-only Clore.ai server ($0.10–0.30/day) is sufficient to run the proxy. Combine with other GPU workloads on the same server for maximum efficiency.
{% endhint %}

| Server Type       | Use Case                | Clore.ai Cost    |
| ----------------- | ----------------------- | ---------------- |
| CPU only (2 vCPU) | Proxy + API gateway     | \~$0.10–0.20/day |
| RTX 3060          | Proxy + local GPU tasks | \~$0.37/day      |
| RTX 4090          | Proxy + heavy GPU work  | \~$2.10/day      |

***

## Clore.ai Port Forwarding

| Port | Service           |
| ---- | ----------------- |
| 8080 | FastAPI TTS proxy |

***

## Alternatives on Clore.ai

If you need **fully local** TTS without API calls:

| Model      | VRAM | Quality | Speed     | Guide                                                                     |
| ---------- | ---- | ------- | --------- | ------------------------------------------------------------------------- |
| Kokoro TTS | 4GB  | ⭐⭐⭐⭐    | Fast      | [Kokoro TTS](https://docs.clore.ai/guides/audio-and-voice/kokoro-tts)     |
| F5-TTS     | 8GB  | ⭐⭐⭐⭐⭐   | Medium    | [F5-TTS](https://docs.clore.ai/guides/audio-and-voice/f5-tts)             |
| Chatterbox | 6GB  | ⭐⭐⭐⭐    | Fast      | [Chatterbox](https://docs.clore.ai/guides/audio-and-voice/chatterbox-tts) |
| Qwen3-TTS  | 8GB  | ⭐⭐⭐⭐⭐   | Medium    | [Qwen3-TTS](https://docs.clore.ai/guides/audio-and-voice/qwen3-tts)       |
| Kani-TTS-2 | 3GB  | ⭐⭐⭐     | Very fast | [Kani-TTS](https://docs.clore.ai/guides/audio-and-voice/kani-tts)         |

***

## Links

* **MiniMax API Docs:** [platform.minimax.io/docs](https://platform.minimax.io/docs)
* **Speech 2.6 Blog Post:** [minimax.io/news/minimax-speech-26](https://www.minimax.io/news/minimax-speech-26)
* **Clore.ai Marketplace:** [clore.ai/marketplace](https://clore.ai/marketplace)
