# Haystack AI 框架

Haystack 是 deepset 的开源 AI 编排框架，用于构建生产级别的 LLM 应用。凭借 18K+ 的 GitHub 星标，它提供了一个灵活的 **基于管道的架构** 将文档存储、检索器、阅读器、生成器和智能体连接在一起——全部使用简洁、可组合的 Python。无论你需要对私有文档的 RAG、语义搜索，还是多步智能体工作流，Haystack 都会处理底层 plumbing，让你可以专注于应用逻辑。

在 Clore.ai 上，当你需要通过 Hugging Face Transformers 或 sentence-transformers 在本地进行模型推理时，Haystack 的优势尤为明显。如果你完全依赖外部 API（OpenAI、Anthropic），可以在仅 CPU 的实例上运行——但对于嵌入生成和本地 LLM，GPU 会显著降低延迟。

{% hint style="success" %}
所有示例均在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

{% hint style="info" %}
本指南涵盖 **Haystack v2.x** (`haystack-ai` 软件包）。v2 的 API 与 v1（有很大差异`farm-haystack`）。如果你有现有的 v1 管道，请参阅 [迁移指南](https://docs.haystack.deepset.ai/docs/migration).
{% endhint %}

## 概览

| 属性            | 详细信息                                                                       |
| ------------- | -------------------------------------------------------------------------- |
| **项目**        | [deepset-ai/haystack](https://github.com/deepset-ai/haystack)              |
| **许可**        | Apache 2.0                                                                 |
| **GitHub 星标** | 18K+                                                                       |
| **版本**        | v2.x（`haystack-ai`)                                                        |
| **主要使用场景**    | RAG、语义搜索、文档问答、智能体工作流                                                       |
| **GPU 支持**    | 可选 — 本地嵌入/本地 LLM 需要                                                        |
| **难度**        | 中等                                                                         |
| **API 提供**    | Hayhooks（基于 FastAPI，REST）                                                  |
| **主要集成**      | Ollama、OpenAI、Anthropic、HuggingFace、Elasticsearch、Pinecone、Weaviate、Qdrant |

### 你可以构建的内容

* **RAG 管道** — 摄取文档、生成嵌入、检索上下文、回答问题
* **语义搜索** — 按含义而非关键字查询文档
* **文档处理** — 解析 PDF、HTML、Word 文档；拆分、清理并索引内容
* **智能体工作流** — 使用工具（网页搜索、计算器、API）的多步推理
* **REST API 服务** — 通过 Hayhooks 将任何 Haystack 管道作为端点暴露

## 要求

### 硬件要求

| 模型变体                             | GPU       | 显存    | 内存    | 磁盘     | Clore.ai 价格     |
| -------------------------------- | --------- | ----- | ----- | ------ | --------------- |
| **仅 API 模式** （OpenAI/Anthropic）  | 无 / 仅 CPU | —     | 4 GB  | 20 GB  | ≈ $0.01–0.05/小时 |
| **本地嵌入** （sentence-transformers） | 按小时费率     | 8 GB  | 16 GB | 30 GB  | ≈ $0.10–0.15/小时 |
| **本地嵌入 + 小型 LLM** （7B）           | 速度        | 24 GB | 16 GB | 50 GB  | ≈ $0.20–0.25/小时 |
| **本地 LLM** （13B–34B）             | 512x512   | 24 GB | 32 GB | 80 GB  | ≈ $0.35–0.50/小时 |
| **大型本地 LLM** （70B，量化）            | 4 小时会话    | 80 GB | 64 GB | 150 GB | ≈ $1.10–1.50/小时 |

{% hint style="info" %}
对于大多数 RAG 用例， **速度** 在约 $0.20/小时 的配置是最合适的 —— 24 GB VRAM 可同时处理 sentence-transformer 嵌入和 7B–13B 的本地 LLM。
{% endhint %}

### 软件要求

* Docker（Clore.ai 服务器上预装）
* NVIDIA 驱动 + CUDA（Clore.ai GPU 服务器上预装）
* Python 3.10+（容器内）
* CUDA 11.8 或 12.x

## 快速开始

### 1. 租用 Clore.ai 服务器

在 [Clore.ai 市场](https://clore.ai/marketplace)中，筛选以下条件：

* **显存**：嵌入工作负载需 ≥ 8 GB， 本地 LLM 需 ≥ 24 GB
* **已预装 Docker**：已启用（大多数列表默认开启）
* **镜像**: `nvidia/cuda:12.1-devel-ubuntu22.04` 或 `pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime`

从中注意服务器的公网 IP 和 SSH 端口 **我的订单**.

### 2. 连接并验证 GPU

```bash
ssh root@<clore-server-ip> -p <port>

# 验证 GPU 是否可用
nvidia-smi

# 预期输出显示你的 GPU、驱动版本、CUDA 版本
```

### 3. 构建 Haystack Docker 镜像

Haystack v2 推荐通过 pip 安装。创建自定义 Dockerfile：

```bash
mkdir -p /workspace/haystack-app && cd /workspace/haystack-app

cat > Dockerfile << 'EOF'
FROM nvidia/cuda:12.1-devel-ubuntu22.04

# 避免交互提示
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

# 安装 Python 和系统依赖
RUN apt-get update && apt-get install -y \
    python3.11 \
    python3-pip \
    python3.11-dev \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*

# 将 python3.11 设为默认
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
RUN update-alternatives --install /usr/bin/python python python3.11 1

# 安装 Haystack v2 和核心依赖
RUN pip install --no-cache-dir \
    haystack-ai \
    hayhooks \
    sentence-transformers \
    transformers \
    torch \
    accelerate \
    fastapi \
    uvicorn

# 安装可选集成
RUN pip install --no-cache-dir \
    ollama-haystack \
    haystack-experimental

WORKDIR /app

# Hayhooks 的默认端口
EXPOSE 1416

CMD ["hayhooks", "run", "--host", "0.0.0.0", "--port", "1416"]
EOF

# 构建镜像
docker build -t haystack-clore:latest .
```

### 4. 使用 Hayhooks 运行 Haystack

[Hayhooks](https://github.com/deepset-ai/hayhooks) 会将任何 Haystack 管道自动转换为 REST API：

```bash
# 为你的管道创建目录
mkdir -p /workspace/haystack-pipelines

# 以 GPU 访问权限运行 Hayhooks
docker run -d \
  --name haystack \
  --gpus all \
  -p 1416:1416 \
  -v /workspace/haystack-pipelines:/app/pipelines \
  -e OPENAI_API_KEY=${OPENAI_API_KEY:-""} \
  -e HF_TOKEN=${HF_TOKEN:-""} \
  haystack-clore:latest

# 检查是否已在运行
curl http://localhost:1416/status
```

预期响应：

```json
{"status": "ok", "pipelines": []}
```

### 5. 创建你的第一个 RAG 管道

编写一个 Hayhooks 将作为端点提供的管道 YAML：

```bash
cat > /workspace/haystack-pipelines/rag_pipeline.yml << 'EOF'
# 使用 Ollama 作为 LLM + 本地嵌入进行检索的 RAG 管道
components:
  embedder:
    type: haystack.components.embedders.SentenceTransformersTextEmbedder
    init_parameters:
      model: BAAI/bge-small-en-v1.5

  retriever:
    type: haystack.components.retrievers.in_memory.InMemoryEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.in_memory.InMemoryDocumentStore

  prompt_builder:
    type: haystack.components.builders.PromptBuilder
    init_parameters:
      template: |
        根据下面的上下文回答问题。
        Context: {% for doc in documents %}{{ doc.content }}{% endfor %}
        Question: {{ question }}

  llm:
    type: haystack_integrations.components.generators.ollama.OllamaGenerator
    init_parameters:
      model: llama3
      url: http://host.docker.internal:11434

connections:
  - sender: embedder.embedding
    receiver: retriever.query_embedding
  - sender: retriever.documents
    receiver: prompt_builder.documents
  - sender: prompt_builder.prompt
    receiver: llm.prompt

inputs:
  query:
    - embedder.text
    - prompt_builder.question

outputs:
  answer: llm.replies
EOF
```

Hayhooks 会自动发现并提供此管道。测试它：

```bash
# 列出已部署的管道
curl http://localhost:1416/pipelines

# 查询 RAG 管道
curl -X POST http://localhost:1416/rag_pipeline/run \
  -H "Content-Type: application/json" \
  -d '{"query": "What is Haystack?"}'
```

## 配置

### 使用环境变量进行 SSH 和 Jupyter 访问：

| 变量                           | 4s                           | 示例                    |
| ---------------------------- | ---------------------------- | --------------------- |
| `OPENAI_API_KEY`             | 用于 GPT 模型的 OpenAI API 密钥     | `sk-...`              |
| `ANTHROPIC_API_KEY`          | 用于 Claude 的 Anthropic API 密钥 | `sk-ant-...`          |
| `HF_TOKEN`                   | 用于受限模型的 Hugging Face 令牌      | `hf_...`              |
| `HAYSTACK_TELEMETRY_ENABLED` | 禁用使用情况遥测                     | `false`               |
| `CUDA_VISIBLE_DEVICES`       | 选择特定 GPU                     | `0`                   |
| `TRANSFORMERS_CACHE`         | HF 模型的缓存路径                   | `/workspace/hf-cache` |

### 使用完整配置运行

```bash
docker run -d \
  --name haystack \
  --gpus '"device=0"' \
  -p 1416:1416 \
  -v /workspace/haystack-pipelines:/app/pipelines \
  -v /workspace/hf-cache:/root/.cache/huggingface \
  -e OPENAI_API_KEY="your-key-here" \
  -e HF_TOKEN="your-hf-token" \
  -e HAYSTACK_TELEMETRY_ENABLED=false \
  -e CUDA_VISIBLE_DEVICES=0 \
  --restart unless-stopped \
  haystack-clore:latest
```

### 文档摄取管道

构建一个单独的索引管道以摄取文档：

```bash
cat > /workspace/index_documents.py << 'EOF'
import haystack
from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument, TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter, DocumentCleaner
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

# 初始化文档存储
document_store = InMemoryDocumentStore()

# 构建索引管道
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", PyPDFToDocument())
indexing_pipeline.add_component("cleaner", DocumentCleaner())
indexing_pipeline.add_component("splitter", DocumentSplitter(
    split_by="word",
    split_length=200,
    split_overlap=20
))
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(
    model="BAAI/bge-small-en-v1.5"
))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))

# 连接组件
indexing_pipeline.connect("converter", "cleaner")
indexing_pipeline.connect("cleaner", "splitter")
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")

# 运行索引
from pathlib import Path
indexing_pipeline.run({"converter": {"sources": list(Path("/data/documents").glob("*.pdf"))}})

print(f"Indexed {document_store.count_documents()} document chunks")
EOF

docker run --rm \
  --gpus all \
  -v /workspace:/workspace \
  -v /your/documents:/data/documents \
  -v /workspace/hf-cache:/root/.cache/huggingface \
  haystack-clore:latest \
  python3 /workspace/index_documents.py
```

### 使用向量数据库（生产）

对于生产工作负载，用持久化向量数据库替换内存存储：

```bash
# 在 Haystack 旁启动 Qdrant
docker network create haystack-net

docker run -d \
  --name qdrant \
  --network haystack-net \
  -p 6333:6333 \
  -v /workspace/qdrant-data:/qdrant/storage \
  qdrant/qdrant

# 在 Haystack 容器中安装 Qdrant 集成
# 添加到 Dockerfile：  RUN pip install qdrant-haystack
# 然后使用 QdrantDocumentStore 替代 InMemoryDocumentStore
```

## GPU 加速

Haystack 在两种主要场景下使用 GPU 加速：

### 1. 嵌入生成（Sentence Transformers）

对于大规模文档集合的嵌入，GPU 非常有益：

```bash
cat > /workspace/benchmark_embeddings.py << 'EOF'
import time
import torch
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack import Document

# 检查 GPU 可用性
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# 创建 embedder
embedder = SentenceTransformersDocumentEmbedder(
    model="BAAI/bge-base-en-v1.5"
)
embedder.warm_up()

# 基准测试
docs = [Document(content=f"Sample document {i} with some text content.") for i in range(100)]

start = time.time()
result = embedder.run(documents=docs)
elapsed = time.time() - start

print(f"Embedded 100 documents in {elapsed:.2f}s ({100/elapsed:.0f} docs/sec)")
EOF

docker run --rm --gpus all \
  -v /workspace:/workspace \
  haystack-clore:latest \
  python3 /workspace/benchmark_embeddings.py
```

### 2. 本地 LLM 推理（Hugging Face Transformers）

用于在 Haystack 中直接运行 LLM（不使用 Ollama）：

```bash
cat > /workspace/local_llm_pipeline.py << 'EOF'
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators.hugging_face import HuggingFaceLocalGenerator

# 有 GPU 时会自动使用
generator = HuggingFaceLocalGenerator(
    model="mistralai/Mistral-7B-Instruct-v0.2",
    task="text-generation",
    generation_kwargs={
        "max_new_tokens": 512,
        "temperature": 0.7,
        "do_sample": True,
    }
)

prompt_builder = PromptBuilder(template="Answer this question: {{ question }}")

pipeline = Pipeline()
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", generator)
pipeline.connect("prompt_builder.prompt", "llm.prompt")

result = pipeline.run({"prompt_builder": {"question": "What is RAG?"}})
print(result["llm"]["replies"][0])
EOF

docker run --rm --gpus all \
  -v /workspace:/workspace \
  -e HF_TOKEN="your-hf-token" \
  haystack-clore:latest \
  python3 /workspace/local_llm_pipeline.py
```

### 3. 与 Ollama 配合（推荐方法）

为了在易用性与性能之间取得最佳平衡，使用 Ollama 进行 LLM 推理，Haystack 负责编排：

```bash
# 第 1 步：启动 Ollama（参见 Ollama 指南）
docker run -d \
  --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v /workspace/ollama:/root/.ollama \
  ollama/ollama

# 第 2 步：拉取一个编码/聊天模型
docker exec ollama ollama pull llama3
docker exec ollama ollama pull nomic-embed-text  # 用于通过 Ollama 生成嵌入

# 第 3 步：启动指向 Ollama 的 Haystack
docker run -d \
  --name haystack \
  --gpus '"device=0"' \
  -p 1416:1416 \
  --add-host=host.docker.internal:host-gateway \
  -v /workspace/haystack-pipelines:/app/pipelines \
  haystack-clore:latest
```

监控两个容器的 GPU 使用情况：

```bash
watch -n 2 nvidia-smi
```

## 提示与最佳实践

### 选择合适的嵌入模型

| A100                           | 显存       | 性能 | 质量 | 最适合    |
| ------------------------------ | -------- | -- | -- | ------ |
| `BAAI/bge-small-en-v1.5`       | 约 0.5 GB | 最快 | 良好 | 高吞吐量索引 |
| `BAAI/bge-base-en-v1.5`        | ≈ 1 GB   | 快速 | 更好 | 通用 RAG |
| `BAAI/bge-large-en-v1.5`       | 约 2 GB   | 中等 | 最佳 | 最高准确率  |
| `nomic-ai/nomic-embed-text-v1` | ≈ 1.5 GB | 快速 | 优秀 | 长文档    |

### 管道设计建议

* **明智地拆分文档** — 对于大多数 RAG 用例，200–400 字的块且有 10–15% 的重叠效果良好
* **缓存嵌入** — 将文档存储持久化到磁盘；重新生成嵌入成本很高
* **使用 `warm_up()`** — 在生产使用前调用 `component.warm_up()` 将模型加载到 GPU 内存中
* **批量索引** — 以 32–64 的批次处理文档以获得最佳 GPU 利用率
* **使用元数据过滤** — 使用 Haystack 的元数据过滤来限定检索范围（例如按日期、来源、类别）

### 成本优化

```bash
# 在 Clore.ai 使用类 spot 的定价 — 选择每小时费用较低的服务器
# 用于开发/测试：RTX 3060（≈ $0.10/小时）足以进行嵌入
# 用于生产嵌入：RTX 3090（≈ $0.20/小时）— 24 GB 可处理大批量
# 用于本地 LLM + 嵌入：A100 40GB（≈ $0.60/小时）— 为并发用户留有余量

# 监控资源使用情况
docker stats haystack
nvidia-smi dmon -s u -d 5  # 每 5 秒报告一次 GPU 利用率
```

### 为外部访问保护 Hayhooks

```bash
# 选项 1：SSH 隧道（最简单，个人使用）
# 从你的本地机器：
ssh -L 1416:localhost:1416 root@<clore-ip> -p <clore-ssh-port>
# 然后在本地访问 http://localhost:1416

# 选项 2：通过 nginx 反向代理添加基础认证
docker run -d \
  --name nginx-proxy \
  -p 80:80 \
  -v /workspace/nginx.conf:/etc/nginx/conf.d/default.conf \
  nginx:alpine
```

## # 使用固定种子以获得一致结果

| 问题                              | 可能原因        | 解决方案                                                                                            |
| ------------------------------- | ----------- | ----------------------------------------------------------------------------------------------- |
| `ModuleNotFoundError: haystack` | 未安装该包       | 重建 Docker 镜像；检查 `pip install haystack-ai` 是否成功                                                  |
| `CUDA 内存不足（out of memory）`      | 嵌入模型过大      | 使用 `bge-small-en-v1.5` 或减小批量大小                                                                  |
| Hayhooks 在管道上返回 404             | 找不到 YAML 文件 | 检查卷挂载；管道文件必须位于 `/app/pipelines/`                                                                |
| CPU 上嵌入速度慢                      | 未检测到 GPU    | 验证 `--gpus all` 标志；检查 `torch.cuda.is_available()`                                               |
| Ollama 连接被拒绝                    | 主机名错误       | 使用 `--add-host=host.docker.internal:host-gateway`；将 URL 设置为 `http://host.docker.internal:11434` |
| HuggingFace 下载失败                | 缺少令牌或速率限制   | 设置 `HF_TOKEN` 环境变量；确保模型不是受限的                                                                    |
| 管道 YAML 解析错误                    | 无效语法        | 验证 YAML；使用 `python3 -c "import yaml; yaml.safe_load(open('pipeline.yml'))"`                     |
| 容器立即退出                          | 启动错误        | 检查 `docker logs haystack`；确保 Dockerfile 的 CMD 正确                                                |
| 端口 1416 外部无法访问                  | 防火墙 / 端口转发  | 在 Clore.ai 订单设置中暴露端口；检查服务器的开放端口                                                                 |

### 调试命令

```bash
# 检查容器日志
docker logs haystack --tail 50 -f

# 测试 Hayhooks API
curl http://localhost:1416/status
curl http://localhost:1416/pipelines

# 交互式 Python 调试会话
docker exec -it haystack python3

# 在容器内检查 GPU
docker exec haystack python3 -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"

# 检查已安装的包
docker exec haystack pip show haystack-ai hayhooks
```

## 延伸阅读

* [Haystack 文档](https://docs.haystack.deepset.ai/) — 官方 v2 文档
* [Hayhooks GitHub](https://github.com/deepset-ai/hayhooks) — 用于管道的 REST API 提供
* [Haystack Cookbook](https://haystack.deepset.ai/cookbook) — 端到端教程（RAG、智能体、搜索）
* [deepset-ai/haystack 在 GitHub 上](https://github.com/deepset-ai/haystack) — 源代码、问题、发布记录
* [Haystack 集成](https://haystack.deepset.ai/integrations) — 支持的向量存储、LLM 和工具的完整列表
* [Clore.ai 上的 Ollama](/guides/guides_v2-zh/yu-yan-mo-xing/ollama.md) — 将 Haystack 与 Ollama 配对以进行本地 LLM 推理
* [Clore.ai 上的 vLLM](/guides/guides_v2-zh/yu-yan-mo-xing/vllm.md) — 为 Haystack 提供的高吞吐量 LLM 提供后端
* [GPU 比较指南](/guides/guides_v2-zh/kai-shi-shi-yong/gpu-comparison.md) — 为你的工作负载选择合适的 Clore.ai GPU
* [CLORE.AI 市场](https://clore.ai/marketplace) — 租用 GPU 服务器


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/ai-ping-tai-yu-zhi-neng-ti/haystack.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.