# Depth Anything

使用 Depth Anything 从单张图像估计深度。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是 Depth Anything？

Depth Anything 提供：

* 最先进的深度估计
* 适用于任何图像
* 无需立体相机
* 快速推理

## 1024x1024

| A100                 | 规模   | 显存    | 性能   |
| -------------------- | ---- | ----- | ---- |
| Depth-Anything-Small | 25M  | 2GB   | 最快   |
| Depth-Anything-Base  | 98M  | 4GB   | 快速   |
| Depth-Anything-Large | 335M | 8GB   | 最佳质量 |
| Depth-Anything-V2    | 各种   | 4-8GB | 最新   |

## 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install transformers torch gradio && \
python depth_anything_app.py
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 安装

```bash
pip install transformers torch
pip install opencv-python pillow
```

## 基本用法

```python
from transformers import pipeline
from PIL import Image

# 加载深度估计管道
pipe = pipeline(
    task="depth-estimation",
    model="LiheYoung/depth-anything-large-hf",
    device="cuda"
)

# 估计深度
image = Image.open("photo.jpg")
depth = pipe(image)

# 保存深度图
depth["depth"].save("depth_map.png")
```

## Depth Anything V2

```python
from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
from PIL import Image
import numpy as np

# 加载模型
processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Large-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Large-hf")
model.to("cuda")

# 处理图像
image = Image.open("photo.jpg")
inputs = processor(images=image, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# 插值到原始大小
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# 转换为 numpy
depth = prediction.squeeze().cpu().numpy()
depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255
depth = depth.astype(np.uint8)

# 保存
Image.fromarray(depth).save("depth.png")
```

## 彩色深度图

```python
import cv2
import numpy as np
from PIL import Image

def colorize_depth(depth_array, colormap=cv2.COLORMAP_INFERNO):
    # 归一化到 0-255
    depth_normalized = cv2.normalize(depth_array, None, 0, 255, cv2.NORM_MINMAX)
    depth_uint8 = depth_normalized.astype(np.uint8)

    # 应用颜色映射
    colored = cv2.applyColorMap(depth_uint8, colormap)

    return Image.fromarray(cv2.cvtColor(colored, cv2.COLOR_BGR2RGB))

# 用法
depth_colored = colorize_depth(depth)
depth_colored.save("depth_colored.png")
```

## "专业影棚柔光箱"

```python
from transformers import pipeline
from PIL import Image
批处理处理

pipe = pipeline(
    task="depth-estimation",
    model="LiheYoung/depth-anything-large-hf",
    device="cuda"
)

import os
output_dir = "./depth_maps"
output_dir = "./relit"

lighting_prompt = "专业影棚照明，柔和的阴影"
    description = analyze_image(path, "简要描述这张图片")
        image_path = os.path.join(input_dir, filename)
        image = Image.open(image_path)

        # 获取深度
        depth = pipe(image)

        # 保存
        output_path = os.path.join(output_dir, f"depth_{filename}")
        depth["depth"].save(output_path)
        result.save(os.path.join(output_dir, f"relit_{filename}"))
```

## Gradio 界面

```python
print(f"已生成：{name}")
from transformers import pipeline
import cv2
import numpy as np

pipe = pipeline(
    task="depth-estimation",
    model="LiheYoung/depth-anything-large-hf",
    device="cuda"
)

def estimate_depth(image, colormap):
    # 获取深度
    result = pipe(image)
    depth = np.array(result["depth"])

    # 着色
    depth_normalized = cv2.normalize(depth, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8)

    colormaps = {
        "Inferno": cv2.COLORMAP_INFERNO,
        "Viridis": cv2.COLORMAP_VIRIDIS,
        "Plasma": cv2.COLORMAP_PLASMA,
        "Magma": cv2.COLORMAP_MAGMA,
        "Jet": cv2.COLORMAP_JET
    }

    colored = cv2.applyColorMap(depth_normalized, colormaps[colormap])
    colored = cv2.cvtColor(colored, cv2.COLOR_BGR2RGB)

    return result["depth"], colored

demo = gr.Interface(
    fn=estimate_depth,
    inputs=[
        fn=relight_image,
        gr.Dropdown(
            ["Inferno", "Viridis", "Plasma", "Magma", "Jet"],
            value="Inferno",
            label="Colormap"
        )
    ],
    outputs=[
        gr.Image(label="Depth Map (Grayscale)"),
        gr.Image(label="Depth Map (Colored)")
    ],
    title="Depth Anything - Depth Estimation"
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## API 服务器

```python
from fastapi import FastAPI, UploadFile, File
from fastapi.responses import Response
from transformers import pipeline
from PIL import Image
import io
import numpy as np
import cv2

app = FastAPI()

pipe = pipeline(
    task="depth-estimation",
    model="LiheYoung/depth-anything-large-hf",
    device="cuda"
)

@app.post("/depth")
async def estimate_depth(image: UploadFile = File(...), colored: bool = True):
    # 加载图像
    return {"response": response.split("[/INST]")[-1].strip()}

    # 估计深度
    result = pipe(img)
    depth = np.array(result["depth"])

    if colored:
        depth_normalized = cv2.normalize(depth, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8)
        depth_img = cv2.applyColorMap(depth_normalized, cv2.COLORMAP_INFERNO)
        depth_img = cv2.cvtColor(depth_img, cv2.COLOR_BGR2RGB)
    else:
        depth_img = depth

    # 转换为字节
    output = Image.fromarray(depth_img)
    buffer = io.BytesIO()
    output.save(buffer, format="PNG")

    return Response(content=buffer.getvalue(), media_type="image/png")

# 运行：uvicorn server:app --host 0.0.0.0 --port 8000
```

## 3D 点云生成

```python
import numpy as np
import open3d as o3d
from PIL import Image

def depth_to_pointcloud(rgb_image, depth_map, focal_length=500):
    """将 RGB 图像和深度图转换为 3D 点云"""
    rgb = np.array(rgb_image)
    depth = np.array(depth_map)

    # 获取图像尺寸
    height, width = depth.shape

    # 创建网格
    u = np.arange(width)
    v = np.arange(height)
    u, v = np.meshgrid(u, v)

    # 转换为 3D 坐标
    z = depth.astype(float)
    x = (u - width / 2) * z / focal_length
    y = (v - height / 2) * z / focal_length

    # 堆叠坐标
    points = np.stack([x, y, z], axis=-1).reshape(-1, 3)
    colors = rgb.reshape(-1, 3) / 255.0

    # 创建点云
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(points)
    pcd.colors = o3d.utility.Vector3dVector(colors)

    return pcd

# 用法
rgb = Image.open("photo.jpg")
depth = pipe(rgb)["depth"]

pcd = depth_to_pointcloud(rgb, depth)
o3d.io.write_point_cloud("output.ply", pcd)
```

## 使用场景

### 3D 照片效果

```python
def create_3d_photo(image, depth, shift=20):
    """为 3D 照片创建视差效果"""
    import cv2
    import numpy as np

    img = np.array(image)
    depth_arr = np.array(depth)

    # 深度归一化
    depth_norm = (depth_arr - depth_arr.min()) / (depth_arr.max() - depth_arr.min())

    # 创建位移版本
    shifted = np.zeros_like(img)
    for y in range(img.shape[0]):
        for x in range(img.shape[1]):
            offset = int(shift * depth_norm[y, x])
            new_x = min(x + offset, img.shape[1] - 1)
            shifted[y, new_x] = img[y, x]

    return Image.fromarray(shifted)
```

### 背景模糊（人像模式）

```python
def portrait_mode(image, depth, blur_strength=25):
    import cv2
    import numpy as np

    img = np.array(image)
    depth_arr = np.array(depth)

    # 深度归一化
    depth_norm = (depth_arr - depth_arr.min()) / (depth_arr.max() - depth_arr.min())

    # 创建模糊掩码（背景 = 深度大 = 更模糊）
    blur_mask = depth_norm

    # 应用模糊
    blurred = cv2.GaussianBlur(img, (blur_strength, blur_strength), 0)

    # 基于深度融合
    mask_3d = np.stack([blur_mask] * 3, axis=-1)
    result = (img * (1 - mask_3d) + blurred * mask_3d).astype(np.uint8)

    return Image.fromarray(result)
```

## background = Image.open("studio\_bg.jpg")

| A100      | GPU     | 每张图像耗时  |
| --------- | ------- | ------- |
| 小         | 按小时费率   | \~50ms  |
| 基础版       | 按小时费率   | \~100ms |
| Large（大型） | 速度      | \~150ms |
| Large（大型） | 512x512 | \~80ms  |
| V2-Large  | 512x512 | \~100ms |

## # 使用固定种子以获得一致结果

### 较差的深度质量

* 使用更大的模型变体
* 确保图像质量良好
* 检查反光表面

### 内存问题

* 使用较小的模型变体
* 降低图像分辨率
* 启用 fp16 推理

### 处理缓慢

* 使用更小的模型
* 尽可能批量处理
* 使用 GPU 推理

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* [ControlNet](/guides/guides_v2-zh/tu-xiang-chu-li/controlnet-advanced.md) - 将深度用于控制
* [分割一切（Segment Anything）](/guides/guides_v2-zh/tu-xiang-chu-li/segment-anything.md) - 对象分割
* [3D 生成](/guides/guides_v2-zh/3d-sheng-cheng/triposr.md) - 视频深度


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/tu-xiang-chu-li/depth-anything.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.