# Bare Metal

## Clore Bare Metal — Requirements and Guide

**Clore Bare Metal** are physical (non-virtualized) servers with full root access, no sharing, and no power limits. Suitable for AI/ML, HPC, 3D rendering, and any heavy workloads.

**Available GPUs (examples):** B200, H100, H200, A100, L40S, RTX 5090, RTX 4090, etc.\
**Locations (start):** USA, Japan, Hong Kong, and others\
**SLA:** Tier 3 and above data centers, target uptime **99.99%**.

***

### 1) What is Bare Metal on Clore

* You get a whole physical machine (CPU, RAM, disks, network, GPU).
* Full root access/SSH and, when available, IPMI/KVM for OS reinstallation.
* No PL limits / isolating layers — performance matches the hardware.
* Differs from container-based rentals (HiveOS/Docker) in that resources are not shared.

***

### 2) Mandatory infrastructure requirements (for providers)

**2.1 Data center**

* Minimum **Tier 3** (Uptime Institute or a recognized local equivalent).
* Documents: DC letter/certificate, redundancy description (power N+1/2N, cooling, network).
* **SLA 99.99%** with a 24/7 NOC.
* Compliance with fire safety standards; availability of emergency procedures (RPO/RTO).
* **Legal entities only.** Home/office “server rooms” are not accepted.

**2.2 Hardware base (minimum)**

* **CPU:** from 64 threads.
* **RAM:** from 128 GB (256 GB+ recommended for multi-GPU/HPC).
* **Storage:** NVMe SSD ≥ 1 TB, throughput ≥ 1 GB/s (RAID1/10 recommended for system and data).
* **Network:** ≥ 1 Gbps symmetric (10 Gbps preferred, L2/L3 redundancy, static IPv4; IPv6 is a plus).
* **GPU (tier):** L40S / H200 and above or equivalents resilient to heavy workload:\
  B200, H100, H200, A100, L40S, RTX 4090/5090 (**server A-series and data-center cards preferred**).

**2.3 High-performance interconnects (preferred)**

* **InfiniBand** (EDR/HDR/NDR) for distributed training/HPC.
* **NVLink/NVSwitch** — desirable for multi-GPU within a node.

#### 2.4 Reliability and replacement

* In case of hardware failure — **one-for-one** replacement (identical or strictly equivalent configuration) with no SLA degradation.
* Mandatory stock of spare parts / “hot” spares.

#### 2.5 Security and data hygiene

* Disk sterilization between rentals: **blkdiscard/secure erase/1-pass zero/TRIM** (logging).
* IPMI isolation, closed **mgmt** perimeter, ACL/DDoS profile.
* OS images — vetted, with up-to-date microcodes/patches, support for **NVIDIA** drivers.

***

### 3) Minimum commercial terms

* **Minimum rental term:** from **1 month**.
* **Pricing:** price lists competitive by geolocation (accounting for traffic/electricity/VAT costs).
* **API integration** is mandatory/desired (depending on volume) for auto-provisioning, extensions, and monitoring.

***

### 4) Software and image requirements

* **OS:** Ubuntu 22.04/24.04 LTS, Rocky/RHEL 9; on request — Windows Server (with licensing).
* **GPU stack:** NVIDIA 550.xx+ (or those recommended for specific GPUs), CUDA 12.2/12.4+.
* **Management:** SSH (required), IPMI/KVM (preferred) with temporary accounts for the renter.
* **Containerization:** Docker/Podman on request; Kubernetes — allowed if a master is provisioned within the same DC.

***

### 5) How a provider can connect to Bare Metal

1. **Application & verification:**
   * Legal entity, official contract with a Tier 3+ DC, SLA 99.99%, 24/7 NOC.
   * Document package: Tier/equivalent certificate, SLA, fire safety, redundancy scheme.
   * Acceptance tests: public IPv4, screenshot/access to IPMI (KVM), iPerf3/disk performance results.
2. **SKU catalog & pricing:**
   * Standardized cards (GPU composition, CPU threads, RAM, NVMe, network, IB/NVLink, DC/location, traffic limits).
   * Prices tied to geography. Minimum term — 2 weeks.
3. **Operational policies:**
   * Incident response time: ≤ 15 min; hardware replacement: equivalent immediately.
   * Logging of disk sterilization, closure of admin access after return, audit.
   * Monthly reports on uptime/incidents.

### 6) Network and throughput requirements

* Minimum **1 Gbps** (symmetric), preferably **10 Gbps** with redundancy.
* Public IPv4, rDNS support on request; IPv6 is desirable.
* Basic ACLs, anti-DDoS profile, dedicated **mgmt-VLAN** for IPMI.
* For **InfiniBand** — direct L2 segmentation within the rack/room and OFED availability.

***

### 7) Example workloads

* **Multi-GPU LLM training:** 8×L40S/NVLink or an IB cluster of A100/H100/H200 nodes.
* **Video rendering:** 4×RTX 4090/5090 with local NVMe cache and **10 Gbps** egress.
* **HFT/trading:** low latencies, CPU **64–128** threads, RAM **256–512 GB**, NVMe **RAID1** and **10 Gbps** network.
* **Genomics/HPC:** A100/H100 with IB **HDR/NDR**, **SLURM** / MPI support.

***

## Comparison of Standard Rental and Bare Metal

| Parameter                     | Standard rental (HiveOS/Docker)                         | Bare Metal                                                         |
| ----------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------ |
| What it is                    | Container/environment inside the host OS                | Entire physical server                                             |
| Resources (CPU/RAM/bandwidth) | Shared by scheduler; cgroup quotas, possible throttling | Exclusive; predictable CPU/RAM/bandwidth                           |
| Root/privileges               | root inside container, no BIOS access                   | Full server root; BIOS/UEFI access                                 |
| GPU drivers (CUDA/NVIDIA)     | Version defined by the host                             | You install required versions (CUDA/OFED, etc.)                    |
| GPU control                   | Passthrough with restrictions (PL/OC per host policy)   | Full PL/OC control; NVLink/NVSwitch (if present)                   |
| IPMI/KVM/Virtual Media        | No                                                      | Yes (remote console, ISO mounting)                                 |
| Storage                       | Host volumes/mounts; bandwidth may fluctuate            | Direct NVMe/RAID; stable IOPS/throughput                           |
| Network                       | Ports/NAT/shared bandwidth                              | Dedicated NIC 1–10G+; rDNS, VLAN; public IPv4                      |
| Reliability / SLA             | Depends on host; no guaranteed like-for-like swap       | DC Tier 3+, target SLA 99.99%, mandatory like-for-like replacement |
| Minimum term                  | Usually hours/days                                      | From 2 weeks                                                       |
| Cost                          | Lower                                                   | Higher (exclusive + data center)                                   |
| Time to start                 | Seconds–minutes                                         | from 1h up to 48h to start                                         |
| HPC / InfiniBand              | Usually no                                              | Recommended (InfiniBand), NVLink/NVSwitch                          |
| Best for                      | Quick tasks, tests, mining, short sessions              | AI/ML/HPC, production workloads, long projects                     |
| Requirements for provider     | Basic                                                   | Legal entity, DC Tier 3+, 24/7 NOC, regional pricing, API          |
| Security / data               | Within host policies                                    | Disk sanitization between rentals, isolated mgmt (IPMI)            |

## FAQ

**How is Bare Metal different from container rental?**\
Bare Metal is **entirely your physical machine** (CPU/RAM/Disk/Net/GPU). In container rental, resources are shared and you work in an isolated environment.

**Is IPMI required?**\
Preferred. It speeds up OS reinstallation and provides KVM access, especially for network/SSH issues.

**Can nodes be interconnected over IB?**\
Yes, InfiniBand is encouraged for distributed training/HPC. Specify the IB bandwidth/type in the SKU.

**What’s the minimum for GPUs?**\
L40S / H200 level and above, or an equivalent resilient to heavy workloads (B200, H100, A100, etc.).

**What if the server “goes down”?**\
The provider must promptly deliver an **identical replacement** with no degradation (SLA 99.99%).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/for-hosts/advanced/bare-metal.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
