AI Cloud Provisioner

Automate the provisioning, lifecycle, and cost optimization of GPU compute across multiple cloud providers. Useful for training runs, batch inference, and burst scale-out.

What it does

Connect one or more GPU providers, declare what you need (GPU class, count, duration), and let the Provisioner pick the best available pod and stand it up. It handles:

Provider abstraction — RunPod, Lambda Labs, CoreWeave, Vast.ai, Hetzner GPU Cloud, AWS, GCP, Azure, on-prem K8s clusters.
Spot vs on-demand routing — picks spot when your workload tolerates interruption, falls back to on-demand otherwise.
Region and country pinning — for data sovereignty requirements.
Auto-shutdown — idle pods get reaped; configurable thresholds prevent runaway charges.
Cost tracking — per-job, per-project, per-tenant accounting.

Use cases

Spin up a training pod for a fine-tuning job, run it, tear it down — without the operator drudge.
Burst inference capacity ahead of a high-traffic event.
Move workloads between regions to chase data residency or cost.
Centralize multi-cloud GPU spend under one roof.

Installing

Per-organization subscription. Pricing depends on number of providers connected and on monthly compute throughput.

After install:

A new app surface mounts under your Console.
The first-time wizard walks you through connecting your first provider (typically the one with the best spot availability for your region).

Connecting providers

Each provider needs:

API credentials (we encrypt them at rest).
A spending cap (a hard ceiling per period).
Region preferences.
Default GPU class.

Once connected, the Provisioner sees that provider's available capacity in real time.

Workloads

You declare a workload with:

A name.
A container image or recipe (we support Docker images, NVIDIA NGC catalog entries, and a curated library of bRRAIn-prepared base images).
GPU class (T4, L4, A10, A40, A100, H100, MI300, custom).
GPU count.
Volume requirements (persistent storage, ephemeral, network volumes).
Networking (public ingress, internal-only, port mapping).
Lifecycle (one-shot job, long-running service, scheduled).

The Provisioner picks the best provider that satisfies the spec, provisions the pod, exposes connection details, and tracks state until the workload terminates.

Pod recipes

Recipes are reusable workload templates. The catalog ships with recipes for:

bRRAIn brain-pod.
LLMOps training (LoRA / QLoRA on common base models).
LLMOps inference.
Stable Diffusion / Flux image generation.
Whisper batch transcription.
Custom user recipes you author.

A recipe can be parameterized (model size, training dataset URI, etc.) so you reuse the same recipe with different inputs.

Auto-shutdown

Idle pods cost money for nothing. The Provisioner watches each pod's GPU utilization. If utilization stays under a threshold for a configured duration, the pod is gracefully shut down. The threshold and duration are tuneable per workload.

For long-running services (e.g., an inference endpoint), you typically disable auto-shutdown.

Cost tracking and quotas

Per-project and per-tenant accounting:

Real-time spend dashboard.
Daily, weekly, monthly aggregates.
Soft-limit alerts at configurable thresholds.
Hard-limit cutoffs that refuse new provisions when a budget is exhausted.

Costs are normalized across providers using each provider's invoice rate.

Multi-cloud failover

If a provider runs out of capacity for a needed GPU class, the Provisioner can fall back to the next best provider in your preference list — useful when capacity is tight (H100 spot, for example).

Scopes required

Read and write in the extensions/ai-cloud-provisioner zone.
Access to the provider credentials you wire up.
Permission to provision compute on behalf of the organization.

Where to next

LLMOps — typical consumer of provisioned GPU.
Console: Observability — where compute spend rolls up.
Architecture: Compute zones — how compute is partitioned for safety.