LLMOps

Provision, train, evaluate, and operate your own LLMs — on your hardware or ours, with full sovereignty over weights, training data, and inference traffic.

What it does

LLMOps gives you everything you need to take an open-weight base model, fine-tune it on your domain corpus, deploy it as a Handler adapter inside your bRRAIn, evaluate it head-to-head against incumbent models, and monitor its production behavior.

Capabilities:

Model registry — register, version, and switch between models (open-weight base, fine-tuned adapters, hosted external).
Training jobs — supervised fine-tuning, LoRA / QLoRA, preference-tuning, on the GPU class of your choice.
Evaluation harness — task-based benchmarks, A/B comparisons, held-out test sets.
Inference deployment — promote a trained model to your organization's Handler with one click.
Observability — per-model latency, throughput, cost, hallucination rate, refusal rate.
Cost control — quotas per project, alerts on threshold breach.

Use cases

Train a domain Handler for legal contract review.
Compare three open-weight models on your own evaluation suite.
Deploy a smaller, faster model for high-throughput summarization while keeping a larger model for complex Q&A.
Run A/B traffic splits between candidate models in production.

Installing

Per-organization subscription. Pricing tiers depend on training-job throughput. See the listing.

After install, the extension appears under Installed extensions → LLMOps in your Console. The first-time wizard:

Connect to a GPU provider (your existing AI Cloud Provisioner config or a fresh wire-up).
Register a base model from the open-weight catalog.
Optionally upload a small evaluation set to confirm everything works.

Model registry

Add a model from the catalog (Llama family, Mistral family, DeepSeek family, Qwen family, and more) or upload a custom model. Each entry stores:

Model ID and version.
Base architecture and parameter count.
License (open-weight permitted models only by default; you can add proprietary if you have the rights).
Tags and notes.
Adapter chain if it's a fine-tuned variant.

Training jobs

Submit a training job with:

A base model from the registry.
A training dataset (uploaded as JSONL or pulled from your Vault).
Hyperparameters (epochs, learning rate, LoRA rank, etc.).
The GPU class to run on.

Jobs run asynchronously on your provisioned GPU pool. Progress, loss curves, and intermediate checkpoints stream to the LLMOps UI.

Evaluation

Evaluation suites are reproducible test runs against a model. Built-in suites cover:

General reasoning (MMLU, ARC, HellaSwag).
Code (HumanEval, MBPP).
Safety and refusals (HarmBench-style).
Domain-specific suites you upload.

Run a suite against multiple models in parallel; the leaderboard shows scores side-by-side with per-task drill-downs.

Deployment

Promoting a model to production is one click. The promoted model becomes the active Handler for your organization. Existing traffic stops draining to the old model and starts hitting the new one within seconds.

You can configure traffic-split deployments — 90% to the production model, 10% to a candidate — for gradual rollout. Metrics on both legs flow to the same observability pane for easy comparison.

Observability

Real-time metrics on every deployed model:

Requests per second.
p50 / p95 / p99 latency.
Token throughput.
Cost per 1000 tokens.
Refusal rate (where the model declined to answer).
User-flagged hallucinations.
Per-prompt-template performance.

Alerts on threshold breaches fire into your Console notification routing.

Cost control

Per-project and per-model quotas:

Daily and monthly token caps.
Daily and monthly cost caps.
Per-user request quotas.

Soft limits warn; hard limits block. Quota usage surfaces on the LLMOps dashboard.

Sovereignty

Models you train belong to your organization. Weights are encrypted at rest with your organization's vault key. Inference traffic stays in your data residency region. Training data does not leave your zone unless you explicitly export it.

Scopes required

Read and write in the extensions/llmops zone.
Read access to training datasets in zones you specify at install.
Write access to evaluation result records.
Compute provisioning rights (typically via the AI Cloud Provisioner extension or an integration you wire up).

Where to next

Architecture: Compute zones — how compute is partitioned for safety.
API: Memory endpoints — programmatic access to the trained models via the Handler.
The extension's in-app docs for training-recipe specifics.