Schema design — bRRAIn Docs

Designing effective schemas for graph-based retrieval: ER modeling, record types, validation, and common patterns.

Schema design

Because bRRAIn is schema-agnostic at ingestion, it's tempting to dump raw payloads and call it done. That works — but thoughtful schema design dramatically improves retrieval quality and graph richness. This guide covers the patterns worth adopting.

Core principle: schemas are relationships

In a traditional database, "schema" means tables and columns. In bRRAIn, schema means which POPE entities your records connect to and how those connections are named. The graph is where queries happen, so the graph is where your schema lives.

Record types

Every record has a type field. Treat types as a controlled vocabulary — pick a small, stable set and document them.

Good examples:

  • article
  • contract
  • patient_encounter
  • transaction
  • telemetry_reading

Avoid:

  • Overly generic types like data or item — the graph can't disambiguate
  • Type names that change frequently — past records become orphaned

Properties

Treat properties as first-class schema:

  • Identifiers — IDs, slugs, external references (always use string, not int, for portability)
  • Classifications — short enum values (status, severity, classification)
  • Freeform content — the main body; indexed for semantic and full-text search
  • Temporal fields — ISO 8601 strings in UTC
  • Numeric metrics — for filtering and sorting

Name properties in snake_case, consistently.

Type constraints and validation

Define a schema per type in content/schemas/<type>.json:

{
    "type": "contract",
    "required": ["title", "parties", "effective_date"],
    "properties": {
        "title":          {"type": "string", "maxLength": 200},
        "parties":        {"type": "array", "items": {"type": "string"}},
        "effective_date": {"type": "string", "format": "date"},
        "expires_on":     {"type": "string", "format": "date"},
        "amount":         {"type": "number"}
    }
}

The Handler validates records against the schema before writing, returning sdk.ErrValidation on failure.

Designing for optimal retrieval

Include entity hints

Always name POPE entities explicitly when you know them:

record := map[string]any{
    "type":     "contract",
    "title":    "MSA with Acme Corp",
    "parties":  []string{"org:acme-corp", "org:our-co"},
    "owner":    "person:alice@firm.io",
    "jurisdiction": "place:delaware",
    "signed_at":   "2026-03-15",
}

The Handler extracts entities automatically, but explicit hints yield faster, more accurate graph connections.

Use stable identifiers

If your record references another record or entity, use the canonical ID:

  • Good: "author": "person:alice-uuid" or "account": "org:acme-corp"
  • Avoid: "author_name": "Alice" alone — ambiguous when two Alices exist

Denormalize for readability

Include the human-readable fields alongside IDs. The graph uses IDs; the search snippets use the names:

{
    "author":      "person:alice-uuid",
    "author_name": "Alice Chen",
}

Common patterns

Pattern A — hierarchical

Use parent and children pointers. Example: a project has many milestones, each has many tasks.

{
    "type":     "task",
    "parent":   "milestone:ms-234",
    "title":    "Deploy to staging",
    "assignee": "person:alice",
}

Pattern B — network

Use arrays of linked entity IDs. Example: a meeting links to many attendees and decisions.

{
    "type":      "meeting",
    "title":     "Q2 planning",
    "attendees": []string{"person:alice", "person:bob"},
    "decisions": []string{"decision:dec-123"},
    "held_at":   "2026-04-10T15:00:00Z",
}

Pattern C — event-sourcing

Store immutable events; compute current state by replay or by maintaining a projection.

{
    "type":       "order_event",
    "event":      "status_changed",
    "order_id":   "order:o-234",
    "from":       "pending",
    "to":         "shipped",
    "actor":      "person:alice",
    "occurred_at": "2026-04-16T09:12:00Z",
}

Anti-patterns to avoid

  • Deeply nested payloads — Graph traversal can't see into nested JSON. Flatten or store as separate records.
  • Giant blobs — A 50 MB JSON record is technically allowed but hurts retrieval. Store the blob in the Document Portal and reference it from a record.
  • Generic "data" payloads — The Handler can't classify what you don't name. Always use a specific type.
  • Non-ASCII IDs — Stick to [a-z0-9-]. Unicode IDs break downstream tooling.
  • Mutating IDs — Once a record has an ID, treat it as permanent. Renames should create a new record and a renamed_to edge.

Evolving schemas

When you need to change a schema:

  1. Bump the schema version in content/schemas/<type>.json.
  2. Write a migration transformer (see Data ingestion transformers).
  3. Reingest old records: brrain docs reingest --type contract.

The Handler honors the most recent schema version but preserves historical records under their original version in the audit log.