Schema design — bRRAIn Docs
Designing effective schemas for graph-based retrieval: ER modeling, record types, validation, and common patterns.
Schema design
Because bRRAIn is schema-agnostic at ingestion, it's tempting to dump raw payloads and call it done. That works — but thoughtful schema design dramatically improves retrieval quality and graph richness. This guide covers the patterns worth adopting.
Core principle: schemas are relationships
In a traditional database, "schema" means tables and columns. In bRRAIn, schema means which POPE entities your records connect to and how those connections are named. The graph is where queries happen, so the graph is where your schema lives.
Record types
Every record has a type field. Treat types as a controlled vocabulary — pick a small, stable set and document them.
Good examples:
articlecontractpatient_encountertransactiontelemetry_reading
Avoid:
- Overly generic types like
dataoritem— the graph can't disambiguate - Type names that change frequently — past records become orphaned
Properties
Treat properties as first-class schema:
- Identifiers — IDs, slugs, external references (always use string, not int, for portability)
- Classifications — short enum values (
status,severity,classification) - Freeform content — the main body; indexed for semantic and full-text search
- Temporal fields — ISO 8601 strings in UTC
- Numeric metrics — for filtering and sorting
Name properties in snake_case, consistently.
Type constraints and validation
Define a schema per type in content/schemas/<type>.json:
{
"type": "contract",
"required": ["title", "parties", "effective_date"],
"properties": {
"title": {"type": "string", "maxLength": 200},
"parties": {"type": "array", "items": {"type": "string"}},
"effective_date": {"type": "string", "format": "date"},
"expires_on": {"type": "string", "format": "date"},
"amount": {"type": "number"}
}
}
The Handler validates records against the schema before writing, returning sdk.ErrValidation on failure.
Designing for optimal retrieval
Include entity hints
Always name POPE entities explicitly when you know them:
record := map[string]any{
"type": "contract",
"title": "MSA with Acme Corp",
"parties": []string{"org:acme-corp", "org:our-co"},
"owner": "person:alice@firm.io",
"jurisdiction": "place:delaware",
"signed_at": "2026-03-15",
}
The Handler extracts entities automatically, but explicit hints yield faster, more accurate graph connections.
Use stable identifiers
If your record references another record or entity, use the canonical ID:
- Good:
"author": "person:alice-uuid"or"account": "org:acme-corp" - Avoid:
"author_name": "Alice"alone — ambiguous when two Alices exist
Denormalize for readability
Include the human-readable fields alongside IDs. The graph uses IDs; the search snippets use the names:
{
"author": "person:alice-uuid",
"author_name": "Alice Chen",
}
Common patterns
Pattern A — hierarchical
Use parent and children pointers. Example: a project has many milestones, each has many tasks.
{
"type": "task",
"parent": "milestone:ms-234",
"title": "Deploy to staging",
"assignee": "person:alice",
}
Pattern B — network
Use arrays of linked entity IDs. Example: a meeting links to many attendees and decisions.
{
"type": "meeting",
"title": "Q2 planning",
"attendees": []string{"person:alice", "person:bob"},
"decisions": []string{"decision:dec-123"},
"held_at": "2026-04-10T15:00:00Z",
}
Pattern C — event-sourcing
Store immutable events; compute current state by replay or by maintaining a projection.
{
"type": "order_event",
"event": "status_changed",
"order_id": "order:o-234",
"from": "pending",
"to": "shipped",
"actor": "person:alice",
"occurred_at": "2026-04-16T09:12:00Z",
}
Anti-patterns to avoid
- Deeply nested payloads — Graph traversal can't see into nested JSON. Flatten or store as separate records.
- Giant blobs — A 50 MB JSON record is technically allowed but hurts retrieval. Store the blob in the Document Portal and reference it from a record.
- Generic "data" payloads — The Handler can't classify what you don't name. Always use a specific
type. - Non-ASCII IDs — Stick to
[a-z0-9-]. Unicode IDs break downstream tooling. - Mutating IDs — Once a record has an ID, treat it as permanent. Renames should create a new record and a
renamed_toedge.
Evolving schemas
When you need to change a schema:
- Bump the schema version in
content/schemas/<type>.json. - Write a migration transformer (see Data ingestion transformers).
- Reingest old records:
brrain docs reingest --type contract.
The Handler honors the most recent schema version but preserves historical records under their original version in the audit log.