Database Schema Overview
The ED Dataset Service exposes Ed Controls data as a relational schema in MariaDB so that BI tools, SQL editors, and data engineers can query it directly. This page covers the pipeline, the high-level shape of the schema, and the conventions every leaf page assumes. For per-table reference, jump to the relevant domain below.
Pipeline architecture
Ed Controls' source system is CouchDB — a document store. The ED Dataset Service reads from CouchDB, transforms each document into one or more relational rows, and writes those rows to MariaDB. What you query through PowerBI, Metabase, or a SQL client is the MariaDB side; the CouchDB side is internal to Ed Controls and not queryable from this service.
Refresh cadence. The pipeline runs on a cron-driven schedule:
- Incremental sync — daily at 02:00. Pulls only documents changed
since each dataset's last successful sync (
lastIncrementalSync) and replays them through the same DELETE-then-INSERT path used at full sync. Most rows you see in MariaDB are at most 24 hours old. - Full sync — weekly, Sunday at 03:00. Drops every table in every dataset's database and rebuilds from scratch, so any drift between the incremental path and the source is corrected at least once a week.
A reindex can also be triggered ad-hoc through the API. Source-system reads go to CouchDB only; this documentation does not cover any other internal stores.
Per-dataset isolation
Every dataset is its own MariaDB database. When a dataset is created, the service provisions a dedicated database, a dedicated user, and the full set of tables described in this section. There is no shared schema across datasets and no cross-customer query at this layer:
- A SQL client connects to one dataset's database at a time, using that dataset's credentials.
- Joining across datasets is not possible from inside this service. Cross-customer rollups belong upstream of this database.
- A single dataset can contain many projects — that is the
natural scope of multi-project queries (see the
projectidjoin rule).
If you have access to multiple datasets, you have multiple connection strings, not multiple schemas in one database.
Global ER and the seven domains
The schema groups into seven domains. Each domain has its own
landing page with a focused ER subgraph and the per-table reference
beneath it. The diagram below shows how the domains connect through
three key bridge tables — contracts_projects,
audit_answers_tickets, and audits_maps — plus the most important
direct foreign keys.
The bridge tables (hexagon nodes) live inside one domain but link
two: contracts_projects is documented under
Contracts,
audit_answers_tickets under
Audits, and
audits_maps under
Maps. The seven domain
landings wire the rest:
- Projects — the hub. Every other
table carries
projectid. - Contracts — commercial layer above projects.
- Tickets — issues, snags, and follow-ups, plus tags / progress labels / custom fields.
- Audits — instances of an audit template plus the question/answer/ticket-link satellites.
- Audit Templates — reusable blueprints that audits are instantiated from.
- Maps — site plans / floor plans that tickets and audits position against.
- Users — one row per email, global across the dataset.
Common patterns
A handful of conventions repeat across every domain. Internalize these once; the leaf pages will not re-explain them.
Primary keys
Most tables use a VARCHAR(255) primary key called id carrying a
UUID-shaped string. A few exceptions:
users.emailis the primary key (the user's email address).- Several bridge / satellite tables (
project_participants,contracts_projects,audit_answers_tickets,audits_maps,audits_tags,audit_roles,audit_signatures,tickets_custom_field_options,tickets_custom_field_selections) have no primary key at all. Identity is enforced procedurally by the converter, which deletes-then-reinserts the full set on every observation. The leaf pages call this out individually.
projectid everywhere
Every entity table carries a projectid column and an index on it.
Filter on projectid in the outer WHERE clause to let MariaDB
prune early. The
Projects landing
documents the rule in full; the short version: when in doubt,
include projectid on both sides of a join.
The projectid + couchdbid composite rule
A few entities (audit templates, maps, map groups) carry IDs that
are project-scoped, not globally unique. To join across tables
that participate in this rule, use both projectid and
couchdbid (or both halves of an id field that concatenates
them with a | separator, e.g. "project123|couch456"). Single-
column joins on couchdbid alone will silently collapse rows
across projects.
Timestamps
Most entity tables carry:
creationdate/lastmodifieddate— when the source document was created or last updated upstream.archived—DATETIMEset when the entity was soft-deleted. Filter onarchived IS NULLfor active reporting. Archived rows remain queryable so historical joins resolve.
Some tables carry derived time-difference columns (e.g.
tickets.created_completed, audits.dueDate_completed). These are
stored in seconds, not days; divide by 86400 for days. The
relevant leaf pages spell out the unit and any clamping or sign
convention.
Status enums are stored verbatim
Status values come straight from CouchDB without normalization.
That means case matters: audits.status includes a value
'In Progress' (capital I, with the space) — querying
WHERE status = 'in progress' returns zero rows. The audit and
ticket leaves enumerate the closed sets explicitly.
Best practices
Joining tables
- Filter on
projectidfirst. Every join from one domain into another should carryprojectidon both sides where possible. - Use the schema's named indexes. Each leaf page lists them under
Indexes; nearly all of them are on
projectid,couchdbid, or a parent FK column. - Prefer composite joins when joining the satellite tables of one
domain (e.g.
audits_questions ↔ audits_answerjoined onaudit AND questionid). ThequestionIDcollision pitfall is the canonical reason.
Refresh-window awareness
Because most satellite tables are rebuilt with DELETE-then-INSERT on every observation of a parent document, reading a satellite during an in-flight reindex of one parent may briefly return zero rows for that parent. For long-running reports, expect this and plan retries — or schedule reports off-peak.
Counting active vs. archived
Always include archived IS NULL when counting active records, and
archived IS NOT NULL when reporting on archived ones. Mixing the
two without a filter produces a count that does not match anything
the product UI shows.
Example queries
All active tickets for a project
SELECT *
FROM tickets
WHERE projectid = 'your_project_id'
AND archived IS NULL;
Audit completion summary
SELECT
a.name AS audit_name,
a.status,
a.completiondate,
COUNT(aa.questionid) AS questions_answered
FROM audits a
LEFT JOIN audits_answer aa ON aa.audit = a.id
WHERE a.projectid = 'your_project_id'
GROUP BY a.id, a.name, a.status, a.completiondate;
aa.audit = a.id is the safe join — every row in audits_answer
carries its parent audit, which sidesteps the questionID collision
described on the Audits domain landing.
Project participants with their roles
SELECT
p.name AS project_name,
pp.email,
pp.role
FROM projects p
JOIN project_participants pp ON pp.projectid = p.id
WHERE p.id = 'your_project_id'
ORDER BY pp.role, pp.email;
Tickets created from audit findings (cross-domain)
audit_answers_tickets ties a ticket back to the audit answer that
produced it — the canonical cross-domain join.
SELECT
a.id AS audit_id,
a.name AS audit_name,
t.id AS ticket_id,
t.title AS ticket_title,
t.status AS ticket_status
FROM audit_answers_tickets aat
JOIN audits a ON a.id = aat.audit
JOIN tickets t ON t.id = aat.ticket
WHERE a.projectid = 'your_project_id'
ORDER BY a.creationdate DESC, t.creationdate DESC;
The join goes through aat.audit and aat.ticket, never through
questionID alone — see the
Audits domain
for why.