Skip to main content

Database Schema Overview

The ED Dataset Service exposes Ed Controls data as a relational schema in MariaDB so that BI tools, SQL editors, and data engineers can query it directly. This page covers the pipeline, the high-level shape of the schema, and the conventions every leaf page assumes. For per-table reference, jump to the relevant domain below.

Pipeline architecture

Ed Controls' source system is CouchDB — a document store. The ED Dataset Service reads from CouchDB, transforms each document into one or more relational rows, and writes those rows to MariaDB. What you query through PowerBI, Metabase, or a SQL client is the MariaDB side; the CouchDB side is internal to Ed Controls and not queryable from this service.

Refresh cadence. The pipeline runs on a cron-driven schedule:

  • Incremental sync — daily at 02:00. Pulls only documents changed since each dataset's last successful sync (lastIncrementalSync) and replays them through the same DELETE-then-INSERT path used at full sync. Most rows you see in MariaDB are at most 24 hours old.
  • Full sync — weekly, Sunday at 03:00. Drops every table in every dataset's database and rebuilds from scratch, so any drift between the incremental path and the source is corrected at least once a week.

A reindex can also be triggered ad-hoc through the API. Source-system reads go to CouchDB only; this documentation does not cover any other internal stores.

Per-dataset isolation

Every dataset is its own MariaDB database. When a dataset is created, the service provisions a dedicated database, a dedicated user, and the full set of tables described in this section. There is no shared schema across datasets and no cross-customer query at this layer:

  • A SQL client connects to one dataset's database at a time, using that dataset's credentials.
  • Joining across datasets is not possible from inside this service. Cross-customer rollups belong upstream of this database.
  • A single dataset can contain many projects — that is the natural scope of multi-project queries (see the projectid join rule).

If you have access to multiple datasets, you have multiple connection strings, not multiple schemas in one database.

Global ER and the seven domains

The schema groups into seven domains. Each domain has its own landing page with a focused ER subgraph and the per-table reference beneath it. The diagram below shows how the domains connect through three key bridge tables — contracts_projects, audit_answers_tickets, and audits_maps — plus the most important direct foreign keys.

The bridge tables (hexagon nodes) live inside one domain but link two: contracts_projects is documented under Contracts, audit_answers_tickets under Audits, and audits_maps under Maps. The seven domain landings wire the rest:

  • Projects — the hub. Every other table carries projectid.
  • Contracts — commercial layer above projects.
  • Tickets — issues, snags, and follow-ups, plus tags / progress labels / custom fields.
  • Audits — instances of an audit template plus the question/answer/ticket-link satellites.
  • Audit Templates — reusable blueprints that audits are instantiated from.
  • Maps — site plans / floor plans that tickets and audits position against.
  • Users — one row per email, global across the dataset.

Common patterns

A handful of conventions repeat across every domain. Internalize these once; the leaf pages will not re-explain them.

Primary keys

Most tables use a VARCHAR(255) primary key called id carrying a UUID-shaped string. A few exceptions:

  • users.email is the primary key (the user's email address).
  • Several bridge / satellite tables (project_participants, contracts_projects, audit_answers_tickets, audits_maps, audits_tags, audit_roles, audit_signatures, tickets_custom_field_options, tickets_custom_field_selections) have no primary key at all. Identity is enforced procedurally by the converter, which deletes-then-reinserts the full set on every observation. The leaf pages call this out individually.

projectid everywhere

Every entity table carries a projectid column and an index on it. Filter on projectid in the outer WHERE clause to let MariaDB prune early. The Projects landing documents the rule in full; the short version: when in doubt, include projectid on both sides of a join.

The projectid + couchdbid composite rule

A few entities (audit templates, maps, map groups) carry IDs that are project-scoped, not globally unique. To join across tables that participate in this rule, use both projectid and couchdbid (or both halves of an id field that concatenates them with a | separator, e.g. "project123|couch456"). Single- column joins on couchdbid alone will silently collapse rows across projects.

Timestamps

Most entity tables carry:

  • creationdate / lastmodifieddate — when the source document was created or last updated upstream.
  • archivedDATETIME set when the entity was soft-deleted. Filter on archived IS NULL for active reporting. Archived rows remain queryable so historical joins resolve.

Some tables carry derived time-difference columns (e.g. tickets.created_completed, audits.dueDate_completed). These are stored in seconds, not days; divide by 86400 for days. The relevant leaf pages spell out the unit and any clamping or sign convention.

Status enums are stored verbatim

Status values come straight from CouchDB without normalization. That means case matters: audits.status includes a value 'In Progress' (capital I, with the space) — querying WHERE status = 'in progress' returns zero rows. The audit and ticket leaves enumerate the closed sets explicitly.

Best practices

Joining tables

  • Filter on projectid first. Every join from one domain into another should carry projectid on both sides where possible.
  • Use the schema's named indexes. Each leaf page lists them under Indexes; nearly all of them are on projectid, couchdbid, or a parent FK column.
  • Prefer composite joins when joining the satellite tables of one domain (e.g. audits_questions ↔ audits_answer joined on audit AND questionid). The questionID collision pitfall is the canonical reason.

Refresh-window awareness

Because most satellite tables are rebuilt with DELETE-then-INSERT on every observation of a parent document, reading a satellite during an in-flight reindex of one parent may briefly return zero rows for that parent. For long-running reports, expect this and plan retries — or schedule reports off-peak.

Counting active vs. archived

Always include archived IS NULL when counting active records, and archived IS NOT NULL when reporting on archived ones. Mixing the two without a filter produces a count that does not match anything the product UI shows.

Example queries

All active tickets for a project

SELECT *
FROM tickets
WHERE projectid = 'your_project_id'
AND archived IS NULL;

Audit completion summary

SELECT
a.name AS audit_name,
a.status,
a.completiondate,
COUNT(aa.questionid) AS questions_answered
FROM audits a
LEFT JOIN audits_answer aa ON aa.audit = a.id
WHERE a.projectid = 'your_project_id'
GROUP BY a.id, a.name, a.status, a.completiondate;

aa.audit = a.id is the safe join — every row in audits_answer carries its parent audit, which sidesteps the questionID collision described on the Audits domain landing.

Project participants with their roles

SELECT
p.name AS project_name,
pp.email,
pp.role
FROM projects p
JOIN project_participants pp ON pp.projectid = p.id
WHERE p.id = 'your_project_id'
ORDER BY pp.role, pp.email;

Tickets created from audit findings (cross-domain)

audit_answers_tickets ties a ticket back to the audit answer that produced it — the canonical cross-domain join.

SELECT
a.id AS audit_id,
a.name AS audit_name,
t.id AS ticket_id,
t.title AS ticket_title,
t.status AS ticket_status
FROM audit_answers_tickets aat
JOIN audits a ON a.id = aat.audit
JOIN tickets t ON t.id = aat.ticket
WHERE a.projectid = 'your_project_id'
ORDER BY a.creationdate DESC, t.creationdate DESC;

The join goes through aat.audit and aat.ticket, never through questionID alone — see the Audits domain for why.