Data contracts for small teams
You do not need a 40-person data platform team to benefit from data contracts. Here is a deliberately modest version that a five-engineer shop can adopt next sprint.
Data contracts, as you will read about them on LinkedIn, are a heavy-sounding thing — schema registries, producer certifications, quality gates, SLAs that feel drafted by lawyers.
Strip all of that away. A data contract is just this: an explicit, versioned agreement between a producer of data and its consumers about the shape, meaning, and quality of what they are passing between them.
You can adopt the idea without any of the machinery. Here is the deliberately modest version.
Start with the handful of tables that matter
In any organisation, a small number of tables carry outsized weight. Every executive dashboard eventually reads from them. Every misreading of them causes a bad decision.
Pick three to five such tables. Those are the ones that need contracts. The rest can wait.
Write the contract as a README next to the table
For each of those tables, write a short markdown file in a repository that the data team actually reads. Put in:
- The question this table exists to answer. One sentence. “For each order, what did we bill the customer and what did we collect?”
- Grain. One row = ? Be exact. “One row per order_id” is clearer than “order data”.
- Columns. For each column: type, nullability, meaning, unit.
- Ownership. One named team, with a fallback on-call. Not “the data team”.
- SLAs. When must the table be updated by, and how stale is too stale?
- Known sharp edges. The things a new analyst will trip on. The historical fields that mean different things depending on year. The late-arriving rows.
That is a contract. It is not machine-enforced. It does not need to be, at first.
Make producers own it
The producers of the data — the team whose system is the source of truth — must own and sign the contract. This is the one non-negotiable. If the data team writes the contract and the producer shrugs, you do not have a contract; you have wishful documentation.
The way to make this real is procedural, not technical: changes to the schema of a contracted table must be approved in a pull request by someone from the producing team. This naturally surfaces breaking changes before they hit consumers.
Layer in automation only when the pain justifies it
Once the social contract is in place, tooling becomes useful. The cheapest first investments, in order:
- Column existence and type tests in dbt or SQLMesh. Runs every morning. Fails the build loudly when a producer silently drops a column.
- Freshness tests. Is this table being updated within its SLA window?
- Uniqueness and referential-integrity tests on the columns the contract calls out as invariants.
You can get a huge amount of value from exactly these three. Anything beyond them — full schema registries, contract-first code generation, producer certification programmes — is probably premature for a team under thirty.
The point is not the contract
The point of contracts is to force a conversation. They make implicit expectations explicit. They make the producer-consumer relationship something you can reason about, debate, and change.
If you get that conversation going, and it gets written down next to the tables people actually use, you already have most of the value — even if you never install a single contract-testing tool.