M1/L1/10

The Platform Engineering Shift

The mid-level track taught you to build production-grade pipelines: idempotent writes, incremental processing, quality gates, Delta Lake, CI/CD, monitoring. You can take a data requirement and deliver a pipeline that runs reliably, performs within SLA, and recovers from failures.

That is the skill of a mid-level data engineer. The senior-level shift is from building pipelines to designing the platform that makes pipelines buildable.

When NordGrid had one team and ten pipelines, any engineer could hold the entire system in their head: every table name, every dependency, every scheduling slot. When GridUnion Continental acquired three subsidiaries and the platform grew to six teams, fifty pipelines, and two hundred tables across four countries, that mental model collapsed.

Engineers on the German billing team did not know which French tables their pipeline depended on. The analytics team's notebook overwrote a Silver table that three production pipelines consumed.

A schema change in the Dutch Bronze ingestion broke a Luxembourg Gold dashboard that nobody knew existed.

GridUnion Continental — The Scenario

GridUnion Continental is the post-acquisition entity formed when NordGrid Energy (Germany) acquired Electra Metering (France), DutchGrid Analytics (Netherlands), and LuxPower Systems (Luxembourg). The combined platform serves 3.5 million smart meters across 30 regions in 4 countries, generating 500,000 new rows per day (~8GB).

The historical archive is 8TB and growing. Six teams contribute to the platform: three data engineering teams (one per legacy company plus a central platform team), two analytics teams (commercial analytics and regulatory reporting), and one ML team (demand forecasting).

Each team operates with partial autonomy — they own their pipelines, their Gold tables, and their deployment schedules — but they share the Silver layer, the cluster resources, and the Unity Catalog infrastructure. The tension between autonomy and shared infrastructure is the central architectural challenge of this module.

Why the Mid-Level Architecture Does Not Scale

Mid-Level Pattern
Single YAML config per pipeline
Works For
1 team, 10 pipelines
Breaks At
6 teams with conflicting config conventions
Senior Alternative
Shared config repository with team-specific overrides (lesson 7)
Mid-Level Pattern
One Bronze/Silver/Gold directory tree
Works For
1 team, clear ownership
Breaks At
6 teams writing to shared Silver, unclear who owns what
Senior Alternative
Namespace hierarchy with ownership registry (lessons 2–3)
Mid-Level Pattern
Quality gates per pipeline
Works For
1 team verifying its own output
Breaks At
Consumer team cannot verify producer's quality
Senior Alternative
Data contracts with cross-team SLAs (lesson 5)
Mid-Level Pattern
Airflow DAG per pipeline
Works For
10 independent pipelines
Breaks At
50 pipelines with cross-team dependencies
Senior Alternative
Dependency graph with coordination protocol (lesson 6)
Mid-Level Pattern
Informal table naming
Works For
Everyone knows the tables
Breaks At
New team members cannot discover tables
Senior Alternative
Data product catalog with discoverability (lesson 5)
Mid-Level Pattern
Cost absorbed by one budget
Works For
Small cluster, one team
Breaks At
Large cluster, 6 teams, unequal usage
Senior Alternative
Cost allocation with per-team chargeback (lesson 8)

What This Module Covers

This module addresses the organizational architecture of a multi-team data platform. The lessons are not about PySpark code — they are about the structures, contracts, and processes that make PySpark code manageable at scale.

Namespace design determines how tables are organized and discovered. Ownership contracts determine who is responsible for each table's quality and freshness.

Layer contracts determine what each consumer can expect from the data. Data product design determines how tables are published and documented.

Dependency management determines how cross-team pipelines coordinate. Configuration management determines how settings are shared and promoted.

Cost allocation determines how cluster resources are charged to the teams that use them. Observability determines how the platform's health is monitored.

And the synthesis lesson pulls these components into GridUnion's complete platform architecture. The module reads more like a design document than a code tutorial because at senior level, the architectural decisions matter more than the implementation details.

Architectural principle

The purpose of platform architecture is to make the default path the correct path. If the default namespace convention prevents naming collisions, engineers do not need to coordinate table names. If the default ownership model prevents unauthorized writes, engineers do not need to negotiate access. If the default dependency model prevents circular references, engineers do not need to audit the dependency graph manually. Every architectural decision in this module is evaluated by this criterion: does it make the correct behavior automatic and the incorrect behavior difficult?

1 / 10