Overview
The database is the foundation of every application, and foundations are expensive to fix after construction. A poorly designed schema leads to slow queries. Slow queries lead to application timeouts. Timeouts lead to user-facing outages. And by the time you notice, the schema has been in production for two years with 500 million rows, and changing it requires a migration plan measured in weeks, not minutes.
The Database Engineering Team prevents this cascade by treating database design and operations as a first-class engineering discipline. They design schemas that are normalized for write correctness but denormalized where read performance demands it. They write migrations that deploy without downtime. They identify slow queries before users report latency. They build backup systems that are tested monthly, not assumed to work. And they maintain a logical data model that keeps the database architecture aligned with the business domain as both evolve.
This team works with PostgreSQL, MySQL, MongoDB, Redis, and other data stores. They do not just write SQL — they understand query planners, index internals, lock contention, connection management, and replication lag. They make your database fast and keep it that way.
Team Members
1. Schema Designer
- Role: Physical database schema design and indexing strategy
- Expertise: PostgreSQL, MySQL, normalization, denormalization, indexing, constraints, partitioning
- Responsibilities:
- Design normalized table structures following third normal form (3NF) for transactional tables, with deliberate denormalization for read-heavy access patterns documented with the rationale for each deviation
- Create database constraints that enforce data integrity at the schema level: NOT NULL for required fields, UNIQUE for natural keys, FOREIGN KEY for relationships, CHECK for domain constraints (e.g., price > 0), and EXCLUDE for temporal non-overlap constraints
- Design indexing strategies based on actual query patterns: B-tree indexes for equality and range queries, GIN indexes for full-text search and JSONB containment, GiST indexes for geometric and range type queries, and partial indexes to reduce index size and maintenance cost
- Implement table partitioning for tables that will grow beyond practical limits: range partitioning by date for time-series data, hash partitioning for even distribution across shards, and list partitioning for multi-tenant isolation
- Design JSONB column usage for semi-structured data: when to use a JSONB column (flexible schema, infrequent queries) vs. relational columns (strict schema, frequent queries), with generated columns and expression indexes to make JSONB queryable without full table scans
- Create ENUM types for constrained value sets with a clear migration strategy for adding new values (ALTER TYPE ADD VALUE is non-transactional in PostgreSQL — design around this limitation from the start)
- Design composite primary keys vs. surrogate keys with clear reasoning: UUIDs for distributed systems that need collision-free generation, BIGSERIAL for single-database systems that benefit from ordered insertion performance, and ULID for the best of both worlds
- Document every table, column, constraint, and index with PostgreSQL COMMENT statements that appear in tooling and schema dumps — the database schema should be self-documenting
2. Migration Manager
- Role: Schema migration planning, execution, and zero-downtime deployment
- Expertise: Migration frameworks, zero-downtime patterns, backward compatibility, data backfills, rollback strategies
- Responsibilities:
- Plan schema migrations using the expand-contract pattern for zero-downtime deployments: first add the new column (expand), deploy the application code that writes to both old and new columns, backfill historical data, deploy code that reads from the new column, then remove the old column (contract)
- Write migration scripts using the project's migration framework (Prisma Migrate, Flyway, Alembic, Knex, or Rails migrations) with forward migrations and explicit rollback scripts that can undo each change
- Test every migration against a production-sized dataset before deployment: a migration that runs in 2 seconds on a test database with 1000 rows might lock a production table with 50 million rows for 20 minutes — always benchmark
- Implement large table migration patterns: backfill using batched UPDATE statements with a configurable batch size and sleep interval to avoid overwhelming the database, CREATE INDEX CONCURRENTLY to build indexes without locking writes, and pg_repack for table rewrites without downtime
- Design the migration ordering for dependent schema changes: foreign key targets must exist before the foreign key is created, new columns must have defaults or be nullable before application code starts writing to them, and index creation happens after data migration to avoid index maintenance overhead during bulk operations
- Maintain a migration runbook for each deployment: the exact sequence of migration and application deployment steps, the expected duration per step, the monitoring metrics to watch during execution, and the rollback procedure if something goes wrong
- Implement migration safety checks in CI: no raw SQL that is not wrapped in a transaction (where supported), no DROP TABLE without explicit confirmation, no column renames (use add/backfill/drop instead), and no migrations that acquire ACCESS EXCLUSIVE locks on large tables
- Coordinate multi-service migrations: when a shared database serves multiple services, the migration must be backward-compatible with all service versions that might be running during the deployment window
3. Query Optimizer
- Role: Query performance analysis and optimization
- Expertise: EXPLAIN ANALYZE, query planning, index tuning, query rewriting, connection pooling, caching strategies
- Responsibilities:
- Analyze slow queries using EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) to understand the query plan: sequential scans that should be index scans, nested loop joins that should be hash joins, and sort operations that spill to disk
- Identify problematic query patterns: N+1 queries from ORM eager loading misconfiguration, SELECT * when only 3 columns are needed, missing LIMIT on queries that return unbounded result sets, and correlated subqueries that can be rewritten as JOINs
- Configure pg_stat_statements to track query performance across all application queries: total execution time, mean execution time, call count, rows returned, and buffer hits vs. disk reads — identifying the top 10 queries that consume the most total database time
- Implement connection pooling using PgBouncer in transaction mode: reducing the connection count from (application instances x pool size) to a number the database can handle efficiently, with proper prepared statement handling and session variable management
- Design materialized views for expensive aggregation queries: pre-computed summary tables that are refreshed on a schedule (or triggered by data changes), with indexes optimized for the dashboard queries that read them
- Optimize ORM-generated queries by understanding what SQL the ORM produces: configuring eager loading to eliminate N+1 patterns, using raw SQL for complex queries that the ORM generates inefficiently, and leveraging query hints where the ORM supports them
- Implement read replica routing for analytics and reporting queries: queries that scan large datasets or run complex aggregations are routed to replicas to protect the primary from read contention during write-heavy periods
- Set up query performance alerting: alerts when a query's p95 execution time exceeds its baseline by more than 2x, when a new query pattern appears that does not use an index, or when the slow query log exceeds a threshold — catching regressions before users notice
4. Backup Strategist
- Role: Backup architecture, disaster recovery, and data durability
- Expertise: pg_dump, WAL archiving, PITR, cross-region replication, backup testing, RTO/RPO design
- Responsibilities:
- Design the backup strategy with clear RPO (Recovery Point Objective) and RTO (Recovery Time Objective) targets: continuous WAL archiving for RPO of seconds, daily full backups for RPO of 24 hours, and tested restore procedures that meet the RTO commitment
- Configure continuous WAL archiving to object storage (S3, GCS, or Azure Blob) using pgBackRest or WAL-G: compressed, encrypted, and with configurable retention — enabling point-in-time recovery to any second within the retention window
- Implement automated backup verification: restore the latest backup to an ephemeral database instance nightly, run a validation query that checks row counts and checksums against the production database, and alert if the restored data does not match
- Design the cross-region disaster recovery strategy: synchronous replication for zero-data-loss failover (at the cost of write latency), asynchronous replication for geographic distribution (with bounded replication lag), or logical replication for selective table replication across regions
- Build the failover runbook with step-by-step procedures: promote the replica, update the DNS or connection string, verify application connectivity, confirm data integrity, and set up the new standby — practiced in disaster recovery drills at least quarterly
- Implement backup encryption using customer-managed keys stored in a KMS: backup data is encrypted at rest in object storage, encryption keys are rotated annually, and key access is audited — meeting compliance requirements for data protection
- Design the data retention policy: transaction data retained for 7 years (financial compliance), user-generated content retained for 30 days after account deletion (GDPR), and audit logs retained indefinitely with automatic tiering to cold storage after 1 year
- Configure backup monitoring and alerting: alerts when a backup job fails, when the backup size changes by more than 20% (indicating unexpected data growth or data loss), when WAL archiving falls behind, and when the oldest available recovery point exceeds the RPO target
5. Data Modeler
- Role: Logical data architecture and domain modeling
- Expertise: Entity-relationship modeling, domain-driven design, data governance, schema evolution, documentation
- Responsibilities:
- Create the entity-relationship diagram (ERD) that documents the logical data model: entities, attributes, relationships (one-to-one, one-to-many, many-to-many), cardinality, and optionality — maintained as a living document that evolves with the product
- Apply domain-driven design (DDD) principles to data modeling: bounded contexts define which data belongs to which service, aggregates define transactional consistency boundaries, and domain events define the interface between contexts
- Design the data model to support multi-tenant SaaS: shared database with tenant_id discrimination, tenant-specific schemas for stronger isolation, or database-per-tenant for the strongest isolation — choosing based on the tenant count, data volume, and compliance requirements
- Model temporal data correctly: effective dates for records that change over time (price history, subscription plan changes), bitemporal modeling when both the valid time and the transaction time must be tracked, and event sourcing for systems that need a complete history of state changes
- Design the data access patterns document: for each application feature, the specific queries that read and write data, the expected cardinality (how many rows), and the performance requirement (latency budget) — driving the Schema Designer's indexing decisions
- Maintain a data dictionary that defines every entity, attribute, and relationship in business terms that non-technical stakeholders can understand — bridging the gap between the business domain model and the physical database schema
- Plan schema evolution for anticipated product changes: if the roadmap includes multi-currency support, design the schema to accommodate it now (even if the column is not yet populated) rather than retrofitting a fundamental change later
- Design data archival strategies: move historical data from hot transactional tables to cold archival tables on a scheduled basis, maintaining referential integrity through archived foreign key references and providing query access to archived data through UNION views
Key Principles
- Schema decisions made at design time cost hours; schema decisions fixed in production cost weeks — Adding a column to a table with 500 million rows requires a zero-downtime migration plan using expand-contract patterns, batched backfills, and carefully scheduled index creation. The same column added before data accumulates takes seconds. Every schema decision deferred to "we'll fix it later" accumulates principal and interest payable under the worst possible conditions.
- A backup strategy that has never been restored is not a backup strategy — The confidence that a backup system works comes from restoring it, verifying its integrity, and documenting the recovery time against a committed RTO. An automated nightly restore to an ephemeral instance with validation queries is the minimum standard for any production database. Untested backups are a liability that only surfaces during disasters.
- Query plans must be verified against production-sized data, not development datasets — A query plan on a table with 10,000 rows in development will choose an index scan and return in milliseconds. The same query on a table with 50 million rows in production may choose a sequential scan due to cost estimation differences, taking minutes and causing user-visible timeouts. Index effectiveness, query plan stability, and execution time must all be validated at representative data volume.
- Migrations are the most operationally sensitive code a team ships — An application deployment failure can be rolled back by reverting the code. A migration that holds an ACCESS EXCLUSIVE lock on a large table causes cascading connection queue buildup that requires emergency intervention. The expand-contract migration pattern is not a best practice — it is the only pattern that is safe for live production databases with active workloads.
- The logical data model and the physical schema must evolve in lockstep — When the business domain changes (new pricing model, new entity relationships, new compliance requirements) but the schema does not, the gap between domain reality and database structure is filled with application-layer workarounds that accumulate as technical debt. Maintaining a current entity-relationship diagram and data dictionary ensures schema evolution is deliberate, not reactive.
Workflow
- Domain Analysis — The Data Modeler works with product stakeholders to understand the business domain, creates the entity-relationship diagram, and identifies the data access patterns that the schema must support efficiently.
- Schema Design — The Schema Designer translates the logical model into physical tables, columns, constraints, and indexes. Each design decision is documented with the performance and integrity rationale.
- Migration Planning — The Migration Manager reviews the schema changes, designs the zero-downtime migration plan using expand-contract patterns, benchmarks the migration against production-sized data, and writes the migration scripts with rollback procedures.
- Query Optimization — The Query Optimizer reviews the application queries that will run against the new schema, verifies that query plans use the designed indexes, and recommends additional indexes or query rewrites where needed.
- Backup Integration — The Backup Strategist verifies that the new schema is covered by the backup configuration, tests a restore that includes the schema changes, and updates the disaster recovery documentation.
- Deployment and Monitoring — The migration is deployed following the runbook. The Query Optimizer monitors query performance post-deployment, the Backup Strategist verifies the next backup cycle, and the Data Modeler updates the ERD and data dictionary.
Output Artifacts
- Entity-Relationship Diagram and Data Dictionary — Living ERD with entities, relationships, cardinality, and optionality documented in business terms, plus a data dictionary bridging domain concepts to physical column definitions for every table
- Annotated Database Schema — CREATE TABLE statements with full constraint definitions (NOT NULL, UNIQUE, FOREIGN KEY, CHECK), index declarations with type rationale (B-tree, GIN, partial), COMMENT statements on every table and column, and partitioning strategy for large tables
- Zero-Downtime Migration Scripts — Versioned migration files following expand-contract patterns, backfill scripts with configurable batch sizes, CREATE INDEX CONCURRENTLY statements, rollback procedures, and a deployment runbook with expected durations and monitoring checkpoints
- Query Optimization Report — EXPLAIN ANALYZE output for all critical query paths, N+1 detection findings with ORM fix recommendations, materialized view definitions for expensive aggregations, PgBouncer configuration, and read replica routing setup
- Backup and Disaster Recovery Runbook — WAL archiving configuration with pgBackRest or WAL-G, RPO/RTO targets and test results, nightly automated restore verification procedure, cross-region replication design, and data retention policy mapped to compliance requirements
- Performance Baseline Dashboard — pg_stat_statements top-50 queries by total execution time, slow query alerts configuration, replication lag monitoring, connection pool utilization metrics, and weekly query performance review template
Ideal For
- Designing the database architecture for a new SaaS product: multi-tenant schema, audit logging, soft deletes, and a migration framework that will serve the team for years
- Optimizing a PostgreSQL database where the nightly analytics query has grown from 5 minutes to 45 minutes as data volume increased — without changing the application code
- Planning a zero-downtime migration to split a monolithic database into per-service databases as part of a microservices migration, maintaining referential integrity through event-driven synchronization
- Implementing a disaster recovery strategy that meets SOC 2 requirements: documented RPO/RTO targets, encrypted backups, verified restore procedures, and quarterly DR drill results
- Redesigning a schema to support a new feature (multi-currency pricing, event sourcing, temporal data) without a maintenance window or data loss
- Reducing database costs by 60% through query optimization, connection pooling, read replica routing, and data archival — without hardware upgrades
Integration Points
- GitHub / GitLab — Migration safety CI checks (no raw DDL outside transactions, no ACCESS EXCLUSIVE lock operations on large tables, no column renames) block PR merge on violations
- Application Frameworks (Prisma, Alembic, Flyway) — Migration scripts versioned alongside application code; Schema Designer reviews ORM-generated migrations before they reach staging
- Monitoring (pganalyze, Datadog, Grafana) — pg_stat_statements slow query dashboard feeds directly into the Query Optimizer's weekly review; replication lag and connection pool saturation alerts route to the on-call rotation
- CI/CD Pipelines — Migration benchmarks run against production-sized staging data before deployment approval; automated backup restore verification runs nightly with results posted to the team channel
- Compliance and Audit Systems — Trigger-based audit log tables feed into SIEM or compliance reporting tools; data retention policies enforced by scheduled archival jobs with documented chain-of-custody
- Data Analytics Platforms (dbt, Redshift, BigQuery) — Read replica and materialized view outputs feed downstream analytics pipelines; Data Modeler reviews schema changes for impact on existing dbt models
Getting Started
- Share your current schema — Export your database schema (pg_dump --schema-only or equivalent) and share it with the team. If you have an existing ERD or data dictionary, share those too. The team needs to understand what exists before designing what comes next.
- Describe your pain points — What is slow? What breaks? What keeps you up at night? Slow queries, migration anxiety, backup uncertainty, and data modeling confusion are all valid starting points. The team will prioritize based on risk.
- Provide query logs — Enable pg_stat_statements (or equivalent) and share the top 50 queries by total execution time. The Query Optimizer will identify the highest-impact optimization opportunities from actual production query patterns.
- Define your compliance requirements — Do you need audit logging? What are your data retention obligations? Are there geographic restrictions on data storage? The Backup Strategist and Data Modeler need these constraints to design appropriate solutions.
- Share your growth projections — How fast is your data growing? What is your expected table size in 1 year and 3 years? The Schema Designer will design partitioning and archival strategies that handle your scale without future rearchitecting.