PostgreSQL Database Engineering
A comprehensive skill for professional PostgreSQL database engineering, covering everything from query optimization and indexing strategies to high availability, replication, and production database management. This skill enables you to design, optimize, and maintain high-performance PostgreSQL databases at scale.
When to Use This Skill
Use this skill when:
- Designing database schemas for high-performance applications
- Optimizing slow queries and improving database performance
- Implementing indexing strategies for complex query patterns
- Setting up partitioning for large tables (100M+ rows)
- Configuring streaming replication and high availability
- Tuning PostgreSQL configuration for production workloads
- Implementing backup and recovery procedures
- Debugging performance issues and query bottlenecks
- Setting up connection pooling with pgBouncer or PgPool
- Monitoring database health and performance metrics
- Planning database migrations and schema changes
- Implementing database security and access controls
- Scaling PostgreSQL databases horizontally or vertically
- Managing VACUUM operations and database maintenance
- Setting up logical replication for data distribution
Core Concepts
PostgreSQL Architecture
PostgreSQL uses a process-based architecture with several key components:
- Postmaster Process: Main server process that manages connections
- Backend Processes: One per client connection, handles queries
- Shared Memory: Shared buffers, WAL buffers, lock tables
- Background Workers: Autovacuum, checkpointer, WAL writer, statistics collector
- Write-Ahead Log (WAL): Transaction log for durability and replication
- Storage Layer: TOAST for large values, FSM for free space, VM for visibility
MVCC (Multi-Version Concurrency Control)
PostgreSQL's foundational concurrency mechanism:
- Snapshots: Each transaction sees a consistent snapshot of data
- Tuple Versions: Multiple row versions coexist for concurrent access
- Transaction IDs: xmin (creating transaction), xmax (deleting transaction)
- Visibility Rules: Determines which row versions are visible to transactions
- VACUUM: Reclaims space from dead tuples and prevents transaction wraparound
- FREEZE: Marks old rows as visible to all transactions
Key Implications:
- No read locks - readers never block writers
- Writers never block readers
- Updates create new row versions
- Regular VACUUM is essential
- Dead tuples accumulate until vacuumed
Transaction Isolation Levels
PostgreSQL supports four isolation levels:
- Read Uncommitted: Treated as Read Committed in PostgreSQL
- Read Committed (default): Sees committed data at statement start
- Repeatable Read: Sees snapshot from transaction start
- Serializable: True serializable isolation with SSI
Choosing Isolation:
- Read Committed: Most applications, best performance
- Repeatable Read: Reports, analytics needing consistency
- Serializable: Financial transactions, critical consistency needs
Index Types
PostgreSQL offers multiple index types for different use cases:
1. B-Tree (Default)
- Use for: Equality, range queries, sorting
- Supports: <, <=, =, >=, >, BETWEEN, IN, IS NULL
- Best for: Most general-purpose indexing
- Example: Primary keys, foreign keys, timestamps
2. Hash
- Use for: Equality comparisons only
- Supports: = operator
- Best for: Large tables with equality lookups
- Limitation: Not WAL-logged before PG 10, no range queries
3. GiST (Generalized Search Tree)
- Use for: Geometric data, full-text search, custom types
- Supports: Overlaps, contains, nearest neighbor
- Best for: Spatial data, ranges, full-text search
- Example: PostGIS geometries, tsvector, ranges
4. GIN (Generalized Inverted Index)
- Use for: Multi-valued columns (arrays, JSONB, full-text)
- Supports: Contains, exists operators
- Best for: JSONB queries, array operations, full-text search
- Tradeoff: Slower updates, faster queries
5. BRIN (Block Range Index)
- Use for: Very large tables with natural ordering
- Supports: Range queries on sorted data
- Best for: Time-series data, append-only tables
- Advantage: Tiny index size, scales to billions of rows
6. SP-GiST (Space-Partitioned GiST)
- Use for: Non-balanced data structures
- Supports: Points, ranges, IP addresses
- Best for: Quadtrees, k-d trees, radix trees
Query Planning and Optimization
PostgreSQL's query planner determines execution strategies:
Planner Components:
- Statistics: Table and column statistics for cardinality estimation
- Cost Model: CPU, I/O, and memory cost estimation
- Plan Types: Sequential scan, index scan, bitmap scan, joins
- Join Methods: Nested loop, hash join, merge join
- Optimization: Query rewriting, predicate pushdown, join reordering
Key Statistics:
n_distinct: Number of distinct values (for selectivity)
correlation: Physical row ordering correlation
most_common_vals: MCV list for skewed distributions
histogram_bounds: Value distribution histogram
Understanding EXPLAIN:
- Cost: Startup cost .. total cost (arbitrary units)
- Rows: Estimated row count
- Width: Average row size in bytes
- Actual Time: Real execution time (with ANALYZE)
- Loops: Number of times node executed
Partitioning Strategies
Table partitioning for managing large datasets:
Range Partitioning
- Use for: Time-series data, sequential values
- Example: Partition by date ranges (daily, monthly, yearly)
- Benefit: Easy data lifecycle management, faster queries
List Partitioning
- Use for: Discrete categorical values
- Example: Partition by country, region, status
- Benefit: Logical data separation, partition pruning
Hash Partitioning
- Use for: Even data distribution
- Example: Partition by hash(user_id)
- Benefit: Balanced partition sizes, parallel queries
Partition Pruning:
- Planner eliminates irrelevant partitions
- Drastically reduces query scope
- Essential for partition performance
Partition-Wise Operations:
- Partition-wise joins: Join matching partitions directly
- Partition-wise aggregation: Aggregate within partitions
- Parallel partition processing
Replication and High Availability
PostgreSQL replication options:
Streaming Replication (Physical)
- Type: Binary WAL streaming to standby servers
- Modes: Asynchronous, synchronous, quorum-based
- Use for: High availability, read scalability
- Failover: Automatic with tools like Patroni, repmgr
Synchronous vs Asynchronous:
- Synchronous: Zero data loss, higher latency
- Asynchronous: Low latency, potential data loss
- Quorum: Balance between safety and performance
Logical Replication
- Type: Row-level change stream
- Use for: Selective replication, upgrades, multi-master
- Benefit: Replicate specific tables, cross-version
- Limitation: No DDL replication, overhead
Cascading Replication
- Standbys replicate from other standbys
- Reduces load on primary
- Geographic distribution
Connection Pooling
Managing database connections efficiently:
pgBouncer
- Type: Lightweight connection pooler
- Modes: Session, transaction, statement pooling
- Use for: High connection count applications
- Benefit: Reduced connection overhead, resource limits
Pooling Modes:
- Session: Client connects for entire session
- Transaction: Connection per transaction
- Statement: Connection per statement (rarely used)
PgPool-II
- Type: Feature-rich middleware
- Features: Connection pooling, load balancing, query caching
- Use for: Read/write splitting, connection management
- Benefit: Advanced routing, in-memory cache
VACUUM and Maintenance
Critical maintenance operations:
VACUUM
- Purpose: Reclaim dead tuple space, update statistics
- Types: Regular VACUUM, VACUUM FULL
- When: After large updates/deletes, regularly via autovacuum
- Impact: Regular VACUUM is non-blocking
ANALYZE
- Purpose: Update planner statistics
- When: After data changes, schema modifications
- Impact: Minimal, fast on most tables
REINDEX
- Purpose: Rebuild indexes, fix bloat
- When: Index corruption, significant bloat
- Impact: Locks table, use REINDEX CONCURRENTLY (PG 12+)
Autovacuum
- Purpose: Automated VACUUM and ANALYZE
- Configuration: Threshold-based triggering
- Tuning: Balance resource usage vs. responsiveness
- Monitoring: Track autovacuum runs, prevent wraparound
Performance Tuning
Key configuration parameters:
Memory Settings
shared_buffers: 25% of RAM (start point)
effective_cache_size: 50-75% of RAM
work_mem: Per-operation memory (sort, hash)
maintenance_work_mem: VACUUM, CREATE INDEX memory
Checkpoint and WAL
checkpoint_timeout: How often to checkpoint
max_wal_size: WAL size before checkpoint
checkpoint_completion_target: Spread checkpoint I/O
wal_buffers: WAL write buffer size
Query Planner
random_page_cost: Relative cost of random I/O
effective_io_concurrency: Concurrent I/O operations
default_statistics_target: Histogram detail level
Connection Settings
max_connections: Maximum client connections
connection_limit: Per-database/user limits
Index Strategies
Choosing the Right Index
Decision Matrix:
| Query Pattern |
Index Type |
Reason |
WHERE id = 5 |
B-tree |
Equality lookup |
WHERE created_at > '2024-01-01' |
B-tree |
Range query |
ORDER BY name |
B-tree |
Sorting support |
WHERE tags @> ARRAY['sql'] |
GIN |
Array containment |
WHERE data->>'status' = 'active' |
GIN (jsonb_path_ops) |
JSONB query |
WHERE to_tsvector(content) @@ query |
GIN |
Full-text search |
WHERE location <-> point(0,0) |
GiST |
Nearest neighbor |
WHERE timestamp BETWEEN ... (large table) |
BRIN |
Sequential time-series |
WHERE ip_address << '192.168.0.0/16' |
GiST or SP-GiST |
IP range query |
Composite Indexes
Multi-column indexes for complex queries:
Column Ordering Rules:
- Equality columns first
- Sort/range columns last
- High-selectivity columns first
- Match query patterns exactly
Example:
CREATE INDEX idx_users_status_created ON users(status, created_at);
Partial Indexes
Index subset of rows:
Benefits:
- Smaller index size
- Faster updates on non-indexed rows
- Targeted query optimization
Use Cases:
- Index only active records:
WHERE deleted_at IS NULL
- Index recent data:
WHERE created_at > NOW() - INTERVAL '90 days'
- Index specific states:
WHERE status IN ('pending', 'processing')
Expression Indexes
Index computed values:
Examples:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
CREATE INDEX idx_events_date ON events(DATE(created_at));
CREATE INDEX idx_data_status ON documents((data->>'status'));
Covering Indexes (INCLUDE)
Include non-key columns for index-only scans:
CREATE INDEX idx_users_email_include
ON users(email)
INCLUDE (first_name, last_name, created_at);
Benefit: Query satisfied entirely from index, no table lookup
Index Maintenance
Monitoring Index Usage:
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Detecting Bloat:
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
Query Optimization
Using EXPLAIN ANALYZE
Understanding query execution:
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2024-01-01';
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT u.*, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';
Key Metrics:
- Planning Time: Time to generate plan
- Execution Time: Actual query runtime
- Shared Hit vs Read: Buffer cache hits vs disk reads
- Rows: Estimated vs actual row counts
- Filter vs Index Cond: Post-scan filtering vs index usage
Common Query Anti-Patterns
1. N+1 Queries
Problem: One query per row in a loop
Solution: JOIN or batch queries
2. SELECT *
Problem: Fetches unnecessary columns
Solution: Select only needed columns
3. Implicit Type Conversions