Saga Orchestration
Patterns for managing distributed transactions and long-running business processes without two-phase commit.
Inputs and Outputs
What you provide:
- Service boundaries and ownership (which service owns which step)
- Transaction requirements (which steps must be atomic, which can be eventual)
- Failure modes for each step (transient vs. permanent, retry policy)
- SLA requirements per step (informs timeout configuration)
- Existing event/messaging infrastructure (Kafka, RabbitMQ, SQS, etc.)
What this skill produces:
- Saga definition with ordered steps, action commands, and compensation commands
- Orchestrator or choreography implementation for your chosen pattern
- Compensation logic for each participant service (idempotent, always-succeeds)
- Step timeout configuration with per-step deadlines
- Monitoring setup: state machine metrics, stuck saga detection, DLQ recovery
When to Use This Skill
- Coordinating multi-service transactions without distributed locks
- Implementing compensating transactions for partial failures
- Managing long-running business workflows (minutes to hours)
- Handling failures in distributed systems where atomicity is required
- Building order fulfillment, approval, or booking processes
- Replacing fragile two-phase commit with async compensation
Core Concepts
Saga Pattern Types
Choreography Orchestration
βββββββ βββββββ βββββββ βββββββββββββββ
βSvc AβββΊβSvc BβββΊβSvc Cβ β Orchestratorβ
βββββββ βββββββ βββββββ ββββββββ¬βββββββ
β β β β
βΌ βΌ βΌ βββββββΌββββββ
Event Event Event βΌ βΌ βΌ
ββββββββββββββββββ
Each service reacts to the βSvc1ββSvc2ββSvc3β
previous service's event. ββββββββββββββββββ
No central coordinator. Central coordinator sends
commands and tracks state.
Choose orchestration when: You need explicit step tracking, retries, and centralized visibility. Easier to debug.
Choose choreography when: You want loose coupling and services that can evolve independently. Harder to trace.
Saga Execution States
| State |
Description |
| Started |
Saga initiated, first step dispatched |
| Pending |
Waiting for a step reply from a participant |
| Compensating |
A step failed; rolling back completed steps |
| Completed |
All forward steps succeeded |
| Failed |
Saga failed and all compensations have finished |
Compensation Rules
| Situation |
Handling |
| Step never started |
No compensation needed (skip) |
| Step completed successfully |
Run compensation command |
| Step failed before completion |
No compensation needed; mark failed |
| Compensation itself fails |
Retry with backoff β DLQ β manual intervention alert |
| Step result no longer exists |
Treat compensation as success (idempotency) |
Templates
Template 1: Order Fulfillment Saga (Orchestration)
Concrete subclass of the base orchestrator. Defines four steps spanning inventory, payment, shipping, and notification. See references/advanced-patterns.md for the full abstract SagaOrchestrator base class.
from saga_orchestrator import SagaOrchestrator, SagaStep
from typing import Dict, List
class OrderFulfillmentSaga(SagaOrchestrator):
"""Orchestrates order fulfillment across four participant services."""
@property
def saga_type(self) -> str:
return "OrderFulfillment"
def define_steps(self, data: Dict) -> List[SagaStep]:
return [
SagaStep(
name="reserve_inventory",
action="InventoryService.ReserveItems",
compensation="InventoryService.ReleaseReservation"
),
SagaStep(
name="process_payment",
action="PaymentService.ProcessPayment",
compensation="PaymentService.RefundPayment"
),
SagaStep(
name="create_shipment",
action="ShippingService.CreateShipment",
compensation="ShippingService.CancelShipment"
),
SagaStep(
name="send_confirmation",
action="NotificationService.SendOrderConfirmation",
compensation="NotificationService.SendCancellationNotice"
),
]
async def create_order(order_data: Dict, saga_store, event_publisher):
saga = OrderFulfillmentSaga(saga_store, event_publisher)
return await saga.start({
"order_id": order_data["order_id"],
"customer_id": order_data["customer_id"],
"items": order_data["items"],
"payment_method": order_data["payment_method"],
"shipping_address": order_data["shipping_address"],
})
class InventoryService:
async def handle_reserve_items(self, command: Dict):
try:
reservation = await self.reserve(command["items"], command["order_id"])
await self.event_publisher.publish("SagaStepCompleted", {
"saga_id": command["saga_id"],
"step_name": "reserve_inventory",
"result": {"reservation_id": reservation.id}
})
except InsufficientInventoryError as e:
await self.event_publisher.publish("SagaStepFailed", {
"saga_id": command["saga_id"],
"step_name": "reserve_inventory",
"error": str(e)
})
async def handle_release_reservation(self, command: Dict):
"""Compensation β idempotent, always publishes completion."""
try:
await self.release_reservation(
command["original_result"]["reservation_id"]
)
except ReservationNotFoundError:
pass
await self.event_publisher.publish("SagaCompensationCompleted", {
"saga_id": command["saga_id"],
"step_name": "reserve_inventory"
})
Template 2: Choreography-Based Saga
Each service listens for the previous service's event and reacts. No central coordinator. Compensation is triggered by failure events propagating backward.
from dataclasses import dataclass
from typing import Dict, Any
@dataclass
class SagaContext:
"""Carried through all events in a choreographed saga."""
saga_id: str
step: int
data: Dict[str, Any]
completed_steps: list
class OrderChoreographySaga:
"""Choreography-based saga β services react to each other's events."""
def __init__(self, event_bus):
self.event_bus = event_bus
self._register_handlers()
def _register_handlers(self):
self.event_bus.subscribe("OrderCreated", self._on_order_created)
self.event_bus.subscribe("InventoryReserved", self._on_inventory_reserved)
self.event_bus.subscribe("PaymentProcessed", self._on_payment_processed)
self.event_bus.subscribe("ShipmentCreated", self._on_shipment_created)