Learning Notes #48 β Common Pitfalls in Event Driven Architecture
Today, i came across Raul Junco post on mistakes in Event Driven Architecture β https://www.linkedin.com/posts/raul-junco_after-years-building-event-driven-systems-activity-7278770394046631936-zu3-?utm_source=share&utm_medium=member_desktop. In this blog i am highlighting the same for future reference.
Event-driven architectures are awesome, but they come with their own set of challenges. Missteps can lead to unreliable systems, inconsistent data, and frustrated users. Letβs explore some of the most common pitfalls and how to address them effectively.
1. Duplication
Idempotent APIs β https://parottasalna.com/2025/01/08/learning-notes-47-idempotent-post-requests/
Events often get re-delivered due to retries or system failures. Without proper handling, duplicate events can,
- Charge a customer twice for the same transaction: Imagine a scenario where a payment service retries a payment event after a temporary network glitch, resulting in a duplicate charge.
- Cause duplicate inventory updates: For example, an e-commerce platform might update stock levels twice for a single order, leading to overestimating available stock.
- Create inconsistent or broken system states: Duplicates can cascade through downstream systems, introducing mismatched or erroneous data.
Solution:
- Assign unique IDs: Ensure every event has a globally unique identifier. Consumers can use these IDs to detect and discard duplicates.
- Design idempotent processing: Structure your operations so they produce the same outcome even when executed multiple times. For instance, an API updating inventory could always set stock levels to a specific value rather than incrementing or decrementing.
2. Not Guaranteeing Order
Events can arrive out of order when distributed across partitions or queues. This can lead to
- Processing a refund before the payment: If a refund event is processed before the corresponding payment event, the system might show a negative balance or fail to reconcile properly.
- Breaking logic that relies on correct sequence: Certain workflows, such as assembling logs or transactional data, depend on a strict event order to function correctly.
Solution
- Use brokers with ordering guarantees: Message brokers like Apache Kafka support partition-level ordering. Design your topics and partitions to align with entities requiring ordered processing (e.g., user or account ID).
- Add sequence numbers or timestamps: Include metadata in events to indicate their position in a sequence. Consumers can use this data to reorder events if necessary, ensuring logical consistency.
3. The Dual Write Problem
Outbox Pattern: https://parottasalna.com/2025/01/03/learning-notes-31-outbox-pattern-cloud-pattern/
When writing to a database and publishing an event, one might succeed while the other fails. This can
- Lose events: If the event is not published after the database write, downstream systems might remain unaware of critical changes, such as a new order or a status update.
- Cause mismatched states: For instance, a transaction might be logged in a database but not propagated to analytical or monitoring systems, creating inconsistencies.
Solution
- Use the Transactional Outbox Pattern: In this pattern, events are written to an βoutboxβ table within the same database transaction as the main data write. A separate process then reads from the outbox and publishes events reliably.
- Adopt Change Data Capture (CDC) tools: CDC tools like Debezium can monitor database changes and publish them as events automatically, ensuring no changes are missed.
4. Non-Backward-Compatible Changes
Changing event schemas without considering existing consumers can break systems. For example:
- Removing a field: A consumer relying on this field might encounter null values or fail altogether.
- Renaming or changing field types: This can lead to deserialization errors or misinterpretation of data.
Solution:
- Maintain versioned schemas: Introduce new schema versions incrementally and ensure consumers can continue using older versions during the transition.
- Use schema evolution-friendly formats: Formats like Avro or Protobuf natively support schema evolution, allowing you to add fields or make other non-breaking changes easily.
- Add adapters for compatibility: Build adapters or translators that transform events from new schemas to older formats, ensuring backward compatibility for legacy systems.