Learning Notes #18 β Bulk Head Pattern (Resource Isolation) | Cloud Pattern
Today, i learned about bulk head pattern and how it makes the system resilient to failure, resource exhaustion. In this blog i jot down notes on this pattern for better understanding.
In todayβs world of distributed systems and microservices, resiliency is key to ensuring applications are robust and can withstand failures.
The Bulkhead Pattern is a design principle used to improve system resilience by isolating different parts of a system to prevent failure in one component from cascading to others.
What is the Bulkhead Pattern?
The term βbulkheadβ originates from shipbuilding, where bulkheads are partitions that divide a ship into separate compartments. If one compartment is breached, the others remain intact, preventing the entire ship from sinking. Similarly, in software design, the Bulkhead Pattern isolates components or services so that a failure in one part does not bring down the entire system.

In software systems, bulkheads:
- Isolate resources (e.g., threads, database connections, or network calls) for different components.
- Limit the scope of failures.
- Allow other parts of the system to continue functioning even if one part is degraded or completely unavailable.
Example
Consider an e-commerce application with a product-service that has two endpoints
/product/{id}
β This endpoint gives detailed information about a specific product, including ratings and reviews. It depends on the rating-service./products
β This endpoint provides a catalog of products based on search criteria. It does not depend on any external services.

Consider, with a fixed amount of resource allocated to product-service is loaded with /product/{id}
calls, then they can monopolize the thread pool. This delays /products
requests, causing users to experience slowness even though these requests are independent. Which leads to resource exhaustion and failures.

With bulkhead pattern, we can allocate separate client, connection pools to isolate the service interaction. we can implement bulkhead by allocating some connection pool (10) to /product/{id}
requests and /products
requests have a different connection pool (5) .

Even if /product/{id}
requests are slow or encounter high traffic, /products
requests remain unaffected.
Scenarios Where the Bulkhead Pattern is Needed
- Microservices with Shared Resources β In a microservices architecture, multiple services might share limited resources such as database connections or threads. If one service experiences a surge in traffic or a failure, it can exhaust these shared resources, impacting all other services. Bulkheading ensures each service gets a dedicated pool of resources, isolating the impact of failures.
- Prioritizing Critical Workloads β In systems with mixed workloads (e.g., processing user transactions and generating reports), critical operations like transaction processing must not be delayed or blocked by less critical tasks. Bulkheading allocates separate resources to ensure critical tasks have priority.
- Third-Party API Integration β When an application depends on multiple external APIs, one slow or failing API can delay the entire application if not isolated. Using bulkheads ensures that issues with one API do not affect interactions with others.
- Multi-Tenant Systems β In SaaS applications serving multiple tenants, a single tenantβs high resource consumption or failure should not degrade the experience for others. Bulkheads can segregate resources per tenant to maintain service quality.
- Cloud-Native Applications β In cloud environments, services often scale independently. A spike in one serviceβs load should not overwhelm shared backend systems. Bulkheads help isolate and manage these spikes.
- Event-Driven Systems β In event-driven architectures with message queues, processing backlogs for one type of event can delay others. By applying the Bulkhead Pattern, separate processing pipelines can handle different event types independently.
What are the Key Points of the Bulkhead Pattern? (Simplified)
- Define Partitions β (Think of a ship) itβs divided into compartments (partitions) to keep water from flooding the whole ship if one section gets damaged. In software, these partitions are designed around how the application works and its technical needs.
- Designing with Context β If youβre using a design approach like DDD (Domain-Driven Design), make sure your bulkheads (partitions) match the business logic boundaries.
- Choosing Isolation Levels β Decide how much isolation is needed. For example: Threads for lightweight tasks. Separate containers or virtual machines for more critical separations. Balance between keeping things separate and the costs or extra effort involved.
- Combining Other Techniques β Bulkheads work even better with patterns like Retry, Circuit Breaker, Throttling.
- Monitoring β Keep an eye on each partitionβs performance. If one starts getting overloaded, you can adjust resources or change limits.
When Should You Use the Bulkhead Pattern?
- To Isolate Critical Resources β If one part of your system fails, other parts can keep working. For example, you donβt want search functionality to stop working because the reviews section is down.
- To Prioritize Important Work β For example, make sure payment processing (critical) is separate from background tasks like sending emails.
- To Avoid Cascading Failures β If one part of the system gets overwhelmed, it wonβt drag down everything else.
When Should You Avoid It?
- Complexity Isnβt Needed β If your system is simple, adding bulkheads might just make it harder to manage.
- Resource Efficiency is Critical β Sometimes, splitting resources into separate pools can mean less efficient use of those resources. If every thread, connection, or container is underutilized, this might not be the best approach.
Challenges and Best Practices
- Overhead: Maintaining separate resource pools can increase system complexity and resource utilization.
- Resource Sizing: Properly sizing the pools is critical to ensure resources are efficiently utilized without bottlenecks.
- Monitoring: Use tools to monitor the health and performance of each resource pool to detect bottlenecks or saturation.
References:
- AWS https://aws.amazon.com/blogs/containers/building-a-fault-tolerant-architecture-with-a-bulkhead-pattern-on-aws-app-mesh/
- Resilience https://resilience4j.readme.io/docs/bulkhead
- https://medium.com/nerd-for-tech/bulkhead-pattern-distributed-design-pattern-c673d5e81523
- Microsoft https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead