Enhancing Microservices Architecture: Eliminating Circular Dependencies and Optimizing Redis for Cost Efficiency
Tl;DR
We tackled challenges in a microservices by:
1. Eliminating Circular Dependencies: Introduced an event-driven architecture where cart service publishes cart data, and ancillary service listens, improving resilience and reducing latency.
2. Optimizing Costs: Merged Redis instances into a shared cache, reducing infrastructure costs and simplifying data access.
These improvements enhanced system performance, scalability, and cost-efficiency.
Introduction
In modern software architecture, microservices are often the backbone of scalable and modular systems. While they offer flexibility and independence, they also introduce complexities such as service dependencies, performance bottlenecks, and infrastructure costs. These challenges can hinder the efficiency of even the most well-designed systems.
In our case, we faced similar issues as it evolved to meet growing demands. Key challenges included circular dependencies between services that impacted performance and scalability, as well as rising infrastructure costs due to fragmented caching solutions.
This article details the systematic approach we implemented to address these challenges, specifically focusing on:
1. Eliminating circular dependencies to improve service decoupling and resilience.
2. Optimizing infrastructure costs by consolidating Redis instances into a shared caching solution.
Through these improvements, we achieved a more efficient, scalable, and cost-effective system while ensuring seamless customer experiences.
Act 1: Identifying the Bottleneck
The System Workflow
- A user selects a flight: The frontend (FE) interacts with the
cart
service to store flight cart data. - The user proceeds to the booking form, where ancillary services like baggage, meals, and seat selection are retrieved via the
ancillary
service.
The Problem: Circular Dependency
In the original design:
- The
ancillary
service needed cart data to display relevant ancillary options. To achieve this, it made API calls to thecart
service. - These API calls created a circular dependency:
cart service
→ stores cart data.ancillary service
→ callscart service
to fetch cart data.
Impact of Circular Dependency
- Performance Degradation: High traffic led to increased latency and failure rates as inter-service calls slowed down the system.
- Reduced Resilience: If
cart service
experienced downtime,ancillary service
also failed to function. - Limited Scalability: The system struggled to handle concurrent calls during peak booking times.
This design flaw was a significant bottleneck, especially during peak sales events or promotional campaigns.
Act 2: Implementing Strategic Solutions
To address the challenges, We introduced two key architectural improvements:
Improvement 1: Breaking the Circular Dependency
Proposed Solution: Event-Driven Architecture
To decouple cart service
and ancillary service
, We implemented an event-driven architecture where:
cart service
publishes cart events: Upon cart creation,cart service
emits ancart.created
message containing cart data.ancillary service
listens to the event: Theancillary
service subscribes to this event and stores the received cart data locally.
When a user retrieves ancillary services, ancillary service
uses the pre-fetched cart data, eliminating the need for an API call to cart service
.
Technical Details
- Event Bus: Used Kafka for reliable message brokering between
cart
andancillary
. - Payload Design:
cart.created
payload included critical information related cart data.
Impact
- Decoupled Services:
ancillary
no longer depends oncart
for real-time cart retrieval. - Improved Resilience: System continued to function even if one service was temporarily down.
- Enhanced Performance: Eliminated latency caused by repeated API calls between services.
Improvement 2: Optimizing Infrastructure Costs with Redis Merging
Initial Setup
Each microservice (cart
and ancillary
) maintained its own Redis instance for caching. While this provided data isolation, it:
- Increased Infrastructure Costs: Running multiple Redis instances led to higher memory and hosting expenses.
- Duplicated Data: Both services often cached overlapping data, wasting resources.
Proposed Solution: Shared Redis Cache
I implemented a shared Redis cache with appropriate namespacing to maintain data isolation. Here’s how it worked:
Centralized Caching:
cart
writes cart data to Redis when a cart is created.ancillary
reads the same data directly from Redis when retrieving ancillary options.
Namespace Design:
- Each service used unique keys (e.g.,
cart:{cartId}
) to avoid collisions and ensure data segregation.
Tech Trade-offs
While merging Redis instances brought significant benefits, it also introduced specific trade-offs that required careful consideration:
- Data Consistency Risks: Sharing a single Redis instance across multiple services increased the likelihood of stale data if proper cache invalidation policies were not enforced. To mitigate this, I implemented strict TTL (time-to-live) settings and established clear cache invalidation processes.
- Loss of Microservice Isolation: By consolidating Redis instances, services became more dependent on shared infrastructure. This risked potential cross-service interference if namespaces were not managed effectively. To address this, I designed a robust namespace structure that ensured complete data segregation for each service.
While these trade-offs presented challenges, proactive strategies and safeguards ensured that the benefits of cost reduction and simplified architecture outweighed the risks.
Impact
- Cost Savings: Reduced infrastructure costs by consolidating Redis instances.
- Simplified Data Access: Eliminated event-based publish/subscribe mechanisms for certain operations, relying instead on simple cache lookups.
- Faster Access: Cache lookups significantly reduced data retrieval latency.
Act 3: Delivering Results
Performance Metrics
The architectural improvements led to measurable benefits:
- Latency Reduction: Ancillary service response time decreased
- Increased Resilience: The decoupled architecture ensured services remained operational even during isolated service downtimes.
- Cost Optimization: Merging Redis instances reduced infra costs.
Business Impact
- Improved User Experience: Faster and more reliable ancillary retrieval ensured seamless booking experiences.
- Scalability: The system handled a 2x increase in traffic during promotional campaigns without degradation.
- Operational Efficiency: Developers spent less time troubleshooting inter-service dependencies.
Lessons Learned and Future Steps
Key Takeaways
- Event-Driven Architecture is a powerful tool for breaking circular dependencies and decoupling microservices.
- Shared Caching can be a cost-effective solution but requires careful management to avoid stale data and ensure isolation.
Future Improvements
- Enhanced Monitoring: Implement automated alerts for cache TTL expirations to detect potential data consistency issues.
- Redis Cluster Setup: Explore Redis clustering for higher fault tolerance and scalability.
Conclusion
This transformation showcases how strategic improvements can resolve deep-rooted architectural issues while optimizing costs. By leveraging event-driven architecture and shared caching, we built a robust, scalable system ready to handle future demands. These changes not only enhanced performance but also positioned the system as a reliable backbone for business growth.