API Optimization Tale: Monitor, Fix and Deploy on Friday

In the talk titled 'API Optimization Tale: Monitor, Fix and Deploy on Friday', Maciek Rzasa shares his experiences navigating the intricacies of deploying code in a complex system during a major transition from REST to GraphQL in their monolithic application. This presentation details the risks involved in making late deployments, particularly focusing on a disastrous Friday deployment that led to significant performance issues.

Key Points Discussed:

- Background and Context: Maciek introduces the team he’s part of, which is responsible for service extraction within a large Rails application of over a million lines of code. The billing extraction team’s goal is to transition specific parts of the monolith into independent services without disrupting ongoing operations.

- Step-by-Step Extraction Strategy: The team adopted a gradual extraction method, using feature flags to switch between the original REST API and their experimental GraphQL API. This approach included implementing safe fallbacks if failures occurred.

- Monitoring and Performance Issues: The need for comprehensive monitoring became evident when the team faced performance bottlenecks and timeouts after some initial failed testing. They employed tools like New Relic and developed a more refined monitoring system that logged request metrics and errors to troubleshoot effectively.

- Optimization Techniques: Maciek outlines various technical optimizations employed, including:

- Caching: To avoid excessive queries, the team implemented caching for billing records, which significantly reduced the number of requests to the service.

- Utilizing GraphQL: By switching to GraphQL, they were able to customize queries, reducing the amount of data returned, hence optimizing response times.

- Addressing Query Performance: Through careful analysis, they identified heavy queries and minimized them. For instance, they avoided repeated queries in a loop by preloading necessary data.

- Learning from Mistakes: The talk illustrates the significance of domain knowledge vs. technical knowledge and the mistakes made during implementation, including a severe production failure due to poor parameter sanitization, which the team learned to prevent in future deployments through comprehensive testing.

- Safe Deployment Culture: Maciek concludes with a discussion on fostering a safe deployment culture where mistakes are part of the learning process, emphasizing that it’s acceptable to learn from failures if one is dedicated to addressing them.

The overarching message is that effective API optimization and deployment strategies require vigilance, knowledge, and the capacity to iterate based on failures and successes. By applying known patterns in innovative environments, developers can navigate the complexities of modern software architecture successfully.

API Optimization Tale: Monitor, Fix and Deploy on Friday
Maciej Rzasa • online • Talk

Date: October 21, 2021
Published: unknown
Announced: unknown

I saw a green build on a Friday afternoon. I knew I need to push it to production before the weekend. My gut told me it was a trap. I had already stayed late to revert a broken deploy. I knew the risk.

In the middle of a service extraction project, we decided to migrate from REST to GraphQL and optimize API usage. My deploy was a part of this radical change.

Why was I deploying so late? How did we measure the migration effects? And why was I testing on production? I'll tell you a tale of small steps, monitoring, and old tricks in a new setting. Hope, despair, and broken production included.

This talk was delivered at EMEA on Rails, a virtual mega-meetup which took place on June 9, 2021.

EMEA on Rails 2021