Saikat Kumar Dey

C1: Reliable Scalable and Maintainable Applications

June 6, 2022


  • Applications today are data-intensive as opposed to compute-intensive.
  • Standard building blocks of data-intensive applications
    • Store data in databases to make searching easier.
    • Speed up reads by storing them in caches
    • Full-text search using search indexes
    • Async data handling via stream processing
    • crunch large amount of data via batch processing
  • For each building blocks mentioned above, there are many different options available to us. It could be difficult to know how to combine the tools as per our requirement.

Thinking about data systems

Three important concerns in most software systems:

  • Reliability: system should continue to work correctly even in the face of adversity
  • Scalability: system should be able to handle growth in traffic
  • Maintainability: different people should be able to work on the system productively.


  • Reliability means, “continuing to work correctly, even when things go wrong”.
  • Systems that anticipate faults and can cope with them are called fault-tolerant or resilient.
  • Fault is different from failure. Fault means that a part of the system doesn’t work as intended, failure means the entire system stops working altogether. Our objective is to ensure that a fault doesn’t lead to failure. As they say, prevention is better than cure!
  • Reliability is important not just for critical applications like nuclear power plants & air traffic controls but for non-critical applications (like, say a photo sharing app). Why? We have a responsibility to our users.
  • Only situation where we could sacrifice reliability is when we’re building prototypes or minimizing operational costs for a service with a narrow margin. However, we should be aware of when we’re cutting corners.


  • We use the term scalability to describe a system’s ability to cope with increased load.
  • If our system grows in a particular way, what are our options for coping with the growth? How can we add computing resources to handle additional load?
  • Before we address growth questions, we need to describe the current load on the system.
  • What is load?
    • load can be defined using load parameters.
    • load parameters depend on the architecture of the system. It could be
      • requests per second for a web server
      • reads:writes in a database
      • concurrent users in a chat room
      • hit rate on a cache, etc
    • Example: Distribution of followers per user is a key load parameter for scalability of twitter since it determines the fan-out load.
      • fan-out: number of requests to other services that we need to make to serve one incoming request.
      • context: each user follows many people and each user is followed by many people.
      • for simple users with few followers/following few people, it’s easy to join different tables to display user’s timeline.
      • for “celebrities” with millions of followers, the simple approach won’t work. Celebrity tweets for the followers of a celebrity are cached in their respective timeline “caches”.
      • twitter uses a hybrid approach for treating different types of users.

Describing Performance

How to investigate the impact of load increase on your system?

  • When you increase a load parameter and keep the system resources unchanged, how is the performance of the system affected?
  • When you increase a load parameter, how much do you need to increase the system resources to keep the performance unchanged?

We need a performance measure to answer these questions.

For web services, usually “response time” is of interest. Often times, “average response time” is stated which is not helpful - it doesn’t tell us how many users experienced the delay. Use percentiles instead.

To figure out how bad your outliers are, look at 95th, 99th and 99.9th percentiles aka p95, p99 and p999. If the 95th percentile response time is 1.5 sec, that means 95 out of 100 requests take less than 1.5 secs.

When several backend calls are needed to serve a request, it takes just a single slow request to slow down the entire end-user request.

Approaches for coping with load

scaling up aka vertical scaling means upgrading to a more powerful machine.

scaling out aka horizontal scaling means distributing the load across several smaller machines. This is also called shared-nothing architecture.

A good architecture is a pragmatic mixture of both.

Elastic systems automatically add computing resources when they detect a load increase. Such systems are good when the load is highly unpredictable but manually scaled systems are simpler.

There’s no magic scaling sauce. The architecture depends on the problem at hand - volume of reads/writes, volume of data to store, etc. For example, a system that handles 100k RPS, each 1KB in size is very different from a system that handles 3 RPS, each 2 GB in size.

Before scaling, think about which operations are common and which ones are rare. Figure out the load parameters. As a startup with an unproven product idea, you should be able to iterate quickly rather than trying to scale some hypothetical feature load that might come in the future. See How We’ve Scaled Dropbox for some inspiration!