Mastering Cloud Architecture Design: The Scalability Principle

Today, we'll talk about scalability, which is basically a system's ability to handle more work by adding more resources to it without sacrificing performance.

There are two ways to scale an architecture, and they are:

  • Vertical Scaling: Adding or removing resources from a single resource. It usually involves adding CPUs, memory or storage to a single computer. It's like giving your computer some steroids, it becomes bigger and stronger.

  • Horizontal Scaling: Adding more resources (nodes) to a system. It's like inviting more friends to your party, the more friends you have, the more fun it gets. In other words, it's like adding more servers to a distributed software application.

Now, not all architectures are built to handle a heavy workload. That's why we need to consider a few scenarios:

  • Stateless Applications: These applications don't store session information, which means that they can scale horizontally. You can use any compute resource to serve any request. It's like having multiple superheroes who can fight the bad guys at the same time.

  • Distribute Load to Multiple Nodes: To distribute the load to multiple nodes, we can use either a push model or a pull model.

    • Push Model: This model distributes incoming requests to multiple servers. We can use either a Network Load Balancer or an Application Load Balancer depending on the use case. It's like having multiple chefs in the kitchen, each one preparing a different dish.

    • Pull Model: This model uses an asynchronous message-driven architecture, where messages are stored in a queue, and multiple resources can pull and consume them in a distributed fashion. It's like having a food delivery service, where each driver delivers food to a different location.

  • Stateless Components: Consider storing only a unique session identifier in an HTTP cookie, and more detailed user information on the server-side. Instead of storing information on a local file system, store it in a database. If you're using AWS, DynamoDB would be a great choice. This way, the application can scale horizontally. It's like having a personal assistant who takes care of all your small tasks so that you can focus on the big ones.

  • Implement Session Affinity: For HTTP/HTTPS traffic, we can use the sticky sessions feature of an Application Load Balancer to bind a user's session to a specific instance. With this feature, an ALB will try to use the same server for that user for the duration of the session. It's like having a favorite bartender who knows your drink and always serves it to you.

  • Implement Distributed Processing: This approach is used for cases that involve processing large amounts of data or anything that can't be handled by a single compute resource. The distributed processing approach divides a task and its data into many small fragments of work, executes them in parallel across a set of compute resources. It's like having a group of people working together on a project, each one contributing their own unique skills.

If you're interested in learning more about scalability and AWS Cloud Best Practices, check out this link.