Construindo aplicativos escalonáveis Node JS: práticas recomendadas, ferramentas e padrões para desempenho ideal

Building Scalable Node JS Applications: Best Practices, Tools, and Patterns for Optimal Performance

June 3, 2024 Roberto Magalhães

Learn how to build scalable and reliable NodeJs applications. Get the best tips and tricks to ensure your apps are optimized for success!

In an increasingly digital world, leveraging Node JS development services to build scalable, high-performance applications is not just a benefit, but a necessity. This article delves into the world of NodeJS, a runtime environment preferred by many developers around the world, offering best practices, key tools, and strategic patterns to increase the performance of scalable Node JS applications. Whether you're a newbie diving into Node JS development services or an experienced developer aiming to refine your application, this article will guide you through the fundamental steps to transform your NodeJS application from simply working to great. Harness the power of these insights and strategies to build scalable Node JS applications that do more than just meet your performance expectations – they exceed them.

What is NodeJS?

NodeJs is a JavaScript runtime built on Chrome's V8 JavaScript engine that uses a non-blocking, event-driven I/O model. That is, with NodeJs, developers can run Javascript code on the server side, which allows Javascript developers to write front-end and back-end applications. The nature of having a single programming language across the entire stack is just one of the many selling points of NodeJs. Some of the others are:

NodeJs is asynchronous and event-driven, which means that when an operation is being conducted, if that operation takes a long time to complete, the application can continue to perform other operations while waiting for the first one to complete. This feature makes NodeJs applications efficient and fast.
Being built on the V8 Javascript engine, NodeJs is very fast in executing code.
NodeJs has a large community of developers. This means there are lots of resources to learn when one is stuck, and lots of libraries to use to make development easier.
NodeJs is cross-platform. It can run on Windows, Linux and Mac OS. And since it's basically Javascript, but server-side, it's easy to learn, use, and find developers. It's not difficult to create a team that can write NodeJs, React Native, and ReactJS applications to cover all parts of the development process.
NodeJs is lightweight. It doesn't consume many resources and is easy to scale. In backend development, scaling means that an application can handle more requests per second without crashing or slowing down, making the user experience smoother. Since scaling is the main focus of this article, we will discuss it in more detail.

Understanding the event loop in NodeJS

Before we get into scaling, let's take a quick look at what the event loop is. The event loop is a fundamental concept in NodeJs development. It is a single-threaded mechanism that runs incessantly and manages the execution of tasks such as reading files, querying databases or making network requests asynchronously in a NodeJs application. Instead of waiting for a task to complete, NodeJs register callback functions to be executed as soon as the operation in question is complete. This non-blocking nature of NodeJs makes it very fast and highly scalable if the right techniques are used.

What is sizing?

Scalability, in the simplest sense, is the application's ability to handle many requests per second at once. There are two more terms in scaling terminology: vertical and horizontal scaling. Vertical scaling, also known as vertical scaling, refers to the process where an application's ability to handle requests is improved by upgrading its resources such as adding more RAM, increasing CPU, etc. Horizontal scaling, on the other hand, known as scale-out is the process where more instances are added to the server.

Scaling in NodeJS with multiple instances

First, let's ask the question: Why climb? Quite simply, in our age of many users, an application that cannot handle all of the requests it receives from all of its users cannot hope to stay in the game.

And as backend developers, we need to make sure our application is fast, responsive, and secure. Scaling helps developers achieve better performance as they can distribute the workload across multiple instances or nodes, handle more traffic, and create fault tolerance, which is the process of having multiple instances so that if one instance fails , the other instances can take over and keep the Node JS application running.

Now, while some other programming languages like Go can handle concurrent requests by default, NodeJs, due to its single-threaded nature, handles operations a little differently. Therefore, the techniques used for climbing also vary.

NodeJs is fast. Very fast. However, due to its single-threaded nature, it may not be able to handle multithreading as it can only execute one thread at a time. Having too many requests at the same time can result in the event loop blocking.

How to scale Node JS applications

There are different methods for scaling Node.JS applications. Let's look at some of them briefly, such as microservices architecture, cluster module, and database optimization.

Microservices architecture

Node JS microservices architecture is the software development process that consists of independent, loosely coupled entities. Each service is a different Node JS application that is developed and deployed, and they can communicate with each through HTTP requests or messaging services like RabbitMQ or Apache Kafka. This method of software development, rather than bringing everything together in a monorepo, allows developers to focus on each service independently and implement necessary changes without directly affecting others. While it should be noted here, the advantages of microservices are a debated concept and should be used with caution.

To understand microservices architecture, let's look at a hypothetical example of an e-commerce application. This application can be divided into microservices such as Product, Cart and Order. Each microservice is developed and deployed independently.

For example, the Product microservice may be responsible for managing product data in the system. It would provide CRUD endpoints and expose an HTTP API that other microservices can use to interact with product information.

The Cart microservice could handle all cart management features like adding items, changing quantities, calculating totals, etc. It would also expose an API for other microservices to create carts and update them. And the Order microservice can enable order creation, payment processing, status tracking, and more. It would provide APIs for cart checkout and order search functions.

By separating concerns into autonomous, decoupled microservices, the application becomes easier to scale and maintain. Each microservice focuses on a specific domain capability while working together to deliver the complete application experience.

For example, the Cart microservice would handle all shopping cart functionality – adding items, updating quantities, calculating totals, etc. It would manage the cart data in its own database.

The Order microservice would provide endpoints for placing orders, querying order history, and integrating the Cart and Product microservices. It serves as a bridge between the cart and product data/functionality.

This way, each microservices team can focus on their specific part of the application. The cart team manages cart features, the product team handles product data and APIs, and the order team handles order processing and integration.

In theory, this separation of concerns by domain speeds up development by dividing work and reducing feature overlap between teams. It also promotes independence and weak linkages between services. Each microservice is less dependent on other parts of the system, reducing the side effects of changes and increasing reliability.

Cache

Caching is a technique used to improve the performance and scalability of Node.js applications by temporarily storing frequently accessed data for quick lookup.

Consider this example: We need to build an application that searches and displays museum data – images, titles, descriptions, etc. There is also pagination to allow users to view different pages of data.

Each paged request can fetch 20 items from the museum's public API. Since it is a public API, it is likely rate limited to prevent abuse. If we request data from the API on every page change, we will quickly hit these rate limits.

Instead, we can use caching to avoid redundant API calls. When the first page of data is requested, we cache it locally. On subsequent visits to the page, we first check whether the data is in the cache. In this case, we return cached data to avoid exceeding rate limits.

The cache provides fast searching of already obtained data. For public APIs or any data that doesn't change frequently, caching can massively improve performance and reduce costs/limits on backend services.

A great way to solve this problem is to cache data using a caching service like Redis. It works like this: we take the API data from page number 1 and store it in Redis, in memory.

Then, when the user switches the page to page 2, we send a request to the museum database as normal.

But caching really demonstrates its value when a user returns to an already visited page. For example, when the user returns to page 1 after viewing other pages, instead of sending a new API request, we first check whether the data for page 1 exists in the cache. If this happens, we return the cached data immediately, avoiding an unnecessary API call.

Only if the cache does not contain the data do we make the API request, store the response in the cache, and return it to the user. This way, we reduce duplicate API requests as users revisit pages. By serving from the cache whenever possible, we improve performance and stay within API rate limits. The cache acts as a short-term data store, minimizing calls to the backend.

Practice: Cluster Module, Multithreading and Work Processes

Theory without practice is only half the work done. In this section, we will look at some of the techniques we can use to scale NodeJs applications: module clustering and multiple threading. We will first use NodeJS's built-in cluster module, and once we understand how it works, we will use the process manager, pm2 package, to make things easier. Next, we'll change the example a bit and use the worker threads module to create multiple threads.

Grouping module

Now since NodeJs is single threaded, no matter how many cores you have, it will only use a single core of your CPU. This is completely acceptable for input/output operations, but if the code consumes too much CPU, your Node application may end up with performance issues. To solve this problem, we can use the cluster module. This module allows us to create child processes that share the same server port as the parent process.

This way we can take advantage of all the CPU cores. To understand what this means and how it works, let's create a simple NodeJs application that will serve as an example.

We will start by creating a new folder called nodeJs-scaling and inside that folder we will create a file called no-cluster.js. Inside this file, we will write the following code snippet:

 const http = require("http");

 const server = http.createServer((req, res) => {
  if (req.url === " {
    res.writeHead(200, { "content-type": "text/html" }); 
res.end("Home Page");
 } else if (req.url === "/slow-page") {
 res.writeHead(200, { "content-type": "text/html" });
 // simulate a slow page
 for (let i = 0; i < 9000000000; i++) {
 res.write("Slow Page");
 }

 res.end; // Send the response after the loop completes
 }
 });

 server.listen(5000, => {
 console.log("Server listening on port: 5000....");
 });

Here, we start by importing the built-in NodeJs HTTP module. We use it to create a server that has two endpoints, a base endpoint and a slow page endpoint. What we want with this structure is that when we go to the base endpoint, it will run and open the page normally. But as you can see, because of the for loop that will be executed when we reach the end point of the slow page, the page will take a long time to load. While this is a simple example, it's a great way to understand how the process works.

Now, if we start the server running node cluster.js and then send a request to the base endpoint via CURL, or just open the page in a browser, it will load very quickly. An example of a CURL request is curl -i Now, if we do the same with curl -i, we will notice that it takes a lot of time and can even result in an error. This is because the event loop is blocked by the for loop and cannot handle any other requests until the loop completes. Now, there are a few ways to solve this problem. We will first start by using the built-in cluster module and then we will use a useful library called PM2.

Integrated cluster module

Now let's create a new file called cluster.js in the same directory and write the following snippet inside it:

 const cluster = require("cluster");
 const os = require("os");
 const http = require("http");
 
// Check if the current process is the master process
 if (cluster.isMaster) {
 // Get the number of CPUs
 const cpus = os.cpus .length;
 console.log(`${cpus} CPUs`);
 } else {
 console.log("Worker process" + process.pid);
 }

Here we start by importing the cluster, operating system and http modules.

What we will do next is check whether the process is the master cluster or not; If so, we are recording the CPU count.

This machine has 6, it would be different for you depending on your machine. When we run node cluster.js we should get a response like “6 CPUs”. Now, let's modify the code a little:

 const cluster = require("cluster");
 const os = require("os");
 const http = require("http");

 // Check if the current process is the master process
 if (cluster.isMaster) { 
// Get the number of CPUs
 const cpus = os.cpus .length;

 console.log(`Forking for ${cpus} CPUs`);
 console.log(`Master process ${process.pid} is running`);

 // Fork the process for each CPU
 for (let i = 0; i < cpus; i++) {
 cluster.fork;
 }
 } else {
 console.log("Worker process" + process.pid);
 const server = http.createServer((req, res) => {
 if (req.url === " {
 res.writeHead(200, { "content-type": "text/html" });
 res.end("Home Page");
 } else if (req.url === "/slow-page") {

 res.writeHead(200, { "content-type": "text/html" });

 // simulate a slow page
 for (let i = 0; i < 1000000000; i++) {
 res.write("Slow Page"); // Use res.write instead of res.end inside the loop
 }
 
res.end; // Send the response after the loop completes
 }
 });

 server.listen(5000, => {
 console.log("Server listening on port: 5000....");
 });
 }

In this updated version, we are forking the process for each CPU. We could have written cluster.fork a maximum amount of 6 times as well (as this is the CPU count of the machine we are using, it would be different for you).

There is a problem here: we should not succumb to the tempting idea of creating more forks than there are CPUs, as this will create performance problems instead of solving them. So what we are doing is forking the process for each CPU via a for loop.

Now if we run node cluster.js we should get a response like this:

 6 CPUs
 Master process 39340 is running
 Worker process39347
 Worker process39348
 Worker process39349
 Server listening on port: 5000....
 Worker process39355
 Server listening on port: 5000.... 
Server listening on port: 5000....
 Worker process39367
 Worker process39356
 Server listening on port: 5000....
 Server listening on port: 5000....
 Server listening on port: 5000....

As you can see, all these processes have a different ID. Now, if we try to open the slow page endpoint first and then the base endpoint, we will see that instead of waiting for the long for loop to complete, we will get a faster response from the base endpoint.

This is because the slow page endpoint is being handled by a different process.

PM2 Package

Instead of working with the cluster module itself, we can use a third-party package like pm2. Since we will be using it in the terminal, we will install it globally by running sudo npm i -g pm2. Now, let's create a new file in the same directory called no-cluster.js and fill it with the following code:

 const http = require("http");

 const server = http.createServer((req, res) => {
  if (req.url === " { 
res.writeHead(200, { "content-type": "text/html" });
 res.end("Home Page");
 } else if (req.url === "/slow-page") {
 res.writeHead(200, { "content-type": "text/html" });

 // simulate a slow page
 for (let i = 0; i < 9000000000; i++) {
 res.write("Slow Page"); // Use res.write instead of res.end inside the loop
 }

 res.end; // Send the response after the loop completes
 }
 });

 server.listen(5000, => {
 console.log("Server listening on port: 5000....");
 });

Now that we've learned how to run multiple processes, let's learn how to create multiple threads.

Various topics

While the cluster module allows us to run multiple NodeJs instances that can distribute workloads, the worker_threads module allows us to run multiple application threads on a single NodeJs instance.

Therefore, the Javascript code will be executed in parallel. We should note here that code running in a worker thread runs in a separate child process, preventing it from blocking our main application.

Let's look at this process in action again. Let's create a new file called main-thread.js and add the following code:

 const http = require("http");
 const { Worker } = require("worker_threads");

 const server = http.createServer((req, res) => {
  if (req.url === " {
    res.writeHead(200, { "content-type": "text/html" });
    res.end("Home Page");
  } else if (req.url === "/slow-page") {
 // Create a new worker
    const worker = new Worker("./worker-thread.js");
    worker.on("message", (j) => {
      res.writeHead(200, { "content-type": "text/html" });

      res.end("slow page" + j); // Send the response after the loop completes
    });
  }
 });
 
server.listen(5000, => {
 console.log("Server listening on port: 8000....");
 });

Let's also create a second file called worker-thread.js and add the following code:

 const { parentPort } = require("worker_threads");
 // simulate a slow page
 let j = 0;
 for (let i = 0; i < 1000000000; i++) {
 //   res.write("Slow Page"); // Use res.write instead of res.end inside the loop
  j++;
 }

 parentPort.postMessage(j);

Now, what's going on here? In the first file, we are destructuring the Worker class from the worker_threads module.

With worker.on plus a fallback function, we can listen to the worker-thread.js file that sends its message to its parent, which is the main.thread.js file. This method also helps us to execute parallel code in NodeJs.

Conclusion

In this tutorial, we discuss different approaches to scaling NodeJs applications such as microservices architecture, memory caching, using cluster module, and multithreading. practice. It is always crucial to work with a reliable third-party NodeJS development partner or hire NodeJS developers who are competent and capable of implementing any required functionality perfectly.

If you liked this article, check out our other guides below;

Change Node Version: A Step-by-Step Guide
Unlocking the Power of Websocket Nodejs
Best Text Editors and Node JS IDE for App Development
Best practices for increasing security in Node JS

Conclusion

How can I use the Cluster module in Node.js to improve scalability?

The Cluster module in Node.js allows you to create child processes (workers) that run simultaneously and share the same server port. This leverages the full power of multiple cores on the same machine to process all requests in parallel (or at least a large number of them), which can significantly improve the scalability of your Node.js application.

What role does PM2 play in Node.js scalability and how does it differ from the built-in process manager?

PM2 is a powerful process manager for Node.js that provides several features in addition to the built-in Cluster module, such as automatic restarts on failures, zero downtime reloads, and centralized logging. It also simplifies cluster management by providing an easy-to-use command-line interface. These features make PM2 a popular choice for managing and scaling production Node.js applications.

How does in-memory caching improve the performance and scalability of a Node.js web application?

In-memory cache like Redis stores frequently accessed data in memory, reducing the need for expensive database operations. This can significantly increase the performance and scalability of your Node.js web application and, when combined with a load balancer, should provide significant performance improvements. By serving cached data and applying a load balancer, you can handle more requests faster, improving user experience and allowing your application to scale more effectively to handle high loads. However, it is crucial to implement a robust cache invalidation strategy to ensure data consistency.

Source: BairesDev