What is data mesh?

May 23, 2024 Roberto Magalhães

Revolutionizing data at scale! Discover Data Mesh, its decentralized data architecture approach, and how it powers domain-driven data products and teams.

Nowadays, companies strive to migrate to more and more digital solutions, thus transforming into data-driven businesses whenever they can. This gives them an advantage, especially in a time of such considerable and continuous technological evolutions.

Unfortunately, sometimes companies ignore the structure of their data architecture and don't bother to scale it like they probably should. Companies that are serious about their digital transformation journey adopt technology known as data mesh. However, most companies end up asking “What is a data mesh?” initially.

This modern data management strategy helps companies improve their organization and productivity with discoverable, accessible, secure, and interoperable data.

Serviços de arquitetura de malha de dados 1

What is data mesh?

A data mesh is a type of data platform architecture that allows users to directly access data without having to transport it to data lakes or data warehouses. It also does not require the intervention of specialized data teams.

This decentralized data management strategy directly connects data owners, producers, and consumers. It organizes data by specific business domains, such as marketing, sales, and customer service, for example. This means that each domain-specific group owns and manages its data as a product.

This method reduces bottlenecks and data silos, improves decision-making, and sometimes even helps detect fraud or alert the company to any changes in supply chain conditions. It helps users think of data as a product that has a purpose within the business.

The data fabric relies on cloud-native or cloud platform technologies to scale and achieve data management goals. The main objective of this technology is to help a company obtain valuable and secure data products.

Data mesh architecture

The data mesh architecture comprises several components. To successfully implement and understand technology, companies and their technology partners must fully understand how these technologies work and relate to each other.

Data product – This is a published dataset accessible to other domains, like an API, for example. It can take the form of sales reports with KPIs, PDF files, or even machine learning models. The ownership of these products is often described in the metadata.
Data ingestion – This is the step where tools input raw data into the data platform. It requires specific tools that work according to domain-driven design principles. Data is ingested in batches or in a stream, in real time.
Clean data – Raw data requires processing and “cleaning” before any analysis or use. Domain teams are responsible for data cleansing and identifying how your domain's data requires specific processing.
Analytical Data – This type of processed data is what allows domains to gain business insights. Members can transform this data into visual presentations or apply data science and machine learning methods to better understand the data and identify trends and anomalies.
Federated Governance – This body is made up of representatives from all domains who must agree on global policies and other rules regarding the creation and operation of data products. Common discussions include interoperability, privacy, compliance policies, documentation, and accessibility processes.
Data platform – This infrastructure is accessible to all existing domains in the organization. It has all the necessary tools to ingest, store, query and visualize data. More advanced versions of data platforms directly enable users to create, monitor, discover, and access complete data products.
Enabling Team – The enabling team is the first piece of the data fabric architecture. Your responsibility is to disseminate the idea of data mesh within the company. They help domain teams become true data fabric experts by acting as consultants.

Benefits of using Data Mesh architecture in your company

Using data mesh architecture in an enterprise brings a wide variety of benefits. The first benefit of the data mesh is increased organizational agility. Decentralized data operations are the basis of this modality, as teams operate independently, reducing implementation time and operational bottlenecks.

Data is more discoverable and accessible across domains. This means there is more clarity about the value all data products provide. Each domain has greater autonomy and flexibility and is able to freely experiment and innovate without overwhelming data teams.

Using a self-service data platform brings automated data standardization, product lineage, monitoring, alerts, and many other benefits. This provides a competitive advantage compared to traditional data architecture.

The data mesh is also extremely cost-effective. It moves away from batch data processing and allows companies to adopt cloud data platforms and real-time data collection. Using cloud storage allows data teams to work with large clusters of data by only paying for the specific amount of storage needed.

When teams need additional space for a limited period of time, they can easily purchase additional compute nodes and cancel extra storage usage whenever necessary.

Adherence to federated computational governance improves data interoperability. Domains agree on how to standardize any data-related procedures, which makes it easier for them to access each other's data products. This also allows for better quality control.

Data Mesh vs. Data Mesh data mesh

Data fabric is a data architecture model that focuses on collecting different technologies used to collect and distribute data efficiently. It uses automation of data integration, engineering, and governance to create an interface between data providers and consumers.

While data mesh is data-centric and decentralized, data mesh is technology-centric and centralized. Its focus is combining the right technologies and bringing data to a unified location.

Data fabric and data mesh are not mutually exclusive and can, in fact, be complementary to each other. Some strategic parts of the data fabric sometimes improve with the data fabric through automation. This would result in faster creation of data products, enforcement of global governance, and easier combination of data products.

Data Mesh x Data Lake

A data lake works as a central repository that stores data. This low-cost storage environment receives data simply and relies on a central team to manage it. The type of data typically found in data lakes is that which immediately results from ingestion. Essentially, data lakes serve as containers for raw data without a defined purpose.

While this technology-based approach can be valuable for some companies, problems often arise. Once teams move data into a data lake, it automatically loses context. Users have access to many files, but they don't necessarily know which ones to use.

Because the data in the data lake is raw, data consumers often need help from the data lake team to understand the meaning of the data and solve problems. This causes significant IT bottlenecks.

How to migrate to a Data Mesh architecture

Migrating to a data fabric architecture requires many organizational changes and adjustments. Companies need to prepare for this change on several levels, including working with teams, changing data-related processes and updating their technology. Fortunately, companies have the ability to move to a data fabric architecture in four steps to improve datafication:

Treat data as a product – This requires standardizing a dataset and dashboard documentation while ensuring interoperability. Domains must catalog their data reliably and reliably to ensure data discoverability, security, and integrity.
Map domain ownership distribution – The second step is to address data product distribution. Using domain-driven design tools, companies can easily group data sets across different domains. Each domain has its datasets divided into different categories (orders, traffic, etc.).
Create a self-service data infrastructure – To access and manage newly available data products, teams need a self-service data infrastructure. All domains must agree on the technology used to build this platform, so that the construction and treatment of datasets is the same across all sectors.
Ensure federated governance – At this stage, representatives from each domain work on agreements and shared nomenclature. They must agree on implemented policies, documentation rules, problem-fix procedures, and more.

As mentioned earlier, adopting a data mesh architecture requires the company to change at different levels. It's important for business leaders to work closely with their team members to help them adapt to their new roles. Moving from a centralized model of data ownership to decentralized domains requires a shift in employee focus.

The Basics of Data Mesh

There are four basic principles behind the data mesh concept. These include domain-driven data ownership, data as a product, self-service data platforms, and federated computational governance.

Domain-based data ownership

A domain is a group of people organized into a common functional business department. The principle of domain-driven data ownership dictates that these domain teams take responsibility for their data.

They are responsible for incorporating, transforming, managing and delivering data to end users. This means that ownership of analytical and operational data is now decentralized and that each domain owns the entire lifecycle of its data products.

Data as a product

The data-as-a-product principle changes the way people think about data. Teams create data products across different domains for downstream consumers or users outside the team. These consumers then use the data products to create business value.

Data products serve different purposes within a company. They may be responsible for security, provenance and infrastructure issues, for example. They also have a duty to ensure that data is always kept up to date.

Domain teams keep up with the needs of other domains by providing them with high-quality data in the form of data products.

Self-service data platform

The idea behind a self-service data platform is that it is easily accessible and intuitive, allowing each member of each domain to create and manage their data products. The main goal of a self-service data infrastructure is to provide autonomy.

These platforms have a dedicated data platform engineering team that manages and operates the wide range of technologies used. Domains only need to worry about consuming and creating data products while the engineering team ensures platform functionality at all times.

Federated Computational Governance

Federated computing governance enables the creation of a data ecosystem in which all data products are interoperable. Unlike traditional data governance, this method allows the production of value through data.

Incorporating governance concerns into each domain's workflow leads to data standardization. Introducing usage metrics and reporting is also critical to help understand the individual value of data products.

When should a company adopt Data Mesh technology?

Adopting data mesh technology requires a major shift in the data management paradigm. During this process, teams must change their data management strategies, processes and, ultimately, the way they work. But this can lead them to innovation.

Data mesh primarily benefits larger organizations or companies that want to scale quickly by working with large, diverse, and constantly changing data sets. It's also an attractive idea for organizations competing based on the global strength of their data.

Adopting data mesh technology can also be a good idea for companies whose teams are already decentralized. If data teams are slowing down innovation efforts, they will also benefit from the data mesh.

Work with BairesDev on your data mesh project

Companies that want to adopt data fabric architecture but don't know where to start and don't have the time to fully dedicate themselves to this change can always try to outsource their data fabric projects to reliable vendors.

Outsourcing providers can easily understand the company's needs and assign different experts to assist with different stages of the data mesh project. Outsourced data fabric experts can help a company configure the data fabric by working as consultants.

For example, an outsourced data fabric expert can help a company determine the changes it needs to make before adopting data fabric architecture. They could help prepare domain teams for their new roles. Third-party data fabric experts could also help determine the best technology to build self-service data infrastructure and how to implement federated compute governance policies.