Data architecture in large enterprises is a complex landscape of opinions, technologies, and purported best practices, many of which are even contradictory. It's easy to get lost in this jungle and waste time developing systems that solve the wrong problems. In my 30+ years of working with data, I've seen recurring myths that continue to trip up teams. These myths not only create confusion but also lead to expensive mistakes and inefficient projects.

This article debunks ten of the most common, persistent myths about data architecture. It's intended for engineers who want to create real value instead of chasing hype.

1. We need a central data platform to create enterprise value

Another variant on this: Just pick the right data technology stack to create enterprise value.

Centralized platforms are an outdated illusion in today's highly distributed system world. They ignore how enterprises actually work. The idea that all data can and should be consolidated into a single, massive platform is naive. Distributed systems require distributed thinking in data.

Let's face reality: No one builds a central application platform anymore.

So why expect one for data?

Domain-Driven Design (DDD) teaches us to divide our systems into distinct business areas. It's the best way to manage complexity and allows for the scaling of processes in the enterprise.

We need to understand that the same principle applies to data architecture. What we really need is to interconnect all applications and services in a distributed way. It's more like a data infrastructure or a data mesh, rather than a single data platform.

If you are now turning away in horror and think that the hype surrounding data mesh has already died down, then hold on for a moment.

Data mesh can be done right. To find out how, I recommend reading my series of articles on the Challenges and Solutions in Data Mesh. There's more on that in this post as well.

Choosing the latest shiny tech means little if you don't understand the domain and the meaning behind the data. Embedding business knowledge and semantics into the enterprise-wide data model is the real challenge.

Data engineers must be passionate about the company's business. It's about empowering the business by ensuring data flows seamlessly throughout the enterprise.

2. Data quality can be fixed downstream inside the data realm

The reality is that quality must be ensured at the source, not retrofitted later.

This is a lesson I learned at the very beginning of my career, working in the quality department of a German car manufacturer.

Patching up bad data downstream leads to fragile systems and lost trust. Quality assurance needs to start at the application level, where data products are created. Otherwise, it's like trying to fix the poor quality of the car production line downstream in the quality department – you can imagine what my colleagues thought of such an approach back then.

3. There is a single source of truth

This misunderstanding is really persistent. But what I observed in virtually every company I worked for: Enterprises have many opinions about reality tailored for different use cases, contexts, and purposes.

One client even went so far as to explicitly request that we create a data warehouse that would represent every alternative truth within the company, side by side. It made perfect sense to allow users to discuss these alternative viewpoints by making them explicit.

Truth is contextual and needs to be negotiated.

Lars Rönnbäck has written some excellent pieces on this topic. A vivid version but also a formal paper on modeling conflicting, unreliable and varying information.

4. We need to model everything up front before shipping anything

Modeling is not a one-time gateway but rather an ongoing conversation between top-down ontology and bottom-up domain realities.

It's about keeping the parts together by providing enough framework to enable a continuous, distributed modeling of your company's memory.

Models emerge from iteration and collaboration, not solely by definition.

Trying to define everything upfront or top-down is fantasy, not architecture.

5. SQL is all you need for data processing

Data engineers often propagate this notion, but I've never heard it from AI/ML engineers.

Why?

SQL is powerful for querying, but pushing all logic into SQL creates complexity and fragility — it turns into a patched templating language (dbt is greeting).

Imagine trying to write complex ML logic, fallback behavior, error handling, and version tracking in SQL.

Sure, it's all possible, but you're bending a language past its intent.

We hope to reduce cognitive overhead by standardizing everything on SQL. But what really happens: We push completely different requirements into the same notation schema that wasn't designed for it.

We should not restrict expressiveness to one single, opinionated language originally designed for database querying.

6. Data products are just datasets with better names

As a variation on this, Data products are just applications that provide data requested through API calls.

Data as a product is one of the four principles of the data mesh concept. It suggests that data is to be provided like a product — self-sufficient, self-describing, and with business value.

That's much more than a static dataset with some meaningful names or even with descriptive metadata. It's a product with full business context, including its provenance, that can exist outside any application as a pure data structure. It can easily be consumed by any other application or service without the need for a running instance of the source application.

7. Governance is about control and restrictions

In reality, good governance enables safe data sharing and agility, rather than imposing restrictions.

Control kills innovation, and therefore, governance must empower teams, not block them. We need to transform governance into a beneficial mechanism that ultimately provides a return on investment for users.

Governance is about providing guidance to enable federalization and, at the same time, motivate participation. If we define data as a product, we should also create a market for the product to govern its usage.

8. Data Mesh is just for analytical data

Data mesh promises to bridge operational and analytical data worlds that, unfortunately, have been partitioned for data consumers.

However, limiting data mesh to analytics overlooks the domain-driven integration required for true data products and real-time business value. Data architecture needs to embrace all participants in the system, whether operational or analytical.

9. Streaming ultimately replaces batch processing

The reality is that batch processing is a logical subset of streaming and can be more efficient if low latency is not required.

If we understand the relationship between both processing styles, we can decide when and how to use streaming instead of batch processing. Streaming is powerful, but it's not the silver bullet for future data processing.

10. The cloud is where your data becomes cheap and efficient

The reality is that this cloud is not a magic land of infinite scalability and low costs that you just have to move to.

The cloud extends your infrastructure, not your intelligence. Without discipline in design and architecture, it's just a faster way to overspend.

I know too well that the cloud is marketed as a place where you can store unlimited data, run infinite processing jobs, and scale effortlessly. But if we cut through the noise, it's just an outsourced data center that charges per byte and millisecond.

Without strong architecture principles, you'll simply move your inefficiencies from on-premise to the cloud. And you'll most likely end up paying more for the same confusion.

Real cloud-native applications abstract away the distinction between on-premises and cloud-based operations. Ultimately, these applications are independent of the infrastructure on which they are executed.

It's important to remember that extracting data from the cloud is significantly more expensive than processing it in the cloud.

So, once you start moving large volumes out of the cloud (including data transfer between different cloud vendors), you'll realize that the true cost is in the exit. Build your architecture around where your data actually resides.

These myths are not just theoretical mistakes. They lead to real-world cost, complexity, and missed opportunities.

Good data architecture is about understanding business context, collaborating across domains, and embracing evolution over control. Tools and technologies matter, but without business alignment and pragmatic iteration, they won't deliver any sustainable value.