Data Platform Modernization: How to Pick Your Stack in 2025 — Part 2

Not every team needs Kafka. Not every dashboard needs real-time.

Nam Nguyen

Data Engineer Things

· ~9 min read · July 24, 2025 (Updated: July 24, 2025) · Free: No

The 4L Framework: How to Pick the Right Stack?

In the first part of this article, we've gone through the preparation for modernization. What modernization is, and what it means for your enterprise.

You might be asking now: "How can we pick our stack then?".

It would be easier for me to hand you a list of tools and call it a day. But that would leave you with generic solutions that don't fit your problem.

I won't give you a list.

But I'll guide you through clarity and confidence on how to pick your own based on what you and your team actually need.

I call it the 4L Framework: Latency, Lineage, Leverage, and Long-term cost.

The 4L Framework — a practical lens to evaluate any modern data platform. Image created by author

Latency: How fast is fast enough?

Fresh data is great. Yet, the lower the latency, the higher the complexity.

Before rushing into real-time pipelines, ask yourself: Does your business suffer when data is delayed for an hour? 12 hours? A full day?

There are two fundamental approaches to processing data. Yet, they are built on different principles.

Batch and streaming.

With batch processing, you collect data for a fixed time window (e.g., an hour or a day). You then process all at once and move on to the next window.

This is also called bounded processing. Latency = time gathering data + processing time.

With streaming, you process data as it comes in. Since there's no clear start or end, it's often referred to as unbounded processing. Here, latency = processing time, which can be real-time or near real-time.

The shift from batch to streaming is a shift from high-latency, low-complexity to low-latency, high-complexity.

The tradeoff between latency and complexity — beware the streaming trap. Image by author.

So when stakeholders ask for a real-time dashboard, ask if they truly need it. A one-hour delay is often enough and can save you months of building a complex streaming system.

For example, marketing campaigns might run fine with daily data. Fraud detection? Not so much.

Lineage: Can you trust what you see?

When a dashboard shows a surprising number, what's the first question everyone asks?

"Where did this data come from?"

That's where lineage comes in — the ability to trace the flow of data from source to insight. In a small team, you might get away with tribal knowledge. But as your business grows, more people touch the data, and more layers get added. Eventually, nobody knows what happens between the raw input and the final chart.

I once worked with a client who spent a full day investigating why their sales KPI dropped 20%.

The root cause? An ingestion job silently failed upstream, and no one noticed for hours, because nobody had visibility into the data flow.

Without lineage, you're not fixing pipelines — you're chasing ghosts.

Navigating data platform without lineage feels like going through a maze. Photo by Mitchell Luo on Unsplash

Lineage helps you answer key questions:

What transformations happened to this data?
Who made the last change, and when?
Which downstream reports depend on this dataset?

In some industries — finance, healthcare, government — lineage isn't just nice to have. It's a requirement for compliance and audits. But even outside of those, it's a foundational layer for data quality, transparency, and accountability.

That doesn't mean you need a full-blown data catalog on day one. Sometimes, documenting your dbt models and building a habit of writing clear commit messages is enough to start.

The key is simple: if you can't explain your data, don't expect anyone to trust it.

Leverage: Can your team run this stack?

It's easy to dream of the "ideal" data platform. The one from conference talks and vendor decks.

But here's the catch: the best stack isn't the most powerful. It's the one your team can run.

Ask yourself: What skills does your team already have? What can they learn and support in the next 6–12 months?

No Airflow experience? Then, no Airflow. Flexibility means nothing if you can't operate it.

No infra team? Skip the DIY lakehouse.

There's no shame in picking simpler tools that "just work." You can always evolve later.

Leverage means moving fast with confidence — not burning out on tools nobody understands.

A good rule of thumb? Don't pick a tool that only one person knows how to use. You're building a platform, not a dependency on your most senior engineer.

Long-term Cost: What will this stack cost you over time?

Every tool looks affordable in isolation — until you add up the real cost over time.

Cost isn't only cloud bills. We are talking about TCO — Total Cost of Ownership.

It's maintenance, training, vendor lock-in, and how much complexity you're signing up for.

A "free" open-source tool might cost you two engineers to maintain. A managed service might save you time, but charges per row, per query, or "per breath".

Ask yourself: What will this decision feel like in 12 months? Will you still be able to scale it? Will you still want to?

Photo by Ibrahim Rifath on Unsplash

Also, think about the cost of change. How painful would it be to swap out this tool later? Can you abstract it behind clean interfaces, or are you hard-wiring it into every pipeline?

You don't need to build forever. But you should build with your eyes open. Sometimes, choosing "boring but predictable" is the most cost-efficient decision you can make.

You don't need the perfect stack.

You need a stack that fits your latency needs, supports your governance requirements, aligns with your team's capabilities, and won't crush you with hidden costs later.

But if you're still unsure where to begin, don't worry. Most teams fall into a handful of recognizable patterns.

Let's look at a few common stack archetypes — and see which tribe you belong to.

Stack Archetypes: Which Tribe Are You?

Every data team falls into a tribe — not better or worse, just different priorities.

Understanding your team's strengths and constraints is already half the battle.

I haven't worked with every data team, but I've worked with enough clients to spot the patterns. They are usually categorized within these archetypes.

Which tribe does your team belong to? Each has its own trade-offs. Image by author

The Startup Sprinters

Your whole data team is this one person working day and night. You care more about moving fast than fine-tuning Spark jobs. You want a stack you can deploy in a weekend and iterate on Monday.

In a typical startup chaos, your business can pivot overnight, and so can your data stack.

Priorities: simplicity, managed services, fast iteration.
Common tools: Fivetran, BigQuery/Snowflake, dbt Cloud, Looker, or Metabase.
Trade-offs: higher long-term cost, risk of vendor lock-in

If you find yourself among the Startup Sprinters, you might want to think about velocity when picking your stack.

The Open Source Hackers

You like to get your hands dirty. Your team has strong engineering chops, and you want full control over every part of the stack. If something breaks, you'd rather dive into the repo than open a support ticket.

You care about transparency, flexibility, and owning your infrastructure. Even if it means spending weekends wrangling YAML files. You build your pipelines with open-source tools because you can, and because you like to customize things to the last detail.

Priorities: flexibility, transparency, community-driven tools.
Common tools: Airbyte, dbt Core, Spark or Flink, Trino, Superset.
Trade-offs: high maintenance overhead, steep learning curve, risk of knowledge silos.

If you're part of the Open Source Hackers tribe, you're not merely picking tools — you're building your own system of trust, one commit at a time.

The Enterprise Architects

You might need 30 meetings to validate an architecture change.

You work in a large organization where data flows across dozens of teams, tools, and compliance checkpoints. Your platform needs to scale, stay secure, and survive audits. Moving fast is great — but not if it breaks lineage or triggers a GDPR incident.

You care about standardization, governance, and long-term stability. Your stack is designed to protect the business as much as enable it. Change is possible — but it has to go through review, documentation, and usually, a steering committee.

Priorities: governance, security, scalability, integration with legacy systems.
Common tools: Informatica, Collibra, Snowflake, Azure Data Factory, Power BI.
Trade-offs: slower delivery cycles, tool sprawl, resistance to experimentation.

If you identify with the Enterprise Architects, think sustainability and control first. Your stack isn't a tech choice; it's part of your risk strategy.

The Real-time Renegades

You don't wait for batch jobs. Your data needs to move as fast as your product. Whether it's fraud detection, live personalization, or operational dashboards that update by the second.

You build for speed and immediacy. Events flow through streams, transformations run on the fly, and stale data is broken data. Your architecture is built around real-time processing, and your team is comfortable managing distributed systems under pressure.

Priorities: low latency, continuous data flow, fast feedback loops.
Common tools: Kafka, Flink, Spark Structured Streaming, ClickHouse, Materialize.
Trade-offs: high complexity, harder observability, steeper learning curve.

If you live among the Real-time Renegades, latency isn't a number — it's a mindset. Make sure your team can keep up with the pace you're designing for.

The Pragmatic Blenders

You've seen the extremes — the fragile streaming setups and the never-ending batch pipelines — and you've decided to take a balanced path. You pick tools that work, not tools that trend. Real-time where it matters, batch where it's enough.

Your stack is modular, maintainable, and evolves as your business grows. You believe in choosing boring tools that solve exciting problems. And you care more about long-term clarity than short-term cool.

Priorities: flexibility, modularity, maintainability.
Common tools: dbt, Snowflake or a lakehouse, Airbyte, occasional Kafka, Power BI, or Looker.
Trade-offs: some complexity, need for good architecture discipline, risk of slow tool sprawl.

If you see yourself in the Pragmatic Blenders, you're likely building a platform that scales with both your data and your team's sanity.

These archetypes aren't rigid boxes. They're snapshots. Your team might start as a Startup Sprinter, evolve into a Pragmatic Blender, and eventually pick up habits from the Enterprise Architects. That's normal. That's growth.

The point isn't to chase the perfect stack. It's to build a platform that matches your team's reality, not someone else's blog post.

So, before you get distracted by yet another trending tool on the Internet, ask yourself: What tribe are we today? And where do we want to go next?

We're coming near the end of this article. If you're still unsure about how to pick the right stack to modernize your data platform, I've got one more thing for you: a tools decision tree.

Start here if you're still unsure. A simplified path based on real-world trade-offs. Image by author.

Take this as a grain of salt. It's based on my years of experience working with data teams and clients who seek modernization for their data platform.

One last thing.

Don't chase the "best" stack. Focus on the stack that solves your problems with the least amount of regret.

And whatever you choose, keep it boring, well-documented, and built for the team you have, not the team you hope to hire someday.

That's how you modernize with confidence.

Hello there, if you enjoy what I've written so far, chances are you'd love the other inspiring and thought-provoking stories of mine.

Feel free to follow me here on Medium or connect with me on LinkedIn. I regularly share insights on modern data architecture, AI in the real world, and career growth in tech.

Whether you're building your first stack or rethinking your current one, I'd love to hear how your journey is going.

Until next time, happy reading!

#data #data-engineering #data-platforms #data-science #big-data