Building a Data Warehouse: A Decision Framework for SMB CTOs
Most data warehouse projects over-engineer before shipping a single dashboard. Here is when you actually need one and how to set it up in weeks, not quarters.
We have audited dozens of data warehouse projects across SMBs and Series-A startups. The pattern that kills 80% of them is not the technology choice. It is starting too early, with too much, before there is a dashboard anyone uses.
This is not another tutorial on Snowflake versus BigQuery. It is the framework we walk every founder, CTO, and Director of Operations through before deciding whether to build a warehouse at all — and how to do it without burning six months of runway.
When you do not need a data warehouse yet
Most companies under €5M ARR ship a warehouse before they need one. Three signals that you should wait:
- Your team makes decisions from a single source today. A Holded export, a HubSpot dashboard, a single Metabase pointed at your production database. If that source is enough, you do not need a warehouse — you need to make it more reliable.
- You have fewer than 5 sources of truth. Below that threshold, ETL into a single SQL database is faster to build and easier to maintain than a warehouse stack.
- Nobody on the team writes SQL. Without at least one analyst or analytics-capable engineer, a warehouse becomes shelfware. dbt models nobody updates. Looker dashboards nobody trusts.
⚠️ Watch Out
If all three of these signals describe you, building a warehouse is a distraction. Fix the data quality of your one source first. We have written a separate piece on data readiness for AI that applies to BI projects as well.
When you do need one
The signal that tips the decision is concrete: you cannot answer a business question this week without manually merging two CSVs. If that has happened more than three times in the last month, you have outgrown your single source.
Specifically, build a warehouse when you have:
- 5+ disconnected SaaS tools (CRM, ERP, ad platforms, support, ecommerce).
- A reporting cadence that requires the same merge every week.
- An analyst who is becoming a full-time CSV cleaner.
- An AI or ML use case that depends on cross-source data (churn prediction, demand forecasting).
The decision framework: four questions
Before choosing a stack, answer these in order. The wrong answer to any of them changes the recommendation.
What is the daily data volume?
If you ingest under 10 GB per day, you are in BigQuery's free tier or close to it. Snowflake makes sense above that. Anything cloud-native beats on-premise for SMBs in 2026 — the operational tax of running your own Postgres-as-warehouse exceeds the cost of managed services within the first year.
Who maintains the models?
dbt is the only sustainable answer for SMB-scale teams. If you do not have someone who can write SQL and version-control models, postpone. A warehouse without governance becomes a graveyard of broken queries within six months.
What is the target latency?
If business decisions can wait 24 hours, batch ELT (Fivetran or Airbyte → BigQuery → dbt) is enough. If you need under-15-minute freshness for a customer-facing feature, the architecture changes — you need streaming ingestion, which doubles complexity and cost.
What is the BI layer?
Metabase is free, runs on a small VM, and covers 90% of SMB needs. Looker and Power BI cost more but integrate better into Microsoft and Google ecosystems respectively. Avoid Tableau for SMB unless you already pay for it elsewhere.
The 90-day minimal path
Most data warehouse projects we see fail because they try to load 200 source tables before shipping a single dashboard. The path that works:
Week 1-2: pick three sources, three KPIs
Not five sources. Three. Pick the ones that answer the most painful weekly question. Define exactly three KPIs you want on the dashboard. Write them down. Get sign-off from the person who will use them.
Week 3-4: ELT into a single table per source
Use Fivetran or Airbyte to land raw tables in BigQuery or Snowflake. No transformation yet. Goal: one schema per source, refreshed daily.
Week 5-8: dbt models for the three KPIs
Build only what those three KPIs need. Resist the urge to model "the whole business". Add tests on every column you display. If a test fails, the dashboard shows a warning before showing wrong numbers.
Week 9-12: ship one dashboard, get usage
Single dashboard, three KPIs, one user. Iterate based on what they actually do with it. Most teams discover at this point that they were building the wrong KPI all along.
After 90 days, with one dashboard delivering value, you can expand. Without that anchor, expanding is how you build a warehouse no one uses.
The stack we default to for SMBs
For Spanish SMBs and Series-A startups under €30M ARR, the boring stack works:
| Layer | Default choice | When to deviate |
|---|---|---|
| Warehouse | BigQuery (€) | Snowflake if multi-region or strict cost predictability needed |
| Ingestion | Airbyte (open source) | Fivetran if you need 200+ connectors out of the box |
| Transformation | dbt Core (free) | dbt Cloud only if you have ≥3 analysts |
| BI | Metabase (free, self-hosted) | Looker if Google Workspace shop, Power BI if Microsoft |
| Orchestration | dbt Cloud or GitHub Actions | Airflow only at €2M+ data ops scale |
Total monthly cost for the default stack at SMB scale: under €500/month including BigQuery, Airbyte Cloud (or self-hosted), and a small VM for Metabase. The savings versus an enterprise stack like Snowflake + Fivetran + Looker are typically 5-10x at this scale, with no capability gap for the workloads SMBs actually run.
Common pitfalls (in order of frequency)
- Modeling before knowing the questions. Building 50 dbt models with no consumer downstream. Symptom: the warehouse exists but no dashboards use it.
- No tests. Numbers in the dashboard slowly drift from reality. Trust erodes. Within six months people are exporting CSVs again.
- Forgetting governance. Two analysts, three definitions of "active customer". Fix this in Week 1 with a shared business glossary, not Week 50.
- Buying tooling for problems you do not have. A reverse ETL platform when you do not even ship the basic ETL yet. dbt Cloud when you have one analyst.
- Not budgeting for ongoing cost. The stack costs hours per week to maintain even after launch. Plan for 0.2-0.5 FTE permanently, or a fractional partner.
🎯 Key Takeaway
A data warehouse is the right move when you are doing the same manual CSV merge every week to answer a business question. It is the wrong move when you have not yet shipped a single dashboard from your existing source. Build one only when the painful question is repeatable, build it minimally, and ship one dashboard before expanding.
Where to start
If you are not sure whether you are in the "wait" or "build" group, the cheapest 30 minutes you will spend is a conversation with someone who has built this stack 20 times. We do that as a free Flash Audit — the deliverable is a one-page document with your three quick wins, not a sales call.
If you have already started and the project is stuck, the Deep Audit goes deeper: a full review of your current architecture, governance, and team capability, with a 180-day roadmap.
For the longer engagement work — building the warehouse with you — see our AI & Data services page or read our pipeline development services for the ETL side specifically.