Building a Local Data Analytics Pipeline with dbt Core and DuckDB
TL;DR: This pipeline uses dbt Core + DuckDB locally — no infrastructure — to normalize domains, deduplicate URLs, enforce data contracts via tests, and materialize four analyst-ready mart tables fr...
TL;DR: This pipeline uses dbt Core + DuckDB locally — no infrastructure — to normalize domains, deduplicate URLs, enforce data contracts via tests, and materialize four analyst-ready mart tables from raw SERP API output. Press enter or click to view image in full size After web ingestion, you’ll have inconsistent domains, duplicate URLs across collection runs, null titles, and more. This is not wrong data, per se, just unprocessed data. The gap between “data in a table” and “data you can trust in a query” is bigger than you think. dbt (data build tool) is an open-source transformation framework that can help us with exactly that problem: you write SQL models, it materializes them in dependency order, and it tracks lineage from raw source to final output. Paired with DuckDB via the community dbt-duckdb adapter — no infrastructure needed, it’s all.duckdb files — it's a surprisingly capable local setup for closing that gap. I’ll walk you through the Python-based pipeline I use — one that