The Gap Between dbt Adoption and dbt Mastery

Getting started with dbt is easy—install the CLI, run dbt init, write a few SQL models, and you're off to the races. But as your dbt project grows from 10 models to 100 to 500+, the lack of structure and best practices becomes painful. Models take hours to run, tests fail inconsistently, documentation is incomplete, and new team members struggle to understand where to make changes. We've rescued dozens of dbt projects from this chaos.

After implementing dbt for over 50 organizations—from startups to enterprises—we've identified the patterns that separate successful dbt projects from struggling ones. These lessons come from real production environments managing hundreds of models, serving thousands of dashboard queries, and supporting business-critical decision-making. Whether you're starting fresh or refactoring an existing project, these practices will save you months of pain.

Project Structure: The Foundation of Scalability

The single most important decision in a dbt project is its folder structure. Poor structure leads to models in the wrong places, unclear dependencies, and constant refactoring. The best practice is a layered approach with clear separation of concerns: staging, intermediate, and marts layers.

The staging layer (models/staging/) contains one model per source table, handling light transformation only: renaming columns to standard conventions, casting data types, and basic cleaning. Every source system gets its own subfolder (staging/salesforce/, staging/stripe/, staging/postgres/). Staging models use the stg_ prefix. These models are the contract between your raw data and your business logic—keep them simple and stable.

The intermediate layer (models/intermediate/) contains business logic that doesn't fit cleanly into staging or marts: complex joins, deduplication, entity resolution, and calculated dimensions. These models are reusable building blocks referenced by multiple marts. Use the int_ prefix and organize by business domain (intermediate/customers/, intermediate/products/). This layer is where most of your transformation complexity lives.

The marts layer (models/marts/) contains business-facing models organized by department or use case: marts/marketing/, marts/finance/, marts/product/. These models power dashboards, reports, and data science workflows. They should be wide, denormalized tables optimized for query performance. Use clear business names (fct_orders, dim_customers) without technical prefixes in this layer.

Naming Conventions: Clarity at Scale

Inconsistent naming is the fastest way to create confusion in a dbt project. Establish clear conventions and enforce them through code review. We recommend: stg_[source]__[entity] for staging models (stg_salesforce__accounts), int_[entity]__[description] for intermediate models (int_customers__deduped), fct_[entity] for fact tables (fct_orders), and dim_[entity] for dimension tables (dim_products).

For columns, use snake_case consistently, prefix boolean fields with is_ or has_, suffix dates with _date or _at, and suffix IDs with _id or _key. Make column names self-documenting: order_created_date is better than created, customer_lifetime_value_usd is better than ltv. These conventions seem pedantic at first but become invaluable when you have hundreds of models and multiple team members.

Testing Strategy: Comprehensive but Pragmatic

Every dbt project starts with good testing intentions, but many end up with sparse, inconsistent tests. The key is making testing so easy that it becomes automatic. Start by testing every staging model: unique and not_null tests on primary keys, relationships tests for foreign keys, and accepted_values tests for status fields. These tests catch source data issues early.

For intermediate and marts models, test business logic: assert that revenue calculations are positive, that percentages are between 0 and 100, that record counts match expectations. Use custom tests for complex business rules. We typically see a healthy ratio of 2-3 tests per model in mature projects. Don't just test for the sake of testing—every test should catch a real failure mode you've experienced or anticipate.

Implement data quality checks with packages like dbt-expectations (an extension of Great Expectations) for advanced validation: expect_column_values_to_be_within_range, expect_column_values_to_match_regex, expect_table_row_count_to_be_between. These tests prevent subtle data quality issues that simple uniqueness checks miss.

Incremental Models: Performance at Scale

Incremental models are dbt's secret weapon for handling large datasets efficiently, but they're also the most commonly misconfigured feature. Use incremental models for large fact tables (millions+ rows) that grow over time, not for dimensions or small tables. The typical pattern: filter on a date column (where created_at > max(created_at) in production), include a lookback window to handle late-arriving data (where created_at > max(created_at) - interval 3 days), and use the merge strategy for upserts or insert_overwrite for immutable events.

Always include a full-refresh capability (dbt run --full-refresh) and test it regularly. Incremental logic bugs are insidious—they might run successfully but gradually corrupt your data over weeks. We've seen incremental models that haven't been full-refreshed in months accumulate silent errors. Schedule a full refresh monthly for critical incrementals to catch these issues.

Documentation: Making It Sustainable

dbt's documentation generation is powerful, but most teams under-utilize it. The goal isn't documenting everything—it's documenting what matters. Focus documentation efforts on: business-facing marts (explain what the table represents and how to use it), complex calculated fields (explain the business logic), and staging models (explain source system quirks and mappings).

Use schema.yml files to document models and columns, add descriptions at both model and column level, tag models by domain or use case, and include ownership information. Go beyond basic descriptions with markdown to add SQL snippets, example values, and usage guidelines. The documentation site becomes your team's single source of truth for understanding the data warehouse.

Performance Optimization: Speed and Cost Efficiency

As dbt projects grow, runtime becomes a bottleneck. A 10-model project runs in seconds; a 300-model project might take hours. Optimization strategies include using incremental models for large tables, leveraging dbt's selector syntax to run only modified models and their children (dbt run --select state:modified+), materializing frequently-queried intermediate models as tables instead of views, and using appropriate warehouse clustering and partitioning (Snowflake clustering keys, BigQuery partitions).

Monitor your model performance using dbt's artifacts (run_results.json shows execution time for each model). Identify the slowest 10% of models and optimize them specifically. Often a single inefficient join or subquery accounts for 80% of a model's runtime. Use dbt's query comments to track query performance in your warehouse's query history, making it easy to identify expensive transformations.

Collaboration and Governance

The best dbt projects have strong team practices: mandatory code review for all changes (no direct commits to main), development branches for individual work, pull request templates that require documenting changes and confirming tests pass, and regular refactoring sprints to pay down technical debt. Establish a style guide and enforce it through PR reviews or linters.

For larger teams, implement CODEOWNERS files to assign ownership by directory, use tags to organize models by domain, and create separate schemas for different deployment environments (dev, staging, prod). This governance prevents chaos as teams grow beyond 3-4 analytics engineers.

Continuous Improvement: From Implementation to Mastery

dbt mastery is a journey. Start with solid foundations (proper structure, basic tests, core documentation) and incrementally add sophistication. Every quarter, conduct a dbt health check: review test coverage, audit documentation completeness, analyze performance metrics, and refactor problematic areas. The teams with the best dbt projects treat them as living systems requiring continuous improvement, not one-time implementations.

At The Big Data Company, our dbt Transformation Setup service ($3,990) implements these best practices from day one. We scaffold your project with optimal structure, establish testing and documentation frameworks, implement CI/CD pipelines, and train your team on sustainable development practices. You get the benefit of our 50+ implementations without the trial-and-error. If you're ready to build a world-class dbt project—or rescue a struggling one—let's talk about how we can help.

dbt Best Practices: Lessons From 50+ Enterprise Implementations