Snowflake Cost Optimization

Summary

Led a comprehensive initiative to reduce Snowflake costs by implementing strategic optimizations to DBT models, establishing systematic query tagging, and developing intuitive dashboards using Superset for enhanced cost monitoring and management.

The Challenge 🧩

Snowflake was incurring high usage costs, but the organization lacked detailed insights into which processes, teams, or pipelines were consuming the most credits. This lack of visibility made it difficult to implement targeted optimizations or drive accountability.

My Approach 🚀

To solve this, I designed and implemented a full-stack cost observability solution, integrating seamlessly with our existing data pipeline.

Tagging at the Source
I integrated the dbt_snowflake_query_tags package (created by get-select) into our custom DBT Docker image. Then I configured query-comments in dbt_project.yml to automatically embed:

Metadata such as Airflow DAG name, task ID, and run ID if the user is Airflow.

A custom query tag for every other user.

query-comment:
  comment: |
    {% if target.user|lower == 'airflow' %} '{{ dbt_snowflake_query_tags.get_query_comment(node, extra={"airflow_dag":env_var('SNOWFLAKE_QUERY_TAGS_AIRFLOW_DAG',''),"airflow_task":env_var('SNOWFLAKE_QUERY_TAGS_AIRFLOW_TASK',''), "airflow_run_id":env_var('SNOWFLAKE_QUERY_TAGS_AIRFLOW_RUN_ID','')}) }}'
    {% else %}  '{{ dbt_snowflake_query_tags.get_query_comment(node, extra={"user_custom_query_tag":env_var('SNOWFLAKE_USER_CUSTOM_QUERY_TAG','')}) }}'
    {% endif %}

Additionally, I enhanced Meltano’s target-snowflake with query-tag support to propagate tags across the pipeline.

- name: target-snowflake    
  ...
  config:
      query_tag: '{"app":"meltano", "airflow_dag":"$SNOWFLAKE_QUERY_TAGS_AIRFLOW_DAG", "airflow_task":"$SNOWFLAKE_QUERY_TAGS_AIRFLOW_TASK", "airflow_run_id":"$SNOWFLAKE_QUERY_TAGS_AIRFLOW_RUN_ID"}'
  ...

Data Extraction & Modeling
I built a set of custom queries to parse Snowflake’s query history and extract all relevant tagging data. This allowed us to correlate credit usage with specific DAGs, tasks, and even individual DBT models.

select
    ...
    case
        when query_tag ilike '%"app": "meltano"%' then 'meltano'
        when query_text ilike '%"app": "dbt"%' then 'dbt'
    end as app,
    try_parse_json(replace(replace(substr(query_text, position('/* \'{', query_text) + 4), '\' */', ''), ';', '')) AS cleaned_query_text,
    case
        when app = 'meltano' then replace(cast(parse_json(query_tag)['airflow_dag']::string as string), '-', '_')
        when app = 'dbt' then replace(cast(cleaned_query_text['airflow_dag']::string as string), '-', '_')
    end as airflow_dag,
    case
        when app = 'meltano' then replace(cast(parse_json(query_tag)['airflow_task']::string as string), '-', '_')
        when app = 'dbt' then replace(cast(cleaned_query_text['airflow_task']::string as string), '-', '_')
    end as airflow_task,
    case
        when app = 'meltano' then replace(cast(parse_json(query_tag)['airflow_run_id']::string as string), '-', '_')
        when app = 'dbt' then replace(cast(cleaned_query_text['airflow_run_id']::string as string), '-', '_')
    end as airflow_run_id,
    ...
from snowflake.account_usage.queries
where true
    and date(start_time) >= current_date()
    and user_name = 'AIRFLOW'
    and (query_tag ilike '%"app": "meltano"%' or query_text ilike '%"app": "dbt"%')

Visualization & Monitoring
To surface these insights, I developed an interactive Superset dashboard. The granularity achieved made it possible to answer questions like: “Which DAG, task, and run consumed the most credits last week?” or “Which DBT model is the most expensive to run?”

The Outcome 🌟

This initiative provided unprecedented visibility into Snowflake usage and costs. As a direct result, multiple cost-saving measures were launched, leading to a reduction in overall Snowflake spending. The dashboard also became the go-to tool for monitoring ongoing costs and optimizing workloads.