Distributed Data Orchestration at Scale
Airflow Azure DBT Docker Kubernetes Meltano Snowflake

Summary

A multi-region orchestration platform built on Kubernetes, Airflow, Meltano, and DBT, powering compliant and scalable data workflows across 12 Snowflake accounts with robust observability and infrastructure-as-code practices.

A Multi-Cluster, Multi-Cloud Orchestration ☁️

I’ve worked on scalable, resilient ETL processes using Apache Airflow, Meltano, and DBT, deployed across a distributed Kubernetes (AKS) architecture built for flexibility, compliance, and performance.

Context & Challenge 🧩

The platform spans five Kubernetes clusters (3 US, 2 EU) with production, beta, and staging environments. It supports dynamic ETL pipelines that adapt per environment—varying DAGs, parameters, secrets, and configs—while orchestrating data across 12 Snowflake accounts in the US, Canada, and Europe under strict compliance (e.g., GDPR).

Data Platform Architecture

Key Contributions 🧑‍💻

I contributed to and maintained various parts of the orchestration platform, supporting its reliability, scalability, and compliance across multiple environments:

  • Maintained and evolved the orchestration platform for reliability and clarity.
  • Managed Airflow deployments via Helm with environment-specific DAG templating.
  • Built custom Meltano plugins and flexible ingestion pipelines.
  • Developed DBT transformations integrated tightly into Airflow workflows.
  • Automated CI/CD with GitHub Actions to sync configs and DAGs across clusters.
  • Implemented robust logging:
    • System logs to Datadog.
    • Airflow logs persisted on Azure Blob for debugging.
  • Supported data compliance via masking, audit logs, and access controls.
  • Outcomes & Learnings 🚀

    This project brought real-world scale and complexity to my data engineering experience.

  • Orchestrated complex DAGs across globally distributed clusters.
  • Enabled seamless data workflows across 12 Snowflake environments.
  • Delivered strong observability, quick troubleshooting, and full compliance.
  • Gained deep expertise in cloud-native orchestration, IaC, and cross-region data governance.
  • It strengthened my ability to balance performance, reliability, and compliance in large-scale data platforms.