Customer Data Platforms on Google Cloud
AI/ML API Airflow BigQuery GCP Visualisation

Summary

Developed a comprehensive Data Lake + Data Warehouse architecture on Google Cloud Platform, seamlessly integrating BigQuery, Cloud Storage, and AI-driven analytics capabilities to enable data-driven decision making and business intelligence.

The Infrastructure 🏗️

I contributed to the development and evolution of a scalable Data Management Platform on Google Cloud Platform (GCP), supporting multiple clients with high-performance, reliable, and flexible solutions.

Multi-Zone Architecture 🔁

  • Raw Zone -> Ingested raw data stored in Google Cloud Storage as the source of truth.
  • Processed Zone -> Transformed and modeled data into star schemas using BigQuery.
  • Refined Zone -> Final datasets curated for BI tools, ML models, and AI applications.
  • Data Platform Architecture

    Platform Engineering ⚙️

  • Built and orchestrated data pipelines using custom extractors, Cloud Functions, and Apache Airflow.
  • Developed connectors for platforms like Salesforce, MySQL, SQL Server, and AS400 using Python and APIs.
  • Created client-facing dashboards in Google Looker Studio for KPI-driven insights.
  • Key Contributions 🧑‍💻

  • BigQuery & GCS Optimization -> Monitored query performance, optimized partitioning, and implemented efficient schema design using star models.
  • Airflow DAG Improvements -> Refactored and modularized DAGs, added retry logic and failure alerts to boost pipeline reliability.
  • Client-Centric Customization -> Tailored ingestion, transformation, and modeling logic to meet unique data needs across various industries.
  • BI & Data Science Support -> Delivered curated, well-documented datasets with validation checks, versioned SQL models, and clear lineage.
  • Reusable Tools & Visuals -> Built reusable data connectors and interactive dashboards to enable client self-service and improve data access.