Sample Data Pipelines

Below are sample flowcharts illustrating some of the data pipelines that I have designed, deployed, and worked on

ETL pipline: Azure Cloud Services

Pipeline 1
  • Data ingestion through Azure Functions from multiple sources
  • Data storage in Azure Data Lake
  • Data transformation using Azure Data Factory
  • Data loading into Azure SQL Database
  • Interactive data visualization with Tableau Server

ETL BigData Pipeline: AWS Cloud Services

Pipeline 2
  • Ingest data from both cloud resources and internal systems
  • Perform data joining and processing using Amazon EMR with PySpark
  • Store processed data in a centralized data lake
  • Utilize AWS Lambda functions, triggered by CloudWatch events, for automation
  • Load the processed data into an S3 bucket for business analytics and use

ETL BigData Pipeline: Airflow hybrid Cloud Services

Pipeline 3
  • Ingest data from various sources using PySpark SQL
  • Extract internal information to S3 and MySQL server
  • Join data using PySpark
  • Extract additional internal information to MongoDB
  • Load the processed data into an internal data lake

BigData Ingestion ETL: parallel processing, Docker-K8

Pipeline 4
  • Ingest data using a Docker image deployed within a Kubernetes cluster, utilizing parallel processing scheduled with Apache Airflow
  • Execute transformation processes with Docker and Kubernetes
  • Ingest data from S3
  • Load the processed results into an internal data lake
  • Backup the results to S3

Back to Home

View Resume