Sample ETL Project Explanation
In my portfolio, I've showcased a comprehensive ETL project where I developed a robust pipeline to handle real estate rental data. The ingestion layer utilized AWS Lambda functions, scheduled by AWS EventBridge, to periodically extract data from the Zillow API. This data, which includes rent prices for San Francisco (SF), San Diego (SD), and New York (NY), was initially stored in AWS Redshift. Following this, another Lambda function triggered data transformations as it arrived, cleaning and structuring the data before loading it into a PostgreSQL database. Finally, an interactive dashboard was created using Django, providing dynamic visualizations of the rental trends across the three cities.
Link to the Interactive Dashboard
Steps
- Extract: Use AWS Lambda functions, scheduled by AWS EventBridge, to periodically pull rental price data from the Zillow API for SF, SD, and NY.
- Transform: Apply necessary data transformations with another Lambda function to clean and structure the data.
- Load: Store the cleaned data in a PostgreSQL database after initial staging in AWS Redshift.
- Visualize: Create an interactive dashboard in Django to provide dynamic visualizations of the rental price trends.
Participate in My Research Survey
I am conducting a simple survey as part of our research on data engineering domains related to my graduate studies. Your participation will provide valuable insights into the current trends, challenges, and advancements in data engineering. The survey is designed to be brief and should only take a few minutes of your time.
Start the SurveyTotal survey submissions so far: 5
Current time: