Ehsan Soltanmohammadi

Email: Ehsans@email.sc.edu, Ehsannsam@gmail.com

Projects

Big Data Pipeline for Healthcare Data Integration

Designed and implemented a big data pipeline for integrating diverse healthcare datasets, including patient records, medical imaging, and lab results. The pipeline leveraged Apache Spark for distributed data processing, optimized ETL processes, and ensured HIPAA compliance. Containerized the entire pipeline using Docker, and deployed it on a Kubernetes cluster, which allowed for scalable and resilient data processing. This setup resulted in a 40% reduction in data processing time and improved data accessibility for healthcare professionals.

Technologies: Apache Spark, Python, Hadoop, Docker, Kubernetes, AWS EMR, S3, Redshift
Real-Time Data Processing System for Insurance Claims

Developed a real-time data processing system to handle high-frequency insurance claim data. Implemented using Apache Kafka and PySpark, the system ingested and processed data streams, providing near real-time insights and anomaly detection. The deployment was handled through Docker containers orchestrated by Kubernetes, ensuring high availability and scalability. This project reduced latency in data processing by 60%, enhancing decision-making capabilities for claims managers.

Technologies: Apache Kafka, PySpark, Docker, Kubernetes, AWS Lambda, S3, PostgreSQL
Automated Data Pipeline for Financial Analytics

Built an automated data pipeline for processing and analyzing large volumes of financial transaction data. The pipeline was designed to handle both batch and real-time data, with automated anomaly detection and alerting mechanisms. The pipeline was containerized using Docker and deployed on a Kubernetes cluster, facilitating easy scaling and management. This deployment improved data processing efficiency by 50%.

Technologies: Python, Docker, Kubernetes, AWS Lambda, Redshift, S3, PostgreSQL

Experience

Data Engineer Intern at CCC Intelligence Solutions (May 2023 – Current), Chicago, IL
- Designed and implemented big data pipelines, ingesting data from various sources, processing it in ETL pipelines, and delivering cleaned data using parallel processing technologies such as Kubernetes and PySpark.
- Implemented daily, batch, and real-time data processing pipelines using Apache Airflow for orchestrating workflows.
- Built an innovative internal tool with a custom hashing algorithm and database/application design, resulting in an 85% improvement in data processing time.
- Leveraged AWS services including EMR clusters and Lambda functions to create robust data pipelines transforming and delivering data to data warehouse.
- Automated the collection and analysis of performance data from each pipeline, developed a web application for unified and comprehensive pipeline monitoring, and deployed it using Docker and Kubernetes.
Data Scientist Intern at Division Of IT, University of South Carolina (May 2022 – April 2023), Columbia, SC
- Deployed automated data pipelines on Azure Cloud, cleansing and evaluating datasets via REST-API for real-time Tableau dashboard insights.
- Collaborated on ETL tasks, ensuring data integrity and pipeline stability.
- Built containerized applications using Docker for seamless, isolated execution environments.
Software Engineer Intern at Billzio Co (Sep 2020 – July 2021), San Francisco, CA
- Designed web applications and endpoints using Flask framework.
- Solved financial business problems using data-driven approaches.
- Implemented machine learning automation, increasing user interaction by 20%.
- Managed PostgreSQL databases, including migration and schema design.

Education

Master of IT from University of South Carolina (Graduation: Dec 2024)
Research: Optimizing big data pipelines in healthcare using machine learning, focusing on bioinformatics applications.
Bachelor of Science: Mechanical Engineering from Azad University of Tehran (Graduation: )

Certificates

AWS Cloud Practitioner
ML on Big Data, IBM (Coursera)
Data Structure, UCSD (Coursera)
Data Science in healthcare

Skills

Cloud Services: AWS, Azure, Google Cloud
Big Data Engineering: ETL, ELT, Data pipeline design
Data Pipeline Tools: Python, Docker, Kubernetes, EMR, Lambda, PySpark, Airflow
Programming Languages: Python, Pyspark, Scala, SQL, NoSQL, Java
Data Visualization: Tableau, Power-BI, Plotly, Seaborn
Software & Tools: PostgreSQL, MongoDB, Presto, Dbeaver, Postman, Hive, Git
Machine Learning Technologies: Scikit-learn, Pytorch, Tensorflow

Back to Home

Ehsan Soltanmohammadi

Projects

Big Data Pipeline for Healthcare Data Integration

Real-Time Data Processing System for Insurance Claims

Automated Data Pipeline for Financial Analytics

Experience

Education

Certificates

Skills