Email: Ehsans@email.sc.edu, Ehsannsam@gmail.com
Designed and implemented a big data pipeline for integrating diverse healthcare datasets, including patient records, medical imaging, and lab results. The pipeline leveraged Apache Spark for distributed data processing, optimized ETL processes, and ensured HIPAA compliance. Containerized the entire pipeline using Docker, and deployed it on a Kubernetes cluster, which allowed for scalable and resilient data processing. This setup resulted in a 40% reduction in data processing time and improved data accessibility for healthcare professionals.
Technologies: Apache Spark, Python, Hadoop, Docker, Kubernetes, AWS EMR, S3, Redshift
Developed a real-time data processing system to handle high-frequency insurance claim data. Implemented using Apache Kafka and PySpark, the system ingested and processed data streams, providing near real-time insights and anomaly detection. The deployment was handled through Docker containers orchestrated by Kubernetes, ensuring high availability and scalability. This project reduced latency in data processing by 60%, enhancing decision-making capabilities for claims managers.
Technologies: Apache Kafka, PySpark, Docker, Kubernetes, AWS Lambda, S3, PostgreSQL
Built an automated data pipeline for processing and analyzing large volumes of financial transaction data. The pipeline was designed to handle both batch and real-time data, with automated anomaly detection and alerting mechanisms. The pipeline was containerized using Docker and deployed on a Kubernetes cluster, facilitating easy scaling and management. This deployment improved data processing efficiency by 50%.
Technologies: Python, Docker, Kubernetes, AWS Lambda, Redshift, S3, PostgreSQL
Research: Optimizing big data pipelines in healthcare using machine learning, focusing on bioinformatics applications.