Intro to Data Engineering: Hands-on introduction to Data (Delta) Lakes, pyspark, and data engineering architectures on AWS


Start: 22.05.2024 @ 18:00


Join us on Wednesday, 22.05.2024 for the PyData Meetup May 2024 , the meet up will be held in Base42, a local hackerspace in Skopje.

Talk:

In this lecture, some basic concepts from the field of data engineering will be presented, and then, through practical examples, an architecture and implementation for data ingestion of (simulated) massive data from a data source in a data lake will be presented, where the data will be structured using the medallion data architecture. The implementation will be with ETL jobs in the PySpark framework. Finally, some services from AWS that are used for data engineering and some possible architectures that use those services will be shown.

About the speaker

Stefan Andonov is a researcher in the field of machine learning and a data engineer with many years of experience. He currently works as an assistant at FINKI, UKIM where he conducts exercises in software engineering courses, and as a senior data engineer at Loka where he works on designing and implementing data engineering architectures for life science and healthcare companies in the USA. He received his master's degree from FINKI, UKIM, where he is currently enrolled in doctoral studies. Stefan is a certified AWS Solution Architect Associate and Data Analytics Specialty.

Call for presenters


Do you enjoy sharing knowledge and like public speaking? Do you just enjoy sharing knowledge but are unsure how you feel about public speaking? Even if you're not 100% about public speaking, PyData is a very welcoming community and we appreciate any talks about sharing knowledge given by anyone passionate enough to share them.

Sign up to speak on the next PyData:

./speak.sh

Location:


Base42 is located in a Garage at Rimska 25, Skopje.

Oh... there's also this map:

Base42 was made from scratch by enthusiasts like you.

© 2042