Tips and Tricks for News Aggregation: From Crawling to Web Serving
Start: 24.09.2025 @ 18:00
Join us on Wednesday, 24.09.2025 for the first PyData Meetup of the new season. The meet up will be held in Base42, a local hackerspace in Skopje.
This time we will have the opportunity to learn about something new in AI and Machine Learning - NeRFS or Neural Radience Fields, a very intricate technique of generating 3D space from 2D inputs.
Talk: Tips and Tricks for News Aggregation: From Crawling to Web Serving
We invite you to a lecture on “Tips and Tricks for News Aggregation: From Crawling to Web Serving”, which will take place on 24.09.2025 at base42. This session is designed for anyone interested in building systems that collect, process, cluster, classify, and present news content at scale.
The lecture will cover the entire pipeline of a modern news aggregation system:
Web Crawling – strategies for selecting sources, building efficient crawlers, and optimizing performance.
Text and Image Extraction – cleaning messy HTML, handling diverse formats, and extracting meaningful signals.
Clustering – grouping related news articles into coherent storylines.
Classification and Ranking – applying machine learning and ranking models for categorization, prioritization, and relevance scoring.
Web Serving – efficient infrastructure for real-time delivery, caching and user experience.
About the speaker
The talk will be given by Dr. Igor Trajkovski, founder of Time.mk — the leading news aggregator in Macedonia since 2011, serving hundreds of thousands of users with automated pipelines for extraction, semantic similarity, clustering, ranking, and high-performance web delivery.
Dr. Trajkovski was an Associate Professor of Computer Science (2008–2016) at FINKI, SS Cyril and Methodius University in Skopje. He is an experienced data scientist and developer, with academic and professional work in Macedonia and Germany.
Base42 was made from scratch by enthusiasts like you.
© 2042