Airflow + Selenium: the perfect toolkit for web scraping

15 June at 5:30pm UTC


As-airflow is a docker image that lets you easily integrate Selenium scrapers into Airflow, allowing you to have all the features that airflow provides to orchestrate your tasks. We will see how to use some developed features such as scrapers queueing, error tolerance, error reporting, scrapers enrichment, browser simulations, browser resets, timeout settings, and more. I will prepare a presentation to explain these features and will demonstrate with real use cases, showing airflow dag runs, browser simulations, and scraper results stored in bigquery



Speaker(s):