I have been interested in automatic approaches to web scraping for a few years now. During university I created the SiteScraper library, which used training cases to automatically scrape webpages. This approach was particularly useful for scraping a website periodically because the model could automatically adapt when the structure was updated but the content remained static.
As a student I was fortunate to have the opportunity to learn about web scraping, guided by Professor Timothy Baldwin. I aimed to build a tool to make scraping web pages easier, resulting from frustration with a previous project.