Flash is a pain. It is flaky on Linux and can not be scraped like HTML because it uses a binary format. HTML5 and Apple’s criticism of Flash are good news for me because they encourage developers to use non-Flash solutions.
The current reality though is that many sites currently use Flash to display content that I need to access. Here are some approaches for scraping Flash that I have tried:
- Check for AJAX requests that may carry the data I am after between the flash app and server
- Extract text with the Macromedia Flash Search Engine SDK
- Use OCR to extract the text directly
Many flash apps are self contained and do not use AJAX requests to load their data, which means can rely on (1). And I have had poor results with (2) and (3).
Still no silver bullet…