When crawling websites I usually cache all HTML on disk to avoid having to re-download later. I wrote the pdict module to automate this process. Here is an example: