A few people asked how to apply this to multiple webpages, so here it is:
This is a simple solution that will keep all HTML in memory, which is not practical for large crawls. For large crawls you should save the results to disk. I use the pdict module for this.
Updated script to take a callback for processing the download immediately and avoid storing in memory.