I made my own version of this technique to extract article summaries.
Source code can be found here.
The idea is simple - extract the biggest text block - but performs well.
Here are some test results:
http://www.nytimes.com/2010/03/23/technology/23google.html?_r=1
The decision to shut down google.cn will have a limited financial impact on Google, which is based in Mountain View, Calif. China accounted for a small fraction of Google’s $23.6 billion in global revenue last year. Ads that once appeared on google.
http://www.theregister.co.uk/2010/09/29/novell_suse_appliance_1_1/
Being able to spin up appliance images for EC2 and spit them out onto the Amazon cloud meshes with Novell’s EC2-based SUSE Linux licensing, which was announced back in August. Novell is only selling priority-level (24x7) support contract for SUSE Linux li
http://webscraping.com/blog/Best-website-for-freelancers/
However with Elance there is a high barrier to entry: you have to pass a test, receive a phone call to confirm your identity, and pay money for each job you bid on. Often I see jobs on Elance with no bids because it requires obscure experience - people we