• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Screen Scraping Your Results Pages

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Pete Church

Member
Joined
Jan 3, 2011
Location
San Diego, CA
Ok, I couldn't help myself... perhaps one or two of you may find this useful.

Over the weekend I wrote some bash/awk scripts that download all my workunits from the seti status pages and push the data into a mysql DB. I can then do some queries on that data to keep an eye on things.

My personal approach was to write some JSP pages that query the DB and spit out XML that I feed into Excel. From there i can do graphs and other sorting of data.

You're welcome to use these scripts as well, but they will only work if you have a *nix environment (probably will work on cygwin but I didn't try). I won't claim that these are bug free or that there's something hardcoded from my environment, but I'm pretty sure that they won't format your harddrive!

File Contents:
seti_db.sql - the create-table script for Mysql (port to any DB that you prefer).

setiresults.sh - the bash script that uses "wget" to log into your seti account, then it walks through all the various work units pages, downloading them, and then calling the awk script below to parse the html. NOTE: you have to modify this script at the top. See all the {BLAH_BLAH_HERE} tags.

setiawk.awk - the awk script to parse the seti pages. You *shouldn't* have to change the .awk file unless you want to change the output completely.


Extra Info:
If you don't want to load the data into a DB automatically, then just comment out the call to processresults() method in the setiresults.sh file.

Post a reply or send me a personal message if you want to use this and you get stuck or have other questions/comments. In my spare time, i may be willing to update these and add more features, but please feel free to do with as you please... open source without the legal stuff :)
 

Attachments

  • setiresults.zip
    2.5 KB · Views: 49
Back