PDA

View Full Version : Website For PPD Statistics - Opinions?


harlam357
06-06-09, 02:07 PM
Want to get a feel from my fellow teammates about something I've been approached with. This may or may not come to fruition. However, I feel it needs explored and I want to get the first gauge of interest from the people I know, trust, and respect.

There is a person putting together a type of "PPD Database" Website... with statistics on hardware, OS, clocks, etc. I think it's a novel idea... it's also been tried in the past... here's an example http://fahinfo.org/index.php?allscores=true&offset=0. The info on fahinfo.org appears dead as far as I can tell. This isn't the site of the person who has approached me, I don't want to divulge that at this point.

My primary question... is this something you guys would find useful? It just appears it's been tried before with little success, and maybe because the attempts had no way to effectively gather the data consistently. Everything appears to be input by hand, and no one is going to take the time to do that consistently. So here's where I'm put in a unique position. I already have HFM and the data at my finger tips. This person is asking to basically "partner" with me and have HFM export the data to his site. I'm entertaining the idea... his site still needs a lot of work IMO.

However, the thought has also crossed my mind... why give away the data? Why not build my own site to gather these statistics from all the instances of HFM running across the globe? It sounds like a really fun project and natural extension to HFM. It could even turn into a fully fledged FAH stats site, with general data from Stanford (like what EOC does) in addition to data collected by HFM. Wouldn't it be cool if you could click on your name and have all the data returned about the WUs you completed over a specific time period? Trends, graphs, the list of possibilities is very long.

What say ye Team 32? :)

dfonda
06-06-09, 02:59 PM
It would be kinda of nice to tell if you had a problem... Right now I depend on my family to tell me if something is wrong(Red Numbers), the rub is that if something is down its going to wait till I get home any way.

I have not had an instance for quite a while where I had something stop working, when I was not there.

I would say I already spend to much time staring at my numbers "on the road" this might border on insanity!:p

Any one who could get to their rigs, this would really make sense for.

Shiggity
06-06-09, 04:06 PM
Well before you put in a ton of work, what is your thesis here?

How would the data analysis benefit new and existing folders?


There is one analysis that I would love to see come to fruition. That would be trying to compute relative PPD to electricity costs and overall costs of running. A PPD to capital cost investment would also be a very useful statistical analysis.

For example : When I first started, a PS3 was probably the best investment. For 400$ you got 900PPD in the capital investment. Now you can build a 1200-1400$ machine and get 10,000 PPD or more. Then you would cross analyze that to the cost of running over a certain period of time (4 weeks, 8 weeks, 12 weeks, 6 months, 1 year), on an area by area basis (electricity costs more here than here etc).

An analysis like this would basically tell a person at any point in time, what hardware would get them the maximum PPD at any given point in time. You would also be able to see the long term electricty investment and plan accordingly.

I just don't see a point in doing what EOC stats already does kinda. However I see a "how can I get the most PPD?" question asked every day.

dfonda
06-06-09, 04:44 PM
I just don't see a point in doing what EOC stats already does kinda. However I see a "how can I get the most PPD?" question asked every day.Shig this would be like seeing the info from HFM...plus alot more.
It would also be instantaneous or pretty close,(EDIT Correction) by Harlam the data would be sent in intervals... say every 2-3 hours. the eoc doesn't tell you which client is euing. This would .

If it's manageable it could turn into the next Eoc the possibilities are huge.

deadlysyn
06-06-09, 04:58 PM
I think the whole point of the idea is to sort of merge what EOC does and what fahinfo.org used to do. But what I think he is talking about will give more info. Things like what WU's are EUE'ing on which clients. I personally think it is a good idea, but it may take a lot of work. Just keeping HFM from being reported as a false positive by all off the spy/adware programs seems like it may be a bit of a task. Since it would be reporting this information to a website, I would see how this could happen. Most of us here know how to put it into the exclusions, but there are those that would think it is something they might need to worry about.

harlam357
06-06-09, 05:11 PM
I think you've missed the point d... there is already a web generation component in HFM that creates a static HTML website that can be exposed to the world. It needs some further work but suffice it to say that one can already keep an eye on their Folding clients from anywhere in the world if the HTML is indeed exposed on a public web server. I'm looking at my farm statistics on my Blackberry right now. :)

@Shiggity - The idea here is to gather the system information about a machine running a FAH client (OS, CPU/GPU, processor speeds, etc). Then using that data in conjunction with the production info for that client (PPD, Average Frame Time, etc) to populate a single online database.

The website would then give users access to production statistics based on that data. In essence answering the question, "how can I get the most PPD?". This is a rudimentary explanation at this point, and none of this has really been thought out. But again, the person who approached me already has something of this nature up and available... the big difference here is that people are being asked to manually input this data about their clients. He has asked me if HFM could automatically populate his database with the information it is gathering... effectively removing the manual entry and adding much more consistency to the data that is being gathered.

My thought here is that if I'm going to take the time to do this... I think I might like to make this an official extension of HFM that I have control over. I have a few reservations about me not being the single sole entity overseeing what happens with the HFM data. What I'm saying here is... "Here folks, here's the monitoring application, oh and it shares it's information with this guy I don't really know." That doesn't sit too well with me. Of course I would code in an option to enable or disable this "sharing" functionality. That would be the users choice. Regardless of whether it's his database or one I create that ends up getting the data.

The idea to add general FAH statistics is just icing on the cake... not totally necessary but it would make the site a one stop shop for all your FAH data needs. :)

Again, I'm just tossing this idea around... it would definitely take some time to do. Just looking for opinions on whether the FAH user base would find such useful. If it's a bad idea in your opinion or would be of limited interest to you as an FAH donor... say so. That's what I'm looking for here. ;)

seadave77
06-06-09, 05:24 PM
I would love it!

I'm always visiting different sites looking at what hardware combos produce the most points. A central, up-to-date place would be awesome. I understand the technical challenges behind it but if it could be pulled off I think it would be a major help to many people.

dfonda
06-06-09, 05:51 PM
I think you've missed the point d... there is already a web generation component in HFM that creates a static HTML website that can be exposed to the world. It needs some further work but suffice it to say that one can already keep an eye on their Folding clients from anywhere in the world if the HTML is indeed exposed on a public web server. I'

No I think I got it....It's not going to be like the EOC was my point .:thup:

And don't forget your friends how soon can we get stock options?:-/:p:D

harlam357
06-06-09, 06:11 PM
Not that I'm considering this possible venture to be a true source of income... but, don't think I'm not thinking of advertising dollars. ;) Meaning, banner ads... something like what is found on EOC, nothing that would detract from the functionality.

@deadly- I think HFM would be fine as far as spyware programs are concerned. It already makes connections over the Internet to Stanford to download its Project data. No spyware seeking programs of mine have complained about it.

Seadave has the idea... this wouldn't necessarily tell you instantly if your clients are EUE'ing. That was not the original thought. However, I have a requested upgrade to HFM for just such a thing. How would you guys like to get an e-mail or SMS text message when a client started misbehaving? I think people would like that option... But that would be handled by your local machine running HFM... not the website. The website could keep track of WUs that EUE'd and you could go there and look for those trends... but that data wouldn't be instantaneous.

This would be a site to compile data on completed WUs, the hardware each WU was run on, and how productive that hardware was at completing the WU. It could be taken to the point that it's almost like Futuremark's ORB database... where 3DMark scores are kept along with the information on the hardware that produced those scores. In fact, with data like this you may start getting overclockers who start benching WUs under dry ice to see who can get the best PPD. :D Not a good idea however... we want good science, not just scores. ;) But if there's a world record to be had... then you know someone is going to do whatever it takes to break it.

Ok, I'm getting a better feeling for this now... keep your thoughts coming 32! :beer:

deadlysyn
06-06-09, 07:17 PM
@deadly- I think HFM would be fine as far as spyware programs are concerned. It already makes connections over the Internet to Stanford to download its Project data. No spyware seeking programs of mine have complained about it.


Where I am seeing this difference though is that now it just checks and recieves data. Being that it would be sending data out quite often might be where that issue would arise. There may be a way that you can program it that this type of activity won't trigger any red flags. I'm not a programmer, so I don't really know much about it. None of the programs for anti spyware I use have complained about it, but I think this added functionality could. I just want to help you make sure all of the bases are covered before the issue comes up and then you have to start from scratch to make it more friendly to these programs. Like I have said, I am in no way a programmer, and I don't know the ins and outs. I just know that some things that are completely legitimate have been known to trigger false positives. I have even seen it with Folding clients and Avast.

dfonda
06-07-09, 06:09 AM
In futuremarks case the sent info is intiated by the sender is that what you have in mind H. In which case the user could be taken to the website so they know the data has been sent.

Or would you have a choice to be included in the project and the info would be sent at intervals?

If this is under your control I would look forward to participating...if not I would be more hesitant. If turns into something big you should get the cheers!

harlam357
06-07-09, 11:11 AM
Yes, the data would be sent in intervals... say every 2-3 hours. This would standardize the number of connections need by HFM on a daily basis. Also, there would be an option to turn this feature on or off. If someone wants to use HFM without contributing... that would certainly be possible.

That's my feeling as well. If I'm going to gather the data and send it to a single bucket over the Internet... then yes, I'd also want control over that bucket. It just makes more sense.

Shiggity
06-08-09, 06:27 PM
Sounds good Harlam. Know any HTML 5? ;)

harlam357
06-08-09, 07:14 PM
Ahh, no... no I don't. :D Not a web guru by any sense of the term. I used to have a website in college... when we had a whole 10mb of free web space. :D Even then it was basic HTML with some ripped off javascript sillyness.

I have to look into hosting... and what the requirements will be for space, bandwidth, etc. Again, lots of planning to do... if I decide to go full bore into this it will likely be 6 months down the road. Something like a launch in 2010.

The good thing is... I have a friend of mine here who is just getting back in the coding saddle who is decently versed in ASP.NET. He would just like to be involved for the fun of doing it... so I would have some help if I decide to indeed pursue this after further investigations. :)

Any other thoughts guys/gals?

tom_ozahoski
06-08-09, 07:56 PM
I'm all for it if you have control over it and implement what you suggested! :beer:

David
06-09-09, 12:09 AM
harlam - I may be able to arrange web space for you. I'd need to run it past the owner of said server. It's mostly just an IRC server at the moment.

ChasR
06-09-09, 07:23 AM
Harlam,
Are you sure you can collect enough accurate data from the users machines to make the statistics meaningful. For SMP folding alone, variables such as OS, cpu, cpu speed, number of instances on number of cores, VM, affinity, dedicated machine, ram speed, ram amount, priority, CPU + GPU, and which GPU make a difference in production. If the goal is to replace fahinfo.org, you'll need to handle a lot of these variables to have meaningful statistics for public viewing. You'd likely have to have a very large server to collect and store the data if significant numbers of users turn on the automatic reporting function.
I'd love to see my personal statistics, by machine on the web. I'd also love to see the differences between production with the major variables of other users. So, I hope you can make it work. Fahinfo.org doesn't work because too much garbage, like cherry picked best frame times, is entered by the users and not enough data from all machine types and configurations.

harlam357
06-09-09, 08:35 AM
I've been thinking about *how* to gather that data accurately ChasR. Although I may not be able to get down to the level of VM, affinity, dedicated, VM + GPU. My thought was to write a small application that could be placed on each individual box/VM which would generate the hardware/environment information specific to each Folding client instance. Then HFM would just read said data when it pulls and parses the log files. Any client that does not have said data available would not be included in the benchmark data supplied to the big bucket.

This would obviously mean more work for anyone wanting to participate, since they would need to install this secondary app on each Folding box. However, such an application would allow me to accurately (read- not manually input by the user) gather a lot of information about the host machine (cpu type, speed, ram amount (possibly speed and timings), GPU, GPU drivers, etc). I'd like this to be pretty automatic. User installs and is done in most cases. If HFM does not find the data it would be able to alert the user if they are participating in the benchmark data gathering. Then other steps could be taken to manually configure things if necessary. However, I'd like that to be the exception rather than the norm. I would want this to be easy.

How does everyone feel about this idea?

ChasR
06-09-09, 09:16 AM
I'd be willing to install the reporting applet.

# of cpu cores/instance is going to be a must to have useful data. The FAH client writes the data to the screen, but doesn't write it in the log. A user would have to set the -smp variable to 2 for a VM or dual core, and you could parse that from the log. It wouldn't be unreasonable to request a future client upgrade write that data to the log.

harlam357
01-17-10, 01:36 AM
Just a bump... this idea is going to start getting attention in the new year. Working on some structural updates to HFM now and will begin to tackle this after those are done... think post v0.5.0. :)

deadlysyn
01-17-10, 01:44 AM
I was starting to wonder what happened with this. Glad to see you still have something in the works, h.

harlam357
01-17-10, 01:51 AM
Oh, it hasn't been forgotten... just need time to develop. ;) Web is not my forte, so there's a learning curve involved with getting it done. I just need to get the data structures in HFM in the right place (which I'm working on) to move into this. Updates are coming... :beer:

dz_jad
01-17-10, 01:45 PM
This would be a site to compile data on completed WUs, the hardware each WU was run on, and how productive that hardware was at completing the WU. It could be taken to the point that it's almost like Futuremark's ORB database... where 3DMark scores are kept along with the information on the hardware that produced those scores. In fact, with data like this you may start getting overclockers who start benching WUs under dry ice to see who can get the best PPD. :D

Amen. I love that idea! I think it would be really simple to implement on the HFM side (assuming all data is already available). A simple connection and a few messages sent back and forth. I imagine the data would be stored sever-side in some sort of file-system or database. I'm sure we've got a expert here somewhere who knows interfacing a website with server data...
I love the benchmarking idea. It may need to be moderated though...to try and keep it to benefit the science. It would be sad if something like this took-off and then was a thorn in the side of Stanford--but I think it is possible to keep it from going that far.

Along those same lines, it could be used for contesting. Say a couple of people on the team are talking some smack and want to have it out. HFM will record the stats of specific clients and they can go head-to-head with their own rules or what-have-you.

Dude, just let me know, I'll be happy to dig in and get to know some of the code so I can help maintain/upgrade...

Roisen
01-17-10, 02:09 PM
Giving a monitoring application this kind of functionality would definitely add a lot of value to it.

On a side note, do you know how many people use HFM?

ChasR
01-17-10, 02:50 PM
As I've said before this is a nice idea. I still don't see how the variables can be reasonably accounted for, so that meaningful data can be presented. Just looking at Linux SMP WUs, the production is influenced by many variables that HFM can't discern. THe number and type of GPUs and the Work unit(s) running on them can make more than a 1000 ppd difference in SMP production.

I have the utmost faith in Harlam's abilities, but recommend waiting until SMP2 and GPU3 become the primary clients. A year from now, we may have a unified client, using OpenCL, and far fewer variables to account for.

ihrsetrdr
01-17-10, 04:48 PM
Sounds like a cool project, I'll install any reporting app you develop, harlam. ;)

harlam357
01-17-10, 05:30 PM
Giving a monitoring application this kind of functionality would definitely add a lot of value to it.

On a side note, do you know how many people use HFM?

Others have tried to do such with manually data entry... which is always an iffy thing. No one wants to take the time to gather or enter any data in a consistent format... so to really make this happen, one would need to be able to gather it automagically. ;) The tough part, as ChasR points out below, is how to get a hold of the hardware data - HFM runs on a single machine. Then beyond that, how do we determine the influence of multiple clients on a single machine. My first pass on this is definitely not going to get this down to a science like ChasR has it... but I still think it would be of value to the community.

How many people? I'm really not sure. The best guesstimate I have is the number of downloads of 0.4.7, which was right at 525 today. As large as the FAH community is, I'm surprised more aren't using HFM. Lord knows I've tried to get the word out. :)

As I've said before this is a nice idea. I still don't see how the variables can be reasonably accounted for, so that meaningful data can be presented. Just looking at Linux SMP WUs, the production is influenced by many variables that HFM can't discern. THe number and type of GPUs and the Work unit(s) running on them can make more than a 1000 ppd difference in SMP production.

I have the utmost faith in Harlam's abilities, but recommend waiting until SMP2 and GPU3 become the primary clients. A year from now, we may have a unified client, using OpenCL, and far fewer variables to account for.

One other thing that "scares" me is the full scale change in the clients. There is supposed to be a completely new client in the works which will do away with some of the constructs we currently deal with as third-party devs. I can only hope that the new client(s) make my life 10x easier. :)

So with that said, I'm not sure SMP2 and GPU3 are the answer... as far as I've seen, they're just new cores, not new clients.

Sounds like a cool project, I'll install any reporting app you develop, harlam. ;)

Thanks ht! :beer: