View Full Version : hwbot evaluation & new features
Hi all,
I asume you guys are happy with the hwbots who maintain the rankings, as you're still using them. ;) Some feedback would help me a lot to get a finished product though, so if there is anything that annoyes you, now is the time to spill your guts!
example requests to make the bot rankings easier:
- wizard for setting up a new bot (planned)
- mail notification to moderators when bot fails (planned)
- better faq (planned)
- ranking in first post customizable
- ...
For those who haven't kept an eye on hwbot development, these are some new feats implemented the past 2 months:
Community based benchmark charts
Community based benchmark charts have the drawback of not being done in a controlled environment on 1 testsystem by 1 user, but once you get enough results in the deviation get's really small and the data gets equally as reliable imho. As all results get charted, a 'fake' result gets spotted quickly and can be removed from the db by one of our result moderators.
But, I see 2 huge benefits these community based benchmarks will have that charts made for a review will never be able to offer:
- overclockability. If a reviewer gets his hands on a cpu/gpu and can overclock it to xxx mhz, it means very little. I can be cherry picked, or a bad sample, or... While overclockability charts based on hundreds of submissions give you a very clear insight on the overclockability of the hw piece.
- 'live' charts. As the charts aren't the result of a handmade excel/calc sheet, but based on live data in the database, the charts will always be up to date, whether you visit them today or within a year.
What's your opinion on this? Do you think community based benchmark charts have a future, or do you think they will suffer too much from vandalism (cfr. wikipedia) to be reliable?
Example 'live' benchmark chart, 3000 superpi benchmark results. lower = better.
http://img141.imageshack.us/img141/7784/showel6.th.png (http://www.hwbot.org/newsLink.do?newsPostId=275987)
Example 'live' overclockability chart. Each dot represent a succesfull benchmark run at x core speed and y memory speed.
http://img161.imageshack.us/img161/1892/jsessionid49022cb1f82f553f0bfe57a0d7009b9eimg14822 528ce4.th.png (http://www.hwbot.org/quickSearch.do?hardwareId=GPU_450)
Example videocard overclockability roundup chart.
http://img162.imageshack.us/img162/2291/showat8.th.png (http://www.hwbot.org/hardware.videocard.statistics.do)
detailed processor specifications and statistics:
processor specs (http://www.hwbot.org/hardware.processors.do)
processor statistics (http://www.hwbot.org/hardware.videocards.do)
detailed processor info:
example processor info page (E6600) (http://www.hwbot.org/quickSearch.do?hardwareId=CPU_873)
detailed videocard specifications and statistics:
videocard specs (http://www.hwbot.org/hardware.videocards.do)
videocard statistics (http://www.hwbot.org/hardware.videocard.statistics.do)
detailed videocard info:
example videocard info page (x800 GTO) (http://www.hwbot.org/quickSearch.do?hardwareId=GPU_450)
world records and team hall of fame:
world records (http://www.hwbot.org/results.hall.of.fame.do)
team rankings (http://www.hwbot.org/teams.hall.of.fame.do)
I need feedback! Spill your guts! :D
I've been waiting forever for you to come here and post what you guys have been working on!
I'll try and come up with some ideas but for the most part you're doing awesome.
I don't know if this would be possible to implement or not, but it'd be really nice to be able to click on a score or time as a link, and have that show a screenshot.
I know in the past you had scores linked to futuremark links from the same user, would the screenshot idea be doable?
That'd be perfectly possible, it even wouldn't be hard to implement. The main problem however is that I have to keep the character count of the text in the first post as low as possible, as most fora have a rather low maximum post length. If I would add a link to the picture or hwbot, it would make the text length skyrocket. : /
I could make it optionable, and let the team in question choose whether they want to enable it or not. It's duable if you have a really high text limit.
Thanks, added to the todo list. :D I'm waiting for more requests. ;)
El<(')>Maxi
08-09-06, 02:32 PM
Seems there are teams with the same person/nick with two or more scores in their top ten (OCX, Overclocking Masters) for example. I love all the new features really but I would prefer to see problems with ranking errors like this fixed :)
You can configure a bot not to display your highest score, but all your score (or better: highest per cpu/gpu combo). It's not a bug, they just configured it differently. : )
El<(')>Maxi
08-09-06, 04:17 PM
I see, imo opinion that is an option that should not be available then.
Yeah i know, it bugs me too sometimes. :D It has been implemented on request, maybe i should think twice when someone asks me sth.
El<(')>Maxi
08-09-06, 05:52 PM
I'm going to be completely honest with you. You have an incredible piece of software and development skill but the errors in ranking (of which I consider my last two posts), hinder you efforts greatly. If hwbot was 99% flawless everyone would use it but there are too many glitches in the rankings like this that hurt hwbot imo :)
Thanks El Maxi, that's some good advice! I'm going to concentrate more on quality instead of quantity.
This is one of the downsides of hwbot being a hobby project developped in spare time. It's just no fun to do a lot of testing. I will have to force myself a bit if I ever want hwbot to be a good alternative to the ORB.
With glitches in the rankings, do you mean glitches in the team averages displayed, glitches in the scanning the results, or both?
El<(')>Maxi
08-11-06, 07:06 AM
Actually neither, I havn't noticed any problems with either of those aside from my nick being displayed oddly.
From a team aspect the main problem I see is the same guy(s) with more than one score in the top ten. Even if they are on a different platform if you ask me it's not a fair representation of any one team. I could buy 5 different Conroe rigs and if I understand you correctly, post all five of them. Before you mentioned it I was under the impression that it was an error in hwbot scanning or something like we used to see. Now I know better, it's obvious you have corrected that. If that feature was was a request maybe some of the feedback opposing it goes unspoken?
Now from an individual user standpoint hwbot offers an amazing resource to compare your results, I have nothing but kudo's there. Anyway those are my thoughts :)
Best regards,
Mark
You do know that the option to show the highest score of all your rigs does nothing more that just display it, right? It doesn't affect the team average, so it's not unfair to other teams.
If you mean it's unfair for the other team members, you're right.
I'd rather have ppl setting the 'display subcategories stating from xxx users' to 1, than enabling the 'show highest score per rig instead of member'. I'm seriously considering removing that option, as I can see it spreads confussion.
Thans for the comments Mark, I really appreciate it.
El<(')>Maxi
08-11-06, 07:34 AM
So if the same person has more than one score in the top ten only their top score is used in the calculation? Yeah that is a little confusing , sorry :D
Indeed.
It was added because some ppl found it annoying only their top score is added to the list, and not the scores of their other benching rigs.
But that's why we have subrankings...
Screw it, i'm going to send a mail to all mods that i'm going to remove the option, and advise them to use the subrankings.
El<(')>Maxi
08-11-06, 02:42 PM
:)
Ever thought about creating your own version of PI with online submission? The time is certainly right for a multi-CPU version.
Thought of it, yes. I think a multithreaded superpi with integrated hwbot score submission would be awesome... I barely have the time to finish hwbot though, so I guess those are long term plans.
El<(')>Maxi
08-12-06, 04:54 PM
So do I, you know it would be a huge huge hit. To bad you can't code in your sleep ;)
Hmmm yes, or ditching the misses and abandoning social life would be great help too. :D
Would a mulitpi app written in java be acceptable, or does it have to be written in C(++)? 97% of the hwbot visitors have a java virtual machine installed, but I know ppl still dislike java apps because of the higher startup time.
By the way, I've got the team ranking statistics page almost finished. It shows a couple of graphs which display the team average or rank moving over time (one for this week, month, all time), and compared to the team ranked 1 place below and above.
Ok, I've been playing around with some pi calculations (turns out i'm a bad employee when the boss is not around ;)) and I've drawn some conlcusions.
There are many different ways to calculate pi, but except for the brute force method, they can't be executed by multiple threads (= spread over different processor cores).
The brute force can be spread over multiple cores, but will never produce an the same PI calculation, only a good estimate.
If I were to develop a multipi app, would it be acceptable that it doesn't always calculate the same PI result? SuperPi calculates Pi to 1M digits, multipi would just do xxx billion iterations (spread over different cores), and then stop no matter what the result is. If I would force mulipi not to stop until 1M digits are reached, results would vary each time because brute force calculation is not exact and uses random numbers.
I guess the end result doesn't really matter, as we want to know how fast our system can do xxx billion iterations of the algorithm, we don't want to know PI.
I wrote a beta. It uses 4 threads.
Starting Pi calculation
Thread core 1 started.
Thread core 3 started.
Thread core 2 started.
Thread core 4 started.
2297 millisec: 20 workunits left
2328 millisec: 19 workunits left
2422 millisec: 18 workunits left
2438 millisec: 17 workunits left
4563 millisec: 16 workunits left
4594 millisec: 15 workunits left
4766 millisec: 14 workunits left
4907 millisec: 13 workunits left
6829 millisec: 12 workunits left
6876 millisec: 11 workunits left
6985 millisec: 10 workunits left
7344 millisec: 9 workunits left
9110 millisec: 8 workunits left
9141 millisec: 7 workunits left
9282 millisec: 6 workunits left
9798 millisec: 5 workunits left
11360 millisec: 4 workunits left
11391 millisec: 3 workunits left
11516 millisec: 2 workunits left
11688 millisec: 1 workunits left
all threads finished!
Calculating pi: 15707898 of 20000000
Calculated Pi: 3.1415796
Real Pi: 3.141592653589793
Time lapsed: 11688 milliseconds
P4 3.0Ghz: 11.7 sec
Intel Core Duo T2500 1 core enabled: 8.5sec
Intel Core Duo T2500 2 core enabled: 5.2sec
Cool. :)
I think 2 cores isn't twice as fast due to memory bandwith.
How heavy should I make the calculation? I think +- 1 minute for a fast midrange processor (T2500?) would be good?
That sounds awesome. And yeah 1 minute sounds like a reasonable baseline time. I guess that'd put the highest end around 30-40 seconds? Much more reasonable than the sub 10 sec times we're seeing these days.
Only thing I'm kinda peeved about is that the Kentsfield folks will kill it. :p
I'm afraid a baseline of 1min for a midrange processor won't be high enough. Otherwhise the 10sec barrier will be reached too quickly with a kentsfield at 5ghz.
I've tuned it a bit and I think i hit a sweet spot. Doing 30 million iterations, it takes 1m30sec on a T2500 (2 cores), and 2m51s when one core enabled. I guess a midrange singlecore processor will do about 3 minutes to calculate it.
That seems a lot, but I think a kentsfield at 5ghz would be able to do 30 million iterations in about 20seconds...
Anyone got a Kentsfield who wants to give it a testrun?
We have a couple of dual Woodcrest systems here, but no Kentsfields yet AFAIK...:p Lets not forget when official release is...ok fine, not like we cared with Conroe either. :D
Here's a run on my dual 5160 machine:
http://www.techforge.biz/images/pi.jpg
I've sent you guys a download link for the beta in pm.
Joe Camel
08-14-06, 03:09 PM
please give me a link :D
or i can get copy off Gautam :p
this is EXCELLENT!!
finally a good test for dual core.
a "32M" (long) version being thought of too?
32seconds? Damn that's fast! Is that default speed, or overclocked?
@Joe: sure! How long should the long version take on a midrange processor?
- edit -
d/l link:
http://www.hwbot.org/forums/viewtopic.php?pid=1902#p1902
It's stock until I can find a way to overclock with a Tyan mobo.... probably never.
Frederik, you're a wonderful asset to the enthusiast community, but I think we all knew that. :p
Can't wait to get home and give this a whirl on the Conroe. (And get my butt kicked by TC's rig for sure :p)
Joe Camel
08-14-06, 03:54 PM
off the top of my head; a 15-20 min INTENSE test would prove CPU "stability" to ME (on MY rig) and would have the CPU/cooling @ max temps.
as im not sure how a "midrange" CPU would compare, im not sure how to truly answer your question.
thank you very much for asking for our input :thup:
thank you very much for asking for our input :thup:
No point in making sth nobody wants. :D
15 minutes on a quad core would mean 1 - 1 1/2 hour on a A64 2Ghz.... would the 'long' bench still be interesting than?
Or should I make 2 totally different versions? 1 'fast' one, first one to finish pi in xx seconds, second one runs pi for eg. 20 minutes and tells you how many iterations it finished?
Lets think back to the A64 days though, remember when 32M would take 20-30 minutes, and breaking 20 would take DI or LN2? :D
I think 20 minutes should take nothing less than the fastest out there, which would mean something in the 5 gig range, but whether to "calibrate" against a Kentsfield or Conroe, I don't know. Right now nearly no one has quad core systems. I guess that might change in about six months.
Thinking out loud, it might be suitable for the benchmark to be revised and re-released every year, like the 3DMark series are....more work for you, but just a thought. :p
Oops you posted while I was writing.
No point in making sth nobody wants. :D
15 minutes on a quad core would mean 1 - 1 1/2 hour on a A64 2Ghz.... would the 'long' bench still be interesting than?
Or should I make 2 totally different versions? 1 'fast' one, first one to finish pi in xx seconds, second one runs pi for eg. 20 minutes and tells you how many iterations it finished?
This is a problem for sure. A bench geared towards quad cores would be nearly unusuable on single cores. :-/ That's the risk of it being multi-threaded...you not only have to account for the high end running at much higher frequencies than midrange, but also having twice if not four times as many cores.
I sort of like the idea of it being based only on how many iterations it completes, that way its not time dependant...BUT I don't think the benching community will be receptive to this. Being able to finish the bench quicker on faster systems and comparing times is part of what makes SuperPI so much fun.
I'm afraid I haven't helped at all, but this is a pickle. :p
Hmmm, if it would take a 20Ghz (4*5Ghz) core 2 to break 20Mins, i wouldn't want this to run on my Epia M10000. :D
- edit -
Yeah, being able to break a certain time barrier is indeed what SuperPi so much fun. If we want multipi (working title :D) to be a success, it would have to look & feel very much like the old superpi.
:shrug:
I guess part of the problem is that the benchers that would appreciate this the most will also mostly have the fastest systems out there. But making a bench geared to them would alienate the rest of the world.
Perhaps if you could purposefully cripple it into being less efficient in multithreading so that additional cores only add a boost, but do not completely double the score or come close to it? :shrug:
Hmmm na, that doesn't seem like a good idea. It almost feels like 'cheating'. Not that multipi isn't completly synthetic, but i'm not going to alter the algorithm to make multiple cores look worse than they really are.
Joe Camel
08-14-06, 04:17 PM
ok...
i think anything over 30 min would become too long and tedious for most bencher's.
trying to come up with a 1-for-all test is almost impossible when your range is from a single core A64 to a dual CPU kentsfield rig ( :drool: )
IMO your going to have to have a: single core / dual core / quad core set (short and long) of tests :(
El<(')>Maxi
08-14-06, 04:26 PM
Is it possible to make it selectable (length of PI calculated) similar to the current Super PI app & have a flashy new interface :D
I think it would be best if it detected the number or cores on the system and also allow you to select how many cores you wish to utilize. If I had the experience I'd be helping you with this but all I know how to do is make forms in Visual Studio.
The problem with having different versions for different amounts of cores pretty much eliminates the need for even having a multithreaded version. I know that might sound weird, but think about it.
Everyone with one core will run the single core calculation, their scores will only be comparable with other single core CPUs.
Same with double, and quad. And now we're proposing to make the calculation take about the same amount of time for each...so everyone will have the same times no matter how many cores they have.
See what I'm getting at?
El<(')>Maxi
08-14-06, 04:39 PM
The same amount of time? I wouldn't do that, I'd make it the same as the current version of PI only multi threaded. ANd there would not be different versions rather all in one :D Right mbot:D The reason you would have the option to select how many cores used is so you can synthetically compare a Woodcrest for example to Conroe. Or even against a single core chip, although I doub't we'll need that for much longer.
Suppose we calibrate the benches so that in the dual core version, you need a 5 GHz Conroe to hit 15 minutes flat.
And in the quad core version, you need a 5 GHz Kentsfield.
In this situation it wouldn't really matter whether you have two cores or four, as you'd get the same time with either as long as you use the appropriate version.
But, if I understand what you're saying correctly, you'd want a single calculation algorithm, with the option of enabling/disabling as many cores as you'd like? Wouldn't this again cause the problem of a Kentsfield completing the calculation twice as fast (or close to) as a Conroe?
El<(')>Maxi
08-14-06, 04:48 PM
Now that I think about it a set single time version might be really good with the 'score' calculated off however many iterations are completed. And I agree woodcrest or other big core chips should not be crippled, we all might be benching on 16 core chips in a couple of years. As far as how long the computation runs I'd say 3-5 min tops.
It would be good for all practical purposes, but once again, I don't it'll be received well. Just won't be the same. The variable time is one of the things IMHO that make Super Pi so alluring and unique as a benchmark.
3-5 min tops, bah. The 32m calc taking around 10 minutes on the fastest procs out there is another part of the challenge that IMHO needs to be retained. It shows that you're not a wuss if you can sit there pouring LN2 for 10-20 minutes straight. :p
Maybe I'm asking too much. :-/ :shrug:
Joe Camel
08-14-06, 05:01 PM
ya, also it would test "stability" a little more and (thus) your cooling setup too.
the "long" version should run longer than 15min
but the "short" should be less than 2 min maybe even less than 1min :shrug:
Mbot great idea. Multi cored pi would be shall we say the future bench of choice. IMHO this should take a LONG time right now as to give this bench/test some longevity. As its been mentioned kentsfield, and 4x4 are comming very soon, and within time multicored machines will be all over. So by making this almost absurdly long to complete will give it longevity.
Now without testing, and only seeing a screenie of the dos interface. I would think adding parameters for separate testing a good thing. saying a good 5 min test for a conroe today. being the short test, something midrange, and then the absurdly long test for future chips to break down. All that would be needed is a reference to which test was run when the result is given.
Hope my thiughts came acrossclear, as i think i confused myself??????
El<(')>Maxi
08-14-06, 05:52 PM
One reasons why a set completion time would be very good for future proofing, only problem is it might be too big of a change to catch on. Yeah maybe it's best if it's as close to the present Super PI as possible with a cleaned up interface and multi-threaded options. But of course that is under ideal circumstances, I have no idea if mbot wants to get that far into it or not.
Anyways, gave it a whirl on the Conroe...don't feel so mighty anymore...:-/
Some early conclusions:
- current amount of iterations seems ok, maybe it needs to be a bit more.
- dual / quad core is almost 2/4 times as fast as single core... maybe tax the memory a bit more so it doesn't get a cpu-only test?
- memory gets barely taxed, heavier use on memory would make it a 'better' test (for dedecting memory faults), lower the advantage of dual/quad core a bit, and make it a more 'real life' test.
- finding the sweet spot between 'not too short on a oc'ed quad core' and 'not too boring on a midrange single core' is frigging hard. : )
- p4 dual core seems to perform equal to single core? need verification of this
- x2 equal as fast as conroe at same speed? need verification
I made a second beta (0.2). Download link will be available shortly. Results will _not_ be comparable to beta 0.1.
I altered the algoritm to consume a lot more memory, and to check the memory for faults. The standard calculation now uses approx 200mb of ram.
The heavier use on memory makes the bonus for having additional cores smaller. I think a second core would give a +50% bonus, a 3th and 4th maybe 20 to 30%. The faster your memory, the more benefit you'll get from having additional cores.
Testrun on my macbook pro:
T2500 (1.86Ghz Core Duo) with 1 core: 3m 07.140s
T2500 (1.86Ghz Core Duo) with 2 cores: 2m 02.754s
download link:
http://www.hwbot.org/download/multipi-0.2.zip
Input highly appreciated!
El<(')>Maxi
08-15-06, 03:53 PM
Tried it out on a stock X2 3800 :)
Seems to almost be paging most of the memory use, is there any way to control that?
Just ran the new beta...
http://www.techforge.biz/images/pi2.jpg
I wonder why it repeats itself at the end... i'll take a look at it tomorrow.
Thanks for testing guys. Anyone with a P4 dual core to test it?
I have a presler 930 in pieces waiting to sell. I guess I could slap it together on the bench and run it.
I tested My P4 3.0D at work: 03m 53.342s
Does it seem right a P4 at 3Ghz is 25% slower than a Core Duo with 1 core at 1.86Ghz?
The new beta 0.2 seems to produce consistent results:
6 minutes for a slow singlecore (Amd Turion 1.6Ghz)
4 minutes for a midrange singlecore (P4 3Ghz)
3 minutes for a fast singlecore (core solo 2ghz)
2 minutes for a midrange dual core (core duo 2ghz, X2 @ 2.5ghz, dual xeon 3Ghz)
1 minute for a fast dual core (core duo at 3.5ghz)
45s for a midrange quad core (woodcrest at 3ghz)
Anomalities:
P4 @ 3.9Ghz: 2m21s ? Thats too fast, it should be around 2m45 - 3m.
No P4 dual core results?
I feel a little better with this version compared to TC, a little more fair. ;)
http://mysite.verizon.net/gautamb/multipi.JPG
Well, I've tested 0.2 thoroughly and there seems to be a difference between platforms I think it has mainly to do with the amount of open processors primarely caused by the window manager. The heavier/fancier your GUI, the slower multipi gets.
Ranked by speed:
linux terminal > linux KDE > windows XP > MacOSX Tiger
The gain for a linux terminal was too much to call it fair, so I started tweaking the algorithm again.
The new algorithm (0.3, to be released early next week), will tax memory less, but will be comparable between platforms. I haven't had the chance to test it on Mac OSX yet, but results on a linux terminal and windows gui are nearly identical (I did 5 runs on each and fastest run on linux was not more than 200 milliseconds faster than slowest run on windows xp). If I get the same, identical results on macosx i think we have a winner. :)
It doesn't tax the memory as much anymore though, but I find consistency between platforms much more important than taxing the memory.
vBulletin® v3.8.4, Copyright ©2000-2010, Jelsoft Enterprises Ltd.