CPU performance at seti@home

mackerel · Apr 11, 2017

Splitting this off the Ryzen thread to prevent cluttering that up.

Keith Myers said:
I run the SETI@Home Multiband AVX app on the CPU. So that really stresses and heats up the chip. The 1700X is twice as fast on the AVX CPU tasks as my FX processors. So they did a real good job in optimizing the AVX path in Ryzen. I am crunching a CPU task in under an hour on the 1700X. The same task on the FX 8350 and 8370 take over 2 hours to finish.

Starting from the above comment, I'm curious what instruction set s@h uses, and the relative performance between Ryzen and recent Intel. I suspect I'm confusing AVX and AVX2.

I'm not sure exactly what work is being compared. Looking in my project options I only see options to select astropulse v7 or s@h v8, so have only selected the latter on CPU. No GPU tasks. Currently running on a Haswell i3 at 3.7 GHz, one per physical core using affinity, it is estimating 2h45m to complete units. I'll add my Ryzen system shortly.

Edit: estimates are now 2h for one unit, 2h30m for the other. I think I'll ignore the estimates and will only use completed unit times.

mackerel · Apr 11, 2017

I found the following in the s@h v7 announcement: "SETI@home v7 now supports the AVX instruction set used on Intel Sandybridge and later processors. That gives a significant speedup on those processors." There doesn't seem to be anything mentioned in v8. As it is AVX, my earlier comments on performance regarding AVX2 are not applicable. I'll still keep testing for more data points.

Out of interest, does anyone know if s@h work is significantly sensitive to ram performance?

Kenrou · Apr 11, 2017

I haven't seen any significant difference in going from ddr4 2133mhz to 3200mhz on my Intel rig (had the ram at stock for a while to troubleshoot), but it's likely to affect it to some degree as everything else in your setup.

mackerel · Apr 11, 2017

I had forgotten about this... now running on 8 cores of my 1700 (SMT off), it looks like there are short and long units. The longer units having about a month longer deadline than the short ones and they're estimating around 2.5h compared to 1h of short units. So, I'll have to manually sort them to give like for like comparisons.

mackerel · Apr 11, 2017

Both the Ryzen system and i3 system have completed some short units now. The Ryzen system (1700 at 3.2 GHz all cores active, 2666 ram) did 4 units averaging 2490s actual time, 2464s CPU time. The i3-4360 system (Haswell 3.7 GHz, 1600 ram) did 2 units averaging 1586s actual time, 1580s CPU time. If I allow for their clock differences, the Haswell i3 has 25% higher IPC.

I'll repeat this later when I get home from work, where both systems should have done more units, including long units, and it'll be interesting to see if that also applies there.

I want to throw in Skylake and Sandy Bridge systems too for a better picture. If I'm patient enough, I also have an Ivy Bridge Celeron which lacks AVX and clocked at 2.3GHz...

I could expand the comparison to other people's results if you can link to the machine task list on S@H website, and also state what CPU is on the system, what the CPU clock is running at, how many tasks are running at once, if other tasks (e.g. GPU) are on the system, and if HT/SMT is enabled with or without manually setting affinity.

Keith Myers · Apr 12, 2017

Good idea to split off the Seti app discussion

mackerel said:
Both the Ryzen system and i3 system have completed some short units now. The Ryzen system (1700 at 3.2 GHz all cores active, 2666 ram) did 4 units averaging 2490s actual time, 2464s CPU time. The i3-4360 system (Haswell 3.7 GHz, 1600 ram) did 2 units averaging 1586s actual time, 1580s CPU time. If I allow for their clock differences, the Haswell i3 has 25% higher IPC.

I'll repeat this later when I get home from work, where both systems should have done more units, including long units, and it'll be interesting to see if that also applies there.

I want to throw in Skylake and Sandy Bridge systems too for a better picture. If I'm patient enough, I also have an Ivy Bridge Celeron which lacks AVX and clocked at 2.3GHz...

I could expand the comparison to other people's results if you can link to the machine task list on S@H website, and also state what CPU is on the system, what the CPU clock is running at, how many tasks are running at once, if other tasks (e.g. GPU) are on the system, and if HT/SMT is enabled with or without manually setting affinity.

There is only SSE3 and AVX CPU apps for SETI V8. We used to have SSE4.1 and SSE4.2 CPU apps for the SETI V7 project but the developer left the project and no one has stepped up to the plate to code for the SSE4.X paths. They were always the fastest on the FX processors, even faster than the AVX path.

I seem to get paired up with wingmen that are running Xeon processors, I think because the scheduler thinks that Ryzen with 16 cores is in the same class as the Xeons with 16-56 cores. Never have found any statement that this is truth.... just my suspicion based on the circumstances.

The AVX app is the same one that runs on my FX CPUs and the Ryzen 1700X. But it runs much faster on the Ryzen for some reason. I don't think it is because of faster memory in the Ryzen system. When BOINC first profiles your system or runs the occasional benchmark it uses some sort of Linpack test that produces a integer and floating point number in millions ops/second to help determine the estimated processing time for tasks it sends you. The 1700X tests much higher than the FX systems even though it is at a 600 - 800 Mhz clock disadvantage. Here are my systems:

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8030022 Ryzen 1700X @ 3.85 Ghz

https://setiathome.berkeley.edu/show_host_detail.php?hostid=5741129 FX-8370 @ 4.6 Ghz

https://setiathome.berkeley.edu/show_host_detail.php?hostid=6279633 FX-8350 @ 4.4 Ghz

So the 1700X tests 1000 - 1200 millions ops/sec more for floating point speed and about 4000 millions ops/sec more for integer speed compared to the FX CPUs. My 1700X @ 16826.51 million ops/sec is faster than even the i7-7700K systems that occasionally show up as wingmen for my CPU tasks. At least that is how BOINC profiles them.

Each of my systems also uses two graphics cards along with the CPU. I only crunch SETI CPU tasks. The GPUs crunch 2 tasks concurrently on each card for either SETI, MilkyWay or Einstein. The FX systems have dual GTX 1070's and the Ryzen system has dual GTX 970's. So the systems are usually loaded very heavily, somewhere around 90-100% utilization. At least for the FX systems. The Ryzen system is only loaded to 75% utilization because I am limiting its use of the CPU to only on the physical cores. Just as I do with the FX systems. I run 8 concurrent SETI tasks on the FX systems and 12 concurrent SETI tasks on the 1700X.

Here are the task lists for each system:

https://setiathome.berkeley.edu/results.php?hostid=8030022&offset=0&show_names=0&state=4&appid= 1700X

https://setiathome.berkeley.edu/results.php?hostid=5741129&offset=0&show_names=0&state=4&appid= FX-8370

https://setiathome.berkeley.edu/results.php?hostid=6279633&offset=0&show_names=0&state=4&appid= FX-8350

mackerel · Apr 12, 2017

Thanks, I'll have to give that a closer look some time.

My comparison on my own systems was delayed as servers were down for a large part of yesterday, so I still need to go back and look at that. I forgot they like to do that... other projects I'm active on are rarely down and I like to run with zero cache for various reasons.

Keith Myers · Apr 13, 2017

Discovered an interesting fact over in the FX to Ryzen thread in the Number Crunching Forum at SETI that if you are running Linux, there are SSE4.1 and SSE4.2 CPU apps. In fact, all the flavors are available over at the Lunatics web site.
http://lunatics.kwsn.info/index.php?action=downloads;cat=48

Wish that was the case for Windows too.

Keith Myers · Apr 17, 2017

Just a note about my SETI recent average credit numbers. Since I put the Ryzen system together on Numbskull, I have seen my RAC climb by over 10K in a little over a month now. That is solely due the change in the CPU. The graphics cards have remained the same. Before the update, my RAC was hovering around 38-39K typically. Today my RAC on Numbskull is over 50K. It is rapidly approaching the performance of my two other crunchers which have higher performance graphics cards and are around 52K RAC. I will be updating Numbskull later this week with new 1070's to match my other systems. I expect then that Numbskull will quickly eclipse my faster machines and will become my most powerful cruncher.

Keith Myers · Apr 26, 2017

My guess was correct. Put in dual GTX 1070's a couple days ago and my SETI RAC hit 59K before today's outage. The FX systems maxed out around 52K. So GPUs are equal across all machines. The extra RAC bump is solely from the extra 8 cores or the intrinsic math performance of Ryzen over FX.

Keith Myers · May 2, 2017

Believe that the Ryzen system has reached its normal output plateau of 64K/day. A substantial improvement over the similarly GPU equipped FX based systems with daily output around 52K/day.

mackerel · May 2, 2017

I never did attempt that IPC comparison did I? A bit late now, the units I did are long gone. I'd have a look at Keith's but, yet again, the server is down. No other project I do has that much downtime... no wonder they built in stupid amounts of cache potential.

Keith Myers · May 2, 2017

Yes, Outage Tuesdays are ridiculously long now. Main page used to say the outage for backup maintenance would last for 4-5 hours. The outages are going on at minimum 10 hours to as long as 12-13 hours now. They went down this morning at 5AM PDT. I don't expect them back till 6PM PDT this afternoon. It will be hard to get a real idea of the IPC comparison between my FX and 1700X systems because I skew the APR's of the systems with some very aggressive task rescheduling between CPU <> GPU and Arecibo <> BLC tasks. About the only way to do a real comparison is to look at individual task completion times of similar antenna tasks with the same or very close angle ranges. That is the only way to do a apples to apples comparison about the IPC performance. A good portion of the increase in my daily RAC for the 1700X is simply because it is doing more work per hour than the FX systems because it employs double the number of CPU cores crunching simultaneously. There are however some very real improvements in crunching times for the 1700X using the AVX CPU app for the BLC tasks.

Keith Myers · Sep 20, 2017

Forgot about this thread for quite a while it seems. Anyway, my steady state RAC or average credits per day for the Ryzen 1700x system is around 95K per day. That is with two GTX 1070s plus one GTX 1060 graphics cards crunching 24/7 GPU tasks plus 8 physical cores of the CPU crunching simultaneously.

FYI, my latest cruncher I recently built from the cast off parts that were in the previous incarnation of the Ryzen cruncher has a RAC of around 141,000 credits per day. Running the special CUDA 8.0 app under linux with three GTX 970 graphics cards on the old FX-8300 CPU which is running 4 CPU task simultaneously on the physical cores. Currently occupying the #8 position of the Top 100 host lists at SETI.

CPU performance at seti@home

mackerel

Member

mackerel

Member

Kenrou

Member

mackerel

Member

mackerel

Member

Keith Myers

Member

mackerel

Member

Keith Myers

Member

Keith Myers

Member

Keith Myers

Member

Keith Myers

Member

mackerel

Member

Keith Myers

Member

Keith Myers

Member

Similar threads