FEATURED Marathon Season VII October: y-cruncher - Pi-1b

caddi daddi · Oct 11, 2018

playing with it just a littl bit, it looks like pure ram speed is what it likes so loose timings don't give you A major hit.

mllrkllr88 · Oct 11, 2018

Johan45 said:
Nice to see you found your way here mllrkllr

Whoot, thanks, this is a fun comp you have going here

I started to setup 2990WX last night to run for this comp. I am going to check out Y-Cruncher on that platform for science and see how it scales with cores. Any predictions how 2990WX will compare to an Intel like 7940x for instance?

EarthDog · Oct 11, 2018

Good question...

I enabled HT on the 7960X I have (AVX set to run at 4.2) and it scored the same. That said, I have no idea why... if there was temperature or VRM throttling, I don't know as I walked away from it. I'd be floored if this didn't scale with HT.

mackerel · Oct 11, 2018

7960X has a lot of cores but still only quad channel ram. Maybe it is 100% ram limited.

EarthDog · Oct 11, 2018

Perhaps, though that isn't where my money would go. I'd bet more on some kind of throttling, personally. I was already on the tipping point temp wise without HT, I enabled it and dropped AVX to 42 (from 43x) which may not have been enough.

Mysticial · Oct 11, 2018

mllrkllr88 said:
Whoot, thanks, this is a fun comp you have going here I started to setup 2990WX last night to run for this comp. I am going to check out Y-Cruncher on that platform for science and see how it scales with cores. Any predictions how 2990WX will compare to an Intel like 7940x for instance?

It probably won't scale well. Still only 4 memory channels. So my guess is that it'll score similarly to the 7940X (which has the advantage of AVX512).

Epyc will beat the 2990WX hands down no overclocking needed.

EarthDog said:
Perhaps, though that isn't where my money would go. I'd bet more on some kind of throttling, personally. I was already on the tipping point temp wise without HT, I enabled it and dropped AVX to 42 (from 43x) which may not have been enough.

What's your AVX512 set to? That's the one that matters.

Johan45 · Oct 11, 2018

mllrkllr88 said:
Whoot, thanks, this is a fun comp you have going here I started to setup 2990WX last night to run for this comp. I am going to check out Y-Cruncher on that platform for science and see how it scales with cores. Any predictions how 2990WX will compare to an Intel like 7940x for instance?

If it scales linearly with the cores/threads should put you around 17 seconds but I doubt it's going to do that. Maybe in the 20 second range as a guess

EarthDog · Oct 11, 2018

Mysticial said:
It probably won't scale well. Still only 4 memory channels. So my guess is that it'll score similarly to the 7940X (which has the advantage of AVX512).

Epyc will beat the 2990WX hands down no overclocking needed.

What's your AVX512 set to? That's the one that matters.

Correct. Sorry, I use that interchangeably (incorrectly)... AVX-512 was also set to 42.

Mysticial · Oct 11, 2018

EarthDog said:
Correct. Sorry, I use that interchangeably (incorrectly)... AVX-512 was also set to 42.

Oh ok.

TBH, 42 is kinda high for AVX512. For these HCC chips, I'd keep it below 4.0 GHz unless you're delided and subzero.

I currently run my 7940X @ 3.6 GHz AVX512. That's already high enough for it to pull around 270 - 300W under load. I wouldn't dare try it at 4.2 GHz - which would probably go upwards of 500W under the required voltages.

mackerel · Oct 11, 2018

Maybe a perk of "only" having a 6 core Skylake-X, I could do 1b and 10b at 4.3 GHz. For 25m, I could risk running at 4.5 as it lasts under a second

Certainly wasn't stable though.

Out of interest, which part(s) of the process is the AVX heavy part? Looking at the output, "summing series" seems to take ball park 80% of the time, and everything else the remainder.

I'm also thinking, it may be possible to get an estimate for the CPU:ram balance by performing multiple tests in various configurations... if this were Prime95 bad, I wouldn't want more than 8 fast (non-AVX512) cores on quad channel ram. I know it was said y-cruncher isn't like Prime95, but there will still be some point where it impacts.

Mysticial · Oct 11, 2018

mackerel said:
Maybe a perk of "only" having a 6 core Skylake-X, I could do 1b and 10b at 4.3 GHz. For 25m, I could risk running at 4.5 as it lasts under a second Certainly wasn't stable though.

Out of interest, which part(s) of the process is the AVX heavy part? Looking at the output, "summing series" seems to take ball park 80% of the time, and everything else the remainder.

Pretty much all of it except for the initial memory allocation, and digit output to disk. It's been AVX'ed pretty much from start to finish.

I'm also thinking, it may be possible to get an estimate for the CPU:ram balance by performing multiple tests in various configurations... if this were Prime95 bad, I wouldn't want more than 8 fast (non-AVX512) cores on quad channel ram. I know it was said y-cruncher isn't like Prime95, but there will still be some point where it impacts.

Here are the overall distributions for various apps:

Those benchmarks were done on my 7900X with ram somewhere between 3200 - 3800. So it's nowhere near as memory-bound as the HCCs. The graph on my 7940X is much worse. But I don't have it in front of me.

I don't have the bandwidth usage as a function of time graphs in front of me. But y-cruncher does go back-and-forth between bandwidth-heavy and bandwidth-light stages - which is why there's a distribution in the graph.

EarthDog · Oct 11, 2018

Mysticial said:
Oh ok.

TBH, 42 is kinda high for AVX512. For these HCC chips, I'd keep it below 4.0 GHz unless you're delided and subzero.

I currently run my 7940X @ 3.6 GHz AVX512. That's already high enough for it to pull around 270 - 300W under load. I wouldn't dare try it at 4.2 GHz - which would probably go upwards of 500W under the required voltages.

I'm almost imagining VRM throttling. I'll dial up XTU and run it at 4.2 all c/t and see if what the throttling reason is. Should be either temps, power, or vrms. This is under a 3x120 rad with some Yate Loons set to 1K RPM. CPU is at 1.18V for 4.4 GHz / -2 AVX/AVX512.

mllrkllr88 · Oct 11, 2018

I was running 7940x at full 4.5g @1.26v bios with NO AVX offset. AVX512 is no joke, even with my good water loop the temps were hitting 90c. I never tried offset but this chip can run 5g with HT disabled for mos things so I was a bit shocked to see 4.5g as the max.

The 2990WX score is looking bad, I am still working on it to figure out what it needs.

mllrkllr88 · Oct 12, 2018

mllrkllr88 / 2990WX / Custom Water / 32.586

I did a bit of testing with the threadripper tonight. It seems to be scaling both with memory frequency and timings. The unfortunate thing is that this bench requires too much real memory (7gb) to make 3600 CL12-11 possible.

50.7 Seconds: 4.0G, SMT Disable, 3200c14 Loose

39.8 Seconds: 4.0G, SMT Enable, 3200c14 Loose

35.0 Seconds: 4.0G, SMT Enable, 3600c14 Loose

32.5 Seconds: 4.0G, SMT Enable, 3600c14 Tight (Submission for comp)

Mysticial · Oct 12, 2018

mllrkllr88 said:
mllrkllr88 / 2990WX / Custom Water / 32.586

I did a bit of testing with the threadripper tonight. It seems to be scaling both with memory frequency and timings. The unfortunate thing is that this bench requires too much real memory (7gb) to make 3600 CL12-11 possible.

Something is weird with the 1st and last ones.

It's very subtle, but if you look at the lines: "Working Memory... 5.06 GiB (locked, spread: 50%/2)"

It's at 50% for the 1st and 4th ones. And it's at 100% for the 2nd and 3rd ones. Higher is better.

The %/2 tells you how well the memory is distributed across the two NUMA nodes. 100%/2 means it's perfectly distributed across 2 NUMA nodes. 0% means it's all on 1 NUMA node (no distribution at all).

TR has 2 NUMA nodes - each with 2 memory channels. In order to get full memory bandwidth, the memory has to be distributed evenly across both nodes. Otherwise there will be an imbalance in bandwidth usage across the memory channels.

It's very well distributed in the 2nd and 3rd runs, but poorly distributed in the 1st and 4th runs.

I can't explain why the 1st and 4th runs are so poor. The program tries its best to evenly spread out the memory, but this isn't always possible if one or more of the nodes is out of memory.

So it might be worth rerunning a few times to see if they are consistently at 50%. You may even need to try rebooting. It's hard to predict performance on NUMA architectures, but there's a possibly that if you rerun your best settings enough times, you'll eventually land one with a 100% spread - which may have the lowest time.

Another thing you can try is to change the memory mode to "UMA Distributed Mode". I believe the default is "Local Mode NUMA". Though I've never actually seen a TR BIOS so I don't know for sure.

(G{in}[AK)TION] · Oct 12, 2018

awit are we supposed to be submitting these to HWBOT as well?

mackerel · Oct 12, 2018

The team cup uses this, but only for quad cores. Otherwise you can still submit for your personal/team general position.

custom90gt · Oct 12, 2018

Custom90gt / 7920x / AIO / 28.09

Johan45 · Oct 12, 2018

Johan45/ Ryzen 2700X/ LN2/ 57.934s

mllrkllr88 · Oct 12, 2018

Mysticial said:
Something is weird with the 1st and last ones.

So it might be worth rerunning a few times to see if they are consistently at 50%. You may even need to try rebooting. It's hard to predict performance on NUMA architectures, but there's a possibly that if you rerun your best settings enough times, you'll eventually land one with a 100% spread - which may have the lowest time.

Wow, nice info, thank you for that!! +1

I toggled a bunch of stuff in the bios and played around with different configurations and I was not able to get the 100% that you are talking about. I was not able to find a setting that yields 100%/2 every time. However, I was able to hit that lucky run twice within 30+ sample runs, so it is possible. I dont have a very good handle of this bench yet, but if it ever becomes a global bench on Hwbot then I am sure we will all be learning alot more about this bench.

Here is the 'lucky" run:
mllrkllr88 / 2990WX / Custom Water / 23.598s

FEATURED Marathon Season VII October: y-cruncher - Pi-1b

Godzilla to ant hills

Member

Gulper Nozzle Co-Owner

Member

Gulper Nozzle Co-Owner

New Member

Benching Team Leader Super Moderator

Gulper Nozzle Co-Owner

New Member

Member

New Member

Gulper Nozzle Co-Owner

Member

Member

New Member

Member

Member

Member

Benching Team Leader Super Moderator

Member

Similar threads