• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

FEATURED Marathon Season VII October: y-cruncher - Pi-1b

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
playing with it just a littl bit, it looks like pure ram speed is what it likes so loose timings don't give you A major hit.
 
Nice to see you found your way here mllrkllr

Whoot, thanks, this is a fun comp you have going here :D I started to setup 2990WX last night to run for this comp. I am going to check out Y-Cruncher on that platform for science and see how it scales with cores. Any predictions how 2990WX will compare to an Intel like 7940x for instance?
 
Good question...

I enabled HT on the 7960X I have (AVX set to run at 4.2) and it scored the same. That said, I have no idea why... if there was temperature or VRM throttling, I don't know as I walked away from it. I'd be floored if this didn't scale with HT.
 
Perhaps, though that isn't where my money would go. I'd bet more on some kind of throttling, personally. I was already on the tipping point temp wise without HT, I enabled it and dropped AVX to 42 (from 43x) which may not have been enough.
 
Whoot, thanks, this is a fun comp you have going here :D I started to setup 2990WX last night to run for this comp. I am going to check out Y-Cruncher on that platform for science and see how it scales with cores. Any predictions how 2990WX will compare to an Intel like 7940x for instance?

It probably won't scale well. Still only 4 memory channels. So my guess is that it'll score similarly to the 7940X (which has the advantage of AVX512).

Epyc will beat the 2990WX hands down no overclocking needed.


Perhaps, though that isn't where my money would go. I'd bet more on some kind of throttling, personally. I was already on the tipping point temp wise without HT, I enabled it and dropped AVX to 42 (from 43x) which may not have been enough.

What's your AVX512 set to? That's the one that matters.
 
Whoot, thanks, this is a fun comp you have going here :D I started to setup 2990WX last night to run for this comp. I am going to check out Y-Cruncher on that platform for science and see how it scales with cores. Any predictions how 2990WX will compare to an Intel like 7940x for instance?

If it scales linearly with the cores/threads should put you around 17 seconds but I doubt it's going to do that. Maybe in the 20 second range as a guess
 
It probably won't scale well. Still only 4 memory channels. So my guess is that it'll score similarly to the 7940X (which has the advantage of AVX512).

Epyc will beat the 2990WX hands down no overclocking needed.




What's your AVX512 set to? That's the one that matters.
Correct. Sorry, I use that interchangeably (incorrectly)... AVX-512 was also set to 42.
 
Correct. Sorry, I use that interchangeably (incorrectly)... AVX-512 was also set to 42.

Oh ok. :)

TBH, 42 is kinda high for AVX512. For these HCC chips, I'd keep it below 4.0 GHz unless you're delided and subzero.

I currently run my 7940X @ 3.6 GHz AVX512. That's already high enough for it to pull around 270 - 300W under load. I wouldn't dare try it at 4.2 GHz - which would probably go upwards of 500W under the required voltages.
 
Maybe a perk of "only" having a 6 core Skylake-X, I could do 1b and 10b at 4.3 GHz. For 25m, I could risk running at 4.5 as it lasts under a second :) Certainly wasn't stable though.

Out of interest, which part(s) of the process is the AVX heavy part? Looking at the output, "summing series" seems to take ball park 80% of the time, and everything else the remainder.

I'm also thinking, it may be possible to get an estimate for the CPU:ram balance by performing multiple tests in various configurations... if this were Prime95 bad, I wouldn't want more than 8 fast (non-AVX512) cores on quad channel ram. I know it was said y-cruncher isn't like Prime95, but there will still be some point where it impacts.
 
Maybe a perk of "only" having a 6 core Skylake-X, I could do 1b and 10b at 4.3 GHz. For 25m, I could risk running at 4.5 as it lasts under a second :) Certainly wasn't stable though.

Out of interest, which part(s) of the process is the AVX heavy part? Looking at the output, "summing series" seems to take ball park 80% of the time, and everything else the remainder.

Pretty much all of it except for the initial memory allocation, and digit output to disk. It's been AVX'ed pretty much from start to finish.

I'm also thinking, it may be possible to get an estimate for the CPU:ram balance by performing multiple tests in various configurations... if this were Prime95 bad, I wouldn't want more than 8 fast (non-AVX512) cores on quad channel ram. I know it was said y-cruncher isn't like Prime95, but there will still be some point where it impacts.

Here are the overall distributions for various apps:
Those benchmarks were done on my 7900X with ram somewhere between 3200 - 3800. So it's nowhere near as memory-bound as the HCCs. The graph on my 7940X is much worse. But I don't have it in front of me.

I don't have the bandwidth usage as a function of time graphs in front of me. But y-cruncher does go back-and-forth between bandwidth-heavy and bandwidth-light stages - which is why there's a distribution in the graph.
 
Oh ok. :)

TBH, 42 is kinda high for AVX512. For these HCC chips, I'd keep it below 4.0 GHz unless you're delided and subzero.

I currently run my 7940X @ 3.6 GHz AVX512. That's already high enough for it to pull around 270 - 300W under load. I wouldn't dare try it at 4.2 GHz - which would probably go upwards of 500W under the required voltages.
I'm almost imagining VRM throttling. I'll dial up XTU and run it at 4.2 all c/t and see if what the throttling reason is. Should be either temps, power, or vrms. This is under a 3x120 rad with some Yate Loons set to 1K RPM. CPU is at 1.18V for 4.4 GHz / -2 AVX/AVX512.
 
I was running 7940x at full 4.5g @1.26v bios with NO AVX offset. AVX512 is no joke, even with my good water loop the temps were hitting 90c. I never tried offset but this chip can run 5g with HT disabled for mos things so I was a bit shocked to see 4.5g as the max.

The 2990WX score is looking bad, I am still working on it to figure out what it needs.
 
mllrkllr88 / 2990WX / Custom Water / 32.586

I did a bit of testing with the threadripper tonight. It seems to be scaling both with memory frequency and timings. The unfortunate thing is that this bench requires too much real memory (7gb) to make 3600 CL12-11 possible.

50.7 Seconds: 4.0G, SMT Disable, 3200c14 Loose
NOHT_3200.png

39.8 Seconds: 4.0G, SMT Enable, 3200c14 Loose
HT_3200.png

35.0 Seconds: 4.0G, SMT Enable, 3600c14 Loose
HT_3600.png

32.5 Seconds: 4.0G, SMT Enable, 3600c14 Tight (Submission for comp)
HT_3600_TIGHT.png
 
mllrkllr88 / 2990WX / Custom Water / 32.586

I did a bit of testing with the threadripper tonight. It seems to be scaling both with memory frequency and timings. The unfortunate thing is that this bench requires too much real memory (7gb) to make 3600 CL12-11 possible.

Something is weird with the 1st and last ones.

It's very subtle, but if you look at the lines: "Working Memory... 5.06 GiB (locked, spread: 50%/2)"

It's at 50% for the 1st and 4th ones. And it's at 100% for the 2nd and 3rd ones. Higher is better.

The %/2 tells you how well the memory is distributed across the two NUMA nodes. 100%/2 means it's perfectly distributed across 2 NUMA nodes. 0% means it's all on 1 NUMA node (no distribution at all).

TR has 2 NUMA nodes - each with 2 memory channels. In order to get full memory bandwidth, the memory has to be distributed evenly across both nodes. Otherwise there will be an imbalance in bandwidth usage across the memory channels.

It's very well distributed in the 2nd and 3rd runs, but poorly distributed in the 1st and 4th runs.

I can't explain why the 1st and 4th runs are so poor. The program tries its best to evenly spread out the memory, but this isn't always possible if one or more of the nodes is out of memory.

So it might be worth rerunning a few times to see if they are consistently at 50%. You may even need to try rebooting. It's hard to predict performance on NUMA architectures, but there's a possibly that if you rerun your best settings enough times, you'll eventually land one with a 100% spread - which may have the lowest time.

Another thing you can try is to change the memory mode to "UMA Distributed Mode". I believe the default is "Local Mode NUMA". Though I've never actually seen a TR BIOS so I don't know for sure.
 
Last edited:
The team cup uses this, but only for quad cores. Otherwise you can still submit for your personal/team general position.
 
Something is weird with the 1st and last ones.

So it might be worth rerunning a few times to see if they are consistently at 50%. You may even need to try rebooting. It's hard to predict performance on NUMA architectures, but there's a possibly that if you rerun your best settings enough times, you'll eventually land one with a 100% spread - which may have the lowest time.
Wow, nice info, thank you for that!! +1

I toggled a bunch of stuff in the bios and played around with different configurations and I was not able to get the 100% that you are talking about. I was not able to find a setting that yields 100%/2 every time. However, I was able to hit that lucky run twice within 30+ sample runs, so it is possible. I dont have a very good handle of this bench yet, but if it ever becomes a global bench on Hwbot then I am sure we will all be learning alot more about this bench.

Here is the 'lucky" run:
mllrkllr88 / 2990WX / Custom Water / 23.598s
snaphsot0002.png
 
Back