• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Just when you think the OC is stable...

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

mackerel

Member
Joined
Mar 7, 2008
System in question is i3-8350k currently at stock core clocks, on Asrock Z370 Pro4 mobo. This is the one I got B-die to play with, eventually settling on a manually tuned 3866C17. To get the most out of the ram, I increased cache to 4.0 also, up from 3.7 stock. Until yesterday, this ran everything I threw at it without error.

I do prime number hunting, and the software used is comparable in stress to Prime95. This is in part why I don't overclock the CPU cores, as feeding it with fast enough ram is the bigger problem. At the start of the year was a 15 day challenge with large units, around 2000k FFT. This is usually ram bandwidth limited. It did that fine and without error.

More recently is a different challenge, where there are many strategies that may be employed. My 1st goal was to find any top 5000 reportable prime number, so running small units (but big enough for top 5000) gives the best chance there. It ran that for some days, again without error. Here FFT sizes are typically 120k so rather small and easily fit within CPU cache with room to spare.

Once I had found one of those prime numbers, it was over to my second phase, to find a million digit prime number. These tasks are 256k FFT size, so would use 2MB/task. At one task per core, that ideally fits in the L3 cache outside of some older i3/i5 CPUs. Now I'm getting errors. Of 48 validated units, 2 of them were declared invalid, meaning on double check my result did not agree with others.

I guess this is part of the fun when determining 24/7 stability. Different tasks will stress differently, so you can appear to be fine for a long time until you do something you haven't already, and uncover a weakness. For now, I dropped the cache down 100 MHz and will see if that helps. It may be an exercise for another day, but has anyone determined how fast Intel caches need to run at to not significantly limit ram performance? I've been working on an assumption that nominal cache speed should exceed ram speed, but have no proof of that. e.g. for my 3866 ram, I'd target 3900+ cache. I think I just talked myself into doing more testing later.

CPU temps are <60C so overheating is not a concern.
 
For prime number searching, this is good to know. Surely you are stable doing everything else?
 
What software do you use for prime number searching and do the temperatures get as high as prime95 FMA3 torture test?
 
For prime number searching, this is good to know. Surely you are stable doing everything else?

Stable as far as I can tell... but I'm only stable until I'm unstable. On that note, I was expecting more bad results as validations come in... but so far, there's not been any more since those 2. I'm now on 66 good, 2 bad, although I can't be sure if some of those "good" units will be from after I dropped the cache slightly. The validation is irregular in time as it is done by other boinc users. Work produces a checksum, and if they match the results are assumed good. In theory there might be a 1 in 2^64 chance of two wrong units producing the same value, but obviously that's highly unlikely. In order of likeliness, bad results are hardware problems by a long way, software problems are not impossible so are distant second. I guess what I'm interested in is if I see more bad results before I dropped the cache, and hopefully no bad results afterwards. Otherwise there is more work to do.

What software do you use for prime number searching and do the temperatures get as high as prime95 FMA3 torture test?

The software is called LLR if you want to look it up. The thing to note is, it uses the same gwnum library that does the heavy lifting that Prime95 does. I think it was built using the code from a late 28.x version, so wont have the minor tweaks in 29.x but they're not that significant (except for Ryzen). In terms of loading, at the same FFT size and thread count, they behave the same.
 
I think Intel says keep the cache within 100 MHz of core speed (no help there). My first thought is the cache should be at least as fast as the RAM, but as that's a $200 apiece set of sticks, I won't have any personal data on that for quite a while. The memory I/O can't go any faster than the cache can handle transaction, can it? And vice versa?
 
Where do they say that? I'm not sure many of my CPUs do that at stock. Further, a fast cache can help even with slow ram, if the software runs out of cache.
 
That's why my cache is OC'd. It helps with things that aren't large enough to go to RAM. I was just referring to the cache not bottlenecking the memory due to a speed deficiency relative to the RAM. I'll try to find that info again. I did a lot of research on cache speed and RAM recently, but found way more data than I can keep in my head with the short exposure/usage.
 
On Intel mainstream processor the L1, L2 cache is the highest speed runs the speed of the cores clock. L3 cache is the third highest speed runs with ring bus = (Uncore)=(System Ajent) separately and is what we can also overclock. The CPU searches for instructions and data in L1 first then L2, L3, main memory finally HDD.

i5 2500k die.jpg arch.jpg
 
How does that help at all?

Anyway, at 3.9 cache I had another inconclusive unit so I've backed off another step to 3.8. There remains a possibility it isn't the cache, but some other edge case with the ram. Although the task sizes should fit within cache, that doesn't mean it never hits the ram, so might explain why the low chance of a bad unit, of the order of under 5%.
 
How does that help at all?

Anyway, at 3.9 cache I had another inconclusive unit so I've backed off another step to 3.8. There remains a possibility it isn't the cache, but some other edge case with the ram. Although the task sizes should fit within cache, that doesn't mean it never hits the ram, so might explain why the low chance of a bad unit, of the order of under 5%.

I just thought you would find it interesting how it works for trouble shooting when overclocking. Overclocking (Uncore)/(System Ajent) on Intel will not allow you to clock the Uncore at the CPU clock speed, Uncore will be 300Hz less than the CPU clock speed no mater what you set in BIOS. Uncore has to sync with the cores.
 
Overclocking (Uncore)/(System Ajent) on Intel will not allow you to clock the Uncore at the CPU clock speed, Uncore will be 300Hz less than the CPU clock speed no mater what you set in BIOS. Uncore has to sync with the cores.

Default Uncore on my chip was 4000 MHz. It's running at 4500 MHz on a core clock of 4600 MHz now.
nope.JPG

Default speeds
CPUZ default cache.JPG
 
I had another inconclusive unit at 3.8, so now wondering if it is the cache at all. Something I hadn't consciously registered until now, I started running the current batch at the start of the month, and the first bad unit wasn't until late on the 3rd. Random thinking, could it be some kinda slow thermal buildup that's only now just reaching the tipping point? CPU temps are fine at below 60C, but I don't know about anything else in there. Note I also have a GPU doing mining in there. Will need to take a deeper look. At work right now, so wont be until evening before I can take a look at it again.
 
I can't set the uncore frequency above the CPU frequency, but that seems to be my only limit. I'm surprised your KL chip won't do it. Mine will do it with XTU and the BIOS, along with having voltage offset available for it. I wonder what the difference is? Maybe Intel found something at higher clock speeds that changes the stability? Your chip is running a lot faster than mine was stock (3900 MHz) without Turbo)
 
I can't set the uncore frequency above the CPU frequency, but that seems to be my only limit. I'm surprised your KL chip won't do it. Mine will do it with XTU and the BIOS, along with having voltage offset available for it. I wonder what the difference is? Maybe Intel found something at higher clock speeds that changes the stability? Your chip is running a lot faster than mine was stock (3900 MHz) without Turbo)

I don't know, maybe it is because it is a i5 or BIOS, although We both have Gigabyte motherboards. At 4.8GHz my uncore maximum I can set is 4.5GHz. At 4.6GHz my uncore Maximum I can set is 4.3GHz.
 
Opened up the system and got the thermal camera out. Only the GPU was hot, reaching 80C in places. Nothing else was nearly as hot. Nearest might be southbridge heatsink around 50C. Even the CPU VRM area was cooler than that. So think I can rule out temperature.

Even if that's not the case, I moved the GPU up a slot to the secondary slot (note the case is inverted). Doesn't seem to make a different to temperatures but there is more clear space around CPU now. Ooh, because I rebooted, cache is back to 4000. I've changed the software to run in multi-thread mode, because I can.
 
Can't easily take a screenshot right now. It wont show you any more than I already mentioned anyway. A similar screenshot is at http://hwbot.org/submission/3725076_ but just make the CPU and cache both 4000 is the only difference. Same ram settings.
 
Why does L3 cache have to be slower than cpu? I can set mine uncore to like 3ghz when my CPU is at like 2.6. (I think)

Or is it not the same for different gen CPUs?
 
Why does L3 cache have to be slower than cpu? I can set mine uncore to like 3ghz when my CPU is at like 2.6. (I think)

Or is it not the same for different gen CPUs?

Post back with a screenshot of CPU-Z CPU clock speed and Cache speed.
 
Back