DDR4 Memory - Bandwidth, Latency, Quad vs Dual Discussion

Woomack · Aug 16, 2018

Going back to some posts ago ... no one is using term ganged in storage. At least if you work with manufacturers and with business hardware then not even one person will use this term. It's related to memory modes on AMD platform and that's all. Even then barely anyone uses this term since FX generation. Many motherboards simply don't have it described this way. Manufacturers stick to general terms accepted by Intel. The same is with XMP and other things like that.

Quad-channel has 2x higher bandwidth than dual-channel. The same as a dual-channel has 2x higher bandwidth than a single-channel.
There are more variables which are limiting maximum theoretical bandwidth and maximum real bandwidth. Maximum theoretical bandwidth can be described as limited mostly by IMC and used memory (memory frequency). Maximum real bandwidth is affected by frequency, all timings, all other things which are causing delays and some more.

X299 motherboards have 4000+ memory clock in specification ... but only for KabyLake-X processors so nearly the same as typical dual channel CPU on 1151 socket. Quad channel controllers are not guaranteed to run much above 3600.
IMC on quad channel processors ends at about ~3866-4200. IMC on dual channel processors typically ends at about 4500-4600.
Can count that to achieve similar bandwidth, dual channel will have to run at 4200-4400 to match quad channel at 3200 (maybe not 100% correct but something close).

The latency of dual channel controllers is lower. It's because of different technology/architecture, not because dual is faster or something like that. Latency that we see is a mix of variables, not only memory related. Simply everything what we see in benchmarks is a mix of memory, cpu, internal bus, cache and some others.
Even if you see 80ns in AIDA64 then it doesn't mean it's slow. A lot of delays are covered by large and fast cache. Actually the main difference in processors (Intel gens are much easier to compare) is cache and internal delays. We can add some instructions but a lot of programs are not really using them.

We won't see quad channel performance gain if we use simple programs which are not using a lot of RAM. If in use are many small files but in total won't use a lot of RAM then it's clear that dual channel can be faster because of faster access time. Once we move to much larger files then quad channel will be faster, especially if environment is highly multi-threaded.
Games are barely using couple of threads and most of them are not so large. Even the largest games are not loading everything to RAM, more like 4-8GB.

mackerel · Aug 16, 2018

Woomack said:
Can count that to achieve similar bandwidth, dual channel will have to run at 4200-4400 to match quad channel at 3200 (maybe not 100% correct but something close).

The latency of dual channel controllers is lower. It's because of different technology/architecture, not because dual is faster or something like that. Latency that we see is a mix of variables, not only memory related. Simply everything what we see in benchmarks is a mix of memory, cpu, internal bus, cache and some others.
Even if you see 80ns in AIDA64 then it doesn't mean it's slow. A lot of delays are covered by large and fast cache. Actually the main difference in processors (Intel gens are much easier to compare) is cache and internal delays. We can add some instructions but a lot of programs are not really using them.

Now you got me thinking... does quad channel scale as badly as above, or is there some other limitation at play? Doesn't measured bandwidth also get affected by the variables as mentioned for latency? What might be an interesting exercise, not one I can do any time soon, is to get similar-ish configurations to test between. For example, I'd use my 8086k on Z370 as representative of dual channel. For quad channel, I'd go two ways, to 5820k on X99, and 7800X on X299. One of the pains of Skylake-x is the new cache doesn't play so friendly with let's call it consumer type workloads, and even overclocked the numbers seem a bit low. Does 5820k show a higher performance with the same ram? And are either of those approximating double 8086k? Edit: I could also test the quad channel systems in dual channel mode...

Part of the problem may be that 6 cores aren't enough to max out quad channel. As mentioned earlier, my rule of thumb is roughly two fast channels for 4 fast Intel cores, and that is in a known ram bandwidth hungry load. 6 cores may not be limited by quad channel. Maybe I should test dual channel with 3 cores... to balance it a bit...

Woomack · Aug 16, 2018

You can run 5820K, 7800X and 8086K at the same frequency with memory at the same frequency, in dual channel on all platforms. Then you will see the difference in cache speed and delays related to cache and IMC. Can't really compare them only looking at memory channels as each one is from different generation and has different architecture.
Benchmarks like AIDA64 will show max bandwidth on as many cores you will use. Also windows command - winsat, will show it.

I made programmers department to make a simple benchmark for MS SQL and using one of the databases, I've noticed that memory frequency doesn't really matter up to ~12 threads. I had no time to check more cores but CPU frequency was scaling great with MS SQL while memory not.

mackerel · Aug 16, 2018

My thinking was more to test your statement on bandwidth scaling. If quad channel doesn't give double dual channel in practice, why? Lack of cores to drive the demand is my best thought. The systems I mentioned all have 3000 ram in them already, although timings will be different so that may have a small impact. I'm hoping it wont be significant, at least it isn't in my other uses. If I'm looking for a factor of 2 difference, a couple % isn't significant.

Woomack · Aug 16, 2018

I said that channels are scaling well ... it just that theoretical max is about 30% higher than we actually get, regardless if it's in dual or quad channel. It's impossible to reach max theoretical bandwidth on any platform because of mentioned delays, spread requests etc.
It's easier to compare channels on Ryzen and it looks somehow better than on Intel platforms. 3600 in quad channel on X399 is about ~90GB/s and in dual channel on X370 about ~50GB/s so it's not far from double and architecture is about the same.

mackerel · Aug 16, 2018

I don't expect the theoretical max, but I was hoping to see quad channel give about double dual, if the cores are enough to drive it. Your earlier post suggested increasing clocks on dual channel by around 35% was enough for it to match quad.

wingman99 · Aug 16, 2018

EarthDog said:
Yes.

Amd could choose between 1x128bits (ganged) or 2x64bits (unganged) for their memory. AFAIK it has nothing to do with dual or single channel.

Looking back through this thread there is confusion, I was trying to make it easy to envision using the unganged term and HDD JBOD (just a bunch of disks) is comparable in certain respects as Intel dual and Quad memory channel, so the Intel memory channels looks like this in dual channel, two 64-bit channels = two independent 64-bit wide channels. Unganged means the same setup.

The ganged term and RAID 0 is comparable in certain respects as dual channel combining two 64-bit buses into a single 128-bit bus. Ganged means the same setup.

EarthDog · Aug 16, 2018

We get it.... but the 'ganged' HDD analogy was the reach as that isn't a common term associated with hdds.

That explanation above gave me a headache (though, we get it).

Its OK.

DDR4 Memory - Bandwidth, Latency, Quad vs Dual Discussion

Woomack

Benching Team Leader

mackerel

Member

Woomack

Benching Team Leader

mackerel

Member

Woomack

Benching Team Leader

mackerel

Member

wingman99

Member

EarthDog

Gulper Nozzle Co-Owner

Similar threads