• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

DDR4 Memory - Bandwidth, Latency, Quad vs Dual Discussion

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
While all these RAM dependent cases may be great stuff, we have people going back to 8 GB on builds because of prices. It won't matter how great dual, quad, or octo channel RAM is if people can't buy it.

I think 8GB is still sufficient for a budget minded build. While I have 16GB on my main systems, and even 64 GB "because I can" on another, I rarely exceed 8GB usage... then again, I'm not the sort of person with a million Chrome tabs open while gaming which is about the nearest scenario I might encounter to using a lot of ram.

If you're looking at quad channel ram, the implication is you're already on the higher end of the CPU scale so you're willing to spend as needed for a higher performance level. Dunno about the rest of world, but roughly speaking a 16GB kit of ram is ball park half the price of an 8700k or 7800X. If you can only afford 8GB, higher end CPUs might not be the best value choice to go with it.
 
So your saying RAM prices knock some people right out of the market for a nicer desktop build. Kinda my point. What was , albeit briefly, a nice standard for a new desktop two years ago (mine for example) is out of reach due to RAM prices. "Can't afford 16 GBs? Get 8 GB, and you shouldn't be looking at an i7. Pentiums are over there, move along".
 
I think 8GB is still sufficient for a budget minded build. While I have 16GB on my main systems, and even 64 GB "because I can" on another, I rarely exceed 8GB usage... then again, I'm not the sort of person with a million Chrome tabs open while gaming which is about the nearest scenario I might encounter to using a lot of ram.

If you're looking at quad channel ram, the implication is you're already on the higher end of the CPU scale so you're willing to spend as needed for a higher performance level. Dunno about the rest of world, but roughly speaking a 16GB kit of ram is ball park half the price of an 8700k or 7800X. If you can only afford 8GB, higher end CPUs might not be the best value choice to go with it.
When money is driving the decision, it has to be good enough. However, already, many games and many other users, can easily eclipse that value. I would honestly say 8GB is the bare minimum for those using their PC for more than a web/word processing machine (think gaming, productivity, etc).

Some people simply need the cores, and not the bandwidth increases. Again memory bandwidth isn't important to the majority of users. Some tests, like what you do mack, respond well to increased capacity/bandwidth/latency (whatever), and most things do not. I liken it to attaching a fire hose to your garden spicket. Just because the hose is bigger (bandwidth) doesn't mean it wil be used - the faucet can only output so much regardless of hose size. :)
 
Last edited:
I think it was Hardware Unboxed youtube channel, and their related web site TechSpot, where they did test ram usage scenarios in modern gaming. The short version was 8GB is enough, as long as you have a good amount of vram on your GPU for the game/settings. If the GPU doesn't, then it will put the rest into system ram and you could bust the 8GB requirement that way. So... if you have a low end GPU with 2 or 3 GB of vram, you might be in trouble. If you have 6GB+ you'll probably be fine.

I still stand by my thinking that as total core potential continues to increase, so must the ram bandwidth otherwise it will start impacting more things more often. Right now the core potential is growing faster than the ram potential.

Actually, earlier this year I had to replace my "gaming" laptop when the previous one died. I went budget this time around, it came with an i5-7300HQ (quad core, 2.5 base, 3.5 turbo, no HT), 1050 2GB, and 1x8GB 2400 ram. I knew this was minimally adequate for my gaming needs, but I realised I used a laptop for gaming a LOT less than I thought, so didn't want to spend big bucks on it again. The laptop happened to have a spare DIMM slot so I put in another 8GB module extracted from the old laptop, giving me dual channel 2133. In bandwidth only it should be 78% faster. Aida64 mem tests were between 62-74% faster, so that was most of the way there. I only did limited 3D benching with it though. FFXIV Stormblood benchmark saw 6.7% increase in average fps on "Standard (Laptop)" preset, and 3.6% on "High (Laptop)". I assume the "high" setting put more load on GPU and it became more of a limiting factor. FireStrike and TimeSpy scores hardly changed at all, gaining 0.5% and 0.4% receptively. I didn't do multiple runs so will leave it as not significant. The 5 parts of the GTAV built in benchmark (defaults, 1080p, vsync off) varied from -1.5% to +3.1%, so again hard to call that conclusive. I'm probably more GPU limited than CPU/ram in the case of that laptop.
 
There are several games out with more coming weekly that can use 8gb. As i saod, its a minimum these days.

Bamdwidth... ive read enough and tested enough to know it makes little difference for me and most users. There certainly are use cases for more bandwidth, but many, most, simply dont need it. :)

Whwn people actually need and use more than 4-6 cores, it may be needed...but imo, it will take longer than it feels you believe. :)
 
Due, in part, to (v)ram prices a lot of people had to settle for 3 and 4 GB cards. That leaves them, not to put too fine a point on it, screwed.
 
Bamdwidth... ive read enough and tested enough to know it makes little difference for me and most users. There certainly are use cases for more bandwidth, but many, most, simply dont need it. :)

Whwn people actually need and use more than 4-6 cores, it may be needed...but imo, it will take longer than it feels you believe. :)

I don't disagree with that, but it isn't an all or nothing thing. It will be a gradual shift and I kinda want to head it off before it becomes more of a problem. It may be both Intel and AMD will continue to restrict more ram channels to HEDT platforms, but given the turbulent times we live in, there is scope for something a bit different than we have come to expect. Maybe not in 2018, but perhaps 2019 or 2020 generations.
 
Yep, you are in a position few others are in where 'heading it off' is a good idea. Most are not. :)

Its been a gradual shift for 10+ years. Remember when the Q6600 was released? Better get a quad or else...... Here we are a decade+ later and hex cores on the mainstream just landed. More games are finally now starting to use more than 4 threads. The shift to cores/threads isn't driven by need IMO, its driven by the competition needing to do something, to catch up. Can't beat em in IPC or overclocking, add more cores. Since the lemmings do what they do, more cores is the answer. Since more cores is 'the answer' Intel responded by adding a hex core to mainstream. I dont believe its driven by need (particularly in the consumer space we are in).

I've been waiting since 2007 for things to REALLY be multi-threaded... but, for most users, a quad/hex with HT will last them quite a while. I don't see a 'scope change' until at least 2020... and even then, the majority will likely not notice any difference in most of what they do between dual/wquad channel memory and gobs of bandwidth, regardless of the number of cores on the mainstream.

Question... maybe I do not quite get how AIDA64 memory test works, but, it increases the bandwidth with the amount of cores. Same memory same speeds different CPU with more cores = more bandiwdth. If its still scaling easily with cores, we aren't at a limit, right? Or do I just not fully grasp how that works?
 
Last edited:
Does it scale with more cores or just because there's more cache? Or is that a distinction with no practical difference?
I highly doubt cache makes THAT big of a difference. Its pretty significant. Look at our past CPU reviews and see the AIDA memory test results. ;)
 
More cores=bigger L3 cache, and more L2 and and Li caches doesn't it? Since that is what the memory accesses wouldn't that increase bandwidth? Not arguing, just asking. Another learning experience. :)
 
Yep, you are in a position few others are in where 'heading it off' is a good idea. Most are not. :)

It takes all sorts :) For example, I look at threadripper, and I just don't see a need for it within any of my use cases. Doesn't mean no one else wont have a use for it. I've long feared that my interests tend towards traditional HPC features. Fortunately there has been enough overlap with consumer kit to keep me going at sane cost, but this might not continue forever. Zen is a bad precedent in that respect, with it's half power FP units. Fortunately for me, so far, it looks like Intel isn't looking to follow, and there's talk of AVX-512 going into the consumer range. Depending if it is half fat or full fat AVX-512 implementation, the ram bandwidth could be very interesting to feed that monster of an execution unit. What I call full fat AVX-512 doubles the potential over AVX2, but the half-fat version is essentially comparable to AVX2.

Question... maybe I do not quite get how AIDA64 memory test works, but, it increases the bandwidth with the amount of cores. Same memory same speeds different CPU with more cores = more bandiwdth. If its still scaling easily with cores, we aren't at a limit, right? Or do I just not fully grasp how that works?

I have to say I'm not that familiar with Aida64 benchmark behavior. I run it as I have it, and use it for comparison with others. Taking a more abstract perspective on it, as I said, more CPU potential = more ram usage potential. Even in my intensive workloads, a single thread isn't going to max out dual channel. For something like a 6700k, 2 tasks scale close to expected, there is a drop on 3rd, and 4th is hardly any faster than 3. So, you do need multiple tasks (or threads) going on to generate the workload to exercise the ram potential.

I will note, of the other synthetic tests it includes, only photoworxx seems to scale strongly with ram bandwidth.


Can't remember if it was this thread or the other one, I have plans to do testing between HT and SMT. That is relevant here, because I wanted to look at HT and SMT in isolation, in particular I wanted to exclude other effects like ram performance as far as practical. So indirectly, that could offer some info for this thread too. Those scenarios will be compute and synthetics though, so I obviously wont claim "typical gamer" relevance there.


More cores=bigger L3 cache, and more L2 and and Li caches doesn't it? Since that is what the memory accesses wouldn't that increase bandwidth? Not arguing, just asking. Another learning experience. :)

Aida64 testing separates out the on CPU cache from the ram. For example, by using a sufficiently large test set it wont be possible to cache it all, and you get ram performance. That's not to say the caches wont have some influence still, as we find in OC turning up the cache usually helps the ram results to some degree.
 
Ahh. So Aida loads fills the cache(s) so the just the RAM throughput is tested? So cache doesn't so much control/handle quantity as much as the speed with which said quantity moves?
 
I can't say for sure how aida64 does it, and I'm not programmer, but what I would do is allocate a large amount of ram. Something much bigger than the L3 cache. You can then do the sequential read/write/copy tests on that large block. It is not possible to cache all of it at once, but the cache could still have some influence as it sits between the cores and ram. There are separate tests for caches, and in that case, the working set size would be chosen to match without exceeding the cache size. There may also be some instructions to help force the cache to be used.
 
I guess what I'm asking is about ED's statement about more cores equaling more bandwidth.

Question... maybe I do not quite get how AIDA64 memory test works, but, it increases the bandwidth with the amount of cores. Same memory same speeds different CPU with more cores = more bandiwdth. If its still scaling easily with cores, we aren't at a limit, right? Or do I just not fully grasp how that works?

I'm trying to figure out why core count would do that and not the increased cache that goes along with it. Six cores increases bandwidth over four, but six L1 caches over four L1 caches isn't what does it?

Edit: Or is it dependent on what the bandwidth is measured with? Ex: Just stuffing data through the pipeline would be more cache and/or RAM dependent, but if that data was measured as some unit of work then core count would be a larger factor?
 
Last edited:
I am simply guessing here, but cache really doens't have much to do with it, I'd imagine. I cannot prove it, but my hunch is based off of the significant differences it shows when using more cores. Again, look at the CPU reviews we have and check out the table that shows the raw data. Its not a couple thousand MB, its 10's of thousands... cache simply can't make up that big of a difference. Another point is Skylake-X reduced the amount of cache yet bandwidth was still comparable or higher than that of the previous generation last I recall. I believe throughput was increased in the non-private cache or w/e but it is actually less.
 
The main point of dual channel, quad channel, is bandwidth. It is raid 0 for ram. That can indirectly help latency, for example, by finishing a transfer operation faster it can start the next one sooner.
I believe they don't use Ganged memory channels that is comparable with Raid 0 since ~2010 multi core processors. With research I have done in to the past, since the start of multi core processors and multithreaded applications do to the lackluster performance gains in applications they switched to unganged memory channels that is comparable Non-RAID drive architectures that is just a bunch of hard drives. Unganged memory dual channels, which maintains two 64-bit memory buses but allows independent access to each channel, in support of multithreading with multicore processors.

Ganged versus unganged memory channels.
As memory operations happens in 64 bytes chunks, it appear that ganged mode will always win: it can spread that 64 bytes operations on the two memory channel, while the unganged mode will only use a single memory channel. The reality, however, is the the unganged mode rarely suffer from this problem, because normally there are many outstanding memory request to be completed, so there are many outstanding cache line to be fetched from or stored to main memory. While the ganged mode will be faster in operating on a single cache line, the unganged mode can theoretically operate on two cache line at a given moment (with some restrictions). This parallelism can be realized because the memory controller incorporate an 8 entry depth memory controller queue (the “MCQ” box in the drawing above), for a total of 8 outstanding cache line requests.

However, simply stating that the unganged mode has the potential to be often on par with the ganged mode is not enough: in this case, we can simply use the ganged mode and forget about the unganged mode. The point is that the unganged mode has potential to be faster that ganged mode. Why? Because we must realize that main memory access don't happen immediately, as the DRAM chip require many ns to be accessed: after this initial access time the data can be transferred quite quickly, but the initial access steps can be very slow (from a processor standpoint). Starting two memory operations at the same time, the memory controller has the possibility to hide at least partially the latency involved in the setup steps of the second operations. Obviously this is not always true, but it is a possibility indeed and, so, this can be an advantage of unganged vs ganged method. Moreover, using the unganged mode the memory controller can theoretically both write to and read from memory at the same time: this should help memory copy routines and multitasking operating system, where many processes can both read from and write to memory at the same time.

Summarizing the whole point, we can state that:

the ganged mode has the potential to be faster than unganged mode because it use a more fine grained interleave mode

the unganged mode has the potential to be faster than ganged mode because it can start two memory operations at the sime time, effectively hiding at least part of the latency involved in the second operation. Also, this mode permit to both read from and write to memory at the same time, with the intrinsic advantages that this possibility implies. http://www.ilsistemista.net/index.p...-the-ganged-vs-unganged-question.html?start=1
 
Last edited:
Dear men, can you please not quote essays if you need to reply, it's such a pain in the Taco to scroll through it all. Just reply and state who you're replying to, we all know what's going on in regards to context.

:ty::beer:
 
Taco,

There wasnt one thing quoted that shouldnt have been. Wingman quoted a source for his statement and who as well as what he was replying to. I feel that was a great post structure.

Im sorry you are stuck on your phone but lets not water down the site requesting low quality posts from users who post supportive information. Its normal to quote the person you are talking to if not directly to the person above... simple forum etiquette.

(No reply please...if you feel the need to say something pm me and not take another thread off topic. :))
 
Last edited:
Yep, you are in a position few others are in where 'heading it off' is a good idea. Most are not. :)

Its been a gradual shift for 10+ years. Remember when the Q6600 was released? Better get a quad or else...... Here we are a decade+ later and hex cores on the mainstream just landed. More games are finally now starting to use more than 4 threads. The shift to cores/threads isn't driven by need IMO, its driven by the competition needing to do something, to catch up. Can't beat em in IPC or overclocking, add more cores. Since the lemmings do what they do, more cores is the answer. Since more cores is 'the answer' Intel responded by adding a hex core to mainstream. I dont believe its driven by need (particularly in the consumer space we are in).

I've been waiting since 2007 for things to REALLY be multi-threaded... but, for most users, a quad/hex with HT will last them quite a while. I don't see a 'scope change' until at least 2020... and even then, the majority will likely not notice any difference in most of what they do between dual/wquad channel memory and gobs of bandwidth, regardless of the number of cores on the mainstream.

Question... maybe I do not quite get how AIDA64 memory test works, but, it increases the bandwidth with the amount of cores. Same memory same speeds different CPU with more cores = more bandiwdth. If its still scaling easily with cores, we aren't at a limit, right? Or do I just not fully grasp how that works?

With more cores the multiple Cache line increases with greater total cache increasing total amount of memory bandwidth internally compared to less cores and Cache. The CPU cores looks for data it needs in L1 first, then a miss it looks in L2, then a miss it will look in all of the shared LLC, then a miss it will look in main memory. If all the increased L1, L2, LLC do not miss much that will be increased bandwidth.

Here is a good short video to see the scaling and technical information for Mesh and Cache. https://www.intel.com/content/www/us/en/products/docs/processors/xeon/mesh-architecture-video.html
 
Last edited:
Back