oxid said:
Then what did you mean by this?
like I said, DDR2-800 with 4-4-4-10 timings is EXACTLY thesame delay in nanoseconds as DDR-400 at 2-2-2-5.
This is incorrect.
And your logic is correct if it were as simple as you stated, but its not. You're assuming just one latency parameter which can easily be doubled, tripled, quadrupled or what have you. In reality its a lot more complex.
Let's go timing by timing. I didn't want to spend time resorting to this, but its actually turning out pretty educational for me. Quotes taken from
AMD themselves. This page is going in my bookmarks.
tRAS
Memory architecture is like a spreadsheet with row upon row and column upon column, with each row being one bank. For the CPU to access memory, it first must determine which row or bank in the memory is to be accessed and then activate that row with the RAS signal. Once activated, the row can be accessed over and over, until the data is exhausted. This is why tRAS has little effect on overall system performance but could impact system stability if set incorrectly.
So right here we can see that doubling tRAS doesn't effectively double your total delay. Say we have tRAS at 5. An instruction is sent to each row every 5ns. tRAS at 10, every 10ns. But what's going on within each row? Within this delay, it can still most certainly be accessed, given that its activated. Theoretically, you could have half as much performance at tRAS 10 compared to tRAS 5, but this only in a
perfectly unideal case. In reality you probably won't see much of a difference at al.
tRCD
tRCD is the delay from the time a row is activated to when the cell (or column) is activated via the CAS signal and data can be written to or read from a memory cell. When memory is accessed sequentially, the row is already active and tRCD will not have much impact. However, if memory is not accessed in a linear fashion, the current active row must be deactivated and then a new row selected/activated. In such an example, low tRCD's can improve performance. However, like any other memory timing, putting this too low for the module can cause in instability.
tRCD here. Known to die-hard overclockers as about the most important latency parameter. People have spent extremely large premiums just to reduce this a notch or two. I know I'm guilty of it. Once again we can see, though, the practical effect that it has depends on how its accessed. Btw, if its accessed sequentially, the latency is additive, and not multiplicative, unlike what you showed in your last post. So we can already see that 10-4-x-x is in almost all cases incurring less than a 2x gain in latency compared to 5-2-x-x. In most cases, much less, since even in a non-linear case in which the CAS signal directly depends on tRCD, the tRAS will almost definitely be a lot better than the worst case scenario. Really the only way you can have a straight doubling of delay is if everything goes wrong.
CAS Latency
Certainly, one of the most important timings is the CAS Latency, which is also the one most people understand. Since data is often accessed sequentially (same row), the CPU need only select the next column in the row to get the next piece of data. In other words, CAS Latency is the delay between the CAS signal and the availability of valid data on the data pins (DQ). The latency between column accesses (CAS) then plays an important role in the performance of the memory. The lower the latency, the better the performance. However, the memory modules must be able to support low-latency settings.
Once again the key word is "sequentially". Though this little tidbit is quite interesting. CAS latency appears to be pivotal architecturally, since the total latency always depends on it. Seems like the reason that P4's don't benefit much from it is because the latency in communication between northbridge and CPU is high enough to overwhelm it. A64s it would seem should be affected pretty significantly by it. Why they aren't is beyond my knowledge, but quite a sidetrack anyways.
tRP
tRP is the time required to terminate one row access and begin the next row access. tRP might also be seen as the delay required between deactivating the current row and selecting the next row. So in conjunction with tRCD, the time required (or clock cycles required) to switch banks (or rows) and select the next cell for reading, writing, or refreshing is a combination of tRP and tRCD.
tRP is basically the next step in the process. Once again, it
adds latency in memory access, doesn't multiply it. tRP obviously has a big effect. It's another one of those timings that performance freaks are willing to give up an arm and a leg for, and I can see why. The delay between each row is inescapable.
There's more, but I'm getting tired, and i think this is illustrative enough.
The cliff's notes: Memory is accessed in a
sequential, step-by-step fashion. When you increase a certain latency parameter, it
adds to the time delay, and does not multiply it.
If there were a signal latency parameter that memory access depended on, then, yes, doubling it would effectively double the total delay. However, in reality there are a multitude of delays that occur over the course of memory access. This is why changing a latency parameter doesn't change delay directly by the factor it was changed by.
Ok, does that do it for the threadjack?
It is interesting stuff, no doubt, and even more explicitly shows that the essential doubling of frequency going from DDR1 to DDR2 will yield a much greater performance gain than the loss incurred by doubling some of the latency parameters.