• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Official AM2 Thread!

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
my point is that for those given values (tCAS-tRCD-tRP-tRAS) the delay in ns is thesame if you double both the RAM clock speed and the delay in clock cycles.
now I might be wrong, I'm not defending my point till I die :p
now is that wrong? I am not really in a position to defy you, cause i'm running a very very very old system with a bios where I can only change the fsb, not even memory timings, and I'm just relying on what I've read.
 
You're incorrect for the reason I just stated above. Imagine there not being only "5-2-2-2" but 2-7-11-2-2-5-2-2-1-1...actually this isn't imagination, these are partially the timings I ran on my A64. Now would you get a doubling in delay if you changed those to 3-12-15-4-3-10-4-4-3-2? Some parameters are doubled, others are not.
 
doesn't matter wether there are 2 kinds of timings or 5 quadrillion, I am saying that the timings on ddr2 are higher then ddr but it doesn't really matter because ddr2 runs at higher clockspeeds, and that you can't compare timings on a 400 megahertz with timings on 200 megahertz
 
oxid said:
like I said, DDR2-800 with 4-4-4-10 timings is EXACTLY thesame delay in nanoseconds as DDR-400 at 2-2-2-5.

Even though it might be close on Intel platform, AMD on die memory controller should give much better performance.
 
the performance of the ddr2 would of course be better, but just the simple latency expressed in nanoseconds
 
with a 2.2ghz 3700 amd 64 and ddr400 at 2-2-2-5 you will see an average of 5700 mb/s... imagine with ddr2 at the higher frequency...

loosening the timings in the above scenario just for S's&G's to 2.5-3-3-6 and then 3-4-4-8 I think i lost about 700mb/s and as floating around 5000 mb/s ...

the venice 3000 in the other machine at 2.4ghz, 300x8 with 3-3-3-6 timings and a divider keeping memory at 200mhz yeilded 5800+ in the sandra test and thats with 250mb of ram being used.... in other words i dont think the looser timings of ddr2 matter much at all... I certainly gave up on tight timings long ago and went for high freq on NF4's 300's with case of 2.5 or 3 anyway and always got over 7000 :shrug:
 
Last edited:
oxid said:
the performance of the ddr2 would of course be better, but just the simple latency expressed in nanoseconds
What you're failing to realize is that the delay itself isn't as simple as you noticed. Each of those parameters is effectively another wait state. You're trying to say basically that 2*[7ns +11ns + 5ns + 2ns + 2ns + 2ns] = 12ns +15ns + 10ns + 4ns + 4ns + 4ns. (It's actually 58ns != 49ns in this still very simplified example)The factor between them is in fact much, much smaller than 2. Otherwise you'd need to have people running twice as fast at cas 4 to match cas 2...this simply isn't the case.
 
Most of this converstaion is going way over my head so I'll just say Gautam knows more than I do so I'll agree with him.
 
no, I never said that.

the timings that are given to ram are adressed as clock cycles.
so for a given latency of 2, that means two clock cycles of delay.
if you double the clock speed of the RAM, the delay is still two clock cycles, but the real world time of the delay is halved, because each clock cycle takes half as much time as usual.
therefore, if we double the delay in clockcycles to four, the real world delay time in nanoseconds would be thesame as if the clock cycle delay were 2 on the original clockspeed.
remember that the amount of Hz is the amount of clock cycles per second

at 400 mhz, one clock cycle is 2.5 nanoseconds (1/400000000)
at 800 mhz, one clock cycle is 1.25 nanoseconds (1/800000000)

thus,

at 400 MHz, a two clock cycle delay would be 5 nanoseconds.
at 800 MHz, a two clock cycle delay would be 2.5 nanoseconds.
at 400 MHz, a four clock cycle delay would be 10 nanoseconds
at 800 MHz, a four clock cycle delay would be 5 nanoseconds.

now where does my reasoning fail?

another edit:
and it's just simple mathematic distributivity. 5(3+2) = 15+10. so it doesn't matter how much types of delay there are.
 
Last edited:
oxid said:
no, I never said that.

Then what did you mean by this?

like I said, DDR2-800 with 4-4-4-10 timings is EXACTLY thesame delay in nanoseconds as DDR-400 at 2-2-2-5.

This is incorrect.

And your logic is correct if it were as simple as you stated, but its not. You're assuming just one latency parameter which can easily be doubled, tripled, quadrupled or what have you. In reality its a lot more complex.

Let's go timing by timing. I didn't want to spend time resorting to this, but its actually turning out pretty educational for me. Quotes taken from AMD themselves. This page is going in my bookmarks.

tRAS
Memory architecture is like a spreadsheet with row upon row and column upon column, with each row being one bank. For the CPU to access memory, it first must determine which row or bank in the memory is to be accessed and then activate that row with the RAS signal. Once activated, the row can be accessed over and over, until the data is exhausted. This is why tRAS has little effect on overall system performance but could impact system stability if set incorrectly.

So right here we can see that doubling tRAS doesn't effectively double your total delay. Say we have tRAS at 5. An instruction is sent to each row every 5ns. tRAS at 10, every 10ns. But what's going on within each row? Within this delay, it can still most certainly be accessed, given that its activated. Theoretically, you could have half as much performance at tRAS 10 compared to tRAS 5, but this only in a perfectly unideal case. In reality you probably won't see much of a difference at al.


tRCD
tRCD is the delay from the time a row is activated to when the cell (or column) is activated via the CAS signal and data can be written to or read from a memory cell. When memory is accessed sequentially, the row is already active and tRCD will not have much impact. However, if memory is not accessed in a linear fashion, the current active row must be deactivated and then a new row selected/activated. In such an example, low tRCD's can improve performance. However, like any other memory timing, putting this too low for the module can cause in instability.

tRCD here. Known to die-hard overclockers as about the most important latency parameter. People have spent extremely large premiums just to reduce this a notch or two. I know I'm guilty of it. Once again we can see, though, the practical effect that it has depends on how its accessed. Btw, if its accessed sequentially, the latency is additive, and not multiplicative, unlike what you showed in your last post. So we can already see that 10-4-x-x is in almost all cases incurring less than a 2x gain in latency compared to 5-2-x-x. In most cases, much less, since even in a non-linear case in which the CAS signal directly depends on tRCD, the tRAS will almost definitely be a lot better than the worst case scenario. Really the only way you can have a straight doubling of delay is if everything goes wrong.


CAS Latency
Certainly, one of the most important timings is the CAS Latency, which is also the one most people understand. Since data is often accessed sequentially (same row), the CPU need only select the next column in the row to get the next piece of data. In other words, CAS Latency is the delay between the CAS signal and the availability of valid data on the data pins (DQ). The latency between column accesses (CAS) then plays an important role in the performance of the memory. The lower the latency, the better the performance. However, the memory modules must be able to support low-latency settings.
Once again the key word is "sequentially". Though this little tidbit is quite interesting. CAS latency appears to be pivotal architecturally, since the total latency always depends on it. Seems like the reason that P4's don't benefit much from it is because the latency in communication between northbridge and CPU is high enough to overwhelm it. A64s it would seem should be affected pretty significantly by it. Why they aren't is beyond my knowledge, but quite a sidetrack anyways.


tRP
tRP is the time required to terminate one row access and begin the next row access. tRP might also be seen as the delay required between deactivating the current row and selecting the next row. So in conjunction with tRCD, the time required (or clock cycles required) to switch banks (or rows) and select the next cell for reading, writing, or refreshing is a combination of tRP and tRCD.
tRP is basically the next step in the process. Once again, it adds latency in memory access, doesn't multiply it. tRP obviously has a big effect. It's another one of those timings that performance freaks are willing to give up an arm and a leg for, and I can see why. The delay between each row is inescapable.

There's more, but I'm getting tired, and i think this is illustrative enough.


The cliff's notes: Memory is accessed in a sequential, step-by-step fashion. When you increase a certain latency parameter, it adds to the time delay, and does not multiply it.

If there were a signal latency parameter that memory access depended on, then, yes, doubling it would effectively double the total delay. However, in reality there are a multitude of delays that occur over the course of memory access. This is why changing a latency parameter doesn't change delay directly by the factor it was changed by.


Ok, does that do it for the threadjack? :p It is interesting stuff, no doubt, and even more explicitly shows that the essential doubling of frequency going from DDR1 to DDR2 will yield a much greater performance gain than the loss incurred by doubling some of the latency parameters.
 
gautam, you might be right there.
now, the question still remains: how do the timings react to increases in clockspeed?
 
If you're asking what I think you're asking, the period=1/frequency relation that you outlined earlier works...800MHz without any other latency parameters(NEVER occurs in reality!) has half the delay compared to 400MHz. 1.25ns vs 2.5ns. Or did you mean something else?

This lends itself to why RAM IC's are speced the way they are btw. If they are spec'ed to operate at 4ns, like common DDR500 parts, they can operate at a maximum frequency of 1/.004, 250MHz.



This is eating a good chunk outta my homework time...the problem is, I can't really tell the difference between this and my homework anymore. :D
 
what I mean is, how do we compare timings of DDR2-800 to the timings of DDR-400
 
boris_37 said:
Yeah this is quite exciting, keep up the work on the official thread. xtreme barton has been crying and crying to me telling me not to go out today and buy an opteron and wait for AM2.

I'm kind of excited now, although screw the semprons, im never going budget again, lol.

I will buy some pc2-5300 ram(overclocks to 1000mhz, or so a review said) so expect me to be one of the first in on overclocks on the new dual cores for am2.

However i really wanna build water cooling for my new pc, with 4 months to spare i got tons of time to look around:)
might want to wait on the water block for now. Though socket AM2 is similar to socket 939, the water blocks might not fit.
 
avalanche83 said:
Everest Latency is a great one. I get 48ns with 200MHz @ 1.5-2-2-6 and 48ns with 300MHz @ 3-3-3-6. The bandwidth makes up for the latency.
Can you do two more runs and compare the latency numbers:
1. 150 MHz 1.5-2-2-6
2. 300 MHz 3-4-4-12 (or 3-4-4-x if you cannot set tRAS to 12)
 
Last edited:
oxid said:
like I said, DDR2-800 with 4-4-4-10 timings is EXACTLY thesame delay in nanoseconds as DDR-400 at 2-2-2-5.
I tend to agree with your statement. If all the timing parameters (in number of cycles) for case A are exactly N times that of case B, and frequency of case A is also N times that of case B, then the latency (in seconds) for both cases would be the same.


But comparing DDR and DDR2 is not as simplistic as just looking at the latency and bandwidth. The key differences between DDR and DDR2 are

package (pin count) ...... TSOP/FPGA (184) .. FPGA (240)
frequency ................ 100/133/167/200 .. 100/133/167/200
data rate ................ 200/266/333/400 .. 400/533/667/800
voltage and power ........ 2.5V (nom) ....... 1.8V (nom) (lower power for DDR2)
DRAM chip density ........ 128 Mb - 1 Gb .... 256 Mb - 2 Gb
internal bank ............ 4 ................ 4/8
prefetch cycle ........... 2 ................ 4
CAS latency .............. 2/2.5/3 .......... 3/4/5
read latency ............. CAS .............. CAS+AL (AL = 0/1/2/3/4)
write latency ............ 1 ................ read_latency - 1
I/O width ................ x4/x8 ............ x4/x8/x16
signaling, termination ... none ............. selectable (better signal integrity for DDR2)
burst length ............. 2/4/8 ............ 4/8
...

(for refernece only, actual spec may differ)

Besides considering the common performance metric of latency and bandwidth, the above factors have to be considered as a whole and they have implications on future system scalability and overall system performance.

Memory frequency and latency tradeoff
 
Last edited:
Back