_damien_:
Thanks for pointing that out. I think you correctly pointed out the distinction between CAS latency (tCAS) and DQ burst cycle time (tDQ) which takes half CLK cycle per output data burst (for DDR1). tDQ = tCLK / 2.
Redoing the latency calculation and break-even frequency estimation for the various memory timings:
Memory frequency and latency tradeoff
Latency is a measure of time to complete certain operations. For synchronous mode such as CPU, memory, ..., each operation is measured in terms of number of cycles. An operation can be a large operation such as the Read or Write operation, or the smaller internal operation of a Read or Write operation.
For DRAM memory modules that are driven by a clock is called synchronous DRAM (SDRAM). With this in mind, the memory operations can be described in terms of number of cycles instead of time. When we move from DDR400 to DDR500, or a CPU from 2GHz to 3GHz, each operation takes propotionally shorter time due to faster silicon process, but the number of cycles to achieve an operation remain the same, and the interrelationship between operations in terms of cycles remains the same, unless there is a change in architechture and timing, ...
DRAM is organized in rows and columns of storage bits. The intersection of a row and a column is a bit of data. To access data, address is decoded into row and column addresses. First, the row corresponding to the decoded row address is accessed followed by sensing of all the bits in that row (and are stored in the sense amplifier during the Read operation), and then the corresponding columns are accessed and data output. In many case, multiple columns are accessed and output as a sequence of data that are located on the same row to save row access overhead, this is called the burst mode of operation commonly used for large block/page of data.
Here use Read operation to illustrate the concept. After memory controller issues Read command and address, a DRAM read operation is like this:
1. DRAM module decodes address into row address and column address
2. Activates word-line (row address) and detect and store all the row data by the sense amplifiers
3a. Activates column (column address) and outputs data
3b. In case of multiple column access (as discuss earlier), a sequence of column data is output.
4. Restore data back to the DRAM cells and precharge for next operation.
The tRCD latency - is the time or number of cycles to perform step 2. Typically, tRCD takes 2 or 3 cyles to complete.
The CAS latency (tCAS) - is the time or number of cycles to perform step 3a. Typically, CAS latency takes 2 to 3 cycles to complete.
For step 3b, multiple column data are output as N burst of DQ data out. DQ burst cycle time (tDQ) takes half of the memory clock cycle (tCLK) per output data burst (for DDR1). tDQ = tCLK / 2.
The total latency for step 3a and 3b = tCAS + N tDQ.
The tRP latency - is the time or number of cycles to perform step 4. Typically, tRP takes 2 or 3 cycles to complete.
So the number of cycles of a DRAM Read operation is tRCD + tCAS + N tDQ, where typically N = 1 or 2 or 4 or 8 (or even more), depends on the number of column access per memory access.
For 1 column access, number of cycles for Read operation = tRCD + tCAS + 1/2
For 2 column access, number of cycles for Read operation = tRCD + tCAS + 1
For 4 column access, number of cycles for Read operation = tRCD + tCAS + 2
For 8 column access, number of cycles for Read operation = tRCD + tCAS + 4
etc.
Number of cycles for first data output from activation command (ACT) = tRCD + tCAS
Number of cycles for data output in multiple column access,
Number of cycles for second data output = tRCD + tCAS + 1/2
Number of cycles for third data output = tRCD + tCAS + 1
Number of cycles for fourth data output = tRCD + tCAS + 1 1/2
etc.
Typically, the number of column access is software dependent, here is listed for a typical case of burst length of 4 (or 4 column accesses).
Number of cycles for each of the following timing are listed
2.0-2-2-5 average_time = 2 + 2.0 + 4 x 1/2 = 6 cycles
2.0-3-2-5 average_time = 3 + 2.0 + 4 x 1/2 = 7 cycles
2.0-3-3-6 average_time = 3 + 2.0 + 4 x 1/2 = 7 cycles
2.5-3-3-6 average_time = 3 + 2.5 + 4 x 1/2 = 7.5 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 4 x 1/2 = 7.5 cycles
2.5-4-3-8 average_time = 4 + 2.5 + 4 x 1/2 = 8.5 cycles
2.5-4-4-8 average_time = 4 + 2.5 + 4 x 1/2 = 8.5 cycles
3.0-5-5-x average_time = 5 + 3.0 + 4 x 1/2 = 10 cycles
Similarily, the average_time in cycles for other burst length can be calculated.
E.g. consider the popular memory modules such as BH-5/UTT (2-2-2-5 1T) and TCCD (2.5-3-3-7 1T).
Burst length of 1, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 1/2 = 4.5 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 1/2 = 6 cycles
Difference in cycles = 2.5 (4.5 vs 6)
% of frequency to break-even = 6 / 4.5 = 133.3%
Burst length of 2, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 2 x 1/2 = 5 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 2 x 1/2 = 6.5 cycles
Difference in cycles = 1.5 (5 vs 6.5)
% of frequency to break-even = 6.5 / 5 = 130.0%
Burst length of 4, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 4 x 1/2 = 6 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 4 x 1/2 = 7.5 cycles
Difference in cycles = 1.5 (6 vs 7.5)
% of frequency to break-even = 7.5 / 6 = 125.0%
Burst length of 8, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 8 x 1/2 = 8 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 8 x 1/2 = 9.5 cycles
Difference in cycles = 1.5 (8 vs 9.5)
% of frequency to break-even = 9.5 / 8 = 118.75%
For a burst length of 4, for memory intensive application, in order for 2.5-3-3-7 to gain back the longer latency of 1.5 cycles (6 vs 7.5 cycles), the frequency has to be increased based on the 7.5:6 ratio, or 25% increase.
E.g. 250 MHz of BH5/UTT at 2-2-2-5 equates to 312.5 MHz of TCCD at 2.5-3-3-7, for the same latency from ACT to DQ out for a burst length of 4.
For a burst length of 8, for memory intensive application, in order for 2.5-3-3-7 to gain back the longer latency of 1.5 cycles (8 vs 9.5 cycles), the frequency has to be increased based on the 9.5:8 ratio, or 18.75% increase.
E.g. 250 MHz of BH5/UTT at 2-2-2-5 equates to 296.9 MHz of TCCD at 2.5-3-3-7, for the same latency from ACT to DQ out for a burst length of 8.
This number represents the upper bound of memory frequency that has to be increased for 100% memory read access. For real applications, the frequency increase number for breaking-even the low latency would be less, depending on the intensitiy of memory access.