Frequency, clock, period of synchronous operations
CPU, memory, PCI bus, GPU, ... are driven by a clock which oscillates many cycles per second. Cycle per sec is also know as Hertz (Hz). E.g. the famous Tbred B 1700+ can be overclocked to around 2,500,000,000 cycles/sec (or 2.5 GHz), DDR400 memory can operate at a clock frequency of 200,000,000 cycles/sec (or 200 MHz). The electronic components that are driven by a clock perform certain operatioins (electrically and logically) at each clock tick, and move on to the next one, .... This is also called the synchronous mode of operation.
The reciprocal of frequency f is called the period T of the clock. T=1/f. So the period T is the time spacing between two operations of the electronic componet, e.g CPU, memory, GPU, ... E.g for a 2.5 GHz CPU, the time spacing between each clock tick is 1/2.5GHz = 0.4 ns or 400 ps. The time spacing between each clock tick in a DDR400 memory module is 1/200MHz = 5 ns.
Latency
Latency is a measure of time to complete certain operations. For synchronous mode such as CPU, memory, ..., each operation is measured in terms of number of cycles. An operation can be a large operation such as the Read or Write operation, or the smaller internal operation of a Read or Write operation.
For DRAM memory modules that are driven by a clock is called synchronous DRAM (SDRAM). With this in mind, the memory operations can be described in terms of number of cycles instead of time. When we move from DDR400 to DDR500, or a CPU from 2GHz to 3GHz, each operation takes propotionally shorter time due to faster silicon process, but the number of cycles to achieve an operation remain the same, and the interrelationship between operations in terms of cycles remains the same, unless there is a change in architechture and timing, ...
DRAM is organized in rows and columns of storage bits. The intersection of a row and a column is a bit of data. To access data, address is decoded into row and column addresses. First, the row corresponding to the decoded row address is accessed followed by sensing of all the bits in that row (and are stored in the sense amplifier during the Read operation), and then the corresponding columns are accessed and data output. In many case, multiple columns are accessed and output as a sequence of data that are located on the same row to save row access overhead, this is called the burst mode of operation commonly used for large block/page of data (this is where CAS latency comes in).
Here use Read operation to illustrate the concept. After memory controller issues Read command and address, a DRAM read operation is like this:
1. DRAM module decodes address into row address and column address
2. Activates word-line (row address) and detect and store all the row data by the sense amplifiers
3a. Activates column (column address) and outputs data
3b. In case of multiple column access (as discuss earlier), a sequence of column data is output.
4. Restore data back to the DRAM cells and precharge for next operation.
The tRCD latency - is the time or number of cycles to perform step 2. Typically, tRCD takes 2 or 3 cyles to complete.
The CAS latency - is the time or number of cycles to perform step 3a. In case of step 3b (in case multiple N column data are output), the total latency = N x CAS. Typically, CAS latency takes 2 to 3 cycles to complete.
The tRP latency - is the time or number of cycles to perform step 4. Typically, tRP takes 2 or 3 cycles to complete.
So the latency of a DRAM Read operation is tRCD + N x CAS + tRP, where typically N = 1 or 4 or 8 (or even more), depends on the number of column access per memory access. This is the total number of cycles for a Read operation.
Remember tRCD = 2 or 3, CAS = 2 (try to avoid setting to CAS 3), tRP = 2 or 3,
For 1 column access, the total latency of Read operation = tRCD + CAS + tRP
For 4 column access, the total latency of Read operation = tRCD + 4 CAS + tRP
For 8 column access, the total latency of Read operation = tRCD + 8 CAS + tRP
etc, etc.
Time for first data output from address decode = tRCD + CAS
Time for data output in multiple column access,
Time for second data output = tRCD + 2 x CAS
Time for third data output = tRCD + 3 x CAS
etc, etc.
Latency is different than memory bandwidth. Memory bandwidth is solely determined by the memory bus frequency and FSB frequency, and NOT latency. It is how fast the memory clock or FSB clock can run, regardless of whether the memory module has shorter or longer latency.
E.g. all DDR400 memory module can run at same memory speed 200MHz, regardles of their latency, to deliver a maximum bandwidth given by 2x8x200 = 3200 MB/s. x2 because of DDR meaning data are transferred at both the rising and falling edges of the clock. x8 because there are 8 bytes in the memory bus. But some low latency module can finish each of the above steps tRCD or CAS or tRP in smaller number of cycles, e.g. at 2-2-2. Most module can finish each of these steps in longer number of cycles, e.g. 3-3-2.
CAS latency is the most important since it is the time to output 1 bit (single column access) or many bit of data (multiple column access) (as described earlier). For all today's memory module, always set CAS to 2, and I find that almost all modules in the market now can be able to do that.
Set tRCD and tRP to 2 is possible. If not, set them to 3 is OK, not much penalty.
Sorry for the long post, since I try to explain to myself as clear and complete as possible. Hope this helps.