[Ret Sticky]Overclocking sndbx for A64 939 systems with Winchester, Opteron dual core

jcw122 · Jul 20, 2005

wow I haven't seen this thread before, I thought it was just a big list of Winchester and up OC speeds and info, but apparently not! I mean wow this is incredible hightechjb! Your insane!

mysterfix · Aug 3, 2005

Hey guys I got my crappy Neo2 refurb to do 305htt at 9X. My NF3 Ultra-D will be arriving sometime next week but for now here are my results...

[/URL][/IMG]

rseven · Aug 3, 2005

mysterfix said:
Hey guys I got my crappy Neo2 refurb to do 305htt at 9X. My NF3 Ultra-D will be arriving sometime next week but for now here are my results...

[/URL][/IMG]

That is amazing! Are you sure it's a Neo2? Double check that label, you must me mistaken.....LOL Congrats!

mysterfix · Aug 3, 2005

rseven said:
That is amazing! Are you sure it's a Neo2? Double check that label, you must me mistaken.....LOL Congrats!

LOL, Yeah It's a Neo2 alright! I got the crappy memory performance to prove it!

I'm running the 1.8r2 bios also, they seemed to work better than than the 1.9's for me.

Thanks rseven, I hope to improve on this a little bit once I get my new board. you never know what a little bump in votage may do for you. I want to give it 1.7v just to see if my OC improves any but I'm pretty happy with this chip either way.

ChronoReverse · Aug 28, 2005

This is my system that I use everyday. I haven't actually tried pushing it. Cool N' Quiet is enabled.

Abit AV8 Bios 19
1GB Samsung Generic @ 222MHz 2.5-3-3-5 (better than I thought my generic ram would do).
Athlon64 3000+ Winchester @ 2.4GHz, 1.4V
Stock cooling
LDT 2X (stupid AV8 bug killing any 3D program)
HTT 267MHz

RotKT · Dec 16, 2005

thanks

_damien_ · Mar 21, 2006

** post removed **

hitechjb1 · Mar 21, 2006

_damien_:

Thanks for pointing that out. I think you correctly pointed out the distinction between CAS latency (tCAS) and DQ burst cycle time (tDQ) which takes half CLK cycle per output data burst (for DDR1). tDQ = tCLK / 2.

Redoing the latency calculation and break-even frequency estimation for the various memory timings:

Memory frequency and latency tradeoff

Latency is a measure of time to complete certain operations. For synchronous mode such as CPU, memory, ..., each operation is measured in terms of number of cycles. An operation can be a large operation such as the Read or Write operation, or the smaller internal operation of a Read or Write operation.

For DRAM memory modules that are driven by a clock is called synchronous DRAM (SDRAM). With this in mind, the memory operations can be described in terms of number of cycles instead of time. When we move from DDR400 to DDR500, or a CPU from 2GHz to 3GHz, each operation takes propotionally shorter time due to faster silicon process, but the number of cycles to achieve an operation remain the same, and the interrelationship between operations in terms of cycles remains the same, unless there is a change in architechture and timing, ...

DRAM is organized in rows and columns of storage bits. The intersection of a row and a column is a bit of data. To access data, address is decoded into row and column addresses. First, the row corresponding to the decoded row address is accessed followed by sensing of all the bits in that row (and are stored in the sense amplifier during the Read operation), and then the corresponding columns are accessed and data output. In many case, multiple columns are accessed and output as a sequence of data that are located on the same row to save row access overhead, this is called the burst mode of operation commonly used for large block/page of data.

Here use Read operation to illustrate the concept. After memory controller issues Read command and address, a DRAM read operation is like this:
1. DRAM module decodes address into row address and column address
2. Activates word-line (row address) and detect and store all the row data by the sense amplifiers
3a. Activates column (column address) and outputs data
3b. In case of multiple column access (as discuss earlier), a sequence of column data is output.
4. Restore data back to the DRAM cells and precharge for next operation.

The tRCD latency - is the time or number of cycles to perform step 2. Typically, tRCD takes 2 or 3 cyles to complete.
The CAS latency (tCAS) - is the time or number of cycles to perform step 3a. Typically, CAS latency takes 2 to 3 cycles to complete.
For step 3b, multiple column data are output as N burst of DQ data out. DQ burst cycle time (tDQ) takes half of the memory clock cycle (tCLK) per output data burst (for DDR1). tDQ = tCLK / 2.
The total latency for step 3a and 3b = tCAS + N tDQ.
The tRP latency - is the time or number of cycles to perform step 4. Typically, tRP takes 2 or 3 cycles to complete.

So the number of cycles of a DRAM Read operation is tRCD + tCAS + N tDQ, where typically N = 1 or 2 or 4 or 8 (or even more), depends on the number of column access per memory access.

For 1 column access, number of cycles for Read operation = tRCD + tCAS + 1/2
For 2 column access, number of cycles for Read operation = tRCD + tCAS + 1
For 4 column access, number of cycles for Read operation = tRCD + tCAS + 2
For 8 column access, number of cycles for Read operation = tRCD + tCAS + 4
etc.

Number of cycles for first data output from activation command (ACT) = tRCD + tCAS
Number of cycles for data output in multiple column access,
Number of cycles for second data output = tRCD + tCAS + 1/2
Number of cycles for third data output = tRCD + tCAS + 1
Number of cycles for fourth data output = tRCD + tCAS + 1 1/2
etc.

Typically, the number of column access is software dependent, here is listed for a typical case of burst length of 4 (or 4 column accesses).

Number of cycles for each of the following timing are listed
2.0-2-2-5 average_time = 2 + 2.0 + 4 x 1/2 = 6 cycles
2.0-3-2-5 average_time = 3 + 2.0 + 4 x 1/2 = 7 cycles
2.0-3-3-6 average_time = 3 + 2.0 + 4 x 1/2 = 7 cycles
2.5-3-3-6 average_time = 3 + 2.5 + 4 x 1/2 = 7.5 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 4 x 1/2 = 7.5 cycles
2.5-4-3-8 average_time = 4 + 2.5 + 4 x 1/2 = 8.5 cycles
2.5-4-4-8 average_time = 4 + 2.5 + 4 x 1/2 = 8.5 cycles
3.0-5-5-x average_time = 5 + 3.0 + 4 x 1/2 = 10 cycles

Similarily, the average_time in cycles for other burst length can be calculated.

E.g. consider the popular memory modules such as BH-5/UTT (2-2-2-5 1T) and TCCD (2.5-3-3-7 1T).

Burst length of 1, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 1/2 = 4.5 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 1/2 = 6 cycles
Difference in cycles = 2.5 (4.5 vs 6)
% of frequency to break-even = 6 / 4.5 = 133.3%

Burst length of 2, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 2 x 1/2 = 5 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 2 x 1/2 = 6.5 cycles
Difference in cycles = 1.5 (5 vs 6.5)
% of frequency to break-even = 6.5 / 5 = 130.0%

Burst length of 4, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 4 x 1/2 = 6 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 4 x 1/2 = 7.5 cycles
Difference in cycles = 1.5 (6 vs 7.5)
% of frequency to break-even = 7.5 / 6 = 125.0%

Burst length of 8, between 2.0-2-2-5 and 2.5-3-3-7,
2.0-2-2-5 average_time = 2 + 2.0 + 8 x 1/2 = 8 cycles
2.5-3-3-7 average_time = 3 + 2.5 + 8 x 1/2 = 9.5 cycles
Difference in cycles = 1.5 (8 vs 9.5)
% of frequency to break-even = 9.5 / 8 = 118.75%

For a burst length of 4, for memory intensive application, in order for 2.5-3-3-7 to gain back the longer latency of 1.5 cycles (6 vs 7.5 cycles), the frequency has to be increased based on the 7.5:6 ratio, or 25% increase.
E.g. 250 MHz of BH5/UTT at 2-2-2-5 equates to 312.5 MHz of TCCD at 2.5-3-3-7, for the same latency from ACT to DQ out for a burst length of 4.

For a burst length of 8, for memory intensive application, in order for 2.5-3-3-7 to gain back the longer latency of 1.5 cycles (8 vs 9.5 cycles), the frequency has to be increased based on the 9.5:8 ratio, or 18.75% increase.
E.g. 250 MHz of BH5/UTT at 2-2-2-5 equates to 296.9 MHz of TCCD at 2.5-3-3-7, for the same latency from ACT to DQ out for a burst length of 8.

This number represents the upper bound of memory frequency that has to be increased for 100% memory read access. For real applications, the frequency increase number for breaking-even the low latency would be less, depending on the intensitiy of memory access.

hitechjb1 · Mar 24, 2006

Based on the analysis of the last post, so between 2-2-2-5 1T (such as BH-5/UTT at 220-250+ MHz) and 2.5-3-3-7 1T (such as TCCD PC4400 at 280-300+ MHz), the latter would require (an average of about) 25% higher frequency to break even with the former low latency memory setup for memory performance.

In conjunction with the 18.8-33.3% for memory read of 1 to 8 burst, and the 25% typical based on analytical estimation by counting read access cycles (see link below), it is fair to establish that memory with 2.5-3-3-7 1T would need 25-30% higher bus frequency to break even with memory with 2-2-2-5 1T timing for memory performance in memory intensive applications.

So if BH-5/UTT is able to run at 250 MHz 2-2-2-5 1T, 3.3+ V. TCCD 4400 such as G. Skill LE or TCCD 4800 has to run at around 300 - 310 MHz 2.5-3-3-7 1T, 2.8 V to break even, and in many cases it is doable using some Nforce4 motherboards.

Besides the performance comparison, these are some pros and cons for BH-5/UTT vs TCCD.
- The TCCD modules which require less voltage would lessen concern about chip reliably due to the high 3.3+ V, especially medium to long term impact (if any) of such voltage level on the CPU's memory controller interface (Vmemref).
- The TCCD modules offer a wider range of memory frequency and timing for tweaking, from 200 - 300+ MHz, cas 2/2.5/3 (if motherboard allows).
- On the other hand, the frequency of around 250 MHz for 2-2-2-5 1T memory modules is more easily achievable in many setups for top performance vs the 300+ MHz for 2.5-3-3-7 1T memory.

hitechjb1 · May 2, 2006

Reserved

hitechjb1 · May 2, 2006

Just got my first dual core, started some testings, ...

So results are preliminary, and will be updated in the next few days, weeks, ....

Dual core 939 Opteron Testing (at 3+ GHz)

The same 939 sandbox hardware as posted at the beginning of this thread are used, except the Winchester 939 was replaced by an Opteron 939 Dual Core 165.

- CPU: Opteron 939 165 CCBBE 0610 DPMW (rated 1.8 GHz)
- CPU cooling: XP-90 with adjustable high speed fan
- Motherboard: DFI NF4 LanParty UT Ultra-D rev A02 (the very first batch of board that DFI released in Feb 05)
- Bios 623-3,
- Memory: G. Skill 256 MB x 2, 4400 LE (a good pair mainly used for testing, capable of high memory bus, 2.5-3/4-3/4- 1T at 300-320 MHz, 3-5-5-10 1T at 350 MHz, 2.8 V)
- PSU: Antec True 550 II
- Video card: ATI X800 Pro 256 MB (plan to get a better one, ...)

Installed CPU, booted right to 300 x 9 = 2.7 GHz with stock voltage (1.35 V), memory at 2.5-3-3-7 1T.

Ran memtest for a while, CPU and memory were able to pass memtest at 3.015 GHz 1.5 V with memory at 335 MHz 3-4-4-10 1T 2.8 V.

First 3.01 GHz 334x9 screen shot:

May 03, 2006 - Highest CPU clock 3.3 GHz with air cooling (very unstable above 3.28 GHz):
(Since 3.3 GHz is a good milestone to obtain, please note that using such a high voltage is not recommended in general even for a short period of time to obtain screen shots.)

hitechjb1 · May 2, 2006

Tested with G. Skill 2 x 256 MB
- CPU at 3 GHz, with a 1:1 memory_HTT_ratio, memory at 334 MHz 3-5-5-10 1T (CPU_memory_divider = 9)
- CPU at 3 GHz, with a 9:10 memory_HTT_ratio, memory at 300 MHz 2.5-4-4-8 1T (CPU_memory_divider = 10)

The setup has been tested for dual prime95 stability. More extensive testings are underway.

Added AMD dual core driver and Microsoft dual core hotfix, not sure whether these help at all.

hitechjb1 · May 2, 2006

Opteron dual core 165 at 3.13 GHz with TCCD memory 313 MHz 2.5-4-4-8 1T

opty165_tcase_3.13GHz_347x9_313_2.5448.JPG

Opteron dual core 165 at 3.15 GHz with TCCD memory 350 MHz 3-5-5-10 1T

opty165_tcase_3.15GHz_350x9_350_35510.JPG

hitechjb1 · May 2, 2006

Space holder

hitechjb1 · May 2, 2006

space holder

hitechjb1 · May 2, 2006

Used a pair of inexpensive PC3200 2 x 512 MB memory module rated 2-3-2-5 with a memory_HTT_ratio of 2:3 (which gives a CPU_memory_divider of 14).

CPU at 3+ GHz 1.52 V with memory at ~ 215 MHz 2.5-3-3-6 1T 2.8 V.
Able to run SuperPi 32M and a short time of dual prime95 for initial testing (user aborted).

Note that the SuperPI 32M run time is longer than it should be due to slower memory and other background tasks during the testing.

hitechjb1 · May 2, 2006

Remarks:

1. Dual prime temperature was
- 57 C, ambient 42 C, running at 3 GHz, 1.60 V, memory 300 MHz or
- 52 C, ambient 40 C, running at 3 GHz, 1.52 V. memory 215 MHz.
Removing IHS may help, not planning to remove the IHS for now.

By lowering ambient to around 35 C, CPU full load temperature is around 50 C.
This gives a full load (dual prime95) power dissipation of about

P = C / heat_sink_thermal_resistance ~ (50 - 35) / 0.2 = 75 W

2. The DFI LanParty NF4 UT Ultra-D board was the very first revision A02. Replacing with the latest Rev AD0 may help to achieve higher HTT, and hence potentially higher CPU frequency and stability (?).

3. Looking for 1 GB x 2 memory modules that would work well for this setup.

More testings are underway, ....

hitechjb1 · May 3, 2006

Just managed to get it to run a CPU-Z at 3.3 GHz

, not stable for other things.
(Since 3.3 GHz is a good milestone to obtain, please note that using such a high voltage is not recommended in general even for a short period of time to obtain screen shots.)

- CPU: Opteron 939 165 CCBBE 0610 DPMW (rated 1.8 GHz)
- CPU cooling: XP-90 with adjustable high speed fan
- Motherboard: DFI NF4 LanParty UT Ultra-D rev A02 (the very first batch of board that DFI released in Feb 05)
- Bios 623-3,
- Memory: PC 3200 2 x 512 MB 2-3-2-5 1T 2.8 V, with a 15 CPU_memory_divider
- Other details, refer to previous few posts.

Will do more testing and benching between 3.0 - 3.3 GHz, to see what is the highest stable setting it can do, ....

TempliNocturnus · Jun 19, 2006

Board is DFI Lanparty UT CFX3200, with 2 Gbs of OCZ Enhanced Latency Platinum memory. Using the stock heatsink/fan with two 12cm in the front and rear, and vent holes drilled in the front of the case. CPU temp seems to float around 36*C idle and ususally gets around 41 - 43 when running BF2 or superpi. The temps are near the same when I had it clocked at 2.4. Awsome chip!

I haven't been able to get it up to 3Ghz yet, been trying really hard but I've gotten the long beep of noworky. I think it's my ram settings; I have my ram at 150Mhz 3/4. I'm alittle lost I don't really want to push my ram too much. Hopefully I'll have a better heatsink and fan before I get this thing up to 3!

mysterfix · Jul 27, 2006

Well I finally got around to testing this new 165 a little further, it's prime stable @ 1.55v so far, I'm gonna do little more testing soon so I'll have to update when I do.

[Ret Sticky]Overclocking sndbx for A64 939 systems with Winchester, Opteron dual core

Member

Member

Member

Member

Registered

Registered

Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Member

Attachments

Member

Similar threads