• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

A64 101

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Gautam

Senior Benchmark Addict
Joined
Feb 4, 2003
Location
SF Bay Area
Overlocking A64's 101

**A HUGE amount of this couldn’t have been written without the insight of D]g[ts and Hitechjb1 from the Overclockers Forums**

My hats off to them :clap:

Intro to A64 Architecture

Traditionally, a Northbridge existed between the memory bus and the CPU. The rate at which data is transferred between the memory and CPU is known as the front side bus. However, the Athlon64’s memory controller is on-die, and as such, has no Northbridge, nor a front side bus. The Athlon64’s have two independent buses; one between the memory and the on-die controller, and another bus that communicates with the other system devices- the HyperTransport bus. The CPU’s clock speed is determined by the HTT speed multiplied by a clock multiplier, which is why it’s often suggested to view the HTT as if it were the front side bus. However, this is about where the similarities between the two diverge. The HTT is, in fact, not a data path or bus, simply an internal time off of which the HyperTransport and CPU speeds are derived. The HyperTransport bus’ effective speed is determined by an LDT(Lighting data transport) multiplier, multiplied by the HTT. Traditionally, the memory speed is derived off of the front side bus, and can be manipulated by FSB/memory ratios. In contrast, in the A64, memory speed is derived off of the CPU speed in CPU/memory ratios. This is why it’s rather inaccurate to say that the memory is ever running “synchronously.” The memory is always running asynchronously with respect to the CPU speed, off of which it’s derived. How fast it’s running with respect to the HTT does not matter at all. There is no latency hit in running the memory slower than the HTT. While the front side bus could’ve been traditionally double or quad-pumped, the HyperTransport’s effective data rate can be anywhere from 1x to 5x it’s speed on the CPU.

Core variants
Toledo (939)- AMD's latest and greatest, a dual-core processor. OPN code CD, revision E6. Manufactured on a 90nm process, containing two cores with 1024k of L2 cache each. Currently, the 2.2GHz 4400+ and 2.4GHz 4800+ are available, rated at 110W. 128-bit dual channel memory controller, supporting unregistered memory.

Manchester (939)- OPN code BV (rev E4), or CD (rev E6). Same as above, except with 512k of L2 cache on each of the cores. 2.2GHz 4200+ (89W) and the 2.4GHz 4600+ (110W) are available.

San Diego (939)- OPN code BN, revision E4. Manufactured on a 90nm fabrication process. Including the 2.0GHz 3500+ (67W), 2.2GHz 3700+ (89W), 2.4GHz 4000+ (89), 2.6GHz FX55 (104w) and the 2.8GHz FX57 (104w). They have a default voltage of 1.4v and 1024K of L2 cache native. Sports a 128-bit wide, dual channel memory controller, and supports unregistered memory. Being revision E, they have an extra copper interconnect layer compared to the D-revision Winchesters, and are consequently far nicer overclockers.

Venice (939)- OPN code BP (revision E3), or OPN code BW (revision E6). Same as above, except with 512k cache native. They range from speeds from 1.8GHz to 2.4 GHz, models starting with the 3000+ and ending with the 3800+. 67W for the lower/mid-range, 89W for the 3800+.

Winchester (939)- OPN code BI. Same as above, except are of D0 revision, lacking an extra copper interconnect layer, and are generally inferior overclockers. 67W.

Clawhammer (939)- Found in the FX53(2.4 GHz), FX55(2.6GHz), and 4000+(2.4GHz). OPN code AS, revision CG. Can you tell I’m getting tired yet? Stock voltage of 1.5v, wattage of 89W. The FX series have an unlocked range of multipliers, which the 4000+ lacks.

Newcastle (939)- Exactly the same as above, except only have 512k of L2 cache, and are found in the 3500+ and 3800+, OPN code AW, revision CG. Unlike the 940, the 939's have only one HyperTransport link, although for the end-user, this doesn't actually make any difference.

Clawhammer (754)- Come in both 512k and 1024k cache variants. Have a 64-bit wide, single channel DDR SDRAM controller. They come in speeds ranging from 1.6GHz to 2.4GHz, with PR ratings ranging from 2800+ to 3700+. There are two revisions of the A64. The first is the C0, which came in both 512k and 1024k flavors. The newer revision, the CG, comes in both, however the 512k parts are very rare. The last letter in the OPN code indicates it’s revision; AP for C0’s, and AR for CG’s. Both desktop and mobile parts are available on this core. Desktops and desktop replacement mobiles have a default voltage of 1.5v, and an approximate heat output of 81.9W. Mobiles have a default voltage of 1.4v, and approximate heat output of 62W.

Newcastle (754) - These are all 512k parts, and are about identical to the Clawhammers in every other respect. There are no C0 Newcastles. All Newcastles currently are CG’s, as denoted by their OPN code of AX. These range in speeds from 1.6GHz to 2.4GHz, PR ratings from 2700+ to 3200+. The mobiles have a default core voltage of only 1.2v, and heat output of only 35W.

Newark (754) - OPN BU, revision E6. Mobile parts. They are based on a 90nm fabrication process, and are essentially the same as the San Diegos, except have only a 64-bit wide single channel memory controller. They have a stock voltage of 1.35v and are rated at 62W.

Sledgehammer (940)- Found the Opterons and the FX51, and some FX53’s. These all have 1024k of L2 cache, and have a 144-bit wide dual channel memory DDR SDRAM controller, supporting only registered memory. Speeds range from 1.4GHz to 2.4GHz. These are all based on a 940-pit socket package. The OPN codes that you will likely see are AG, which are B3’s, AK, which are C0’s, and AT’s, which are CG’s. They have stock voltages of 1.50v, and wattage ranges from 82.9W to 89W. The Sledgehammers have eight HyperTransport links for multi-way processing. These are factory enabled/disabled as appropriate, and cannot be modified.

Available Chipsets

nVidia nForce4 SLI- Cream of the crop. Supports a 1000MHz HyperTransport frequency, featuring SLI PCI-Express support. Take note, this chipset does not have AGP support, so you will need to replace your graphics cards. Also features a native SATA300 controller, supporting RAID 0, 0+1 and 1 for both its SATA controller and ATA133 IDE controller. Native Gigabit Ethernet with built-in firewall feature. Allows for breathtaking overclocks; very solid chipset, for socket 939 and 940 only. Allows multi-way processors as well.

nVidia nForce4 Ultra- Equally feature-rich and overclockable as above, the only difference being that this chipset does not support SLI.

nVidia nForce3 Ultra- Same as above, but no PCI-E support, and is generally less stable than above, also does not allow HTT overclocking of the same degree.

VIA K8T800- One of the highest performance A64 chipsets. Supports a maximum HyperTransport effective rate of 800MHz, but unfortunately, lacks AGP and PCI locks. AGP and PCI rates are determined by either a 1/6 or 1/7 divider off of the HyperTransport bus.

VIA K8T800 Pro- HyperTransport effective rate of 1000MHz supported, some motherboards have AGP/PCI locks, some don’t. Available for all sockets. Also has native support for SATA RAID 0+1, which is advantageous, as the PCI bus wouldn’t be used.

nVidia nForce3 150- Slightly lower in performance compared to the VIA’s, supports a max effective HyperTransport rate of 600MHz, but sports AGP and PCI locks in most boards, a huge plus. For the boards that don't have PCI locks, some SFF boards, PCI dividers up to HTT/9 are available, so even these shouldn't have trouble overclocking.

nVidia nForce3 250- Same as above, except supports a 800MHz HyperTransport rate.

nVidia nForce3 250GB- Same as above, but with a richer feature set. Native support for SATA RAID 0+1, Gigabit Ethernet, and has a built-in firewall feature. This is the preferred chipset; with such rich native support, the PCI bus can be kept quite clean.

SiS 755- A very promising chipset, humbling both the nVidia’s and VIA’s handily at the same speed, and sports AGP/PCI locks. However, motherboard support for this chipset is quite lacking, and there isn’t a solid solution for overclocking that utilizes it to date.

Configuring an A64 System

For your everyday overclocker, one of the most cost-effective solutions is to opt for a s939 512k 3000+, of Venice core. Venices are fabricated on a smaller manufacturing process than their predecessors, and as such, bring less heat dissipation and power draw, leading to theoretically higher overclocks. They also have an extra copper interconnect layer compared to the Winchesters, leading in much better overclocks in general.

For the best of the best, there is no substitute for the 939 San Diego, AMD’s flagship line. The FX57 is virtually untouchable by any other processor once you get it going.

A good compromise is the 3700+ San Diego, which has the 1 MB L2 cache of the FX, and should still clock very well, albeit not as high.

Multitaskers should opt for the Manchester and Toledo based offerings. Single cores cannot compete with these in multithreaded applications. These processors have the convenience of a single processor, with all the benefits of SMP.

Those that have 754-based boards should see a nice boost by upgrading to a Newark, if supported. These will often require special BIOSes, and cooling mods to accomodate for the lack of the heatspreader.

As far as chipsets go, the nForce4 Ultra is the best choice for most. The DFI Lanparty UT nF4 Ultra-D is the motherboard of choice, however it requires a PCI-E graphics adapter. For AGP, DFI's offering is once again the best choice, but supposedly a little buggier than the nForce4 PCI-E variant.

TCCD-based memory is king. G.Skill’s lineup clearly is taking the cake at this point in time, allowing for speeds in excess of 300MHz, and even 350 in some very rare cases. However, if the price premium is too high, competing offerings are also very fast. These include Patriot XBL, PQI Turbo Series , Corsair 3200XL, Kingston HyperX, and OCZ EL Platinum to name a few. All of these are also very solid choices and should achieve impressive speed levels. TCCD’s optimal usage is medium timings (3-3-2.5) along with speeds between 270MHz and 290MHz, most of the time. However, for those of us that prefer lower latency, OCZ Voltage Extreme and Mushkin Redline allow for close to as high speeds, but with 2-2-2 timings, but require 3.4 volts as a minimum. They are BH5’s successor, using Winbond’s UTT IC’s. These IC’s also show up in the far less expensive Twinmos Speed Premium. It is noteworthy that older Hynix-based modules and BH5 are still very much worth keeping, and can very possibly match the performance of the newer offerings.

Overclocking Technique

You are essentially in control of the speeds of four frequencies; the CPU speed, the memory bus, the HTT, and the HyperTransport bus. Overclocking the CPU is done pretty much as it always has been, except that the HTT substitutes the front side bus.

The HTT can almost always go just as high as you need it, granted that you don’t exceed the motherboard’s maximum supported HyperTransport speed by too much. For systems that support a 1000MHz HyperTransport data rate, for example, one could use a 200MHz HTT with a 5x LDT multiplier. However, using 5x250 would result in an effective 1250MHz, which would almost certainly lead to instability. The LDT could be dropped to 4x, allowing for a higher HTT speed with stability, 250, but resulting in the same 1000MHz HyperTransport data rate as default. The HTT being increased alone doesn’t accomplish anything; unlike increasing the front side bus does other platforms. Even raising the HyperTransport speed doesn’t add any noticeable increase in performance, as the bus is already so wide, that saturating it isn’t very likely. For this reason, the nF3 250’s and 150’s perform quite similarly to one another. My suggestion would be to leave the HyperTransport speed at as close to stock as possible, and raise the HTT only as much as necessary. Unless you’re using an 8x multiplier, there shouldn’t be much reason to go far above 300MHz in many cases.

What complicates what multiplier and HTT speed you should use are the CPU/memory dividers. No motherboard allows you to manipulate them directly. Instead, they provide “maximum memory clocks,” or supposed HTT/memory ratios. Make no mistake, no such thing exists. The memory is derived off of the CPU speed, but it’s never made clear, and the dividers need to be manipulated indirectly. Also, CPU/mem dividers are integral only; there are no half dividers, so it’s advisable not to use half multipliers. Ok, what I just said probably doesn’t make much sense, so here are some examples of how to get certain get common CPU/mem dividers:

CPU/5-8 - Set memory to 200 and multiplier to desired divider

CPU/9
Memory to 200, multi to 9x(if available)
Memory to 183*, multi to 8x

CPU/10
Memory to 200, multi to 10x(if available)
Memory to 183*, multi to 9x(if available)
Memory to 166, multi to 8x

CPU/11
Memory to 200, multi to 11x(if available)
Memory to 183*, multi to 10x(if available)
Memory to 166, multi to 9x(if available)
Memory to 150, multi to 8x

CPU/12
Memory to 200, multi to 12x(if available)
Memory to 183, multi to 11x(if available)
Memory to 166, multi to 10x(if available)
Memory to 150, multi to 9x(if available)
Memory to 133, multi to 8x

*Reserved values- will require modded BIOS or A64 Tweaker

To find the dividers mathematically, use the formula CPU Multiplier/Ratio and take its celing (e.g. anything above 10.0 becomes 11, anything above 11.0 becomes 12, etc). Formula found by Hitechjb1 from the Overclockers Forums.

Subsitute 1 for 200 as the ratio, 11/12 for 183, 5/6 for 166, 3/4 for 150 and 2/3 for 133.


Doesn’t make much sense? Don’t worry, it shouldn’t. Very irritating indeed for overclockers, but manufacturers didn't have us mind. As far as a typical end user is concerned, those ratios will give whatever speed they claim, as long as you don't mess with multis or HTT speeds. But who does that? ;) You will probably need to experiment with different multipliers and max mem clocks to find the CPU/mem divider that you desire. Using half multipliers complicates things further, as the memory is divided integrally. Just to eliminate variables, drop the LDT multiplier down to 3x if your board supports a 1000MHz HyperTransport speed, or down to 2x if 800MHz or 600MHz is it’s maximum. You can also increase in the HTT/LDT voltage on some motherboards, which can give you an extra 20-50MHz extra MHz on your effective rate in some cases.

As I've always stressed, overclocking needs to be done carefully and systematically. This is especially important with the A64. Focus on one area that you wish to overclock, and overclock it alone. For example, if you wish to overclock your memory, drop your LDT and CPU multipliers as low as they can go, and see how far your memory can go with everything else clocked low enough to not hinder stability. For overclocking the CPU, drop the LDT as low as it can go, and set the max memory clock as low as it can go as well. Once you find your maximum memory speed and CPU clock, play around with the max memory and CPU multiplier to find the suitable CPU/mem ratio. Once you've already got in mind how far the CPU and memory each can go, this isn't too difficult. I cannot stress enough how important it is to isolate variables. It's all too common that people try to max everything out at once, fail, and then give up out of frustration. Take your time, be patient, and have fun. Dividing and conquering can make the task of overclocking the A64 a lot less daunting.

This excellent tool by Cpjk can be very helpful in removing some of the confusion in figuring out how to run things.

If you're lucky enough to have an FX, though, you don't need to bother with finding the right CPU/memory ratio. Simply find your maximum memory clock, and then increase the multiplier as necessary to max out the processor.

On a related note, the absolute core voltage for 130nm A64’s rated by AMD is 1.65v, opposed to 2.25v(I believe) for Bartons. The heat output of A64’s at the same speeds as AXP’s is roughly equivalent to what they’d put out with 0.2v less. For this reason, it usually is not too beneficial to exceed a core voltage of 1.7v or so on air cooling. On phase change, 1.8-1.85v usually appears to be all that’s needed for an optimal overclock.

One rather important memory-related setting is the command rate, a.k.a CPU Interface on many other boards. The default for C0 processors is 1t, and the default for the CG’s is 2t. Take note that the C0's cannot run 2t. 1t is quicker, but makes overclocking the memory with double-sided sticks especially difficult in many cases. Running at 2t, however, takes off about 1 sec in SuperPI and PIFast, and makes a couple hundred point difference in 3DMark01. The one benchmark where it takes a significant toll is the Sandra Memory Bandwidth Benchmark, where it takes 10%, or 300-400 MB/sec off. I don’t see having to run at 2t as the end of the world, unless you’re a Sandra fanatic. The difference between 1t and 2t is actually less than that between tRCD2 and tRCD3 in my experience. Again, there is no one size fits all solution. It may take some experimentation to see what combination of command rate, latencies and memory speeds are optimally for you. Low tRCD is highly recommended, but CAS doesn't matter very much. tRAS at 10, and nothing else, seems to deliver the best performance, while backing it down much lower begins to hurt.

Some notes on Windows Tweaking/Overclocking
Overclocking A64s within Windows was originally done when high HTT’s caused BIOS corruption, however this doesn’t appear to be an issue today. It still can be very convenient, and for some boards like mine that don’t allow overclocking in the BIOS with mobiles, can be a godsend. ClockGen is a Windows-based utility for overclocking. It allows multiplier, voltage, HyperTransport speed, and PCI/AGP bus speed manipulation. Changing the voltage doesn’t work on all boards, and the CPU/mem ratio and LDT multiplier cannot be changed using the utility, so some settings must be set in the BIOS. It also allows for profiles, so you can quickly change speeds on the fly. To make a profile, put the signature of your board, as found on the website in brackets on the first line, e.g. [CG-NVNF3] for nForce3’s, and then the values you want to change in the succeeding lines, e.g. FID=9.0, HTT=250. For nForce3 boards, if you set your AGP rate or HTT rate anywhere above spec in the BIOS, the AGP/PCI lock is enabled, so you can increase the HTT easy in Windows. Setting the HTT to 201 in the BIOS is the most common technique. The nVidia System utility is a nice tool to have for nVidia-based boards. It allows manipulation of the tRAS, tRCD and tRP within Windows, and also the changing of the HyperTransport and AGP/PCI speeds. A64 Tweaker is an excellent utility written by CodeRed. It allows manipulation of just about every memory-related setting on the fly in Windows. It’s made my life dozens of times easier when trying to test things out. It also has much more functionality than you’ll find in most BIOSes.
 
Last edited:
OP
G

Gautam

Senior Benchmark Addict
Joined
Feb 4, 2003
Location
SF Bay Area
NiTrO bOiE said:
Good guide G. Do you mind adding that if you use the nvidia system utility, you can change memory timings within windows also?
Yep, also thx for that linky to those WPCREDIT tweaks. I finished this about 5 minutes before you pm'ed me with those. :bang head: I'll add those in now.
 

oc_byagi

Member
Joined
Apr 28, 2004
Location
Sunnyvale, CA
Gautam said:
Clawhammer- Come in both 512k and 1024k cache variants. Have a 64-bit wide, single channel DDR SDRAM controller. They come in speeds ranging from 1.6GHz to 2.4GHz, with PR ratings ranging from 2800+ to 3700+.
nice gautam, make sure to add 2600+ SOCKET 754 AHTLON 64, I think it's already out in Japan.
 
OP
G

Gautam

Senior Benchmark Addict
Joined
Feb 4, 2003
Location
SF Bay Area
oc_byagi said:
nice gautam, make sure to add 2600+ SOCKET 754 AHTLON 64, I think it's already out in Japan.
Really? Per the tech docs from AMD themselves, the lowest PR appears to be the 1.6GHz, 512k mobile 2700+. I'll change that once it's officially released.
 

MassiveOverkill

Member
Joined
Dec 25, 2002
Location
Florida
"However, the K8T800 Pro has a slight edge in performance. Some VIA boards have PCI/AGP locks, and others don’t. The one’s that do can sometimes be temperamental. For hardcore overclockers, the VIA may be the best route"

This is a typo right? You meant to say VIA may NOT be the best route right?
 
OP
G

Gautam

Senior Benchmark Addict
Joined
Feb 4, 2003
Location
SF Bay Area
MassiveOverkill said:
"However, the K8T800 Pro has a slight edge in performance. Some VIA boards have PCI/AGP locks, and others don’t. The one’s that do can sometimes be temperamental. For hardcore overclockers, the VIA may be the best route"

This is a typo right? You meant to say VIA may NOT be the best route right?

No, I meant it may. For the die-hard, the VIA's have always been the choice. OPPainter used the K8T800 non-pro-based Asus SK8V to pull off the top 3dmark01 score, and some others pulled of some very impressive things using the chipset. But for your average user, it's almost definitely not worth the trouble. It just doesn't quite deserve to be eliminated as an option entirely, especially now that PCI/AGP locked boards appear to be around the bend.
 

Jess1313

Member
Joined
Feb 15, 2004
I read that wpcr thing & cant find it now,any links.

Opps I see it now. How well did they help. Ah it dont matter off to try them that will work. Thanx very nice tread.
 
Last edited:

ap673

Member
Joined
May 23, 2004
I vote sticky, It goes a little more indepth on how overclocking and HTT/LDT work.
 

OC Detective

Member
Joined
Jul 30, 2001
Location
Mauritius
Very indepth and well laid out - however be careful not to mix fact with opinion.
Gautam said:
The 939 Newcastles are rather pricey, and the dual-channel memory controller has proven to just compensate for the lack of cache, and not much more. These actually do worse than the 754 Clawhammers in gaming, and plenty of other benchmarks.

The 3800+ is rated higher than the 3700+ for a reason - namely it does better overall. Indeed I think you should try to find the benchmarks where your statement shows otherwise - as I am sure you are aware this link doesnt prove your statement on gaming (most games the 3800+ is ahead).
http://www.aceshardware.com/read.jsp?id=65000311
Gautam said:
For the best of the best, there is no substitute for the 939 Clawhammer, AMD’s flagship line. The FX53 virtually untouchable by another processor once you get it going.
Sorry but I would have thought the FX53 on 939 is a sledgehammer which can use non registered RAM? I would think there is no 939 clawhammer.
 
OP
G

Gautam

Senior Benchmark Addict
Joined
Feb 4, 2003
Location
SF Bay Area
OC Detective said:
Very indepth and well laid out - however be careful not to mix fact with opinion.


The 3800+ is rated higher than the 3700+ for a reason - namely it does better overall. Indeed I think you should try to find the benchmarks where your statement shows otherwise -
You are correct. Guess I let my 754-fanboyism get to me too far. ;)

OC Detective said:
Sorry but I would have thought the FX53 on 939 is a sledgehammer which can use non registered RAM? I would think there is no 939 clawhammer.
I can't seem to find any definite information either way. All the retailers seem to list it as a Clawhammer, but I can't find any reputable source that denotes it as either a Sledgehammer or a Clawhammer. :confused:

I originally wrote this up over at Xtreme Resources, here's the thread in case anyone's interested. Got lots of good input both from here and there, so keep it coming!
 
Last edited:

Silent Buddha

Member
Joined
Nov 30, 2003
Location
Bellevue, WA
I think a Socket 939 FX would be considered a ClawHammer. One of the main distinguishing features of Socket 940 Sledgehammers are the extra HT links, which Socket 939 doesn't have (no dual-FX for Socket 939 :( ). So 1MB L2 cache, no extra HT links = ClawHammer

Oh yeah, sticky :D
 
OP
G

Gautam

Senior Benchmark Addict
Joined
Feb 4, 2003
Location
SF Bay Area
That'd be funny Dave if I didn't mention it in the last post. :rolleyes: :p

Silent Buddha said:
I think a Socket 939 FX would be considered a ClawHammer. One of the main distinguishing features of Socket 940 Sledgehammers are the extra HT links, which Socket 939 doesn't have (no dual-FX for Socket 939 ). So 1MB L2 cache, no extra HT links = ClawHammer
That's a very good point, though, didn't think of it. Time to add another snippet. :)

CandymanCan said:
Umm i thought the desktop A64 with 1mb l2 cache can have the CO stepping also ?
Yes, absolutely. There are both 512k and 1024k C0 Clawhammers. But I must emphasize that there are no C0 Newcastles.

BeerHunter said:
Awesome thread.

I'm a little confused on the memory stuff like still.
It's weird all right. Anything I can clarify?