• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Athlon64 Questions

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

OSFP

Member
Joined
Aug 1, 2003
Well I am planning a future Athlon64 purchase, just when nforce3 250 boards with 939 support and PCI Express slot come out, and I need some questions answered so I can plan the other stuff I am gonna need (Ram mostly...)

Questions:

1) As far as I know the non-FX A64s are locked so you cant change multipliers, so I was wondering what tolerability do nforce3 chipsets have? Do they do 250MHz easily? can they reach 300MHz just like the 875 Intel chips do?

2) Has AMD fixed the dual channel problem the XPs had? What I mean is, do the A64s have a real dual channel or like the XPs that only gained a 10% or something due to their architecture...I know the A64 currently is single channel and I can possibly demand from you to know what will hap[pen with the 939s that will be dual channeled, but I think you can answer this if you got some experience with the current FX.

3) Can the A64 run non-sync with the RAM? Becaus eI remember that if the XP runs non-sync you lose too much performance. I remember that a typical barton with 333MHz FSB, if it was run with 333MHz ram it would perform better than if with 400MHz ram and a divider used....however strange as it seems...well is this ficed now or whatever I do I should consider running sync the only choice?

Thats mostly my tech questions, and now a time release question.
When do you think the 939s will come out and the new 250 mobos? and when do you think pci express for VGA cards will be added?

I thank you in advance, nd hope for an early reply...
 
1) They are only locked to higher multipliers. You can change them to a lower multiplier and then bump the FSB up higher.

2) There is not a huge increase with dual channel on A64 unlike the FXs. (someone correct me if I'm wrong.)
 
NiTrO bOiE said:

2) There is not a huge increase with dual channel on A64 unlike the FXs. (someone correct me if I'm wrong.)

The A64 = FX w/out DC

Dual channel helps a lot, and will give the plain A64 more bandwidth than a P4 at the same speeds. It helps :) I dont thinks its anylonger tied to the DDR base, like the AXP's.
 
OSFP said:
1) As far as I know the non-FX A64s are locked so you cant change multipliers, so I was wondering what tolerability do nforce3 chipsets have? Do they do 250MHz easily? can they reach 300MHz just like the 875 Intel chips do?

The nForce3 150 and nForce3 250 have shown the capability to run at very high FSB (Referance Clock). Both have gone well over 300MHz FSB, when CPU multiplier is lowered, of course. The chipset, when not held back by an unlocked PCI, is not a bottleneck.

OSFP said:
2) Has AMD fixed the dual channel problem the XPs had? What I mean is, do the A64s have a real dual channel or like the XPs that only gained a 10% or something due to their architecture...I know the A64 currently is single channel and I can possibly demand from you to know what will hap[pen with the 939s that will be dual channeled, but I think you can answer this if you got some experience with the current FX.

The bandwidth shown by Socket 940 FX CPU's is significantly higher then that shown in 'dual-channel' nForce2 motherboards. Socket 939 CPU's will show similar or identical bandwidth. So the bandwidth problem isn't there. Whether or not the impact of this increased bandwidth is important (Socket 754 vs Socket 939) is another question.

OSFP said:
3) Can the A64 run non-sync with the RAM? Becaus eI remember that if the XP runs non-sync you lose too much performance. I remember that a typical barton with 333MHz FSB, if it was run with 333MHz ram it would perform better than if with 400MHz ram and a divider used....however strange as it seems...well is this ficed now or whatever I do I should consider running sync the only choice?

Running memory out of sync with the FSB (referance clock) is still not recommended, and you will see performance hits if you do so. The impact of this performance hit may not be significant to every user.

OSFP said:
When do you think the 939s will come out and the new 250 mobos? and when do you think pci express for VGA cards will be added?

Socket 939 CPU's and motherboards should be out in the market by mid-June. AMD will likely announce the product somewhere in the last week of May to the first week in June. These dates, as all predicted dates, could easilly be pushed back or possible up a week.

Socket 754 motherboards with the nForce3 250 chipset are already beginning to come out on the market. Socket 939 motherboards with the nForce3 250/250Gb chipset will likely be appearing as soon as any other Socket 939 motherboard.
 
from what Ive seen from several peoples tests
Running the A64 is possibly one of the only processors that does not take the "Traditional" performance hit when running it Async with the memory...why? well the fact that the memory controller is directly hooked up to the processor and does not have to deal with any unordinary clock cycles.
sure it might take a TAD of performance hit a tiny tiny little bit. but nowhere NEAR what a P4 or athlon XP takes when async
 
Running at 250 6:5 for 209MHz gives me the exact same performance as running at 209 1:1. When running async with an A64, you have to view it as if you're actually running at a lower FSB. 250 6:5 does not have nearly the same level of performance as 250 1:1, but its no worse than running 209 1:1, unlike AXP's, where running async would take a big bandwidth and latency hits. The A64 doesn't seem to lose anything. I use high HTT's and asynchronous memory to make up for the lack of multipliers, and its more than a suitable substitute. The flexibility of the ratios make overclocking just as easy as if higher multipliers were available. I've run HTT's up to about 320 without issue. Others have hit around 350 and even above. Now try and find memory that can keep up. ;)
 
Last edited:
Ok, so as you pointed out the stupid thing, (333/333 being faster than 333/400 FSB/RAM) doesnt happen. But when you say 250-209 is the same as 209/209 , do you mean the same in memory bandwidth or general performane? (In 3dmark2001 for example)...

Because If you get generally the same performance even though you got a higher FSB, then the only reason to use a divider would be so that you RAM doesnt limit your CPU overclock...assuming the multiplier is locked...am I right?

Could you also explain what is HTT and what does it have to do with the FSB? at first it seems like the quadpumped FSB of the P4 that says 800 FSB but in reality its 200....but then I see the nforce3-150 uses 600HTT and stil has 200 FSB so I am kinda puzzled...

One more question, I've heard that the overall performance of an A64 system grows more than the other systems when you overclock, because the memory controller which is inside the CPU gets more effective when the CPU clock is raised. For example lets say that in 3dmark an A64 2GHz has the same performance as a P4 3.4GHz (it should be aproximately the same but for the sake of comparison lets assume its exaxtly the same). so lets say they both hit 20k 3dmark2001 at these speeds (with the same video card ram etc of course) If you overclock BOTH a 20% you should get again the exact same performance for example 22k ...but because the memory controller also gets faster with the CPU clock increase of the A64 you also get an additional boost and hit 22.5k....That's what I've heard....is it true?


Cant wait to get my hands on a nfocre3-250 939 mobo + CPU....A64 seems great for games (my personal need) and I will surely kick some Intel *** out there...and it seems great that most of the XPs vulnerabilities are out of the way...
 
Ok, so as you pointed out the stupid thing, (333/333 being faster than 333/400 FSB/RAM) doesnt happen. But when you say 250-209 is the same as 209/209 , do you mean the same in memory bandwidth or general performane? (In 3dmark2001 for example)...
Everything. I'll run some 3DMarks showing this soon, but running 250/209 is effectively the same as running 209/209, no worse, no better. For Intels and 939, running at 250/209 should be pretty much the same as running 250/250.
Because If you get generally the same performance even though you got a higher FSB, then the only reason to use a divider would be so that you RAM doesnt limit your CPU overclock...assuming the multiplier is locked...am I right?
As far as I can tell. Higher HTT's by themselves don't seem to do that much.
Could you also explain what is HTT and what does it have to do with the FSB? at first it seems like the quadpumped FSB of the P4 that says 800 FSB but in reality its 200....but then I see the nforce3-150 uses 600HTT and stil has 200 FSB so I am kinda puzzled...
You can think of the HTT just like the FSB. The HTT has an LDT multiplier of 3x, 4x or 5x, like triple, quad or quint-pumping if you will. It's pretty much analogous to Intels, but with different terminology.
One more question, I've heard that the overall performance of an A64 system grows more than the other systems when you overclock, because the memory controller which is inside the CPU gets more effective when the CPU clock is raised. For example lets say that in 3dmark an A64 2GHz has the same performance as a P4 3.4GHz (it should be aproximately the same but for the sake of comparison lets assume its exaxtly the same). so lets say they both hit 20k 3dmark2001 at these speeds (with the same video card ram etc of course) If you overclock BOTH a 20% you should get again the exact same performance for example 22k ...but because the memory controller also gets faster with the CPU clock increase of the A64 you also get an additional boost and hit 22.5k....That's what I've heard....is it true?
Maybe. In theory, it doesn't sound like it to me. The memory controller is still being limited by the memory itself. Latencies may be cut. The prob is, I'm getting far worse scalability in 3DMark01 than I did with my AXP, simply because my 9800 is now a bottleneck. :-/ :D
 
I think the A64 939 platform would deliver the highest effective memory bandwidth, .... among P4 dual channel QDR, 754 SC, XP DC, XP SC...

Further due to the separate memory bus and HT bus, the A64 platform would deliver much higher max combined bandwidth (memory + HT), about two to four times that of P4, XP.

These are numbers based on analysis, not from actual test results.

hitechjb1 said:
This 939 platform memory bandwidth, as estimated from some test data (so result is preliminary), is impressive. Its efficiency is around 86-90%, which is 15-20% (to be confirmed with more 939 test data) better than the P4 QDR dual channel counterpart.

Its effective bandwidth (not max), running at the same memory bus speed, is about 15-20% higher than that of P4 QDR dual channel and 81-89% higher than that of 754 platform or nforce2 dual channel.

...

A major difference between the AMD 754 and 939 platforms is the memory bus, i.e. 64-bit memory bus for 754 vs the 128-bit memory bus for 939. Here put it some estimate (since 939 is not commonly available yet) to see the potential impact on memory bandwidth performance.

I think there is a significant advantage from the 939 128-bit memory bus and on-chip dual channel controller, it is very different from the nforce2 dual channel which has only few % memory bandwidth improvement over single channel, as shown below.


....


Summary (preliminary numbers, may vary as more 939 test results become available):

- If further confirmed by more 939 hardwares, this 86 - 90% number on bandwidth efficiency for 939 128-bit is 15 - 20% higher than the 75% QDR of P4 (64-bit).

- At 86-90% efficiency, the effective bandwidth for the 939 128-bit memory bus would be 81 - 89% higher than that of a 754 64-bit memory bus, with assumed 95% memory efficiency.

This higher bandwidth in 939 would have significant impact on memory intensive applications such as video and image streaming, applications using spatially structured data as in scientific computation, ..., as well as 3Dmark01.


PS:

For video, image streaming, data needs to be refreshed constantly from the main memory (L3) to the on chip L2 via the memory bus (same as FSB in P4 and XP) as size of video data >> L2 size at any given time. So the high P4 dual channel memory bandwidth delivers an advantage. For the upcoming 939, I think it would even be better due to its 128-bit memory bus (w/ dual channel controller).

Let BW stands for effective memory bandwidth (not max),
DC stands for dual channel memory controller,
SC stands for single channel memory controller,
for the same bus speed (FSB, memory bus)

BW_939 > BW_P4_DC > BW_754 > BW_XP_DC > BW_XP_SC

at a ratio estimated respectively about

86-90 : 75 : 48 : 47 : 44

or

BW_939 = 27.5 - 28.8 bus (to be confirmed when 939 available)
BW_P4_DC = 24 FSB
BW_754 = 15.2 bus
BW_XP_DC = 14.8 FSB
BW_XP_SC = 14 FSB

Multiply the corresponding number and FSB in MHz will give the MB/s memory bandwidth.
E.g. FSB = 200 MHz, mem_fsb_ratio 1:1, BW_P4_DC = 24 x 200 = 4800 MB/s

hitechjb1 said:
....
Summary:

Due to the separation of memory bus and HyperTransport (system) bus for all other devices in A64,
- the effective latency between the CPU (after L2 miss) and the memory (L3) is reduced
- the effective bandwidth of the A64 memory bus (128-bit in 939) to/from the CPU is alone higher than the effective P4 memory (and system) bandwidth, and twice that of XP
- the max bandwidth of the HyperTransport bus (for all other devices) to/from the CPU is alone comparable to P4 system bus, twice that of XP

The max combined bandwidth of memory bus (in 939) and HyperTransport in an A64 system is more than twice the sytem bus (FSB) of a P4 system and four times the system bus (FSB) of an XP system.

For the complete posts.

Estimation and importance of 939 platform memory bandwidth (page 19)

Differences between the XP FSB and the A64 buses (separate memory bus and HyperTransport bus) (page 19)

Some remarks on cache latency, cache size, memory latecny and memory bandwidth (for A64's) (page 19)

Remarks on A64 and various platforms (page 19)
 
Gautam said:
Yep, undoubtedly.

What do you mean by "max system bandwidth"?

For A64, system bandwidth is the combined memory bandwidth + HT bandwidth.

For P4, XP, system bandwidth is the bandwidth of the FSB.

For A64,
max system bandwidth = max memory bandwidth + max HT bandwidth

In other words, it is the max bandwidth to/from the A64 CPU.

In the P4 and XP arena, the memory and other system bandwidth merge at the NB chipset and is limited by the FSB bandwidth to the CPU. Now for the A64, that bottleneck is removed, memory bus and HT system bus directly go to the A64 CPU (both 939/754).

The posts in the links give details about each bandwidth based on the memory bus frequency, HT bus frequency for A64 939/754, and also that of P4 and XP FSB, ...
 
Originally posted by hitechjb1 ]/i]

Differences between XP FSB and the A64 buses (separate memory bus and HyperTransport bus)

For XP, memory data, video card data, PCI data (hard disk, optical drives, networking, ...), serial links (USB, firewires, ...), slower peripheral (keyboard, mouse, ...), everything are going through the FSB to/from the CPU.

Using nominal 200 MHz FSB running in DDR, with 64-bit data path, the
max bandwidth is 200 x 8 x 2 = 3200 MB/s = 3.2 GB/s

The traffic that are crucial to system performance such as memory data, video card data, hard disk data (file I/O, paging) have to compete with other in the FSB, result in bottleneck and system bus conflct.


For the A64 CPU, the memory traffic and the traffic for the rest of the devices mentioned above are separated at the CPU rather than at the chipset (NB). This is the key difference in system bus architecture between the old XP and the new A64, and has an important advantage of system performance for the A64.

- Memory is communicating directly via a separate memory bus to the processor's on-chip dual channel memory controller with 128-bit data path (for 939/940) and on-chip single channel memory controller with 64-bit data path (for 754).
In an other post, the effective bandwidth for the 128-bit dual channel is estimated around 90% of max bandwidth, which is higher than the 75% number of P4 dual channel QDR.

- The rest of the subsystems such as video, hard drives (IDE, SATA, RAID), optical drives, networking, serial links, multi-CPU communication (for multi-processor board), ..., are comunicating to/from the CPU via the HyperTransport bus to the chipset and various bridges down stream.



HyperTransport is for point to point connecting the CPU to peripheral subsystems such as networking, storage, serial links, chip to chip communication, I/O, ....

(HyperTransport bus does exist in nforce2 chipset, linking NB and SB.)

... (SKIP DETAILS ABOUT HT, refer to original post for details)

The max bandwidth (for peripheral communication) of 6.4 GB/s is comparable to that offered by the current system bus (FSB) used for both memory, video and peripheral at 200 MHz, the max bandwidth for dual channel quad pump P4 is 6.4 GB/s, and DDR for AMD is 3.2 GB/s.


Summary:

Due to the separation of memory bus and HyperTransport (system) bus for all other devices in A64,
- the effective latency between the CPU (after L2 miss) and the memory (L3) is reduced
- the effective bandwidth of the A64 memory bus (128-bit in 939) to/from the CPU is alone higher than the effective P4 memory (and system) bandwidth, and twice that of XP
- the max bandwidth of the HyperTransport bus (for all other devices) to/from the CPU is alone comparable to P4 system bus, twice that of XP

The max combined bandwidth of memory bus (in 939) and HyperTransport in an A64 system is more than twice the sytem bus (FSB) of a P4 system and four times the system bus (FSB) of an XP system.


For complete post:

Differences between the XP FSB and the A64 buses (separate memory bus and HyperTransport bus) (page 19)
 
Interesting, I don't really get those same results. First, there isn't a chance that an Intel system running at 250/209 is as quick at one running 250/250. I've tried this on multiple cpus and mobo. If ran the system in my sig at 5/4 it would take a large hit and even running it at 280 5/4 and 4.2G its probably about the same speed as running 259 1/1. Also 280/224 would easlily beat 224/224.

I've found the same in the short time with the A64. The system is certainly quicker at 275/275 then 275/230 even with LDT set to 2.5 vs 2. I've tested this on 2001se. I aslo find it very hard to believe that any system running at 275/230 would be at the same quickness at a system running 230/230 or as you said 250/209 being the same as 209/209. What you're saying is that a system running at 2.5G is no quicker than a system running at 2.1G with the same memory speed. LDT helps but not that much.

One other thing to think about it that while the FX chips have dual channel and their sandra buffered does very well, in Sandra unbuffered they fall very much behind the Intel systems in bandwidth. Sandra unbuffered is a much better indication of useful memory performance. Hopefully, this is due to using ecc ram, but only time will tell.
 
Last edited:
What you're saying is that a system running at 2.5G is no quicker than a system running at 2.1G with the same memory speed
I didn't mean to say that. I only meant that there's only really a linear sort of rise in performance, not an exponential one, as the clock speed scales higher.

I didn't mean that 210x10 1:1 would equal 250x10 6:5; I meant that 8.5*250 6:5 would equal 250x10 1:1. Sorry for any confusion.
 
Back