• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

New Phenom 2's X4 and X2

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
There's two main factors swaying my assumption on price.

#1 Deneb core is far more competitive than Agena core was which pushed AMD Agena into very low-end bargain bin right upon introduction... naturally that meant the lowest-end Agena's being harvested as Kuma had to fit in the overall marketing hierarchy and compete with their rivals, which limited them to just above K8 65nm prices. Remember, Kuma came very late, nearly a year after Agena did and by then, Intel had amassed a huge lead with Wolfdale and its harvested derivatives. Any higher pricing and Kuma wouldn't compete nor sell at such time. The price was strictly governed by the market competition.
Deneb and all the recouped CPU's using its cores don't have such a forced shove into such low pricing sub $80.. well, no way as extreme anyway which allows AMD to sell them for a higher price with higher volume due to being competitive higher up (key to marketing).

#2 Heka is currently spanning the $120-150 territory. That means the next pricing territory lower down will be occupied by Callisto, that being $110 and below currently, but equal or above the Kuma 7750BE pricing. Callisto however won't arrive for a month or two yet at the earliest, and by then, pricing is bound to drop another round at least. Bearing all this in mind, Callisto will likely come from $75-$110. Regor, I expect will come below this in price as it will be far cheaper to manufacture and make profits from. Callisto with huge L3 cache per core at its disposable will show quite promising performance versus its competition but lose out on power requirement.


One of the largest benefits for the L3 cache is precisely this ability. To detect core activity levels and flush the Core and L1/L2 contents into the L3 to shut down the rest of the unused logic. It allows extremely low idle power and thats what you'll find if you tap into each Core phase for a Deneb in low-power mode (especially in C1E).

Another is also to provide the ability to a core to probe the private shared cache of any core and as the AMD developer explained, for local semi-inclusive and exclusive data buffering as per requirement.

My main push for AMD/Intel is to decrease cache latencies first and foremost as that can make a very large performance difference per time. The lower storage hierarchies are all starved and their benefits and usage minimized due to the high latencies of the upper 'buffers'. I expect thats when AMD will start to see large benefits from the huge L3 size they've employed... as soon as the L3 latency is dropped by another 1/4, which is the sweet spot.

They also need to improve buffering local to caches and the available prefetch abilities to each cache, as well as the width between cache and the transfer bandwidth between the MCT/DCT and the XBar/L3. Most of the subroutines optimized for higher L1 associativity will also suffer performance penalties with the Deneb core.

Seems you have been talking to or reading info from people in the know and that being the case you stated that this L3 is capable of cacheing the registers if I understood you correctly. And if I did then could this possibly be leading up to CPU fault tolerance for critical apps?
 
Seems you have been talking to or reading info from people in the know and that being the case you stated that this L3 is capable of cacheing the registers if I understood you correctly. And if I did then could this possibly be leading up to CPU fault tolerance for critical apps?
You picked on more than I expected anyone to. I'll have to check what I can say.

Leaving register buffering aside here, I tend to leave things simple and short to appeal to the broader readership and because of my time constraints. However, what I'm also trying to point out is how vague and simple I've seen people in fora leave the cache/mem subsystems leaving the impression that the L1/L2/L3 size are the only main storage based performance enhancers. The vast majority I've come across base too much emphasis on cache with little to none on the rest of the core (unless something else also becomes repeated by many). L1-L3 are the largest by far, no question, but there are plenty more 'local storages' within a core at each phase and any one of them can pose a bottleneck to per cycle performance, critically. Instructions are naturally buffered and locally stored at nearly every zone of a core. The 100s of fancy names for most units in a core are for the most part nothing but buffering/decoding/renaming/rescheduling/history trackers/temporary storage systems.

For instance, with the K10 arch, right at the very start of an execution cycle, the IFU pulls 32b from the L1I to the Predecode/Pick buffer. With Intel C2, it has a special local internal buffer between the Fetch and L1I that caches the last four 16b block draws for much faster decoding if required. There are plenty of such local buffers and register files around any core with various names. Even before such decode can take place with K10, when the instruction is fetched into the L1 for the first time, a special instruction pre-decode process works away on each instruction. The resulting instruction including the new pre-decode information (marking the end of an instruction) is stored in the local ECC bits of the L1I/L2/L3. Branch predictor, selector and indirect predictor are also in the ECC bits of the instruction cache with local global history registers for 'storing history'.

Later down the execution process, once the VectorPath or DirectPath decode finishes with instructions, it passes them on to the ICU. The ICU feeds the 72 micro-op Reorder buffer which stores the code that is passed on to the the IEU (int) and FPEU (fp) after first going through their individual independent reschedulers. Within the decode phase the instructions are sent through the Sideband Stack Optimizer, which modifies the stack pointer. That again has two ESPx registers in different parts of the core to function as 'temporary storage' of needed information (stack pointer value and a changes tracker in this scenario). The ROB and the later Pack/LSU 1/2 all partially act as temporary storage space, as a buffer. It goes on and on within a CPU..

The NB on the K10 core is able to buffer writes with 16-20 entry space. The 2x DCT also have internal special prefetch buffers of untold size. Both prefetch speculated data slowly making use of idle cycles.

In effect, every one of the 'local storage systems' target to decrease instruction execution latencies on the whole, as a CPU. :)
 
Thanks I will study this as I see that someone may have done a little on paper CPU design:cool: and they may have also had the chance to work on the implementation:beer:
 
You picked on more than I expected anyone to. I'll have to check what I can say.

Leaving register buffering aside here, I tend to leave things simple and short to appeal to the broader readership and because of my time constraints. However, what I'm also trying to point out is how vague and simple I've seen people in fora leave the cache/mem subsystems leaving the impression that the L1/L2/L3 size are the only main storage based performance enhancers. The vast majority I've come across base too much emphasis on cache with little to none on the rest of the core (unless something else also becomes repeated by many). L1-L3 are the largest by far, no question, but there are plenty more 'local storages' within a core at each phase and any one of them can pose a bottleneck to per cycle performance, critically. Instructions are naturally buffered and locally stored at nearly every zone of a core. The 100s of fancy names for most units in a core are for the most part nothing but buffering/decoding/renaming/rescheduling/history trackers/temporary storage systems.

For instance, with the K10 arch, right at the very start of an execution cycle, the IFU pulls 32b from the L1I to the Predecode/Pick buffer. With Intel C2, it has a special local internal buffer between the Fetch and L1I that caches the last four 16b block draws for much faster decoding if required. There are plenty of such local buffers and register files around any core with various names. Even before such decode can take place with K10, when the instruction is fetched into the L1 for the first time, a special instruction pre-decode process works away on each instruction. The resulting instruction including the new pre-decode information (marking the end of an instruction) is stored in the local ECC bits of the L1I/L2/L3. Branch predictor, selector and indirect predictor are also in the ECC bits of the instruction cache with local global history registers for 'storing history'.

Later down the execution process, once the VectorPath or DirectPath decode finishes with instructions, it passes them on to the ICU. The ICU feeds the 72 micro-op Reorder buffer which stores the code that is passed on to the the IEU (int) and FPEU (fp) after first going through their individual independent reschedulers. Within the decode phase the instructions are sent through the Sideband Stack Optimizer, which modifies the stack pointer. That again has two ESPx registers in different parts of the core to function as 'temporary storage' of needed information (stack pointer value and a changes tracker in this scenario). The ROB and the later Pack/LSU 1/2 all partially act as temporary storage space, as a buffer. It goes on and on within a CPU..

The NB on the K10 core is able to buffer writes with 16-20 entry space. The 2x DCT also have internal special prefetch buffers of untold size. Both prefetch speculated data slowly making use of idle cycles.

In effect, every one of the 'local storage systems' target to decrease instruction execution latencies on the whole, as a CPU. :)

After a close read and being too lazy to go on a fact finding expedition while also realizing that I do see what I know to be facts embeded in this information I will take what you stated in your post to be true. That being said I think you answered my question in part without answering anything:cool:

Thanks for the post it is a good review for the most part as when I did this crap L1 was it and L2 was COAST if you were lucky.
 
The dates are april and may, If I'm correct, we're still in march :beer: no offense meant.

Edit: damn you can't quote someone with a quote.... Grrr.
 
those *****
Indeed... Although, they aren't really EYE catching to me. I REALLY want to see a dual with some L# on the 45 nanometer arch.

Darn...
Wonder if we could unlock the disabled cores just like X3...

I kinda doubt it. They are going to be made out of a separate die. So it won't have the L3 on the die and so running four cores with no L3 would be very pointless. If that made any sense. Basically I am saying it won't be a broken 940 anymore. It will be an actual two core/one die chip.
 
Well, what the heck, the only quad I waited for is now gone down even further back to the July period. And even that is a totally tentative date they give.
 
Well at least they offer a dozen of 45nm optys now, unfortunately no 2 way eatx mobos with XF to threw them in, they should take a look at intel's 2 way neha workstation mobos.

According to Fudo seems both quad and dual L3 less processors are Q3 now, well the dual has some L3 left in it according to them (first time I hear about this cpu with 2MB L3). Link for quad Dual.

At this rate we sooner get a bulldozer than these.
 
Hey Kuro... I know we're still a little ways off from Regor, but have you happened across any speculative MSRP's? I'm just wondering if they're going to try and sneak them in under the $100 mark. I'm sure during the launch it'll be higher, but once the price settles... Regor for >$100 would get a mass following.
 
With the launch date estimates all over the place I doubt there is anything reliable about the price.
If I had to guess I would say the same as Kuma prices were, the X3s don't leave much place for them anyway so yes around 100, depending on how many AMD has.
 
Those PII X2's are going to be a seriously fantastic value, esp with a 790GX mobo for customer new PC's. The dx10 integrated video works great for vista and ok for games.

I have sev customers looking to build new low cost pc's that can do desktop work + light gaming & these will be perfect.


Tho I am super glad I grabbed my 940BE with the Giga UD4H deal for 280 I got on the egg, the 955 is a damn sexy chip.
If you are on anything lower than a core2quad and feel the need to upgrade, the 955BE is one sexy chip. I still prefer ddr2 over ddr3 at the moment, as anands review shows the performance increase from ddr2-1066 to any ddr3 is going to be 0-4%. Higher latency means programs that dont saturate all that bwidth (anything but games, even some games) will perform poorer due to higher latency. That along with ddr2's insane cheapness make it still the king of ram imo.
 
Last edited:
First time I've heard of the separate dual K10 die having some L3. I assume the reporter is somehow confusing the 2MB of L2, 1M each core, with being the L3. These muppets usually get a drop of fact and add in, through guesswork, the remaining bits to form an article.
 
Back