Page 1 of 2 1 2 LastLast
Results 1 to 20 of 27
  1. #1

    Bulldozer Architecture Explained

    George Harris is one of our most active Senior Members at the Overclockers.com Forums, as well as a frequent contributor to articles on the site. As a fifth year undergraduate at Missouri University of Science and Technology studying Computer Engineering (with an emphasis on Computer Architecture), he has both the knowledge and insight to bring us a more technical view on the Bulldozer architecture. This article, while at first glance quite wordy, will allow us all to gain deeper insight into what makes the (admittedly rather underwhelming) Bulldozer architecture 'tick'.


    ... Return to article to continue reading.

  2. #2
    "The Expert" Archer0915's Avatar
    Join Date
    Nov 2008
    Location
    East Carolina University Grad School
    Posts
    4,792
    Dolk; good read. I agree about the FP and I was actually trying to bring that up the other night (face to face discussion with edumicated ppl) but decided to drop it so as not to cause confusion. Sometimes I feel like OCF, TR and some other forums are the only places I am understood; most of the time anyway.

    I made this post somewhere else this morning:

    The issue here is the architecture and software do not mesh, according to AMD. To me it is not better than HT in some cases and worse than others. In cases where it is better than the iX it is most likley not software that takes full advantage of the FPU.
    In essence I am saying that in heavy float ops work it is no better than a quad but in straight int ops it acts like a octo with a limited FSB.
    Last edited by Archer0915; 10-20-11 at 02:07 PM.
    People that buy OEM systems think Linux was a Charlie Brown character, a registry is something you see at target to buy shower gifts, RAM is a Dodge truck and a hard drive is DC at rush hour.
    Current Active Fleet:
    Daily Driver i7 3740QM T530
    Work Rig/networked backup - i7 4770K@4.4, 24GB Ram, 7850 GFX Bedroom PC - i7 870 @ 3.4, 16GB Ram, 6670 GFX
    Home School PC (kids) SB Celery @1.6, 4GB Ram, On Chip GFX Living Room HTPC - E-450@1.9 GHz, 4GB Ram
    Game Rig (kids)- 3570K@4.7, 12GB Ram, 465GTX + 9800GT (PhysX) using intel SRT
    Test Bed - In flux

  3. #3
    Retired muddocktor's Avatar
    Join Date
    Nov 2001
    Location
    New Iberia, LA
    Posts
    12,977
    Very good read, Dolk.

  4. #4
    [Citation Needed] Member Theocnoob's Avatar
    Join Date
    Dec 2007
    Location
    Canada. Eh?
    Posts
    9,626
    We're hitting the wall harder than a drunk Sailor on a Friday night, man.

    Anybody else notice the actual visible improvements in CPU performance getting smaller with each gen?
    Or not existing?

    Come on, graphene...

  5. #5
    "The Expert" Archer0915's Avatar
    Join Date
    Nov 2008
    Location
    East Carolina University Grad School
    Posts
    4,792
    Quote Originally Posted by Theocnoob View Post
    We're hitting the wall harder than a drunk Sailor on a Friday night, man.

    Anybody else notice the actual visible improvements in CPU performance getting smaller with each gen?
    Or not existing?

    Come on, graphene...
    And what do you mean by visible improvements? Compute power goes up, software demands go up, programs get sloppy because of the immense amounts of code and memory requirements go up, programs get larger to take advantage of the memory size and speed as well as the compute power so storage speed and size go up. I would not trade the PC I have today (the slowest) for the best I had 10 years ago.

    If you DC or encode quite a bit and a few other things you can see and feel the powar I still want moar powar. I want it!!! Give it to me now; I demand it!!! OOPS Be careful what you ask for lightning does strike

    Dolk: Did you get any insight into the cache issues? I just feel that AMD cache scheme is ineffective. It seems that the L3 offers little gain to the average user and that a large L2 is where it is at for them. (speaking about the move from winzer to brizzy to PhI to PhII to AthII (winzer 2.0 on the duals) x4 and so on. You can see that the windsor had a great run but the brizzy was lacking.
    Last edited by Archer0915; 10-21-11 at 08:18 AM.
    People that buy OEM systems think Linux was a Charlie Brown character, a registry is something you see at target to buy shower gifts, RAM is a Dodge truck and a hard drive is DC at rush hour.
    Current Active Fleet:
    Daily Driver i7 3740QM T530
    Work Rig/networked backup - i7 4770K@4.4, 24GB Ram, 7850 GFX Bedroom PC - i7 870 @ 3.4, 16GB Ram, 6670 GFX
    Home School PC (kids) SB Celery @1.6, 4GB Ram, On Chip GFX Living Room HTPC - E-450@1.9 GHz, 4GB Ram
    Game Rig (kids)- 3570K@4.7, 12GB Ram, 465GTX + 9800GT (PhysX) using intel SRT
    Test Bed - In flux

  6. #6
    Member kskwerl's Avatar
    Join Date
    Jul 2010
    Location
    New Jersey
    Posts
    535
    Very well written and informative article, thank you.

  7. #7
    Member Metallica's Avatar
    Join Date
    Jan 2007
    Posts
    1,058
    Quote Originally Posted by Theocnoob View Post
    We're hitting the wall harder than a drunk Sailor on a Friday night, man.

    Anybody else notice the actual visible improvements in CPU performance getting smaller with each gen?
    Or not existing?

    Come on, graphene...
    I feel the same way. My E6750 felt the same as my i7 920. Synthetic benchmarks is the only place I saw improvements. I don't really do anything CPU intensive.

    But the jump from p4 -> e6750 was massive, or so it felt.

    SSD's were the huge jump in performance that I was looking for. I wonder when something else will come out that will make as much of a noticeable improvement as SSD's.

  8. #8
    Premium Member #5


    bmwbaxter's Avatar
    Join Date
    Jun 2010
    Location
    Hamilton, Ontario
    Posts
    3,599
    Nice article! it was an informative read
    SSD > PCMark 05
    freeagent - Torture is watching all these 2600k's kicking my rig in the mosfets

  9. #9
    Member Lord_of_Decay's Avatar
    Join Date
    May 2002
    Location
    Currently in New Mexico
    Posts
    459
    Excellent article Dolk.
    Intel i5 3570k
    MSI Z77A-GD55
    8GB Geil PC312800
    MSI R6950 Twin Frozr III
    Aircooled with Thermalright Venomous X

  10. #10
    I once overclocked an Intel
    in a nightmare and
    it was HORRIBLE Senior



    OCF's Plaything
    Dolk's Avatar
    Join Date
    Mar 2008
    Location
    St. Louis MO
    Posts
    5,779
    @Archer, sorry for the late reply. I have become very busy as of lately. I do not believe there is a cache problem, or at least there is a problem with the architecture plan for the cache. There may have been something wrong with the execution, but I cannot full determine that without a full disclosure from AMD.

    The L3 cache has greatly improved, and if AMD had kept their original plan of having a single L3 cache, than we would have seen a far less gain from the Phenom II to BD architecture. You have to think of it this way for the cache system in BD. The L1 is only for the cores to store their data. The L2 is for the cores to talk to one another inside their respected module. The L3 is for the modules to talk to each other inside their respected CPU. Now each cache gets all the information within that hierarchical set. That means the L2 will always contain the information from all L1 caches in the module, and the L3 will always contain the information from all L2 caches in the CPU.

    Again how this idea was executed is unknown to me. It is more guess work without actually knowing the full details of each segment of the actual produced architecture. My only guess is that the cache system is being held back by the CMOS layout setup that AMD decided to go with (the Auto layout).
    (╯v)╯︵ ┻━┻
    currently has .2738428927348 internets
    create your own rules to find the answer
    be proud of the pink stars
    Earthdog: New codename for SB for extreme overclockers... BLEH.



  11. #11
    "The Expert" Archer0915's Avatar
    Join Date
    Nov 2008
    Location
    East Carolina University Grad School
    Posts
    4,792
    Quote Originally Posted by Dolk View Post
    @Archer, sorry for the late reply. I have become very busy as of lately. I do not believe there is a cache problem, or at least there is a problem with the architecture plan for the cache. There may have been something wrong with the execution, but I cannot full determine that without a full disclosure from AMD.

    The L3 cache has greatly improved, and if AMD had kept their original plan of having a single L3 cache, than we would have seen a far less gain from the Phenom II to BD architecture. You have to think of it this way for the cache system in BD. The L1 is only for the cores to store their data. The L2 is for the cores to talk to one another inside their respected module. The L3 is for the modules to talk to each other inside their respected CPU. Now each cache gets all the information within that hierarchical set. That means the L2 will always contain the information from all L1 caches in the module, and the L3 will always contain the information from all L2 caches in the CPU.

    Again how this idea was executed is unknown to me. It is more guess work without actually knowing the full details of each segment of the actual produced architecture. My only guess is that the cache system is being held back by the CMOS layout setup that AMD decided to go with (the Auto layout).
    You know I think they would let you in. In a way by explaining some of these thing in the manner you have you have done them a service. I personally want to see the white papers

    Well a module level cache is good but I just have to speculate on the actual efficency of the entire setup. They need another float unit in there and IMHO it can cause unnessary WS and flushes of data that has waited around for too long. If you are running a math intensive (float and int mix) program I can see that there could be a substantial gain but I can also see that if the instructions were not written (compiled) in such a way that they would specifically take advantage of the design that things could be worse than a traditional quad.

    I am beginning to think that the Intel fake cores (at least the way they do it) could be inferior to this (and should be); but only if they can get things hammered out for PD. This design is good and it is an advancment but I would just feel like an early adopter of win 95 waiting for a patch if I had juped on this thing.
    People that buy OEM systems think Linux was a Charlie Brown character, a registry is something you see at target to buy shower gifts, RAM is a Dodge truck and a hard drive is DC at rush hour.
    Current Active Fleet:
    Daily Driver i7 3740QM T530
    Work Rig/networked backup - i7 4770K@4.4, 24GB Ram, 7850 GFX Bedroom PC - i7 870 @ 3.4, 16GB Ram, 6670 GFX
    Home School PC (kids) SB Celery @1.6, 4GB Ram, On Chip GFX Living Room HTPC - E-450@1.9 GHz, 4GB Ram
    Game Rig (kids)- 3570K@4.7, 12GB Ram, 465GTX + 9800GT (PhysX) using intel SRT
    Test Bed - In flux

  12. #12
    I once overclocked an Intel
    in a nightmare and
    it was HORRIBLE Senior



    OCF's Plaything
    Dolk's Avatar
    Join Date
    Mar 2008
    Location
    St. Louis MO
    Posts
    5,779
    In the time that BD was first proposed, the current trend for programmers was pushing FP onto the GPU rather than the CPU. The CPU was not strong enough. In the 5 years of the development, we have seen that the CPU can do some good FP calculations with enough resources.

    If you look at Fusion and BD, you may see something that for the coming future. Having on board GPU will allow for FP calculations to be pushed permanently onto the GPU, and not the CPU. That way the CPU can go back to doing its task of multi-threaded integer and memory.

    Furthermore, if we move on over to the server side, most of the calculations on server side do not involve the use of a FP. If they do, those servers usually have CUDA clusters or Vector Processors to help with the efficiency.

    Yeah BD will not be good at breaking benchmarks, but maybe its time that we changed how we benched our CPUs? Just like the time when we changed our benchmarks from single threaded to multi-threaded, or 32bit to 64bit. Maybe its time we go to only multi-threaded integer and memory.

    Hmm I should have stated that in my article, that would have been a good paragraph.
    (╯v)╯︵ ┻━┻
    currently has .2738428927348 internets
    create your own rules to find the answer
    be proud of the pink stars
    Earthdog: New codename for SB for extreme overclockers... BLEH.



  13. #13
    "The Expert" Archer0915's Avatar
    Join Date
    Nov 2008
    Location
    East Carolina University Grad School
    Posts
    4,792
    Quote Originally Posted by Dolk View Post
    In the time that BD was first proposed, the current trend for programmers was pushing FP onto the GPU rather than the CPU. The CPU was not strong enough. In the 5 years of the development, we have seen that the CPU can do some good FP calculations with enough resources.

    If you look at Fusion and BD, you may see something that for the coming future. Having on board GPU will allow for FP calculations to be pushed permanently onto the GPU, and not the CPU. That way the CPU can go back to doing its task of multi-threaded integer and memory.

    Furthermore, if we move on over to the server side, most of the calculations on server side do not involve the use of a FP. If they do, those servers usually have CUDA clusters or Vector Processors to help with the efficiency.

    Yeah BD will not be good at breaking benchmarks, but maybe its time that we changed how we benched our CPUs? Just like the time when we changed our benchmarks from single threaded to multi-threaded, or 32bit to 64bit. Maybe its time we go to only multi-threaded integer and memory.

    Hmm I should have stated that in my article, that would have been a good paragraph.
    Well I look more at how much work a cpu can do so I am with you on changing some standards to represent what the processor actually does as far as work unit type. Unfortuantely you fall into that entire class of PPL who get a bad rap of being partial when you point out facts that they refuse to consider.

    In the end I don't think eliminating benches and saying we dont like your tests is the solution. I think expanding the benches and every year evaluating what direction the total package is going in, by looking at usage models, software sales, software in development and task environment might be a great solution. My i5 can crunch and fold better than my PhII but when running I really can not tell. Honestly these days we need to stop looking so hard for stellar performance and focus on the deficiencies. Any desktop mainstream CPU manufactured today will play games as well as any other depending on the system setup (I should say any quad) as long as you have the supporting hardware. People can say all they want but if I felt my i processors were any faster in day to day use in my 24/7 rig (daily driver) I would throw out every AMD system in the house. It is just not the case.
    People that buy OEM systems think Linux was a Charlie Brown character, a registry is something you see at target to buy shower gifts, RAM is a Dodge truck and a hard drive is DC at rush hour.
    Current Active Fleet:
    Daily Driver i7 3740QM T530
    Work Rig/networked backup - i7 4770K@4.4, 24GB Ram, 7850 GFX Bedroom PC - i7 870 @ 3.4, 16GB Ram, 6670 GFX
    Home School PC (kids) SB Celery @1.6, 4GB Ram, On Chip GFX Living Room HTPC - E-450@1.9 GHz, 4GB Ram
    Game Rig (kids)- 3570K@4.7, 12GB Ram, 465GTX + 9800GT (PhysX) using intel SRT
    Test Bed - In flux

  14. #14
    I once overclocked an Intel
    in a nightmare and
    it was HORRIBLE Senior



    OCF's Plaything
    Dolk's Avatar
    Join Date
    Mar 2008
    Location
    St. Louis MO
    Posts
    5,779
    The problem is that we are still figuring out our limits with the CPU. As of right now for gaming and general usage of the CPU, we have not seen progression since we started working in multi-threading.

    CPU Architects are still trying to find the best combination with the resources they have. In the past year or two, we have seen that increasing cores is done, we have exploited that technique to its max. The next step is exploiting each core to its max, and that is what we are currently doing. Until we find that, than e will move onto other tasks maybe different types of doping of the silicon, or different kinds of transistors, or new memory architectures, or finally going 128bit (which will be sooner than some of you can imagine).
    (╯v)╯︵ ┻━┻
    currently has .2738428927348 internets
    create your own rules to find the answer
    be proud of the pink stars
    Earthdog: New codename for SB for extreme overclockers... BLEH.



  15. #15
    "The Expert" Archer0915's Avatar
    Join Date
    Nov 2008
    Location
    East Carolina University Grad School
    Posts
    4,792
    Quote Originally Posted by Dolk View Post
    The problem is that we are still figuring out our limits with the CPU. As of right now for gaming and general usage of the CPU, we have not seen progression since we started working in multi-threading.
    The way I understand what you you are saying I can not agree. Run todays highest end games on the first batch of duals. We have come a long way.

    The raw compute power has increased dramatically as well.

    What needs to be considered, as I have said before, is the entire package. Just think of some of the arch. changes to the CPU that have not involved compute power. The APU that you have mentioned already, new SIMD instructions, moving many of the NB components to the CPU to lessen the bottlenecking and a few other things.

    I see the progression that you have said has not been realized. Killing the FSB is a huge jump for heavy background multi tasking for the GP user who might want to DC. The memory controllers ability to operate faster memory...............

    CPU Architects are still trying to find the best combination with the resources they have. In the past year or two, we have seen that increasing cores is done, we have exploited that technique to its max. The next step is exploiting each core to its max, and that is what we are currently doing. Until we find that, than e will move onto other tasks maybe different types of doping of the silicon, or different kinds of transistors, or new memory architectures, or finally going 128bit (which will be sooner than some of you can imagine).
    Your argument seems to contractdict itself. You said we are not reaching potential yet you almost sound happy about 128bit.

    Not that 128 is bad but to me it just makes for bigger slopier programs with sloppy, generic, coding.

    I personally do agree with the theme of your post and it is the software that has to take advantage of the hardware and do it efficently.

    Also I was making a case that benches could be revamped with usage models that were more represanative of todays average user. With FB and all of this social garbage and streaming and blah, blah we can not readily say that a CPU is crap because it is just as fast at FB as the rest of the pack.

    Yeah gaming benches show this and that but unless I am playing a game where frame rates give a specific advantage then 80FPS or 200FPS makes no diffrence. I want to see tha parts that stress the CPU revealed. I mean who cares if you max at 200FPS and the other guy maxes at 180 when he also bottoms at 60FPS and bottom out at 40FPS and it is consitent across video cards. How well is the CPU is handling heavy BOT AI.

    I would also like to see timed tests and real workloads used.
    Last edited by Archer0915; 10-21-11 at 12:50 PM.
    People that buy OEM systems think Linux was a Charlie Brown character, a registry is something you see at target to buy shower gifts, RAM is a Dodge truck and a hard drive is DC at rush hour.
    Current Active Fleet:
    Daily Driver i7 3740QM T530
    Work Rig/networked backup - i7 4770K@4.4, 24GB Ram, 7850 GFX Bedroom PC - i7 870 @ 3.4, 16GB Ram, 6670 GFX
    Home School PC (kids) SB Celery @1.6, 4GB Ram, On Chip GFX Living Room HTPC - E-450@1.9 GHz, 4GB Ram
    Game Rig (kids)- 3570K@4.7, 12GB Ram, 465GTX + 9800GT (PhysX) using intel SRT
    Test Bed - In flux

  16. #16
    Senior Benchmark Addict
    Join Date
    Feb 2003
    Location
    Hillsboro, OR
    Posts
    10,656
    I hope that this was some type of typo on your part because it (obviously) couldn't be further from the truth:

    This is the first time that AMD has implemented branch prediction.
    El<(')>Maxi: I still have your board...and I'm afraid I lost your address so I can't send it back...and my pm box is broken...and I can't remember where the PO is anyway

  17. #17
    Senior Internet Fart Owenator's Avatar
    Join Date
    Dec 2000
    Location
    Bear, Delaware
    Posts
    1,455
    Nice write up!

    One comment - in your article you state:
    "As a side note, it was actually the multi-cores that came before the paralleled software itself. It was a last ditch effort of the computer architects to keep an aging technology growing and relevant to newer, more advanced computers."

    I think you should qualify this as "for Windows PC's". Because parallel software has been around for decades. It is true that writing parallel code is newer to Windows PC's. This is mostly because Microsoft never really took it seriously.

    I used parallel software techniques in the codes I wrote on UNIX systems back in the 90's. And it goes back further than that. Software running on supercomputers is parallelized. Windows PCs got stuck in the one CPU does it all because that's what the first IBM Personal Computers did (<-- fore fathers of modern PC's). They could just as easily had dual or more CPU's. It's to bad that we didn't switch over to parallized software sooner when the first multi-core consumer PC's came out.

    All the above said, I do hope that we get better software that can actually use all these wonderfull new cores/nodes!
    System: ASRock P67 Extreme6, i7-2600k @ 4.6 GHz, 16GiB Corsair Vengeance DDR3 1600, Galaxy GTX 570 & EVGA GTX570 in SLI @ stock, 120GB Corsair F120 SSD for Win 7 Pro x64, WD Caviar Black 1 TB for everything else, Lite-On Dual layer DVDRW, Cooler Master Silent Pro Gold 1200W PSU, CM690 II advanced case.
    Accessories: ACER 23" WS LCD + ACER 21" WS for web/TS/etc. DIY 4-way monitor stand. Logitech G35 headset, Idelazon Fang keypad, Logitech G9 mouse, Zalman FPS Gun mouse, Logitech Extreme 3D Pro joystick, momo racing wheel w/ pedals.
    Water Cooling : Replumbed to one single loop. Apogee GTZ CPU block, MCR220 w/ 2x Panaflo 120x38 'M's, MCR220 w/ 2x Panaflo 120x38 'H's, MCP350 w/ bored over Alphacoool top. MCP220 drive with second MCP350 in series w/first. 2x MCW60 GPU blocks Temps: TBD

    Blackbelt Ubercloxx0r My Current Rig My First WC Rig DIY Nvidia Surround Setup


  18. #18
    I once overclocked an Intel
    in a nightmare and
    it was HORRIBLE Senior



    OCF's Plaything
    Dolk's Avatar
    Join Date
    Mar 2008
    Location
    St. Louis MO
    Posts
    5,779
    @Gautam, This is true in a sense. AMD has used a different type of branch prediction before, but the one they have implemented for BD is much different than usual. Let me look through my notes to see where I got this quote.

    @Owenator, again also true. I should have stated this for the PC side. Server and workstation has been using parallel coding for some time.
    (╯v)╯︵ ┻━┻
    currently has .2738428927348 internets
    create your own rules to find the answer
    be proud of the pink stars
    Earthdog: New codename for SB for extreme overclockers... BLEH.



  19. #19
    Senior Benchmark Addict
    Join Date
    Feb 2003
    Location
    Hillsboro, OR
    Posts
    10,656
    Quote Originally Posted by Dolk View Post
    @Gautam, This is true in a sense.
    No, it's not true in any sense. Branch prediction has been around for decades. (longer than you or I have) AMD has simply moved away from a more traditional hierarchical BP to one that's decoupled from fetch and uses branch fusion. There's a world of a difference between improving their prediction and using it for the first time (which was probably some time in the early 90's with whatever it was they were pitting against the Pentium)
    El<(')>Maxi: I still have your board...and I'm afraid I lost your address so I can't send it back...and my pm box is broken...and I can't remember where the PO is anyway

  20. #20
    I once overclocked an Intel
    in a nightmare and
    it was HORRIBLE Senior



    OCF's Plaything
    Dolk's Avatar
    Join Date
    Mar 2008
    Location
    St. Louis MO
    Posts
    5,779
    Yeah I am going to take back what I said, I really have no idea why I put that statement in. I went through all my notes and couldn't find anything to backup that statement. Let me re-write that. Thanks for the catch, I'll update it to be accurate.
    (╯v)╯︵ ┻━┻
    currently has .2738428927348 internets
    create your own rules to find the answer
    be proud of the pink stars
    Earthdog: New codename for SB for extreme overclockers... BLEH.



  21. Thanks!

    yo4444 (10-28-11)

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •