• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Question on CPU processing power vs. memory bandwidth

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

strokeside

Member
Joined
Jan 10, 2002
Location
Dublin, Ireland
A friend asked me this question:
If your memory sends data to your CPU on a 32 bit 100MHz bus, why would you need a CPU speed greater then 100MHz (assuming your CPU is a 32bit one)?
The RAM is supplying the CPU with data at the same rate the CPU can absorb it.
Why have a 1GHz CPU with PC133 ram?Why does the extra CPU speed make a difference.
Why does a 1.4GHz CPU do better then a 1GHz cpu even when they are using the same memory bus speeds and widths?
(this may be CPU dynamics 101, but for some reason it is confusing me).
I'm sure I am missing some obvious point, but I am still confused.
Can you please explain this?
 
It shouldn't be necessary to run the RAM faster, but you have to keep in mind what the CPU is doing:

The CPU has a built-in prefetch, for its buffer level 1 and level 2 (aka cache), and it will fill the buffer according to a specific algorithm, just in case the CPU needs it.

This is done at a rate set by the CPU's FSB, but limited by the RAM's timings.

The NForce2 chipset can accelerate the RAM reading process, by interleaving the read or write commands between the ram sticks, but it is still only optimized for synchronous operation with the CPU, because the NB will not act as a buffer. It is useful at reducing the latency time, because the cache memory runs a lot faster than the RAM.

Since most programs make good use of a large amount of memory, it's always good to set the RAM in such a way as to maximize the transfer rate of the RAM to at least match what the CPU can take (within the limits of the chipset).

On the other hand, if all you do with your PC is encode music/videos (does not include editing), or try to run something scientific, like calculating pi to the last digit, then you would want to concentrate your configuration (OC) on raw processing power, before accelerating the RAM, because you may not need the RAM bandwidth for that application.

(Well, that sums up how I understand it. Can anyone correct any of this?)
 
CPUs don't just act on new data being loaded from ram. They often do something to the same data over and over and over.

Let's just say hypothetically computer A needs to load the characters 'ABCDE' from ram and then just take that data and change the letters around so that it's 'EABCD' 'DEABC' etc, 10,000 times. That's just a completely random example.

Let's say that computer B just needs to load the data "ABCDEFGHI" and do one thing to it, say, shift it once.

Well, computer A only needs to wait for the ram to load 'ABCDE', but it has to do something with that data 10,000 times. In this case, the ram just needs to get the data there - it's a tiny fraction of the process - and the CPU is operating on that data 10,000 times. In this case, the ram did it's job, and the faster the CPU is, the faster it'll accomplish this task, regardless of the ram speed.

In the second case, that's closer to what your understanding is - the ram needs to load something basically every time the CPU wants to act on something.

However, in real world computing, computer A is a lot more like reality. Basic data loads from RAM, but the CPU often does hundreds of thousands of things with that data in loops that doesn't actually require any new data from memory.

I realize I explained it badly, but hopefully you get my point. You kind of see a CPU as a wheel spun by a stream. There's no point for the wheel to turn faster than the stream, because then it doesn't have anything to act on.

But in reality, the same data can be used in 10 million different ways, and so CPUs need to be significantly faster than RAM. For example, if you want your computer to figure out each prime number from 1 to 1000, you only have to load a little bit of data from the program to RAM, but from there, the CPU has to do the same thing hundreds of thousands of times, with no new data, to figure out prime numbers.
 
Depends what you mean by busses IF you are talking about the internal bus then no because there is no point of using 64bit internal busses when all the data will be 32bit. The instruction bus is infact 72bit (I think either that or 74) This is because it has to cope with 2 32bit address inputs and 1 instruction input (8 or 10 bits cannot rememver)
Hope this answers the queston though I might be incorrect mind fussy at moment to tired falling asleep zzzzzzzzzzzzzzzzzzzz.
 
On the origional post. The CPU uses the RAM as little as it possibly can it uses the Cache for most things. In modern CPUs the chace has a 99% hit rate and so it doesnt have to go to the ram that "often" only a few thousand times a second. What SenorBeef siad is pretty much right.

If a CPU does do a looping statement it wont need to access the RAM much but it will need to access teh Cache for every instruction this is due to how x86 CPUs work in that they have to get their instructions from RAM. However as teh Cache mirrors by location usually the memmory addresses around the current instruction are mirrored as well therefore reducing teh need to access the RAM.

The topic of Cache algorighms is an interesting one and is pretty complex but basically what data is stored in the Cache is determined by the time and the location. The time is how long it has sat there and not been accessed logically the longer it is there not being accessed the less likly it is to be accessed. The second location is the data/instruction in respect to the current one/s being run and the location bit get quite complex with tables of which instruction was run when to try and find a patern.
 
strokeside,

The 100Mhz or 133Mhz data bus you are talking about is the external memory bus, and yes it is much slower than the CPU. As others have said, the CPU can cache data so it has to access the data from RAM much faster. As well the CPU has internal registers that it can use to manipulate data, and an internal bus IN the cpu that runs at the full speed of the CPU. For example a 2.4GHz CPU is running at 2.4Ghz internally. The L1 Cache runs at that speed as well. L2 is usually a bit slower but still MUCH faster than RAM.

The more you minimize RAM access, the more you allow the CPU to run at it's full internal speed. Compilers can optimize code to run like this.

Plus modern CPUs have branch prediction that allows to CPU to 'guess' ahead when a decision is to be made, even if it doesn't know the outcome. It guesses what data it may need from RAM next, and loads that information saving time. (Or taking a performance hit if it is wrong, but it is worth it overall.)
 
Plus modern CPUs have branch prediction that allows to CPU to 'guess' ahead when a decision is to be made, even if it doesn't know the outcome. It guesses what data it may need from RAM next, and loads that information saving time. (Or taking a performance hit if it is wrong, but it is worth it overall.)

I didnt think branch predicts had anything to do wiht information coming from memory (as literally anything could be coming)

I thought they predicted more the outcome on the pipeline the information that has already been sent (whether come directly from, or has been stored in a cache and is coming that)
 
Sort of, I was simplifying it. Maybe I shouldn't have brought it up.
But, yes, it has more to do with the pipeline and cache RAM.
 
omg i want that amd doll lol..anyway your friend is retarded the cpu has to continuslly process data and the fsb doesnt.it just sends info to cpu to ram also it take longer for the cpu to process the data...to summ it up
 
Movax said:
strokeside,

The 100Mhz or 133Mhz data bus you are talking about is the external memory bus, and yes it is much slower than the CPU. As others have said, the CPU can cache data so it has to access the data from RAM much faster. As well the CPU has internal registers that it can use to manipulate data, and an internal bus IN the cpu that runs at the full speed of the CPU. For example a 2.4GHz CPU is running at 2.4Ghz internally. The L1 Cache runs at that speed as well. L2 is usually a bit slower but still MUCH faster than RAM.

The more you minimize RAM access, the more you allow the CPU to run at it's full internal speed. Compilers can optimize code to run like this.

Plus modern CPUs have branch prediction that allows to CPU to 'guess' ahead when a decision is to be made, even if it doesn't know the outcome. It guesses what data it may need from RAM next, and loads that information saving time. (Or taking a performance hit if it is wrong, but it is worth it overall.)

Not quite. L2 cache is the back-side bus, and it cycles at the same rate as the core...not slower (in current design). Some people don't understand what cache is. Cache is just a high speed mirror of the RAM. The cache insulates the CPU from RAM, therefore it's important that needed data remains coherent in the cache.
There are a few ways that chip makers go about increasing cache coherency....namely through cache organization. Take a four-way set associative cache for example:
What we're doing is taking that cache and simply splitting it into 4 blocks. Each block is then organized into either 128 or 256 lines of 16 bytes each. Think of these blocks as essentially bookmarks, if the entire content of the RAM is the book. So, if you settle on marking only 4 areas of the book- then you have yourself a 4-way set associative cache. Keep in mind though, more is not always better....additional overhead is required with more blocks, which means more time is taken to check every block to see which one has the data you need. But, more also means you have a greater chance of having the data needed- meaning greater cache coherency. There's no free lunch really, but there are ways of absorbing a cache miss through a non-blocking cache. Non-blocking is a technique that hides memory delays by exploiting the overlap of processor operations with data accesses. Basically, this allows the processor to continue doing something nondependent of the missing data (in the event of a cache miss).

Dynamic Execution is a whole 'nother subject, which doesn't necessarily have anything to do with the cache itself....
And you can't talk about the BPU (branch prediction unit) without talking about speculative execution and data flow analysis either, because it all goes together.

-PC
 
Last edited:
Back