View Full Version : Intel pulling a fast one on us? Maybe So.
Overclocker456
07-11-03, 11:38 PM
After getting a P4P800 and 1GB of PC3500 memory that could do 2/2/2/5 at 433MHz, I felt my 2.8b at 3.5GHz was the weakest link.
I was dreaming of 3.5GHz with over 1GHz of bus speed and Huge memory bandwith that the 2.8b wasn't able to produce. So I did it. I went out and picked up a 2.6C... The max the chip could do was 3.5GHz with 1.85v. And even that wasn't 100% stable. While the 2.8b could hit 3.6GHz with 1.85v 100% stable.
Now lemme get to the point. My initial reaction the the 2.6C benchmarks at 3.5GHz were wow.. Nice memory bandwith and Sandra scores. But that's were it ended. I decided to compare my 2.8b at 3.5GHz (668MHz bus speed) to the 2.6C at 3.5GHz (1080 bus speed). Hyper threading was enabled on the 2.6C. Memory speed were the same with the same timings. Of course in Sandra memory and CPU benchmarks the 2.6C easily beat the 2.8b. When when it came to 3dmark2001 SE, Super PI, Prime 95 benchmark, PCmark2002 memory and a few other benchmarks, the 2.6C at 3.5GHz lost every benchmark to the 2.8b at 3.5GHz.
My initial reaction was something is wrong, how can the 2.8b win every test? The 2.6C has SO much more memory bandwith, A huge lead in bus speed, and Hyper threading. So I reinstalled windows xp and did a round 2.. again the 2.8b won all the REAL world test, expect Sandra. I was ****ed and confused. Alteast you'd think they'be be equal, but the 2.6C loose by up to 5% is a disturbing.
when I first heard of the 800MHz bus speed chips I was like wow, what a big jump, I wonder how they did it. well I think I know.. When I compared Super Pi, Prime 95 benchmark PCmark memory, and other latecny influcene tests the 2.6C was slower. My verdict is that the C chips have a increased latency than the B chips. Even though the C chip had a 1500MB/s lead in the memory benchamarks in Sandra it didn't win one test other than that. I'm sure intel had to do this in order to get the chips to run stable at 800MHz or more, not to mention the motherboards running over 1GHz. So I'm getting rid of the 2.6C and keeping my trusty old 2.8B.
I'm not saying the Pentium 4 C chips sucks or anything, I'm saying if you have a Pentium 4 b chip, keep it till prescott comes out because the only difference you'll see in will be in Sandra, and Personally I think Sandra doesn't mean anything. It's synthetic.
In general what you are saying is both correct and important, but there are errors in your analysis that give the wrong impression about a couple of factors.
Firstly, re-try your comparison with the HT turned off on the 2.6C. What you are seeing is the generally harmful effect of HT on applications that cannot derive a benefit from it (most of them). On tasks that lend themselves to multithreading HT can be a real boon, but there is no free lunch. Many performance metrics are slower with HT enabled.
As far as latency, there is no such thing as "slower" latency. You mean increased latency. Minor point, yes, but complex topics such as these tend to confuse especially if we don't use clear terminology. There is no latency difference from one cpu to the other. It is simply the FSB that is running faster, and the silicon of the cpu is plenty fast to acommodate a 800MHz clock speed without introducing increased latencies, as evidenced by the fact that the cpu core runs at 2600 or more. Again, this is the HT getting in the way.
I do agree wholeheartedly with your conclusions regarding the value of a 800fsb in place of a 533fsb one though. The simple fact is the fsb is basically useless as the 533fsb (especially in 6-700MHz oc'ed form) is already fast enough to accommodate the bandwidth generated by even dual channel DDR400. Since it is not the limiting factor, increasing it pays little in the way of dividends. There is no elegant solution to a problem that doesn't exist.
You can understand how little benefit is likely form the increased bandwdith by comparing the single channel 845pe to the dual channel 865 or 875. The difference in most applications is indeed negligible, and this is in the face of a 70-80% diffference in realized bandwidth. If a 70-80% percent increase in the actual memory subsytem bandwidth translates into so little difference in application performance, obviously increasing the interface speed between the memory subsytem and the CPU slightly more beyond the actual memory bandwidth can have only a barely measurable effect.
And I also agree wholeheartedly that synthetic benchmarks like Sandra are essentially useless. Like synthetic hard drive benchmarks the results fall more into the realm of a curiousity than a performance measurement. It might be a popular game, but it is a meaningless one nonethless.
So in the 800fsb vs 533fsb cpu comparison, HT is the big factor. If you are crunching SETI it is a huge advantage, but if your goal is to create impressive performance on typical (single threaded) applications it often compromises the result. I think comparison between a 3.06 and a C chip would allow you to see this more clearly, although I expect that simply disabling the HT on the C chip will allow the same point to be made.
Thanks for going to the effort to test the two and report your findings.
dustybyrd
07-12-03, 02:02 AM
well....i would say that for users who like to seriously multitask (like encode mp3's or video while photoshoping or gaming) then the hyperthreading is a huge advantage...i have seen these real world benchmarks and they are much better with HT or two cpu's (like AMD's)....
but if you are a single process user then the HT threading is useless and will (just like two CPU's) be a little slower than an equally clocked single cpu...
Overclocker456
07-12-03, 02:12 AM
Originally posted by larva
In general what you are saying is both correct and important, but there are errors in your analysis that give the wrong impression about a couple of factors.
Firstly, re-try your comparison with the HT turned off on the 2.6C. What you are seeing is the generally harmful effect of HT on applications that cannot derive a benefit from it (most of them). On tasks that lend themselves to multithreading HT can be a real boon, but there is no free lunch. Many performance metrics are slower with HT enabled.
As far as latency, there is no such thing as "slower" latency. You mean increased latency. Minor point, yes, but complex topics such as these tend to confuse especially if we don't use clear terminology. There is no latency difference from one cpu to the other. It is simply the FSB that is running faster, and the silicon of the cpu is plenty fast to acommodate a 800MHz clock speed without introducing increased latencies, as evidenced by the fact that the cpu core runs at 2600 or more. Again, this is the HT getting in the way.
I do agree wholeheartedly with your conclusions regarding the value of a 800fsb in place of a 533fsb one though. The simple fact is the fsb is basically useless as the 533fsb (especially in 6-700MHz oc'ed form) is already fast enough to accommodate the bandwidth generated by even dual channel DDR400. Since it is not the limiting factor, increasing it pays little in the way of dividends. There is no elegant solution to a problem that doesn't exist.
You can understand how little benefit is likely form the increased bandwdith by comparing the single channel 845pe to the dual channel 865 or 875. The difference in most applications is indeed negligible, and this is in the face of a 70-80% diffference in realized bandwidth. If a 70-80% percent increase in the actual memory subsytem bandwidth translates into so little difference in application performance, obviously increasing the interface speed between the memory subsytem and the CPU slightly more beyond the actual memory bandwidth can have only a barely measurable effect.
And I also agree wholeheartedly that synthetic benchmarks like Sandra are essentially useless. Like synthetic hard drive benchmarks the results fall more into the realm of a curiousity than a performance measurement. It might be a popular game, but it is a meaningless one nonethless.
So in the 800fsb vs 533fsb cpu comparison, HT is the big factor. If you are crunching SETI it is a huge advantage, but if your goal is to create impressive performance on typical (single threaded) applications it often compromises the result. I think comparison between a 3.06 and a C chip would allow you to see this more clearly, although I expect that simply disabling the HT on the C chip will allow the same point to be made.
Thanks for going to the effort to test the two and report your findings.
Very well said... to bad it cost me of few dollars to find this all out. Thanks for correcting me on the latency issue. Increased is the key word. Now, Do you think maybe the motherboard increases latency to compensate with the high bus speed? Kind of like how some of the BX chipsets used to do?? I remember a few that once you went past 133MHz the latencey would increase. Just a thought.:p
Overclocker456
07-12-03, 02:14 AM
Originally posted by dustybyrd
well....i would say that for users who like to seriously multitask (like encode mp3's or video while photoshoping or gaming) then the hyperthreading is a huge advantage...i have seen these real world benchmarks and they are much better with HT or two cpu's (like AMD's)....
but if you are a single process user then the HT threading is useless and will (just like two CPU's) be a little slower than an equally clocked single cpu...
I personally didn't notice the difference... Although I'm sure in the future HT will play a bigger role.
dustybyrd
07-12-03, 03:08 AM
I personally didn't notice the difference... Although I'm sure in the future HT will play a bigger role.
time some "tough" programs while encoding an mp3 or video....then you'll see what i mean...
try recording tv, watching a prerecorded movie, burning a cd, encoding video all at the same time with HT on and off and then see what the difference is
Overclocker456
07-12-03, 03:56 AM
Originally posted by dustybyrd
time some "tough" programs while encoding an mp3 or video....then you'll see what i mean...
try recording tv, watching a prerecorded movie, burning a cd, encoding video all at the same time with HT on and off and then see what the difference is
I tried running multiple versions of prime 95 and Super PI at the same time, the difference was VERY small.. And looking at my results from hours of testing in todays programs HT makes things slower by up to 5%. Maybe 3Dmark2005 will be HT ready huh?;)
NookieN
07-12-03, 04:10 AM
I've tried benchmarking my 3.06 @3.33Ghz with both HT enabled and disabled. I see virually _no_ difference in single-threaded Super Pi and Prime95 benchmarks when HT is enabled vs. disabled. Super Pi 1M is 47s with HT-enabled and 48s with HT-disabled. Prime 95 2048k FFT is 64ms for both.
Having HT turned on really should not hurt any single threaded application when it is the only thing running. Of course, having HT turned on when other things are running _will_ negatively impact your benchmark scores. Those other things don't have to be other applications you're running. They could be antivirus software, Windows update, Windows paging, spyware, etc.
Overclocker456
07-12-03, 04:22 AM
Originally posted by NookieN
I've tried benchmarking my 3.06 @3.33Ghz with both HT enabled and disabled. I see virually _no_ difference in single-threaded Super Pi and Prime95 benchmarks when HT is enabled vs. disabled. Super Pi 1M is 47s with HT-enabled and 48s with HT-disabled. Prime 95 2048k FFT is 64ms for both.
Having HT turned on really should not hurt any single threaded application when it is the only thing running. Of course, having HT turned on when other things are running _will_ negatively impact your benchmark scores. Those other things don't have to be other applications you're running. They could be antivirus software, Windows update, Windows paging, spyware, etc.
In my tests I did not compare the scores with HT and off on the same CPU, in this case the 2.6C. The 2.6C at 3.5GHz was with the HT enabled, and the 2.8b at 3.5GHz of course don't have HT.
For all tests, nothing was running on the backround, it was a clean install with all the updates... I still want to know why the 2.6C lost every test..hmmmmm Where's a intel rep when you need one. Don't you think the 2.6C at 3.5GHz should have atleast matched the 2.8B at 3.5GHz? not to mention it's huge lead in Sandra?
As far as latency, there is no such thing as "slower" latency. You mean increased latency. Minor point, yes, but complex topics such as these tend to confuse especially if we don't use clear terminology. There is no latency difference from one cpu to the other.
Actually thats not 100% accurate, especially if you are talking about AMD's (I know this thread isn't about AMD's, but hey - theory still applies). When running through a sequence of instructions, the CPU has to do several things before the instruction can actually be excuted. For instance, both AMD's and Intel's processors are RISC based with a CISC wrapper (the processor only takes a small number of very simple instructions, but has a layer outside the core that can turn 1 complex instruction into many simple ones). So to execute an instruction, it has to be decoded before being executed (turn that 1 complex instruction into simple ones). This takes (at least) a clock cycle. And this is just one example. There are many other steps along the way that the CPU as a whole has to take before the core can execute the instruction. This all results in a pipeline, with x number of steps, and x number of instructions in at a time. So because an instruction takes x cycles to execute you have a latency between sending the instruction to be executed and the execution actually taking place. In real terms you can rarely notice any performance difference with longer pipelines (and higher latency) unless you have a high predictive branching error rate (each time a mistake is make with predictive branching the entire pipeline has to be discarded). It's a technical point, but the latency does change from processor to processor, and while it doesn't have a huge impact on performance, it is a key structural feature that is part and parcel of analysing CPU behievour.
Overclocker456
07-12-03, 04:26 AM
Originally posted by flux
Actually thats not 100% accurate, especially if you are talking about AMD's (I know this thread isn't about AMD's, but hey - theory still applies). When running through a sequence of instructions, the CPU has to do several things before the instruction can actually be excuted. For instance, both AMD's and Intel's processors are RISC based with a CISC wrapper (the processor only takes a small number of very simple instructions, but has a layer outside the core that can turn 1 complex instruction into many simple ones). So to execute an instruction, it has to be decoded before being executed (turn that 1 complex instruction into simple ones). This takes (at least) a clock cycle. And this is just one example. There are many other steps along the way that the CPU as a whole has to take before the core can execute the instruction. This all results in a pipeline, with x number of steps, and x number of instructions in at a time. So because an instruction takes x cycles to execute you have a latency between sending the instruction to be executed and the execution actually taking place. In real terms you can rarely notice any performance difference with longer pipelines (and higher latency) unless you have a high predictive branching error rate (each time a mistake is make with predictive branching the entire pipeline has to be discarded). It's a technical point, but the latency does change from processor to processor, and while it doesn't have a huge impact on performance, it is a key structural feature that is part and parcel of analysing CPU behievour.
Do you think it's possible that either the motherboard or the CPU
increase the latency when using Pentium 4 C chips to compensate for that High bus speed?
Mark Larson
07-12-03, 04:30 AM
Originally posted by dustybyrd
well....i would say that for users who like to seriously multitask (like encode mp3's or video while photoshoping or gaming) then the hyperthreading is a huge advantage...i have seen these real world benchmarks and they are much better with HT or two cpu's (like AMD's)....
but if you are a single process user then the HT threading is useless and will (just like two CPU's) be a little slower than an equally clocked single cpu...
I've said it once, i've said it a million times. And people all over the web back up what i say. Hyperthreading DOES NOT give you another CPU. Just because you see another CPU in the Task Manager doesn't mean that another CPU magically appears in the system. Definitely not like a dual AthlonMP or dual Xeon.
All HT does is increase the efficiency with which the pipeline is fed. That is all.
RaiNZerO
07-12-03, 04:33 AM
Originally posted by Overclocker456
After getting a P4P800 and 1GB of PC3500 memory that could do 2/2/2/5 at 433MHz, I felt my 2.8b at 3.5GHz was the weakest link.
I was dreaming of 3.5GHz with over 1GHz of bus speed and Huge memory bandwith that the 2.8b wasn't able to produce. So I did it. I went out and picked up a 2.6C... The max the chip could do was 3.5GHz with 1.85v. And even that wasn't 100% stable. While the 2.8b could hit 3.6GHz with 1.85v 100% stable.
Now lemme get to the point. My initial reaction the the 2.6C benchmarks at 3.5GHz were wow.. Nice memory bandwith and Sandra scores. But that's were it ended. I decided to compare my 2.8b at 3.5GHz (668MHz bus speed) to the 2.6C at 3.5GHz (1080 bus speed). Hyper threading was enabled on the 2.6C. Memory speed were the same with the same timings. Of course in Sandra memory and CPU benchmarks the 2.6C easily beat the 2.8b. When when it came to 3dmark2001 SE, Super PI, Prime 95 benchmark, PCmark2002 memory and a few other benchmarks, the 2.6C at 3.5GHz lost every benchmark to the 2.8b at 3.5GHz.
My initial reaction was something is wrong, how can the 2.8b win every test? The 2.6C has SO much more memory bandwith, A huge lead in bus speed, and Hyper threading. So I reinstalled windows xp and did a round 2.. again the 2.8b won all the REAL world test, expect Sandra. I was ****ed and confused. Alteast you'd think they'be be equal, but the 2.6C loose by up to 5% is a disturbing.
when I first heard of the 800MHz bus speed chips I was like wow, what a big jump, I wonder how they did it. well I think I know.. When I compared Super Pi, Prime 95 benchmark PCmark memory, and other latecny influcene tests the 2.6C was slower. My verdict is that the C chips have a increased latency than the B chips. Even though the C chip had a 1500MB/s lead in the memory benchamarks in Sandra it didn't win one test other than that. I'm sure intel had to do this in order to get the chips to run stable at 800MHz or more, not to mention the motherboards running over 1GHz. So I'm getting rid of the 2.6C and keeping my trusty old 2.8B.
I'm not saying the Pentium 4 C chips sucks or anything, I'm saying if you have a Pentium 4 b chip, keep it till prescott comes out because the only difference you'll see in will be in Sandra, and Personally I think Sandra doesn't mean anything. It's synthetic.
Could you post your numbers for super pi, prime95 and pcmark with the 533mhz cpu?
Overclocker456
07-12-03, 04:34 AM
Originally posted by Mark Larson
I've said it once, i've said it a million times. And people all over the web back up what i say. Hyperthreading DOES NOT give you another CPU. Just because you see another CPU in the Task Manager doesn't mean that another CPU magically appears in the system. Definitely not like a dual AthlonMP or dual Xeon.
All HT does is increase the efficiency with which the pipeline is fed. That is all.
Well said Mark..:p
NookieN
07-12-03, 04:46 AM
Originally posted by Mark Larson
I've said it once, i've said it a million times. And people all over the web back up what i say. Hyperthreading DOES NOT give you another CPU. Just because you see another CPU in the Task Manager doesn't mean that another CPU magically appears in the system. Definitely not like a dual AthlonMP or dual Xeon.
All HT does is increase the efficiency with which the pipeline is fed. That is all.
Yes Hyperthreading does not give you two physical CPUs. However I disagree that it only increases the "efficiency with which the pipeline is fed." Saying that implies that HT is speeding up the micro-op decoder or reducing cache misses. If anything, the opposite of those happens with HT.
But HT _does_ allow you to have two different instructions in the same pipeline stage when there is not a resource conflict between them. A lot of the time, there will be resource conflicts, so HT can't always help. But in a situation where the instructions demand different parts of the CPU, HT allows both to plow right through the chip at once.
I don't think it's likely - I can't imagine anyway the motherboard could affect the way or speed the CPU handles data, only how fast it can pump data to it. As for the CPU having a different latecy between the b and c chips, it's possible. I don't know enough (anything) about the differences between the b and c chips to be able to comment why the readings are different. In addidtion I don't know how this quad-pumping the FSB works. The only thing I can think of to cause this problem is the high bus speed being unstable - occasionaly certain errors crop up with data transmission in high bus speeds that are dectected and re-requested. If this is happening alot then a significant drop in performance is likely. But I don't know how you would test for this.
Out of interest, what is the result if you run both processors at 2.8? Is the 5% difference still there?
What HT does (this is not specific to P4, or even Intel - HT is just a name given to TLP (Thread-Level Parallelism)) is run a second pipeline through the same CPU, allowing this second pipeline to use parts of the CPU that are currently idle.
Mark Larson
07-12-03, 04:57 AM
Originally posted by NookieN
Yes Hyperthreading does not give you two physical CPUs. However I disagree that it only increases the "efficiency with which the pipeline is fed." Saying that implies that HT is speeding up the micro-op decoder or reducing cache misses. If anything, the opposite of those happens with HT.
But HT _does_ allow you to have two different instructions in the same pipeline stage when there is not a resource conflict between them. A lot of the time, there will be resource conflicts, so HT can't always help. But in a situation where the instructions demand different parts of the CPU, HT allows both to plow right through the chip at once.
You know, my initial impression of you wasn't so good, but you've turned out to be a good `un. Hope you stick around and visit us at the [H] and the [M] once in a while.
</OT>
dustybyrd
07-12-03, 06:14 AM
i see that no one is believing that HT can enable MUCH better multitasking...
here's one source:
maximum PC july 2003 issue pg 58...
3200xp compared to 3ghz HT p4
photoshop by itself:
34sec for amd 3200xp+ vs 34sec for p4
add same photoshop benchmark with musicmatch:
82sec for amd 3200+ vs 52sec for p4 w/HT
billstuck
07-12-03, 06:27 AM
Originally posted by dustybyrd
i see that no one is believing that HT can enable MUCH better multitasking...
here's one source:
maximum PC july 2003 issue pg 58...
3200xp compared to 3ghz HT p4
photoshop by itself:
34sec for amd 3200xp+ vs 34sec for p4
add same photoshop benchmark with musicmatch:
82sec for amd 3200+ vs 52sec for p4 w/HT
HT does benefit me greatly in the situations you described.
I have a 2.4c OC'ed to 3.12. I make tons of XVCD movies which is hours of 100% CPU MPEG1 video encoding. I use TMPGEnc which like most video encoding programs is multithreaded.
In my benchmarks I get a 30% increase in encoding time with HT turned on compared to HT turned off all other things being equal.
It works
I agree that there is little improvement to be had from increasing the FSB throughput when this is not matched by (~) an equal increase in throughput in the RAM department at least. Also, even in the case where both are matched and the boost is thus optimal, it's still a 'secondary' speed boost, compared to for example CPU speed. Now this reflects on Intel's current offerings as well, which are essentially barely faster than their lower-FSB (but at comparable Mhz) counterparts.
Of course, it's still a boost when all is said and done, and people can decide whether they want to have it at the cost it's made available at.
One point that struck me is that while the benchmarks run here may indicate a possible HT drawback, that does not mean you should automatically disable HT "to get a boost" in these benches.
As pointed out, HT does have benefits when multiple processes are involved, and if this is not shown in these benchmarks, it may be an idea to create a testbench that does. After all, being able to do more things at the same time is a speed increase too.
Originally posted by Overclocker456
Very well said... to bad it cost me of few dollars to find this all out. Thanks for correcting me on the latency issue. Increased is the key word. Now, Do you think maybe the motherboard increases latency to compensate with the high bus speed? Kind of like how some of the BX chipsets used to do?? I remember a few that once you went past 133MHz the latencey would increase. Just a thought.:p
Yes, the fact that you put your money where your mouth is (unlike our, ehem, crackpot fanboy input later in the thread that is just intended to justify why he didn't...) allowed you to learn quite a bit and gives us a valuable example at the same time. I'm sorry you didn't get the improvement you were after, but asking questions always teaches us something even if the answer is no. The man who tries the most hardware learns the most, I do so miss the days when I had limitless access to this type of gear.
To answer your question, I don't really think the memory latency has increased. There is a dos-based memory test called cachemem that xbit labs uses in their excellent chipset comparison articles that will allow you to verify this. Read the Albatron 865 mb review, I believe they have a link to it in there. The article is also an excellent comparison of the 845pe, 865pe, and 875 chipsets with respect to both subsystem and application performance.
BX did not increase latency at elevated fsb. This implies that it "reacted" to setting the fsb above 133MHz in some way. This is obviously impossible, as the chipset was intended to run no more than 100MHz fsb. There would be no point to engineering a reaction to high fsb's that the board was never intended to reach.
BX, like 865 and 875, had silicon far better than it needed to run the target clock rates. It was a 100MHz chip that ran at 160MHz, and latency does nothing but improve as the increased clock rate shortens the period of any one cycle. Some users might have had to increase their cas timings in the bios to allow their ram to cope with the additional clock speed, but this is not inherent in high fsb operation.
I tried every type of ram on the market searching for PC100, and later PC133 SDRAM before finally finding the later high density Infineon PC133 that allowed me to utilize my 150fsb/825, 900, and 1050 MHz coppermine P3 formulas without increasing the latency. My final BX rig was a P3B-F running at its max of 150fsb with a C step P3-700 @ 1050 MHz. With the aforemention Infineon ram, I was able to run 2-2-2 timings with this combo, actually creating improved latency performance as compared to the same board running at 100 or 133MHz fsb's.
The case of 865 and 875 chipset is indeed analogous to the old BX quandry, except that they are more advanced and offer more tuning options that allow us to increase their fsb as high as we might need to. These would be the PCI/AGP lock as well as the asynchronous memory modes that allow us to stay within our memory's clock speed envelope at a wide array of fsb's. But as good as the intel async modes are, async modes will still degrade latency performance to a small degree. If there really is a latency difference in your example configuration, it is a result of slightly differing efficiencies amongst the partricular memory ratios used. But as I doubt you were using 1:1 for any of the tests, it's hard to attribute the difference in application performance you noted to latency. But continued investigaton with the cachemen tool might make this relationship clearer. From what I know of the situation I believe HT is the cause of the differences you note, rather than latency.
PAT, and the PAT-like Asus optimizations found in the P4P800, do indeed cut latency. This is the area where dual channel falls down. Its latency is poor compared to the single channel 845pe, allowing 845pe's application performance to compare surprsingly well to the obviously more advanced 865 and 875. The Xbit labs article mentioned above is a valuable demonstration of this point. PAT does really help the latency of the dual channel configurations though, and almost entirely eiminates the rather large latency penalty dual channel operation causes.
http://www.xbitlabs.com/articles/mainboards/display/albatron-px865pe-pro.html
Originally posted by flux
Actually thats not 100% accurate, especially if you are talking about AMD's (I know this thread isn't about AMD's, but hey - theory still applies). When running through a sequence of instructions, the CPU has to do several things before the instruction can actually be excuted. For instance, both AMD's and Intel's processors are RISC based with a CISC wrapper (the processor only takes a small number of very simple instructions, but has a layer outside the core that can turn 1 complex instruction into many simple ones). So to execute an instruction, it has to be decoded before being executed (turn that 1 complex instruction into simple ones). This takes (at least) a clock cycle. And this is just one example. There are many other steps along the way that the CPU as a whole has to take before the core can execute the instruction. This all results in a pipeline, with x number of steps, and x number of instructions in at a time. So because an instruction takes x cycles to execute you have a latency between sending the instruction to be executed and the execution actually taking place. In real terms you can rarely notice any performance difference with longer pipelines (and higher latency) unless you have a high predictive branching error rate (each time a mistake is make with predictive branching the entire pipeline has to be discarded). It's a technical point, but the latency does change from processor to processor, and while it doesn't have a huge impact on performance, it is a key structural feature that is part and parcel of analysing CPU behievour.
Actually it is 100% accurate if you leave my comments in the context in which they were made. I was not making a blanket statement about all cpu's- only the two he was comparing, the 533fsb P4 and the 800fsb P4.
Here are a few remarks from the xbit labs HT article that note the same tendency I feel explains your benchmark results:
"Moreover, we can observe the same "harmful" tendency in the tests of this set: Hyper-Threading slows down the processor. However, we didn't expect anything else, to tell the truth. The threads created by SPECviewperf 7.0 are very similar and struggle with one another for the same resources: OpenGL context."
Although this paragraph was referring to the SPECviewperf results, the fact that it applies to other applications is reflected in this paragraph from the conclusion of the article:
"In fact, it is still impossible to evaluate the cons and pros of Hyper-Threading to the full extent today. On the one hand, this technology will give green light to virtual dual-processor systems entering the market of high-performance home and office systems. The advantages of the technology are evident: the performance as well as response time when working with the existing applications get improved in most cases. However, there is always other side to the picture. Many contemporary tasks optimized for actual and not for virtual multi-processor configurations can be slowed down notably by Hyper-Threading technology. Besides, there are also quite many tasks, such as games, for instance, which performance does not depend on Hyper-Threading at all. Anyway, so far the advantages are dominating, so that the use of Hyper-Threading appears justified in most cases, if the system is not intended for any specific needs."
The entire article is found at:
http://www.xbitlabs.com/articles/cpu/display/3dmax5-p4-ht.html
I have seen other tests conducted on other sites that showed a larger penalty with HT enabled for serveral applications than the ones linked here at Xbit labs. But as I feel Xbit is the most technically competent and unbiased of the sites publishing this type of information (and perhaps the only one capable of correctly drawing anything more than the most obvious conclusions from their test data), I will omit links to them. But the trend is indeed clear.
I stated it before, and I will state it again: Hyperthreading can be a huge issue for multiple tasks or for programs that peform operations that are amiable to SMP and are optimized for it. For many applications this is simply not the case. In these cases HT can only degrade the performance, fortunately this degregation is minor. And while HT may allow you to run other applications at the same time you play a UT2K3 or the like, exactly how are you supposed to use them if you are playing a game of the nature? If they require user input the user is already fully occupied. And if they don't, they are data processing applications that will impose a drain on system resources that seriously compromise the game's performance.
Overclocker456
07-12-03, 09:42 AM
Unfortunalty Flux I can't run tests at 2.8GHz. I don't have the 2.6C anymore, I got rid of it ASAP to avoid being stuck with a CPU I don't want.
Rainzero, I'll have to re-run tests with the 533MHz CPU because I didn't keep them. I'm on vacation right now... I'll be back home (NY) in a week... For now this Sony notebook with a duron CPU could use a 800MHz bus.. :p
Thanks to everyone for the contribution to the thread.
ol' man
07-12-03, 10:25 AM
Look at what HT does for seti. A 3.2 with HT is as fast as a 2.8 dual Xeon rig. Look at the 2.8 non HT vs. the 2.8 with HT.
http://www.shackspace.com/~chemhaqr@shackmail.com/PIV.gif
Overclocker456
07-12-03, 11:34 AM
yeah seti... personally that means nothing to me. I didn't buy a PC to run SETI. At this point no programs or games for that matter benifit from HT. Of course if software is WRITTEN for HT then it helps, but when software isn't written for HT it's slower. I personally can't name one game or program that uses HT off hand. HT, is kind of like MMX.. Remember MMX... expect MMX didn't decrease performance when the software wasn't coded for it. I say we'll need atleast 2 years before REAL programs use HT. And of course Prescott is coming with a new and improved version of HT. Maybe prescott will make HT come to life. We'll see, as of right now though HT is in the Same boat as MMX as far as I'm concerned.
micamica1217
07-12-03, 01:36 PM
one thing I didn't notice is mutiplyer vs FSB.......of the two cpus tested.
let me see if I can explain what I mean......
I have been harping on the fact that a high OCed 2.4b is not realy any slower then a OCed 2.4c at the same ghz.
but let's take a look at a few things:
at stock the 2.4c will be about 5%-10% faster then the stock 2.4b(both on a dual DDR setup).
why is the 800 bus chip only about 8% faster, if the FSB is 75% faster then the 533 chip?
simple, the 800 chip has a multiplyer that is 33% slower.
now if you OC both chips, then you may have to use the 5/4 or 3/2 ratio for the 800 bus chip.
this will degrade performance anyware from 5%-15% as compared to running at 1/1 mem ratio.
my 2.4b at 3.4ghz is no more then 5% slower then a 2.4c at 3.4ghz.
infact, in many cases it is just as fast.(forget sandra mem bench, it means nothing)
also, if a person can reach 3.3-3.5ghz with a 2.4c, but must use the 3/2 ratio....then the performance I get with my 2.4b running at 1/1 ratio may be equal to the C chip at the same ghz.
now, since I've stated repeatedly that you loose about 5% performance from going with a 5/4 ratio, and about another 5% in performance when using 3/2 ratio(with the C chips)......
I can clearly see why your 2.8b can woop all over the 2.6c.
it's the 2.6c's low multiplyer that is holding it back.
yes, at 3.5ghz, the 2.6c has a slightly higher OC persentage.
yet, your multiplyer on the 2.8b is much higher, and your able to run it at a 1/1 memory ratio....where as you must use the 5/4(or 3/2) ratio for the 2.6c.
I'm not at all suprised at the results you got, comparing the two cpus. maybe a better comparison would have been with a 2.8b vs 2.8c at the same ghz.
you can blame HT all you want....but I feal that if you remove the fact that you have a great 2.8b that is able to reach 3.5ghz,(with it's fast multipler) and no memory ratio handicap, then you are only looking at half the story.
I realy wish that I too was not on a mini vacation, and that we could speak on the phone(I live in NY too).
if I knew that you were thinking of getting the 2.6c, then I would have talked you out of it.
the 800 chips out right now, are not made for you and me.
yet when prescott hits, things may change.
mica
ol' man
07-12-03, 01:51 PM
Originally posted by Overclocker456
yeah seti... personally that means nothing to me. I didn't buy a PC to run SETI. At this point no programs or games for that matter benifit from HT. Of course if software is WRITTEN for HT then it helps, but when software isn't written for HT it's slower. I personally can't name one game or program that uses HT off hand. HT, is kind of like MMX.. Remember MMX... expect MMX didn't decrease performance when the software wasn't coded for it. I say we'll need atleast 2 years before REAL programs use HT. And of course Prescott is coming with a new and improved version of HT. Maybe prescott will make HT come to life. We'll see, as of right now though HT is in the Same boat as MMX as far as I'm concerned.
Every watch this?
http://www4.tomshardware.com/images/thg_video_5_p4_ht.zip
I showed not to long ago here how I could run one benchmark and another bechmark together with HT but without it one app had to wait for the other to finish.
For instance I use a geochemical modeling program which takes around 15 minutes to complete its iterations. Depending on the size and distance of the reaction(ground water flow) it can take longer. When running this program I would have to wait for it to finish before I could use my computer as it would run the proc to 100%. With HT now I can surf, word, game, etc.... without waiting for the geochem program to finish and its times are barely affected. It is similar to running a SMP system but not identical.
Wish some folk would wake up.
Overclocker456
07-12-03, 01:57 PM
Originally posted by micamica1217
one thing I didn't notice is mutiplyer vs FSB.......of the two cpus tested.
let me see if I can explain what I mean......
I have been harping on the fact that a high OCed 2.4b is not realy any slower then a OCed 2.4c at the same ghz.
but let's take a look at a few things:
at stock the 2.4c will be about 5%-10% faster then the stock 2.4b(both on a dual DDR setup).
why is the 800 bus chip only about 8% faster, if the FSB is 75% faster then the 533 chip?
simple, the 800 chip has a multiplyer that is 33% slower.
now if you OC both chips, then you may have to use the 5/4 or 3/2 ratio for the 800 bus chip.
this will degrade performance anyware from 5%-15% as compared to running at 1/1 mem ratio.
my 2.4b at 3.4ghz is no more then 5% slower then a 2.4c at 3.4ghz.
infact, in many cases it is just as fast.(forget sandra mem bench, it means nothing)
also, if a person can reach 3.3-3.5ghz with a 2.4c, but must use the 3/2 ratio....then the performance I get with my 2.4b running at 1/1 ratio may be equal to the C chip at the same ghz.
now, since I've stated repeatedly that you loose about 5% performance from going with a 5/4 ratio, and about another 5% in performance when using 3/2 ratio(with the C chips)......
I can clearly see why your 2.8b can woop all over the 2.6c.
it's the 2.6c's low multiplyer that is holding it back.
yes, at 3.5ghz, the 2.6c has a slightly higher OC persentage.
yet, your multiplyer on the 2.8b is much higher, and your able to run it at a 1/1 memory ratio....where as you must use the 5/4(or 3/2) ratio for the 2.6c.
I'm not at all suprised at the results you got, comparing the two cpus. maybe a better comparison would have been with a 2.8b vs 2.8c at the same ghz.
you can blame HT all you want....but I feal that if you remove the fact that you have a great 2.8b that is able to reach 3.5ghz,(with it's fast multipler) and no memory ratio handicap, then you are only looking at half the story.
I realy wish that I too was not on a mini vacation, and that we could speak on the phone(I live in NY too).
if I knew that you were thinking of getting the 2.6c, then I would have talked you out of it.
the 800 chips out right now, are not made for you and me.
yet when prescott hits, things may change.
mica
This might also be a possibility.. good work. It's nice to see I'm not the only one who noticed this... I thought I was alone:p
What part of NY you live in? It would be cool if we lived close..
Overclocker456
07-12-03, 01:59 PM
Originally posted by ol' man
Every watch this?
http://www4.tomshardware.com/images/thg_video_5_p4_ht.zip
I showed not to long ago here how I could run one benchmark and another bechmark together with HT but without it one app had to wait for the other to finish.
For instance I use a geochemical modeling program which takes around 15 minutes to complete its iterations. Depending on the size and distance of the reaction(ground water flow) it can take longer. When running this program I would have to wait for it to finish before I could use my computer as it would run the proc to 100%. With HT now I can surf, word, game, etc.... without waiting for the geochem program to finish and its times are barely affected. It is similar to running a SMP system but not identical.
Wish some folk would wake up.
yeah I saw that video a while ago... to be 100% honest I really don't believe it's 100% accurate, although tom did a good job making LOOK nice..
micamica1217
07-12-03, 02:15 PM
Originally posted by Overclocker456
This might also be a possibility.. good work. It's nice to see I'm not the only one who noticed this... I thought I was alone:p
What part of NY you live in? It would be cool if we lived close..
Brooklyn.....PM me if you want my phone #
mica
ol' man
07-12-03, 02:21 PM
Originally posted by Overclocker456
yeah I saw that video a while ago... to be 100% honest I really don't believe it's 100% accurate, although tom did a good job making LOOK nice..
Well believe what you want. HT works quite well in an smp situation. That is what I noticed.
ol' man
07-12-03, 03:00 PM
MAYBE NO!
Here are some screen shots with HT enabled and without.
This is without.
http://www.shackspace.com/~chemhaqr@shackmail.com/HT4.gif
This is with HT enabled.
http://www.shackspace.com/~chemhaqr@shackmail.com/HT5.gif
As you can see it took about 7.6 seconds longer for the non HT system to complete both benchmarks.
Here is what it looked like without HT on. Prime would not even start until superpi was done.
http://www.shackspace.com/~chemhaqr@shackmail.com/HT2.gif
With HT on.
http://www.shackspace.com/~chemhaqr@shackmail.com/HT1.gif
I like HT and it is not a gimmick or whatever you want to beleive it is.
Overclocker456
07-12-03, 03:30 PM
Originally posted by micamica1217
Brooklyn.....PM me if you want my phone #
mica
I'm in Queens... aight.. I'll PM you when I get back..
NookieN
07-12-03, 03:36 PM
Originally posted by ol' man
Here is what it looked like without HT on. Prime would not even start until superpi was done.
Try raising the priority of Prime in the Advanced menu to 7. Then they should run together. But yeah it will still be faster with HT enabled.
Originally posted by Overclocker456
I tried running multiple versions of prime 95 and Super PI at the same time, the difference was VERY small.. And looking at my results from hours of testing in todays programs HT makes things slower by up to 5%. Maybe 3Dmark2005 will be HT ready huh?;)
Well, that's exactly the sort of scenario where HT cannot help at all. As you know, HT works by allowing idle resources to be allocated to a second thread. Running two identical (or very similar) threads that are using the exact same resources means no sharing can take place.
All of the nice benefits of HT appear when you're doing two or three different tasks at the same time. Background media encoding and foreground web browsing, or playing a game while your machine is running Folding@home in the background. That sorta stuff.
It is unwise to expect miracles from HT, but it is an undeniably excellent idea. Unless, of course, you're an AMD shareholder. ;)
vBulletin® v3.8.7, Copyright ©2000-2012, vBulletin Solutions, Inc.