nlite profile

Windows Showdown: 8 Operating Systems in 6 Benchmarks

Add Your Comments

Since its debut, Windows Vista has taken nothing but flak from almost every demographic one could think of. Everyone from the casual user looking to browse the web and type up a few reports to the benchmark fanatic obsessed with squeezing all the speed he or she could is likely to complain about Vista being bloated and slow. Windows 7 on the other hand has been hailed as being noticeably better performing, and supposedly as light as XP. And what about XP? How do they really stack up to one another? The examination of these questions follows.

Personally, I’m an avid benchmark junkie, so I could only look from this perspective. I’m unconcerned with how things “feel”, but rather how they score. Hard numbers are what matter to me. They might not matter to many, but they measure speed in its true essence, devoid of any subjectivity. Bearing this in mind, I selected six of the most popular benchmarks used by overclocking enthusiasts, each tending to have unique biases with regards to what part of the system they emphasize.

Editor’s Note: While the author is being modest, Gautam is a world renowned benchmarker, and is an authority on the subject of Windows benchmarks.

The Benchmarks

3DMark03 – Predominantly measures GPU performance
3DMark05 – Predominantly measures CPU and memory performance
3DMark06 – Measures both GPU and CPU/memory, and additionally tests multi-threaded performance
Aquamark3 – Almost exclusively measures CPU and memory performance, with an emphasis to the latter
SuperPi 1M – Measures single-threaded CPU performance and is slightly influenced by memory
wPrime 32M – Measures multi-threaded CPU performance with no influence from memory
Some might be wondering why 3DMark Vantage was omitted. The main reason is that it would be a bit boring. Each operating system appears to score nearly identical in 3DMark Vantage, and any variations are within the margin of error.

System Configuration

I used a setup that I would consider fairly typical for an overclocking enthusiast:

Intel Core i7 965 Extreme at 4 GHz
ASUS P6T Deluxe OC Palm Edition
6GB Corsair Dominator GT 2000C7 at 960MHz CAS 7
2x ATi Radeon HD 4890’s at stock frequency of 850/975

To be perfectly honest, the system configuration will likely have an impact on how the various operating systems compare with each other. Therefore, using one that is modern and high-performing is, in my eyes, the fairest way to compare them.

The Operating Systems

Windows Server 2008 x64
Windows Server 2008 x32
Windows 7 x64
Windows 7 x32
Windows Vista x64
Windows Vista x32
Windows XP x64
Windows XP x32

The operating systems are the usual suspects, all with the latest service packs installed. I added Windows Server 2008, as some people have supposed that it is faster than Vista, which it is based on, and I wished to put that theory to the test. Additionally, I tested both 32-bit and 64-bit variants of each operating system. How they handle the memory subsystem is important when it comes to benchmark performance, as we will see. Lastly, in order to make the tests fairest for the operating systems, I trimmed all eight of them using nLite and vLite. Consequently, I made the running services constant between all of them to rule any out as a factor. My vLite profile is as follows:

vlite profile

vlite profile

Only the important stuff remains, with all the fluff removed.
And nLite (for XP 32 and 64):

nlite profile

nlite profile

The Results

3DMark03 Results

3DMark03 Results

Only one thing predominantly sticks out when viewing the results for 3DMark03—good ol’ XP doesn’t fare too well, while all the others are very close, with Windows 7 being slightly in the lead. Since 3DMark03 is heavily GPU-centric, this dead heat is not too much of a surprise. The benchmark depends mostly on GPU performance and is not heavily influenced by much on the system side, OS included. Still, it certainly shows XP’s obsolescence.

3DMark05 Results

3DMark05 Results

Now is when things start getting interesting. 3DMark05 emphasizes CPU and memory performance, and consequently we can see the operating system having a very noticeable impact on performance. In fact, that the only two operating systems that even perform similarly are Server 2008 and Vista. This does not come as much of a surprise, considering that the two are mostly the same under the hood, and are even more similar after I ensured that the running services and installed components were as close between them as possible. XP once again lags far behind the rest of the pack, but interestingly enough both 7 32 and 7 64 also score considerably lower than Vista and Server 08. 7 and XP being the worst performers certainly flies in the face of conventional beliefs. Another very interesting thing to note is that the 64-bit variants for 7, Vista and Server 08 all perform worse than their 32-bit counterparts. We must bear in mind that this benchmark uses under 1GB of memory, but for this quantity, the 64-bit OSes handle the memory sub-system a bit slower.

3DMark06 Results

3DMark06 Results

The results in 3DMark06 are somewhat similar to those in 05, however, this time around Windows 7 pulls up far ahead, scoring almost evenly with Vista. Also, the hit going from 32-bit to 64-bit in Windows 7 is much smaller than it is going from 32-bit to 64 in Vista and Server 2008. XP is still decisively in last place, but the margin is a bit smaller this time around, thanks to XP scoring better in the CPU test portion of 3DMark06 than the newer OSes.

Aquamark3 Results

Aquamark3 Results

The results from Aquamark3 are quite similar to 06. Windows 7 once again makes a strong showing, and once again, 64-bit does not seem to hurt 7 very much, but takes a slight toll on Vista. Both versions of XP are far behind, but curiously enough XP 64 is considerably better than XP 32. Server 2008 is similar to Vista, however it’s only fair to point out that run #3 for Server 08 32 was a bit of an outlier, what one would call an unlucky run. The first two runs had it performing on par with Vista.

One important thing to note about Aquamark3 in particular is that there is a heavy dependence on graphics drivers. These results only look this way on ATi GPU’s, like those used in this test. On nVidia GPU’s, XP is actually slightly ahead of the others. You’ll have to take my word on that since nVidia results aren’t included in this roundup, but curiously enough, running ATi in Windows 7 scores about equal to a comparable nVidia setup in XP.

SuperPi 1M Results

SuperPi 1M Results

These results are very different indeed from those obtained in the 3D benchmarks, and are almost completely the opposite. XP 64 has a noticeable lead over all the others, and is also the most consistent. Interestingly enough, this is the only benchmark where Server 2008 appears to be considerably faster than Vista. However, just like in the 3D benchmarks, the 64-bit variants of Vista and Server 2008 are slower. In 7 it’s the complete opposite, with 7 64 noticeably outperforming 7 32, further supporting that the 64-bit version of 7 does indeed seem to be optimized in some way that 64-bit Vista is not.

wPrimeResults

wPrimeResults

I’ll start out by saying that I tried to work out exactly why XP 64 scored so poorly, but I’m afraid I can’t offer any explanation, so it has to be taken at face value. Otherwise, XP 32 is still ahead of the newer OSes, but by a smaller amount than it is in SuperPi. All OSes in fact are very close to each other, barring XP64. Windows 7 though once again shows some weakness on the 2D side of things, but 32-bit and 64-bit are in a dead heat.

Conclusion

So, who’s the winner? Well, if you’ve scrolled to skip past the graphs, every single benchmark has a unique operating system that does best. Overall, the two most solid performers are Server 2008 32 and Vista 32. Both of these are at the top for the 3D benchmarks, and fare okay in the 2D benchmarks as well. Deserving of flak in every day usage or not, in benchmarks, Vista 32 performs very well. Contrary to popular belief, XP and 7’s supposed “lightness” does not really translate in benchmarks. In fact, the more CPU-centric a benchmark is, the worse 7 tends to do. The only thing XP remains good for are 2D benchmarks, falling far behind the pack in all things 3D. Once again, this article only sets out to show which the fastest operating systems are by the numbers. The fastest choice might not necessarily be the best one for you.

Questions and discussion of this article are on Overclockers Forums, join in!

Gautam

Leave a Reply

Your email address will not be published. Required fields are marked *

Discussion
  1. Was just reading some of the comments and I can not believe some of the statements made "XP is just fine by the data" and it is "tangibly faster than Vista" etc.

    First of all "Vista is tangibly slower" is based on a subjective conclusion, and is contrary to your claims about the article not being "scientific" and has been known since the OS was in BETA that it was a UI effect to make the OS seem more appealing.

    To deal with the Vista comment. It is faster than XP, the difference is in the UI. The "aero theme" has a 1000ms delay that you can adjust. This will make Vista "tangibly appear faster" than XP. But it gets rid of the nice effects. Way back in the day XP had the same issue. They added a delay to the start menu and tweakers hacked the hell out of that OS to make it a benchable system over 2000.

    (subjective)For me Vista boots faster, loads programs faster and runs a lot more solidly than XP does. I DREAD having to work on peoples PCs that still use XP. Sad but true. I am even starting to appreciate 7 a bit now that I have forced myself to use it for more than a month. Its still no Vista64 but, it might be. (7-64 would not let me run a ton of software I like so that choice was not an option :( )

    As for the basis of the article. It is quite clear and would be too hard to read if it started at 0%. I like seeing them start at 0, and oft times when I see a review that does not start there.. I anticipate a biased report. I can see why they chose to work it how they did. Yes 5% is small in terms of "desktop readiness" it is HUGE when talking about benchmarking though. 5% boost in performance could lead to 50-2000% improvement in boints. (not a typo).

    Just saying the article is great. Thanks guatum for your diligence. I find myself linking to or referring to this article quite a bit :)
    macklin01
    Another funny thought: for some of the benches, there may not be a statistically significant "winner." In those cases, a bencher would be better served by running the benchmark multiple times and waiting for a random event to push them higher than reinstalling their OS. :) That's actually kind of cool.


    You are absolutely right there, I actually did hold the worldrecord in 3dMark 2001 - the result came after hours of benching. I never expected the result :)

    Futuremark (former Mad Onion) have created a hype - I did early understand their goal ; earn money on others work.... so I just jumped off ;)
    Omsion
    Yeah, I probably mis-wrote some of those. I plead too many graphs and quick glances :burn:

    Anyway, do you think a fair conclusion given these numbers, for an average user interested in upgrading to Win7 from Vista only for performance reasons is "Don't bother - many insignificant results, couple significant ones but only resulting in small differences in both directions depending on benchmark"?


    I think that's fair; see above. I think that's what makes this so interesting--depending upon the purpose, there are two radically different sets of conclusions to draw. Also, the fact that the highest scorer varied among all the tests seems to show that overall it's a draw, when not considering specific benchmarks as the goal. I also think that another interesting result shown here is that Vista can be trimmed to perform essentially as well as XP and Win-7 (at least in benchmarks; task switching, etc. is another matter). These are interesting results outside the benching community.

    It might be good to use the term "small" rather than "insignificant." The differences may well turn out to be statistically significant but not large enough to justify the time spent in an OS reinstallation. Again, depending upon the purpose of the system. ;)

    Another funny thought: for some of the benches, there may not be a statistically significant "winner." In those cases, a bencher would be better served by running the benchmark multiple times and waiting for a random event to push them higher than reinstalling their OS. :) That's actually kind of cool.
    icebob
    You see G that thing should have stayed in the lounge.....


    rdrash
    Thanks for the hard work Gautam.... I know it must have taken hours and hours to accomplish and is very much appreciated! Not many people would have bothered with such an exhaustive effort, kudos.

    ....sorry to see some people giving you headaches.

    ..... lol Bob, you might catch grief for saying that, but +1 brother I'm with you.


    The response has been overwhelmingly positive, and even gautam would agree it was time to release his work. 6 major community outlets picked up his article, as well as many other smaller ones.

    The negativity in response to open discussion is the only thing out of place here. ;)
    Gautam
    In 03, 06 and Aquamark, yes Vista and 7 are basically equal. In 05 they are certainly not. (And in 06 the difference between 32-bit and 64-bit for Vista is significant)

    How about I focus on just 05, just Vista 32 and 7 32 for example, and give them each a much larger amount of trials?
    Yeah, I probably mis-wrote some of those. I plead too many graphs and quick glances :burn:

    Anyway, do you think a fair conclusion given these numbers, for an average user interested in upgrading to Win7 from Vista only for performance reasons is "Don't bother - many insignificant results, couple significant ones but only resulting in small differences in both directions depending on benchmark"?
    Omsion


    For example, Vista vs Server 08 vs Win7 in 3DMark03, 3DMark05, minus Vista64 in 3DMark06, minus Servers and Vista64 in Aquamark, etc. Thus my original conclusion in my first post, Vista and Win7 are generally statistically indistinguishable. I definitely agree with you that there are non-random differences in the mix, though.



    In 03, 06 and Aquamark, yes Vista and 7 are basically equal. In 05 they are certainly not. (And in 06 the difference between 32-bit and 64-bit for Vista is significant)

    How about I focus on just 05, just Vista 32 and 7 32 for example, and give them each a much larger amount of trials?
    Gautam
    It's not "random noise" and it is consistent. Even from a statistical viewpoint, if a data point is 5 deviations from the mean then the error is certainly statistically significant. In fact I'm not clear what reasoning you guys are using to dismiss a certain percentage as being "insignificant."
    Yes, that's definitely appears true for some of the comparisons (generally speaking, XP vs the rest). But for quite a few, I don't think we could conclude a statistically significant difference of mean values with the current sample size (just by looking at the graphs, no actual hypothesis testing).

    For example, Vista vs Server 08 vs Win7 in 3DMark03, 3DMark05, minus Vista64 in 3DMark06, etc. Thus my original conclusion(s). I definitely agree with you that there are non-random differences in the mix.

    icebob
    Ok let me put it in a different perspective, if you want a new car and want advice on what is more cost/performance effective you will probably look in Car and Drivers, but if you already have the car and want to get the most out of it you will probably look in Muscle Car. You see my point, this comparo was done for a Muscle car audience not for the Car and Drivers reader. You seem to don't understand how much work it involve to get let's say 1 seconds less in wprime, and G reference guide help us accomplish that. I can assure you that switching from Vista 32 to Win 7 won't let you get your email faster


    macklin01 got to this first, so I'll let his word stand.

    But for myself, I learned alot here from this back and forth, and especially about what benchmarkers look for. I understand now that this is an especially great guide for choosing which OS to run when targeting different benchmakrs. This is something I wouldn't have gotten out of this without this discussion.
    Yeah, I also think that the concerns are valid.

    But perhaps I should do one benchmark with something like 20-50 trials which will also exhibit that the error between results is very small, and when you have even a couple of percent worth of difference, it is significant.
    I hope so, hokie. I hope I'm not just stirring up trouble.

    I hope that Guatam realizes that I wouldn't even be commenting if I didn't think it was a great article worth discussing.
    macklin01
    Again, a well-done work comes out stronger after tackling constructive criticism. It's part of how we learn and evolve.

    I believe that Guatam's work is in this category: well-done work that will emerge all the stronger.

    Firewalling ourselves from differing points of view isn't healthy or conducive to understanding. If our analyses can only convince people who agree with us, then they probably aren't very good analyses. Fortunately, that's not the case here. :)

    I think there's a good opportunity here to intermingle and strengthen the bonds within our diverse community. Again, I'd like to extend my offer to G to do something together as a follow-up. I'm learning a lot as I read through here.


    Spoken like a true PhD. Thanks for your input. Even those of us that didn't write the article are getting some good advice for the future. :salute:
    Thanks. I appreciate the difference.

    I wouldn't say that. In fact, I greatly appreciate and admire how difficult it is. I myself would never have the time, patience, or budget to do that. But I admire seeing what's possible, and I appreciate that pushing the envelope of the hardware helps advance the state of hardware for the rest of us. At the absolute very least, what you do (1) helps us figure out what hardware has enough quality to survive 24-7 heavy-duty use in less extreme settings (e.g., a 5% overclock applied to a cancer simulation), and (2) pushes the hardware manufacturers to improve their top-end products, which in turn improves the mid- and lower-end products as well. It's a win for everyone. I don't think anybody denies that. And nobody denies that there are benefits to the broader community far beyond this.

    What we have here is an interesting discussion. You're presenting work that started in a niche but is interesting to everyone. You're finding different points of view on the same data. That's enlightening for all of us. It's not that somebody or other "doesn't get it." It's that they have a different frame of reference.

    The data may or may not be statistically significant. Some plots are, some may not be. I believe most individually are. Nonetheless, a near-NULL result is extremely interesting for the general readership, and the individual results are interesting to the benchers. We all win here. And I think taking care to remember that we are a broader audience is valuable. We gain data that we didn't have before, even if for different conclusions. It's a beautiful case of getting twice as much out of the same data than previously thought. That's a benefit of opening up to a broader group--you find things you would not have otherwise expected.

    That's been the case for me. I've been exposed to the thoughts and methods of a completely new group. Aside from reading a few "world record LN2 overclock" articles here and there, this is new to me. And I gained for it. So thanks for opening up. Don't let constructive critiques scare anyone away--it means that we're genuinely interested and want to learn more. You might just get some new recruits for it.

    Opening yourself up and presenting your work to a broader, often skeptical audience is challenging and scary. I know exactly how this feels, because I do it every day as a mathematician working on cancer and molecular/cellular biology. The discussions can be heated and draining, but you learn so much and advance your knowledge and your presentation skills so much, that you always come out the stronger for it.

    I've also found that the more I learn, the more education I acquire, the more I find myself able and willing to say "I was wrong. I hadn't thought of it that way. That's interesting. That has so much more meaning than I had appreciated. That's deep, and I think I can use it."
    Ok let me put it in a different perspective, if you want a new car and want advice on what is more cost/performance effective you will probably look in Car and Drivers, but if you already have the car and want to get the most out of it you will probably look in Muscle Car. You see my point, this comparo was done for a Muscle car audience not for the Car and Drivers reader. You seem to don't understand how much work it involve to get let's say 1 seconds less in wprime, and G reference guide help us accomplish that. I can assure you that switching from Vista 32 to Win 7 won't let you get your email faster
    rdrash
    Thanks for the hard work Gautam.... I know it must have taken hours and hours to accomplish and is very much appreciated! Not many people would have bothered with such an exhaustive effort, kudos.

    ....sorry to see some people giving you headaches.

    ..... lol Bob, you might catch grief for saying that, but +1 brother I'm with you.


    Again, a well-done work comes out stronger after tackling constructive criticism. It's part of how we learn and evolve.

    I believe that Guatam's work is in this category: well-done work that will emerge all the stronger.

    Firewalling ourselves from differing points of view isn't healthy or conducive to understanding. If our analyses can only convince people who agree with us, then they probably aren't very good analyses. Fortunately, that's not the case here. :)

    I think there's a good opportunity here to intermingle and strengthen the bonds within our diverse community. Again, I'd like to extend my offer to G to do something together as a follow-up. I'm learning a lot as I read through here.
    icebob
    You see G that thing should have stayed in the lounge.....


    I disagree. This kind of discussion is healthy and enlightening for all of us. We all learn something and are forced to reassess and strengthen our arguments. Sometimes we find we were wrong (and can be thankful for new knowledge, saving money, or whatever), and sometimes we find we were right but now have a deeper understanding of why (and have more effective arguments for the next time).

    There's a great risk when a group keeps itself isolated because it doesn't want to hear contrary opinions or analyses. The group loses out because it develops a monoculture that's susceptible to unchallenged dogma. The broader community loses out because they don't get the group's in-depth expertise. When both work together, both are enriched. They just have to learn one anothers' vocabularies and motivations.
    Thanks for the hard work Gautam.... I know it must have taken hours and hours to accomplish and is very much appreciated! Not many people would have bothered with such an exhaustive effort, kudos.

    ....sorry to see some people giving you headaches.

    ..... lol Bob, you might catch grief for saying that, but +1 brother I'm with you.