SSE2: Part of the Picture?

Software optimizations can determine winners in the CPU race. –Ed

A + B > B

Ace’s Hardware has a comparison between an 1.2Ghz dual AthlonMP workstation and a few dual 1.7Gh Xeon workstations.

If you’re sixteen years old, don’t stop reading. I’m talking about workstations, but the same issues apply to games, too.

For most tests, the Athlon workstation beats the Xeon workstations. There’s a pretty good reason for that.

A while back, the design aims of AMD and Intel when it came to floating point operations diverged. AMD beefed up
its floating point hardware. Intel decided that software approaches were a better idea, hence SSE and SSE2.

This competition basically faced off an AMD chip with better FPU hardware AND SSE versus Intel chips with just SSE and son
running applications that had SSE, but not SSE2 optimizations.

A + B > B

I don’t find that unfair at all. These companies are big boys. They decided what to put into the chips or not. They both brought
a product to the table, and tested with real applications, and the Athlon won.

I’d reject any Intel whining about the test not being “fair” due to the lack of SSE2 optimizations just as fast as an AMD whining about TBirds not being able to take advantage of SSE optimizations in some test.

However, there was an indicator in the article as to what SSE2 could do for at least one of the programs:

Alias|Wavefront has used the special Intel Compiler optimized for the Pentium 4 to compile Maya 4. According to the first data we received, the Dual Xeon 1.7 GHz should be able to render about 44 images per hour, while the dual Athlon MP should reach 43.5 images per hour.

In short, SSE2 optimization turned a (fairly small) AMD advantage into a slight Intel nudge.

The Game Can Be Won In the Back Rooms

This is not meant to say the PIV will inevitably win. Even in this particular area, the gap between the Athlon and PIV is often so big that there’s little Intel can do short of redesigning the PIV or ramping up speeds a lot faster than they’ve said in order to catch up.

When it’s relatively close, though, whether one processor ends up doing better than another can be decided by people far, far away from the arena, by people who may not even know that they’re deciding.

These are the software programmers.

Generally, gettting software programmers to use these kinds of optimizations to a large degree depends on whether or not you can make it a no-brains activity for the compiler.

This puts the likelihood of these optimizations being used into the hands of the people who write compilers. Ace’s Hardware has a comprehensive article about floating-point and how it can be compiled. Perhaps of greatest interest to AMD fans is this description of how
Intel’s plug-in compiler generates great SSE2 code for PIV use, but not-so-hot generic code for non-SSE2 processors (like the PIII, or, more importantly, Palomino).

What this means is that for certain floating-point applications, whether a Palomino or PIV will win will depend on how lazy a programming team is.

If they just use the Microsoft compiler, the Athlon gets an “unfair” advantage, since the MS compiler all by itself doesn’t do SSE2 very well. If they use the Intel plug-in and take whatever it puts out, the PIV is fine,
but the Palomino gets less than optimal code. To get an “even” playing field, you need two separate compiles.

Talk about arcane! But this little detail decide which “side” wins in a particular app.

It Sure Makes Looking Ahead Murky

Let’s say you need to buy a workstation. You run Program X all the time. You’re looking at either an Athlon or PIV workstation in say, December.

You look at this article, and you appreciate it, but you also know the configurations you’re going to be looking at four months from now will be a good deal different. That you can handle.

But finding out about software optimizations is a whole new ballgame. It’ll be rough enough talking to some PR person about whether the program will use SSE2 or not.

Imagine asking them, “Now just what compilers are you going to use and just how are you going to compile them?” That is sure to win you Anal Compulsive Geek of the Century award in some circles. What’s scary is that those circles might include the responsible programmers.

And again, this issue applies to games just as much as professional rendering programs. Exactly how Doom4 is compiled can matter just as much, indeed, maybe more (especially for RDRAM based systems).

Eventually, AMD is supposed to put SSE2 in its processors, but they’ve only promised that for Hammer so far, couldn’t find anything about it being included in Thoroughbred next year.

Of course, for most people, their applications either aren’t filled with floating-point, or other priorities (like cost) take precedence over relatively small differences in performance.

Again, even in the FP field, the Athlon is likely to do better more often than not due to its better fp hardware.

But where it’s close, that arcane little detail can make the difference, and make life difficult for the rest of us.

Finally, Fairness

The compiler situation I’ve described above ironically provides an excellent fairness vs. “fairness” test.

You have three compiling possibilities. The current situation measured (MP vs. Willy) is objectively unfair to Intel (due to lack of SSE2 optimizations). One is objectively unfair to AMD (due to generic instructions for non-SSE2 processors). Only one of the three is what I would call truly fair.

We can spend forever uselessly arguing about “fairness” and it doesn’t matter. We don’t live in a fair world, we live in the real world.

You want to argue and lobby those responsible to make matters fairer, fine, but you can’t say, “Forget reality because we think it’s unfair.” If you don’t like current reality, do what you can to change it, not hide it.

3DNow lost in the marketplace. Earth was unfair to 3DNow. You don’t like that? Go find a fairer planet.

A realistic definition of “fairness” here is to enable those competing to be the best they can be. Excellence should be the goal. That means winners and losers.

I would not be surprised to find out that some of the current AMD advantage over Intel in at least some cases is due the first situation.

I would not be any more surprised to find at least some of any shift in performance towards the PIV in benchmarking is due to folks going from Compiling Door #1 to Compiling Door #2 or #3.

Do I think the difference turns the PIV from a loser to winner? Generally, no.

Please note, though, going from door #1 to door #3 is more “fair.” In at least some areas, Intel’s been under a compiler disadvantage, and going to door #3 is an advance in “fairness.” That will mean the Athlon will lose some of its margin of advantage in these areas (while at the same time picking some up with MP/Palomino due to the inclusion of SSE).

Anybody who hollers that only door #1 is “fair” is either being a hypocrite, or at least has a much different definition of fairness than I. You ought to find out what that is.

True, it’s quite possible the compilers may go to door #2 instead, and shift from being unfair to Intel to being unfair to AMD. But that’s something which has to be shown, it can’t be automatically assumed.

It’s also just as possible that benchmarking programs may cherry-pick activities favoring certain optimizations, but again, that’s something provable, and thus has a burden of proof.

Please note that these are two very different situations. If a benchmark doesn’t reflect reality, you shouldn’t use it because it’s unrealistic, not because it’s “unfair.”

I think we’re going to find ourselves probing more and more into these benchmarks to determine just that.

If however, the program itself is “unfair,” that is reality to those who have to use it, and you can’t hide that and be “fair” to the user.

“Fairness” doesn’t matter to the user. He’s interested in what is, not what should be. He’s interested in what he can use, not what he could have or might have or should have been able to use.

AMD and Intel are not charities. They are not churches. They are two business rivals engaged in an increasingly bitter, important struggle against each other, and neither side has even a hint of a halo. They are not out to be fair; they are out to win.

A lot of these “fairness” people are fighting this battle under sheepskin “fairness” clothing. Make no mistake about that, nor doubt that both Intel and AMD are doing whatever they can to skew results in their favor (Intel is likely getting the better of that exchange).

Some of these “fairness” people may think they need to help David against Goliath for a “fair fight,” but that’s hardly the same as being objective.

We’ve said it before, and we’ll say it again. We’ve been pro-Intel when we’ve felt they’ve had the best products. We’ve been pro-AMD when we’ve felt they’ve had the best products. Actually, we’re not pro-anybody; we’re pro-best, whomever that might be.

Yes, we do have a bias, we are biased towards best bang-for-the-buck solutions. What we are not biased towards is where it comes from.

We think over the next six months, the playing field as we see it is going to shift from being AMD’s playground to a truly competitive battlefield. We think it will be close, with advantage shifting back and forth, but our crystal ball hazes up really badly when we try to peer deeply into 2002.

What is not hazy, though, is how we’re going to look at this. We’re not going to take sides, and we’ll give both sides whacks when they deserve it, and call “foul” when that’s deserved.

Our ultimate loyalty is our stated principles, which is to find the best available for a reasonable price.

It’s going to get nasty out there. The closer it gets, the louder it will get.

Email Ed

SSE2: Part of the Picture?

Be the first to comment

Leave a Reply

Related Articles

The Cheap Hot Box of 2009: Do You Want It?

Lapping Theory – Is It “Perfectly” Flat?”

The Camel’s Back

Be the first to comment

Leave a Reply