BAPCO

A bad benchmark is no better than an evil one.– Ed

We’ve never used SysMark 2002 in a review because frankly I found some of the PIV results (particularly how the results scaled) rather . . . strange.

There’s a number of items recently posted with some detailed reasons why the results end up so . . . strange here and here.

Lousy Is Just As Good As Evil

There’s one problem with these pieces. There seems to be this underlying assumption (much more so in one than the other) that evil intent has to be proved for the benchmark to be discredited.

There’s no reason for that at all. All you have to have to do to discredit the benchmark is to show that it’s a lousy benchmark that is not representative of the activities being measured.

This is a benchmark, not a first-degree crime, not a sin. You don’t have to prove intent, just inadequacy.

This is a benchmark, a program, a series of commands. You can’t hurt it, harangue it or hang it. All you can “do” to it is not use it.

And lousy is just as good a reason for doing that as evil.

You are a manager. You have to hire somebody. You have two candidates. One is hopelessly incompetent; the other is hopelessly evil. What do you do? You find somebody else.

You want somebody who can do the job. If the person can’t, you make the same decision no matter why.

Guilty As Not Charged

I look at the evidence presented, and I don’t quite see a smoking gun for evil. Close, but not quite.

However, I see far more than enough evidence to show that this is a benchmark so badly put together that it shouldn’t be generally used. While this verdict may not as emotionally satisfying to some as evil, but for practical purposes, it’s just as deadly.

You cannot rationally call a scoring methodology bad just because the “other side” happens to score higher. The fairest methodology in the world will always favor the faster processor.

The whole point of a benchmark is to determine winners and losers in a simulation of reality. To seek equality of results in such a test is ludicrous; it defeats the whole purpose of the test.

This view must be rejected. There should be no affirmative action program for benchmarks.

No, the criteria for judging a benchmark should only be to decide whether the program actions that go into the benchmark are representative of what people actually do with the program.

SysMark grossly fails this test, and that is the reason why it should not be generally used.

The complaints about performance tests seem focused on SysMark replacing AMD-favorable benchmarks with Intel-favorable benchmarks. That’s not the problem.

What is necessarily bad is SysMark’s amazingly disgusting reliance of just a few program actions to determine application performance.

Using the Sort function in Excel for 90% of the score in SysMark 2002 is terrible, but using a single function in Flash 88% of the time in SysMark 2001 is just as bad. Neither is a reasonably representative test of the program.

To me, that’s the smoking gun. I don’t need any further evidence or theories or suspicions. That’s enough reason not to use it.

You’re a prosecutor. You have a murder. There’s not a snowball’s chance in hell the defendant won’t be found guilty of homicide. Getting Murder One is a lot iffier.

What do you do?

What you don’t do is just try him for Murder One. If you do that, he may walk away scot-free. No, you try him on both charges so that even if he’s found innocent of Murder One, he’ll still do hard time for homicide.

Objective Reality Vs. Political Reality

The authors say, “Don’t use SysMark,” for a certain set of reasons. I agree with the conclusion, but for much different reasons.

You may say, “So what’s the difference?”

You try out for the high school baseball team. You can hit fastballs, you’re not so good hitting curve balls. The coach’s son can hit curves but not fastballs. In tryouts, you both get 90% curves. The coach’s son beats you out.

Their argument says, “The coach is biased.” My argument says, “Players hitting in high school baseball games aren’t going to just get curve balls.”

Which argument is more likely to convince the PTA? Which argument is the coach going to have more of a problem answering? If you argue the first, the coach can just deny it, and then it becomes a debate over the coach’s character.

If you argue the second, the coach can hardly creditably state that his team isn’t going to see a lot of fastballs, and it would be very hard for him to object to solving the problem by redoing the hitting tryout with a variety of pitches.

If the coach refuses to do that, doesn’t that flush out his real motivations a lot better than pro-coach and anti-coach factions endlessly arguing about it?

The point is not to say the authors couldn’t well be right; I’m just saying that proving intent is much harder than proving inadequacy, and it is not essential here. Since you can hang the defendant well enough for other reasons that you can back up 100%, that’s what you go for first, then think about the other stuff.

Another way to show the difference is that at least one of the authors seem to think the question is: “Why is SysMark 2002 a bad benchmark?”

It isn’t. To me, the initial question should be: “Is SysMark 2002 a good benchmark or not?” The answer I get is, “No.”

For me and most people, that’s enough not to use. It’s fine by me if others then ask the second question, “Why is SysMark 2002 a poor benchmark?”

But I don’t need to hear a bad second answer to decide that it’s not worth using. A bad answer to the first question is good enough.

The real jury consists of those who use it. If this get turned into a political argument, to a “this is unfair to AMD” argument, it just turns into another useless argument over character.

On the other hand, it’s a lot harder for anybody, no matter what side they’re on, to argue that any single program function should make up almost all of the whole program’s score, or, in one case, up to 30% of the entire benchmark’s score.

The Important Lesson To Be Learned Here

Some people think it’s smart to only rely on numbers like these. As you can see, that’s not being smart, that’s just doing what Oz says without looking behind the stage.

When you get hooked on a number, you become the puppet of those who construct the numbers, and they can pull your strings and make you do whatever they want just by manipulating them behind the scenes.

That’s smart? I think not.

The Philosophical Lesson To Be Learned Here (extra credit)

Many, perhaps most human beings almost instinctively tend to gather in camps and view these matters as exclusively “Us vs. Them” disputes. Why? Security in numbers is no doubt part of the reason; transference of most or all of the burden of thinking to the group is another reason less-obvious but still true.

What often puzzles and confuses these people is that this is not the only way to look at these issues, indeed, it’s often a bad way to do so. They can’t conceive of anything else besides “Us. Vs. Them;” there cannot be any alternate viewpoint.

Just to cite recent history, my criticisms of ATI and the R9700 were viewed by more than a few as pro-nVidia, “if you’re not for ATI, you must be for nVidia,” which was not the case at all. To say ATI did something bad does not make nVidia good.

Dig deeper, and you see a denial that there is something called objective reality. Reality becomes a subjective political matter determined by at best how well the “Us. vs. Them” is faring at the moment, but more usually, how you feel about it at the moment.

A benchmark cannot operate by “Us. vs. Them” rules if it is to retain any value. It must rather operate on a much different set of principles; a reasonable approximation/simulation of an objective reality (i.e., typical user use). No “Us. vs. Them” there; AMD users don’t use different Photoshop commands than Intel users.

No simplification can mirror reality perfectly, but lack of perfection does not render it useless; you don’t get an “F” when you get 98% on a test.

The reason why people don’t like the notion of objective reality is that the rules fall outside their control and influence. If the other side wins the test, no arguing can refute that.

Of course, what gave objective reality a bad name in some circles is that the term was used to cover much more than it could. Or it was used to cover oversimplifications. That’s often the case in this arena.

For instance, in the caae of AMD vs. Intel, there is no one objective truth about which is a better performer. Provided roughly equal clock speeds or equivalents, the AMD processor does better at some things, the Intel processor does better at other things.

Nor can “better” usually be isolated to one factor like performance. If you want performance but can’t stand noise, you will likely choose differently than one who is indifferent to sound.

Subjective reality in an area like this one is composed of a variety of objective realities judged by the relative (and differing) values placed on those realities by individuals.

The deep underlying reason for so many arguments and fights is that people often think that their subjective reality is objective reality, when in fact, it isn’t. What people are really (and usually unknowingly) arguing is “the way I value these various realities is the only way.” And it isn’t.

Some may note that there seems to be a contradiction here. The first half talks about people denying objective reality, then it talks about people thinking objective reality is whatever they think it is.

Yes, there is a contradiction, but it’s not in the explanation, but rather human behavior. People often think tactically, not strategically. They’ll pick up any rock to throw, even if it’s the same rock they condemned the other side from throwing just seconds ago.

It’s completely different, of course, when you’re the one throwing the rock. 🙂 Yet another common mental blinder.

Just realize that this is what these views and behaviors are, and because others blind themselves doesn’t mean you have to also.

Ed

Be the first to comment

Leave a Reply