As software is constantly changing, there is no real way to make a "definitive benchmark". The "ultimate benchmark" from 5 years ago is vastly different from the "ultimate benchmark" of today, if such a thing even exists. The question isn't whether a benchmark is "fair" or not, merely that it's relevent. That is, a lot of people actually depend on performance in such application. Does it matter whether Sandra gives you a better score? Are you going to spend most of your time running Sandra? Does it matter more than, say, your UT2003 scores are better? Your video encoding times are better? Definitely.
That's really the only way to determine "which is the superior processor" in terms of performance. Take commonly used software, test it on both platforms. That's what most sites do and that's where we draw conclusions. Are all consumers going to look at them? No. Is there a way to dumb down all that information to something that would capture the consumer's limited attention span without loosing information? Absolutely not.
Computers are complex things and performance is a complex thing. You can't make it simple without making hasty generalizations. Hence, this "educating the average Joe" thing is a lost cause.