It didn’t look good for Intel last June and July. The Katmai PIII had been shown to do little better than the PII or even cranked up Celerons. Then the Athlon came out, and it looked like it had answered the prayers of even the most crazed AMD fanatic.
Six months later, we see quite a different picture. While the Athlon is still a fine chip that will only get better, we see Intel’s new Coppermine, not too much different than that lousy Katmai, managing to stand toe-to-toe with it for most things, even in areas where the Athlon was supposed to wipe the floor with it. People can actually recommend buying a chip that still bears a resemblance to the old Pentium Pro over a newly designed chip without risk of annoying psychiatric treatment.
The old dog reused a couple of the old tricks, and learned a few new ones. None of them big, but they added up.
Here’s what happened:
The glories of on-die cache: CPUs are a lot like very, very quick short-order cooks. They can whip up meals in a flash, but they have to have food handy in order to work fast. No food, the cook just waits around until he can get his hands on some.
If the CPU is the cook, the L1 cache is like the food in the skillet. The L2 cache is like the food on-hand within a step or two of the cook. Main memory is like a refrigerator further away. Hard drives and CDs are like the grocery store, and the Internet is like the farm.
A big part of PC architecture is dedicated to being like a cook’s helpers. Their job is to keep the cook cooking all the time. That means getting him exactly the food he needs exactly when he needs it without him having to take too many extra steps. They try to anticipate what food the chef is going to need next and get it from the refrigerator, grocery store or farm before the chef asks for it. Otherwise, they keep the cook waiting. Helpers like these can make a big difference in the number of meals a cook can cook. If you keep a faster cook waiting more than a slower cook, the slower one can cook as much or more while the speedy one spends a lot of time twiddling his thumbs waiting.
Intel did two things to help the chef. (AMD did one, increasing L1 cache, which is like increasing the size of the cook’s skillet. That lets the chef cook more food at one time, but eventually you have to get new food to cook). First, Intel gave the Coppermine memory that runs at the same speed as the processor and reduced the latency of the L2 memory. That’s just a fancy way of saying that it lets the chef get his hands on food a lot faster, like putting everything within arm’s reach rather than making the chef take a step or two to get it.
Current AMD chips, on the other hand, rely on older external memory chips running at only a fraction of the speed of the processor with higher latency. This is like putting all the food a step or two away from the chef. The first .18 micron Athlons will have memory running at an even a smaller fraction of the processor speed, which is like giving the cook track shoes, but putting the food another step away.
Doing it Intel’s way is a lot harder and expensive to get exactly right, like having Martha Stewart designing the kitchen. You really don’t want to have to bring her in four or five times. It’s cheaper in the long run to spend more time once designing the kitchen for the fastest chef you ever plan on bringing in the kitchen rather than bringing her in several times to design for each cook. By putting all the food within arm’s reach, if you get a faster cook later, he’ll just grab faster.
This is very important for overclockers, who make the cooks work harder than the Intel boss. At least the design of the kitchen won’t slow the chefs down.
AMD plans on at least getting the memory faster by this spring, which helps, but it still remains to be seen whether they can also get latency down.
Integer still counts: The designers of the Athlon kitchen worked most on the weakest part of earlier AMD design, floating point. Outside of increasing L1 cache, though, they didn’t do much else to the integer oven, which is around the same as that of the Coppermine. Most PC applications use integer math, so that ended up pretty even between the two processors.
Floating point, part one: Waiter! AMD really beefed up floating point on the Athlon. Consider floating point to be like making pasta, and the PII had two pots to the K6’s one. The Athlon now has three pots, so it can theoretically make a lot more pasta at one time than the Coppermine, and if you just watch how much pasta can be made, the Athlon will make more.
However, just as a CPU does not a computer make, nor does a fast chef necessarily mean a quickly served meal. Somebody has to take the meal out to the customers. A lot of floating point intensive programs like 3-D rendering and scientific programs eat in the kitchen, so the Athlon can really dish it out faster than the Coppermine. However, gamers eat in the dining room, and your video card is the waiter. It doesn’t matter if the chef can make three bowls of pasta at a time if the waiter can only carry two of them at a time.
Even a GeForce 256 gets maxxed out pretty quickly by either a Coppermine or Athlon. The Athlon can make all the pasta it wants, but that won’t make the waiter any faster. What the Athlon kitchen really needs are waiters with three arms. The question is whether they can find one (maybe his name is NV15) before Intel comes out with its new Willamette kitchen rumored to have even more pasta pots.
Floating point, part two: 3DHow? Both Intel and AMD have their own ways to accelerate floating point activity by letting the processor take multiple actions at the same time. Sort of like flash-frying in oil. However, it’s like the Coppermine’s deep fryer makes french fries while AMD’s makes fried zucchini. Fried zucchini can be pretty good, but most people order french fries, just as more people have optimized their applications for SSE rather than 3DNow. Not too many (DirectX being a big exception) have done both, so it’s likely floating point enhancements in an application will benefit the Coppermine more than the Athlon owner. However, since most of the menu isn’t deep fried, if you don’t order something that is, quick frying doesn’t help you at all.
Prefetch optimizations and buffering and wider memory path: Helping the helpers. We’ve talked a lot about having food handy for the chef, but the chef’s helpers get it there. If you give the helpers better instructions on what the chef needs when he cooks certain meals, the helpers are more likely to have the right food needed when the chef needs it. That’s all that terms like multiple branch prediction and speculative execution really mean.
Prefetch optimization is another way of doing the same thing. However, in this case, it only happens when the customers ask for it (applications must be compiled with compilers that include these instructions. They also work just about as well in the Athlon kitchen.) Since these prefetch optimizations only do work with new applications compiled on recently released compilers, you have the odd result of expecting programs that will be released six months or a year from now running significantly faster on the same processors than current programs, which aren’t helped a bit.
Buffering lets you put more food within arm’s reach of the chef. A wider memory path lets you carry more food at one time to the chef whenever he needs it.
RAMBUS, any bus: Faster fridging. No matter if you use RAMBUS or PC133 or VCSDRAM or something else, the reason why you buy it is to have faster memory. If RAM is the kitchen’s refrigerator, faster RAM (along with motherboards designed to run at higher bus speeds) is like a new refrigerator that lets you get what you want out of it and over to the cook faster.
In contrast, the Athlons are stuck for the moment at memory speeds of 100Mhz or a little more. That will change shortly, but until Athlon L2 memory is designed to run as fast as the fastest expected cook in the family, it will not be as overclockable as the Coppermine.
None of these improvements (outside of maybe prefetch, which helps benchmarks like spec95 a lot but may not help as much in real applications; we’ll have to see) help a ton all by themselves. It’s a few percent improvement here, a few there, but taken all together, it makes the Coppermine look a good deal better than Daddy Katmai, and makes it competitive with the Athlon.
The last and best improvement for overclockers is:
If you can’t stand the heat, get a cooler kitchen: Cooking is hot work, and when it gets too hot, the chef stops. If it gets too hot too often, the cook gets brain damage and isn’t good for much of anything. The .25 micron Katmai was like working on an old big hot oven’s burners. The chef could only work so fast before he got too hot.
The Coppermine is made with a .18 micron process. This is like buying a new, energy-efficient stove that needs less power and generates less heat. That way, the cook can work a lot faster without getting too hot. The new notched circuits take this idea even further.
Intel’s kitchens are always set up to accommodate a number of different chefs. Some are rated to work faster than others. Since they all come from the same cooking school, even those not rated as the quickest chefs can often work as fast as the quickest chefs provided you help them a little help. In most cases, you can get a slow chef to work as fast as a quick one if you give him better air conditioning than Intel gives him. Others need a little extra juice than Intel shells out to work faster, but like real chefs, not too much, otherwise, he stops working altogether.
In the old PII and Katmai kitchens (and in the current Athlon kitchens), Intel used to design the L2 food tables (cache memories) for each and every chef based on how fast he was supposed to work, and even chefs that could work quicker usually couldn’t go a whole lot faster. In the Celeron kitchens, the L2 food tables were designed to go as fast as the fastest cook to save money on constantly redoing the kitchen for faster chefs. The Coppermine kitchens follow the Celeron pattern.
While Intel’s having some problems with finishing up the Coppermine kitchen, even now the L2 food tables can handle pretty fast chefs. Maybe not as fast as the chefs that will be around in six-nine months, but pretty fast, and a lot faster than what the slower chefs are being asked to do by Intel. Just as the slower Celeron chefs, the slower Coppermine cooks look like they can work at least as fast as the quickest ones Intel has certified so far.
The way Intel overclockers get their cooks to work faster is to increase the bus speed, which is like telling the cook to cook more meals an hour. With Celerons, overclocking was like telling the cook to cook at 100 or more meals an hour rather than the standard 66. With Coppermines, they come in the 100 meal an hour and 133 meal an hour types. Right now, the most you can expect out of any cook is about 150 meals an hour, so you obviously have a lot more room for improvement with the 100 MAH Coppermine than the 133. Since the slowest Coppermines are well below the speeds of what will end up being medium-speed Coppermines, these are under-achieving cooks that can do more work than Intel will let them.
Some of the overclockers have had some problems getting the new cooks to work up to their potential, but I think that’s because they’re trying to get the cooks to work using the older, slower BX refrigerators and AGP waiters. The refrigerator isn’t too bad, but the waiters just go on strike when you push them too much. Union rules, you know. Under the new Apollo and Camino union contract, you don’t stress out the waiters but still can get the food out.
But there’s hope even for older restaurants. Next spring and summer, there will be a Coppermine Junior kitchen cooks coming out, and at least originally, their speed limit will be 66 meals an hour, so pushing it to 100 MHz should be just like overclocking the Celeron chefs. While they won’t work quite as well as the seniors, they also won’t ask as much from the refrigerator or waiters, and the old ones should be able to handle it with a little slotket remodeling.
The Athlon won’t stand still, either. Many of the improvements made to the Coppermine are already planned for later versions of the Athlon. But no matter what it is, it all comes down to make that cook work faster and more efficiently.