AMD has a number of units, which if I understand it right partly have to rely on software to really show their potential, because with the wrong software some of that shader power goes to a sort of waste because of the design.
NVIDIA's approach is a little different... they have less of what a person might call shader units and they (or at least most of the modern cards) have their own clock speed which is ran faster than the core.
When you compare 128 NVIDIA shaders to 800 AMD shaders, I think when you figure it in that way each NVIDIA shader is more powerful... there's both a higher clock speed, and probably a higher efficiency, to each NVIDIA shader considered... though it's possible the 800 shaders can still deliver more power than the 128 NVIDIA shaders (and probably would in the right shader-intensive programs), since there's still a lot of potential for brute force from the 800 shaders.