More and more HPCs are adding GPUs, since GPUs are much faster for some workloads. Those applications require low latency high bandwidth connection between GPU and CPU, so trading PCI-E for wider memory bus wouldn't be a good idea.
As for why they don't have higher bandwidths in general, like others said, it's probably just because it's not the bottleneck.
1TB dataset isn't really that big. A modern Xeon can stream close to 100GB/s already. So unless the task takes less than a few minutes, memory access time is negligible.
Also, there's only so much bandwidth a core/thread can use, even if the IMC has infinite bandwidth. The charts here summarize it nicely -
http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-haswell-ep-cores-/10
At 8 cores, bandwidth is limited by how much each thread can use, which is why hyperthreading increases total bandwidth. At 14c or 18c, IMC becomes the bottleneck.
However, no real life workload will be totally memory-bound like that.
There are many programming tricks you can use to reduce memory access. Almost all algorithms can be written in a way that mostly works with stuff in L3.
Then there's also the question of whether DRAM is actually limited by the bus, vs the speed of the DRAM chips.