• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

128-bit CPUs

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

SPL Tech

Member
Joined
Nov 28, 2006
Why are we still on 64-bit CPUs and programs? That's so like 2005. When is 128 bit processing coming out? I feel like this is long overdue.
 
I forget what the reason for the 32 -> 64 bit switch, but I think the memory limit (~3.5GB) was a big one. 64 bit chips - if my quick search is right - can address 16 exabytes. I think the most memory I've ever needed to use was ~1.5 TB and that was a very special case on a very specialist system.

So my question in return would be: what can't you do with a 64 bit CPU that you need a 128 bit one to do?
 
There's no technical barriers AFAIK... but there just isn't a need or point. IIRC, you'll need HUGE cache sizes to handle the data..it will require more power and larger chips to make. With that pricing will be sky high as well. No OS out that can take advantage of it...hell, even a C-level exec at ARM stated they had no plans for 128-bit CPUs (2019).

Why do you feel it is long overdue? With what was said above (and doing a bit more research) it doesn't seem like there are many benefits(?).
 
As already covered, bit-ness of a CPU can cover separately addressable ram range, and/or data sizes. I think there are some edge cases where it might be useful for data handling, but they're niche enough I'm guessing the silicon cost makes it un-economic. Look at consumer grade GPUs, they've had crippled FP64 performance for a long time now since it largely isn't needed outside of specific use cases, where they're happy to sell you a more expensive product that isn't crippled. FP32 perf OTOH is ever increasing, and with ML stuff, we're seeing smaller data sizes being the trend, not bigger. Basically, find a killer app for the masses that needs 128-bit data sizes, then they'll make the CPU support for it happen. Don't hold your breath.

In my niche interest of prime number finding, even 64-bit is too "small" to be useful in itself, and some much smarter people than I are able to cobble it together to work as efficiently as possible on as big a number as you like. I have long wondered, what a hypothetical "native" large data type CPU would look like. Never mind 128-bit, think megabit data sizes. In part I think this wouldn't happen because doing it in conventional ways would simply not scale well. That is, one 128-bit instruction may take more transistor count than two 64-bit instructions. Look at long multiplication. If you multiply two 2 digit numbers together, that's 4 sub multiply operations and some adds thrown in. Multiply two 4 digit numbers together, that's 16 sub multiplies, and a bunch of adds. Complexity going up as a square of data size, maybe there's some optimisations in there, but it will probably still hurt to implement.

There's another way to look at it, and we may be beyond 128-bit in a way through SIMD. That is, apply an instruction to multiple data chunks at the same time. For example, with a good implementation of AVX2 (Intel since Haswell, AMD since Zen 2), you can process 8 FP64 data at the same time. Two unit AVX-512 implementations doubles that again.
 
There's another way to look at it, and we may be beyond 128-bit in a way through SIMD. That is, apply an instruction to multiple data chunks at the same time. For example, with a good implementation of AVX2 (Intel since Haswell, AMD since Zen 2), you can process 8 FP64 data at the same time. Two unit AVX-512 implementations doubles that again.

Good point - some computational chemistry codes benefit a lot from AVX2. Not sure if any of them have implemented AVX-512 yet.
 
Not a programmer so don't know how much effort it is to support AVX-512 if you already support AVX2. To me it seems an easy win, with limitations. The biggest being, the number of CPUs in the wild with AVX-512 two unit implementations are very low. Excluding server stuff, that's Skylake-X and Cascade Lake-X. Unfortunately mobile implementations of AVX-512 are single unit, so don't offer a throughput benefit in this use case. Note AVX-512 is a family of instructions, so it may still provide benefit in other areas.
 
Back