128-bit CPUs

SPL Tech · Jan 9, 2021

Why are we still on 64-bit CPUs and programs? That's so like 2005. When is 128 bit processing coming out? I feel like this is long overdue.

David · Jan 9, 2021

I forget what the reason for the 32 -> 64 bit switch, but I think the memory limit (~3.5GB) was a big one. 64 bit chips - if my quick search is right - can address 16 exabytes. I think the most memory I've ever needed to use was ~1.5 TB and that was a very special case on a very specialist system.

So my question in return would be: what can't you do with a 64 bit CPU that you need a 128 bit one to do?

Robert17 · Jan 9, 2021

And when will programming be thoroughly 64bit across all applications? We're already decades in on it.

EarthDog · Jan 9, 2021

There's no technical barriers AFAIK... but there just isn't a need or point. IIRC, you'll need HUGE cache sizes to handle the data..it will require more power and larger chips to make. With that pricing will be sky high as well. No OS out that can take advantage of it...hell, even a C-level exec at ARM stated they had no plans for 128-bit CPUs (2019).

Why do you feel it is long overdue? With what was said above (and doing a bit more research) it doesn't seem like there are many benefits(?).

mackerel · Jan 12, 2021

As already covered, bit-ness of a CPU can cover separately addressable ram range, and/or data sizes. I think there are some edge cases where it might be useful for data handling, but they're niche enough I'm guessing the silicon cost makes it un-economic. Look at consumer grade GPUs, they've had crippled FP64 performance for a long time now since it largely isn't needed outside of specific use cases, where they're happy to sell you a more expensive product that isn't crippled. FP32 perf OTOH is ever increasing, and with ML stuff, we're seeing smaller data sizes being the trend, not bigger. Basically, find a killer app for the masses that needs 128-bit data sizes, then they'll make the CPU support for it happen. Don't hold your breath.

In my niche interest of prime number finding, even 64-bit is too "small" to be useful in itself, and some much smarter people than I are able to cobble it together to work as efficiently as possible on as big a number as you like. I have long wondered, what a hypothetical "native" large data type CPU would look like. Never mind 128-bit, think megabit data sizes. In part I think this wouldn't happen because doing it in conventional ways would simply not scale well. That is, one 128-bit instruction may take more transistor count than two 64-bit instructions. Look at long multiplication. If you multiply two 2 digit numbers together, that's 4 sub multiply operations and some adds thrown in. Multiply two 4 digit numbers together, that's 16 sub multiplies, and a bunch of adds. Complexity going up as a square of data size, maybe there's some optimisations in there, but it will probably still hurt to implement.

There's another way to look at it, and we may be beyond 128-bit in a way through SIMD. That is, apply an instruction to multiple data chunks at the same time. For example, with a good implementation of AVX2 (Intel since Haswell, AMD since Zen 2), you can process 8 FP64 data at the same time. Two unit AVX-512 implementations doubles that again.

Culbrelai · Jan 12, 2021

Quantum computing will come before x86 128 bit CPUs

David · Jan 13, 2021

mackerel said:
There's another way to look at it, and we may be beyond 128-bit in a way through SIMD. That is, apply an instruction to multiple data chunks at the same time. For example, with a good implementation of AVX2 (Intel since Haswell, AMD since Zen 2), you can process 8 FP64 data at the same time. Two unit AVX-512 implementations doubles that again.

Good point - some computational chemistry codes benefit a lot from AVX2. Not sure if any of them have implemented AVX-512 yet.

mackerel · Jan 13, 2021

Not a programmer so don't know how much effort it is to support AVX-512 if you already support AVX2. To me it seems an easy win, with limitations. The biggest being, the number of CPUs in the wild with AVX-512 two unit implementations are very low. Excluding server stuff, that's Skylake-X and Cascade Lake-X. Unfortunately mobile implementations of AVX-512 are single unit, so don't offer a throughput benefit in this use case. Note AVX-512 is a family of instructions, so it may still provide benefit in other areas.

EarthDog · Jan 13, 2021

Where's the OP? Lighting fires and walking away?

Why is it long overdue...?

128-bit CPUs

SPL Tech

Member

David

Forums Super Moderator

Robert17

Premium Member

EarthDog

Gulper Nozzle Co-Owner

mackerel

Member

Culbrelai

Member

David

Forums Super Moderator

mackerel

Member

EarthDog

Gulper Nozzle Co-Owner

Similar threads