As already covered, bit-ness of a CPU can cover separately addressable ram range, and/or data sizes. I think there are some edge cases where it might be useful for data handling, but they're niche enough I'm guessing the silicon cost makes it un-economic. Look at consumer grade GPUs, they've had crippled FP64 performance for a long time now since it largely isn't needed outside of specific use cases, where they're happy to sell you a more expensive product that isn't crippled. FP32 perf OTOH is ever increasing, and with ML stuff, we're seeing smaller data sizes being the trend, not bigger. Basically, find a killer app for the masses that needs 128-bit data sizes, then they'll make the CPU support for it happen. Don't hold your breath.
In my niche interest of prime number finding, even 64-bit is too "small" to be useful in itself, and some much smarter people than I are able to cobble it together to work as efficiently as possible on as big a number as you like. I have long wondered, what a hypothetical "native" large data type CPU would look like. Never mind 128-bit, think megabit data sizes. In part I think this wouldn't happen because doing it in conventional ways would simply not scale well. That is, one 128-bit instruction may take more transistor count than two 64-bit instructions. Look at long multiplication. If you multiply two 2 digit numbers together, that's 4 sub multiply operations and some adds thrown in. Multiply two 4 digit numbers together, that's 16 sub multiplies, and a bunch of adds. Complexity going up as a square of data size, maybe there's some optimisations in there, but it will probably still hurt to implement.
There's another way to look at it, and we may be beyond 128-bit in a way through SIMD. That is, apply an instruction to multiple data chunks at the same time. For example, with a good implementation of AVX2 (Intel since Haswell, AMD since Zen 2), you can process 8 FP64 data at the same time. Two unit AVX-512 implementations doubles that again.