APUs are monolithic but also have significantly less cache. I have no idea how 5000 APUs are acting but 4000 have higher max IF/IMC clock while they have less cache and a quite low clock what causes them to be slower than Ryzen 3000. Count that fast and large cache is in some part covering delays caused by IMC/RAM. This is why we don't see so significant difference because of RAM timings on Ryzen and in most cases, something like CL14 is barely faster than CL18 (regardless of what people are saying around the forums).
Would a smaller cache, that is likely designed the same apart from capacity, allow higher clocks? I'm not sure about that. Bigger caches may incur more latency as in effect there is a bigger search area to find that data. Does that directly affect clock? Hard to say without knowing more about the implementation.
Caches do mitigate the performance hit due to hitting ram, but it only goes so far. You will still have to go to ram at some point. To me it is more about latency masking, given you can pre-fetch data. As such bandwidth still matters far more, although to me dual rank also helps a lot.
Even in integrated graphics operations, that high IF/memory clock isn't helping as much as expected. I was expecting much better results going from standard DDR4-3200 to DDR4-4533 1:1 or DDR4-5200+ 1:2 ... but it's barely visible and even if there is a 10-15% performance gain then these APUs are slow in games so it's hard to see. It's about ~40GB vs ~65-72GB memory bandwidth keeping about the same latency. I highly doubt that anyone would spend money on top memory series only so integrated graphics would run a bit faster.
The balance between GPU cores and ram bandwidth is likely chosen to match them. There would be little value in putting in more cores if they can't be well fed. That might explain why going much faster in ram doesn't provide as much benefit as expected, it will still be core limited. If you know you will have more bandwidth with future ram technologies, then you can scale up the cores more to balance with that.
Hypothetically, DDR5 6400 would give about 100GB/s bandwidth, which then roughly equals a GTX 1050. If a future APU can reach that performance level it would be barely good enough for demanding titles at 1080p.
I agree, it doesn't really make sense to pair top end ram with an APU just for gaming, but we will see an uplift in the baseline when the time comes. I hope 1st gen DDR5 CPUs will support 6400 up front, even if a more affordable speed in 4xxx will be baseline.
I don't think that the next Ryzen generation will use the same IF design. There have to be next improvements for cache and access time. Right now in theory is high internal bandwidth but there are differences between single and dual CCX CPUs. In single CCX chips, there is dual link for memory read but single for write. Still, memory read is higher on dual CCX CPUs and in some configurations to reach the same bandwidth, can be used lower memory/IF/IMC clock. Clearly it's affected by something else but I wasn't digging into the detailed specs so I'm not sure what is causing that.
I think you mean CCDs. Anyway, that's the open question. What could/would AMD do for DDR5? Current IF will choke on that much data. Intel has historically run async in three domains (cores, L3 cache, ram) with good performance. I wonder if AMD will have to break up with their sync everything thinking. Since Zen 2 they already put in the ability to do so, even if it is best not to for now, it could be a stepping stone to something down the line.