Benchmarks react not to latency only but to overall performance which is something like a balance between high bandwidth and the low latency. It can also be translated in synthetic bandwidth tests as memory copy (not directly but it is something like memory performance in general). This is what AIDA64 is pointing out in the latest versions (after the test there is a question mark in a circle above read/write/copy results).
Another way to check it is to run 'winsat mem' from a command prompt. This is how memory "speed" translates into performance in the Windows environment.
High memory clock lowers latency and timings lower latency. The balance between a high clock and tight timings always give the best results. On Intel, tight timings count much more than on modern AMD.
On quad-channel platforms, there is already high bandwidth, and memory clock is usually limited, so there is a higher performance gain from tight timings than pushing memory frequency to the limits.
Most benchmarks in competitive benchmarking (at least those that give good points) don't react really well to high memory performance. You can get about 50 points more in Cinebench R11/15/20. 3DMarks react well in some configurations (physics tests). In the mentioned SuperPi32M memory is really important, and it's probably the only benchmark which is really worth to spend a lot of time on memory tweaking. x265 benchmarks get a bit better results too but not so significant.
Some more examples:
On Intel Z3xx, 4000 CL12-12-12 2N gives similar results to 4800-4900 18-18-18 2N in most benchmarks, On X299 3600 12-12-12/13-13-13 seems optimal, and it will be hard to pass 3733 at so tight timings. On AMD 3600 CL14 and below is optimal, 3800-4000 is, of course, better but usually not possible. Next step on 3000 Ryzen seems 5000+ as 4700-4800 is a bit slower or faster, depends on the test.