Doing some multi-thread prime number finding tests now. This uses software called LLR, which uses the same math library as Prime95. Work is PrimeGrid PSP, currently they're running the 1280k FFT size.
On first 3 systems I tested running 4x single threads, and compared it to one task 4 threads.
i5-4570S stock saw a 4.06x speedup, for a small throughput bonus. This is still advantageous as shorter time to return work increases your chance of being the initial prime finder.
i5-5675C OC 3.5 saw a 3.76x speedup, so actually a loss in throughput here. This is probably an exception due to its large L4 cache, meaning the single threads were probably not being ram limited, and we lose a little from MT overhead.
i7-6700k OC 4.2 saw a 4.67x speedup. Really nice! Further testing needs to be done with a 6600k to see if the L3 cache plays a significant role in this.
Wait, isn't this a Ryzen thread? I'm running an 8 thread test on stock 1700 now (SMT/HT off on all systems). Note as it is similar to P95 28.xx, it doesn't use the FMA code, although I don't think there is much penalty to use the older AMD optimisation for a general comparison to Intel here. It is currently estimating taking 5.3 hours a unit, which is slightly faster than the 4570S at 6.3 hours, and 5675C at 5.7 hours, but slower than the 6700k at 4.2 hours. E5-2683v3 however is estimating 3.0 hours! I knew Ryzen was never going to be strong in this area, but I've long suspected double the cores at about half the IPC meant it would still be competitive per socket. Note Ryzen has more than enough cache to take ram out of the equation in this task, but there may still be a concern about cross-CCX traffic. I'm not going to run 8x single threads on Ryzen as I don't have long enough to live (probably looking around 2 days runtime for that).