My suspected stability loss running stock was happening around 90C, and my criteria for stability might be tougher since my normal workload is comparable to Prime95. I'm doing actual work on PrimeGrid, where there is a challenge currently running. By benchmarking, I found the best throughput was with 2 workers (2 tasks) using 3 cores each. I don't think it is possible to specify this in P95 stress test, as it only works one core per worker. FFT size was 1280k but I hear reports some are now 1440k. These fit in each CCX's local cache, without crossing cache boundary and resulting significant performance loss if you do. Temps are slightly lower than my 6x 128k FFT test used here, presumably due to inefficiencies in the multi-threading code. Tasks results are all double checked, so it is easy to see if there has been any instability. I have 2 known bad units, two inconclusive units which I expect to be confirmed bad in due course, 15 known good units, and 3 unchecked so far. Each task takes over 6 hours. Very roughly, I had a 1 in 4 error rate occurring within a 6 hour window per CCX. Running Prime95 stress test at a single FFT range for 24 hours would not necessarily detect this error rate. When Skylake launched I had detectable error rates of around 1 a month until I figured out it was fussy with ram. Tests back then were not multi-threaded and longer ones could take a day or more, which is a lot of compute time wasted from errors.
Anyway, I left the graphite pad in as it was the last one I fitted, and have resumed crunching. Let's see if the new units are without error from the somewhat reduced temperatures compared to the Prism.
I would like to see if temperature related instability at stock on Zen 2 is a thing on more than just this observation e.g. with more systems, but it isn't something I can set up easily. I never observed instability on stock Intel CPUs due to CPU temperature. When power efficiency considered overclocked (modest clock increase, lower voltage setting) I found stability drops around 80C or so, so that remains my target maximum for 24/7 OC running on Intel to this day. I can tolerate higher stock temps.