Apologies for the continued thread jacking but I moved one the overheating GTX 1070s from my dual GTX 1070 machine to another comp with the GTX 1080, since it has enough PCIE slots so that the two cards aren't right next to each other.
Unfortunately this new machine is not liking the second card. The system runs OK for a bit, but then it just hangs with no BSOD or anything. Needs a hard restart. I disabled the second cards folding so my PPD isn't totally tanked but I'm trying to figure out the problem. Anyone also have this problem?
The system I'm running is a
Mobo: MSI SLI Plus X99
CPU: Xeon 2679v4 20 core 3.0 Ghz (200w TDP, a server OEM chip), HT disabled (the work programs I use don't benefit)
RAM: 8x8 GB DDR4 2400 Mhz
GPU: MSI Armor GTX 1080, MSI Armor GTX 1070
PSU: 750w 80 Plus GOLD, here's the review for 850w
http://www.jonnyguru.com/modules.php?name=NDReviews&op=Story5&reid=206
OS: Windows 7 SP1, fully updated
Drivers: NVIDIA 376.48 (Hotfix for folding)
CPU is running SMP folding. 18 cores on SMP folding. 2 reserved for GPU feeding.
I've narrowed down to either two problems.
1. Xeon 2679v4 does not have proper support for 2nd GPU in its PCIE controller. This was a chip custom designed for a datacenter needing high clocks and high threads. A beast chip but maybe not with full PCIE support if it wasn't needed by the specific customer. It's pretty niche so there's not much documentation. There are some reports of it being used for SLI but as we all know folding can push things a lot harder.
2. Power supply too weak. (200w CPU + 150w + 160w = ~ 510w + 50w for ram and such) It should still be able to handle this load, but maybe it's the sustained load, or the PSU has just degraded/is faulty.
Any insight?
Edit: Oh there's also some slight BCLK overclocking (100 mhz -> 103.6 mhz). That's all you can get with any Xeon. Maybe that's enough to mess up the PCIE controller? I'll try reverting it tomorrow and testing without the OC. No GPU overclocking.
I've also established the crashing isn't due overheating. All temps are reasonable (<70 deg C)