• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

So, who's getting a 5XXX series card?

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
The number of shunts and how they are tied into the power input will only add to sensing, it does nothing for actively mitigating anything. Mitigation requires a smart controller to act on the data it is given. Coincidentally, the only way for consumer products to be aware of an urgent power issue on an end device will have to come from the host OS. In terms of potential damage mitigation, this is the absolute worse process for it. For GPUs it would be something like this: Power Input VR senses power exceeds rated -> Power input chip sends I2C alert or GPIO drive to GPU or another power controller (if its another power controller it would be one more hop to GPU) -> GPU sends interrupt via PCIe or asserts THERMTRIP to CPU or PCH -> HOST OS captures request and intiates shutdown. In the same time span of this action sequence, the connector would already have melted and all small fire could potentially be there. In the server world its common for BMCs or dedicated chips like FPGAs to override the HOST and shut down the system to a 0V input to protect hardware in cases of power instability.

The number of power inputs will only dampen risk, it will not fix whatever this connector issue is.

With the news that this issue is not easily reproducible, I'm going to guess that De8aur and GN and etc will talk with their Nvidia contacts to get an understanding on what is happening.

I'm in agreement with De8baur and Buildzoid saying that the design at an external glance is a risk. I trust the engineers at Nvidia have done the right things to make sure this is a safe product. But because there are failures, and that it may be dependent on specific variables, makes me believe that a late design change caused initial safe designs to fall short.
Why can't they just use a form of a breaker ? Gets too hot it shuts off. The only reason I could think of not to is that ambient temps could trigger it. The type of temps we do not have in a Service Panel.
 
Why can't they just use a form of a breaker ? Gets too hot it shuts off. The only reason I could think of not to is that ambient temps could trigger it. The type of temps we do not have in a Service Panel.
Wouldn't the surrounding temps affect the breaker? Between the core and the mem it can be pretty toasty...
 
You still need something to configure that breaker. Each users environment will change some variables. Go talk to server management teams to see how they like configuring each BMC in their data center lol.

Also, the breaker is the PSU. The PSU already has OVP/OCP functions just like a GPU. PSUs could be told to react quicker by yanking power but again its gotta be told sooner. PSU outputs are rated for higher tolerance than an end point consuming said power.
 
Wouldn't the surrounding temps affect the breaker? Between the core and the mem it can be pretty toasty...
Surely, there's a threshold that ambient temps wouldn't set it off... if the cables are good to 150C.... I don't think heatsoak from the processor/memory would trip it.

Why can't they just use a form of a breaker ? Gets too hot it shuts off.
Two shunts (NOT in parallel like in the 4090) would do (according to BZ).

The problem with 4090 is two shunts in parallel and it spits out a single 12V again. Whereas the 3090 they are in series. 5090 FE has one shunt resistor (same difference as 4090 - all spits out one 12v blob).



Def. watch the BZ video... you'll get the idea. Really worth the 20 mins IMO.


shunt.png
 
I'm going to ask a really stupid question here...probably.

Not knowing the mechanism behind why or how there's a load balance issue other than what may or may not still be poor contact of the connection itself, would a single cable (of appropriate size of course) for 12v/gnd, obviously split at the connector end, effectively split that load across said cable?

The very short individual traces at that point should alleviate some of that issue, should it not?

Unless I'm missing something, there's no specific reason they need to be individual wires in this respect that I'm aware of; it's ultimately the same bus.
 
Can you use a single wire thats sized correctly? Yes

Is it a good idea? No

The quick basics is this:
Lower error rate for defection -> typically if one wire is not performing as well as the rest in the bundle, the entire cable will continue to work
Higher surface area == higher electron mobility == higher current capacity. Electrons like to flow on the surface not in the center of a conductor :). One big wire will perform worse than a bunch of smaller wires equaling the capacity of the that one big wire. This property is not exactly linear and so increasing the individual wire amount and decreasing the diameter continuously will not yield a linear performance capacity.

I do not think there is a way to fix this in the cables. The cables need to be dumb and tied to a simple standard. The 12v2x6 connectors are fine and better designed. Again I believe that Nvidia should have put two connectors down and that would solve this issue. Even if both connectors used the same shunt, we still wouldn't have an issue.
 
To be clear, I'm not talking about one larger solid core...still stranded. Think car battery. Not 0 gauge of course, but the principal being the same. I understand the basics in the context of electron flow over a physical media, but given automotive wiring works in the same way, that was the basis of the question.

I appreciate I'm detracting from the underlying issue here, but I'm trying to determine if there's a suitable "workaround" that doesn't require a manufacturing fix.

It's all good...and nothing more than an academic exercise at the end of the day. :cheers:
 
Higher surface area == higher electron mobility == higher current capacity. Electrons like to flow on the surface not in the center of a conductor :). One big wire will perform worse than a bunch of smaller wires equaling the capacity of the that one big wire. This property is not exactly linear and so increasing the individual wire amount and decreasing the diameter continuously will not yield a linear performance capacity.
I never looked at this in practice, but have vague memories of being told skin effect is more an RF thing. For power delivery, we're at DC where it doesn't matter. Or does it? For instance, how much high frequency ripple is there if that matters? As another example, UK wiring in buildings for power delivery is solid copper core for each wire. That runs at a whopping 50 Hz! I've looked around a bit since I'm very rusty on this, and one website states that at 60 Hz you'd need a solid conductor of 17mm diameter before you see it take effect. We're not even close to that in PC space, but it is a thing if you're into large scale power delivery.
 
@Railgun

You could remove the connector entirely and hard solder wires into the board and then plug that directly into the PSU.
 
That makes sense... you deal with the theoretical quite a bit. :p

Being serious, as a whole, I see how it can add up in people's head. But just because a dude repairs a last gen card daily, didn't prove what the issue was on the new series. If anything, it pointed back to the cables. You literally posted toasted wires with old 3rd party cables. Now, I can make the logic leap too, but wasn't all in. Correlation is not causation in all. You did the best you could with the pictures you had. It's unfair to say poor...sorry about that.

That said, the big question now is, why did it happen on der8aurs sample (which he had trouble with the card from day 1 of even putting a signal on the screen), but not on ANY of Falcon Northwest's systems who used dozens of cards, psus, and cables?

I'd really like Nvidia to respond already....

If it's really the design like BZ said (i do believe him), why doesn't it happen with every card? What's the straw that breaks the camel's back to make this behavior happen?? So far we've seen 2 5090s do this, right??
No worries.

Regarding der8auer specifically, what he showed is concerning, but...
Jonny Guru kinda debunked that video and said that if the cable really was at 150c, it would have melted instantly. He also said most of the heat should be on the GPU side of the connector and not PSU side, which I agree with. So I think der8auer should repeat the test, try a different cable, try different psu terminals, etc.

NVIDIA should just announce, "if you bought a new GPU, get a new cable for your PSU". The root of the issue is that they haven't recalled the H+ 4090, which in my theory, is a cable destroyer. Take a destroyed cable and plug it into a 5090, and bam...
That explains what happened with that melted 5090, maybe it explains whatever der8auer is seeing as he used that cable to power his 4090 too.

Is your 4090 H+ or H++? If you're getting a 5090, might wanna join the testing mix.
 
Is your 4090 H+ or H++? If you're getting a 5090, might wanna join the testing mix.
My 4090 is a liquid cooled suprim. It's been running in my daily machine since after the launch day review using the included cable (3x 8-pin to 12vhpwr) since that time. It's been removed and reinstalled 3x since.

As far as the 5090, one landed and I'll review it March after the 5070ti, 5070, and 9070xt. Tboigh i don't have any ir guns... if it melts, we have a problem, lol.
 
Jonny Guru kinda debunked that video and said that if the cable really was at 150c, it would have melted instantly. He also said most of the heat should be on the GPU side of the connector and not PSU side, which I agree with. So I think der8auer should repeat the test, try a different cable, try different psu terminals, etc.
This comment was enough for me to find Jonny's comments on reddit, and also Der8auer's counter-comment on another forum with similar name to this one. I think they're talking across each other though, not quite the same thing.

Jonny does mention an interesting point: insertion cycles. These aren't like USB-C meant to be plugged many times a day. If you swap GPUs or whatever frequently, you could use it the rated 30 times up and enter the unknown.

Also while IR cameras might be a nice tool too "see" heat, as a temperature measurement tool it will shift depending on the material. So don't trust the number as being accurate unless that is taken into consideration. The black plastic and similar materials used on the outside of the cable are probably close enough to default values though and I wouldn't expect a massive error, but something to keep in mind.

NVIDIA should just announce, "if you bought a new GPU, get a new cable for your PSU".
Given above, this might be what's needed, at least on units towards the upper end of the connector rating, or if the cable has been reconnected many times in the past. I'm now wondering how many times I've moved my 4070 since I got it, although it will be a split of the octopus adapter in my old system, and PSU direct cable in new system. I probably have enough headroom it doesn't matter either way.
Post magically merged:

Tboigh i don't have any ir guns... if it melts, we have a problem, lol.
I use the FLIR One which is pretty cheap compared to PC parts. It is a camera module that connects to phones and is quite basic, but can't complain for the price. Of course you can get nicer stand alone ones, if budget allows!
 
Regarding der8auer specifically, what he showed is concerning, but...
Jonny Guru kinda debunked that video and said that if the cable really was at 150c, it would have melted instantly. He also said most of the heat should be on the GPU side of the connector and not PSU side, which I agree with. So I think der8auer should repeat the test, try a different cable, try different psu terminals, etc.
Also while IR cameras might be a nice tool too "see" heat, as a temperature measurement tool it will shift depending on the material. So don't trust the number as being accurate unless that is taken into consideration. The black plastic and similar materials used on the outside of the cable are probably close enough to default values though and I wouldn't expect a massive error, but something to keep in mind.
IR cameras look for differential from ambient and proper calibration is critical for an accurate temp reading, so while it may not have been 150C it was clearly showing two much hotter than ambient wires and his amp meter was showing the 22a reading to back up the high temperature.
 
A few points that were missed (?) from reviewers commenting on burnt cables (I get that they have not had time for an investigation):
  1. Clamp meters can be inaccurate. Also note that using Hall effect sensors may not be accurate either, unless you shield them from EMI. A breakout board with shunt resistors is the way to go. I'd go with something like this: https://www.ti.com/lit/ds/symlink/ina790b.pdf?ts=1739410875322
  2. Current carrying capacity is a function of temperature. At higher temperatures you have to de-rate appropriately. See here: https://www.eaton.com/content/dam/e...ter/bus-ele-tech-lib-conductor-ampacities.pdf
  3. Thermal imagers are useful for approximate temperature estimations. Attaching a thermistor or a thermocouple onto each wire would give you more precise temperature results.
 

Another point of interest/inspection/data. I know we have a couple EEs/CEs here, but doesn't hurt to have another EE take/understanding of the situation.
Read this... a great read and jives with BZ's vid. My Cliff's Notes takeaways......

Now we get to the 4090 and 5090 FE boards. Both of them combine all 6 12V pins into a single block, meaning no current balancing can be done between pins or pairs of pins. It is literally impossible for the 4090 and 5090, and I assume lower cards in the lineup using this connector, to balance their load as they lack any means to track beyond full connector current. Part of me wants to question the qualifications of whoever signed off on this, as I've been in their shoes with motherboards. I cannot conceive of a reason to remove a safety feature this evidently critical beyond costs, and those costs are on the order of single-digit dollars per card if not cents at industrial scale. The decision to leave it out for the 50 series after seeing the failures of 4090 cards is particularly egregious, as they now had an undeniable indication that something needed to be changed. Those connectors failed at 3/4 the rated power, and they chose to increase the power going through with no impactful changes to the power circuitry.
If we use the 648W figure for 6x9-amp pins from above, a 375W rating now has a safety factor of 1.72x. In theory, as few as 4 pins could carry the load, with some headroom left over for a remaining factor of 1.15. This is roughly the same as the safety limit on the worst possible 8-pin with weak little 5-amp pins and 20AWG wires. Even the shittiest 7A micro-fit connectors I could find would have a safety factor of 1.34x.

The connector itself isn't bad. It is simply rated far too high, leaving little safety factor and thus, little room for error or imperfection. 600W should be treated as the absolute maximum power, with about 375W as a decent rated power limit.
It is my opinion that any card drawing more than the base 375W per 12VHPWR connector should be avoided. Every single-cable 4090 and 5090 is in that mix, and the 5080 is borderline at 360W.


**** cables, **** design.......that didn't change in the 5090. Big Yikes, NV............what say you?


EDIT: A bit down the thread..........
Users asks.........

So there is no way to safely use a 5090?
As far as my opinion goes, no. Unless you cripple it down to 5080 power levels, it is simply too power hungry for this connector. It either needs active load balancing (and it better be good at ~600W) or multiple connectors to brute-force a big safety factor.

The Galax HOF 4090 is actually a good example. 2x 12-pins, so in theory 1320W of power capacity on a 450W card, and if I use the derated connector spec of 375W, that's still 750W. If you find a 5090 like that, only then would I be comfortable running at full TDP.

So either, build the connector right, or use two. Neither of which were done. Add shitty cables............... ICANT
 
Last edited:


Basically moddiy have said to use newer cables for 5000 series and that they made changes between 4 series and 5 series
Wasn't that a moddiy cable that fried in the der8baur vid?
 
Back