• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

A Universal Formula To Rate The Performance of Any Cooling Solution

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
Status
Not open for further replies.

Dr_Emmett_Brown

Registered
Joined
Apr 22, 2020
Location
Caprona
For quite some time, I have been seeking out the "Holy Grail" of objective measurements and formulas that could quantify the Overall Heat Removal capabilities of various cooling solutions. The big question was: Is there such a thing that can be universally applied to any CPU/cooler/system load that will work every time? After sufficient reflection, here is what I came up with...

The amount of heat being removed is, obviously, a function of the CPU load. Therefore, the number that this hypothetical formula will return is just an INSTANTANEOUS SNAPSHOT of the data collected at the moment. To get an overall picture, we need multiple data points. This was the first "realization." We can't rate a Heat Removal Solution with one number. We can say a computer has a 4.0 GHz chip. We can say a triple radiator has a certain interior volume. We can say fans have a max speed. But we can only state the Heat Removal Capacity as a function of a few variables.

So what are the variables?

The wattage output of the CPU
The instantaneous average temperature of the active cores
The ambient temperature of the operating environment
The area of the CPU die itself


Clearly, the more "intense" the program(s) running on the CPU, the more wattage will be exuded. Higher wattage means more heat. This is the first variable.

The second item might seem a little bit murky, since most of our software programs capture the MAX temp on the core (as in HWMonitor) but the instantaneous numbers seem to fly by at amazing speed. What we really need is a "Task Manager" version of the temperatures across each core. More on this later.

If we subtract the ambient temperature of the environment from the core(s) temperature, we have a "working heat reference" of sorts. The bigger the difference, the more heat WITHIN our system as a whole. Smaller is better. Bigger means our Heat Removal Capacity it not that good.

And finally, the heat flux parameter is the area over which this heat is being disseminated. This comes directly from the CPU die itself.

When running a test to get a single Heat Removal Quantifier, the wattage must be captured simultaneously with the average temperature of the active cores. In this respect, it is probably best to either use just one core, or all cores at once. It would be too tedious to cherry pick which core had which temperature as the CPU most often will redistribute the processing load in a computer version of "hot potato."

In the world of thermodynamics, such a concept is referred to as the Overall Heat Transfer Coefficient. But since we are more interested in Heat Removal, I am calling this the "HRQ" rather than that.

So, here is my formula...



formula.png


The huge number in the numerator comes from the fact we are dividing by square millimeters instead of square meters, which would match the spec of the thermodynamic definition of the Overall Heat Transfer Coefficient. One millimeter is 1/1000th of a meter, so if we use square milliemeters for the die size in the denominator, we must "square 1000" in the numerator. Dividing by one millionth less is multiplying by a million.

The beauty of this equation is, it accounts for the different temperatures in our operating environment, and still returns a "fair number" to show how hard the heat removal solution is working. In a hotter environment, the CPU temperatures will also be hotter.

One small note: In the Thermodynamics World, we use degrees Kelvin. We can use degrees Celsius here without needing any conversion since we are taking the difference between the two temperatures, and the degrees Celsius climb as fast as degrees Kelvin.

A practical example:

The HRQ for the Corsair H150i = (1000000)x(200 watts)/[(92C - 22 C) x (174 mm squared)] = 16420

HWMonitor displayed 200 watts the instant the core temperatures were 92C in a room temperature of 22C on the i9-9900KS with a die size of 174 square millimeters.

Plug those numbers into my HRQ equation, and you get the number by which you can judge the Corsair H150i's ability to remove heat.

Again, this is ONE DATA POINT and not an exact number everybody will get every single time. But it is a great way to measure your own system!

The 2-fan version of the first iteration of the Thermosyphon scored 17869 just a few days later.

The wattage was 202, CPU average temperatures were 87C, all other factors the same.

So the Thermosyphon kept it 5 degrees cooler even though the CPU was delivering 2 more watts of heat during that test.

Higher numbers = better Heat Removal Capacity.

Feel free to share your HRQ numbers here.
 
Last edited:
You cannot apply the same formula to air cooling as you can to phase change. There are certain common variables, but beyond that you're comparing apples to oranges. If you're trying to boil down and homogenize to achieve an "all things being equal" quantification, then sure. It's not very complicated, though. You take your ambient, wattage load (however complicated you want to get into that by considering heat soak etc.), and temperature of the thing you're cooling. There's a lot more based on different systems you could measure, though, that don't exist on others. Liquid systems have flow, coolant temps, radiator size/fans, etc etc etc. Phase has cold/hot side temps, refrigerant, condenser size, cap tube/TXV... but I guess those are parameters that can only be compared to like systems.

Expand heavily upon:

So you have Ambient=A, Wattage Load=W, Component Temp=C, and Overall Cooler Efficiency=X. (temps in C)

X=W/(C-A)


Obviously super simple, but it scales somewhat. Just dividing wattage by delta. It could be universally applied, but would turn negative for subambient and the function falls apart. Calling it rudimentary gives it credit. But it does express an output number that classifies how well the system is being cooled. A 50W load and a 30C delta gives X=1.67, where a 200W load and a 25C delta gives X=8. Obviously the second output is the much better cooling solution. This is missing so many other variables of the system... but the average Joe isn't going to want or need to know those.


EDIT: you made your edit while I was posting. Does the die size matter? You're not going to have many cases at all where say an i7700k is going to be presented in two different die sizes. The X value is inextricably linked to the specific production being tested.
 
You cannot apply the same formula to air cooling as you can to phase change.

An air cooler will just report a higher average CPU temperature.
A phase change unit will report a lower average CPU temperature.

You don't measure the temperature on the phase change contact plate. You allow the cooling to propagate through the CPU lid and the TIM, and see what the internal sensors read as reported by the software utility.

I don't think it matters what the "working fluid" is. Perhaps measuring the temperature of the working fluid as it exits the CPU block would be an interesting qualifier. It would be much harder to capture in the instance of air-cooled systems.


EDIT: you made your edit while I was posting. Does the die size matter? You're not going to have many cases at all where say an i7700k is going to be presented in two different die sizes. The X value is inextricably linked to the specific production being tested.

Yes I was leafing through the pages of my lab notes, and I launched "Paint" to make a picture of the formula and edited some of my explanations. It was a long post, and I did not finish it in one sitting.

The die size is what gives the heat flux, or the "density" of the heat being removed. As you mentioned, you won't have different die sizes for the SAME CPU, which is of course what you want! But removing 200 watts from the head of a pin is much harder than removing 200 watts from a much larger surface. Different CPUs have different heat flux numbers, and will result in different temperatures being reported under load as well. The die size parameter helps account for those differences.
 
Last edited:
What about the IHS? What about MCM/chiplets with different areas in package? This is why, for something useful for the masses and not a specialized formula for the few, disregarding surface area may be best. If this is an exercise in precision, then by all means make the calculations. You're not removing heat from the specific surface area directly, you're removing heat from the package. Of course there is going to be a difference, but how marginal that is depends on a lot of factors that I'm far too tired right now to try and evaluate...
 
What about the IHS? What about MCM/chiplets with different areas in package?

The formula accommodates all values possible within each variable. That's what makes it a formula.

The universal law of gravitation calculating the force between two bodies accounts for any masses you plug in for m1 and m2.

Similarly, different die sizes will have different heat flux numbers as a function of CPU loading.

That doesn't mean a "worse" number will be reported for a good cooling solution.

It means the change in heat flux will be accounted for by a change in some other observed variable data point.

The net collaboration of all data points produces the final meaningful number returned by the formula.

This is why, for something useful for the masses and not a specialized formula for the few, disregarding surface area may be best.

What you mentioned above is a misnomer. I think you might not understand the roll of heat flux in determining the heat load of the computer. That's ok, I don't think most people get to this level of detail. But my formula is the correct one, I assure you, "for the masses," as your formula without heat flux is missing something.

Think of how pressure is force per unit area. A small force applied to a very small area is a great deal of pressure, since dividing by a small number is the same as multiplying by a large one.

A woman wearing a spiked high heel exerts as much pressure on the ground as does one tire of an 18-wheel truck.

Similarly, it is much harder to remove heat from a smaller area than a larger one. It requires more energy. Therefore, since the cold plate can only "work" over the area of the CPU die, it is harder to remove the same amount of heat from a small die than a larger one.

If you ponder this some more, it will make sense.
 
I am not a thermal dynamic engineer but here are my thoughts on why you are going down a good path, but using bad data.

1. Neither formula sugessted allows for Time. Sure you can attempt to capture instant cooling efficiency but is that accurate? No. In Doc's example he used a Corsair H150i. Over time the coolant in the AIO will begin to rise in temperature until it is completely saturated and you reach a thermal equilibrium between the heat source (200W in this case- more on this soon), its cooling capacity, and of course ambient temperatures. At this point you should recalulate to determine the coolers Range of efficiency.

2. The second issue I see is the belief that CPU Watts is a hard number. The facts are this is simply not a reliable enough source of data to be used here with any precision. Intel CPUs go through power cycles during instructions and have a peak power limit (PL2), a sustained power limit (PL1), as well as a Time Limit (TAU) that are all factors in determining the current wattage. My understanding is that this is all related to VID Voltage but becomes less accutrate as the VID voltage is further from the actual core voltage. I'm still learning all of this so if I am misunderstanding something here, please correct me. Further, I'm not aware of any software that is capable of measuring the PL1, PL2, or TUA of an Intel CPU or if Intel even reports these sensors.
 
1. Neither formula sugessted allows for Time.

The whole motivation for the advanced cooling solution was that the H150i could not "keep up with" the gradually increasing heat when I eventually overclocked to 5.2 GHz from this 5.0 GHz data set. The temperatures went from 96C at 5.2 GHz to > 102C before thermal shutdown took place. This required a span of about 8 minutes.

In that respect, I agree that TIME is a factor, but not for the underlying reason you mentioned.

My equation is not an f(t), it is more like df/dt. It is literally a SNAPSHOT in time. It is of a single data point, which I articulated.

Having said that, taking snapshots over time with a constant heat load is a good way to test if your cooling solution is OVERBURDENED.

If it is NOT overburdened, and the heat load is being managed by the cooling solution, the need for f(t) disappears because there is no change with time, once at a steady state. Small changes back and forth; sure. Ramping ever-upward, no.

In most instances, this is the case.

Sure you can attempt to capture instant cooling efficiency but is that accurate?

Yes, it surely is! There can be no more accurate measurement!

I think you are confusing the SINGLE DATA POINT nature of the equation with the misinterpretation that it will always produce the same number. I never stated that. If your cooling solution can handle the heat load, this equation works for all values of time. If it cannot, you are doomed to thermal shutdown anyway, and the number is meaningless.

2. The second issue I see is the belief that CPU Watts is a hard number. The facts are this is simply not a reliable enough source of data to be used here with any precision. Intel CPUs go through power cycles during instructions and have a peak power limit (PL2), a sustained power limit (PL1), as well as a Time Limit (TAU) that are all factors in determining the current wattage. My understanding is that this is all related to VID Voltage but becomes less accutrate as the VID voltage is further from the actual core voltage. I'm still learning all of this so if I am misunderstanding something here, please correct me. Further, I'm not aware of any software that is capable of measuring the PL1, PL2, or TUA of an Intel CPU or if Intel even reports these sensors.

The CPU wattage being reported is MORE THAN SUFFICIENT for heat removal estimation purposes. It is not constant, I watch this number change all the time. The fluctuations you mention change by a mere fraction of a percentage point when proper cooling is applied. Throttling down only occurs to "save the chip" from a predicted runaway heat condition. If you see 30-second intervals of CPU throttling (typical tau interval) your overclocking exceeds the heat removal capacity of your system. Insufficient cooling will show the fluctuations you mention. I don't see them at all.
 
Last edited:
So if I understand what you are saying is that the cooler will be just as efficient when you begin the testing procedure, with the coolant at ambient temps, as when it's been underload for, say, 50 minutes and the coolant is heat soaked? And that the math will prove the numbers to be near the same result?
 
The rate at which a block removes the heat doesnt change. If the block is capable of XXX Watts @ XX Degrees C delta... that wont change. What does change is the water temps over time.....which of course effects end temps. But they will both reach an equilibrium as blay said. But the block's properties do not change...just other variables around it (like water temp).

Outside of that, I need to reread this thread for a third time... something isnt landing. If there was a magic equation, I'd imagine it would be known already...?
 
Last edited:
So if I understand what you are saying is that the cooler will be just as efficient when you begin the testing procedure, with the coolant at ambient temps, as when it's been underload for, say, 50 minutes and the coolant is heat soaked? And that the math will prove the numbers to be near the same result?

I am not saying that at all. If you actually read every word I wrote, reflect on it, digest it, you should come to the opposite conclusion as your statement.

I can't think of a way to make this more clear. It will just have to take a while to "sink in."

It is rather involved, so this is understandable.
 
The actual block itself, the properties, do not change. Only the external variables (water temp for one).

If your copper block dissipates heat at (random values, note) XXW /k it will still do so. What is changing the results/temps are the other variables.
 
The rate at which a block removes the heat doesnt change.

If you substituted "entire cooling solution" for "block" I would agree with you. I am not splitting hairs here, there is a technical fine point about whether the solution is condenser or radiator bound, or bound by the air cooling volume per unit of time. For example, if your fans spun at 10,000 RPM, most likely you would not see an improvement in observed CPU temperatures. There is an asymptote beyond which your air cooling is no longer contributing to the radiator cooling. I mention all of this because it may not be "the blocks fault" if you have an undersized radiator or fans spinning at only 100 RPMs.

What does change is the water temps over time.....which of course effects end temps.

Correct.

But they will both reach an equilibrium as blay said.

This is true only if the cooling solution as a whole can keep pace with the heat load. As I alluded to earlier, at 5.2 GHz x 16 threads the H150i did not have the fluid reach steady state. The temperature ramp was such that after 8 minutes, it hit thermal shut down. Of course, in most cases, with the proper cooling, your statement is correct.

Outside of that, I need to reread this thread for a third time... something isnt landing. If there was a magic equation, I'd imagine it would be known already...?

That's my point exactly: IT IS KNOWN in the world of thermodynamics! It is exactly the same, only I use Celsius instead of Kelvin and I use millimeters instead of meters.

It is known as the Heat Transfer Coefficient.

Also known as the Overall Heat Transfer.

But we are interested in the opposite, kind of: The heat removal, not how much is being transferred to the cold plate.

- - - Auto-Merged Double Post - - -

The actual block itself, the properties, do not change. Only the external variables (water temp for one).

If your copper block dissipates heat at (random values, note) XXW /k it will still do so. What is changing the results/temps are the other variables.

I think what will help clear things up is if you guys capture your own data sets and see what happens.

Remember, this is not a BROAD BRUSH STROKE UNIVERSAL RATING such as the speed of your CPU.

This is the Heat Removal Quantifier at that snapshot in time.

I think what everyone is thinking about is "What is the max(HRQ) of my cooling solution?"

The formula is not returning the max, it is returning snapshot data.

Like how CinebenchR20 does not return the same score every time, but it is close on each run.
 
Last edited:
ut we are interested in the opposite, kind of: The heat removal, not how much is being transferred to the cold plate.
I've been in this game so long, at this point, I try to stay away from the minutia. I can look at a result and know what I should expect and have a good idea of the science behind it. :)

This is certainly interesting stuff, but been there, done that (and so has Martin's and Skinee labs back in the day - where many of us old timers learned about watercooling).

For example, if your fans spun at 10,000 RPM, most likely you would not see an improvement in observed CPU temperatures.
This depends on several factors though... radiator fin density, how the fan scales with CFM and static pressure,

This is true only if the cooling solution as a whole can keep pace with the heat load. As I alluded to earlier, at 5.2 GHz x 16 threads the H150i did not have the fluid reach steady state. The temperature ramp was such that after 8 minutes, it hit thermal shut down. Of course, in most cases, with the proper cooling, your statement is correct.
Chicken or the egg... is this a problem of the AIO or the CPU? As you said, the smaller dies with more things going on inside is the biggest factor in getting the heat out of the die. This is a problem with the CPU/TIM/IHS, not the cooler. There isn't much one can do about the die size/its ability to dissipate the heat. You can change the TIM and IHS and that helps, but inherently, today, you have a tiny die trying out output similar heat loads. It isn't about the fluid (reasonable amounts)... if you have double the fluid it will simply take longer to reach the same point.

Also, your tests I don't think yield a stable load. IIRC, prime 95 goes through different length FFTs which put a different load on the CPU. So it may be around that time something heavier hit.

What you would want is some test like a heatplate which can put out a more constant load.


EDIT: In other words, the die size and amount of heat it outputs does not change, nor do blocks vary much to overcome the hardships of extracting so much heat out of a little space.
 
Last edited:
The rate at which a block removes the heat doesnt change. If the block is capable of XXX Watts @ XX Degrees C delta... that wont change. What does change is the water temps over time.....which of course effects end temps. But they will both reach an equilibrium as blay said. But the block's properties do not change...just other variables around it (like water temp).

Outside of that, I need to reread this thread for a third time... something isnt landing. If there was a magic equation, I'd imagine it would be known already...?

But doesn't the water temp reflect equilibrium? For most people, they want their cooling solution to be good over time. That means the water temp should be at equilibrium for the solution to meet people's expectations.
 
But doesn't the water temp reflect equilibrium? For most people, they want their cooling solution to be good over time. That means the water temp should be at equilibrium for the solution to meet people's expectations.
Yes. That's what blaylock said and I mentioned as well. I'm just saying outside variables change, but the rate at which the block removes heat is constant. If the ambient goes down, or water temps are kept lower, so do temps, but the block is still moving the same amount of energy from one place to the other (right?).... and the tiny did is struggling to get heat out of...well, itself, in the first place.
 
I'm just saying outside variables change, but the rate at which the block removes heat is constant.

If a computer is just painting the desktop with nothing running, is "the block" removing the same amount of heat per unit of time as it would be if running on all cores and all threads with Prime95 cranking?

If there's not much heat to remove, there's not much work for the system to do.

The Thermosyphon I built can handle up to 350 watts.

If I am idling at 50 watts, it's not removing 350.

I would say, instead of your remark, a SYSTEM can handle up to its MAXIMUM heat removal capacity at any moment in time.

This does not mean it is removing the heat at a constant rate.
 
Chicken or the egg... is this a problem of the AIO or the CPU? As you said, the smaller dies with more things going on inside is the biggest factor in getting the heat out of the die. This is a problem with the CPU/TIM/IHS, not the cooler. There isn't much one can do about the die size/its ability to dissipate the heat. You can change the TIM and IHS and that helps, but inherently, today, you have a tiny die trying out output similar heat loads.

With this statement, I see what you do not understand fully. I will elaborate.

The HRQ equation treats the SYSTEM IN ITS ENTIRETY!

This includes the heat transfer through the lid, if the CPU is not delidded, the transfer through the TIM, whether it is good or bad, etc.

THIS IS HOW I FIGURED OUT MY CONDENSER WAS TOO SMALL FOR THE 5.2 GHZ MACHINE!

I measured the HRQ.

I added more fans. Then faster fans. Then larger and faster fans that covered more fin area on the condenser. Then I doubled the fan count with a push/pull config to lower the pressure drop across the fins to get even more CFM out of the air portion.

The HRQ did not change much. The system was BOUND by something other than the air cooling contribution.

I swapped condensers for a bigger one, more copper fins, greater surface area per fin, more vapor in the vapor loop, and the HRQ jumped by over 1500 points.

I could run the same tests with and without delidding.

I could run the same tests with another component as the single hardware item being swapped out.

This will allow me to remove the LIMITS of the cooling solution, one at a time, until it cannot be improved further.

I think you might also be confusing the role of the die size strategically. The die size merely matches the size of the COLD PLATE of the cooling solution. Change the die size, you need a different cold plate, or "block" as you call it. The smaller the cold plate, the harder that component must work in order to contribute to the heat removal. More nooks and crannies inside are needs to separate the working fluid so that it has time to cool it inside.

To summarize: The HRQ measures the COMPREHENSIVE performance of the system as a whole. Not just the radiator or the fans, etc. Every component along the path is accounted for by the equation.
 
Last edited:
You misunderstood what I'm saying... I think. :p

The rate at which the block is able to get heat from the cold plate to the water does not change. The load below it can change all it wants. In other words the properties of the block doesn't change... copper is copper and 385 W/m K.



Edit: whoa, double post...

I'm still not sure what you are e entirely after here... I'm going to take a break and kick for the different goal posts in a bit. :)
 
The rate at which the block is able to get heat from the cold plate to the water does not change. The load below it can change all it wants.

I agree 100%

You have stated explicitly now what was one of two ambiguous implicit possibilities previously.

Now I see what "constant" was referring to, I did a coin toss, and thought you meant the other one.
 
Ok.. walked round the car... kicked the tires, looked at the engine and underneath. I'm still not sure I want to buy it....sell me.

What value does introducing this equation bring to us? What does a snapshot in time tell us about cooling capacity? Results can be all over the map due to all of the variables not accounted for, right? What if I had the same hardware but better tim or block? Does this really show its potential? Or the current state only?

I'm just not sure how this, if accurate, is useful. It feels like arbitrary data. But perhaps I'm still not grasping it all. :)
 
Status
Not open for further replies.
Back