10/28/99: See Comments from Lee Winde regarding MTBF and Temperature:”
There are several areas which, I believe, are lacking or need further explanation. My background in electronics is EE in ’67 and considerable R/D development for ten years in audio circuitry. I went on to professional school and have not worked as a EE in years; however, I’ve kept fairly current. All my comments are pre-computer yet are exactly what your CPU thermal lowering or degradation article describes. IC voltage regulators and several case styles or series pass regulators were a big problem in the early 1970’s both in mil spec gear and the areas I was involved in. My comments follow:
I’ve read a number of recent “descriptions” concerning the ill effects of temperature on life expectancy; however, I do not recall two important considerations being explained or raised:
What does excessive heat do anyway? Probably less than you imagine, since a CPU’s failure increase is a direct function of the number of thermal cycles and the CHANGE in temperature above ambient. One of the two mechanisms of failure is called “thermal fatigue” a misnomer since it is metal fatigue.
The second failure mechanism deals with absolute temperature. For each ten degrees C, chemical reactions double in their speed. This doubling is roughly accurate when referring to leakage currents. Leakage currents represent heat sources and further increase junction (CPU) temperatures. With a CPU you have the following three mechanisms of degradation:
1. Temperature Range: Delta T is a function of ambient and Steady State or equilibrium conditions. Too high a delta T and temperature goes up yielding increased leakage. Ic current flow that will increase junction temperatures and increase the total delta T per cycle. Minimizing the magnitude of Delta T is one of the most important design parameters. The degree by which the CPU etc. increase in temperature beyond ambient is a function of heat sink efficiency and rapidity of heat transfer. In an ideal world the heat sink’s temperature equals the junction temperatures.
2. Thermal Cycles: Degradation due to thermal cycling (turning the computer on and off) is one reason there is some validity for 24/7 operation. You have to balance mechanical fatigue against how many Kwh per year are spent running 24/7. Once a critical temperature is reached, leakage currents become a progressively larger and self “feeding” problem.
3. Voltage: When over-volting a CPU, the internal heat generation is considerably greater since increases in voltage normally yield an increase in current flow: Elementary my dear Watson – Volts x Amps equals Watts.
PS: Freezing CPUs also constitutes a delta T but at a reduced current flow due to the lower temperature reached with active cooling. Temp ambient is still the starting spot and the magnitude of delta T is directly proportional to failure rate.
10/8/99: Comments from Dan of Dans Data:
“I think you should note the way that CPU and other device mean-times-between-failures and, hence, lifespans, are estimated. If a CPU has a listed “lifespan” of 30,000 hours, that’s not because they’ve actually tested that core for that long; it’s 3.4 years. Instead, they test, say, 1000 CPUs for as long as it takes for, say, 20 of them to fail, and more or less assume that that is how long
it’ll take for another 2% of the remaining CPUs to fail, and on and on. If the results suggest that a CPU can reasonably be expected to live for 30,000 hours, that’s what they’ll put on the spec sheet.
I dare say they’ve got compensation curves to account for the increasing failure rate with component age, but they can only extrapolate from the data they get in the relatively short testing time, and from background info from previous CPUs which may or may not be particularly applicable to more sophisticated ones. I don’t know how much real-world data the CPU manufacturers actually apply to their lifespan estimates, which are after all essentially just marketing numbers, and not very important ones at that. Hard drive manufacturers have their own special megabyte; maybe CPU
manufacturers have their own special year :-).
This makes estimated lifespans less relevant than many people think. A device built with a bomb in it that blew up after 1001 hours of operation could have a MTBF of 250,000 hours, if the evaluation units were only tested for 1000 hours.
The upshot of this is that estimates of the probable life of an overclocked processor are not only dependant on the value of the fudge factor M, but also on how accurate the original lifespan estimate is. And, for current processors, I think the manufacturer’s lifespan figures should be taken with a large grain of salt.”
Original Article Starts Here:
More than a few emails to Overclockers.com seek answers to two questions – What does oveclocking do to my CPU’s life expectancy, and What is the optimum operating temperature for my CPU. The following equations should shed some light on these questions from a theoretical viewpoint.
Increase in Heat Due to Overclocking
CPUs dissipate heat at known rates. Intel lists these rates for each of its processors in their Developer Notes. For example, the Celeron 366 dissipates 21.7 watts. Any cooling solution must be able to effectively shed this heat load. Note that this heat is at spec speeds and voltages – when overclocking, more heat is generated than this.
The CPU Overclocking Heat Equation
Heat above spec can come from two areas: 1, Heat due to increased frequencies and 2, Heat due to increased voltage. Increasing bus speeds (frequencies) increases heat linearly and increasing voltage increases heat by the square of the voltage increase. This is represented by the following equation:
Pnew = Pspec * (Fnew/Fspec)*(Vnew/Vspec)^2
P = Power in watts
F = Frequency in MHz
V = Voltage
new = P, F and V at the new settings
spec = Intel’s published specifications for the CPU in question
For the Celeron 366, let’s assume we are hitting 550 MHz at 2.3 volts. Plugging these numbers into the equation:
Pnew = 21.7 * (550/360) * (2.3/2.0)^2
Pnew = 21.7 * 1.50 * 1.32 = 43.0 watts
Therefor overclocking this C366 raise the heat dissipated from the spec of 21.7 watts to an overclocked rate of 43 watts, a 98% increase. What then is the impact on CPU Life? This depends on the amount of heat generated and the cooling efficiency of your CPU cooler.
CPU Cooler Impact on CPU Heat
Let’s assume the CPU cooler you are using has a thermal efficiency of .35 c/w. This means that for every watt of heat dissipated by the CPU, its temp will rise by .35 degrees Centigrade – the lower the c/w, the better. The more efficient the cooler, the more heat is dissipated by the heatsink, hence the lower the CPU temp. The increase in CPU temp can be estimated as follows:
Estimated CPU Temp = CPU spec temp + (CPU watts * CPU cooler efficiency)
Estimated CPU Temp at spec = 25 + (21.7 * .35) = (25 + 7.6) = 32.6 C
Estimated Overclocked CPU Temp = 25 + (43 * .35) = 40 C
The Estimated Overclocked CPU Temperature is 40 C compared to the estimated spec rating of 32.6 C. So now we know the difference between running the C 366 at Intel’s specifications and the overclocked settings: 7.3 C. More heat will result in decreased CPU life.
Estimating the Impact of Heat on CPU Life Expectancy
CPU Life and Temperature are inversely related – the higher the temperature, the lower the CPU’s Life. This holds true for all integrated circuits – heat is the enemy! What this formula shows is just how this relationship works and its potential impact.
CPU Life = Normal Life Hours / [((273 + New Temp) / (273 + Normal Temp)) ^ M]
Now “Normal Life Hours” means that the CPU has some expected life at Normal Temp, say 30,000 hours. If the CPU is run at a higher temperature, CPU Life is degraded by the ratio of the New Temp to Normal Temp raised to the power of M. The number 273 is a constant in the formula. M is determined by real life temperature tests – The CPU is run at a constant 60 C, then 70 C, and the resultant decrease in CPU Life determines M. Let’s plug in some numbers and see what we get.
For this case, I have used 30,000 hours as “Normal Life” and calculated the impact of temperatures from 25 to 75 C, for 3 cases of M – let’s call the three cases the “Hardy” CPU, the “Average” CPU and the “Weak” CPU. The Hardy CPU is not too affected by temperature, while the Weak CPU wilts very quickly in the heat. The important point here is to demonstrate how heat can impact CPU Life over a range of conditions.
The Graph shows the relationship for the three cases outlined above, with “Hardy” on the top line and “weak” the bottom line. As you can see, depending on the CPU’s “hardiness”, CPU Life can be impacted a lot or not too badly. However, it is interesting to note that in all cases there is more absolute degradation closer to Normal Temp. For the three cases shown, the “Hardy” CPU, if run at 75C, will live for 13,814 hours, the “Average” CPU 6,361 hours and the “Weak” CPU for a puny 1,349 hours. Don’t buy that one!
Now don’t take these numbers as absolutes – these cases are representative and intended to show relative impacts of temperature on CPU Life. I don’t know what Intel’s “M” is and, as you can see, it plays a big role in projecting CPU Life. I would love to hear from someone “inside” who might be able to supply an “M” based on some testing. However, based on experiences with Intel CPUs, I think it is safe to say that they are a lot closer to the “Hardy” CPU than the “Weak” CPU.
Estimating the Overclocking Impact on CPU Life
Now let’s plug into this equation the impact on the C366’s life between running the Running the CPU at spec and overclocking it. A heatsink of .35 c/w will result in a CPU temp of 32.6 C at spec for an expected life of 23,321 hours (“Average” curve). Overclocking the C 366 with the same heatsink will result in a CPU temp of 40 C resulting in an estimated life of 18,359 hours, a difference of 4,962 hours. Now what does this means? If you run this C 366 flat out for an average of four hours per day, running at spec will result in failure in 16 years. Running the same CPU overclocked will result in failure in 12.6 years. How long did you keep you last CPU?
All of this assumes you are running the CPU flat out – in the real world, this does not happen. Most of the time the CPU is loafing along – if you use a CPU idle program like Waterall or CPU Idle, the CPU can be running at 1% of its rated power at rest – like when you are doing some word processing and pause to look at your work. The fact is that CPUs just do not work all that hard all the time, and as such life expectancy is increased by these under-powered conditions. In addition, there is “power-up” stress on the CPU just by turning your system on and off, which can decrease life even for a non-overclocked CPU.
There is a message here – Heat does degrade and in a measurable way. Heat will kill your CPU, and the more heat, the quicker. It should be your objective, overclocking or not, to run your systems at the lowest possible temperature. Whether or not the projected CPU Life coincides with economic life is a decision we each make and obviously impacts what you will tolerate as to how hot your CPU runs.
Now there is something interesting here, although I’ll be the first to say I don’t know how much it holds up, but if you run the CPU under Normal Temp, you should see an increase in CPU Life. How much? Well, If you run the CPU at 0C, the “Weak” CPU will last for 46,493 hours, the “Average” for 72,054 and the “Hardy” CPU for a whopping 173,059 hours, all compared to the “Normal” of 30,000 hours. We have all seen performance increases at reduced temperatures, so it is not unreasonable to see increased CPU Life as well.