The Evolution of Aftermarket Heat Sink / Waterblock Testing

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

The testing of CPU cooling devices in the electronics industry is an activity whose development has proceeded apace with that of the processors themselves. Some years ago the heat source was a low wattage small silicon die with abundant space around it. In contrast, today’s CPUs are characterized with vast heat generation, single or multiple sources within the silicon, a die that is ‘protected’ with a copper cover (the integrated heat spreader or IHS), and a close proximity to other heat-generating motherboard components. Additionally, the thermal grease initially used for the thermal interface material (TIM) between the CPU die and the HIS (TIM1) is now commonly replaced with a soldered interface. In short, there is a larger area with a higher heat flux, but less space for the cooling solution.

Naturally, cooling development has been forced to adapt to accommodate the changed conditions, and we have observed the progression from small aluminum finned sinks using a thermal pad, to fan assisted HSFs, to water cooling and phase change systems, to heat pipe enhanced HSFs, and most recently commercial thermosyphons. To aid in heat transfer, TIMs have also evolved from an adhesive pads to filled greases to solders to diamond composites and beyond. CPUs have continually and rapidly evolved, and it is clear that the requisite cooling methods and materials to provide more cooling in a smaller package have as well in parallel.

As the heat sources, interface materials, and heat sinks have evolved; so too have the methods of testing these cooling solutions. CPU performance is often heat limited; so better cooling can, and truly does in the overclocking world, result in higher CPU performance. But it is important to note that the testing necessary to define performance capability also increases with CPU heat generation complexity.

Today, the copper slug with a thermocouple is no longer capable of yielding test results predictive of the cooling device’s performance on a specific CPU. If aftermarket testing is to remain relevant to gauging the performance of current and future sinks, the test equipment and methods will have to be upgraded. This paper will discuss the challenge, and some possible approaches along with their respective limitations.

The Heat Source

The ‘heat source’ in a desktop pc is much more than the CPU, as is the ‘system cooling’ more than the sink that is affixed to the CPU. A somewhat simplistic starting point is the characterization of the CPU itself, then the CPU package, the socket and mobo (in terms of traces and planes), the mobo layout in terms of secondary heat sources, and finally the in-case airflow depending on the components and their arrangement in terms of secondary cooling. If this seems an exaggeration, go to Frys and physically compare the SLI mobos with the non SLI boards, noting as well the great differences even between the various SLI boards; many different worlds. One size does not fit all; cooling is addressed at a system level, but in general the higher the performance (heat load), the more sophisticated the solution.

This paper considers one element of the multiple heat sources and cooling system, the testing of the CPU heat sink using a specific methodology. The aftermarket ‘we’ inhabit is a strange place, not because the physics are different but due more to the use of sloppy and/or incorrect terminology. The engineering ‘world’ attempts to use words with rather strict meanings to facilitate communication, which is to say, to reduce misunderstandings.

A good example is the buzz phrase “realistic testing”; which a normal person would think means putting the sink on a CPU and directly measuring its cooling efficacy, the presumed metric being temperature. But things are not so simple; which temperature? And where? And by what method? To what degree of accuracy? Dare I add repeatability?

So when I hear “realistic” I think fake; in a real sense there is no such thing, only approximations. The pertinent question becomes: What are the assumptions that can be made for simplification and still yield a useful result? To answer this question the goal must be defined; what is to be tested and why?

For the purpose of this discussion we will consider the CPU heat sink as the device under test (DUT), but we will remain cognizant of the huge effect of the removal of significant on-board air flow when a waterblock is used in place of an HSF. Since some ‘enthusiasts’ ignore air flow, any analysis limited to waterblocks becomes most incomplete as a heat sink comparison because the effects of changed board level air flow are ignored.

Additionally, as an actual powered board is not (normally with a die sim) used, the thermal impact of variables such as big ‘universal’ mounting plates vs. narrow arms are ignored in terms of on-board air flow effects. This is true even for WBs. As can be appreciated, with each simplification and approximation the test becomes less realistic.

“Realistic Testing” is a goal, not a reality.

Let us consider a single core CPU with a single program used for loading purposes. This creates a somewhat constant heat ‘output’, but it is by no means isothermal (uniform heat). Unfortunately, with a “real” application, both the heat output magnitude and peak temperature location vary. This can clearly be observed in this article at Procooling. Also, consider multiple cores as they turn on and off and balance load and the matrix becomes complex indeed. Finally, all modern desktop CPUs are mounted in a socket and have an IHS, and both of those are points of contact for potential heat flow. The removable IHSs are soon to be a thing of the past, and thus there is no point in designing a new test platform for bare silicon dies.

So, the heat source the sink has to deal with is the total package: mobo planes and traces + socket and pins or contacts + pcb + CPU silicon + Tim1 + IHS + Tim2 + mounting parameters. From the silicon source, the heat is moving in many directions. At a 2005 IDF in Taipei the ‘secondary losses’ were generalized as being ~20%, but this is both board and sink specific. Thus, it should be obvious that there is no such thing as a ‘realistic’ test bench for anyone without a multi-million dollar budget; the goal then becomes a workable approximation of a CPU heat source.

The Thermal Test Vehicle (TTV)

This is not an unresolved ‘issue’ in the thermal management industry; a TTV is used, and there are several. The ubiquitous one is that of Intel, provided to their channel suppliers and regularly updated to reflect the characteristics of their evolving processors. The Intel TTV (of resistors) is provided in a CPU package on a dual socket ‘board’ which is a bare epoxy laminate; other than lateral socket convection losses, the entire secondary path is eliminated. Why? Since the purpose is to characterize sink performance on a normalized basis, variables not specific to the DUT are eliminated.

Each processor for which a given TTV is applicable has a correction factor by which the test results can be directly related to the expected actual component performance. Obviously if a TTV were used for a sink not based on air (flow) cooling, then the assumptions regarding ‘normalized’ testing would no longer be valid, nor would any specific correction factor relating the test result to expected device performance. This does not mean that the test is not valid, merely that the results require correlation with specific CPUs.

The equipment, procedures, and data requirements for TTV testing are defined. Test facilities are audited (I was at a ‘previous’ facility) and the submittal packages are reviewed. The measured parameters are Tcase (by means of a calibrated Type T TC soldered, or bonded, in a defined groove machined into the IHS) and the applied voltage and current as measured with calibrated instruments to characterize the input power. For an HSF the airflow can be provided with either a wind tunnel or a fan having a specified setup.

These requirements can be seen here HERE at Intel’s site (Appendix D, page 75, describes the TC placement in the IHS.) Tens, and hundreds, of millions of dollars are allocated for components and production facilities based on this testing, and only a fool would dismiss this serious and peer-reviewed methodology without thoroughly analyzing the TTV design and then having very concrete proof of a flaw.

There is a spurious argument circulating suggesting that the Intel TTV is inappropriate for testing WBs, but the TTV heat source cannot distinguish the sink’s utilization of air or water. The thermodynamics are quite the same since secondary heat paths are eliminated: heat moves from hot to cold, and the rate is related to the temperature differential and relative areas involved (across the resistance of Tim2). In industry, TTVs are used as heat source simulators for all cooling technologies: HSFs, WCing, thermosyphons, and phase change. Also, note that in industry, the aftermarket ‘waterblock’ is called a cold plate.

There are other ways to make a TTV; resistors and diodes (or RTDs) can be fabbed in wafers to actually make a silicon heat source, then packaged as a conventional CPU. This approach is much more expensive but enables the actual thermal mapping within the emulated CPU by process activity; for the size, power, and number of on-die CPUs being evaluated. These TTVs are not distributed, but to some are ‘loaned’ upon occasion. As the (very complex) board is powered, the sink test results correlate more directly to actual performance, but still have a correction factor. This is vastly more sophisticated, but still only a model that requires correlation to specific CPUs.

So the aftermarket ‘enthusiast’ question might be phrased: Accepting that ‘we’ cannot test with the accuracy (relevance?) of industry, what manner of testing is useful? Here “useful” is defined as meaning ‘yield a result indicative of device performance ON A CPU’. This relevance is perhaps limited to only small group of CPUs. But, the key is that the test results must be predictive of actual performance.
-> If the test results do not equate to the device’s actual performance, the test is worthless.

One could ask; “useful” for what purpose?

Two justifications have been proposed: product design, and comparative performance testing.
Note that for industry, there is a third category: product qualification testing.

Product Design

Is the range of applications such that a single product ‘design’ will be best for all applications? How is ‘best’ to be defined, and where does cost enter the equation?

There exists no design that is not an assemblage of compromises. Accepting these compromises as a given, a good test bench and associated methodology for product design testing MUST be able to distinguish the individual effects of design variations; this is referred to as parametric testing and the results are the inputs for theoretical modeling. Such equipment is complex and expensive, and requires a qualified operator to obtain repeatable results.

Of course products can also be ‘designed’ empirically; keep trying ’till you get it ‘right’, if ‘right’ can even be distinguished amidst all the other uncontrolled variables. The ‘looks good to me’ products are today only supported by uninformed buyers placing BS or bling over documented performance. Do not underestimate the importance of this group. They support many manufacturers in many industries, and this is what marketing and sales are all about.

Comparative Product Testing

The description of product differentiation is the purpose of comparative product testing; specific performance capability is one aspect, convenience, size, weight, and noise are some others. Considering only the thermal performance capability; a useful test bench and methodology must be able to yield results that are consistent with the devices’ actual performance on CPUs on a mobo.
ANY test bed can yield a number, but if contradicted by actual use – the test bed is junk.

There are several tiers of cooling devices on the market which can be nominally sorted by price, technology, size, and performance. Within those groups the actual performance differences may be quite small; to consistently distinguish between them will necessitate good equipment, good procedures, and multiple trials.

‘Junk testing’ is not only a disservice to the community, it can be damaging to the manufacturer. Reviewers who publish test results should be prepared to describe their test equipment, its calibration, the procedures followed, and a genuine defense of their test results, including their personal qualifications and experience. Just ‘real world’ kinda stuff, eh guys.

Product Qualification Testing

Though not the province of aftermarket ‘enthusiast’ testers, the general parameters of qualification testing are well worth understanding; some such data might even be useful despite the skepticism some have for manufacturers’ data. It is necessary to distinguish between words, a number on a retail product box, and numbers in graphs or on a spec sheet; they are intended for different audiences and likely have different sources.

In general; spec sheet data from reputable companies may be taken at face value, but do take the effort to read the test protocols to understand what the numbers actually mean and how so derived; unrealistic conditions will generate unrealistic results oft times used for consumer marketing programs. Notwithstanding such; tested products so described have been ‘qualified’ per the described test methods, and no reputable manufacturer remains in business presenting false data.

What is often the case is that a single ‘value’ is described and nothing more; some offshore companies are well known for this. Even more common are products presented with no data at all; does the WCing market come to mind (with a single exception)? Here can be seen the substitution of sales blather for qualification testing. Untested products cannot even be presented in the commercial marketplace, but they sure do well with ‘enthusiasts’.

So what do the different types of testing mean to ‘enthusiasts’?

Product Design Testing for ‘enthusiasts’ has not much relevance at all, though the data is always interesting to observe. Parametric testing is not product testing; for example it can be an assessment of the effects of pin shape, size, or spacing. Those active in product design will financially support that level of testing appropriate to their specific methodologies and design goals.

For a product designer extreme procedural rigor and numerical accuracy is an absolute necessity, repeatability must be demonstrated and continually verified. Product design testing is focused on new products and extremely confidential.

There is no overlap between the testing needs of product designers and those of end users. For an end user it is present product utility that is relevant, not ‘design’ blahblah used as a sales pitch.

Comparative Product Testing is the goal of aftermarket ‘enthusiast’ testing; in the real world of sinks or WBs on a CPU, which one works better? This is the question of relevance. How much one wants to spend on test equipment will define how well one can distinguish between the ‘close’ performers, its all about resolution. Worth observing is that good testing done at high resolution can define ‘specification winners’ whose real world performance differences are minimal; how much is that 0.01°C/W difference going to make ? (1.5°C @ 150W; for how many $s?)

Product Qualification Testing is the province of the manufacturer doing so to meet a standard or specification. But this data, if known for 2 products, can be quite significant – for it directly answers the comparative performance questions of interest to the ‘enthusiast’. Fan curves are one example, and pump curves are another.

(Reputable) Manufacturer qualification data will almost always be of higher reliability than ‘enthusiast’ data, so long as the test conditions are relevant. This is a consequence of available resources; equipment budgets, personnel qualifications, and time.

What are the present (CPU heat) die simulator options?

Focusing specifically on comparative product testing, the required information is defined by the units of thermal resistance, °C/W.

°C, temperature, seems straightforward, but specifically where and how is it measured? While “a” temperature could be measured in any place so long as it was consistent and repeatable, doing so would make every die sim unique (the present deplorable situation). A much better method is to define the location and measurement method so that all measured temperatures are directly comparable, no matter the test bench.

This can be done simply by defining the method described by Intel for their TTV as that required for all die sims; this is Tcase (Tc, by definition). AMD’s use of the same measurement can be seen HERE.

The groove will create a shadow; this is understood and can be measured experimentally by those so interested. As it will have the same effect on all sinks and WBs, its absolute value is not germane to comparative testing. (It is a well known value BTW, not sure if an NDA issue; in any case no biggie.) Note that this method works exactly the same for an IHS or a copper slug: same problem, same solution.

W, the power in Watts, is also not difficult so long as an independent heater (element) is used; a measurement of the voltage and current yields the (nominal) power input. This is the reason for the existence of Intel’s TTV, and the principal attraction of a copper slug with a cartridge heater even though the slug’s secondary losses are rarely, in fact, quantified

If available, the Intel TTV is the easy solution; but they are a channel only item. The (minimal) secondary losses with the TTV are of interest, but only in an academic sense as TTVs are not attainable.

Option #1, the current copper die sim ‘slug’.

Let us consider the ‘proven’ test method, a copper slug. The most benign assumption are that the heat flux is quasi-isothermal, and that the heat input is known (though seldom are the secondary losses quantified); but the temperature is typically poorly defined due to different TC or PRTD placements, resulting in bench specific C/Ws. This method worked reasonably well as long as the slug and silicon were similarly sized, and CPU dies were bare.

So, what is the ‘new problem’ today with using a copper slug as a die sim?

There are four problems warranting discussion:

The first has already been described, the lack of consistency in the temperature measurement location yielding different numbers. This is easily corrected by adopting the Intel Tc measurement method.

The second problem is the characterization of secondary (heat path) losses previously alluded to. Most slug testers have highly insulated the slug, but not all have actually quantified the actual losses (mine ran 2 to 5% and I would ‘correct’ the input power by this amount). The greater difficulty with this approach is the relevance of the insulated slug losses to the actual reduction in the applied power were the DUT on a powered mobo. Using the very general estimate of 20% for secondary path losses it is clear that copper slug die sim heat loads are probably overstated by 15% or more.

The third simple problem is the dimensional correlation of the slug die sim with the CPU die size, the heat flux, and the spreading due to the HIS. With some hairy assumptions and beer-mat calcs one could proceed – except, it all falls apart for dual core processors. This is not to say that one larger slug face could not be used to simulate a dual core processor, only that the results would no longer be predictive without extensive correlation. Which aftermarket tester has the resources to do such CPU specific correlations?

What was the purpose of the test? Testing with a copper slug will yield results based on a copper slug: What does that mean or imply with respect to a CPU? Or to dual CPUs? Is a specific correlation going to be established, or are the results to be accepted as gospel based only on hope and faith?

An illustrative example is the recent introduction of the Swiftech Apogee which was designed using a TTV. Swiftech described its performance as being nominally equal to the Storm when mounted on a CPU. But when the aftermarket testers compared the two WBs on several copper die sim slugs; the Apogee was a dog. (Note that different temperatures were being measured and ‘compared’.) But end users having both WBs reported that both were in fact similar in performance (if I correctly understood all the noise). So the question can be asked:

Should WBs be designed to perform well on a specific test platform?
Do we need a reality check here? Something is clearly amiss!

The fourth and very difficult problem is the (non) flatness of the slug face; and even when initially flat, the die sim face corners will round and a slope with erosion grooves develops. This is NOT trivial; I have measured the flatness with optical flats and tracked its degradation by the number of mountings. After 20 mountings there is noticeable degradation, after 40 it can be as much as 0.1°C (using a PRTD with 0.01°C resolution). The progressive degradation will trash the results of comparative tests done over time.

A baseline must be established and periodically verified; but without the flatness measurement that capability is a sham, and eventually the “comparisons” will just be a jumble of numbers. Furthermore, hand lapping cannot produce a ‘flat’ surface on an area so small. Even with a larger plate attached, it can take days to get the flat spot over the die (using an optical flat for verification). With a lapping machine, restoring the flatness is an hour or two; again with the requisite inspection capability – but who has a lapping machine and optical flats at their disposal?

One approach to the slug face flatness issue is to ‘relax’ the accuracy expectations. After all, if a TC is used, the resolution is only 0.1°C to begin with. But this is merely to make junk testing acceptable. Yet again, the fulcrum is the flatness inspection capability; without such what is known?

Option # 2, if a copper slug is ‘bad’, is a CPU worse?

The alternative to a slug is using a CPU on a powered mobo with the Intel TTV temperature measurement method. This would yield a repeatable Tc value, but introduces 2 other issues; the variability of Tim1 and the applied power to the DUT.

The variability of Tim1 seems to be dependent upon the CPU manufacturer; AMD CPUs with grease based Tim1 have a known problem, Intel with soldered Tim1 should not. The potential problem has 2 dimensions; durability (the AMD manifestation due to ‘extreme’ temperature cycling and/or repeated mountings) and unit to unit variation. The second issue is speculation as it has not been demonstrated (for the 775 LGA) – other than via the AMD Tim1 degradation.

At present, the only viable CPU option is the 775 LGA due to the socket retention method (AMD has a similar package not yet released); so if the same CPU is used for comparative testing the unit to unit variation issue is moot, and additional CPUs could be ‘qualified’ by retesting several items. Additionally, Tim1 degradation over time/mountings could be determined by periodic testing of a ‘baseline reference sink’, or two.

The measurement of the applied power is the real problem with using any CPU as the heat source, how much heat is actually applied to the DUT? While all kinds of assumptions can be made regarding secondary losses (and 20% is as good an estimate as any), the input power to the CPU simply is not known. But is this ‘unknown’ power repeatable? Is there a measure of processor activity that can be related to an operating program with a specific CPU on a specific board? Is this repeatable with a defined Vcore? Is this sufficient for comparative testing?

N.B. I do not know the answers to these questions as I have never performed any mobo testing; I am asking. This is a topic for discussion by those with actual experience.

“W” will be bogus, derived from the CPU frequency, voltage, etc.; so a C/W so derived would be specific to the particular CPU/mobo/freq/voltage/program combination. Is this any worse than what we now have with copper slugs? Based on my experience with non-flat die faces, I think not. And secondary path losses would be more realistically considered than by the present strategy of ignoring them.

If ‘manageable’ in this manner, the multiple core issues are then somewhat addressed as well; of course the test CPU will have to be upgraded as they progress from single, to dual, to quad core, etc. And of course there will be periodic mobo upgrades to correspond with the CPU evolution.

If using a CPU heat source, I believe the in-case air temps will need to be measured and reported to nominally characterize the secondary losses, no different than if using a radiator – though a chiller (for WCing) and an environmental chamber are what are appropriate for the highest accuracy and repeatability (moving beyond the more casual aftermarket testing methods however).

A notable advantage of a 775 LGA CPU used as a die sim is its durability, but this has two components 1) the IHS surface, and 2) the TC durability in its machined groove. WRT the IHS, I have made over a hundred mountings with no discernible degradation of the surface or its finish; I would speculate this is due to the nickel plating. Note that the IHS is not flat, nor should it be lapped to make it so. It is designed to maximize the contact pressure over the CPU while the vertical sides provide a stabilizing perimeter. (A lapped IHS should be considered as invalidating the test results.)

The TC durability will not be as good when bonded as if soldered, but the appropriate soldering fixture is beyond any aftermarket tester’s budget I suspect. It is however not difficult to verify that the TC placement is correct; simply verify the continuity of the copper TC lead and the IHS as was done during the initial TC bonding (or soldering) procedure. So long as the resistance remains negligible, all is well.

Can a slug and a CPU based test bench be compared?

Why not? There are several aftermarket testers who have test benches with all of the pieces, less perhaps the grooved IHS CPU, and the corresponding groove in the slug face. For comparative testing the specific CPU and mobo used are not critical, as long as they are the same for all comparisons. My personal preference would be for a known ‘hot’ CPU (buy 2, one for a backup – groove, bond, and cal both TCs together), and an SLI mobo having specific mosfet forced air cooling (the Abit Fatality AN8-SLI is one example, there are others).

For focus perhaps the general goal should be repeated: The intent of this suggested CPU sink testing procedure is to describe an aftermarket test platform which is affordable using available components, and at the same time will yield results reasonably corresponding to the expected performance of all types of cooling solutions installed on a powered mobo.

Anyone care to step up to the plate?
I’ll bet dollars to donuts there’ll be some sink/WB re-rankings resulting.

Appreciation is expressed to Brian Smith and Derek Peak for their suggestions and corrections; and most importantly to Lee Garbutt whose testing revealed ‘what’s really going on’.

be cool

Bill Adams CoolingWorks Inc.


Leave a Reply

Your email address will not be published.