The Intel PIVs Thermal Diodes

An overview of how these work and what “CPU Thermal Throttling” does in BIOS. — Joe

SUMMARY: Intel’s PIV uses two thermal diodes – one to measure temps, the other to prevent the CPU from frying.

What follows is a condensation of Intel’s description for the PIV’s Thermal Monitoring circuit – these are in the smaller boldface type. My comments are larger and not boldface. I have shortened Intel’s descriptions and moved some descriptions around to (hopefully) make it a bit more digestible. These documents are available for download¹ on Intel’s site if you want the full treatment.

If you plan to buy a PIV motherboard, you should see settings in BIOS that say something like “CPU Thermal Throttling” with a range of settings you can select, usually from 12.5% to 87.5% in 12.5% increments. This is part of Intel’s Thermal Monitor system, the intent of which is to provide a safety valve in case the CPU is not being cooled adequately.

There are two independent thermal sensors in the Intel Pentium 4 processor in the 478-pin package. One is the on-die thermal diode. The other is the temperature sensor used for the Thermal Monitor and for THERMTRIP#.

The Thermal Monitor’s temperature sensor and the on-die thermal diode are independent and isolated devices with no direct correlation to one another. Circuit constraints and performance requirements prevent the Thermal Monitor’s temperature sensor and the on-die thermal diode from being located at the same place on the silicon. As a result, it is not possible to predict the activation of the thermal control circuit by monitoring the on-die thermal diode.

Intel uses the Thermal Monitor for two things:

  1. To turn the CPU off and on when things get too hot (Clock Modulation), and
  2. To shut the CPU down totally if temps get way out of hand.

Clock Modulation


By using a highly accurate on-die temperature sensing circuit and a fast acting temperature control circuit, the processor can rapidly initiate thermal management control through Clock Modulation, defined as periodically removing the clock signal from the processor core, which effectively reduces its power consumption to a few watts. A zero watt power dissipation level is not achievable due to transistor leakage current and the need to keep a few areas of the processor active.

Therefore, by cycling the clocks on and off at a 50% duty cycle for example, the average power dissipation can drop by up to 50%. Note that the processor performance also drops by about 50% during this period, since program execution halts while the clocks are removed.

Thermal Monitor Implementation

The Thermal Monitor is integrated into the processor silicon and includes a highly accurate on-die temperature sensing circuit, a signal (PROCHOT#) that indicates the processor has reached its maximum operating temperature, and registers to determine status as well as a thermal control circuit that can reduce processor temperature by controlling the duty cycle of the processor clocks.

The processor temperature is determined through an analog thermal sensor circuit comprised of a temperature sensing diode, a factory calibrated reference current source, and a current comparator. Each processor is individually calibrated during manufacturing to eliminate any potential manufacturing variations. Once configured, the processor temperature at which the PROCHOT# signal is asserted (trip point) is not reconfigurable.

The reason you see this in BIOS is explained below:

Operation and Configuration

To maintain compatibility with previous generations of processors, which have no integrated thermal logic, the Thermal Monitor circuit is disabled by default. During the boot process, BIOS must enable the thermal control circuit, or a software driver may do this, after the operating system has booted.

Enabling the thermal control circuit allows the processor to maintain a safe operating temperature without the need for special software drivers or interrupt handling routines. When the thermal control circuit has been enabled, processor power consumption will be reduced within a few hundred clock cycles after the thermal sensor detects a high temperature.

The thermal control circuit goes inactive once the temperature has been brought back down below the thermal trip point, although a small hysteresis (~1 °C) has been included to prevent multiple PROCHOT# transitions around the trip point. External hardware can monitor PROCHOT# and generate an interrupt whenever there is a transition from active-to-inactive or inactive-to-active.

Now this is very interesting; what this means is that you potentially could have a “smart fan” that monitors this circuit and, for example, speeds up when things get too hot and then slows down when it’s cooler.

The duty cycle is configurable in steps of 12.5%, from 12.5% to 87.5%. For any duty cycle, the maximum time period the clocks are disabled is ~3 µs. This time period decreases as frequency increases. To achieve different duty cycles, the interval between stopping the clocks is adjusted to achieve the desired ratio.

For example, if the clock disable period is 3 µs, and a duty cycle of ¼ (25%) is selected, the clock on time would be reduced to approximately 1 µs [on time (1 µs) ÷ total cycle time (3 + 1) µs = ¼ duty cycle]. Similarly, for a duty cycle of 7/8 (87.5%), the clock on time would be extended to 21 µs [21 ÷ (21 + 3) = 7/8 duty cycle].

OK – probably more than you wanted to know.

The Thermal Monitor is a way to keep the CPU active while it may be getting on the warm side. What happens if the heatsink falls off?

In addition to Thermal Monitor, the Intel Pentium 4 processor in the 478-pin package supports the same thermal management features available on the Intel Pentium III processor. These features are the on-die thermal diode and THERMTRIP# signal for indicating catastrophic thermal failure.

THERMTRIP

In the event of a catastrophic cooling failure, the processor will automatically shut down when the silicon temperature has reached approximately ~135 °C. At this point, the system bus signal THERMTRIP# goes active and power needs to be removed from the processor. THERMTRIP# stays active until RESET# has been initiated. THERMTRIP# activation is independent of processor activity and does not generate any bus cycles.

Now you look at all this stuff and wonder why Intel couldn’t do something simpler, like the good old in-socket thermistor. There’s a good reason why only on-die diodes provide the best safety valve:

Unfortunately, measuring temperature with a thermocouple on the processor package has some inherent disadvantages when using the resulting data to control a thermal management mechanism.

Thermal conductivity through the processor package creates a temperature gradient between the processor case and silicon. This temperature difference may be large with the silicon temperature always being higher than the case temperature. Since thermocouples measure case temperature, not silicon temperature, significant added margin may be necessary to ensure the processor silicon does not exceed its maximum specification [i.e., fry].

Thermal ramp rates, or change in die temperature over a specified time period, may be extremely high in high power processors, where ramp rates in excess of 50C/sec may occur in the course of normal operation. With this type of thermal characteristic, it would not be possible to control fans or other cooling devices based on processor case temperature.

By the time the fans have spun up to speed, the processor may be well beyond a safe operating temperature, which would render any increase in cooling capability useless. The Thermal Monitor resolves this issue by using a highly accurate on-die temperature sensing circuit and a fast acting temperature control circuit so that external thermocouples are no longer needed.

One Potential Problem

According to Intel’s Specification Update², for all steppings through D0, the CPU may hang if you set the throttle to 12.5% of 25%. The Intel “fix” is not to use these settings.

Software Controlled Clock Modulation using a 12.5% or 25% Duty Cycle may cause the Processor to Hang

Problem: The processor may hang while attempting to execute a floatingpoint instruction… This processor hang is caused by interactions between thermal control circuit and floating-point event handler.

Implication: The processor will go into a sleep state from which it fails to return.

Workaround: Use a duty cycle other than 12.5% or 25%.

This is a variation of the old doctor joke:

Guy goes to the doctor and says “Doc, it hurts when I do this.” Doc says “Don’t do that.” Ta-dum.

Anyhow, I wrote this up so now you know what these settings do in BIOS.

¹Intel ® Pentium ® 4 Processor in
the 478-Pin Package
Thermal Design Guidelines
August 2001
Document Number: 249889-001

²Intel ® Pentium ® 4 Processor
Specification Update
Release Date: December 2001
Order Number: 249199-016

Email Joe

Be the first to comment

Leave a Reply