Overclocking Sandbox: Tbred B DLT3C 1700+ and Beyond

Mr. Fri · Mar 8, 2004

hitechjb1,

You seem to understand advanced engineering concepts, do you know if anyone has done a DOE (design of experiments) to map out the best operationg space for a given set of overclocking parameters?

Based on your comments:

Each CPU has its own stable operation region (a multi-dimensional region). Vcore and temperature are one of the many variables. Highest possible voltage and lowest possible temperature give the upper bound (or best case scenario) of a CPU to sustain its max frequency which is determined by the total time delay between one of the slowest pair of latches in a CPU (critical latch timing). This shortest time determines the CPU cycle time (Tmin), and its inverse gives highest CPU overclocking frequency (fmax = 1/Tmin). The higher Vcore gives more drain current (Idsat) driving ability for a transistor to drive a given load (Cload), the time is given by

I'd think one could set up factors for Vcore, temp, frequency, %CPU usage, , etc, and measure the parameters used to monitor stability (time to fail, type of failure, etc.). Since overclocking is a combination of many factors, many of them interacting, then the best operation space could only be found by measuring the cross terms of the equations and not by setting OC parameters one at a time as is usually done. What do you think?

For those of you not familer with DOE methodology, here's a link to an example.

http://www.6sigma.us/DOEUnit1ReadOnly/doeunit11.html

><>Mr. Fri

hitechjb1 · Mar 8, 2004

felinusz said:
...
How does one go about roughly determining the "transistor current driving ability, loading, and delay of the slowest pair of latches in a chip" of any given processor? I know that with all the factors involved it would probably be near impossible.
...

Vcore vs processor frequency and cycle time

There is no simple formula to determine CPU cycle time or max frequency precisely, as there are statistical variations of the underlying semiconductor devices, complexity of the 100 millions of transistors and wires, and the exponentially many ways of interaction, operation uncertainty such as noise, temperature, voltage fluctuation, …. In CPU design, nominal or worst-case CPU cycle time are determined by electrical and statistical modeling and simulations of the underlying transistors and circuits.

One can attempt to understand such complex problem by using simplified models. A CPU or almost any logic chip is made up of logic gates (such as NAND, NOR, INVERTER, LATCH storage element, …). Basic logic operations are AND, OR, NOT. NAND refers to the composite of AND and NOT, with AND followed by NOT. NOR refers to OR followed by NOT. INVERTER refers to NOT, .... Logic gates are made up of transistors and connected by wires.

The operation sequence of a CPU or chip is organized into equally spaced time segments or cycle time T, which is synchronized by a clock of frequency f, f = 1/T. For example, for a 3 GHz CPU, f = 3 GHz or 3,000,000,000 Hz, T = 333 ps, 1 ps = 10^-12 s.

A CPU chip consists of logic gates and storage elements (LATCH or REGISTER). The entire circuit of a CPU chip can be basically viewed as many pairs (~ 10K - 100K) of latch storage elements. In between each pair of related storage elements there are many stages (typically 10-20) of logic gates, which form a register transfer logic path. Initial and computed logic values (or states) are stored and updated every cycle in these storage elements. When a CPU chip is active, at the begin of a cycle, the logic gates get logic values from the associated preceding storage elements (LAUNCH). The logic gates then perform logic operations, all in parallel, within each cycle (COMPUTE). At the end of a cycle, the computed logic values are stored in the corresponding succeeding storage elements (STORE). A STORE logic value in a storage element may become the LAUNCH logic value for the succeeding logic path for the next cycle so logic values (or states) can proprogate through the chip between functional units. This LAUNCH-COMPUTE-STORE operation repeat each cycle indefinitely as long as the CPU is active.

At a given temperature, the minimum cycle time (Tmin) of a CPU chip is determined by the minimum time (Tpath) taken to compute and propagate the logic signal through the slowest pair of storage elements (or slowest register transfer logic path) in the chip.

Tmin = min max Tpath (maximized over all related register transfer logic paths)
Tpath = d_launch + d1 + d2 + … + dN + d_store
fmax = 1 / Tmin

where d_launch, d1, d2, …, dN, d_store are respectively the delay of the launching storage element, the N logic gates, and storing storage elements in a register transfer logic path. The delay (the various d’s) is given by

delay = Vcore Cload / Idsat
Idsat = k (Vcore – Vt)^n, where n is between 1 and 2

where Idsat is the current driving capacity of a logic gate, Cload is loading of a logic gate, Vt is transistor threshold voltage, k is some constant describing a logic gate. The current Idsat determines the the logic switching speed.

So before the heat and temperature constraints kick in, to the first approximation, typically n = 1.5 - 2 in the above equation,

For n = 1.5,
delay sqrt(Vcore - Vt) = constant
frequency / sqrt(Vcore - Vt) = constant

For n = 2,
delay (Vcore - Vt) = constant
frequency / (Vcore - Vt) = constant

The term Vt stands for the threshold voltage of the transistors in a chip. So chips with lower lower threshold variation would be able to run at higher max frequency. I suspect that the lower voltage chips such as the Tbred B DLT3C and Mobile Barton have lower Vt variation, hence can be clocked faster given the same voltage.

At a given temperature, the higher the Vcore is, the smaller the delay in a logic gate for a given circuit, hence the smaller the cycle time T can be, or the higher the clock frequency f can be. As seen above, to the first order, without the contraint of heat, the maximum clock frequency varies somewhere, from with the square root of voltage to linearly with voltage. This is the good news - higher voltage can get it to run faster.

The bad news is, the higher the Vcore, the active power (C Vcore^2 f) and leakage power (Vcore^2/R) would also increase, where C and R are the equivalent capacitance and resistance to model the chip power dissipation. Up to a point, the slow down and instability effects due heat and temperature increase outpace the gain in clock speed, and diminishing return occurs. There is a delicate balance between voltage and temperature at optimal overclocking.

To summarize, Vcore improves the maximum CPU clock frequency (fmax) for a given CPU chip with same architecture and circuit under certain operating condition of temperature. So with sufficient voltage, within a (very) short period of time (1/1000 – few seconds), the CPU should be able to operate at such maximum frequency (fmax), before temperature begins to rise, leading to instability unless sufficient cooling is incorporated to limit any adverse effect of temperature on electron mobility, higher leakage current and instability perturbation on electronic components.

hitechjb1 · Mar 8, 2004

Update and repost for better continuity, last post should come first.

Here trying to explain why lower temperature is needed to run stably at higher Vcore and high frequency.

What is CPU stability

Stability is a function of Vcore and temperature, Vcore line fluctuation, .... Lets limit to Vcore and die temperature first. The stable overclocking temperature of a CPU may be different (much lower) than the maximum temperature specified in the data sheet. In general, the higher the overclocking voltage needed to maintain certain frequency, the lower the overclocking temperature has to be in order to maintain stability.

Each CPU has its own stable operating region (a multi-dimensional region). Vcore and temperature are one of the many variables. Highest possible voltage and lowest possible temperature give the upper bound (or best case scenario) of a CPU to sustain its max frequency which is determined by the minimum total time delay between one of the slowest pair of latches in a CPU (critical latch timing). This shortest time determines the CPU cycle time (Tmin), and its inverse gives highest CPU overclocking frequency (fmax = 1/Tmin). The higher Vcore gives more drain current (Idsat) driving ability for a transistor to drive a given load (Cload), the time is given by

time = Vcore Cload / Idsat

More current means shorter time and hence shorter cycle time Tmin. Hence the CPU can be clocked faster up to certain real limit, the good side. But the opposing side is that the active power Pactive would also increase drastically and raise the CPU temperature. Pactive = C Vcore^2 f, where f is the overclocking frequency, C is the equivalent CPU capacitive load.

At or very close to the maximal overclocking frequency, the CPU is walking at the edge of that stable overclocking region, beyond which some logic operations would fail and the system would crash. In other words, a small perturbation of temperature, current, … at maximal overclocking would lead to timing failure (e.g not meeting the latch timing mentioned above) or logic error and resulted in system crash. In practice, depending on the cooling and base temperature of the setup (e.g. ambient temperature for air, cold water in-take for water, …), there is a lower limit on temperature at highest possible Vcore for a particular chip, so temperature plays a role in determining its stability. The actual stable operating point would be in between these lower and upper limits for temperature and voltage.

Many chip and transistor properties depend on temperature, e.g. electron thermal energy/velocity, mobility, … in the electron scale; threshold voltage, leakage current, resistance, signal jitters, voltage line fluctuation, … in the circuit/chip scale. When the temperature is very low, the perturbation is small, i.e. the chance of derailing the CPU is relatively small. When the temperature rises higher, the perturbations to the CPU are even larger, hence a bigger probability of bringing the CPU outside of its stable overclocking region. The width of derailment path of a CPU from its stable overclocking region is a function of temperature and it increases as temperature increases.

At certain temperature, with certain Vcore and frequency, the operating point is determined by Vcore and frequency, as well as the possible perturbations which are functions of temperature. The stable operating point of voltage and frequency should be within the stability region, which is the maximal region of voltage and frequency less the perturbation due to temperature. Temperature perturbation increases with temperature.

R_stable_voltage_frequency + R_temperature_perturbation <= R_max_voltage_frequency

where R_... stands for operating region.

Or one can also write, at certain temperature and voltage,

max_stable_frequency + temp_sensitive_frequency_margin = max_frequency

The term temp_sensitive_frequency_margin varies with temperature, it accounts for the margin that is needed to run at certain temperature. The higher the temperature, the bigger the margin would be needed, and the further away the stable operating frequency from attaining the maximum ideal frequency for a given voltage.

In order to get the (long term) stable voltage and frequency to as close to the intrinsic maximum overclocking frequency of a setup, the temperature perturbation part would have to be as low as possible.

This explains why one can apply a higher voltage to boot up the computer or OS or run some programs close to the ideal intrsinic maximum frequency for a (very) short period of time. But if temperature is not low enough, it will sooner or later crash, since the probability of instability which varies with temperature is not close to zero. In other words, the operating point + temperature perturbation is outside the maximum operating point.

In summary,
- Vcore primarily determines the ideal CPU max overclocking frequency at a given temperture, based on transistor current driving ability, loading, and delay of the slowest pair of latches in a chip.

- Temperature plays a role in determining the overall (longer term) instability, or probability of derailment of the CPU and system at a given Vcore and frequency.

- Temperature perturbation (or probability of instability or derailment) increases with temperature. The degree of stability of an operating point measures how tolerable component variations are before resulting in logic errors, and it varies inversely with temperature.

Examples:

- One can boot up system or even run Prime95 for a very short time at a much higher frequency, with enough Vcore. It means the Vcore is able to satisfy the minimum condition at a high frequency, but temperature and current fluctuation may bring system down eventually if certain condition is not met.

- At highest overclocking, a CPU and system have minimal tolerance to perturbation since a small perturbation would result in logical errors. The small perturbation requirement translates into a lower operating temperature requirement for the CPU.

- The further away from maximal overclocking (e.g. much lower Vcore), a CPU and system have more tolerance to perturbations, component fluctuation, …, which translate into higher allowable operating temperature.

- A CPU can run stably at a much higher temperature (e.g. 60+ C), at a lower Vcore and lower frequency (e.g. 1.4 – 1.6 V, 2.2 - 2.3 GHz for Tbred B/Barton) than its intrinsic ideal max frequency.

- A CPU needs a much lower temperature (e.g. under 30 - 45 C on air or even lower for extreme cooling) to run stably at high Vcore for sustaining a higher overclocking frequency (e.g. 1.8 – 2.0+ V, 2.5 – 3.0+ GHz).

hitechjb1 · Mar 9, 2004

felinusz said:
...
Is this the unique attribute which makes one processor a unusually good overclocking chip, and another of the same week and stepping a cludge?
...

Why some processors (of the same type) can be clocked faster?

Why for CPU chips of the same architecture and circuit (such as XP, Barton), and even within the same silicon process (such as Tbred A, Tbred B/Barton, Barton Mobile), the overclocking frequency can vary to such a great extent. I think this is mainly due to variations in a silicon process, which I think we usually refer to such grouping as “stepping”. Assuming the same nominal silicon process and underlying circuits for a given CPU type, the key parameters that can affect the overall speed of a CPU are the transistor threshold voltage, transistor channel length, and to a lesser extent dopant concentration, electron mobility, in a silicon process. Channel length variation can even occur within the same wafer. Lower threshold transistors have faster speed since Idsat (discussed earlier) is higher, hence smaller transistor delay and shorter cycle time. Shorter channel length (resembling future technology shrink) also leads to higher Idsat and smaller transistor delay.

I think the high overclock ability of Tbred B DLT3C 1700+/1800+ is mainly attributed to the lower threshold and shorter channel length in the chips due to process variation. I don’t think there is circuit difference between these Tbred B DLT3C and the other Tbred B (DUT3C and DKT3C) at that time.
Lower voltage, shorter transistor channel length, lower transistor threshold voltage and Tbred B 1700+/1800+DLT3C (page 15)

In the case of the latest and famous Mobile Barton, I think at least:

(1) They should have the lower transistor threshold voltage and/or shorter channel length, like the Tbred B DLT3C, as they can operate as about the same or slightly lower voltage, while delivering similar maximum frequency at the low end of voltage. They both are from 130nm technology. The frequency per Vcore slope seems to be about the same (to be confirmed in detail).

(2) Further, the Mobile Barton performs better at the high voltage range (1.9 – 2.2 V) than the Tbred B DLT3C which tops at (1.9 – 2.0 V). I think this may be attributed to better handling of leaking current (V/R), not just active current (CVf), at the high end of voltage range (1.9 – 2.2 V), so the temperature sensitive leaking current is under better controlled in these Mobile Barton chips.

The chip can ride the overclocking curve higher by taking more voltage instead of topping out due to diminishing return at around 2.5 GHz, as in the case of Tbred B DLT3C under similar voltage.

(3) Such better handling of leaking current and power at higher voltage could be due to internal circuit improvement, circuit and architectural power management in these Mobile chips (just conjecture), as such good high voltage behavior was not observed in the non-Mobile Barton. We can also rule out the explanation of larger die area of Barton for heat dissipation at high voltage.

hitechjb1 · Mar 11, 2004

This is an update.

Why frequency and voltage are important for overclocking performance

CPU and motherboard FSB operate and repeat operations at a fixed time interval called cycle time, and so they operate a fixed number of operations per second called frequency.
The unit of time is second, and the unit of frequency is Hz. Hz stands for Hertz, and is the same as cycle per second.

The relationship between frequency and cycle time is
f = 1/T

Today, typical CPU operates around 2.5 - 3.0+ GHz, meaning it repeats 2,500,000,000 - 3,000,000,000+ operations every second.
Typical FSB of mothboard and system memory operates around 200 - 250 MHz, about 1/10 times of CPU. (1 MHz = 1,000,000 Hz).

So it is apparent that the faster the frequency (MHz), the more operations the CPU can repeat every second, the more computer instructions it can do per second.
For the same programs with the same amount of instructions to execute, a higher MHz CPU can finish it sooner, hence faster turn around in time.

Same is true for moving memory data, video data, ..., over the FSB. Higher FSB frequency would enable programs that require lots of memory access, video subsystem access to finish sooner in time.

Overall_performance = A FSB + B CPU

where A and B are some constants, CPU and FSB stand for frequency of CPU and FSB.

For CPU intensive programs where everything can reside in the CPU cache (e.g. small kernel code, inner loop code)
A ~ 0
Overall_performance = B CPU (only CPU frequency is important)

For memory intensive programs such as large matrix computation, video compression in which lots of memory and video access are needed, B is also important, so is FSB.

In general, A and B are non-zero, so both CPU and FSB frequency are important.

How does Vcore improve CPU frequency?

When certain higher Vcore is applied to a CPU or any logic chip, the current, called Idsat, in transistors would increase, hence the transistors can operate the logic gates faster. The delay of a logic gate, transistor current (Idsat), voltage (Vcore), loading (Cload) are related by

delay = Vcore Cload / Idsat
Idsat = k (Vcore - Vt)^n

where Vt is transistor threshold voltage and k is some constant, n is between 1 and 2.
Simply speaking, the higher the Vcore, the higher the current Idsat, the smaller the delay (hence chips run faster).

It can be shown that for a given temperature and without temperature constraint, increase Vcore can speed up frequency about linearly, that is the higher the Vcore, the higher the max frequency.

frequency / Vcore ~ constant

Originally posted by hitechjb1
At a given temperature, the higher the Vcore is, the smaller the delay in a logic gate for a given circuit, hence the smaller the cycle time T can be, or the higher the clock frequency f can be. As seen above, to the first order, without the contraint of heat, the maximum clock frequency varies somewhere, from with the square root of voltage to linearly with voltage. This is the good news - higher voltage can get it to run faster.

The bad news is, the higher the Vcore, the active power (C Vcore^2 f) and leakage power (Vcore^2/R) would also increase, where C and R are the equivalent capacitance and resistance to model the chip power dissipation. Up to a point, the slow down and instability effects due heat and temperature increase outpace the gain in clock speed, and diminishing return occurs. There is a delicate balance between voltage and temperature at optimal overclocking.

To summarize, Vcore improves the maximum CPU clock frequency (fmax) for a given CPU chip with same architecture and circuit under certain operating condition of temperature. So with sufficient voltage, within a (very) short period of time (1/1000 – few seconds), the CPU should be able to operate at such maximum frequency (fmax), before temperature begins to rise, leading to instability unless sufficient cooling is incorporated to limit any adverse effect of temperature on electron mobility, higher leakage current and instability perturbation on electronic components.

For details:

What is cycle time and frequency

Frequency, clock, period of synchronous operations, latency

Latency

Analogy on Bus Speed, Bandwidth and Latency

Analogy for FSB, CAS2, CAS3 latency and bandwidth for DRAM memory

Memory bandwidth efficiency

Vcore vs processor frequency and cycle time (page 19)

What is CPU stability (page 19)

hitechjb1 · Mar 14, 2004

This is from discussion in another thread, reorganize and put it here.

In the past, CPU frequency (MHz) roughly doubled for each generation of technology (180 nm, 130 nm). For 90, 65 nm and beyond, it would be harder and harder to achieve such trend due to the leakage current component which will surpass the active current component, ..., as possibly explained below.

How does leakage current slow down future generations of chips

In the past twenty years, chip manufacturers had a relatively easy time by doubling the CPU frequency every two to three years by shrinking the dimension (feature size) of transistors and wires inside a chip.

There are two main components of electric current inside a chip:

- active current: the "good" component that does logic computations by charging and discharging the internal capacitors of transistors and wires via the internal transistor switches

- leakage current: the "bad" component that is not computation related, and leaks through the transistors from supply voltage to ground, and dissipates as HEAT

E.g. a Tbred B 1700+ at rated 1.5 V, 1.47 GHz, draws about 30 A which is higher than the current of a typical house circuit breaker (which is typically 20-30A). When it is overclocked to 2.5 GHz, 1.9 V, it would draw about 65 A at full load which is more than 2-3 times the current of a house circuit breaker !!!

Historically, active current is the major current in a chip, so when more power is put in, the chip can run faster and does more computation.

From 90 nm, 65 nm and beyond, due to the smaller transistor channel length and thinner transistor oxide thickness, the leakage current increases at a faster rate and will surpass the active current. As a consequence, even when more power is put in, the chip frequency would increase at a slower pace than heat increase, the chip speed would level off due to heat. This is one of the major hurdle for silicon scaling to 90, 65 nm and beyond.

What is channel length

Originally posted by hitechjb1
...

As the transistor size (channel length) of future generations of silicon chips are scaled down to, e.g., 90, 65, 45, ... nano-meter (nm) (e.g. Hammers are 90/130 nm SOI, TBred B is 130 nm, Palomino is 180 nm), the supply voltage, transistor channel length and threshold voltage will be lowered accordingly. Even the supply voltage is lower, the transistors run faster, both current and power density also increase (actual trend). As the transistors are scaled down, logic gate delay decreases, both the active power density (W/cm^2) and the passive leakage power density (from both gate and subthreshold leakage) increase.

The passive leakage current component increases at an even faster pace than the active current, posing problems on cooling and power dissipation for future generations of chips. If this trend continues, the high passive, standby leakage current will lead to high power drawn and high idle CPU temperature, compared to today's CPU, even when the system is idle and the CPU is not under heavy load.

...
What is channel length of a MOS transistor (page 14)

For details about channel length variation and overclocking:
Lower voltage, shorter transistor channel length, lower transistor threshold voltage and Tbred B 1700+/1800+DLT3C (page 15)

Originally posted by hitechjb1
Relationship of clock frequency, die temperature, power and voltage (update)

As far as Vcore, clock and die temperatue relationship, a chip (CPU) can be modeled as a capacitor C and a resistor R in parallel driven by Vcore. C models the useful active power to substain the computation by charging and discharging 100 millions of internal capacitors (from coupling between transistors, wires and silicon substrate). R models the wasted leakage power through the internal current paths through the dozens millions of transistors.

If the die temp is kept low enough, in theory, todays XP and P4 can be clocked as high as 3 GHz, 4 GHz. The power (the C component) going into the chip to run the clock at a frequency f and Vcore V is given by

P_active = C V^2 f

And this can go on to 3-4 GHz if the die is kept below certain temp. Most of the power are used to power the clock faster as Vcore is increased.

But in reality, for any cooling used, air, water, vapor, liquid nitrogen, ..., the die temperature will eventually increase as Vcore increases due to leakage current which heats up the chip. Though at a different rate depends on what cooling is used. The leakage current is small at low temp, and increases with temp increases and also at a faster rate as temp increases. The power that heats up the chip (the R component) is given by

P_leak = V^2 / R

From my experiment with the TB B 1700+ DLT3C, when die temp reaches around 40C, the chip leakage current begins to increase at a faster pace, and heats up the chip more, as well as due to the higher active power component P_active. Once this starts, any Vcore increase will heat up the chip at a faster pace. The exact Vcore when this occurs varies from chip to chip (100-200 mV difference), it depends on certain properties and characteristics ("gene") of how a particular CPU was born in silicon.

P = P_active + P_leak = CV^2 f + V^2 / R

After passing that temperature threshold, the portion P_leak going into heating the chip (the R component) will become larger and larger, as Vcore is increased. The additional power supplied to the CPU will be wasted as P_leak instead of going into the useful P_active. In other word, the useful P_active to power the chip faster (the C component) will increase at a diminishing rate. And the chip is just being heat up, and in turn slow down the chip, and cannot be clocked faster any more.

...

For details (about how to compute power, ...):
Relationship of clock frequency, die temperature, power and voltage (update)
- What is the active power of a CPU at frequency f and voltage V
- How to estimate CPU static and active power
- Effect of die temperature on CPU clock frequency at a given Vcore
(page 13)

More about leakage current and leakage power

In a silicon chip, the lowest part is silicon substrate on which 10-100 millions of transistors are deposited (current technology). Above the transistors are 100's millions wire segments in the form of multilayer grid. The metal wires are for getting power from outside, signals in and out the chip, and passing signal around the chip to the transistors.

The bulk of the silicon substrate is connected and typically grounded. Such silicon structure is usually called bulk silicon. This is what silicon chip in the past and down to 130 nm silicon chips are like. Currents also leak through the transistors to the substrate.

From 90 nm and down (some 130 nm are SOI), most of the silicon chips have the silicon body insulated from the substrate, hence the name silicon on insulator (SOI). So the leakage currents through transistors to the substrate are significantly reduced. This is the good part.

BUT the bad news is, ..., the main part of the leakage current in bulk silicon and SOI is due to the internal leakage current through the 10-100 millions transistors. Transistors have p- and n-type. Inside a chip, between the power supply (VDD) and ground, there are 10's millions of transitor paths, made up of some p- and some n-type, and leakage current are constantly flowing through those paths. This is called leakage current, or OFF current (since ideally the path should be off). So the leakage power can be written as V^2 / R, V is voltage (typically VDD), R respresents the equivalent resistance of all those leakage path. In older generation of silicon, these leakage paths and leakage current are relatively small and had not been an issue.

As transistors are getting smaller and smaller (90, 65, 45 nm), and transistor gate oxide thinner and thinner, these leakage currents are getting larger and larger (relative to the normal active current used for switching). And as described in the last post, the "wasteful" leakage current will be larger than the "useful" active current, unless something can be done. So the power for computation relative to leakage power is getting smaller for each generation, and frequency gain per generation will be leveling off.

What is active power

active power = C V^2 f?

C is the equivalent capacitance of a chip (CPU) for power modeling, V is the voltage (Vcore), f is the frequency.

Inside a chip, there are 10-100 millions of transistors, almost everyone functions as a logic switch. Logically, all these transistors switch according to the flow of instructions, logic commands and logic functions, .... Electrically, each of these transistors charges or discharges some capacitor(s) that are connected to them. At full load, a large % of these 10-100 millions transistors are charging and discharging capacitors. For those that are charging, useful active power given by Cload V^2 f is dissipated for performing logic computation. Cload is the capacitance loading to a transistor.

So the total active power C V^2 f is the sum total of those 10-100 million active power. And C is the equivalent capacitance for the chip, the equivalent sum total of the small Cload's. What it means is that, more active power is needed to run faster (higher f).

mongoled · Mar 18, 2004

hitechjb1,

WOW mann that was amazing, your are indeed knowledgeable

:attn:

I will have to read that again another time when im not so stoned, its a bit difficult right now

Thanxs

mong....is mongging

hitechjb1 · Mar 19, 2004

c627627 said:
Side question I've been trying to ask hitechjb1 for a while: how do you illustrate AMD MHz vs Intel MHz to someone who has litlle or no computer knowledge, what's the everyday comparison example you use?

(I know everyone else has their stories, I'd like to hear hitechjb1's, please.)

If someone has little or no computer knowledge, I would not try to compare in terms of "AMD MHz" and "INTEL MHz", ....

I assume if trying to convince them to buy AMD or Intel .... that would be a marketing job, ..... Intel and AMD have lots of these marketing info, same as car makers describing the mechanics and performance.

If one knows about science and technology, one would say most marketing materials are not complete. But for a general consumer, the marketing materials serve some purposes and it is the consumer's call based on what they understand. Simplified analogy would be as good or not as good.

Trying to use MHz, bandwidth, ... may be as revelant and as confusing for them. I would not try to get through technical explanation, such as
- Intructions per cycle (IPC)
- pipleline (and depth)
- integer and floating point arithmetic benchmark
- cache performance, memory bandwidth
etc, etc

Having said that for that question, ....

What is IPC and how to compare cycle or Hz for different CPU architectures

A CPU has many functional units, such as integer unit(s), floating point unit(s), instruction decode unit, control unit, instruction schedulers, register files, cache, ..., for executing instructions (compiled program codes) and performing computations. More and more functiions are integrated into a CPU as transistors are shrinked smaller in size in each new generation of silicon technology. As a result, multiple instructions can be executed during a CPU cycle.

IPC stands for instructions per cycle, the number of integer or float point instructions executed per clock cycle in a CPU. In CPU arithmetic benchmarking, a set of defined (CPU/cache intensive with minimal memory access) programs are executed to measure the average instructions per cycle.

Based on Sandra CPU arithmetic reference CPU (these numbers may vary for different Sandra versions, so don't take it as absolute).

XP Dhrystone integer IPC = 7829/2080 = 3.764
XP Whetstone floating point IPC = 3180/2080 = 1.529

Comparing with a P4B,
P4B Dhrystone integer IPC = 8164/3060 = 2.668
P4B Whetstone floating point IPC = 1717/3060 = 0.561 (w/o SSE2)
P4B Whetstone floating point IPC = 4009/3060 = 1.310 (w/ SSE2)

Ratio between XP to P4B (w/o SMT):
Dhyrstone integer IPC = 3.764/2.668 = 1.41:1
Whetstone floating point IPC = 1.529/0.561 = 2.73:1 (w/o SSE2), 1.529 / 1.310 = 1.17:1 (w/ SSE2)

Comparing with a P4C w/ 2 SMT,
P4B Dhrystone integer IPC = 9858/3200 = 3.081
P4B Whetstone floating point IPC = 4062/3200 = 1.269 (w/o SSE2)
P4B Whetstone floating point IPC = 7139/3200 = 2.231 (w/ SSE2)

Ratio between XP to P4C (w/ 2 SMT):
Dhyrstone integer IPC = 3.764/3.081 = 1.22:1
Whetstone floating point IPC = 1.529/1.269 = 1.21:1 (w/o SSE2), 1.529/2.231 = 0.69:1 (w/ SSE2)

Using another version of Sandra (see next post), to compare with a P4C (without SMT)
XP Dhrystone integer IPC = 8404/2200 = 3.82
XP Whetstone floating point IPC = 3465/2200 = 1.575

P4C Dhrystone integer IPC = 7869/3200 = 2.459
P4C Whetstone floating point IPC = 2365/3200 = 0.739 (w/o SSE2)
P4C Whetstone floating point IPC = 4325/3200 = 1.352 (w/ SSE2)

Ratio between XP to P4C:
Dhyrstone integer IPC = 1.55:1
Whetstone floating point IPC = 2.13:1 (w/o SSE2), 1.16:1 (w/ SSE2)

That is, for executing codes specified in each of the benchmarks,
e.g. comparing an XP with a P4C w/ 2 SMT,
- For Dhyrstone integer arithmetic, 100 AMD XP cycles will do the same computation as about 122 Intel P4 cycles
- For Whetstone floating point arithmetic, 100 AMD XP cycles will do the same computation as about 121 Intel P4C cycles (2 SMT, w/o SSE2), or 100 AMD XP cycles for 69 P4C cycles (2 SMT + SSE2),

In summary,
- 1 AMD Hz = 1.22 P4C Hz (2 SMT) for integer arithmetic (based on Dhrystone benchmark)
- 1 AMD Hz = 1.21 P4C Hz (2 SMT) for floating point arithmetic (based on Whetstone benchmark)
- 1 AMD Hz = 0.69 P4C Hz (2 SMT) for SSE2 floating point arithmetic (based on Whetstone benchmark)

Example,
- A XP/Barton running at 2.5 GHz is as fast as a P4C 3.1 GHz (= 2.5 x 1.22) running in for integer computation in terms of raw CPU power.
- A 2.8 GHz Barton would perform about the same as a 3.4 GHz P4 in integer arithmetic.

The benchmark codes are usually CPU/cache intensive to test CPU and require little or no memory access.
The IPC numbers vary with CPU architechture, so XP/Barton will be different from A64, ...., and can be measured accordingly.

How to interpret Sandra CPU benchmark, IPC and comparing with P4 (page 2)

What is cycle time and frequency

Frequency, clock, period of synchronous operations, latency

Analogy on Bus Speed, Bandwidth and Latency

This link shows the screen shots for the Sandra run used for the IPC calculation in the last post.

Overclocking a mobile Barton 2400+ to 2.6/2.7+ GHz on air (page 18)

hitechjb1 · Mar 19, 2004

How to estimate max stable frequency at full load (for mobile Barton)

A socket A CPU at full load dissipates around 115 W +- 10W active power (about 90% of total).
Since their thermal resistance are around 0.22 +- 0.02 C/W,
the CPU at full load will have a temperature increase of 23 - 30 C.
This temperature increase will cut into the CPU max frequency some where between 9.2 - 12 %,
which is between 262 - 342 MHz at 2850 MHz level.
Make this frequency drop as 300 +- 40 MHz.
Here 2850 MHz is assumed to be the bootable frequency of a typical mobile Barton with sufficient voltage,
at low ambient temperature of 15 C.
That number for some golden mobile Bartons may be 100-150 MHz higher (to 2950 - 3000 MHz).

stable_fullload_frequency = max_bootable_frequency ( 1 - HS_thermal_resistance x max_CPU_overclock_power_dissipation x 0.004) (for XP/Barton)

This put the stable, full load overclocking frequency on air to around
2850 - 300 = 2550 MHz +- 40 Mz (i.e. 2510 - 2590 MHz)

The SP-97 delivers the highest stable, overclocking frequency, following by SLK-947/900, then 800, then SK-7.

An interesting thread showing how to estimate the max overclocking of a mobile Barton 2400+, and how to get the last 50-100 MHz from the CPU.
http://www.ocforums.com/showthread.php?s=&threadid=281607

Any difference between a mobile Barton 2400+ and 2500+?

Based on my impression from the various posts (not a completely scientific evaluation),
the mobile 2400+ has a good chance to do stably between 2.5 - 2.6+ GHz with around 2.0 - 2.1 V, 45 - 50 C, using a high end HS (such as the SLK-series) plus a high CFM fan (such as a 80mm Tornado, or TT SF II w/ 50 MHz less). Many can get a temporarily boot to as high as 2.7 - 2.8 GHz.

The mobile 2500+ seems to have about 100 MHz higher bias.

With high end water cooling, it would add another 100-150 MHz or so to the above numbers.

For price performance, the $20+ cheaper mobile 2400+ would be the best choice. $20 for only 100 MHz less on average.

For highest CPU MHz potential (satisfaction and/or competition), the mobile 2500+ would have a better chance.

hitechjb1 · Mar 20, 2004

c627627 said:
Side question I've been trying to ask hitechjb1 for a while: how do you illustrate AMD MHz vs Intel MHz to someone who has litlle or no computer knowledge, what's the everyday comparison example you use?

(I know everyone else has their stories, I'd like to hear hitechjb1's, please.)

Analogy for comparing CPU cycles

To summarize the technical details, simply put:

For XP and P4 (with 2 SMT),

P4 can be clocked roughly 25% higher than an XP.
But an XP executes about 25% more instructions per cycle than a P4 (for integer arithmetic).
So both are roughly a tie in term of number of instructions executed per sec (a performance measure).

For details,
What is IPC and how to compare cycle or Hz for different CPU architectures (page 19)

The 25% is derived from benchmarks, and it may vary slightly for a different set of program codes.
Since it is roughly a tie, so both sides should be happy.

For an analogy without using computer terms, one may say

A person called P4 walks 25% more steps per unit time than another person called XP.
But the step of XP is 25% longer than the step of P4.
So both would travel the same distance over the same period of time.

The analogy is:

clock cycle <--> foot step
cycles per second <--> foot steps per second
instruction executed <--> distance travelled
instruction executed per second <--> distance travelled per second (a performance measure)

cycle time = 1 / frequency

What is cycle time and frequency

Fr3ak · Mar 20, 2004

Nice work hitechjb1

Even I got the point now

hitechjb1 · Mar 21, 2004

Recent high-end heat sink for socket A

For air, any of the Thermalright copper HS such as SK-7, SLK-800U, SLK-947U/900U, SP-97 are considered the best kind.

Which one to buy depends on whether they are good deals around. Sometimes, some may go down as low as $15-30.
Regular price is around $40.

At full load,
SP-97 is better than the SLK-947U/900U by about 4 C (~ 40 MHz at 2.5 GHz)
SLK-947U is better than the SLK-800U by about 1-2 C (~ 10-20 MHz at 2.5 GHz)
SLK-800U is bettern than the SK-7 by about 2 C (~ 20 MHz at 2.5 GHz)

So the difference in MHz among them are within 80 MHz, not a big difference for real usage, except for benchmarking or competition.

Fans

For fan, the popular all around fan for both 24/7 regular usage and overclocking is the Thermaltake Smart Fan II, whose fan speed can be adjustable to tradeoff noise level and air flow (CFM). It is about $10. 3000-3500 rpm is for regular usage, max rpm is for overclocking benchmarking.

The 80 mm Tornado can deliver highest CFM, gives best result w/ the SLK- HS. But its noise is unacceptably high for 24/7 usage, unless a fan control is added to adjust the speed.

IMHO, 80 mm Tornado performs better than a 92 mm Tornado, by about 5 C at full load (see links below).

Side note: The SLK- HS CANNOT be used with the long awaiting A64 socket 754/939 CPU,
which is the logical upgrade path in 3-12 months for many people.
So consider this for the $40 HS investment to get that last 100-300 MHz for socket A CPU.

Comparing some HSF's on CPU overclock frequency on air (page 6)

Testing 4 fans w/ a SLK-947U on a Tbred B 1700+ DLT3C (page 16)

Comparing Tornado 80mm, Tornado 92mm and TT SFII (second test data) (page 17)

RedDawg41 · Mar 24, 2004

hitechjb1 · Mar 25, 2004

Some remarks on case fans and cooling

For air cooling, case temperature is important since it affects the final overclocking results.

1 C in system ambient temperature would generally translate into about 10 CPU MHz at 2.5 GHz level.

Lower system ambient temperature also helps cooling of Vcore regulator, chipset and video card GPU, ....

Any quiet fans with ~ 2000 rpm, 20-30 CFM, 20-25 dba should do the job for case fan.

- There are quite a few 80 mm fans with low noise, 20-30 CFM,
such as Panaflo, Enermax (Whisper model, var. speed), Vantec Stealth model), ....
- There are also some self-adjustable fans that change speed according to temperature sensed.
- One can also use a TT SFII (if there are some sitting around) as a case fan, by setting to low speed.
- There are also some inexpensive (~ $4) fans with color leds for side panel fan and case fan (if you like the color).
They are quite effective and low noise, but not the quietest.

This link gives some general idea about different 80 mm fans, some are suitable for case fans.
http://www.svc.com/standardfans.html

Usually I use one intake (80 - 120 mm depends on the case) in the front, for general air intake, through the drive area. Then another one on the side panel to blow external air to the video card and system area.

For outtake, I have two to three, one through the PSU, and one/two at the back.

Related links:

AMD Athlon System Cooling Guidelines

Impact of higher ambient temperature on CPU clock frequency (page 6)

Intake and outake air flow

Total CFM for intake should be roughly equal to total CFM for outtake to maintain fluid flow mass conservation through the intake and outake openings, so the main flow is from the intake and exhausts through the outtake to minimze dust build up all over the gaps (don't try to split hair on this aspect).

hitechjb1 said:
I think there is really NO positive pressure buildup or negative pressure buildup at steady state, just continuous air flow from intakes to exhausts based on the principle of conservation of mass for fluid flow at steady state.

1. For posiitve fan intakes (instead of calling positive pressure):
SUM CFM_intakeFan = SUM CFM_exhaustFan + airflow_exhaust_holes_gaps
2. For positive fan exhausts (instead of calling negative pressure):
SUM CFM_intakeFan + airflow_intake_holes_gaps = SUM CFM_exhaustFan
3. For balance intakes and exhausts:
SUM CFM_intakeFan = SUM CFM_exhaustFan

This is an interesting thread to read, regardless of "positive" or "negative air pressure". It contains some physics about fluid flow, fan CFM and pressure, ... (in page 2-3).
Negative or positive pressure in case air flow?

hitechjb1 · Mar 26, 2004

Is it safe to run 2.1+ V on mobile Barton?

I have been running a mobile 2400+ using 2.15 V on air since early March 04,
in order to get it stable around 2.6+ GHz, temperature 40-45 C load,
with medium fan speed ~ 3000 rpm.

This mobile barton needs that much voltage to get it above 2.6 GHz.

I even tested it with 2.2-2.3 V for short time benchmarking, ...., in a cold day on air.
Used 2.22 V to run some benchmarks at 2.79 GHz on air.
Overclocking a mobile Barton 2400+ to 2.6/2.7+ GHz on air (page 18)

2.15-2.2 V is highest it can run stably.
High voltage alone cannot deliver stability, it needs also low temperature.

I intend to run it with 2.15 V as long as situation permits, to see any adverse effect, in weeks or months, ....
Effect of high Vcore and electromigration on CPU failure time (page 15)
How to determine "highest" voltage and temperature for CPU overclocking (page 16)

High voltage is necessary for CPU to be clocked higher.
It is necessary but not sufficient.
The heat generated under full load will be the opposing force to slow CPU down.
The balance point between voltage/frequency and temperature would give the max stable frequency.

The higher the voltage, the lower the temperature has to be.
What is CPU stability (page 19)

It is NOT recommend to run at such high voltage if it is the ONLY CPU one has to run the system.
I would lower the voltage to bring the max frequency down by 200 MHz.

It is NOT recommend to put high voltage on a CPU without monitoring how frequency and temperature per step increase.
The frequency increase and temperature increase will tell you whether more voltage is warranted.
The break-even point is around 10 MHz / C at 2.5 GHz level, i.e. should not push voltage unconditionally when getting less than 10 MHz per C increase (under load).

This was posted before the debut of mobile barton, so the number for desktop Tbred B and Barton may not apply directly to mobile XP.

Originally posted by hitechjb1
Higher CPU voltage is needed to sustain higher CPU frequency. Increase the voltage gradually if needed, step by step (25 mV step in bios) so that CPU frequency can be raised. Monitor CPU stability by running Prime95, and CPU temperature during this process. Keep the temperature lowest possible using best affordable cooling. For air, SLK-800U/900U/947U plus a high CFM, variable speed fan are the best choice.

1. At early stage of overclocking, within 10% above rated voltage, overclocking is an easy, linear ride of frequency over voltage, about 100-130 MHz / 100 mV (for Tbred B/Barton).

2. Around the break-even point of overclocking, characterized by about
- 10 MHz / C of temperature increase, or
- 30 MHz / 100 mV of Vcore increase
for Tbred B/Barton. Beyond the break-even, it becomes much more difficult to overclock higher. This happens around 1.8 - 1.9 V.

3. Above the break-even point, diminishing return occurs, i.e. heat will eat into more than half of the frequency increase from the increase in power put into the CPU for sustaining higher frequency. Temperature would rise quickly and much more voltage would be needed to gain few MHz, making further overclocking very costly (high PSU current, cooling, CPU life expectancy degradation) and impractical (huge fan noise, little performance gain). Even the Vcore can be increased higher (beyond 1.9 - 2.0 V) and system is stable, there would be practically little gain in MHz (less than 30 MHz / 100 mV).
...

General rules on voltage and temperature for CPU overclocking (page 16)

How to determine "highest" voltage and temperature for CPU overclocking(page 16)

hitechjb1 · Mar 28, 2004

Found an interesting read about strained silicon,
which will be used to speed up future chips (90, 65 nm), ....

Strained Silicon

Quote from article: "A transistor built with strained silicon. The silicon is "stretched out" because of the natural tendency for atoms inside compounds to align with one another. When silicon is deposited on top of a substrate with atoms spaced farther apart, the atoms in silicon stretch to line up with the atoms beneath, stretching -- or "straining" -- the silicon. In the strained silicon, electrons experience less resistance and flow up to 70 percent faster, which can lead to chips that are up to 35 percent faster -- without having to shrink the size of transistors."

Strained Silicon article from IBM
(Click on the color image link,
excellent image showing a detailed crossection of a transitor down to the scale of nm,
with gate oxide thickness of 10 Angstoms or less, the order of few atoms).

1 nm = = 10^-9 m = 1/1,000,000,000 m = 10 Angstoms

Strained Silicon article from Intel

Audioaficionado · Mar 28, 2004

SS sounds like a great concept but as we all now know the reality of the situation, SS also leaks current like sieve and is causing all kinds of PressHott and soon to be revealed Nocona heat issues.

hitechjb1 · Apr 8, 2004

How to read CPU temperature

Temperature reading from temperature sensors, bios/software reading are error prone, i.e. absolute reading can be off significantly, by as much as few degree C. Then the question is how to comapre CPU temperature between different systems, different case cooling, different seasonal effects, between idle and load conditions, ....

The internal CPU die temperature monitoring device (aka diode) monitors die temperature. There is usually another temperature measureing device at the CPU socket, external and touching the CPU to monitor the CPU temperature. These two numbers would not be the same in general, the external one usually being lower under full load. Some motherboard bios and monitoring software can report both of these temperature for comparison, so these numbers can be tracked by using their difference. E.g. ABIT NF7-S reports only the external one, regardless of what software monitoring programs used (such as MBM5, hardware doctor), but the BIOS temperature cut-off protection indeed uses the internal die temperature for protection. The ASUS A7N8X can report both temperature numbers.

Four different temperature measurements can be measured for air cooling, and can also be extended to water cooling. It would be less error-prone and less dependent on the absolute accuracy of the on-board temperature sensors and software/BIOS probe.

CPU_fullload - CPU full load temperature
CPU_idle - CPU idle temperature (loosely defined, e.g. just boot up OS/BIOS, with minimal stuffs running)
SYS_fullload - system ambient temperature when CPU at full load
SYS_idle - system ambient temperature when CPU idle

actual_CPU_temperature_increase = (CPU_fullload - CPU_idle) - (SYS_fullload - SYS_idle)

So the absolute temperature of CPU is not read, but rather
- the difference between full load and idle, and
- the difference between CPU and system.
Hence eliminating (relatively) the absolute measurement errors, and ambient temperature effect.

E.g. CPU under simliar load in summer and winter, or under different ambient room temperature conditions.
Summer:
CPU_fullload = 53 C
CPU_idle = 42 C
SYS_fullload = 31 C
SYS_idle = 28 C
actual_temperature_increase = (53 - 42) - (31 - 28) = 8 C

Winter:
CPU_fullload = 40 C
CPU_idle = 29 C
SYS_fullload = 18
SYS_idle = 15 C
actual_temperature_increase = (40 - 29) - (18 - 15) = 8 C

This can apply to water cooling. Replace SYS_fullload with WATER_INTAKE_fullload_temperature, and replace SYS_idle with WATER_INTAKE_idle_temperature.

Advantage of using these four temperature numbers:

- It relatively eliminates the effect of absolute errors from the temperature sensors and software/BIOS reading.

- As the system temperatures are measured, it relatively eliminates the effect of case cooling on the CPU temperature measurement.

With this, one can compare the temperature change of two CPU's between idle and full load, even when the two CPU's are in two systems with different case cooling.

E.g. the CPU in a case with worse case cooling would tend to read a higher temperature, but the change in temperature should be similar to the one in a case with better case cooling (assuming the worse CPU did not crash first due to higher temperature).

- It also eliminates the effect of seasonal ambient room temperature on CPU temperature reading.

Power dissipation

power_dissipation = (CPU_temperature - system_ambient_temperature) / cooling_coefficient

For good case cooling, SYC_full_load - SYS_idle should be at most 2 - 3 C when CPU is under full load at 2.4 - 2.6 GHz on air.

For CPU under full load, using my CPU as an example,

Tbred B 1700+ DLT3C at 2.54 GHz 1.92 V,
CPU_full_load - CPU_idle ~ 8 C
CPU_full_load - SYS_full_load ~ 20 C
full load power ~ 20 / .22 = 91 W

For a 2400+ mobile Barton at 2.65 GHz 2.15 V,
CPU_full_load - CPU_idle ~ 12 C
CPU_full_load - SYC_full_load ~ 27 C
full load power ~ 27 / .22 = 123 W

For a 2600+ mobile Barton at 2.71 GHz 1.89 V, 22/46 C load, 20/39 C idle
CPU_full_load - CPU_idle ~ 7 C
CPU_full_load - SYC_full_load ~ 24 C
full load power ~ 24 / .22 = 109 W

Next, will discuss how to relate the CPU temperature increase with CPU power dissipation and CPU switching factor under full load and idle conditions.

hitechjb1 · Apr 13, 2004

CPU temperature, power dissipation under different CPU load (switching factor)

place holder

hitechjb1 · Apr 13, 2004

place holder

Overclocking Sandbox: Tbred B DLT3C 1700+ and Beyond

Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Member

Senior Member

Senior Member

Senior Member

Member

Senior Member

Member

Senior Member

Senior Member

Senior Member

Sparkomatic Moderator

Senior Member

Senior Member

Senior Member

Similar threads