- Joined
- Apr 19, 2003
The Relationship Between CPU Frequency And Performance. Critique, Also Answer Qs?
The Relationship Between Processor Frequency and Performance.
Abstract
Those who work in the computer science field are always asking themselves one question, “how can I get more performance?” To satisfy these performance addicts and to stay profitable, Central Processing Unit (CPU) manufacturers have to ask themselves, “Are we manufacturing these CPUs in the most efficient manner?” Efficiency equates to money, and that’s why it’s important to understand the fundamentals of microprocessors and what laws govern them.
A CPU plays a pivotal role in anyone’s computer system. It’s what makes the computer tick, and it’s one of the most common culprits if a system is acting sluggish. The intention of this project was to find out what kind of relationship frequency and performance had, and if there is any way one could utilize this relationship to one’s advantage. The hypothesis theorized that the relationship between frequency (speed, measured in megahertz) and performance (as measured by benchmarking applications) would be absolutely linear.
The experiment was conducted in a relatively simple manner. A modern microprocessor was clocked at different frequencies, tested for performance with several benchmarking applications, and then it was rebooted, and the process was repeated for other frequencies up to 2400 mhz. Results were recorded after each step. The frequencies covered were, 1000 mhz, 1100 mhz, 1200 mhz, 1300 mhz, 1400 mhz, 1500 mhz, 1600 mhz, 1700 mhz, 1900 mhz, 2000 mhz, 2100 mhz, 2200 mhz, 2300 mhz, and 2400 mhz. This supplied a very wide range of frequencies, and was sure to provided an adequate range of data.
To see which model fit best, the exponential and linear models were made for each respective benchmark, then the r value, also called the correlation coefficient, was calculated. The Correlation coefficient shows how strongly a model actually fits the data. The results were quite controversial. While there were some benchmarks that favored a linear model, there were some benchmarks that in fact more closely modeled an exponential. While there may have been experimental errors, none could account for the consistent results.
While it may be tempting, it is impossible to draw any concrete conclusions from this data alone. That doesn’t make this experiment useless, in fact, quite the opposite. It has opened up a new subject of discussion, and opened the door to future testing. Hopefully, with more expansive and controlled tests, the correlation may finally be figured out once and for all.
Introduction
One of the demons plaguing computing is performance. Microprocessor manufacturers strive to release the best possible product, in a cutthroat race to the top, claiming the performance crown. But just what forces act upon this performance-oriented race? This experiment intends to determine whether a microprocessor’s frequency, as measured in megahertz (mhz), has a linear, or exponential relationship to performance, as measured by benchmarking applications.
Wikipedia.com defines a Central Processing Unit (CPU) as, “the part of a computer that interprets and carries out the instructions contained in the software.” Simply put, the CPU is the heart of the modern computer. When all parts of a CPU are on a single Intergrated Circuit (IC), it is referred to as a microprocessor. A CPU gets its clock speed from a simple equation, Front Side Bus (FSB) times Multiplier equals resulting frequency. (http://en.wikipedia.org/wiki/CPU)
While this may sound very complex, in reality it is not. Very simply, a CPU takes in raw instructions and data from the other parts of the computer, manipulates the data and runs processes as directed by other components, and outputs results. Whenever anything occurs on one’s system, the CPU has computed it. Almost no information is passed on to the user without first being manipulated by the CPU.
One of the main concerns of chip makers is clock speed. Clock speed is also dubbed frequency or operating speed and it is measured in megahertz. The faster the clock speed, the more MIPS, or “Millions Of Instructions Per Second”, a CPU can execute. CPUs will perform several different functions on data it obtains from the rest of the computer, and depending on desired output, it will manipulate it accordingly. Increase the frequency and it will be able to fetch more data at a time, and complete the computations faster.
If chip manufacturers are able to understand the relationship between frequency and performance, then they may be able to estimate future limitations and ceilings ahead of time, and engineer ways to overcome them.
Gordon Moore predicted the density of microprocessors will double every eighteen months. More chip density means smaller circuits in a smaller amount of space, allowing for even faster frequencies, thereby, more performance. This being an exponential equation, one may wonder of limitations to this law—and many people have. There are constant debates and contradicting articles on the Internet disputing the credibility of the law being applicable forever, and some in defense of it. Surely there must be some point where the equation can’t take any more—some point at which the chip manufacturers are unable to stuff any more transistors in an already tightly packed space. While that battle rages on, there are yet other frontiers, which need to be conquered. If there is a limit to how much we are able to get out of microprocessors, how will be make them more efficient, and faster? (http://www.intel.com/research/silicon/mooreslaw.htm)
There are many optimizations that chip manufacturers released to help increase performance as well—SSE, special instruction sets, larger on-die cache, the list goes on and on. The bottom line is they are not only blindly increasing megahertz. The question arises, disregarding the other exponential relationships in computing, is there a possibility that there is an exponential relationship occurring with the same chip that is clocked at different frequencies?
Thus, this experiment was formed. The hypothesis was that the relationship between processor frequency and resulting performance must be a linear relationship. Logic would tell one that the faster clocked the processor, the more instructions it will be able to perform. As you double the speed of the processor, you double the raw performance.
Experimental Design
The experiment was set up to be fairly easy, as any benchmarking would be energy consuming to the computer, not the researcher. The experiment required a test bed system. An AMD AthlonXP 1700+ Processor, an Epox 8rda+ motherboard, 2x256MB pc3200 Kingston RAM, an ATI Radeon 9800 Pro, two Western Digital 120GB 7200RPM drives, one self built case, and one self built watercooling system were used in this experiment. Watercooling was chosen for the fact that in such a system ambient temperature has the least effect on the temperature of the processor, thus eliminating one of the variables.
The benchmarks that were chosen were, Sisoft Sandra 2005, Pifast, Prime95, and ScienceMark2005. These benchmarks were chosen for their diversity in the way they stressed the microprocessor—Sisoft Sandra had an arithmetic as well as a multimedia mode, Pifast uses a method to benchmark how long the micrprocessor takes to calculate millions of digits of pi (For more on the method used for calculation of Pi see Appendix A), Prime95 used equations to find prime numbers, and ScienceMark fed the microprocessor equations to solve to gauge how long it took the microprocessor to solve several scientific equations. All of these benchmarking applications tax the CPU in a controlled way; they feed the CPU data, then record the amount of time it takes for it to output a result.
In preparation for the experiment, the system had all the startup applications disabled via msconfig (a system configuration utility), so there was nothing to call upon the CPU and impact results. To further eliminate any variables, the Vcore (voltage given to the CPU during use) was kept the same throughout the tests. FSB (Front Side Bus) was also kept at 200 MHZ. The first multiplier was booted at 5, giving the chip a 1000 MHZ clock speed. The system was then booted into Windows XP, at which point a program called Free Ram Optimizer XP was executed. To make sure all the systems had similar access to the same amount of RAM, Free Ram Optimizer XP was used to clear up 400 MB of system memory. Upon completion, Sisoft Sandra was loaded, and the Arithmetic and multimedia benchmarks were completed (see appendix D for screenshot). Afterwards, once again, Free Ram Optimizer XP was used before the next benchmark. Next, a set .bat file was used to execute Pifast (see Appendix C for a screenshot of Pifast in action), and have it calculate 33554432 digits of pi (see Appendix B for the contents of the .bat file). Afterwards, Free Ram Optimizer XP was executed to clear the memory of any lingering, unused information (see appendix E for screenshot). Prime95 was then opened, and benchmark mode was enabled. Prime95 recorded the times it took to calculate certain prime numbers at different settings (see appendix F for screenshot). Prime95 was then closed, and Free Ram Optimizer XP was opened for one last time. Science Mark 2005 was then opened, at which time the Molecular Dynamics program was loaded (see appendix G for screenshot). The standard model was rendered, and the program automatically recorded the time it took to render. Then Primordia (a sub application of Science Mark 2005) was run, which simulated different conditions upon Aluminum (see appendix H for screenshot). The last benchmark contained within Science Mark 2005 was a Cipher benchmark (see appendix I for screenshot). The computer recorded how long it took to cipher using default settings. Upon completion, the system was rebooted, and the multiplier was changed to 5.5, resulting in a 1100 MHZ frequency. This process was repeated, in intervals of .5 multiplier additions, until the frequency of 2400 MHz was obtained. At which point, the limitations of the equipment were prevalent, and the computer would not boot higher than that frequency. The only limitation that this method presented was the fact that there was no 9.0 multiplier enabled on the test motherboard, therefore it was impossible to test at 1800 MHZ. A listing of the procedure used while testing, a quick field reference, is listed in Appendix K.
Results
For a full listing of results put in data tables see appendix V.
All results were recorded, and statistical analysis was done upon them to find out the most appropriate model. Included with all results are two statistical model lines, one linear, and one exponential. Also calculated was the correlation coefficient (r) of the exponential and linear data sets to the original data. Appendix L shows the first set of data, Prime 95 Best Times (in ms) at all the tested frequencies.
Appendix M shows the results for the Pifast calculations, along with exponential and linear models. Appendix N represents the time taken by the different frequencies to do the Molecular Dynamics model, as well as corresponding statistical models. Appendix O shows the primordia calculation time for the varying frequencies. Appendix P shows the time taken to finish the Cipher routine. Appendix Q shows the amount of bandwidth yielded by the cipher routine. Appendix R shows the Sisoft Million Instructions Per Second (MIPS) benchmark results. This is a very basic means of measuring the raw amount of data a CPU can handle. Appendix S represents the Million Floating Point Operations Per Second the frequencies were able to churn out. Appendix T and U show the result of the multimedia benchmark, in IT per second.
Discussion
The hypothesis was that the performance would follow a completely linear relationship. As confusing as it may sound the results were very inconclusive. In some of the tests, such as cipher time, primordial time, Molecular Dynamics time, and Prime 95 time, the exponential model fit much better than the linear, while still not absolutely perfect. The r value tells us which model has a stronger relationship to the original data, and for all these the r value was closer to one with the exponential rather than the linear, indicating a strong exponential relationship. Every other benchmark that was encountered, the linear model fit perfectly, while the exponential model was quite inaccurate.
While the linear results seem logical, and perfectly fit, it is almost inexplicable why, in some tests, an exponential model fit better. One hypothesis might be the way in which the benchmarks were formed. The benchmarks that fit the linear model were more geared towards theoretical performance, while the other benchmarks were real world situations. This is definitely an open door to the future for more research as to why a CPU might behave exponentially in real world tests, while adhering to linear performance in theoretical benchmarking. Further research on the topic could lead CPU manufacturers to more efficient processors that get more done with less effort.
Unfortunately, as all experiments do, there were multiple possibilities for sources of error in this project. The most obvious error is an errant process running, taking precedence over the benchmarking program, and delaying, thus skewing, the results of the benchmark. Windows isn’t necessarily the best platform to run these tests, as there are many objects always running in the background other than the bare operating system, which may have been a source of error.
Another error may have been temperature. Despite best efforts to minimize the impact of varying temperatures with a watercooling system, the ambient temperature did fluctuate. Microprocessors operate at peak efficiency at cooler temperatures, therefore if it was colder within the room in which one test was conducted, it may yield slightly better performance than another test.
One other source of error may have come from the network. The system was plugged into the network at the time of benchmarking. Often, computers will receive ping requests from other computers, and normal networking duties need to be carried out. When a computer receives a ping request it is mandated to respond, and that response may have taken cycles away from the benchmark, which was running at the time.
If repeated this experiment would need a lot more expanding. Perhaps a better system than using “Free Ram Optimizer XP” to optimize the memory after every benchmark would be to restart the system after every benchmark, giving a similarly fresh system every time. Another factor to consider would be the limited scope of tests available. If expanded upon, a wide variety of tests should be used, and perhaps the research should be conducted in an environment-controlled facility. Another limitation was the CPU itself. Only one microprocessor was available at the time, the AMD Athlon XP. If more research were to be conducted, it may be beneficial to include many different CPU architectures to be tested, such as the AMD 64 bit, or the Intel P4.
Bibliography
xgourdon (2004). Algorithm used by PiFast. Retrieved November 20, 2004, from the World Wide Web: http://numbers.computation.free.fr/Constants/PiProgram/pifast.html
Science Mark 2 Team (2004) Benchmark HowTo’s. Retrieved November 21, 2004, from the World Wide Web: http://www.sciencemark.org
Nick Tredennick & Brion Shimamoto (2004) The death of microprocessors. Retrieved November 05, 2004, from the World Wide Web: http://www.embedded.com
Online Author (2004) Moore’s Law. Retrieved December 01, 2004, from the World Wide Web: http://www.intel.com/research/silicon/mooreslaw.htm
Online Author (2004) CPUs. Retrieved November 25, 2004, from the World Wide Web: http://www.wikipedia.org
Gordon Moore (2003) No Exponential is Forever … but We Can Delay ‘Forever’. Retrieved November 28, 2004, from the World Wide Web: ftp://download.intel.com/research/silicon/Gordon_Moore_ISSCC_021003.pdf
APPENDICIES
Appendix A
Algorithm used by PiFast
The program implements a Brent binary splitting method together with an efficient cache handling hermitian FFT to multiply big integers (NTT with several primes is used for huge computations). To compute p, it is based on the Chudnovsky formula
426880 _____ض10005
p = هn ³ 0 (6n)!(545140134n+13591409) (n!)3(3n)!(-640320)3n ,
which adds roughly 14 decimal digits by term.
PiFast also proposes a second method for verification, based on a Ramanujan formula
1 p = 2ض2 هn ³ 0 (4n)!(1103+26390n) 44n(n!)4 994n+2 ,
which adds roughly 8 decimal digits by term.
My experience is that a careful implementation of these techniques seems better than any other approaches (like AGM based formulaes for example) for reachable number of digits.
PiFast implements the computation of E with two formulas, namely
e = ¥ هn = 0 1 n! and e = ( ¥ هn = 0 (-1)n n! )-1.
From version 4.0, PiFast permits to compute a large family of user defined constants from linear combination of general hypergeometric series. The algorithm also uses binary splitting, which generalizes well to hypergeometric series.
(http://numbers.computation.free.fr/Constants/PiProgram/pifast.html)
Appendix B
The .bat file contained these simple parameters:
PiFast, version 4.3 (fix 1) (Copyright 1999-2003 Xavier Gourdon)
http://numbers.computation.free.fr/Constants/PiProgram/pifast.html
Menu :
[0] Compute Pi with Chudnovsky method (Fastest)
[1] Compute Pi with Ramanujan method
[2] Compute E by the exponential series exp(1)
[3] Compute E by the exponential series 1/exp(-1)
[4] Compute Sqrt(2) (useful for testing)
[5] Define your constant with hypergeometric series
[6] Compute a user constant from a .pifast file
[7] Decompress a result file
[8] Check a compress result Pi file
Enter your choice : 0
Choose your computation mode :
[0] standard mode (no disk memory used)
[1] basic disk memory mode (for big computations)
[2] advanced disk memory mode (for huge computations)
Enter your choice : 0
Number of decimal digits : 32M
(33554432 digits)
Possible FFT modes, with approximate needed memory :
FFT Size=4096 k, Mem=230640 K (Fastest mode)
FFT Size=2048 k, Mem=148720 K (Time: Fastest mode * 1.1)
FFT Size=1024 k, Mem=107360 K (Time: Fastest mode * 1.3)
FFT Size= 512 k, Mem=86880 K (Time: Fastest mode * 1.7)
FFT Size= 256 k, Mem=76440 K (Time: Fastest mode * 2.7)
FFT Size= 128 k, Mem=71320 K (Time: Fastest mode * 4.7)
...
Enter FFT Size in k :4096
Compressed output (also useful to specify output format) ? [0=No, 1=Yes] : 0
Basically, the .bat file just specified the fastest way to compute pi, told it to use no disk memory (as that would be another bottleneck and possible variable), compute 33554432 digits, and to use 230,640 K of memory.
Appendix C
Screenshot of pifast’s completed computation with CPU information provided by WCPUID on top.
Appendix D
Screenshot of Sisoft Sandra Arithmetic Benchmark.
Appendix E
Screenshot of Free Ram Optimizer XP
Appendix F
A picture of Prime95 Benchmarking.
Appendix G
A Screenshot of the molecular dynamics program in action.
Appendix H
A shot of the primordial simulation.
Appendix I
Screenshot of cipher in action.
Appendix K
Variable issues:
Temperature
Program Error
Network computation
Procedure:
Boot At specified frequency.
Use "Free Ram Optimizer XP" Free up 300.00 MB of memory before benchmark load
WCPUID ON SIDE OF SCREEN!
---
Open Sisoft Sandra
Run Sisoft Sandra CPU Arithmetic Benchmark
SCREENSHOT! File Name: SSSCPUABENCH$frequency$$DATE$.PNG
run Sisoft Sandra CPU Multi-Media Benchmark
SCREENSHOT! File Name: SSSCPUMMBENCH$frequency$$DATE$.PNG
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
---
Use "Free Ram Optimizer XP" Free up 300.00 MB of memory before benchmark load
---
run f:\program files\pifast\PIFAST32M.bat
SCREENSHOT! COMPUTATION OK... File Name: PIFAST$frequency$$DATE$.PNG
CHANGE LOG FILE NAME! SAVE AS: PIFAST$frequency$$DATE$.TXT
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
---
Use "Free Ram Optimizer XP" Free up 300.00 MB of memory before benchmark load
---
Run CPU Right Mark
Open prime95
Benchmark, make sure log is there
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
---
Open Science Mark
run Molecular Dynamics Simluation: Defualt settings, 140 K Temperature
Run primordia Simulation: Aluminum
Run Cipher Benchmark
CHANGE LOG FILE NAME! SAVE AS: SM$frequency$$date$.txt
CONFIRM THAT ALL RESULTS ARE CONTAINED.
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
The Relationship Between Processor Frequency and Performance.
Abstract
Those who work in the computer science field are always asking themselves one question, “how can I get more performance?” To satisfy these performance addicts and to stay profitable, Central Processing Unit (CPU) manufacturers have to ask themselves, “Are we manufacturing these CPUs in the most efficient manner?” Efficiency equates to money, and that’s why it’s important to understand the fundamentals of microprocessors and what laws govern them.
A CPU plays a pivotal role in anyone’s computer system. It’s what makes the computer tick, and it’s one of the most common culprits if a system is acting sluggish. The intention of this project was to find out what kind of relationship frequency and performance had, and if there is any way one could utilize this relationship to one’s advantage. The hypothesis theorized that the relationship between frequency (speed, measured in megahertz) and performance (as measured by benchmarking applications) would be absolutely linear.
The experiment was conducted in a relatively simple manner. A modern microprocessor was clocked at different frequencies, tested for performance with several benchmarking applications, and then it was rebooted, and the process was repeated for other frequencies up to 2400 mhz. Results were recorded after each step. The frequencies covered were, 1000 mhz, 1100 mhz, 1200 mhz, 1300 mhz, 1400 mhz, 1500 mhz, 1600 mhz, 1700 mhz, 1900 mhz, 2000 mhz, 2100 mhz, 2200 mhz, 2300 mhz, and 2400 mhz. This supplied a very wide range of frequencies, and was sure to provided an adequate range of data.
To see which model fit best, the exponential and linear models were made for each respective benchmark, then the r value, also called the correlation coefficient, was calculated. The Correlation coefficient shows how strongly a model actually fits the data. The results were quite controversial. While there were some benchmarks that favored a linear model, there were some benchmarks that in fact more closely modeled an exponential. While there may have been experimental errors, none could account for the consistent results.
While it may be tempting, it is impossible to draw any concrete conclusions from this data alone. That doesn’t make this experiment useless, in fact, quite the opposite. It has opened up a new subject of discussion, and opened the door to future testing. Hopefully, with more expansive and controlled tests, the correlation may finally be figured out once and for all.
Introduction
One of the demons plaguing computing is performance. Microprocessor manufacturers strive to release the best possible product, in a cutthroat race to the top, claiming the performance crown. But just what forces act upon this performance-oriented race? This experiment intends to determine whether a microprocessor’s frequency, as measured in megahertz (mhz), has a linear, or exponential relationship to performance, as measured by benchmarking applications.
Wikipedia.com defines a Central Processing Unit (CPU) as, “the part of a computer that interprets and carries out the instructions contained in the software.” Simply put, the CPU is the heart of the modern computer. When all parts of a CPU are on a single Intergrated Circuit (IC), it is referred to as a microprocessor. A CPU gets its clock speed from a simple equation, Front Side Bus (FSB) times Multiplier equals resulting frequency. (http://en.wikipedia.org/wiki/CPU)
While this may sound very complex, in reality it is not. Very simply, a CPU takes in raw instructions and data from the other parts of the computer, manipulates the data and runs processes as directed by other components, and outputs results. Whenever anything occurs on one’s system, the CPU has computed it. Almost no information is passed on to the user without first being manipulated by the CPU.
One of the main concerns of chip makers is clock speed. Clock speed is also dubbed frequency or operating speed and it is measured in megahertz. The faster the clock speed, the more MIPS, or “Millions Of Instructions Per Second”, a CPU can execute. CPUs will perform several different functions on data it obtains from the rest of the computer, and depending on desired output, it will manipulate it accordingly. Increase the frequency and it will be able to fetch more data at a time, and complete the computations faster.
If chip manufacturers are able to understand the relationship between frequency and performance, then they may be able to estimate future limitations and ceilings ahead of time, and engineer ways to overcome them.
Gordon Moore predicted the density of microprocessors will double every eighteen months. More chip density means smaller circuits in a smaller amount of space, allowing for even faster frequencies, thereby, more performance. This being an exponential equation, one may wonder of limitations to this law—and many people have. There are constant debates and contradicting articles on the Internet disputing the credibility of the law being applicable forever, and some in defense of it. Surely there must be some point where the equation can’t take any more—some point at which the chip manufacturers are unable to stuff any more transistors in an already tightly packed space. While that battle rages on, there are yet other frontiers, which need to be conquered. If there is a limit to how much we are able to get out of microprocessors, how will be make them more efficient, and faster? (http://www.intel.com/research/silicon/mooreslaw.htm)
There are many optimizations that chip manufacturers released to help increase performance as well—SSE, special instruction sets, larger on-die cache, the list goes on and on. The bottom line is they are not only blindly increasing megahertz. The question arises, disregarding the other exponential relationships in computing, is there a possibility that there is an exponential relationship occurring with the same chip that is clocked at different frequencies?
Thus, this experiment was formed. The hypothesis was that the relationship between processor frequency and resulting performance must be a linear relationship. Logic would tell one that the faster clocked the processor, the more instructions it will be able to perform. As you double the speed of the processor, you double the raw performance.
Experimental Design
The experiment was set up to be fairly easy, as any benchmarking would be energy consuming to the computer, not the researcher. The experiment required a test bed system. An AMD AthlonXP 1700+ Processor, an Epox 8rda+ motherboard, 2x256MB pc3200 Kingston RAM, an ATI Radeon 9800 Pro, two Western Digital 120GB 7200RPM drives, one self built case, and one self built watercooling system were used in this experiment. Watercooling was chosen for the fact that in such a system ambient temperature has the least effect on the temperature of the processor, thus eliminating one of the variables.
The benchmarks that were chosen were, Sisoft Sandra 2005, Pifast, Prime95, and ScienceMark2005. These benchmarks were chosen for their diversity in the way they stressed the microprocessor—Sisoft Sandra had an arithmetic as well as a multimedia mode, Pifast uses a method to benchmark how long the micrprocessor takes to calculate millions of digits of pi (For more on the method used for calculation of Pi see Appendix A), Prime95 used equations to find prime numbers, and ScienceMark fed the microprocessor equations to solve to gauge how long it took the microprocessor to solve several scientific equations. All of these benchmarking applications tax the CPU in a controlled way; they feed the CPU data, then record the amount of time it takes for it to output a result.
In preparation for the experiment, the system had all the startup applications disabled via msconfig (a system configuration utility), so there was nothing to call upon the CPU and impact results. To further eliminate any variables, the Vcore (voltage given to the CPU during use) was kept the same throughout the tests. FSB (Front Side Bus) was also kept at 200 MHZ. The first multiplier was booted at 5, giving the chip a 1000 MHZ clock speed. The system was then booted into Windows XP, at which point a program called Free Ram Optimizer XP was executed. To make sure all the systems had similar access to the same amount of RAM, Free Ram Optimizer XP was used to clear up 400 MB of system memory. Upon completion, Sisoft Sandra was loaded, and the Arithmetic and multimedia benchmarks were completed (see appendix D for screenshot). Afterwards, once again, Free Ram Optimizer XP was used before the next benchmark. Next, a set .bat file was used to execute Pifast (see Appendix C for a screenshot of Pifast in action), and have it calculate 33554432 digits of pi (see Appendix B for the contents of the .bat file). Afterwards, Free Ram Optimizer XP was executed to clear the memory of any lingering, unused information (see appendix E for screenshot). Prime95 was then opened, and benchmark mode was enabled. Prime95 recorded the times it took to calculate certain prime numbers at different settings (see appendix F for screenshot). Prime95 was then closed, and Free Ram Optimizer XP was opened for one last time. Science Mark 2005 was then opened, at which time the Molecular Dynamics program was loaded (see appendix G for screenshot). The standard model was rendered, and the program automatically recorded the time it took to render. Then Primordia (a sub application of Science Mark 2005) was run, which simulated different conditions upon Aluminum (see appendix H for screenshot). The last benchmark contained within Science Mark 2005 was a Cipher benchmark (see appendix I for screenshot). The computer recorded how long it took to cipher using default settings. Upon completion, the system was rebooted, and the multiplier was changed to 5.5, resulting in a 1100 MHZ frequency. This process was repeated, in intervals of .5 multiplier additions, until the frequency of 2400 MHz was obtained. At which point, the limitations of the equipment were prevalent, and the computer would not boot higher than that frequency. The only limitation that this method presented was the fact that there was no 9.0 multiplier enabled on the test motherboard, therefore it was impossible to test at 1800 MHZ. A listing of the procedure used while testing, a quick field reference, is listed in Appendix K.
Results
For a full listing of results put in data tables see appendix V.
All results were recorded, and statistical analysis was done upon them to find out the most appropriate model. Included with all results are two statistical model lines, one linear, and one exponential. Also calculated was the correlation coefficient (r) of the exponential and linear data sets to the original data. Appendix L shows the first set of data, Prime 95 Best Times (in ms) at all the tested frequencies.
Appendix M shows the results for the Pifast calculations, along with exponential and linear models. Appendix N represents the time taken by the different frequencies to do the Molecular Dynamics model, as well as corresponding statistical models. Appendix O shows the primordia calculation time for the varying frequencies. Appendix P shows the time taken to finish the Cipher routine. Appendix Q shows the amount of bandwidth yielded by the cipher routine. Appendix R shows the Sisoft Million Instructions Per Second (MIPS) benchmark results. This is a very basic means of measuring the raw amount of data a CPU can handle. Appendix S represents the Million Floating Point Operations Per Second the frequencies were able to churn out. Appendix T and U show the result of the multimedia benchmark, in IT per second.
Discussion
The hypothesis was that the performance would follow a completely linear relationship. As confusing as it may sound the results were very inconclusive. In some of the tests, such as cipher time, primordial time, Molecular Dynamics time, and Prime 95 time, the exponential model fit much better than the linear, while still not absolutely perfect. The r value tells us which model has a stronger relationship to the original data, and for all these the r value was closer to one with the exponential rather than the linear, indicating a strong exponential relationship. Every other benchmark that was encountered, the linear model fit perfectly, while the exponential model was quite inaccurate.
While the linear results seem logical, and perfectly fit, it is almost inexplicable why, in some tests, an exponential model fit better. One hypothesis might be the way in which the benchmarks were formed. The benchmarks that fit the linear model were more geared towards theoretical performance, while the other benchmarks were real world situations. This is definitely an open door to the future for more research as to why a CPU might behave exponentially in real world tests, while adhering to linear performance in theoretical benchmarking. Further research on the topic could lead CPU manufacturers to more efficient processors that get more done with less effort.
Unfortunately, as all experiments do, there were multiple possibilities for sources of error in this project. The most obvious error is an errant process running, taking precedence over the benchmarking program, and delaying, thus skewing, the results of the benchmark. Windows isn’t necessarily the best platform to run these tests, as there are many objects always running in the background other than the bare operating system, which may have been a source of error.
Another error may have been temperature. Despite best efforts to minimize the impact of varying temperatures with a watercooling system, the ambient temperature did fluctuate. Microprocessors operate at peak efficiency at cooler temperatures, therefore if it was colder within the room in which one test was conducted, it may yield slightly better performance than another test.
One other source of error may have come from the network. The system was plugged into the network at the time of benchmarking. Often, computers will receive ping requests from other computers, and normal networking duties need to be carried out. When a computer receives a ping request it is mandated to respond, and that response may have taken cycles away from the benchmark, which was running at the time.
If repeated this experiment would need a lot more expanding. Perhaps a better system than using “Free Ram Optimizer XP” to optimize the memory after every benchmark would be to restart the system after every benchmark, giving a similarly fresh system every time. Another factor to consider would be the limited scope of tests available. If expanded upon, a wide variety of tests should be used, and perhaps the research should be conducted in an environment-controlled facility. Another limitation was the CPU itself. Only one microprocessor was available at the time, the AMD Athlon XP. If more research were to be conducted, it may be beneficial to include many different CPU architectures to be tested, such as the AMD 64 bit, or the Intel P4.
Bibliography
xgourdon (2004). Algorithm used by PiFast. Retrieved November 20, 2004, from the World Wide Web: http://numbers.computation.free.fr/Constants/PiProgram/pifast.html
Science Mark 2 Team (2004) Benchmark HowTo’s. Retrieved November 21, 2004, from the World Wide Web: http://www.sciencemark.org
Nick Tredennick & Brion Shimamoto (2004) The death of microprocessors. Retrieved November 05, 2004, from the World Wide Web: http://www.embedded.com
Online Author (2004) Moore’s Law. Retrieved December 01, 2004, from the World Wide Web: http://www.intel.com/research/silicon/mooreslaw.htm
Online Author (2004) CPUs. Retrieved November 25, 2004, from the World Wide Web: http://www.wikipedia.org
Gordon Moore (2003) No Exponential is Forever … but We Can Delay ‘Forever’. Retrieved November 28, 2004, from the World Wide Web: ftp://download.intel.com/research/silicon/Gordon_Moore_ISSCC_021003.pdf
APPENDICIES
Appendix A
Algorithm used by PiFast
The program implements a Brent binary splitting method together with an efficient cache handling hermitian FFT to multiply big integers (NTT with several primes is used for huge computations). To compute p, it is based on the Chudnovsky formula
426880 _____ض10005
p = هn ³ 0 (6n)!(545140134n+13591409) (n!)3(3n)!(-640320)3n ,
which adds roughly 14 decimal digits by term.
PiFast also proposes a second method for verification, based on a Ramanujan formula
1 p = 2ض2 هn ³ 0 (4n)!(1103+26390n) 44n(n!)4 994n+2 ,
which adds roughly 8 decimal digits by term.
My experience is that a careful implementation of these techniques seems better than any other approaches (like AGM based formulaes for example) for reachable number of digits.
PiFast implements the computation of E with two formulas, namely
e = ¥ هn = 0 1 n! and e = ( ¥ هn = 0 (-1)n n! )-1.
From version 4.0, PiFast permits to compute a large family of user defined constants from linear combination of general hypergeometric series. The algorithm also uses binary splitting, which generalizes well to hypergeometric series.
(http://numbers.computation.free.fr/Constants/PiProgram/pifast.html)
Appendix B
The .bat file contained these simple parameters:
PiFast, version 4.3 (fix 1) (Copyright 1999-2003 Xavier Gourdon)
http://numbers.computation.free.fr/Constants/PiProgram/pifast.html
Menu :
[0] Compute Pi with Chudnovsky method (Fastest)
[1] Compute Pi with Ramanujan method
[2] Compute E by the exponential series exp(1)
[3] Compute E by the exponential series 1/exp(-1)
[4] Compute Sqrt(2) (useful for testing)
[5] Define your constant with hypergeometric series
[6] Compute a user constant from a .pifast file
[7] Decompress a result file
[8] Check a compress result Pi file
Enter your choice : 0
Choose your computation mode :
[0] standard mode (no disk memory used)
[1] basic disk memory mode (for big computations)
[2] advanced disk memory mode (for huge computations)
Enter your choice : 0
Number of decimal digits : 32M
(33554432 digits)
Possible FFT modes, with approximate needed memory :
FFT Size=4096 k, Mem=230640 K (Fastest mode)
FFT Size=2048 k, Mem=148720 K (Time: Fastest mode * 1.1)
FFT Size=1024 k, Mem=107360 K (Time: Fastest mode * 1.3)
FFT Size= 512 k, Mem=86880 K (Time: Fastest mode * 1.7)
FFT Size= 256 k, Mem=76440 K (Time: Fastest mode * 2.7)
FFT Size= 128 k, Mem=71320 K (Time: Fastest mode * 4.7)
...
Enter FFT Size in k :4096
Compressed output (also useful to specify output format) ? [0=No, 1=Yes] : 0
Basically, the .bat file just specified the fastest way to compute pi, told it to use no disk memory (as that would be another bottleneck and possible variable), compute 33554432 digits, and to use 230,640 K of memory.
Appendix C
Screenshot of pifast’s completed computation with CPU information provided by WCPUID on top.
Appendix D
Screenshot of Sisoft Sandra Arithmetic Benchmark.
Appendix E
Screenshot of Free Ram Optimizer XP
Appendix F
A picture of Prime95 Benchmarking.
Appendix G
A Screenshot of the molecular dynamics program in action.
Appendix H
A shot of the primordial simulation.
Appendix I
Screenshot of cipher in action.
Appendix K
Variable issues:
Temperature
Program Error
Network computation
Procedure:
Boot At specified frequency.
Use "Free Ram Optimizer XP" Free up 300.00 MB of memory before benchmark load
WCPUID ON SIDE OF SCREEN!
---
Open Sisoft Sandra
Run Sisoft Sandra CPU Arithmetic Benchmark
SCREENSHOT! File Name: SSSCPUABENCH$frequency$$DATE$.PNG
run Sisoft Sandra CPU Multi-Media Benchmark
SCREENSHOT! File Name: SSSCPUMMBENCH$frequency$$DATE$.PNG
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
---
Use "Free Ram Optimizer XP" Free up 300.00 MB of memory before benchmark load
---
run f:\program files\pifast\PIFAST32M.bat
SCREENSHOT! COMPUTATION OK... File Name: PIFAST$frequency$$DATE$.PNG
CHANGE LOG FILE NAME! SAVE AS: PIFAST$frequency$$DATE$.TXT
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
---
Use "Free Ram Optimizer XP" Free up 300.00 MB of memory before benchmark load
---
Run CPU Right Mark
Open prime95
Benchmark, make sure log is there
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
---
Open Science Mark
run Molecular Dynamics Simluation: Defualt settings, 140 K Temperature
Run primordia Simulation: Aluminum
Run Cipher Benchmark
CHANGE LOG FILE NAME! SAVE AS: SM$frequency$$date$.txt
CONFIRM THAT ALL RESULTS ARE CONTAINED.
+++++++++MOVE FILES TO F:\Benchmarks WITH APPROPRIATE AREA SAVED+++++++++
Code:
Appendix V
Prime 95 Best Time (ms) 2048 K FFT Linear Model Exponential Model
Frequency (in mhz) 1000 305.462 280.8938541 284.8999874
1100 281.174 271.4561845 272.8792246
1200 266.985 262.0185148 261.365653
1300 244.105 252.5808452 250.3378726
1400 231.534 243.1431755 239.7753864
1500 222.025 233.7055059 229.6585624
1600 213.761 224.2678362 219.9685967
1700 201.95 214.8301666 210.687479
1800 205.3924969 201.7979588
1900 187.595 195.9548273 193.2835134
2000 181.573 186.5171576 185.1283173
2100 178.285 177.079488 177.3172128
2200 170.776 167.6418183 169.8356816
2300 168.605 158.2041487 162.669818
2400 163.213 148.766479 155.8063032
Y= Equation y=0.0943766965*X+375.2705506 y=438.4418548*0.9995690039^X
Rvalue 0.966275416 0.983087601 Best Model: Exponential
Pifast Calculation Time For 35M digits of Pi (seconds) Linear Model Exponential Model
Frequency (in mhz) 1000 420.78 400.728694 406.1926074
1100 418.31 388.2474392 390.4816981
1200 368.28 375.7661844 375.3784629
1300 346.31 363.2849296 360.8593977
1400 330.61 350.8036748 346.9019077
1500 351.5 338.32242 333.4842721
1600 303.55 325.8411652 320.5856102
1700 292.78 313.3599104 308.1858488
1800 300.8786556 296.2656912
1900 274.41 288.3974008 284.8065871
2000 295.5 275.916146 273.7907036
2100 263.39 263.4348912 263.2008976
2200 248.16 250.9536364 253.0206892
2300 246.55 238.4723816 243.2342357
2400 239.39 225.9911268 233.8263072
Y= Equation y=0.124812548*x+525.541242 y=602.6224854*0.9996056143^x
Rvalue 0.957089526 0.968121161 Best Model: Exponential
Molecular Dynamics Calculation Time (seconds) Linear Model Exponential Model
Frequency (in mhz) 1000 200.4713 180.5514806 188.1055998
1100 184.01943 171.9272875 175.133709
1200 162.73004 163.3030944 163.0563688
1300 149.60813 154.6789013 151.8118902
1400 136.93067 146.0547082 141.3428385
1500 129.36831 137.4305152 131.5957397
1600 117.42824 128.8063221 122.5208076
1700 108.35438 120.182129 114.0716889
1800 111.5579359 106.2052273
1900 95.24033 102.9337428 98.88124225
2000 91.79168 94.3095497 92.06232421
2100 84.85377 85.68535661 85.71364341
2200 80.69694 77.06116352 79.80277198
2300 76.39628 68.43697043 74.29951829
2400 73.28478 59.81277734 69.17577273
Y= Equation y=-.0862419309*x+266.7934115 y=384.3452462*.9992857175^x
Rvalue 0.969827086 0.992891783 Best Model: Exponential
Primordia (aluminum) Calculation Time (seconds) Linear Model Exponential Model
Frequency (in mhz) 1000 28.9436 26.69250317 27.33774747
1100 26.88128 25.64557487 25.91462602
1200 24.71778 24.59864657 24.56558802
1300 23.06108 23.55171827 23.28677691
1400 21.43038 22.50478997 22.07453688
1500 20.37451 21.45786167 20.92540245
1600 19.24684 20.41093337 19.83608853
1700 18.07063 19.36400507 18.80348103
1800 18.31707677 17.82462799
1900 16.36488 17.27014847 16.89673111
2000 15.69712 16.22322017 16.01713778
2100 15.17129 15.17629187 15.18333345
2200 14.6119 14.12936357 14.39293447
2300 14.00373 13.08243527 13.64368131
2400 13.5671 12.03550697 12.93343203
Y= Equation y=-.010469283*x+37.16178617 y=46.65954972*.9994655337^x
Rvalue 0.973713895 0.991162496 Best Model: Exponential
ScienceMark Cipher Sec) Cipher Time (seconds) Linear Model Exponential Model
Frequency (in mhz) 1000 28.66301 26.34561101 27.16241494
1100 26.50202 25.22812051 25.58257982
1200 24.27935 24.11063001 24.09463193
1300 22.44325 22.99313951 22.69322687
1400 20.87407 21.87564901 21.37333109
1500 19.54407 20.75815851 20.13020381
1600 18.35992 19.64066801 18.95937997
1700 17.24314 18.52317751 17.85665422
1800 17.40568701 16.81806581
1900 15.43873 16.28819651 15.83988433
2000 14.61881 15.17070601 14.91859638
2100 13.96487 14.05321551 14.05089288
2200 13.38392 12.93572501 13.23365723
2300 12.81679 11.81823451 12.46395407
2400 12.31024 10.70074401 11.73901881
Y= Equation y=-.011174905*x+37.52051601 y=49.45483921*.9994009538^x
Rvalue 0.975252583 0.993499578 Best Model: Exponential
ScienceMark Cipher (MB) Cipher Bandwidth (MB/s) Linear Model Exponential Model
Frequency (in mhz) 1000 53.29 52.7499795 56.19132654
1100 57.58 57.8579206 59.65974306
1200 62.85 62.9658617 63.34224801
1300 67.99 68.0738028 67.25205603
1400 73.1 73.1817439 71.40319743
1500 78.07 78.289685 75.81056854
1600 83.11 83.3976261 80.48998518
1700 88.49 88.5055672 85.45823937
1800 93.6135083 90.73315967
1900 98.83 98.7214494 96.33367507
2000 104.38 103.8293905 102.279883
2100 109.27 108.9373316 108.5931213
2200 114.01 114.0452727 115.296045
2300 119.05 119.1532138 122.4127075
2400 123.95 124.2611549 129.9686469
Y= Equation y=.051079411*x+1.670568502 y=30.87083292*1.00059913^x
Rvalue 0.999927074 0.993926226 Best Model: Linear
sisoft Sandra Arithmetic Benchmark Arithmetic Benchmark (MIPS) Linear Model Exponential Model
Frequency (in mhz) 1000 4159 4112.970295 4405.020146
1100 4499 4531.984891 4685.756818
1200 4991 4950.999488 4984.385141
1300 5357 5370.014085 5302.045368
1400 5812 5789.028682 5639.950423
1500 6166 6208.043278 5999.39053
1600 6586 6627.057875 6381.738143
1700 7043 7046.072472 6788.45318
1800 7465.087068 7221.088604
1900 7894 7884.101665 7681.296348
2000 8294 8303.116262 8170.833628
2100 8653 8722.130858 8691.569645
2200 9197 9141.145455 9245.492728
2300 9589 9560.160052 9834.717925
2400 9986 9979.174649 10461.49508
Y= Equation y=4.190145967*x+-77.17567222 y=2374.813069*1.000618017^x
Rvalue 0.99981704 0.994394962 Best Model: Linear
Sisoft Sandra Arithmetic Benchmark (MFLOPS) Arithmetic Benchmark (MFLOPS) Linear Model Exponential Model
Frequency (in mhz) 1000 1601 1594.966197 1706.653692
1100 1758 1757.569014 1815.690011
1200 1916 1920.171831 1931.692546
1300 2076 2082.774648 2055.106362
1400 2249 2245.377465 2186.404958
1500 2393 2407.980282 2326.092083
1600 2554 2570.583099 2474.703673
1700 2782 2733.185915 2632.809902
1800 2895.788732 2801.017372
1900 3049 3058.391549 2979.971442
2000 3211 3220.994366 3170.358699
2100 3381 3383.597183 3372.9096
2200 3557 3546.2 3588.401266
2300 3706 3708.802817 3817.660469
2400 3869 3871.405634 4061.566802
Y= Equation y=1.626028169*x+-31.06197183 y=918.7183851*1.000619502^x
Rvalue 0.99977496 0.993364412 Best Model: Linear
Sisoft Sandra MultiMedia Benchmark Integer x4 aEMMX/aSSE (it/s) Integer x4 aEMMX/aSSE (it/s) Linear Model Exponential Model
Frequency (in mhz) 1000 9262 9232.395391 9891.067441
1100 10203 10182.13214 10526.27378
1200 11156 11131.86889 11202.2732
1300 12038 12081.60563 11921.68544
1400 13053 13031.34238 12687.29848
1500 13944 13981.07913 13502.07933
1600 14837 14930.81588 14369.18559
1700 15889 15880.55263 15291.97757
1800 16830.28937 16274.03145
1900 17815 17780.02612 17319.15302
2000 18739 18729.76287 18431.39251
2100 19710 19679.49962 19615.06025
2200 20648 20629.23636 20874.74336
2300 21559 21578.97311 22215.32358
2400 22527 22528.70986 23641.99612
Y= Equation y=9.497367478*x+-264.9720871 y=5307.966561*1.000622617^x
Rvalue 0.999965216 0.993795729 Best Model: Linear
Sisoft Sandra MultiMedia Benchmark Floating-Point x4 aSSE (it/s) Floating-Point x4 aSSE (it/s) Linear Model Exponential Model
Frequency (in mhz) 1000 9922 9897.197443 10596.04679
1100 10914 10909.54008 11274.17985
1200 11952 11921.88272 11995.71254
1300 12894 12934.22536 12763.42236
1400 13990 13946.56799 13580.26461
1500 14900 14958.91063 14449.38368
1600 15871 15971.25327 15374.12523
1700 17031 16983.59591 16358.04903
1800 17995.93855 17404.94265
1900 19053 19008.28118 18518.83609
2000 20004 20020.62382 19704.01724
2100 21056 21032.96646 20965.04843
2200 22055 22045.3091 22306.78396
2300 23054 23057.65174 23734.3888
2400 24062 24069.99437 25253.35846
Y= Equation y=10.12342638*x+-226.2289373 y=5698.137911*1.000620534^x
Rvalue 0.999958386 0.993742568 Best Model: Linear