Notices

Overclockers Forums > Hardware > CPUs > General CPU Discussion
General CPU Discussion
Forum Jump

Cell, P4, gflops and mhz... (PS3/Xbox2/PC)

Post Reply New Thread Subscribe Search this Thread
 
 
Thread Tools
Old 02-12-05, 11:51 PM Thread Starter   #1
OC Noob
Member

 
OC Noob's Avatar 

Join Date: Jun 2002
Location: Phoenix, AZ USA

 
Cell, P4, gflops and mhz... (PS3/Xbox2/PC)


I posted this in a gaming thread and thought someone here could probably answer my question. This is, after all, the CPU section

"This is one thing Sony does extremely well. Hype. I'm still trying to understand if this thing is the greatest thing since sliced bread or just an overhyped piece of silicon.

They say it does 256 Gflops @ 4.6 ghz. Can it even reach those speed and is that just theoretical?

A P4 3.2 can do a theoretical 59 gflops and a Radeon X800 can do 200 gflops according to its documentation.

According to that presentation on Cell, PS3 bandwidth limits Cell to 1/10 of the power (256 gflop) so 25 gflops. If thats its theoretical limit then a P4 is theoretically faster then what Cell will be in the PS3.


What the heck, I need some computing genius to make some sense out of this crapola."

__________________
Hail to the King:
Opteron 165 w/ DFI Ultra-D w/ BBA 1900XT 512 mb GSkill PC4200 1 GB x 2
74 gb WD Raptor x 2 Raid 0 MSI (ATI550) Tuner
w/ Windows XP Media Center Edition & OCZ Powerstream 520
DD TDX H2O block w/ Maze4 GPA block DD D4 pump
single 120mm Fan Heater core w/ shroud In Lian-Li fish tank window case

RIP (Rest In Pieces):
P4 3.0 ghz @ 3.75 ghz Aerocool HT-101 IC7-G
Radeon 9800 Pro 430 W Antec True Power
OC Noob is offline   QUOTE Thanks
Old 02-13-05, 12:17 PM   #2
man_utd
Member



Join Date: Jun 2003
Location: Amsterdam, NL

 
From what I understand, Flops are what it is pretty good at, as long as it's not bandwith limited (which it is, due to lack of a decent sized cache). 4.6ghz, I see as being theoretical speed, just doesn't seem realistic to me.
man_utd is offline   QUOTE Thanks
Old 02-13-05, 02:13 PM   #3
Gnufsh
Senior Member

 
Gnufsh's Avatar 

Join Date: Dec 2001
Location: June Lake, California

 
I believe their sample ran at 4GHz, but I may be mistaken.

__________________
Lost access to the classifieds? Look here.
Forum Policies
Sig Rules

"Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened."
-Sir Winston Churchill
Gnufsh is offline   QUOTE Thanks
Old 02-13-05, 03:04 PM Thread Starter   #4
OC Noob
Member

 
OC Noob's Avatar 

Join Date: Jun 2002
Location: Phoenix, AZ USA

 
So does it mean anything that Cell can do 256 gflops at 4.6 ghz, P4 can do 59 @ 3.2 ghz and an X800 can do 200 gflops?

Probably all theoretical numbers, but can we use them to make any kind of a comparison between the three?

And why use CPU's when GPU's are so good at gflops?


Sorry for the stupid questions, I really don't know much about flops.

__________________
Hail to the King:
Opteron 165 w/ DFI Ultra-D w/ BBA 1900XT 512 mb GSkill PC4200 1 GB x 2
74 gb WD Raptor x 2 Raid 0 MSI (ATI550) Tuner
w/ Windows XP Media Center Edition & OCZ Powerstream 520
DD TDX H2O block w/ Maze4 GPA block DD D4 pump
single 120mm Fan Heater core w/ shroud In Lian-Li fish tank window case

RIP (Rest In Pieces):
P4 3.0 ghz @ 3.75 ghz Aerocool HT-101 IC7-G
Radeon 9800 Pro 430 W Antec True Power
OC Noob is offline   QUOTE Thanks
Old 02-13-05, 04:18 PM   #5
Gnufsh
Senior Member

 
Gnufsh's Avatar 

Join Date: Dec 2001
Location: June Lake, California

 
Quote:
Originally Posted by OC Noob
And why use CPU's when GPU's are so good at gflops?
Well, off the top of my head, flexibility and integer operations. GPUS are very good at what they do, and they are moving to be more programmable, but they cannot do the sort of general purpose things CPUs can do. Additionally, many programs use integer code, something at which desktop CPUs are much better than GPUs. Additionally, the 256GFLOPs figure is for single precision FP operations, CPUs generally do double precision. The figure for the cell is much lower when double-precision FP ops are used instead of single precision.

__________________
Lost access to the classifieds? Look here.
Forum Policies
Sig Rules

"Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened."
-Sir Winston Churchill
Gnufsh is offline   QUOTE Thanks
Old 02-13-05, 04:23 PM   #6
germanjulian
Member

 
germanjulian's Avatar 

Join Date: Apr 2002
Location: Frankfurt/London

 
it all depends!

thats the correct thing to say. depending how the cpu's are build, their bandwith, cache, instruction sets and especially how programs are written = SPEED.

the cell processor will be amazing... no doubt about it... it will kick some a**... if it will take over the PC word ...... maybe.... but I doubt that

__________________
/|\ Asus P5W DH Deluxe, Intel C2D E6600, 4GB Corsair XMS2-6400C4 DDR2, E-VGA GeForce 7800 GT, Asus Xonar D2, 160GB Intel X25-m SSD, 1TB HD, Coolermaster Cosmos, etc. see my website google my name/|\
germanjulian is offline   QUOTE Thanks
Old 02-14-05, 08:56 PM Thread Starter   #7
OC Noob
Member

 
OC Noob's Avatar 

Join Date: Jun 2002
Location: Phoenix, AZ USA

 
hmmm, thanks for the info.


I'm still not sold on this thing. Sounds like a lot of hype and numbers that are meaningless. I guess we'll see what it does when "the rubber meets the road."


I should say I'm not sold on this thing when it comes to PCs or PC chip applications (servers). For consoles I have no doubts it will be great. They are going to be coded specifically for Cell where PC software won't or is limited to what is coded for it.

By the time it gets even a decent software base Itanium could be out with an MS OS to support it or some other advanced chip.

__________________
Hail to the King:
Opteron 165 w/ DFI Ultra-D w/ BBA 1900XT 512 mb GSkill PC4200 1 GB x 2
74 gb WD Raptor x 2 Raid 0 MSI (ATI550) Tuner
w/ Windows XP Media Center Edition & OCZ Powerstream 520
DD TDX H2O block w/ Maze4 GPA block DD D4 pump
single 120mm Fan Heater core w/ shroud In Lian-Li fish tank window case

RIP (Rest In Pieces):
P4 3.0 ghz @ 3.75 ghz Aerocool HT-101 IC7-G
Radeon 9800 Pro 430 W Antec True Power
OC Noob is offline   QUOTE Thanks
Old 02-16-05, 11:28 AM   #8
JigPu
Inactive Pokémon Moderator

 
JigPu's Avatar 

Join Date: Jun 2001
Location: Vancouver, WA

10 Year Badge
 
Quote:
Originally Posted by OC Noob
A P4 3.2 can do a theoretical 59 gflops
Where'd you get that bit of info? I can't for the life of me get the math (or even Google ) to give me that number...

59 GFLOPs @ 3.2GHz = 18 floating point ops/cycle

Somebody wanna enlighten me as to how 18 floating point values can appear to be modified in only one clock tick? I mean, even if you use SSE, that only messes with 4 pieces of data, leaving you with 14 yet to be touched. If we go for dual parallel SSE pipes, we're still 10 ops too short. Assume that hyperthreading will magically allow you to perform two operations in one cycle as well ( ), and we're STILL 2 operations too short...

JigPu

__________________
.... ASRock Z68 Extreme3 Gen3
.... Intel Core i5 2500 ........................ 4 thread ...... 3300 MHz ......... -0.125 V
2x ASUS GTX 560 Ti ............................... 1 GiB ....... 830 MHz ...... 2004 MHz
.... G.SKILL Sniper Low Voltage ............. 8 GiB ..... 1600 MHz ............ 1.25 V
.... OCZ Vertex 3 ................................. 120 GB ............. nilfs2 ..... Arch Linux
.... Kingwin LZP-550 .............................. 550 W ........ 94% Eff. ....... 80+ Plat
.... Nocuta NH-D14 ................................ 20 dB ..... 0.35 C°/W ................ 7 V


"In order to combat power supply concerns, Nvidia has declared that G80 will be the first graphics card in the world to run entirely off of the souls of dead babies. This will make running the G80 much cheaper for the average end user."
"GeForce 8 Series." Wikipedia, The Free Encyclopedia. 7 Aug 2006, 20:59 UTC. Wikimedia Foundation, Inc. 8 Aug 2006.
JigPu is offline   QUOTE Thanks
Old 02-16-05, 02:54 PM   #9
Gnufsh
Senior Member

 
Gnufsh's Avatar 

Join Date: Dec 2001
Location: June Lake, California

 
Well, there are also the 2 FP ADDers that operate at twice the frequency of the chip...

OC Noob: MS already has an OS out for the Itanium. Or were you talking about the Cell?

__________________
Lost access to the classifieds? Look here.
Forum Policies
Sig Rules

"Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened."
-Sir Winston Churchill
Gnufsh is offline   QUOTE Thanks
Old 02-16-05, 03:13 PM   #10
madcow235
Member



Join Date: May 2002
Location: Purdue University, IN

 
I read that cell is really only 25flops when its actually accurate in the floating point. Also the code needs to be specialized to work with the cell and in the pc market Specialized code is hard to come by
madcow235 is offline   QUOTE Thanks
Old 02-16-05, 09:03 PM   #11
JigPu
Inactive Pokémon Moderator

 
JigPu's Avatar 

Join Date: Jun 2001
Location: Vancouver, WA

10 Year Badge
 
Quote:
Originally Posted by Gnufsh
Well, there are also the 2 FP ADDers that operate at twice the frequency of the chip...
I thought those were integer, not FP... Oh well, goes to show how much attention I pay to chips these days

Regardless, fully loading the 2 adders (which I assume aren't SIMD), is only 4 ops/cycle. Again, assuming that HT somehow magically lets you execcute two instructions in the same clock cycle gives only 8 OPS/cyce. Still quite a bit less than 18....

JigPu

__________________
.... ASRock Z68 Extreme3 Gen3
.... Intel Core i5 2500 ........................ 4 thread ...... 3300 MHz ......... -0.125 V
2x ASUS GTX 560 Ti ............................... 1 GiB ....... 830 MHz ...... 2004 MHz
.... G.SKILL Sniper Low Voltage ............. 8 GiB ..... 1600 MHz ............ 1.25 V
.... OCZ Vertex 3 ................................. 120 GB ............. nilfs2 ..... Arch Linux
.... Kingwin LZP-550 .............................. 550 W ........ 94% Eff. ....... 80+ Plat
.... Nocuta NH-D14 ................................ 20 dB ..... 0.35 C°/W ................ 7 V


"In order to combat power supply concerns, Nvidia has declared that G80 will be the first graphics card in the world to run entirely off of the souls of dead babies. This will make running the G80 much cheaper for the average end user."
"GeForce 8 Series." Wikipedia, The Free Encyclopedia. 7 Aug 2006, 20:59 UTC. Wikimedia Foundation, Inc. 8 Aug 2006.
JigPu is offline   QUOTE Thanks
Old 02-17-05, 10:41 AM   #12
Gnufsh
Senior Member

 
Gnufsh's Avatar 

Join Date: Dec 2001
Location: June Lake, California

 
I assume that, as long as we're doing theoretical calculation, we can use both sse and 387 code. WHich means 4/clock from the FP adders, 1/clock from the full FPU, 4-8 more from sse... Oh, wait, still short.

This post points to 24GFLOPS from the p4:
http://forums.anandtech.com/messagev...&enterthread=y

__________________
Lost access to the classifieds? Look here.
Forum Policies
Sig Rules

"Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened."
-Sir Winston Churchill
Gnufsh is offline   QUOTE Thanks
Old 02-18-05, 03:33 AM   #13
imgod2u
Member



Join Date: Jun 2002
Location: Isla Vista, CA

 
That's not possible, the x87 FPU and the SSE unit share the same execution hardware. They even share an issue port. And you cannot issue multiply and add instructions in parallel, you can't even issue them alone. In order to get the full throughput of 1 SSE instructions per cycle, you must alternate multiply and add instructions. The previous poster is correct, at most you can get 4 SP FP ops per cycle on a P4 which leads to 12.8 GFLOPS on a 3.2 GHz P4.

The post on anandtech is incorrect as the person is getting 24 by assuming there's a multiply-accumulate (FMAC) instruction. SSE/2/3 does not have FMAC instructions like these cells or AltiVec has.
imgod2u is offline   QUOTE Thanks
Old 02-18-05, 10:21 AM   #14
Gnufsh
Senior Member

 
Gnufsh's Avatar 

Join Date: Dec 2001
Location: June Lake, California

 
Ah, I assumed the SSE/x87 execution hardware was different (I'm fairly certain it is in the P3, and I'm 100% certain it is in some x86 cpus).

__________________
Lost access to the classifieds? Look here.
Forum Policies
Sig Rules

"Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened."
-Sir Winston Churchill
Gnufsh is offline   QUOTE Thanks
Old 02-18-05, 06:13 PM Thread Starter   #15
OC Noob
Member

 
OC Noob's Avatar 

Join Date: Jun 2002
Location: Phoenix, AZ USA

 
Quote:
Originally Posted by Gnufsh
Well, there are also the 2 FP ADDers that operate at twice the frequency of the chip...

OC Noob: MS already has an OS out for the Itanium. Or were you talking about the Cell?
sorry have to be quick, but yeah. I was talking about Cell. Just stuck in that Itanium could have an MS OS by then... but anything could happen

EDIT: Didin't know its software was by MS. Thanks for the info. I should have said an MS OS for home users.


Quote:
Originally Posted by JigPu
Where'd you get that bit of info? I can't for the life of me get the math (or even Google ) to give me that number...

59 GFLOPs @ 3.2GHz = 18 floating point ops/cycle

Somebody wanna enlighten me as to how 18 floating point values can appear to be modified in only one clock tick? I mean, even if you use SSE, that only messes with 4 pieces of data, leaving you with 14 yet to be touched. If we go for dual parallel SSE pipes, we're still 10 ops too short. Assume that hyperthreading will magically allow you to perform two operations in one cycle as well ( ), and we're STILL 2 operations too short...

JigPu

I did an internet search for gflops and pentium and it was said on a few boards where people had done the math.

I'll link it when I have more time


EDIT:


Found it referenced at a handful of places, but am having a hard time finding the boards that did the math to get that. Not saying its right either, but I'd like to find it so you guys can take a look and tell me if its right or wrong.

http://www.abovetopsecret.com/forum/thread75492/pg2

http://www.geek.com/news/geeknews/20...0728026217.htm

http://www.nvnews.net/vbulletin/showthread.php?p=434532


Oh well, I give up for tonight. Sounds like that number isn't right anyway.

__________________
Hail to the King:
Opteron 165 w/ DFI Ultra-D w/ BBA 1900XT 512 mb GSkill PC4200 1 GB x 2
74 gb WD Raptor x 2 Raid 0 MSI (ATI550) Tuner
w/ Windows XP Media Center Edition & OCZ Powerstream 520
DD TDX H2O block w/ Maze4 GPA block DD D4 pump
single 120mm Fan Heater core w/ shroud In Lian-Li fish tank window case

RIP (Rest In Pieces):
P4 3.0 ghz @ 3.75 ghz Aerocool HT-101 IC7-G
Radeon 9800 Pro 430 W Antec True Power

Last edited by OC Noob; 02-19-05 at 01:12 AM.
OC Noob is offline   QUOTE Thanks
Old 02-18-05, 09:11 PM   #16
man_utd
Member



Join Date: Jun 2003
Location: Amsterdam, NL

 
Gigaflops are not the reason you have a CPU for gaming, the GPU will always obliterate it. A CPU for gaming needs higher integer, which cell is sounding like it sucks at. GJ IBM, build a chip that looks great on paper, but still lacks a kep component for gaming.
man_utd is offline   QUOTE Thanks
Old 02-19-05, 01:18 AM   #17
imgod2u
Member



Join Date: Jun 2002
Location: Isla Vista, CA

 
Quote:
Originally Posted by Gnufsh
Ah, I assumed the SSE/x87 execution hardware was different (I'm fairly certain it is in the P3, and I'm 100% certain it is in some x86 cpus).
As far as I know, no x86 MPU uses separate execution hardware. The only chip I'm aware of that has separate vector units is the PPC G4 and 970. Rarely, if ever, will you have code that uses both scalar and vector instructions within any reasonable window, so such a feature would be pretty useless as well as costly.
imgod2u is offline   QUOTE Thanks
Old 02-19-05, 01:20 AM   #18
imgod2u
Member



Join Date: Jun 2002
Location: Isla Vista, CA

 
Quote:
Originally Posted by man_utd
Gigaflops are not the reason you have a CPU for gaming, the GPU will always obliterate it. A CPU for gaming needs higher integer, which cell is sounding like it sucks at. GJ IBM, build a chip that looks great on paper, but still lacks a kep component for gaming.
IMHO, it sounds like Cell is suppose to replace GPU's rather than CPU's. Having fully-programmable Vector processors are much better from a programming point of view than shader-programming on the GPU. There will still be a graphics processor, but I doubt it'll be that powerful. Looks like we're moving back to the days of software rendering and the graphics subsystem being there just for drawing.
imgod2u is offline   QUOTE Thanks
Old 02-19-05, 09:44 AM   #19
Gnufsh
Senior Member

 
Gnufsh's Avatar 

Join Date: Dec 2001
Location: June Lake, California

 
Quote:
Originally Posted by imgod2u
As far as I know, no x86 MPU uses separate execution hardware. The only chip I'm aware of that has separate vector units is the PPC G4 and 970. Rarely, if ever, will you have code that uses both scalar and vector instructions within any reasonable window, so such a feature would be pretty useless as well as costly.
really? Good to know. I wonder why gcc has this option then:
Quote:
-mfpmath=unit
Generate floating point arithmetics for selected unit unit. The choices for unit are:

387
Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.

This is the default choice for i386 compiler.
sse
Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too.

For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.

This is the default choice for the x86-64 compiler.
sse,387
Attempt to utilize both instruction sets at once. This effectively double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because the GCC register allocator does not model separate functional units well resulting in instable performance.
And it's an i386 option. Perhaps some cpus, while the registers are shared (I think they have to be), have seperate execution hardware? Or, I could be wrong. It's happened before and it'll happen again.

__________________
Lost access to the classifieds? Look here.
Forum Policies
Sig Rules

"Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened."
-Sir Winston Churchill
Gnufsh is offline   QUOTE Thanks
Old 02-19-05, 07:33 PM   #20
imgod2u
Member



Join Date: Jun 2002
Location: Isla Vista, CA

 
Quote:
Originally Posted by Gnufsh
really? Good to know. I wonder why gcc has this option then:

And it's an i386 option. Perhaps some cpus, while the registers are shared (I think they have to be), have seperate execution hardware? Or, I could be wrong. It's happened before and it'll happen again.
The reason is that some code runs faster in x87 (even on the P4) and some code runs faster in SSE. ICC does the same thing, it spits out a mixture of x87 and SSE code. However, they're rarely close enough to be executed in parallel by the instruction window, even an instruction window as large as the P4's.

As far as I know, in hardware, the 2 instruction extensions use the same registers. They're treated separately by the ISA, but in reality, they're written to the same ones. You can, however, issue an x87 instruction and an SSE one at once, SSE does have scalar instructions and that may be what they're refering to when they say both can be executed in parallel. Although at that point, I'm not sure why you don't just use 2 parallel SSE scalar instructions.
imgod2u is offline   QUOTE Thanks

Post Reply New Thread Subscribe


Overclockers Forums > Hardware > CPUs > General CPU Discussion
General CPU Discussion
Forum Jump

Thread Tools Search this Thread
Search this Thread:

Advanced Search


Mobile Skin
All times are GMT -5. The time now is 05:15 AM.
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
You can add these icons by updating your profile information to include your Heatware ID, Benching Profile ID or your Folding/SETI profile ID. Edit your profile!
X

Welcome to Overclockers.com

Create your username to jump into the discussion!

New members like you have made this the best community on the Internet since 1998!


(4 digit year)

Why Join Us?

  • Share experience
  • Max out your hardware
  • Best forum members anywhere
  • Customized forum experience

Already a member?