PDA

View Full Version : Question about ATI Folding


hainer36
05-14-09, 05:20 PM
Is the ATI version of GPU folding supposed to use an entire cores worth of processing power? I have an i7 920 and it always says it is using 13% (1 of 8 cores) on the GPU client. The Nvidia version doesnt use this much (0-1% tops) so is it just that the Nvidia client is that much better, or is something wrong

dark bishop
05-14-09, 06:27 PM
you probably need to adjust enviromental variable, im not sure which ones though. i assume you have the latest driver and newest version of the fah_core.

hainer36
05-14-09, 07:42 PM
was on 9.2 9.3 and 9.4 and it was all the same, and use the client from Stanfords website.

AlucardCasull
05-14-09, 09:15 PM
My 4870 uses 25% of my q9400 if that helps.

Darius_Silver
05-14-09, 09:17 PM
The GPU2 client is supposed to be optimized for Nvidia cards with Cuda programming. I think this is why they use such little processor power.

As far as ATI cards go, mine is using 95% Core1 and 30% core2 on average. But I have the client set to use as much processor power as it needs. Maybe check you configure settings to see where its at?

Also, if Stream ever gets put into a future GPU client, ATI cards should act the same as Nvidia cards (Or better :p)

ChasR
05-14-09, 09:37 PM
I've got a HD 3850 running in an old FX-53 single core rig. I'm able to run the gpu client and the cpu client, losing only 5% production on the cpu client. With the drivers 8.12 and above, the environment variables FLUSH_INTERVAL, CAL_NO_FLUSH, CAL_PRE_FLUSH, and BROOK_YIELD, work to reduce cpu utilization to close to zero while keeping gpu utilization close to 100%.

A description from Mike Houston, ATi:
FLUSH_INTERVAL is the one that is going to effect GFX performance. Basically, it corresponds to the number of functions submitted in one shot to the GPU. The GPU will not do anything else, including UI updates, until that submission completes. A lower value will basically reduce the timeslice folding@home gets on the GPU, but UI and graphics responsiveness will improve. However, as the value gets smaller, the CPU/OS/Driver overheads increase, so there is a tradeoff between folding@home performance and UI 'snappiness'.

CAL_NO_FLUSH and CAL_PRE_FLUSH, which were poor name choices by me, change how things are submitted to the hardware. CAL_NO_FLUSH changes how we build up packets of work to submit to the hardware. CAL_PRE_FLUSH basically 'double buffers' the building of command buffers so one can be processed while another is being built.

BROOK_YIELD has several modes, 0/1/2. 0 will spin the CPU giving the lowest latency response to the GPU as possible. 1 will yield the CPU when waiting on the GPU to complete to any process of the same priority. 2 will yield the CPU to any process. Now, for really small flush intervals and small proteins, there is a chance the GPU was almost done when the CPU yields. You have to wait to be rescheduled, which could be up to a millisecond. A fast GPU completes many of the kernels in <100 microseconds, so this can have a large impact. If you have a larger flush interval, you can build up several milliseconds of work so missing by a bit is less of an issue.

We are working on less esoteric methods for setting these things. Basically CAL_NO_FLUSH and CAL_PRE_FLUSH will be enabled by default in a later core update and we will start auto adjusting the FLUSH_INTERVAL to match different size proteins better, basically by making the FLUSH_INTERVAL setting not be the number of functions submitted, but a time interval.

THe setup that worked for me on the 3850 was to set the variables as follows:
FLUSH_INTERVAL = 128
BROOK_YIELD = 2
CAL_NO_FLUSH = 1

Your settings will vary depending on the card.