View Full Version : Added to folding.vmx-priorities\infinities
GeneralMac
06-28-09, 08:26 AM
Hey all, dont know if anyone knows(I didnt)so thought this might help someone
when i edited the smp client's folding.vmx w/notepad I added these lines to automatically add the prorities and infinities:
smp 1
quote
priority.ungrabbed = "idle"
priority.grabbed = "idle"
processor0.use = "TRUE"
processor1.use = "FALSE"
processor2.use = "FALSE"
processor3.use = "FALSE"
processor4.use = "TRUE"
processor5.use = "FALSE"
processor6.use = "FALSE"
processor7.use = "FALSE"
SMP 2
quote:
priority.ungrabbed = "idle"
priority.grabbed = "idle"
processor0.use = "FALSE"
processor1.use = "TRUE"
processor2.use = "FALSE"
processor3.use = "FALSE"
processor4.use = "FALSE"
processor5.use = "TRUE"
processor6.use = "FALSE"
processor7.use = "FALSE"
SMP 3:
quote:
priority.ungrabbed = "idle"
priority.grabbed = "idle"
processor0.use = "FALSE"
processor1.use = "FALSE"
processor2.use = "TRUE"
processor3.use = "FALSE"
processor4.use = "FALSE"
processor5.use = "FALSE"
processor6.use = "TRUE"
processor7.use = "FALSE"
SMP 4:
quote:
priority.ungrabbed = "idle"
priority.grabbed = "idle"
processor0.use = "FALSE"
processor1.use = "FALSE"
processor2.use = "FALSE"
processor3.use = "TRUE"
processor4.use = "FALSE"
processor5.use = "FALSE"
processor6.use = "FALSE"
processor7.use = "TRUE"
I've got all of VMs priorities and affinities set in the .vmx file. Works great.
Grabbed priority should be set to normal. Idle is not a valid value for grabbed priority. You can enter the value in the .vmx file, but it does nothing, the VM runs at normal grabbed priority, which is the minimum valid value.
You should try NoteTab Light. Its a far better editor than Notepad and it's free.
affinity = an attraction or liking of something, as in the threads from SMP 4 are "attracted" to run on processor 3 and 7.
infinity = boundless
:)
I'm curious as to the performance benefits of setting affinity to logical processors and running 4 VMs on an i7. Windows must be doing a masterful job of task scheduling to run 16 threads through 4 FP units. Have you tried fewer VMs without affinity? I'll poke around on FCF and see what I can find.
GeneralMac
06-28-09, 10:13 AM
Grabbed priority should be set to normal. Idle is not a valid value for grabbed priority. You can enter the value in the .vmx file, but it does nothing, the VM runs at normal grabbed priority, which is the minimum valid value.
I checked taskmanager and all cpu priorities were set to low when i entered "idle" :beer: its great because if I have to stop then restart dont have to set anything :thup:
I'm curious as to the performance benefits of setting affinity to logical processors and running 4 VMs on an i7. Windows must be doing a masterful job of task scheduling to run 16 threads through 4 FP units. Have you tried fewer VMs without affinity? I'll poke around on FCF and see what I can find
I actually gain more ppd running 4 vms and 2 gpu as apposed to 3 vm and 2 gpu, even though the single ppd per cpu is lower the the grand total of the sum of all 4 cpu ppd is greater :beer:
How much greater are 4 VMs with affinity than three without? How much do you gain by setting affinity on 4? There are definite scientific advantages to having the individual WUs returned as soon as possible over the max ppd senario you're running, where you produce more WUs over time but complete each WU slower.
priority.ungrabbed="idle" will cause the vmware-vmx.exe process to run at "low" priority as seen in Task manager (idle and low are both reported as low by Task Manager). There is no need to edit priority.grabbed because "normal", the default value, is as low as it will go.
GeneralMac
06-28-09, 11:53 AM
How much greater are 4 VMs with affinity than three without? How much do you gain by setting affinity on 4? There are definite scientific advantages to having the individual WUs returned as soon as possible over the max ppd senario you're running, where you produce more WUs over time but complete each WU slower.
Hmmm good point. but if points = wu complete, then wouldnt that mean more wu's are being completed if you gain 1000 - 2000 ppd? as long as you are completing them well before they are due?
I have no Idea good question
priority.ungrabbed="idle" will cause the vmware-vmx.exe process to run at "low" priority as seen in Task manager (idle and low are both reported as low by Task Manager). There is no need to edit priority.grabbed because "normal", the default value, is as low as it will go.
Before I add the priority "idle" according to task manager the priority was "normal" after I added "idle" according to task manager the priority was "low" which is 2 settings under "normal"
But I stress, I am New to this and you are very experienced folder. That being said My $0.02 may be oxidized
harlam357
06-28-09, 12:11 PM
Yep... I've started doing this on most of mine as well... very handy. :)
Hey cool, I need to play around with that. I've read that I7's are capable of 10k-11k with 4 smp vm's running. I've only gotten 8k with 2 vms doing 4k each on real cores. I'd like to specify which cores they use so they don't gimp my GPU's (though NV_FAH_CPU_AFFINITY environmental variable helped with that).
What version of vmware are you running? VMware server 2.0 is limited to 4 cores on the free version.
I probably wont run 4 instances as this thing is pumping out tons of heat as is, but its good to know about.
--edit--
Notepad rules! :)
hehe, I've balanced out all my core temps with this. Core 0 likes to run 7c hotter (common on i7s) than core 3. So I've got GPUs and most apps running on Core 0 and vm 1 running on Core 1,4(HT) and vm 2 running on Core 2,3. I was watching real temp and all 4 cores we the same temp for the first time ever.
I'll report back in 24 hours on how vm 1 is producing vs vm 2. So far running on the HT core hasn't cost me very much ppd.
Affinity becomes quite blurred on HT rigs. Every folding thread HAS to run on the 4 physical FP units, so even though you set it to run on a logical core, it runs on the physical core. The task scheduler just does what it has to, regardless of what you set. Is there any performance benefit to assigning affinity to the VMs on an I7 running 2 VMs?
When I was running 2 vms at default settings they would max out the 4 physical cores and the logical cores would be idle. My gpu's would default to 6 or 7 (ht cores) which would result in my GPU ppd being cut in half. This could just be a "priority" issue but NV_FAH_CPU_AFFINITY set to all cores fixed it. Specifying the affinity for everything to run on different cores has worked well also.
I wouldn't say setting affinity for 2 vm's (on an I7 w/ HT) is a benefit PPD wise but it does help with the juggling of processes. Maybe I like micro managing my system too much :). Also with HT my system is usable for all but the most demanding tasks while folding.
If you have priority set correctly, affinity makes little if any difference in GPU production. On a Q6600, you can gain a few hundred ppd on the SMP VMs with affinity set due to the 4 threads of each client running in a separate cache. On the i7, I don't know if cache contention is an issue.
I'm sure if I found a tool that would set the GPU priority to "above normal" every time the cores fire up / wu was finished, it would have the same effect. Vmware server was just defaulting to "normal".
Honestly I can't comment on the cache situation. Though HT can squeeze out extra cycles when its actually utilized (more than 4 cores worth of work, or say 4 smp's).
From HT wiki "When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)"
Assuming the GPU priority is set to low (console -config advanced options) or slightly higher (systray), the VMs will not interfere with the GPU(s) if they are also set to low (idle) priority. You can set vmware priority three ways, two of which always work. Easy way is set the vmware-vmx.exe process to low priority using Task Manager. Priority set in this manner is sticky until the VM is closed. It is retained on guest restarts. Next is edit the .vmx file, ungrabbed.priority="idle", as GM was pointing out. THird is set ungrabbed priority in VMware Host configuration to low. The latter never worked for me after a VM was created with the default priority. There isn't any need to elevate GPU priority above low nor do I think you would want to. Useability problems would likely be the result.
I just hate to see you wasting resources, dedicating cores to the GPU, that could be used for the SMP client. You should be able to easily run 3 or 4 vms and your 2 x 275s and not lose but a few percent on either due to interference.
One other thing to try once you've got priority sorted and more VMs going is to unlock gpu affinity and let Windows Task scheduler find idle cycles accross all cores. With the SMP client there are always idle cycles available on one core or another. Locking GPU affinity to one core could slow down the SMP client using that core.
Affinity just doesn't work the same way on a Hyperthreaded cpu as it does on a non HT cpu and that makes me wonder if there is any benefit, or perhaps even a detriment, to production on a HT rig.
So far I didn't loose any PPD by setting the affinities. 2x GPU's usually doing 8k sometimes 6k depending on the WU. 2x vms doing 3800 - 4000k. I'd like to try 3 or 4 vms but the free license for vmware server only allows 4 cores. Unless I can trick it by editing the vmx files? Or run 2 servers on different ports? Or use a different version?
I've been debating/testing the pros and cons of HT a lot else where on the forum. So far I found it helped with SLI and it seemed like it helped with folding 2 vms and 2 gpus + using the comp. But obviously im still playing with affinity/priority. More tests are needed.
As I understand the VMware core limitation, it is a per Virtual machine limit. Server 1.0.9 limit is 2 cores per VM (I'm fairly certain this is the case with 2.0 as well). You should be able to run 4 x 2 core VMs on the I7 using it. The maximum number of VMs is limited by ram allocation. If you have 6 GB and allocate 1 GB to each VM, you can run 4 and have 2GB left for the OS and other apps. If you allocate 2 GB of ram per vm, you can only run 2 VMs totalling 4 cores if you have 6 GB. You'll need more ram on your rig to run more VMs, if your sig is correct.
As I understand the VMware core limitation, it is a per Virtual machine limit. Server 1.0.9 limit is 2 cores per VM (I'm fairly certain this is the case with 2.0 as well). You should be able to run 4 x 2 core VMs on the I7 using it. The maximum number of VMs is limited by ram allocation. If you have 6 GB and allocate 1 GB to each VM, you can run 4 and have 2GB left for the OS and other apps. If you allocate 2 GB of ram per vm, you can only run 2 VMs totalling 4 cores if you have 6 GB. You'll need more ram on your rig to run more VMs, if your sig is correct.
Yeah 2 cpu per vm but vmware only sees 4 cores on my system. When I check the license it says 4 cpu only. At first I thought I needed to enable HT support. The more I think about it the more I think I can trick it by forcing it to use cores 4-7 in the VMX file.
I was giving each VM (basic gentoo no extras) 1gb and it never really used it (except for cache). But I was running low all the time. So the other day I dropped them down to 512mb and told FAH about 420mb. Hasn't effect my PPD at all. Also worried about GPU mem using up all the mem addresses on a 32-bit OS with more than 2GB of ram installed.
I might go native 64-bit linux soon. At which time I'll get another 3 GB. Hope it doesn't hurt my mem speeds like 9-12GB does.
Its almost Guinness time! :beer:
I can't find any limitation on the number of VMs you can run on VMware server or limitation on the number of processors in the machine VMware server is run on in the license. It does give the definition of a processor as a "single, physical chip that houses no more than four (4) processor cores", but puts no limit on how many processors you can have.
The memory setting in FAH config has nothing to do with the amount of memory FAH will use. It is an assignment tool used by the server to determine which WUs your machine will be assigned. If you report over 200 MB of ram to the AS, it will assign most of the A2 WUs which can use up to 700 MB of ram. If you run out of ram and swap space, the VM will crash. It has happened to a few of us squeezing two vms onto a 2 GB machine.
Native Linux works great on the i7 but not so great on the 275s. I think max ppd is going to come with a x64 Windows OS with 3 or 4 Linux VMs + GPUs.
GeneralMac
07-02-09, 09:10 PM
before I used 4 vm - 2 threads per vm. using the affinities from above + 2 gpu clients using 1 core (2 threads) points total around 27,000 per day.
2,300 per vm
9,000 per gpu
but rendered my computer useless, 2nd gpu started getting errors
vm errored out as well (cpu 4.1 mem 1600)
now im using 2 vm - 6 threads per vm
smp 1
priority.ungrabbed = "idle"
priority.grabbed = "idle"
processor0.use = "FALSE"
processor1.use = "FALSE"
processor2.use = "TRUE"
processor3.use = "TRUE"
processor4.use = "TRUE"
processor5.use = "TRUE"
processor6.use = "TRUE"
processor7.use = "TRUE"
SMP 2
quote:
priority.ungrabbed = "idle"
priority.grabbed = "idle"
processor0.use = "FALSE"
processor1.use = "FALSE"
processor2.use = "TRUE"
processor3.use = "TRUE"
processor4.use = "TRUE"
processor5.use = "TRUE"
processor6.use = "TRUE"
processor7.use = "TRUE"
+ 2 gpu's 1 core (2 threads) the core not being used by the vm's
averages around 24,600
vm1 - 2835.7
vm2 - 2811.7
gpu1 - 9479.3
gpu2 - 9479.3
cpu - 3.7 mem 2000
i have 3 gigs of memory using 64 bit vista ultimate
I need more memory before i can use any more vm's
and my puter is usable now
dont know if all this means anything but figured id post
I can't find any limitation on the number of VMs you can run on VMware server or limitation on the number of processors in the machine VMware server is run on in the license. It does give the definition of a processor as a "single, physical chip that houses no more than four (4) processor cores", but puts no limit on how many processors you can have.
The memory setting in FAH config has nothing to do with the amount of memory FAH will use. It is an assignment tool used by the server to determine which WUs your machine will be assigned. If you report over 200 MB of ram to the AS, it will assign most of the A2 WUs which can use up to 700 MB of ram. If you run out of ram and swap space, the VM will crash. It has happened to a few of us squeezing two vms onto a 2 GB machine.
Native Linux works great on the i7 but not so great on the 275s. I think max ppd is going to come with a x64 Windows OS with 3 or 4 Linux VMs + GPUs.
How much of that 700mb is the OS? My Gentoo vm's use like 70mb (gotta love linux for that). I'll change it in a heart beat if my vm's start crashing. Initial 3 vm tests are that I don't have enough mem and I'm not getting enough PPD more than 8K to justify it. I'll try again later..... "Oh Captain my captain" :cool:
+ 2 gpu's 1 core (2 threads) the core not being used by the vm's
averages around 24,600
vm1 - 2835.7
vm2 - 2811.7
gpu1 - 9479.3
gpu2 - 9479.3
cpu - 3.7 mem 2000
You should be able to get 3500k out of each vm, with only 2 vms running. You should leave them alone for a while, they seem to gain 500ppd or so over time. You might need to remove all the affinity settings or set the 2 vms to use 2/4 real cores each and set everything else to other cores / all cores. So they grab the HT cores.
Gotta clean up the family is coming over tomorrow but I'll try and post up some screenshots later. My 4.2 ghz OC can get 2 vm's avging 4k (3800-4200) each. 9k-10k sounds good but I like to use my KRD=KillerRigOfDoom! When I'm not gaming I can fold upwards of 24k on here and still watch HD vids, listen to music, chat, surf, run a ton of monitoring apps, ftp, bittorrent ect.
GeneralMac
07-02-09, 10:48 PM
"Oh Captain my captain" :cool:
Great movie - dead poets soceity
You should be able to get 3500k out of each vm, with only 2 vms running. You should leave them alone for a while, they seem to gain 500ppd or so over time. You might need to remove all the affinity settings or set the 2 vms to use 2/4 real cores each and set everything else to other cores / all cores. So they grab the HT cores.
Im using vist 64bit, 2 vms and 2gpu folding--no mem left my puter is usable now. no probs and no errors on any of my clients
Gotta clean up the family is coming over tomorrow but I'll try and post up some screenshots later.
have a good 4th and be safe Im going on vacation and wont be back till the 14th
Checked p2675, using 532 MB out of 713 total in use. p2677 using 500 out of 664. Drank too many guinness' to go downstairs and check a rig with a p2669. Ahh, I found one, about 500 out of 670. p2662, which is not currently available, was the cause of the vm crashes.
Great movie - dead poets soceity
"Indeed" :)
Im using vist 64bit, 2 vms and 2gpu folding--no mem left my puter is usable now. no probs and no errors on any of my clients
My current thoughts are: I'd rather get 8k and be able to do everything but game, than get 10k and be on the brink of crashing / be unable to use my precious.
have a good 4th and be safe Im going on vacation and wont be back till the 14th
You too and Have fun! I expect to see at least 20k+ a day from you on EOC while you are gone. You should be on my radar!
Checked p2675, using 532 MB out of 713 total in use. p2677 using 500 out of 664. Drank too many guinness' to go downstairs and check a rig with a p2669. Ahh, I found one, about 500 out of 670. p2662, which is not currently available, was the cause of the vm crashes.
Ha! Why are there pints all over my place this morning? :beer: I've been crunching alot of p2669 and they use about 450mb. I'll bump the vms up to 768 in case a WU comes along that needs it.
Are 5749 and 5750 gpu killers? Is 2667 a cpu killer? I think 2667 is skewing all of my affinity tests ATM.
Alright. You tell me what the deal is :). Check out the TPF times on my vm1.
3 vm's was killing my ppd and eating up all my mem, so I turned off vm3. I then set vm1 to cores (0,1) and vm2 (2,3), GPUs were set to ALL cores. I was getting like 10min TPF and 3000PPD on vm1. VM2 was steady at 4100. So I switched vm1 to (1,4). 12min TPF and 2500ppd. Switched GPU's to core 0. Now vm1 is getting 7min TPF and 4100ppd.
Maybe core 0 is a HT core and 4 is real. Maybe the vm's are cruzing and when the GPUs steal cycles its like a speed bump. I don't know..
WU variability makes testing difficult. P5749 and p5750 are 511 pointers, the lowest producing of the current GPU WUs,roughly 30% slower than the best 353 pointers. In testing thee effects of affinity, you have to do it when you do not have any p59xx on the GPU because the frame times within a WU are highly variable, the spread being about 25% from the average to the minimum time. Ideally you would test the same frames from the same WUs by using back up copies and starting the test from the same point for each affinity configuration.
I haven't had a p2667, p2677 maybe? p2677 folds just like all the other a2 WUs.
Too many variables you say. But this takes too much time as is :). I'll post up another pic later as everything is finishing up right now.
I don't think you'll be able to get consistent results with HT on. No matter what, the work has to be done by the 4 physical Floating Point units. If you could truly lock a fah thread to a logical processor, it couldn't run.
On a C2Q affinity helps because cores 0 and 1 use one cache and cores 2 and 3 use another. By keeping the WU data in the same cache associated with two cores, latency is reduced and performance increases. THe i7 has one shared cache so there is no reduction in latency and probably no benefit to FAH. While I can envision scenarios where setting affinity may improve usability, I can't see how it will help production on a nehalem.
I don't think you'll be able to get consistent results with HT on. No matter what, the work has to be done by the 4 physical Floating Point units. If you could truly lock a fah thread to a logical processor, it couldn't run.
On a C2Q affinity helps because cores 0 and 1 use one cache and cores 2 and 3 use another. By keeping the WU data in the same cache associated with two cores, latency is reduced and performance increases. THe i7 has one shared cache so there is no reduction in latency and probably no benefit to FAH. While I can envision scenarios where setting affinity may improve usability, I can't see how it will help production on a nehalem.
Lol, maaaan you're hardcore :). Wu data in the CPU cache eh... never would have thought about that. I was just looking at from a CPU cycles, physical vs logical core angle.
Does mem speed / bus speed effect ppd?
Yes, by a few percent.Testing I did a while back on ddr2 showed a 3 % improvement for 1067 Mhz 4-4-4-12 over 800 Mhz at the same latency at the same FSB. 1067 CL5 was 1% faster than 800 MHz CL4.
GeneralMac
07-03-09, 05:02 PM
bought 6 more gigs of memory....that should improve things a little....if nothing else no more errors (which does cost wu and points) , but wont be able to install before i leave. :(
vBulletin® v3.8.7, Copyright ©2000-2012, vBulletin Solutions, Inc.