September 11, 2005 - 19:00 UTC
After weeks of dealing with stymied servers and painful outages we're back on line and catching up with the backlog of work. It was a month in the making, but it was always the same problem - dozens of processes randomly accessing thousands of directories each containing up to (and over) ten thousand files located on a single file server which doesn't have enough RAM to contain these directories in cache.
Since this file server is maxed out in RAM, our only immediate option was to create a second file server out of parts we have at the lab. So the upload and download directories are on physically separate devices, and no longer competing with each other. The upload directories are actually directly attached to the upload/download server, so all the result writes are to local storage, which vastly helps the whole system.
While this all very good news, this isn't the final step. The disks on the new upload file server are old - we'd like to replace this whole system at some point soon (something with bigger, newer, faster disks and faster CPUs).