Regarding the science database issues over the past couple of months, let me recap: this is the informix/science database, not the mysql/user database. We noticed that one of the tables in astropulse got corrupted. No big deal - we lost a couple rows out of 80 million. In the process of fixing this we noticed that the astropulse portion of the database hasn't been replicated properly to the secondary informix/science database. So this whole project was to fix two things: the corrupted table, and the broken replication. Ultimately we learned a lot along the way cleaning this up ourselves, but each iteration has been sloooow, and a lot of time was lost trying certain things which seemed obvious, but didn't work like we expected. We're nearing the final stages of this (we hope). One silver lining is that we'll get to test recovery of the secondary from our weekly database backup - these backups are 1.2 terabytes in size at this point, so we don't test this procedure often.
So.. there's been upload issues starting yesterday. Not sure why, maybe we were just being blitzed more than normal. We tweaked our configuration around this morning so that the scheduling server is now also handling 25% of the upload load. Maybe that'll help push the clog through.
In worse news, our server closet temperature shot up way too high this weekend. Machines were running 10 degree (Celsius) hotter than normal, and well beyond spec and in some cases the "danger zone." This isn't good. We're hoping somebody on campus will come up today and inspect our a/c system, but given it's the holidays we might not get anybody until the new year, in which case... we'll have to shut everything down for a while. We shall see.
- Matt