View Full Version : The Mystery of the LOST WU's.
chawken
09-06-01, 05:26 PM
In several threads, I have been ranting and raving about loosing WU's. :mad: I guess I still am - but now I have some information as to why I lost so many - and why others are also loosing WU's.
This all started on Sunday Aug 2nd - at least for me. My SetiLog counts for completed WU's was not matching my personal stats at Berkeley - a discrepancy of about 25 or so. During this week, I have been monitoring each machine with SetiPro. Shortly after I would see one of my rigs transmit - via SetiPro - I would check Berkeley. If I didn't see a credit for that WU then I would go to the machine in question. Each time I found that SetiDriver had not transmitted.
O.K. So on Tuesday I would hit the transmit button, and walk away. After awhile I would check my Berkeley stats and low and behold - no credit. OK now what's up. Yesterday I started paying more attention - and noticed something really bizzare. SetiDriver would show 1 unit 'Ready for Transmit' and 1 unit 'Needed to Fill', but as soon as I hit the Transmit button the 'Ready for Transmit' would instantly go to Zero - even before the DOS client window opened (it is configured to 'Display Transmit'). Now I am thinking that it is not Berkeley - instead Seti Driver. But why? I have been using it for 6 months - why now would it start doing weird stuff.
I email Mike Ober today - explaining the problem - and the great guy that he is, returned my email within about 15 minutes. It wasn't Seti Driver - it is now back to Berkeley being the problem. Here is the response from Mike Ober:
There has been a problem with Berkeley sending out some really weird WUs lately. I have also noticed this and have checked the WUs that had the problems. They were all corrupt when I received them from Berkeley.
Mike.
This makes perfect sense - Berkeley start having transmit problems on Sunday - and ever since I have been loosing WU's - to the tune of about 40 now.
As each machine completes a WU - I will be deleting everything and setting everything back up fresh.
Morpheus
09-06-01, 07:32 PM
Chawken...
Thanks for the heads-up... Mike IS a great guy based on my experiences as well...
Crunch on!!!
DeltaSierra
09-06-01, 07:58 PM
That also fits with what I've seen lately. Ever since your rants Chawken, I've paid closer attention to my WUs. I use SetiGate, so it keeps track of cached, processing and completed WUs differently than SetiDriver. Anyway, what I was seeing is that SetiGate would pass a WU to the CLC, but over this past week, about 1 in 4 WUs would get skipped. That is, SetiGate would show the WU as "processing" but with 0.00 completion...AND...a different WU also listed as "processing" but with the completion amount updated regularly as the work progressed.
So, what I saw was one WU (per machine) actually being crunched, with one, two or more WUs noted as being processed, but actually having been passed over by the CLC to go on to the "next" WU. I tried to reset the skipped WUs (SetiGate has a restore workunit button), but the very next time the CLC would try to crunch the WU, the same response would happen (i.e., skip it and go on to the next WU). I eventually had to delete the dozen or so WUs that couldn't be crunched. Mike's explanation of corrupt WUs seems to fit my recent experience. The good news for me is that I didn't lose any credits because SetiGate won't count WUs unless they're 100% complete.
For me at least, the CLC would only spend a minute or two trying to crunch the bad WUs and then SetiGate would start a new WU. So, I guess the good news is that I didn't lose any time on those corrupt WUs. That is, I could still get 7 or 8 WUs done per day despite the corrupt WUs.
[Oc]acaridans
09-06-01, 08:07 PM
Thanks for doing the dirty work Chawken..I was actually going to send an email my self....I to have been experiencing the same issue...But there is 1 other thing that has been happening that puzzles me.....Earlier I checked one of my slower box's, it does a WU in about 16hours, it did 4 in the past 24 hours, so I knew something was up but what happened was when I transmitted the WU I got sredit for a WU that was already complete...In other words when I transmitted what should have been WU's 657,658,659 I got credited for WU's 602,603,613..
Does anyone have an explanition for this??
EDIT: I was just reading the other posts..Is the cause of the miscredited WU the coruption?? I dont see how a corrupt WU could cause the unit to be miscredited. Dosent your completed WU's reflect the number of WU's transmitted??? but more importantly will we get them back????
Here's a (futile) solution. We all start FOLDING until we come up with a chemical solution that alters the human brain enough for us to be come either a) smart enough to straighten out this wu problem or b) smart enough to BE space aliens with warp drive and such so we don't need to look for others from the confines of our computers.
:)
bodezafa
09-06-01, 10:11 PM
Huuuuumm I wonde if this could be the caus of the problem I was having with my Slow machine and spy going Idle???
Morpheus
09-06-01, 10:53 PM
What REALLY ******es me off here is the fact that SETI hasn't publuished anything on the site... they continue to make cosmetic alterations, but now it seems they need to be doing some work on the user database...
Chawken, where are you my DB-lovin' friend ( :) ) Given the fact that the WU doesn't hold that data (number of WUs complete)doesn't it make more sense that the miscrediting issue is most likely a problem in SETI's user database?
Crunch 'em
chawken
09-07-01, 09:07 AM
Originally posted by Morpheus
What REALLY ******es me off here is the fact that SETI hasn't publuished anything on the site... they continue to make cosmetic alterations, but now it seems they need to be doing some work on the user database...
Chawken, where are you my DB-lovin' friend ( :) ) Given the fact that the WU doesn't hold that data (number of WUs complete)doesn't it make more sense that the miscrediting issue is most likely a problem in SETI's user database?
Crunch 'em
Seems to me that Berkeley has some serious issues that need to be addressed. You're correct about them not publishing anything - but what else is new. We started to get some good communication from SSL when 'Trust no One' was circulating their petition on the news groups. But now, SSL seems to be back to their non-communication mode.
I was waiting to see if someone could figure out the sheet, and what needed to be changed on it to conform to the Berkeley cgi changes. I guess I need to get to designing a DB for us. My only hesitation would be that unless someone else on this team has a Unidata DBMS system - the DB is not transportable - only the raw data exported from it. It isn't a PC level OS/DBMS.
Today I am going to take [OC]'s advice and look at SetiGate.
Well something is definitely screwed up now. I've been checking on my progress this week and didn't see anything unusual until today. First of all my average time per work unit is going up (beyond 6 hours) while none of my machines takes more than 4.5 hours to finish a unit. Yesterday my times average time was going down. Secondly according to data I'm getting from Berkeley my WU per day is up to 69 but according to my own data it should be around 58. Yes you read that right - looks like I'm picking up units from somewhere else while other people are losing units. This is one of the biggest problems they've had to date. I'm thinking about shutting off all of my machines until they admit something is wrong and fix it.
chawken
09-07-01, 03:06 PM
Originally posted by TC
Well something is definitely screwed up now. I've been checking on my progress this week and didn't see anything unusual until today. First of all my average time per work unit is going up (beyond 6 hours) while none of my machines takes more than 4.5 hours to finish a unit. Yesterday my times average time was going down. Secondly according to data I'm getting from Berkeley my WU per day is up to 69 but according to my own data it should be around 58. Yes you read that right - looks like I'm picking up units from somewhere else while other people are losing units. This is one of the biggest problems they've had to date. I'm thinking about shutting off all of my machines until they admit something is wrong and fix it.
I am also thinking of shutting down. I added another cruncher last week, that would have brought my daily up to 57+, but instead, my average is down to 48. It is very frustrating. I have summitted a bug report, hoping that someone at Berkeley will respond, or at least post in the Technical News. But won't hold my breath for any response.
Would it help if whoever is currently in charge of our setiing efforts put a formal petition that we could all put our names and emails addresses to and then send it to the Berkeley people? If we had a whole bunch of names unified together instead of a few isolated ones that might carry more weight. BTW who is currently heading up our setiing efforts? Didn't it just change recently?
chawken
09-07-01, 06:04 PM
Originally posted by eobard
Would it help if whoever is currently in charge of our setiing efforts put a formal petition that we could all put our names and emails addresses to and then send it to the Berkeley people? If we had a whole bunch of names unified together instead of a few isolated ones that might carry more weight. BTW who is currently heading up our setiing efforts? Didn't it just change recently?
Other than submitting a bug report - I don't know any email addresses for anyone in charge at Berkeley. But your idea is a good one. If anyone else has a contact at Berkeley - as a team I think that we should send a group petition from the team.
It seemed to temporarily work for 'Trust No One', back in May/June time period.
Morpheus
09-08-01, 12:22 AM
the following is the email addy for Eric Korpela (SETI scientist @ Berekely)... Eric has been most helpful to me in the past, and always replies to my emails....
korpela@ellie.ssl.berkeley.edu
I would suggest NOT flooding him with mail...
Hope this helps (& that Eric is not to mad at me :rolleyes: )
Don't you just love it when you wake up, go to check your seti machines for their progress while you slept and find them all idling, unable to connect so they can transmit their last wu? Especially when your internet connection is still going rock solid but only getting to seti is impossible. :mad:
Thelemac
09-08-01, 06:23 PM
yeah, I had rather noticed that last night (I went to bed really, er late/early), though it didn't affect me with Driver...but it refused to transmit even when I had a good connection.
Now I'm getting a "Forbidden Access" error when I try to look at the Berkeley stats page. Hopefully that means they're in mid-repair. What's the smiley face thing for "fingers crossed"?
Originally posted by eobard
Now I'm getting a "Forbidden Access" error when I try to look at the Berkeley stats page. Hopefully that means they're in mid-repair. What's the smiley face thing for "fingers crossed"?
I'm getting Forbidden errors for EVERY page of SETI's site. Mabey that means they are doing a complete remodel? Dunno what Forbidden *actually* means, just that I can't look at the page. Any webmasters out there know exactly what "fobidden" means?
Haven't been paying close attention at all to my WU counts... When I upload this batch, I'll check my stats just before sending, and then just after. See if I loose some WUs....
JigPu
just logged on to SETI site with no probs...nothing looks different to me
I got a forbidden message once last night when trying to check my personal stats - but I hit reolad and it came up fine.
Morpheus
09-10-01, 09:17 AM
Tried to update the stats this AM, & Tim (TC) came up with 13,768 WUs completed.... #1 place, with an avg of 249.5 WUs per day...
Been busy Tim? jk ... LOL
Needless to say the sheets are still down :(
Crunch on!!!
Originally posted by Morpheus
Tried to update the stats this AM, & Tim (TC) came up with 13,768 WUs completed.... #1 place, with an avg of 249.5 WUs per day...
Been busy Tim? jk ... LOL
Needless to say the sheets are still down :(
Crunch on!!!
Maybe because tim cray changed his name to tim so there are now 2 tim's.
Morpheus
09-10-01, 11:19 AM
yep... that is true.. but, if you add them together you get 14,000+ WUs??... I dunno... Sounds most likely though...
Crunch 'em
I left the machine alone long enough to finish off the remaining 10% it still had to go and complete another whole unit and when I check on it what do I find? The system was unable to upload results!!!!:mad:
killem1x1
09-10-01, 03:53 PM
I made it in to my office today, and found that some of the guys had stopped Seti on one of my machines, imagine that stopping seti to do work ;) Anyway, I watched as it tried to connect for a while, it took about two hours of trying before I was finally able to send in results, but atleast it finally worked. I hope you are all able to get in soon :)
vBulletin® v3.8.7, Copyright ©2000-2013, vBulletin Solutions, Inc.