• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Instability woes (can't diagnose this)

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

SolidxSnake

Member
Joined
Dec 26, 2004
Okay, I've been dealing with this one on and off for a while. See: http://www.overclockers.com/forums/showthread.php?t=686570

Quick summary of the above thread:
Since I built my new rig, things have been a bit flaky at times (rig in sig). I've been using the vertex 60G drives for a while now, and while I've had issues with them it's predictable when they'll fail (one of them likes to fail when I stress test it, either with a bunch of data to move all at once, or a benchmark). However, I've never had the issues I'm having now with them.

Anyhow, when I rebuilt my rig over the summer, I decided to swap the PSU fan around to exhaust instead of intake (for airflow consistency in the case). Rig was running fine for a week or two, then I got the GTX580. Popped that into my computer, worked fine for a bit. Then after a day or two, after the GPU had been loaded (FAH-GPU mainly) for a couple of minutes, the computer would start acting up. Windows would start to get finicky, programs would hang/crash, desktop would flash, etc. All things that would happen when one of my drives would drop from the array. Eventually I'd either have to hard-reset the rig, or it would BSOD with error code 0x000000F4 (which is IDE/HDD related). Upon reboot, the one SSD which I know to be picky (the one that will drop under benchmarks) will have a status of "Error Occurred!" in the Intel RAID boot-ROM. However, it will boot fine (though I've had a couple of corrupted files here and there).

Anyhow, I thought the issue went away in the last thread when I swapped the PSU fan direction. Rig has been working completely fine for a month or two (whenever that thread was "Solved") with no issue. I've played with the fans now and again to see if I can lower my temps a bit but haven't really done much else with the rig. A couple of nights ago, I had the same issue as before. 0xF4 BSODs under heavy GPU load (BF3 or FAH-GPU), drive dropping from array. This started happening last night when I replaced my side-panel. Which implies to me this is a heat-related issue. However, with the side-panel off, my temps are the same (at least monitored GPU core/mem/VRM temps as well as CPU temps).

All I can think of is the PSU is buckling under a heat load and it's manifesting itself as a problem in the SSD.

What I can't wrap my head around is why it's so random. It's not any hotter than it was a couple of days ago (if anything ambient temps are cooler) and I'm doing the same things, yet now I'm getting BSODs and corrupted data. It's been running FINE for a couple of months with no degradation, so it's not like the PSU isn't up to the task.

The rig was running fine last night with the side panel off, but on it crashes. I'm doing some more testing right now. If it's the side-panel thing, it must be heat....I think?

Any ideas?

Update: As expected, popped the side-panel off an the rig is running fine. I think this narrows it down to the PSU getting too hot? I literally cannot think of anything else. Monitored temps (CPU/GPU) are hotter now than they were at the last crash this morning (with the side panel on) so I'm guessing the PSU is what is getting affected by the heat.
 
Last edited:
Have you tested the rails for voltage stability? Do you run through a UPS or voltage filter to "purify" your input voltage? (Do you have "dirty" power?) Does the issue disappear when the CPU is at stock clocks?
 
Have you tested the rails for voltage stability? Do you run through a UPS or voltage filter to "purify" your input voltage? (Do you have "dirty" power?) Does the issue disappear when the CPU is at stock clocks?

Clocks have no effect. Not behind a UPS.

Rails, tested with a DMM, are within spec. Not sure about ripple, but as far as the meter can read all the rails are within +/- 5%.

And as for another update, panel's been off and it crashed while I was afk (monitor wouldn't exit power-save mode, after shaking mouse, etc).

So I'm officially stumped again.
 
Are you able to grab a regular spindle HD and install your OS on it for testing? I'm wondering if your SSD's are causing you grief. If you have the time/ability, parts based concerning ability of course, remove the SSD's from the equation (no power or data connections) and install your OS on a spindle drive.

Give it a whirl, if you can, and let us know how it turns out.
 
Update:

Decided to pull my rig apart and do some small stuff (some of my electrical-tape-covered wire splicing was coming undone and it was getting dusty). Fixed up the wiring, changed fan directions (top, rear and front of case now intake, both rads exhaust, PSU intakes, removed grills from side panels), remounted CPU and GPU block, and dusted out the case/blocks/sinks/fans/rads.

Rig is back together, running at similar temps as before with FAH gpu/cpu clients running. Seemingly fine at the moment.

If it is fine, I'll be dissatisfied because I'll _still_ have no idea what's going on :bang head

edit: To respond to your suggestion, I have Ubuntu 10.10 installed on an HDD that is also connected (my rig has the two 60GB Vertex SSDs in RAID0, a 1TB Seagate for data and the 250GB Seagate doing pretty much nothing besides having Ubuntu on it), so if I still have issues after my teardown/rebuild I'll try loading my rig in ubuntu.
 
Back