• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

lost hope, about to scrap parts. any last ideas?

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

BinaryPirate

Registered
Joined
Dec 23, 2004
Location
Sandy, UT
First off, I must say that I'm quickly losing hope. Let me give you all some background. My specs are in the sig. Everything has been running perfectly stable and cool for a year now at those speeds. (Mind you I keep stock speeds when I'm not gaming.) Anyhow, in the past year, I've had data corruption on all my hard drives 3 times, with the most recent occurance being a little over a week ago. The first two times, the 4 HDs were in RAID 0 with an old Silicon Image 640 PCI controller. I lost about 80% of the data each time. After being sick of losing that much not once, but twice, I took, the drives out of RAID to run them as single drives. I even bought a new Highpoint Rocketraid133 controller card because I didnt trust my SiI 640 card with it's notoriously bad reputation.

While all of this data loss is most annoying, there's more to the story. A few days ago while playing Doom, I paused to plug in a speed control knob to my cpu fan. It seemed pretty routine. I was careful not to short anything as I did it, but as I plugged it in, all of the LED fans dimmed and slowed for a moment, and then the entire computer shut off completely.

At first, I thought I had bumped the heatsink on the cpu (though if I had, it was not hard at all. And the heatsink still appeared to be seated fine.) But to be safe, I reseated the HS/F and reapplied thermal paste before I turned the comp back on. I immediately checked temps and voltages, all of which were normal. However, after I rebooted and attempted to load windows, I started getting random BSoDs each time I restarted.

IRQL_NOT_LESS_OR_EQUAL
DRIVER_IRQL_NOT_LESS_OR_EQUAL
PAGE_FAULT_IN_NONPAGED_AREA
BAD_POOL_HEADER
PNP_DETECTED_FATAL_ERROR
DRIVER_CORRUPTED_MMPOOL (According to MS support this stop error is "Disasterous" and caused by corrupt memory, but they do not specify system memory, bois, HD, or any sort of cache.)

I also recieved BSoDs telling me to run chkdsk /f, telling me that disk.sys and cdrom.sys were corrupt.

Most recently, I can no longer even boot as far as before to the point where I would get the BSoDs because windows stops with the error that \windows\system32\config\system is missing or corrupt.

These BSoDs would seem to indicate that it is a driver issue, but the fact that they are getting progressively worse makes me think not. I had not modified any drivers at the time. I have been thinking it is most likely either the mobo or psu. The mobo could not be properly handling IRQ assignments or the psu voltage could be fluctuating. Here's what I've found.

I've swapped every part except the mobo (including the vid card, both sticks of ram, the cpu, the HD's and the optical drives.) Of those parts, all were fine except the PSU. When I swapped the out the Antec 480W with another cheap Rhycon 550W, the BSoDs went away and windows loaded. I wonder if I'm putting too much load on the PSU or if its just bad. I haven't yet successfully gotten the computer to boot on another mobo to verify if that is also an issue. The spare mobo doesn't seem to be booting from the HDs connected to the Highpoint IDE controller, yet I normally use another Promise TX2 controller in that spare mobo, so I am familiar with the required settings. I have tried both controllers.

If it is indeed a psu issue, I'm am not convinced that is the only problem. Why would I only get errors as windows boots? Why not sooner? For that reason, I'm more inclined to think that it is a mobo IRQ handling issue because the mobo handles the IRQ assignments as windows boots. (And fluctuating power from PSU could also be an issue there. Yet remember I found all the PSU voltages to be normal.) There are too many inconsistancis. I'm lost as to what to do from here. To make matters worst, since that \windows\system32\config\system file is corrupt, I can no longer get far enough in the windows loading process to see if the BSoDs have been corrected as I swap parts.

I tried booting with another hard drive that has windows xp installed and configured for a VIA chipset instead of the nforce2 chipset, but got another chkdsk /f BSoD. That error may be expected. I dont know. I also tried booting to the recovery console from an XP CD and got the same BSoDs. That makes me think the errors were NOT HD corruption, at least originally. As for now, there is definately corruption.

The problems I experenced the last 2 times I experienced the data corruption were very similar to this- similar BSoD messages, and chkdsk errors. Formatting did fix the issue, but this time seems different. Why does it keep happening? How can I pinpoint the source of intermittent issues? I'm ready to toss the HDs, mobo, psu and buy new ones, though I really can't afford to do that. If anyone has any ideas, I'd desperately like to hear them. Much thanks for taking your time.
 
I am so lost as to what kind of method you are using to troubleshoot. To trouble shoot you try one thing at a time, Not 20. Did you run memtest86 on your ram modules? You said it booted when you switched PSU's so that should mean your problem was found, but from your post you kept trying stuff. I am just trying to get a handle on everything you did. if you could could you list what you did and when in a list format that is numbered with a description of the outcome for each thing.


Did you mess with power connectors while the computer was on?
 
IRQL_NOT_LESS_OR_EQUAL
DRIVER_IRQL_NOT_LESS_OR_EQUAL
PAGE_FAULT_IN_NONPAGED_AREA
BAD_POOL_HEADER
PNP_DETECTED_FATAL_ERROR
I've had intermittent random BSODs of this sort. The cause? bad memory. I suggest that before you go and buy new PSUs or mobos, you should try booting your computer with a different stick of memory.

Actually, your story really sounds like a memory issue.
 
I've been trying one part at a time. First trying other parts in my comp, then trying my parts in other comps. Yes, I have run memtest86 and its run about 2 hours without trouble since this happened. Before all this happened memtest ran fine 24+ hours, so Im pretty sure the memory is fine. While swapping parts, the psu was the only part that seemed to make a difference between BSoDs and no BSoDs. Yes, the comp was on when I plugged the fan speed knob in. :( It was a Termaltake 80mm Smart Case Fan- model a2016 I believe. Without anything connected, it runs full speed. I wanted to slow it down a bit to get the noise down.

3line- I've tried a spare 256MB stick of ram and it seems to make no difference. I do appreciate the suggestion though.
 
BinaryPirate said:
I've swapped every part except the mobo (including the vid card, both sticks of ram, the cpu, the HD's and the optical drives.) Of those parts, all were fine except the PSU. When I swapped the out the Antec 480W with another cheap Rhycon 550W, the BSoDs went away and windows loaded. I wonder if I'm putting too much load on the PSU or if its just bad. I haven't yet successfully gotten the computer to boot on another mobo to verify if that is also an issue.

If it is indeed a psu issue, I'm am not convinced that is the only problem. Why would I only get errors as windows boots? Why not sooner? For that reason, I'm more inclined to think that it is a mobo IRQ handling issue because the mobo handles the IRQ assignments as windows boots. (And fluctuating power from PSU could also be an issue there. Yet remember I found all the PSU voltages to be normal.) There are too many inconsistancis. I'm lost as to what to do from here. To make matters worst, since that \windows\system32\config\system file is corrupt, I can no longer get far enough in the windows loading process to see if the BSoDs have been corrected as I swap parts.

Okay, it sounds like a psu problem, and to get around windows get knoppix and burn it as a boot disc onto a CD. This is a great way to troubleshoot (haven't done it myself). When you say your rails are normal, is this a software check or a multimeter check? Because the software isn't too accurate. The psu may have helped corrupt your data which is why it isn't getting to windows. I'd try the psu first because your motherboard at least gets to windows so your bios is okay and the mobo isn't fried.
Hope this helps.

Edit: Welcome to the forums! :welcome:
 
The voltage was checked from both the bios and multimeter. Both read the same values of 4.95V and 12.02V.

Update: Within the last few minutes, I've heard one or more HDs spin down, then back up again. I've tried jiggling wires to reproduce a poor connection, but there doesnt seem to be one. Everything seems secure. One stick of memory is now getting errors in memtest that it didnt get yesterday. Im going to run memtest longer to test both sticks as well as dual ch. and single ch. modes more. I will report back with results. Though I'm almost afraid to keep testing the stuff since more problems keep arising.... I'd hate to be risking good parts, even though Im not sure which are good anymore. Anyway, I'm off to test more.
 
Hey, no need to yell. I should have been more clear. The computer was off. I made sure everything was connected securly. Now, either there are several loose connections (though they aren't obvious according to visuals and continuity on multimeter) or the PSU is really dying because now 2 of the HDs aren't detected, nor is the floppy.




And now, a minute later WITHOUT TOUCHING ANYTHING, the floppy works, but the HDs still dont.
 
Are you testing the ram sticks one at a time? You know what. Reread my first post and try to do the list. Because untill you start to detail what is wrong and what you are doing it will be close to impossible for us to help you.
 
i'd like to second my own (non-expert of course) opinion that it's bad memory,
or not enough voltage to one/many of the sticks.
i have had those errors before, and it took me upping some voltage and id'ing the bad stick, then reformatting and reinstalling to fix it.

alternatively, i've also seen them when OC'ing too high without enough voltage.
run everything at stock speeds when trying to find the culprit, run on a spare/throw away hard drive.. and start with memtest and the memory. If thats not it, also try prime95 to see if its the cpu

good luck!
 
@Spade,

HD clicking doesn't sound like bad mem to me though. You could be right though. In-fact it could be that your PSU is slowly dying and taking a few things when it goes (like the HDs and mem).

Good luck bro.
 
lets make some sense of this-

a new PSU might help, but it allso might not. you have some signs of inadequate PSU problems, but overclocking can make this much worse. allso, it allmost sounds like the AC power connection is loose/sloppy. check the outlet and house wiring, plug ends & cord, AC power cable,etc...is there allot of slop in the molex connectors or are they tight?
it allso sounds like the overclocking is not serving you in the way you think it should, but-

you continue to overclock:

"Everything has been running perfectly stable and cool for a year now at those speeds
(Mind you I keep stock speeds when I'm not gaming.) Anyhow, in the past year, I've had data corruption on all my hard drives 3 times"

you may think you are cool and stable, but obviously not.

data corruption and Bsods mean your O/C is spanking you.
i speak from experience. i fubarred a HD AND corrupted the data 3 weeks ago. now the drive was old and it could have been coincidence that it failed when i corrrupted the drive, but maybe not.

"These BSoDs would seem to indicate that it is a driver issue,"

not if you overclock beyond stable. classic signs you pushed it too far.
raformats and reinstalls would fix that if it were drivers- unless you have some funky non-native things going on that XP dosnt like with those controllers...

Backing WAY off the O/C would be a good place to start...

but at the same time IF all this is caused by an overly aggressive O/C, then you most likely permanently damaged your hardware, and most, if not all of it at that.... you could even underclock and find no relief - if thats the case. try running at stock or below after your next reformat and reinstall, and consider it semi-permanent. if you still have these problems, your hardware is shot.

sometimes (rarely) a new bios chip, flashed to the most recent fixes all, most of the time reflashing an old chip will not. a shot in the dark, but it might work.

you are locking to 33.3/66.6, right?
thats the "duh" one right there...

best of luck.
 
Last edited:
Kendan-

I spent yesterday and today testing the memory. I have tested each stick by itself, one stick for 7 hours and one stick for 6 hours on memtest86. Memtest reported no errors, however, on the stick that ran for 7 hours, memtest froze. I have also tested the memory together in single and dual channel modes. When both sticks are tested together, memtest still reports no errors, but instead freezes on test #5 every time. All of these tests were performed while using the suspected bad 480W Antec. I have now switched to a 550W Rhycon psu that I know works well and am retesting both sticks in dual channel mode. So far, I have been testing 20 mins. and am on test #7, a test I never got to in dual channel mode with the Antec. I don't plan on using the Antec ever again. I'll call Antec tomorrow and see what they say.

As I was sitting here running the tests last night, four times over the course of several hours I heard one or more HDs powering down, and a few seconds later, powering up again. No cables were touched when it happened.
 
Spade-

Update on last post: memtest 3.2 has frozen after 27 min. while testing both sticks in dual channel mode. It made it past the test #5 it normally got stuck on to 0% on test #8. According to memtest website, Tests 5 and 8 seem to catch the majority of the problems. I'm going to continue testing each stick individually to see if it makes a difference. I expect it will because yesterday when the sticks were tested individually, they tested 6-7 hours and the stick running 6 hours could have run longer.

What could the problem be (besides the psu of course since that's now out of the picture) if the sticks fail much sooner when together than when seperately? I think mobo. And then there's the fact that they are failing at all when they didn't before...

Spade AND orionlion82-

The OC I had pre-disaster was perfectly stable. I understand that I said I clocked back to stock when not gaming in an attempt to extent the life of the parts, but I actually kept it running at 2.5Ghz (200x12.5) about 75% of the time. Pre-disaster, my comp was prime95 and memtest86 stable for 24+ hours peaking at 47C, but usually 46C the entire time. The cpu was stress-tested with vcore of 1.775V and the ram was stress-tested while at vdimm of 2.6V. I hope that lays to rest your fears that I was going too far on the OC, but if you still think I am taking the OC too far, please share.

btw, everything is at stock now while testing.

MonroeM-

I completely agree with you. Now I'm trying to find what else isn't working like it used to. The mobo I think is failing too.

orionlion82-

Everything is wired very neatly in my case. All the connections make good contact. The wires all have enough slack so that they don't pull to one side or the other or anything like that.

I don't beileve the data corruption and bsods are from the OC. I believe them to be psu related. Though I'm not ruling out the cause as being the OC, but from everything I know, I don't believe that is the case.

As for, "These BSoDs would seem to indicate that it is a driver issue," I agree that could definately be an OC related issue, though everything was perfectly stable before. And as for a new bios chip and reflashing it, I've done that a few times on another DFI Infinity I had, but I think that if it comes to that, I'll buy a new mobo. And yes, I had locked the pci/agp bus at 33/66.

I should also say the the qfan mobo feature no longer seems to work. Another sign of mobo failure.

Thank you everyone for all your support.
 
Last edited:
I'd like to add something. Just because its prime stable does not mean its stable. I have run prime for 2 days strait with no errors, only to get a BSoD the next day.

Anywho I believe it was mentioned above that it could be house wiring? I have a friend who has lost 2 entire pc's to faulty house wiring. I suggest you invest in a UPS, even if it doesn't fix your problem, its very worth having one.
 
A single bsod? Can you be sure it was OC related?

I do have a ups, though I wasn't using it at the time. I was using an ordinary surge protector. But it is a good idea. I'll hook it up. Thanks.
 
re-reading the entire thread, i want to say youve got things shorting, but since your wiring is tidy, i'm stumped.
 
My last post mentioned I had just tested both sticks in DC mode and memtest froze after 27 mins. Well, I just tested 1 stick and memtest also froze on test 8, not completing a single pass. I tested at 2.6V as I have been this whole time and again at 2.8V. No difference. It still froze on test #8. I'm now testing the other stick at 2.6V and will test again at 2.8V. Keep in mind memtest is not actually listing errors, it's just freezing. Its a bad thing none the less. This is disconcerting, since yesterday both sticks ran at least 6 hours individually.


It's sounding like psu, mobo and mem are bad, but only further testing can say for sure.
 
I finished testing the second stick of ram, both at 2.6V and 2.8V. It failed both times- after 26 mins at 2.6V and after 39 mins at 2.8V. Yesterday, both sticks ran 6+ hours. So either the sticks are now worse off than before, or the mobo is worse off than before, or both. I'll now have to find another 400MHz fsb mobo to test ram in to see if it's still at all good.

Evidence mobo is bad:

qfan control no longer works
2 sticks of ram fail sooner than individual sticks


I'll post back, but in the mean time, am I missing anything? You think this is enough evidence to order a new mobo (regardless of outcome of the next memory test)? I'd probably get an Abit NF7-S ver. 2
 
Back