PDA

View Full Version : Memtest ok on pass 1-5, 6 start errors but then no more. What does it mean?


silkshadow
08-21-09, 10:30 AM
I have a machine built with spare parts that I sold to a friend (for the price of a new Q6600, so it was more like a gift) about 4 months ago. Specs:

Gigabyte P35-DS4 (bios up to date)
Intel Q6600
2GB Crucial Tracer DDR2 (2 1gb sticks)
Raptor 36gb
2 1TB western digital HDs in mirror raid
Enermax Infinity 720w
Nvidia 8800gt 512MB
Vista Home 32 bit

She was getting BSODs, a lot of them, and brought it to me to see if I can fix it. The computer was running fine 4 months ago till the last 2 weeks when she started getting BSODs. She tells me at first they were far apart but last week they started becoming more frequent and Tuesday it BSOD 4 times in 3 hours and that is when she brought it to me.

The machine boots no problem and it doesn't BSOD on any reproducible action. It can BSOD when sitting idle for a few hours (one time I left it sitting for 6 hours before it BSOD) or when browsing, or when copying a file or anything. She told me she smelt something burning before she started having this problem, but she smokes, and always thinks she smells something burning.

The mini dumps are a mixed bag of KERNEL_MODE_EXCEPTION_NOT_HANDLED and MEMORY_MANAGEMENT with one INTERNAL_POWER_ERROR. The majority of them are MEMORY_MANAGEMENT errors.

I just got around to running memtest on it tonight. Memtest was clean of errors for the first 5 passes. I was checking it periodically and there were no errors. Then I went out. When I came back it was on pass 9 and there were 29696 errors. However, now it is on pass 14 and still there are 29696 errors. So somewhere there was errors but then they stopped.

I've been googling and there is no real answers. Some posts point to the ram and some point to the PSU. I know something it wrong, however is it the ram, or is it the PSU? I don't want to spend too much more time on this, but I have no spare DDR2 ram or a powerful enough PSU to swap out.

So I just want to buy a replacement of the defective part. However, I am not sure which it is. Both the ram and the PSU are about a year and a half old. If it were you which part would you suspect of being defective?

New ram would save me so much time as I can swap that out in a couple minutes. The PSU is all wire managed and stuff, that will be a PIA to fix.

Thanks!

KillrBuckeye
08-21-09, 10:40 AM
Based on the symptoms you describe, I strongly suspect the memory. If you're getting any Memtest errors, then it's further evidence of a faulty memory module. Memory errors can be quite random, so don't read much into the 5 error-free passes. Also, I've had bad memory modules that wouldn't throw any errors in Memtest--I had to use Prime95/Orthos to find the bad sticks.

Trap05
08-21-09, 10:58 AM
Its Crucial Ballastix Tracer's...those are famous for being bad if you are getting memtest errors you've found the problem. Try one stick at a time maybe only one is bad.

18 is # 1
08-21-09, 11:57 AM
Got some more DDR2 you can throw in there?

silkshadow
08-21-09, 11:59 AM
Thanks guys! I had no clue the Tracers had a bad rep, I guess I should've done more research. :blush:

I'm on memtest pass 21 now and still no new errors. Its the darnedest thing. So you guys think the chances of it being a PSU issue is pretty small, right? Its got to be the ram?

Thanks again!

Edit: Oops posting at the same time as you 18 is # 1. Unfortunately no more DDR2s. I got DDR and DDR3s. This machine was built from the last of my 775/DDR2 stuff. I did an i7 change over about 4 months ago for my power machines and the HTPCs and such are still back on AMD64s and DDR.

Trap05
08-21-09, 12:05 PM
To give you an example I had a customers PC come across my bench on Monday with 3/4 sticks bad. One stick wouldn't even post...the rest errored on memtest within 30 seconds.

redduc900
08-21-09, 12:11 PM
When Memtest fails it doesn't necessarily mean you have bad RAM. Any number of things can cause it to fail, including incorrectly set Vdimm, RAM timings (including sub-timings), Vcore (Vcc / Core Voltage), vNB (vMCH or vSPP), or tRD (Performance Level).

18 is # 1
08-21-09, 12:26 PM
When Memtest fails it doesn't necessarily mean you have bad RAM. Any number of things can cause it to fail, including incorrectly set Vdimm, RAM timings (including sub-timings), Vcore (Vcc / Core Voltage), vNB (vMCH or vSPP), or tRD (Performance Level).

Which is the best reason to run everything stock when testing.
You can always RMA those Tracers. Crucial has good customer service.

madhatter256
08-21-09, 12:42 PM
When Memtest fails it doesn't necessarily mean you have bad RAM. Any number of things can cause it to fail, including incorrectly set Vdimm, RAM timings (including sub-timings), Vcore (Vcc / Core Voltage), vNB (vMCH or vSPP), or tRD (Performance Level).

Ditto. Sometimes the motherboard is what might be bad and causing the memory to fail. If not the motherboard, than maybe the PSU. I've ran into systems where the PSU was the cause of the BSOD and lack of working USB ports, etc.

KillrBuckeye
08-21-09, 12:52 PM
I agree that there are a multitude of possible causes for BSODs and Memtest errors, and that everything should be returned to stock settings when troubleshooting. However, in my experience when a rig has been stable at certain settings for quite some time (as was stated in OP) and random BSODs start occurring and continue even when stock settings are applied, memory is the most likely cause and should be the first component to be replaced if the bad component cannot be identified with any certainty.

silkshadow
08-21-09, 11:28 PM
Thanks guys! The machine is on stock settings, no OCs and memory timings are as crucial states this ram should be.

Thanks KillrBuckeye, you are right, the machine has been running great for months and before that, even though there was a different HD and case config, the mobo, ram and CPU were together in 1 rig.

I will pick up some new ddr2 sticks tomorrow and run memtest again. I left memtest running all night and it wa on pass 48 this morning and there were no new errors. It still showed 29696 errors. So strange.

Thanks again!

Daddyjaxx
08-22-09, 06:22 AM
I had some Ballistix until I started getting BSOD's. Ran MEMTEST and it had like 50k errors. Crucial admitted there was an issue with the memory when I requested an RMA. Decided to go I7 and just pitched them. That's the only memory I have ever had die on me and I always loved Crucial.

silkshadow
08-23-09, 02:39 PM
Got 2 new sticks of corsair ram (2x2gb), just finished the 20th pass with memtest and no errors :D. Just restoring an image of the OS and I'll let it run overnight and see if it BSODs but so far so good!

Thanks again for all the help! Last time I'm buying crucial though, that's for sure.

Trap05
08-23-09, 02:44 PM
Glad we were right

silkshadow
09-01-09, 12:58 PM
Just a post mortem on this. It was the ram :D. My friend says the system is running great.

No more crucial for me. They used to be the gold standard in reliable ram. Guess nothing lasts forever.

RJARRRPCGP
09-01-09, 06:42 PM
You may have overheating problems, at least one of the symptoms sound like when I was OC'ing a fair amount and the temps got too high. More likely if you're overvolting, especially the Vcore.
That can cause sudden errors if your intake starts taking in warm air.

And if you're at stock, it could be the consequence of using more VDIMM than the JDEC spec.