• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Cache problem driving me crazy!!

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

strophy

New Member
Joined
Dec 19, 2003
Heyas, I'm new here.

My new processor has been causing problems. The entrie system is running at spec speed, and nothing is or ever has been overclocked. The system frequently dies, briefly flashing a BSOD before restarting. From what I can read in the brief time it is on the screen, the errors are identical to those experienced by the user in forum post http://forums.amd.com/index.php?showtopic=3376. Running memtest86 on the system results in memory errors throughout the entire range of addressable memory.

I thought this was a memory problem, and bought a stick of Infineon chips on Infineon stick memory, only to find that exactly the same error would occur, regardless of what combination of memory I had installed (Inf/OEM/Inf&OEM). I then noticed that the errors only occured in the tests which utilised the processor cache, particularly test 1. It generates about 150 errors in one pass of the memory! Strangely, the error is always identical, the value stored to memory is increased by exactly 0x00002000, e.g. 0x80808080 becomes 8080a080, or 0x00000000 becomes 0x00002000. I disabled the L1 cache in the BIOS, and all the problems vanished. Unfortunately, my system is now slower than a 386. My question is if this is now due to a faulty processor (which tends to run quite hot btw, 45-60) or if the motherboard or power supply is to blame.

I have of course tried different swapping the memory sockets, swapping graphics cards (PCI & AGP), removing the processor and cleaning it and the heatsink with medical alcohol and reapplying the thermal compound, etc, etc.

Please let me know what you think about this so that I can resolve this issue ASAP. Thanks and hi from Germany, Leon

AMD AthlonXP 2400+ (128/256)
ASUS A7V266-EX (1011 BIOS)
NVIDIA GeForce 2 GTS 32MB (ELSA)
2x 512MB DDR266 CL2 RAM (Infineon/OEM Limbus)
Codegen 300XX Power Supply (350W)
Arctic Cooling Copper Silent 2 (Rev. 2)
 
check your L2 cache level on your CPU...I had this issue with a 1700...memtest86 gave me errors on the L1 cache...but when I looked at WCPUID it stated my L2 cache was only reading as 64KB, not 256 like it should of been...which in turn was putting too much stress on my L1 cache...which resulted in the L1 cache errors and a very unstable system...

And those Infineons are not very good at all...I wont ever waste my money on getting anything like them again...its worth saving up for good memory...etc. Kingston Hyper X...Corsair...OCZ...to name a few...
 
Its really possible that you have a faulty processor, if its retail just contact AMD and see what you can do to get it replaced. If not, maybe its time for a new CPU
 
Still, I'm 99% confident it isn't a RAM problem. It's just too weird that it only occurs during the cache tests. What's the chance of having 2 brand new memory sticks being both faulty? As far as the BIOS goes, no, I have tried every BIOS released by ASUS for the board, a good 6 of them including the new beta2 version which has been out for a few months now and looks like it won't be developed into a final release :(

WCPUID reports everything as being totally normal, with L1&2 cache sizes and such. Is there a way to test ONLY the L1 or L2 cache, without accessing the normal memory? can anybody think of a way to distinguish if this is a processor or motherboard problem?

thanks for your speedy replies
Leon
 
sisoft sandra cache & memory test works fine, no crashes.... i tried the prime95 tests, the L2 cache test almost always fails, but without killing windows completely. I guess it depends if the kernel gets the bad data or a program. tests using more memory in prime95 run fine, could it be that this shows it is definetely an L1/2 cache problem, and not something with the motherboard?

i also tried underclocking to 100*15=1500MHz, this resulted in much more stable operation, but again proves nothing!

any other ideas or tips? has anybody ever had their cache DOA or go bad?

thanks,
leon
 
ok if sandra didn't crash on the cache test then the CPU's cache should be just fine.
What are your full load temps? Temprature maybe causing instabilities and when the CPU is used in memtest it produces errors
 
good point. I was checking the temperatures, running prime95 it levels off at about 71°. idle is about 55°. after running memtest86 a quick look in the bios shows about 60°. these temperature are with the motherboard flat on my bench, i.e. not inside a case or anything.

it depends a lot on the ambient temperature, which is normally a summery 30°. with the window open to the sub zero temperatures of wintertime germany, the cpu goes down to 40°. any opinions? the die is rated at 85°, so theoretically it shouldn't generate any errors under this temperature, right?

thanks again,
leon
 
Its rated for85C before starting to actually fail (burn up) not to run stably, 71C is a very high temprature to run your chip at, OCed or not. What motherboard are you using? Could be just misreading the temps. I wouldn't advise actually touching the heatsink as if its really 70C it can burn your hand, but try to get ur hand close to the heatsink while its running and see if its really that hot.
Also what Case do you have and what Heatsink? If you have one of those cases where the PSU hangs just above the intake of the CPU fan could cause such high temps as well...
 
As I said, the motherboard is currently out of its case on my desk, and it is reaching these temperatures. I am using an Arctic Cooling Copper Silent 2 (Rev. 2) cooler, with the supplied thermal grease. The system was reaching similar temperatures before I had this cooler, just using the AMD stock heatsink, that came with the boxed processor. The system is currently running Prime95, and reaching 68°. The heatsink feels warm to the touch, but certainly doesn't burn my fingers.

The case is a Codegen E-6094. The power supply is also from Codegen, 350W. The case has an exhaust fan at the back in addition to the power supply exhaust, and I have installed an intake fan near the bottom front of the case. The power supply is certainly not in the way, and the airflow in the case should be pretty good when everything is installed there and I tie up the cables. Keep in mind this is a system running completely in spec, there should be no such problems occuring!


I am using MBM5 and the BIOS to read the temperatures. It is an ASUS A7V266-EX board. I thought the problem was originally due to the thermal limit jumper on the board, which cuts power to the processor at when a high temperature is reached, but I think this triggers at about 85°, as changing the jumper has no effect, and the tests still fail either way.

Hope this is still interesting for you, and thanks again,
Leon
 
It seems like you might've applied too much thermal paste to the core. Try reseating the heatsink with a VERY thin coat of thermal greese, its only supposed to cover the CPU core, and the layer is supposed to be as thin as possible.
 
yeah, I know. I applied a layer so thin it was pretty much transparent, and upon removing the heatsink it looks perfectly even, there is no paste squishing out the sides or anything. When I reapply the heatsink, I always completely clean the die and heatsink with medical alcohol and reapply new grease. I've been doing this sorta thing for a while, which is why I'm very puzzled that it is running so hot and so unreliably. Keep in mind that while idle in a colder ambient environment it comes down to about 40-50°, but the errors still occur..

cheers,
leon
 
Back