• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Help me with Intel ICH8 SATA CRC fails

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Psycogeec

Member
Joined
Sep 15, 2006
First the problem:
Transferring terrabytes at a time of videos that are 50m-4g for transfer to a different drive, and for working on my professional video work.
I binary CRC the transferred files , a rare file will have failed to transfer properly to the other hard drive.

Average bit inacurate files <0.3% relative to data quantity (the bigger a file the more likly some bits will be wrong)
If I was to put a number on, for each terrabyte of disk to disk movement, there are 10-15 tiny changes to the data.

Compares of the files show that there will be a few items within the file that changed, but basically speaking the files are still 100% usable. If it was not for the CRC check I never would have known.
Compares can be repeated on only the (few) failed files, which demonstrates if a read error occured or if it was written to the drive that way.

I have traced the situation down via the process of elimination to "things going through the Intel controller" Most. (another controller doesnt have that issue, and I proved the hard drives are bit accurate in other ways).

I have tried to make the post short , because i have experimented for days and days , and it would read more like a book, but if you need to know more i will certannly write the book :)

Question:
I want to know what is the likly cause of a few tiny bits not making it from one hard drive to the other VIA the Intel ICH8. More bits fail the faster and harder i push it.

Machine Specs:
Board: P5B-Deluxe (no wifi)
Chipsets Intel 965 ICH8 SATA controller
CPU: 9550 Duo Quad OC to 3650
Ram: Normal OCZ 800type runnning ~830 SPD timing
FSB: 1720 Multi 8.5 Voltage of CPU ~1.2v
Passes Prime95 ram based and cpu based torture tests
OS: XP pro SP1.5

Whats hot and what not:
High temps during these tasks only on 965 chip (and cpu when pushed which it isnt)
ICH8 chip has tall heat sync on extra does not seem to be overly hot
Ram Chips get hot, but not unusually so
Case Air flow (at the moment) is lower, I have the case open now
Voltages are all stable.

Drive Configuration:
Drives are configured for Raid0 (at this moment) but the configuration did not seem to be the actual issue, because i can still get the bit fails (less) when blazing enough data from singles.
Drives are HD greenies , but i have not had "RAID" issues with spindown and error fix stuff , S.M.A.R.T. shows there (basically) have never been any bad sectors on the things, so that aspect of raiding greens has not come up yet.
Billions of bits of data are used on this computer daily, and you would never know that anything was going wrong ever, without the CRC check, the system has been on green raid for years.

The situation is worstest when going from RAID0 X2 to RAID0 X2 drives moving data at 89-130MBS. Having 2X X2 Raid0 is new for this computer, so HD data rate is much higher along with more CRC fails.

ICH8 Chip does not seem Overly hot, although it gets hot when lots of stuff going through it.
In the BIOS the ICH8 chip voltage can be reduced or raised, i was previously running it reduced for cool/quiet, but i put the voltage back to "normal" and i am pretty sure it still has the problem.

being terrabytes of data that are working 100% and even CRC checking 99.9% correct, and being massive file sizes the TIME it takes to test the issue is still days and days.
 
Last edited:
Duhh, why didnt i notice that too.

it is changing (a few) bits (very rarely) when reading TOO
and weird thing is i have 3 compares going at one time

1) 1 compare is slow going from a single to the RAID0 2X NO FAILS

2) going from RAID0 to OTHER RAID0 AT THE SAME TIME the faster one is getting (few tiny) read fails.

3) 1 compare is slow going from a single to the OTHER RAID0 2X NO FAILS

The single drive is on the other controller.
The 1 and 3 compare show that the data is the same, be it on the raid or the other raid or the single, but 2 is going raid to raid and still failing (rare).
The strain on the chip should be the same , beings they are all 3 going now at once ??? all compares are running at once ???


hmmmm, there must be a clue there.

If it is a heat issue, why isnt the slower one having the same problem?
If it is a RAM issue , why isnt the slower one having the same problem?
If it was a drive issue , why was the drive not having a prolbem before when used as single. how can the data be the same on the other compare?
if it was a raid issue why does it only happen on the one going from raid to raid? the same drives that the others are checking?

i am straining my brain and the answer must be right in front of my face, but i cant see it. its impossible!

Picture this

S1 --read--> Compare <--read-- R2 working

S1 --read--> Compare <--read-- R1 working

R1 --read--> Compare <--read-- R1 Failing (rare)

all 3 running at the same time. all basically going via the controller, cache, ram and stuff to the OS into 3 same programs to compare
 
Last edited:
Memory.
although the memory tested ok in memtest, it was the only thing that was left it could be.
so where is the controllers memory ?(i thought)

the system memory tested "ok" in memtest, but had i provided enough leeway for different uses of the ram that might still not work.
the ram worked for photoshop, and encoding, and all but the controllers use could be much faster or something.
SO
I raised the overclock to get the system RAM to fail in memtest, then re-voltaged the memory till it passed (2.1v) at that higher overclock. then lowered the overclock back down, retested with memtest, then tested in the system.
and it is now working 100%.
 
Back