• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

LSI MegaRAID 9260-8i SSD VD question

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

mokrunka

Member
Joined
Dec 21, 2009
Hey guys. I have the LSI 9260-8i MegaRAID card in my system. I have 3 virtual disks:

RAID 0, 2x 2Tb Samsung drives
RAID 1, 2x 2Tb Samsung drives
RAID 0, 4x 256 GB Kingston HyperX SSDs (boot drive)

For the most part, everything is working great. However, every 1-2 weeks (varies in frequency), I have one of my SSDs drop out of the array, and my system locks up. It takes a number of reboots, and then the drive will magically reappear as connected, and everything is fine again until the next time.

I'm not a pro with these cards, and this is only the second time I've done this, and never with SSDs before.

Is there any special setting/configuration that I should be using for SSDs in a RAID 0 configuration? Has anyone had any experience with this before?

Thanks
 
There shouldn't be a special setting. When it locks up, immediately restart and get into the RAID BIOS to see what it says. The RAID will likely be degraded and the offending drive probably won't be listed. Figure out which one this is.

In the meantime, check to the logs to see what it says. You can do this from the RAID BIOS or from the installed software that you can download for the card. It should tell you exactly what happened. For example, if the drive is not responding or dropping completely. I would suspect the cable or the drive itself, especially if it is the same one every time.
 
That's what I'm leaning towards as well, a bad drive. I don't think it's the cable, since it is in a group of 4, and the others are working okay. I've tried switching both PSUs and SATA power cables within a PSU, and no change, so those should be good.

The errors in the storage manager list a bunch of "unexpected sense, unrecovered error", and a command timeout on PD 5, which is always the one that falls out of the array. I've checked connections several times (every time it happens) but they seem to be good. After the crash, the logs indicate that the system is rebooted, and then tries to initialize all the disks, and they all are found except the culprit drive.

As a troubleshooting measure, will it mess up the VD if I switch cables between 2 SSDs to see if the error follows the cable, or the drive?

Is there some sort of maintenance that gets run on the drives periodically in the background that could be causing some problem?
 
If it is the same drive, it is the cable or drive, for sure. You can swap cables without a problem. The RAID controllers knows what the drive is based on the information it has saved to the disk and information it has saved locally. Switching the location doesn't matter.

Don't count out the cable just because the other drives are ok. Those breakout cables have single points of failure for individual drives.
 
I had about the same on other controller with my 3x Crucial M4 SSD in RAID0. One had random issues with connection every 3-4 weeks and after maybe half year it just died. I made RMA and got new one. There were no SMART errors or anything else.
If you change SATA/SAS cable like thideras suggested and it will be the same then you can expect that drive can die.
 
Okay, thanks for the help guys.

I'll swap the cables on two of the drives and see how that works.

I should be able to RMA the drive if it's bad since it's only a couple months old.
 
Since you have the space to do it, you should take an image* of the drives when they are working, break the array, then restore it (or restore it to another disk). This way, you don't have to reinstall. Clonezilla should be able to do this.


*resize the partition first so it is smaller than a single SSD if you do this
 
Found the drive that was bad and replaced it. Good call guys. Clonezilla was surprisingly easy to use, but I screwed up one thing; I didn't make the image onto a partition that would fit back on my SSD VD, so I wound up having to reinstall everything anyways. I was too dumb to pay attention to what thid said above. Lesson learned though, and I found a cool piece of software.

Next question. When I built the RAID 0 volume with the SSDs (let's call it, drive "A"), I was not able to enable SSD caching, which I think is considerably slowing down my xfer rates from what I was using before. Before the rebuild, drive A was reading and writing around 1.2GB/s (64 kB stripe size), now, I'm down around 550MB/s read/write (512 kB stripe size). I changed the stripe size in hopes of getting better performance, not worse! The drivers are installed for the card, and everything else seems to be working fine. Any ideas on the speeds and the SSD caching? When I go into the WebBIOS to adjust the controller settings, it will not allow me to add SSD caching to drive A.

Kind of random, but holy crap your servers have got a lot of RAM, thid.
 
I believe SSD caching is to add a SSD drive in addition to an existing array. Trying to add solid state drive caching to an array using the disks that are already in that array doesn't make any sense. Did you enable write back cache? This is usually disabled by default.

Yup, the servers have a bit of RAM. I don't get to touch them until the Chimp Challenge is over, though. ;)
 
Hmmm. I swear I had enabled it the first time. I had never heard of it before the software suggested it to me.

Nevertheless, I may have been mistaken. Changing the cache mode to always write back doubled my speeds, back where they were before. Great suggestion, and thank you.

Dare I ask about the chimp challenge?
 
Glad that got it fixed. You do have a BBU, correct? If not, I would suggest investing in one and leaving that option disabled until you do.

Chimp Challenge information is here.
 
I didn't know I could buy a BBU as a standalone and retrofit it. I'll have to check that out.

Thanks again
 
Back