Request Help

Request Help

We would love to hear from you! Please fill out this form or call us
Call: 866-438-6932

Recovering from a RAID Controller Failure

There are many reasons why a RAID goes down.  A technician will normally assume that one or more of the drives have failed.  This is a common diagnosis as the diagnostic lights on each of the drives may be blinking, the lights may have gone amber, or in some cases the drive may not be spinning up at all.  All of these surface indicators would surely lead the most seasoned technician to assume that the drives have either failed or are definitely on their way out.  There is another reason why all these things would happen, and that would be a RAID controller failure.  The challenge is to diagnose the problem with a damaged controller.  Using a damaged controller to make a diagnosis is the same as having a sick doctor diagnose his own health problems. Some technicians will try and replace the controller and hope the configuration will reload from the drives and the RAID will mount.  DTI Data has made a very good living on technicians that swap controllers and cross their fingers in hopes that the RAID will come up.  There are so many problems with this method of trying to bring the RAID online that they are too numerous to mention.

What needs to be done is to separate the primary component of the RAID which is the hard drive from the controller in order to make a legitimate diagnosis. The following are some methods you can use that are isolated from a damaged RAID controller that will help you recover the data of your client.

First of all check the drives to make sure they are electronically sound. If you have SCSI drives use an Adaptec SCSI controller.  Perhaps an Adaptec 2930 would suit your needs.  They are inexpensive and have been around for a bit so all of the firmware bugs are worked out.  Put the SCSI card in a reliable computer and mount each drive individually. If the drives are SATA, or PATA use a standard interface port to mount the drives.

If the drive shows up in the ‘Disk Manager’  item of the ‘Computer Management’  then it is a pretty safe assumption that the interface is intact and you have some I/O between the drive and the controller. In addition to this DTI Data has a free surface scanner that will allow you to look at each drive and map any bad sectors on the drive.  If two or more drives come up having bad sectors then that could be the reason why the RAID went down.  RAID controllers are very sensitive to more than one drive exhibiting bad sectors or slow reads.  A RAID 5 controllers’ firmware may be fault tolerant, but when two drives have bad sectors the controller will degrade the array and bring it offline.

If, however, there are not any bad sectors on any of the drives then that is normally a controller problem.  You may have received a power spike, or some kind of memory fault but the fact of the matter is that barring those kinds of things the raid controller failed and will not mount your array.

In addition to doing a surface scan to verify if in fact you have had a raid controller failure you can check the integrity of the raid.  In a raid 5, the controller will do a set of mathematical operations on the data in order to be able to reverse engineer the data if a drive drops out of the array. These XOR math functions are used to do a rebuild on the array and take a degraded raid 5 hard drive and build it.  The drive will have to be replaced before the build but a raid 5 controller has the ability to integrate a brand new drive back into the array.

I bring up the raid 5 mathematics, because in order for the array to have a ‘clean bill of health’ the parity integrity must be intact.  If a raid card does not detect the fact that a drive has dropped out of the array then the drive will become stale.  A raid 5 will continue to function even if one drive is out of the array; however the raid card should notify the technician that the array has been degraded and the drive should be replaced and a rebuild performed.

DTI Data has a free diagnostic tool for raid 5 and will allow you to see if in fact there is a stale drive in the array.  I wrote a blog on how to detect a stale drive in the array and hopefully this will help you diagnose drive array controller failure.  If in fact the software finds a stale drive and the raid controller did not indicate that then the only way to recover the data is to create a virtual raid 5 array offline using software and images created from the raid 5 drives.

In order to create the images DTI Data has an inexpensive solution on our web site.  The software was designed and written with the technician imaging multiple drives to a single drive.  It is as easy as mounting the drives, selecting the source drives, the destination drive and then just walking away.  The software will not only image the drives but it will map all the bad sectors its finds and generate a comprehensive report.

These are just a few things that you can do to detect a raid controller failure.  DTI Data offers a set of comprehensive tools that will check all aspects of the raid 5 hard drive.  These tools are all on our website and will hopefully be an addition to your tool set.

If you need RAID Data Recovery Call Toll Free 1-866-438-6932 ext. 203 or direct 1-727-345-9665 ext. 203 to speak with a qualified RAID engineer now!

Tags:

3 Responses to “Recovering from a RAID Controller Failure”

  1. Jason November 3, 2010 7:06 pm #

    Hello, I’ve been trying to find information online about where RAID “membership” or signature information is stored on the drives? This isn’t related to data recovery, but since you’ve written many low-level utilities I thought you’d be the person to ask.
    We have an external RAID5 enclosure (usb+firewire, internally the drives are sata), and to verify that it worked, I pulled a drive to see how the unit behaves. It behaved as expected, all the dummy data was fine, and the OS didn’t notice. I put the drive back in, and it’s rebuilding, but the problem is it takes forever and I’d rather just reinitialize and reformat, since there’s no real data on the drives.
    After connecting each individual drive through SATA->USB adapter, I tried to use linux “dd” command to overwrite first several MB of each of the drives with 0’s (this works for wiping MBR), but the external raid device still thinks the drives are all members of a degraded array, and insists on rebuilding it, instead of recreating a new array. I suspect that it is storing drive serial numbers somewhere, but that would be bad if the whole unit fails, and I wanted to put the drives in another same-model enclosure; and other raid controllers I’ve seen can be replaced, so the Array information must be on the drives somewhere, but the question is where?
    (I tried DBAN, but even on single pass “quick” it estimates 48 hours to wipe a 2TB drive)

    thanks in advance, any reply is appreciated!

    • Dick Correa November 5, 2010 1:28 pm #

      Greetings Jason,

      The sad reality is that there is no standard for the storage of meta-data that will identify the drives. There are some series of RAID cards that store it in the first 64k of the drive. There are others that store it at the end of the drive. There are other cards that store it on the card with read/write PROMs. What is becoming most prevalent is the use of an LVM header that defines the drive order and characteristics since the firmware uses a Linux RAID handler.

      As for circumventing a rebuild, that is also on a card by card basis. Some manufacturers force a rebuild because it is a prudent thing to do when the drive has been degraded. The assumption is that you pulled the drive for a reason so the new one must be integrated back into the array by performing a rebuild. There are cards that will allow you to configure the rebuild out. There are cards that you have to tell them to do a rebuild. The rebuild is performed to protect the remaining data.

      There is no had and fast rule for RAID rebuilding, configuring, etc. I have seen the exact same RAID come into the shop where one will have meta-data at the front of the drive and the other won’t have any meta-data at all. Ultimately there is no way to be sure how to handle your particular situation.

      I am sorry to be so vague, however, the card firmware programmers are very flighty.

      Dick Correa

      • Jason November 5, 2010 6:09 pm #

        Hi Dick, thanks for the info. I found that each of the drives contained the information in the *last* 1024 bytes of the drive, so a `dd` with additional parameter “seek=###” where ### was the offset ( 2 TB – 1024 Bytes), worked like a charm. After clearing the drives, put them back in, and it rebuilt the array like it was new. (their email support was not helpful unfortunately).

        -Jason

Leave a Reply