Over the course of a week I receive several calls regarding the recovery of a RAID 5 array. In the course of the initial interrogation of a client I ask several questions regarding the state of the array, what has been done to recover the array and most importantly what happened to the array that made this particular technician begin a recovery? In over half of the conversations I have with the client/technician two drives have dropped out of the array. By virtue of its design a RAID 5 can run with one drive down which is why it takes a second drive to fault before the RAID goes offline. That being said, when a second drive goes down then the RAID firmware will flag the RAID as offline and refuse service.
The fact of the matter is very rarely do I see only two drives go down simultaneously. I will see a single drive down, or all drives down, but almost never do two drives go down at the same time. Almost without exception one drive has failed before the second one has. In other words, one drive fails, the RAID card degrades the array and continues to provide service, then a second drive fails and the array goes offline. How does this happen?
There are several reasons. First of all, a RAID card will normally send a loud alarm that can be heard in the far reaches of the universe. In order for that alarm to work it has to be configured in the firmware, and the speaker has to be undamaged. Many times the alarm does work, however the technician does not want to take the array offline since there are several users who would burn his image in effigy if he did. So, he turns off the alarm and promises himself to replace the drive when he shuts down. That day never comes. If the array is software managed and not hardware managed then there is no alarm, however an email is usually sent to apprise the technician of the impending catastrophe he will experience if the drive is not replaced and a rebuild initiated. But! The email address that was used at configuration time 8 years ago is long gone, or, the spam filter set up by the same technician bounces the email that has the ominous warning. For whatever reason, the drive is left offline and the RAID continues to provide service.
Then a second drive goes offline and the entire array comes crashing down. Most technicians try several reboots, or forcing the drives online. Some will try a rebuild, and some will replace both drives and try to do a rebuild of the array with two new drives in the array. None of these work, and all will cause permanent data loss making recovery impossible. The best course of action for any array recovery is to take the drives that are still working and make images of them onto another hard drive. We have a very good piece of imaging software on the web site that will allow you to image multiple drives onto a single drive titled “Speed Clone for Windows”. If you have more than one drive in the array that is damaged then those drives need to be sent to a clean room for an image recovery, however, the majority of the time there is no damage to any of the hard drives in the array and the imaging software will act as a safeguard for any steps you take for a recovery.
With all of this being said a technician will still have the problem of a stale drive. If the stale drive can be isolated and the drives in the array are not damaged then the stale drive can be replaced with a new drive and a rebuild performed on the stale drive. So the challenge is, how does a technician find the stale drive? Using a simple hex editor which has a search routine any technician can find a stale drive. One note, this technique only works on the NTFS file system, but I am sure it can be adapted for other file systems as well.
NTFS uses a table for keeping track of the files stored on the drive. This table is called the Master File Table (MFT). Each file on a drive has one entry in the MFT. Each MFT entry houses a great deal of data pertinent to the file. The MFT keeps the dates of the file, security attributes, the file type, the file name, where the file is stored and many more items too numerous to mention. The MFT also has an internal attribute that is unique to the record. That attribute is called the MFT record ‘magic’ value. Each record has the same ‘magic’ value to allow the operating system to verify that it is processing an MFT record. This ‘magic’ value for the MFT is the four letters ‘FILE’. This value is located in the first four bytes of the MFT record. Knowing this, if we count every sector that has the ‘magic’ number located in the first four bytes we would get a very accurate idea of how many files are on this drive. In other words, by counting the magic numbers for the MFT, we can determine how many files reside on a drive. In addition to the magic number, the MFT is normally stored in one area, this can also help us find each MFT record. Now knowing that we can find every MFT record by using a search for the magic number how a RAID 5 stores data across the array becomes very important and I’ll explain why.
A RAID 5 stores data equally between all drives in an array. Using a four drive array as an example a 800 KB file will be stored across all four drives equally. In other words each drive in the array will receive 200 KB of the file. This fact does not hold true for smaller files due to the stripe size, however, for our purposes we are dealing with a very large file, the MFT. The average server will have between 250,000 and 500,000 files on the array. As an example let’s use 400,000 files to illustrate this point. If a four drive array has 400,000 files then each drive will have 100,000 MFT records. In other words, if we were to search each drive for the MFT ‘magic’ value we would find 100,000 entries per drive. Let’s take this scenario one step further. A drive drops out of the array with 400,000 files but continues in service. As the array is used more and more files are added to the array until a few weeks later the server now has 500,000 files on the array. If we do a search now for the MFT magic we will find 125,000 files on the three drives that have remained online, and only 100,000 files on the drive that dropped out of the array several weeks ago. As you can see it is an easy matter to determine that the stale drive in the array is the one with fewer MFT entries.
It must be noted that even though one drive has dropped out of the array it is still a four drive array. The drive that has dropped out of the array becomes virtual. In other words, there is no data written to the drive and all data read from the drive is calculated from the other three drives. For this reason a degraded RAID 5 will run slower since it is calculating the virtual drive on the fly.
This is example is somewhat simplified but the method is sound. I have written a scanner that does all of this automatically and will tell me which drive is the stale one. The software uses a much more enhanced MFT record filter and keeps track of the calculations real time. The software is also available from attending one of our training classes.
I hope this information has been of service to you. If you have any questions I will be more than happy to answer them on our blog which is updated daily or you can call me directly at 727-345-9665 ext 203.
Interesting write-up. I’m a technician and occasionally have to destroy individual failed drives from RAID5 arrays.
Could you confirm whether it would ever be possible for anyone to recover any meaningful data from a single drive out of a RAID5, without the rest of the array? I’m interested in this from the perspective of determining what is the appropriate destruction method for the drive – no point in going to the expense of getting it granulated if there is no recoverable data on there anyway.
If a single drive fails in a raid 5 array, and that drive is not replaced for a while, I believe any new data written to the array will be written without parity. When the faulty drive is replaced, will the rebuild be able to calculate parity for the data that was written without parity? If the rebuild does calculate and record parity for the newly written data, how does it do it? Does it use information from the MFT to determine which data was written, while the array was degraded?
John,
When a drive drops out of a RAID 5 that drive becomes virtualized. In other words, the drive is considered there but the data written to it and read from it is calculated from the parity. That is why when you lose one drive in the array the system slows down a bit because the data read from and written to the RAID is calculated. The RAID will receive a small boost since all writes are virtual and do not take place.
Parity is always written, the degraded array just virtualizes the dropped drive. If no parity was written then the RAID would become a RAID 0 and not a RAID 5. In addition, there is still rotating parity that has been written before the RAID degraded.
When the dropped drive is replaced then a rebuild is done in either exclusive mode, or active mode and until the drive is rebuilt it is considered virtualized. It is always best to bring everyone off line and then do the rebuild. It is hard on the system and the card to do a rebuild while the RAID is in production.
I hope I have answered your question.
Regards,
Richard Correa