Last time I gave a brief description of a RAID that we received here at the shop. I also shared some of the things the client had done to try and get their RAID back online. I also explained that chkdsk had been run. What made this RAID unusual was that the client had replaced a drive and was in production for a short period of time. This made the data set spread across four drives in a three drive array.
Now that we have a basic idea of the RAID situation how do we solve the problem of a data set that is spread across more drives in the array. One last note, I believe there was a stale drive in the array so that made it even more complicated to build a data set that would get this particular PST file off.
The situation with this particular file was that it was fragmented very badly. This in itself is not a terrible thing but file fragmentation slows access and is very difficult to recover from if the file is deleted. As I had mentioned before ‘chkdsk’ had been run on this drive, and in addition, the drive had a file size of zero.
In order to understand file fragmentation it is important to understand the NTFS file system. I’m not saying you have to be a guru, however, how the file system stores the cluster map is of paramount importance in this recovery. The following is a brief description of how the cluster map is stored in the NTFS file system.
All files and folders in the NTFS file system have a single record that houses all its information. Attributes like the file name, file size, security parameters, the date of creation, updates, and much more is stored in a 1024 byte record in a database called the Master File Table (MFT). The MFT is the heart of the NTFS file system and if destroyed will cause all data to be lost. One of the attributes that is stored in the MFT is the cluster map. This map houses all the information as to where the file is stored. NTFS uses a storage unit called clusters to store data on the hard drive. Each cluster is normally 4096 bytes (8 sectors).
The cluster map is extremely simple in concept but in application is horrific. Basically it consists of two numbers. The first number is the starting cluster, the second is the number of contiguous clusters from the starting cluster. This is an extremely simplified explanation, but it is basically correct. This cluster mapping is called a runlist.
Now, if a file is fragmented you have many of these cluster mapping pairs. In fact there are files that have so many cluster mapping pairs that the cluster data must be stored in another record. In other words, the runlist must be stored in a second, third, or more records. NTFS has a facility for doing this called an attribute list. As briefly as possible, if an attribute grows so large that it cannot be stored in the main MFT record then an attribute list is built and the runlist is stored there. The file that I had to recover had two extra records that stored an enormous runlist.
Next time I will explain how the fact that the file I had to recover was fragmented actually made the recovery much easier. Until next time…