As discussed in the previous articles we have a RAID that had some problems. These problems were exacerbated by the fact that a chkdsk was run on the file system and changed some of the attributes of the PST file we are trying to recover. Normally when this happens the file becomes irretrievable and is lost to the cluster allocation gods, but in this case the flags acted as a stop gap to prevent further data loss.
In order to understand how this happened an explanation of how the NTFS file system work, as well as a quick lesson in database design and management. First the database.
The NTFS file system, well in fact any file system is nothing more than a real time database. The records are stored as a flat file (Master File Table) and there are indexes (INDX Records) that have pointers into the flat file for fast specified data set access. As an example, when you click on a folder in explorer and all the files are displayed the file system handler does not query the MFT and build a record set for that particular folder. Each folder has its own unique record number. As in any good relational database, each folder and file in the parent folder not only has its own unique record number but they also have a parent record number that groups the folder together. Now, if the MFT were sorted by record number then a binary search could be used to find the parent folder, build the list, and then pass the list to the GUI for display to the end user. However, as I said before, the MFT is a flat file that new records are appended to and is never sorted.
One of the reasons the MFT is not sorted is because before NTFS 5 there were not any embedded record numbers in the MFT. The record number was assigned by the placement of the record in the file. In other words, the first record in the file had a record number of zero, the next record was one and so on and so forth. This was a very poor design in respect to data recovery. If the MFT became fragmented, and the cluster map for the MFT was destroyed (Record zero), then there was no way to reconstruct the MFT since a fragmented file has no order to it. However, NTFS 5 fixed all that by embedding a record number into the MFT record.
All this being said, the NTFS file system uses INDX records that keep a sort of short list of a folder set. Just some basic information like dates, file size, file name, and some security. This keeps the record small since the INDX record IS sorted and is kept sorted by the file system. In other words you have a sub data set of records for display purposes only in order to offer a faster refresh of the data set in the GUI. In addition, chkdsk will synchronize the MFT record, with the INDX record using the MFT as the base.
Well, seems like I have rambled on a bit here. As I mentioned there are flags that are set in order to let the file system handler know when data is active and inactive. In the next installment I will explain how this ‘flagging’ design used in a ‘virtual’ file system saved the PST file for my client.
Until next time…