Welcome back! As before, we still have the problem where chkdsk was run on a RAID with a stale drive. We have had a brief explanation of how NTFS 5 works and how the data is stored on the volume. Lets take a much more detailed look as to how the data is stored in a virtual database and how this helped me recover the PST file.
There are two types of databases. Static, and virtual. A static database is allocated before run time, and the size stays the same. In addition, the record format is usually static as well. Static basically meaning that all the sizes remain the same, and if you want to change a field size, of record size the database must be rebuilt. I used to work in a UNIX database called UNIFY that had both options. The upside to a static database is speed. They are much easier to index and in many cases no index is needed. Record placement in the file is used as indexes. The down side is that if you want to change a record, or a field in a record you must make the change, export the data, and import the data back into the database. You don’t see many static databases anymore if at all, but the database admin and designer really earned there money in the day when this type of database was used.
Another type of database is a ‘virtual’ database. This database has record sizes that are virtual, field sizes that are virtual, and the database itself can grow as large as is allowable by the file system so it is also virtual. The upside to this type of database is that you can make any changes you want to the records, fields, size, data type, etcetera and you do not have to rebuild, or import the data. A database change can be done live. The downside is that you need indexes, and that can cause design issues. Maintenance on the indexes can slow drive access. There is one issue that came to light for the old PST file format which is a virtual database using static data types. Some of the data types used to access offsets in a PST file were INTS which have a maximum value of 2147483647. If this value is translated into bytes then this is 2 GB. Coincidentally, the PST files in Office 2002 and before would corrupt data in files 2 GB or larger.
Why all this talk about virtual and static databases? NTFS 5.0 is virtual. The MFT is virtual, and all file and folder allocation is virtual. The problem with this virtuality is that the main component of a MFT is not virtual. A Master File Table Record is not virtual, it is static. It does not exceed the size of 1024 bytes. That being said, what happens when an attribute of an MFT record exceeds the 1024 byte limit? What happens when a file is so fragmented the a run list grows so large that it outgrows the MFT record?
Microsoft uses a facility called an attribute list. In my next installment I will explain the attribute list and how it plays a starring role in the PST recovery.
We have learned about RAID rebuilding theory. Stale drives in a RAID 5. We have covered NTFS 5 and some of its design components. We have even looked at database design and how it plays a part in the recovery of a data file. Finally, we covered static and virtual database sets, and what to do when they are mixed. Now I bring you the attribute list.
As I stated before, the MFT has a static size of 1024 bytes. It contains all the information about a file INCLUDING the runlist. The runlist is the cluster map of a file and can grow very quickly. One of the things that can cause a runlist to grow is file fragmentation, and when talking about a PST file, fragmentation is the byword.
When a runlist exceeds the confines of an MFT 1024 byte storage limit Microsoft implements a method called an attribute list. To put it as simply as possible, an attribute list is another MFT record that houses ONLY ONE attribute type. In other words, the runlist that was stored in the primary MFT record is now moved to another MFT record that is only used to store the runlist of primary record. The runlist is exactly the same as if it were being stored in the primary record, it is just stored in another area of the Master File Table. The MFT record is EXTENDED by using the attribute list.
There are two components of an attribute list. An attribute type, and the data related to that attribute type. The attribute type is maintained in a data structure called an attribute header. The attribute header has several components, but the attribute type is a flag to tell the file system handler how to process the following data. In this case we have a data type of 0x80 and a data storage type of non-resident. These two attributes mean that we have a data runlist and should process the next set of data accordingly.
The attribute type also has another value which can be -1. This value means do not process the following data and continue to the next attribute type. Now here is where it gets very interesting.
When deleting a set of data Microsoft has ALWAYS left the actual data behind. In the FAT file system the File Allocation Table may have been updated but the File Entry Record was only ‘flagged’ as deleted. A value is placed in the first byte of the record to indicate to the record is no longer in use, and in fact, this space can be used. This is called a ‘virtual’ delete and has been used in databases for years. In NTFS, a flag is also set in the MFT but all of the record data is intact, just one bit in a flag is changed.
With this knowledge I will explain what happened to the PST file and how I used these virtual flags to recover my clients PST file.
Well, if you have stuck with me this far then you have made it to the end. Just to recap, we have a PST file that has a file size of zero, and cannot be recovered. We had a RAID with a stale drive that marked the file system as dirty and caused a chkdsk to be executed. We have an NTFS file system that uses virtual flags to mark data as active and inactive in a database type environment. Lastly, it is important to note that marking data as virtually inactive leaves the data intact. Until something writes over the cluster where the data is stored, the data is not deleted, wiped, or otherwise changed. It remains intact.
The final ingredient to this recovery is the fact that the file was so fragmented that an attribute list had to be used. By storing the attribute list as an extension of the original MFT record it has a data type flag. By changing the flag from a 0x8000, to a 0xFFFF the data is marked as inactive and not to be used by the MFT to find the runlist for our PST file.
So here is the secret to how I recovered this PST file.
The original PST record had index pointers to the MFT records that contained the attribute list data. When the client did the rebuild he used a different drive and this data was not touched. After the rebuild with the stale drive chkdsk ran and marked the file with zero bytes, and flagged the attribute lists type with 0xFFFF. Now, knowing where the attribute lists were stored, and how the flags were set I executed the following steps.
First I took my hex editor and changed the file size to the original file size. I use the size of the cluster map to find the original size. Second I copied the original pointers to the attribute lists into the new MFT record. Third, I changed the attribute type flags from 0xFFFF, to 0x08000. This told the file system handler that the attribute list was now active again and could be used to retrieve the data. Fourth, I used a modified Recovery It All 2008 to move the PST file onto my server. Coincidentally, the file was almost two GB in size. Finally, I used scanpst to cleanup any bad records in the PST. During the rebuild some data got moved and overwritten but 95 percent of the file was recovered.
This was an excellent exercise and I really enjoyed this recovery. I hope you learned something, I know I sure did. Take care, and as always…
Learn more about RAID 5 data recovery here.