Request Help

Request Help

We would love to hear from you! Please fill out this form or call us
Call: 866-438-6932

Recovering a badly fragmented Outlook PST file after a few rounds with chkdsk The Final Installment

Welcome back!  As before, we still have the problem where chkdsk was run on a RAID with a stale drive.  We have had a brief explanation of how NTFS 5 works and how the data is stored on the volume.  Lets take a much more detailed look as to how the data is stored in a virtual database and how this helped me recover the PST file.

There are two types of databases.  Static, and virtual.  A static database is allocated before run time, and the size stays the same.  In addition, the record format is usually static as well.  Static basically meaning that all the sizes remain the same, and if you want to change a field size, of record size the database must be rebuilt.  I used to work in a UNIX database called UNIFY that had both options.  The upside to a static database is speed.  They are much easier to index and in many cases no index is needed.  Record placement in the file is used as indexes.  The down side is that if you want to change a record, or a field in a record you must make the change, export the data, and import the data back into the database.  You don’t see many static databases anymore if at all, but the database admin and designer really earned there money in the day when this type of database was used.

Another type of database is a ‘virtual’ database.  This database has record sizes that are virtual, field sizes that are virtual, and the database itself can grow as large as is allowable by the file system so it is also virtual.  The upside to this type of database is that you can make any changes you want to the records, fields, size,  data type, etcetera and you do not have to rebuild, or import the data.  A database change can be done live.  The downside is that you need indexes, and that can cause design issues.  Maintenance on the indexes can slow drive access.  There is one issue that came to light for the old PST file format which is a virtual database using static data types.  Some of the data types used to access offsets in a PST file were INTS which have a maximum value of 2147483647.  If this value is translated into bytes then this is 2 GB.  Coincidentally, the PST files in Office 2002 and before would corrupt data in files 2 GB or larger.

Why all this talk about virtual and static databases?  NTFS 5.0 is virtual.  The MFT is virtual, and all file and folder allocation is virtual.  The problem with this virtuality is that the main component of a MFT is not virtual.  A Master File Table Record is not virtual, it is static.  It does not exceed the size of 1024 bytes.  That being said, what happens when an attribute of an MFT record exceeds the 1024 byte limit?  What happens when a file is so fragmented the a run list grows so large that it outgrows the MFT record?

Microsoft uses a facility called an attribute list.  In my next installment I will explain the attribute list and how it plays a starring role in the PST recovery.

We have learned about RAID rebuilding theory. Stale drives in a RAID 5.  We have covered NTFS 5 and some of its design components.  We have even looked at database design and how it plays a part in the recovery of a data file.  Finally, we covered static and virtual database sets, and what to do when they are mixed.  Now I bring you the attribute list.

As I stated before, the MFT has a static size of 1024 bytes.  It contains all the information about a file INCLUDING the runlist.  The runlist is the cluster map of a file and can grow very quickly.  One of the things that can cause a runlist to grow is file fragmentation, and when talking about a PST file, fragmentation is the byword.

When a runlist exceeds the confines of an MFT 1024 byte storage limit Microsoft implements a method called an attribute list.  To put it as simply as possible, an attribute list is another MFT record that houses ONLY ONE attribute type.  In other words, the runlist that was stored in the primary MFT record is now moved to another MFT record that is only used to store the runlist of primary record.  The runlist is exactly the same as if it were being stored in the primary record, it is just stored in another area of the Master File Table.  The MFT record is EXTENDED by using the attribute list.

There are two components of an attribute list.  An attribute type, and the data related to that attribute type.  The attribute type is maintained in a data structure called an attribute header.  The attribute header has several components, but the attribute type is a flag to tell the file system handler how to process the following data.  In this case we have a data type of 0x80 and a data storage type of non-resident.  These two attributes mean that we have a data runlist and should process the next set of data accordingly.

The attribute type also has another value which can be -1.  This value means do not process the following data and continue to the next attribute type. Now here is where it gets very interesting.

When deleting a set of data Microsoft has ALWAYS left the actual data behind.  In the FAT file system the File Allocation Table may have been updated but the File Entry Record was only ‘flagged’ as deleted.  A value is placed in the first byte of the record to indicate to the record is no longer in use, and in fact, this space can be used. This is called a ‘virtual’ delete and has been used in databases for years.  In NTFS, a flag is also set in the MFT but all of the record data is intact, just one bit in a flag is changed.

With this knowledge I will explain what happened to the PST file and how I used these virtual flags to recover my clients PST file.

Well, if you have stuck with me this far then you have made it to the end.   Just to recap, we have a PST file that has a file size of zero, and cannot be recovered.  We had a RAID with a stale drive that marked the file system as dirty and caused a chkdsk to be executed.  We have an NTFS file system that uses virtual flags to mark data as active and inactive in a database type environment.  Lastly, it is important to note that marking data as virtually inactive leaves the data intact.  Until something writes over the cluster where the data is stored, the data is not deleted, wiped, or otherwise changed.  It remains intact.

The final ingredient to this recovery is the fact that the file was so fragmented that an attribute list had to be used.  By storing the attribute list as an extension of the original MFT record it has a data type flag.  By changing the flag from a 0x8000, to a 0xFFFF the data is marked as inactive and not to be used by the MFT to find the runlist for our PST file.

So here is the secret to how I recovered this PST file.

The original PST record had index pointers to the MFT records that contained the attribute list data.  When the client did the rebuild he used a different drive and this data was not touched.  After the rebuild with the stale drive chkdsk ran and marked the file with zero bytes, and flagged the attribute lists type with 0xFFFF.  Now, knowing where the attribute lists were stored, and how the flags were set I executed the following steps.

First I took my hex editor and changed the file size to the original file size.  I use the size of the cluster map to find the original size.  Second I copied the original pointers to the attribute lists into the new MFT record.  Third, I changed the attribute type flags from 0xFFFF, to 0x08000.  This told the file system handler that the attribute list was now active again and could be used to retrieve the data. Fourth, I used a modified Recovery It All 2008 to move the PST file onto my server.  Coincidentally, the file was almost two GB in size. Finally, I used scanpst to cleanup any bad records in the PST.  During the rebuild some data got moved and overwritten but 95 percent of the file was recovered.

This was an excellent exercise and I really enjoyed this recovery.  I hope you learned something, I know I sure did. Take care, and as always…

Learn more about RAID 5 data recovery here.

Tags:

4 Responses to “Recovering a badly fragmented Outlook PST file after a few rounds with chkdsk The Final Installment”

  1. shrieksss February 24, 2010 6:37 am #

    0 Byte PST after SCANPST and CHKDSK

    Hello,

    I lost my pst file of approx 1GB size containing important official mails of 4 month period. I have been taking backup once in a while, but for this period I did not have backup.

    I used to store all my PSTs in a USB HDD drive, so that I could use Outlook 2007 both from Office Laptop and Home Laptop, by plugging the HDD to respective laptops, HDD being lighter and easier to carry around. Both laptops run Windows XP SP2.

    On one occasion however, what probably happened was that while logged into outlook, I put the laptop in Hibernate mode(not shutdown), and then inadvertently pulled out the HDD. (The HDD was not ejected before hibernate.)

    On next login I got an outlook error about PST being corrupted (don’t recall the exact message) and outlook recommended that I run Inbox Repair Tool (scanpst.exe).

    I ran scanpst.exe, but it could not repair the Inbox. Scanpst.exe created a logfile in the same folder as the pst folder, which I have appended below.

    After this, Scanpst recommended that I run CHKDSK. Here is when I made blunder of not taking a backup of the corrupted pst, before running chkdsk.

    After running chkdsk (following the sequence – My Computer>Properties>Tools>Checknow), to my horror, I found that pst file was reduced to 0 bytes.

    Immediately after this mishap, I made sure not to write anything at all on this external USB HDD, and other than one 0 byte pst file, condition of the rest of the file system and physical condition of the HDD is excellent.

    The size of the HDD is about 80 GB and about 40% is free space. For fear of overwriting, I did not analyze the fragmentation condition of the HDD. But the since the PST file is about 1GB in size, it may be fragmented.

    I tried to run a variety of (atleast 3-4) types of recovery softwares including file signature verification types of software, but they failed to recover the mails in the pst file in question, as the softwares usually came up with 0 bytes pst only. (It seems that a deleted file is easier to recover than recovering the mails data in a pst file which is not deleted, but has 0 bytes)

    Can you suggest good recovery software and nore detailed do-it-yourself instructions and how to recover the mails in 0 byte pst file?

    Regards

    Scanpst log file content>>>>

    Microsoft (R) Inbox Repair Tool
    Copyright (C) Microsoft Corp 1995-1996. All rights reserved.

    **Beginning NDB recovery

    **Attempting to open database

    **Attempting to validate header

    !!End-of-file less than actual (read=44A94400, actual=448A4400)

    **Attempting to validate AMap

    !!AMap page @1124008960: CRC mismatch (read 4CCC649C, computed 666C1A77)
    !!AMap page @1124008960: Sig mismatch (read 89C9, computed 0000)
    !!AMap page @1124008960: PTYPE mismatch (read E7, expected 84)
    !!AMap page @1124008960: PTYPE does not repeat (E7/45)
    !!AMap page @1124008960: BID mismatch (read F609C89C945604C, expected 42FF0400)

    !!AMap page @1124262912: CRC mismatch (read 00000000, computed 29560247)
    !!AMap page @1124262912: PTYPE mismatch (read 00, expected 84)
    !!AMap page @1124262912: BID mismatch (read 0, expected 4302E400)

    !!AMap page @1124516864: Sig mismatch (read 62F2, computed 0000)
    !!AMap page @1124516864: PTYPE mismatch (read 80, expected 84)
    !!AMap page @1124516864: BID mismatch (read 7E89BA, expected 4306C400)

    !!AMap page @1124770816: CRC mismatch (read BF473D63, computed 0FA233F1)
    !!AMap page @1124770816: Sig mismatch (read 0F4C, computed 0000)
    !!AMap page @1124770816: PTYPE mismatch (read E5, expected 84)
    !!AMap page @1124770816: PTYPE does not repeat (E5/E2)
    !!AMap page @1124770816: BID mismatch (read C0D2673588AE5DEC, expected 430AA400)

  2. Jake April 19, 2010 2:26 pm #

    I am having trouble getting expertise on restoring an accidentally deleted large pst file. Does Dick Correa or someone else you know of have a service where I can send my hard drive to, and hire them to see what they can recover ? No one in the Seattle area that I can find seems to know much about this.

    Thanks,
    Jake

    • DTI Data Recovery April 19, 2010 2:31 pm #

      Jake,

      Yes we have done this in the past, re-created pst files from leftover fragments. Call 727-345-9665 ext 203 to speak with an email expert.

Trackbacks/Pingbacks

  1. PST Recovery From A Failed RAID 5 | Computer File Recovery - November 24, 2009

    [...] original post here: PST Recovery From A Failed RAID 5 Tags: attribute, data-recovery-solutions, database, drive-recovery, file, file recovery, it-news, [...]

Leave a Reply