It is not often that I am asked to go on site to perform a recovery. But in this case a large mining and exploration company had suffered a power outage and one of their HP dedicated servers was now not booting and they wanted me to come in and recover data in house.
They of course had a dedicated Server room but their office building was subject to frequent power drops both scheduled and intermittent. This has caused them to move the bulk of their IT infrastructure to a Data Center and thus alleviate their power outage problems.
However they were still running a legacy server running Windows 2012 and Hypervisor with various server images. For those that don’t know 2012 with hypervisor allows the running of multiple Servers or virtual machines of many flavour , eg Linux , on the one machine.
The first problem that came to my mind is was the actual raid configuration damaged. Their server still booted to Windows 2012 however none of their Virtual Machines would load. Obviously the RAID config on which the operating system was located was still intact however the volume or volumes where the virtual machines were located was on a separate 13 TB RAID 5.
Three possibilities existed
- 2 or more of the hard drives in the RAID config had failed which usually means some data loss.
- The drives were still healthy but the raid configuration data on the drives had become corrupted.
- The drives and the RAID were healthy but that crucial volume information ( the partition table) and/or Master File Table had become corrupted.
If any of the first 2 conditions existed it would be impossible to do the recovery on site as reconstruction of the raid would need to occur and this would need to be done with recovery tools having direct access to the drives and the drives not being connected to a RAID Controller. Recovering failed RAIDS is time consuming and complex and a greater cost.
Fortunately after much discussion with myself ant the organisations IT staff it was determined that there was good likelihood the RAID was intact and the third failure scenario was most likely.
Once on site I was able to use Windows Disk Management console to see that a 13 terabayte RAW volume existed so at the very least the partition information for the drive was missing.
I installed my favourite recovery Data Recovery tool on the Windows 2012 volume which is R-Studio see https://www.r-studio.com/data-recovery-software/.
Although the above image is not from the failed RAID if we look at the second 1tb Western Digital drive in the above image we know it has 4 separate volumes. Our failed RAID was devoid of any volume info so we know the partition table of the drive was corrupted by the power failure.
In my experience when this happens it is more than likely we are going to find a corrupt MFT or Master File Table. The MFT contains the info Windows Operating Systems Require to locate files on a hard drive or RAID.
We can consider the drive or RAID as a contiguous set of blocks ( 0 – 9999999999) and a file will occupy a number of these blocks and they are recorded in the MFT. Without this info the operating system has no idea where these files are located.
So it was more than likely we were not going to recover the Virtual Machines via the file system. R-Studio does it work by scanning the whole drive or RAID volume from block 0 to the last block and tries to rebuild the file system information.
What R-Studio can also do while scanning the any volume is try to recover individual files such as jpgs or word documents by signatures within those files and it can also discover virtual machines such as those used by Hypervisor. They are called Hyper-V virtual disks and have a vhdx file extension.
So I configures R-Studio to try and find these files while it was scanning the 13 TB RAID and started it on it’s scan which took approx 24 hours.
Upon arriving at the organisation the next day it was clear that the MFT had been corrupted as R-Studio reported back it had discovered nearly 10 different 5 terabyte volumes.
It had however found 3 1tb or so VHDX files which corresponded to the the files we were looking so I made a decision to recover those first. When you recover data form a failed drive it’s important to write the data back to a different drive and unfortunately the legacy server only had USB 2 connections so it took nearly 3 days to recover the images which fortunately turned out to be in pristine condition and contained the required data.