Recently I had a customer’s hard drive on a dedicated server corrupt so badly that we couldn’t access the information on it at all. Any attempt to fsck the drive or even just mount it produced weird errors and strange notifications (like trying to mount it would say “not found” and other really vague answers).
We basically determined that the inode that held the partition tables as well as the inode that held root directory (/) were corrupted. We were able to run a fsck on it initially, and were prompted to repair the entries as well as multiply linked inodes and other errors. After letting fsck finish, the drive was completely unreadable. No partition tables, nothing… We told the customer that the drive corrupted and that they would have to restore their site from backups. Well guess what they said? “What backups?” Ugh… So…
Moving on… we installed a new drive, moved the old drive as the slave to attempt to recover the data for them. We installed the OS and setup the control panel, and put a page “Site crashed… we are working on restoring it.” and moved forward.
Here is what we did and the results we had and some notes about the process along the way.
1) Do not panic! The first thing you have to remember in dealing with a corrupted drive is don’t panic! The information is still there… and with enough effort can be recovered provided that the drive still functions at some level.
2) Make a backup image. This will help you out a great deal when things go wrong. I recommend using dd to just mirror the data to a new drive or some place with enough storage to hold the drive data.
3) Take your time. This one is hard to follow because normally the site is down and the customer is absolutely panic striken and calling you every 15 minutes asking about the status. Working with large sets of data takes time, so be patient and wait for the various processes to complete… They always take a while, and cutting corners here to save time will only lead to misery if you screw things up.
Okay… Now that we have the ground rules laid out, here is how we restored all their data save one database table, and even then we managed to save that and I will show you how we did that as well.
Step 1: Take the drive to another machine if you can, or a safe place to work on it. We had another machine with a large enough hard drive to hold the data on the drive and made a disk image of it.
dd if=/dev/sdb bs=1k conv=sync,noerror | gzip -c > /path/to/disk.img.gz
This will save your a potential future headache, but it takes a long time for a fair amount of data. Wait it out… it is worth it in the long run.
Step 2: Determine how badly the drive is corrupted. In our case an “fdisk -l /dev/sdb” didn’t show any partition tables, so we had to recover that first before we could start getting at the data.
The application we used to recover the partition table is TestDisk. It is written by a gentleman named Christopher Grenier, and to say that it is awesome is a complete understatement. You will be using this software for all steps that follow, that is how useful it is.
Okay… so download TestDisk from the link above and extract it. Make sure to grab the version for the OS you are using. We used the linux 2.6 kernel version since the drive is attached to a 2.6 kernel machine that we are using for recovery purposes.
Step 3: Recover the partition table. When you first run TestDisk it provides you with a list of drives available. Select the drive (in our case it was /dev/sdb), and the partition table type (we used Intel/PC since that is what we have). Select “Analyse” and the software will inspect the drive to look for partitions. The software will do a quick search first and scan the drive very quickly looking for partitions. In our case it found the boot partition almost immediately, however the other partition was not found on the first pass, so we needed to do a “Deeper Search”. The Deep Search found the other partition.
Step 4: Once you have the partitions listed, you will want to write them to the drive if possible. This will help you in the next step. After we wrote the partition table we exited TestDisk and did step 5. If you just have a Linux based partiton instead of an LVM partition, then you can just continue to step 6.
This is where things get a little tricky. In our case the partition was actually an LVM physical volume partition, meaning it holds additional information for LVM volume groups and logical volumes inside it. TestDisk won’t allow you to read files directly from the LVM partition, which makes sense since there aren’t technically any files inside that. What you need to do is get your OS to import and activate the volume groups and logical volumes so that TestDisk can see them and use them.
Step 5: Import and activate your LVM volume groups and logical volumes. First you need to find your LVM settings. So run:
lvm vgdisplay
This will show you the volume groups for the drive. Then you run:
lvchange -ay
This will activate the volume groups and logical volumes so that the OS can use them.
Step 6: Start TestDisk again, and this time you should see the logical volume or partition with your files on it. Select it and this time (if you are using LVM) you should select “None” as the partition type, since LVM doesn’t have partition information. Select Analyse and you will see the partitions hopefully.
Step 7: Move the partition that contains your files, and press the letter p. This will give you a directory listing of the files as they are found in the partition/logical volume. From here it is just a matter of navigating to the correct location and finding the files you need to recover and pressing “c” to copy them off the drive onto your working drive.
The customer’s drive was a mess. Basically all the directories and files were linked under “lost+found” with names like #123456789. We had to hunt and peck through the drive structure to find the information that we needed. We eventually found the mysql directory and the web space directory and copied those to the working drive (/dev/sda) on our recovery machine.
I have to say that we were pretty pleased with the results. Out of all the files we recovered (and there thousands of images and other files) we only had one file that was damaged. Unfortunetly it was a MySQL FRM file which contains the schema information for the associated table. Without this information MySQL can’t read the data from the MYD file.
Recovering from a damaged FRM file
The FRM file doesn’t often get modified, so the likelyhood of it being damaged is small, but in our case the damaged drive did corrupt this file. Here is how we recovered the data.
If you have a copy of the original schema, and I mean an exact copy with correct field lengths/types, etc. then you are in luck and the restore process is very simple.
Step 1: Make backups of your files. Copy the MYD file out of the databases folder (/var/lib/mysql/databasename) to someplace safe. You are goint to need this later.
Step 2: Delete the other files that make up the table files (remove the corrupted FRM file, MYI file and the MYD file you just copied). F
or example if the table is named foobar in database example, then you will have files named: foobar.MYD, foobar.MYI, and foobar.frm in a folder named example in your mysql directory.
Step 3: Recreate the table schema. Generally this involves logging into the MySQL command line interface and typing in the CREATE TABLE sql queries to make the table. This will recreate the table in MySQL, but the table will be empty.
Step 4: Copy your data file back to your database folder. Take the backup you made of the MYD file in step 1 and copy it over the file that is now in the database folder. Using our example again, you would copy the file foobar.MYD over the file that exists in the example folder in the MySQL data folder.
Step 5: Restart MySQL. Not really necessary, but can’t hurt either.
Step 6: You will need to repair the table to make sure your indexes are correct and everything is working correctly.
And that is it!
Good luck, and I hope you never need this information.