Hard disk failure on Matrix server

We are currently facing an outage on Matrix server due to a failed HDD. However, the data are fully protected with our Raid-1 redundancy system. We are curently replacing the failed drive from RAID array and cloning a new disk from the primary HDD. We expect to complete this task soon. After that, the server will run stable. We apologize for the inconvenience caused.

** UPDATE #1**
The drive clone process errored out. We are currently attempting to manually copy over
the data from old drive to the new drive. Most likely, this means the
hard drive has some severely damaged section of the disk. Quite often we are
able to manually copy over the data to a new drive (thats what we’re trying
now). We are also double checking the RAID backup drive to check the status of
data.

** UPDATE #2**
All of the HDD partitions copied successfully except for /home, which is still in
progress. It has currently restored 50% and still in progress. So it will take quite a bit of
time yet (couple more hours most likely). /home partition contains all the real data (including files, databases, emails etc).

** UPDATE #3**
There is about 5GB remaining data to copy. So far, the data recovery has been going perfectly fine, with no loss of data. The recovery process took so long due to the huge amount of data we had to copy over (few hungreds GB). We are going to finish the restore process very soon and get the server back online as soon as possible. Again, we truly apologize for the inconvenience caused and appreciate your patience.

** UPDATE #4**
All data files have been successfully restored. We are now restoring the MySQL databases. It’s taking more time than expected because the faulty drive keeps shutting off intermittently…and we have to reboot the server everytime to resume the restore process (from the point where it rebooted). We will post an update here as soon as we are done.

** UPDATE #5**
The server is now online. All data files & SQL DBs have now been successfully restored. We are now fixing the permissions of web files, so that the webserver can parse the pages corrrectly. Until we finish fixing the file permissions, you may see “404 error” while browsing your site. This is a temporary issue and we are already working to fix it. We’ll post an update here soon.

** UPDATE #6**
The server is now running in stable condition. We are now restoring individual sites from our daily backup copies (dated Aug-07-2009 1AM PST). 50% sites are already fully online and the site-restore operation is still in progress. We’ll update here when all the sites are fully restored from backup.

** UPDATE #7**
There are few sites still remaining in the restore queue & currently in progress. These sites are taking time to restore to to bigger sizes (more than 10GB of data). Of your site still shows “404 error”, please open a support ticket and our techs will restore your site on priority basis (instead of queing).

Comments are closed.

Copyright © 2018, GigaPros.com | All rights reserved.