Imagine, if you will, that you stumble upon a directory on one of the CentOS Linux servers that you administer named
/opt/config/etc. “That’s odd”, you say to yourself, “that must have been when I was experimenting with placing /etc/ under version control.” You do a quick listing of all of the files in
/opt/config/etc and notice that the files are basically identical to the ones in
/etc/. You say to yourself, “Hmmm… better get rid of these files – they’re just taking up space here.” And you type
sudo rm -rf /opt/config/etc. And … all hell breaks loose. What has just happened?
You are no longer able to
sudo. You receive odd messages about
user 501. It slowly dawns on you that …
/opt/config/etc was a link to
/etc. You cd into
/etc and your fears are realized. There is nothing there. Nada. Zilch. Well then. Well, here’s another nice mess you’ve gotten yourself into. The question is how to recover.
Well, here’s what I did. I rebooted the server using the fabulous Linux System Rescue CD. The System Rescue CD will boot the computer from CD. Initially, it does not attempt to mount the server’s hard drive. We use CentOS, so by default the drives are configured as LVM volumes. This complicates the recovery very slightly. Using the tips from this blog post, because we can never remember our lvm commands, we do this:
# lvm vgscan
Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 # lvm vgchange -ay # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert LogVol00 VolGroup00 -wi-ao 6.88G LogVol01 VolGroup00 -wi-ao 1.00G
This shows us that we do have two volumes on the CentOS disk, which makes sense. There is a 7 GB root partition and a 1 GB swap partition. The root partition is the one we’re after, so we can do this from the command line:
# mkdir /disk # mount /dev/VolGroup00/LogVol00 /disk # cd /disk/etc # ls /disk/etc
At this point, we see nothing, no files, just as we had feared. Time to restore from our backup. Nothing magical here; if you don’t have a backup, you’ll be re-installing the OS.
In our case, we use the terrific rsnapshot script to periodically store backups of important directories (like
/etc) to another server. Because the data is remote, we need to bring up networking via System Rescue CD. You can just do
ifconfig eth0 xxx.xxx.xxx.xxx and then
route add default gw yyy.yyy.yyy.yyy to bring up the adapter and establish a route. You could also edit
/etc/resolv.conf so that you have access to a name server.
At this point you should be able to ssh to the host that holds the backups. In our case, because of firewall configuration, we cannot ssh into the backup server. Rather we need to ssh into the system being repaired from the backup server. When System Rescue CD starts up, it actually starts an sshd server and root is allowed to connect to the server. However, we had to set root’s password first before we were able to successfully connect:
passwd at the command line and then a reasonable password. You may also need to fiddle with the ssh settings on the backup server; after all the server to be rescued no longer has the same server key.
On the backup host, we did
tar cvfp etcbackup.tgz etc/* to create a tgz archive containing all of the files from the backed-up etc directory. Note the “p” option – we’re trying to preserve file modes and ownership of the files to be restored. We then copied the archive over to the host to be rescued:
scp etcbackup.tgz email@example.com:/root/. This copied the archive to root’s home directory on the host to be rescued. Back on the machine to be rescued, we did ‘
tar xvfz etcbackup.tgz‘ and examined /root/etc/* to see that the files were there. At this point, we copied the files back into
/disk/etc/ (the previously mounted hard disk for the damaged server), crossed our fingers and rebooted.
The machine came back up without any issues and we are back in business. The total time commitment was about 35 minutes from scary start to relieved finish.
You may still be nervous that there are things that are broken in /etc/ that will cause unforeseen problems down the road. Here’s one way to do some checking with regard to that:
rpm -qf * | grep -v "is not owned"|sort | uniq >/tmp/etcpkgs
for x in $(cat /tmp/etcpkgs);do rpm -V $x;done
Here we are leveraging RPM’s package validation tools. You change into the /etc directory. First, you determine what packages own the files in /etc/ and strip out any information about files that are not owned by any rpm package. Obviously, the assumption here is that you primarily use pre-built RPM packages and do not install much software from source. We then sort the list of packages and save the unique package names to a file in /tmp called etcpkgs. For each package in that list you then run the rpm –verify command. That command will return information like the following:
.......T c /etc/audit/auditd.conf
S.5....T c /etc/yum.repos.d/CentOS-Base.repo
S.5....T c /etc/httpd/conf/httpd.conf
.......T c /etc/inittab
S.5....T c /etc/ssh/sshd_config
....L... c /etc/pam.d/system-auth
S.5....T c /etc/php.ini
S.5....T c /etc/postfix/main.cf
S.5....T c /etc/postfix/virtual
S.5....T c /etc/mail/sendmail.cf
S.5....T c /etc/mail/sendmail.mc
S.5....T c /etc/aliases
S.5....T c /etc/printcap
S.5....T c /etc/sudoers
The columns in the output correspond to the following issues:
S file Size differs
M Mode differs (includes permissions and file type)
5 MD5 sum differs
D Device major/minor number mismatch
L readLink(2) path mismatch
U User ownership differs
G Group ownership differs
T mTime differs
Based on the returned output, you can investigate further. Logical candidates for further investigation are any files with file Mode, Link, User or Group issues.