« Posts under SysAdmin

Kayako V3 to V4 Importing

Kayako has a new version of their ticket system available. We have a customer that uses this system extensively. The database format for the new version of Kayako isn’t compatible with the older database schema, so you have to run an import script that processes the old database information and imports it into the new database for V4.

The problems begin when you start to deal with large installations of Kayako. This one customer has over 4.2 million rows of information in their current Kayako installation (this is based on what the import script is reporting). Kayako’s importing script is said to have some issues with memory leaks and memory utilization with large imports and they have included a command line option to help get around this issue.

./index.php /Base/Import/Version3/<limit>
is the number of data loops to be performed for each run. Overall, not a bad approach to work around some limitations. I personally would have gone a different route and spent some serious amounts of time on memory optimization and clean up in the import scripting, but that is just me.

My problem with this process is how the import script actually works. Each time you run the import command the system prompts you for the database information of the previous installation.

====================
Version3 Import
====================
Database Host: localhost
Database Name: database
Database Port (enter for default port):
Database Socket (enter for default socket):
Database Username: username
Database Password: password

To my knowledge, there is not any way currently to pass this information via the command line or settings file. It must be manually entered each and every time you run the import script. You can imagine that for 4.2 million records, this would get rather tiresome after the second or third time.

So… I wrote a script that uses expect to get around it:

#!/usr/bin/expect

set dbhost "localhost"
set dbuser "user"
set dbpass "password"
set dbname "database"

spawn /path-to-the-new-kayako-install/console/index.php /Base/Import/Version3/50

expect "Database Host:"
send "$dbhost\r"

expect "Database Name:"
send "$dbname\r"

expect "Database Port (enter for default port):"
send "\r"

expect "Database Socket (enter for default socket):"
send "\r"

expect "Database Username:"
send "$dbuser\r"

expect "Database Password:"
send "$dbpass\r"

interact

This will pass the values to the executed script filling in the values as required, and then allow the script to execute until it is complete. The interact line was the part I needed to find with some trial and error.

So there you go… use that to work around the dumb requirements of having to type that information in every time.

Anti Anti Debugging Tricks

Housekeeping: Well… It’s been a while since I have posted. I had been meaning to post some things but Google turned off their FTP service in Blogger making their service really useless to me and so this blog languished while I finally got around to sorting things out and moving it to a proper blog software. I hope to be providing more updates now, but I won’t fool myself too much.

On with the show….

I recently came across a binary application on a Linux server that had been coded to prevent people from snooping into what the application actually did. If you ran the application using strace or gdb the application would detect it and stop running. It would throw an error similar to:

“Debugging detected… goodbye!”

and the application would simply exit. Now I understand a programmer’s desire to protect their code, however if you are going to be running an application on my server I should at least know what it is you are actually doing. Apparently this is known as “anti debugging” and is designed to prevent reverse engineering of an application. Not being one to turn down a challenge…. I accepted. Below I will outline some very simple processes that can be used to circumvent some of the more basic checks.

There are apparently a few different methodologies involved in anti debugging, and it seems that our application used a couple of checks. I will show you some of what the application was doing and the very simple workarounds to get around them.

When you attach a debugging tool such as strace or gdb to the command it would throw off the process id of the parent application. Apparently the application was checking that by using a ptrace call to attach to the parent process id (getppid). Here is how I worked around this issue.

We override the getppid system call with our own. Create a very simple C application that will define our new getppid() calls.

#include <stdio.h>

int getppid() { return getsid( getpid() ); }

What this does is return the session id for current process’s id anytime getppid() is called. So now we need to compile this into a shared library that we can load before we execute the application we want to debug.

gcc -shared -fPIC -o fakegetppid fakegetpppid.c

This compiles the C application into a shared library that we can then load. Here is how we tell something like strace to use this instead of the normal getppid system call.

strace -fxi -E LD_PRELOAD=./fakegetppid /path/to/the/main/application

If there are no other anti-debugging tricks in place, the application will execute as normal.

What if you have an anti-debugging trick that isn’t so simple to thwart? What if the application makes multiple calls to a system call that you need to catch a specific instance of? Well there is a way to accomplish that as well.

#define _GNU_SOURCE 1

#include <stdio.h>
#include <dlfcn.h>

static int(*next_ptrace)(int a, int b, int c, int d ) = NULL;

long ptrace( int a, int b, int c, int d )
{
    if( next_ptrace == NULL ) {
        next_ptrace = dlsym( RTLD_NEXT, "ptrace" );
    }

    if( a == 16 ) { /* PTRACE_ATTACH */
        fprintf( stderr, "PTRACE_ATTACH called with pid %i\n", b);
    }

    return next_ptrace( a, b, c, d );

}

This will allow you to specifically capture PTRACE_ATTACH calls and do whatever you want with the calls, sending all other calls to the original system ptrace call. That’s pretty darn powerful and works exactly the same way as the code above.

I hope this information is useful to some people out there. I know it is a little complex, but it just goes to show that it is possible to debug an application, even if it doesn’t appear to be at first.

Good luck and happy hunting.

How To Recover Data From a Badly Corrupted Drive

Recently I had a customer’s hard drive on a dedicated server corrupt so badly that we couldn’t access the information on it at all. Any attempt to fsck the drive or even just mount it produced weird errors and strange notifications (like trying to mount it would say “not found” and other really vague answers).

We basically determined that the inode that held the partition tables as well as the inode that held root directory (/) were corrupted. We were able to run a fsck on it initially, and were prompted to repair the entries as well as multiply linked inodes and other errors. After letting fsck finish, the drive was completely unreadable. No partition tables, nothing… We told the customer that the drive corrupted and that they would have to restore their site from backups. Well guess what they said? “What backups?” Ugh… So…

Moving on… we installed a new drive, moved the old drive as the slave to attempt to recover the data for them. We installed the OS and setup the control panel, and put a page “Site crashed… we are working on restoring it.” and moved forward.

Here is what we did and the results we had and some notes about the process along the way.

1) Do not panic! The first thing you have to remember in dealing with a corrupted drive is don’t panic! The information is still there… and with enough effort can be recovered provided that the drive still functions at some level.

2) Make a backup image. This will help you out a great deal when things go wrong. I recommend using dd to just mirror the data to a new drive or some place with enough storage to hold the drive data.

3) Take your time. This one is hard to follow because normally the site is down and the customer is absolutely panic striken and calling you every 15 minutes asking about the status. Working with large sets of data takes time, so be patient and wait for the various processes to complete… They always take a while, and cutting corners here to save time will only lead to misery if you screw things up.

Okay… Now that we have the ground rules laid out, here is how we restored all their data save one database table, and even then we managed to save that and I will show you how we did that as well.

Step 1: Take the drive to another machine if you can, or a safe place to work on it. We had another machine with a large enough hard drive to hold the data on the drive and made a disk image of it.

dd if=/dev/sdb bs=1k conv=sync,noerror | gzip -c > /path/to/disk.img.gz

This will save your a potential future headache, but it takes a long time for a fair amount of data. Wait it out… it is worth it in the long run.

Step 2: Determine how badly the drive is corrupted. In our case an “fdisk -l /dev/sdb” didn’t show any partition tables, so we had to recover that first before we could start getting at the data.

The application we used to recover the partition table is TestDisk. It is written by a gentleman named Christopher Grenier, and to say that it is awesome is a complete understatement. You will be using this software for all steps that follow, that is how useful it is.

Okay… so download TestDisk from the link above and extract it. Make sure to grab the version for the OS you are using. We used the linux 2.6 kernel version since the drive is attached to a 2.6 kernel machine that we are using for recovery purposes.

Step 3: Recover the partition table. When you first run TestDisk it provides you with a list of drives available. Select the drive (in our case it was /dev/sdb), and the partition table type (we used Intel/PC since that is what we have). Select “Analyse” and the software will inspect the drive to look for partitions. The software will do a quick search first and scan the drive very quickly looking for partitions. In our case it found the boot partition almost immediately, however the other partition was not found on the first pass, so we needed to do a “Deeper Search”. The Deep Search found the other partition.

Step 4: Once you have the partitions listed, you will want to write them to the drive if possible. This will help you in the next step. After we wrote the partition table we exited TestDisk and did step 5. If you just have a Linux based partiton instead of an LVM partition, then you can just continue to step 6.

This is where things get a little tricky. In our case the partition was actually an LVM physical volume partition, meaning it holds additional information for LVM volume groups and logical volumes inside it. TestDisk won’t allow you to read files directly from the LVM partition, which makes sense since there aren’t technically any files inside that. What you need to do is get your OS to import and activate the volume groups and logical volumes so that TestDisk can see them and use them.

Step 5: Import and activate your LVM volume groups and logical volumes. First you need to find your LVM settings. So run:

lvm vgdisplay

This will show you the volume groups for the drive. Then you run:

lvchange  -ay

This will activate the volume groups and logical volumes so that the OS can use them.

Step 6: Start TestDisk again, and this time you should see the logical volume or partition with your files on it. Select it and this time (if you are using LVM) you should select “None” as the partition type, since LVM doesn’t have partition information. Select Analyse and you will see the partitions hopefully.

Step 7: Move the partition that contains your files, and press the letter p. This will give you a directory listing of the files as they are found in the partition/logical volume. From here it is just a matter of navigating to the correct location and finding the files you need to recover and pressing “c” to copy them off the drive onto your working drive.

The customer’s drive was a mess. Basically all the directories and files were linked under “lost+found” with names like #123456789. We had to hunt and peck through the drive structure to find the information that we needed. We eventually found the mysql directory and the web space directory and copied those to the working drive (/dev/sda) on our recovery machine.

I have to say that we were pretty pleased with the results. Out of all the files we recovered (and there thousands of images and other files) we only had one file that was damaged. Unfortunetly it was a MySQL FRM file which contains the schema information for the associated table. Without this information MySQL can’t read the data from the MYD file.

Recovering from a damaged FRM file

The FRM file doesn’t often get modified, so the likelyhood of it being damaged is small, but in our case the damaged drive did corrupt this file. Here is how we recovered the data.

If you have a copy of the original schema, and I mean an exact copy with correct field lengths/types, etc. then you are in luck and the restore process is very simple.

Step 1: Make backups of your files. Copy the MYD file out of the databases folder (/var/lib/mysql/databasename) to someplace safe. You are goint to need this later.

Step 2: Delete the other files that make up the table files (remove the corrupted FRM file, MYI file and the MYD file you just copied). F
or example if the table is named foobar in database example, then you will have files named: foobar.MYD, foobar.MYI, and foobar.frm in a folder named example in your mysql directory.

Step 3: Recreate the table schema. Generally this involves logging into the MySQL command line interface and typing in the CREATE TABLE sql queries to make the table. This will recreate the table in MySQL, but the table will be empty.

Step 4: Copy your data file back to your database folder. Take the backup you made of the MYD file in step 1 and copy it over the file that is now in the database folder. Using our example again, you would copy the file foobar.MYD over the file that exists in the example folder in the MySQL data folder.

Step 5: Restart MySQL. Not really necessary, but can’t hurt either.

Step 6: You will need to repair the table to make sure your indexes are correct and everything is working correctly.

And that is it!

Good luck, and I hope you never need this information.

Turn Your PXE Enabled Network Card Into an iSCSI HBA

Update: gPXE has been forked to a new development called iPXE. iPXE is being actively developed by the same team that worked on gPXE and have made many new code changes while gPXE has remained relatively static (look at the respective changelogs for confirmation). The iPXE project has the same features as before with active bug fixes and features being added all the time. Please be sure to double check the commands referenced in this article as they might have changed in name from gPXE to iPXE.

You can turn that PXE enabled network card into an iSCSI enabled HBA for free. Save yourself a couple of bucks on an iSCSI HBA, and boot your server/workstation diskless via iSCSI.

Here is how you turn your card into an HBA.

gPXE iPXE is a PXE compatible bootloader that provides some great functionality including AoE (ATA Over Ethernet), HTTP (loading boot scripts and boot images from HTTP), and the one we are most interested in, iSCSI which allows us to boot from an iSCSI target.

We start off with a working PXE enviroment; DHCP server (to provide IP and PXE settings) and TFTP server (to provide the PXE files we need to load). Now in order to get PXE to load the gPXE iPXE firmware we need to do what is called “chainloading”. This means that our network card will do its standard PXE boot up, and when it loads, we will then load the gPXE iPXE loader and use gPXE iPXE for the rest of the boot process.

Here is how we do that:

In our DHCPd server we need to add some specific settings to enable us to detect wether or not the DHCP request is coming from PXE, or gPXE.

/etc/dhcpd.conf:

allow booting;
allow bootp;
default-lease-time 600;
max-lease-time 7200;
authoritative;
option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;
ddns-update-style ad-hoc;
subnet 192.168.2.0 netmask 255.255.255.0 {
    use-host-decl-names on;
    range 192.168.2.20 192.168.2.200;
    option subnet-mask 255.255.255.0;
    option broadcast-address 192.168.2.255;
    default-lease-time 1800;
    max-lease-time 86400;
    option domain-name-servers 192.168.1.10;
    next-server 192.168.2.1;
    if not exists gpxe.bus-id {
        filename "undionly.kpxe";
    } else {
        filename "http://192.168.2.1/default/install.gpxe";
    }
}

The important lines are:

option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;

and

if not exists gpxe.bus-id {
    filename "undionly.kpxe";
} else {
    filename "http://192.168.2.1/default/install.gpxe";
}

This conditional statement allows us to load either the gPXE chainloader when we are called from a standard PXE request (the if not exists gpxe.bus-id) or a gPXE compatiable script when we are called from gPXE.

We are currently using this setup to handle new server OS installations, hence the install.gpxe file.

The contents of that file are rather simple.

install.gpxe:

#!gpxe
kernel http://192.168.2.1/default/centos5 askmethod
initrd http://192.168.2.1/default/centos5.img
boot

This loads the CentOS 5 PXE installation image and initrd to handle OS installation on the server.

Once the server has its OS installed, we then need to add the server’s MAC address to the DHCPd server so that it will chain load gPXE and then load the server’s root disk via iSCSI.

Here is how we accomplish that:

/etc/dhcpd.conf (added in the subnet 192.168.2.0 section above):

host server01 {
    hardware ethernet 00:xx:xx:xx:xx:xx;
    fixed-address 192.168.2.21;
    if not exists gpxe.bus-id {
        filename "undionly.kpxe";
    } else {
        filename "";
        option root-path "iscsi:192.168.2.1::::iqn.2001-04.com.server:server01.vg00.lun0";
    }
}

This, again, chainloads the gPXE iPXE chainloader from PXE, and on the next DHCP request from gPXE iPXE we provide the iSCSI target to load the root for the server. This brings up the normal GRUB screen and the system boots as normal.

And that is how you turn your PXE enabled network card, into an iSCSI HBA.

Gotchas:
I had originally wanted to use gPXE/iSCSI to host the root drive for a Xen based Dom0, however I have discovered that the Xen hypervisor does not support this feature. I have done some searching on the internet and it seems that the problem does lie with Xen’s hypervisor kernel and its inability to read the iBFT (Iscsi Boot Firmware Table). gPXE does support and utilize the iBFT, however the Xen Hypervisor kernel only recognizes the iBFT from certain iSCSI HBAs (listed in thier HCL).

Installing CentOS 5 as a DomU with a Debian Dom0

There isn’t a whole lot of information about how to setup CentOS as a DomU under a Debian 4.0 based Dom0 and still maintain the use of pygrub to boot the CentOS kernels. This howto will give you a general overview on what steps to take without having to use an incomplete CentOS image. This is not going to be a copy and paste sort of howto, but rather a more high level detail, and a couple of fixes to make it all work correctly.

A couple of assumptions I am making here:

  1. You have a working Xen install already under Debian
  2. You can edit files using vi or a comparable editor.
  3. You understand how Xen and LVM can work together at least at some basic level
  4. You are confident to compile your own applications using make, etc…

Here is what you need to do to get started:

Step 1:

Download the kernel image and ram disk for CentOS and put them some place you can access them on the Dom0.

In my case, I put them in /usr/local/src/xen/ (vmlinuz and initrd.gz respectively). I downloaded these files from a CentOS mirror. The files you are after are located in the centos/5.1/os/i386/images/xen/ directory as these contain the Xen code compiled into the kernel so that you can boot the DomU in paravirtualization mode.

Step 2:

Create a Xen DomU configuration file that points to these files for the boot kernel.

I edited the two lines:

kernel = “/usr/local/src/xen/vmlinuz”
ramdisk = “/usr/local/src/xen/initrd.img”

This tells Xen to use these kernels on boot up.

Step 3:

Modify your DomU config to point to your disks:

disk = [ ‘phy:/dev/xen01/centos5-disk,xvda,w’, ‘phy:/dev/xen01/centos5-swap,sda1,w’]

It is important to note that you must export the drives from the Dom0 as xvda, otherwise the CentOS installer will not be able to detect them properly and you will have no target drive to install to.

We will also want to modify the default restart behavior as you will see later, this is important:

on_reboot = ‘destroy’

Step 4:

Go ahead and boot up the Xen DomU using xm create -c

Install CentOS as a normal network installation (point it at an FTP or HTTP mirror and let it install normally).

Step 5:

Once the CentOS installation is completed, the DomU will attempt to reboot itself. This is why we set the on_restart to destroy instead of the default of restart. We need to edit the configuration to boot up via pygrub instead:

bootloader = “/usr/lib/xen-3.0.3-1/bin/pygrub”

Step 6:

Here is where things get a little tricky. The pygrub application is missing a library that it needs in order to boot up CentOS based kernels. We must build this ourselves.

Download the xen-3.0.3 source (the new sources do not build this file, so I used this version specifically, I don’t know if others will work). I know for a fact that xen-3.2.0 does not work.

wget http://bits.xensource.com/oss-xen/release/3.0.3-0/src.tgz/xen-3.0.3_0-src.tgz

Untar the file and cd into the directory xen-3.0.3_0-src

Then:

cd tools/pygrub

Then you need to run make. Pay attention to the errors, you might need to install additional libraries if you don’t have them on your Dom0. (e2fslibs-dev comes to mind).

Step 7:

Once your build has successfully completed, you will need to copy the files to your local xen installation.

cd build/lib.linux-i686-2.4/grub/fsys/ext2
mkdir /usr/lib/xen-3.0.3-1/lib/python/grub/fsys/ext2
cp * /usr/lib/xen-3.0.3-1/lib/python/grub/fsys/ext2/

Step 8:

Boot your DomU using:

xm create -c

Finished:

You should now have a working Xen DomU under Dom0 without having to resort to broken CentOS images.

Find Out What Your DNS Server is Doing

What is my DNS server responding to?

We have been in the process of moving from an old server to a newer server. The process is straight forward, we move the sites over to the new server and then update their zone records to point at the new server (the zone has a low TTL – Time To Live to make this transition smoother). Overall everything has gone smoothly with little interuption in the service of each site.

Finally once everything was moved over, we updated the nameserver records to point at the new server so now everything should be running off the new server’s DNS. We are ready to turn off the old server, but noticed that named (bind) was still handing out DNS responses (based on its activity in top). We thought we had everything updated so that this server shouldn’t be used at all.

So we had to find out what DNS requests were still hitting the old server and why we missed those. Here is what we did to find out.

Edit your named.conf (ours was in /etc).

Add the following section if you do not already have a section called logging {}.

logging {
channel query_logging {
syslog daemon;
severity debug 9;
};
category queries {
query_logging;
};
};

What this does it record any DNS query named serves up in the default syslog for named (generally /var/log/messages). This will help you see what domains are being requested from your server.

We determined what DNS queries were coming in, and based on the whois information found out that there were some very old nameserver records pointing at the server’s IP. Without the logging change above, we could have lost 3 or 4 long time customer’s DNS information when the old server was turned off. As it is now we have updated those nameserver records to point at the new nameservers, and will need to keep the old server up and running for at least another 48 hours (the amount of time a root nameserver record is cached). Saved us a black eye for sure.

What else is my DNS server handing out?

Additionally, you might want to look at the log information and determine if anybody is using your server for recursive lookups too.

What is DNS recursion?

Well, recursion itself isn’t bad, and actually a vital part of DNS. Recursion means that if you request a DNS lookup against a DNS server, and that server isn’t authoritative for that domain (it doesn’t have a zone for that domain), it must pass the DNS request to another server.

Why is it bad to allow recursion?

Until recently DNS recursion wasn’t really a bad thing, but hackers have determined that it is possible to “amplify” or magnify their DDoS (Distributed Denial of Service) attacks using spoofed UDP based DNS requests. (UDP is extremely easy to spoof the originating IP address of the request.) The hackers send a spoofed UDP request for a given domain with a large number of records to a DNS server that allows recursive lookups. Since the initial UDP request is realtively small, and the response (because it has so many records in it) is very large, hackers can amplify the amount of data they can send at a target using recursive third party DNS servers.

How do I turn off recursion in named/bind?

To turn off recursive lookups from unauthorized sources you can add the follownig ACL to your named.conf:

acl recursion { 127.0.0.1; 1.2.3.4/24; };

And then in your options do:

options {
allow-recursion { “recursion”; };
};

The first line creates an ACL (Access Control List) to let named (bind) know who is allowed to do recursive lookups against the server. The IP’s should be listed in CIDR notation, and be followed by a semicolon. Include any IP address that uses this server for legitimate DNS lookup purposes.

The second section should already exist in your named.conf, and you just want to add the allow-recursion line to that section. This will apply the ACL to your server. Then you just need to restart named, and you are good to go.

So that is why you should know exactly what your DNS server is doing.