I'll start here a small series of posts about ganeti, xen and puppet. For my work I run few servers sitting on xen and it has always been a bit of a pain to create a new instance and keep it up to date. Up to now I've used the excellent xen-create-image tool to create my VMs, but I wanted to try something new and more sexy... Last week I finally found some time to learn (and a spare box to run my experiments) how to use ganeti. Ganeti is the only tool I tried out, but it seems to fit the bill for my use and it seems polished and mature project to me...
A while ago I received a new desktop machine (8 cores, 8Gb of memory ...) at work. Since for the moment I kinda happy to work on my laptop using an external screen, I decided to put the hw to a good use and to explore a bit more some more exotic (at least for me) xen features.
In particular I spend half a day playing with different xen network settings. The bridge model, that should work out of the shelf, is the easiest one.
After the upgrade of last week, I didn't have any major problems : xen 4 seems pretty stable and does its job well. One problem I encountered the other day was about the dom0 balloning. By default, xen sets dom0_min_mem to 196Mb and balloning set to true. This is all and good untill you try to use too much memory for your VMs, squeezing dom0 to its minimum amount of memory and causing all sort of problems.
It's time to upgrade my xen servers to squeeze. I've already put this off too long and now I've to task to go from etch to squeeze in one long step. In order to avoid problems I just did a first upgrade etch -> lenny and then to squeeze. However, since so much has changed in the meantime, and so much twicking of essential components is needed (such as Xen !), I guess I could have gone directly from etch to squeeze in one go, and fix everthing in the process...
I've modified a script to backup live xen images with dar. This script uses lvm to snapshot a running VM disk, then mount it read only and uses dar to create an incremental backup. The script is a derivative of a script I've found on the net . There is still a small problem with journaled file system that even if the fs is frozen before taking the snapshot, for some reason, even if I mount it read only, the kernel module tries to go through the journal to recover the fs. I'm worried that this might lead to data corruptions...
This is a small recipe to resize a disk image. In this case I wanted to make it bigger.
create a sparse file :
dd if=/dev/zero of=xen.img bs=1k count=1 seek=30M
copy the old file system to the sparse file:
dd if=old-xen.img of=xen.img conv=notrunc
now we resize the fs (reiserfs in this case)
and we can happily mount it.
mount -o loop xen.img test/
now we have a bigger fs !
# df -h
Filesystem Size Used Avail Use% Mounted on
/home/xen.img 30G 338M 30G 2% /home/test
Well today I tried to understand why our production server (shame shame) has rebooted twice in a row in the last 3 days. The only visible problem in the logs is the infamous xen error : " xen_net: Memory squeeze in netback driver." . Googling around it seems kinda common and the recommended solution is to add dom0-min-mem to xend.conf and dom0_mem as a kernel option. I've done that and updated the xen hypervisor to the latest bakcported version. The machine is up and running and everything seems fine at the moment. I didn't touch the kernel.
Yesterday we basically reinstalled the main host for the cduce and mancoosi projects. The Problem was that the machine (a power edge 2950) was installed with a 32 bits system while the Xeon processors are 64 bits. To cut the story short we decided to re-install the system.
First we installed a generic 64 bits kernel. Debian ships this kernel in the i386 repository, so it was as simple as apt-get install. After we reboot the machine, we had to add a new lvm partition for the new 64 installation and debootstrap a new system in it.