Xen brouter setup

A while ago I received a new desktop machine (8 cores, 8Gb of memory ...) at work. Since for the moment I kinda happy to work on my laptop using an external screen, I decided to put the hw to a good use and to explore a bit more some more exotic (at least for me) xen features.

In particular I spend half a day playing with different xen network settings. The bridge model, that should work out of the shelf, is the easiest one. To setup this up, you basically need to specify a couple of options in the xend-config file and you're done. This is the "default" network configuration and is should work out of the box in most situations. Using this method, since all VMs' interfaces are bridged together (surprise !) with the public interface, your network card is left in promiscuous mode (not a big problem if you ask me...). Once your VMs are up, you can then decided to use your default dhcp server, autoconf your VMs with ipv6 only, or do nothing as you please.

An other popular method, albeit a bit more complex, is to setup a natted network using the script network-nat (this one is an evolution of the third method that is 'network-routed') . I played with it, but since I wanted to have all my DomU on the same subnet, this setup wasn't satisfying for me. In particular, by default, 'network-nat' assigns a different subnet to each DomU. Using the natted set up you can also configure a local dhcp server to give private IPs to your VMs all done transparently by the xen network scripts. I've noticed that there is a bug in the xen script that does not make it very squeeze friendly. Since the default dhcp server in squeeze is isc-dhcp and few configuration files got shuffled in the process (notably /etc/dhcp3.conf is not /etc/dhcp/dhcp.conf) , the script needs a little fix to work properly. I'll report this bug sometimes soon...

Goggling around I found a different setup that is called brouter, that is a hybrid between a bridge configuration and a routed configuration. This is the original (??) article well hidden in an old suse wiki.

 
I've done few modifications here to add natting. So basically, all virtual interfaces connected each to one DomU are linked together by a bridge (xenbr0). The bridge with address 10.0.0.1 is also the router of the subnet. All DomU are configured to used dhcp that assigns a new ip and specifies the router of the subnet. The dhcp server is configured to answer requests only on the xenb0 interface avoiding problems on the public network...

routing is configured using iptables :

    iptables -t nat -A POSTROUTING -o ${netdev} -j MASQUERADE
    iptables -A FORWARD -i ${bridge} -j ACCEPT
    echo 1 >/proc/sys/net/ipv4/ip_forward
    /etc/init.d/isc-dhcp-server restart

Note that since the dhcp server is configured to give addresses only on the virtual network, we need to restart it after creating the bridge interface, otherwise isc-dhcp-server will refuse to run. Mum says that I should configure the bridge in /etc/network/interfaces to make the dhcp server happy at startup, but I felt a bit lazy, so I let to task to xen...

In the next episode, I'll add ipv6 connectivity to the virtual subnet and then start playing with puppet... ipv6 is almost done, puppet... I started with the doc...

The complete script from the suse wiki and with my modifications is below (only lightly tested):

#!/bin/sh
#============================================================================
# Default Xen network start/stop script.
# Xend calls a network script when it starts.
# The script name to use is defined in /etc/xen/xend-config.sxp
# in the network-script field.
#
# This script creates a bridge (default xenbr${vifnum}), gives it an IP address
# and the appropriate route. Then it starts the SuSEfirewall2 which should have
# the bridge device in the zone you want it.
#
# If all goes well, this should ensure that networking stays up.
# However, some configurations are upset by this, especially
# NFS roots. If the bridged setup does not meet your needs,
# configure a different script, for example using routing instead.
#
# Usage:
#
# vnet-brouter (start|stop|status) {VAR=VAL}*
#
# Vars:
#
# bridgeip   Holds the ip address the bridge should have in the
#            the form ip/mask (10.0.0.1/24).
# brnet      Holds the network of the bridge (10.0.0.1/24).
#
# vifnum     Virtual device number to use (default 0). Numbers >=8
#            require the netback driver to have nloopbacks set to a
#            higher value than its default of 8.
# bridge     The bridge to use (default xenbr${vifnum}).
#
# start:
# Creates the bridge
# Gives it the IP address and netmask
# Adds the routes to the routing table.
#
# stop:
# Removes all routes from the bridge
# Removes any devices on the bridge from it.
# Deletes bridge
#
# status:
# Print addresses, interfaces, routes
#
#============================================================================

#set -x

dir=$(dirname "$0")
. "$dir/xen-script-common.sh"
. "$dir/xen-network-common.sh"

findCommand "$@"
evalVariables "$@"

vifnum=${vifnum:-0}
bridgeip=${bridgeip:-10.6.7.1/24}
brnet=${brnet:-10.6.7.0/24}
netmask=${netmask:-255.255.255.0}
bridge=${bridge:-xenbr${vifnum}}

##
# link_exists interface
#
# Returns 0 if the interface named exists (whether up or down), 1 otherwise.
#
link_exists()
{
    if ip link show "$1" >/dev/null 2>/dev/null
    then
        return 0
    else
        return 1
    fi
}


# Usage: create_bridge bridge
create_bridge () {
    local bridge=$1

    # Don't create the bridge if it already exists.
    if [ ! -d "/sys/class/net/${bridge}/bridge" ]; then
        brctl addbr ${bridge}
        brctl stp ${bridge} off
        brctl setfd ${bridge} 0
    fi
    ip link set ${bridge} up
}

# Usage: add_to_bridge bridge dev
add_to_bridge () {
    local bridge=$1
    local dev=$2
    # Don't add $dev to $bridge if it's already on a bridge.
    if ! brctl show | grep -wq ${dev} ; then
        brctl addif ${bridge} ${dev}
    fi
}

# Usage: show_status dev bridge
# Print interface configuration and routes.
show_status () {
    local dev=$1
    local bridge=$2

    echo '============================================================'
    ip addr show ${dev}
    ip addr show ${bridge}
    echo ' '
    brctl show ${bridge}
    echo ' '
    ip route list
    echo ' '
    route -n
    echo '============================================================'
    echo ' '
    iptables -L
    echo ' '
    iptables -L -t nat
    echo '============================================================'

}

op_start () {
    if [ "${bridge}" = "null" ] ; then
        return
    fi

    create_bridge ${bridge}

    if link_exists "$bridge"; then
        ip address add dev $bridge $bridgeip
        ip link set ${bridge} up arp on
        ip route add to $brnet dev $bridge
    fi

    if [ ${antispoof} = 'yes' ] ; then
        antispoofing
    fi
    iptables -t nat -A POSTROUTING -o ${netdev} -j MASQUERADE
    iptables -A FORWARD -i ${bridge} -j ACCEPT
    echo 1 >/proc/sys/net/ipv4/ip_forward
    /etc/init.d/isc-dhcp-server restart
}

op_stop () {
    if [ "${bridge}" = "null" ]; then
        return
    fi
    if ! link_exists "$bridge"; then
        return
    fi

    ip route del to $brnet dev $bridge
    ip link set ${bridge} down arp off
    ip address del dev $bridge $bridgeip
    ##FIXME: disconnect the interfaces from the bridge 1st
    brctl delbr ${bridge}
    /etc/init.d/isc-dhcp-server restart
}

case "$command" in
    start)
        op_start
        ;;

    stop)
        op_stop
        ;;

    status)
        show_status ${netdev} ${bridge}
        ;;

    *)
        echo "Unknown command: $command" >&2
        echo 'Valid commands are: start, stop, status' >&2
        exit 1
esac

Average: 1.1 (66 votes)

Misc live v. 4

The results of the 4th run of the misc competition are online ! .

The annoncement sent on the mailing list:

 This time, it was quite close, and execution times had to be taken
 into account several times to break ties. However, looking at the 
 different total times for each solver it looks closer than it actually 
 was. The reason for this is the way the "total success time" is defined
 (see http://www.mancoosi.org/misc-live/20110225/rules/, section "Breaking
 Ties"). Since we had many cases without a solution, our definition
 resulted in the same large constant added to the time of each participant. 
 We'll have to think about changing this for the next time.
We still have to decide if we're going to have another edition of misc live before the official annual competition (TBA).

enjoy !

Average: 1 (84 votes)

On equivalent debian versions

For one of our experiments, we ended up analyzing all versions mentioned in a Debian Packages file if mentioned in a constraint or in a version field of a package. It seems that there are a lot of DDs that like to use strange 'formatting' when writing down a versioned constraint. This is might be not a choice done directly by the debian maintainer, but just a consequence of a particular version schema from upstream.

The debian policy is rigorous with relation to the algorithm to compare version .

 The strings are compared from left to right.

 First the initial part of each string consisting entirely of non-digit characters is determined. These two parts 
 (one of    which may be empty) are compared lexically. If a difference is found it is returned. The lexical 
 comparison is a comparison of ASCII values modified so that all the letters sort earlier than all the non-letters 
 and so that a tilde sorts before anything, even the end of a part. For example, the following parts are in sorted 
 order from earliest to latest: ~~, ~~a, ~, the empty part, a.[34]

 Then the initial part of the remainder of each string which consists entirely of digit characters is determined. 
 The numerical values of these two parts are compared, and any difference found is returned as the result of 
 the comparison. For these purposes an empty string (which can only occur at the end of one or both version 
 strings being compared) counts as zero.

 These two steps (comparing and removing initial non-digit strings and initial digit strings) are repeated until a
 difference is found or both strings are exhausted.
I think the important part is about the numerical comparison The numerical values of these two parts are compared, and any difference found is returned as the result of the comparison .

Our tool finds a number of examples of equivalent versions :

  • version 0.1-1 is equivalent to (0.00001-1,0.001-1,0.01-1,0.1-1)
  • version 2.1.0-1 is equivalent to (2.01.00-1,2.1.00-1,2.1.0-1)
  • ... etc . I've one page full of equivalent versions.

For example version '0.00001-1' is from package libbenchmark-progressbar-perl. I don't know why so many equivalent way of writing a version string are used. Probably they appear in completely unrelated packages I'm pretty sure no harm is intended :) However, to avoid confusion, it might be a good idea to settle on a non-normative canonical representation of versions. Like no '0' in front of dots and hyphen... Maybe we could add a warning in lintian ?

If somebody is interested I can generate a full report that associates each version to a package name.

Average: 1 (87 votes)

usb-creator

How to create a debian installer for a usb pen drive ? There are many way.s from a simple sudo zcat boot.img.gz > /dev/sdb to unetbootin.

Last night I discovered a very nice tool from ubuntu (usb-creator) that apparently is still not in the debian archives. There is a bit of doco here if you don't know it already.

Installing it on debian is a breeze (I used the debs from lucid) and it works pretty well. The reason I've tried this one is that unetbootin sometimes failed for me producing an un-usable usb key. The old school method of course works, but sometimes, late at night, I feel the need of a friendly gtk interface to help my sleep.

Debs :

Average: 1.9 (16 votes)

SNCF on his way to drupal !

I've just learned from this article (in french) that the SNCF, the french railways company, is migrating all its infrastructure to an open source based solution. Apparently the organization has already migrated large part of their servers to IBM running linux, apache tomcat for the applications and drupal as their content management system.

Pretty soon alsy voyages-sncf will migrate to drupal. Hopufully it will work better then the actual website that from time to time really gives me an hard time when booking tickets.

hurray for sncf (at least reagaring this move :) )

Average: 1.1 (88 votes)

More on Xen 4.0 setup on squeeze

After the upgrade of last week, I didn't have any major problems : xen 4 seems pretty stable and does its job well. One problem I encountered the other day was about the dom0 balloning. By default, xen sets dom0_min_mem to 196Mb and balloning set to true. This is all and good untill you try to use too much memory for your VMs, squeezing dom0 to its minimum amount of memory and causing all sort of problems. On the xen wiki, they reccomend as best practice to reserve a minimum of 512Mb to dom0 for its operations. This is done but setting dom0_mem=512M on the grub command line and the adjusting enable-dom0-ballooning to no and dom0-min-mem accordingly to the amount of memory you choose.

On debian, you can set the grub command line once for all just by adding in /etc/default/group the conf variable

GRUB_CMDLINE_XEN="dom0_mem=512M"

Another small problem is related to the reboot sequence. Since I'm using lvm on aoe, the default shutdown sequence (network down first, lvm later) is not going to work for me. As I've few lvm volumes on aoe and others on the physical disk, the proper solution to this problem is to write a custom shutdown script for the aoe lvm volumes and make it run before deconfiguring the network interfaces. In the mean time, to avoid the kernel hanging there forever, I've added these lines to in /etc/syscntl.d/panic.conf

# Reboot 5 seconds after panic
kernel.panic = 5

# Panic if a hung task was found
kernel.hung_task_panic = 1

# Setup timeout for hung task to 300 seconds
kernel.hung_task_timeout_secs = 120

This will instruct the kernel to panic and then reboot if there a task will not respond for more then 120 seconds.

Average: 1.3 (121 votes)

easy cudf parsing in python

With the forth run of Misc live, you might wonder how to you can quickly write a parser for a cudf document. If you are writing your solver in C / C++ , I advice to either grab the legacy ocaml parser and use the C bindings or reuse a parser written by other competitors (all frontends have a FOSS-compatible licence).

If you want to write a dirty and quick frontend in python, maybe the following 10 lines of python might help you:

from itertools import groupby

cnf_fields = ['conflict','depends','provides','recommends']

def cnf(k,s) :
    if k in cnf_fields :
        l = s.split(',')
        ll = map(lambda s : s.split('|'), l)
        return ll
    else :
        return s

records = []
for empty, record in groupby(open("universe.cudf"), key=str.isspace):
  if not empty:
    l = map(lambda s : s.split(': '), record)
    # we ignore the preamble here ...
    if 'preamble' not in l[0] :
        pairs = ([k, cnf(k,v.strip())] for k,v in l)
        records.append(dict(pairs))

for i in records :
    print i

we use the function groupby from itertools to create a list of stanzas and then we just trasfrom each of them in a dictionary that should be pretty easy to manipulate. We ignore the preamble, but adding support for it should be straigthforward... I got the idea from this forum post.

the result :

#python cudf.py
{'recommends': [['perl-modules '], [' libio-socket-inet6-perl']], 'package': '2ping', 'replaces': '', 'number': '1.0-1', 'sourceversion': '1.0-1', 'source': '2ping', 'depends': [['perl']], 'version': '4806', 'architecture': 'all', 'conflicts': '2ping'}0.5-3', 'source': '2vcard', 'version': '1523', 'architecture': 'all', 'conflicts': '2vcard', 'recommends': [['true!']]}'package': '3270-common', 'number': '3.3.10ga4-2', 'sourceversion': '3.3.10ga4-2', 'source': 'ibm-3270', 'depends': [['libc6 >= 9784 '], [' libssl0.9.8 >= 2840']], 'version': '11009', 'architecture': 'amd64', 'conflicts': '3270-common', 'recommends': [['true!']]}chess', 'depends': [['libc6 >= 9578 '], [' libx11-6 '], [' libxext6 '], [' libxmu6 '], [' libxpm4 '], [' libxt6 '], [' xaw3dg >= 6582']], 'version': '2409', 'architecture': 'amd64', 'conflicts': '3dchess', 'recommends': [['true!']]} [' libxpm4 '], [' libxt6 '], [' xaw3dg >= 6582']], 'version': '2410', 'architecture': 'amd64', 'conflicts': '3dchess', 'recommends': [['true!']]}6 >= 8923 '], [' libfreetype6 >= 8856 '], [' libftgl2 >= 8661 '], [' libgcc1 >= 14906 '], [' libgl1-mesa-glx ', ' libgl1--virtual ', ' libgl1 '], [' libglu1-mesa ', ' libglu1--virtual ', ' libglu1 '], [' libgomp1 >= 11829 '], [' libmgl5 '], [' libpng12-0 >= 5996 '], [' libstdc++6 >= 11843 '], [' libwxbase2.8-0 >= 9714 '], [' libwxgtk2.8-0 >= 9714 '], [' libxml2 >= 9624 '], [' zlib1g >= 14223']], 'version': '116', 'architecture': 'amd64', 'conflicts': '3depict', 'recommends': [['true!']]} '], [' libstdc++6 >= 11664 '], [' libwxbase2.8-0 >= 9714 '], [' libwxgtk2.8-0 >= 9714 '], [' libxml2 >= 9624 '], [' zlib1g >= 14223']], 'version': '138', 'architecture': 'amd64', 'conflicts': '3depict', 'recommends': [['true!']]}': '14987', 'architecture': 'amd64', 'conflicts': '9base', 'recommends': [['true!']]}.8-5', 'sourceversion': '1.8-5', 'source': '9menu', 'depends': [['libc6 >= 8923 '], [' libx11-6']], 'version': '7010', 'architecture': 'amd64', 'conflicts': '9menu', 'recommends': [['true!']]}sion': '1.2-9', 'source': '9wm', 'depends': [['libc6 >= 9578 '], [' libx11-6 '], [' libxext6']], 'version': '5712', 'architecture': 'amd64', 'provides': [['x-window-manager--virtual']], 'conflicts': '9wm', 'recommends': [['true!']]}
{'replaces': '', 'package': 'abook', 'number': '0.5.6-7+b1', 'sourceversion': '0.5.6-7', 'source': 'abook', 'depends': [['libc6 >= 9022 '], [' libncursesw5 >= 12348 '], [' libreadline5 >= 12239 '], [' debconf >= 1510 ', ' debconf-2.0--virtual ', ' debconf-2.0']], 'version': '1712', 'architecture': 'amd64', 'conflicts': 'abook', 'recommends': [['true!']]}
...

update

Maybe a small example of the input file would help :)

package: m4
version: 3
depends: libc6 >= 8

package: openssl
version: 11
depends: libc6 >= 18, libssl0.9.8 >= 8, zlib1g >= 1
conflicts: ssleay < 1

Average: 1.1 (103 votes)

xen 4 on debian squeeze

It's time to upgrade my xen servers to squeeze. I've already put this off too long and now I've to task to go from etch to squeeze in one long step. In order to avoid problems I just did a first upgrade etch -> lenny and then to squeeze. However, since so much has changed in the meantime, and so much twicking of essential components is needed (such as Xen !), I guess I could have gone directly from etch to squeeze in one go, and fix everthing in the process... Anyway, to late for this kind of considerations.

 
The xen debian wiki is full of invaluable information. Kudos to the xen team for their hard work. To get you started on squeeze you need to install the xen hypervisor. Everything is provided by one package:
aptitude install xen-linux-system-2.6-xen-amd64 xen-hypervisor-4.0-amd64

This will pull the latest linux kernel and xen-hypervisor to run on dom0 .

By default the hypervisor is probably not going to be the default kernel. If you want to change this, you should edit the grub default values :

vi /etc/default/grub

to make sure that the default kernel on dom0 is the xen-hypervisor. This is tricky, because grub let you define a default w.r.t the list of available kernels. so if you install a new kernel, you have to change the default accordingly with the list of kernels in /boot/grub/grub.cfg. It would be nice if I could define the default kernel with a label instead of a number... ( ref #505517 )

Alternatively, as suggested in the wiki, you can just move the xen kernel out of the way ...

mv -i /etc/grub.d/10_linux /etc/grub.d/50_linux

When installing xen related tools, aptitude will also probably install by default rinse and xenwatch. The first one is to boostrap redhat machines and maybe you don't need it. The second one is a GUI and will pull in a lot of X-related dependencies. If we have similar needs, you can just remove what is not needed...

aptitude purge rinse rpm rpm-common
aptitude purge xenwatch

Something that is new, is the new schema for virtual devices. Now all vms will see /dev/xdva1 instead of /dev/sda1 as before. This needs to be changed in the domU as well as in the xen config files (/etc/xen/vm.cfg).

One fantastic news is that xen 4 now uses pyGrub. It is not mandatory (so if you want, you can stick with the old configuration file). But if you use pygrub, on the domU you can install whatever kernel you want. Finally, your users will have complete freedom to pick and choose their kernels !

There was a small detail I didn't notice on the debian wiki, that is, if you try to use grub2 in squeeze, it will fail when probing the device#601974 . The workaround described in the wiki is to use xvd{a,b,c,...} as device names (and not xvd{1,2,3,...}) to make grub happy. Once you have changed then naming schema, grub will be able to see the disks and install the bootloader. Another solution is to install the os-prober from unstable / experimental. It seems a patch is on the works.

On newly created images, you can also pass the --scsi parameter to xen-create-image to ignore this problem altogeher... I'm not sure if this will have other implications...

 
The console name is also changes from tty to xhv0 . To get back the console you should add this line in the inittab of all you VMs.
vc:2345:respawn:/sbin/getty 38400 hvc0

A last note is about the merge upstream of the xen patch !! \o/ yeiiii !!

Average: 1 (54 votes)

Package Managers Comparison

The Mancoosi Team has recently published the details of a study we conducted analyzing different packages managers available in debian. The goal of this study was to compare MPM (the mancoosi package manager) to other legacy solvers and try to get a big picture regarding of the state of the art. A similar study was conduct during EDOS and the results are still available here.

As I wrote few months ago, MPM is a proof of concept that we wrote to test the behavior of a number of solvers that have been developed for the MISC competition in a real world scenario.

These results do not show anything new w.r.t. the experience of a lot of poeple dueling daily with their machines in order to install a new piece of software. In a nutshell, we have shown that apt-get, aptitude, smart and cupt perform pretty well when used only with one baseline (for example a stable release) : this conforms with the experience of the majority of users of debian based systems. However problems start to arise when a user start mixing more then one baseline putting a lot of stress on the solver in order to find a satisfying solution. This solution of course exists, but it is cleverly hidden in the dependency structure of more then 40K packages...

MPM is not as fast as other package solvers (say 10 seconds for mpm, while apt-get is able to find a solution in 3 second), but is remarkably stable. It is always able to find a satisfying solution, even in the harder cases where all the other failed. In these experiments MPM uses the postdam solver aspcud. This solver uses only GPL components and it would be a good candidate for inclusion in debian (there are actually a couple of ITP already filled for clasp and gringo).

The results (with a lot of details) are published on the mancoosi website (a more detailed report is in the works). Enjoy !

During fosdem the Mancoosi team that authored this work (Roberto, Ralf, Zack and Me) will be around, so, please stop us for a chat ! And don't miss Ralf's talk !!

Average: 1.1 (61 votes)

The results of the Misc live competition 3rd are online !

In december we published the results of the third run of the misc live competition on the mancoosi website. I left this post in my draft folder for a quite a while now. I'll publish for posterity...

The main difference from the last misc competition is the introduction of a new track, the user track, where we we want to answer the user request, and look for an optimal solution according to an optimization criterion provided by the user.

The initial assessment of the results are quite positive. We received 6 submissions for the trendy and paranoid track and 4 for the user track. These are very interesting results. We don't have a clear winner on all tracks as in the misc 2010 competition. The cudf2msu4trendy-1.0 from INESC wins the trandy and user1 track. The aspcud-paranoid-1.3 from the university of posdam is the best on the paranoid track, while cudf2pbo4user-1.0 is the winner of the track user2 and user2.

The track paranoid and trendy are the same as in misc 2010. We actually used the same problems as in misc2010 plus 4 new categories (sarge...sid) that are a collection of problems featuring the same installation request but with increasingly large number of packages and versions per packages.

A few words on the new user track. Since in this track the optimization function was given to the solver as an additional input, we decided to try out different criteria. The first one, in user1 is what I called the "Paranoid upgrade" criteria and all problems used in this track are (real) upgrade problems. In cudf the upgrade semantic allow to effectively not upgrade at all a package (since in the solution its version must be greater or equal or the version currently installed). This definition does not goes very well together with the paranoid criteria as the best solution for an upgrade would always be a solution that do not change anything. For this reason the we defined the new criteria as '-notuptodate,-removed,-changed' where solutions that privilege new (upgraded) packages are preferred to solution that do not change anything at all.

Average: 1 (54 votes)
Syndicate content