## OSS2017 Adoption of academic tools in open source communities: The Debian case Study

Date

This is a joint work with Roberto Di Cosmo presented at OSS2017 in Buenos Aires, Argentina.

# Abstract

Component repositories play a key role in the open software ecosystem. Managing the evolvution of these repositories is a challenging task, and maintainers are confronted with a number of complex issues that need automatic tools to be adressed properly.

In this paper, we present an overview of 10 years of research in this field and the process leading to the adoption of our tools in a FOSS community. We focus on the Debian distribution and in particular we look at the issues arising during the distribution lifecycle: ensuring buildability of source packages, detecting packages that cannot be installed and bootstrapping the distribution on a new architecture. We present three tools, {\em distcheck}, {\em buildcheck} and {\em botch}, that we believe of general interest for other open source component repositories.

The lesson we have learned during this journey may provide useful guidance for researchers willing to see their tools broadly adopted by the community.

Paper: oss2017.pdf

Slides: oss2017-slides.pdf

## matplotlib and multiple y-axis scales

This week I had to create a plot using two different scales in the same graph to show the evolution of two related, but not directly comparable, variables. This operation is described in this FAQ on the matplot lib website. Nonetheless I’d like to give a small step by step example…

Consider my input data of the form date release total broken outdated .

20110110T034549Z unstable 29989 133 3
20110210T034103Z wheezy 28900 8 0
20110210T034103Z unstable 30125 209 11
20110310T060132Z wheezy 29179 8 0
20110310T060132Z unstable 30230 945 28
20110410T040442Z wheezy 29487 8 0
20110410T040442Z unstable 31142 991 12
20110510T034745Z wheezy 30247 8 0
20110510T034745Z unstable 31867 610 31
20110610T041209Z wheezy 30328 9 0
20110610T041209Z unstable 32395 328 15
20110710T030855Z wheezy 31403 9 0


I want to create one graph containing three sub graphs, each one containing data for unstable and wheezy. For the sub graph plotting the total number of packages, since the data is kinda uniform, the plot is pretty and self explanatory. The problem arise if we compare the non installable packages in unstable and wheezy, since the data from unstable will squash the plot for wheezy, making it useless.

Below I’ve added the commented python code and the resulting graph. You can get the full source of this example here.

# plot two distribution with different scales
def plotmultiscale(dists,dist1,dist2,output) :

fig = plt.figure()
# add the main title for the figure
fig.suptitle("Evalution during wheezy release cycle")
# set date formatting. This is important to have dates pretty printed
fig.autofmt_xdate()

# we create the first sub graph, plot the two data sets and set the legend
ax1 = fig.add_subplot(311,title='Total Packages vs Time')
ax1.plot(dists[dist1]['date'],dists[dist1]['total'],'o-',label=dist1.capitalize())
ax1.plot(dists[dist2]['date'],dists[dist2]['total'],'s-',label=dist2.capitalize())
ax1.legend(loc='upper left')

# we need explicitly to remove the labels for the x axis
ax1.xaxis.set_visible(False)

# we add the second sub graph and plot the first data set
ax2 = fig.add_subplot(312,title='Non-Installable Packages vs Time')
ax2.plot(dists[dist1]['date'],dists[dist1]['broken'],'o-',label=dist1.capitalize())
ax2.xaxis.set_visible(False)

# now the fun part. The function twinx() give us access to a second plot that
# overlays the graph ax2 and shares the same X axis, but not the Y axis
ax22 = ax2.twinx()
# we plot the second data set
ax22.plot(dists[dist2]['date'],dists[dist2]['broken'],'gs-',label=dist2.capitalize())
# and we set a nice limit for our data to make it prettier
ax22.set_ylim(0, 20)

# we do the same for the third sub graph
ax3 = fig.add_subplot(313,title='Outdated Packages vs Time')
ax3.plot(dists[dist1]['date'],dists[dist1]['outdated'],'o-',label=dist1.capitalize())

ax33 = ax3.twinx()
ax33.plot(dists[dist2]['date'],dists[dist2]['outdated'],'gs-',label=dist2.capitalize())
ax33.set_ylim(0, 10)

# this last function is necessary to reset the date formatting with 30 deg rotation
# that somehow we lost while using twinx() ...
plt.setp(ax3.xaxis.get_majorticklabels(), rotation=30)

# And we save the result
plt.savefig(output)


## Mini Debian Conf 2012 in Paris : Bootstrapping Debian for a new architecture

Date Tags debian

I just finished to address the awesome debian crowd at the Mini Deb conf in paris. My presentation was about a few challenges we have ahead to bootstrap debian on a new architecture. Johannes Schauer and Wookey did a lot of work in the last few months particularly focusing on Linaro/Ubuntu. After Wheezy I think it is important to catch up with their work and integrate it in debian.

The two main take away messages from my presentation :

• ”’Add Multi Arch annotations”’ to your packages. This is essential to cross compile packages automatically. We are still not able to cross compile a minimal debian system in debian because we still miss many multi-arch annotations. Experiments show that with these annotations, these packages will cross compile just fine. A lot of work has been done in this direction by Wookey.

• Debian should ”’consider adding build profiles”’ to build dependencies. Build Profiles are global build dependencies filters to create packages with a different set of functionalities. Build profiles are of the form Build-Depends: foo (>=0.1) [amd64] <!stage1 bootstrap> | bar.

This week we just reached an important milestone toward a fully automatic bootstrap procedure. Hopefully we are going to tell you more about this work during fosdem 2013

My slides are attached.

## Generic Graphml Printer for OcamlGraph

Graphml is a nice and widely used graph description format. This is a micro module to print OcamlGraph - graphs in this format.

The signature is minimal. Since in GraphMl all attributes are typed, we only need two functions to describe the name, type a default value for the attributes of each vertex and edge, and two functions to map the value of each vertex and edge to a key / value list.

module type GraphmlSig =
sig
include Graph.Sig.G
(** the format is (key, type of the key, default value *)
val default_vertex_properties : (string * string * string option) list
val default_edge_properties : (string * string * string option) list

(** the format is (key, value *)
val data_map_vertex : vertex -> (string * string) list
val data_map_edge : edge -> (string * string) list
end
;;

module type GraphmlPrinterSig =
sig
type t
val pp_graph : Format.formatter -> t -> unit
val to_file : t -> string -> unit
end


To give a small example, we build a simple graph with three vertex and two edges . In this case we only print the id of the node.

open Graphml

module V = struct
type t = int
let compare = compare
let hash i = i
let equal = (=)
end

module G = Graph.Imperative.Digraph.ConcreteBidirectional(V)

module Gr = struct
include G
let default_vertex_properties = ["id","string",None]
let default_edge_properties = []
let data_map_edge e = []
let data_map_vertex v = ["id",string_of_int v]
end

module GraphPrinter = GraphmlPrinter (Gr) ;;

let print g = GraphPrinter.pp_graph Format.std_formatter g ;;

let g = G.create () in
print g;;


Use use ocamlbuild to compile the lot.

\$ocamlbuild -use-ocamlfind -package ocamlgraph test.native


The result looks like this. I agree the formatting is not perfect …

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="id" for="node" attr.name="id" attr.type="string">

<graph id="G" edgedefault="directed">
<node id="n1">
<data key="id">1</data> </node>
<node id="n2">
<data key="id">2</data>
</node>
<node id="n3">
<data key="id">3</data>
</node>

<edge id="e131199" source="n1" target="n2">
</edge>
<edge id="e196798" source="n1" target="n3">
</edge>

</graph></graphml>


Using GraphTool, you can easily access a zillion more algorithms from the boost graph library. To be sincere, GraphTool accepts also graphs in dot and gml format. Graphml is the default format for GraphTool as it contains precise type information.

#! /usr/bin/env python
from graph_tool.all import *

graph_draw(g, pos=None, output_size=(420, 420), output="ex.pdf")


## Update

A refined version of the module Graphml is going to be included in the next release of ocamlgraph ! The tgz is attached to this message.

## lesson learned making a cargo kilt

This entry is not really about computers, technology, or other work-related topics, but more about a hard-hack that I wanted to try for a while. How to make a kilt !!!

After a bit of duck-ducking, I decided to follow this excellent tutorial. At first sight the entire process seems a bit long, but you will realize that after the first read, that everything boils down to 3 steps: measure, fold and pin, sew.

For the measure part, I’ve the impression that the formula that is given in the instructable (waist/3*8+1) is a bit short for my comfort and taste. This is the size for the internal apron, the folded part that goes all around your left hip, back, and right hip, and the front apron. My suggestion would be to make the inner apron a bit longer then the front apron. This way the kilt will feel in my opinion more comfortable and it will envelop you body completely.

For the fold and pin part, you just need a bit of patience and a ruler. Put the pins parallel to the folding as in the instructable and not perpendicular. This will help you later when sewing everything.

The sewing … If you know how to use a sewing machine, this is going to be a piece of cake. Otherwise, well, I spent more time troubleshooting the machine then sewing the kilt. I broke a few needles in the process and learned how to thread the machine with my eyes blindfolded. Not to mention that you have to learn how to disassemble this machines in a thousand parts to understand how the thread got stuck. It was fun. A lesson that I’ve learned is that a sewing machine works much better in the morning than late at night when you are tired and sleepy. Really !

Other then that, it was a fun experience. Maybe I’ll make another one to commit this skill to mind. Maybe I’ll run a kilt making workshop at the next debconf :)