distcheck vs edos-debcheck

This is the second post about distcheck. I want to give a quick overview of the differences between edos-distcheck and the new version. First despite using the same sat solver and encoding of the problem, Distcheck has been re-written from scratch. Dose2 has several architectural problems and not very well documented. Adding new features had become too difficult and error-prone, so this was a natural choice (at least for me). Hopefully Dose3 will survive the Mancoosi project and provide a base for dependency reasoning. The framework is well documented and the architecture pretty modular. It's is written in ocaml, so sadly, I don't expect many people to join the development team, but we'll be very open to it.

These are the main differences with edos-debcheck .

Performances

distcheck is about two times faster than edos-debcheck (from dose2), but it is a "bit" slower then debcheck (say the original debcheck), that is the tool wrote by Jerome Vouillon and that was then superseded in debian by edos-debcheck. The original debcheck was a all-in-one tool that did the parsing, encoding and solving without converting the problem to any intermediate format. distcheck trades a bit of speed for generality. Since it is based on Cudf, it can handle different formats and can be easily adapted in a range of situation just by changing the encoding of the original problem to cudf.

Below there are a couple of test I've performed on my machine (debian unstable). The numbers speak alone.

$time cat tmp/squeeze.packages | edos-debcheck -failures > /dev/null
Completing conflicts...                                            * 100.0%
Conflicts and dependencies...                                      * 100.0%
Solving                                                            * 100.0%

real    0m19.515s
user    0m19.193s
sys    0m0.276s

$time ./distcheck.native -f deb://tmp/squeeze.packages > /dev/null

real    0m10.859s
user    0m10.669s
sys    0m0.172s

Input

The second big difference is about different input format. In fact, at the moment, we have two different tools in debian, one edos-debcheck and the other edos-rpmcheck. Despite using the same underlying library these two tools have different code bases. distcheck basically is a multiplexer that convert different inputs to a common format and then uses it (agnostically) to solve the installation problem. It can be called in different ways (via symlinks) to behave similarly to its predecessors.

At the moment we are able to handle 5 different formats

  1. deb:// Packages 822 format for debian based distributions
  2. hdlist:// a binary format used by rpm based distribution
  3. synth:// a simplified format to describe rpm based package repositories
  4. eclipse:// a 822 based format that encoded OSGi plugings metadata
  5. cudf:// the native cudf format

distcheck handles gz and bz2 compressed file transparently . However if you care about performances, you should decompress your input file first and the parse it with distcheck and it often takes more time to decompress the file on the fly that run the installability test itself. There is also an experimental database backend that is not compiled by default at them moment.

Output

Regarding the output, I've already explained the main differences in an old post. As a quick reminder, the old edos-debcheck had two output options. The first is a human readable - unstructured output - that was a handy source of information when running the tool interactively. The second was a xml based format (without a dtd or a schema, I believe) that was used for batch processing.

distcheck has only one output type in the YAML format that aims to be human and machine readable. Hopefully this will cater for both needs. Moreover, just recently I've added the output of distcheck a summary of who is breaking what. The output of edos-debcheck was basically a map of packages to the reasons of the breakage. In addition to this information distcheck gives also a maps between reason (a missing dependency or a conflict) to the list of packages that are broken by this problem.This additional info is off by default, but I think it can be nice to know what is the missing dependency that is responsible for the majority of problems in a distribution...

For example, calling distcheck with --summary :

$./distcheck.native --summary deb://tests/sid.packages
backgroud-packages: 29589
foreground-packages: 29589
broken-packages: 143
missing-packages: 138
conflict-packages: 5
unique-missing-packages: 52
unique-conflict-packages: 5
summary:
 -
  missing:
   missingdep: libevas-svn-05-engines-x (>= 0.9.9.063)
   packages:
    -
     package: enna-dbg
     version: 0.4.0-4
     architecture: amd64
     source: enna (= 0.4.0-4)
    -
     package: enna
     version: 0.4.0-4
     architecture: amd64
     source: enna (= 0.4.0-4)
 -
  missing:
   missingdep: libopenscenegraph56 (>= 2.8.1)
   packages:
    -
     package: libosgal1
     version: 0.6.1-2+b3
     architecture: amd64
     source: osgal (= 0.6.1-2)
    -
     package: libosgal-dev
     version: 0.6.1-2+b3
     architecture: amd64
     source: osgal (= 0.6.1-2)

Below I give a small example of the edos-debcheck output compared to the new yaml based output.

$cat tests/sid.packages | edos-debcheck -failures -explain
Completing conflicts...                                            * 100.0%
Conflicts and dependencies...                                      * 100.0%
Solving                                                            * 100.0%
zope-zms (= 1:2.11.1-03-1): FAILED
  zope-zms (= 1:2.11.1-03-1) depends on missing:
  - zope2.10
  - zope2.9
zope-tinytableplus (= 0.9-19): FAILED
  zope-tinytableplus (= 0.9-19) depends on missing:
  - zope2.11
  - zope2.10
  - zope2.9
...

And an extract from the distcheck output (the order is different. I cut and pasted parts of the output here...)

$./distcheck.native -f -e deb://tests/sid.packages
report:
 -
  package: zope-zms
  version: 1:2.11.1-03-1
  architecture: all
  source: zope-zms (= 1:2.11.1-03-1)
  status: broken
  reasons:
   -
    missing:
     pkg:
      package: zope-zms
      version: 1:2.11.1-03-1
      architecture: all
      missingdep: zope2.9 | zope2.10
 -
  package: zope-tinytableplus
  version: 0.9-19
  architecture: all
  source: zope-tinytableplus (= 0.9-19)
  status: broken
  reasons:
   -
    missing:
     pkg:
      package: zope-tinytableplus
      version: 0.9-19
      architecture: all
      missingdep: zope2.9 | zope2.10 | zope2.11
...

Future

The roadmap to release version 1.0 of distcheck is as follows:
  1. add background and foreground package selection. This feature will allow the use to specify a larger universe (background packages), but check only a subset of this universe (foreground packages). This should allow users to select packages using grep-dctrl and then pipe them to discheck . At the moment we can select individual packages on the command line or we can use expression like bash (<= 2.7) to check all version of bash in the universe with version greater than 2.7.
  2. code cleanup and a bit of refactoring between distcheck and buildcheck (that is a frontend for distcheck that allow us to report broken build dependencies)
  3. consider essential packages while performing the installation test. Here there are few things we have to understand, but the idea would be to detect possible problems related the implicit presence of essential packages in the distribution. At the moment, distcheck performs the installation test in the empty universe, while ideally, the universe should contain all essential packages.
  4. finish the documentation. The effort in underway and we hope to finalize shortly to release the debian package in experimental.
Average: 1.1 (61 votes)

Comments

I wonder why you choose the

I wonder why you choose the name distcheck. I don't know if you're aware of this, but automake automagically generates a "distcheck" target, so good luck getting your google-fu up.

I didn't google it up, mean

I didn't google it up, mean culpa. At the beginning we had debcheck and rpmcheck that were changed to edos-debcheck and edos-rpmcheck. To reflect the fact that distcheck is distribution agnostic (it can handle packages list also from the rpm world), and the code has been refactored in one tool, I generically called it distcheck. However it will answer also to edos-debcheck (via a symlink), debcheck, rpmcheck, eclipsecheck for backward compatibility... I'll leave to each distribution to choose an appropriate name, but it there are compelling reason to change the name upstream, I won't be against it ...