Kolab XML Format: Proposal for an XSD friendly update
mollekopf at kolabsys.com
Tue Oct 18 18:46:24 CEST 2011
Because the various implementations of the Kolab XML Format are
difficult to maintain and are very error prone, the idea of a library to
read/write the XML objects came up. Till and Volker from KDAB pointed
out that using databindings based on an XML Schema (XSD) would be the
ideal tool to develop such a library. The process of writing this schema
brought up several problems with the format which I'm going to outline
== Why do we need a schema ==
The current format specification is not very explicit about some
details and up to interpretation in these parts. A schema would give us
a much stricter specification which also allows XML files to be
validated against the spec.
Further it is a tedious job to actually implement a specification and
make sure that the implementation really does the correct thing in all
cornercases. Obviously most implementations will behave slightly
different, possibly ending up with conflicts.
Fortunately there is a tool to write such a specification in a more
useful way for XML, an XML Schema. There are various schema languages
but the most promising for our purpose is XSD.
Using XSD, we can write a schema which can be used to validate the XML
file. This means the XSD actually holds the promise, If a client can
read and write a file accepted by the schema, any other client will also
be able to read those files.
Even better, using code generators we don't have to write the
parsing/mapping to in memory representation code, but generate it
directly from the XSD. This completely removes the need to
implement/test/maintain this fairly error prone part of the code.
== What will we gain from the schema ==
As said, primarily a well defined format and reduced development
effort. But we will also get an implementation of the format from all
clients which actually follows the spec, which is very much different
from the situation we're having now.
If you try to set the GPG settings with KAddressBook and modify the
same contact afterwards in Horde, your GPG settings will be gone,
because Kontact makes use of the "unknown tags", which are not preserved
With an XSD based databinding such surprises are much less likely to
happen. Because all clients which make use of the XSD databindings will
adhere to the spec.
You'll realise that one misbehaving client can effectively destroy the
"Kolab expererience" that not matter which client you're using you get
everywhere at least the features defined by the format.
Also, instead of the validation of the actual values which is now up to
every implementation, we'll get one centrally defined in the schema.
With the databindings we even get typesafety which gives us compile time
errors instead of runtime errors.
Databindings based on XSD are available for various languages. We're
targeting C++/PHP/Python for now, so we will make sure that a solution
for these languages exists.
== Problems with the current Format ==
The current format allows some things, or is at least not explicit
enough about it, which are hardly implementable using XSD or lead to
The key points are:
- Preserving of undefined tags
- Undefined order of elements
- No defined namespace
=== Preserving of undefined tags ===
Preserving unknown tags is far from trivial and a rather big
development effort. I understand the use of an extensible format as it
makes it very easy for vendors to implement their own special features
using extensions (aka unknown tags). Also the idea that old clients can
still make use of a subset of the data of
newer versions of the format is intriguing. However I think there are a
couple of severe drawbacks which make me think unknown tags are not a
good idea after all.
- If vendors can implement their features with unknown tags, no one
else can make use of it. This effectively works against the idea that
all clients support the same features and even encourages vendors to
implement their features in their client only to have a market
advantage. This obviously hurts the Kolab platform. If we disallowed
unknown tags instead, we would force vendors to go through a KEP process
to improve the Kolab format for everyone.
Of course there are values which are by definition not useful for
others, but even those can be added to the format as a vendor extension.
This way we have a much stricter definition which is much easier to
implement. Using XSD it is actually not feasible to allow unknown tags
anywhere in the format, but in my opinion not allowing unknown tags is
the only way to get a well defined format. Without the use of namespaces
it is even impossible to implement unknown elements with XSD.
Of course this change in the format would imply that we include the
extensions currently in use in the format.
=== Undefined order of elements ===
This is mainly a technical problem for XSD. It is not feasible to
implement an XSD Schema with an undefined order of the elements. I would
therefore like to make the order a requirement. I also don't see any
drawback of this approach, especially once the implementations are base
on the XSD, which will do that job for the developer anyway.
=== No defined namespace ===
Because the current format lacks a namespace it is not suitable to be
used together with other XML technologies such as XSLT. The lack of a
namespace increases the chance of nameconflicts with other formats such
as ICAL. The design with a namespace is also more robust should we once
want to extend the format. Afterall it is just good practice with no
drawbacks, so I'd suggest to add a namespace if we change the format
== Conclusion ==
Because of these reasons I propose to change the Kolab XML Format in
- disallow unknown tags
- include now used unknown tags into the format
- make the order of the elements a requirement
- introduce a Kolab namespace
This will allow me to write an XSD for the specification which will
make it a lot easier to ensure that all clients adhere to the
I believe we can significantly improve the Kolab XML format this way.
With my best regards,
Kolab Systems AG
e: mollekopf at kolabsys.com
pgp: EA657400 Christian Mollekopf
More information about the format