Kolab XML Format: Proposal for an XSD friendly update

Tue Oct 18 18:46:24 CEST 2011

Hi,

Because the various implementations of the Kolab XML Format are 
difficult to maintain and are very error prone, the idea of a library to 
read/write the XML objects came up. Till and Volker from KDAB pointed 
out that using databindings based on an XML Schema (XSD) would be the 
ideal tool to develop such a library. The process of writing this schema 
brought up several problems with the format which I'm going to outline 
here.

== Why do we need a schema ==

The current format specification is not very explicit about some 
details and up to interpretation in these parts. A schema would give us 
a much stricter specification which also allows XML files to be 
validated against the spec.

Further it is a tedious job to actually implement a specification and 
make sure that the implementation really does the correct thing in all 
cornercases. Obviously most implementations will behave slightly 
different, possibly ending up with conflicts.

Fortunately there is a tool to write such a specification in a more 
useful way for XML, an XML Schema. There are various schema languages 
but the most promising for our purpose is XSD.

Using XSD, we can write a schema which can be used to validate the XML 
file. This means the XSD actually holds the promise, If a client can 
read and write a file accepted by the schema, any other client will also 
be able to read those files.

Even better, using code generators we don't have to write the 
parsing/mapping to in memory representation code, but generate it 
directly from the XSD. This completely removes the need to 
implement/test/maintain this fairly error prone part of the code.

== What will we gain from the schema ==

As said, primarily a well defined format and reduced development 
effort. But we will also get an implementation of the format from all 
clients which actually follows the spec, which is very much different 
from the situation we're having now.

If you try to set the GPG settings with KAddressBook and modify the 
same contact afterwards in Horde, your GPG settings will be gone, 
because Kontact makes use of the "unknown tags", which are not preserved 
by Horde.

With an XSD based databinding such surprises are much less likely to 
happen. Because all clients which make use of the XSD databindings will 
adhere to the spec.

You'll realise that one misbehaving client can effectively destroy the 
"Kolab expererience" that not matter which client you're using you get 
everywhere at least the features defined by the format.

Also, instead of the validation of the actual values which is now up to 
every implementation, we'll get one centrally defined in the schema. 
With the databindings we even get typesafety which gives us compile time 
errors instead of runtime errors.

Databindings based on XSD are available for various languages. We're 
targeting C++/PHP/Python for now, so we will make sure that a solution 
for these languages exists.

== Problems with the current Format ==

The current format allows some things, or is at least not explicit 
enough about it, which are hardly implementable using XSD or lead to 
other problems.
The key points are:

- Preserving of undefined tags
- Undefined order of elements
- No defined namespace

=== Preserving of undefined tags ===

Preserving unknown tags is far from trivial and a rather big 
development effort. I understand the use of an extensible format as it 
makes it very easy for vendors to implement their own special features 
using extensions (aka unknown tags). Also the idea that old clients can 
still make use of a subset of the data of
newer versions of the format is intriguing. However I think there are a 
couple of severe drawbacks which make me think unknown tags are not a 
good idea after all.

- If vendors can implement their features with unknown tags, no one 
else can make use of it. This effectively works against the idea that 
all clients support the same features and even encourages vendors to 
implement their features in their client only to have a market 
advantage. This obviously hurts the Kolab platform. If we disallowed 
unknown tags instead, we would force vendors to go through a KEP process 
to improve the Kolab format for everyone.
Of course there are values which are by definition not useful for 
others, but even those can be added to the format as a vendor extension.

This way we have a much stricter definition which is much easier to 
implement. Using XSD it is actually not feasible to allow unknown tags 
anywhere in the format, but in my opinion not allowing unknown tags is 
the only way to get a well defined format. Without the use of namespaces 
it is even impossible to implement unknown elements with XSD.

Of course this change in the format would imply that we include the 
extensions currently in use in the format.

=== Undefined order of elements ===

This is mainly a technical problem for XSD. It is not feasible to 
implement an XSD Schema with an undefined order of the elements. I would 
therefore like to make the order a requirement. I also don't see any 
drawback of this approach, especially once the implementations are base 
on the XSD, which will do that job for the developer anyway.

=== No defined namespace ===

Because the current format lacks a namespace it is not suitable to be 
used together with other XML technologies such as XSLT. The lack of a 
namespace increases the chance of nameconflicts with other formats such 
as ICAL. The design with a namespace is also more robust should we once 
want to extend the format. Afterall it is just good practice with no 
drawbacks, so I'd suggest to add a namespace if we change the format 
anyways.

== Conclusion ==

Because of these reasons I propose to change the Kolab XML Format in 
the
following ways:

- disallow unknown tags
- include now used unknown tags into the format
- make the order of the elements a requirement
- introduce a Kolab namespace

This will allow me to write an XSD for the specification which will 
make it a lot easier to ensure that all clients adhere to the 
specification.

I believe we can significantly improve the Kolab XML format this way.

With my best regards,

Christian

-- 
Christian Mollekopf
Software Engineer

Kolab Systems AG
Zürich, Switzerland

e: mollekopf at kolabsys.com
w: http://kolabsys.com

pgp: EA657400 Christian Mollekopf