Kolab XML Format: Proposal for an XSD friendly update

Tue Oct 18 21:35:18 CEST 2011

Hi Christian,

Quoting Christian Mollekopf <mollekopf at kolabsys.com>:

> Hi,
>
> Because the various implementations of the Kolab XML Format are
> difficult to maintain and are very error prone, the idea of a library to
> read/write the XML objects came up. Till and Volker from KDAB pointed
> out that using databindings based on an XML Schema (XSD) would be the
> ideal tool to develop such a library. The process of writing this schema
> brought up several problems with the format which I'm going to outline
> here.
>
> == Why do we need a schema ==
>
> The current format specification is not very explicit about some
> details and up to interpretation in these parts. A schema would give us
> a much stricter specification which also allows XML files to be
> validated against the spec.
>
> Further it is a tedious job to actually implement a specification and
> make sure that the implementation really does the correct thing in all
> cornercases. Obviously most implementations will behave slightly
> different, possibly ending up with conflicts.
>
> Fortunately there is a tool to write such a specification in a more
> useful way for XML, an XML Schema. There are various schema languages
> but the most promising for our purpose is XSD.
>
> Using XSD, we can write a schema which can be used to validate the XML
> file. This means the XSD actually holds the promise, If a client can
> read and write a file accepted by the schema, any other client will also
> be able to read those files.
>
> Even better, using code generators we don't have to write the
> parsing/mapping to in memory representation code, but generate it
> directly from the XSD. This completely removes the need to
> implement/test/maintain this fairly error prone part of the code.
>
> == What will we gain from the schema ==
>
> As said, primarily a well defined format and reduced development
> effort. But we will also get an implementation of the format from all
> clients which actually follows the spec, which is very much different
> from the situation we're having now.
>
> If you try to set the GPG settings with KAddressBook and modify the
> same contact afterwards in Horde, your GPG settings will be gone,
> because Kontact makes use of the "unknown tags", which are not preserved
> by Horde.
>
> With an XSD based databinding such surprises are much less likely to
> happen. Because all clients which make use of the XSD databindings will
> adhere to the spec.
>
> You'll realise that one misbehaving client can effectively destroy the
> "Kolab expererience" that not matter which client you're using you get
> everywhere at least the features defined by the format.
>
> Also, instead of the validation of the actual values which is now up to
> every implementation, we'll get one centrally defined in the schema.
> With the databindings we even get typesafety which gives us compile time
> errors instead of runtime errors.
>
> Databindings based on XSD are available for various languages. We're
> targeting C++/PHP/Python for now, so we will make sure that a solution
> for these languages exists.
>
> == Problems with the current Format ==
>
> The current format allows some things, or is at least not explicit
> enough about it, which are hardly implementable using XSD or lead to
> other problems.
> The key points are:
>
> - Preserving of undefined tags
> - Undefined order of elements
> - No defined namespace
>
> === Preserving of undefined tags ===

I agree to a large extent with everything else you wrote in your mail.  
I think the only difficult point is "Preserving of undefined tags" -  
so I'll add my comments just here.

>
> Preserving unknown tags is far from trivial and a rather big
> development effort. I understand the use of an extensible format as it
> makes it very easy for vendors to implement their own special features
> using extensions (aka unknown tags). Also the idea that old clients can
> still make use of a subset of the data of
> newer versions of the format is intriguing. However I think there are a
> couple of severe drawbacks which make me think unknown tags are not a
> good idea after all.
>
> - If vendors can implement their features with unknown tags, no one
> else can make use of it. This effectively works against the idea that
> all clients support the same features

I do not think it is very likely that all Kolab clients will ever  
support the same feature set. I do not consider this to be a central  
idea of the format specification. The Kolab format forces the clients  
to adhere to it for Kolab features supported the clients. It does not  
force the clients to support all features though.

Why shouldn't there be a client that only knows how to use the  
"summary" and "body" field of the "note" object? Yes, not very useful.  
But what would force the vendor of this imaginary client to support  
the full Kolab format feature set if he has customers that are happy  
with this extremely reduced set of capabilities?

> and even encourages vendors to
> implement their features in their client only to have a market
> advantage.

Most of the times vendor specific tags have been used because a client  
already had a certain feature that the Kolab format just didn't  
specify. I think most often this was just done because the vendors  
would like to avoid telling people: "oh, but this feature doesn't work  
if you use Kolab as a backend"

And even if it would be used as a competing factor between Kolab  
client - would it be that dramatic? It is not like getting a patent on  
that feature and restricting the other clients from implementing  
something similar. I would assume that a cool new feature that one  
client might offer would draw some attention and finally give birth to  
another KEP so that all clients can implement the feature.

> This obviously hurts the Kolab platform.

I don't see how the extensibility itself hurts Kolab. What you  
described above - Horde overwriting Kontact extensions - is what hurts  
users. But that is not a problem caused by extensibility. This is a  
client - Horde - not being careful enough and ignoring that  
extensibility feature.

> If we disallowed
> unknown tags instead, we would force vendors to go through a KEP process
> to improve the Kolab format for everyone.

And until the KEP has been approved the vendor is unable to enable use  
of a feature that the client might already have?

> Of course there are values which are by definition not useful for
> others, but even those can be added to the format as a vendor extension.

Would that take the full KEP process?

Don't misunderstand me: I think the KEP process is great but if it  
takes a full KEP to get a tag into the format so that a client can  
support a specific feature when using Kolab as a backend seems a bit  
much.

>
> This way we have a much stricter definition which is much easier to
> implement.

To me the fact that it is easier to implement seems to be the main  
driving factor behind the request.

I'm not saying that the extensibility feature should be retained at  
all costs. But dropping it because the implementation based on an XSD  
is hard does not seem to be a good reason.

> Using XSD it is actually not feasible to allow unknown tags
> anywhere in the format, but in my opinion not allowing unknown tags is
> the only way to get a well defined format. Without the use of namespaces
> it is even impossible to implement unknown elements with XSD.

So you are saying that the use of a namespace (can't see a problem  
with that) would allow using unknown elements? Why not going into that  
direction then?

Cheers,

Gunnar

>
> Of course this change in the format would imply that we include the
> extensions currently in use in the format.

>
> === Undefined order of elements ===
>
> This is mainly a technical problem for XSD. It is not feasible to
> implement an XSD Schema with an undefined order of the elements. I would
> therefore like to make the order a requirement. I also don't see any
> drawback of this approach, especially once the implementations are base
> on the XSD, which will do that job for the developer anyway.
>
> === No defined namespace ===
>
> Because the current format lacks a namespace it is not suitable to be
> used together with other XML technologies such as XSLT. The lack of a
> namespace increases the chance of nameconflicts with other formats such
> as ICAL. The design with a namespace is also more robust should we once
> want to extend the format. Afterall it is just good practice with no
> drawbacks, so I'd suggest to add a namespace if we change the format
> anyways.
>
> == Conclusion ==
>
> Because of these reasons I propose to change the Kolab XML Format in
> the
> following ways:
>
> - disallow unknown tags
> - include now used unknown tags into the format
> - make the order of the elements a requirement
> - introduce a Kolab namespace
>
> This will allow me to write an XSD for the specification which will
> make it a lot easier to ensure that all clients adhere to the
> specification.
>
> I believe we can significantly improve the Kolab XML format this way.
>
> With my best regards,
>
> Christian
>
> --
> Christian Mollekopf
> Software Engineer
>
> Kolab Systems AG
> Zürich, Switzerland
>
> e: mollekopf at kolabsys.com
> w: http://kolabsys.com
>
> pgp: EA657400 Christian Mollekopf
>
> _______________________________________________
> Kolab-format mailing list
> Kolab-format at kolab.org
> https://kolab.org/mailman/listinfo/kolab-format

-- 
Core Developer
The Horde Project

e: wrobel at horde.org
t: +49 700 6245 0000
w: http://www.horde.org

pgp: 9703 43BE
tweets: http://twitter.com/pardus_de
blog: http://log.pardus.de