Background information for KEP #17

Christian Mollekopf mollekopf at kolabsys.com
Mon Dec 12 14:34:33 CET 2011


Hey,

As a follow-up to the announcement of KEP #17 I'd like to provide you with 
some more background information.

The approach used differs slightly from what I had lined out in my last mail 
[1] in the way that the xCal/xCard based xml objects are now fully RFC 
compliant. We hope to improve the sustainability of the format this way, as 
well as lowering the adoption barrier by other projects.

The new specification will provide us with a normative, canonical storage 
format, eliminating the interoperability problems of xCal/xCard. I believe 
that this will be a solid base for us to build a server where interoperability 
between different clients truly exists.

On this occasion I would also like to share this summary, explaining some of 
the conclusions we've come to:

"When we speak about formats, we need to keep in mind we have external and 
internal formats. External formats are used for transport between applications 
by different parties, each of which have their own internal format, typically, 
e.g. Microsoft Exchange has a different internal format from Lotus Notes. 
Zarafa has largely copied the Microsoft format, but stores it in its own way 
in a SQL database, the same is true for virtually any major groupware 
solution.

All these solutions then interact through external formats, most importantly 
iCalendar and vCard, which are interpreted by each application into their 
respective internal formats for processing.

The rationale for this is that internal formats need to be sparse, normative 
and canonical. iCalendar & vCard are neither, giving them a vendor specific 
flair closely related to their internal format, which is sometimes even derived 
from iCalendar & vCard. 

This was the root cause of the interoperability issues that Kolab had in its 
first version, and are still experienced by CalDav servers today which went 
down the same route.

So Kolab XML in Kolab Format 2.0 was copying the approach of many other 
solutions to have a different internal storage format against which to work.

This, as we have seen, has its own pitfalls. Most importantly, it is a lot of 
effort to create a clean specification and implement it well. It is even harder 
to be aware of all the conceptual issues and thinking that went into the 
external formats by the various vendors participating in them trying to make 
sure their concepts could be expressed in those formats.

The result was the sometimes outright buggy specification of Kolab Format, with 
very non-normative bits, and places where it cannot model the reality of 
groupware well enough, as demonstrated in KEP 2, among others.

In addition, an internal storage format then needs to be maintained for all 
supported clients, which is expensive and deviations between those clients can 
become painful on a variety of levels.

In order to address this last point, we started discussing a libkolabformat 
library in C++ which would provide a SWIG wrapped interface against various 
languages in order to allow a single code base to maintain that internal 
storage format.

It could also provide consistency checks, and if based upon an XSD description 
to generate code, would allow all sorts of added benefits, e.g. having an 
automatic object validation processor and so on. Plus minor changes in the 
format become possible without touching the API for the clients, which means 
everyone's job gets a lot easier.

So far, so good, and definitely sensible.

In theory the advantages would exist whether we do this in a completely 
arbitrary schema in some obscure African language or re-use Kolab XML 2.0 or 
whatever, with one important difference.

If we choose something that we maintain 100%, we have 100% of the effort.

On top of that, when looking at all the issues we would like to address in the 
internal storage format, we quickly realized that for virtually anything we 
always had to look at how iCalendar and vCard were modelling things in order 
to make sure that we can express things with a certain fidelity.

So the question came up why not directly build upon the recently published 
xCalendar / xCard RFCs which are also XML and are an assembly of all the 
various concepts of all the applications out there.  So instead of repeating 
the exercise 100%, we take the existing 80% and then only provide 20% of 
normative effort to make the formats usable for long term storage.

The particular twist in this would be not to repeat what Kolab Version 1 and 
CalDav have done, but rather work from the existing RFCs and make them 
normative and canonical in nature.

For the object types where such RFCs do not yet exist we will still have to do 
the 100%, but then Calendar & Contacts are among the more complex objects, so 
gaining 80% on them will most likely make a major difference.

In consequence, everything we would write would then be RFC compliant, and we 
would know that where we model a particular concept in the Kolab format, we 
have done it in a way that is semantically compatible to the most widely used 
external format.

So we know we have less likelihood of having to break existing semantics in 
the format, that we can add and even extent functionality in compatible ways, 
and that the resulting object should be migration friendly. It also allows us 
the differentiating claim that even our internal format is Open Standards 
based.

In combination with the format library it means that existing clients will 
have an easy time becoming full Kolab clients, and that parsing of the 
external format will be mappable against our internal format.

They will obviously never replace their iCalendar / vCard libraries against 
our library, as the person years of effort that went into those cannot be 
easily duplicated, nor could we hope to ever be that complete in our 
implementation due to resource constraints.

But even more importantly. We don't want to do this.

Going for full read compliance means we would move the interoperability issues 
from the client side where they occasionally affect one client at a time, into 
the database backend, where they affect all clients all the time.

So whatever we do, we definitely want to stay with a normative and canonical 
storage format. This could be based upon Kolab XML version 1.1, or it could be 
partially re-based upon xCalendar / xCard with some additional normalization, 

The resulting format then has a realistic chance of actually being adopted by 
other projects and solutions as their internal storage format, I believe, as 
the combination of RFC & additional normative effort actually provides added 
value of Kolab over other approaches.

Also I think this will be less effort for us to maintain, freeing resources for 
more interesting and fun stuff."

Based on this I'd like to get a discussion started rather sooner than later. 
You may start firing questions, critics, suggestions or any other comments =)

Cheers,
Christian

[1] http://kolab.org/pipermail/kolab-format/2011-November/001559.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.kolab.org/pipermail/format/attachments/20111212/cd65e465/attachment.sig>


More information about the format mailing list