On "Preserving all occurrences of the same tag" in the Kolab-Format spec

Fri Jul 15 11:00:11 CEST 2011

Hi Gunnar,

separating out the aspect "Preserving all occurrences of the same XML-tag":

Am Freitag, 15. Juli 2011 um 05:47:50 schrieb Gunnar Wrobel:
> Quoting "Florian v. Samson" <florian.samson at bsi.bund.de>:
> > Am Montag, 20. Juni 2011 um 10:28:12 schrieb Georg C. F. Greve:
> >> On Monday 20 June 2011 10.01:19 Florian v. Samson wrote:
> >> > 
> >> > ...

On basic consideration, which is also applicable here, is:

> > Good question, which lead me to this conclusion:
> > As the already accepted KEPs demand quite invasive code (and partially
> > design) changes in the various client libraries (actually the
> > explanatory text in the proposal stated that in v0.2), which implement
> > the interpretation and generation of Kolab-XML objects, and multiple
> > proposed KEPs augment this tendency, my answer is "nowhere".
> >
> > ...
> >
> >> > 1b. If a XML-tag appears multiple times within one Kolab-XML-object,
> >> > ALL occurrences of this XML-tag MUST be preserved.
> >>
> >> Ack.
>
> There is one area where I consider it less trivial to retain old tags
> and would like to provide an example.

The example you provided is very dependent on a (IMO not very fortunate) 
implementation of the clients internal storage format, which results in 
information loss.  This is exactly what these proposals (1a & 1b.) try to 
disallow.

> Consider the
>
> {<email>
>    <display-name>(string, default empty)</display-name>
>    <smtp-address>(string, default empty)</smtp-address>
> </email>}
>
> tag from http://kolab.org/doc/kolabformat-2.0-html/c295.html.
>
> Assume I'm using a client that finds the following information in the
> XML part:
>
> <email>
>    <display-name>Mr. Test</display-name>
>    <smtp-address>test at example.org</smtp-address>
>    <type>private</type>
> </email>
> <email>
>    <display-name>Mr. Test</display-name>
>    <smtp-address>work at example.org</smtp-address>
>    <type>work</type>
> </email>
>
> with "<type>" being a non-standard format addition unknown to my Kolab
> client.

Well, you will have to preserve the "<type>foo</type>"-statements anyway, 
for both entries ("preserve unknown tags"-rule 1a., which you do not seem 
to criticise).
So your (ex-)sample implementation (here) can check anytime, that these are 
two different entries.

> Now I want to change "work at example.org" to "work at example.com".
>
> Assuming my client represents the mail addresses as a simple comma
> delimited list:
>
> # E-Mail addresses (separate by comma): "test at example.org,
> work at example.org"
>
> Now I modify this to
>
> # E-Mail addresses (separate by comma): "test at example.org,
> work at example.com"
>
> In that case it will be pretty hard to retain the information which
> mail-address the user actually touched. 

That sounds very implementation specific to me.

> It is not obvious if the user 
> changed a mail address or if he removed one and added a new one.
> Retaining the "<type>work</type>" tag will be hard. So the UI of the
> client is part of the problem here and should be modified.

I do not agree, that this is UI dependent.  IMO this has to be addressed by 
properly designing the internal data-structures ("internal storage format") 
of that specific (sample/example) client.  If the internal representation 
of the Kolab-XML data really only is a CSV-list of e-mail addresses per 
user name, information loss occurs implicitly during format conversion, 
resulting in being unable to convert the internal data representation back 
to Kolab-XML properly.

The way out is to store all unprocessable / undisplayable information in a 
covered manner (i.e. protected and somewhat hidden; we called it "Kolab 
store" as a working name [1]), when converting into the internal data 
format.  Still this data can be used to differentiate entries by comparing 
these strings (although uninterpretable), if they differ (or compare hashes 
of them).  When converting back to Kolab-XML these covered information has 
to be uncovered by writing them unaltered back to Kolab-XML.

Deeper analysis [1] and a sample implementation [2] of this scheme is 
provided by the evolution-kolab project, even though we still drop some 
information (not everything is put into the "Kolab store") as the current 
Kolab-format specification is not 100% concise on this topic.

Cheers
 	Florian

References:

[1] 
<http://sourceforge.net/apps/mediawiki/evolution-kolab/index.php?title=Conversion_Issues#Kolab_Store>

[2] 
<http://evolution-kolab.git.sourceforge.net/git/gitweb.cgi?p=evolution-kolab/evolution-kolab;a=tree;f=src/libekolabconv/main;hb=HEAD>