On "Preserving all occurrences of the same tag" in the Kolab-Format spec

Fri Jul 15 11:43:37 CEST 2011

Hi Florian,

Quoting "Florian v. Samson" <florian.samson at bsi.bund.de>:

> Hi Gunnar,
>
> separating out the aspect "Preserving all occurrences of the same XML-tag":
>
> Am Freitag, 15. Juli 2011 um 05:47:50 schrieb Gunnar Wrobel:
>> Quoting "Florian v. Samson" <florian.samson at bsi.bund.de>:
>> > Am Montag, 20. Juni 2011 um 10:28:12 schrieb Georg C. F. Greve:
>> >> On Monday 20 June 2011 10.01:19 Florian v. Samson wrote:
>> >> >
>> >> > ...
>
> On basic consideration, which is also applicable here, is:
>
>> > Good question, which lead me to this conclusion:
>> > As the already accepted KEPs demand quite invasive code (and partially
>> > design) changes in the various client libraries (actually the
>> > explanatory text in the proposal stated that in v0.2), which implement
>> > the interpretation and generation of Kolab-XML objects, and multiple
>> > proposed KEPs augment this tendency, my answer is "nowhere".
>> >
>> > ...
>> >
>> >> > 1b. If a XML-tag appears multiple times within one Kolab-XML-object,
>> >> > ALL occurrences of this XML-tag MUST be preserved.
>> >>
>> >> Ack.
>>
>> There is one area where I consider it less trivial to retain old tags
>> and would like to provide an example.
>
> The example you provided is very dependent on a (IMO not very fortunate)
> implementation of the clients internal storage format, which results in
> information loss.  This is exactly what these proposals (1a & 1b.) try to
> disallow.
>
>> Consider the
>>
>> {<email>
>>    <display-name>(string, default empty)</display-name>
>>    <smtp-address>(string, default empty)</smtp-address>
>> </email>}
>>
>> tag from http://kolab.org/doc/kolabformat-2.0-html/c295.html.
>>
>> Assume I'm using a client that finds the following information in the
>> XML part:
>>
>> <email>
>>    <display-name>Mr. Test</display-name>
>>    <smtp-address>test at example.org</smtp-address>
>>    <type>private</type>
>> </email>
>> <email>
>>    <display-name>Mr. Test</display-name>
>>    <smtp-address>work at example.org</smtp-address>
>>    <type>work</type>
>> </email>
>>
>> with "<type>" being a non-standard format addition unknown to my Kolab
>> client.
>
> Well, you will have to preserve the "<type>foo</type>"-statements anyway,
> for both entries ("preserve unknown tags"-rule 1a., which you do not seem
> to criticise).

I do not even criticise 1b., I just want to discuss what the rules  
mean for the implementation.

> So your (ex-)sample implementation (here) can check anytime, that these are
> two different entries.
>
>> Now I want to change "work at example.org" to "work at example.com".
>>
>> Assuming my client represents the mail addresses as a simple comma
>> delimited list:
>>
>> # E-Mail addresses (separate by comma): "test at example.org,
>> work at example.org"
>>
>> Now I modify this to
>>
>> # E-Mail addresses (separate by comma): "test at example.org,
>> work at example.com"
>>
>> In that case it will be pretty hard to retain the information which
>> mail-address the user actually touched.
>
> That sounds very implementation specific to me.
>
>> It is not obvious if the user
>> changed a mail address or if he removed one and added a new one.
>> Retaining the "<type>work</type>" tag will be hard. So the UI of the
>> client is part of the problem here and should be modified.
>
> I do not agree, that this is UI dependent.

Even if what you refer to as the "internal storage format" would allow  
to distinguish the different elements, the UI would not allow to do so  
in this specific example.

But you are right the problem can already start below the UI level.

> IMO this has to be addressed by
> properly designing the internal data-structures ("internal storage format")
> of that specific (sample/example) client.  If the internal representation
> of the Kolab-XML data really only is a CSV-list of e-mail addresses per
> user name, information loss occurs implicitly during format conversion,
> resulting in being unable to convert the internal data representation back
> to Kolab-XML properly.

Correct.

>
> The way out is to store all unprocessable / undisplayable information in a
> covered manner (i.e. protected and somewhat hidden; we called it "Kolab
> store" as a working name [1]), when converting into the internal data
> format.  Still this data can be used to differentiate entries by comparing
> these strings (although uninterpretable), if they differ (or compare hashes
> of them).  When converting back to Kolab-XML these covered information has
> to be uncovered by writing them unaltered back to Kolab-XML.

Hm, that sounds somewhat cumbersome. I would say that you already have  
that "Kolab store" and don't need to implement it: the XML itself. It  
contains all information and I don't see a need to rewrite the whole  
document. The only problem I still have with that approach: the unique  
ID for each node/tag that may occur multiple times. But I still  
believe it shouldn't be too hard to generate.

Cheers,

Gunnar

>
> Deeper analysis [1] and a sample implementation [2] of this scheme is
> provided by the evolution-kolab project, even though we still drop some
> information (not everything is put into the "Kolab store") as the current
> Kolab-format specification is not 100% concise on this topic.
>
>
> Cheers
>  	Florian
>
> References:
>
> [1]
> <http://sourceforge.net/apps/mediawiki/evolution-kolab/index.php?title=Conversion_Issues#Kolab_Store>
>
> [2]
> <http://evolution-kolab.git.sourceforge.net/git/gitweb.cgi?p=evolution-kolab/evolution-kolab;a=tree;f=src/libekolabconv/main;hb=HEAD>
>
> _______________________________________________
> Kolab-format mailing list
> Kolab-format at kolab.org
> https://kolab.org/mailman/listinfo/kolab-format

-- 
Core Developer
The Horde Project

e: wrobel at horde.org
t: +49 700 6245 0000
w: http://www.horde.org

pgp: 9703 43BE
tweets: http://twitter.com/pardus_de
blog: http://log.pardus.de