Update: On "Preserving unknown XML-tags and their content" in the Kolab-Format specification

Florian v. Samson florian.samson at bsi.bund.de
Mon Jun 20 10:01:19 CEST 2011


On "Preserving unknown XML-tags and their content" in the Kolab-Format 
specification, v0.2 (2011-06-19) 


Some thoughts and suggestions on preserving unknown (i.e. future) XML-tags 
and their content in the Kolab-Format specification (after some 
brainstorming in private emails and phone-calls):

1. Kolab-clients SHOULD retain all unknown XML-tags and their embraced (i.e 
embedded within the opening and closing tag) content, regardless of their 
position in the XML-tag-tree and of their content (e.g. sub-tags); But all 
Kolab-clients MUST retain all top-level XML-tags along with their full 
content within an Kolab-object. 
{Some argue, only the latter statement shall be put into the Kolab-Format 
specification.  I do see the hurdles for various clients to implement the 
preservation of *all* unknown XML-tags regardless of their position in the 
XML-tree, but strongly believe that limiting tag-preservation to top-level 
XML-tags only vastly reduces the extensibility of the Kolab-Format, as 
mixed environments with both old and new clients are extremely problematic 
to impossible then.  Hence the rather weak "SHOULD" (for this kind of 
statement), instead of a "MUST", but I think the aim and general direction 
must be made absolutely clear.}

1b. If a XML-tag appears multiple times within one Kolab-XML-object, ALL 
occurrences of this XML-tag MUST be preserved.  
{Currently some clients only preserve their first occurrence, discarding all 
other occurrences.  Explicitly forbidding this behaviour simplifies some 
things.  More detailed reasoning can be requested from Hendrik.}

2. IMHO a non-issue: Eventual semantic issues.  But in order to avoid them, 
clear guidance for future extensions (i.e. XML-tags) to the Kolab-format 
are needed.
{Bernhard pointed out semantic issues, when old clients hit newly defined 
XML sub-tags (i.e. unknown to them), embedded in existing, known XML-tags, 
and such a client alters other known fields.  Yes, the content of the 
unknown (hence untouched) XML-tags may become out of sync in relation to 
known tags which may be altered by the old client, 
but I think ...
 I. ... it is easily overcome by defining the XML-tags and their content 
properly, so they do not contain duplicate or dependent information.  In 
case of orthogonal (i.e. completely unrelated/uncorrelated information), 
this is a non-issue.
 II. ... furthermore, that is a minor issue even without such carefully 
specified XML-tags, compared to limiting the extensibility of the 
Kolab-Format to top-level XML-tags only, when rule 1 (above) is mandatory.
}
   a. Duplicate information in XML-tag definitions
      The contents of XML-tags MUST NOT contain duplicate information (even 
if it is expressed in different ways), except for nested (sub-) tags, which 
SHOULD NOT contain information which is already provided by a tag's content 
at a higher level in the XML-tree.
   b. Dependent information in XML-tag definitions
      The contents of XML-tags MUST NOT contain dependent information (even 
if it is expressed in different ways), except for adjacent XML-tags (i.e. 
on the same level in the XML-tree), which MAY contain dependent information 
(although this is nit recommended for new tags).  Nested (sub-) tags (i.e. 
on a lower level in the XML-tree) implicitly contain information, which 
depends on the higher-level tags they are embedded in. 
      {I would love to see the "adjacent tags"-clause being taken out, but 
am unsure, if that is feasible.}

3. The depth of nesting XML-tags MUST be limited to 7 levels at most. 
{A suggestion from Bernhard in order to limit the size of and parsing 
efforts for Kolab-objects, which otherwise may be used for 
Denial-of-Service attacks.  I picked the "7" as an educated guess (i.e. 
some considerations done, but not at all exhaustive).}


a. Please discuss!

b. Do we need similar, but separated statements for XML-attributes?


Cheers
	Florian
-------------- next part --------------
On "Preserving unknown XML-tags and their content" in the Kolab-Format specification, v0.2 (2011-06-19) 


Some thoughts and suggestions on preserving unknown (i.e. future) XML-tags and their content in the Kolab-Format specification (after some brainstorming in private emails and phone-calls):

1. Kolab-clients SHOULD retain all unknown XML-tags and their embraced (i.e embedded within the opening and closing tag) content, regardless of their position in the XML-tag-tree and of their content (e.g. sub-tags); But all Kolab-clients MUST retain all top-level XML-tags along with their full content within an Kolab-object. 
{Some argue, only the latter statement shall be put into the Kolab-Format specification.  I do see the hurdles for various clients to implement the preservation of *all* unknown XML-tags regardless of their position in the XML-tree, but strongly believe that limiting tag-preservation to top-level XML-tags only vastly reduces the extensibility of the Kolab-Format, as mixed environments with both old and new clients are extremely problematic to impossible then.  Hence the rather weak "SHOULD" (for this kind of statement), instead of a "MUST", but I think the aim and general direction must be made absolutely clear.}

1b. If a XML-tag appears multiple times within one Kolab-XML-object, ALL occurrences of this XML-tag MUST be preserved.  
{Currently some clients only preserve their first occurrence, discarding all other occurrences.  Explicitly forbidding this behaviour simplifies some things.  More detailed reasoning can be requested from Hendrik.}

2. IMHO a non-issue: Eventual semantic issues.  But in order to avoid them, clear guidance for future extensions (i.e. XML-tags) to the Kolab-format are needed.
{Bernhard pointed out semantic issues, when old clients hit newly defined XML sub-tags (i.e. unknown to them), embedded in existing, known XML-tags, and such a client alters other known fields.  Yes, the content of the unknown (hence untouched) XML-tags may become out of sync in relation to known tags which may be altered by the old client, 
but I think ...
 I. ... it is easily overcome by defining the XML-tags and their content properly, so they do not contain duplicate or dependent information.  In case of orthogonal (i.e. completely unrelated/uncorrelated information), this is a non-issue.
 II. ... furthermore, that is a minor issue even without such carefully specified XML-tags, compared to limiting the extensibility of the Kolab-Format to top-level XML-tags only, when rule 1 (above) is mandatory.
}
   a. Duplicate information in XML-tag definitions
      The contents of XML-tags MUST NOT contain duplicate information (even if it is expressed in different ways), except for nested (sub-) tags, which SHOULD NOT contain information which is already provided by a tag's content at a higher level in the XML-tree.
   b. Dependent information in XML-tag definitions
      The contents of XML-tags MUST NOT contain dependent information (even if it is expressed in different ways), except for adjacent XML-tags (i.e. on the same level in the XML-tree), which MAY contain dependent information (although this is nit recommended for new tags).  Nested (sub-) tags (i.e. on a lower level in the XML-tree) implicitly contain information, which depends on the higher-level tags they are embedded in. 
      {I would love to see the "adjacent tags"-clause being taken out, but am unsure, if that is feasible.}

3. The depth of nesting XML-tags MUST be limited to 7 levels at most. 
{A suggestion from Bernhard in order to limit the size of and parsing efforts for Kolab-objects, which otherwise may be used for Denial-of-Service attacks.  I picked the "7" as an educated guess (i.e. some considerations done, but not at all exhaustive).}


a. Please discuss!

b. Do we need similar, but separated statements for XML-attributes?


Cheers
	Florian


More information about the format mailing list