Update: On "Preserving unknown XML-tags and their content" in the Kolab-Format specification

Fri Jul 15 05:47:50 CEST 2011

Hi Florian,

Quoting "Florian v. Samson" <florian.samson at bsi.bund.de>:

> Hi Georg,
>
>
> Am Montag, 20. Juni 2011 um 10:28:12 schrieb Georg C. F. Greve:
>>
>> Thanks a lot on picking this up and driving the discussion forward!
>
> Well, rather a "sorry for the late answer" from my side, and "me driving
> this" will be a slow ride, as I am extremely busy and on vacation for the
> next month.
> Still, I will try not to let this thread go, and to continue to contribute
> to it.
>
>> On Monday 20 June 2011 10.01:19 Florian v. Samson wrote:
>> > 1. Kolab-clients SHOULD retain all unknown XML-tags and their embraced
>> > (i.e embedded within the opening and closing tag) content, regardless
>> > of their position in the XML-tag-tree and of their content (e.g.
>> > sub-tags); But all Kolab-clients MUST retain all top-level XML-tags
>> > along with their full content within an Kolab-object.
>>
>> This would be okay for me. What was your argument for choosing the
>> slightly less forceful "should" instead of "must"?
>
> Hurdles for implementing this in multiple existing client libraries.
>
>> Where would you agree that not preserving tags is reasonable?
>
> Good question, which lead me to this conclusion:
> As the already accepted KEPs demand quite invasive code (and partially
> design) changes in the various client libraries (actually the explanatory
> text in the proposal stated that in v0.2), which implement the
> interpretation and generation of Kolab-XML objects, and multiple proposed
> KEPs augment this tendency, my answer is "nowhere".
>
> Hence I change this to MUST in the recent draft v0.3 (attached) and
> eliminated the now superfluous special cases and comments on that.
>
>> > 1b. If a XML-tag appears multiple times within one Kolab-XML-object,
>> > ALL occurrences of this XML-tag MUST be preserved.
>>
>> Ack.

There is one area where I consider it less trivial to retain old tags  
and would like to provide an example.

Consider the

{<email>
   <display-name>(string, default empty)</display-name>
   <smtp-address>(string, default empty)</smtp-address>
</email>}

tag from http://kolab.org/doc/kolabformat-2.0-html/c295.html.

Assume I'm using a client that finds the following information in the  
XML part:

<email>
   <display-name>Mr. Test</display-name>
   <smtp-address>test at example.org</smtp-address>
   <type>private</type>
</email>
<email>
   <display-name>Mr. Test</display-name>
   <smtp-address>work at example.org</smtp-address>
   <type>work</type>
</email>

with "<type>" being a non-standard format addition unknown to my Kolab client.

Now I want to change "work at example.org" to "work at example.com".

Assuming my client represents the mail addresses as a simple comma  
delimited list:

# E-Mail addresses (separate by comma): "test at example.org, work at example.org"

Now I modify this to

# E-Mail addresses (separate by comma): "test at example.org, work at example.com"

In that case it will be pretty hard to retain the information which  
mail-address the user actually touched. It is not obvious if the user  
changed a mail address or if he removed one and added a new one.  
Retaining the "<type>work</type>" tag will be hard. So the UI of the  
client is part of the problem here and should be modified.

I just wanted to provide the example to demonstrate that the suggested  
rules do not only have an effect on the format parsing/writing but  
also on the UI. And I believe that "1b."
is less trivial for most Kolab format attributes that can occur  
multiple times (already by definition).

In general I think it would be great to provide a few "hard" examples  
for the rules the text suggests. An "acid" test for the Kolab format.  
Something that can be easily poured into some unit tests on the code  
side of life.

>>
>> > 2. IMHO a non-issue: Eventual semantic issues.  But in order to avoid
>> > them, clear guidance for future extensions (i.e. XML-tags) to the
>> > Kolab-format are needed.
>> > {Bernhard pointed out semantic issues, when old clients hit newly
>> > defined XML sub-tags (i.e. unknown to them), embedded in existing,
>> > known XML-tags, and such a client alters other known fields.  Yes, the
>> > content of the unknown (hence untouched) XML-tags may become out of
>> > sync in relation to known tags which may be altered by the old client,
>> > but I think ...
>> >  I. ... it is easily overcome by defining the XML-tags and their
>> > content properly, so they do not contain duplicate or dependent
>> > information.  In case of orthogonal (i.e. completely
>> > unrelated/uncorrelated information), this is a non-issue.
>> >  II. ... furthermore, that is a minor issue even without such carefully
>> > specified XML-tags, compared to limiting the extensibility of the
>> > Kolab-Format to top-level XML-tags only, when rule 1 (above) is
>> > mandatory.
>>
>> Also, a question one may ask is in which way the alternative - so losing
>> information - is preferrable.
>
> IMHO "never", as ...
>
>> It may be possible to "fix" an object that was modified by an older,
>> unaware client. It is usually not possible to re-generate information.
>
> .. you nicely pointed out.
>
> Draft text for guidance provided in v0.3 (attached).
>
>> > 3. The depth of nesting XML-tags MUST be limited to 7 levels at most.
>> > {A suggestion from Bernhard in order to limit the size of and parsing
>> > efforts for Kolab-objects, which otherwise may be used for
>> > Denial-of-Service attacks.  I picked the "7" as an educated guess (i.e.
>> > some considerations done, but not at all exhaustive).}
>>
>> I would be interested in the threat scenario.

Me as well here. And to stay in line with what I said above: Do you  
have an example of something that would immediately kill the parser? I  
just tested with a high nesting level on PHP and immediately got:

DOMDocument::loadXML(): Excessive depth in document: 256 use  
XML_PARSE_HUGE option in Entity, line: 258 in Command line code on  
line 1

So excessive nesting itself wouldn't kill the Kolab web client at the  
moment. And I would assume similar security measures are available in  
other languages as well.

So do we really need to specify something in the format specs?

Cheers,

Gunnar

>
> Denial of Service for all clients accessing and interpreting that
> Kolab-object.
>
>> The storage format is accessed only by authorized clients,
>
> Uh, who "authorises" Kolab-clients?
>
>> so clients that have been authorized by the user to access the
>> information.
>
> No, definitely not.
> - Step 1: A malicious Kolab-user creates a few Kolab-XML bombs, each with a
> nesting level > 10.000 (those Kolab-Objects will be a couple of 10 KB in
> size, but one could use alternatively more smaller Kolab-objects instead)
> and puts them into his shared IMAP-folder, to which he grants all users on
> this Kolab-server access.
> - Step 2: All users on this Kolab-server will be DOSsed on their next sync,
> regardless of the individual client used.
>
> Hence, no "authorised Kolab-client" involved, and not even a malicious one.
> The malicious Kolab-user can upload the forged Kolab-object and set the
> access rights on this IMAP-folder "List & Read" for all IMAP-users with any
> IMAP-tool or -client, including those which do not understand Kolab-XML.
>
>> If the client is malicious - which seems to be assumed here -
>
> Not necessarily, see above.
>
>> I don't see  how an XML nesting bomb is the most likely scenario when
>> the client might as well delete/falsify/make inconsistent the entire
>> object/database.
>
> Ugh, that is way more complicated: why should an attacker take that route,
> when there is an easier one.
> And eliminating the easiest attack vector (and the most easy to eliminate)
> is always a step which makes formerly harder attack vectors becoming the
> easiest, thus interesting for attackers.
>
>> > b. Do we need similar, but separated statements for XML-attributes?
>>
>> Yes, imho. But I am not sure it would need to be separate, the two seem
>> closely enough related.
>
> Can you (or someone) please come up with a suggestion (i.e. a paragraph WRT
> XML-attributes), which fits somewhere in this context and the draft text.
>
>
> Cheers
> 	Florian

-- 
Core Developer
The Horde Project

e: wrobel at horde.org
t: +49 700 6245 0000
w: http://www.horde.org

pgp: 9703 43BE
tweets: http://twitter.com/pardus_de
blog: http://log.pardus.de