Question: Individual annotations vs One large annotation (conceptual riddle for the interested)

Thu Sep 15 15:47:29 CEST 2011

Quoting "Jeroen van Meeuwen (Kolab Systems)" <vanmeeuwen at kolabsys.com>:

> On 15.09.2011 08:25, Georg C. F. Greve wrote:
>> Hi all,
>>
>> In the context of drafting KEPs #12 [1] and #15 [2] there has been a
>> suggestion made by Jeroen which thus far has gone undiscussed and
>> probably was
>> missed by some people. I think it warrants more brainspace as it is a
>> basic
>> decision as to which direction we want to take not just for KEPs 12 &
>> 15, but
>> also later, subsequent KEPs.
>>
>
> Thanks Georg, for bringing this back to the attention of a broader
> audience.
>
>> The suggestion was to do an additional KEP to define a folder
>> annotation
>> '/vendor/kolab/folder-config' which would be a JSON encoded array
>> that different
>> KEPs could store different values into. (...snip...)
>>
>> (...snip...)
>>
>> The rationale for Jeroen's proposal as far as I understood it was:
>>
>> 	(a) new annotations need to be defined in the imap server, which is
>> 		considered an 'expensive' step
>>
>> 	(b) more efficiency in storing annotations
>> 		(things get stored on disk in the end)
>>
>> 	(c) with one read of the annotation, a client gets a lot of
>> information
>>
>> 	(d) because it is stored in JSON, it is still searchable
>>
>> (Jeroen: please complement. I am trying to do your proposal justice,
>> but it is
>> hard when it is not your own proposal, so please feel free to correct
>> any
>> mistakes that I may have made in presentation or substance.)
>>
>> Personally I have thus far been thinking that individual annotations
>> are
>> likely a better path, because
>>
>> 	(a) annotations are defined not ad-hoc by admins during
>> installation,
>> 		but just one configuration of many for each Kolab server version,
>> 		so defining them did not seem *that* expensive
>>
>
> The "cost" is not when /etc/imapd.annotations.conf needs to be altered,
> *if* the consumer has not edited said file. The *cost* is implied when
> the consumer has a copy of that file that is modified outside of package
> management - in which case proper packaging methods will not want to
> alter the file's contents.
>
> The *cost* is perceived to be in;
>
>    - Updating the configuration file deploying an update/upgrade to
> Kolab, when said configuration file or its filesystem metadata has been
> changed in any way (touch will do the trick),

We currently already ship this as a template in  
/kolab/etc/kolab/templates/imapd.annotation_definitions.template

So at least at the moment we already intend this to be a file the user  
should be able to modify.

>
>    - Documenting the ability to opt-out of features by removing the
> annotation, and documenting opting in, including all combinations of
> annotation keys and values, troubleshooting for and resolving issues
> with clients that may or may not assume a certain set of annotations to
> (not) be available,

As Georg pointed out: Seems to be easier to do that when having  
multiple annotations.

>
>    -
>
>> 	(b) there is no danger of write-conflicts, e.g. one client storing
>> 'color'
>> 		while another client is storing 'search' is a scenario where at
>> 		least in theory it is possible to lose an unrelated edit of a
>> 		different property based on the timing of the two writes, which
>> seems
>>  		worse than two clients editing the same property, and the last
>>  		write winning, which is what would happen if we have one
>> annotation
>>  		per property/use-case
>>
>
> For the overwriting part, it is a relatively simple clause to, on a
> single annotation, preserve the existing contents;
>
>    $json = get_annotation('folder-config')
>    // Check for potential conflicts
>    $json[key] = value
>    set_annotation($json)

This will be easier and cost less if the annotation values are smaller.

>
> With multiple annotations however, such would look like:

I would expect the clients only write a single annotation at a time.

>
>    $all_annotations = get_annotation('*')
>
>    // Whatever implements get_annotation() iterates over all annotations
> it knows about, as annotation 'keys' cannot be searched or matched by
> wildcards.
>
>    // Check for potential conflicts
>    set_annotation('<something>', '<value>')
>
> Note that:
>
>    - For each annotation, the shared as well as the private values need
> to be obtained and checked for conflicts in a full-mesh topology,
> creating 2*n(n-1)/2 where n is the number of annotations to be added to
> the current list,
>
>    - The one annotation is in a defined location, and can be retrieved
> with a single (client) command.
>    - With one annotation any potential conflict can be detected both
> when merely 'visiting' the folder as well as when attempting to 'alter'
> the folder, whereas with multiple annotations the retrieving of all
> annotations and values and resolving said conflicts is mandatory,
>    - When defining the layout of the JSON object, certain keywords for
> the top-level key can be used to indicate there is (potential for)
> conflict, even for those clients that are not compatible with a new
> feature or configuration item to be stored, for example;
>
>        /vendor/kolab/folder-config: {
>            'search': { <blabla> }
>          }
>
>     If a client not compatible with 'search' specifically where to be
> able to detect (potential for) conflict, it;
>
>       1) would not know to retrieve the '/vendor/kolab/search'
> annotation, but
>          - it would also not know what /vendor/kolab/folder-type
> 'search' was for, and
>          - any potentially pre-populated search data is completely
> wasted on said client.
>
>       2) would not know to retrieve the 'search' key in the
> folder-config JSON object, but
>          - would still be able to use potentially pre-populated data as
> using a 'folder-config' frees up 'folder-type' to be set to the original
> object type contained within the search folder, but admittedly this
> would allow saved and pre-populated searches for one type of objects
> only -on the other hand presenting multiple object types as part of a
> single folder currently seems like a client implementation nightmare,
>          - it can iterate over the JSON object nonetheless, and
>          - find it should not write to the JSON object top-level keys it
> doesn't understand, and
>          - not add any additional JSON object top-level keys.
>
>          Optionally, we can provide a keyword key-value pair that will
> allow clients to determine whether or not there would be a conflict;
>
>            /vendor/kolab/folder-config: {
>                'type_search': { <blabla> }
>              }
>
>           where 'type_*' keys tell the client what type of thing is
> configured here.
>
>
>> 		(Note: This is under the assumption there is no locking mechanism
>> 		in place for folder annotations, so we cannot protect against this
>> 		scenario.)
>>
>
> There is a locking mechanism in place for folder annotations, similar
> to the locking mechanism on IMAP folders, contents and metadata such as
> flags.
>
>> 	(c) clients only need to retrieve the information they are
>> interested in,
>> 		and since the large annotation may just get fairly large with a lot
>> 		of values stored, this may become bandwidth expensive and slower
>> 		over limited connections
>>
>
> How large an annotation is exactly depends on a variety of factors
> including but not limited to the complexity and brevity of a query
> language for search, which is yet to be explored / defined.
>
> That said, however, all annotations need to be retrieved regardless,
> for both private and shared.

I don't see why. If the folder color does not matter for the current  
view context why should the current value be retrieved?

>
>> But then it is possible that I am trying to preserve the wrong
>> resource in (a)
>> and (c) and am overly cautious in (b) and that in fact the "one
>> annotation for
>> most things" approach is better.

As probably obvious I'm leaning towards the "separate annotations".  
But I admit that I probably do not have the full picture yet. Will  
follow the discussion.

Cheers,

Gunnar

>>
>> Which is why I'm counting on the collective wisdom on this list to
>> help us
>> sort this out and come to an understanding which path we should take,
>> and why.
>>
>> Your turn.
>>
>
> Let's also think about the way we want to formulate any given 'single
> annotation' JSON object key value (i.e. the "blabla") or value of 'multi
> annotation' in either of;
>
>    /vendor/kolab/folder-config: {
>        'search': { <blabla> }
>      }
>
> or;
>
>    /vendor/kolab/folder-type: 'search'
>    /vendor/kolab/search: { <blabla> }
>
> This will need to happen in order to allow multiple clients to use,
> read and write said funkiness.
>
> Kind regards,
>
> Jeroen van Meeuwen
>
> --
> Senior Engineer, Kolab Systems AG
>
> e: vanmeeuwen at kolabsys.com
> t: +44 144 340 9500
> m: +44 74 2516 3817
> w: http://www.kolabsys.com
>
> pgp: 9342 BF08
>
> _______________________________________________
> Kolab-format mailing list
> Kolab-format at kolab.org
> https://kolab.org/mailman/listinfo/kolab-format

-- 
Core Developer
The Horde Project

e: wrobel at horde.org
t: +49 700 6245 0000
w: http://www.horde.org

pgp: 9703 43BE
tweets: http://twitter.com/pardus_de
blog: http://log.pardus.de