Question: Individual annotations vs One large annotation (conceptual riddle for the interested)

Thu Sep 15 12:47:49 CEST 2011

On 15.09.2011 08:25, Georg C. F. Greve wrote:
> Hi all,
>
> In the context of drafting KEPs #12 [1] and #15 [2] there has been a
> suggestion made by Jeroen which thus far has gone undiscussed and
> probably was
> missed by some people. I think it warrants more brainspace as it is a 
> basic
> decision as to which direction we want to take not just for KEPs 12 &
> 15, but
> also later, subsequent KEPs.
>

Thanks Georg, for bringing this back to the attention of a broader 
audience.

> The suggestion was to do an additional KEP to define a folder 
> annotation
> '/vendor/kolab/folder-config' which would be a JSON encoded array
> that different
> KEPs could store different values into. (...snip...)
>
> (...snip...)
>
> The rationale for Jeroen's proposal as far as I understood it was:
>
> 	(a) new annotations need to be defined in the imap server, which is
> 		considered an 'expensive' step
>
> 	(b) more efficiency in storing annotations
> 		(things get stored on disk in the end)
>
> 	(c) with one read of the annotation, a client gets a lot of 
> information
>
> 	(d) because it is stored in JSON, it is still searchable
>
> (Jeroen: please complement. I am trying to do your proposal justice,
> but it is
> hard when it is not your own proposal, so please feel free to correct 
> any
> mistakes that I may have made in presentation or substance.)
>
> Personally I have thus far been thinking that individual annotations 
> are
> likely a better path, because
>
> 	(a) annotations are defined not ad-hoc by admins during 
> installation,
> 		but just one configuration of many for each Kolab server version,
> 		so defining them did not seem *that* expensive
>

The "cost" is not when /etc/imapd.annotations.conf needs to be altered, 
*if* the consumer has not edited said file. The *cost* is implied when 
the consumer has a copy of that file that is modified outside of package 
management - in which case proper packaging methods will not want to 
alter the file's contents.

The *cost* is perceived to be in;

   - Updating the configuration file deploying an update/upgrade to 
Kolab, when said configuration file or its filesystem metadata has been 
changed in any way (touch will do the trick),

   - Documenting the ability to opt-out of features by removing the 
annotation, and documenting opting in, including all combinations of 
annotation keys and values, troubleshooting for and resolving issues 
with clients that may or may not assume a certain set of annotations to 
(not) be available,

   -

> 	(b) there is no danger of write-conflicts, e.g. one client storing 
> 'color'
> 		while another client is storing 'search' is a scenario where at
> 		least in theory it is possible to lose an unrelated edit of a
> 		different property based on the timing of the two writes, which 
> seems
>  		worse than two clients editing the same property, and the last
>  		write winning, which is what would happen if we have one 
> annotation
>  		per property/use-case
>

For the overwriting part, it is a relatively simple clause to, on a 
single annotation, preserve the existing contents;

   $json = get_annotation('folder-config')
   // Check for potential conflicts
   $json[key] = value
   set_annotation($json)

With multiple annotations however, such would look like:

   $all_annotations = get_annotation('*')

   // Whatever implements get_annotation() iterates over all annotations 
it knows about, as annotation 'keys' cannot be searched or matched by 
wildcards.

   // Check for potential conflicts
   set_annotation('<something>', '<value>')

Note that:

   - For each annotation, the shared as well as the private values need 
to be obtained and checked for conflicts in a full-mesh topology, 
creating 2*n(n-1)/2 where n is the number of annotations to be added to 
the current list,

   - The one annotation is in a defined location, and can be retrieved 
with a single (client) command.
   - With one annotation any potential conflict can be detected both 
when merely 'visiting' the folder as well as when attempting to 'alter' 
the folder, whereas with multiple annotations the retrieving of all 
annotations and values and resolving said conflicts is mandatory,
   - When defining the layout of the JSON object, certain keywords for 
the top-level key can be used to indicate there is (potential for) 
conflict, even for those clients that are not compatible with a new 
feature or configuration item to be stored, for example;

       /vendor/kolab/folder-config: {
           'search': { <blabla> }
         }

    If a client not compatible with 'search' specifically where to be 
able to detect (potential for) conflict, it;

      1) would not know to retrieve the '/vendor/kolab/search' 
annotation, but
         - it would also not know what /vendor/kolab/folder-type 
'search' was for, and
         - any potentially pre-populated search data is completely 
wasted on said client.

      2) would not know to retrieve the 'search' key in the 
folder-config JSON object, but
         - would still be able to use potentially pre-populated data as 
using a 'folder-config' frees up 'folder-type' to be set to the original 
object type contained within the search folder, but admittedly this 
would allow saved and pre-populated searches for one type of objects 
only -on the other hand presenting multiple object types as part of a 
single folder currently seems like a client implementation nightmare,
         - it can iterate over the JSON object nonetheless, and
         - find it should not write to the JSON object top-level keys it 
doesn't understand, and
         - not add any additional JSON object top-level keys.

         Optionally, we can provide a keyword key-value pair that will 
allow clients to determine whether or not there would be a conflict;

           /vendor/kolab/folder-config: {
               'type_search': { <blabla> }
             }

          where 'type_*' keys tell the client what type of thing is 
configured here.

> 		(Note: This is under the assumption there is no locking mechanism
> 		in place for folder annotations, so we cannot protect against this
> 		scenario.)
>

There is a locking mechanism in place for folder annotations, similar 
to the locking mechanism on IMAP folders, contents and metadata such as 
flags.

> 	(c) clients only need to retrieve the information they are 
> interested in,
> 		and since the large annotation may just get fairly large with a lot
> 		of values stored, this may become bandwidth expensive and slower
> 		over limited connections
>

How large an annotation is exactly depends on a variety of factors 
including but not limited to the complexity and brevity of a query 
language for search, which is yet to be explored / defined.

That said, however, all annotations need to be retrieved regardless, 
for both private and shared.

> But then it is possible that I am trying to preserve the wrong
> resource in (a)
> and (c) and am overly cautious in (b) and that in fact the "one
> annotation for
> most things" approach is better.
>
> Which is why I'm counting on the collective wisdom on this list to 
> help us
> sort this out and come to an understanding which path we should take,
> and why.
>
> Your turn.
>

Let's also think about the way we want to formulate any given 'single 
annotation' JSON object key value (i.e. the "blabla") or value of 'multi 
annotation' in either of;

   /vendor/kolab/folder-config: {
       'search': { <blabla> }
     }

or;

   /vendor/kolab/folder-type: 'search'
   /vendor/kolab/search: { <blabla> }

This will need to happen in order to allow multiple clients to use, 
read and write said funkiness.

Kind regards,

Jeroen van Meeuwen

-- 
Senior Engineer, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
t: +44 144 340 9500
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08