Question: Individual annotations vs One large annotation (conceptual riddle for the interested)
Jeroen van Meeuwen (Kolab Systems)
vanmeeuwen at kolabsys.com
Thu Sep 15 12:47:49 CEST 2011
On 15.09.2011 08:25, Georg C. F. Greve wrote:
> Hi all,
>
> In the context of drafting KEPs #12 [1] and #15 [2] there has been a
> suggestion made by Jeroen which thus far has gone undiscussed and
> probably was
> missed by some people. I think it warrants more brainspace as it is a
> basic
> decision as to which direction we want to take not just for KEPs 12 &
> 15, but
> also later, subsequent KEPs.
>
Thanks Georg, for bringing this back to the attention of a broader
audience.
> The suggestion was to do an additional KEP to define a folder
> annotation
> '/vendor/kolab/folder-config' which would be a JSON encoded array
> that different
> KEPs could store different values into. (...snip...)
>
> (...snip...)
>
> The rationale for Jeroen's proposal as far as I understood it was:
>
> (a) new annotations need to be defined in the imap server, which is
> considered an 'expensive' step
>
> (b) more efficiency in storing annotations
> (things get stored on disk in the end)
>
> (c) with one read of the annotation, a client gets a lot of
> information
>
> (d) because it is stored in JSON, it is still searchable
>
> (Jeroen: please complement. I am trying to do your proposal justice,
> but it is
> hard when it is not your own proposal, so please feel free to correct
> any
> mistakes that I may have made in presentation or substance.)
>
> Personally I have thus far been thinking that individual annotations
> are
> likely a better path, because
>
> (a) annotations are defined not ad-hoc by admins during
> installation,
> but just one configuration of many for each Kolab server version,
> so defining them did not seem *that* expensive
>
The "cost" is not when /etc/imapd.annotations.conf needs to be altered,
*if* the consumer has not edited said file. The *cost* is implied when
the consumer has a copy of that file that is modified outside of package
management - in which case proper packaging methods will not want to
alter the file's contents.
The *cost* is perceived to be in;
- Updating the configuration file deploying an update/upgrade to
Kolab, when said configuration file or its filesystem metadata has been
changed in any way (touch will do the trick),
- Documenting the ability to opt-out of features by removing the
annotation, and documenting opting in, including all combinations of
annotation keys and values, troubleshooting for and resolving issues
with clients that may or may not assume a certain set of annotations to
(not) be available,
-
> (b) there is no danger of write-conflicts, e.g. one client storing
> 'color'
> while another client is storing 'search' is a scenario where at
> least in theory it is possible to lose an unrelated edit of a
> different property based on the timing of the two writes, which
> seems
> worse than two clients editing the same property, and the last
> write winning, which is what would happen if we have one
> annotation
> per property/use-case
>
For the overwriting part, it is a relatively simple clause to, on a
single annotation, preserve the existing contents;
$json = get_annotation('folder-config')
// Check for potential conflicts
$json[key] = value
set_annotation($json)
With multiple annotations however, such would look like:
$all_annotations = get_annotation('*')
// Whatever implements get_annotation() iterates over all annotations
it knows about, as annotation 'keys' cannot be searched or matched by
wildcards.
// Check for potential conflicts
set_annotation('<something>', '<value>')
Note that:
- For each annotation, the shared as well as the private values need
to be obtained and checked for conflicts in a full-mesh topology,
creating 2*n(n-1)/2 where n is the number of annotations to be added to
the current list,
- The one annotation is in a defined location, and can be retrieved
with a single (client) command.
- With one annotation any potential conflict can be detected both
when merely 'visiting' the folder as well as when attempting to 'alter'
the folder, whereas with multiple annotations the retrieving of all
annotations and values and resolving said conflicts is mandatory,
- When defining the layout of the JSON object, certain keywords for
the top-level key can be used to indicate there is (potential for)
conflict, even for those clients that are not compatible with a new
feature or configuration item to be stored, for example;
/vendor/kolab/folder-config: {
'search': { <blabla> }
}
If a client not compatible with 'search' specifically where to be
able to detect (potential for) conflict, it;
1) would not know to retrieve the '/vendor/kolab/search'
annotation, but
- it would also not know what /vendor/kolab/folder-type
'search' was for, and
- any potentially pre-populated search data is completely
wasted on said client.
2) would not know to retrieve the 'search' key in the
folder-config JSON object, but
- would still be able to use potentially pre-populated data as
using a 'folder-config' frees up 'folder-type' to be set to the original
object type contained within the search folder, but admittedly this
would allow saved and pre-populated searches for one type of objects
only -on the other hand presenting multiple object types as part of a
single folder currently seems like a client implementation nightmare,
- it can iterate over the JSON object nonetheless, and
- find it should not write to the JSON object top-level keys it
doesn't understand, and
- not add any additional JSON object top-level keys.
Optionally, we can provide a keyword key-value pair that will
allow clients to determine whether or not there would be a conflict;
/vendor/kolab/folder-config: {
'type_search': { <blabla> }
}
where 'type_*' keys tell the client what type of thing is
configured here.
> (Note: This is under the assumption there is no locking mechanism
> in place for folder annotations, so we cannot protect against this
> scenario.)
>
There is a locking mechanism in place for folder annotations, similar
to the locking mechanism on IMAP folders, contents and metadata such as
flags.
> (c) clients only need to retrieve the information they are
> interested in,
> and since the large annotation may just get fairly large with a lot
> of values stored, this may become bandwidth expensive and slower
> over limited connections
>
How large an annotation is exactly depends on a variety of factors
including but not limited to the complexity and brevity of a query
language for search, which is yet to be explored / defined.
That said, however, all annotations need to be retrieved regardless,
for both private and shared.
> But then it is possible that I am trying to preserve the wrong
> resource in (a)
> and (c) and am overly cautious in (b) and that in fact the "one
> annotation for
> most things" approach is better.
>
> Which is why I'm counting on the collective wisdom on this list to
> help us
> sort this out and come to an understanding which path we should take,
> and why.
>
> Your turn.
>
Let's also think about the way we want to formulate any given 'single
annotation' JSON object key value (i.e. the "blabla") or value of 'multi
annotation' in either of;
/vendor/kolab/folder-config: {
'search': { <blabla> }
}
or;
/vendor/kolab/folder-type: 'search'
/vendor/kolab/search: { <blabla> }
This will need to happen in order to allow multiple clients to use,
read and write said funkiness.
Kind regards,
Jeroen van Meeuwen
--
Senior Engineer, Kolab Systems AG
e: vanmeeuwen at kolabsys.com
t: +44 144 340 9500
m: +44 74 2516 3817
w: http://www.kolabsys.com
pgp: 9342 BF08
More information about the format
mailing list