Question: Individual annotations vs One large annotation (conceptual riddle for the interested)

Thu Sep 15 16:52:33 CEST 2011

On 15.09.2011 13:05, Georg C. F. Greve wrote:
> On Thursday 15 September 2011 11.47:49 Jeroen van Meeuwen wrote:
>> The "cost" is not when /etc/imapd.annotations.conf needs to be 
>> altered,
>> *if* the consumer has not edited said file. The *cost* is implied 
>> when
>> the consumer has a copy of that file that is modified outside of 
>> package
>> management - in which case proper packaging methods will not want to
>> alter the file's contents.
>
> True, although I guess we will never be at the point where we won't 
> define
> *any* additional annotations, even the folder-config annotation would
> have to
> be defined.
>
> So there will always be SOME edits of the annotation file.
>

Yes, but pushing them out regularly just because there will always be 
some edits to the annotations file anyway would be a flawed 
justification to just ignore the cost as a downside of this option, 
while we have other options - so I had to point it out.

>>    - Documenting the ability to opt-out of features by removing the
>> annotation, and documenting opting in, including all combinations of
>> annotation keys and values, troubleshooting for and resolving issues
>> with clients that may or may not assume a certain set of annotations 
>> to
>> (not) be available,
>
> True.
>
> Although the ability to opt-out or block a certain feature 
> installation wide
> in a reliable way would be an argument for the annotation per use 
> case,
> because an annotation that is not defined is simply not usable, 
> whereas a
> key/value pair for folder-config can be set regardless of whether it
> is meant to be set or not.
>

Opting in and out of features such as saved searches or color though 
has not been made a requirement or feature request as of yet. Should it 
become a requirement or feature request, we can run in different circles 
about its feasibility to implement either server-side or client-side and 
the value of the -then to be explored- use-cases.

>> For the overwriting part, it is a relatively simple clause to, on a
>> single annotation, preserve the existing contents;
>
> This is not the case that I was concerned about.
>
> Think of the following:
>
> 	Client A reads folder-config
>
> 	Client B reads folder-config
>
> 	Client A sets 'search' to new value
>
> 	Client B sets 'color' to new value
>
> The modification of 'search' by Client A is now lost in a way

First of all, the same argument would apply to both clients operating 
either the 'search' or the 'color' separate annotation -if in the same 
namespace.

Second, a more relevant concern would apply to both clients operating 
what will end up being mutually exclusive, separate annotations or 
mutually exclusive options in the values of separate annotations. The 
risk here is greater, because it takes longer to obtain all annotations, 
2*n(n-1)/2 parse them (and detect conflicts), increasing the interval in 
which another client could supposedly change the original value of the 
annotation or any other annotation.

Third, the same would apply to client A 'deleting' a message client B 
is 'flagging' or any other combination of such.

Fourth, while the one client is writing (to the mailbox path in the 
annotations database), the other client will get a big fat NO response 
-if it attempts to write at the very same moment- as the mailbox 
annotation database would be locked for the submission of entry by the 
one client. If the other client is stupid enough to use yesterday's 
annotations, or without polling for updates after a "NO" response, 
notwithstanding unsolicited METADATA responses as defined by the RFC in 
section after a client issues an ENABLE command with the METADATA 
capability keyword.

Fifth, 'search' is more likely to be a shared annotation (value) 
whereas 'color' is most likely set once in the shared annotation (value) 
for the default, but edited in the private annotation (value).

> that would be
> fairly hard for the support department to track and resolve,

That depends on logging (verbosity) capabilities more so then using 
separate annotations or one annotation.

>>    - For each annotation, the shared as well as the private values
>
> ...if the annotation is defined private and shared for this use case.
>

No, for each annotation, the shared as well as the private - if only 
the shared or private has been defined in the specification, it needs to 
be cleaned up. You're right when your point was only *conflicts* need to 
be searched for in those shared and/or private annotations defined in 
the specification. That makes it an x*y(y-1)/2 mesh then, where x is one 
or two and y is the number of annotations defined.

>>    - With one annotation any potential conflict can be detected both
>> when merely 'visiting' the folder as well as when attempting to 
>> 'alter'
>> the folder, whereas with multiple annotations the retrieving of all
>> annotations and values and resolving said conflicts is mandatory,
>
> Actually you only need to retrieve the annotations that pertain to
> whatever it
> is you're planning to do, e.g. a change of color can ignore search.
>

Wrong, as there may be annotations a client is unaware of, that may be 
conflicting with annotations the client is planning on setting. Such can 
be easily circumvented by having the client poll for configuration of 
the folder in one location, where 'type_*' style keywords with '*' 
representing a certain capability can simply be found.

> Only with the large annotations you always must read everything.
>

Wrong, the client *retrieves* the full annotation value but only needs 
to read;

  - the top-level keys (iterate those) to find;

    - keys it understands,
    - keys it doesn't understand ('type_*' style keys?), for which it 
can derive whether or not such keys may be conflicting (naming 
convention).

>>     If a client not compatible with 'search' specifically where to 
>> be
>> able to detect (potential for) conflict, it;
>>
>>       1) would not know to retrieve the '/vendor/kolab/search'
>> annotation, but
>>          - it would also not know what /vendor/kolab/folder-type
>> 'search' was for, and
>>          - any potentially pre-populated search data is completely
>> wasted on said client.
>
> Yes, although this is equally true for the large annotation, as this
> is about
> the new folder-type idea in KEP 15, and not the question whether or
> not to go
> with one annotation per use case or one large annotation.
>

Wrong, this is not equally true for the "large" annotation, as with the 
large annotation (the client now aware of where to find folder specific 
configuration) the folder-type is freed up and *can* be used for the 
original object type, with pre-population allowing said client to still 
use the saved search - note, *can* be used for the original object type, 
not *must* be used, perhaps it does have a different value, such as 
'mixed' for example.

> If a client does not know about KEP 15, the 'search' annotation and 
> the
> 'search' value in folder-config are both equally lost on the client,
> there are
> no obvious advantages or disadvantages to either.
>
> As to the new folder-type, if the client does not know about KEP 15, 
> it also
> does not know that it *MUST NEVER* change objects in a prepopulated
> folder, so
> it would happily allow the user to do this, enabling diverging 
> datasets,
> inconsistencies, and lost data.
>

Changing (copies of) objects (only applicable if folder is 
pre-populated) in saved search folders by clients not compatible with 
KEP #15 is mixture of implementation detail and a saved search folder 
permission problem, resolved by restricting the user to not allow 
editing the contents of said folder at all *if* and *only if* such 
folder is to be pre-populated at all, being worked around in a fashion 
that makes any folder implementing any of these new features be ignored 
by the client -which it never does completely?

>> There is a locking mechanism in place for folder annotations, 
>> similar
>> to the locking mechanism on IMAP folders, contents and metadata such 
>> as
>> flags.
>
> How exactly does it work?
>

http://git.cyrusimap.org/cyrus-imapd/tree/imap/imapd.c#n8019

http://git.cyrusimap.org/cyrus-imapd/tree/imap/imapd.c#n8214

> I guess we then would need to specify that a client would always have 
> to do
>
> 	#1: Lock
> 	#2: Re-Read
> 	#3: Modify
> 	#4: Write
> 	#5: Unlock
>
> to safely modify a folder-config annotation.

come on...; "to safely modify *any* annotation", or better yet, "to 
safely execute *any* IMAP operation".

> The additional read is a bit of
> network overhead & delay, but probably not prohibitive in most 
> scenarios.
>

If this is so much of a concern, I suspect we would have seen a lot 
more issues logged stating the exact problem as is described could be a 
problem, both with Kolab as well as Cyrus IMAP upstream. In fact, one 
could consider it an IMAP design flaw.

>> How large an annotation is exactly depends on a variety of factors
>> including but not limited to the complexity and brevity of a query
>> language for search, which is yet to be explored / defined.
>
> It depends on that and on what else will then go into this annotation 
> in the
> future provided we define this as the canonical way. So I see 
> potential for
> this annotation to grow beyond 10k easily.
>

Well, from the top of my head;

  - Identity configuration (reply/respond with 'sales at kolabsys.com' 
identity as opposed to 'greve at kolabsys.com', ...)
  - Favourite folder (boolean),
  - local subscription (per application, do we dare do this?),
  - alarm / reminder configuration,
  - z-push / active-sync,
  - horde,
  - ... (other clients)

and whatever else we can come up with has a purpose / use-case valuable 
enough to pursue. Adding annotations for each of these 1) on the server, 
2) in every client and 3) in the documentation is more difficult of a 
process then if we were to outline a key-value pair in an existing 
annotation.

>> That said, however, all annotations need to be retrieved regardless,
>> for both private and shared.
>
> True.
>
> But only those you actually need at the time, not all of them all the 
> time.
>

Again, firstly, with many annotations retrieving any annotations is 
subject to the client's understanding of which annotations are available 
and the server's understanding of what are valid annotations.

Secondly, with many annotations, in order to be able to determine 
whether or not there is a potential conflict, and in order to be able to 
determine which takes precedence when content is retrieved for display, 
all annotations will need to be retrieved.

Now, so far I've only heard the following arguments against a single 
folder-config annotation;

- simultaneous editing - exists for every single operation in IMAP,
- no ability to opt-out server-side - not a requirement / feature 
request,

And one against *potentially* preserving the original folder-type 
value;

- search results potentially having mixed object types as results - but 
'folder-type' MAY still have another annotation value such as 'mixed' or 
'random', so that clients that do not know how to display the contents 
of folders with multiple object types in it MUST / SHOULD / MAY ignore 
the folder entirely.

Am I missing something?

-- 
Kind regards,

Jeroen van Meeuwen

-- 
Senior Engineer, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
t: +44 144 340 9500
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08