Question: Individual annotations vs One large annotation (conceptual riddle for the interested)

Gunnar Wrobel wrobel at horde.org
Thu Sep 15 17:18:57 CEST 2011


Quoting "Jeroen van Meeuwen (Kolab Systems)" <vanmeeuwen at kolabsys.com>:

> On 15.09.2011 13:05, Georg C. F. Greve wrote:
>> On Thursday 15 September 2011 11.47:49 Jeroen van Meeuwen wrote:
>>> The "cost" is not when /etc/imapd.annotations.conf needs to be
>>> altered,
>>> *if* the consumer has not edited said file. The *cost* is implied
>>> when
>>> the consumer has a copy of that file that is modified outside of
>>> package
>>> management - in which case proper packaging methods will not want to
>>> alter the file's contents.
>>
>> True, although I guess we will never be at the point where we won't
>> define
>> *any* additional annotations, even the folder-config annotation would
>> have to
>> be defined.
>>
>> So there will always be SOME edits of the annotation file.
>>
>
> Yes, but pushing them out regularly just because there will always be
> some edits to the annotations file anyway would be a flawed
> justification to just ignore the cost as a downside of this option,
> while we have other options - so I had to point it out.
>
>>>    - Documenting the ability to opt-out of features by removing the
>>> annotation, and documenting opting in, including all combinations of
>>> annotation keys and values, troubleshooting for and resolving issues
>>> with clients that may or may not assume a certain set of annotations
>>> to
>>> (not) be available,
>>
>> True.
>>
>> Although the ability to opt-out or block a certain feature
>> installation wide
>> in a reliable way would be an argument for the annotation per use
>> case,
>> because an annotation that is not defined is simply not usable,
>> whereas a
>> key/value pair for folder-config can be set regardless of whether it
>> is meant to be set or not.
>>
>
> Opting in and out of features such as saved searches or color though
> has not been made a requirement or feature request as of yet. Should it
> become a requirement or feature request, we can run in different circles
> about its feasibility to implement either server-side or client-side and
> the value of the -then to be explored- use-cases.
>
>>> For the overwriting part, it is a relatively simple clause to, on a
>>> single annotation, preserve the existing contents;
>>
>> This is not the case that I was concerned about.
>>
>> Think of the following:
>>
>> 	Client A reads folder-config
>>
>> 	Client B reads folder-config
>>
>> 	Client A sets 'search' to new value
>>
>> 	Client B sets 'color' to new value
>>
>> The modification of 'search' by Client A is now lost in a way
>
> First of all, the same argument would apply to both clients operating
> either the 'search' or the 'color' separate annotation -if in the same
> namespace.
>
> Second, a more relevant concern would apply to both clients operating
> what will end up being mutually exclusive, separate annotations or
> mutually exclusive options in the values of separate annotations. The
> risk here is greater, because it takes longer to obtain all annotations,
> 2*n(n-1)/2 parse them (and detect conflicts), increasing the interval in
> which another client could supposedly change the original value of the
> annotation or any other annotation.
>
> Third, the same would apply to client A 'deleting' a message client B
> is 'flagging' or any other combination of such.
>
> Fourth, while the one client is writing (to the mailbox path in the
> annotations database), the other client will get a big fat NO response
> -if it attempts to write at the very same moment- as the mailbox
> annotation database would be locked for the submission of entry by the
> one client. If the other client is stupid enough to use yesterday's
> annotations, or without polling for updates after a "NO" response,
> notwithstanding unsolicited METADATA responses as defined by the RFC in
> section after a client issues an ENABLE command with the METADATA
> capability keyword.
>
> Fifth, 'search' is more likely to be a shared annotation (value)
> whereas 'color' is most likely set once in the shared annotation (value)
> for the default, but edited in the private annotation (value).
>
>> that would be
>> fairly hard for the support department to track and resolve,
>
> That depends on logging (verbosity) capabilities more so then using
> separate annotations or one annotation.
>
>>>    - For each annotation, the shared as well as the private values
>>
>> ...if the annotation is defined private and shared for this use case.
>>
>
> No, for each annotation, the shared as well as the private - if only
> the shared or private has been defined in the specification, it needs to
> be cleaned up. You're right when your point was only *conflicts* need to
> be searched for in those shared and/or private annotations defined in
> the specification. That makes it an x*y(y-1)/2 mesh then, where x is one
> or two and y is the number of annotations defined.
>
>>>    - With one annotation any potential conflict can be detected both
>>> when merely 'visiting' the folder as well as when attempting to
>>> 'alter'
>>> the folder, whereas with multiple annotations the retrieving of all
>>> annotations and values and resolving said conflicts is mandatory,
>>
>> Actually you only need to retrieve the annotations that pertain to
>> whatever it
>> is you're planning to do, e.g. a change of color can ignore search.
>>
>
> Wrong, as there may be annotations a client is unaware of, that may be
> conflicting with annotations the client is planning on setting. Such can
> be easily circumvented by having the client poll for configuration of
> the folder in one location, where 'type_*' style keywords with '*'
> representing a certain capability can simply be found.
>
>> Only with the large annotations you always must read everything.
>>
>
> Wrong, the client *retrieves* the full annotation value but only needs
> to read;
>
>   - the top-level keys (iterate those) to find;
>
>     - keys it understands,
>     - keys it doesn't understand ('type_*' style keys?), for which it
> can derive whether or not such keys may be conflicting (naming
> convention).
>
>>>     If a client not compatible with 'search' specifically where to
>>> be
>>> able to detect (potential for) conflict, it;
>>>
>>>       1) would not know to retrieve the '/vendor/kolab/search'
>>> annotation, but
>>>          - it would also not know what /vendor/kolab/folder-type
>>> 'search' was for, and
>>>          - any potentially pre-populated search data is completely
>>> wasted on said client.
>>
>> Yes, although this is equally true for the large annotation, as this
>> is about
>> the new folder-type idea in KEP 15, and not the question whether or
>> not to go
>> with one annotation per use case or one large annotation.
>>
>
> Wrong, this is not equally true for the "large" annotation, as with the
> large annotation (the client now aware of where to find folder specific
> configuration) the folder-type is freed up and *can* be used for the
> original object type, with pre-population allowing said client to still
> use the saved search - note, *can* be used for the original object type,
> not *must* be used, perhaps it does have a different value, such as
> 'mixed' for example.
>
>> If a client does not know about KEP 15, the 'search' annotation and
>> the
>> 'search' value in folder-config are both equally lost on the client,
>> there are
>> no obvious advantages or disadvantages to either.
>>
>> As to the new folder-type, if the client does not know about KEP 15,
>> it also
>> does not know that it *MUST NEVER* change objects in a prepopulated
>> folder, so
>> it would happily allow the user to do this, enabling diverging
>> datasets,
>> inconsistencies, and lost data.
>>
>
> Changing (copies of) objects (only applicable if folder is
> pre-populated) in saved search folders by clients not compatible with
> KEP #15 is mixture of implementation detail and a saved search folder
> permission problem, resolved by restricting the user to not allow
> editing the contents of said folder at all *if* and *only if* such
> folder is to be pre-populated at all, being worked around in a fashion
> that makes any folder implementing any of these new features be ignored
> by the client -which it never does completely?
>
>>> There is a locking mechanism in place for folder annotations,
>>> similar
>>> to the locking mechanism on IMAP folders, contents and metadata such
>>> as
>>> flags.
>>
>> How exactly does it work?
>>
>
> http://git.cyrusimap.org/cyrus-imapd/tree/imap/imapd.c#n8019
>
> http://git.cyrusimap.org/cyrus-imapd/tree/imap/imapd.c#n8214
>
>> I guess we then would need to specify that a client would always have
>> to do
>>
>> 	#1: Lock
>> 	#2: Re-Read
>> 	#3: Modify
>> 	#4: Write
>> 	#5: Unlock
>>
>> to safely modify a folder-config annotation.
>
> come on...; "to safely modify *any* annotation", or better yet, "to
> safely execute *any* IMAP operation".
>
>> The additional read is a bit of
>> network overhead & delay, but probably not prohibitive in most
>> scenarios.
>>
>
> If this is so much of a concern, I suspect we would have seen a lot
> more issues logged stating the exact problem as is described could be a
> problem, both with Kolab as well as Cyrus IMAP upstream. In fact, one
> could consider it an IMAP design flaw.
>
>>> How large an annotation is exactly depends on a variety of factors
>>> including but not limited to the complexity and brevity of a query
>>> language for search, which is yet to be explored / defined.
>>
>> It depends on that and on what else will then go into this annotation
>> in the
>> future provided we define this as the canonical way. So I see
>> potential for
>> this annotation to grow beyond 10k easily.
>>
>
> Well, from the top of my head;
>
>   - Identity configuration (reply/respond with 'sales at kolabsys.com'
> identity as opposed to 'greve at kolabsys.com', ...)
>   - Favourite folder (boolean),
>   - local subscription (per application, do we dare do this?),
>   - alarm / reminder configuration,
>   - z-push / active-sync,
>   - horde,
>   - ... (other clients)
>
> and whatever else we can come up with has a purpose / use-case valuable
> enough to pursue. Adding annotations for each of these 1) on the server,
> 2) in every client and 3) in the documentation is more difficult of a
> process then if we were to outline a key-value pair in an existing
> annotation.

This list is exactly what makes me think that separate annotations are  
the better choice. All those values are pretty unrelated to each  
other. There will be clients only supporting a few of those. Having  
distinct, well defined annotation entries seems a lot more  
appropriate. And don't think the cost of defining them is that high -  
compared to the cost of implementing the corresponding features in the  
clients.

>
>>> That said, however, all annotations need to be retrieved regardless,
>>> for both private and shared.
>>
>> True.
>>
>> But only those you actually need at the time, not all of them all the
>> time.
>>
>
> Again, firstly, with many annotations retrieving any annotations is
> subject to the client's understanding of which annotations are available
> and the server's understanding of what are valid annotations.
>
> Secondly, with many annotations, in