Question: Individual annotations vs One large annotation (conceptual riddle for the interested)

Thu Sep 15 17:18:57 CEST 2011

Quoting "Jeroen van Meeuwen (Kolab Systems)" <vanmeeuwen at kolabsys.com>:

> On 15.09.2011 13:05, Georg C. F. Greve wrote:
>> On Thursday 15 September 2011 11.47:49 Jeroen van Meeuwen wrote:
>>> The "cost" is not when /etc/imapd.annotations.conf needs to be
>>> altered,
>>> *if* the consumer has not edited said file. The *cost* is implied
>>> when
>>> the consumer has a copy of that file that is modified outside of
>>> package
>>> management - in which case proper packaging methods will not want to
>>> alter the file's contents.
>>
>> True, although I guess we will never be at the point where we won't
>> define
>> *any* additional annotations, even the folder-config annotation would
>> have to
>> be defined.
>>
>> So there will always be SOME edits of the annotation file.
>>
>
> Yes, but pushing them out regularly just because there will always be
> some edits to the annotations file anyway would be a flawed
> justification to just ignore the cost as a downside of this option,
> while we have other options - so I had to point it out.
>
>>>    - Documenting the ability to opt-out of features by removing the
>>> annotation, and documenting opting in, including all combinations of
>>> annotation keys and values, troubleshooting for and resolving issues
>>> with clients that may or may not assume a certain set of annotations
>>> to
>>> (not) be available,
>>
>> True.
>>
>> Although the ability to opt-out or block a certain feature
>> installation wide
>> in a reliable way would be an argument for the annotation per use
>> case,
>> because an annotation that is not defined is simply not usable,
>> whereas a
>> key/value pair for folder-config can be set regardless of whether it
>> is meant to be set or not.
>>
>
> Opting in and out of features such as saved searches or color though
> has not been made a requirement or feature request as of yet. Should it
> become a requirement or feature request, we can run in different circles
> about its feasibility to implement either server-side or client-side and
> the value of the -then to be explored- use-cases.
>
>>> For the overwriting part, it is a relatively simple clause to, on a
>>> single annotation, preserve the existing contents;
>>
>> This is not the case that I was concerned about.
>>
>> Think of the following:
>>
>> 	Client A reads folder-config
>>
>> 	Client B reads folder-config
>>
>> 	Client A sets 'search' to new value
>>
>> 	Client B sets 'color' to new value
>>
>> The modification of 'search' by Client A is now lost in a way
>
> First of all, the same argument would apply to both clients operating
> either the 'search' or the 'color' separate annotation -if in the same
> namespace.
>
> Second, a more relevant concern would apply to both clients operating
> what will end up being mutually exclusive, separate annotations or
> mutually exclusive options in the values of separate annotations. The
> risk here is greater, because it takes longer to obtain all annotations,
> 2*n(n-1)/2 parse them (and detect conflicts), increasing the interval in
> which another client could supposedly change the original value of the
> annotation or any other annotation.
>
> Third, the same would apply to client A 'deleting' a message client B
> is 'flagging' or any other combination of such.
>
> Fourth, while the one client is writing (to the mailbox path in the
> annotations database), the other client will get a big fat NO response
> -if it attempts to write at the very same moment- as the mailbox
> annotation database would be locked for the submission of entry by the
> one client. If the other client is stupid enough to use yesterday's
> annotations, or without polling for updates after a "NO" response,
> notwithstanding unsolicited METADATA responses as defined by the RFC in
> section after a client issues an ENABLE command with the METADATA
> capability keyword.
>
> Fifth, 'search' is more likely to be a shared annotation (value)
> whereas 'color' is most likely set once in the shared annotation (value)
> for the default, but edited in the private annotation (value).
>
>> that would be
>> fairly hard for the support department to track and resolve,
>
> That depends on logging (verbosity) capabilities more so then using
> separate annotations or one annotation.
>
>>>    - For each annotation, the shared as well as the private values
>>
>> ...if the annotation is defined private and shared for this use case.
>>
>
> No, for each annotation, the shared as well as the private - if only
> the shared or private has been defined in the specification, it needs to
> be cleaned up. You're right when your point was only *conflicts* need to
> be searched for in those shared and/or private annotations defined in
> the specification. That makes it an x*y(y-1)/2 mesh then, where x is one
> or two and y is the number of annotations defined.
>
>>>    - With one annotation any potential conflict can be detected both
>>> when merely 'visiting' the folder as well as when attempting to
>>> 'alter'
>>> the folder, whereas with multiple annotations the retrieving of all
>>> annotations and values and resolving said conflicts is mandatory,
>>
>> Actually you only need to retrieve the annotations that pertain to
>> whatever it
>> is you're planning to do, e.g. a change of color can ignore search.
>>
>
> Wrong, as there may be annotations a client is unaware of, that may be
> conflicting with annotations the client is planning on setting. Such can
> be easily circumvented by having the client poll for configuration of
> the folder in one location, where 'type_*' style keywords with '*'
> representing a certain capability can simply be found.
>
>> Only with the large annotations you always must read everything.
>>
>
> Wrong, the client *retrieves* the full annotation value but only needs
> to read;
>
>   - the top-level keys (iterate those) to find;
>
>     - keys it understands,
>     - keys it doesn't understand ('type_*' style keys?), for which it
> can derive whether or not such keys may be conflicting (naming
> convention).
>
>>>     If a client not compatible with 'search' specifically where to
>>> be
>>> able to detect (potential for) conflict, it;
>>>
>>>       1) would not know to retrieve the '/vendor/kolab/search'
>>> annotation, but
>>>          - it would also not know what /vendor/kolab/folder-type
>>> 'search' was for, and
>>>          - any potentially pre-populated search data is completely
>>> wasted on said client.
>>
>> Yes, although this is equally true for the large annotation, as this
>> is about
>> the new folder-type idea in KEP 15, and not the question whether or
>> not to go
>> with one annotation per use case or one large annotation.
>>
>
> Wrong, this is not equally true for the "large" annotation, as with the
> large annotation (the client now aware of where to find folder specific
> configuration) the folder-type is freed up and *can* be used for the
> original object type, with pre-population allowing said client to still
> use the saved search - note, *can* be used for the original object type,
> not *must* be used, perhaps it does have a different value, such as
> 'mixed' for example.
>
>> If a client does not know about KEP 15, the 'search' annotation and
>> the
>> 'search' value in folder-config are both equally lost on the client,
>> there are
>> no obvious advantages or disadvantages to either.
>>
>> As to the new folder-type, if the client does not know about KEP 15,
>> it also
>> does not know that it *MUST NEVER* change objects in a prepopulated
>> folder, so
>> it would happily allow the user to do this, enabling diverging
>> datasets,
>> inconsistencies, and lost data.
>>
>
> Changing (copies of) objects (only applicable if folder is
> pre-populated) in saved search folders by clients not compatible with
> KEP #15 is mixture of implementation detail and a saved search folder
> permission problem, resolved by restricting the user to not allow
> editing the contents of said folder at all *if* and *only if* such
> folder is to be pre-populated at all, being worked around in a fashion
> that makes any folder implementing any of these new features be ignored
> by the client -which it never does completely?
>
>>> There is a locking mechanism in place for folder annotations,
>>> similar
>>> to the locking mechanism on IMAP folders, contents and metadata such
>>> as
>>> flags.
>>
>> How exactly does it work?
>>
>
> http://git.cyrusimap.org/cyrus-imapd/tree/imap/imapd.c#n8019
>
> http://git.cyrusimap.org/cyrus-imapd/tree/imap/imapd.c#n8214
>
>> I guess we then would need to specify that a client would always have
>> to do
>>
>> 	#1: Lock
>> 	#2: Re-Read
>> 	#3: Modify
>> 	#4: Write
>> 	#5: Unlock
>>
>> to safely modify a folder-config annotation.
>
> come on...; "to safely modify *any* annotation", or better yet, "to
> safely execute *any* IMAP operation".
>
>> The additional read is a bit of
>> network overhead & delay, but probably not prohibitive in most
>> scenarios.
>>
>
> If this is so much of a concern, I suspect we would have seen a lot
> more issues logged stating the exact problem as is described could be a
> problem, both with Kolab as well as Cyrus IMAP upstream. In fact, one
> could consider it an IMAP design flaw.
>
>>> How large an annotation is exactly depends on a variety of factors
>>> including but not limited to the complexity and brevity of a query
>>> language for search, which is yet to be explored / defined.
>>
>> It depends on that and on what else will then go into this annotation
>> in the
>> future provided we define this as the canonical way. So I see
>> potential for
>> this annotation to grow beyond 10k easily.
>>
>
> Well, from the top of my head;
>
>   - Identity configuration (reply/respond with 'sales at kolabsys.com'
> identity as opposed to 'greve at kolabsys.com', ...)
>   - Favourite folder (boolean),
>   - local subscription (per application, do we dare do this?),
>   - alarm / reminder configuration,
>   - z-push / active-sync,
>   - horde,
>   - ... (other clients)
>
> and whatever else we can come up with has a purpose / use-case valuable
> enough to pursue. Adding annotations for each of these 1) on the server,
> 2) in every client and 3) in the documentation is more difficult of a
> process then if we were to outline a key-value pair in an existing
> annotation.

This list is exactly what makes me think that separate annotations are  
the better choice. All those values are pretty unrelated to each  
other. There will be clients only supporting a few of those. Having  
distinct, well defined annotation entries seems a lot more  
appropriate. And don't think the cost of defining them is that high -  
compared to the cost of implementing the corresponding features in the  
clients.

>
>>> That said, however, all annotations need to be retrieved regardless,
>>> for both private and shared.
>>
>> True.
>>
>> But only those you actually need at the time, not all of them all the
>> time.
>>
>
> Again, firstly, with many annotations retrieving any annotations is
> subject to the client's understanding of which annotations are available
> and the server's understanding of what are valid annotations.
>
> Secondly, with many annotations, in order to be able to determine
> whether or not there is a potential conflict, and in order to be able to
> determine which takes precedence when content is retrieved for display,
> all annotations will need to be retrieved.

I don't get that point. Why exactly should a client that does only  
knows two or three of the whole set of defined annotation types  
retrieve all of them?

>
> Now, so far I've only heard the following arguments against a single
> folder-config annotation;
>
> - simultaneous editing - exists for every single operation in IMAP,
> - no ability to opt-out server-side - not a requirement / feature
> request,
>
> And one against *potentially* preserving the original folder-type
> value;
>
> - search results potentially having mixed object types as results - but
> 'folder-type' MAY still have another annotation value such as 'mixed' or
> 'random', so that clients that do not know how to display the contents
> of folders with multiple object types in it MUST / SHOULD / MAY ignore
> the folder entirely.
>
> Am I missing something?

I think the size argument is valid as well. If I understood you  
correctly you would lump things like the activy sync configuration  
together with horde configuration into the folder configuration. Why  
should zpush make the effort of retrieving the Horde configuration  
(the latter data maybe really large)?

Cheers,

Gunnar

>
> --
> Kind regards,
>
> Jeroen van Meeuwen
>
> --
> Senior Engineer, Kolab Systems AG
>
> e: vanmeeuwen at kolabsys.com
> t: +44 144 340 9500
> m: +44 74 2516 3817
> w: http://www.kolabsys.com
>
> pgp: 9342 BF08
>
> _______________________________________________
> Kolab-format mailing list
> Kolab-format at kolab.org
> https://kolab.org/mailman/listinfo/kolab-format

-- 
Core Developer
The Horde Project

e: wrobel at horde.org
t: +49 700 6245 0000
w: http://www.horde.org

pgp: 9703 43BE
tweets: http://twitter.com/pardus_de
blog: http://log.pardus.de