Getting rid of (most) annotations? (Is: Question: Individual annotations vs One large annotation)

Wed Oct 5 12:10:32 CEST 2011

Hi Georg,

thanks for arguing the case. This helps me to understand your reasoning better 
as you have more time to consider the issue.

Am Saturday, 1. October 2011 10:26:39 schrieb Georg C. F. Greve:
> On Friday 30 September 2011 22.38:36 Bernhard Reiter wrote:

> Ultimately every case where information is strongly associated to a single
> folder, and every case where potentially meta-information is changing over
> hundreds of thousands of objects, e.g. tagging the results of a search,
> would force hundreds of thousands of objects to be deleted, re-written and
> re-parsed by every client connected to that account, and some of these
> objects will be several megabytes large. This is a very heavy operation in
> comparison to noting a single annotation change.

My mental model so far did not include things like "tagging a search",
in fact I believe I do not fully understand what is being done there.
I agree that it sounds like being a bad case for in-object-configuration.

[..]

> > Yes, I would not save the configuration for an email folder within this
> > email folder. So it has to be saved in a configuration folder, probably
> > in a must- have configuration default folder if there is no other or in a
> > subfolder. But the Kolab Client knows it is there and where to find it.
> > Email client would not touch that folders, so no risk of breakage.
>
> Because a client does not know which object have references on this folder
> unless it has parsed every one of them, which otherwise may not be
> necessary,

I agree, thought my argument here is: 
The client will have to parse those configuration objects anyway
and can exclude some. E.g. all the levels from the folder in question to the 
top would need to be parsed, but not the lower levels of other branches.
Usually I would expect the data size and change frequency to be low,
compared to the rest of the data.

> and because the client may not have the permissions to adjust 
> the path of the configuration object in shared user scenarios, or may not
> even see it, your argument actually does not counter the point.

Yes, this is a difficulty.
Partly I believe there are some similiar issues in the annotation case.
E.g. I have a private configuration option attached to a folder of somebody 
else and this person then renames and moves the folder. Now the folder
might have gotten a new purpose or meaning and my configuration options
may not fit that change.

Maybe we need a notice to all readers of a folder from name and location A to 
B so they can adjust. Add an optional comment.
Anyway renames and moves of shared folder are a rare operation in my view.

> > There is a hiearchy defined in KEP9, so a setting specifically for one
> > folder does not have a conflict. Conflict resolution for concurrent
> > writes can be handled like all other conflict resolution: the next
> > interactive client preferably asks the user or decides.
>
> I know you're an avid fan of additional user interaction. 

Untrue, I completely agree with your next paragraph:

> Personally I 
> think that for cases where clients *can* behave reasonably without such
> interaction, they should do so in the interest of usability. Making a
> conceptual decision that leads to deteriorated usability is not a good
> option, in my view.

I just disagree that this is a conceptual decision having this effect.
User interaction if it cannot be avoided should be done at the right
point. We are trying to find those parts of the concept.

> But your point actually did not really respond to the point made.
>
> The XML space does not know a concept similar to namespaces which
> annotations provide, and trying to emulate it becomes *very* complex.

I'm probably not grasphing this fully, but using the imap namespace
paths and then the annotation namespace seems to be straight forward
and in the same level of complexity.

> > A subfolder or the default configuration folder would be easy to parse.
> > Quite likely a client would do this close to startup on this account,
> > because the prefered and recommended mode of operation is
> > a fully cached client, [...]
>
> I understand that your points are from the perspective of a particular
> fully cached client and the design decisions it made in its data handling
> backend which might require a change to efficiently deal with annotations.

I'm trying to make my arguments without this perspective.

> But this is *not* the only client, and most users want a web client, as
> well as mobile phone connectivity, both of which are *not* fully cached
> clients and which would suffer from pushing this limitation of one client
> into the server through such a conceptual change.

To me the mobile phone and web clients should be full blown clients, which 
caches of the necessary information, to keep the concept of an I/O bound slim 
server. At least that was the original idea and I still believe it is a good 
idea here as well.

> > The main point is that in a setting where a user accesses multiple email
> > servers at once from one client, some of them will not be Kolab Servers.
>
> If that server is not a Kolab server, it does not need to support Kolab
> functionality, which is all we're discussing.
>
> > The chance is very high to hit IMAP servers that do not support the
> > annotations like Kolab Server does. Getting rid of annotations makes all
> > these servers accessible as additional Kolab storage and would thus
> > make Kolab much more attractive.
>
> Naturally such servers would not have all the server-side functionality,
> such as free-busy, web client, mobile phone synchronization, resource
> handling, invitation parsing and so on and so forth. So the question
> becomes: Why would someone want to use such a server when a Kolab servers
> is just a click away?

For various reasons, many users are still bound with an IMAP server they do 
not control, e.g. by their work environment, privately or in other 
organisations at a hoster. For them it can be helpful if they can save some 
of their objects there and still use full blown Kolab Clients. 

> This kind of server would be a brain-dead data dump, with no way to know
> folder types from one another, as that requires annotations in the Kolab
> concept. Getting rid of all annotations would in fact break the concept on 
> a very fundamental level.

Well if you manually mark a folder as being "calendar" this already worked.
(This was necessary with some version of the synckolab addition to Lightning.)
In the non-annotation world a Kolab Client would just need to find out about 
the top-level configuration folder and then could find everything by itself.
In fact I guess the toplevel configuration could be scanned easily, we could 
give it an object saying that it is. So no user interaction necessary, also 
no types saved in the annotations.

> The need for doing this through such a mechanism was a strong enough
> requirement that a pre-final version of ANNOTATEMORE was implemented into
> Cyrus. So if the concept had known an efficient and robust way of doing
> this without annotations, I am sure it would have done so when there were
> no annotations available.

AFAIK the Cyrus people tried a number of approaches to the configuration 
question and never were really happy with it. One reason the METADATA
things have moved so slowly someone could presume.

> Meanwhile METADATA is an RFC and a standard functionality in all the
> servers we are interested in. It is the basis of various advanced
> functionalities, such as special-use and such, which are being pushed by
> big players on their server sides and currently find their way into the
> various clients.
>
> In fact, special-use will likely be underpinning advanced use cases such as
> the new CalDav server in Cyrus, and be used to identify folder types, so to
> know what is a Calendar folder. Sounds familiar?

Can you point me to a brief description of "special-use" in this contexts.

> In other words: By moving away from annotations, we'd not only be breaking
> the Kolab concept, we'd also be moving away from the standardized
> approaches to dealing with this kind of data at a time when the standard is
> moving in our direction.

This mainly is an authoritative argument as I read it. You could as well say: 
If we were a step ahead of the big players and standards, why not go to the 
next step while them are still chewing up the last. ;) Okay that was 
tonque-in-cheek... I am happy to see more IMAP servers moving towards usuale 
standards here.

> To do so for a very marginal fringe use case (you said this use case was
> the main argument) seems unwise.

To me it is not a fringe case, but maybe I am too pessimistic here having 
dealt with annotation issues for many years and not seeing the progress.
I still estimate that even if recent versions of Cyrus IMAPD and Dovecot 
have the full blown stuff implemented, there will be many providers and 
distributions not using it for a while. So if this is slow, we could make 
ourself independent of that timelime and broaden the user base.

Another advantage of an annotate-free or less configuration would be that we 
could decouple the configuration more from the storage layer. Or we could 
potentially use more storage source like upcoming owncould or other 
email-free services. But that is speculating a bit as well.

Best,
Bernhard

-- 
Managing Director + Owner: www.Intevation.net <- A Free Software Company
Kolabsys.com: Board Member          FSFE.org: Founding GA Member
Intevation GmbH, Osnabrück, DE; Amtsgericht Osnabrück, HRB 18998
Geschäftsführer Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.kolab.org/pipermail/format/attachments/20111005/d4165a3d/attachment.sig>