[Kolab-devel] 10.000 events in a Resource Calendar

Tue May 22 16:08:00 CEST 2012

Am Dienstag, 22. Mai 2012, 13:26:23 schrieb Jeroen van Meeuwen:

Hi Jeroen,

> > No this is not a flaw in any way. A delete operation is handled
> > exactly like
> > and together with write operations. (E.g. EVERY modify is actually a
> > delete+write operation by nature of how Kolab storage works)
> 
> Let's take a step back, because we're confusing the issue in OP.

What does 'OP' mean?

> The following *actually* happens when an event is deleted (whether "the
> idea behind 0(1)" design or not);
> 
> - Adding or editing an event to a calendar obviously adds a new object
> to IMAP.

Correct.

> - To remove an event from a calendar, the message could be flagged
> \Deleted in IMAP, and (possibly) the folder is expunged (doesn't
> matter),

Yes. A remove is mapped to a \Deleted flag. Why do you consider the obvious 
worthwile to mention? The way how a delete is actually syntactically 
implemented in IMAP does not really matter here. The actual IMAP4 spec uses 
this implementation in order to make the very common delete operation extremly 
fast. (It is fast because a delete does not actually change anything in the 
store except setting a flag. Setting a flag avoids extra seeks and filesystem 
overhead. In order to not only semantically but really deleten an IMAP message 
a potentially very expensive EXPUNGE command is required.

>    - This is *not* a write operation that adds a new object to IMAP.

Yes, it is not a write operation but a delete operation. Delete is technically 
implemented via setting a delete flag in IMAP. Why does this matter with 
regards to scalability of a resource calendar?

>    It
> does bump UIDVALIDITY

No, this is plain wrong. Please reread the IMAP4rev1 RFC 3501 
http://tools.ietf.org/html/rfc3501#section-2.3.1.1. There it is explained that 
the UIDVALIDITY has nothing to do with neither adding nor removing items from 
an IMAP folder.

The UIDVALIDITY is a property of a folder not of a message. Historically the 
UIDVALIDITY was implemented in order make the following uncommon procedure 
save:

1. Folder "foldername" created
2. Folder "foldername" populated with messages e.g. UID 1,2,3,4,5
3. Client A synchronises with "foldername"
4. Client B deletes messages with UID 1,2,3,4,5
5. Client B removes folder "foldernamme"
6. Client B creates folder "foldernamme"
7. Client B populates folder "foldernamme" with messages e.g. UID 1,2,3,4,5
8. Client A checks folder "foldernamme" and does not detect that actually the 
messages with the previously existing UIDs did change. (It correctly assumes 
that there is no modify)

Solution:
Whenenver a folder is created it does not only get a unique foldername but 
also a unique UIDVALIDITY in such a manner that the tupel (foldername, 
UIDVALIDITY) is unique for every installation.

In other works UIDVALIDITY allows that the tripple (foldername, UIDVALIDITY, 
UID) is immutable for any IMAP installation!

Such an immutability guarantee is the foundation for correctness and 
scalability.

> , but... see below.
> 
> - The *client* is to trigger the Free/Busy update,

Yes, this is implemented this way in order to keep the patchset small and make 
the Kolab solution work with any unmodified standards compliant IMAP4 server.

(An alternative would be to extend either IMAP4 syntax or IMAP4 semantics.)

> - CONDSTORE (required for UIDVALIDITY) is not enabled on Kolab 2.3
> (Cyrus IMAP 2.3) mailboxes by default,

Sorry, this is technically plain wrong. CONDSTORE is no prerequisite of 
UIDVALITITY.

CONDSTORE is defined in RFC 4551 (June 2006, years after Kolab was designed) 
which happens to be much younger than UIDVALIDITY which is already defined in 
RFC 2683 (September 1999).

> - The Free/Busy mechanism has little to hold on to, to see what has
> changed, unless it maintains a local cache of at least the UIDs of the
> message it used when it last generated the (partial) Free/Busy,

Keeping such a cache for optimisation purposes is trivial and common practice. 
Actually it is not required for a scalable solution but this fact is a minor 
detail which could be discussed seperately. The size of the cache is a 
negletable simple list of 32bit Integers e.g. 40K in the case of 10.000 
events.

> - Retrieval of relevant events to the relevant period in time could be
> made faster using sorting and retrieving the newest objects first,

This is common practise and trivial but doing sorting is plain wrong and slow.

A sorting approach is a typical relational database approach. There is NO need 
to do any sorting if you leverage upon the IMAP protocol.

IMAP guarantees strong monotonous increasing UID values. Due to the fact that 
IMAP does NOT know a modify every modified or new event results in a new IMAP 
message which happens to have a UID > LASTSEENUID. (For briefity I will not 
get into the details of removal).

Therefore the simple rule that a "FETCH LASTSEEN+1:*" is sufficient.

> - The client triggering Free/Busy does not simply HEAD a URL and
> disconnects

No this claim is wrong, ofcourse this is the case up to today.

> , as this would impede the slice of time any web server code
> has available to do what it needs to do. Therefore, a client keeps open
> the connection (and uses GET/POST) until the web server performing the
> Free/Busy updating is done. This is considered a blocking operation for
> clients that cannot do this in the background.

This is wrong. Please look at the code.

> >> Euh, as far as I know, it is the client software that triggers an
> >> update of the free/busy, and not the Kolab server itself, and unless
> >> the
> >> client is multi-threaded like Kontact it is also a blocking
> >> operation.
> > 
> > Sorry this is non-sense.
> 
> Thank you for your balanced and well-formulated opinion.

I am sorry but how else should I call it. This is not an opinion but a trivial 
provable fact that for every Kolab client the trigger by its very nature is 
non blocking. After all it is a trigger.

> As I've illustrated before, it's not like Kolab uses FPM or any other
> FastCGI-like implementation, 

Don't think in terms of a web developer. Kolab does not require any of those 
implementations in order have non blocking fb generation. (The current 
implementation uses a daemon approach in order to avoid extra patching of 
upstream resources. Though this is an implementation detail)

> and it's not like the client can simple
> HEAD a URI and be done with it (close the connection).

But this is exactly what happens. Therefore I call you assumptions and claims 
nonsense.

> > The main point here is that the Client trigger the update of the
> > partial
> > freebusy data but they never wait nor block. (A trigger is simply an
> > http call which immediately returns and hints the server that the
> > freebusy needs to be
> > updated)
> 
> It only that were true. It sounds very good in theory, but theory is a
> place up north in Narnia. In the real world, there is nothing that
> "hints" the server and nothing to follow up on such "hint". It is the
> client that is actively involved and waiting for the Free/Busy
> information to be updated as part of the trigger URI it is hitting.

I will stop now arguing. Please check with the code.

> > Sorry, but you really got things wrong. The basic idea behind Kolab
> > is NOT to think in terms of a relational database including terms of
> > doing queries all
> > the time.
> > 
> > This is the essential clue behind Kolab that it is so extremly
> > scalable.
> > 
> > Introducing all these "query" concepts will lead to loosing this
> > unique
> > property.
> 
> Well, unique != good and most certainly unique != best. At most, unique
> <> common.

In this case unique == good and I consider it insulting that you claim that 
the existing scalable solution is inferiour to your "query" approach while 
denying all evidence as seen in source and existing binaries.

IMHO query is slow, has scalability issues and should be avoided when 
possible. 

Leveraging upon guaranteed protocal semantics is good practise upwards 
compatible. On the the other hand mapping everything towards a relational 
database even though the underlaying problem does not have relational 
properties is abuse and leads at least to scalability issues.

> To be honest, the "extremely scalable" argument is starting to get to
> be completely wasted on me.

I accept that you do not care about scalability but then please don't ask for 
answers to scalability questions like having 10.000 events in a single 
calendar.

>From my experience both scalability and security MUST be designed into a 
solution right from the beginning. Adding both later is extremly cumbersome 
and most often not really solvable in a satisfactory manner.

> Every time it is used, it is used as the ultimate argument against
> something

Most of the time it is used well founded as an argument against abusing 
traditional web technology. (E.g. large scalable web solution like facebook, 
google or twitter have moved away from traditional relational databases long 
ago.)

> , but it misses merit in that the scalability parameter to a
> Kolab deployment is never removed nor reduced by any of the developments
> or ideas to move forward. While you may disagree with that, I have to
> conclude "no-SQL storage" is being confused and arbitrarily substituted
> with "caches, possibly in SQL".

This is plain wrong. There seems to be a fundamental missunderstanding.

I hope that I could anyway provide some insight. As I lack both time and 
funding for actually working on Kolab 3 I hereby stop contributing to this 
thread. 

Maybe sometime we can meet at some conference and have a beer together after 
meeting before for about an hour in front of a black board. I am confident 
that you would then understand better what this fuzz is all about.

Yours,
-- martin

--  
e r f r a k o n
Erlewein, Frank, Konold & Partner - Beratende Ingenieure und Physiker
Sitz: Adolfstraße 23, 70469 Stuttgart, Partnerschaftsregister Stuttgart PR 126
http://www.erfrakon.com/