[Kolab-devel] 10.000 events in a Resource Calendar
Martin Konold
martin.konold at erfrakon.de
Tue May 22 16:08:00 CEST 2012
Am Dienstag, 22. Mai 2012, 13:26:23 schrieb Jeroen van Meeuwen:
Hi Jeroen,
> > No this is not a flaw in any way. A delete operation is handled
> > exactly like
> > and together with write operations. (E.g. EVERY modify is actually a
> > delete+write operation by nature of how Kolab storage works)
>
> Let's take a step back, because we're confusing the issue in OP.
What does 'OP' mean?
> The following *actually* happens when an event is deleted (whether "the
> idea behind 0(1)" design or not);
>
> - Adding or editing an event to a calendar obviously adds a new object
> to IMAP.
Correct.
> - To remove an event from a calendar, the message could be flagged
> \Deleted in IMAP, and (possibly) the folder is expunged (doesn't
> matter),
Yes. A remove is mapped to a \Deleted flag. Why do you consider the obvious
worthwile to mention? The way how a delete is actually syntactically
implemented in IMAP does not really matter here. The actual IMAP4 spec uses
this implementation in order to make the very common delete operation extremly
fast. (It is fast because a delete does not actually change anything in the
store except setting a flag. Setting a flag avoids extra seeks and filesystem
overhead. In order to not only semantically but really deleten an IMAP message
a potentially very expensive EXPUNGE command is required.
> - This is *not* a write operation that adds a new object to IMAP.
Yes, it is not a write operation but a delete operation. Delete is technically
implemented via setting a delete flag in IMAP. Why does this matter with
regards to scalability of a resource calendar?
> It
> does bump UIDVALIDITY
No, this is plain wrong. Please reread the IMAP4rev1 RFC 3501
http://tools.ietf.org/html/rfc3501#section-2.3.1.1. There it is explained that
the UIDVALIDITY has nothing to do with neither adding nor removing items from
an IMAP folder.
The UIDVALIDITY is a property of a folder not of a message. Historically the
UIDVALIDITY was implemented in order make the following uncommon procedure
save:
1. Folder "foldername" created
2. Folder "foldername" populated with messages e.g. UID 1,2,3,4,5
3. Client A synchronises with "foldername"
4. Client B deletes messages with UID 1,2,3,4,5
5. Client B removes folder "foldernamme"
6. Client B creates folder "foldernamme"
7. Client B populates folder "foldernamme" with messages e.g. UID 1,2,3,4,5
8. Client A checks folder "foldernamme" and does not detect that actually the
messages with the previously existing UIDs did change. (It correctly assumes
that there is no modify)
Solution:
Whenenver a folder is created it does not only get a unique foldername but
also a unique UIDVALIDITY in such a manner that the tupel (foldername,
UIDVALIDITY) is unique for every installation.
In other works UIDVALIDITY allows that the tripple (foldername, UIDVALIDITY,
UID) is immutable for any IMAP installation!
Such an immutability guarantee is the foundation for correctness and
scalability.
> , but... see below.
>
> - The *client* is to trigger the Free/Busy update,
Yes, this is implemented this way in order to keep the patchset small and make
the Kolab solution work with any unmodified standards compliant IMAP4 server.
(An alternative would be to extend either IMAP4 syntax or IMAP4 semantics.)
> - CONDSTORE (required for UIDVALIDITY) is not enabled on Kolab 2.3
> (Cyrus IMAP 2.3) mailboxes by default,
Sorry, this is technically plain wrong. CONDSTORE is no prerequisite of
UIDVALITITY.
CONDSTORE is defined in RFC 4551 (June 2006, years after Kolab was designed)
which happens to be much younger than UIDVALIDITY which is already defined in
RFC 2683 (September 1999).
> - The Free/Busy mechanism has little to hold on to, to see what has
> changed, unless it maintains a local cache of at least the UIDs of the
> message it used when it last generated the (partial) Free/Busy,
Keeping such a cache for optimisation purposes is trivial and common practice.
Actually it is not required for a scalable solution but this fact is a minor
detail which could be discussed seperately. The size of the cache is a
negletable simple list of 32bit Integers e.g. 40K in the case of 10.000
events.
> - Retrieval of relevant events to the relevant period in time could be
> made faster using sorting and retrieving the newest objects first,
This is common practise and trivial but doing sorting is plain wrong and slow.
A sorting approach is a typical relational database approach. There is NO need
to do any sorting if you leverage upon the IMAP protocol.
IMAP guarantees strong monotonous increasing UID values. Due to the fact that
IMAP does NOT know a modify every modified or new event results in a new IMAP
message which happens to have a UID > LASTSEENUID. (For briefity I will not
get into the details of removal).
Therefore the simple rule that a "FETCH LASTSEEN+1:*" is sufficient.
> - The client triggering Free/Busy does not simply HEAD a URL and
> disconnects
No this claim is wrong, ofcourse this is the case up to today.
> , as this would impede the slice of time any web server code
> has available to do what it needs to do. Therefore, a client keeps open
> the connection (and uses GET/POST) until the web server performing the
> Free/Busy updating is done. This is considered a blocking operation for
> clients that cannot do this in the background.
This is wrong. Please look at the code.
> >> Euh, as far as I know, it is the client software that triggers an
> >> update of the free/busy, and not the Kolab server itself, and unless
> >> the
> >> client is multi-threaded like Kontact it is also a blocking
> >> operation.
> >
> > Sorry this is non-sense.
>
> Thank you for your balanced and well-formulated opinion.
I am sorry but how else should I call it. This is not an opinion but a trivial
provable fact that for every Kolab client the trigger by its very nature is
non blocking. After all it is a trigger.
> As I've illustrated before, it's not like Kolab uses FPM or any other
> FastCGI-like implementation,
Don't think in terms of a web developer. Kolab does not require any of those
implementations in order have non blocking fb generation. (The current
implementation uses a daemon approach in order to avoid extra patching of
upstream resources. Though this is an implementation detail)
> and it's not like the client can simple
> HEAD a URI and be done with it (close the connection).
But this is exactly what happens. Therefore I call you assumptions and claims
nonsense.
> > The main point here is that the Client trigger the update of the
> > partial
> > freebusy data but they never wait nor block. (A trigger is simply an
> > http call which immediately returns and hints the server that the
> > freebusy needs to be
> > updated)
>
> It only that were true. It sounds very good in theory, but theory is a
> place up north in Narnia. In the real world, there is nothing that
> "hints" the server and nothing to follow up on such "hint". It is the
> client that is actively involved and waiting for the Free/Busy
> information to be updated as part of the trigger URI it is hitting.
I will stop now arguing. Please check with the code.
> > Sorry, but you really got things wrong. The basic idea behind Kolab
> > is NOT to think in terms of a relational database including terms of
> > doing queries all
> > the time.
> >
> > This is the essential clue behind Kolab that it is so extremly
> > scalable.
> >
> > Introducing all these "query" concepts will lead to loosing this
> > unique
> > property.
>
> Well, unique != good and most certainly unique != best. At most, unique
> <> common.
In this case unique == good and I consider it insulting that you claim that
the existing scalable solution is inferiour to your "query" approach while
denying all evidence as seen in source and existing binaries.
IMHO query is slow, has scalability issues and should be avoided when
possible.
Leveraging upon guaranteed protocal semantics is good practise upwards
compatible. On the the other hand mapping everything towards a relational
database even though the underlaying problem does not have relational
properties is abuse and leads at least to scalability issues.
> To be honest, the "extremely scalable" argument is starting to get to
> be completely wasted on me.
I accept that you do not care about scalability but then please don't ask for
answers to scalability questions like having 10.000 events in a single
calendar.
>From my experience both scalability and security MUST be designed into a
solution right from the beginning. Adding both later is extremly cumbersome
and most often not really solvable in a satisfactory manner.
> Every time it is used, it is used as the ultimate argument against
> something
Most of the time it is used well founded as an argument against abusing
traditional web technology. (E.g. large scalable web solution like facebook,
google or twitter have moved away from traditional relational databases long
ago.)
> , but it misses merit in that the scalability parameter to a
> Kolab deployment is never removed nor reduced by any of the developments
> or ideas to move forward. While you may disagree with that, I have to
> conclude "no-SQL storage" is being confused and arbitrarily substituted
> with "caches, possibly in SQL".
This is plain wrong. There seems to be a fundamental missunderstanding.
I hope that I could anyway provide some insight. As I lack both time and
funding for actually working on Kolab 3 I hereby stop contributing to this
thread.
Maybe sometime we can meet at some conference and have a beer together after
meeting before for about an hour in front of a black board. I am confident
that you would then understand better what this fuzz is all about.
Yours,
-- martin
--
e r f r a k o n
Erlewein, Frank, Konold & Partner - Beratende Ingenieure und Physiker
Sitz: Adolfstraße 23, 70469 Stuttgart, Partnerschaftsregister Stuttgart PR 126
http://www.erfrakon.com/
More information about the devel
mailing list