[Kolab-devel] 10.000 events in a Resource Calendar

Tue May 22 18:29:35 CEST 2012

On 2012-05-22 15:08, Martin Konold wrote:
> Am Dienstag, 22. Mai 2012, 13:26:23 schrieb Jeroen van Meeuwen:
>
> Hi Jeroen,
>
>> > No this is not a flaw in any way. A delete operation is handled
>> > exactly like
>> > and together with write operations. (E.g. EVERY modify is actually 
>> a
>> > delete+write operation by nature of how Kolab storage works)
>>
>> Let's take a step back, because we're confusing the issue in OP.
>
> What does 'OP' mean?
>

Original Post(er), which was about possibly finding ways to increase 
the efficiency when operating raw IMAP.

>> The following *actually* happens when an event is deleted (whether 
>> "the
>> idea behind 0(1)" design or not);
>>
>> - Adding or editing an event to a calendar obviously adds a new 
>> object
>> to IMAP.
>
> Correct.
>
>> - To remove an event from a calendar, the message could be flagged
>> \Deleted in IMAP, and (possibly) the folder is expunged (doesn't
>> matter),
>
> Yes. A remove is mapped to a \Deleted flag. Why do you consider the 
> obvious
> worthwile to mention?

I'm merely adding the background of a line of thought, or my part of it 
anyway, that brought us to where we are in this part of the 
conversation.

>> It does bump UIDVALIDITY
>
> No, this is plain wrong.
>

You're right, that is plain wrong. I'm sorry, I meant HIGHESTMODSEQ, 
not UIDVALIDITY.

>> , but... see below.
>>
>> - The *client* is to trigger the Free/Busy update,
>
> Yes, this is implemented this way in order to keep the patchset small
> and make the Kolab solution work with any unmodified standards 
> compliant IMAP4 server.
>
> (An alternative would be to extend either IMAP4 syntax or IMAP4 
> semantics.)
>
>> - CONDSTORE (required for UIDVALIDITY) is not enabled on Kolab 2.3
>> (Cyrus IMAP 2.3) mailboxes by default,
>
> Sorry, this is technically plain wrong.
>

Yes, indeed, you're right again. A daisy chain of errors, I spat onto 
this list. My apologies, again.

>> - The Free/Busy mechanism has little to hold on to, to see what has
>> changed, unless it maintains a local cache of at least the UIDs of 
>> the
>> message it used when it last generated the (partial) Free/Busy,
>
> Keeping such a cache for optimisation purposes is trivial and common
> practice. Actually it is not required for a scalable solution but 
> this fact is a minor
> detail which could be discussed seperately. The size of the cache is 
> a
> negletable simple list of 32bit Integers e.g. 40K in the case of 
> 10.000
> events.
>
>> - Retrieval of relevant events to the relevant period in time could 
>> be
>> made faster using sorting and retrieving the newest objects first,
>
> This is common practise and trivial but doing sorting is plain wrong
> and slow.
>
> A sorting approach is a typical relational database approach. There
> is NO need to do any sorting if you leverage upon the IMAP protocol.
>

Server-side sorting obviously does leverage the IMAP protocol.

> IMAP guarantees strong monotonous increasing UID values. Due to the
> fact that IMAP does NOT know a modify every modified or new event 
> results in a
> new IMAP message which happens to have a UID > LASTSEENUID. (For 
> briefity I will not
> get into the details of removal).
>
> Therefore the simple rule that a "FETCH LASTSEEN+1:*" is sufficient.
>

Naturally.

>> - The client triggering Free/Busy does not simply HEAD a URL and
>> disconnects
>
> No this claim is wrong, ofcourse this is the case up to today.
>

I'm not sure this parses. The claim is wrong but it is the case 
to-date?

>> , as this would impede the slice of time any web server code
>> has available to do what it needs to do. Therefore, a client keeps 
>> open
>> the connection (and uses GET/POST) until the web server performing 
>> the
>> Free/Busy updating is done. This is considered a blocking operation 
>> for
>> clients that cannot do this in the background.
>
> This is wrong. Please look at the code.
>

The code of... the fbview Horde fork server-side, or the client (and if 
the latter, which client?).

>> >> Euh, as far as I know, it is the client software that triggers an
>> >> update of the free/busy, and not the Kolab server itself, and 
>> unless
>> >> the
>> >> client is multi-threaded like Kontact it is also a blocking
>> >> operation.
>> >
>> > Sorry this is non-sense.
>>
>> Thank you for your balanced and well-formulated opinion.
>
> I am sorry but how else should I call it. This is not an opinion but
> a trivial provable fact that for every Kolab client the trigger by 
> its very nature is
> non blocking. After all it is a trigger.
>
>> As I've illustrated before, it's not like Kolab uses FPM or any 
>> other
>> FastCGI-like implementation,
>
> Don't think in terms of a web developer. Kolab does not require any 
> of those
> implementations in order have non blocking fb generation. (The 
> current
> implementation uses a daemon approach in order to avoid extra 
> patching of
> upstream resources. Though this is an implementation detail)
>

Could you please explain this statement? As far as I know, there is no 
daemon whatsoever. Perhaps you could point out the package that is 
responsible for deploying such freebusy daemon in the sources of Kolab 
2.3.4?

   http://files.kolab.org/server/release/kolab-server-2.3.4/sources/

>> and it's not like the client can simple
>> HEAD a URI and be done with it (close the connection).
>
> But this is exactly what happens. Therefore I call you assumptions
> and claims nonsense.
>

Right.

>> > Sorry, but you really got things wrong. The basic idea behind 
>> Kolab
>> > is NOT to think in terms of a relational database including terms 
>> of
>> > doing queries all
>> > the time.
>> >
>> > This is the essential clue behind Kolab that it is so extremly
>> > scalable.
>> >
>> > Introducing all these "query" concepts will lead to loosing this
>> > unique
>> > property.
>>
>> Well, unique != good and most certainly unique != best. At most, 
>> unique
>> <> common.
>
> In this case unique == good and I consider it insulting that you 
> claim that
> the existing scalable solution is inferiour to your "query" approach 
> while
> denying all evidence as seen in source and existing binaries.
>

Insulting you certainly wasn't my intention, so I'm sorry if I did.

> IMHO query is slow, has scalability issues and should be avoided when
> possible.
>
> Leveraging upon guaranteed protocal semantics is good practise 
> upwards
> compatible. On the the other hand mapping everything towards a 
> relational
> database even though the underlaying problem does not have relational
> properties is abuse and leads at least to scalability issues.
>

Well, I'd appreciate some elaboration on the insights that;

- "queries are slow",

- databases are not scalable,

>> To be honest, the "extremely scalable" argument is starting to get 
>> to
>> be completely wasted on me.
>
> I accept that you do not care about scalability but then please don't
> ask for answers to scalability questions like having 10.000 events in 
> a single
> calendar.
>

Oh, but don't get me wrong, please. I *do* care about scalability. I'm 
just getting tired of hearing "No, not scalable" without the background 
of why it (the suggested idea or development) is not scalable, or not 
sufficiently scalable.

> From my experience both scalability and security MUST be designed 
> into a
> solution right from the beginning. Adding both later is extremly 
> cumbersome
> and most often not really solvable in a satisfactory manner.
>

I couldn't agree more.

>> Every time it is used, it is used as the ultimate argument against
>> something
>
> Most of the time it is used well founded as an argument against 
> abusing
> traditional web technology. (E.g. large scalable web solution like 
> facebook,
> google or twitter have moved away from traditional relational 
> databases long
> ago.)
>

Nobody's setting in stone it MUST be SQLite, nor arguing *SQL is the 
only option, it may very well be Cassandra, or whatever substitute does 
the job, or nothing at all. Nobody's threatening the holy model of 
"no-SQL storage". You're completely right when you say I haven't got the 
faintest idea what all the fuzz is about.

>> , but it misses merit in that the scalability parameter to a
>> Kolab deployment is never removed nor reduced by any of the 
>> developments
>> or ideas to move forward. While you may disagree with that, I have 
>> to
>> conclude "no-SQL storage" is being confused and arbitrarily 
>> substituted
>> with "caches, possibly in SQL".
>
> This is plain wrong. There seems to be a fundamental 
> missunderstanding.
>

There being a misunderstanding was exactly my point, I'm glad you 
agree.

> I hope that I could anyway provide some insight. As I lack both time 
> and
> funding for actually working on Kolab 3 I hereby stop contributing to 
> this
> thread.
>

Well, I'm sorry about that.

> Maybe sometime we can meet at some conference and have a beer 
> together after
> meeting before for about an hour in front of a black board. I am 
> confident
> that you would then understand better what this fuzz is all about.
>

Likewise.

Kind regards,

Jeroen van Meeuwen

-- 
Systems Architect, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08