Kolab2 architecture/format flaws

Thu May 12 23:59:50 CEST 2005

On Thu, 2005-05-12 at 22:47 +0200, Matt Douhan wrote:
> On Thursday 12 May 2005 21.18, Zachariah Mully wrote:
> > 4) Without format changes, I consider the possibility of a workable
> > *scalable* webclient, in any form, dead in Kolab2. Was a robust
> > webclient ever part of the plan for Kolab2?
> >
> 
> How large is your current installations?
> 
> We are running installations with 1000+ users with the webclient and they are 
> yet to complain about it.
> 
> I am not saying it is perfect I am simply saying that large installations 
> exists and are working.

My current installation? One person, 503 calendar entries, dual PIII500,
1G RAM, external U160 217GB SCSI array, running Debian Woody.  Minimum
load time for any page in the calendar is ~ 45sec. I had turned on
squatter for the mailboxes, but since I've found that there is zero
optimization done for retrieving the calendar entries over IMAP, I'm not
surprised this didn't help. Do you have a rough idea of how frequently
the calendar is used and the average number of entries in the
calendars? 

I know that your setup has far more CPU cycles to waste on object
processing than mine, but from what I've seen in the code, I'm not sure
that the solution should be to get a faster machine... Regardless, the
method which the webclient currently uses to pull calendar entries is
horribly inefficient any way you cut it and it's my feeling this going
to bite a lot of people in the ass as soon as they try to scale it up. 

For example, I just created a calendar folder with 3556 entries in it.
Cyrus returns the applicable message ids in <1sec (since Horde requests
the entire contents of the mailbox). Horde then spends the next 1'52"
processing the messages regardless of the fact the events are all 3
months in the past and would not have been displayed!

I guess I am a bit mystified why the designers chose to store everything
in an IMAP backend (which is fine), BUT then defined the Kolab objects
in such a way that there is no way (to my knowledge) optimize the
retrieval of objects from the backend. It's like using a database
backend but then storing all the data you need to select against in a
binary format that the database doesn't understand!

This really has the most impact on the webclient since there isn't an
good way to cache the messages, but it also means that any fat clients
are going to have do a lot of fetching and pre-processing of messages
before getting any usable information from them. 

If the developers expose the contents of the Kolab objects to Cyrus,
then Cyrus' built-in indexing and caching can be used fully. Maybe I'm
blowing this out of proportion, but I think it has the potential to
cause serious performance issues and should be addressed.

Z