[Kolab-devel] Caching in Roundcube's libkolabxml storage layer

Jeroen van Meeuwen (Kolab Systems) vanmeeuwen at kolabsys.com
Tue Apr 3 23:49:54 CEST 2012


On 2012-04-03 17:34, Thomas Brüderli wrote:
> Hello devs
>
> Since server side akonadi most likely won't make it into the Kolab
> 3.0 release we have to build our own caching engine to improve access
> speed for Kolab objects in Roundcube. The initial plan was to 
> delegate
> reading, caching and synching Kolab objects to a server side akonadi
> instance and keep things it rather simple on the Roundcube side. In
> previous versions we used the entire Horde stack to fetch, cache, 
> read
> and write Kolab groupware objects with all it's known downsides. Now
> that reading and writing Kolab 3 objects using the libkolabxml PHP
> bindings basically works, it's time to get our heads into caching in
> order to speed up the listing of contacts, events and such.
>
> Let me start with a short explanation of the new libkolabxml storage
> layer we've created in Roundcube. This is how one can list all
> contacts from an annotated IMAP folder:
>
>   $folders = kolab_storage::get_folders('contact');
>   $folder = $folders[0]; // instance of kolab_storage_folder class;
>
>   foreach ($folder->get_objects() as $contact) {
>     // $contact is a hash array containing contact properties
>     // which are relevant in Roundcube
>     echo $contact['uid'] . "\t" . $contact['name'] . "\n";
>   }
>
> Fetching a contact by its UID and updating it is pretty simple, too:
>
>   $contact = $folder->get_object('<some-uid>');
>   $contact['name'] = 'John Doe';
>   $folder->save($contact);
>
> Without a specific caching in place, the following procedure is
> executed when reading a contact:
>
> kolab_storage::get_folders()
>   (1) List all IMAP folders with their metadata and post-filter the 
> list
>
> kolab_storage_folder:get_objects()
>   (2a) List all messages of a particular IMAP folder or
>   (2b) search messages by HEADER X-Kolab-Type ...
>
> For each of the returned messages or when calling
> kolab_storage_folder:get_object(<uid>) directly, the following
> happens:
>
>   (3) Fetch bodystructure, parse it and find XML part.
>   (4) Fetch the XML part from IMAP and parse it with libkolabxml
>   (5) Load object data into a hash array by calling dozens of getter
> methods of the Contact/Event/DistributionList class.

I suppose the libkolabxml PHP bindings (or wrapper (PEAR?) library on 
top of ~) should be creating this hash representation?

>   (6) Compute repeated instances of recurring events
>
> (...snip...) I'd strongly recommend to also
> cache the interpreted object data in addition to the raw XML block
> because reading from libkolabxml involves plenty of function calls
> which are known to be rather expensive in PHP.
>

See previous point. I reckon the hashing is faster in C++ then it is in 
PHP, right?

> So here's my proposition for a Kolab object cache data structure:
>
> FOLDER:	<fully qualified IAMAP folder URI>  // example: [1]
> MSGUID:	<IMAP message UID>	// used for synchronizing
> EXPIRE:	<date-time>		// expiration timestamp
> UID:	<object UID>
> TYPE:	<object type>		// contact/event/distribution-list/etc.
> DATA:	<serialized object data>
> XML:	<raw xml block>
> DTSTART:	<date-time>	// event start (empty for other types)
> DTEND:		<date-time>	// event end
> TAGS:	<object specific keywords used for filtering>
>
> [1] imap://bruederli%40kolab.cc@mail.kolab.cc/INBOX/Contacts
>
> This structure can easily be reflected in a SQL database. If it's a
> requirement to have different caching backends such as db, memcache,
> file-based, etc. things become slightly more complicated but still
> doable. Is it?
>

SQL will do - for now - as it's what we'd use for the Roundcube 
database already.

> Using full IMAP URIs would make it possible to re-use the cache of a
> shared folder for multiple users who have access to that folder.

That is, unless you include the username in the (folder/message) URI.

Let's also note that the URI can contain the exact message mime part 
identifier for the payload, which may be another interesting parameter 
to consider in caching.

For contacts, it may be worthwhile considering storing the parts that 
make auto-complete work separately from the raw XML payload / serialized 
object data.

Also, since other language do not speak PHP's serialize really well, 
would/could you consider using json instead?

Kind regards,

Jeroen van Meeuwen

-- 
Systems Architect, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08




More information about the devel mailing list