[Kolab-devel] Caching in Roundcube's libkolabxml storage layer

Thomas Brüderli bruederli at kolabsys.com
Wed Apr 4 10:54:50 CEST 2012


Jeroen van Meeuwen (Kolab Systems) wrote:
> On 2012-04-03 17:34, Thomas Brüderli wrote:
>> Hello devs
>>
>> [...]
>>
>> Without a specific caching in place, the following procedure is
>> executed when reading a contact:
>>
>> kolab_storage::get_folders()
>>   (1) List all IMAP folders with their metadata and post-filter the 
>> list
>>
>> kolab_storage_folder:get_objects()
>>   (2a) List all messages of a particular IMAP folder or
>>   (2b) search messages by HEADER X-Kolab-Type ...
>>
>> For each of the returned messages or when calling
>> kolab_storage_folder:get_object(<uid>) directly, the following
>> happens:
>>
>>   (3) Fetch bodystructure, parse it and find XML part.
>>   (4) Fetch the XML part from IMAP and parse it with libkolabxml
>>   (5) Load object data into a hash array by calling dozens of getter
>> methods of the Contact/Event/DistributionList class.
> 
> I suppose the libkolabxml PHP bindings (or wrapper (PEAR?) library on 
> top of ~) should be creating this hash representation?

It's currently the wrapper classes in the libkolab Roundcube plugin (namely
kolab_format_*) doing that. One can still consider turning them into a PEAR
library later on once the Roundcube core functions are "librificated".
> 
>>   (6) Compute repeated instances of recurring events
>>
>> (...snip...) I'd strongly recommend to also
>> cache the interpreted object data in addition to the raw XML block
>> because reading from libkolabxml involves plenty of function calls
>> which are known to be rather expensive in PHP.
> 
> See previous point. I reckon the hashing is faster in C++ then it is in 
> PHP, right?

Absolutely. But in order to make it really fast one would need to write a
specific PHP module by hand. The wrappers generated by SWIG aren't really
optimized.
> 
>> So here's my proposition for a Kolab object cache data structure:
>>
>> FOLDER:	<fully qualified IAMAP folder URI>  // example: [1]
>> MSGUID:	<IMAP message UID>	// used for synchronizing
>> EXPIRE:	<date-time>		// expiration timestamp
>> UID:	<object UID>
>> TYPE:	<object type>		// contact/event/distribution-list/etc.
>> DATA:	<serialized object data>
>> XML:	<raw xml block>
>> DTSTART:	<date-time>	// event start (empty for other types)
>> DTEND:		<date-time>	// event end
>> TAGS:	<object specific keywords used for filtering>
>>
>> [1] imap://bruederli%40kolab.cc@mail.kolab.cc/INBOX/Contacts
>>
>> This structure can easily be reflected in a SQL database. If it's a
>> requirement to have different caching backends such as db, memcache,
>> file-based, etc. things become slightly more complicated but still
>> doable. Is it?
> 
> SQL will do - for now - as it's what we'd use for the Roundcube 
> database already.

The question was more whether we have to add another layer of abstraction
for the cache storage.
> 
>> Using full IMAP URIs would make it possible to re-use the cache of a
>> shared folder for multiple users who have access to that folder.
> 
> That is, unless you include the username in the (folder/message) URI.

The idea was to add the username for folders in personal and user namespace
and leave it away for shared namespace.
> 
> Let's also note that the URI can contain the exact message mime part 
> identifier for the payload, which may be another interesting parameter 
> to consider in caching.

Indeed. Thanks for the hint.
> 
> For contacts, it may be worthwhile considering storing the parts that 
> make auto-complete work separately from the raw XML payload / serialized 
> object data.

Good point. That was basically what I had in mind with fulltext searching
but auto-completion is just a reduced set of data which should be indexed.
> 
> Also, since other language do not speak PHP's serialize really well, 
> would/could you consider using json instead?

"serialized" doesn't necessarily mean PHP serialization, though the
advantage with PHP serialize is the ability to transparently handle
instances of native PHP classes. But I haven't decided yet.

~Thomas




More information about the devel mailing list