[Kolab-devel] Caching in Roundcube's libkolabxml storage layer

Wed Apr 4 11:20:46 CEST 2012

On 2012-04-04 9:54, Thomas Brüderli wrote:
> Jeroen van Meeuwen (Kolab Systems) wrote:
>> I suppose the libkolabxml PHP bindings (or wrapper (PEAR?) library 
>> on
>> top of ~) should be creating this hash representation?
>
> It's currently the wrapper classes in the libkolab Roundcube plugin 
> (namely
> kolab_format_*) doing that. One can still consider turning them into 
> a PEAR
> library later on once the Roundcube core functions are 
> "librificated".
>

Right...

On 2012-04-04 9:54, Thomas Brüderli wrote:
> Jeroen van Meeuwen (Kolab Systems) wrote:
>> On 2012-04-03 17:34, Thomas Brüderli wrote:
>>>   (6) Compute repeated instances of recurring events
>>>
>>> (...snip...) I'd strongly recommend to also
>>> cache the interpreted object data in addition to the raw XML block
>>> because reading from libkolabxml involves plenty of function calls
>>> which are known to be rather expensive in PHP.
>>
>> See previous point. I reckon the hashing is faster in C++ then it is 
>> in
>> PHP, right?
>
> Absolutely. But in order to make it really fast one would need to 
> write a
> specific PHP module by hand. The wrappers generated by SWIG aren't 
> really
> optimized.
>

Which, its associated effort aside, we're saying is worth it if 1) it 
can be done sustainably, 2) speeds up getting the PHP hash object by a 
factor of $x, right?

On 2012-04-04 9:54, Thomas Brüderli wrote:
> Jeroen van Meeuwen (Kolab Systems) wrote:
>> On 2012-04-03 17:34, Thomas Brüderli wrote:
>>> So here's my proposition for a Kolab object cache data structure:
>>>
>>> FOLDER:	<fully qualified IAMAP folder URI>  // example: [1]
>>> MSGUID:	<IMAP message UID>	// used for synchronizing
>>> EXPIRE:	<date-time>		// expiration timestamp
>>> UID:	<object UID>
>>> TYPE:	<object type>		// contact/event/distribution-list/etc.
>>> DATA:	<serialized object data>
>>> XML:	<raw xml block>
>>> DTSTART:	<date-time>	// event start (empty for other types)
>>> DTEND:		<date-time>	// event end
>>> TAGS:	<object specific keywords used for filtering>
>>>
>>> [1] imap://bruederli%40kolab.cc@mail.kolab.cc/INBOX/Contacts
>>>
>>> This structure can easily be reflected in a SQL database. If it's a
>>> requirement to have different caching backends such as db, 
>>> memcache,
>>> file-based, etc. things become slightly more complicated but still
>>> doable. Is it?
>>
>> SQL will do - for now - as it's what we'd use for the Roundcube
>> database already.
>
> The question was more whether we have to add another layer of 
> abstraction
> for the cache storage.
>

Alright, let me rephrase; I won't need such abstraction layer, I doubt 
anyone else has legitimate reasons to require we put in the effort 
*right now*.

That is to say, if we can get it (a level of abstraction), that's nice, 
but it's also completely secondary to the primary milestone of having 
*some* caching.

>>> Using full IMAP URIs would make it possible to re-use the cache of 
>>> a
>>> shared folder for multiple users who have access to that folder.
>>
>> That is, unless you include the username in the (folder/message) 
>> URI.
>
> The idea was to add the username for folders in personal and user 
> namespace
> and leave it away for shared namespace.
>

Fair enough.

>> Let's also note that the URI can contain the exact message mime part
>> identifier for the payload, which may be another interesting 
>> parameter
>> to consider in caching.
>
> Indeed. Thanks for the hint.
>

>> For contacts, it may be worthwhile considering storing the parts 
>> that
>> make auto-complete work separately from the raw XML payload / 
>> serialized
>> object data.
>
> Good point. That was basically what I had in mind with fulltext 
> searching
> but auto-completion is just a reduced set of data which should be 
> indexed.
>

Yeah, I reckon a "xml_payload LIKE '%vanmeeuwen%'" is more expensive 
then a "mail/displayname LIKE '%vanmeeuwen%'" simply because the size of 
the columns (across many, many rows) is smaller.

>> Also, since other language do not speak PHP's serialize really well,
>> would/could you consider using json instead?
>
> "serialized" doesn't necessarily mean PHP serialization, though the
> advantage with PHP serialize is the ability to transparently handle
> instances of native PHP classes. But I haven't decided yet.
>

Right, I think this is what Roundcube currently uses for message 
caching, right? I reckon this way of caching becomes "problematic" when 
the underlying class is updated - I've tried this once or twice - 
unintentionally - but in these cases the failure is mysterious and no 
upgrade path is available.

That said, I'm completely unaware of how using JSON would actually 
resolve any such "problem" ;-)

Kind regards,

Jeroen van Meeuwen

-- 
Systems Architect, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08