[Kolab-devel] Caching in Roundcube's libkolabxml storage layer
Jeroen van Meeuwen (Kolab Systems)
vanmeeuwen at kolabsys.com
Wed Apr 4 11:20:46 CEST 2012
On 2012-04-04 9:54, Thomas Brüderli wrote:
> Jeroen van Meeuwen (Kolab Systems) wrote:
>> I suppose the libkolabxml PHP bindings (or wrapper (PEAR?) library
>> on
>> top of ~) should be creating this hash representation?
>
> It's currently the wrapper classes in the libkolab Roundcube plugin
> (namely
> kolab_format_*) doing that. One can still consider turning them into
> a PEAR
> library later on once the Roundcube core functions are
> "librificated".
>
Right...
On 2012-04-04 9:54, Thomas Brüderli wrote:
> Jeroen van Meeuwen (Kolab Systems) wrote:
>> On 2012-04-03 17:34, Thomas Brüderli wrote:
>>> (6) Compute repeated instances of recurring events
>>>
>>> (...snip...) I'd strongly recommend to also
>>> cache the interpreted object data in addition to the raw XML block
>>> because reading from libkolabxml involves plenty of function calls
>>> which are known to be rather expensive in PHP.
>>
>> See previous point. I reckon the hashing is faster in C++ then it is
>> in
>> PHP, right?
>
> Absolutely. But in order to make it really fast one would need to
> write a
> specific PHP module by hand. The wrappers generated by SWIG aren't
> really
> optimized.
>
Which, its associated effort aside, we're saying is worth it if 1) it
can be done sustainably, 2) speeds up getting the PHP hash object by a
factor of $x, right?
On 2012-04-04 9:54, Thomas Brüderli wrote:
> Jeroen van Meeuwen (Kolab Systems) wrote:
>> On 2012-04-03 17:34, Thomas Brüderli wrote:
>>> So here's my proposition for a Kolab object cache data structure:
>>>
>>> FOLDER: <fully qualified IAMAP folder URI> // example: [1]
>>> MSGUID: <IMAP message UID> // used for synchronizing
>>> EXPIRE: <date-time> // expiration timestamp
>>> UID: <object UID>
>>> TYPE: <object type> // contact/event/distribution-list/etc.
>>> DATA: <serialized object data>
>>> XML: <raw xml block>
>>> DTSTART: <date-time> // event start (empty for other types)
>>> DTEND: <date-time> // event end
>>> TAGS: <object specific keywords used for filtering>
>>>
>>> [1] imap://bruederli%40kolab.cc@mail.kolab.cc/INBOX/Contacts
>>>
>>> This structure can easily be reflected in a SQL database. If it's a
>>> requirement to have different caching backends such as db,
>>> memcache,
>>> file-based, etc. things become slightly more complicated but still
>>> doable. Is it?
>>
>> SQL will do - for now - as it's what we'd use for the Roundcube
>> database already.
>
> The question was more whether we have to add another layer of
> abstraction
> for the cache storage.
>
Alright, let me rephrase; I won't need such abstraction layer, I doubt
anyone else has legitimate reasons to require we put in the effort
*right now*.
That is to say, if we can get it (a level of abstraction), that's nice,
but it's also completely secondary to the primary milestone of having
*some* caching.
>>> Using full IMAP URIs would make it possible to re-use the cache of
>>> a
>>> shared folder for multiple users who have access to that folder.
>>
>> That is, unless you include the username in the (folder/message)
>> URI.
>
> The idea was to add the username for folders in personal and user
> namespace
> and leave it away for shared namespace.
>
Fair enough.
>> Let's also note that the URI can contain the exact message mime part
>> identifier for the payload, which may be another interesting
>> parameter
>> to consider in caching.
>
> Indeed. Thanks for the hint.
>
>> For contacts, it may be worthwhile considering storing the parts
>> that
>> make auto-complete work separately from the raw XML payload /
>> serialized
>> object data.
>
> Good point. That was basically what I had in mind with fulltext
> searching
> but auto-completion is just a reduced set of data which should be
> indexed.
>
Yeah, I reckon a "xml_payload LIKE '%vanmeeuwen%'" is more expensive
then a "mail/displayname LIKE '%vanmeeuwen%'" simply because the size of
the columns (across many, many rows) is smaller.
>> Also, since other language do not speak PHP's serialize really well,
>> would/could you consider using json instead?
>
> "serialized" doesn't necessarily mean PHP serialization, though the
> advantage with PHP serialize is the ability to transparently handle
> instances of native PHP classes. But I haven't decided yet.
>
Right, I think this is what Roundcube currently uses for message
caching, right? I reckon this way of caching becomes "problematic" when
the underlying class is updated - I've tried this once or twice -
unintentionally - but in these cases the failure is mysterious and no
upgrade path is available.
That said, I'm completely unaware of how using JSON would actually
resolve any such "problem" ;-)
Kind regards,
Jeroen van Meeuwen
--
Systems Architect, Kolab Systems AG
e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com
pgp: 9342 BF08
More information about the devel
mailing list