[Kolab-devel] Caching in Roundcube's libkolabxml storage layer
Thomas Brüderli
bruederli at kolabsys.com
Wed Apr 4 10:54:50 CEST 2012
Jeroen van Meeuwen (Kolab Systems) wrote:
> On 2012-04-03 17:34, Thomas Brüderli wrote:
>> Hello devs
>>
>> [...]
>>
>> Without a specific caching in place, the following procedure is
>> executed when reading a contact:
>>
>> kolab_storage::get_folders()
>> (1) List all IMAP folders with their metadata and post-filter the
>> list
>>
>> kolab_storage_folder:get_objects()
>> (2a) List all messages of a particular IMAP folder or
>> (2b) search messages by HEADER X-Kolab-Type ...
>>
>> For each of the returned messages or when calling
>> kolab_storage_folder:get_object(<uid>) directly, the following
>> happens:
>>
>> (3) Fetch bodystructure, parse it and find XML part.
>> (4) Fetch the XML part from IMAP and parse it with libkolabxml
>> (5) Load object data into a hash array by calling dozens of getter
>> methods of the Contact/Event/DistributionList class.
>
> I suppose the libkolabxml PHP bindings (or wrapper (PEAR?) library on
> top of ~) should be creating this hash representation?
It's currently the wrapper classes in the libkolab Roundcube plugin (namely
kolab_format_*) doing that. One can still consider turning them into a PEAR
library later on once the Roundcube core functions are "librificated".
>
>> (6) Compute repeated instances of recurring events
>>
>> (...snip...) I'd strongly recommend to also
>> cache the interpreted object data in addition to the raw XML block
>> because reading from libkolabxml involves plenty of function calls
>> which are known to be rather expensive in PHP.
>
> See previous point. I reckon the hashing is faster in C++ then it is in
> PHP, right?
Absolutely. But in order to make it really fast one would need to write a
specific PHP module by hand. The wrappers generated by SWIG aren't really
optimized.
>
>> So here's my proposition for a Kolab object cache data structure:
>>
>> FOLDER: <fully qualified IAMAP folder URI> // example: [1]
>> MSGUID: <IMAP message UID> // used for synchronizing
>> EXPIRE: <date-time> // expiration timestamp
>> UID: <object UID>
>> TYPE: <object type> // contact/event/distribution-list/etc.
>> DATA: <serialized object data>
>> XML: <raw xml block>
>> DTSTART: <date-time> // event start (empty for other types)
>> DTEND: <date-time> // event end
>> TAGS: <object specific keywords used for filtering>
>>
>> [1] imap://bruederli%40kolab.cc@mail.kolab.cc/INBOX/Contacts
>>
>> This structure can easily be reflected in a SQL database. If it's a
>> requirement to have different caching backends such as db, memcache,
>> file-based, etc. things become slightly more complicated but still
>> doable. Is it?
>
> SQL will do - for now - as it's what we'd use for the Roundcube
> database already.
The question was more whether we have to add another layer of abstraction
for the cache storage.
>
>> Using full IMAP URIs would make it possible to re-use the cache of a
>> shared folder for multiple users who have access to that folder.
>
> That is, unless you include the username in the (folder/message) URI.
The idea was to add the username for folders in personal and user namespace
and leave it away for shared namespace.
>
> Let's also note that the URI can contain the exact message mime part
> identifier for the payload, which may be another interesting parameter
> to consider in caching.
Indeed. Thanks for the hint.
>
> For contacts, it may be worthwhile considering storing the parts that
> make auto-complete work separately from the raw XML payload / serialized
> object data.
Good point. That was basically what I had in mind with fulltext searching
but auto-completion is just a reduced set of data which should be indexed.
>
> Also, since other language do not speak PHP's serialize really well,
> would/could you consider using json instead?
"serialized" doesn't necessarily mean PHP serialization, though the
advantage with PHP serialize is the ability to transparently handle
instances of native PHP classes. But I haven't decided yet.
~Thomas
More information about the devel
mailing list