Request for Input: Storing Searches
Georg C. F. Greve
greve at kolabsys.com
Sat Aug 27 20:39:40 CEST 2011
Dear all,
Some of us have started tossing around thoughts about how to save searches in
one Kolab client in a way that they are re-usable in all others, ideally.
When giving it some brainspace, it turns out this is not a trivial issue, for
a variety of reasons, starting with there being a tradeoff decision between
being expensive for the CPU or storage, for instance. But it is a little bit
more complex than that, actually.
Allow me to higlight a couple of scenarios with advantages and disadvantages:
- Scenario 1: Storage with a new KEP 9 based XML object
One could attempt to model this as a "search" XML object that would
incorporate the fields of the object type searched, plus some special
fields, e.g. folders to search, as well as searches across multiple fields
and search logic (AND/OR etc),
These objects would live in the regular folders for resources, and would
potentially even replace the list object in functionality, as they would
then model a list of recipients as list of address book entries, which is
something that Alain once suggested. [1]
Advantages: Fairly close to existing functionality, and likely not too
hard to implement for most clients (in comparison, at least), no data
duplication anywhere.
Disadvantages: Expensive on the CPU, Does not work on all resources
because we cannot store these XML elements in email type mailboxes.
- Scenario 2: Creation of new folder type w/KEP #9 annotation for metadata,
create one folder per saved search
In this approach we'd create a new folder of the corresponding resource
folder type for each search which would be identified as a stored search
folder by existence of the /vendor/kolab/saved-search annotation which
carries the metadata for the search in an array, e.g.
{ 'saved_search':
{ 'search_locations': 'blabla',
'params': 'blabla',
'filter': 'blabla',
'fuzzyness': 'blabla',
'async': '0'
....
}
}
and the folder would be populated with the results of the search.
This DOES mean data duplication on the client, but Cyrus does allow to
deduplicate entries on the server side, so it would not affect storage
there. I am sure something similar would be possible with Dovecot, so
we can for the moment assume data gets duplicated on the client only.
Advantages: Allows clients without search functionality to use results,
can be automatically regenerated on the server if needs be, least CPU
usage, works on email.
Disadvantages: Data duplication on the client, possible data set de-
synchronization (e.g. contact gets edited in search results, same contact
in main box and other search results boxes must be updated, this may be
hard to ensure), increases folder clutter, some folder sharing questions.
- Scenario 3: Map searches with tags
As a Kolab object, each search will carry an ID. If we were to introduce
a new email header flag in storage that can carry an arbitrary number
of tags, we could tag each object with the ID of every search that it
matches.
IMAP searching for header fields should make it comparatively easy
and fast to find all objects of type X that match a certain tag Y,
especially if we ask the server to cache this header field.
This would be complemented by a KEP 9 compatible object to describe
the search, which could then be automatically applied to new objects on
the server, or performed by the client, based on the scenario.
Advantages: Low CPU & storage requirements, allows Kolab clients to apply
a tag concept over all object types including email with potential server
side tagging of incoming email
Disadvantages: New concept, some questions around shared folders, e.g.
what if a client sees a shared object tagged with an ID for a search it
does not know because it does not have access to the folder where that
search is defined?
There may be other advantages and disadvantages that I did not list.
Please help us identify them all, so we can come to a good decision.
Likewise, if you can think of a scenario that should be considered in addition
to the ones listed here, please let me know. As for the scenarios listed,
there are two questions in particular that I wonder about:
(a) Compatibility with clients, in particular: How will this integrate (or
not) with the new Nepomuk/Akonadi KDE Kontact basis
(b) Query language: How do we best formulate/store the query in these
scenarios?
Anyhow, these are my thoughts on the matter right now.
I'd be happy to start drafting on something once we've identified which
direction things should go into, but right now I still am not sure which is
the path to go. So input is VERY appreciated.
Best regards,
Georg
[1] http://kolab.org/pipermail/kolab-format/2011-July/001415.html
--
Georg C. F. Greve
Chief Executive Officer
Kolab Systems AG
Zürich, Switzerland
e: greve at kolabsys.com
t: +41 78 904 43 33
w: http://kolabsys.com
pgp: 86574ACA Georg C. F. Greve
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 308 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.kolab.org/pipermail/format/attachments/20110827/49853f2d/attachment.sig>
More information about the format
mailing list