Request for Input: Storing Searches

Tue Aug 30 11:47:30 CEST 2011

On Saturday, August 27, 2011 08:39:40 PM Georg C. F. Greve wrote:
> Dear all,
> 
> Some of us have started tossing around thoughts about how to save searches
> in one Kolab client in a way that they are re-usable in all others,
> ideally.
> 
> When giving it some brainspace, it turns out this is not a trivial issue,
> for a variety of reasons, starting with there being a tradeoff decision
> between being expensive for the CPU or storage, for instance. But it is a
> little bit more complex than that, actually.
> 
> Allow me to higlight a couple of scenarios with advantages and
> disadvantages:
> 
>  - Scenario 1: Storage with a new KEP 9 based XML object
> 
> 	One could attempt to model this as a "search" XML object that would
> 	incorporate the fields of the object type searched, plus some special
> 	fields, e.g. folders to search, as well as searches across multiple fields
> 	and search logic (AND/OR etc),
> 
> 	These objects would live in the regular folders for resources, and would
> 	potentially even replace the list object in functionality, as they would
> 	then model a list of recipients as list of address book entries, which is
> 	something that Alain once suggested. [1]
> 
> 	Advantages: Fairly close to existing functionality, and likely not too
> 	hard to implement for most clients (in comparison, at least), no data
> 	duplication anywhere.
> 
> 	Disadvantages: Expensive on the CPU, Does not work on all resources
> 	because we cannot store these XML elements in email type mailboxes.
> 
> 
>  - Scenario 2: Creation of new folder type w/KEP #9 annotation for metadata,
> create one folder per saved search
> 
> 	In this approach we'd create a new folder of the corresponding resource
> 	folder type for each search which would be identified as a stored search
> 	folder by existence of the /vendor/kolab/saved-search annotation which
> 	carries the metadata for the search in an array, e.g.
> 
> 	{ 'saved_search':
>     		{ 'search_locations': 'blabla',
>       		'params': 'blabla',
>       		'filter': 'blabla',
>       		'fuzzyness': 'blabla',
> 		'async': '0'
> 		....
>     		}
> 	}
> 
> 	and the folder would be populated with the results of the search.
> 
> 	This DOES mean data duplication on the client, but Cyrus does allow to
> 	deduplicate entries on the server side, so it would not affect storage
> 	there. I am sure something similar would be possible with Dovecot, so
> 	we can for the moment assume data gets duplicated on the client only.
> 
> 	Advantages: Allows clients without search functionality to use results,
> 	can be automatically regenerated on the server if needs be, least CPU
> 	usage, works on email.
> 
> 	Disadvantages: Data duplication on the client, possible data set de-
> 	synchronization (e.g. contact gets edited in search results, same contact
> 	in main box and other search results boxes must be updated, this may be
> 	hard to ensure), increases folder clutter, some folder sharing questions.
> 
> 
>  - Scenario 3: Map searches with tags
> 
> 	As a Kolab object, each search will carry an ID. If we were to introduce
> 	a new email header flag in storage that can carry an arbitrary number
> 	of tags, we could tag each object with the ID of every search that it
> 	matches.
> 
> 	IMAP searching for header fields should make it comparatively easy
> 	and fast to find all objects of type X that match a certain tag Y,
>  	especially if we ask the server to cache this header field.
> 
> 	This would be complemented by a KEP 9 compatible object to describe
> 	the search, which could then be automatically applied to new objects on
> 	the server, or performed by the client, based on the scenario.
> 
> 	Advantages: Low CPU & storage requirements, allows Kolab clients to apply
> 	a tag concept over all object types including email with potential server
> 	side tagging of incoming email
> 
> 	Disadvantages: New concept, some questions around shared folders, e.g.
> 	what if a client sees a shared object tagged with an ID for a search it
> 	does not know because it does not have access to the folder where that
> 	search is defined?
> 

Something along the lines of tagging the search results via the ANNOTATE 
Extension and using an XML object will probably gives us the most flexibility.

There are however some problems existing:

On the Akonadi side searches are afaik implemented as virtual folders which
are populated based on a nepomuk query(sparql). Searches don't belong to a 
single resource however, since you can search accross various resources.
This means we would have to implement that functionality from outside of the 
kolab resource I reckon.

What I think is possible is something along the lines of this:

- Add an option to search folders "Share this search", where you can choose a 
resource which supports shared searches (Kolab).

- The action would then pick the results from the search which are in this 
resource, and tag them via ANNOTATE and create an xml object with the search 
info.

>From here were going the same path if we are consumer or creator of the 
search:

- The resource creates a virtual-folder inside the resource i.e. "Shared 
Searches/Last Search" (The name can be edited and is stored to the xml 
object), based on the xml object.

- The resource populates the virtual folder with virtual items based on the 
tags.

=>  	- No data duplication 
	- items are directly editable 
	- you get only the results for which you have permission.

Also we can implement most of the code in the kolab resource, and the only 
additional thing which we need is the action which creates the saved search.

At a later stage we could even rerun the query inside the resource to find 
additional matches which were not available for the creator of the search, or 
new matches but that's not even necessarily expected from the user. 
If we have a revision number in the xml object the search could even be 
modified and updated.

I think this gives us a pretty flexible and powerful way of sharing folders 
which is at least fully compatible with akonadi. No idea however if something 
like this can also be implemented in an MS Exchange client.

Btw. we could even think of adding a readonly folder with copies for the 
limited clients (along the lines of option #2). I.e. on akonadi this folder 
would be hidden so you get the full functionality including editing via 
virtual folders. On the smartphone (or just any other imap client) you get 
only the readonly version, but at least you have something for reference. But 
this, again, can be added at a later stage and is fully optional (as long as 
we have a field in the xml object to store this folder).

I don't see any implications on nepomuk with either of those solutions.

With my best regards,

Christian

> 
> There may be other advantages and disadvantages that I did not list.
> 
> Please help us identify them all, so we can come to a good decision.
> 
> Likewise, if you can think of a scenario that should be considered in
> addition to the ones listed here, please let me know. As for the scenarios
> listed, there are two questions in particular that I wonder about:
> 
>  (a) Compatibility with clients, in particular: How will this integrate (or
> 	not) with the new Nepomuk/Akonadi KDE Kontact basis
> 
>  (b) Query language: How do we best formulate/store the query in these
> 	scenarios?
> 
> Anyhow, these are my thoughts on the matter right now.
> 
> I'd be happy to start drafting on something once we've identified which
> direction things should go into, but right now I still am not sure which is
> the path to go. So input is VERY appreciated.
> 
> Best regards,
> Georg
> 
> 
> 
> [1] http://kolab.org/pipermail/kolab-format/2011-July/001415.html
-- 
Christian Mollekopf
Software Engineer

Kolab Systems AG
Zürich, Switzerland

e: mollekopf at kolabsys.com
w: http://kolabsys.com

pgp: EA657400 Christian Mollekopf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.kolab.org/pipermail/format/attachments/20110830/711010be/attachment.sig>