Request for Input: Storing Searches

Georg C. F. Greve greve at kolabsys.com
Sat Aug 27 20:39:40 CEST 2011


Dear all,

Some of us have started tossing around thoughts about how to save searches in 
one Kolab client in a way that they are re-usable in all others, ideally. 

When giving it some brainspace, it turns out this is not a trivial issue, for 
a variety of reasons, starting with there being a tradeoff decision between 
being expensive for the CPU or storage, for instance. But it is a little bit
more complex than that, actually.

Allow me to higlight a couple of scenarios with advantages and disadvantages:

 - Scenario 1: Storage with a new KEP 9 based XML object

	One could attempt to model this as a "search" XML object that would
	incorporate the fields of the object type searched, plus some special
	fields, e.g. folders to search, as well as searches across multiple fields
	and search logic (AND/OR etc),

	These objects would live in the regular folders for resources, and would
	potentially even replace the list object in functionality, as they would
	then model a list of recipients as list of address book entries, which is
	something that Alain once suggested. [1]

	Advantages: Fairly close to existing functionality, and likely not too
	hard to implement for most clients (in comparison, at least), no data
	duplication anywhere.

	Disadvantages: Expensive on the CPU, Does not work on all resources
	because we cannot store these XML elements in email type mailboxes.


 - Scenario 2: Creation of new folder type w/KEP #9 annotation for metadata, 	
			create one folder per saved search

	In this approach we'd create a new folder of the corresponding resource
	folder type for each search which would be identified as a stored search
	folder by existence of the /vendor/kolab/saved-search annotation which
	carries the metadata for the search in an array, e.g.

	{ 'saved_search':
    		{ 'search_locations': 'blabla',
      		'params': 'blabla',
      		'filter': 'blabla',
      		'fuzzyness': 'blabla',
		'async': '0'
		....
    		}
	}

	and the folder would be populated with the results of the search.

	This DOES mean data duplication on the client, but Cyrus does allow to
	deduplicate entries on the server side, so it would not affect storage
	there. I am sure something similar would be possible with Dovecot, so
	we can for the moment assume data gets duplicated on the client only.

	Advantages: Allows clients without search functionality to use results,
	can be automatically regenerated on the server if needs be, least CPU
	usage, works on email.

	Disadvantages: Data duplication on the client, possible data set de-
	synchronization (e.g. contact gets edited in search results, same contact
	in main box and other search results boxes must be updated, this may be
	hard to ensure), increases folder clutter, some folder sharing questions.
	

 - Scenario 3: Map searches with tags

	As a Kolab object, each search will carry an ID. If we were to introduce
	a new email header flag in storage that can carry an arbitrary number
	of tags, we could tag each object with the ID of every search that it
	matches.

	IMAP searching for header fields should make it comparatively easy
	and fast to find all objects of type X that match a certain tag Y,
 	especially if we ask the server to cache this header field.

	This would be complemented by a KEP 9 compatible object to describe
	the search, which could then be automatically applied to new objects on
	the server, or performed by the client, based on the scenario.

	Advantages: Low CPU & storage requirements, allows Kolab clients to apply
	a tag concept over all object types including email with potential server
	side tagging of incoming email

	Disadvantages: New concept, some questions around shared folders, e.g.
	what if a client sees a shared object tagged with an ID for a search it
	does not know because it does not have access to the folder where that
	search is defined?


There may be other advantages and disadvantages that I did not list.

Please help us identify them all, so we can come to a good decision.

Likewise, if you can think of a scenario that should be considered in addition 
to the ones listed here, please let me know. As for the scenarios listed, 
there are two questions in particular that I wonder about:

 (a) Compatibility with clients, in particular: How will this integrate (or
	not) with the new Nepomuk/Akonadi KDE Kontact basis

 (b) Query language: How do we best formulate/store the query in these
	scenarios?

Anyhow, these are my thoughts on the matter right now.

I'd be happy to start drafting on something once we've identified which 
direction things should go into, but right now I still am not sure which is 
the path to go. So input is VERY appreciated.

Best regards,
Georg



[1] http://kolab.org/pipermail/kolab-format/2011-July/001415.html


-- 
Georg C. F. Greve
Chief Executive Officer

Kolab Systems AG
Zürich, Switzerland

e: greve at kolabsys.com
t: +41 78 904 43 33
w: http://kolabsys.com

pgp: 86574ACA Georg C. F. Greve
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 308 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.kolab.org/pipermail/format/attachments/20110827/49853f2d/attachment.sig>


More information about the format mailing list