Request for Input: Storing Searches

Wed Aug 31 14:13:24 CEST 2011

Christian Mollekopf wrote:
> > > - The action would then pick the results from the search which are in
> > > this resource, and tag them via ANNOTATE and create an xml object with
> > > the search info.
> > 
> > Where would this XML object be put?
> 
> I'd imagine that we put those objects in the rootfolder or in a "Saved
> Searches" folder. In the "Saved Searches" folder we could then also create
> the optional server-side populated search directories.
> 

There's no such thing as a 'root folder'.

If a 'Saved Searches' folder were to be used, all 'saved search' Kolab XML 
objects would go into that one folder.

Sharing any particular saved search now becomes a problem,

Clients not compatible with KEP #9 are now helpless, since a top-level folder 
of unknown type is encountered, but they have to descend in order to get to 
any sub-folder,

The saved searches folder(s) cannot be pre-populated unless the Kolab XML 
object for saved searches also states where any pre-populating should go out 
to, which naturally is subject to too much change,

Keeping 'reference' objects in a 'saved search' folder creates the same 
'subject to change' problem... if a reference object where to say, 
user/john.doe/Contacts at example.org?uid=blabla, renaming the Contacts folder to 
Kontacten would create a reference issue; any reference should be completely 
referential (OLAP),

etc.

> > > - The resource populates the virtual folder with virtual items based on
> > > the tags.
> > > 
> > > =>  	- No data duplication
> > 
> > This has always been "optional"; a saved search folder *could* be pre-
> > populated.
> > 
> > Imagine a saved search across 5 contact folders with 10.000 contacts on
> > average.
> > 
> > When NOT pre-populating the saved search folder with the search results,
> > you pay the cost every time the folder is opened. Maybe this cost is not
> > so great for a fat client with a local cache to query
> > (Kontact/akonadi/nepomuk/Disconnected IMAP), but for a web-interface...
> > well...
> 
> Yes, since akonadi can create virtual folders it only has to populate them
> once AFAIK, and then result is then cached.
> For a webclient I guess you're right. But if we have the dataduplication

Again, the data duplication is *at one's option*. For web-interface, I say one 
should pre-populate the search folder. For clients like Kontact (with client-
side, local caches), perhaps it's feasible to allow them to ignore the cached 
results and go with a real-time search.

> and it should also be writable it looks somewhat error prone to me.

A clause in the KEP for these types of folders can be, that the content of the 
folder SHOULD NOT be made writeable for any event other then 'update saved 
search'.

> Especially if we have to implement that for every client.
> 

We have to implement everything and anything for every client, in case you 
haven't noticed.

> I reckon using akonadi as a cache for the webinterface would solve that
> problem?
> 

No, the caching layer is moot unless you also consider all clients use 
akonadi. One cache (in one location) to rule them all being akonadi is not 
necessarily the best way to go with this. This, however, is a different topic, 
and we should talk about caching separately from the saved searches topic.

> > When pre-populating the saved search folder with the search results, you
> > pay the cost in "duplicate" storage (as explained, not on the server
> > side, perhaps on the client side if it's not intelligent enough to
> > de-duplicate).
> 
> Well, the argument for doing it server side would be that it is available
> on any client (i.e. smartphone), but then read-only is the only option i
> see. If it is on the client side, this ends up to be essentially the same
> as the akonadi virtual folders.
> 

Note that the "problem" or "difficulty" is not the editing of an object from 
within the saved search, not for a client and not for a user.

It is the occurence of said object twice or more times in or across all 
readable folders that is the first problem.

> > Another penalty in pre-populating the saved search folder could,
> > arguably, be that perhaps there's results in said saved search folder
> > that the person using the folder would otherwise not have access to.
> > However, this can also be considered a feature; "Share all contacts from
> > Vendors folder tagged with 'ict' with helpdesk personnel"
> 
> If it is being populated on the client side I don't see how you could get
> access to items you shouldn't have access to.
> 

You want to avoid your 350.000 clients from each having to iterate and re-
iterate against most of your infrastructure components themselves, just to 
pre-populate and update their saved (contact) searches cache(s), if you can do 
so periodically on the server-side, under your own control.

Please note that the folders of type 'contact' that the user has access to are 
not the only things that are subject to change. Please also note that these 
folders and other resources can be *huge*.

Saved searches are often not the smallest set of search results. They often 
include a type of query that is a little less specific then 
(mail=john.doe at example.org), and inherently can include a lot of attributes 
that are not (cannot be?) indexed anywhere.

As such, saved searches are *hugely* expensive to execute.

When they happen on the client side, and on one particular client only, 
configurable per client, then we're all fine. Kontact for instance can do 
saved searches in a reasonable fashion because of Akonadi. A web interface 
such as Horde or Roundcube however cannot. Please note a saved search for 
these interfaces will have to;

1) execute within the PHP execution timeout (30 seconds),

2) stay within the memory_limit (64/128/192 MB),

3) do not use the user's credentials / privileges to query resources other 
then IMAP

Furthermore, interesting saved searches discussions can be held over 
particular types of Kolab clients such as Z-Push (and its clients). Not all 
Kolab-consuming applications have a need for, an interface for, or a place for 
saved searches or server-side caching of said saved searches.

Similarly, not all clients have a need for, or place for, a server-side cache 
-such as Disconnected IMAP in Kontact.

That said, a folder clearly configured as a 'saved_search' folder can be 
ignored (Z-Push?) or the contents thereof can be ignored (Kontact?), while the 
contents may be the periodically updated search results, or the folder be 
empty. Both cases serve all clients well, KEP #9 compatible or not.

Kind regards,

Jeroen van Meeuwen

-- 
Senior Engineer, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
t: +44 144 340 9500
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kolab.org/pipermail/format/attachments/20110831/05d27d5a/attachment.html>