Request for Input: Storing Searches

Mon Sep 5 14:27:02 CEST 2011

On Wednesday 31 August 2011 13.13:24 Jeroen van Meeuwen wrote:
> Christian Mollekopf wrote:
> > > > - The action would then pick the results from the search which
> > > > are in
> > > > this resource, and tag them via ANNOTATE and create an xml
> > > > object with the search info.
> > > 
> > > Where would this XML object be put?
> > 
> > I'd imagine that we put those objects in the rootfolder or in a "Saved
> > Searches" folder. In the "Saved Searches" folder we could then also
> > create the optional server-side populated search directories.
> 
> There's no such thing as a 'root folder'.

Indeed, lets take the INBOX folder as root folder then.

I just realized that we are probably thinking of slightly different usecases.

I was mainly thinking about the usecase where a user saves a search for 
himself for use on another client.
So the searchfolder would be a subfolder of the users INBOX and the search is 
mainly meaningful for that user only. I.e. I search for all emails belonging 
to a project I'm working on (and then I might want to share that search with 
one or two fellow workers).

Sharing the search would then be more along the lines of copying the query to 
another users INBOX folder so he can execute the same query.

Thats also why I thought doing the search on the client side isn't a huge 
performance hog. If we have a shared search for 300'000 users, it is a bit a 
different story.

I assume you were more thinking of something like a shared search for i.e.
all representatives of a company living in Switzerland, which could be used as 
a dynamically updated distribution list for the whole company. Such searches 
do not belong to a particular user and here what you say makes a lot of sense.

I guess it's desirable to cover both usecases?
I think they are really quite different because in the first case the search 
really belongs to a user and is optionally shared while in the latter usecase 
the search doesn't belong to anyone and is global.

Also in the first usecase read/write rights are probably more important than in 
the second one. Imagine I have notes in this search, so write rights are 
mandatory.
In the second case editing is more a "nice to have" because the search serves 
more as a reference/directory.
So it's IMO mainly the first usecase where it makes sense to replace the 
prepopulated results with something like the akonadi virtual folders giving 
full read/write rights.

> 
> If a 'Saved Searches' folder were to be used, all 'saved search' Kolab XML
> objects would go into that one folder.
> 
> Sharing any particular saved search now becomes a problem,
> 
> Clients not compatible with KEP #9 are now helpless, since a top-level
> folder of unknown type is encountered, but they have to descend in order to
> get to any sub-folder,
> 
> The saved searches folder(s) cannot be pre-populated unless the Kolab XML
> object for saved searches also states where any pre-populating should go out
> to, which naturally is subject to too much change,
> 
> Keeping 'reference' objects in a 'saved search' folder creates the same
> 'subject to change' problem... if a reference object where to say,
> user/john.doe/Contacts at example.org?uid=blabla, renaming the Contacts folder
> to Kontacten would create a reference issue; any reference should be
> completely referential (OLAP),
> 
> etc.

Ok, then it would probably make sense to forget about the xml objects and have 
a 'saved search' folder containing all the public shared searches, each being 
a folder which is populated with the results. For "personal" saved searches 
(first usecase above) the client can ignore the prepopulated items and execute 
the query which is stored in an annotation to provide a virtual folder with 
read/write access.

> 
> > > > - The resource populates the virtual folder with virtual items
> > > > based on the tags.
> > > > 
> > > > =>  	- No data duplication
> > > 
> > > This has always been "optional"; a saved search folder *could* be
> > > pre-
> > > populated.
> > > 
> > > Imagine a saved search across 5 contact folders with 10.000 contacts
> > > on
> > > average.
> > > 
> > > When NOT pre-populating the saved search folder with the search
> > > results, you pay the cost every time the folder is opened. Maybe
> > > this cost is not so great for a fat client with a local cache to
> > > query
> > > (Kontact/akonadi/nepomuk/Disconnected IMAP), but for a
> > > web-interface...
> > > well...
> > 
> > Yes, since akonadi can create virtual folders it only has to populate
> > them once AFAIK, and then result is then cached.
> > For a webclient I guess you're right. But if we have the dataduplication
> 
> Again, the data duplication is *at one's option*. For web-interface, I say
> one should pre-populate the search folder. For clients like Kontact (with
> client- side, local caches), perhaps it's feasible to allow them to ignore
> the cached results and go with a real-time search.
> 
> > and it should also be writable it looks somewhat error prone to me.
> 
> A clause in the KEP for these types of folders can be, that the content of
> the folder SHOULD NOT be made writeable for any event other then 'update
> saved search'.
> 
Indeed.
> > Especially if we have to implement that for every client.
> 
> We have to implement everything and anything for every client, in case you
> haven't noticed.
> 

If we have server-side populated search folders that should work with any IMAP 
client and we don't have to implement anything on the client.

> > I reckon using akonadi as a cache for the webinterface would solve that
> > problem?
> 
> No, the caching layer is moot unless you also consider all clients use
> akonadi. One cache (in one location) to rule them all being akonadi is not
> necessarily the best way to go with this. This, however, is a different
> topic, and we should talk about caching separately from the saved searches
> topic.
> > > When pre-populating the saved search folder with the search results,
> > > you pay the cost in "duplicate" storage (as explained, not on the
> > > server side, perhaps on the client side if it's not intelligent
> > > enough to de-duplicate).
> > 
> > Well, the argument for doing it server side would be that it is
> > available
> > on any client (i.e. smartphone), but then read-only is the only option i
> > see. If it is on the client side, this ends up to be essentially the
> > same
> > as the akonadi virtual folders.
> 
> Note that the "problem" or "difficulty" is not the editing of an object from
> within the saved search, not for a client and not for a user.
> 
> It is the occurence of said object twice or more times in or across all
> readable folders that is the first problem.
> 

Yes, that's why I say that server-side populated search folders should be read 
only.

> > > Another penalty in pre-populating the saved search folder could,
> > > arguably, be that perhaps there's results in said saved search
> > > folder
> > > that the person using the folder would otherwise not have access to.
> > > However, this can also be considered a feature; "Share all contacts
> > > from Vendors folder tagged with 'ict' with helpdesk personnel"
> > 
> > If it is being populated on the client side I don't see how you could
> > get
> > access to items you shouldn't have access to.
> 
> You want to avoid your 350.000 clients from each having to iterate and re-
> iterate against most of your infrastructure components themselves, just to
> pre-populate and update their saved (contact) searches cache(s), if you can
> do so periodically on the server-side, under your own control.
> 

I agree for globally shared searches (second usecase). For personal saved 
searches I'm not sure if it makes sense to run the queries of 350'000 clients 
on the server if most of the queries are only relevant for this particular 
user.

> Please note that the folders of type 'contact' that the user has access to
> are not the only things that are subject to change. Please also note that
> these folders and other resources can be *huge*.
> 
> Saved searches are often not the smallest set of search results. They often
> include a type of query that is a little less specific then
> (mail=john.doe at example.org), and inherently can include a lot of attributes
> that are not (cannot be?) indexed anywhere.
> 
> As such, saved searches are *hugely* expensive to execute.
> 
> When they happen on the client side, and on one particular client only,
> configurable per client, then we're all fine. Kontact for instance can do
> saved searches in a reasonable fashion because of Akonadi. A web interface
> such as Horde or Roundcube however cannot. Please note a saved search for
> these interfaces will have to;
> 
> 1) execute within the PHP execution timeout (30 seconds),
> 
> 2) stay within the memory_limit (64/128/192 MB),
> 
> 3) do not use the user's credentials / privileges to query resources other
> then IMAP
> 

I thought if we had a cache like akonadi below the webinterface we could 
circumvent that problem, that we're about in the same position as with 
Kontact, but I don't know enough about Roundcube or Horde to say if that makes 
sense.

> Furthermore, interesting saved searches discussions can be held over
> particular types of Kolab clients such as Z-Push (and its clients). Not all
> Kolab-consuming applications have a need for, an interface for, or a place
> for saved searches or server-side caching of said saved searches.
> 
> Similarly, not all clients have a need for, or place for, a server-side
> cache -such as Disconnected IMAP in Kontact.
> 
> That said, a folder clearly configured as a 'saved_search' folder can be
> ignored (Z-Push?) or the contents thereof can be ignored (Kontact?), while
> the contents may be the periodically updated search results, or the folder
> be empty. Both cases serve all clients well, KEP #9 compatible or not.
> 

Agreed, I think we should have a 'saved_search' folder for global searches and 
one for each user to accommodate both usecases. I just don't know how to share 
the query for a personal saved search yet.

Cheers,

Christian

> Kind regards,
> 
> Jeroen van Meeuwen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.kolab.org/pipermail/format/attachments/20110905/9e3f7d35/attachment.sig>