Event UID in the Subject?

Wed Jul 14 18:26:14 CEST 2004

On Wednesday, 14 July 2004 17:01, Bernhard Reiter wrote:
> Any operation on the server done by many clients is significant.
> Our design should consider 100.000 or more users.

We're not aiming for a Horde/Webclient system with numbers as high as this. 
Perhaps sometime in the future, when we've worked out some distributed 
processing problems, but at the moment we're targeting the webclient for a 
significantly smaller userbase (either overall or as a small percentage of 
the total Kolab-system userbase). This is another reason why I do not 
consider the search operations to be that significant a performance hit.

We're thinking of the webclient as a tool for the small number of users in a 
company who are 'on the road' to access their groupware features wherever 
they are, as long as they have internet access. We're not aiming for this to 
be *the* primary client for an organisation, but merely a useful fallback 
tool for remote access.

We do have clients using the webclient as their primary client, but merely the 
mail portion of it - none of them are using the full groupware suite.

> > You see, it's not just "one additional value" that I would have to store
> > - I can't add arbitrary values to the event hashes (well I can, but I
> > can't retrieve them later), meaning I would have to implement a EUID ->
> > IUID mapping (EUID = Event UID, IUID = Imap UID) within the PHP session.
> > As I've said previously - all I get is a "retrieve the object with UID X"
> > call.
>
> This is about Horde design.
> In other circumstances I would have said: just add this as variable to the
> links you create for each event and save it in the page on the webclient
> this way.

Yes. Unfortunately Horde does not work this way, and I don't see them changing 
anytime soon. Horde was not designed to work off an IMAP server as the data 
backend, but rather a SQL database. They have the ability to use drivers that 
access any backend imaginable, through a plugin mechanism, but the internals 
are still optimised to work in a SQL-like fashion.

> > The UIDs that are used in Horde are at a minimum 32 characters long
> > (MD5sums), however it is often the case that a UID is 64 or possibly more
> > characters long, as we also use the UID to resolve which share (message
> > folder) the object (mail) is actually stored in (*).
>
> I guess I also do not know how the EUIDs get calculated initially.
> My understanding was that other clients might also come up with them
> and horde need to cope with them, if it reads them in an email.
> Do you also need a mapping for those?

When reading/modifying/deleting local objects, I don't care what the format of 
the EUID is - that is why the code works with any EUID value (i.e., EUIDs 
generated by other clients), as long as it is unique within the current 
mailbox. This is where I do the search when I get the "load object w/ UID X" 
call.

With new objects that are created in Horde, we just generate MD5 UIDs.

> > And the problem is, this mapping would have to cater for every message I
> > read, within every message folder I touch, for every user of the
> > webclient, for a reasonable time period as I do not know when the next
> > page load (message request) will come through, as again, I've said
> > before.
>
> There is a limit to the time an information on screen can be valid.
> I'd say 30 minutes should be enough. People usually understand this.

However this cannot really work either, as I have no way of knowing that a 
user will even come back and view another page, or if s/he will view the page 
in the next 5 minutes or whatever. And again, my scripts aren't running as 
some sort of daemon so I cannot perform timed-caching through that.

While the user is accessing the system I can maybe implement a timed-caching 
system on the mailboxes that are not being used, but then the current 
operations that are being performed by the user will seem even more sluggish 
as I manage the other sub-systems (e.g. the user is accessing the calendar - 
I then manage the tasks/notes, etc). Not something I would like to do.

The only effective way of timing out data is to just let the entire cache be 
cleared when the session is killed. Then again, there's no gaurantee that a 
session will be properly killed (e.g. if a user does not click "log off" 
before browsing to another site).

> > Oh yes,
> > and then there's the UIDValidity problem as well.
>
> When the IUID is outdated, then you need to check for new messages
> anyway.

With the search scheme, I don't have to care if the UIDvalidity value has 
changed. My search will work regardless of this fact.

However, with the caching scheme, it's quite a different and more complex 
problem. When accessing a specific object I'd have to first check that the 
IUIDs in my map are still valid before performing the operation; if they are 
no longer valid (i.e. UIDvalidity has changed) I then have to build up my 
entire mapping again (by re-reading every message), so the performance would 
be abysmal on cases such as this.

Another problem I forgot to mention was that the revered C-Client library has 
no means of reading all new messages (e.g. "FETCH X-Y", a range of messages 
as you would normally do). You have to fetch individual message in the new 
UID range and test if the operation was successful or not to see if you 
actually read a message.

> > Now *this* "puts a lot of load on the server and limits scalability", as
> > this would be done through PHP scripting in the session cache. Not a very
> > practical/desireable solution.
>
> Could be another missunderstanding on my part.
> I though that it would be possible to let PHP for the webclient run
> on a different machine and only point to the IMAP server.
> Best would be if it just could use disconnected imap just like
> any other clients, but online imap also works.
> This makes the client a client process running on a server machine,
> but not in _the_ kolab server machine.

It should technically be possible to separate the two, by just changing some 
IP addresses. Unfortunately we haven't tried this yet, so we're not sure if 
there will be problems with it, and what we could change in the code to take 
advantage of this fact.

> > I'm not sure if you've used the webclient yet, but without a PHP
> > accelerator it's really not that geat an experience to try. Horde is
> > already a massive system without this gargantuan caching mechanism that
> > has been proposed. That is why I would like to offload a lot of this
> > probable "caching" performace hit to the IMAP server (which translates to
> > a little hit to Cyrus, ala the SEARCH), and let Horde concentrate on
> > providing web-based groupware functionality, as opposed to mirroring the
> > data that Cyrus already holds.
>
> Mirroring with on demand syncs is a lot more scalable, if a different
> machine can be used.

Perhaps we will get around to exploring these options some day. Unfortuantely 
we don't have the time nor resources to do this now.

> Hmmm, so you cannot deal with an answer of an uid that you did not create
> yourself? Also when the event would be moved from one calender folder to
> another, you would loose this connection?

Any answer I receive from a request made by the webclient will always have the 
share UID in the EUID as well, as I have the possibility of changing the EUID 
when modifying the event (in Horde, the only way to sent out meeting 
requests/updates is by modifying the event).

Any requests that I receive outside of this (for example, a user sends a 
meeting request from Kontact and processes the reply from any attendees 
within Horde) will unfortunately require the full search operation to locate 
the specified event (unless, if I'm not mistaken, Kontact/Outlook can only 
send out meeting requests from the primary (default) calendar, in which case 
there is no problem. Or am I getting confused with the free/busy 
generation?). In any case I'm judging that this will be a relatively rare 
situation  - I can't see users switching clients like this.

Another question, unrelated to the current discussion, is how the other 
clients handle meeting requests/replies from Horde. This will be an 
interesting part of the interoperability testing - getting requests 
operational between the three. More fun after the format is stable.

> Both problems sound like the horde code needs to put up caches on disc
> to get faster which would also solve many of those problems.

Caches on disk? Do you mean caching the IMAP data, or the UID mapping in the 
session? The session data is already stored on disk, or SQL, or wherever else 
(Horde has several session backends). Caching the IMAP data would however 
involve quite a bit more work than we're planning on spending.

> I agree with Martin that using those headers leans towards a design
> that looks simple and might lead to many requests to Cyrus which
> could be avoided with better design. We certainly should fix the design
> problems, but we cannot do this quickely.

Unfortunately I am in a similar boat to Joon, where I cannot change the parent 
application. Unlike Joon I do have access to the code and can make minor 
changes, however I don't believe the Horde guys would like me changing their 
fundamental designs, just so our backend drivers work more effectively.

If you're thinking of the webclient as a primary client then perhaps Horde is 
not the best choice for the problem - you'd probably be forced to write your 
own "dedicated" kolab webclient, as there really isn't anything out there 
than can cater to this.

So yes, unfortunately we're limited by the Horde design. This is not a problem 
for us, as as I've said we're not aiming for the webclient to be a primary 
client.

> So for this reason and that it cannot hurt
> I think we can have that UID in the subject.
> However the problem about handling non-horde created EIDs
> must be solved somehow.

There is no problem with non-Horde EIDS, as I described above.

Cheers,

-- 
Stuart Bingë
Code Fusion cc.

Office: +27 11 673 0411
Mobile: +27 83 298 9727
Email: s.binge at codefusion.co.za

Tailored email solutions; Kolab specialists.
http://www.codefusion.co.za/