A summary of sorts (was: Re: Why and when storing local time? (Re: Basic rationale of the KEP #2 design))

Mon Apr 4 10:50:22 CEST 2011

Georg,

Am Donnerstag, 31. März 2011 um 18:16:19 schrieb Georg C. F. Greve:
> 
> On Thursday 31 March 2011 15.49:08 Florian v. Samson wrote:
> > Please provide a pointer to the DST switching dates of UTC:
>
> I never said that UTC switches.

1. Semantically UTCWND (="UTC with no DST") implies that regular UTC has 
DST, which is does not: <http://en.wikipedia.org/wiki/UTC#Daylight_saving>

2. In your example below the opposite is true: UTC has no DST (correct), but 
your "UTCWND" has a (supposedly arbitrary, in your example the German) DST 
offset added to UTC (in "summer"), so it is actually rather "UTCWSD" (="UTC 
with some DST").

> What I said is that local time is expressed in variable UTC offsets.

O.K., this is true for sure (DST comes and goes, TZs become redefined etc.), 
but you never wrote it that way before, IIRC.  Having read this email 
completely, I now understand that this was just confusion in terms (no 
matter on whose side).

> > Until then I stick to believe UTC == "UTCWND"
>
> For UTC == UTCWND to hold true, the conversion to either one from local
> time and back would have to be identical. 

3. Hence it is not to be called "UTC*" anymore, as it loses the property of 
being an *universal* time everywhere around the globe; it seems to have a 
specific, local DST-delta added.

4. I still fail to see, what the definition of "UTCWND"/"UTCWSD" shows, 
proves or adds, except confusion.
First you define your "UTCWND" (or rather "UTCWSD") to have certain 
properties, then you show that your "UTCWND" really has these specific 
properties: IMHO this is a circular conclusion, leading to nothing.  
Or am I missing something?

> But UTCWND was specifically 
> defined to not behave identically, as it is supposed to behave as if DST
> did not exist.

O.K., *you* defined your "UTCWND" the way *you* want to.  
But I still fail to understand what that strange mishmash of UTC with an 
DST-dependent local offset helps to explain: IMHO nothing.

> That is what I tried to explain in
> http://kolab.org/pipermail/kolab-format/2011-March/001287.html

I sure read it (multiple times, actually), but copying the same stuff again, 
without providing any new explanation what this is about or intended to 
show, cannot provide new insights.

> Copying from there: Considering that 10:00 in Europe/Berlin
>
>  * in the winter translates to
> 	 09:00 UTC
> 	 09:00 UTCWND
>
>  * in the summer translates to
> 	 08:00 UTC
> 	 09:00 UTCWND
>
> So any local time with DST regime maps to different times for UTC and
> UTCWND, which means they are *not* identical, nor were they defined to
> be.

But it does not show / prove / explain anything, IMO.  It is just some 
arbitrary definition of an artificial "Georg's mishmash time".

> > As discussed here numerous times: everybody usually thinks in local
> > time, so  that is what is meant, no matter if currently DST is active
> > or not.
>
> Exactly. So we need storage that preserves or at the very least allows us
> to restore that user intent from the stored data.

Yes, this is what I always proposed: store the TZ-data for the TZ identified 
by the TZ-ID, which we already agreed (hopefully) to store there.  This 
will allow "to restore that user intent from the stored data.", AFAICS.

> > > but it adds one step to get to local time for all fields past,
> > > present and future.
> >
> > Yes, but it can retain compatibility with older versions of the
> > Kolab-format: a big pro, IMHO.
>
> No, actually, because older clients stored data in UTC.

Sure "older clients stored data in UTC", so there is no change in their 
behaviour: This is what backward compatibility is about, hence I do not 
understand your "No".  Can you please explain?

> As shown above, UTC != UTCWND, so clients will interpret data wrongly.

This is no change from the situation today, thus no drawback, rather "as 
good as it can get" for those older clients.

> We agree that compatibility is a major issue.

Good (and the best is still to come, IMO).

> Because the old client behaviour was broken due to format
> insufficiencies, one- way compatibility is the best we can ultimately
> hope for. Older clients will never correctly interpret a full set of
> newer data regardless of which solution we choose.

Ack, +1.

> So the best we can do is a solution where newer clients will read and
> interpret older data correctly. Because the old format is a very limited
> subset of RFC3339, local time storage in RFC3339 fits that bill.

In order to retain compatibility with older clients, we would have to stick 
to that "very limited subset of RFC3339" which implied stating date-time in 
Zulu (UTC), AFICS, and add new ("top level", as Bernhard would interject 
here) XML-tags.

> UTCWND on the other hand has no compatibility in either direction.

Yes, this is why I never understood your digression about "UTCWND".  Maybe I 
just missed somebody's statement which triggered that, but I had the 
impression "UTCWND" was solely your invention.

> > > It can also not be stored as RFC3339, as that explicitly specifies
> > > UTC.
> >
> > So let it be it Zulu-time (UTC).
>
> As explained before, that requires an additional point of metadata,
> namely whether or not the client assumed that DST would be in effect for
> this appointment.
>
> So it's UTC + TZ-ID + DST Assumption.

Bernhard and I (I hope correctly) had the impression that "DST Assumption" 
just means "the TZ-data for the TZ identified by the TZ-ID", so this 
translates to:

UTC + TZ-ID + TZ-data

This is exactly what I proposed, and what Joon proposed as well (the way I 
understood him) quite a while ago.   Hurrah, I would have never thought 
that such a large part of this lengthy discussion thread was just happening 
due to misunderstandings. :-))

> Clients would then have to adjust the stored UTC value accordingly if the
> DST assumption turns out to be wrong. Some more work would have to go
> into this to ensure that possible things like a changed base offset gets
> calculated correctly.

Ack, +1, even though I do not really understand what is addressed 
with "ensure that possible things like a changed base offset gets 
calculated correctly".  
(Your first sentence in above paragraph covers what I tried to express a 
couple of times before, and obviously failed to accomplish.)

> That is clearly a possibility, but much more complex than local time +
> TZ-ID.

"Yes" *and* "No": I think we both could extract a lengthy list of pros and 
cons WRT to the properties and the technical implications of either 
solution from the lively discussion throughout the last couple of months on 
this mailing list.  
Maybe we should, put the two proposals, that list and the reasoning for the 
final decision in the KEP2.  This would also dispel my criticism WRT 
transparency of the process.

BTW, I disagree with your "much" above, and how is backward compatibility 
ensured, then? ;-)

> > No, this is only correct for clients in the same time zone as the event
> > itself;
>
> You misunderstood what I tried to say: Finding out what the user intended
> to store requires zero steps when storing the intended local time.
>
> That from there on you have multiple calculations to get to the end
> result is correct, but also true for all other approaches.

Yes, so what?  As this is no difference, it is neither a "pro" or "con" for 
any solution.

> > > Nobody seems willing to advocate for it strongly,
> >
> > I did and still do, in case you forgot.  An I still believe it should
> > be the  primary source of TZ-data for the Clients accessing that
> > Kolab-object, and the rules to dynamically update that in-format
> > TZ-data should be well defined.
>
> Alright. As you may have seen, using static information as the primary
> source of information is not shared by most people.

a. I refrain from jumping at the "static" (again), as I probably have 
triggered that by emphasising that in-format TZ-data is updateable and IMO 
should be updated as well.

Can we both agree, that a local zoneinfo-database and in-format TZ-data both 
are possible sources of TZ-data for a Kolab-client, but with different 
properties?
Maybe we can even agree that the crucial properties are:
- Local TZ-database:
  Pros: 1. supposed to be automatically updated with / by the OS, hence 
           Kolab-clients do not have to take care of the freshness of 
           TZ-data.
        2. No need to store any TZ-data in Kolab-objects, thus saving 
           disk space.
           (A very rough estimate: ca. 100 Bytes of TZ-data in a couple 
            of thousand events per user = a couple of hundred KBytes.)
  Cons: 1. Kolab Format-parsers may not have a TZ-database at hand 
           (= easily in reach), hence a PIM-Client using such a parser 
           is being precluded to conform to and properly utilise KEP2, 
           under both technical *and* behavioural aspects.
        2. As every Kolab-client uses its own copy of TZ-data, displayed 
           event times may deviate due to the varying freshness of those 
           TZ-databases, depending on update frequency of each single 
           client PC and the OS / OS-distribution used (rsp. the freshness 
           and correctness of their distribution of a TZ-database).
    I am sure I forgot some points: everybody feel free to add some.
- In-format TZ-data
  Pros: 1. As it is a single, central source of TZ-data for all 
           Kolab-clients accessing an Kolab-object, all those clients 
           can display identical date-times.  Consequently switching to 
           more recent TZ-data for an event (date, task) is an atomic 
           operation / event for all those clients.
        2. Not all clients must have access to a TZ-database.
  Cons: 1. Updating in-format TZ-data is a process which must be properly 
           specified, in order to avoid "update wars" in each Kolab-object 
           (e.g. via a time-stamp or some kind of a version string) and in 
           IMAP-directories (thrashing the IMAP-server by concurrently 
           trying to update Kolab-objects, because the local TZ-database 
           was updated simultaneously; every client waiting a random number 
           of minutes followed by iteratively checking that each 
           Kolab-object's TZ-data is still stale immediately before 
           updating them for each writable IMAP-directory containing 
           Kolab-objects will alleviate this issue).
        2. Even though the additional disk space occupied by TZ-data in 
           thousands of Kolab-objects is of minor size, for most usages 
           the TZ-data fields will mostly contain identical, hence 
           duplicate information.
    Again, I sure missed a couple of points.

And the different points have different weights.

b. Yes, I read you and Bernhard clearly arguing against it, but IIRC even 
Jeroen has not made a concise statement WRT the priority of in-format vs. 
local sources of TZ-data, yet (and we seem to have left everybody else 
behind, see below).  

I have the impression many of the other subscribers of this mailing-list are 
tired of the whole ongoing KEP2 discussion, even though at least you, 
Bernhard, Jeroen and I seem to agree, that properties and implications of 
each proposed solution must be very well understood, in order to avoid 
mishaps like the very reason for KEP2: a significant flaw in the 
Kolab-format specification. 
This is one of various reasons, why I believe a transparent compilation of 
technical and behavioural pros and cons in KEP2 for each proposed solution 
is crucial, so people not following the discussion closely (anymore) are 
still able to form a substantiated opinion without retracing the whole 
discussion (to be honest, that was my original plan for KEP2, after I read 
KEP1 for the first time; I ended up retracing via the web-frontend more 
than once, but for a good part this is due to my leaky memory).

> Would you be willing to live with a way to encode this format in KEP 2 as
> a cache that is subordinate to the dynamic TZ-ID based lookup, but may be
> used where such lookup is too hard and can be updated under certain
> conditions?

Actually I think that using in-format TZ-data as a cache ("subordinate" to a 
local TZ-ID based look-up of TZ-data) is a reasonable compromise, which 
dispels my concerns WRT incapable Kolab-format parsers.  This would also 
allow for switching the priority of the in-format vs. the local sources of 
TZ-data just by changing the specification (and consequently the client 
behaviour), but without changing the format of Kolab-objects, in  case a 
different assessment of the weight of the pros and cons anytime later: 
nice.
(Still then some of the cons from both in-format and local TZ-data sources 
become effective, but I think the combined pros outweigh these by far.)

But as discussed above, then we also have all ingredients ("DST assumption") 
needed in every Kolab-object to retain backward compatibility by sticking 
to the date-time format specified in the existing Kolab-format 
specification (Zulu only, no milliseconds etc.).  
So as this is such a low hanging fruit, why not grab it?

Cheers
        Florian

P.S.: For the definition of in-format TZ-data, Bernhard suggested 
(<http://kolab.org/pipermail/kolab-format/2010-December/001161.html>) 
looking at <http://tools.ietf.org/html/draft-douglass-timezone-xml>, which 
is also what somebody else suggested later on this list ("do not reinvent 
the wheel", using the experience and efforts condensed in recent 
iCal-RFCs).
One technical design detail in the timezone-xml specification worries me a 
little: the use of XML-namespaces, which are a quite modern XML-features, 
which might not me well supported in all XML-libraries used, yet.  If that 
worry should turn out to be substantiated, an algorithmic transformation of 
the XML-namespaces defined in the timezone-xml specification to XML-subtags 
(e.g. per a provided RegEx) may be a solution.