Basic rationale of the KEP #2 design

Jeroen van Meeuwen (Kolab Systems) vanmeeuwen at kolabsys.com
Mon Mar 14 23:04:51 CET 2011


Georg C. F. Greve wrote:
> On Monday 14 March 2011 17.34:22 Hendrik Helwich wrote:
> > The proposal for RFC3339 came from you.
> 
> Actually the initial proposal for RFC3339 came from Jeroen van Meeuwen, if I 
> remember correctly. And because in standardization re-invention of the wheel 
> is usually a very bad idea, and because RFC3339 and its super-set ISO8601 
are 
> the widely spread standards to express this kind of data, this seemed like a 
> sane approach to follow.
> 

First and fore-most, it seems to be a very bad idea to not allow RFC 3339 
compatible datetime notations just because parsers may not be readily 
available, only to then come up with a format rather exclusively applied, for 
which parsers possibly not being readily available can only be more prominent 
problem rather then less so.

That said, when I originally recommended allowing any RFC 3339 compatible 
format, the KEP was stating to use UTC datetime stamps.

Also, for any client to have been capable to parse the relevant parts of any 
RFC 3339 compatible datetime notation -noted it was based on the datetime 
stamp being in UTC, using a substring of static length should have sufficed; 
noted the client's Kolab XML format parsing libraries compatible with the 
newer version of the specification for the format would also have take into 
account the 'tz'-attribute -as is required by the newer version of the 
specification for the format, or in cases this attribute would not be 
available, interpret the datetime, or same static-length sub-string, as UTC, 
compatible with the older clients in that behaviour. Additionally, clients are 
then also not required to have fully featured RFC 3339 datetime stamp system 
library parsing routines. Have them? Do parsing. Don't have them? Take string 
and see how far you get. Not getting far (because of +0000 perhaps)? Take sub-
string.

Part of that argument for RFC 3339 compatible *parsing* -writing would have 
been limited to any one notation, by simple recommendation- was based on -the 
then current- UTC datetime, not local time; with UTC datetime only, it doesn't 
matter which notation format is being used as they all indicate UTC 
regardless, and there is minimal levels of redundancy or ambiguity in the 
information stored. The ability to parse everything RFC 3339 compatible would 
allow clients to write out whatever was easiest to them, even though the KEP 
may have recommended format X, Y or Z (out of those RFC 3339 compatible). The 
recommended strict writing and loose parsing, I think, is a good practice.

The argument for UTC timestamps has also been, that legacy clients would be 
enabled to still be compatible with the new format - but not gain the fix for 
the long-standing recurrence/DST shortcoming. While some format parsing client 
libraries apparently do not preserve unknown attributes and tags, while 
specified in previous versions of the format specification, they indeed break 
the new specification -as they did the old one-, but also ultimately revert 
new clients functionality back to the old behaviour with whatever they touch -
and break.

> RFC3339 basedness has then been modified/strengthened/weakened in different 
> versions of the KEP, which at the time you are quoting was still based on 
UTC 
> + time zone, because most people, including myself, erroneously believed 
this 
> was suffcient to resolve the issue.
> 

It was not erroneously believed using UTC solved the issue, because it can;

You stated there is no way to restore the originally intended local time when 
UTC datetimes are used, but it seems the last-modification datetime has been 
overlooked. This datetime does indicate whether the client was experiencing 
DST in a given timezone when the XML was written out - can be expanded with 
storing the local timezone with the last-modification datetime. However, this 
is not required at all as long as the parsing logic is consistent.

You stated clients cannot foretell the future, but back in those days of the 
discussion, where the 'tz' attribute was only supposed to indicate the 
authoritative timezone that should be used, a timestamp for 18:00 UTC with a 
'tz' attribute set to Europe/Berlin had been consistently saying the same 
thing at the data storage level -the client didn't need to predict the future 
beyond any feasible horizon; DST changes or not, the data storage would not 
require any change. It has been a matter of defining the parser logic that 
resulted in what was perceived as erroneousity. A very simple fix would have 
been to specify clients must write out the datetime as if DST was never 
invented, and only use their baseline offset; two situations for writing the 
data out exist:

- During the summer, user decides he wants a weekly 11am company meeting and 
is in Zurich.
- Writes out 10 UTC, Europe/Zurich, as the baseline offset is +1 -and does not 
write out 9 UTC accounting for the current DST offset to UTC.

- During the winter, user decides he wants a weekly 11am company meeting and 
is in Zurich,
- Writes out 10 UTC, Europe/Zurich, as the baseline offset is +1

- All clients know the baseline UTC offset for Zurich is +1, and the DST 
offset for Zurich is +2.
- The local baseline UTC offset is known (0 for me in this case), and the DST 
information for my location is available too; +1. In my case, this cancels the 
+1 Zurich is also experiencing.

- Employee in South Africa/Cape Town must call in at 11am in the summer; 
knowing the UTC offset for Zurich at that moment is +2, it's timestamp minus 2 
(negate Zurich offset) + 2 (Cape Town offset).

- Employee in South Africa/Cape Town must call in at 12am in the winter; 
knowing the UTC offset for Zurich at that moment is +1, it's timestamp minus 1 
(negate Zurich offset) + 2 (Cape Town offset).

While the specification could just say that datetimes should always be written 
out as if no DST was ever invented, I'm sure that leaves too much room for 
error.

While the specification could just say to use the local time, this builds an 
information set with either ambiguous or duplicate information; a timestamp 
with +0200 (useless) offset information attached to an event with 'tz' 
attribute 'Europe/Berlin', for a recurring event instance in December 
(ambiguous) or the same recurring event with an instance in July (duplicate).

While the specification could just say to use the local time but not include 
the offset, this builds a one-off and unique notation format to Kolab.

The third and fourth point seem to have been derived from the first two more 
so then valid points in themselves.

FYI, with local times being stored the exact notation becomes the culprit more 
so then the calculation of baseline offsets and local offsets, which is 
virtually the same:

- 11am with tz /Zurich, client knows the current offset for Zurich to be 
either +1 (winter) or +2 (summer), substract for the current UTC timestamp (if 
needed), add local current offset (if needed).

I had concerns about storing the local time and not UTC right from the 
beginning, and I've attempted to express those, but having failed to come up 
with constructive feedback on my part I decided to refrain from making any 
further comments on the subject.

I figured either way we would come to a working solution and ultimately the 
specification would define how the interpretation of data was to be 
undertaken, and how the interpretation could lead to proper presentation of 
the event. With UTC, this is regularly "what's my offset", or with a 'tz'-
attribute "what's the intended offset and what's my offset". With local time, 
I figured, this is much the same -but not exactly the same.

Anyway, I hope the former puts some light on why I suggested clients should be 
able to parse any RFC 3339 stamps in the first place, because while I disagree 
with the argument of complexity in the sense of code, yes indeed using local 
times in any RFC 3339 compatible format does not increase human readibility, 
disambiguity nor deduplication of information. Especially if the 
recommendation on strict writing is losely parsed -no pun intended ;-)

Kind regards,

-- 
Jeroen van Meeuwen
Senior Engineer, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
t: +316 42 801 403
w: http://www.kolabsys.com

pgp: 9342 BF08




More information about the format mailing list