Basic rationale of the KEP #2 design
Jeroen van Meeuwen (Kolab Systems)
vanmeeuwen at kolabsys.com
Mon Mar 14 23:04:51 CET 2011
Georg C. F. Greve wrote:
> On Monday 14 March 2011 17.34:22 Hendrik Helwich wrote:
> > The proposal for RFC3339 came from you.
>
> Actually the initial proposal for RFC3339 came from Jeroen van Meeuwen, if I
> remember correctly. And because in standardization re-invention of the wheel
> is usually a very bad idea, and because RFC3339 and its super-set ISO8601
are
> the widely spread standards to express this kind of data, this seemed like a
> sane approach to follow.
>
First and fore-most, it seems to be a very bad idea to not allow RFC 3339
compatible datetime notations just because parsers may not be readily
available, only to then come up with a format rather exclusively applied, for
which parsers possibly not being readily available can only be more prominent
problem rather then less so.
That said, when I originally recommended allowing any RFC 3339 compatible
format, the KEP was stating to use UTC datetime stamps.
Also, for any client to have been capable to parse the relevant parts of any
RFC 3339 compatible datetime notation -noted it was based on the datetime
stamp being in UTC, using a substring of static length should have sufficed;
noted the client's Kolab XML format parsing libraries compatible with the
newer version of the specification for the format would also have take into
account the 'tz'-attribute -as is required by the newer version of the
specification for the format, or in cases this attribute would not be
available, interpret the datetime, or same static-length sub-string, as UTC,
compatible with the older clients in that behaviour. Additionally, clients are
then also not required to have fully featured RFC 3339 datetime stamp system
library parsing routines. Have them? Do parsing. Don't have them? Take string
and see how far you get. Not getting far (because of +0000 perhaps)? Take sub-
string.
Part of that argument for RFC 3339 compatible *parsing* -writing would have
been limited to any one notation, by simple recommendation- was based on -the
then current- UTC datetime, not local time; with UTC datetime only, it doesn't
matter which notation format is being used as they all indicate UTC
regardless, and there is minimal levels of redundancy or ambiguity in the
information stored. The ability to parse everything RFC 3339 compatible would
allow clients to write out whatever was easiest to them, even though the KEP
may have recommended format X, Y or Z (out of those RFC 3339 compatible). The
recommended strict writing and loose parsing, I think, is a good practice.
The argument for UTC timestamps has also been, that legacy clients would be
enabled to still be compatible with the new format - but not gain the fix for
the long-standing recurrence/DST shortcoming. While some format parsing client
libraries apparently do not preserve unknown attributes and tags, while
specified in previous versions of the format specification, they indeed break
the new specification -as they did the old one-, but also ultimately revert
new clients functionality back to the old behaviour with whatever they touch -
and break.
> RFC3339 basedness has then been modified/strengthened/weakened in different
> versions of the KEP, which at the time you are quoting was still based on
UTC
> + time zone, because most people, including myself, erroneously believed
this
> was suffcient to resolve the issue.
>
It was not erroneously believed using UTC solved the issue, because it can;
You stated there is no way to restore the originally intended local time when
UTC datetimes are used, but it seems the last-modification datetime has been
overlooked. This datetime does indicate whether the client was experiencing
DST in a given timezone when the XML was written out - can be expanded with
storing the local timezone with the last-modification datetime. However, this
is not required at all as long as the parsing logic is consistent.
You stated clients cannot foretell the future, but back in those days of the
discussion, where the 'tz' attribute was only supposed to indicate the
authoritative timezone that should be used, a timestamp for 18:00 UTC with a
'tz' attribute set to Europe/Berlin had been consistently saying the same
thing at the data storage level -the client didn't need to predict the future
beyond any feasible horizon; DST changes or not, the data storage would not
require any change. It has been a matter of defining the parser logic that
resulted in what was perceived as erroneousity. A very simple fix would have
been to specify clients must write out the datetime as if DST was never
invented, and only use their baseline offset; two situations for writing the
data out exist:
- During the summer, user decides he wants a weekly 11am company meeting and
is in Zurich.
- Writes out 10 UTC, Europe/Zurich, as the baseline offset is +1 -and does not
write out 9 UTC accounting for the current DST offset to UTC.
- During the winter, user decides he wants a weekly 11am company meeting and
is in Zurich,
- Writes out 10 UTC, Europe/Zurich, as the baseline offset is +1
- All clients know the baseline UTC offset for Zurich is +1, and the DST
offset for Zurich is +2.
- The local baseline UTC offset is known (0 for me in this case), and the DST
information for my location is available too; +1. In my case, this cancels the
+1 Zurich is also experiencing.
- Employee in South Africa/Cape Town must call in at 11am in the summer;
knowing the UTC offset for Zurich at that moment is +2, it's timestamp minus 2
(negate Zurich offset) + 2 (Cape Town offset).
- Employee in South Africa/Cape Town must call in at 12am in the winter;
knowing the UTC offset for Zurich at that moment is +1, it's timestamp minus 1
(negate Zurich offset) + 2 (Cape Town offset).
While the specification could just say that datetimes should always be written
out as if no DST was ever invented, I'm sure that leaves too much room for
error.
While the specification could just say to use the local time, this builds an
information set with either ambiguous or duplicate information; a timestamp
with +0200 (useless) offset information attached to an event with 'tz'
attribute 'Europe/Berlin', for a recurring event instance in December
(ambiguous) or the same recurring event with an instance in July (duplicate).
While the specification could just say to use the local time but not include
the offset, this builds a one-off and unique notation format to Kolab.
The third and fourth point seem to have been derived from the first two more
so then valid points in themselves.
FYI, with local times being stored the exact notation becomes the culprit more
so then the calculation of baseline offsets and local offsets, which is
virtually the same:
- 11am with tz /Zurich, client knows the current offset for Zurich to be
either +1 (winter) or +2 (summer), substract for the current UTC timestamp (if
needed), add local current offset (if needed).
I had concerns about storing the local time and not UTC right from the
beginning, and I've attempted to express those, but having failed to come up
with constructive feedback on my part I decided to refrain from making any
further comments on the subject.
I figured either way we would come to a working solution and ultimately the
specification would define how the interpretation of data was to be
undertaken, and how the interpretation could lead to proper presentation of
the event. With UTC, this is regularly "what's my offset", or with a 'tz'-
attribute "what's the intended offset and what's my offset". With local time,
I figured, this is much the same -but not exactly the same.
Anyway, I hope the former puts some light on why I suggested clients should be
able to parse any RFC 3339 stamps in the first place, because while I disagree
with the argument of complexity in the sense of code, yes indeed using local
times in any RFC 3339 compatible format does not increase human readibility,
disambiguity nor deduplication of information. Especially if the
recommendation on strict writing is losely parsed -no pun intended ;-)
Kind regards,
--
Jeroen van Meeuwen
Senior Engineer, Kolab Systems AG
e: vanmeeuwen at kolabsys.com
t: +316 42 801 403
w: http://www.kolabsys.com
pgp: 9342 BF08
More information about the format
mailing list