[Tarantool-patches] [PATCH v5 2/8] lua: built-in module datetime
Vladimir Davydov
vdavydov at tarantool.org
Wed Aug 18 14:45:29 MSK 2021
On Wed, Aug 18, 2021 at 01:03:27PM +0300, Safin Timur wrote:
> > > +void
> > > +datetime_now(struct datetime *now)
> > > +{
> > > + struct timeval tv;
> > > + gettimeofday(&tv, NULL);
> > > + now->secs = tv.tv_sec;
> > > + now->nsec = tv.tv_usec * 1000;
> > > +
> > > + time_t now_seconds;
> > > + time(&now_seconds);
> >
> > Can't you use tv.tv_sec here?
>
> Nope. :)
>
> Actually I don't want to know value returned from time() - all I need is to
> eventually call localtime[_r]() to get local timezone. If you know simpler
> way to determine local timezone without time, then please give me know.
AFAIU time() returns exactly the same value as tv.tv_sec, where tv is
the value returned by gettimeofday(), no?
Come to think of it, I'm not so sure it's a good idea to call
localtime() on each call to get the timezone. In comparison to
gettimeofday() it 10x slower. Maybe, it's worth caching the current
timezone?
vlad at esperanza:~/tmp$ cat bench.lua
clock = require('clock')
datetime = require('datetime')
LOOPS = 10000000
t1 = clock.monotonic()
for i = 1, LOOPS do
datetime.now()
end
t2 = clock.monotonic()
print(string.format('datetime.now: %f', t2 - t1))
t1 = clock.monotonic()
for i = 1, LOOPS do
clock.realtime()
end
t2 = clock.monotonic()
print(string.format('clock.realtime: %f', t2 - t1))
os.exit(0)
vlad at esperanza:~/tmp$ tarantool bench.lua
datetime.now: 2.190355
clock.realtime: 0.201188
> > > +#define DT_EPOCH_1970_OFFSET 719163
> >
> > I don't understand what this constant stores. Please add a comment.
>
> I've documented them after Serge request this way:
>
> ------------------------------------------------------
> /**
> * We count dates since so called "Rata Die" date
> * January 1, 0001, Monday (as Day 1).
> * But datetime structure keeps seconds since
> * Unix "Epoch" date:
> * Unix, January 1, 1970, Thursday
> *
> * The difference between Epoch (1970-01-01)
> * and Rata Die (0001-01-01) is 719163 days.
> */
>
> #ifndef SECS_PER_DAY
> #define SECS_PER_DAY 86400
> #define DT_EPOCH_1970_OFFSET 719163
> #endif
>
> /**
> * c-dt library uses int as type for dt value, which
> * represents the number of days since Rata Die date.
> * This implies limits to the number of seconds we
> * could safely store in our structures and then safely
> * pass to c-dt functions.
> *
> * So supported ranges will be
> * - for seconds [-185604722870400 .. 185480451417600]
> * - for dates [-5879610-06-22T00:00Z .. 5879611-07-11T00:00Z]
> */
> #define MAX_DT_DAY_VALUE (int64_t)INT_MAX
> #define MIN_DT_DAY_VALUE (int64_t)INT_MIN
> #define SECS_EPOCH_1970_OFFSET \
> ((int64_t)DT_EPOCH_1970_OFFSET * SECS_PER_DAY)
> #define MAX_EPOCH_SECS_VALUE \
> (MAX_DT_DAY_VALUE * SECS_PER_DAY - SECS_EPOCH_1970_OFFSET)
> #define MIN_EPOCH_SECS_VALUE \
> (MIN_DT_DAY_VALUE * SECS_PER_DAY - SECS_EPOCH_1970_OFFSET)
I'd name them DT_MAX_DAY_VALUE, etc, with DT_ prefix. For consistency
with DT_EPOCH_1970_OFFSET and so as to emphasize that these constants
are related to datetime objects. SECS_PER_DAY, MINS_PER_DAY, etc are
fine without prefix, because they are universal.
> > > +local function datetime_new_dt(dt, secs, frac, offset)
> >
> > What's 'frac'? Please either add a comment or give it a better name.
>
> fraction part. Renamed to full fraction.
It's unclear what fraction is of. I'd rename it to 'nsecs', because
the function it is passed to (datetime_new_raw) actually takes
nanoseconds.
> > > +-- create datetime given attribute values from obj
> > > +-- in the "easy mode", providing builder with
> > > +-- .secs, .nsec, .offset
> > > +local function datetime_new_obj(obj, ...)
> > > + if obj == nil or type(obj) ~= 'table' then
> >
> > type(obj) ~= 'table' implies 'obj' == nil
>
> I meant that obj may be simple number, not builder object.
I understand, but
if type(obj) ~= 'table' then
will work for obj == nil (because type(obj) is nil ~= 'table).
That is, 'obj == nil' check is redundant.
>
> >
> > > + return datetime_new_raw(obj, ...)
> > > + end
> > > + local secs = 0
> > > + local nsec = 0
> > > + local offset = 0
> > > +
> > > + for key, value in pairs(obj) do
> > > + if key == 'secs' then
> > > + secs = value
> > > + elseif key == 'nsec' then
> > > + nsec = value
> > > + elseif key == 'offset' then
> > > + offset = value
> > > + else
> > > + error(('unknown attribute %s'):format(key), 2)
> > > + end
> >
> > Hmm, do we really need this check? IMO this is less clear and probably
> > slower than accessing the table directly:
> >
> > local secs = obj.secs
> > local nsecs = obj.nsecs
> > ...
>
> So here is dilema:
> - we could directly pass initialization object to the ffi.new. But allow all
> kinds of unknown attributes;
> - or we could check all attributes in a slightly slower loop, but provide
> nice user visible diagnostics for an error.
>
> What would you recommend? (I prefer implementation with checks and
> human-understandable errors)
I understand the dilemma. I don't know what we should do. Probably ask
someone who writes a lot of Lua code. I, personally, haven't seen checks
like this in our Lua code - we never check if there's bogus positional
arguments AFAICS. E.g. if you pass {tmieout = 123} to net.box.call() in
options, there will be no error (and apparently no timeout because of
the typo). I think that this is the Lua way, but I can't say for sure.
> > > +local function datetime_new(obj)
> > > + if obj == nil or type(obj) ~= 'table' then
> > > + return datetime_new_raw(0, 0, 0)
> > > + end
> > > + local y = 0
> > > + local M = 0
> > > + local d = 0
> > > + local ymd = false
> > > +
> > > + local h = 0
> > > + local m = 0
> > > + local s = 0
> > > + local frac = 0
> > > + local hms = false
> > > + local offset = 0
> > > +
> > > + local dt = 0
> > > +
> > > + for key, value in pairs(obj) do
> > > + if key == 'year' then
> > > + check_range(value, {1, 9999}, key)
> > > + y = value
> > > + ymd = true
> > > + elseif key == 'month' then
> > > + check_range(value, {1, 12}, key)
> > > + M = value
> > > + ymd = true
> > > + elseif key == 'day' then
> > > + check_range(value, {1, 31}, key)
> > > + d = value
> > > + ymd = true
> > > + elseif key == 'hour' then
> > > + check_range(value, {0, 23}, key)
> > > + h = value
> > > + hms = true
> > > + elseif key == 'min' or key == 'minute' then
> > > + check_range(value, {0, 59}, key)
> > > + m = value
> > > + hms = true
> > > + elseif key == 'sec' or key == 'second' then
> >
> > I don't think we should support both 'sec'/'min' and 'second'/'minute'
> > here. Since you want to be consistent with os.date() output, I would
> > leave 'sec'/'min' only.
>
> Originally, there were only human-readable names like 'second' or 'minute',
> os.date compatibility added only lately. Because, it costs nothing
> (comparing to all other overhead).
>
> I'd prefer to have ability to use full readable names, not only those from
> set of os.date. It costs nothing, but make it's friendlier.
'min' and 'sec' are common abbreviations so I wouldn't say they are not
human-readable.
> > > +local function datetime_index(self, key)
> > > + if key == 'epoch' or key == 'unixtime' then
> > > + return self.secs
> > > + elseif key == 'ts' or key == 'timestamp' then
> > > + return self.secs + self.nsec / 1e9
> > > + elseif key == 'ns' or key == 'nanoseconds' then
> > > + return self.secs * 1e9 + self.nsec
> > > + elseif key == 'us' or key == 'microseconds' then
> > > + return self.secs * 1e6 + self.nsec / 1e3
> > > + elseif key == 'ms' or key == 'milliseconds' then
> > > + return self.secs * 1e3 + self.nsec / 1e6
> > > + elseif key == 's' or key == 'seconds' then
> > > + return self.secs + self.nsec / 1e9
> > > + elseif key == 'm' or key == 'min' or key == 'minutes' then
> > > + return (self.secs + self.nsec / 1e9) / 60
> > > + elseif key == 'hr' or key == 'hours' then
> > > + return (self.secs + self.nsec / 1e9) / (60 * 60)
> > > + elseif key == 'd' or key == 'days' then
> > > + return (self.secs + self.nsec / 1e9) / (24 * 60 * 60)
> >
> > 1. There's so many ways to get the same information, for example m,
> > min, minutes. There should be exactly one way.
>
> Disagreed. I prefer friendlier approach.
I wouldn't call it 'friendly'. What exactly, as a user, should I use,
'minutes', 'min', or 'm'? Freedom of choice is a pain.
>
> >
> > 2. I'd expect datetime.hour return the number of hours in the current
> > day, like os.date(), but it returns the number of hours since the
> > Unix epoch instead. This is confusing and useless.
>
> Well, at least this is consistent with seconds and their fractions of
> different units. If there would be demand for os.date compatible table
> format, we might extend interface with corresponding accessor. But I do not
> foresee it, because os.date is totally broken if we need correct handling of
> timezone information. AFAIK it's always using local time context.
It's just that I don't see any point in knowing the number of hours that
passed since 1970. Better remove this method altogether then. To be
discussed.
> >
> > 3. Please document the behavior somewhere in the comments to the code.
>
> Documented.
>
> >
> > 4. These if-else's are inefficient AFAIU.
>
> Quite the contrary (if we compare to the idiomatic closure handlers case).
> If/ifelse is the fastest way to handle such cases in Lua.
> Here is earlier bench we have discussed with Vlad and Oleg Babin, and after
> which I've changed from using hash based handlers to the series of ifs.
>
> https://gist.github.com/tsafin/31cc9b0872b6015904fcc90d97740770
The test creates a new hashtable on each function invocation - no
surprise it's slow. The hashtable should be global.
>
>
> >
> > > + elseif key == 'to_utc' then
> > > + return function(self)
> > > + return datetime_to_tz(self, 0)
> > > + end
> > > + elseif key == 'to_tz' then
> > > + return function(self, offset)
> > > + return datetime_to_tz(self, offset)
> > > + end
> > > + else
> > > + error(('unknown attribute %s'):format(key), 2)
> > > + end
> > > +end
> > > +
> > > +local function datetime_newindex(self, key, value)
> > > + if key == 'epoch' or key == 'unixtime' then
> > > + self.secs = value
> > > + self.nsec, self.offset = 0, 0
> > > + elseif key == 'ts' or key == 'timestamp' then
> > > + local secs, frac = math_modf(value)
> > > + self.secs = secs
> > > + self.nsec = frac * 1e9
> > > + self.offset = 0
> >
> > Do we really want the datetime object to be mutable? If so, allowing to
> > set its value only to the time since the unix epoch doesn't look
> > particularly user-friendly.
>
> I do want. That was request from customer - to have ability to modify by
> timestamp from Epoch.
Why can't the user create a new datetime object instead?
To be discussed.
> > > +local datetime_mt = {
> > > + __serialize = datetime_serialize,
> > > + __eq = datetime_eq,
> > > + __lt = datetime_lt,
> > > + __le = datetime_le,
> > > + __index = datetime_index,
> > > + __newindex = datetime_newindex,
> > > +}
> > > +
> > > +local interval_mt = {
> > > + __serialize = interval_serialize,
> > > + __eq = datetime_eq,
> > > + __lt = datetime_lt,
> > > + __le = datetime_le,
> > > + __index = datetime_index,
> >
> > Why does datetime_mt has __newindex while interval_mt doesn't?
>
> I did not foresee the need. Should we?
>
> And semantically datetime has persistence and life-time, but intervals have
> not. [And besides all we already provide a plenty of helpers for creation of
> intervals]
You're right - there's no point in __newindex for interval. Resolved.
>
> >
> > > +}
> > > +
> > > +ffi.metatype(interval_t, interval_mt)
> > > +ffi.metatype(datetime_t, datetime_mt)
> > > +
> > > +return setmetatable(
> > > + {
> > > + new = datetime_new,
> > > + new_raw = datetime_new_obj,
> >
> > I'm not sure, we need to make the 'raw' function public.
>
> That's for second kind of constructors using initialization object but with
> low level attributes {secs = 0, nsec = 0, offset}. It's not for everybody,
> but still provides some ergonomics.
You're probably right. Still, to be discussed.
> > > +
> > > + parse = parse,
> > > + parse_date = parse_date,
> >
> > > + parse_time = parse_time,
> > > + parse_zone = parse_zone,
> >
> > Creating a datetime object without a date sounds confusing.
>
> :) But I do foresee the need to parse parts of datetime string, and I did
> not want to establish special type for that.
>
> Now you have asked it I realized we may better create interval objects of
> approapriate time. But in this case we need to add timezone to interval
> record. So it will be able to represent timezone shifts.
>
> Ok, heer is the deal:
> - at the moment those partial parsed objects store their data to the partial
> datetime object created. That's confusing but do not require modifications
> in interval arithmetic.
> - we may start to store partial time and timezone information into generic
> intervals. But it would require extension of interval arithmetic to be ready
> to shift by timezone delta. [Does it even make any sense?]
>
> What do you think?
No, I don't like the idea of storing a timezone shift in time interval.
IMO better introduce separate 'time' and 'date' objects, which would
store the time in the datetime struct under the hood. This needs to be
discussed.
>
> >
> > > +
> > > + now = local_now,
> > > + strftime = strftime,
> > > + asctime = asctime,
> > > + ctime = ctime,
> > > +
> > > + is_datetime = is_datetime,
> > > + is_interval = is_interval,
> >
> > I don't see any point in making these functions public.
>
> I was asked by customer to introduce such.
Okay, please add tests for them.
More information about the Tarantool-patches
mailing list