[Tarantool-patches] [PATCH v5 2/8] lua: built-in module datetime

Vladimir Davydov vdavydov at tarantool.org
Wed Aug 18 14:45:29 MSK 2021


On Wed, Aug 18, 2021 at 01:03:27PM +0300, Safin Timur wrote:
> > > +void
> > > +datetime_now(struct datetime *now)
> > > +{
> > > +	struct timeval tv;
> > > +	gettimeofday(&tv, NULL);
> > > +	now->secs = tv.tv_sec;
> > > +	now->nsec = tv.tv_usec * 1000;
> > > +
> > > +	time_t now_seconds;
> > > +	time(&now_seconds);
> > 
> > Can't you use tv.tv_sec here?
> 
> Nope. :)
> 
> Actually I don't want to know value returned from time() - all I need is to
> eventually call localtime[_r]() to get local timezone. If you know simpler
> way to determine local timezone without time, then please give me know.

AFAIU time() returns exactly the same value as tv.tv_sec, where tv is
the value returned by gettimeofday(), no?

Come to think of it, I'm not so sure it's a good idea to call
localtime() on each call to get the timezone. In comparison to
gettimeofday() it 10x slower. Maybe, it's worth caching the current
timezone?

vlad at esperanza:~/tmp$ cat bench.lua
clock = require('clock')
datetime = require('datetime')

LOOPS = 10000000

t1 = clock.monotonic()
for i = 1, LOOPS do
    datetime.now()
end
t2 = clock.monotonic()
print(string.format('datetime.now: %f', t2 - t1))

t1 = clock.monotonic()
for i = 1, LOOPS do
    clock.realtime()
end
t2 = clock.monotonic()
print(string.format('clock.realtime: %f', t2 - t1))

os.exit(0)
vlad at esperanza:~/tmp$ tarantool bench.lua
datetime.now: 2.190355
clock.realtime: 0.201188

> > > +#define DT_EPOCH_1970_OFFSET  719163
> > 
> > I don't understand what this constant stores. Please add a comment.
> 
> I've documented them after Serge request this way:
> 
> ------------------------------------------------------
> /**
>  * We count dates since so called "Rata Die" date
>  * January 1, 0001, Monday (as Day 1).
>  * But datetime structure keeps seconds since
>  * Unix "Epoch" date:
>  * Unix, January 1, 1970, Thursday
>  *
>  * The difference between Epoch (1970-01-01)
>  * and Rata Die (0001-01-01) is 719163 days.
>  */
> 
> #ifndef SECS_PER_DAY
> #define SECS_PER_DAY          86400
> #define DT_EPOCH_1970_OFFSET  719163
> #endif
> 
> /**
>  * c-dt library uses int as type for dt value, which
>  * represents the number of days since Rata Die date.
>  * This implies limits to the number of seconds we
>  * could safely store in our structures and then safely
>  * pass to c-dt functions.
>  *
>  * So supported ranges will be
>  * - for seconds [-185604722870400 .. 185480451417600]
>  * - for dates   [-5879610-06-22T00:00Z .. 5879611-07-11T00:00Z]
>  */
> #define MAX_DT_DAY_VALUE (int64_t)INT_MAX
> #define MIN_DT_DAY_VALUE (int64_t)INT_MIN
> #define SECS_EPOCH_1970_OFFSET 	\
> 	((int64_t)DT_EPOCH_1970_OFFSET * SECS_PER_DAY)
> #define MAX_EPOCH_SECS_VALUE    \
> 	(MAX_DT_DAY_VALUE * SECS_PER_DAY - SECS_EPOCH_1970_OFFSET)
> #define MIN_EPOCH_SECS_VALUE    \
> 	(MIN_DT_DAY_VALUE * SECS_PER_DAY - SECS_EPOCH_1970_OFFSET)

I'd name them DT_MAX_DAY_VALUE, etc, with DT_ prefix. For consistency
with DT_EPOCH_1970_OFFSET and so as to emphasize that these constants
are related to datetime objects. SECS_PER_DAY, MINS_PER_DAY, etc are
fine without prefix, because they are universal.

> > > +local function datetime_new_dt(dt, secs, frac, offset)
> > 
> > What's 'frac'? Please either add a comment or give it a better name.
> 
> fraction part. Renamed to full fraction.

It's unclear what fraction is of. I'd rename it to 'nsecs', because
the function it is passed to (datetime_new_raw) actually takes
nanoseconds.

> > > +-- create datetime given attribute values from obj
> > > +-- in the "easy mode", providing builder with
> > > +-- .secs, .nsec, .offset
> > > +local function datetime_new_obj(obj, ...)
> > > +    if obj == nil or type(obj) ~= 'table' then
> > 
> > type(obj) ~= 'table' implies 'obj' == nil
> 
> I meant that obj may be simple number, not builder object.

I understand, but

  if type(obj) ~= 'table' then

will work for obj == nil (because type(obj) is nil ~= 'table).
That is, 'obj == nil' check is redundant.

> 
> > 
> > > +        return datetime_new_raw(obj, ...)
> > > +    end
> > > +    local secs = 0
> > > +    local nsec = 0
> > > +    local offset = 0
> > > +
> > > +    for key, value in pairs(obj) do
> > > +        if key == 'secs' then
> > > +            secs = value
> > > +        elseif key == 'nsec' then
> > > +            nsec = value
> > > +        elseif key == 'offset' then
> > > +            offset = value
> > > +        else
> > > +            error(('unknown attribute %s'):format(key), 2)
> > > +        end
> > 
> > Hmm, do we really need this check? IMO this is less clear and probably
> > slower than accessing the table directly:
> > 
> > local secs = obj.secs
> > local nsecs = obj.nsecs
> > ...
> 
> So here is dilema:
> - we could directly pass initialization object to the ffi.new. But allow all
> kinds of unknown attributes;
> - or we could check all attributes in a slightly slower loop, but provide
> nice user visible diagnostics for an error.
> 
> What would you recommend? (I prefer implementation with checks and
> human-understandable errors)

I understand the dilemma. I don't know what we should do. Probably ask
someone who writes a lot of Lua code. I, personally, haven't seen checks
like this in our Lua code - we never check if there's bogus positional
arguments AFAICS. E.g. if you pass {tmieout = 123} to net.box.call() in
options, there will be no error (and apparently no timeout because of
the typo). I think that this is the Lua way, but I can't say for sure.

> > > +local function datetime_new(obj)
> > > +    if obj == nil or type(obj) ~= 'table' then
> > > +        return datetime_new_raw(0, 0, 0)
> > > +    end
> > > +    local y = 0
> > > +    local M = 0
> > > +    local d = 0
> > > +    local ymd = false
> > > +
> > > +    local h = 0
> > > +    local m = 0
> > > +    local s = 0
> > > +    local frac = 0
> > > +    local hms = false
> > > +    local offset = 0
> > > +
> > > +    local dt = 0
> > > +
> > > +    for key, value in pairs(obj) do
> > > +        if key == 'year' then
> > > +            check_range(value, {1, 9999}, key)
> > > +            y = value
> > > +            ymd = true
> > > +        elseif key == 'month' then
> > > +            check_range(value, {1, 12}, key)
> > > +            M = value
> > > +            ymd = true
> > > +        elseif key == 'day' then
> > > +            check_range(value, {1, 31}, key)
> > > +            d = value
> > > +            ymd = true
> > > +        elseif key == 'hour' then
> > > +            check_range(value, {0, 23}, key)
> > > +            h = value
> > > +            hms = true
> > > +        elseif key == 'min' or key == 'minute' then
> > > +            check_range(value, {0, 59}, key)
> > > +            m = value
> > > +            hms = true
> > > +        elseif key == 'sec' or key == 'second' then
> > 
> > I don't think we should support both 'sec'/'min' and 'second'/'minute'
> > here. Since you want to be consistent with os.date() output, I would
> > leave 'sec'/'min' only.
> 
> Originally, there were only human-readable names like 'second' or 'minute',
> os.date compatibility added only lately. Because, it costs nothing
> (comparing to all other overhead).
> 
> I'd prefer to have ability to use full readable names, not only those from
> set of os.date. It costs nothing, but make it's friendlier.

'min' and 'sec' are common abbreviations so I wouldn't say they are not
human-readable.

> > > +local function datetime_index(self, key)
> > > +    if key == 'epoch' or key == 'unixtime' then
> > > +        return self.secs
> > > +    elseif key == 'ts' or key == 'timestamp' then
> > > +        return self.secs + self.nsec / 1e9
> > > +    elseif key == 'ns' or key == 'nanoseconds' then
> > > +        return self.secs * 1e9 + self.nsec
> > > +    elseif key == 'us' or key == 'microseconds' then
> > > +        return self.secs * 1e6 + self.nsec / 1e3
> > > +    elseif key == 'ms' or key == 'milliseconds' then
> > > +        return self.secs * 1e3 + self.nsec / 1e6
> > > +    elseif key == 's' or key == 'seconds' then
> > > +        return self.secs + self.nsec / 1e9
> > > +    elseif key == 'm' or key == 'min' or key == 'minutes' then
> > > +        return (self.secs + self.nsec / 1e9) / 60
> > > +    elseif key == 'hr' or key == 'hours' then
> > > +        return (self.secs + self.nsec / 1e9) / (60 * 60)
> > > +    elseif key == 'd' or key == 'days' then
> > > +        return (self.secs + self.nsec / 1e9) / (24 * 60 * 60)
> > 
> >   1. There's so many ways to get the same information, for example m,
> >      min, minutes. There should be exactly one way.
> 
> Disagreed. I prefer friendlier approach.

I wouldn't call it 'friendly'. What exactly, as a user, should I use,
'minutes', 'min', or 'm'? Freedom of choice is a pain.

> 
> > 
> >   2. I'd expect datetime.hour return the number of hours in the current
> >      day, like os.date(), but it returns the number of hours since the
> >      Unix epoch instead. This is confusing and useless.
> 
> Well, at least this is consistent with seconds and their fractions of
> different units. If there would be demand for os.date compatible table
> format, we might extend interface with corresponding accessor. But I do not
> foresee it, because os.date is totally broken if we need correct handling of
> timezone information. AFAIK it's always using local time context.

It's just that I don't see any point in knowing the number of hours that
passed since 1970. Better remove this method altogether then. To be
discussed.

> > 
> >   3. Please document the behavior somewhere in the comments to the code.
> 
> Documented.
> 
> > 
> >   4. These if-else's are inefficient AFAIU.
> 
> Quite the contrary (if we compare to the idiomatic closure handlers case).
> If/ifelse is the fastest way to handle such cases in Lua.
> Here is earlier bench we have discussed with Vlad and Oleg Babin, and after
> which I've changed from using hash based handlers to the series of ifs.
> 
> https://gist.github.com/tsafin/31cc9b0872b6015904fcc90d97740770

The test creates a new hashtable on each function invocation - no
surprise it's slow. The hashtable should be global.

> 
> 
> > 
> > > +    elseif key == 'to_utc' then
> > > +        return function(self)
> > > +            return datetime_to_tz(self, 0)
> > > +        end
> > > +    elseif key == 'to_tz' then
> > > +        return function(self, offset)
> > > +            return datetime_to_tz(self, offset)
> > > +        end
> > > +    else
> > > +        error(('unknown attribute %s'):format(key), 2)
> > > +    end
> > > +end
> > > +
> > > +local function datetime_newindex(self, key, value)
> > > +    if key == 'epoch' or key == 'unixtime' then
> > > +        self.secs = value
> > > +        self.nsec, self.offset = 0, 0
> > > +    elseif key == 'ts' or key == 'timestamp' then
> > > +        local secs, frac = math_modf(value)
> > > +        self.secs = secs
> > > +        self.nsec = frac * 1e9
> > > +        self.offset = 0
> > 
> > Do we really want the datetime object to be mutable? If so, allowing to
> > set its value only to the time since the unix epoch doesn't look
> > particularly user-friendly.
> 
> I do want. That was request from customer - to have ability to modify by
> timestamp from Epoch.

Why can't the user create a new datetime object instead?
To be discussed.

> > > +local datetime_mt = {
> > > +    __serialize = datetime_serialize,
> > > +    __eq = datetime_eq,
> > > +    __lt = datetime_lt,
> > > +    __le = datetime_le,
> > > +    __index = datetime_index,
> > > +    __newindex = datetime_newindex,
> > > +}
> > > +
> > > +local interval_mt = {
> > > +    __serialize = interval_serialize,
> > > +    __eq = datetime_eq,
> > > +    __lt = datetime_lt,
> > > +    __le = datetime_le,
> > > +    __index = datetime_index,
> > 
> > Why does datetime_mt has __newindex while interval_mt doesn't?
> 
> I did not foresee the need. Should we?
> 
> And semantically datetime has persistence and life-time, but intervals have
> not. [And besides all we already provide a plenty of helpers for creation of
> intervals]

You're right - there's no point in __newindex for interval. Resolved.

> 
> > 
> > > +}
> > > +
> > > +ffi.metatype(interval_t, interval_mt)
> > > +ffi.metatype(datetime_t, datetime_mt)
> > > +
> > > +return setmetatable(
> > > +    {
> > > +        new         = datetime_new,
> > > +        new_raw     = datetime_new_obj,
> > 
> > I'm not sure, we need to make the 'raw' function public.
> 
> That's for second kind of constructors using initialization object but with
> low level attributes {secs = 0, nsec = 0, offset}. It's not for everybody,
> but still provides some ergonomics.

You're probably right. Still, to be discussed.

> > > +
> > > +        parse       = parse,
> > > +        parse_date  = parse_date,
> > 
> > > +        parse_time  = parse_time,
> > > +        parse_zone  = parse_zone,
> > 
> > Creating a datetime object without a date sounds confusing.
> 
> :) But I do foresee the need to parse parts of datetime string, and I did
> not want to establish special type for that.
> 
> Now you have asked it I realized we may better create interval objects of
> approapriate time. But in this case we need to add timezone to interval
> record. So it will be able to represent timezone shifts.
> 
> Ok, heer is the deal:
> - at the moment those partial parsed objects store their data to the partial
> datetime object created. That's confusing but do not require modifications
> in interval arithmetic.
> - we may start to store partial time and timezone information into generic
> intervals. But it would require extension of interval arithmetic to be ready
> to shift by timezone delta. [Does it even make any sense?]
> 
> What do you think?

No, I don't like the idea of storing a timezone shift in time interval.
IMO better introduce separate 'time' and 'date' objects, which would
store the time in the datetime struct under the hood. This needs to be
discussed.

> 
> > 
> > > +
> > > +        now         = local_now,
> > > +        strftime    = strftime,
> > > +        asctime     = asctime,
> > > +        ctime       = ctime,
> > > +
> > > +        is_datetime = is_datetime,
> > > +        is_interval = is_interval,
> > 
> > I don't see any point in making these functions public.
> 
> I was asked by customer to introduce such.

Okay, please add tests for them.


More information about the Tarantool-patches mailing list