[Tarantool-patches] [PATCH resend v2 02/11] lua: built-in module datetime

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Jul 30 02:36:33 MSK 2021


Thanks for the patch!

This is a bad question to ask now, but why did you choose a dead library
from 2015 with just 25 stars, without any tests, and without any README as
a basic block for our datetime types? Is this the correct link I am
using - https://github.com/chansen/c-dt?

See 31 comment below.

On 28.07.2021 12:34, Timur Safin wrote:
> * created a new Tarantool built-in module `datetime`;
> * register cdef types for this module;
> * export some `dt_*` functions from `c-dt` library;
> * lua implementationis of `asctime` and `strftime`;

1. implementationis -> implementations

> * datetime parsing unit tests, with and withput timezones;

2. withput -> without

> diff --git a/src/exports.h b/src/exports.h
> index 5bb3e6a2b..db40c03a4 100644
> --- a/src/exports.h
> +++ b/src/exports.h
> @@ -531,3 +531,24 @@ EXPORT(uri_format)
>  EXPORT(uri_parse)
>  EXPORT(uuid_nil)
>  EXPORT(uuid_unpack)
> +EXPORT(dt_from_rdn)
> +EXPORT(dt_from_yd)
> +EXPORT(dt_from_ymd)
> +EXPORT(dt_from_yqd)
> +EXPORT(dt_from_ywd)
> +EXPORT(dt_to_yd)
> +EXPORT(dt_to_ymd)
> +EXPORT(dt_to_yqd)
> +EXPORT(dt_to_ywd)
> +EXPORT(dt_rdn)
> +EXPORT(dt_dow)
> +EXPORT(dt_parse_iso_date)
> +EXPORT(dt_parse_iso_time)
> +EXPORT(dt_parse_iso_time_basic)
> +EXPORT(dt_parse_iso_time_extended)
> +EXPORT(dt_parse_iso_zone)
> +EXPORT(dt_parse_iso_zone_basic)
> +EXPORT(dt_parse_iso_zone_extended)
> +EXPORT(dt_parse_iso_zone_lenient)
> +EXPORT(dt_from_struct_tm)
> +EXPORT(dt_to_struct_tm)

3. Please, read the comment in the beginning of the file. The
first sentence would be:

	Keep the symbols sorted by name for search and addition
	simplicity, to avoid duplicates.

> diff --git a/src/lib/core/datetime.h b/src/lib/core/datetime.h
> new file mode 100644
> index 000000000..403bf1c64
> --- /dev/null
> +++ b/src/lib/core/datetime.h
> @@ -0,0 +1,61 @@

<...>

> +#include <c-dt/dt_core.h>

4. Please, use "" instead of <> for non-system headers.

> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif /* defined(__cplusplus) */
> +
> +/**
> + * datetime structure consisting of:

5. Well, this comment isn't really helpful. Also we do not
use _t suffix ever except for typedefs. Please, drop both
these things.

The same for datetime_interval_t.

> + */
> +struct datetime_t {
> +	int64_t secs;	///< seconds since epoch
> +	int32_t nsec;	///< nanoseconds if any
> +	int32_t offset; ///< offset in minutes from GMT

6. We never use // nor /// nor ///< nor we write comments
on the same line as the code (except for old legacy code we
inherited from sqlite and some super old tarantool code.

The same for datetime_interval_t.

> +};
> +
> +/**
> + * Date/time delta structure
> + */
> +struct datetime_interval_t {
> +	int64_t secs; ///< relative seconds delta
> +	int32_t nsec; ///< nanoseconds delta
> +};
> +

7. Why do you need this file? It is not included anywhere.
And you don't need to define the structs in C if you are
using them in Lua only. You can just define them in Lua
using ffi.cdef like it is done in some other places.

> diff --git a/src/lua/datetime.lua b/src/lua/datetime.lua
> new file mode 100644
> index 000000000..0996ca5a2
> --- /dev/null
> +++ b/src/lua/datetime.lua
> @@ -0,0 +1,581 @@
> +local ffi = require('ffi')
> +local cdt = ffi.C
> +
> +ffi.cdef [[
> +
> +    typedef int dt_t;
> +
> +    // dt_core.h
> +    typedef enum {
> +        DT_MON       = 1,
> +        DT_MONDAY    = 1,
> +        DT_TUE       = 2,
> +        DT_TUESDAY   = 2,
> +        DT_WED       = 3,
> +        DT_WEDNESDAY = 3,
> +        DT_THU       = 4,
> +        DT_THURSDAY  = 4,
> +        DT_FRI       = 5,
> +        DT_FRIDAY    = 5,
> +        DT_SAT       = 6,
> +        DT_SATURDAY  = 6,
> +        DT_SUN       = 7,
> +        DT_SUNDAY    = 7,
> +    } dt_dow_t;
> +
> +    dt_t     dt_from_rdn     (int n);
> +    dt_t     dt_from_yd      (int y, int d);
> +    dt_t     dt_from_ymd     (int y, int m, int d);
> +    dt_t     dt_from_yqd     (int y, int q, int d);
> +    dt_t     dt_from_ywd     (int y, int w, int d);
> +
> +    void     dt_to_yd        (dt_t dt, int *y, int *d);
> +    void     dt_to_ymd       (dt_t dt, int *y, int *m, int *d);
> +    void     dt_to_yqd       (dt_t dt, int *y, int *q, int *d);
> +    void     dt_to_ywd       (dt_t dt, int *y, int *w, int *d);
> +
> +    int      dt_rdn          (dt_t dt);
> +    dt_dow_t dt_dow          (dt_t dt);
> +
> +    // dt_parse_iso.h
> +    size_t dt_parse_iso_date          (const char *str, size_t len, dt_t *dt);
> +
> +    size_t dt_parse_iso_time          (const char *str, size_t len, int *sod, int *nsec);
> +    size_t dt_parse_iso_time_basic    (const char *str, size_t len, int *sod, int *nsec);
> +    size_t dt_parse_iso_time_extended (const char *str, size_t len, int *sod, int *nsec);
> +
> +    size_t dt_parse_iso_zone          (const char *str, size_t len, int *offset);
> +    size_t dt_parse_iso_zone_basic    (const char *str, size_t len, int *offset);
> +    size_t dt_parse_iso_zone_extended (const char *str, size_t len, int *offset);
> +    size_t dt_parse_iso_zone_lenient  (const char *str, size_t len, int *offset);
> +
> +    // dt_tm.h
> +    dt_t    dt_from_struct_tm  (const struct tm *tm);
> +    void    dt_to_struct_tm    (dt_t dt, struct tm *tm);
> +
> +    // <asm-generic/posix_types.h>
> +    typedef long            __kernel_long_t;
> +    typedef unsigned long   __kernel_ulong_t;
> +    // /usr/include/x86_64-linux-gnu/bits/types/time_t.h
> +    typedef long            time_t;
> +
> +
> +    // <time.h>
> +    typedef __kernel_long_t	__kernel_time_t;
> +    typedef __kernel_long_t	__kernel_suseconds_t;
> +
> +    struct timespec {
> +        __kernel_time_t	        tv_sec;     /* seconds */
> +        long                    tv_nsec;    /* nanoseconds */
> +    };
> +
> +    struct timeval {
> +        __kernel_time_t	        tv_sec;	    /* seconds */
> +        __kernel_suseconds_t    tv_usec;    /* microseconds */
> +    };
> +
> +    struct timezone {
> +        int	tz_minuteswest;     /* minutes west of Greenwich */
> +        int	tz_dsttime;	        /* type of dst correction */
> +    };
> +
> +    // /usr/include/x86_64-linux-gnu/sys/time.h
> +    typedef struct timezone * __timezone_ptr_t;
> +
> +    /* Get the current time of day and timezone information,
> +       putting it into *TV and *TZ.  If TZ is NULL, *TZ is not filled.
> +       Returns 0 on success, -1 on errors.
> +
> +       NOTE: This form of timezone information is obsolete.
> +       Use the functions and variables declared in <time.h> instead.  */
> +    int gettimeofday (struct timeval *__tv, struct timezone * __tz);
> +
> +    // /usr/include/x86_64-linux-gnu/bits/types/struct_tm.h
> +    /* ISO C `broken-down time' structure.  */
> +    struct tm
> +    {
> +        int tm_sec;	        /* Seconds.	[0-60] (1 leap second) */
> +        int tm_min;	        /* Minutes.	[0-59] */
> +        int tm_hour;        /* Hours.	[0-23] */
> +        int tm_mday;        /* Day.		[1-31] */
> +        int tm_mon;	        /* Month.	[0-11] */
> +        int tm_year;        /* Year	- 1900.  */
> +        int tm_wday;        /* Day of week.	[0-6] */
> +        int tm_yday;        /* Days in year.[0-365]	*/
> +        int tm_isdst;       /* DST.		[-1/0/1]*/
> +
> +        long int tm_gmtoff; /* Seconds east of UTC.  */
> +        const char *tm_zone;/* Timezone abbreviation.  */
> +    };
> +
> +    // <time.h>
> +    /* Return the current time and put it in *TIMER if TIMER is not NULL.  */
> +    time_t time (time_t *__timer);
> +
> +    /* Format TP into S according to FORMAT.
> +    Write no more than MAXSIZE characters and return the number
> +    of characters written, or 0 if it would exceed MAXSIZE.  */
> +    size_t strftime (char * __s, size_t __maxsize, const char * __format,
> +                     const struct tm * __tp);
> +
> +    /* Parse S according to FORMAT and store binary time information in TP.
> +    The return value is a pointer to the first unparsed character in S.  */
> +    char *strptime (const char * __s, const char * __fmt, struct tm *__tp);
> +
> +    /* Return the `struct tm' representation of *TIMER in UTC,
> +    using *TP to store the result.  */
> +    struct tm *gmtime_r (const time_t * __timer, struct tm * __tp);
> +
> +    /* Return the `struct tm' representation of *TIMER in local time,
> +    using *TP to store the result.  */
> +    struct tm *localtime_r (const time_t * __timer, struct tm * __tp);
> +
> +    /* Return a string of the form "Day Mon dd hh:mm:ss yyyy\n"
> +    that is the representation of TP in this format.  */
> +    char *asctime (const struct tm *__tp);
> +
> +    /* Equivalent to `asctime (localtime (timer))'.  */
> +    char *ctime (const time_t *__timer);
> +

8. The amount of platform-specific stuff you got here really bothers me.
For instance, struct tm from /usr/include/x86_64-linux-gnu/bits/types/struct_tm.h.

Is it said anywhere that the struct has only these members and only in this
order? The standard says "includes at least the following members". Not
only them and no specific order.

I would probably go with Lua C implementation if I would start this
module. Too strong dependecy on C. Also I wouldn't be so sure FFI is faster than
Lua C: https://github.com/tarantool/tarantool/issues/5896.

> +]]
> +
> +local native = ffi.C

9. You already have cdt in the beginning of the file.

> +
> +local SECS_PER_DAY     = 86400
> +local NANOS_PER_SEC    = 1000000000LL
> +
> +-- c-dt/dt_config.h
> +
> +-- Unix, January 1, 1970, Thursday
> +local DT_EPOCH_1970_OFFSET = 719163LL

10. I remember there was a good question somewhere - how are these
datetimes supposed to store dates before 1970? For example, a
database of historical events or documents.

Can this library store dates before Christ?

> +
> +
> +local datetime_t = ffi.typeof('struct datetime_t')
> +local interval_t = ffi.typeof('struct datetime_interval_t')
> +
> +local function interval_new()
> +    local interval = ffi.new(interval_t)
> +    return interval
> +end
> +
> +local function adjusted_secs(dt)
> +    return dt.secs - dt.offset * 60
> +end
> +
> +local function datetime_sub(lhs, rhs)
> +    local s1 = adjusted_secs(lhs)
> +    local s2 = adjusted_secs(rhs)

11. Did you think about storing the time in a unified
format with offset already applied and add the offset only
when you print the time or convert to something? It seems
these `dt.offset * 60` might be expensive on each arith
operation done again and again.

> +    local d = interval_new()
> +    d.secs = s2 - s1
> +    d.nsec = rhs.nsec - lhs.nsec
> +    if d.nsec < 0 then

12. AFAIK, in Lua index operator '.' is costly.
You might want to cache its values like this:

	local secs = s2 - s1
	local nsec = rhs.nsec - lhs.nsec
	if nsec < 0 then
		d.secs = secs - 1
		d.nsec = nsec + NANOS_PER_SEC
	else
		d.secs = secs
		d.nsec = nsec
	end

The same in the other places where index operation
is repeated more than once: datetime_lt, datetime_le,
etc.

> +        d.secs = d.secs - 1
> +        d.nsec = d.nsec + NANOS_PER_SEC
> +    end
> +    return d
> +end
> +
> +local function datetime_add(lhs, rhs)
> +    local s1 = adjusted_secs(lhs)
> +    local s2 = adjusted_secs(rhs)
> +    local d = interval_new()
> +    d.secs = s2 + s1
> +    d.nsec = rhs.nsec + lhs.nsec
> +    if d.nsec >= NANOS_PER_SEC then
> +        d.secs = d.secs + 1
> +        d.nsec = d.nsec - NANOS_PER_SEC
> +    end
> +    return d
> +end
> +
> +local function datetime_eq(lhs, rhs)
> +    -- we usually don't need to check nullness
> +    -- but older tarantool console will call us checking for equality to nil
> +    if rhs == nil then
> +        return false
> +    end
> +    return (lhs.secs == rhs.secs) and (lhs.nsec == rhs.nsec)

13. You do not need () around the comparison operators. The same in all places
below. Also you could make it shorter:

	return rhs ~= nil and lhs.secs == rhs.secs and lhs.nsec == rhs.nsec

14. Why don't you take 'offset' into account?

> +end
> +
> +
> +local function datetime_lt(lhs, rhs)
> +    return (lhs.secs < rhs.secs) or
> +           (lhs.secs == rhs.secs and lhs.nsec < rhs.nsec)
> +end
> +
> +local function datetime_le(lhs, rhs)
> +    return (lhs.secs <= rhs.secs) or
> +           (lhs.secs == rhs.secs and lhs.nsec <= rhs.nsec)
> +end
> +
> +local function datetime_serialize(self)
> +    -- Allow YAML, MsgPack and JSON to dump objects with sockets
> +    return { secs = self.secs, nsec = self.nsec, tz = self.offset }

15. Why is 'offset' serialized as 'tz' and not just 'offset'?

> +end
> +
> +local function interval_serialize(self)
> +    -- Allow YAML and JSON to dump objects with sockets
> +    return { secs = self.secs, nsec = self.nsec }
> +end
> +
> +local datetime_mt = {
> +    -- __tostring = datetime_tostring,

16. Why is it commented out? Is it even tested then?

> +    __serialize = datetime_serialize,
> +    __eq = datetime_eq,
> +    __lt = datetime_lt,
> +    __le = datetime_le,
> +    __sub = datetime_sub,
> +    __add = datetime_add,
> +
> +    nanoseconds = function(self)
> +        return tonumber(self.secs*NANOS_PER_SEC + self.nsec)
> +    end,

17. Please, separate function definitions with an empty line
between them.

> +    microseconds = function(self)
> +        return tonumber(self.secs*1e6 + self.nsec*1e3)

18. Please, add whitespaces around binary operators.

19. Why do you do tonumber() for all results? You are loosing
precision for big values. The same in all the places where you
use tonumber().

> +    end,
> +    seconds = function(self)
> +        return tonumber(self.secs + self.nsec*1e3)
> +    end,
> +    minutes = function(self)
> +        return tonumber((self._ticks/(1e6*60))%60)
> +    end,
> +    hours = function(self)
> +        return tonumber(self._ticks/(1e6*60*60))

20. I don't see _ticks defined anywhere in our source code in
any of the files. What is it? Is it tested?

> +    end,
> +

21. Extra empty line.

> +}
> +
> +local interval_mt = {
> +    -- __tostring = interval_tostring,

22. Ditto. Why is it commented out and not tested?

> +    __serialize = interval_serialize,
> +    __eq = datetime_eq,
> +    __lt = datetime_lt,
> +    __le = datetime_le,
> +}
> +
> +local function datetime_new_raw(secs, nsec, offset)
> +    local dt_obj = ffi.new(datetime_t)
> +    dt_obj.secs = secs
> +    dt_obj.nsec = nsec
> +    dt_obj.offset = offset
> +    return dt_obj
> +end
> +
> +local function local_rd(o)

23. What is 'rd' and 'dt'?

> +    return math.floor(tonumber(o.secs / SECS_PER_DAY)) + DT_EPOCH_1970_OFFSET
> +end
> +
> +local function local_dt(o)
> +    return cdt.dt_from_rdn(local_rd(o))
> +end
> +
> +local function mk_timestamp(dt, sp, fp, offset)
> +    local epochV = dt ~= nil and (cdt.dt_rdn(dt) - DT_EPOCH_1970_OFFSET) * SECS_PER_DAY or 0
> +    local spV = sp ~= nil and sp or 0
> +    local fpV = fp ~= nil and fp or 0
> +    local ofsV = offset ~= nil and offset or 0
> +    return datetime_new_raw (epochV + spV - ofsV * 60, fpV, ofsV)
> +end
> +
> +-- create @datetime_t given object @o fields
> +local function datetime_new(o)

24. Why such a wierd name 'o'? It looks like 0, super
confusing.

> +    if o == nil then
> +        return datetime_new_raw(0, 0, 0)
> +    end
> +    local secs = 0
> +    local nsec = 0
> +    local offset = 0
> +    local easy_way = false
> +    local y, M, d, ymd
> +    y, M, d, ymd = 0, 0, 0, false
> +
> +    local h, m, s, frac, hms
> +    h, m, s, frac, hms = 0, 0, 0, 0, false

25. Please, try to define the variables on their own lines.
Like you did a few lines above. Otherwise it is practically
unreadable.

> +
> +    local dt = 0
> +
> +    for key, value in pairs(o) do
> +        local handlers = {
> +            secs = function(v)
> +                secs = v
> +                easy_way = true

26. I don't understand what 'easy way' means. Please, elaborate
what you are trying to achieve.

> +            end,

27. The usage of closures here might render all your FFI efforts
useless, killing the performance. Please, try to define all
methods of all objects only once in the root namespace of the
file. Closure usage might be justified only for rarely created
long living heavy objects like netbox.

> +
> +            nsec = function(v)
> +                nsec = v
> +                easy_way = true
> +            end,
> +
> +            offset = function (v)
> +                offset = v
> +                easy_way = true
> +            end,
> +
> +            year = function(v)
> +                assert(v > 0 and v < 10000)
> +                y = v
> +                ymd = true
> +            end,
> +
> +            month = function(v)
> +                assert(v > 0 and v < 12 )
> +                M = v
> +                ymd = true
> +            end,
> +
> +            day = function(v)
> +                assert(v > 0 and v < 32)
> +                d = v
> +                ymd = true
> +            end,
> +
> +            hour = function(v)
> +                assert(v >= 0 and v < 24)
> +                h = v
> +                hms = true
> +            end,
> +
> +            minute = function(v)
> +                assert(v >= 0 and v < 60)
> +                m = v
> +                hms = true
> +            end,
> +
> +            second = function(v)
> +                assert(v >= 0 and v < 61)
> +                frac = v % 1
> +                if frac then
> +                    s = v - (v % 1)
> +                else
> +                    s = v
> +                end
> +                hms = true
> +            end,
> +
> +            -- tz offset in minutes
> +            tz = function(v)
> +                assert(v >= 0 and v <= 720)
> +                offset = v
> +            end
> +        }
> +        handlers[key](value)
> +    end
> +
> +    -- .sec, .nsec, .offset
> +    if easy_way then
> +        return datetime_new_raw(secs, nsec, offset)
> +    end
> +
> +    -- .year, .month, .day
> +    if ymd then
> +        dt = dt + cdt.dt_from_ymd(y, M, d)
> +    end
> +
> +    -- .hour, .minute, .second
> +    if hms then
> +        secs = h * 3600 + m * 60 + s
> +    end
> +
> +    return mk_timestamp(dt, secs, frac, offset)
> +end
> +
> +
> +-- simple parse functions:
> +-- parse_date/parse_time/parse_zone

28. I wouldn't be so sure they are 'simple' and I can read
their names below anyway. This makes this comment impractical.

> +
> +--[[
> +    Basic      Extended
> +    20121224   2012-12-24   Calendar date   (ISO 8601)
> +    2012359    2012-359     Ordinal date    (ISO 8601)
> +    2012W521   2012-W52-1   Week date       (ISO 8601)
> +    2012Q485   2012-Q4-85   Quarter date
> +]]
> +
> +local function parse_date(str)
> +    local dt = ffi.new('dt_t[1]')
> +    local rc = cdt.dt_parse_iso_date(str, #str, dt)
> +    assert(rc > 0)

29. Er ... And what if I pass a string in a wrong format? Users
should not get assertions.

	tarantool> datetime.parse_date('121r1 r1r 13 13 13')
	---
	- error: 'builtin/datetime.lua:397: assertion failed!'
	...

This shall not ever happen. Would you mind implementing proper
error handling. The same in all the places where the errors are
ignored.

> +    return mk_timestamp(dt[0])
> +end
> +
> +--[[
> +    Basic               Extended
> +    T12                 N/A
> +    T1230               T12:30
> +    T123045             T12:30:45
> +    T123045.123456789   T12:30:45.123456789
> +    T123045,123456789   T12:30:45,123456789
> +
> +    The time designator [T] may be omitted.
> +]]
> +local function parse_time(str)
> +    local sp = ffi.new('int[1]')
> +    local fp = ffi.new('int[1]')
> +    local rc = cdt.dt_parse_iso_time(str, #str, sp, fp)
> +    assert(rc > 0)
> +    return mk_timestamp(nil, sp[0], fp[0])
> +end
> +
> +--[[
> +    Basic    Extended
> +    Z        N/A
> +    ±hh      N/A
> +    ±hhmm    ±hh:mm

30. I would recommend to avoid non-ASCII symbls in the code. I
remember I used unicode table borders to build nice schemas
but people were complaining it ruins their terminal's output.

> +]]
> +local function parse_zone(str)
> +    local offset = ffi.new('int[1]')
> +    local len = cdt.dt_parse_iso_zone_lenient(str, #str, offset)
> +    return len > 0 and mk_timestamp(nil, nil, nil, offset[0]) or nil, tonumber(len)

31. Please, keep 80 symbols border.

I am going to stop here, will return later.


More information about the Tarantool-patches mailing list