[Tarantool-patches] [PATCH resend v2 02/11] lua: built-in module datetime

Timur Safin tsafin at tarantool.org
Fri Jul 30 18:39:25 MSK 2021


: From: Vladislav Shpilevoy <v.shpilevoy at tarantool.org>
: Subject: Re: [PATCH resend v2 02/11] lua: built-in module datetime
: 
: Thanks for the patch!
: 
: This is a bad question to ask now, but why did you choose a dead library
: from 2015 with just 25 stars, without any tests, and without any README as
: a basic block for our datetime types? Is this the correct link I am
: using - https://github.com/chansen/c-dt?

Good you asked :) That was data-driven decision and you could see details here
https://github.com/tsafin/bench-timestamps

In short c-dt is as almost as feature-rich as ICU or CCTZ, but much, much 
faster (and we know how to make it even faster if we have to).
Originally I've learned about c-dt from perl' perl5-time-moment mentioned by Mons
(which module was the fastest date/time implementation in Perl ecosystem).

: 
: See 31 comment below.
: 
: On 28.07.2021 12:34, Timur Safin wrote:
: > * created a new Tarantool built-in module `datetime`;
: > * register cdef types for this module;
: > * export some `dt_*` functions from `c-dt` library;
: > * lua implementationis of `asctime` and `strftime`;
: 
: 1. implementationis -> implementations

: 
: > * datetime parsing unit tests, with and withput timezones;
: 
: 2. withput -> without

Thanks!

: 
: > diff --git a/src/exports.h b/src/exports.h
: > index 5bb3e6a2b..db40c03a4 100644
: > --- a/src/exports.h
: > +++ b/src/exports.h
: > @@ -531,3 +531,24 @@ EXPORT(uri_format)
: >  EXPORT(uri_parse)
: >  EXPORT(uuid_nil)
: >  EXPORT(uuid_unpack)
: > +EXPORT(dt_from_rdn)
: > +EXPORT(dt_from_yd)
: > +EXPORT(dt_from_ymd)
: > +EXPORT(dt_from_yqd)
: > +EXPORT(dt_from_ywd)
: > +EXPORT(dt_to_yd)
: > +EXPORT(dt_to_ymd)
: > +EXPORT(dt_to_yqd)
: > +EXPORT(dt_to_ywd)
: > +EXPORT(dt_rdn)
: > +EXPORT(dt_dow)
: > +EXPORT(dt_parse_iso_date)
: > +EXPORT(dt_parse_iso_time)
: > +EXPORT(dt_parse_iso_time_basic)
: > +EXPORT(dt_parse_iso_time_extended)
: > +EXPORT(dt_parse_iso_zone)
: > +EXPORT(dt_parse_iso_zone_basic)
: > +EXPORT(dt_parse_iso_zone_extended)
: > +EXPORT(dt_parse_iso_zone_lenient)
: > +EXPORT(dt_from_struct_tm)
: > +EXPORT(dt_to_struct_tm)
: 
: 3. Please, read the comment in the beginning of the file. The
: first sentence would be:
: 
: 	Keep the symbols sorted by name for search and addition
: 	simplicity, to avoid duplicates.

Yes, I knew it, but forgot. 
BTW, when I tried to sort the whole block I have discovered that
there are multiple of symbols, which are already in incorrect order
but I decided to not touch irrelevant chunks, and only sorted d*
See in the next iteration...
: 
: > diff --git a/src/lib/core/datetime.h b/src/lib/core/datetime.h
: > new file mode 100644
: > index 000000000..403bf1c64
: > --- /dev/null
: > +++ b/src/lib/core/datetime.h
: > @@ -0,0 +1,61 @@
: 
: <...>
: 
: > +#include <c-dt/dt_core.h>
: 
: 4. Please, use "" instead of <> for non-system headers.

Ok.

: 
: > +#include <stdint.h>
: > +#include <stdbool.h>
: > +
: > +#if defined(__cplusplus)
: > +extern "C" {
: > +#endif /* defined(__cplusplus) */
: > +
: > +/**
: > + * datetime structure consisting of:
: 
: 5. Well, this comment isn't really helpful. Also we do not
: use _t suffix ever except for typedefs. Please, drop both
: these things.
: 
: The same for datetime_interval_t.

Will rename to `struct datetime`, and `struct datetime_interval` 
Accordingly.

: 
: > + */
: > +struct datetime_t {
: > +	int64_t secs;	///< seconds since epoch
: > +	int32_t nsec;	///< nanoseconds if any
: > +	int32_t offset; ///< offset in minutes from GMT
: 
: 6. We never use // nor /// nor ///< nor we write comments
: on the same line as the code (except for old legacy code we
: inherited from sqlite and some super old tarantool code.

BTW, why? Single line comment // is the legal C construction 
since C99 (and always have been acceptable by Gnu C and major 
other compilers). 

Also did you know that ///< is doxygen construction for 
documentation of a struct member? (Asking, just to make sure)

: 
: The same for datetime_interval_t.
: 
: > +};
: > +
: > +/**
: > + * Date/time delta structure
: > + */
: > +struct datetime_interval_t {
: > +	int64_t secs; ///< relative seconds delta
: > +	int32_t nsec; ///< nanoseconds delta
: > +};
: > +
: 
: 7. Why do you need this file? It is not included anywhere.
: And you don't need to define the structs in C if you are
: using them in Lua only. You can just define them in Lua
: using ffi.cdef like it is done in some other places.

I'll need to use it in C very soon, once I get to SQL parser
INTERVAL 'xx' support.

: 
: > diff --git a/src/lua/datetime.lua b/src/lua/datetime.lua
: > new file mode 100644
: > index 000000000..0996ca5a2
: > --- /dev/null
: > +++ b/src/lua/datetime.lua
: > @@ -0,0 +1,581 @@
: > +local ffi = require('ffi')
: > +local cdt = ffi.C
: > +
: > +ffi.cdef [[
: > +
: > +    typedef int dt_t;
: > +
: > +    // dt_core.h
: > +    typedef enum {
: > +        DT_MON       = 1,
: > +        DT_MONDAY    = 1,
: > +        DT_TUE       = 2,
: > +        DT_TUESDAY   = 2,
: > +        DT_WED       = 3,
: > +        DT_WEDNESDAY = 3,
: > +        DT_THU       = 4,
: > +        DT_THURSDAY  = 4,
: > +        DT_FRI       = 5,
: > +        DT_FRIDAY    = 5,
: > +        DT_SAT       = 6,
: > +        DT_SATURDAY  = 6,
: > +        DT_SUN       = 7,
: > +        DT_SUNDAY    = 7,
: > +    } dt_dow_t;
: > +
: > +    dt_t     dt_from_rdn     (int n);
: > +    dt_t     dt_from_yd      (int y, int d);
: > +    dt_t     dt_from_ymd     (int y, int m, int d);
: > +    dt_t     dt_from_yqd     (int y, int q, int d);
: > +    dt_t     dt_from_ywd     (int y, int w, int d);
: > +
: > +    void     dt_to_yd        (dt_t dt, int *y, int *d);
: > +    void     dt_to_ymd       (dt_t dt, int *y, int *m, int *d);
: > +    void     dt_to_yqd       (dt_t dt, int *y, int *q, int *d);
: > +    void     dt_to_ywd       (dt_t dt, int *y, int *w, int *d);
: > +
: > +    int      dt_rdn          (dt_t dt);
: > +    dt_dow_t dt_dow          (dt_t dt);
: > +
: > +    // dt_parse_iso.h
: > +    size_t dt_parse_iso_date          (const char *str, size_t len, dt_t
: *dt);
: > +
: > +    size_t dt_parse_iso_time          (const char *str, size_t len, int
: *sod, int *nsec);
: > +    size_t dt_parse_iso_time_basic    (const char *str, size_t len, int
: *sod, int *nsec);
: > +    size_t dt_parse_iso_time_extended (const char *str, size_t len, int
: *sod, int *nsec);
: > +
: > +    size_t dt_parse_iso_zone          (const char *str, size_t len, int
: *offset);
: > +    size_t dt_parse_iso_zone_basic    (const char *str, size_t len, int
: *offset);
: > +    size_t dt_parse_iso_zone_extended (const char *str, size_t len, int
: *offset);
: > +    size_t dt_parse_iso_zone_lenient  (const char *str, size_t len, int
: *offset);
: > +
: > +    // dt_tm.h
: > +    dt_t    dt_from_struct_tm  (const struct tm *tm);
: > +    void    dt_to_struct_tm    (dt_t dt, struct tm *tm);
: > +
: > +    // <asm-generic/posix_types.h>
: > +    typedef long            __kernel_long_t;
: > +    typedef unsigned long   __kernel_ulong_t;
: > +    // /usr/include/x86_64-linux-gnu/bits/types/time_t.h
: > +    typedef long            time_t;
: > +
: > +
: > +    // <time.h>
: > +    typedef __kernel_long_t	__kernel_time_t;
: > +    typedef __kernel_long_t	__kernel_suseconds_t;
: > +
: > +    struct timespec {
: > +        __kernel_time_t	        tv_sec;     /* seconds */
: > +        long                    tv_nsec;    /* nanoseconds */
: > +    };
: > +
: > +    struct timeval {
: > +        __kernel_time_t	        tv_sec;	    /* seconds */
: > +        __kernel_suseconds_t    tv_usec;    /* microseconds */
: > +    };
: > +
: > +    struct timezone {
: > +        int	tz_minuteswest;     /* minutes west of Greenwich */
: > +        int	tz_dsttime;	        /* type of dst correction */
: > +    };
: > +
: > +    // /usr/include/x86_64-linux-gnu/sys/time.h
: > +    typedef struct timezone * __timezone_ptr_t;
: > +
: > +    /* Get the current time of day and timezone information,
: > +       putting it into *TV and *TZ.  If TZ is NULL, *TZ is not filled.
: > +       Returns 0 on success, -1 on errors.
: > +
: > +       NOTE: This form of timezone information is obsolete.
: > +       Use the functions and variables declared in <time.h> instead.  */
: > +    int gettimeofday (struct timeval *__tv, struct timezone * __tz);
: > +
: > +    // /usr/include/x86_64-linux-gnu/bits/types/struct_tm.h
: > +    /* ISO C `broken-down time' structure.  */
: > +    struct tm
: > +    {
: > +        int tm_sec;	        /* Seconds.	[0-60] (1 leap second) */
: > +        int tm_min;	        /* Minutes.	[0-59] */
: > +        int tm_hour;        /* Hours.	[0-23] */
: > +        int tm_mday;        /* Day.		[1-31] */
: > +        int tm_mon;	        /* Month.	[0-11] */
: > +        int tm_year;        /* Year	- 1900.  */
: > +        int tm_wday;        /* Day of week.	[0-6] */
: > +        int tm_yday;        /* Days in year.[0-365]	*/
: > +        int tm_isdst;       /* DST.		[-1/0/1]*/
: > +
: > +        long int tm_gmtoff; /* Seconds east of UTC.  */
: > +        const char *tm_zone;/* Timezone abbreviation.  */
: > +    };
: > +
: > +    // <time.h>
: > +    /* Return the current time and put it in *TIMER if TIMER is not NULL.
: */
: > +    time_t time (time_t *__timer);
: > +
: > +    /* Format TP into S according to FORMAT.
: > +    Write no more than MAXSIZE characters and return the number
: > +    of characters written, or 0 if it would exceed MAXSIZE.  */
: > +    size_t strftime (char * __s, size_t __maxsize, const char * __format,
: > +                     const struct tm * __tp);
: > +
: > +    /* Parse S according to FORMAT and store binary time information in
: TP.
: > +    The return value is a pointer to the first unparsed character in S.
: */
: > +    char *strptime (const char * __s, const char * __fmt, struct tm
: *__tp);
: > +
: > +    /* Return the `struct tm' representation of *TIMER in UTC,
: > +    using *TP to store the result.  */
: > +    struct tm *gmtime_r (const time_t * __timer, struct tm * __tp);
: > +
: > +    /* Return the `struct tm' representation of *TIMER in local time,
: > +    using *TP to store the result.  */
: > +    struct tm *localtime_r (const time_t * __timer, struct tm * __tp);
: > +
: > +    /* Return a string of the form "Day Mon dd hh:mm:ss yyyy\n"
: > +    that is the representation of TP in this format.  */
: > +    char *asctime (const struct tm *__tp);
: > +
: > +    /* Equivalent to `asctime (localtime (timer))'.  */
: > +    char *ctime (const time_t *__timer);
: > +
: 
: 8. The amount of platform-specific stuff you got here really bothers me.
: For instance, struct tm from /usr/include/x86_64-linux-
: gnu/bits/types/struct_tm.h.
: 
: Is it said anywhere that the struct has only these members and only in this
: order? The standard says "includes at least the following members". Not
: only them and no specific order.

Agreed that despite the fact that `struct tm` is defined pretty similarly 
in POSIX/SUSV https://pubs.opengroup.org/onlinepubs/009696699/basedefs/time.h.html
actual layout of (at least last fields tm_gmtoff and tm_zone) might vary
significantly. And we better hide this layout specific code (i.e. allocation
and manipulations with fields) into accompanying C code. 

: 
: I would probably go with Lua C implementation if I would start this
: module. Too strong dependecy on C. Also I wouldn't be so sure FFI is faster
: than
: Lua C: https://github.com/tarantool/tarantool/issues/5896.

As you know - who is faster is complicated question. There are some
subtle bug in LuaJIT with structure allocations which I believe was the
major contributor to slowness reported in Decimal branch 
https://github.com/tarantool/tarantool/issues/692#issuecomment-505043929 
but which has been narrowed down by Sergos and Mike has already committed
patch to upstream in https://github.com/LuaJIT/LuaJIT/commit/ad65934fa08a65bfb0eb9528731a4394842cc173
but, I believe, it's not yet in our LuaJIT fork. 

I'd probably postpone total redesign from ffi to LuaC of this 1st version
of module (because performance is not yet problem, functionality is), but
rather wait and see how things would change once this struct-of-NYI would
be landed to Tarantool code. 
: 
: > +]]
: > +
: > +local native = ffi.C
: 
: 9. You already have cdt in the beginning of the file.

Yes, thanks

: 
: > +
: > +local SECS_PER_DAY     = 86400
: > +local NANOS_PER_SEC    = 1000000000LL
: > +
: > +-- c-dt/dt_config.h
: > +
: > +-- Unix, January 1, 1970, Thursday
: > +local DT_EPOCH_1970_OFFSET = 719163LL
: 
: 10. I remember there was a good question somewhere - how are these
: datetimes supposed to store dates before 1970? For example, a
: database of historical events or documents.
: 
: Can this library store dates before Christ?

Yes, please see relevant discussion with Peter https://github.com/tarantool/tarantool/discussions/6244#discussioncomment-1055996
c-dt is properly handling positive and negative dt values, 
where dt is number of dates since Rata Die, January 1, 0001.
For better MessagePack compactness of our typical data we shift beginning
of our 0 to the Unix Epoch (1970-01-01), and our 0 is actually 
dt=719163.
So our epoch-based second 0 would become dt value (719163) via this simple
formula (secs / 86400) + 719163. Where 719163 is an offset of Unix Epoch
(1970-01-01) since Rata Die (0001-01-01) in dates.

We might consider to further shift, though, our relative 0 
to any date in time like 2000-01-01, whatever, if we would decide that
that different base would provide us more compact storage. But I believe
unix-epoch date, from statistical point of view, is more convenient for 
storing and using at runtime.

: 
: > +
: > +
: > +local datetime_t = ffi.typeof('struct datetime_t')
: > +local interval_t = ffi.typeof('struct datetime_interval_t')
: > +
: > +local function interval_new()
: > +    local interval = ffi.new(interval_t)
: > +    return interval
: > +end
: > +
: > +local function adjusted_secs(dt)
: > +    return dt.secs - dt.offset * 60
: > +end
: > +
: > +local function datetime_sub(lhs, rhs)
: > +    local s1 = adjusted_secs(lhs)
: > +    local s2 = adjusted_secs(rhs)
: 
: 11. Did you think about storing the time in a unified
: format with offset already applied and add the offset only
: when you print the time or convert to something? It seems
: these `dt.offset * 60` might be expensive on each arith
: operation done again and again.

Exactly! That was bug in the early patches, which has been fixed 
in later version of this code. Sorry for the confusion created. 
Yes, secs are UC normalized, and offset if nominal (used only
For output).

: 
: > +    local d = interval_new()
: > +    d.secs = s2 - s1
: > +    d.nsec = rhs.nsec - lhs.nsec
: > +    if d.nsec < 0 then
: 
: 12. AFAIK, in Lua index operator '.' is costly.
: You might want to cache its values like this:
: 
: 	local secs = s2 - s1
: 	local nsec = rhs.nsec - lhs.nsec
: 	if nsec < 0 then
: 		d.secs = secs - 1
: 		d.nsec = nsec + NANOS_PER_SEC
: 	else
: 		d.secs = secs
: 		d.nsec = nsec
: 	end
: 
: The same in the other places where index operation
: is repeated more than once: datetime_lt, datetime_le,
: etc.

Thanks, I'll review all relevant places and extract
read references to locals.

: 
: > +        d.secs = d.secs - 1
: > +        d.nsec = d.nsec + NANOS_PER_SEC
: > +    end
: > +    return d
: > +end
: > +
: > +local function datetime_add(lhs, rhs)
: > +    local s1 = adjusted_secs(lhs)
: > +    local s2 = adjusted_secs(rhs)
: > +    local d = interval_new()
: > +    d.secs = s2 + s1
: > +    d.nsec = rhs.nsec + lhs.nsec
: > +    if d.nsec >= NANOS_PER_SEC then
: > +        d.secs = d.secs + 1
: > +        d.nsec = d.nsec - NANOS_PER_SEC
: > +    end
: > +    return d
: > +end
: > +
: > +local function datetime_eq(lhs, rhs)
: > +    -- we usually don't need to check nullness
: > +    -- but older tarantool console will call us checking for equality to
: nil
: > +    if rhs == nil then
: > +        return false
: > +    end
: > +    return (lhs.secs == rhs.secs) and (lhs.nsec == rhs.nsec)
: 
: 13. You do not need () around the comparison operators. The same in all
: places
: below. Also you could make it shorter:
: 
: 	return rhs ~= nil and lhs.secs == rhs.secs and lhs.nsec == rhs.nsec

Nice! Thanks!

: 
: 14. Why don't you take 'offset' into account?

Because it's not needed due to UTC-normalization. Sorry for the confusion 
with adjusted_sescs function. 

: 
: > +end
: > +
: > +
: > +local function datetime_lt(lhs, rhs)
: > +    return (lhs.secs < rhs.secs) or
: > +           (lhs.secs == rhs.secs and lhs.nsec < rhs.nsec)
: > +end
: > +
: > +local function datetime_le(lhs, rhs)
: > +    return (lhs.secs <= rhs.secs) or
: > +           (lhs.secs == rhs.secs and lhs.nsec <= rhs.nsec)
: > +end
: > +
: > +local function datetime_serialize(self)
: > +    -- Allow YAML, MsgPack and JSON to dump objects with sockets
: > +    return { secs = self.secs, nsec = self.nsec, tz = self.offset }
: 
: 15. Why is 'offset' serialized as 'tz' and not just 'offset'?

Yes, now you've asked it and it's looking confusing after rename.

: 
: > +end
: > +
: > +local function interval_serialize(self)
: > +    -- Allow YAML and JSON to dump objects with sockets
: > +    return { secs = self.secs, nsec = self.nsec }
: > +end
: > +
: > +local datetime_mt = {
: > +    -- __tostring = datetime_tostring,
: 
: 16. Why is it commented out? Is it even tested then?

It's implemented in the later patch in patchset series.

: 
: > +    __serialize = datetime_serialize,
: > +    __eq = datetime_eq,
: > +    __lt = datetime_lt,
: > +    __le = datetime_le,
: > +    __sub = datetime_sub,
: > +    __add = datetime_add,
: > +
: > +    nanoseconds = function(self)
: > +        return tonumber(self.secs*NANOS_PER_SEC + self.nsec)
: > +    end,
: 
: 17. Please, separate function definitions with an empty line
: between them.

Done! (Will be pushed later) 

: 
: > +    microseconds = function(self)
: > +        return tonumber(self.secs*1e6 + self.nsec*1e3)
: 
: 18. Please, add whitespaces around binary operators.

Done! It was already such in the later commits which were part
of in patchset. [Yes, apparently I need to squash few more commits,
as Oleg Babin has asked elsewhere]

: 
: 19. Why do you do tonumber() for all results? You are loosing
: precision for big values. The same in all the places where you
: use tonumber().

To be able to use it in places which expect numbers like %d in
string.format. Though at the end of day I've came up to the version
where I convert to Lua numbers not all (potentially fraction point) 
values, but with exception of nanoseconds / microseconds / milliseconds
(which return integer values). 

: 
: > +    end,
: > +    seconds = function(self)
: > +        return tonumber(self.secs + self.nsec*1e3)
: > +    end,
: > +    minutes = function(self)
: > +        return tonumber((self._ticks/(1e6*60))%60)
: > +    end,
: > +    hours = function(self)
: > +        return tonumber(self._ticks/(1e6*60*60))
: 
: 20. I don't see _ticks defined anywhere in our source code in
: any of the files. What is it? Is it tested?

Yes, that was miscopy from elsewhere. Fixed in newer commits. 
Sorry for the confusion.

: 
: > +    end,
: > +
: 
: 21. Extra empty line.

Yup, but later, well, you know ...

: 
: > +}
: > +
: > +local interval_mt = {
: > +    -- __tostring = interval_tostring,
: 
: 22. Ditto. Why is it commented out and not tested?

Tostring conversion was implemented later in a few 
commits. One for datetime stringization, and another 
one for interval.

: 
: > +    __serialize = interval_serialize,
: > +    __eq = datetime_eq,
: > +    __lt = datetime_lt,
: > +    __le = datetime_le,
: > +}
: > +
: > +local function datetime_new_raw(secs, nsec, offset)
: > +    local dt_obj = ffi.new(datetime_t)
: > +    dt_obj.secs = secs
: > +    dt_obj.nsec = nsec
: > +    dt_obj.offset = offset
: > +    return dt_obj
: > +end
: > +
: > +local function local_rd(o)
: 
: 23. What is 'rd' and 'dt'?

`dt` is type - data type from c-dt (alias to signed integer).
Which shows a number of dates since Rata Die date (0001-01-01).
It's signed 32-bit value, thus could represent huge range before 
0001-01-01 as well. 

: 
: > +    return math.floor(tonumber(o.secs / SECS_PER_DAY)) +
: DT_EPOCH_1970_OFFSET
: > +end
: > +
: > +local function local_dt(o)
: > +    return cdt.dt_from_rdn(local_rd(o))
: > +end
: > +
: > +local function mk_timestamp(dt, sp, fp, offset)
: > +    local epochV = dt ~= nil and (cdt.dt_rdn(dt) - DT_EPOCH_1970_OFFSET)
: * SECS_PER_DAY or 0
: > +    local spV = sp ~= nil and sp or 0
: > +    local fpV = fp ~= nil and fp or 0
: > +    local ofsV = offset ~= nil and offset or 0
: > +    return datetime_new_raw (epochV + spV - ofsV * 60, fpV, ofsV)
: > +end
: > +
: > +-- create @datetime_t given object @o fields
: > +local function datetime_new(o)
: 
: 24. Why such a wierd name 'o'? It looks like 0, super
: confusing.

o means object (with attributes) here. Will make it look 
longer, i.e. obj.

: 
: > +    if o == nil then
: > +        return datetime_new_raw(0, 0, 0)
: > +    end
: > +    local secs = 0
: > +    local nsec = 0
: > +    local offset = 0
: > +    local easy_way = false
: > +    local y, M, d, ymd
: > +    y, M, d, ymd = 0, 0, 0, false
: > +
: > +    local h, m, s, frac, hms
: > +    h, m, s, frac, hms = 0, 0, 0, 0, false
: 
: 25. Please, try to define the variables on their own lines.
: Like you did a few lines above. Otherwise it is practically
: unreadable.

Ok

: 
: > +
: > +    local dt = 0
: > +
: > +    for key, value in pairs(o) do
: > +        local handlers = {
: > +            secs = function(v)
: > +                secs = v
: > +                easy_way = true
: 
: 26. I don't understand what 'easy way' means. Please, elaborate
: what you are trying to achieve.

We could create datetime object 2 ways:
- using raw attributes, without any conversion - i.e. secs, nsec, offset;
- or using human-friendly year/month/day/hour/minutes/seconds;

The easy_way is using raw mode - defining .secs, .nsec, .offset, but using
The same object constructor interface. i.e.

	date = require 'datetime'
	T = date{ secs = 0, nsec = 0, offset = 0}

Will create datetime object representing 1970-01-01T00:00Z moment.

: 
: > +            end,
: 
: 27. The usage of closures here might render all your FFI efforts
: useless, killing the performance. Please, try to define all
: methods of all objects only once in the root namespace of the
: file. Closure usage might be justified only for rarely created
: long living heavy objects like netbox.

Agreed that creating closures in each iteration of a loop was
bad idea (as Oleg Babin has already pin-pointed it elsewhere).
And we have to ate least to remove loop-invariant outside of cycle.
But at the moment I do not consider the code creating object 
as very performance critical path, because I do not expect this
path used frequently, more like after parsing of ido-8601 literals,
which uses raw object construction approach.

[I'll bench though the difference between using invariant closures 
and multiple ifs to see the size of a problem]

: 
: > +
: > +            nsec = function(v)
: > +                nsec = v
: > +                easy_way = true
: > +            end,
: > +
: > +            offset = function (v)
: > +                offset = v
: > +                easy_way = true
: > +            end,
: > +
: > +            year = function(v)
: > +                assert(v > 0 and v < 10000)
: > +                y = v
: > +                ymd = true
: > +            end,
: > +
: > +            month = function(v)
: > +                assert(v > 0 and v < 12 )
: > +                M = v
: > +                ymd = true
: > +            end,
: > +
: > +            day = function(v)
: > +                assert(v > 0 and v < 32)
: > +                d = v
: > +                ymd = true
: > +            end,
: > +
: > +            hour = function(v)
: > +                assert(v >= 0 and v < 24)
: > +                h = v
: > +                hms = true
: > +            end,
: > +
: > +            minute = function(v)
: > +                assert(v >= 0 and v < 60)
: > +                m = v
: > +                hms = true
: > +            end,
: > +
: > +            second = function(v)
: > +                assert(v >= 0 and v < 61)
: > +                frac = v % 1
: > +                if frac then
: > +                    s = v - (v % 1)
: > +                else
: > +                    s = v
: > +                end
: > +                hms = true
: > +            end,
: > +
: > +            -- tz offset in minutes
: > +            tz = function(v)
: > +                assert(v >= 0 and v <= 720)
: > +                offset = v
: > +            end
: > +        }
: > +        handlers[key](value)
: > +    end
: > +
: > +    -- .sec, .nsec, .offset
: > +    if easy_way then
: > +        return datetime_new_raw(secs, nsec, offset)
: > +    end
: > +
: > +    -- .year, .month, .day
: > +    if ymd then
: > +        dt = dt + cdt.dt_from_ymd(y, M, d)
: > +    end
: > +
: > +    -- .hour, .minute, .second
: > +    if hms then
: > +        secs = h * 3600 + m * 60 + s
: > +    end
: > +
: > +    return mk_timestamp(dt, secs, frac, offset)
: > +end
: > +
: > +
: > +-- simple parse functions:
: > +-- parse_date/parse_time/parse_zone
: 
: 28. I wouldn't be so sure they are 'simple' and I can read
: their names below anyway. This makes this comment impractical.

:) ok

: 
: > +
: > +--[[
: > +    Basic      Extended
: > +    20121224   2012-12-24   Calendar date   (ISO 8601)
: > +    2012359    2012-359     Ordinal date    (ISO 8601)
: > +    2012W521   2012-W52-1   Week date       (ISO 8601)
: > +    2012Q485   2012-Q4-85   Quarter date
: > +]]
: > +
: > +local function parse_date(str)
: > +    local dt = ffi.new('dt_t[1]')
: > +    local rc = cdt.dt_parse_iso_date(str, #str, dt)
: > +    assert(rc > 0)
: 
: 29. Er ... And what if I pass a string in a wrong format? Users
: should not get assertions.
: 
: 	tarantool> datetime.parse_date('121r1 r1r 13 13 13')
: 	---
: 	- error: 'builtin/datetime.lua:397: assertion failed!'
: 	...
: 
: This shall not ever happen. Would you mind implementing proper
: error handling. The same in all the places where the errors are
: ignored.


Now, I've reworked approach how they handle errors (you could see
It in later commits in the patch-set), and now it returns tuple - 
- parsed value
- and the number of accepted input characters.

	tarantool> date.parse_date('121r1 r1r 13 13 13')
	---
	- null
	- 0
	...

Sometimes, while parsing composed strings using these partial
methods (parse_date / parse_time / parse_time_zone) we need to
know the number of symbols we have successfully accepted. That's
why there is seconds value in returned tuple.

Please give me know if there is better/more idiomatic approach
used for such scenarios?

: 
: > +    return mk_timestamp(dt[0])
: > +end
: > +
: > +--[[
: > +    Basic               Extended
: > +    T12                 N/A
: > +    T1230               T12:30
: > +    T123045             T12:30:45
: > +    T123045.123456789   T12:30:45.123456789
: > +    T123045,123456789   T12:30:45,123456789
: > +
: > +    The time designator [T] may be omitted.
: > +]]
: > +local function parse_time(str)
: > +    local sp = ffi.new('int[1]')
: > +    local fp = ffi.new('int[1]')
: > +    local rc = cdt.dt_parse_iso_time(str, #str, sp, fp)
: > +    assert(rc > 0)
: > +    return mk_timestamp(nil, sp[0], fp[0])
: > +end
: > +
: > +--[[
: > +    Basic    Extended
: > +    Z        N/A
: > +    ±hh      N/A
: > +    ±hhmm    ±hh:mm
: 
: 30. I would recommend to avoid non-ASCII symbls in the code. I
: remember I used unicode table borders to build nice schemas
: but people were complaining it ruins their terminal's output.

Even if it's properly utf-8 encoded? That's odd. 
Will replace with + and - separately.

: 
: > +]]
: > +local function parse_zone(str)
: > +    local offset = ffi.new('int[1]')
: > +    local len = cdt.dt_parse_iso_zone_lenient(str, #str, offset)
: > +    return len > 0 and mk_timestamp(nil, nil, nil, offset[0]) or nil,
: tonumber(len)
: 
: 31. Please, keep 80 symbols border.

Yup

: 
: I am going to stop here, will return later.

Thanks!
Timur



More information about the Tarantool-patches mailing list