[Tarantool-patches] [PATCH v5 2/8] lua: built-in module datetime

Safin Timur tsafin at tarantool.org
Wed Aug 18 13:03:27 MSK 2021


On 17.08.2021 19:52, Vladimir Davydov via Tarantool-patches wrote:
> On Mon, Aug 16, 2021 at 02:59:36AM +0300, Timur Safin via
> Tarantool-patches wrote:
>> diff --git a/extra/exports b/extra/exports
>> index 9eaba1282..80eb92abd 100644
>> --- a/extra/exports
>> +++ b/extra/exports
>> @@ -148,8 +148,34 @@ csv_feed
>>   csv_iterator_create
>>   csv_next
>>   csv_setopt
>> +datetime_asctime
>> +datetime_ctime
>> +datetime_now
>> +datetime_strftime
>> +decimal_unpack
> 
> decimal_unpack?

Yes, bad copy paste. Supposed to become datetime_unpack. Has been 
changed later in the patchset. But now, after Serge complain, I've 
reshuffled and fixed original patch, not later.

> 
>>   decimal_from_string
>>   decimal_unpack
>> +tnt_dt_dow
>> +tnt_dt_from_rdn
>> +tnt_dt_from_struct_tm
>> +tnt_dt_from_yd
>> +tnt_dt_from_ymd
>> +tnt_dt_from_yqd
>> +tnt_dt_from_ywd
>> +tnt_dt_parse_iso_date
>> +tnt_dt_parse_iso_time_basic
>> +tnt_dt_parse_iso_time_extended
>> +tnt_dt_parse_iso_time
>> +tnt_dt_parse_iso_zone_basic
>> +tnt_dt_parse_iso_zone_extended
>> +tnt_dt_parse_iso_zone_lenient
>> +tnt_dt_parse_iso_zone
>> +tnt_dt_rdn
>> +tnt_dt_to_struct_tm
>> +tnt_dt_to_yd
>> +tnt_dt_to_ymd
>> +tnt_dt_to_yqd
>> +tnt_dt_to_ywd
>>   error_ref
>>   error_set_prev
>>   error_unref

...

>> new file mode 100644
>> index 000000000..c48295a6f
>> --- /dev/null
>> +++ b/src/lib/core/datetime.c
>> @@ -0,0 +1,96 @@
>> +/*
>> + * Copyright 2021, Tarantool AUTHORS, please see AUTHORS file.
>> + *
>> + * Redistribution and use in source and binary forms, with or
>> + * without modification, are permitted provided that the following
>> + * conditions are met:
>> + *
>> + * 1. Redistributions of source code must retain the above
>> + *    copyright notice, this list of conditions and the
>> + *    following disclaimer.
>> + *
>> + * 2. Redistributions in binary form must reproduce the above
>> + *    copyright notice, this list of conditions and the following
>> + *    disclaimer in the documentation and/or other materials
>> + *    provided with the distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
>> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
>> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
>> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
>> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
>> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>> + * SUCH DAMAGE.
>> + */
>> +
>> +#include <string.h>
>> +#include <time.h>
>> +
>> +#include "trivia/util.h"
>> +#include "datetime.h"
>> +
>> +static int
>> +local_dt(int64_t secs)
>> +{
>> +	return dt_from_rdn((int)(secs / SECS_PER_DAY) +
> DT_EPOCH_1970_OFFSET);
>> +}
> 
> I don't understand what this function does. Please add a comment.

Good point. Added comment.

> 
>> +
>> +static struct tm *
>> +datetime_to_tm(const struct datetime *date)
>> +{
>> +	static struct tm tm;
>> +
>> +	memset(&tm, 0, sizeof(tm));
>> +	int64_t secs = date->secs;
>> +	dt_to_struct_tm(local_dt(secs), &tm);
>> +
>> +	int seconds_of_day = (int64_t)date->secs % SECS_PER_DAY;
>> +	tm.tm_hour = (seconds_of_day / 3600) % 24;
>> +	tm.tm_min = (seconds_of_day / 60) % 60;
>> +	tm.tm_sec = seconds_of_day % 60;
> 
> To make the code easier for understanding, please define and use
> constants here and everywhere else in this patch: HOURS_PER_DAY,
> MINUTES_PER_HOUR, NSECS_PER_USEC, etc.
> 
>> +
>> +	return &tm;
>> +}
>> +
>> +void
>> +datetime_now(struct datetime *now)
>> +{
>> +	struct timeval tv;
>> +	gettimeofday(&tv, NULL);
>> +	now->secs = tv.tv_sec;
>> +	now->nsec = tv.tv_usec * 1000;
>> +
>> +	time_t now_seconds;
>> +	time(&now_seconds);
> 
> Can't you use tv.tv_sec here?

Nope. :)

Actually I don't want to know value returned from time() - all I need is 
to eventually call localtime[_r]() to get local timezone. If you know 
simpler way to determine local timezone without time, then please give 
me know.


>> new file mode 100644
>> index 000000000..1a8d7e34f
>> --- /dev/null
>> +++ b/src/lib/core/datetime.h
>> @@ -0,0 +1,95 @@
>> +#pragma once
>> +/*
>> + * Copyright 2021, Tarantool AUTHORS, please see AUTHORS file.
>> + *
>> + * Redistribution and use in source and binary forms, with or
>> + * without modification, are permitted provided that the following
>> + * conditions are met:
>> + *
>> + * 1. Redistributions of source code must retain the above
>> + *    copyright notice, this list of conditions and the
>> + *    following disclaimer.
>> + *
>> + * 2. Redistributions in binary form must reproduce the above
>> + *    copyright notice, this list of conditions and the following
>> + *    disclaimer in the documentation and/or other materials
>> + *    provided with the distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
>> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
>> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
>> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
>> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
>> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>> + * SUCH DAMAGE.
>> + */
>> +
>> +#include <stdint.h>
>> +#include <stdbool.h>
>> +#include <stdio.h>
>> +#include "c-dt/dt.h"
>> +
>> +#if defined(__cplusplus)
>> +extern "C"
>> +{
>> +#endif /* defined(__cplusplus) */
>> +
>> +#ifndef SECS_PER_DAY
>> +#define SECS_PER_DAY          86400
>> +#define DT_EPOCH_1970_OFFSET  719163
> 
> I don't understand what this constant stores. Please add a comment.

I've documented them after Serge request this way:

------------------------------------------------------
/**
  * We count dates since so called "Rata Die" date
  * January 1, 0001, Monday (as Day 1).
  * But datetime structure keeps seconds since
  * Unix "Epoch" date:
  * Unix, January 1, 1970, Thursday
  *
  * The difference between Epoch (1970-01-01)
  * and Rata Die (0001-01-01) is 719163 days.
  */

#ifndef SECS_PER_DAY
#define SECS_PER_DAY          86400
#define DT_EPOCH_1970_OFFSET  719163
#endif

/**
  * c-dt library uses int as type for dt value, which
  * represents the number of days since Rata Die date.
  * This implies limits to the number of seconds we
  * could safely store in our structures and then safely
  * pass to c-dt functions.
  *
  * So supported ranges will be
  * - for seconds [-185604722870400 .. 185480451417600]
  * - for dates   [-5879610-06-22T00:00Z .. 5879611-07-11T00:00Z]
  */
#define MAX_DT_DAY_VALUE (int64_t)INT_MAX
#define MIN_DT_DAY_VALUE (int64_t)INT_MIN
#define SECS_EPOCH_1970_OFFSET 	\
	((int64_t)DT_EPOCH_1970_OFFSET * SECS_PER_DAY)
#define MAX_EPOCH_SECS_VALUE    \
	(MAX_DT_DAY_VALUE * SECS_PER_DAY - SECS_EPOCH_1970_OFFSET)
#define MIN_EPOCH_SECS_VALUE    \
	(MIN_DT_DAY_VALUE * SECS_PER_DAY - SECS_EPOCH_1970_OFFSET)
------------------------------------------------------

This is more than you probably were asking for at the moment, but bear 
with me - we will need those MAX_EPOCH_SECS_VALUE/MIN_EPOCH_SECS_VALUE 
in the messagepack serialization/deserialization code.

> 
>> +#endif
>> +
>> +/**
>> + * Full datetime structure representing moments
>> + * since Unix Epoch (1970-01-01).
>> + * Time is kept normalized to UTC, time-zone offset
>> + * is informative only.
>> + */
>> +struct datetime {
>> +	/** seconds since epoch */
>> +	double secs;
> 
> Please add a comment explaining why you use 'double' instead of
> an integer type.
> 
>> +	/** nanoseconds if any */
>> +	int32_t nsec;
> 
> Why 'nsec', but 'secs'? This is inconsistent. Should be 'nsec' and 'sec'
> or 'nsecs' and 'secs'.

Good point, will rname to use 'secs' and 'nsecs'. But rename will be 
massive, so no incremental patch so far.

> 
>> +	/** offset in minutes from UTC */
>> +	int32_t offset;
> 
> Why do you use int32_t instead of int for these two members?

Because I need signed integer larger than 16-bit, but less or equal to 
32-bit signed value. Simple int is not exactly what I want, depending on 
platfrom it could be 4byte or 8bytes long. For example, ILP32, and LP64 
do have 4-bytes long int, but there are(were) ILP64 hardware platforms 
(like Cray) where int was as 64-bit as long or pointer.

int is not as specific as int32_t in this particular case.

> 
> Same comments for the datetime_interval struct.
> 
>> +};
>> +
>> +/**
>> + * Date/time interval structure
>> + */
>> +struct datetime_interval {
>> +	/** relative seconds delta */
>> +	double secs;
>> +	/** nanoseconds delta */
>> +	int32_t nsec;
>> +};
>> +
>> +/**
>> + * Convert datetime to string using default asctime format
>> + * "Sun Sep 16 01:03:52 1973\n\0"
>> + * Wrapper around reenterable asctime_r() version of POSIX function
>> + * @param date source datetime value
>> + * @sa datetime_ctime
>> + */
>> +char *
>> +datetime_asctime(const struct datetime *date, char *buf);
>> +
>> +char *
>> +datetime_ctime(const struct datetime *date, char *buf);
>> +
>> +size_t
>> +datetime_strftime(const struct datetime *date, const char *fmt, char
> *buf,
>> +		  uint32_t len);
>> +
>> +void
>> +datetime_now(struct datetime * now);
> 
> Extra space after '*'.

Fixed.

> 
> Please add comments to all these functions, like you did for
> datetime_asctime.

Added.

> 
>> +
>> +#if defined(__cplusplus)
>> +} /* extern "C" */
>> +#endif /* defined(__cplusplus) */
>> diff --git a/src/lua/datetime.lua b/src/lua/datetime.lua
>> new file mode 100644
>> index 000000000..ce579828f
>> --- /dev/null
>> +++ b/src/lua/datetime.lua
>> @@ -0,0 +1,500 @@
>> +local ffi = require('ffi')
>> +
>> +ffi.cdef [[
>> +
>> +    /*
>> +    `c-dt` library functions handles properly both positive and
> negative `dt`
>> +    values, where `dt` is a number of dates since Rata Die date
> (0001-01-01).
>> +
>> +    For better compactness of our typical data in MessagePack stream we
> shift
>> +    root of our time to the Unix Epoch date (1970-01-01), thus our 0 is
>> +    actually dt = 719163.
>> +
>> +    So here is a simple formula how convert our epoch-based seconds to
> dt values
>> +        dt = (secs / 86400) + 719163
>> +    Where 719163 is an offset of Unix Epoch (1970-01-01) since Rata Die
>> +    (0001-01-01) in dates.
>> +
>> +    */
>> +    typedef int dt_t;
>> +
>> +    // dt_core.h
>> +    dt_t     tnt_dt_from_rdn     (int n);
>> +    dt_t     tnt_dt_from_ymd     (int y, int m, int d);
>> +
>> +    int      tnt_dt_rdn          (dt_t dt);
>> +
>> +    // dt_parse_iso.h
>> +    size_t tnt_dt_parse_iso_date          (const char *str, size_t len,
> dt_t *dt);
>> +    size_t tnt_dt_parse_iso_time          (const char *str, size_t len,
> int *sod, int *nsec);
>> +    size_t tnt_dt_parse_iso_zone_lenient  (const char *str, size_t len,
> int *offset);
> 
> The line is too long. Please ensure that all lines are <= 80 characters
> long.

Reformatted.

> 
>> +
>> +    // datetime.c
>> +    int
>> +    datetime_to_string(const struct datetime * date, char *buf,
> uint32_t len);
>> +
>> +    char *
>> +    datetime_asctime(const struct datetime *date, char *buf);
>> +
>> +    char *
>> +    datetime_ctime(const struct datetime *date, char *buf);
>> +
>> +    size_t
>> +    datetime_strftime(const struct datetime *date, const char *fmt,
> char *buf,
>> +                      uint32_t len);
>> +
>> +    void
>> +    datetime_now(struct datetime * now);
> 
> Extra space after '*'.

Removed

> 
>> +
>> +]]
>> +
>> +local builtin = ffi.C
>> +local math_modf = math.modf
>> +
>> +local SECS_PER_DAY     = 86400
>> +
>> +-- c-dt/dt_config.h
>> +
>> +-- Unix, January 1, 1970, Thursday
>> +local DT_EPOCH_1970_OFFSET = 719163
>> +
>> +
>> +local datetime_t = ffi.typeof('struct datetime')
>> +local interval_t = ffi.typeof('struct datetime_interval')
>> +
>> +local function is_interval(o)
>> +    return type(o) == 'cdata' and ffi.istype(interval_t, o)
> 
> The check for 'cdata' is redundant. ffi.istype alone should be enough
> AFAIK.

Wanted to object, but have verified and indeed, in that particular order 
(first ffi metatype reference, then value) it's accepting every type. 
Apparently I'd got wrong impression when I passed arguments in the wrong 
order. Updated.


> 
>> +end
>> +
>> +local function is_datetime(o)
>> +    return type(o) == 'cdata' and ffi.istype(datetime_t, o)
>> +end
>> +
>> +local function is_date_interval(o)
>> +    return type(o) == 'cdata' and
>> +           (ffi.istype(interval_t, o) or ffi.istype(datetime_t, o))
>> +end
>> +
>> +local function interval_new()
>> +    local interval = ffi.new(interval_t)
>> +    return interval
>> +end
>> +
>> +local function check_date(o, message)
>> +    if not is_datetime(o) then
>> +        return error(("%s: expected datetime, but received %s"):
>> +                     format(message, o), 2)
>> +    end
>> +end
>> +
>> +local function check_str(s, message)
>> +    if not type(s) == 'string' then
>> +        return error(("%s: expected string, but received %s"):
>> +                     format(message, s), 2)
>> +    end
>> +end
>> +
>> +local function check_range(v, range, txt)
>> +    assert(#range == 2)
>> +    if v < range[1] or v > range[2] then
>> +        error(('value %d of %s is out of allowed range [%d, %d]'):
>> +              format(v, txt, range[1], range[2]), 4)
>> +    end
>> +end
>> +
>> +local function datetime_cmp(lhs, rhs)
>> +    if not is_date_interval(lhs) or
>> +       not is_date_interval(rhs) then
>> +       return nil
> 
> Bad indentation.

Fixed.

> 
>> +    end
>> +    local sdiff = lhs.secs - rhs.secs
>> +    return sdiff ~= 0 and sdiff or (lhs.nsec - rhs.nsec)
>> +end
>> +
>> +local function datetime_eq(lhs, rhs)
>> +    local rc = datetime_cmp(lhs, rhs)
>> +    return rc ~= nil and rc == 0
>> +end
>> +
>> +local function datetime_lt(lhs, rhs)
>> +    local rc = datetime_cmp(lhs, rhs)
>> +    return rc == nil and error('incompatible types for comparison', 2)
> or
>> +           rc < 0
>> +end
>> +
>> +local function datetime_le(lhs, rhs)
>> +    local rc = datetime_cmp(lhs, rhs)
>> +    return rc == nil and error('incompatible types for comparison', 2)
> or
>> +           rc <= 0
>> +end
>> +
>> +local function datetime_serialize(self)
>> +    return { secs = self.secs, nsec = self.nsec, offset = self.offset }
>> +end
>> +
>> +local function interval_serialize(self)
>> +    return { secs = self.secs, nsec = self.nsec }
>> +end
>> +
>> +local function datetime_new_raw(secs, nsec, offset)
>> +    local dt_obj = ffi.new(datetime_t)
>> +    dt_obj.secs = secs
>> +    dt_obj.nsec = nsec
>> +    dt_obj.offset = offset
>> +    return dt_obj
>> +end
>> +
>> +local function datetime_new_dt(dt, secs, frac, offset)
> 
> What's 'frac'? Please either add a comment or give it a better name.

fraction part. Renamed to full fraction.

> 
>> +    local epochV = dt ~= nil and (builtin.tnt_dt_rdn(dt) -
> DT_EPOCH_1970_OFFSET) *
>> +                   SECS_PER_DAY or 0
> 
> AFAIK we don't use camel-case for naming local variables in Lua code.
> 
>> +    local secsV = secs ~= nil and secs or 0
> 
> This is equivalent to 'secs = secs or 0'

Indeed. Fixed.

> 
>> +    local fracV = frac ~= nil and frac or 0
>> +    local ofsV = offset ~= nil and offset or 0
>> +    return datetime_new_raw(epochV + secsV - ofsV * 60, fracV, ofsV)
>> +end
>> +
>> +-- create datetime given attribute values from obj
>> +-- in the "easy mode", providing builder with
>> +-- .secs, .nsec, .offset
>> +local function datetime_new_obj(obj, ...)
>> +    if obj == nil or type(obj) ~= 'table' then
> 
> type(obj) ~= 'table' implies 'obj' == nil

I meant that obj may be simple number, not builder object.

> 
>> +        return datetime_new_raw(obj, ...)
>> +    end
>> +    local secs = 0
>> +    local nsec = 0
>> +    local offset = 0
>> +
>> +    for key, value in pairs(obj) do
>> +        if key == 'secs' then
>> +            secs = value
>> +        elseif key == 'nsec' then
>> +            nsec = value
>> +        elseif key == 'offset' then
>> +            offset = value
>> +        else
>> +            error(('unknown attribute %s'):format(key), 2)
>> +        end
> 
> Hmm, do we really need this check? IMO this is less clear and probably
> slower than accessing the table directly:
> 
> local secs = obj.secs
> local nsecs = obj.nsecs
> ...

Ok, let me put you to the context, where we have been iteration before. 
Here we see smaller part of larger code which used to be emulating named 
arguments for creating datetime using attributes from passed. It's a 
standard idiom for emulating named arguments in a languages where there 
is no such syntax, but there is json-like table/object/hash facilities.

https://www.lua.org/pil/5.3.html

The idea is to allow constructions like below (with any set of 
attributes passed)

	date = require 'datetime'
	T = date.new{ year = 20201, month = 11, day = 15,
		      hour = 1, minute = 10, second = 01, tz = 180}

The problem was that there is set of attributes, which we want to use 
for initialization via this same named attributes idiom, but directly 
using .secs, .nsec, .offset attributes when we know what we are doing.

	T = date.new_raw{ secs = 0, nsec = 0, offset = 180}

That's why this silly separate version has been created which strangely 
parsing set of attributes accessible in the cdata object. (If there is 
passed cdata object, but it's not)

[Worth to note, that at the moment in addition to initialization object 
`date.new_raw` now handles gracefully this case of calling it without 
creating initialization object, but passing arguments:

	T = date.new_raw(0, 0, 180)

This is why there is check for object in obj `type(obj) ~= 'table'`
]


Now you have asked that I realized that ffi.new could do the 
initialization magic for us, and instead of this for-loop we could 
simply pass an object to the 2nd argument of ffi.new.

	tarantool> o = { secs = 0, nsec = 0, offset = 180}
	---
	...
	
	tarantool> ffi = require 'ffi'
	---
	...
	
	tarantool> T = ffi.new('struct datetime', o)
	---
	...
	
	tarantool> T
	---
	- 1970-01-01T03:00+03:00
	...

The problem is - it will be silently ignoring bogus attributes we may 
pass to this initialization object, which do not map directly to the set 
of known fields in a structure initialized.

Which is not a problem for the code above, where we verify set of 
attributes in an object, and bail out if there is any unknown attribute.

Performance-wise I do not expect such fancy initialization of objects 
would be on a critical path. The expected scenario for average scenario 
- creating datetime objects using textual strings (from logs). And this 
is very fast, due to c-dt parsing speed.


So here is dilema:
- we could directly pass initialization object to the ffi.new. But allow 
all kinds of unknown attributes;
- or we could check all attributes in a slightly slower loop, but 
provide nice user visible diagnostics for an error.

What would you recommend? (I prefer implementation with checks and 
human-understandable errors)


> 
> I never saw we do anything like this in Lua code.
> 
> Same comment for datetime_new and other places where you use pairs()
> like this.
> 

It's still same approach to use initialization object for "named 
arguments" idiom.

>> +    end
>> +
>> +    return datetime_new_raw(secs, nsec, offset)
>> +end
>> +
>> +-- create datetime given attribute values from obj
> 
> Bad comment - what's 'obj'?

Yes, probably the confusion created because I was too terse here, and 
not explained that obj is initialization object for "named arguments" 
approach.

> 
>> +local function datetime_new(obj)
>> +    if obj == nil or type(obj) ~= 'table' then
>> +        return datetime_new_raw(0, 0, 0)
>> +    end
>> +    local y = 0
>> +    local M = 0
>> +    local d = 0
>> +    local ymd = false
>> +
>> +    local h = 0
>> +    local m = 0
>> +    local s = 0
>> +    local frac = 0
>> +    local hms = false
>> +    local offset = 0
>> +
>> +    local dt = 0
>> +
>> +    for key, value in pairs(obj) do
>> +        if key == 'year' then
>> +            check_range(value, {1, 9999}, key)
>> +            y = value
>> +            ymd = true
>> +        elseif key == 'month' then
>> +            check_range(value, {1, 12}, key)
>> +            M = value
>> +            ymd = true
>> +        elseif key == 'day' then
>> +            check_range(value, {1, 31}, key)
>> +            d = value
>> +            ymd = true
>> +        elseif key == 'hour' then
>> +            check_range(value, {0, 23}, key)
>> +            h = value
>> +            hms = true
>> +        elseif key == 'min' or key == 'minute' then
>> +            check_range(value, {0, 59}, key)
>> +            m = value
>> +            hms = true
>> +        elseif key == 'sec' or key == 'second' then
> 
> I don't think we should support both 'sec'/'min' and 'second'/'minute'
> here. Since you want to be consistent with os.date() output, I would
> leave 'sec'/'min' only.

Originally, there were only human-readable names like 'second' or 
'minute', os.date compatibility added only lately. Because, it costs 
nothing (comparing to all other overhead).

I'd prefer to have ability to use full readable names, not only those 
from set of os.date. It costs nothing, but make it's friendlier.

> 
>> +            check_range(value, {0, 60}, key)
>> +            s, frac = math_modf(value)
>> +            frac = frac * 1e9 -- convert fraction to nanoseconds
> 
> So 'frac' actually stores nanoseconds. Please rename accordingly.

Yes, this is fractional part of seconds, but represented as integers in 
nanoseconds units. Renamed.

>> +            hms = true
>> +        elseif key == 'tz' then
>> +        -- tz offset in minutes
> 
> Bad indentation.

Fixed.

> 
>> +            check_range(value, {0, 720}, key)
>> +            offset = value
>> +        elseif key == 'isdst' or key == 'wday' or key =='yday' then --
> luacheck: ignore 542
> 
> Missing space between '==' and 'yday'.

Fixed.

> 
>> +            -- ignore unused os.date attributes
>> +        else
>> +            error(('unknown attribute %s'):format(key), 2)
>> +        end
>> +    end
>> +
>> +    -- .year, .month, .day
>> +    if ymd then
>> +        dt = builtin.tnt_dt_from_ymd(y, M, d)
>> +    end
>> +
>> +    -- .hour, .minute, .second
>> +    local secs = 0
>> +    if hms then
>> +        secs = h * 3600 + m * 60 + s
>> +    end
>> +
>> +    return datetime_new_dt(dt, secs, frac, offset)
>> +end
>> +
>> +--[[
>> +    Basic      Extended
>> +    20121224   2012-12-24   Calendar date   (ISO 8601)
>> +    2012359    2012-359     Ordinal date    (ISO 8601)
>> +    2012W521   2012-W52-1   Week date       (ISO 8601)
>> +    2012Q485   2012-Q4-85   Quarter date
>> +]]
> 
> Please mention in the comment what this function returns.
> Same for other 'parse' functions.

Good point. Clarified.

> 
>> +
>> +local function parse_date(str)
>> +    check_str("datetime.parse_date()")
> 
> check_str(str, ...)
> 
> Here and everywhere else.
> 
>> +    local dt = ffi.new('dt_t[1]')
>> +    local len = builtin.tnt_dt_parse_iso_date(str, #str, dt)
>> +    return len > 0 and datetime_new_dt(dt[0]) or nil, tonumber(len)
> 
> Is this tonumber() really necessary?

Yup, size_t is boxed in cdata.
```
tarantool> ffi = require 'ffi'
---
...

tarantool> ffi.cdef [[ size_t tnt_dt_parse_iso_date (const char *str, 
size_t len, dt_t *dt); ]]
---
...

tarantool> ffi.cdef [[ typedef int dt_t; ]]
---
...


tarantool> dt = ffi.new 'dt_t[1]'
---
...

tarantool> r = ffi.C.tnt_dt_parse_iso_date('1970-01-01', 10, dt)
---
...

tarantool> dt[0]
---
- 719163
...

tarantool> type(r)
---
- cdata
...

tarantool> ffi.typeof(r)
---
- ctype<uint64_t>
...
```

> 
>> +end
>> +
>> +--[[
>> +    Basic               Extended
>> +    T12                 N/A
>> +    T1230               T12:30
>> +    T123045             T12:30:45
>> +    T123045.123456789   T12:30:45.123456789
>> +    T123045,123456789   T12:30:45,123456789
>> +
>> +    The time designator [T] may be omitted.
>> +]]
>> +local function parse_time(str)
>> +    check_str("datetime.parse_time()")
>> +    local sp = ffi.new('int[1]')
>> +    local fp = ffi.new('int[1]')
>> +    local len = builtin.tnt_dt_parse_iso_time(str, #str, sp, fp)
>> +    return len > 0 and datetime_new_dt(nil, sp[0], fp[0]) or nil,
>> +           tonumber(len)
>> +end
>> +
>> +--[[
>> +    Basic    Extended
>> +    Z        N/A
>> +    +hh      N/A
>> +    -hh      N/A
>> +    +hhmm    +hh:mm
>> +    -hhmm    -hh:mm
>> +]]
>> +local function parse_zone(str)
>> +    check_str("datetime.parse_zone()")
>> +    local offset = ffi.new('int[1]')
>> +    local len = builtin.tnt_dt_parse_iso_zone_lenient(str, #str,
> offset)
>> +    return len > 0 and datetime_new_dt(nil, nil, nil, offset[0]) or
> nil,
>> +           tonumber(len)
>> +end
>> +
> 
> Extra new line.

Removed

> 
>> +
>> +--[[
>> +    aggregated parse functions
>> +    assumes to deal with date T time time_zone
>> +    at once
>> +
>> +    date [T] time [ ] time_zone
>> +]]
>> +local function parse(str)
>> +    check_str("datetime.parse()")
>> +    local dt = ffi.new('dt_t[1]')
>> +    local len = #str
>> +    local n = builtin.tnt_dt_parse_iso_date(str, len, dt)
>> +    local dt_ = dt[0]
>> +    if n == 0 or len == n then
>> +        return datetime_new_dt(dt_)
>> +    end
>> +
>> +    str = str:sub(tonumber(n) + 1)
>> +
>> +    local ch = str:sub(1,1)
> 
> Missing space after ','.

Fixed

> 
>> +    if ch:match('[Tt ]') == nil then
>> +        return datetime_new_dt(dt_)
>> +    end
>> +
>> +    str = str:sub(2)
>> +    len = #str
>> +
>> +    local sp = ffi.new('int[1]')
>> +    local fp = ffi.new('int[1]')
>> +    local n = builtin.tnt_dt_parse_iso_time(str, len, sp, fp)
>> +    if n == 0 then
>> +        return datetime_new_dt(dt_)
>> +    end
>> +    local sp_ = sp[0]
>> +    local fp_ = fp[0]
>> +    if len == n then
>> +        return datetime_new_dt(dt_, sp_, fp_)
>> +    end
>> +
>> +    str = str:sub(tonumber(n) + 1)
>> +
>> +    if str:sub(1,1) == ' ' then
> 
> Missing space after ','.

Fixed

> 
>> +        str = str:sub(2)
>> +    end
>> +
>> +    len = #str
>> +
>> +    local offset = ffi.new('int[1]')
>> +    n = builtin.tnt_dt_parse_iso_zone_lenient(str, len, offset)
>> +    if n == 0 then
>> +        return datetime_new_dt(dt_, sp_, fp_)
>> +    end
>> +    return datetime_new_dt(dt_, sp_, fp_, offset[0])
>> +end
>> +
>> +local function datetime_from(o)
> 
> Please add a comment explaining what this function does.

Added.

> 
>> +    if o == nil or type(o) == 'table' then
>> +        return datetime_new(o)
>> +    elseif type(o) == 'string' then
>> +        return parse(o)
>> +    end
>> +end
>> +
>> +local function local_now()
> 
> Please add a comment explaining what this function does.

Added.

> 
>> +    local d = datetime_new_raw(0, 0, 0)
>> +    builtin.datetime_now(d)
>> +    return d
>> +end
>> +
>> +-- Change the time-zone to the provided target_offset
>> +-- Time `.secs`/`.nsec` are always UTC normalized, we need only to
>> +-- reattribute object with different `.offset`
> 
> It doesn't change the time-zone of the given object. It creates a new
> object with the new timezone. Please fix the comment to avoid confusion.

Indeed. Implementation has been changed to make original object immune 
to the timezone changes, but comment has not been updated. Updated now.

> 
>> +local function datetime_to_tz(self, tgt_ofs)
>> +    if self.offset == tgt_ofs then
>> +        return self
>> +    end
>> +    if type(tgt_ofs) == 'string' then
>> +        local obj = parse_zone(tgt_ofs)
>> +        if obj == nil then
>> +            error(('%s: invalid time-zone format %s'):format(self,
> tgt_ofs), 2)
>> +        else
>> +            tgt_ofs = obj.offset
> 
> target_offset. Please don't use confusing abbreviations.
> 
>> +        end
>> +    end
>> +    return datetime_new_raw(self.secs, self.nsec, tgt_ofs)
>> +end
>> +
>> +local function datetime_index(self, key)
>> +    if key == 'epoch' or key == 'unixtime' then
>> +        return self.secs
>> +    elseif key == 'ts' or key == 'timestamp' then
>> +        return self.secs + self.nsec / 1e9
>> +    elseif key == 'ns' or key == 'nanoseconds' then
>> +        return self.secs * 1e9 + self.nsec
>> +    elseif key == 'us' or key == 'microseconds' then
>> +        return self.secs * 1e6 + self.nsec / 1e3
>> +    elseif key == 'ms' or key == 'milliseconds' then
>> +        return self.secs * 1e3 + self.nsec / 1e6
>> +    elseif key == 's' or key == 'seconds' then
>> +        return self.secs + self.nsec / 1e9
>> +    elseif key == 'm' or key == 'min' or key == 'minutes' then
>> +        return (self.secs + self.nsec / 1e9) / 60
>> +    elseif key == 'hr' or key == 'hours' then
>> +        return (self.secs + self.nsec / 1e9) / (60 * 60)
>> +    elseif key == 'd' or key == 'days' then
>> +        return (self.secs + self.nsec / 1e9) / (24 * 60 * 60)
> 
>   1. There's so many ways to get the same information, for example m,
>      min, minutes. There should be exactly one way.

Disagreed. I prefer friendlier approach.

> 
>   2. I'd expect datetime.hour return the number of hours in the current
>      day, like os.date(), but it returns the number of hours since the
>      Unix epoch instead. This is confusing and useless.

Well, at least this is consistent with seconds and their fractions of 
different units. If there would be demand for os.date compatible table 
format, we might extend interface with corresponding accessor. But I do 
not foresee it, because os.date is totally broken if we need correct 
handling of timezone information. AFAIK it's always using local time 
context.
> 
>   3. Please document the behavior somewhere in the comments to the code.

Documented.

> 
>   4. These if-else's are inefficient AFAIU.

Quite the contrary (if we compare to the idiomatic closure handlers 
case). If/ifelse is the fastest way to handle such cases in Lua.
Here is earlier bench we have discussed with Vlad and Oleg Babin, and 
after which I've changed from using hash based handlers to the series of 
ifs.

https://gist.github.com/tsafin/31cc9b0872b6015904fcc90d97740770


> 
>> +    elseif key == 'to_utc' then
>> +        return function(self)
>> +            return datetime_to_tz(self, 0)
>> +        end
>> +    elseif key == 'to_tz' then
>> +        return function(self, offset)
>> +            return datetime_to_tz(self, offset)
>> +        end
>> +    else
>> +        error(('unknown attribute %s'):format(key), 2)
>> +    end
>> +end
>> +
>> +local function datetime_newindex(self, key, value)
>> +    if key == 'epoch' or key == 'unixtime' then
>> +        self.secs = value
>> +        self.nsec, self.offset = 0, 0
>> +    elseif key == 'ts' or key == 'timestamp' then
>> +        local secs, frac = math_modf(value)
>> +        self.secs = secs
>> +        self.nsec = frac * 1e9
>> +        self.offset = 0
> 
> Do we really want the datetime object to be mutable? If so, allowing to
> set its value only to the time since the unix epoch doesn't look
> particularly user-friendly.

I do want. That was request from customer - to have ability to modify by 
timestamp from Epoch.

> 
>> +    else
>> +        error(('assigning to unknown attribute %s'):format(key), 2)
>> +    end
>> +end
>> +
>> +-- sizeof("Wed Jun 30 21:49:08 1993\n")
>> +local buf_len = 26
> 
> What if year > 9999?

Good point. IIRC both BSD libc and glibc have hardcoded length limit at 
around this size and not ready to handle such huge dates properly.

> 
>> +
>> +local function asctime(o)
>> +    check_date(o, "datetime:asctime()")
>> +    local buf = ffi.new('char[?]', buf_len)
> 
> I think it would be more efficient to define the buffer in a global
> variable - we don't yield in this code so this should be fine. Also,
> please give the buffer an appropriate name that would say that it is
> used for datetime formatting.

Good point. Changed to using outer variables in asctime and ctime functions.

> 
>> +    return ffi.string(builtin.datetime_asctime(o, buf))
>> +end
>> +
>> +local function ctime(o)
>> +    check_date(o, "datetime:ctime()")
>> +    local buf = ffi.new('char[?]', buf_len)
>> +    return ffi.string(builtin.datetime_ctime(o, buf))
>> +end
>> +
>> +local function strftime(fmt, o)
>> +    check_date(o, "datetime.strftime()")
>> +    local sz = 128
>> +    local buff = ffi.new('char[?]', sz)
>> +    builtin.datetime_strftime(o, fmt, buff, sz)
>> +    return ffi.string(buff)
>> +end
>> +
>> +local datetime_mt = {
>> +    __serialize = datetime_serialize,
>> +    __eq = datetime_eq,
>> +    __lt = datetime_lt,
>> +    __le = datetime_le,
>> +    __index = datetime_index,
>> +    __newindex = datetime_newindex,
>> +}
>> +
>> +local interval_mt = {
>> +    __serialize = interval_serialize,
>> +    __eq = datetime_eq,
>> +    __lt = datetime_lt,
>> +    __le = datetime_le,
>> +    __index = datetime_index,
> 
> Why does datetime_mt has __newindex while interval_mt doesn't?

I did not foresee the need. Should we?

And semantically datetime has persistence and life-time, but intervals 
have not. [And besides all we already provide a plenty of helpers for 
creation of intervals]

> 
>> +}
>> +
>> +ffi.metatype(interval_t, interval_mt)
>> +ffi.metatype(datetime_t, datetime_mt)
>> +
>> +return setmetatable(
>> +    {
>> +        new         = datetime_new,
>> +        new_raw     = datetime_new_obj,
> 
> I'm not sure, we need to make the 'raw' function public.

That's for second kind of constructors using initialization object but 
with low level attributes {secs = 0, nsec = 0, offset}. It's not for 
everybody, but still provides some ergonomics.

> 
>> +        interval    = interval_new,
> 
> Why would anyone want to create a 0 interval?

Good point, especiall taking into account that there is no way to modify 
it, beyond directly modifying .secs and .nsec fields of cdata object.

Will delete it. We already have a plenty of helpers to construct 
different kinds of intervals.

> 
>> +
>> +        parse       = parse,
>> +        parse_date  = parse_date,
> 
>> +        parse_time  = parse_time,
>> +        parse_zone  = parse_zone,
> 
> Creating a datetime object without a date sounds confusing.

:) But I do foresee the need to parse parts of datetime string, and I 
did not want to establish special type for that.

Now you have asked it I realized we may better create interval objects 
of approapriate time. But in this case we need to add timezone to 
interval record. So it will be able to represent timezone shifts.

Ok, heer is the deal:
- at the moment those partial parsed objects store their data to the 
partial datetime object created. That's confusing but do not require 
modifications in interval arithmetic.
- we may start to store partial time and timezone information into 
generic intervals. But it would require extension of interval arithmetic 
to be ready to shift by timezone delta. [Does it even make any sense?]

What do you think?

> 
>> +
>> +        now         = local_now,
>> +        strftime    = strftime,
>> +        asctime     = asctime,
>> +        ctime       = ctime,
>> +
>> +        is_datetime = is_datetime,
>> +        is_interval = is_interval,
> 
> I don't see any point in making these functions public.

I was asked by customer to introduce such.

> 
>> +    }, {
>> +        __call = function(self, ...) return datetime_from(...) end
>> +    }
>> +)
>> diff --git a/src/lua/init.c b/src/lua/init.c
>> index f9738025d..127e935d7 100644
>> --- a/src/lua/init.c
>> +++ b/src/lua/init.c
>> @@ -129,7 +129,8 @@ extern char strict_lua[],
>>   	parse_lua[],
>>   	process_lua[],
>>   	humanize_lua[],
>> -	memprof_lua[]
>> +	memprof_lua[],
>> +	datetime_lua[]
>>   ;
>>   
>>   static const char *lua_modules[] = {
>> @@ -184,6 +185,7 @@ static const char *lua_modules[] = {
>>   	"memprof.process", process_lua,
>>   	"memprof.humanize", humanize_lua,
>>   	"memprof", memprof_lua,
>> +	"datetime", datetime_lua,
>>   	NULL
>>   };
>>   
>> diff --git a/src/lua/utils.c b/src/lua/utils.c
>> index c71cd4857..2c89326f3 100644
>> --- a/src/lua/utils.c
>> +++ b/src/lua/utils.c
>> @@ -48,6 +48,9 @@ static uint32_t CTID_STRUCT_IBUF_PTR;
>>   uint32_t CTID_CHAR_PTR;
>>   uint32_t CTID_CONST_CHAR_PTR;
>>   uint32_t CTID_UUID;
>> +uint32_t CTID_DATETIME = 0;
>> +uint32_t CTID_INTERVAL = 0;
> 
> Should be called CTID_DATETIME_INTERVAL.
> 
> Anyway, it's never used. Please remove.

It's used in later patches. Wil lsquash and rename.

> 
>> +
>>   
>>   void *
>>   luaL_pushcdata(struct lua_State *L, uint32_t ctypeid)
>> @@ -120,6 +123,12 @@ luaL_pushuuidstr(struct lua_State *L, const struct
> tt_uuid *uuid)
>>   	lua_pushlstring(L, str, UUID_STR_LEN);
>>   }
>>   
>> +struct datetime *
>> +luaL_pushdatetime(struct lua_State *L)
>> +{
>> +	return luaL_pushcdata(L, CTID_DATETIME);
>> +}
>> +
> 
> This function isn't used in this patch. Please move it to patch 4.
> 

Looks like it will be used once squashed.

Thanks,
Timur

P.S.

I need to jump to the different location, so the rest of responses will 
be send in a couple of hours. Please answer some questions - I 'm 
interested in your opinion.


More information about the Tarantool-patches mailing list