[Tarantool-patches] [PATCH v3 2/9] lua: built-in module datetime

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Aug 10 15:20:51 MSK 2021


Hi! Thanks for the fixes!

On 08.08.2021 19:35, Safin Timur wrote:
> Much respect that you've found some time for review, even while being on vacation! Thanks!
> 
> On 08.08.2021 14:26, Vladislav Shpilevoy wrote:
>>>>> +local datetime_index_handlers = {
>>>>> +    unixtime = function(self)
>>>>> +        return self.secs
>>>>> +    end,
>>>>> +
>>>>> +    timestamp = function(self)
>>>>> +        return tonumber(self.secs) + self.nsec / 1e9
>>>>
>>>> 11. If you are saying the Lua number value range is enough,
>>>> why do you store them as 64bit integers in 'self' and
>>>> convert to a number only for the external API?
>>>
>>> May be I misunderstood your sentence, but let me elaborate here.
>>> I need seconds and nanoseconds kept separately for their efficient and precise handling, and for passing to c-dt.
>>>
>>> If we would keep 32-bit integers in seconds then we would be able to handle only dates upto 2038 year, thus we need 64-bit seconds for extended range.
>>>
>>> OTOH, not whole 64-bit range is practical, and required for expected in real-life datetime range values. It's not a problem that Lua number and int64_t have very different range for precise integer values. Lua number could handle integer values upto 9e15, which is corresponding to ...
>>
>> I know all that. The question is why do you keep storing cdata 64 bits numbers
>> inside of the datetime objects, if you convert them all to Lua numbers before
>> return? You could store self.secs as just number. And the other fields too. Lua
>> number is double, it does not loose precision for integers < 2^53, which should
>> be enough for the ranges you want to support. Correct?
> 
> I keep using 64-bit because the primary code operating with fields is on C-side, and we need Lua number only on rare cases when user asked for composed attribute date.timestamp. Whenever we deal with seconds within arthmetic operations or transfer to c-dt, we need integer C type, larger than 32-bit. It's good fit for int64_t.

But it is slower. Notably slower last time I benched, when I also
thought integers should be faster than doubles. But cdata 64 bit integers
are slower than plain Lua numbers. Perhaps because they involve too much
work with metatables for everything. Besides, doubles are larger than 32
bits - they are 53 bits until they start loosing int precision. And it is
just enough for you, isn't it?

>>>> 19. It can ignore some fields (here it ignores 'year'):
>>>>
>>>> tarantool> d1 = datetime({secs = 100, year = 1})
>>>> ---
>>>> ...
>>>>
>>>> tarantool> d2 = datetime({secs = 100, year = 2})
>>>> ---
>>>> ...
>>>>
>>>> tarantool> d1 == d2
>>>> ---
>>>> - true
>>>> ...
>>>>
>>>
>>> Yes, and that's by design. .secs defines 'easy_way' of initializing object via secs/nsec/offset, while .year selects 'ymd' mode. And `easy_way` has precedence. It looks reasonable for me, please, suggest your idea how to handle it even better without much redesign?
>>
>> You could for instance, by looking at a first key you see in the dictionary,
>> decide how will you parse the entire dict. If you see first 'secs', you parse
>> only secs, nsecs and offset. Everything else raises an error. If you see
>> first 'year', you raise an error for the other non-related keys.
>>
>> Ignorance of some keys does not look acceptable.
>>
>> And by the way, does it mean I can't specify a date using year, month, day,
>> hours, and secs? Why isn't it valid?
> 
> Here is the tricky part, agreed that this may be confusing. The "easy mode" operates on secs, nsec, offset, the human-readable mode operates on year, month, day, hour, minute, **second**, and tz.
> 
> There is second in human-readable mode, it's rather named differently (i.e. complete word, non-plural).
> 
> The more important question though - how to implement reasonably diagnosing of such mix-in of raw attributes, with human-radable attributes?
> 
> - I tend to believe that we should **not accept them both** in the single function, but rather accept .secs/.nsec/.offset via different entries, like datetime_new_raw vs datetime_new
> 
> ```
> date = require('datetime')
> T = date.new{year = 1970, month = 1, day = 1, minute = 0, seconds = 15, tz = 180}
> T2 = date.new_raw{secs = 0, nsec = 0, offset = 180} -- will be the same as new(0, 0, 180)
> ```
> 
> In this case we physically prevent mixing of different modes with different allowed attributes in the builder object.
> 
> What do you think?

I agree that these should not be allowed to mix. But the names above
look strange. They both seem raw. Non-raw is a string. Intuitively.

	datetime.new_date()
	datetime.new_stamp()

?

Or still you could have one new() and inside of it allow to set all the
fields. Like I asked above and like you also showed in your example:

	T = date.new{year = 1970, month = 1, day = 1, minute = 0, seconds = 15, tz = 180}

Because I still don't understand, why can't I specify an exact date with
year, day, month, and seconds inside of the day.

What if I want the precision does to nsecs? That is, the max
precision: year, month, day, nsecs.

>>>> 20. Serialization does not seem to be working in the console
>>>> despite you tried to implement it above:
>>>>
>>>> tarantool> d = datetime({secs = 100})
>>>> ---
>>>> ...
>>>>
>>>> tarantool> d
>>>> ---
>>>> - 'cdata<struct datetime_t>: 0x0103ad9c50'
>>>> ...
>>>
>>> Yup, redefining of __serialize is not enough for cdata object by whatever reason (I didn't look deep). But after the next patch(es) it's start to stringize properly.
>>
>> But don't you have __serialize implementation in this commit already?
>> (I don't remember and didn't see the latest code yet.)
> 
> Yes, I did have __serialize there, but that was working strangely and only started to behave properly (well, more or less) once I've added __tostring in the following commit. I'm not very well informed about expected console behaviour (regarding those 2 meta-methods, at least), so took that for granted. Probably, there is some bug, may be it's expected behaviour for cdata based data structures?

Does not matter much. I am not against serialization done in a separate
commit. I only don't like when it is added half-way in the second commit
and starts working at some point later.

I propose to make it atomic. Either the serialization is added here and
starts working here and is tested here, or all is done in a separate
commit.

>>>> 22. Did you really test this function?
>>>>
>>>> tarantool> dt = require('datetime')
>>>> ---
>>>> ...
>>>>
>>>> tarantool> d1 = dt()
>>>> ---
>>>> ...
>>>>
>>>> tarantool> d2 - 1
>>>> ---
>>>> - error: 'builtin/datetime.lua:487: bad argument #1 to ''typeof'' (C type expected,
>>>>       got number)'
>>>> ...
>>>
>>> All kinds of arguments checking implemented in the separate patch about interval arithmetic operations, please find a moment to look into them.
>>
>> Please, do not introduce broken functions. Make the correct ones right away and
>> add tests. The new code does not need to introduce full functionality if it
>> can be done in an atomic way, but it must not be visibly broken at least. Here
>> it is.
> 
> Ok, here is how it's looking in patchset: in 1 commit we introduce arithmetic support, but not handle well all corner cases (especially there is no support for proper year or month arithmetics). Special types for year, and months date intervals have been added in the next commit, where we handle all corner cases with bogus data.
> 
> We could:
> - either squash interval support to the original module;
> - or remove arithmetic altogether from original patch creating the module datetime.lua.
> 
> 
> What is the recommended way?

I would leave the intervals in their own commit, in which I would also
add the arithmetics. Working correctly right away.

>>>>> +
>>>>> +local function datetime_from(o)
>>>>> +    if o == nil or type(o) == 'table' then
>>>>> +        return datetime_new(o)
>>>>> +    elseif type(o) == 'string' then
>>>>> +        return parse(o)
>>>>> +    end
>>>>> +end
>>>>> +
>>>>> +local function local_now()
>>>>> +    local p_tv = ffi.new('struct timeval [1]')
>>>>> +    local rc = builtin.gettimeofday(p_tv, nil)
>>>>
>>>> 24. Why can't you use fiber time? It should be enough in
>>>> nearly all usecases and is much faster. For real-real time
>>>> talking directly to the kernel we have 'clock' Lua module
>>>> already. You could check if results of clock functions can
>>>> be passed into your module to create 'real' timestamps not
>>>> depending on the eventloop.
>>>
>>> Fiber time, which actually return cached ev_rt_now, looks imprecise and depends on event loop circumstances, though eventually it calls the same clock_gettime(). So fiber-time is looking like a cached timestamp value from [hopefully, non-distant] past.
>>>
>>> I'd rather prefer to have more realtime values, and clock_realtime*/clock_monotonic* look relevant, they call clock_gettime() for their purposes, which in turn calls the same gettimeofday(). So in this case we call same api, but with 2 extra call layers above.
>>
>> By using gettimeofday you drag a lot of cdef garbage including the function
>> itself and what it returns. Please, don't. Anyway, the standard says struct
>> timeval can have more members than the usually used seconds and microseconds:
>>
>>     https://pubs.opengroup.org/onlinepubs/7908799/xsh/systime.h.html
>>
>>     The <sys/time.h> header defines the timeval structure that includes
>>     **at least** the following members
>>
>>> IMHO, not worth extra complexity if we could just simply call gettimeofday().
>>
>> Now you added a lot of extra complexity exactly because of this function.
> 
> Once again, this code is in C now, there is no any non-portable timeval definition of ffi to gettimeofday in Lua - everything is in C, which extra complexity you are talking about? I do not understand.

Sorry, I didn't see the latest version at the time I wrote this
comment.


More information about the Tarantool-patches mailing list