[Tarantool-patches] [PATCH v3 2/9] lua: built-in module datetime

Sun Aug 8 19:35:35 MSK 2021

Much respect that you've found some time for review, even while being on 
vacation! Thanks!

On 08.08.2021 14:26, Vladislav Shpilevoy wrote:
>>>> +local datetime_index_handlers = {
>>>> +    unixtime = function(self)
>>>> +        return self.secs
>>>> +    end,
>>>> +
>>>> +    timestamp = function(self)
>>>> +        return tonumber(self.secs) + self.nsec / 1e9
>>>
>>> 11. If you are saying the Lua number value range is enough,
>>> why do you store them as 64bit integers in 'self' and
>>> convert to a number only for the external API?
>>
>> May be I misunderstood your sentence, but let me elaborate here.
>> I need seconds and nanoseconds kept separately for their efficient and precise handling, and for passing to c-dt.
>>
>> If we would keep 32-bit integers in seconds then we would be able to handle only dates upto 2038 year, thus we need 64-bit seconds for extended range.
>>
>> OTOH, not whole 64-bit range is practical, and required for expected in real-life datetime range values. It's not a problem that Lua number and int64_t have very different range for precise integer values. Lua number could handle integer values upto 9e15, which is corresponding to ...
> 
> I know all that. The question is why do you keep storing cdata 64 bits numbers
> inside of the datetime objects, if you convert them all to Lua numbers before
> return? You could store self.secs as just number. And the other fields too. Lua
> number is double, it does not loose precision for integers < 2^53, which should
> be enough for the ranges you want to support. Correct?

I keep using 64-bit because the primary code operating with fields is on 
C-side, and we need Lua number only on rare cases when user asked for 
composed attribute date.timestamp. Whenever we deal with seconds within 
arthmetic operations or transfer to c-dt, we need integer C type, larger 
than 32-bit. It's good fit for int64_t.


> 
>>>> +local function datetime_new(o)
>>>> +    if o == nil then
>>>> +        return datetime_new_raw(0, 0, 0)
>>>> +    end
>>>> +    local secs = 0
>>>> +    local nsec = 0
>>>> +    local offset = 0
>>>> +    local easy_way = false
>>>> +    local y = 0
>>>> +    local M = 0
>>>> +    local d = 0
>>>> +    local ymd = false
>>>> +
>>>> +    local h = 0
>>>> +    local m = 0
>>>> +    local s = 0
>>>> +    local frac = 0
>>>> +    local hms = false
>>>> +
>>>> +    local dt = 0
>>>> +
>>>> +    local handlers = {
>>>
>>> 17. Please, stop using closures in the constructor of a
>>> potentionally very frequently created/deleted object >
>>
>> I still disagreed that datetime_new constructor will be on critical path, the most frequent case expected to be parse*() -> datetime_new_dt() -> datetime_new_raw(). i.e. when we parse millions of timestamps in logs and then proceed them.
>>
>> datetime_new(builder_object) expected to be called in rare cases when object got created from Lua code, not on performance critical paths, thus convenience here is more important than performance.
>>
>> And BTW I've created simple benchmark for comparing datetime_new_dt vs datetime_new(from builder object) vs datetime_new_2 which would be not using hash object with handlers, but rather sequence of ifs.
>>
>> https://gist.github.com/tsafin/31cc9b0872b6015904fcc90d97740770
>> ```
>>   ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3]
>> 00:39 $ ./build/src/tarantool ../bench-hash-dt.lua
>> datetime_new_dt 0.002718448638916
>> datetime_new    5.1318607330322
>> datetime_new(precreated)        4.4448664188385
>> datetime_new_2(ifs)     4.4977836608887
>> ```
>>
>> Note, that creation from builder object is much, much slower than raw cdata object creation (that's expected). If we move object out of loop, and precreate it then we save some time (as expected) 5.1 vs 4.4.
>> But if we change implementation from hash with closure handlers to sequence of ifs - we actually degrade performance (a bit) from 4.44 to 4.49.
>>
>> Biggest performance impact here - traversing of object attributes, but whether it's using closures or not using them, it makes no much difference.
>>
>> But using closures makes this code look much more elegant. So keeping it here for today.
> 
> I want to see the version you created with ifs, please. With the closures
> you do the same calculations + create the functions, it can't be slower
> unless something is wrong with the version you did with ifs.

I've provided you the link to this benchmark - please see gist 
https://gist.github.com/tsafin/31cc9b0872b6015904fcc90d97740770

But now looked into this code again, and discovered that I didn't call 
datetime_new_2 (version with ifs) in the last case, but rather 
copy-pasted datetime_new (original version), and timings should be 
actually the same as in precreated version, where we created object with 
attributes before loop, and then used it as invariant.

So here are local timings with corrected code. Ran 3 times:

```
✔ ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3|…1⚑ 3]
18:01 $ ./build/src/tarantool ../bench-hash-dt.lua
datetime_new_dt 0.001617431640625
datetime_new    3.0791389942169
datetime_new(precreated)        3.1896848678589
datetime_new_2(ifs)     0.15672922134399
✔ ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3|…1⚑ 3]
18:02 $ ./build/src/tarantool ../bench-hash-dt.lua
datetime_new_dt 0.0017032623291016
datetime_new    3.0739498138428
datetime_new(precreated)        2.9590804576874
datetime_new_2(ifs)     0.14570164680481
✔ ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3|…1⚑ 3]
18:12 $ ./build/src/tarantool ../bench-hash-dt.lua
datetime_new_dt 0.0016193389892578
datetime_new    3.4707424640656
datetime_new(precreated)        3.1298229694366
datetime_new_2(ifs)     0.14686107635498
```

So yes, you were correct, version with ifs is 20x times faster than more 
readable version using hash with closures.


> 
>>> 19. It can ignore some fields (here it ignores 'year'):
>>>
>>> tarantool> d1 = datetime({secs = 100, year = 1})
>>> ---
>>> ...
>>>
>>> tarantool> d2 = datetime({secs = 100, year = 2})
>>> ---
>>> ...
>>>
>>> tarantool> d1 == d2
>>> ---
>>> - true
>>> ...
>>>
>>
>> Yes, and that's by design. .secs defines 'easy_way' of initializing object via secs/nsec/offset, while .year selects 'ymd' mode. And `easy_way` has precedence. It looks reasonable for me, please, suggest your idea how to handle it even better without much redesign?
> 
> You could for instance, by looking at a first key you see in the dictionary,
> decide how will you parse the entire dict. If you see first 'secs', you parse
> only secs, nsecs and offset. Everything else raises an error. If you see
> first 'year', you raise an error for the other non-related keys.
> 
> Ignorance of some keys does not look acceptable.
> 
> And by the way, does it mean I can't specify a date using year, month, day,
> hours, and secs? Why isn't it valid?

Here is the tricky part, agreed that this may be confusing. The "easy 
mode" operates on secs, nsec, offset, the human-readable mode operates 
on year, month, day, hour, minute, **second**, and tz.

There is second in human-readable mode, it's rather named differently 
(i.e. complete word, non-plural).

The more important question though - how to implement reasonably 
diagnosing of such mix-in of raw attributes, with human-radable attributes?

- I tend to believe that we should **not accept them both** in the 
single function, but rather accept .secs/.nsec/.offset via different 
entries, like datetime_new_raw vs datetime_new

```
date = require('datetime')
T = date.new{year = 1970, month = 1, day = 1, minute = 0, seconds = 15, 
tz = 180}
T2 = date.new_raw{secs = 0, nsec = 0, offset = 180} -- will be the same 
as new(0, 0, 180)
```

In this case we physically prevent mixing of different modes with 
different allowed attributes in the builder object.

What do you think?


> 
>>> 20. Serialization does not seem to be working in the console
>>> despite you tried to implement it above:
>>>
>>> tarantool> d = datetime({secs = 100})
>>> ---
>>> ...
>>>
>>> tarantool> d
>>> ---
>>> - 'cdata<struct datetime_t>: 0x0103ad9c50'
>>> ...
>>
>> Yup, redefining of __serialize is not enough for cdata object by whatever reason (I didn't look deep). But after the next patch(es) it's start to stringize properly.
> 
> But don't you have __serialize implementation in this commit already?
> (I don't remember and didn't see the latest code yet.)

Yes, I did have __serialize there, but that was working strangely and 
only started to behave properly (well, more or less) once I've added 
__tostring in the following commit. I'm not very well informed about 
expected console behaviour (regarding those 2 meta-methods, at least), 
so took that for granted. Probably, there is some bug, may be it's 
expected behaviour for cdata based data structures?


> 
>>>> +            o = right_t == datetime_t and interval_new() or datetime_new()
>>>> +            o.secs, o.nsec = _normalize_nsec(lhs.secs - rhs.secs,
>>>> +                                            lhs.nsec - rhs.nsec)
>>>> +            return o
>>>> +        else
>>>> +            error_incompatible("operator -")
>>>> +        end
>>>> +    -- both left and right are generic intervals
>>>> +    elseif left_t == interval_t and right_t == interval_t then
>>>> +        o = interval_new()
>>>> +        o.secs, o.nsec = _normalize_nsec(lhs.secs - rhs.secs,
>>>> +                                        lhs.nsec - rhs.nsec)
>>>> +        return o
>>>> +    else
>>>> +        error_incompatible("operator -")
>>>> +    end
>>>> +end
>>>
>>> 22. Did you really test this function?
>>>
>>> tarantool> dt = require('datetime')
>>> ---
>>> ...
>>>
>>> tarantool> d1 = dt()
>>> ---
>>> ...
>>>
>>> tarantool> d2 - 1
>>> ---
>>> - error: 'builtin/datetime.lua:487: bad argument #1 to ''typeof'' (C type expected,
>>>       got number)'
>>> ...
>>
>> All kinds of arguments checking implemented in the separate patch about interval arithmetic operations, please find a moment to look into them.
> 
> Please, do not introduce broken functions. Make the correct ones right away and
> add tests. The new code does not need to introduce full functionality if it
> can be done in an atomic way, but it must not be visibly broken at least. Here
> it is.

Ok, here is how it's looking in patchset: in 1 commit we introduce 
arithmetic support, but not handle well all corner cases (especially 
there is no support for proper year or month arithmetics). Special types 
for year, and months date intervals have been added in the next commit, 
where we handle all corner cases with bogus data.

We could:
- either squash interval support to the original module;
- or remove arithmetic altogether from original patch creating the 
module datetime.lua.


What is the recommended way?

> 
>>>> +
>>>> +local function datetime_from(o)
>>>> +    if o == nil or type(o) == 'table' then
>>>> +        return datetime_new(o)
>>>> +    elseif type(o) == 'string' then
>>>> +        return parse(o)
>>>> +    end
>>>> +end
>>>> +
>>>> +local function local_now()
>>>> +    local p_tv = ffi.new('struct timeval [1]')
>>>> +    local rc = builtin.gettimeofday(p_tv, nil)
>>>
>>> 24. Why can't you use fiber time? It should be enough in
>>> nearly all usecases and is much faster. For real-real time
>>> talking directly to the kernel we have 'clock' Lua module
>>> already. You could check if results of clock functions can
>>> be passed into your module to create 'real' timestamps not
>>> depending on the eventloop.
>>
>> Fiber time, which actually return cached ev_rt_now, looks imprecise and depends on event loop circumstances, though eventually it calls the same clock_gettime(). So fiber-time is looking like a cached timestamp value from [hopefully, non-distant] past.
>>
>> I'd rather prefer to have more realtime values, and clock_realtime*/clock_monotonic* look relevant, they call clock_gettime() for their purposes, which in turn calls the same gettimeofday(). So in this case we call same api, but with 2 extra call layers above.
> 
> By using gettimeofday you drag a lot of cdef garbage including the function
> itself and what it returns. Please, don't. Anyway, the standard says struct
> timeval can have more members than the usually used seconds and microseconds:
> 
> 	https://pubs.opengroup.org/onlinepubs/7908799/xsh/systime.h.html
> 
> 	The <sys/time.h> header defines the timeval structure that includes
> 	**at least** the following members
> 
>> IMHO, not worth extra complexity if we could just simply call gettimeofday().
> 
> Now you added a lot of extra complexity exactly because of this function.

Once again, this code is in C now, there is no any non-portable timeval 
definition of ffi to gettimeofday in Lua - everything is in C, which 
extra complexity you are talking about? I do not understand.


> 
>>> 25. Functions like ctime can't be used in Lua, because according
>>> to their Linux man:
>>>
>>>      The return value points to a statically allocated string
>>>      which might be overwritten by subsequent calls to any of the
>>>      date and time functions
>>>
>>> Which means they use a global buffer. If you will
>>> use datetime module in any of __gc handlers, it will
>>> return garbage sometimes or just override results
>>> somethere. That is exactly the same problem as existed
>>> with tarantool's static buffer.
>>
>> I do not believe it's the same problem - in this case content of static buffer is immediately interned in Lua via ffi.string()
> 
> Well, it is exactly the same problem. How do you think the static
> buffer used to be copied to Lua until we purged it from everywhere? Using
> the same ffi.string(). Before ffi.string() will copy anything, it
> will start GC on demand, and inside of GC the string might be overwritten.
> So when GC ends, ffi.string() will copy garbage or some new foreign string.
> 
>> https://luajit.org/ext_ffi_api.html
>> "The Lua string is an (interned) copy of the data and bears no relation to the original data area anymore."
> 
> It does not say anything about when GC is started, and that was the
> problem with already 2 global buffers - our static one and one from
> luacjson library. I don't see how a third global buffer is different.
> It is broken in the same way precisely.
> 
> Look at this commit: acf8745ed8fef47e6d1f1c31708c7c9d6324d2f3
> ("uuid: drop tt_uuid_str() from Lua"). It is your case exactly.
> A global buffer is returned to Lua via ffi and then ffi.string()
> was called. There is a test which breaks when you do so. Strings
> become wrong and/or garbage.

Thanks for the hint, that's very nice test, and quite convincing! I've 
added datetime.ctime and datetime.asctime checkes there:
==========================================================

diff --git a/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua 
b/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua
index 84c2944e5..7eb05f9e8 100755
--- a/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua
+++ b/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua
@@ -14,6 +14,7 @@ local uri = require('uri')
  local json = require('json')
  local msgpackffi = require('msgpackffi')
  local decimal = require('decimal')
+local datetime = require('datetime')

  local function test_uuid(test)
      test:plan(1)
@@ -256,15 +257,86 @@ local function test_decimal(test)
      test:ok(is_success, 'decimal str in gc')
  end

+local function test_datetime_asctime(test)
+    test:plan(1)
+
+    local gc_count = 100
+    local iter_count = 1000
+    local is_success = true
+
+    local T1 = datetime('1970-01-01')
+    local T2 = datetime('2000-01-01')
+
+    local function datetime_asctime()
+        local str1 = datetime.asctime(T1)
+        local str2 = datetime.asctime(T2)
+        local str3 = datetime.asctime(T1)
+        local str4 = datetime.asctime(T2)
+        if str1 ~= str3 or str2 ~= str4 then
+            is_success = false
+        end
+    end
+
+    local function create_gc()
+        for _ = 1, gc_count do
+            ffi.gc(ffi.new('char[1]'), function() datetime_asctime() end)
+        end
+    end
+
+    for _ = 1, iter_count do
+        create_gc()
+        datetime_asctime()
+    end
+
+    test:ok(is_success, 'info datetime in gc')
+end
+
+local function test_datetime_ctime(test)
+    test:plan(1)
+
+    local gc_count = 100
+    local iter_count = 1000
+    local is_success = true
+
+    local T1 = datetime('1970-01-01')
+    local T2 = datetime('2000-01-01')
+
+    local function datetime_ctime()
+        local str1 = datetime.ctime(T1)
+        local str2 = datetime.ctime(T2)
+        local str3 = datetime.ctime(T1)
+        local str4 = datetime.ctime(T2)
+        if str1 ~= str3 or str2 ~= str4 then
+            is_success = false
+        end
+    end
+
+    local function create_gc()
+        for _ = 1, gc_count do
+            ffi.gc(ffi.new('char[1]'), function() datetime_ctime() end)
+        end
+    end
+
+    for _ = 1, iter_count do
+        create_gc()
+        datetime_ctime()
+    end
+
+    test:ok(is_success, 'info datetime in gc')
+end
+
+
  box.cfg{}

  local test = tap.test('gh-5632-6050-6259-gc-buf-reuse')
-test:plan(6)
+test:plan(8)
  test:test('uuid in __gc', test_uuid)
  test:test('uri in __gc', test_uri)
  test:test('msgpackffi in __gc', test_msgpackffi)
  test:test('json in __gc', test_json)
  test:test('info uuid in __gc', test_info_uuid)
  test:test('decimal str in __gc', test_decimal)
+test:test('datetime.asctime in __gc', test_datetime_asctime)
+test:test('datetime.ctime in __gc', test_datetime_ctime)

  os.exit(test:check() and 0 or 1)
==========================================================

And regardless of a number of times I ran it in Debug mode - it was 
never failing. But...

Once, I've recompiled branch in Release mode - it start to faile evrry 
next moment. I'll play with reentrant versions of ctime (ctime_r) and 
asctime (asctime_r), making sure that LuaJIT allocators will know about 
buffers passed to and returned from those functions.

> 
>>>> diff --git a/test/unit/datetime.c b/test/unit/datetime.c
>>>> new file mode 100644
>>>> index 000000000..b6f568c03
>>>> --- /dev/null
>>>> +++ b/test/unit/datetime.c
>>>
>>> 26. This file tests dt library. It does not test your code. Please, add
>>> tests for everything you implemented in this commit. For c-dt library
>>> the tests must be added in the previous commit, and the test file
>>> must be called test/unit/cdt.c then.
>>
>> Unit-test has been moved to the commit introducing c-dt. And in turn, initial datetime test (checking parser and ctime/asctime/now) squashed to the same commit where we introduce module with parser.
> 
> Thanks, that should be better.
> 

Thanks,
Timur