From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A95206EC40; Sun, 8 Aug 2021 19:35:40 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A95206EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1628440540; bh=H86+v9dWfijoscMv1XfMkTOgc5onk5LfIDNaVJvbjRY=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=v1SbM05ZuNJbcro2hd6x/ORfGxT3JaWuT1lTKbVajoMIb1CwqE1OpyLU7q67r0a2l uk49F8ICs6N5nviDr/RuzbsPn+9QiaFKef2GL0DUAOVf7ZgkUP0l8IXtl2tsOATiOG jStYFJ0dALcZTLsQhbWZC7qxrADCZgw2fNt7tVPc= Received: from smtp46.i.mail.ru (smtp46.i.mail.ru [94.100.177.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id CED7A6EC40 for ; Sun, 8 Aug 2021 19:35:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CED7A6EC40 Received: by smtp46.i.mail.ru with esmtpa (envelope-from ) id 1mCllx-0003GZ-8o; Sun, 08 Aug 2021 19:35:37 +0300 To: Vladislav Shpilevoy , olegrok@tarantool.org References: <449cb4c7-c738-ed07-8517-61b663ad9f79@tarantool.org> <0931c9db-bd8e-0687-2c96-73fac1aeff7e@tarantool.org> <18971894-830a-6bbd-a87b-83cd6abe5f34@tarantool.org> Message-ID: <3349aff9-a95c-cab5-4610-ca1089c1e762@tarantool.org> Date: Sun, 8 Aug 2021 19:35:35 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <18971894-830a-6bbd-a87b-83cd6abe5f34@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92087353F0EC44DD972FF4A7D76DB5E242D14FEF1BD8BF4AC182A05F53808504048BB584EB8D018B0BE56B3BB70B0C404225D6B3F3E2FD17548377CEF3DB2BBDA X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE737BB76880A4CA9A4EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637B100969577675F2D8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D814F2B2C49D6F635B50F8EB6A462D4601117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAA867293B0326636D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EEC65AC60A1F0286FE86A01F665E7D37C5D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE3FE4D9CDE3FF759CFC0837EA9F3D19764C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F79006371F24DFF1B2961425731C566533BA786AA5CC5B56E945C8DA X-C1DE0DAB: 0D63561A33F958A55D7D8BC828009DA1FFBE7CF8E93F7FBA5576E9AB32BB968FD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75AF0B556A5A327A45410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34A150564C13026C18441E5DE0EC26F7644F712690F9DAC4B58A7D508039AB2C1BE571890392553B8A1D7E09C32AA3244C203E18D6DF5341CEB33AA646D3DF78D2C3B3ADDA61883BB583B48618A63566E0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojMTMPlNJj3Sjwptmjlwt3UQ== X-Mailru-Sender: B5B6A6EBBD94DAD807A6B1700956AF3194A2131B5B5F9B276A011C8B4F5FF1CE1FE834A55A7494F41EC9E4A2C82A33BC8C24925A86E657CE0C70AEE3C9A96FBAB3D7EE8ED63280BE112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3 2/9] lua: built-in module datetime X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Safin Timur via Tarantool-patches Reply-To: Safin Timur Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Much respect that you've found some time for review, even while being on vacation! Thanks! On 08.08.2021 14:26, Vladislav Shpilevoy wrote: >>>> +local datetime_index_handlers = { >>>> +    unixtime = function(self) >>>> +        return self.secs >>>> +    end, >>>> + >>>> +    timestamp = function(self) >>>> +        return tonumber(self.secs) + self.nsec / 1e9 >>> >>> 11. If you are saying the Lua number value range is enough, >>> why do you store them as 64bit integers in 'self' and >>> convert to a number only for the external API? >> >> May be I misunderstood your sentence, but let me elaborate here. >> I need seconds and nanoseconds kept separately for their efficient and precise handling, and for passing to c-dt. >> >> If we would keep 32-bit integers in seconds then we would be able to handle only dates upto 2038 year, thus we need 64-bit seconds for extended range. >> >> OTOH, not whole 64-bit range is practical, and required for expected in real-life datetime range values. It's not a problem that Lua number and int64_t have very different range for precise integer values. Lua number could handle integer values upto 9e15, which is corresponding to ... > > I know all that. The question is why do you keep storing cdata 64 bits numbers > inside of the datetime objects, if you convert them all to Lua numbers before > return? You could store self.secs as just number. And the other fields too. Lua > number is double, it does not loose precision for integers < 2^53, which should > be enough for the ranges you want to support. Correct? I keep using 64-bit because the primary code operating with fields is on C-side, and we need Lua number only on rare cases when user asked for composed attribute date.timestamp. Whenever we deal with seconds within arthmetic operations or transfer to c-dt, we need integer C type, larger than 32-bit. It's good fit for int64_t. > >>>> +local function datetime_new(o) >>>> +    if o == nil then >>>> +        return datetime_new_raw(0, 0, 0) >>>> +    end >>>> +    local secs = 0 >>>> +    local nsec = 0 >>>> +    local offset = 0 >>>> +    local easy_way = false >>>> +    local y = 0 >>>> +    local M = 0 >>>> +    local d = 0 >>>> +    local ymd = false >>>> + >>>> +    local h = 0 >>>> +    local m = 0 >>>> +    local s = 0 >>>> +    local frac = 0 >>>> +    local hms = false >>>> + >>>> +    local dt = 0 >>>> + >>>> +    local handlers = { >>> >>> 17. Please, stop using closures in the constructor of a >>> potentionally very frequently created/deleted object > >> >> I still disagreed that datetime_new constructor will be on critical path, the most frequent case expected to be parse*() -> datetime_new_dt() -> datetime_new_raw(). i.e. when we parse millions of timestamps in logs and then proceed them. >> >> datetime_new(builder_object) expected to be called in rare cases when object got created from Lua code, not on performance critical paths, thus convenience here is more important than performance. >> >> And BTW I've created simple benchmark for comparing datetime_new_dt vs datetime_new(from builder object) vs datetime_new_2 which would be not using hash object with handlers, but rather sequence of ifs. >> >> https://gist.github.com/tsafin/31cc9b0872b6015904fcc90d97740770 >> ``` >>  ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3] >> 00:39 $ ./build/src/tarantool ../bench-hash-dt.lua >> datetime_new_dt 0.002718448638916 >> datetime_new    5.1318607330322 >> datetime_new(precreated)        4.4448664188385 >> datetime_new_2(ifs)     4.4977836608887 >> ``` >> >> Note, that creation from builder object is much, much slower than raw cdata object creation (that's expected). If we move object out of loop, and precreate it then we save some time (as expected) 5.1 vs 4.4. >> But if we change implementation from hash with closure handlers to sequence of ifs - we actually degrade performance (a bit) from 4.44 to 4.49. >> >> Biggest performance impact here - traversing of object attributes, but whether it's using closures or not using them, it makes no much difference. >> >> But using closures makes this code look much more elegant. So keeping it here for today. > > I want to see the version you created with ifs, please. With the closures > you do the same calculations + create the functions, it can't be slower > unless something is wrong with the version you did with ifs. I've provided you the link to this benchmark - please see gist https://gist.github.com/tsafin/31cc9b0872b6015904fcc90d97740770 But now looked into this code again, and discovered that I didn't call datetime_new_2 (version with ifs) in the last case, but rather copy-pasted datetime_new (original version), and timings should be actually the same as in precreated version, where we created object with attributes before loop, and then used it as invariant. So here are local timings with corrected code. Ran 3 times: ``` ✔ ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3|…1⚑ 3] 18:01 $ ./build/src/tarantool ../bench-hash-dt.lua datetime_new_dt 0.001617431640625 datetime_new 3.0791389942169 datetime_new(precreated) 3.1896848678589 datetime_new_2(ifs) 0.15672922134399 ✔ ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3|…1⚑ 3] 18:02 $ ./build/src/tarantool ../bench-hash-dt.lua datetime_new_dt 0.0017032623291016 datetime_new 3.0739498138428 datetime_new(precreated) 2.9590804576874 datetime_new_2(ifs) 0.14570164680481 ✔ ~/datetime/tarantoolt [tsafin/gh-5941-datetime-v3|…1⚑ 3] 18:12 $ ./build/src/tarantool ../bench-hash-dt.lua datetime_new_dt 0.0016193389892578 datetime_new 3.4707424640656 datetime_new(precreated) 3.1298229694366 datetime_new_2(ifs) 0.14686107635498 ``` So yes, you were correct, version with ifs is 20x times faster than more readable version using hash with closures. > >>> 19. It can ignore some fields (here it ignores 'year'): >>> >>> tarantool> d1 = datetime({secs = 100, year = 1}) >>> --- >>> ... >>> >>> tarantool> d2 = datetime({secs = 100, year = 2}) >>> --- >>> ... >>> >>> tarantool> d1 == d2 >>> --- >>> - true >>> ... >>> >> >> Yes, and that's by design. .secs defines 'easy_way' of initializing object via secs/nsec/offset, while .year selects 'ymd' mode. And `easy_way` has precedence. It looks reasonable for me, please, suggest your idea how to handle it even better without much redesign? > > You could for instance, by looking at a first key you see in the dictionary, > decide how will you parse the entire dict. If you see first 'secs', you parse > only secs, nsecs and offset. Everything else raises an error. If you see > first 'year', you raise an error for the other non-related keys. > > Ignorance of some keys does not look acceptable. > > And by the way, does it mean I can't specify a date using year, month, day, > hours, and secs? Why isn't it valid? Here is the tricky part, agreed that this may be confusing. The "easy mode" operates on secs, nsec, offset, the human-readable mode operates on year, month, day, hour, minute, **second**, and tz. There is second in human-readable mode, it's rather named differently (i.e. complete word, non-plural). The more important question though - how to implement reasonably diagnosing of such mix-in of raw attributes, with human-radable attributes? - I tend to believe that we should **not accept them both** in the single function, but rather accept .secs/.nsec/.offset via different entries, like datetime_new_raw vs datetime_new ``` date = require('datetime') T = date.new{year = 1970, month = 1, day = 1, minute = 0, seconds = 15, tz = 180} T2 = date.new_raw{secs = 0, nsec = 0, offset = 180} -- will be the same as new(0, 0, 180) ``` In this case we physically prevent mixing of different modes with different allowed attributes in the builder object. What do you think? > >>> 20. Serialization does not seem to be working in the console >>> despite you tried to implement it above: >>> >>> tarantool> d = datetime({secs = 100}) >>> --- >>> ... >>> >>> tarantool> d >>> --- >>> - 'cdata: 0x0103ad9c50' >>> ... >> >> Yup, redefining of __serialize is not enough for cdata object by whatever reason (I didn't look deep). But after the next patch(es) it's start to stringize properly. > > But don't you have __serialize implementation in this commit already? > (I don't remember and didn't see the latest code yet.) Yes, I did have __serialize there, but that was working strangely and only started to behave properly (well, more or less) once I've added __tostring in the following commit. I'm not very well informed about expected console behaviour (regarding those 2 meta-methods, at least), so took that for granted. Probably, there is some bug, may be it's expected behaviour for cdata based data structures? > >>>> +            o = right_t == datetime_t and interval_new() or datetime_new() >>>> +            o.secs, o.nsec = _normalize_nsec(lhs.secs - rhs.secs, >>>> +                                            lhs.nsec - rhs.nsec) >>>> +            return o >>>> +        else >>>> +            error_incompatible("operator -") >>>> +        end >>>> +    -- both left and right are generic intervals >>>> +    elseif left_t == interval_t and right_t == interval_t then >>>> +        o = interval_new() >>>> +        o.secs, o.nsec = _normalize_nsec(lhs.secs - rhs.secs, >>>> +                                        lhs.nsec - rhs.nsec) >>>> +        return o >>>> +    else >>>> +        error_incompatible("operator -") >>>> +    end >>>> +end >>> >>> 22. Did you really test this function? >>> >>> tarantool> dt = require('datetime') >>> --- >>> ... >>> >>> tarantool> d1 = dt() >>> --- >>> ... >>> >>> tarantool> d2 - 1 >>> --- >>> - error: 'builtin/datetime.lua:487: bad argument #1 to ''typeof'' (C type expected, >>>      got number)' >>> ... >> >> All kinds of arguments checking implemented in the separate patch about interval arithmetic operations, please find a moment to look into them. > > Please, do not introduce broken functions. Make the correct ones right away and > add tests. The new code does not need to introduce full functionality if it > can be done in an atomic way, but it must not be visibly broken at least. Here > it is. Ok, here is how it's looking in patchset: in 1 commit we introduce arithmetic support, but not handle well all corner cases (especially there is no support for proper year or month arithmetics). Special types for year, and months date intervals have been added in the next commit, where we handle all corner cases with bogus data. We could: - either squash interval support to the original module; - or remove arithmetic altogether from original patch creating the module datetime.lua. What is the recommended way? > >>>> + >>>> +local function datetime_from(o) >>>> +    if o == nil or type(o) == 'table' then >>>> +        return datetime_new(o) >>>> +    elseif type(o) == 'string' then >>>> +        return parse(o) >>>> +    end >>>> +end >>>> + >>>> +local function local_now() >>>> +    local p_tv = ffi.new('struct timeval [1]') >>>> +    local rc = builtin.gettimeofday(p_tv, nil) >>> >>> 24. Why can't you use fiber time? It should be enough in >>> nearly all usecases and is much faster. For real-real time >>> talking directly to the kernel we have 'clock' Lua module >>> already. You could check if results of clock functions can >>> be passed into your module to create 'real' timestamps not >>> depending on the eventloop. >> >> Fiber time, which actually return cached ev_rt_now, looks imprecise and depends on event loop circumstances, though eventually it calls the same clock_gettime(). So fiber-time is looking like a cached timestamp value from [hopefully, non-distant] past. >> >> I'd rather prefer to have more realtime values, and clock_realtime*/clock_monotonic* look relevant, they call clock_gettime() for their purposes, which in turn calls the same gettimeofday(). So in this case we call same api, but with 2 extra call layers above. > > By using gettimeofday you drag a lot of cdef garbage including the function > itself and what it returns. Please, don't. Anyway, the standard says struct > timeval can have more members than the usually used seconds and microseconds: > > https://pubs.opengroup.org/onlinepubs/7908799/xsh/systime.h.html > > The header defines the timeval structure that includes > **at least** the following members > >> IMHO, not worth extra complexity if we could just simply call gettimeofday(). > > Now you added a lot of extra complexity exactly because of this function. Once again, this code is in C now, there is no any non-portable timeval definition of ffi to gettimeofday in Lua - everything is in C, which extra complexity you are talking about? I do not understand. > >>> 25. Functions like ctime can't be used in Lua, because according >>> to their Linux man: >>> >>>     The return value points to a statically allocated string >>>     which might be overwritten by subsequent calls to any of the >>>     date and time functions >>> >>> Which means they use a global buffer. If you will >>> use datetime module in any of __gc handlers, it will >>> return garbage sometimes or just override results >>> somethere. That is exactly the same problem as existed >>> with tarantool's static buffer. >> >> I do not believe it's the same problem - in this case content of static buffer is immediately interned in Lua via ffi.string() > > Well, it is exactly the same problem. How do you think the static > buffer used to be copied to Lua until we purged it from everywhere? Using > the same ffi.string(). Before ffi.string() will copy anything, it > will start GC on demand, and inside of GC the string might be overwritten. > So when GC ends, ffi.string() will copy garbage or some new foreign string. > >> https://luajit.org/ext_ffi_api.html >> "The Lua string is an (interned) copy of the data and bears no relation to the original data area anymore." > > It does not say anything about when GC is started, and that was the > problem with already 2 global buffers - our static one and one from > luacjson library. I don't see how a third global buffer is different. > It is broken in the same way precisely. > > Look at this commit: acf8745ed8fef47e6d1f1c31708c7c9d6324d2f3 > ("uuid: drop tt_uuid_str() from Lua"). It is your case exactly. > A global buffer is returned to Lua via ffi and then ffi.string() > was called. There is a test which breaks when you do so. Strings > become wrong and/or garbage. Thanks for the hint, that's very nice test, and quite convincing! I've added datetime.ctime and datetime.asctime checkes there: ========================================================== diff --git a/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua b/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua index 84c2944e5..7eb05f9e8 100755 --- a/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua +++ b/test/app-tap/gh-5632-6050-6259-gc-buf-reuse.test.lua @@ -14,6 +14,7 @@ local uri = require('uri') local json = require('json') local msgpackffi = require('msgpackffi') local decimal = require('decimal') +local datetime = require('datetime') local function test_uuid(test) test:plan(1) @@ -256,15 +257,86 @@ local function test_decimal(test) test:ok(is_success, 'decimal str in gc') end +local function test_datetime_asctime(test) + test:plan(1) + + local gc_count = 100 + local iter_count = 1000 + local is_success = true + + local T1 = datetime('1970-01-01') + local T2 = datetime('2000-01-01') + + local function datetime_asctime() + local str1 = datetime.asctime(T1) + local str2 = datetime.asctime(T2) + local str3 = datetime.asctime(T1) + local str4 = datetime.asctime(T2) + if str1 ~= str3 or str2 ~= str4 then + is_success = false + end + end + + local function create_gc() + for _ = 1, gc_count do + ffi.gc(ffi.new('char[1]'), function() datetime_asctime() end) + end + end + + for _ = 1, iter_count do + create_gc() + datetime_asctime() + end + + test:ok(is_success, 'info datetime in gc') +end + +local function test_datetime_ctime(test) + test:plan(1) + + local gc_count = 100 + local iter_count = 1000 + local is_success = true + + local T1 = datetime('1970-01-01') + local T2 = datetime('2000-01-01') + + local function datetime_ctime() + local str1 = datetime.ctime(T1) + local str2 = datetime.ctime(T2) + local str3 = datetime.ctime(T1) + local str4 = datetime.ctime(T2) + if str1 ~= str3 or str2 ~= str4 then + is_success = false + end + end + + local function create_gc() + for _ = 1, gc_count do + ffi.gc(ffi.new('char[1]'), function() datetime_ctime() end) + end + end + + for _ = 1, iter_count do + create_gc() + datetime_ctime() + end + + test:ok(is_success, 'info datetime in gc') +end + + box.cfg{} local test = tap.test('gh-5632-6050-6259-gc-buf-reuse') -test:plan(6) +test:plan(8) test:test('uuid in __gc', test_uuid) test:test('uri in __gc', test_uri) test:test('msgpackffi in __gc', test_msgpackffi) test:test('json in __gc', test_json) test:test('info uuid in __gc', test_info_uuid) test:test('decimal str in __gc', test_decimal) +test:test('datetime.asctime in __gc', test_datetime_asctime) +test:test('datetime.ctime in __gc', test_datetime_ctime) os.exit(test:check() and 0 or 1) ========================================================== And regardless of a number of times I ran it in Debug mode - it was never failing. But... Once, I've recompiled branch in Release mode - it start to faile evrry next moment. I'll play with reentrant versions of ctime (ctime_r) and asctime (asctime_r), making sure that LuaJIT allocators will know about buffers passed to and returned from those functions. > >>>> diff --git a/test/unit/datetime.c b/test/unit/datetime.c >>>> new file mode 100644 >>>> index 000000000..b6f568c03 >>>> --- /dev/null >>>> +++ b/test/unit/datetime.c >>> >>> 26. This file tests dt library. It does not test your code. Please, add >>> tests for everything you implemented in this commit. For c-dt library >>> the tests must be added in the previous commit, and the test file >>> must be called test/unit/cdt.c then. >> >> Unit-test has been moved to the commit introducing c-dt. And in turn, initial datetime test (checking parser and ctime/asctime/now) squashed to the same commit where we introduce module with parser. > > Thanks, that should be better. > Thanks, Timur