[Tarantool-patches] [PATCH resend v2 02/11] lua: built-in module datetime

Sun Aug 1 20:01:09 MSK 2021

Hi! Thanks for the fixes! Although I would prefer you
sending updates right under each non-trivial comment and
pushing the changes. Otherwise I can't check if you really
fixed the comments and if the fixes are better than it was
before.

On 30.07.2021 17:39, Timur Safin wrote:
> : From: Vladislav Shpilevoy <v.shpilevoy at tarantool.org>
> : Subject: Re: [PATCH resend v2 02/11] lua: built-in module datetime
> : 
> : Thanks for the patch!
> : 
> : This is a bad question to ask now, but why did you choose a dead library
> : from 2015 with just 25 stars, without any tests, and without any README as
> : a basic block for our datetime types? Is this the correct link I am
> : using - https://github.com/chansen/c-dt?

In all your responses the previous email's text is prefixed with ':' sign
unlike it is done in other people's responses. In my email client it turns the
email into a mess where I can't see where is the new text. Could you possibly
start using '>' prefix?

> : > + */
> : > +struct datetime_t {
> : > +	int64_t secs;	///< seconds since epoch
> : > +	int32_t nsec;	///< nanoseconds if any
> : > +	int32_t offset; ///< offset in minutes from GMT
> : 
> : 6. We never use // nor /// nor ///< nor we write comments
> : on the same line as the code (except for old legacy code we
> : inherited from sqlite and some super old tarantool code.
> 
> BTW, why? Single line comment // is the legal C construction 
> since C99 (and always have been acceptable by Gnu C and major 
> other compilers). 
> 
> Also did you know that ///< is doxygen construction for 
> documentation of a struct member? (Asking, just to make sure)

The fact that something is a legal C construction does not
mean it fits our code style. For doxygen in our rules we have
/** as a comment opening.

> : 
> : The same for datetime_interval_t.
> : 
> : > +};
> : > +
> : > +/**
> : > + * Date/time delta structure
> : > + */
> : > +struct datetime_interval_t {
> : > +	int64_t secs; ///< relative seconds delta
> : > +	int32_t nsec; ///< nanoseconds delta
> : > +};
> : > +
> : 
> : 7. Why do you need this file? It is not included anywhere.
> : And you don't need to define the structs in C if you are
> : using them in Lua only. You can just define them in Lua
> : using ffi.cdef like it is done in some other places.
> 
> I'll need to use it in C very soon, once I get to SQL parser
> INTERVAL 'xx' support.

You will need to introduce it in the same commit in which you
will use it. We do not split patches into commits just to make
them smaller. The split needs to make sense in terms of
atomicity.

> : > +local datetime_mt = {
> : > +    -- __tostring = datetime_tostring,
> : 
> : 16. Why is it commented out? Is it even tested then?
> 
> It's implemented in the later patch in patchset series.

Well, then it should not be here in this commit.

> : > +    microseconds = function(self)
> : > +        return tonumber(self.secs*1e6 + self.nsec*1e3)
> : 
> : 18. Please, add whitespaces around binary operators.
> 
> Done! It was already such in the later commits which were part
> of in patchset. [Yes, apparently I need to squash few more commits,
> as Oleg Babin has asked elsewhere]

It is not just about squashes. It is only about atomicity (and
as a consequence - review simplicity). You can't just squash all
the commits into one. You will need to change the older commits
to make them right, then rebase the newer commits.

> : 
> : 19. Why do you do tonumber() for all results? You are loosing
> : precision for big values. The same in all the places where you
> : use tonumber().
> 
> To be able to use it in places which expect numbers like %d in
> string.format.

That does not look like a good reason - you loose precision just
to simplify the logs. Better make the logs use tonumber() than
make the functions return imprecise results.

> Though at the end of day I've came up to the version
> where I convert to Lua numbers not all (potentially fraction point) 
> values, but with exception of nanoseconds / microseconds / milliseconds
> (which return integer values). 
> 
> : 
> : > +    end,
> : > +
> : 
> : 21. Extra empty line.
> 
> Yup, but later, well, you know ...

No, I couldn't understand. What 'later'?

> : 
> : > +}
> : > +
> : > +local interval_mt = {
> : > +    -- __tostring = interval_tostring,
> : 
> : 22. Ditto. Why is it commented out and not tested?
> 
> Tostring conversion was implemented later in a few 
> commits. One for datetime stringization, and another 
> one for interval.

Then it should be in the corresponding commit.

> : 
> : > +    __serialize = interval_serialize,
> : > +    __eq = datetime_eq,
> : > +    __lt = datetime_lt,
> : > +    __le = datetime_le,
> : > +}
> : > +
> : > +local function datetime_new_raw(secs, nsec, offset)
> : > +    local dt_obj = ffi.new(datetime_t)
> : > +    dt_obj.secs = secs
> : > +    dt_obj.nsec = nsec
> : > +    dt_obj.offset = offset
> : > +    return dt_obj
> : > +end
> : > +
> : > +local function local_rd(o)
> : 
> : 23. What is 'rd' and 'dt'?
> 
> `dt` is type - data type from c-dt (alias to signed integer).
> Which shows a number of dates since Rata Die date (0001-01-01).
> It's signed 32-bit value, thus could represent huge range before 
> 0001-01-01 as well. 

You didn't say what is 'rd'.

> : 
> : > +            end,
> : 
> : 27. The usage of closures here might render all your FFI efforts
> : useless, killing the performance. Please, try to define all
> : methods of all objects only once in the root namespace of the
> : file. Closure usage might be justified only for rarely created
> : long living heavy objects like netbox.
> 
> Agreed that creating closures in each iteration of a loop was
> bad idea (as Oleg Babin has already pin-pointed it elsewhere).

It is not only about in the loop. It is about having closures at all.
I don't see why are they necessary here. It is not a complex state
machine like netbox which needs closures. Please, rework the function
not to have any closures in them.

> : > +
> : > +--[[
> : > +    Basic      Extended
> : > +    20121224   2012-12-24   Calendar date   (ISO 8601)
> : > +    2012359    2012-359     Ordinal date    (ISO 8601)
> : > +    2012W521   2012-W52-1   Week date       (ISO 8601)
> : > +    2012Q485   2012-Q4-85   Quarter date
> : > +]]
> : > +
> : > +local function parse_date(str)
> : > +    local dt = ffi.new('dt_t[1]')
> : > +    local rc = cdt.dt_parse_iso_date(str, #str, dt)
> : > +    assert(rc > 0)
> : 
> : 29. Er ... And what if I pass a string in a wrong format? Users
> : should not get assertions.
> : 
> : 	tarantool> datetime.parse_date('121r1 r1r 13 13 13')
> : 	---
> : 	- error: 'builtin/datetime.lua:397: assertion failed!'
> : 	...
> : 
> : This shall not ever happen. Would you mind implementing proper
> : error handling. The same in all the places where the errors are
> : ignored.
> 
> 
> Now, I've reworked approach how they handle errors (you could see
> It in later commits in the patch-set), and now it returns tuple - 
> - parsed value
> - and the number of accepted input characters.
> 
> 	tarantool> date.parse_date('121r1 r1r 13 13 13')
> 	---
> 	- null
> 	- 0
> 	...
> 
> Sometimes, while parsing composed strings using these partial
> methods (parse_date / parse_time / parse_time_zone) we need to
> know the number of symbols we have successfully accepted. That's
> why there is seconds value in returned tuple.
> 
> Please give me know if there is better/more idiomatic approach
> used for such scenarios?

In the new code which is a part of the core we use exceptions.
https://github.com/tarantool/doc/issues/1727

But I see it is still not documented, so for me 'null, pos' look
fine so far.