[Tarantool-patches] [PATCH resend v2 02/11] lua: built-in module datetime

Timur Safin tsafin at tarantool.org
Sun Aug 1 23:23:32 MSK 2021


Hello Vlad,

Ok, ok, despite of 20 years (!!) long personal tradition (which I've 
learned from Larry Wall email style, e.g. https://www.nntp.perl.org/group/perl.perl5.porters/1999/09/msg444.html),
I'm finally changing my citation style back to the usual `>`, so your
mail client will be happy again. 

> From: Vladislav Shpilevoy <v.shpilevoy at tarantool.org>
> Subject: Re: [PATCH resend v2 02/11] lua: built-in module datetime
> 
> Hi! Thanks for the fixes! Although I would prefer you
> sending updates right under each non-trivial comment and
> pushing the changes. Otherwise I can't check if you really
> fixed the comments and if the fixes are better than it was
> before.

I understand, but hard to follow (at least yet for me) unless
I've reshuffled all changes back to the reasonable set of changes,
which I'm ok to show indefinitely attached to the GitHub issue page.
Personally I hate when thousands of temporary changes got attached 
forever to the issue, it's hard then to find the relevant ones.

> In all your responses the previous email's text is prefixed with ':' sign
> unlike it is done in other people's responses. In my email client it turns
> the
> email into a mess where I can't see where is the new text. Could you
> possibly
> start using '>' prefix?

Now, I've switched. 

> 
> > : > + */
> > : > +struct datetime_t {
> > : > +	int64_t secs;	///< seconds since epoch
> > : > +	int32_t nsec;	///< nanoseconds if any
> > : > +	int32_t offset; ///< offset in minutes from GMT
> > :
> > : 6. We never use // nor /// nor ///< nor we write comments
> > : on the same line as the code (except for old legacy code we
> > : inherited from sqlite and some super old tarantool code.
> >
> > BTW, why? Single line comment // is the legal C construction
> > since C99 (and always have been acceptable by Gnu C and major
> > other compilers).
> >
> > Also did you know that ///< is doxygen construction for
> > documentation of a struct member? (Asking, just to make sure)
> 
> The fact that something is a legal C construction does not
> mean it fits our code style. For doxygen in our rules we have
> /** as a comment opening.

Ok, ok, I've switched back to /**< ... */ here

> 
> > :
> > : The same for datetime_interval_t.
> > :
> > : > +};
> > : > +
> > : > +/**
> > : > + * Date/time delta structure
> > : > + */
> > : > +struct datetime_interval_t {
> > : > +	int64_t secs; ///< relative seconds delta
> > : > +	int32_t nsec; ///< nanoseconds delta
> > : > +};
> > : > +
> > :
> > : 7. Why do you need this file? It is not included anywhere.
> > : And you don't need to define the structs in C if you are
> > : using them in Lua only. You can just define them in Lua
> > : using ffi.cdef like it is done in some other places.
> >
> > I'll need to use it in C very soon, once I get to SQL parser
> > INTERVAL 'xx' support.
> 
> You will need to introduce it in the same commit in which you
> will use it. We do not split patches into commits just to make
> them smaller. The split needs to make sense in terms of
> atomicity.
> 

Ok


> > : > +local datetime_mt = {
> > : > +    -- __tostring = datetime_tostring,
> > :
> > : 16. Why is it commented out? Is it even tested then?
> >
> > It's implemented in the later patch in patchset series.
> 
> Well, then it should not be here in this commit.
> 
> > : > +    microseconds = function(self)
> > : > +        return tonumber(self.secs*1e6 + self.nsec*1e3)
> > :
> > : 18. Please, add whitespaces around binary operators.
> >
> > Done! It was already such in the later commits which were part
> > of in patchset. [Yes, apparently I need to squash few more commits,
> > as Oleg Babin has asked elsewhere]
> 
> It is not just about squashes. It is only about atomicity (and
> as a consequence - review simplicity). You can't just squash all
> the commits into one. You will need to change the older commits
> to make them right, then rebase the newer commits.

I see.

BTW, what is the simplest approach here you could recommend?
I mean what tools do you use for shuffling code between version layers?
How to make it simple enough and painless? (I'd love to see training 
Here - I know how to do it the harder way in interactive rebase, but
I suspect there are some helpers, which I missed, which allow to 
Make it easy way. If any?)

> 
> > :
> > : 19. Why do you do tonumber() for all results? You are loosing
> > : precision for big values. The same in all the places where you
> > : use tonumber().
> >
> > To be able to use it in places which expect numbers like %d in
> > string.format.
> 
> That does not look like a good reason - you loose precision just
> to simplify the logs. Better make the logs use tonumber() than
> make the functions return imprecise results.

FWIW, maximal integer value, which may be precisely saved to Lua 
Number, is ~9e15, which, if being assigned to datetime `.secs`, 
could represent this date:

	tarantool> T = date.new()
	---
	...
	
	tarantool> M = 9e15
	---
	...
	
	tarantool> M
	---
	- 9000000000000000
	...
	
	tarantool> T.secs = M
	---
	...
	
	tarantool> T
	---
	- 2979311-04-03T16:00Z
	...

It's _well beyond_ currently planned supported date range.

So better ergonomics while working with those calculated attributed
does deserve some extra cast upon return of a value. We do not loss
any precision if we work with expected date range.

At the moment the claimed supported date range is [0000...9999] for
Lua runtime, and is narrower [208BC...4147] if you need to store value,
and keep it correctly sorted in index.

> 
> > Though at the end of day I've came up to the version
> > where I convert to Lua numbers not all (potentially fraction point)
> > values, but with exception of nanoseconds / microseconds / milliseconds
> > (which return integer values).
> >
> > :
> > : > +    end,
> > : > +
> > :
> > : 21. Extra empty line.
> >
> > Yup, but later, well, you know ...
> 
> No, I couldn't understand. What 'later'?

That was referring to some later commit. 

> 
> > :
> > : > +}
> > : > +
> > : > +local interval_mt = {
> > : > +    -- __tostring = interval_tostring,
> > :
> > : 22. Ditto. Why is it commented out and not tested?
> >
> > Tostring conversion was implemented later in a few
> > commits. One for datetime stringization, and another
> > one for interval.
> 
> Then it should be in the corresponding commit.

Yup.

> 
> > :
> > : > +    __serialize = interval_serialize,
> > : > +    __eq = datetime_eq,
> > : > +    __lt = datetime_lt,
> > : > +    __le = datetime_le,
> > : > +}
> > : > +
> > : > +local function datetime_new_raw(secs, nsec, offset)
> > : > +    local dt_obj = ffi.new(datetime_t)
> > : > +    dt_obj.secs = secs
> > : > +    dt_obj.nsec = nsec
> > : > +    dt_obj.offset = offset
> > : > +    return dt_obj
> > : > +end
> > : > +
> > : > +local function local_rd(o)
> > :
> > : 23. What is 'rd' and 'dt'?
> >
> > `dt` is type - data type from c-dt (alias to signed integer).
> > Which shows a number of dates since Rata Die date (0001-01-01).
> > It's signed 32-bit value, thus could represent huge range before
> > 0001-01-01 as well.
> 
> You didn't say what is 'rd'.

Rata Die date.

> 
> > :
> > : > +            end,
> > :
> > : 27. The usage of closures here might render all your FFI efforts
> > : useless, killing the performance. Please, try to define all
> > : methods of all objects only once in the root namespace of the
> > : file. Closure usage might be justified only for rarely created
> > : long living heavy objects like netbox.
> >
> > Agreed that creating closures in each iteration of a loop was
> > bad idea (as Oleg Babin has already pin-pointed it elsewhere).
> 
> It is not only about in the loop. It is about having closures at all.
> I don't see why are they necessary here. It is not a complex state
> machine like netbox which needs closures. Please, rework the function
> not to have any closures in them.

datetime_new() is a good example of a function using closures
called via tag attributes names, where I do need to have access 
to the locals from outer function, i.e. easy_way, ymd, hms variables
which defines submode. 
Converting this code to the series of ifs may make dispatch
slower (and less elegant), and after change which we have discussed 
with Oleg Babin elsewhere, when table creation as loop invariant
moved to the module scope of above of a loop, then performance
impact becoming negligible. While still keeping code ergonomics.

Here is the deal - let me send fully feature complete patchset
still using closures at a couple of places, we proceed with review
before feature freeze, then we return back to performance/closure
optimizations, but as a follow-up commit?

> 
> > : > +
> > : > +--[[
> > : > +    Basic      Extended
> > : > +    20121224   2012-12-24   Calendar date   (ISO 8601)
> > : > +    2012359    2012-359     Ordinal date    (ISO 8601)
> > : > +    2012W521   2012-W52-1   Week date       (ISO 8601)
> > : > +    2012Q485   2012-Q4-85   Quarter date
> > : > +]]
> > : > +
> > : > +local function parse_date(str)
> > : > +    local dt = ffi.new('dt_t[1]')
> > : > +    local rc = cdt.dt_parse_iso_date(str, #str, dt)
> > : > +    assert(rc > 0)
> > :
> > : 29. Er ... And what if I pass a string in a wrong format? Users
> > : should not get assertions.
> > :
> > : 	tarantool> datetime.parse_date('121r1 r1r 13 13 13')
> > : 	---
> > : 	- error: 'builtin/datetime.lua:397: assertion failed!'
> > : 	...
> > :
> > : This shall not ever happen. Would you mind implementing proper
> > : error handling. The same in all the places where the errors are
> > : ignored.
> >
> >
> > Now, I've reworked approach how they handle errors (you could see
> > It in later commits in the patch-set), and now it returns tuple -
> > - parsed value
> > - and the number of accepted input characters.
> >
> > 	tarantool> date.parse_date('121r1 r1r 13 13 13')
> > 	---
> > 	- null
> > 	- 0
> > 	...
> >
> > Sometimes, while parsing composed strings using these partial
> > methods (parse_date / parse_time / parse_time_zone) we need to
> > know the number of symbols we have successfully accepted. That's
> > why there is seconds value in returned tuple.
> >
> > Please give me know if there is better/more idiomatic approach
> > used for such scenarios?
> 
> In the new code which is a part of the core we use exceptions.
> https://github.com/tarantool/doc/issues/1727
> 
> But I see it is still not documented, so for me 'null, pos' look
> fine so far.

Nice!

I still need a time top move portions of a code between different 
patches in a new version of patchset, but I definitely plan to send
it later tonight. Stay tuned!

Thanks,
Timur



More information about the Tarantool-patches mailing list