From: Sergey Kaplun <skaplun@tarantool.org> To: imeevma@tarantool.org Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH v1 1/1] misc: introduce misc.tonumber64() Date: Wed, 25 Nov 2020 22:11:51 +0300 [thread overview] Message-ID: <20201125191151.GA13309@root> (raw) In-Reply-To: <d7e33570be67b846c72d18da046b7a73929e054e.1605519162.git.imeevma@gmail.com> Mergen, Thanks for the patch! Please consider my comments below. Please use "luajit" prefix in the subject for LuaJIT-related patches in the future (see that patch series [1] for example). It's not obligatory, just looks more convenient. On 16.11.20, imeevma@tarantool.org wrote: > This patch introduces function misc.tonumber64() that accepts number, > cdata number or string and returns 64-bit integer value. > > Part of tarantool/tarantool#4738 > --- It is totally unclear to me what is the behaviour of types different from `CTID_INT64` and `CTID_UINT64`. I suppose this fogged zone should be well described in commit message and documentation. > https://github.com/tarantool/tarantool/issues/4738 > https://github.com/tarantool/luajit/tree/imeevma/gh-4738-introduce-misc-tonumber64 > > src/lib_misc.c | 173 ++++++++++++++++++ > ...-4738-incorrect-result-tonumber64.test.lua | 38 ++++ > 2 files changed, 211 insertions(+) > create mode 100755 test/gh-4738-incorrect-result-tonumber64.test.lua > > diff --git a/src/lib_misc.c b/src/lib_misc.c I observe these warnings at compile, please fix them in the next version: | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:203:38: warning: 'val_u' may be used uninitialized in this function [-Wmaybe-uninitialized] | 203 | *(uint64_t *) cdataptr(cd) = val_u; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ | lib_misc.c:205:37: warning: 'val_i' may be used uninitialized in this function [-Wmaybe-uninitialized] | 205 | *(int64_t *) cdataptr(cd) = val_i; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:203:38: warning: 'val_u' may be used uninitialized in this function [-Wmaybe-uninitialized] | 203 | *(uint64_t *) cdataptr(cd) = val_u; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ | lib_misc.c:205:37: warning: 'val_i' may be used uninitialized in this function [-Wmaybe-uninitialized] | 205 | *(int64_t *) cdataptr(cd) = val_i; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ I've compiled with the following command: | make -j CCDEBUG=" -g -ggdb3" XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT" Also, with non-debug build I see the following errors: | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:100:5: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration] | 100 | assert(size != CTSIZE_INVALID); | | ^~~~~~ | lib_misc.c:25:1: note: 'assert' is defined in header '<assert.h>'; did you forget to '#include <assert.h>'? | 24 | #include "lj_state.h" | +++ |+#include <assert.h> | 25 | | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:100:5: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration] | 100 | assert(size != CTSIZE_INVALID); | | ^~~~~~ | lib_misc.c:25:1: note: 'assert' is defined in header '<assert.h>'; did you forget to '#include <assert.h>'? | 24 | #include "lj_state.h" | +++ |+#include <assert.h> | 25 | I've compiled with the following command: | make -j Please use `lua_assert()` instead of `assert()` inside LuaJIT code. I suppose that should partially fix red CI [2]. > index 6f7b9a9..4228836 100644 > --- a/src/lib_misc.c > +++ b/src/lib_misc.c > @@ -10,11 +10,18 @@ > > #include "lua.h" > #include "lmisclib.h" > +#include "lauxlib.h" > +#include <errno.h> > +#include <ctype.h> Please, include system-wide headers before local ones. > > #include "lj_obj.h" > #include "lj_str.h" > #include "lj_tab.h" > #include "lj_lib.h" > +#include "lj_err.h" > +#include "lj_ctype.h" > +#include "lj_cdata.h" Usually these two headers includes only if LuaJIT build with FFI. Please add | #if LJ_HASFFI directive. > +#include "lj_state.h" Please, update <Makefile.dep> with these headers and their dependencies recursively. > > /* ------------------------------------------------------------------------ */ > > @@ -63,6 +70,172 @@ LJLIB_CF(misc_getmetrics) > return 1; > } > > +LJLIB_CF(misc_tonumber64) What is the function's behaviour when LuaJIT build without FFI? > +{ > + TValue *o = L->base; > + if (o >= L->top) > + lj_err_arg(L, 1, LJ_ERR_NOVAL); It seems like `lj_lib_checkany()` does what you want here, doesn't it? > + int base = luaL_optint(L, 2, -1); Please declare variables at the beginning of a block here and below, considering LuaJIT code style (you can use these flags | -Wall -Wextra -Wdeclaration-after-statement -Wredundant-decls -Wshadow -Wpointer-arith to check yourself, see <src/Makefile>). Why do you use the public interface instead `lj_lib_optint()`? > + luaL_argcheck(L, (2 <= base && base <= 36) || base == -1, 2, > + "base out of range"); Let's use quarter-fake tab here. See <lib_base.c> indentation style, for example. Also, you may use `lj_err_arg()` here. > + uint32_t ctypeid = 0; > + GCcdata *cd; > + CTSize size; > + CTState *cts; > + double val_d; AFAICS, `val_d` value uses only inside `LUA_TNUMBER` case-block and can be declared inside it. Also, I see no reason to use `double` instead `lua_Number` here. > + switch (lua_type(L, 1)) { > + case LUA_TNUMBER: > + base = (base == -1 ? 10 : base); > + if (base != 10) > + return luaL_argerror(L, 1, "string expected"); Nit: AFAICS here and below we expect string if `base != 10`. IMHO, code structure like this a little bit clearer: | const int32_t base = lj_lib_optint(L, 2, 10); | if (base == 10) { | TValue *o = lj_lib_checkany(L, 1); | /* ... */ | } else { | const char *arg = strdata(lj_lib_checkstr(L, 1)); | /* ... */ | Feel free to ignore. > + val_d = numV(L->base); > + if (val_d < (double)INT64_MIN || val_d >= (double)UINT64_MAX) > + return luaL_argerror(L, 1, "cannot convert to 64-bit integer"); It's a little bit confusing -- we throw an error here, but for the "same" string returns nil. Both ways break backward compatibility (as far as `tonumber64` just yield `lua_Number` before, although this is not described in the documentation), but at least we should be consistent here, shouldn't we? | $ src/luajit -e 'print(misc.tonumber64(18446744073709551616))' | src/luajit: (command line):1: bad argument #1 to 'tonumber64' (cannot convert to 64-bit integer) | stack traceback: | [C]: in function 'tonumber64' | (command line):1: in main chunk | [C]: at 0x55f47d5541d0 | | ################################################################ | | $ src/luajit -e 'print(misc.tonumber64("18446744073709551616"))' | nil Yields nil in both ways looks more consistent to me. Thoughts? > + cts = ctype_cts(L); Unfortunately, ctype state may be not initialized here. It's initialized at opening of ffi library. This will lead to crush if we call `misc.tonumber64()` before load of ffi. | $ src/luajit -e 'print(misc.tonumber64(1))' | Segmentation fault (core dumped) > + if (val_d < 0.0) > + ctypeid = CTID_INT64; > + else > + ctypeid = CTID_UINT64; IMHO, the ternary operator is prettier here, isn't it? | ctypeid = (val_d < 0) ? CTID_INT64 : CTID_UINT64; > + lj_ctype_info(cts, ctypeid, &size); This call looks redundant -- we already know that size == 8. > + assert(size != CTSIZE_INVALID); > + cd = lj_cdata_new(cts, ctypeid, size); Is the usage of `lj_cdata_new_()` better here? > + o = L->top; > + setcdataV(L, o, cd); I suppose you forgot to use `lj_gc_check(L)` here and below when return cdata. We need to check used memory to start GC cycle if necessary. > + incr_top(L); We shouldn't check for growing stack here and below. We have 1 or 2 argument and return 1 value. > + if (val_d < 0.0) This check and the following cast look redundant, don't they? > + *(int64_t *) cdataptr(cd) = (int64_t)val_d; > + else > + *(uint64_t *) cdataptr(cd) = (uint64_t)val_d; > + return 1; > + case LUA_TSTRING: { > + GCstr *s = strV(o); > + const char *arg = strdata(s); > + size_t argl = s->len; > + while (argl > 0 && isspace(arg[argl - 1])) Is there any reason why do you prefer `isspace()` over `lj_char_isspace()`? > + argl--; > + while (isspace(*arg)) { Ditto. > + arg++; > + argl--; > + } > + > + /* > + * Check if we're parsing custom format: Please use LuaJIT comment format here and below (for example, see any copyright header). > + * 1) '0x' or '0X' trim in case of base == 16 or base == -1 > + * 2) '0b' or '0B' trim in case of base == 2 or base == -1 > + * 3) '-' for negative numbers > + * 4) LL, ULL, LLU - trim, but only for base == 2 or > + * base == 16 or base == -1. For consistency do not bother > + * with any non-common bases, since user may have specified > + * base >= 22, in which case 'L' will be a digit. > + */ Side note: AFAIK <lj_strscan.c> has a lot of functions to parse different formats. May be it is more consistent to use LuaJIT's parsers here. Thoughts? > + char negative = 0; > + if (arg[0] == '-') { > + arg++; > + argl--; > + negative = 1; > + } > + if (argl > 2 && arg[0] == '0') { > + if ((arg[1] == 'x' || arg[1] == 'X') && I suppose, that you can use `casecmp()` macro like in <src/lj_strscan.c> here and below. > + (base == 16 || base == -1)) { > + base = 16; arg += 2; argl -= 2; > + } else if ((arg[1] == 'b' || arg[1] == 'B') && > + (base == 2 || base == -1)) { > + base = 2; arg += 2; argl -= 2; > + } > + } > + int ull = 0; > + if (argl > 2 && (base == 2 || base == 16 || base == -1)) { > + if (arg[argl - 1] == 'u' || arg[argl - 1] == 'U') { > + ull = 1; > + --argl; It will be nice to see an explanation about behaviour of working with integers less that `INT64_MIN` for example. Should `tonumber64` returns nil here or same value as LuaJIT parsers for corresponding input string? IMHO, the value should overlap (behaviour must be same as for LuaJIT's parser). > + } > + if ((arg[argl - 1] == 'l' || arg[argl - 1] == 'L') && > + (arg[argl - 2] == 'l' || arg[argl - 2] == 'L')) { > + argl -= 2; > + if (ull == 0 && (arg[argl - 1] == 'u' || arg[argl - 1] == 'U')) { > + ull = 1; > + --argl; > + } > + } else { > + ull = 0; > + } > + } I see no check for strings like "1.5" or "1e6" and so on. Why are these formats not supported? > + base = (base == -1 ? 10 : base); > + errno = 0; > + char *arge; > + unsigned long long result = strtoull(arg, &arge, base); > + if (errno == 0 && arge == arg + argl) { > + if (argl == 0) { > + lua_pushnil(L); > + } else if (negative) { > + if (ull == 0 && result != 0 && result - 1 > INT64_MAX) { Will we save this behaviour? I suppose it should be the same as for LuaJIT parser. Thoughts? > + lua_pushnil(L); > + return 1; > + } > + cts = ctype_cts(L); > + /* This comment should be highter, shouldn't it? > + * To test overflow, consider > + * result > -INT64_MIN; > + * result - 1 > -INT64_MIN - 1; > + * Assumption: > + * INT64_MAX == -(INT64_MIN + 1); > + * Finally, > + * result - 1 > INT64_MAX; > + */ Does this cast differ from? | (uint64_t)-(int64_t)result > + int64_t val_i; > + uint64_t val_u; > + if (ull != 0) { > + val_u = (UINT64_MAX - result) + 1; > + ctypeid = CTID_UINT64; > + } else { > + val_i = -result; > + ctypeid = CTID_INT64; > + } > + > + lj_ctype_info(cts, ctypeid, &size); This call looks redundant -- we already know that size == 8. > + assert(size != CTSIZE_INVALID); > + > + cd = lj_cdata_new(cts, ctypeid, size); > + TValue *o = L->top; > + setcdataV(L, o, cd); > + incr_top(L); > + if (ull != 0) > + *(uint64_t *) cdataptr(cd) = val_u; > + else > + *(int64_t *) cdataptr(cd) = val_i; > + } else { The code here duplicates the code above. I suppose that it can be done much simpler. Something like that: | *(uint64_t *)cdataptr(cd) = neg ? (uint64_t)-(int64_t)result : result; All these comments return us to question about <lj_strscan.*> parsers. At least it will be nice to see an explanation in commit message, why they are not suitable here (if they are not). > + cts = ctype_cts(L); > + uint64_t val_u = result; > + ctypeid = CTID_UINT64; > + lj_ctype_info(cts, ctypeid, &size); > + assert(size != CTSIZE_INVALID); > + > + cd = lj_cdata_new(cts, ctypeid, size); > + TValue *o = L->top; > + setcdataV(L, o, cd); > + incr_top(L); > + *(uint64_t *) cdataptr(cd) = val_u; > + } > + return 1; > + } > + break; > + } > + case LUA_TCDATA: > + base = (base == -1 ? 10 : base); > + if (base != 10) > + return luaL_argerror(L, 1, "string expected"); > + cd = cdataV(L->base); > + ctypeid = cd->ctypeid; > + if (ctypeid >= CTID_INT8 && ctypeid <= CTID_DOUBLE) { > + lua_pushvalue(L, 1); > + return 1; > + } > + break; > + } > + lua_pushnil(L); > + return 1; > +} > + What's about LuaC API? > /* ------------------------------------------------------------------------ */ > > #include "lj_libdef.h" > diff --git a/test/gh-4738-incorrect-result-tonumber64.test.lua b/test/gh-4738-incorrect-result-tonumber64.test.lua Nit: It is new functionality for LuaJIT maybe it will be better to use the same naming for the test like `getmetrics()` and leave a comment with issue reference nearby corresponding test. Feel free to ignore. > new file mode 100755 > index 0000000..1d6da4a > --- /dev/null > +++ b/test/gh-4738-incorrect-result-tonumber64.test.lua > @@ -0,0 +1,38 @@ > +#!/usr/bin/env tarantool > + > +-- Miscellaneous test for LuaJIT bugs I don't understand this comment. This test checks `misc.tonumber64()` only, doesn't it? > +local tap = require('tap') > +local misc = require('misc') It is not necessary to require this builtin library (ffi library is the only one exception). > + > +local test = tap.test("gh-4738-incorrect-result-tonumber64") > +test:plan(24) > +-- > +-- gh-4738: Make sure that tonumber64 always returns cdata. > +-- > +test:ok(misc.tonumber64(1) == 1) > +test:ok(misc.tonumber64(-1) == -1) > +test:ok(misc.tonumber64(1.5) == 1) > +test:ok(misc.tonumber64(-1.5) == -1) > +test:ok(misc.tonumber64(1LL) == 1) > +test:ok(misc.tonumber64(1ULL) == 1) > +test:ok(misc.tonumber64(1LLU) == 1) > +test:ok(misc.tonumber64(-1ULL) == 18446744073709551615ULL) > +test:ok(misc.tonumber64('1') == 1) > +test:ok(misc.tonumber64('1LL') == 1) > +test:ok(misc.tonumber64('1ULL') == 1) > +test:ok(misc.tonumber64('-1ULL') == 18446744073709551615ULL) > + > +test:is(type(misc.tonumber64(1)), 'cdata') Minor: we check each number/cdata/string twice: to value and its type. May be it well be better to use 12 subtests that test value and type. Feel free to ignore. > +test:is(type(misc.tonumber64(-1)), 'cdata') > +test:is(type(misc.tonumber64(1.5)), 'cdata') > +test:is(type(misc.tonumber64(-1.5)), 'cdata') > +test:is(type(misc.tonumber64(1LL)), 'cdata') > +test:is(type(misc.tonumber64(1ULL)), 'cdata') > +test:is(type(misc.tonumber64(1LLU)), 'cdata') > +test:is(type(misc.tonumber64(-1ULL)), 'cdata') > +test:is(type(misc.tonumber64('1')), 'cdata') > +test:is(type(misc.tonumber64('1LL')), 'cdata') > +test:is(type(misc.tonumber64('1ULL')), 'cdata') > +test:is(type(misc.tonumber64('-1ULL')), 'cdata') It would be great to add several tests for `ffi.typeof()` since there are complaints [3] on return value type. > + > +os.exit(test:check() and 0 or 1) > -- > 2.25.1 > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-January/013801.html [2]: https://gitlab.com/tarantool/tarantool/-/pipelines/216530551/failures [3]: https://github.com/tarantool/tarantool/issues/4738#issuecomment-687241952 -- Best regards, Sergey Kaplun
prev parent reply other threads:[~2020-11-25 19:12 UTC|newest] Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-16 9:34 imeevma 2020-11-25 19:11 ` Sergey Kaplun [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201125191151.GA13309@root \ --to=skaplun@tarantool.org \ --cc=imeevma@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v1 1/1] misc: introduce misc.tonumber64()' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox