From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp47.i.mail.ru (smtp47.i.mail.ru [94.100.177.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 60868469710 for ; Wed, 25 Nov 2020 22:12:29 +0300 (MSK) Date: Wed, 25 Nov 2020 22:11:51 +0300 From: Sergey Kaplun Message-ID: <20201125191151.GA13309@root> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v1 1/1] misc: introduce misc.tonumber64() List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: imeevma@tarantool.org Cc: tarantool-patches@dev.tarantool.org Mergen, Thanks for the patch! Please consider my comments below. Please use "luajit" prefix in the subject for LuaJIT-related patches in the future (see that patch series [1] for example). It's not obligatory, just looks more convenient. On 16.11.20, imeevma@tarantool.org wrote: > This patch introduces function misc.tonumber64() that accepts number, > cdata number or string and returns 64-bit integer value. > > Part of tarantool/tarantool#4738 > --- It is totally unclear to me what is the behaviour of types different from `CTID_INT64` and `CTID_UINT64`. I suppose this fogged zone should be well described in commit message and documentation. > https://github.com/tarantool/tarantool/issues/4738 > https://github.com/tarantool/luajit/tree/imeevma/gh-4738-introduce-misc-tonumber64 > > src/lib_misc.c | 173 ++++++++++++++++++ > ...-4738-incorrect-result-tonumber64.test.lua | 38 ++++ > 2 files changed, 211 insertions(+) > create mode 100755 test/gh-4738-incorrect-result-tonumber64.test.lua > > diff --git a/src/lib_misc.c b/src/lib_misc.c I observe these warnings at compile, please fix them in the next version: | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:203:38: warning: 'val_u' may be used uninitialized in this function [-Wmaybe-uninitialized] | 203 | *(uint64_t *) cdataptr(cd) = val_u; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ | lib_misc.c:205:37: warning: 'val_i' may be used uninitialized in this function [-Wmaybe-uninitialized] | 205 | *(int64_t *) cdataptr(cd) = val_i; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:203:38: warning: 'val_u' may be used uninitialized in this function [-Wmaybe-uninitialized] | 203 | *(uint64_t *) cdataptr(cd) = val_u; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ | lib_misc.c:205:37: warning: 'val_i' may be used uninitialized in this function [-Wmaybe-uninitialized] | 205 | *(int64_t *) cdataptr(cd) = val_i; | | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ I've compiled with the following command: | make -j CCDEBUG=" -g -ggdb3" XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT" Also, with non-debug build I see the following errors: | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:100:5: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration] | 100 | assert(size != CTSIZE_INVALID); | | ^~~~~~ | lib_misc.c:25:1: note: 'assert' is defined in header ''; did you forget to '#include '? | 24 | #include "lj_state.h" | +++ |+#include | 25 | | lib_misc.c: In function 'lj_cf_misc_tonumber64': | lib_misc.c:100:5: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration] | 100 | assert(size != CTSIZE_INVALID); | | ^~~~~~ | lib_misc.c:25:1: note: 'assert' is defined in header ''; did you forget to '#include '? | 24 | #include "lj_state.h" | +++ |+#include | 25 | I've compiled with the following command: | make -j Please use `lua_assert()` instead of `assert()` inside LuaJIT code. I suppose that should partially fix red CI [2]. > index 6f7b9a9..4228836 100644 > --- a/src/lib_misc.c > +++ b/src/lib_misc.c > @@ -10,11 +10,18 @@ > > #include "lua.h" > #include "lmisclib.h" > +#include "lauxlib.h" > +#include > +#include Please, include system-wide headers before local ones. > > #include "lj_obj.h" > #include "lj_str.h" > #include "lj_tab.h" > #include "lj_lib.h" > +#include "lj_err.h" > +#include "lj_ctype.h" > +#include "lj_cdata.h" Usually these two headers includes only if LuaJIT build with FFI. Please add | #if LJ_HASFFI directive. > +#include "lj_state.h" Please, update with these headers and their dependencies recursively. > > /* ------------------------------------------------------------------------ */ > > @@ -63,6 +70,172 @@ LJLIB_CF(misc_getmetrics) > return 1; > } > > +LJLIB_CF(misc_tonumber64) What is the function's behaviour when LuaJIT build without FFI? > +{ > + TValue *o = L->base; > + if (o >= L->top) > + lj_err_arg(L, 1, LJ_ERR_NOVAL); It seems like `lj_lib_checkany()` does what you want here, doesn't it? > + int base = luaL_optint(L, 2, -1); Please declare variables at the beginning of a block here and below, considering LuaJIT code style (you can use these flags | -Wall -Wextra -Wdeclaration-after-statement -Wredundant-decls -Wshadow -Wpointer-arith to check yourself, see ). Why do you use the public interface instead `lj_lib_optint()`? > + luaL_argcheck(L, (2 <= base && base <= 36) || base == -1, 2, > + "base out of range"); Let's use quarter-fake tab here. See indentation style, for example. Also, you may use `lj_err_arg()` here. > + uint32_t ctypeid = 0; > + GCcdata *cd; > + CTSize size; > + CTState *cts; > + double val_d; AFAICS, `val_d` value uses only inside `LUA_TNUMBER` case-block and can be declared inside it. Also, I see no reason to use `double` instead `lua_Number` here. > + switch (lua_type(L, 1)) { > + case LUA_TNUMBER: > + base = (base == -1 ? 10 : base); > + if (base != 10) > + return luaL_argerror(L, 1, "string expected"); Nit: AFAICS here and below we expect string if `base != 10`. IMHO, code structure like this a little bit clearer: | const int32_t base = lj_lib_optint(L, 2, 10); | if (base == 10) { | TValue *o = lj_lib_checkany(L, 1); | /* ... */ | } else { | const char *arg = strdata(lj_lib_checkstr(L, 1)); | /* ... */ | Feel free to ignore. > + val_d = numV(L->base); > + if (val_d < (double)INT64_MIN || val_d >= (double)UINT64_MAX) > + return luaL_argerror(L, 1, "cannot convert to 64-bit integer"); It's a little bit confusing -- we throw an error here, but for the "same" string returns nil. Both ways break backward compatibility (as far as `tonumber64` just yield `lua_Number` before, although this is not described in the documentation), but at least we should be consistent here, shouldn't we? | $ src/luajit -e 'print(misc.tonumber64(18446744073709551616))' | src/luajit: (command line):1: bad argument #1 to 'tonumber64' (cannot convert to 64-bit integer) | stack traceback: | [C]: in function 'tonumber64' | (command line):1: in main chunk | [C]: at 0x55f47d5541d0 | | ################################################################ | | $ src/luajit -e 'print(misc.tonumber64("18446744073709551616"))' | nil Yields nil in both ways looks more consistent to me. Thoughts? > + cts = ctype_cts(L); Unfortunately, ctype state may be not initialized here. It's initialized at opening of ffi library. This will lead to crush if we call `misc.tonumber64()` before load of ffi. | $ src/luajit -e 'print(misc.tonumber64(1))' | Segmentation fault (core dumped) > + if (val_d < 0.0) > + ctypeid = CTID_INT64; > + else > + ctypeid = CTID_UINT64; IMHO, the ternary operator is prettier here, isn't it? | ctypeid = (val_d < 0) ? CTID_INT64 : CTID_UINT64; > + lj_ctype_info(cts, ctypeid, &size); This call looks redundant -- we already know that size == 8. > + assert(size != CTSIZE_INVALID); > + cd = lj_cdata_new(cts, ctypeid, size); Is the usage of `lj_cdata_new_()` better here? > + o = L->top; > + setcdataV(L, o, cd); I suppose you forgot to use `lj_gc_check(L)` here and below when return cdata. We need to check used memory to start GC cycle if necessary. > + incr_top(L); We shouldn't check for growing stack here and below. We have 1 or 2 argument and return 1 value. > + if (val_d < 0.0) This check and the following cast look redundant, don't they? > + *(int64_t *) cdataptr(cd) = (int64_t)val_d; > + else > + *(uint64_t *) cdataptr(cd) = (uint64_t)val_d; > + return 1; > + case LUA_TSTRING: { > + GCstr *s = strV(o); > + const char *arg = strdata(s); > + size_t argl = s->len; > + while (argl > 0 && isspace(arg[argl - 1])) Is there any reason why do you prefer `isspace()` over `lj_char_isspace()`? > + argl--; > + while (isspace(*arg)) { Ditto. > + arg++; > + argl--; > + } > + > + /* > + * Check if we're parsing custom format: Please use LuaJIT comment format here and below (for example, see any copyright header). > + * 1) '0x' or '0X' trim in case of base == 16 or base == -1 > + * 2) '0b' or '0B' trim in case of base == 2 or base == -1 > + * 3) '-' for negative numbers > + * 4) LL, ULL, LLU - trim, but only for base == 2 or > + * base == 16 or base == -1. For consistency do not bother > + * with any non-common bases, since user may have specified > + * base >= 22, in which case 'L' will be a digit. > + */ Side note: AFAIK has a lot of functions to parse different formats. May be it is more consistent to use LuaJIT's parsers here. Thoughts? > + char negative = 0; > + if (arg[0] == '-') { > + arg++; > + argl--; > + negative = 1; > + } > + if (argl > 2 && arg[0] == '0') { > + if ((arg[1] == 'x' || arg[1] == 'X') && I suppose, that you can use `casecmp()` macro like in here and below. > + (base == 16 || base == -1)) { > + base = 16; arg += 2; argl -= 2; > + } else if ((arg[1] == 'b' || arg[1] == 'B') && > + (base == 2 || base == -1)) { > + base = 2; arg += 2; argl -= 2; > + } > + } > + int ull = 0; > + if (argl > 2 && (base == 2 || base == 16 || base == -1)) { > + if (arg[argl - 1] == 'u' || arg[argl - 1] == 'U') { > + ull = 1; > + --argl; It will be nice to see an explanation about behaviour of working with integers less that `INT64_MIN` for example. Should `tonumber64` returns nil here or same value as LuaJIT parsers for corresponding input string? IMHO, the value should overlap (behaviour must be same as for LuaJIT's parser). > + } > + if ((arg[argl - 1] == 'l' || arg[argl - 1] == 'L') && > + (arg[argl - 2] == 'l' || arg[argl - 2] == 'L')) { > + argl -= 2; > + if (ull == 0 && (arg[argl - 1] == 'u' || arg[argl - 1] == 'U')) { > + ull = 1; > + --argl; > + } > + } else { > + ull = 0; > + } > + } I see no check for strings like "1.5" or "1e6" and so on. Why are these formats not supported? > + base = (base == -1 ? 10 : base); > + errno = 0; > + char *arge; > + unsigned long long result = strtoull(arg, &arge, base); > + if (errno == 0 && arge == arg + argl) { > + if (argl == 0) { > + lua_pushnil(L); > + } else if (negative) { > + if (ull == 0 && result != 0 && result - 1 > INT64_MAX) { Will we save this behaviour? I suppose it should be the same as for LuaJIT parser. Thoughts? > + lua_pushnil(L); > + return 1; > + } > + cts = ctype_cts(L); > + /* This comment should be highter, shouldn't it? > + * To test overflow, consider > + * result > -INT64_MIN; > + * result - 1 > -INT64_MIN - 1; > + * Assumption: > + * INT64_MAX == -(INT64_MIN + 1); > + * Finally, > + * result - 1 > INT64_MAX; > + */ Does this cast differ from? | (uint64_t)-(int64_t)result > + int64_t val_i; > + uint64_t val_u; > + if (ull != 0) { > + val_u = (UINT64_MAX - result) + 1; > + ctypeid = CTID_UINT64; > + } else { > + val_i = -result; > + ctypeid = CTID_INT64; > + } > + > + lj_ctype_info(cts, ctypeid, &size); This call looks redundant -- we already know that size == 8. > + assert(size != CTSIZE_INVALID); > + > + cd = lj_cdata_new(cts, ctypeid, size); > + TValue *o = L->top; > + setcdataV(L, o, cd); > + incr_top(L); > + if (ull != 0) > + *(uint64_t *) cdataptr(cd) = val_u; > + else > + *(int64_t *) cdataptr(cd) = val_i; > + } else { The code here duplicates the code above. I suppose that it can be done much simpler. Something like that: | *(uint64_t *)cdataptr(cd) = neg ? (uint64_t)-(int64_t)result : result; All these comments return us to question about parsers. At least it will be nice to see an explanation in commit message, why they are not suitable here (if they are not). > + cts = ctype_cts(L); > + uint64_t val_u = result; > + ctypeid = CTID_UINT64; > + lj_ctype_info(cts, ctypeid, &size); > + assert(size != CTSIZE_INVALID); > + > + cd = lj_cdata_new(cts, ctypeid, size); > + TValue *o = L->top; > + setcdataV(L, o, cd); > + incr_top(L); > + *(uint64_t *) cdataptr(cd) = val_u; > + } > + return 1; > + } > + break; > + } > + case LUA_TCDATA: > + base = (base == -1 ? 10 : base); > + if (base != 10) > + return luaL_argerror(L, 1, "string expected"); > + cd = cdataV(L->base); > + ctypeid = cd->ctypeid; > + if (ctypeid >= CTID_INT8 && ctypeid <= CTID_DOUBLE) { > + lua_pushvalue(L, 1); > + return 1; > + } > + break; > + } > + lua_pushnil(L); > + return 1; > +} > + What's about LuaC API? > /* ------------------------------------------------------------------------ */ > > #include "lj_libdef.h" > diff --git a/test/gh-4738-incorrect-result-tonumber64.test.lua b/test/gh-4738-incorrect-result-tonumber64.test.lua Nit: It is new functionality for LuaJIT maybe it will be better to use the same naming for the test like `getmetrics()` and leave a comment with issue reference nearby corresponding test. Feel free to ignore. > new file mode 100755 > index 0000000..1d6da4a > --- /dev/null > +++ b/test/gh-4738-incorrect-result-tonumber64.test.lua > @@ -0,0 +1,38 @@ > +#!/usr/bin/env tarantool > + > +-- Miscellaneous test for LuaJIT bugs I don't understand this comment. This test checks `misc.tonumber64()` only, doesn't it? > +local tap = require('tap') > +local misc = require('misc') It is not necessary to require this builtin library (ffi library is the only one exception). > + > +local test = tap.test("gh-4738-incorrect-result-tonumber64") > +test:plan(24) > +-- > +-- gh-4738: Make sure that tonumber64 always returns cdata. > +-- > +test:ok(misc.tonumber64(1) == 1) > +test:ok(misc.tonumber64(-1) == -1) > +test:ok(misc.tonumber64(1.5) == 1) > +test:ok(misc.tonumber64(-1.5) == -1) > +test:ok(misc.tonumber64(1LL) == 1) > +test:ok(misc.tonumber64(1ULL) == 1) > +test:ok(misc.tonumber64(1LLU) == 1) > +test:ok(misc.tonumber64(-1ULL) == 18446744073709551615ULL) > +test:ok(misc.tonumber64('1') == 1) > +test:ok(misc.tonumber64('1LL') == 1) > +test:ok(misc.tonumber64('1ULL') == 1) > +test:ok(misc.tonumber64('-1ULL') == 18446744073709551615ULL) > + > +test:is(type(misc.tonumber64(1)), 'cdata') Minor: we check each number/cdata/string twice: to value and its type. May be it well be better to use 12 subtests that test value and type. Feel free to ignore. > +test:is(type(misc.tonumber64(-1)), 'cdata') > +test:is(type(misc.tonumber64(1.5)), 'cdata') > +test:is(type(misc.tonumber64(-1.5)), 'cdata') > +test:is(type(misc.tonumber64(1LL)), 'cdata') > +test:is(type(misc.tonumber64(1ULL)), 'cdata') > +test:is(type(misc.tonumber64(1LLU)), 'cdata') > +test:is(type(misc.tonumber64(-1ULL)), 'cdata') > +test:is(type(misc.tonumber64('1')), 'cdata') > +test:is(type(misc.tonumber64('1LL')), 'cdata') > +test:is(type(misc.tonumber64('1ULL')), 'cdata') > +test:is(type(misc.tonumber64('-1ULL')), 'cdata') It would be great to add several tests for `ffi.typeof()` since there are complaints [3] on return value type. > + > +os.exit(test:check() and 0 or 1) > -- > 2.25.1 > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-January/013801.html [2]: https://gitlab.com/tarantool/tarantool/-/pipelines/216530551/failures [3]: https://github.com/tarantool/tarantool/issues/4738#issuecomment-687241952 -- Best regards, Sergey Kaplun