[Tarantool-patches] [PATCH v1 1/1] misc: introduce misc.tonumber64()
Sergey Kaplun
skaplun at tarantool.org
Wed Nov 25 22:11:51 MSK 2020
Mergen,
Thanks for the patch! Please consider my comments below.
Please use "luajit" prefix in the subject for LuaJIT-related patches in the
future (see that patch series [1] for example). It's not obligatory, just looks
more convenient.
On 16.11.20, imeevma at tarantool.org wrote:
> This patch introduces function misc.tonumber64() that accepts number,
> cdata number or string and returns 64-bit integer value.
>
> Part of tarantool/tarantool#4738
> ---
It is totally unclear to me what is the behaviour of types different from
`CTID_INT64` and `CTID_UINT64`. I suppose this fogged zone should be
well described in commit message and documentation.
> https://github.com/tarantool/tarantool/issues/4738
> https://github.com/tarantool/luajit/tree/imeevma/gh-4738-introduce-misc-tonumber64
>
> src/lib_misc.c | 173 ++++++++++++++++++
> ...-4738-incorrect-result-tonumber64.test.lua | 38 ++++
> 2 files changed, 211 insertions(+)
> create mode 100755 test/gh-4738-incorrect-result-tonumber64.test.lua
>
> diff --git a/src/lib_misc.c b/src/lib_misc.c
I observe these warnings at compile, please fix them in the next version:
| lib_misc.c: In function 'lj_cf_misc_tonumber64':
| lib_misc.c:203:38: warning: 'val_u' may be used uninitialized in this function [-Wmaybe-uninitialized]
| 203 | *(uint64_t *) cdataptr(cd) = val_u;
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
| lib_misc.c:205:37: warning: 'val_i' may be used uninitialized in this function [-Wmaybe-uninitialized]
| 205 | *(int64_t *) cdataptr(cd) = val_i;
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
| lib_misc.c: In function 'lj_cf_misc_tonumber64':
| lib_misc.c:203:38: warning: 'val_u' may be used uninitialized in this function [-Wmaybe-uninitialized]
| 203 | *(uint64_t *) cdataptr(cd) = val_u;
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
| lib_misc.c:205:37: warning: 'val_i' may be used uninitialized in this function [-Wmaybe-uninitialized]
| 205 | *(int64_t *) cdataptr(cd) = val_i;
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
I've compiled with the following command:
| make -j CCDEBUG=" -g -ggdb3" XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT"
Also, with non-debug build I see the following errors:
| lib_misc.c: In function 'lj_cf_misc_tonumber64':
| lib_misc.c:100:5: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration]
| 100 | assert(size != CTSIZE_INVALID);
| | ^~~~~~
| lib_misc.c:25:1: note: 'assert' is defined in header '<assert.h>'; did you forget to '#include <assert.h>'?
| 24 | #include "lj_state.h"
| +++ |+#include <assert.h>
| 25 |
| lib_misc.c: In function 'lj_cf_misc_tonumber64':
| lib_misc.c:100:5: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration]
| 100 | assert(size != CTSIZE_INVALID);
| | ^~~~~~
| lib_misc.c:25:1: note: 'assert' is defined in header '<assert.h>'; did you forget to '#include <assert.h>'?
| 24 | #include "lj_state.h"
| +++ |+#include <assert.h>
| 25 |
I've compiled with the following command:
| make -j
Please use `lua_assert()` instead of `assert()` inside LuaJIT code.
I suppose that should partially fix red CI [2].
> index 6f7b9a9..4228836 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c
> @@ -10,11 +10,18 @@
>
> #include "lua.h"
> #include "lmisclib.h"
> +#include "lauxlib.h"
> +#include <errno.h>
> +#include <ctype.h>
Please, include system-wide headers before local ones.
>
> #include "lj_obj.h"
> #include "lj_str.h"
> #include "lj_tab.h"
> #include "lj_lib.h"
> +#include "lj_err.h"
> +#include "lj_ctype.h"
> +#include "lj_cdata.h"
Usually these two headers includes only if LuaJIT build with FFI. Please add
| #if LJ_HASFFI
directive.
> +#include "lj_state.h"
Please, update <Makefile.dep> with these headers and their dependencies
recursively.
>
> /* ------------------------------------------------------------------------ */
>
> @@ -63,6 +70,172 @@ LJLIB_CF(misc_getmetrics)
> return 1;
> }
>
> +LJLIB_CF(misc_tonumber64)
What is the function's behaviour when LuaJIT build without FFI?
> +{
> + TValue *o = L->base;
> + if (o >= L->top)
> + lj_err_arg(L, 1, LJ_ERR_NOVAL);
It seems like `lj_lib_checkany()` does what you want here, doesn't it?
> + int base = luaL_optint(L, 2, -1);
Please declare variables at the beginning of a block here and below,
considering LuaJIT code style (you can use these flags
| -Wall -Wextra -Wdeclaration-after-statement -Wredundant-decls -Wshadow -Wpointer-arith
to check yourself, see <src/Makefile>).
Why do you use the public interface instead `lj_lib_optint()`?
> + luaL_argcheck(L, (2 <= base && base <= 36) || base == -1, 2,
> + "base out of range");
Let's use quarter-fake tab here. See <lib_base.c> indentation style, for
example.
Also, you may use `lj_err_arg()` here.
> + uint32_t ctypeid = 0;
> + GCcdata *cd;
> + CTSize size;
> + CTState *cts;
> + double val_d;
AFAICS, `val_d` value uses only inside `LUA_TNUMBER` case-block and can
be declared inside it. Also, I see no reason to use `double` instead
`lua_Number` here.
> + switch (lua_type(L, 1)) {
> + case LUA_TNUMBER:
> + base = (base == -1 ? 10 : base);
> + if (base != 10)
> + return luaL_argerror(L, 1, "string expected");
Nit: AFAICS here and below we expect string if `base != 10`.
IMHO, code structure like this a little bit clearer:
| const int32_t base = lj_lib_optint(L, 2, 10);
| if (base == 10) {
| TValue *o = lj_lib_checkany(L, 1);
| /* ... */
| } else {
| const char *arg = strdata(lj_lib_checkstr(L, 1));
| /* ... */
|
Feel free to ignore.
> + val_d = numV(L->base);
> + if (val_d < (double)INT64_MIN || val_d >= (double)UINT64_MAX)
> + return luaL_argerror(L, 1, "cannot convert to 64-bit integer");
It's a little bit confusing -- we throw an error here, but for the "same"
string returns nil. Both ways break backward compatibility (as far as
`tonumber64` just yield `lua_Number` before, although this is not described in
the documentation), but at least we should be consistent here, shouldn't we?
| $ src/luajit -e 'print(misc.tonumber64(18446744073709551616))'
| src/luajit: (command line):1: bad argument #1 to 'tonumber64' (cannot convert to 64-bit integer)
| stack traceback:
| [C]: in function 'tonumber64'
| (command line):1: in main chunk
| [C]: at 0x55f47d5541d0
|
| ################################################################
|
| $ src/luajit -e 'print(misc.tonumber64("18446744073709551616"))'
| nil
Yields nil in both ways looks more consistent to me. Thoughts?
> + cts = ctype_cts(L);
Unfortunately, ctype state may be not initialized here. It's initialized at
opening of ffi library. This will lead to crush if we call `misc.tonumber64()`
before load of ffi.
| $ src/luajit -e 'print(misc.tonumber64(1))'
| Segmentation fault (core dumped)
> + if (val_d < 0.0)
> + ctypeid = CTID_INT64;
> + else
> + ctypeid = CTID_UINT64;
IMHO, the ternary operator is prettier here, isn't it?
| ctypeid = (val_d < 0) ? CTID_INT64 : CTID_UINT64;
> + lj_ctype_info(cts, ctypeid, &size);
This call looks redundant -- we already know that size == 8.
> + assert(size != CTSIZE_INVALID);
> + cd = lj_cdata_new(cts, ctypeid, size);
Is the usage of `lj_cdata_new_()` better here?
> + o = L->top;
> + setcdataV(L, o, cd);
I suppose you forgot to use `lj_gc_check(L)` here and below when return cdata.
We need to check used memory to start GC cycle if necessary.
> + incr_top(L);
We shouldn't check for growing stack here and below. We have 1 or 2 argument
and return 1 value.
> + if (val_d < 0.0)
This check and the following cast look redundant, don't they?
> + *(int64_t *) cdataptr(cd) = (int64_t)val_d;
> + else
> + *(uint64_t *) cdataptr(cd) = (uint64_t)val_d;
> + return 1;
> + case LUA_TSTRING: {
> + GCstr *s = strV(o);
> + const char *arg = strdata(s);
> + size_t argl = s->len;
> + while (argl > 0 && isspace(arg[argl - 1]))
Is there any reason why do you prefer `isspace()` over `lj_char_isspace()`?
> + argl--;
> + while (isspace(*arg)) {
Ditto.
> + arg++;
> + argl--;
> + }
> +
> + /*
> + * Check if we're parsing custom format:
Please use LuaJIT comment format here and below (for example, see any copyright
header).
> + * 1) '0x' or '0X' trim in case of base == 16 or base == -1
> + * 2) '0b' or '0B' trim in case of base == 2 or base == -1
> + * 3) '-' for negative numbers
> + * 4) LL, ULL, LLU - trim, but only for base == 2 or
> + * base == 16 or base == -1. For consistency do not bother
> + * with any non-common bases, since user may have specified
> + * base >= 22, in which case 'L' will be a digit.
> + */
Side note: AFAIK <lj_strscan.c> has a lot of functions to parse different formats.
May be it is more consistent to use LuaJIT's parsers here.
Thoughts?
> + char negative = 0;
> + if (arg[0] == '-') {
> + arg++;
> + argl--;
> + negative = 1;
> + }
> + if (argl > 2 && arg[0] == '0') {
> + if ((arg[1] == 'x' || arg[1] == 'X') &&
I suppose, that you can use `casecmp()` macro like in <src/lj_strscan.c> here
and below.
> + (base == 16 || base == -1)) {
> + base = 16; arg += 2; argl -= 2;
> + } else if ((arg[1] == 'b' || arg[1] == 'B') &&
> + (base == 2 || base == -1)) {
> + base = 2; arg += 2; argl -= 2;
> + }
> + }
> + int ull = 0;
> + if (argl > 2 && (base == 2 || base == 16 || base == -1)) {
> + if (arg[argl - 1] == 'u' || arg[argl - 1] == 'U') {
> + ull = 1;
> + --argl;
It will be nice to see an explanation about behaviour of working with integers
less that `INT64_MIN` for example. Should `tonumber64` returns nil here or same
value as LuaJIT parsers for corresponding input string?
IMHO, the value should overlap (behaviour must be same as for LuaJIT's parser).
> + }
> + if ((arg[argl - 1] == 'l' || arg[argl - 1] == 'L') &&
> + (arg[argl - 2] == 'l' || arg[argl - 2] == 'L')) {
> + argl -= 2;
> + if (ull == 0 && (arg[argl - 1] == 'u' || arg[argl - 1] == 'U')) {
> + ull = 1;
> + --argl;
> + }
> + } else {
> + ull = 0;
> + }
> + }
I see no check for strings like "1.5" or "1e6" and so on.
Why are these formats not supported?
> + base = (base == -1 ? 10 : base);
> + errno = 0;
> + char *arge;
> + unsigned long long result = strtoull(arg, &arge, base);
> + if (errno == 0 && arge == arg + argl) {
> + if (argl == 0) {
> + lua_pushnil(L);
> + } else if (negative) {
> + if (ull == 0 && result != 0 && result - 1 > INT64_MAX) {
Will we save this behaviour? I suppose it should be the same as
for LuaJIT parser.
Thoughts?
> + lua_pushnil(L);
> + return 1;
> + }
> + cts = ctype_cts(L);
> + /*
This comment should be highter, shouldn't it?
> + * To test overflow, consider
> + * result > -INT64_MIN;
> + * result - 1 > -INT64_MIN - 1;
> + * Assumption:
> + * INT64_MAX == -(INT64_MIN + 1);
> + * Finally,
> + * result - 1 > INT64_MAX;
> + */
Does this cast differ from?
| (uint64_t)-(int64_t)result
> + int64_t val_i;
> + uint64_t val_u;
> + if (ull != 0) {
> + val_u = (UINT64_MAX - result) + 1;
> + ctypeid = CTID_UINT64;
> + } else {
> + val_i = -result;
> + ctypeid = CTID_INT64;
> + }
> +
> + lj_ctype_info(cts, ctypeid, &size);
This call looks redundant -- we already know that size == 8.
> + assert(size != CTSIZE_INVALID);
> +
> + cd = lj_cdata_new(cts, ctypeid, size);
> + TValue *o = L->top;
> + setcdataV(L, o, cd);
> + incr_top(L);
> + if (ull != 0)
> + *(uint64_t *) cdataptr(cd) = val_u;
> + else
> + *(int64_t *) cdataptr(cd) = val_i;
> + } else {
The code here duplicates the code above. I suppose that it can be done much
simpler.
Something like that:
| *(uint64_t *)cdataptr(cd) = neg ? (uint64_t)-(int64_t)result : result;
All these comments return us to question about <lj_strscan.*> parsers.
At least it will be nice to see an explanation in commit message, why
they are not suitable here (if they are not).
> + cts = ctype_cts(L);
> + uint64_t val_u = result;
> + ctypeid = CTID_UINT64;
> + lj_ctype_info(cts, ctypeid, &size);
> + assert(size != CTSIZE_INVALID);
> +
> + cd = lj_cdata_new(cts, ctypeid, size);
> + TValue *o = L->top;
> + setcdataV(L, o, cd);
> + incr_top(L);
> + *(uint64_t *) cdataptr(cd) = val_u;
> + }
> + return 1;
> + }
> + break;
> + }
> + case LUA_TCDATA:
> + base = (base == -1 ? 10 : base);
> + if (base != 10)
> + return luaL_argerror(L, 1, "string expected");
> + cd = cdataV(L->base);
> + ctypeid = cd->ctypeid;
> + if (ctypeid >= CTID_INT8 && ctypeid <= CTID_DOUBLE) {
> + lua_pushvalue(L, 1);
> + return 1;
> + }
> + break;
> + }
> + lua_pushnil(L);
> + return 1;
> +}
> +
What's about LuaC API?
> /* ------------------------------------------------------------------------ */
>
> #include "lj_libdef.h"
> diff --git a/test/gh-4738-incorrect-result-tonumber64.test.lua b/test/gh-4738-incorrect-result-tonumber64.test.lua
Nit: It is new functionality for LuaJIT maybe it will be better to use the same
naming for the test like `getmetrics()` and leave a comment with issue reference
nearby corresponding test.
Feel free to ignore.
> new file mode 100755
> index 0000000..1d6da4a
> --- /dev/null
> +++ b/test/gh-4738-incorrect-result-tonumber64.test.lua
> @@ -0,0 +1,38 @@
> +#!/usr/bin/env tarantool
> +
> +-- Miscellaneous test for LuaJIT bugs
I don't understand this comment. This test checks `misc.tonumber64()` only,
doesn't it?
> +local tap = require('tap')
> +local misc = require('misc')
It is not necessary to require this builtin library (ffi library is the only
one exception).
> +
> +local test = tap.test("gh-4738-incorrect-result-tonumber64")
> +test:plan(24)
> +--
> +-- gh-4738: Make sure that tonumber64 always returns cdata.
> +--
> +test:ok(misc.tonumber64(1) == 1)
> +test:ok(misc.tonumber64(-1) == -1)
> +test:ok(misc.tonumber64(1.5) == 1)
> +test:ok(misc.tonumber64(-1.5) == -1)
> +test:ok(misc.tonumber64(1LL) == 1)
> +test:ok(misc.tonumber64(1ULL) == 1)
> +test:ok(misc.tonumber64(1LLU) == 1)
> +test:ok(misc.tonumber64(-1ULL) == 18446744073709551615ULL)
> +test:ok(misc.tonumber64('1') == 1)
> +test:ok(misc.tonumber64('1LL') == 1)
> +test:ok(misc.tonumber64('1ULL') == 1)
> +test:ok(misc.tonumber64('-1ULL') == 18446744073709551615ULL)
> +
> +test:is(type(misc.tonumber64(1)), 'cdata')
Minor: we check each number/cdata/string twice: to value and its type.
May be it well be better to use 12 subtests that test value and type.
Feel free to ignore.
> +test:is(type(misc.tonumber64(-1)), 'cdata')
> +test:is(type(misc.tonumber64(1.5)), 'cdata')
> +test:is(type(misc.tonumber64(-1.5)), 'cdata')
> +test:is(type(misc.tonumber64(1LL)), 'cdata')
> +test:is(type(misc.tonumber64(1ULL)), 'cdata')
> +test:is(type(misc.tonumber64(1LLU)), 'cdata')
> +test:is(type(misc.tonumber64(-1ULL)), 'cdata')
> +test:is(type(misc.tonumber64('1')), 'cdata')
> +test:is(type(misc.tonumber64('1LL')), 'cdata')
> +test:is(type(misc.tonumber64('1ULL')), 'cdata')
> +test:is(type(misc.tonumber64('-1ULL')), 'cdata')
It would be great to add several tests for `ffi.typeof()` since there are
complaints [3] on return value type.
> +
> +os.exit(test:check() and 0 or 1)
> --
> 2.25.1
>
[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-January/013801.html
[2]: https://gitlab.com/tarantool/tarantool/-/pipelines/216530551/failures
[3]: https://github.com/tarantool/tarantool/issues/4738#issuecomment-687241952
--
Best regards,
Sergey Kaplun
More information about the Tarantool-patches
mailing list