Tarantool development patches archive
 help / color / mirror / Atom feed
From: Maxim Kokryashkin via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Kaplun <skaplun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] Avoid negation of signed integers in C that may hold INT*_MIN.
Date: Mon, 1 Jul 2024 12:11:59 +0300	[thread overview]
Message-ID: <gtg2uuqs7c4pbyyxkh3ufk7dxdfg6oq6wcjh5p67ldpercv7g4@tq3l6yy44rlm> (raw)
In-Reply-To: <e586c7e8418c500e190b330529d51fac32fa6df5.1719329795.git.skaplun@tarantool.org>

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few nits below.

On Tue, Jun 25, 2024 at 06:54:25PM GMT, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Reported by minoki.
> Recent C compilers 'take advantage' of the undefined behavior.
> This completely changes the meaning of expressions like (k == -k).
>
> (cherry picked from commit 8a5e398c52c7f8ca3e1a0e574cc2ba38224b759b)
>
> This patch changes all possibly dangerous -x operations on integers to
> the corresponding two's complement. Also, it removes all related UBSAN
> suppressions, since they are fixed.
>
> Also, this patch limits the `bit.tohex()` result by 254 characters.
>
> There is no testcase for `strscan_oct()`, `strscan_dec()` or/and
> `STRSCAN_U32` format since first the unary minus is parsed first and
> only after the number itself is parsed during parsing C syntax. So the
> error is raised in `cp_expr_prefix()` instead. For parsing the exponent
> header, there is no testcase, since the power is limited by
> `STRSCAN_MAXEXP`.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#9924
> Relates to tarantool/tarantool#8473
> ---
>  src/lib_base.c                                |   2 +-
>  src/lib_bit.c                                 |   3 +-
>  src/lj_asm_mips.h                             |   2 +-
>  src/lj_carith.c                               |   7 +-
>  src/lj_cparse.c                               |   2 +-
>  src/lj_crecord.c                              |   3 +-
>  src/lj_ctype.c                                |   2 +-
>  src/lj_emit_arm.h                             |   2 +-
>  src/lj_emit_arm64.h                           |   9 +-
>  src/lj_obj.h                                  |   2 +-
>  src/lj_opt_fold.c                             |   6 +-
>  src/lj_parse.c                                |  17 +--
>  src/lj_strfmt.c                               |   9 +-
>  src/lj_strscan.c                              |  26 ++--
>  src/lj_vmmath.c                               |   6 +-
>  .../lj-928-int-min-negation.test.lua          | 121 ++++++++++++++++++
>  16 files changed, 164 insertions(+), 55 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-928-int-min-negation.test.lua
>

<snipped>

> diff --git a/test/tarantool-tests/lj-928-int-min-negation.test.lua b/test/tarantool-tests/lj-928-int-min-negation.test.lua
> new file mode 100644
> index 00000000..26f4ed8e
> --- /dev/null
> +++ b/test/tarantool-tests/lj-928-int-min-negation.test.lua
> @@ -0,0 +1,121 @@
> +local tap = require('tap')
> +
> +-- Test file to demonstrate LuaJIT's UBSan failures during
> +-- `INT*_MIN` negation.
> +-- See also: https://github.com/LuaJIT/LuaJIT/issues/928.
> +
> +local test = tap.test('lj-928-int-min-negation.'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +local INT32_MIN = -0x80000000
> +local INT64_MIN = -0x8000000000000000
> +local TOBIT_CHAR_MAX = 254
> +
> +-- XXX: Many tests (`tonumber()`-related) are failed under UBSan
Typo: s/are failed/are failing/
> +-- with DUALNUM enabled. They are included to avoid regressions in
> +-- the future if such a build becomes the default.
> +local ffi = require('ffi')
> +local LL_T = ffi.typeof(1LL)
> +
> +test:plan(14)
> +
> +jit.opt.start('hotloop=1')
> +
> +-- Temporary variable for the results.
> +local r
Let's name it `result` or `tmp_result` then.

> +
> +-- <src/lj_vmmath.c>:`lj_vm_modi()`
> +for _ = 1, 4 do
> +  -- Use additional variables to avoid folding during parsing.
> +  -- Operands should be constants on the trace.
> +  local x = -0x80000000
> +  local y = -0x80000000
> +  r = x % y
> +end
> +test:is(r, 0, 'no UB during lj_vm_modi')
> +
> +-- <src/lj_strfmt.c>:`lj_strfmt_wint()`
> +for _ = 1, 4 do
> +  -- Operand should be the constant on the trace.
Typo: s/the constant/a constant/
> +  r = tostring(bit.tobit(0x80000000))
> +end
> +test:is(r, '-2147483648', 'no UB during lj_strfmt_wint')
> +
> +-- <src/lj_strfmt.c>:`lj_strfmt_putfxint()`
> +test:is(('%d'):format(INT64_MIN), '-9223372036854775808',
> +        'no UB during lj_strfmt_putfxint')
> +
> +-- <src/lj_parse.c>:`bcemit_unop()`
> +local int64_min_cdata = -0x8000000000000000LL
> +test:ok(true, 'no UB during bcemit_unop')
> +
> +-- <src/lj_carith.c>:`carith_int64()`
> +-- Use the additional variable to avoid folding during
> +-- `bcemit_unop()`.
> +test:is(-int64_min_cdata, int64_min_cdata, 'no UB during carith_int64')
> +
> +-- <src/lj_ctype.c>:`lj_ctype_repr_int64()`
> +-- Use cast to separate the test case from `bcemit_unop()`.
> +test:is(tostring(LL_T(INT64_MIN)), '-9223372036854775808LL',
> +        'no UB during lj_ctype_repr_int64')
> +
> +local TOHEX_EXPECTED = ('0'):rep(TOBIT_CHAR_MAX)
> +-- <src/lib_bit.c>:`bit_tohex()`
> +-- The second argument is the number of bytes to be represented.
> +-- The negative value stands for uppercase.
> +test:is(bit.tohex(0, INT32_MIN), TOHEX_EXPECTED, 'no UB during bit_tohex')
> +
> +-- <src/lj_crecord.c>:`recff_bit64_tohex()`
> +-- The second argument is the number of bytes to be represented.
> +-- The negative value stands for uppercase.
> +for _ = 1, 4 do
> +  -- The second argument should be the constant on the trace.
> +  r = bit.tohex(0, -0x80000000)
> +end
> +test:is(r, TOHEX_EXPECTED, 'no UB during recording bit.tohex')
> +
> +-- <src/lj_opt_fold.c>:`simplify_intsub_k()`
> +r = 0
> +for _ = 1, 4 do
> +  r = r - 0x8000000000000000LL
> +end
> +test:is(r, 0LL, 'no UB during simplify_intsub_k')
> +
> +-- <src/lj_strscan.c>:`strscan_hex()`
> +test:is(tonumber('-0x80000000'), INT32_MIN, 'no UB during strscan_hex')
> +
> +-- <src/lj_strscan.c>:`strscan_bin()`
> +test:is(tonumber('-0b10000000000000000000000000000000'), INT32_MIN,
> +        'no UB during strscan_bin')
> +
> +-- <src/lj_strscan.c>:`lj_strscan_scan()`
> +test:is(tonumber('-2147483648'), INT32_MIN, 'no UB during strscan_scan')
> +
> +-- Test for 32bit long, just in case.
> +-- <src/lib_base.c>:`tonumber()`
> +test:is(tonumber('-2000000000000000', 4), INT32_MIN,
> +        'no UB during tonumber, base 4')
> +
> +-- <src/lj_cparse.c>:`cp_expr_prefix()`
> +-- According to ISO/IEC 9899:2023 [1]:
> +-- | Each constant expression shall evaluate to a constant that is
> +-- | in the range of representable values for its type.
> +-- It means that since 0x80000000 does not fit in the int32_t
> +-- range, -0x80000000 does not fit in the int32_t range either.
> +--
> +-- In the case when the enumeration has no fixed underlying type,
> +-- the type of the enum is implementation defined [2][3].
> +--
> +-- Hence, we used -INT32_MAX - 1 since both values fit into
> +-- int32_t, so it can't be ambiguous.
> +--
> +-- luacheck: ignore (too long line)
> +-- [1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#subsection.6.2.6
> +-- [2]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf#%5B%7B%22num%22%3A232%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22Fit%22%7D%5D
> +-- [3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#subsubsection.6.7.2.2
> +ffi.cdef[[typedef enum {enum_int32_min = -0x7fffffff - 1} enum_t;]]
> +test:is(ffi.new('enum_t', 'enum_int32_min'), LL_T(INT32_MIN),
> +        'no UB during cp_expr_prefix')
> +
> +test:done(true)
> --
> 2.45.1
>

  reply	other threads:[~2024-07-01  9:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-25 15:54 [Tarantool-patches] [PATCH luajit 0/2] Fix UBSan warnings Sergey Kaplun via Tarantool-patches
2024-06-25 15:54 ` [Tarantool-patches] [PATCH luajit 1/2] Prevent sanitizer warning in snap_restoredata() Sergey Kaplun via Tarantool-patches
2024-07-01  8:44   ` Maxim Kokryashkin via Tarantool-patches
2024-07-04  7:58   ` Sergey Bronnikov via Tarantool-patches
2024-07-04  8:41     ` Sergey Kaplun via Tarantool-patches
2024-07-04 14:59       ` Sergey Bronnikov via Tarantool-patches
2024-06-25 15:54 ` [Tarantool-patches] [PATCH luajit 2/2] Avoid negation of signed integers in C that may hold INT*_MIN Sergey Kaplun via Tarantool-patches
2024-07-01  9:11   ` Maxim Kokryashkin via Tarantool-patches [this message]
2024-07-01 10:12     ` Sergey Kaplun via Tarantool-patches
2024-07-04  8:08   ` Sergey Bronnikov via Tarantool-patches
2024-07-04  8:40     ` Sergey Kaplun via Tarantool-patches
2024-07-04 14:59       ` Sergey Bronnikov via Tarantool-patches
2024-07-09  8:08 ` [Tarantool-patches] [PATCH luajit 0/2] Fix UBSan warnings Sergey Kaplun via Tarantool-patches
2024-07-09  8:15 ` Sergey Kaplun via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=gtg2uuqs7c4pbyyxkh3ufk7dxdfg6oq6wcjh5p67ldpercv7g4@tq3l6yy44rlm \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=m.kokryashkin@tarantool.org \
    --cc=skaplun@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH luajit 2/2] Avoid negation of signed integers in C that may hold INT*_MIN.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox