From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 1B3E1CA6EA4; Thu, 4 Jul 2024 11:08:26 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1B3E1CA6EA4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1720080506; bh=ZQOTK2FYlwcuYnc0tMqzl3EM2dFOYXtBJhHR35h6Hbc=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=dcfzSkJPOjrPNfn0XwJRVHcEU3Dn0EdwuKIxQfaBXTHgtHgB4N8/6IRA7VkCtUPxC LQlSBwsyTP4dO4kDk8cdWzLIYYW5iRC61/6qM4er/yOzwEinmSsxy2ZwuiHEt/flE6 VOvnF1XxYJBs/COjF5afcMMgCPqZUVYrgpp5u9WU= Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [95.163.41.78]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 1E478CA6EA4 for ; Thu, 4 Jul 2024 11:08:25 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1E478CA6EA4 Received: by exim-smtp-84f7fbf7d8-czwgm with esmtpa (envelope-from ) id 1sPHVu-00000000MKP-1odr; Thu, 04 Jul 2024 11:08:22 +0300 Content-Type: multipart/alternative; boundary="------------04pQwLs0AUez0ROwkwvU4kRZ" Message-ID: Date: Thu, 4 Jul 2024 11:08:22 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun , Maxim Kokryashkin Cc: tarantool-patches@dev.tarantool.org References: In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9E6DEAB84015E6B498CE8E4DAA70EE9391C4671B1C19AC138182A05F53808504024878EDB85D9BD4E3DE06ABAFEAF6705BCC8DB5D7322EFE7AAB2EA3318260825955D7F540D105623 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE79E448936ADEE6ED1C2099A533E45F2D0395957E7521B51C2CFCAF695D4D8E9FCEA1F7E6F0F101C6778DA827A17800CE7F35A5D86BDFCC4EDEA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38B043BF0FB74779F36407D574AA13299B28F74F8EA3F8DAD3F042AE780CBF95C47A471835C12D1D9774AD6D5ED66289B5278DA827A17800CE73AFA331E307B52169FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C3BCA4DA3BE1BC1572CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C2249A496649CD2B9676276E601842F6C81A12EF20D2F80756B5FB606B96278B59C4276E601842F6C81A127C277FBC8AE2E8BA0F45A697F502CAD3AA81AA40904B5D99C9F4D5AE37F343AD1F44FA8B9022EA23BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4930A3850AC1BE2E7356A58BF81F659395FC4224003CC83647689D4C264860C145E X-C1DE0DAB: 0D63561A33F958A58F49250467C0E61B5002B1117B3ED69690E23429FC9D4A3D6E5F408120975D33823CB91A9FED034534781492E4B8EEAD1DEE520572A35377BDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFD97D30CCEA2B59F8B145443846087E62CA8D2B3C26229A3B6DF492B4069F37FDC7C5EACBE3BE5B47CABF58F1CDB0D7D19888E46FEC4C19A685A36304FE00263BFFC739D3A0E79B115F4332CA8FE04980913E6812662D5F2AB9AF64DB4688768036DF5FE9C0001AF333F2C28C22F508233FCF178C6DD14203 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj2Rj/U0IvFaplOwqrzcUPiA== X-Mailru-Sender: 520A125C2F17F0B1E52FEF5D219D6140C52F67964530EF6DEAE04967732FC458DF628B2AC856E3D80152A3D17938EB451EB5A0BCEC6A560B3DDE9B364B0DF289BE2DA36745F2EEB5CEBA01FB949A1F1EEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] Avoid negation of signed integers in C that may hold INT*_MIN. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------04pQwLs0AUez0ROwkwvU4kRZ Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, Sergey, thanks for the patch! LGTM with nit below On 25.06.2024 18:54, Sergey Kaplun wrote: > From: Mike Pall > > Reported by minoki. > Recent C compilers 'take advantage' of the undefined behavior. > This completely changes the meaning of expressions like (k == -k). > > (cherry picked from commit 8a5e398c52c7f8ca3e1a0e574cc2ba38224b759b) > > This patch changes all possibly dangerous -x operations on integers to > the corresponding two's complement. Also, it removes all related UBSAN > suppressions, since they are fixed. > > Also, this patch limits the `bit.tohex()` result by 254 characters. > > There is no testcase for `strscan_oct()`, `strscan_dec()` or/and > `STRSCAN_U32` format since first the unary minus is parsed first and > only after the number itself is parsed during parsing C syntax. So the > error is raised in `cp_expr_prefix()` instead. For parsing the exponent > header, there is no testcase, since the power is limited by > `STRSCAN_MAXEXP`. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#9924 > Relates to tarantool/tarantool#8473 > --- > diff --git a/test/tarantool-tests/lj-928-int-min-negation.test.lua b/test/tarantool-tests/lj-928-int-min-negation.test.lua > new file mode 100644 > index 00000000..26f4ed8e > --- /dev/null > +++ b/test/tarantool-tests/lj-928-int-min-negation.test.lua > @@ -0,0 +1,121 @@ > +local tap = require('tap') > + > +-- Test file to demonstrate LuaJIT's UBSan failures during > +-- `INT*_MIN` negation. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/928. > + > +local test = tap.test('lj-928-int-min-negation.'):skipcond({ dot could be omitted in a test's name. > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +local INT32_MIN = -0x80000000 > +local INT64_MIN = -0x8000000000000000 > +local TOBIT_CHAR_MAX = 254 > + > +-- XXX: Many tests (`tonumber()`-related) are failed under UBSan > +-- with DUALNUM enabled. They are included to avoid regressions in > +-- the future if such a build becomes the default. > +local ffi = require('ffi') > +local LL_T = ffi.typeof(1LL) > + > +test:plan(14) > + > +jit.opt.start('hotloop=1') > + > +-- Temporary variable for the results. > +local r > + > +-- :`lj_vm_modi()` > +for _ = 1, 4 do > + -- Use additional variables to avoid folding during parsing. > + -- Operands should be constants on the trace. > + local x = -0x80000000 > + local y = -0x80000000 > + r = x % y > +end > +test:is(r, 0, 'no UB during lj_vm_modi') > + > +-- :`lj_strfmt_wint()` > +for _ = 1, 4 do > + -- Operand should be the constant on the trace. > + r = tostring(bit.tobit(0x80000000)) > +end > +test:is(r, '-2147483648', 'no UB during lj_strfmt_wint') > + > +-- :`lj_strfmt_putfxint()` > +test:is(('%d'):format(INT64_MIN), '-9223372036854775808', > + 'no UB during lj_strfmt_putfxint') > + > +-- :`bcemit_unop()` > +local int64_min_cdata = -0x8000000000000000LL > +test:ok(true, 'no UB during bcemit_unop') > + > +-- :`carith_int64()` > +-- Use the additional variable to avoid folding during > +-- `bcemit_unop()`. > +test:is(-int64_min_cdata, int64_min_cdata, 'no UB during carith_int64') > + > +-- :`lj_ctype_repr_int64()` > +-- Use cast to separate the test case from `bcemit_unop()`. > +test:is(tostring(LL_T(INT64_MIN)), '-9223372036854775808LL', > + 'no UB during lj_ctype_repr_int64') > + > +local TOHEX_EXPECTED = ('0'):rep(TOBIT_CHAR_MAX) > +-- :`bit_tohex()` > +-- The second argument is the number of bytes to be represented. > +-- The negative value stands for uppercase. > +test:is(bit.tohex(0, INT32_MIN), TOHEX_EXPECTED, 'no UB during bit_tohex') > + > +-- :`recff_bit64_tohex()` > +-- The second argument is the number of bytes to be represented. > +-- The negative value stands for uppercase. > +for _ = 1, 4 do > + -- The second argument should be the constant on the trace. > + r = bit.tohex(0, -0x80000000) > +end > +test:is(r, TOHEX_EXPECTED, 'no UB during recording bit.tohex') > + > +-- :`simplify_intsub_k()` > +r = 0 > +for _ = 1, 4 do > + r = r - 0x8000000000000000LL > +end > +test:is(r, 0LL, 'no UB during simplify_intsub_k') > + > +-- :`strscan_hex()` > +test:is(tonumber('-0x80000000'), INT32_MIN, 'no UB during strscan_hex') > + > +-- :`strscan_bin()` > +test:is(tonumber('-0b10000000000000000000000000000000'), INT32_MIN, > + 'no UB during strscan_bin') > + > +-- :`lj_strscan_scan()` > +test:is(tonumber('-2147483648'), INT32_MIN, 'no UB during strscan_scan') > + > +-- Test for 32bit long, just in case. > +-- :`tonumber()` > +test:is(tonumber('-2000000000000000', 4), INT32_MIN, > + 'no UB during tonumber, base 4') > + > +-- :`cp_expr_prefix()` > +-- According to ISO/IEC 9899:2023 [1]: > +-- | Each constant expression shall evaluate to a constant that is > +-- | in the range of representable values for its type. > +-- It means that since 0x80000000 does not fit in the int32_t > +-- range, -0x80000000 does not fit in the int32_t range either. > +-- > +-- In the case when the enumeration has no fixed underlying type, > +-- the type of the enum is implementation defined [2][3]. > +-- > +-- Hence, we used -INT32_MAX - 1 since both values fit into > +-- int32_t, so it can't be ambiguous. > +-- > +-- luacheck: ignore (too long line) > +-- [1]:https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#subsection.6.2.6 > +-- [2]:https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf#%5B%7B%22num%22%3A232%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22Fit%22%7D%5D > +-- [3]:https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#subsubsection.6.7.2.2 > +ffi.cdef[[typedef enum {enum_int32_min = -0x7fffffff - 1} enum_t;]] > +test:is(ffi.new('enum_t', 'enum_int32_min'), LL_T(INT32_MIN), > + 'no UB during cp_expr_prefix') > + > +test:done(true) --------------04pQwLs0AUez0ROwkwvU4kRZ Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi, Sergey,

thanks for the patch! LGTM with nit below


On 25.06.2024 18:54, Sergey Kaplun wrote:
From: Mike Pall <mike>

Reported by minoki.
Recent C compilers 'take advantage' of the undefined behavior.
This completely changes the meaning of expressions like (k == -k).

(cherry picked from commit 8a5e398c52c7f8ca3e1a0e574cc2ba38224b759b)

This patch changes all possibly dangerous -x operations on integers to
the corresponding two's complement. Also, it removes all related UBSAN
suppressions, since they are fixed.

Also, this patch limits the `bit.tohex()` result by 254 characters.

There is no testcase for `strscan_oct()`, `strscan_dec()` or/and
`STRSCAN_U32` format since first the unary minus is parsed first and
only after the number itself is parsed during parsing C syntax. So the
error is raised in `cp_expr_prefix()` instead. For parsing the exponent
header, there is no testcase, since the power is limited by
`STRSCAN_MAXEXP`.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9924
Relates to tarantool/tarantool#8473
---

<snipped>


diff --git a/test/tarantool-tests/lj-928-int-min-negation.test.lua b/test/tarantool-tests/lj-928-int-min-negation.test.lua
new file mode 100644
index 00000000..26f4ed8e
--- /dev/null
+++ b/test/tarantool-tests/lj-928-int-min-negation.test.lua
@@ -0,0 +1,121 @@
+local tap = require('tap')
+
+-- Test file to demonstrate LuaJIT's UBSan failures during
+-- `INT*_MIN` negation.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/928.
+
+local test = tap.test('lj-928-int-min-negation.'):skipcond({

dot could be omitted in a test's name.


+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+local INT32_MIN = -0x80000000
+local INT64_MIN = -0x8000000000000000
+local TOBIT_CHAR_MAX = 254
+
+-- XXX: Many tests (`tonumber()`-related) are failed under UBSan
+-- with DUALNUM enabled. They are included to avoid regressions in
+-- the future if such a build becomes the default.
+local ffi = require('ffi')
+local LL_T = ffi.typeof(1LL)
+
+test:plan(14)
+
+jit.opt.start('hotloop=1')
+
+-- Temporary variable for the results.
+local r
+
+-- <src/lj_vmmath.c>:`lj_vm_modi()`
+for _ = 1, 4 do
+  -- Use additional variables to avoid folding during parsing.
+  -- Operands should be constants on the trace.
+  local x = -0x80000000
+  local y = -0x80000000
+  r = x % y
+end
+test:is(r, 0, 'no UB during lj_vm_modi')
+
+-- <src/lj_strfmt.c>:`lj_strfmt_wint()`
+for _ = 1, 4 do
+  -- Operand should be the constant on the trace.
+  r = tostring(bit.tobit(0x80000000))
+end
+test:is(r, '-2147483648', 'no UB during lj_strfmt_wint')
+
+-- <src/lj_strfmt.c>:`lj_strfmt_putfxint()`
+test:is(('%d'):format(INT64_MIN), '-9223372036854775808',
+        'no UB during lj_strfmt_putfxint')
+
+-- <src/lj_parse.c>:`bcemit_unop()`
+local int64_min_cdata = -0x8000000000000000LL
+test:ok(true, 'no UB during bcemit_unop')
+
+-- <src/lj_carith.c>:`carith_int64()`
+-- Use the additional variable to avoid folding during
+-- `bcemit_unop()`.
+test:is(-int64_min_cdata, int64_min_cdata, 'no UB during carith_int64')
+
+-- <src/lj_ctype.c>:`lj_ctype_repr_int64()`
+-- Use cast to separate the test case from `bcemit_unop()`.
+test:is(tostring(LL_T(INT64_MIN)), '-9223372036854775808LL',
+        'no UB during lj_ctype_repr_int64')
+
+local TOHEX_EXPECTED = ('0'):rep(TOBIT_CHAR_MAX)
+-- <src/lib_bit.c>:`bit_tohex()`
+-- The second argument is the number of bytes to be represented.
+-- The negative value stands for uppercase.
+test:is(bit.tohex(0, INT32_MIN), TOHEX_EXPECTED, 'no UB during bit_tohex')
+
+-- <src/lj_crecord.c>:`recff_bit64_tohex()`
+-- The second argument is the number of bytes to be represented.
+-- The negative value stands for uppercase.
+for _ = 1, 4 do
+  -- The second argument should be the constant on the trace.
+  r = bit.tohex(0, -0x80000000)
+end
+test:is(r, TOHEX_EXPECTED, 'no UB during recording bit.tohex')
+
+-- <src/lj_opt_fold.c>:`simplify_intsub_k()`
+r = 0
+for _ = 1, 4 do
+  r = r - 0x8000000000000000LL
+end
+test:is(r, 0LL, 'no UB during simplify_intsub_k')
+
+-- <src/lj_strscan.c>:`strscan_hex()`
+test:is(tonumber('-0x80000000'), INT32_MIN, 'no UB during strscan_hex')
+
+-- <src/lj_strscan.c>:`strscan_bin()`
+test:is(tonumber('-0b10000000000000000000000000000000'), INT32_MIN,
+        'no UB during strscan_bin')
+
+-- <src/lj_strscan.c>:`lj_strscan_scan()`
+test:is(tonumber('-2147483648'), INT32_MIN, 'no UB during strscan_scan')
+
+-- Test for 32bit long, just in case.
+-- <src/lib_base.c>:`tonumber()`
+test:is(tonumber('-2000000000000000', 4), INT32_MIN,
+        'no UB during tonumber, base 4')
+
+-- <src/lj_cparse.c>:`cp_expr_prefix()`
+-- According to ISO/IEC 9899:2023 [1]:
+-- | Each constant expression shall evaluate to a constant that is
+-- | in the range of representable values for its type.
+-- It means that since 0x80000000 does not fit in the int32_t
+-- range, -0x80000000 does not fit in the int32_t range either.
+--
+-- In the case when the enumeration has no fixed underlying type,
+-- the type of the enum is implementation defined [2][3].
+--
+-- Hence, we used -INT32_MAX - 1 since both values fit into
+-- int32_t, so it can't be ambiguous.
+--
+-- luacheck: ignore (too long line)
+-- [1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#subsection.6.2.6
+-- [2]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf#%5B%7B%22num%22%3A232%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22Fit%22%7D%5D
+-- [3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#subsubsection.6.7.2.2
+ffi.cdef[[typedef enum {enum_int32_min = -0x7fffffff - 1} enum_t;]]
+test:is(ffi.new('enum_t', 'enum_int32_min'), LL_T(INT32_MIN),
+        'no UB during cp_expr_prefix')
+
+test:done(true)
--------------04pQwLs0AUez0ROwkwvU4kRZ--