* [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation.
@ 2023-06-13 12:42 Maxim Kokryashkin via Tarantool-patches
2023-06-28 10:57 ` Sergey Kaplun via Tarantool-patches
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-06-13 12:42 UTC (permalink / raw)
To: tarantool-patches, skaplun, sergeyb
From: Mike Pall <mike>
Reported by Philipp Kutin.
Fix contributed by Peter Cawley.
(cherry-picked from commit 03a7ebca4f6819658cdaa12ba3af54a17b8035e9)
In a situation where a variable shift left bitwise rotation that
has a 64-bit result is recorded on an x86 64-bit processor and
the result is supposed to end up in the `rcx` register, that value
could be written into the `ecx` instead, thus being truncated into
32 bits. This patch fixes the described behavior, so now that
value is written into the `rcx`.
Resulting assembly changes from the following before the patch:
| rol rsi, cl
| mov ecx, esi
to the following after the patch:
| rol rsi, cl
| mov rcx, rsi
Importantly, the same behavior is impossible with the right
rotation on machines with BMI2 support because there is a
BMI2 instruction for it, so it is handled differently.
Maxim Kokryashkin:
* added the description and the test for the problem
Part of tarantool/tarantool#8516
---
Changes in v2:
- Fixed comments as per review by Sergey
Branch: https://github.com/tarantool/luajit/tree/fckxorg/fix-bit-shift-generation
PR: https://github.com/tarantool/tarantool/pull/8727
src/lj_asm_x86.h | 2 +-
test/tarantool-tests/CMakeLists.txt | 1 +
.../fix-bit-shift-generation.test.lua | 48 +++++++++++++++++++
.../fix-bit-shift-generation/CMakeLists.txt | 1 +
.../libtestbitshift.c | 8 ++++
5 files changed, 59 insertions(+), 1 deletion(-)
create mode 100644 test/tarantool-tests/fix-bit-shift-generation.test.lua
create mode 100644 test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt
create mode 100644 test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c
diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index e6c42c6d..63d332ca 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -2328,7 +2328,7 @@ static void asm_bitshift(ASMState *as, IRIns *ir, x86Shift xs, x86Op xv)
dest = ra_dest(as, ir, rset_exclude(RSET_GPR, RID_ECX));
if (dest == RID_ECX) {
dest = ra_scratch(as, rset_exclude(RSET_GPR, RID_ECX));
- emit_rr(as, XO_MOV, RID_ECX, dest);
+ emit_rr(as, XO_MOV, REX_64IR(ir, RID_ECX), dest);
}
right = irr->r;
if (ra_noreg(right))
diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
index a428d009..d36271f1 100644
--- a/test/tarantool-tests/CMakeLists.txt
+++ b/test/tarantool-tests/CMakeLists.txt
@@ -54,6 +54,7 @@ macro(BuildTestCLib lib sources)
endmacro()
add_subdirectory(ffi-ccall)
+add_subdirectory(fix-bit-shift-generation)
add_subdirectory(gh-4427-ffi-sandwich)
add_subdirectory(gh-5813-resolving-of-c-symbols/both)
add_subdirectory(gh-5813-resolving-of-c-symbols/hash)
diff --git a/test/tarantool-tests/fix-bit-shift-generation.test.lua b/test/tarantool-tests/fix-bit-shift-generation.test.lua
new file mode 100644
index 00000000..e3f30eae
--- /dev/null
+++ b/test/tarantool-tests/fix-bit-shift-generation.test.lua
@@ -0,0 +1,48 @@
+local tap = require('tap')
+local test = tap.test('fix-bit-shift-generation'):skipcond({
+ ['Test requires JIT enabled'] = not jit.status(),
+})
+
+local NTESTS = 4
+
+test:plan(NTESTS)
+
+local ffi = require('ffi')
+local bit = require('bit')
+local rol = bit.rol
+local shl = bit.lshift
+
+local testbitshift = ffi.load('testbitshift')
+ffi.cdef[[
+uint64_t
+testbitshift
+(const int arg1, const int arg2, const int arg3, const uint64_t arg4)
+]]
+
+local result = {}
+jit.opt.start('hotloop=1')
+
+for i = 1, NTESTS do
+ -- The rotation is performed beyond the 32-bit size, for
+ -- truncation to become noticeable. `testbitshift` is used to
+ -- ensure that the result of rotation goes into the `rcx`,
+ -- corresponding to the x86_64 ABI. Although it is possible to
+ -- use a function from the C standard library for that, all of
+ -- the suitable ones are variadic, and variadics are recorded
+ -- incorrectly on Apple Silicon.
+ result[i] = testbitshift.testbitshift(1, 1, 1, rol(1ULL, i + 32))
+ -- Resulting assembly for the `rol` instruction above changes
+ -- from the following before the patch:
+ -- | rol rsi, cl
+ -- | mov ecx, esi
+ --
+ -- to the following after the patch:
+ -- | rol rsi, cl
+ -- | mov rcx, rsi
+end
+
+for i = 1, NTESTS do
+ test:ok(result[i] == shl(1ULL, i + 32), 'valid rol')
+end
+
+os.exit(test:check() and 0 or 1)
diff --git a/test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt b/test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt
new file mode 100644
index 00000000..f85f875b
--- /dev/null
+++ b/test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt
@@ -0,0 +1 @@
+BuildTestCLib(libtestbitshift libtestbitshift.c)
diff --git a/test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c b/test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c
new file mode 100644
index 00000000..0785ebba
--- /dev/null
+++ b/test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c
@@ -0,0 +1,8 @@
+#include <stdint.h>
+
+uint64_t
+testbitshift
+(const int arg1, const int arg2, const int arg3, const uint64_t arg4)
+{
+ return arg4;
+}
--
2.40.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation.
2023-06-13 12:42 [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation Maxim Kokryashkin via Tarantool-patches
@ 2023-06-28 10:57 ` Sergey Kaplun via Tarantool-patches
2023-06-29 8:59 ` Igor Munkin via Tarantool-patches
2023-07-04 17:09 ` Igor Munkin via Tarantool-patches
2 siblings, 0 replies; 6+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-06-28 10:57 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the fixes!
LGTM, with an ignorable nit below.
Also, I suppose it is better to rebase on the newest master (since
merged C tests may conflict with your changes).
On 13.06.23, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
>
> Reported by Philipp Kutin.
> Fix contributed by Peter Cawley.
>
> (cherry-picked from commit 03a7ebca4f6819658cdaa12ba3af54a17b8035e9)
<snipped>
>
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/fix-bit-shift-generation
> PR: https://github.com/tarantool/tarantool/pull/8727
>
> src/lj_asm_x86.h | 2 +-
> test/tarantool-tests/CMakeLists.txt | 1 +
> .../fix-bit-shift-generation.test.lua | 48 +++++++++++++++++++
> .../fix-bit-shift-generation/CMakeLists.txt | 1 +
> .../libtestbitshift.c | 8 ++++
> 5 files changed, 59 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/fix-bit-shift-generation.test.lua
> create mode 100644 test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt
> create mode 100644 test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c
>
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index e6c42c6d..63d332ca 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
<snipped>
> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
> index a428d009..d36271f1 100644
> --- a/test/tarantool-tests/CMakeLists.txt
> +++ b/test/tarantool-tests/CMakeLists.txt
<snipped>
> diff --git a/test/tarantool-tests/fix-bit-shift-generation.test.lua b/test/tarantool-tests/fix-bit-shift-generation.test.lua
> new file mode 100644
> index 00000000..e3f30eae
> --- /dev/null
> +++ b/test/tarantool-tests/fix-bit-shift-generation.test.lua
<snipped>
> +local testbitshift = ffi.load('testbitshift')
> +ffi.cdef[[
> +uint64_t
> +testbitshift
> +(const int arg1, const int arg2, const int arg3, const uint64_t arg4)
Side note: See the ignorable comment for the testbitshift lib below.
> +]]
> +
> +local result = {}
> +jit.opt.start('hotloop=1')
> +
> +for i = 1, NTESTS do
> + -- The rotation is performed beyond the 32-bit size, for
> + -- truncation to become noticeable. `testbitshift` is used to
> + -- ensure that the result of rotation goes into the `rcx`,
> + -- corresponding to the x86_64 ABI. Although it is possible to
> + -- use a function from the C standard library for that, all of
> + -- the suitable ones are variadic, and variadics are recorded
> + -- incorrectly on Apple Silicon.
Side note: still think that we also should fix the M1 behaviour (but in
the different patch set, obviously). And we shouldn't forget to update
this test after, I believe.
> + result[i] = testbitshift.testbitshift(1, 1, 1, rol(1ULL, i + 32))
<snipped>
> diff --git a/test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt b/test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt
> new file mode 100644
> index 00000000..f85f875b
> --- /dev/null
> +++ b/test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt
<snipped>
> diff --git a/test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c b/test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c
> new file mode 100644
> index 00000000..0785ebba
> --- /dev/null
> +++ b/test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c
> @@ -0,0 +1,8 @@
> +#include <stdint.h>
> +
> +uint64_t
> +testbitshift
> +(const int arg1, const int arg2, const int arg3, const uint64_t arg4)
> +{
> + return arg4;
> +}
I suggest to use more meaningful naming (like we need to get only 4th
argument of the function). Also, it helps with linewidth.
| uint64_t
| pick4(const int arg1, const int arg2, const int arg3, const uint64_t res)
| {
| return res;
| }
Feel free to ignore.
But if you prefer the new naming -- don't forget to change it in the Lua
test too.
Also, it is strange that the compiler doesn't warn about unused
arguments, but OK.
> --
> 2.40.1
>
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation.
2023-06-13 12:42 [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation Maxim Kokryashkin via Tarantool-patches
2023-06-28 10:57 ` Sergey Kaplun via Tarantool-patches
@ 2023-06-29 8:59 ` Igor Munkin via Tarantool-patches
2023-07-04 17:09 ` Igor Munkin via Tarantool-patches
2 siblings, 0 replies; 6+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2023-06-29 8:59 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Max,
Thanks for the patch! LGTM, but I've applied minor comments left by
Sergey in the thread.
--
Best regards,
IM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation.
2023-06-13 12:42 [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation Maxim Kokryashkin via Tarantool-patches
2023-06-28 10:57 ` Sergey Kaplun via Tarantool-patches
2023-06-29 8:59 ` Igor Munkin via Tarantool-patches
@ 2023-07-04 17:09 ` Igor Munkin via Tarantool-patches
2 siblings, 0 replies; 6+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2023-07-04 17:09 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Max,
I've checked the patchset into all long-term branches in
tarantool/luajit and bumped a new version in master, release/2.11 and
release/2.10.
On 13.06.23, Maxim Kokryashkin via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Reported by Philipp Kutin.
> Fix contributed by Peter Cawley.
>
> (cherry-picked from commit 03a7ebca4f6819658cdaa12ba3af54a17b8035e9)
>
> In a situation where a variable shift left bitwise rotation that
> has a 64-bit result is recorded on an x86 64-bit processor and
> the result is supposed to end up in the `rcx` register, that value
> could be written into the `ecx` instead, thus being truncated into
> 32 bits. This patch fixes the described behavior, so now that
> value is written into the `rcx`.
>
> Resulting assembly changes from the following before the patch:
> | rol rsi, cl
> | mov ecx, esi
>
> to the following after the patch:
> | rol rsi, cl
> | mov rcx, rsi
>
> Importantly, the same behavior is impossible with the right
> rotation on machines with BMI2 support because there is a
> BMI2 instruction for it, so it is handled differently.
>
> Maxim Kokryashkin:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8516
> ---
> Changes in v2:
> - Fixed comments as per review by Sergey
>
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/fix-bit-shift-generation
> PR: https://github.com/tarantool/tarantool/pull/8727
>
> src/lj_asm_x86.h | 2 +-
> test/tarantool-tests/CMakeLists.txt | 1 +
> .../fix-bit-shift-generation.test.lua | 48 +++++++++++++++++++
> .../fix-bit-shift-generation/CMakeLists.txt | 1 +
> .../libtestbitshift.c | 8 ++++
> 5 files changed, 59 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/fix-bit-shift-generation.test.lua
> create mode 100644 test/tarantool-tests/fix-bit-shift-generation/CMakeLists.txt
> create mode 100644 test/tarantool-tests/fix-bit-shift-generation/libtestbitshift.c
>
<snipped>
> --
> 2.40.1
>
--
Best regards,
IM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation.
2023-06-09 9:13 Maxim Kokryashkin via Tarantool-patches
@ 2023-06-09 9:50 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 6+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-06-09 9:50 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the fixes!
About the patch itself: just 2 typos in the comments to the test.
But I see [1] faliling tests on M1 with enabled JIT *.
May be related to the [2].
Anyway, good to see, that test shows something :).
*: I suggest to use `test:is()` instead of `test:ok()` to show `got`
`expected` values.
On 09.06.23, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
<snipped>
> +
> +for i = 1, NITERATIONS do
> + -- Rotation is performed beyond the 32-bit size, for truncation
Typo: s/Rotation/The rotation/
> + -- to become noticeable. Sprintf is used to ensure that the
> + -- result of rotation goes into the `rcx`, corresponing to
Typo: s/corresponing/corresponding/
> + -- the x86_64 ABI.
<snipped>
> --
> 2.40.1
>
[1]: https://github.com/tarantool/luajit/actions/runs/5210890238/jobs/9402526109#step:7:907
[2]: https://github.com/tarantool/tarantool/issues/6097
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation.
@ 2023-06-09 9:13 Maxim Kokryashkin via Tarantool-patches
2023-06-09 9:50 ` Sergey Kaplun via Tarantool-patches
0 siblings, 1 reply; 6+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-06-09 9:13 UTC (permalink / raw)
To: tarantool-patches, skaplun, sergeyb
From: Mike Pall <mike>
Reported by Philipp Kutin.
Fix contributed by Peter Cawley.
(cherry-picked from commit 03a7ebca4f6819658cdaa12ba3af54a17b8035e9)
In a situation where a variable shift left bitwise rotation that
has a 64-bit result is recorded on an x86 64-bit processor and
the result is supposed to end up in the `rcx` register, that value
could be written into the `ecx` instead, thus being truncated into
32 bits. This patch fixes the described behavior, so now that
value is written into the `rcx`.
Resulting assembly changes from the following before the patch:
| rol rsi, cl
| mov ecx, esi
to the following after the patch:
| rol rsi, cl
| mov rcx, rsi
Importantly, the same behavior is impossible with the right
rotation on machines with BMI2 support because there is a
BMI2 instruction for it, so it is handled differently.
Maxim Kokryashkin:
* added the description and the test for the problem
Part of tarantool/tarantool#8516
---
Changes in v2:
- Fixed comments as per review by Sergey Kaplun
PR: https://github.com/tarantool/tarantool/pull/8727
Branch: https://github.com/tarantool/luajit/tree/fckxorg/fix-bit-shift-generation
src/lj_asm_x86.h | 2 +-
.../fix-bit-shift-generation.test.lua | 50 +++++++++++++++++++
2 files changed, 51 insertions(+), 1 deletion(-)
create mode 100644 test/tarantool-tests/fix-bit-shift-generation.test.lua
diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index e6c42c6d..63d332ca 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -2328,7 +2328,7 @@ static void asm_bitshift(ASMState *as, IRIns *ir, x86Shift xs, x86Op xv)
dest = ra_dest(as, ir, rset_exclude(RSET_GPR, RID_ECX));
if (dest == RID_ECX) {
dest = ra_scratch(as, rset_exclude(RSET_GPR, RID_ECX));
- emit_rr(as, XO_MOV, RID_ECX, dest);
+ emit_rr(as, XO_MOV, REX_64IR(ir, RID_ECX), dest);
}
right = irr->r;
if (ra_noreg(right))
diff --git a/test/tarantool-tests/fix-bit-shift-generation.test.lua b/test/tarantool-tests/fix-bit-shift-generation.test.lua
new file mode 100644
index 00000000..9f14a9e3
--- /dev/null
+++ b/test/tarantool-tests/fix-bit-shift-generation.test.lua
@@ -0,0 +1,50 @@
+local tap = require('tap')
+local test = tap.test('fix-bit-shift-generation'):skipcond({
+ ['Test requires JIT enabled'] = not jit.status(),
+})
+
+local NITERATIONS = 4
+local NTESTS = NITERATIONS * 2
+
+test:plan(NTESTS)
+
+local ffi = require('ffi')
+local bit = require('bit')
+local rol = bit.rol
+local shl = bit.lshift
+
+ffi.cdef('int snprintf(char *str, size_t n, const char *format, ...);')
+
+-- Buffer size is adjsuted to fit `(1 << 36)`,
+-- which has exactly 11 digits.
+local BUFFER_SIZE = 12
+local bufs = {}
+for i = 1, NTESTS do
+ bufs[i] = ffi.new('char[?]', BUFFER_SIZE)
+end
+
+local result = {}
+jit.opt.start('hotloop=1')
+
+for i = 1, NITERATIONS do
+ -- Rotation is performed beyond the 32-bit size, for truncation
+ -- to become noticeable. Sprintf is used to ensure that the
+ -- result of rotation goes into the `rcx`, corresponing to
+ -- the x86_64 ABI.
+ result[i] = ffi.C.snprintf(bufs[i], BUFFER_SIZE, '%llu', rol(1ULL, i + 32))
+ -- Resulting assembly for the `rol` instruction above changes
+ -- from the following before the patch:
+ -- | rol rsi, cl
+ -- | mov ecx, esi
+ --
+ -- to the following after the patch:
+ -- | rol rsi, cl
+ -- | mov rcx, rsi
+end
+
+for i = 1, NITERATIONS do
+ test:ok(result[i] > 1, '64-bit value was not truncated')
+ test:ok(tonumber(ffi.string(bufs[i])) == shl(1ULL, i + 32), 'valid rol')
+end
+
+os.exit(test:check() and 0 or 1)
--
2.40.1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-04 17:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-13 12:42 [Tarantool-patches] [PATCH luajit v2] x64: Fix 64 bit shift code generation Maxim Kokryashkin via Tarantool-patches
2023-06-28 10:57 ` Sergey Kaplun via Tarantool-patches
2023-06-29 8:59 ` Igor Munkin via Tarantool-patches
2023-07-04 17:09 ` Igor Munkin via Tarantool-patches
-- strict thread matches above, loose matches on Subject: below --
2023-06-09 9:13 Maxim Kokryashkin via Tarantool-patches
2023-06-09 9:50 ` Sergey Kaplun via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox