Hi, Sergey,  thanks for the fixes! LGTM On 15.01.2025 16:06, Sergey Kaplun wrote: > Hi, Sergey! > Thanks for the review! > Please consider my answers below. > > On 14.01.25, Sergey Bronnikov wrote: >> Hi, Sergey! >> >> Thanks for the patch! >> >> >> On 14.01.2025 14:06, Sergey Kaplun wrote: >>> From: Mike Pall >>> >>> See the discussion in the corresponding ticket for the rationale. >>> >>> (cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b) >>> >>> For the modulo operation, the arm64 VM uses `fmsub` [1] instruction, >>> which is the fused multiply-add (FMA [2]) operation (more precisely, >>> multiply-sub). Hence, it may produce different results compared to the >>> unfused one. This patch fixes the behaviour by using the unfused >>> instructions by default. However, the new JIT optimization flag (fma) is >>> introduced to make it possible to take advantage of the FMA >>> optimizations. >>> >>> Sergey Kaplun: >>> * added the description and the test for the problem >>> >>> [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB >>> [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation >>> >>> Part of tarantool/tarantool#10709 >>> --- > > >>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua >>> new file mode 100644 >>> index 00000000..55ec7b98 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua >>> @@ -0,0 +1,36 @@ >>> +local tap = require('tap') >>> + >>> +-- Test file to demonstrate consistent behaviour for JIT and the >>> +-- VM regarding FMA optimization (disabled by default). >>> +-- XXX: The VM behaviour is checked in the >>> +-- . >>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. >>> +local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({ >>> + ['Test requires JIT enabled'] = not jit.status(), >>> +}) >>> + >>> +test:plan(1) >>> + >>> +local _2pow52 = 2 ^ 52 >>> + >>> +-- IEEE754 components to double: >>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). >>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) >>> +assert(a == 2197541395358679800) >>> + >>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) >>> +assert(b == -1005065126.3690554) >>> + >> Please add a comment with explanation why exactly these testcases >> >> are used. >> >> As I got it right, the idea is to calculate negative and positive >> number, right? > I've added the corresponding comment to avoid confusion. Branch is > force-pushed. > =================================================================== > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > index 55ec7b98..8b16d4c3 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > @@ -13,6 +13,18 @@ test:plan(1) > > local _2pow52 = 2 ^ 52 > > +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] > +-- instruction for the modulo operation, which is the fused > +-- multiply-add (FMA [2]) operation (more precisely, > +-- multiply-sub). Hence, it may produce different results compared > +-- to the unfused one. For the test, let's just use 2 numbers in > +-- modulo for which the single rounding is different from the > +-- double rounding. The numbers from the original issue are good > +-- enough. > +-- > +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB > +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation > +-- > -- IEEE754 components to double: > -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > index a3775d6d..25b59707 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > @@ -11,6 +11,18 @@ test:plan(2) > > local _2pow52 = 2 ^ 52 > > +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] > +-- instruction for the modulo operation, which is the fused > +-- multiply-add (FMA [2]) operation (more precisely, > +-- multiply-sub). Hence, it may produce different results compared > +-- to the unfused one. For the test, let's just use 2 numbers in > +-- modulo for which the single rounding is different from the > +-- double rounding. The numbers from the original issue are good > +-- enough. > +-- > +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB > +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation > +-- > -- IEEE754 components to double: > -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > =================================================================== > >> Why do you think two examples are enough for testing that behavior for >> JIT and the VM >> >> is consistent? > Since we test for modulo operation consistency, I suppose it is enough > to check similar cases for the JIT and the VM. I distinguished them in > the separate files to avoid skipping the test for the VM when JIT is > disabled. > >> Should we check more corner cases? >> >> * Standard/Normal arithmetic >> * Subnormal arithmetic >> * Infinite arithmetic >> * NaN arithmetic >> * Zero arithmetic > All these checks are good but not really relevant to this particular > issue. I suppose we may continue this activity as a part of the > corresponding issue (please create one if it isn't created already), as > we discussed offline, with test vectors for floating point values. > >>> +local results = {} >>> + >>> +jit.opt.start('hotloop=1') >>> +for i = 1, 4 do >>> + results[i] = a % b >>> +end >>> + >>> +-- XXX: The test doesn't fail before the commit. But it is >> Please add a commit hash and it's short description. > We usually meand this particular commit in the tests (commit when test > is introduced). I rephrase it like the following to avoid confusion: > > =================================================================== > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > index 55ec7b98..8b16d4c3 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > @@ -28,7 +40,7 @@ for i = 1, 4 do > results[i] = a % b > end > > --- XXX: The test doesn't fail before the commit. But it is > +-- XXX: The test doesn't fail before this commit. But it is > -- required to be sure that there are no inconsistencies after the > -- commit. > test:samevalues(results, 'consistent behaviour between the JIT and the VM') > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > index a3775d6d..25b59707 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > @@ -19,7 +31,7 @@ assert(a == 2197541395358679800) > local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) > assert(b == -1005065126.3690554) > > --- These tests fail on ARM64 before the patch or with FMA > +-- These tests fail on ARM64 before this patch or with FMA > -- optimization enabled. > -- The first test may not fail if the compiler doesn't generate > -- an ARM64 FMA operation in `lj_vm_foldarith()`. > =================================================================== > >>> +-- required to be sure that there are no inconsistencies after the >>> +-- commit. >>> +test:samevalues(results, 'consistent behaviour between the JIT and the VM') >>> + >>> +test:done(true) >>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua >>> new file mode 100644 >>> index 00000000..a3775d6d >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua >>> @@ -0,0 +1,31 @@ >>> +local tap = require('tap') >>> + >>> +-- Test file to demonstrate possible numerical inaccuracy if FMA >>> +-- optimization takes place. >> I suppose we don't need to test FMA itself, but we should >> >> check that FMA is actually enabled when it's option >> >> is enabled. Right? if yes I would merge test >> lj-918-fma-numerical-accuracy.test.lua >> >> and test lj-918-fma-optimization.test.lua. > I would rather avoid this since the FMA is more like the -mfma compiler > option and affects only the JIT behaviour and can be enabled for the > performance reason. I used this canary test to check that this option > exists. > >> >>> +-- XXX: The JIT consistency is checked in the >>> +-- . >>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. >>> +local test = tap.test('lj-918-fma-numerical-accuracy') >>> + >>> +test:plan(2) >>> + >>> +local _2pow52 = 2 ^ 52 >>> + >>> +-- IEEE754 components to double: >>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). >>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) >>> +assert(a == 2197541395358679800) >>> + >>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) >>> +assert(b == -1005065126.3690554) >> The same questions as above. > Added the comment. > >>> + >>> +-- These tests fail on ARM64 before the patch or with FMA >>> +-- optimization enabled. >>> +-- The first test may not fail if the compiler doesn't generate >>> +-- an ARM64 FMA operation in `lj_vm_foldarith()`. >>> +test:is(2197541395358679800 % -1005065126.3690554, -606337536, >>> + 'FMA in the lj_vm_foldarith() during parsing') >>> + >>> +test:is(a % b, -606337536, 'FMA in the VM') >>> + >>> +test:done(true) >>> diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua >>> new file mode 100644 >>> index 00000000..af749eb5 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua >>> @@ -0,0 +1,25 @@ >>> +local tap = require('tap') >>> +local test = tap.test('lj-918-fma-optimization'):skipcond({ >>> + ['Test requires JIT enabled'] = not jit.status(), >>> +}) >>> + >>> +test:plan(3) >>> + >>> +local function jit_opt_is_on(needed) >> why `needed` and not something like "flag"? > Replaced with `flag`: > > =================================================================== > diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua > index af749eb5..9396e558 100644 > --- a/test/tarantool-tests/lj-918-fma-optimization.test.lua > +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua > @@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({ > > test:plan(3) > > -local function jit_opt_is_on(needed) > +local function jit_opt_is_on(flag) > for _, opt in ipairs({jit.status()}) do > - if opt == needed then > + if opt == flag then > return true > end > end > =================================================================== > >>> + for _, opt in ipairs({jit.status()}) do >>> + if opt == needed then >>> + return true >>> + end >>> + end > >