Hi, Sergey,

  thanks for the fixes! LGTM

On 15.01.2025 16:06, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
> Please consider my answers below.
>
> On 14.01.25, Sergey Bronnikov wrote:
>> Hi, Sergey!
>>
>> Thanks for the patch!
>>
>>
>> On 14.01.2025 14:06, Sergey Kaplun wrote:
>>> From: Mike Pall <mike>
>>>
>>> See the discussion in the corresponding ticket for the rationale.
>>>
>>> (cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b)
>>>
>>> For the modulo operation, the arm64 VM uses `fmsub` [1] instruction,
>>> which is the fused multiply-add (FMA [2]) operation (more precisely,
>>> multiply-sub). Hence, it may produce different results compared to the
>>> unfused one. This patch fixes the behaviour by using the unfused
>>> instructions by default. However, the new JIT optimization flag (fma) is
>>> introduced to make it possible to take advantage of the FMA
>>> optimizations.
>>>
>>> Sergey Kaplun:
>>> * added the description and the test for the problem
>>>
>>> [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
>>> [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
>>>
>>> Part of tarantool/tarantool#10709
>>> ---
> <snipped>
>
>>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
>>> new file mode 100644
>>> index 00000000..55ec7b98
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
>>> @@ -0,0 +1,36 @@
>>> +local tap = require('tap')
>>> +
>>> +-- Test file to demonstrate consistent behaviour for JIT and the
>>> +-- VM regarding FMA optimization (disabled by default).
>>> +-- XXX: The VM behaviour is checked in the
>>> +-- <lj-918-fma-numerical-accuracy.test.lua>.
>>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
>>> +local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({
>>> +  ['Test requires JIT enabled'] = not jit.status(),
>>> +})
>>> +
>>> +test:plan(1)
>>> +
>>> +local _2pow52 = 2 ^ 52
>>> +
>>> +-- IEEE754 components to double:
>>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
>>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
>>> +assert(a == 2197541395358679800)
>>> +
>>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
>>> +assert(b == -1005065126.3690554)
>>> +
>> Please add a comment with explanation why exactly these testcases
>>
>> are used.
>>
>> As I got it right, the idea is to calculate negative and positive
>> number, right?
> I've added the corresponding comment to avoid confusion. Branch is
> force-pushed.
> ===================================================================
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> index 55ec7b98..8b16d4c3 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> @@ -13,6 +13,18 @@ test:plan(1)
>   
>   local _2pow52 = 2 ^ 52
>   
> +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
> +-- instruction for the modulo operation, which is the fused
> +-- multiply-add (FMA [2]) operation (more precisely,
> +-- multiply-sub). Hence, it may produce different results compared
> +-- to the unfused one. For the test, let's just use 2 numbers in
> +-- modulo for which the single rounding is different from the
> +-- double rounding. The numbers from the original issue are good
> +-- enough.
> +--
> +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
> +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
> +--
>   -- IEEE754 components to double:
>   -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
>   local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> index a3775d6d..25b59707 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> @@ -11,6 +11,18 @@ test:plan(2)
>   
>   local _2pow52 = 2 ^ 52
>   
> +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
> +-- instruction for the modulo operation, which is the fused
> +-- multiply-add (FMA [2]) operation (more precisely,
> +-- multiply-sub). Hence, it may produce different results compared
> +-- to the unfused one. For the test, let's just use 2 numbers in
> +-- modulo for which the single rounding is different from the
> +-- double rounding. The numbers from the original issue are good
> +-- enough.
> +--
> +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
> +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
> +--
>   -- IEEE754 components to double:
>   -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
>   local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
> ===================================================================
>
>> Why do you think two examples are enough for testing that behavior for
>> JIT and the VM
>>
>> is consistent?
> Since we test for modulo operation consistency, I suppose it is enough
> to check similar cases for the JIT and the VM. I distinguished them in
> the separate files to avoid skipping the test for the VM when JIT is
> disabled.
>
>> Should we check more corner cases?
>>
>>    * Standard/Normal arithmetic
>>    * Subnormal arithmetic
>>    * Infinite arithmetic
>>    * NaN arithmetic
>>    * Zero arithmetic
> All these checks are good but not really relevant to this particular
> issue. I suppose we may continue this activity as a part of the
> corresponding issue (please create one if it isn't created already), as
> we discussed offline, with test vectors for floating point values.
>
>>> +local results = {}
>>> +
>>> +jit.opt.start('hotloop=1')
>>> +for i = 1, 4 do
>>> +  results[i] = a % b
>>> +end
>>> +
>>> +-- XXX: The test doesn't fail before the commit. But it is
>> Please add a commit hash and it's short description.
> We usually meand this particular commit in the tests (commit when test
> is introduced). I rephrase it like the following to avoid confusion:
>
> ===================================================================
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> index 55ec7b98..8b16d4c3 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> @@ -28,7 +40,7 @@ for i = 1, 4 do
>     results[i] = a % b
>   end
>   
> --- XXX: The test doesn't fail before the commit. But it is
> +-- XXX: The test doesn't fail before this commit. But it is
>   -- required to be sure that there are no inconsistencies after the
>   -- commit.
>   test:samevalues(results, 'consistent behaviour between the JIT and the VM')
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> index a3775d6d..25b59707 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> @@ -19,7 +31,7 @@ assert(a == 2197541395358679800)
>   local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
>   assert(b == -1005065126.3690554)
>   
> --- These tests fail on ARM64 before the patch or with FMA
> +-- These tests fail on ARM64 before this patch or with FMA
>   -- optimization enabled.
>   -- The first test may not fail if the compiler doesn't generate
>   -- an ARM64 FMA operation in `lj_vm_foldarith()`.
> ===================================================================
>
>>> +-- required to be sure that there are no inconsistencies after the
>>> +-- commit.
>>> +test:samevalues(results, 'consistent behaviour between the JIT and the VM')
>>> +
>>> +test:done(true)
>>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
>>> new file mode 100644
>>> index 00000000..a3775d6d
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
>>> @@ -0,0 +1,31 @@
>>> +local tap = require('tap')
>>> +
>>> +-- Test file to demonstrate possible numerical inaccuracy if FMA
>>> +-- optimization takes place.
>> I suppose we don't need to test FMA itself, but we should
>>
>> check that FMA is actually enabled when it's option
>>
>> is enabled. Right? if yes I would merge test
>> lj-918-fma-numerical-accuracy.test.lua
>>
>> and test lj-918-fma-optimization.test.lua.
> I would rather avoid this since the FMA is more like the -mfma compiler
> option and affects only the JIT behaviour and can be enabled for the
> performance reason. I used this canary test to check that this option
> exists.
>
>>
>>> +-- XXX: The JIT consistency is checked in the
>>> +-- <lj-918-fma-numerical-accuracy-jit.test.lua>.
>>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
>>> +local test = tap.test('lj-918-fma-numerical-accuracy')
>>> +
>>> +test:plan(2)
>>> +
>>> +local _2pow52 = 2 ^ 52
>>> +
>>> +-- IEEE754 components to double:
>>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
>>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
>>> +assert(a == 2197541395358679800)
>>> +
>>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
>>> +assert(b == -1005065126.3690554)
>> The same questions as above.
> Added the comment.
>
>>> +
>>> +-- These tests fail on ARM64 before the patch or with FMA
>>> +-- optimization enabled.
>>> +-- The first test may not fail if the compiler doesn't generate
>>> +-- an ARM64 FMA operation in `lj_vm_foldarith()`.
>>> +test:is(2197541395358679800 % -1005065126.3690554, -606337536,
>>> +        'FMA in the lj_vm_foldarith() during parsing')
>>> +
>>> +test:is(a % b, -606337536, 'FMA in the VM')
>>> +
>>> +test:done(true)
>>> diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
>>> new file mode 100644
>>> index 00000000..af749eb5
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
>>> @@ -0,0 +1,25 @@
>>> +local tap = require('tap')
>>> +local test = tap.test('lj-918-fma-optimization'):skipcond({
>>> +  ['Test requires JIT enabled'] = not jit.status(),
>>> +})
>>> +
>>> +test:plan(3)
>>> +
>>> +local function jit_opt_is_on(needed)
>> why `needed` and not something like "flag"?
> Replaced with `flag`:
>
> ===================================================================
> diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
> index af749eb5..9396e558 100644
> --- a/test/tarantool-tests/lj-918-fma-optimization.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
> @@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({
>   
>   test:plan(3)
>   
> -local function jit_opt_is_on(needed)
> +local function jit_opt_is_on(flag)
>     for _, opt in ipairs({jit.status()}) do
> -    if opt == needed then
> +    if opt == flag then
>         return true
>       end
>     end
> ===================================================================
>
>>> +  for _, opt in ipairs({jit.status()}) do
>>> +    if opt == needed then
>>> +      return true
>>> +    end
>>> +  end
> <snipped>
>