From: Sergey Bronnikov via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Kaplun <skaplun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable.
Date: Thu, 16 Jan 2025 16:19:33 +0300 [thread overview]
Message-ID: <d2874f66-e25d-4277-bcae-41416e65c2f6@tarantool.org> (raw)
In-Reply-To: <Z4eyy5LqtO5AT900@root>
[-- Attachment #1: Type: text/plain, Size: 11788 bytes --]
Hi, Sergey,
thanks for the fixes! LGTM
On 15.01.2025 16:06, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
> Please consider my answers below.
>
> On 14.01.25, Sergey Bronnikov wrote:
>> Hi, Sergey!
>>
>> Thanks for the patch!
>>
>>
>> On 14.01.2025 14:06, Sergey Kaplun wrote:
>>> From: Mike Pall <mike>
>>>
>>> See the discussion in the corresponding ticket for the rationale.
>>>
>>> (cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b)
>>>
>>> For the modulo operation, the arm64 VM uses `fmsub` [1] instruction,
>>> which is the fused multiply-add (FMA [2]) operation (more precisely,
>>> multiply-sub). Hence, it may produce different results compared to the
>>> unfused one. This patch fixes the behaviour by using the unfused
>>> instructions by default. However, the new JIT optimization flag (fma) is
>>> introduced to make it possible to take advantage of the FMA
>>> optimizations.
>>>
>>> Sergey Kaplun:
>>> * added the description and the test for the problem
>>>
>>> [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
>>> [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
>>>
>>> Part of tarantool/tarantool#10709
>>> ---
> <snipped>
>
>>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
>>> new file mode 100644
>>> index 00000000..55ec7b98
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
>>> @@ -0,0 +1,36 @@
>>> +local tap = require('tap')
>>> +
>>> +-- Test file to demonstrate consistent behaviour for JIT and the
>>> +-- VM regarding FMA optimization (disabled by default).
>>> +-- XXX: The VM behaviour is checked in the
>>> +-- <lj-918-fma-numerical-accuracy.test.lua>.
>>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
>>> +local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({
>>> + ['Test requires JIT enabled'] = not jit.status(),
>>> +})
>>> +
>>> +test:plan(1)
>>> +
>>> +local _2pow52 = 2 ^ 52
>>> +
>>> +-- IEEE754 components to double:
>>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
>>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
>>> +assert(a == 2197541395358679800)
>>> +
>>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
>>> +assert(b == -1005065126.3690554)
>>> +
>> Please add a comment with explanation why exactly these testcases
>>
>> are used.
>>
>> As I got it right, the idea is to calculate negative and positive
>> number, right?
> I've added the corresponding comment to avoid confusion. Branch is
> force-pushed.
> ===================================================================
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> index 55ec7b98..8b16d4c3 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> @@ -13,6 +13,18 @@ test:plan(1)
>
> local _2pow52 = 2 ^ 52
>
> +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
> +-- instruction for the modulo operation, which is the fused
> +-- multiply-add (FMA [2]) operation (more precisely,
> +-- multiply-sub). Hence, it may produce different results compared
> +-- to the unfused one. For the test, let's just use 2 numbers in
> +-- modulo for which the single rounding is different from the
> +-- double rounding. The numbers from the original issue are good
> +-- enough.
> +--
> +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
> +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
> +--
> -- IEEE754 components to double:
> -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
> local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> index a3775d6d..25b59707 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> @@ -11,6 +11,18 @@ test:plan(2)
>
> local _2pow52 = 2 ^ 52
>
> +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
> +-- instruction for the modulo operation, which is the fused
> +-- multiply-add (FMA [2]) operation (more precisely,
> +-- multiply-sub). Hence, it may produce different results compared
> +-- to the unfused one. For the test, let's just use 2 numbers in
> +-- modulo for which the single rounding is different from the
> +-- double rounding. The numbers from the original issue are good
> +-- enough.
> +--
> +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
> +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
> +--
> -- IEEE754 components to double:
> -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
> local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
> ===================================================================
>
>> Why do you think two examples are enough for testing that behavior for
>> JIT and the VM
>>
>> is consistent?
> Since we test for modulo operation consistency, I suppose it is enough
> to check similar cases for the JIT and the VM. I distinguished them in
> the separate files to avoid skipping the test for the VM when JIT is
> disabled.
>
>> Should we check more corner cases?
>>
>> * Standard/Normal arithmetic
>> * Subnormal arithmetic
>> * Infinite arithmetic
>> * NaN arithmetic
>> * Zero arithmetic
> All these checks are good but not really relevant to this particular
> issue. I suppose we may continue this activity as a part of the
> corresponding issue (please create one if it isn't created already), as
> we discussed offline, with test vectors for floating point values.
>
>>> +local results = {}
>>> +
>>> +jit.opt.start('hotloop=1')
>>> +for i = 1, 4 do
>>> + results[i] = a % b
>>> +end
>>> +
>>> +-- XXX: The test doesn't fail before the commit. But it is
>> Please add a commit hash and it's short description.
> We usually meand this particular commit in the tests (commit when test
> is introduced). I rephrase it like the following to avoid confusion:
>
> ===================================================================
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> index 55ec7b98..8b16d4c3 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> @@ -28,7 +40,7 @@ for i = 1, 4 do
> results[i] = a % b
> end
>
> --- XXX: The test doesn't fail before the commit. But it is
> +-- XXX: The test doesn't fail before this commit. But it is
> -- required to be sure that there are no inconsistencies after the
> -- commit.
> test:samevalues(results, 'consistent behaviour between the JIT and the VM')
> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> index a3775d6d..25b59707 100644
> --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> @@ -19,7 +31,7 @@ assert(a == 2197541395358679800)
> local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
> assert(b == -1005065126.3690554)
>
> --- These tests fail on ARM64 before the patch or with FMA
> +-- These tests fail on ARM64 before this patch or with FMA
> -- optimization enabled.
> -- The first test may not fail if the compiler doesn't generate
> -- an ARM64 FMA operation in `lj_vm_foldarith()`.
> ===================================================================
>
>>> +-- required to be sure that there are no inconsistencies after the
>>> +-- commit.
>>> +test:samevalues(results, 'consistent behaviour between the JIT and the VM')
>>> +
>>> +test:done(true)
>>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
>>> new file mode 100644
>>> index 00000000..a3775d6d
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
>>> @@ -0,0 +1,31 @@
>>> +local tap = require('tap')
>>> +
>>> +-- Test file to demonstrate possible numerical inaccuracy if FMA
>>> +-- optimization takes place.
>> I suppose we don't need to test FMA itself, but we should
>>
>> check that FMA is actually enabled when it's option
>>
>> is enabled. Right? if yes I would merge test
>> lj-918-fma-numerical-accuracy.test.lua
>>
>> and test lj-918-fma-optimization.test.lua.
> I would rather avoid this since the FMA is more like the -mfma compiler
> option and affects only the JIT behaviour and can be enabled for the
> performance reason. I used this canary test to check that this option
> exists.
>
>>
>>> +-- XXX: The JIT consistency is checked in the
>>> +-- <lj-918-fma-numerical-accuracy-jit.test.lua>.
>>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
>>> +local test = tap.test('lj-918-fma-numerical-accuracy')
>>> +
>>> +test:plan(2)
>>> +
>>> +local _2pow52 = 2 ^ 52
>>> +
>>> +-- IEEE754 components to double:
>>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
>>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
>>> +assert(a == 2197541395358679800)
>>> +
>>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
>>> +assert(b == -1005065126.3690554)
>> The same questions as above.
> Added the comment.
>
>>> +
>>> +-- These tests fail on ARM64 before the patch or with FMA
>>> +-- optimization enabled.
>>> +-- The first test may not fail if the compiler doesn't generate
>>> +-- an ARM64 FMA operation in `lj_vm_foldarith()`.
>>> +test:is(2197541395358679800 % -1005065126.3690554, -606337536,
>>> + 'FMA in the lj_vm_foldarith() during parsing')
>>> +
>>> +test:is(a % b, -606337536, 'FMA in the VM')
>>> +
>>> +test:done(true)
>>> diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
>>> new file mode 100644
>>> index 00000000..af749eb5
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
>>> @@ -0,0 +1,25 @@
>>> +local tap = require('tap')
>>> +local test = tap.test('lj-918-fma-optimization'):skipcond({
>>> + ['Test requires JIT enabled'] = not jit.status(),
>>> +})
>>> +
>>> +test:plan(3)
>>> +
>>> +local function jit_opt_is_on(needed)
>> why `needed` and not something like "flag"?
> Replaced with `flag`:
>
> ===================================================================
> diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
> index af749eb5..9396e558 100644
> --- a/test/tarantool-tests/lj-918-fma-optimization.test.lua
> +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
> @@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({
>
> test:plan(3)
>
> -local function jit_opt_is_on(needed)
> +local function jit_opt_is_on(flag)
> for _, opt in ipairs({jit.status()}) do
> - if opt == needed then
> + if opt == flag then
> return true
> end
> end
> ===================================================================
>
>>> + for _, opt in ipairs({jit.status()}) do
>>> + if opt == needed then
>>> + return true
>>> + end
>>> + end
> <snipped>
>
[-- Attachment #2: Type: text/html, Size: 14470 bytes --]
prev parent reply other threads:[~2025-01-16 13:19 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-14 11:06 [Tarantool-patches] [PATCH luajit 0/2] Refactoring and FMA optimizations Sergey Kaplun via Tarantool-patches
2025-01-14 11:06 ` [Tarantool-patches] [PATCH luajit 1/2] Cleanup CPU detection and tuning for old CPUs Sergey Kaplun via Tarantool-patches
2025-01-14 11:25 ` Sergey Bronnikov via Tarantool-patches
2025-01-15 13:10 ` Sergey Kaplun via Tarantool-patches
2025-01-16 12:47 ` Sergey Bronnikov via Tarantool-patches
2025-01-14 11:06 ` [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable Sergey Kaplun via Tarantool-patches
2025-01-14 12:45 ` Sergey Bronnikov via Tarantool-patches
2025-01-15 13:06 ` Sergey Kaplun via Tarantool-patches
2025-01-16 13:19 ` Sergey Bronnikov via Tarantool-patches [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d2874f66-e25d-4277-bcae-41416e65c2f6@tarantool.org \
--to=tarantool-patches@dev.tarantool.org \
--cc=sergeyb@tarantool.org \
--cc=skaplun@tarantool.org \
--subject='Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable.' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox