Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable.

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Bronnikov <sergeyb@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable.
Date: Wed, 15 Jan 2025 16:06:19 +0300	[thread overview]
Message-ID: <Z4eyy5LqtO5AT900@root> (raw)
In-Reply-To: <75dbc9ca-5332-42b4-93b8-471d27370feb@tarantool.org>

Hi, Sergey!
Thanks for the review!
Please consider my answers below.

On 14.01.25, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> Thanks for the patch!
> 
> 
> On 14.01.2025 14:06, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > See the discussion in the corresponding ticket for the rationale.
> >
> > (cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b)
> >
> > For the modulo operation, the arm64 VM uses `fmsub` [1] instruction,
> > which is the fused multiply-add (FMA [2]) operation (more precisely,
> > multiply-sub). Hence, it may produce different results compared to the
> > unfused one. This patch fixes the behaviour by using the unfused
> > instructions by default. However, the new JIT optimization flag (fma) is
> > introduced to make it possible to take advantage of the FMA
> > optimizations.
> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
> > [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
> >
> > Part of tarantool/tarantool#10709
> > ---

<snipped>

> > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> > new file mode 100644
> > index 00000000..55ec7b98
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
> > @@ -0,0 +1,36 @@
> > +local tap = require('tap')
> > +
> > +-- Test file to demonstrate consistent behaviour for JIT and the
> > +-- VM regarding FMA optimization (disabled by default).
> > +-- XXX: The VM behaviour is checked in the
> > +-- <lj-918-fma-numerical-accuracy.test.lua>.
> > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
> > +local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({
> > +  ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(1)
> > +
> > +local _2pow52 = 2 ^ 52
> > +
> > +-- IEEE754 components to double:
> > +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
> > +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
> > +assert(a == 2197541395358679800)
> > +
> > +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
> > +assert(b == -1005065126.3690554)
> > +
> 
> Please add a comment with explanation why exactly these testcases
> 
> are used.
> 
> As I got it right, the idea is to calculate negative and positive 
> number, right?

I've added the corresponding comment to avoid confusion. Branch is
force-pushed.
===================================================================
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
index 55ec7b98..8b16d4c3 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
@@ -13,6 +13,18 @@ test:plan(1)
 
 local _2pow52 = 2 ^ 52
 
+-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
+-- instruction for the modulo operation, which is the fused
+-- multiply-add (FMA [2]) operation (more precisely,
+-- multiply-sub). Hence, it may produce different results compared
+-- to the unfused one. For the test, let's just use 2 numbers in
+-- modulo for which the single rounding is different from the
+-- double rounding. The numbers from the original issue are good
+-- enough.
+--
+-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
+-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
+--
 -- IEEE754 components to double:
 -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
 local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
index a3775d6d..25b59707 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
@@ -11,6 +11,18 @@ test:plan(2)
 
 local _2pow52 = 2 ^ 52
 
+-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
+-- instruction for the modulo operation, which is the fused
+-- multiply-add (FMA [2]) operation (more precisely,
+-- multiply-sub). Hence, it may produce different results compared
+-- to the unfused one. For the test, let's just use 2 numbers in
+-- modulo for which the single rounding is different from the
+-- double rounding. The numbers from the original issue are good
+-- enough.
+--
+-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
+-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
+--
 -- IEEE754 components to double:
 -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
 local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
===================================================================

> 
> Why do you think two examples are enough for testing that behavior for 
> JIT and the VM
> 
> is consistent?

Since we test for modulo operation consistency, I suppose it is enough
to check similar cases for the JIT and the VM. I distinguished them in
the separate files to avoid skipping the test for the VM when JIT is
disabled.

> 
> Should we check more corner cases?
> 
>   * Standard/Normal arithmetic
>   * Subnormal arithmetic
>   * Infinite arithmetic
>   * NaN arithmetic
>   * Zero arithmetic

All these checks are good but not really relevant to this particular
issue. I suppose we may continue this activity as a part of the
corresponding issue (please create one if it isn't created already), as
we discussed offline, with test vectors for floating point values.

> 
> > +local results = {}
> > +
> > +jit.opt.start('hotloop=1')
> > +for i = 1, 4 do
> > +  results[i] = a % b
> > +end
> > +
> > +-- XXX: The test doesn't fail before the commit. But it is
> Please add a commit hash and it's short description.

We usually meand this particular commit in the tests (commit when test
is introduced). I rephrase it like the following to avoid confusion:

===================================================================
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
index 55ec7b98..8b16d4c3 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
@@ -28,7 +40,7 @@ for i = 1, 4 do
   results[i] = a % b
 end
 
--- XXX: The test doesn't fail before the commit. But it is
+-- XXX: The test doesn't fail before this commit. But it is
 -- required to be sure that there are no inconsistencies after the
 -- commit.
 test:samevalues(results, 'consistent behaviour between the JIT and the VM')
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
index a3775d6d..25b59707 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
@@ -19,7 +31,7 @@ assert(a == 2197541395358679800)
 local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
 assert(b == -1005065126.3690554)
 
--- These tests fail on ARM64 before the patch or with FMA
+-- These tests fail on ARM64 before this patch or with FMA
 -- optimization enabled.
 -- The first test may not fail if the compiler doesn't generate
 -- an ARM64 FMA operation in `lj_vm_foldarith()`.
===================================================================

> > +-- required to be sure that there are no inconsistencies after the
> > +-- commit.
> > +test:samevalues(results, 'consistent behaviour between the JIT and the VM')
> > +
> > +test:done(true)
> > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> > new file mode 100644
> > index 00000000..a3775d6d
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
> > @@ -0,0 +1,31 @@
> > +local tap = require('tap')
> > +
> > +-- Test file to demonstrate possible numerical inaccuracy if FMA
> > +-- optimization takes place.
> 
> I suppose we don't need to test FMA itself, but we should
> 
> check that FMA is actually enabled when it's option
> 
> is enabled. Right? if yes I would merge test 
> lj-918-fma-numerical-accuracy.test.lua
> 
> and test lj-918-fma-optimization.test.lua.

I would rather avoid this since the FMA is more like the -mfma compiler
option and affects only the JIT behaviour and can be enabled for the
performance reason. I used this canary test to check that this option
exists.

> 
> 
> > +-- XXX: The JIT consistency is checked in the
> > +-- <lj-918-fma-numerical-accuracy-jit.test.lua>.
> > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
> > +local test = tap.test('lj-918-fma-numerical-accuracy')
> > +
> > +test:plan(2)
> > +
> > +local _2pow52 = 2 ^ 52
> > +
> > +-- IEEE754 components to double:
> > +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
> > +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
> > +assert(a == 2197541395358679800)
> > +
> > +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
> > +assert(b == -1005065126.3690554)
> The same questions as above.

Added the comment.

> > +
> > +-- These tests fail on ARM64 before the patch or with FMA
> > +-- optimization enabled.
> > +-- The first test may not fail if the compiler doesn't generate
> > +-- an ARM64 FMA operation in `lj_vm_foldarith()`.
> > +test:is(2197541395358679800 % -1005065126.3690554, -606337536,
> > +        'FMA in the lj_vm_foldarith() during parsing')
> > +
> > +test:is(a % b, -606337536, 'FMA in the VM')
> > +
> > +test:done(true)
> > diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
> > new file mode 100644
> > index 00000000..af749eb5
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
> > @@ -0,0 +1,25 @@
> > +local tap = require('tap')
> > +local test = tap.test('lj-918-fma-optimization'):skipcond({
> > +  ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(3)
> > +
> > +local function jit_opt_is_on(needed)
> why `needed` and not something like "flag"?

Replaced with `flag`:

===================================================================
diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
index af749eb5..9396e558 100644
--- a/test/tarantool-tests/lj-918-fma-optimization.test.lua
+++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
@@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({
 
 test:plan(3)
 
-local function jit_opt_is_on(needed)
+local function jit_opt_is_on(flag)
   for _, opt in ipairs({jit.status()}) do
-    if opt == needed then
+    if opt == flag then
       return true
     end
   end
===================================================================

> > +  for _, opt in ipairs({jit.status()}) do
> > +    if opt == needed then
> > +      return true
> > +    end
> > +  end

<snipped>

-- 
Best regards,
Sergey Kaplun

next prev parent reply	other threads:[~2025-01-15 13:07 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-14 11:06 [Tarantool-patches] [PATCH luajit 0/2] Refactoring and FMA optimizations Sergey Kaplun via Tarantool-patches
2025-01-14 11:06 ` [Tarantool-patches] [PATCH luajit 1/2] Cleanup CPU detection and tuning for old CPUs Sergey Kaplun via Tarantool-patches
2025-01-14 11:25   ` Sergey Bronnikov via Tarantool-patches
2025-01-15 13:10     ` Sergey Kaplun via Tarantool-patches
2025-01-16 12:47       ` Sergey Bronnikov via Tarantool-patches
2025-01-14 11:06 ` [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable Sergey Kaplun via Tarantool-patches
2025-01-14 12:45   ` Sergey Bronnikov via Tarantool-patches
2025-01-15 13:06     ` Sergey Kaplun via Tarantool-patches [this message]
2025-01-16 13:19       ` Sergey Bronnikov via Tarantool-patches
2025-01-31  9:28 ` [Tarantool-patches] [PATCH luajit 0/2] Refactoring and FMA optimizations Sergey Kaplun via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4eyy5LqtO5AT900@root \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=skaplun@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox