From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Sergey Bronnikov <sergeyb@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable. Date: Wed, 15 Jan 2025 16:06:19 +0300 [thread overview] Message-ID: <Z4eyy5LqtO5AT900@root> (raw) In-Reply-To: <75dbc9ca-5332-42b4-93b8-471d27370feb@tarantool.org> Hi, Sergey! Thanks for the review! Please consider my answers below. On 14.01.25, Sergey Bronnikov wrote: > Hi, Sergey! > > Thanks for the patch! > > > On 14.01.2025 14:06, Sergey Kaplun wrote: > > From: Mike Pall <mike> > > > > See the discussion in the corresponding ticket for the rationale. > > > > (cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b) > > > > For the modulo operation, the arm64 VM uses `fmsub` [1] instruction, > > which is the fused multiply-add (FMA [2]) operation (more precisely, > > multiply-sub). Hence, it may produce different results compared to the > > unfused one. This patch fixes the behaviour by using the unfused > > instructions by default. However, the new JIT optimization flag (fma) is > > introduced to make it possible to take advantage of the FMA > > optimizations. > > > > Sergey Kaplun: > > * added the description and the test for the problem > > > > [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB > > [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation > > > > Part of tarantool/tarantool#10709 > > --- <snipped> > > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > > new file mode 100644 > > index 00000000..55ec7b98 > > --- /dev/null > > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > > @@ -0,0 +1,36 @@ > > +local tap = require('tap') > > + > > +-- Test file to demonstrate consistent behaviour for JIT and the > > +-- VM regarding FMA optimization (disabled by default). > > +-- XXX: The VM behaviour is checked in the > > +-- <lj-918-fma-numerical-accuracy.test.lua>. > > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. > > +local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({ > > + ['Test requires JIT enabled'] = not jit.status(), > > +}) > > + > > +test:plan(1) > > + > > +local _2pow52 = 2 ^ 52 > > + > > +-- IEEE754 components to double: > > +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > > +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > > +assert(a == 2197541395358679800) > > + > > +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) > > +assert(b == -1005065126.3690554) > > + > > Please add a comment with explanation why exactly these testcases > > are used. > > As I got it right, the idea is to calculate negative and positive > number, right? I've added the corresponding comment to avoid confusion. Branch is force-pushed. =================================================================== diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua index 55ec7b98..8b16d4c3 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua @@ -13,6 +13,18 @@ test:plan(1) local _2pow52 = 2 ^ 52 +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] +-- instruction for the modulo operation, which is the fused +-- multiply-add (FMA [2]) operation (more precisely, +-- multiply-sub). Hence, it may produce different results compared +-- to the unfused one. For the test, let's just use 2 numbers in +-- modulo for which the single rounding is different from the +-- double rounding. The numbers from the original issue are good +-- enough. +-- +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation +-- -- IEEE754 components to double: -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua index a3775d6d..25b59707 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua @@ -11,6 +11,18 @@ test:plan(2) local _2pow52 = 2 ^ 52 +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] +-- instruction for the modulo operation, which is the fused +-- multiply-add (FMA [2]) operation (more precisely, +-- multiply-sub). Hence, it may produce different results compared +-- to the unfused one. For the test, let's just use 2 numbers in +-- modulo for which the single rounding is different from the +-- double rounding. The numbers from the original issue are good +-- enough. +-- +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation +-- -- IEEE754 components to double: -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) =================================================================== > > Why do you think two examples are enough for testing that behavior for > JIT and the VM > > is consistent? Since we test for modulo operation consistency, I suppose it is enough to check similar cases for the JIT and the VM. I distinguished them in the separate files to avoid skipping the test for the VM when JIT is disabled. > > Should we check more corner cases? > > * Standard/Normal arithmetic > * Subnormal arithmetic > * Infinite arithmetic > * NaN arithmetic > * Zero arithmetic All these checks are good but not really relevant to this particular issue. I suppose we may continue this activity as a part of the corresponding issue (please create one if it isn't created already), as we discussed offline, with test vectors for floating point values. > > > +local results = {} > > + > > +jit.opt.start('hotloop=1') > > +for i = 1, 4 do > > + results[i] = a % b > > +end > > + > > +-- XXX: The test doesn't fail before the commit. But it is > Please add a commit hash and it's short description. We usually meand this particular commit in the tests (commit when test is introduced). I rephrase it like the following to avoid confusion: =================================================================== diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua index 55ec7b98..8b16d4c3 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua @@ -28,7 +40,7 @@ for i = 1, 4 do results[i] = a % b end --- XXX: The test doesn't fail before the commit. But it is +-- XXX: The test doesn't fail before this commit. But it is -- required to be sure that there are no inconsistencies after the -- commit. test:samevalues(results, 'consistent behaviour between the JIT and the VM') diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua index a3775d6d..25b59707 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua @@ -19,7 +31,7 @@ assert(a == 2197541395358679800) local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) assert(b == -1005065126.3690554) --- These tests fail on ARM64 before the patch or with FMA +-- These tests fail on ARM64 before this patch or with FMA -- optimization enabled. -- The first test may not fail if the compiler doesn't generate -- an ARM64 FMA operation in `lj_vm_foldarith()`. =================================================================== > > +-- required to be sure that there are no inconsistencies after the > > +-- commit. > > +test:samevalues(results, 'consistent behaviour between the JIT and the VM') > > + > > +test:done(true) > > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > > new file mode 100644 > > index 00000000..a3775d6d > > --- /dev/null > > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > > @@ -0,0 +1,31 @@ > > +local tap = require('tap') > > + > > +-- Test file to demonstrate possible numerical inaccuracy if FMA > > +-- optimization takes place. > > I suppose we don't need to test FMA itself, but we should > > check that FMA is actually enabled when it's option > > is enabled. Right? if yes I would merge test > lj-918-fma-numerical-accuracy.test.lua > > and test lj-918-fma-optimization.test.lua. I would rather avoid this since the FMA is more like the -mfma compiler option and affects only the JIT behaviour and can be enabled for the performance reason. I used this canary test to check that this option exists. > > > > +-- XXX: The JIT consistency is checked in the > > +-- <lj-918-fma-numerical-accuracy-jit.test.lua>. > > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. > > +local test = tap.test('lj-918-fma-numerical-accuracy') > > + > > +test:plan(2) > > + > > +local _2pow52 = 2 ^ 52 > > + > > +-- IEEE754 components to double: > > +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > > +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > > +assert(a == 2197541395358679800) > > + > > +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) > > +assert(b == -1005065126.3690554) > The same questions as above. Added the comment. > > + > > +-- These tests fail on ARM64 before the patch or with FMA > > +-- optimization enabled. > > +-- The first test may not fail if the compiler doesn't generate > > +-- an ARM64 FMA operation in `lj_vm_foldarith()`. > > +test:is(2197541395358679800 % -1005065126.3690554, -606337536, > > + 'FMA in the lj_vm_foldarith() during parsing') > > + > > +test:is(a % b, -606337536, 'FMA in the VM') > > + > > +test:done(true) > > diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua > > new file mode 100644 > > index 00000000..af749eb5 > > --- /dev/null > > +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua > > @@ -0,0 +1,25 @@ > > +local tap = require('tap') > > +local test = tap.test('lj-918-fma-optimization'):skipcond({ > > + ['Test requires JIT enabled'] = not jit.status(), > > +}) > > + > > +test:plan(3) > > + > > +local function jit_opt_is_on(needed) > why `needed` and not something like "flag"? Replaced with `flag`: =================================================================== diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua index af749eb5..9396e558 100644 --- a/test/tarantool-tests/lj-918-fma-optimization.test.lua +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua @@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({ test:plan(3) -local function jit_opt_is_on(needed) +local function jit_opt_is_on(flag) for _, opt in ipairs({jit.status()}) do - if opt == needed then + if opt == flag then return true end end =================================================================== > > + for _, opt in ipairs({jit.status()}) do > > + if opt == needed then > > + return true > > + end > > + end <snipped> -- Best regards, Sergey Kaplun
prev parent reply other threads:[~2025-01-15 13:07 UTC|newest] Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-01-14 11:06 [Tarantool-patches] [PATCH luajit 0/2] Refactoring and FMA optimizations Sergey Kaplun via Tarantool-patches 2025-01-14 11:06 ` [Tarantool-patches] [PATCH luajit 1/2] Cleanup CPU detection and tuning for old CPUs Sergey Kaplun via Tarantool-patches 2025-01-14 11:25 ` Sergey Bronnikov via Tarantool-patches 2025-01-15 13:10 ` Sergey Kaplun via Tarantool-patches 2025-01-14 11:06 ` [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable Sergey Kaplun via Tarantool-patches 2025-01-14 12:45 ` Sergey Bronnikov via Tarantool-patches 2025-01-15 13:06 ` Sergey Kaplun via Tarantool-patches [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Z4eyy5LqtO5AT900@root \ --to=tarantool-patches@dev.tarantool.org \ --cc=sergeyb@tarantool.org \ --cc=skaplun@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable.' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox