From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id AECA6A18A06; Thu, 16 Jan 2025 16:19:40 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org AECA6A18A06 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1737033580; bh=LFBn2N4Wx1PEO+zcduPg1UastflDfo19T/H9QTTnJTI=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=nMQfryVxq4URYENmOQ5+2zkyk70Q4/3zFQW5bO23n4eaermukh/4YUHwPijE97A2D VjLeKNyk6gTXiwMHuf214kq6ECTnDa8SEV9moJ6Zs1zz/fq4R+EPd2ROUOVzJSfOIY wrAxJYPtcwTsJf3nWKLFwqfhLMtgBKBNOanDeXDY= Received: from send36.i.mail.ru (send36.i.mail.ru [89.221.237.131]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C6A12464E99 for ; Thu, 16 Jan 2025 16:19:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C6A12464E99 Received: by exim-smtp-6758d5575c-bbh8q with esmtpa (envelope-from ) id 1tYPmc-000000007wU-1qUW; Thu, 16 Jan 2025 16:19:39 +0300 Content-Type: multipart/alternative; boundary="------------sRcVRcppzDo01cJdRE0DgRzB" Message-ID: Date: Thu, 16 Jan 2025 16:19:33 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: <4cdba52a1ba1a1f2a8ccb4624f00fe156c3088c6.1736779534.git.skaplun@tarantool.org> <75dbc9ca-5332-42b4-93b8-471d27370feb@tarantool.org> In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9CAF828D4DCE9EB95A17155F9CDCF9140D15976B0CC7BB71C182A05F53808504098A2247D9F2372B13DE06ABAFEAF67051EE2088C2F41C2377834E4DBEAEC50E300AE4641C9F47C8B X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7387B291F9AC4D188EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637E893C22CB255350D8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D81CABE27A46E247876903821238BFE6C949E7DB399F600FBECC7F00164DA146DAFE8445B8C89999728AA50765F790063783E00425F71A4181389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC821E93C0F2A571C7BF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA73AA81AA40904B5D9A18204E546F3947C4E7D9683544204AF6E0066C2D8992A164AD6D5ED66289B523666184CF4C3C14F6136E347CC761E07725E5C173C3A84C3B74263D4D5690889BA3038C0950A5D36B5C8C57E37DE458B330BD67F2E7D9AF16D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE7994FE22CF3C16DE0731C566533BA786AA5CC5B56E945C8DA X-C1DE0DAB: 0D63561A33F958A5B1582BB0E07537C45002B1117B3ED69612CA034EDC5E0D571A1B8FE1FED62FE8823CB91A9FED034534781492E4B8EEADC0A73878EBD0941BBDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CF6D0DB343FA268A8DAD99AC12B84F9A82349CFB597703EC1F1A1BF703AAF9C67F1230ACE34F2D113D8239A89FD99B18759E032B48C9B5ECAE875985B8F583559F931A2F3CB0563B16111DC66A97D0BFE2913E6812662D5F2AB9AF64DB4688768036DF5FE9C0001AF333F2C28C22F508233FCF178C6DD14203 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojyistvkELW9p+1Pg7U4clIw== X-Mailru-Sender: 520A125C2F17F0B1E52FEF5D219D614061F1D4860E9D8CA091417EB218679B821A243F077AF58FA90152A3D17938EB451EB5A0BCEC6A560B3DDE9B364B0DF289BE2DA36745F2EEB5CEBA01FB949A1F1EEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------sRcVRcppzDo01cJdRE0DgRzB Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi, Sergey,  thanks for the fixes! LGTM On 15.01.2025 16:06, Sergey Kaplun wrote: > Hi, Sergey! > Thanks for the review! > Please consider my answers below. > > On 14.01.25, Sergey Bronnikov wrote: >> Hi, Sergey! >> >> Thanks for the patch! >> >> >> On 14.01.2025 14:06, Sergey Kaplun wrote: >>> From: Mike Pall >>> >>> See the discussion in the corresponding ticket for the rationale. >>> >>> (cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b) >>> >>> For the modulo operation, the arm64 VM uses `fmsub` [1] instruction, >>> which is the fused multiply-add (FMA [2]) operation (more precisely, >>> multiply-sub). Hence, it may produce different results compared to the >>> unfused one. This patch fixes the behaviour by using the unfused >>> instructions by default. However, the new JIT optimization flag (fma) is >>> introduced to make it possible to take advantage of the FMA >>> optimizations. >>> >>> Sergey Kaplun: >>> * added the description and the test for the problem >>> >>> [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB >>> [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation >>> >>> Part of tarantool/tarantool#10709 >>> --- > > >>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua >>> new file mode 100644 >>> index 00000000..55ec7b98 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua >>> @@ -0,0 +1,36 @@ >>> +local tap = require('tap') >>> + >>> +-- Test file to demonstrate consistent behaviour for JIT and the >>> +-- VM regarding FMA optimization (disabled by default). >>> +-- XXX: The VM behaviour is checked in the >>> +-- . >>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. >>> +local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({ >>> + ['Test requires JIT enabled'] = not jit.status(), >>> +}) >>> + >>> +test:plan(1) >>> + >>> +local _2pow52 = 2 ^ 52 >>> + >>> +-- IEEE754 components to double: >>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). >>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) >>> +assert(a == 2197541395358679800) >>> + >>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) >>> +assert(b == -1005065126.3690554) >>> + >> Please add a comment with explanation why exactly these testcases >> >> are used. >> >> As I got it right, the idea is to calculate negative and positive >> number, right? > I've added the corresponding comment to avoid confusion. Branch is > force-pushed. > =================================================================== > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > index 55ec7b98..8b16d4c3 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > @@ -13,6 +13,18 @@ test:plan(1) > > local _2pow52 = 2 ^ 52 > > +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] > +-- instruction for the modulo operation, which is the fused > +-- multiply-add (FMA [2]) operation (more precisely, > +-- multiply-sub). Hence, it may produce different results compared > +-- to the unfused one. For the test, let's just use 2 numbers in > +-- modulo for which the single rounding is different from the > +-- double rounding. The numbers from the original issue are good > +-- enough. > +-- > +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB > +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation > +-- > -- IEEE754 components to double: > -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > index a3775d6d..25b59707 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > @@ -11,6 +11,18 @@ test:plan(2) > > local _2pow52 = 2 ^ 52 > > +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] > +-- instruction for the modulo operation, which is the fused > +-- multiply-add (FMA [2]) operation (more precisely, > +-- multiply-sub). Hence, it may produce different results compared > +-- to the unfused one. For the test, let's just use 2 numbers in > +-- modulo for which the single rounding is different from the > +-- double rounding. The numbers from the original issue are good > +-- enough. > +-- > +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB > +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation > +-- > -- IEEE754 components to double: > -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > =================================================================== > >> Why do you think two examples are enough for testing that behavior for >> JIT and the VM >> >> is consistent? > Since we test for modulo operation consistency, I suppose it is enough > to check similar cases for the JIT and the VM. I distinguished them in > the separate files to avoid skipping the test for the VM when JIT is > disabled. > >> Should we check more corner cases? >> >> * Standard/Normal arithmetic >> * Subnormal arithmetic >> * Infinite arithmetic >> * NaN arithmetic >> * Zero arithmetic > All these checks are good but not really relevant to this particular > issue. I suppose we may continue this activity as a part of the > corresponding issue (please create one if it isn't created already), as > we discussed offline, with test vectors for floating point values. > >>> +local results = {} >>> + >>> +jit.opt.start('hotloop=1') >>> +for i = 1, 4 do >>> + results[i] = a % b >>> +end >>> + >>> +-- XXX: The test doesn't fail before the commit. But it is >> Please add a commit hash and it's short description. > We usually meand this particular commit in the tests (commit when test > is introduced). I rephrase it like the following to avoid confusion: > > =================================================================== > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > index 55ec7b98..8b16d4c3 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > @@ -28,7 +40,7 @@ for i = 1, 4 do > results[i] = a % b > end > > --- XXX: The test doesn't fail before the commit. But it is > +-- XXX: The test doesn't fail before this commit. But it is > -- required to be sure that there are no inconsistencies after the > -- commit. > test:samevalues(results, 'consistent behaviour between the JIT and the VM') > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > index a3775d6d..25b59707 100644 > --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > @@ -19,7 +31,7 @@ assert(a == 2197541395358679800) > local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) > assert(b == -1005065126.3690554) > > --- These tests fail on ARM64 before the patch or with FMA > +-- These tests fail on ARM64 before this patch or with FMA > -- optimization enabled. > -- The first test may not fail if the compiler doesn't generate > -- an ARM64 FMA operation in `lj_vm_foldarith()`. > =================================================================== > >>> +-- required to be sure that there are no inconsistencies after the >>> +-- commit. >>> +test:samevalues(results, 'consistent behaviour between the JIT and the VM') >>> + >>> +test:done(true) >>> diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua >>> new file mode 100644 >>> index 00000000..a3775d6d >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua >>> @@ -0,0 +1,31 @@ >>> +local tap = require('tap') >>> + >>> +-- Test file to demonstrate possible numerical inaccuracy if FMA >>> +-- optimization takes place. >> I suppose we don't need to test FMA itself, but we should >> >> check that FMA is actually enabled when it's option >> >> is enabled. Right? if yes I would merge test >> lj-918-fma-numerical-accuracy.test.lua >> >> and test lj-918-fma-optimization.test.lua. > I would rather avoid this since the FMA is more like the -mfma compiler > option and affects only the JIT behaviour and can be enabled for the > performance reason. I used this canary test to check that this option > exists. > >> >>> +-- XXX: The JIT consistency is checked in the >>> +-- . >>> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. >>> +local test = tap.test('lj-918-fma-numerical-accuracy') >>> + >>> +test:plan(2) >>> + >>> +local _2pow52 = 2 ^ 52 >>> + >>> +-- IEEE754 components to double: >>> +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). >>> +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) >>> +assert(a == 2197541395358679800) >>> + >>> +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) >>> +assert(b == -1005065126.3690554) >> The same questions as above. > Added the comment. > >>> + >>> +-- These tests fail on ARM64 before the patch or with FMA >>> +-- optimization enabled. >>> +-- The first test may not fail if the compiler doesn't generate >>> +-- an ARM64 FMA operation in `lj_vm_foldarith()`. >>> +test:is(2197541395358679800 % -1005065126.3690554, -606337536, >>> + 'FMA in the lj_vm_foldarith() during parsing') >>> + >>> +test:is(a % b, -606337536, 'FMA in the VM') >>> + >>> +test:done(true) >>> diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua >>> new file mode 100644 >>> index 00000000..af749eb5 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua >>> @@ -0,0 +1,25 @@ >>> +local tap = require('tap') >>> +local test = tap.test('lj-918-fma-optimization'):skipcond({ >>> + ['Test requires JIT enabled'] = not jit.status(), >>> +}) >>> + >>> +test:plan(3) >>> + >>> +local function jit_opt_is_on(needed) >> why `needed` and not something like "flag"? > Replaced with `flag`: > > =================================================================== > diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua > index af749eb5..9396e558 100644 > --- a/test/tarantool-tests/lj-918-fma-optimization.test.lua > +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua > @@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({ > > test:plan(3) > > -local function jit_opt_is_on(needed) > +local function jit_opt_is_on(flag) > for _, opt in ipairs({jit.status()}) do > - if opt == needed then > + if opt == flag then > return true > end > end > =================================================================== > >>> + for _, opt in ipairs({jit.status()}) do >>> + if opt == needed then >>> + return true >>> + end >>> + end > > --------------sRcVRcppzDo01cJdRE0DgRzB Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi, Sergey,

 thanks for the fixes! LGTM

On 15.01.2025 16:06, Sergey Kaplun wrote:
Hi, Sergey!
Thanks for the review!
Please consider my answers below.

On 14.01.25, Sergey Bronnikov wrote:
Hi, Sergey!

Thanks for the patch!


On 14.01.2025 14:06, Sergey Kaplun wrote:
From: Mike Pall <mike>

See the discussion in the corresponding ticket for the rationale.

(cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b)

For the modulo operation, the arm64 VM uses `fmsub` [1] instruction,
which is the fused multiply-add (FMA [2]) operation (more precisely,
multiply-sub). Hence, it may produce different results compared to the
unfused one. This patch fixes the behaviour by using the unfused
instructions by default. However, the new JIT optimization flag (fma) is
introduced to make it possible to take advantage of the FMA
optimizations.

Sergey Kaplun:
* added the description and the test for the problem

[1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
[2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation

Part of tarantool/tarantool#10709
---
<snipped>

diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
new file mode 100644
index 00000000..55ec7b98
--- /dev/null
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
@@ -0,0 +1,36 @@
+local tap = require('tap')
+
+-- Test file to demonstrate consistent behaviour for JIT and the
+-- VM regarding FMA optimization (disabled by default).
+-- XXX: The VM behaviour is checked in the
+-- <lj-918-fma-numerical-accuracy.test.lua>.
+-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
+local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(1)
+
+local _2pow52 = 2 ^ 52
+
+-- IEEE754 components to double:
+-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
+local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
+assert(a == 2197541395358679800)
+
+local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
+assert(b == -1005065126.3690554)
+
Please add a comment with explanation why exactly these testcases

are used.

As I got it right, the idea is to calculate negative and positive 
number, right?
I've added the corresponding comment to avoid confusion. Branch is
force-pushed.
===================================================================
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
index 55ec7b98..8b16d4c3 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
@@ -13,6 +13,18 @@ test:plan(1)
 
 local _2pow52 = 2 ^ 52
 
+-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
+-- instruction for the modulo operation, which is the fused
+-- multiply-add (FMA [2]) operation (more precisely,
+-- multiply-sub). Hence, it may produce different results compared
+-- to the unfused one. For the test, let's just use 2 numbers in
+-- modulo for which the single rounding is different from the
+-- double rounding. The numbers from the original issue are good
+-- enough.
+--
+-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
+-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
+--
 -- IEEE754 components to double:
 -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
 local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
index a3775d6d..25b59707 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
@@ -11,6 +11,18 @@ test:plan(2)
 
 local _2pow52 = 2 ^ 52
 
+-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1]
+-- instruction for the modulo operation, which is the fused
+-- multiply-add (FMA [2]) operation (more precisely,
+-- multiply-sub). Hence, it may produce different results compared
+-- to the unfused one. For the test, let's just use 2 numbers in
+-- modulo for which the single rounding is different from the
+-- double rounding. The numbers from the original issue are good
+-- enough.
+--
+-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
+-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
+--
 -- IEEE754 components to double:
 -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
 local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
===================================================================

Why do you think two examples are enough for testing that behavior for 
JIT and the VM

is consistent?
Since we test for modulo operation consistency, I suppose it is enough
to check similar cases for the JIT and the VM. I distinguished them in
the separate files to avoid skipping the test for the VM when JIT is
disabled.

Should we check more corner cases?

  * Standard/Normal arithmetic
  * Subnormal arithmetic
  * Infinite arithmetic
  * NaN arithmetic
  * Zero arithmetic
All these checks are good but not really relevant to this particular
issue. I suppose we may continue this activity as a part of the
corresponding issue (please create one if it isn't created already), as
we discussed offline, with test vectors for floating point values.


        
+local results = {}
+
+jit.opt.start('hotloop=1')
+for i = 1, 4 do
+  results[i] = a % b
+end
+
+-- XXX: The test doesn't fail before the commit. But it is
Please add a commit hash and it's short description.
We usually meand this particular commit in the tests (commit when test
is introduced). I rephrase it like the following to avoid confusion:

===================================================================
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
index 55ec7b98..8b16d4c3 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua
@@ -28,7 +40,7 @@ for i = 1, 4 do
   results[i] = a % b
 end
 
--- XXX: The test doesn't fail before the commit. But it is
+-- XXX: The test doesn't fail before this commit. But it is
 -- required to be sure that there are no inconsistencies after the
 -- commit.
 test:samevalues(results, 'consistent behaviour between the JIT and the VM')
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
index a3775d6d..25b59707 100644
--- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
@@ -19,7 +31,7 @@ assert(a == 2197541395358679800)
 local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
 assert(b == -1005065126.3690554)
 
--- These tests fail on ARM64 before the patch or with FMA
+-- These tests fail on ARM64 before this patch or with FMA
 -- optimization enabled.
 -- The first test may not fail if the compiler doesn't generate
 -- an ARM64 FMA operation in `lj_vm_foldarith()`.
===================================================================

+-- required to be sure that there are no inconsistencies after the
+-- commit.
+test:samevalues(results, 'consistent behaviour between the JIT and the VM')
+
+test:done(true)
diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
new file mode 100644
index 00000000..a3775d6d
--- /dev/null
+++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua
@@ -0,0 +1,31 @@
+local tap = require('tap')
+
+-- Test file to demonstrate possible numerical inaccuracy if FMA
+-- optimization takes place.
I suppose we don't need to test FMA itself, but we should

check that FMA is actually enabled when it's option

is enabled. Right? if yes I would merge test 
lj-918-fma-numerical-accuracy.test.lua

and test lj-918-fma-optimization.test.lua.
I would rather avoid this since the FMA is more like the -mfma compiler
option and affects only the JIT behaviour and can be enabled for the
performance reason. I used this canary test to check that this option
exists.


+-- XXX: The JIT consistency is checked in the
+-- <lj-918-fma-numerical-accuracy-jit.test.lua>.
+-- See also:https://github.com/LuaJIT/LuaJIT/issues/918.
+local test = tap.test('lj-918-fma-numerical-accuracy')
+
+test:plan(2)
+
+local _2pow52 = 2 ^ 52
+
+-- IEEE754 components to double:
+-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal).
+local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1)
+assert(a == 2197541395358679800)
+
+local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1)
+assert(b == -1005065126.3690554)
The same questions as above.
Added the comment.

+
+-- These tests fail on ARM64 before the patch or with FMA
+-- optimization enabled.
+-- The first test may not fail if the compiler doesn't generate
+-- an ARM64 FMA operation in `lj_vm_foldarith()`.
+test:is(2197541395358679800 % -1005065126.3690554, -606337536,
+        'FMA in the lj_vm_foldarith() during parsing')
+
+test:is(a % b, -606337536, 'FMA in the VM')
+
+test:done(true)
diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
new file mode 100644
index 00000000..af749eb5
--- /dev/null
+++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
@@ -0,0 +1,25 @@
+local tap = require('tap')
+local test = tap.test('lj-918-fma-optimization'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(3)
+
+local function jit_opt_is_on(needed)
why `needed` and not something like "flag"?
Replaced with `flag`:

===================================================================
diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua
index af749eb5..9396e558 100644
--- a/test/tarantool-tests/lj-918-fma-optimization.test.lua
+++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua
@@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({
 
 test:plan(3)
 
-local function jit_opt_is_on(needed)
+local function jit_opt_is_on(flag)
   for _, opt in ipairs({jit.status()}) do
-    if opt == needed then
+    if opt == flag then
       return true
     end
   end
===================================================================

+  for _, opt in ipairs({jit.status()}) do
+    if opt == needed then
+      return true
+    end
+  end
<snipped>

--------------sRcVRcppzDo01cJdRE0DgRzB--