From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A7AD9EE5749; Wed, 15 Jan 2025 16:07:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A7AD9EE5749 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1736946420; bh=0hwQNM18gSjQp4vxbzLWwJDRsv6dTJk4QVnw0Olkh5w=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=OmPjKQFxQP33p4LXh6CuvCjakbi6l6aSeFAsRbauYjgn9HQAD3a+ed6rYvJfQI8Ga LpjWfVuIcz729zI2PMgroYIDQ6W0wjdmlEPVyvTL8gkHe0wKllna4IUAy+dIZ0Wqtl TN/MX9hjiQmeadQWHxuQuNx8gLjG5rWUxZZbUsEI= Received: from send265.i.mail.ru (send265.i.mail.ru [95.163.59.104]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0716868978B for ; Wed, 15 Jan 2025 16:06:59 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 0716868978B Received: by exim-smtp-6758d5575c-zwhph with esmtpa (envelope-from ) id 1tY36n-000000009xq-3Rw1; Wed, 15 Jan 2025 16:06:58 +0300 Date: Wed, 15 Jan 2025 16:06:19 +0300 To: Sergey Bronnikov Cc: tarantool-patches@dev.tarantool.org Message-ID: References: <4cdba52a1ba1a1f2a8ccb4624f00fe156c3088c6.1736779534.git.skaplun@tarantool.org> <75dbc9ca-5332-42b4-93b8-471d27370feb@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <75dbc9ca-5332-42b4-93b8-471d27370feb@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9CAF828D4DCE9EB9524190862C3D49D721EC364D4A17DCD6C182A05F538085040CC3121A2F46B72CD3DE06ABAFEAF670565DB29FC34D6B71DED8EBF7919461A628A1CF08F9A3DEB66 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7E2331B2371EFE129EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006377CC130305260E47D8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D81E9C3ABC47E246C5F77C44B98BAC5F34ED202150675D1AA5CC7F00164DA146DAFE8445B8C89999728AA50765F79006372A3B24BF85B2E607389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC821E93C0F2A571C7BF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8BAEB924C2B054B06E75ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A5DA90608E13D80A795002B1117B3ED69625F0F462024E090EF09842853758E9E5823CB91A9FED034534781492E4B8EEADEEA082C9A12FE455BDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFC868DD70B2FA0FA57A2E1AFC47B0523FB45B8D891C9383F32B585211D3E753CA2A6269F78606922F8239A89FD99B1875E9FB37C8C8CD31B4C4279D680EBCAA21E49D5B05144F02CD111DC66A97D0BFE2913E6812662D5F2A5EAB5682573093F7837F15F2B5E4A70B33F2C28C22F508233FCF178C6DD14203 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojsala1BEF1WBU6PHDP9lrpw== X-DA7885C5: 61106A89B9A2C184F255D290C0D534F9839DB2FF51FE06628827E047A4C6A2E26225509A7F2548DE5B1A4C17EAA7BC4BEF2421ABFA55128DAF83EF9164C44C7E X-Mailru-Sender: 689FA8AB762F739381B31377CF4CA2193E88071E716347BD4C70143416B9BAD1BA2DC5593FF7070DE49D44BB4BD9522A059A1ED8796F048DB274557F927329BE89D5A3BC2B10C37545BD1C3CC395C826B4A721A3011E896F X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] Disable FMA by default. Use -Ofma or jit.opt.start("+fma") to enable. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Sergey! Thanks for the review! Please consider my answers below. On 14.01.25, Sergey Bronnikov wrote: > Hi, Sergey! > > Thanks for the patch! > > > On 14.01.2025 14:06, Sergey Kaplun wrote: > > From: Mike Pall > > > > See the discussion in the corresponding ticket for the rationale. > > > > (cherry picked from commit de2e1ca9d3d87e74c0c20c1e4ad3c32b31a5875b) > > > > For the modulo operation, the arm64 VM uses `fmsub` [1] instruction, > > which is the fused multiply-add (FMA [2]) operation (more precisely, > > multiply-sub). Hence, it may produce different results compared to the > > unfused one. This patch fixes the behaviour by using the unfused > > instructions by default. However, the new JIT optimization flag (fma) is > > introduced to make it possible to take advantage of the FMA > > optimizations. > > > > Sergey Kaplun: > > * added the description and the test for the problem > > > > [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB > > [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation > > > > Part of tarantool/tarantool#10709 > > --- > > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > > new file mode 100644 > > index 00000000..55ec7b98 > > --- /dev/null > > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua > > @@ -0,0 +1,36 @@ > > +local tap = require('tap') > > + > > +-- Test file to demonstrate consistent behaviour for JIT and the > > +-- VM regarding FMA optimization (disabled by default). > > +-- XXX: The VM behaviour is checked in the > > +-- . > > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. > > +local test = tap.test('lj-918-fma-numerical-accuracy-jit'):skipcond({ > > + ['Test requires JIT enabled'] = not jit.status(), > > +}) > > + > > +test:plan(1) > > + > > +local _2pow52 = 2 ^ 52 > > + > > +-- IEEE754 components to double: > > +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > > +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > > +assert(a == 2197541395358679800) > > + > > +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) > > +assert(b == -1005065126.3690554) > > + > > Please add a comment with explanation why exactly these testcases > > are used. > > As I got it right, the idea is to calculate negative and positive > number, right? I've added the corresponding comment to avoid confusion. Branch is force-pushed. =================================================================== diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua index 55ec7b98..8b16d4c3 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua @@ -13,6 +13,18 @@ test:plan(1) local _2pow52 = 2 ^ 52 +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] +-- instruction for the modulo operation, which is the fused +-- multiply-add (FMA [2]) operation (more precisely, +-- multiply-sub). Hence, it may produce different results compared +-- to the unfused one. For the test, let's just use 2 numbers in +-- modulo for which the single rounding is different from the +-- double rounding. The numbers from the original issue are good +-- enough. +-- +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation +-- -- IEEE754 components to double: -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua index a3775d6d..25b59707 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua @@ -11,6 +11,18 @@ test:plan(2) local _2pow52 = 2 ^ 52 +-- XXX: Before this commit the LuaJIT arm64 VM uses `fmsub` [1] +-- instruction for the modulo operation, which is the fused +-- multiply-add (FMA [2]) operation (more precisely, +-- multiply-sub). Hence, it may produce different results compared +-- to the unfused one. For the test, let's just use 2 numbers in +-- modulo for which the single rounding is different from the +-- double rounding. The numbers from the original issue are good +-- enough. +-- +-- [1]:https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB +-- [2]:https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation +-- -- IEEE754 components to double: -- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) =================================================================== > > Why do you think two examples are enough for testing that behavior for > JIT and the VM > > is consistent? Since we test for modulo operation consistency, I suppose it is enough to check similar cases for the JIT and the VM. I distinguished them in the separate files to avoid skipping the test for the VM when JIT is disabled. > > Should we check more corner cases? > > * Standard/Normal arithmetic > * Subnormal arithmetic > * Infinite arithmetic > * NaN arithmetic > * Zero arithmetic All these checks are good but not really relevant to this particular issue. I suppose we may continue this activity as a part of the corresponding issue (please create one if it isn't created already), as we discussed offline, with test vectors for floating point values. > > > +local results = {} > > + > > +jit.opt.start('hotloop=1') > > +for i = 1, 4 do > > + results[i] = a % b > > +end > > + > > +-- XXX: The test doesn't fail before the commit. But it is > Please add a commit hash and it's short description. We usually meand this particular commit in the tests (commit when test is introduced). I rephrase it like the following to avoid confusion: =================================================================== diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua index 55ec7b98..8b16d4c3 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy-jit.test.lua @@ -28,7 +40,7 @@ for i = 1, 4 do results[i] = a % b end --- XXX: The test doesn't fail before the commit. But it is +-- XXX: The test doesn't fail before this commit. But it is -- required to be sure that there are no inconsistencies after the -- commit. test:samevalues(results, 'consistent behaviour between the JIT and the VM') diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua index a3775d6d..25b59707 100644 --- a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua @@ -19,7 +31,7 @@ assert(a == 2197541395358679800) local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) assert(b == -1005065126.3690554) --- These tests fail on ARM64 before the patch or with FMA +-- These tests fail on ARM64 before this patch or with FMA -- optimization enabled. -- The first test may not fail if the compiler doesn't generate -- an ARM64 FMA operation in `lj_vm_foldarith()`. =================================================================== > > +-- required to be sure that there are no inconsistencies after the > > +-- commit. > > +test:samevalues(results, 'consistent behaviour between the JIT and the VM') > > + > > +test:done(true) > > diff --git a/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > > new file mode 100644 > > index 00000000..a3775d6d > > --- /dev/null > > +++ b/test/tarantool-tests/lj-918-fma-numerical-accuracy.test.lua > > @@ -0,0 +1,31 @@ > > +local tap = require('tap') > > + > > +-- Test file to demonstrate possible numerical inaccuracy if FMA > > +-- optimization takes place. > > I suppose we don't need to test FMA itself, but we should > > check that FMA is actually enabled when it's option > > is enabled. Right? if yes I would merge test > lj-918-fma-numerical-accuracy.test.lua > > and test lj-918-fma-optimization.test.lua. I would rather avoid this since the FMA is more like the -mfma compiler option and affects only the JIT behaviour and can be enabled for the performance reason. I used this canary test to check that this option exists. > > > > +-- XXX: The JIT consistency is checked in the > > +-- . > > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/918. > > +local test = tap.test('lj-918-fma-numerical-accuracy') > > + > > +test:plan(2) > > + > > +local _2pow52 = 2 ^ 52 > > + > > +-- IEEE754 components to double: > > +-- sign * (2 ^ (exp - 1023)) * (mantissa / _2pow52 + normal). > > +local a = 1 * (2 ^ (1083 - 1023)) * (4080546448249347 / _2pow52 + 1) > > +assert(a == 2197541395358679800) > > + > > +local b = -1 * (2 ^ (1052 - 1023)) * (3927497732209973 / _2pow52 + 1) > > +assert(b == -1005065126.3690554) > The same questions as above. Added the comment. > > + > > +-- These tests fail on ARM64 before the patch or with FMA > > +-- optimization enabled. > > +-- The first test may not fail if the compiler doesn't generate > > +-- an ARM64 FMA operation in `lj_vm_foldarith()`. > > +test:is(2197541395358679800 % -1005065126.3690554, -606337536, > > + 'FMA in the lj_vm_foldarith() during parsing') > > + > > +test:is(a % b, -606337536, 'FMA in the VM') > > + > > +test:done(true) > > diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua > > new file mode 100644 > > index 00000000..af749eb5 > > --- /dev/null > > +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua > > @@ -0,0 +1,25 @@ > > +local tap = require('tap') > > +local test = tap.test('lj-918-fma-optimization'):skipcond({ > > + ['Test requires JIT enabled'] = not jit.status(), > > +}) > > + > > +test:plan(3) > > + > > +local function jit_opt_is_on(needed) > why `needed` and not something like "flag"? Replaced with `flag`: =================================================================== diff --git a/test/tarantool-tests/lj-918-fma-optimization.test.lua b/test/tarantool-tests/lj-918-fma-optimization.test.lua index af749eb5..9396e558 100644 --- a/test/tarantool-tests/lj-918-fma-optimization.test.lua +++ b/test/tarantool-tests/lj-918-fma-optimization.test.lua @@ -5,9 +5,9 @@ local test = tap.test('lj-918-fma-optimization'):skipcond({ test:plan(3) -local function jit_opt_is_on(needed) +local function jit_opt_is_on(flag) for _, opt in ipairs({jit.status()}) do - if opt == needed then + if opt == flag then return true end end =================================================================== > > + for _, opt in ipairs({jit.status()}) do > > + if opt == needed then > > + return true > > + end > > + end -- Best regards, Sergey Kaplun