From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 1D2E26EFDA; Thu, 2 Jun 2022 11:52:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1D2E26EFDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1654159941; bh=Kl6GJ9HN1cbnIok3rU1afUvPahOy/sqnwvWDA+HXvY4=; h=In-Reply-To:Date:References:To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Th8IhV1YugBjCKTzxmSekzZ/jBC4ogygxJdpb55G+mI/1bkshwBXDJKz4KCb2aa37 iY0jjacPRs/eM4XBnkLvIyV/gTCdHkFA/e1iiiFKeyZU9AVKKMu5FfAFat03T7DCCP T9zysBF5n3OWSpuKCbN8ljrdL1R1RZDyjWJIVj18= Received: from smtp56.i.mail.ru (smtp56.i.mail.ru [217.69.128.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B86B36EFDB for ; Thu, 2 Jun 2022 11:51:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B86B36EFDB Received: by smtp56.i.mail.ru with esmtpa (envelope-from ) id 1nwgYc-0002yX-0J; Thu, 02 Jun 2022 11:51:54 +0300 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\)) In-Reply-To: Date: Thu, 2 Jun 2022 11:51:53 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Sergey Kaplun X-Mailer: Apple Mail (2.3696.80.82.1.1) X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD93C5C90AC751DDC8C9A8909CE9D77FD1A819BC97AD7C120D91313CFAB8367EF908E2BE116634AD74D2CFF057E16357A9A0AD086E50C9ADC64DFE7C7AAA25F05614C00A71A96C76CBF X-8FC586DF: 6EFBBC1D9D64D975 X-C1DE0DAB: 0D63561A33F958A5DD0A41B034C953FA0943653F6339DFD0D4D7FAEE76669D6CD59269BC5F550898D99A6476B3ADF6B4886A5961035A09600383DAD389E261318FB05168BE4CE3AF X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3450C5E6D685282BA1BB3FADF74CC88E2E9D7C379E68CE5A2970EA2CDBFBF7FDED1A9F00FD07225E5A1D7E09C32AA3244C0B7C1DC624B41C6972B1F267FB4FA8A3B4DF56057A86259FFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojRxBGqISLOzC3qN2/2aMn6g== X-Mailru-Sender: 11C2EC085EDE56FA38FD4C59F7EFE4074599D2217B083E966DB19F1A1AA4413406F365114516B2B619381EE24192DF5555834048F03EF5D4C9A814A92B2E3B1BA4250FC3964EA4964198E0F3ECE9B5443453F38A29522196 X-Mras: OK Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch. Some minor updates to the wording, test updates requested. Regards, Sergos > On 9 Dec 2021, at 13:24, Sergey Kaplun wrote: >=20 > From: Mike Pall >=20 > (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224) >=20 > The arm64 parameters passing rules for Homogeneous Floating-point > Aggregates (HFA) are the following [1]: > * If the argument type is an HFA or an HVA, then the argument is used There=E2=80=99s no explanation of HVA here, unlike HFA. Should it help? > unmodified. > * If the argument is an HFA and there are sufficient unallocated > Floating-point registers, then the argument is allocated to > Floating-pointRegisters (with one register per member of the HFA). >=20 > Also, if the argument type is a Composite Type then the size of the > argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI > backend makes this rounding unconditionally for arm64 architecture and > uses the result value to determine the necessary amount of registers > for the call. So for the HFA composed of 2n + 1 (< 4) float members = the I suppose you mean 32bit FP value here.=20 > extra one register is supposed as occupied. This leads to the wrong > value (0 due to the corresponding memset in `cconv_struct_init()`) in > this register if the other one parameter is passed to the procedure. If I read it correct: =E2=80=9Cthe HFA compliance zero-filled padding =20= can be referred to as a next argument passed=E2=80=9D? >=20 > This patch fixes this case by using the real size of structure in the > calculation of necessary amount of registers to use. >=20 > Sergey Kaplun: > * added the description and the test for the problem >=20 > [1]: https://developer.arm.com/documentation/ihi0055/b/ The link is 404. ARM is known for shuffling the docs, dunno how to pin = it. >=20 > Part of tarantool/tarantool#6548 > --- > OK, there is an important note before you proceed with the patch: >=20 > `isfp` variable in arm64 may takes the following values: > 0 - for the pointer argument > 1 - each part of the argument (i.e. field in a structure or floating > point number itself) takes exactly one register > 2 - the structure is HFA (including complex floats) and compact, i.e. > two fields may be saved into one register >=20 > Patch fixes the behaviour in the last case, when the structure is HFA > and takes 8*n + 4 bytes. The variable can't take another value exept > mentioned before, IINM (I've checked it several times). So the magic I don=E2=80=99t see any tests using half precision FP, say _Float16. It = is available since GCC 7 at least. It can help with more variations in the size of = the HFA - say 2 or 6. =20 > `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no > idea why Mike's prefered such interesting way to fix the behaviour in > this case... May be he has its own branch with different `isfp` values > and this is necessary for compatibility. >=20 > Side note: I choose a general name (without mentioning arm64) for this = C > "library" in purpose. It may be adjusted in the future for brute force > testing of FFI calling conventions for different architectures. > See also: = https://github.com/facebookarchive/luaffifb/blob/master/test.lua > This link has an interesting example of FFI tests. May be we can = create > something similar with autogeneration of functions, structures and > tests (not right now, but later). >=20 > Branch: = https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-cca= ll-fp-convention-full-ci > Tarantool branch: = https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-c= call-fp-convention-full-ci >=20 > src/lj_ccall.c | 3 +- > test/tarantool-tests/CMakeLists.txt | 1 + > .../arm64-ccall-fp-convention.test.lua | 65 +++++++++++++++++++ > test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 + > test/tarantool-tests/ffi-ccall/libfficcall.c | 28 ++++++++ > 5 files changed, 97 insertions(+), 1 deletion(-) > create mode 100644 = test/tarantool-tests/arm64-ccall-fp-convention.test.lua > create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt > create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c >=20 > diff --git a/src/lj_ccall.c b/src/lj_ccall.c > index f26f1fea..d39ff861 100644 > --- a/src/lj_ccall.c > +++ b/src/lj_ccall.c > @@ -337,7 +337,8 @@ > if (LJ_TARGET_OSX && isva) { \ > /* IOS: All variadic arguments are on the stack. */ \ > } else if (isfp) { /* Try to pass argument in FPRs. */ \ > - int n2 =3D ctype_isvector(d->info) ? 1 : n*isfp; \ > + int n2 =3D ctype_isvector(d->info) ? 1 : \ > + isfp =3D=3D 1 ? n : (d->size >> (4-isfp)); \ > if (nfpr + n2 <=3D CCALL_NARG_FPR) { \ > dp =3D &cc->fpr[nfpr]; \ > nfpr +=3D n2; \ > diff --git a/test/tarantool-tests/CMakeLists.txt = b/test/tarantool-tests/CMakeLists.txt > index 992556d9..50769cda 100644 > --- a/test/tarantool-tests/CMakeLists.txt > +++ b/test/tarantool-tests/CMakeLists.txt > @@ -56,6 +56,7 @@ macro(BuildTestCLib lib sources) > set(LD_LIBRARY_PATH "${CMAKE_CURRENT_BINARY_DIR}:${LD_LIBRARY_PATH}" = PARENT_SCOPE) > endmacro() >=20 > +add_subdirectory(ffi-ccall) > add_subdirectory(gh-4427-ffi-sandwich) > add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) > add_subdirectory(gh-6189-cur_L) > diff --git a/test/tarantool-tests/arm64-ccall-fp-convention.test.lua = b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua > new file mode 100644 > index 00000000..c0d2236a > --- /dev/null > +++ b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua > @@ -0,0 +1,65 @@ > +local ffi =3D require('ffi') > +local tap =3D require('tap') > + > +local ffi_ccall =3D ffi.load('libfficcall') > + > +local test =3D tap.test('ffi c call convention') > +test:plan(3) > + > +ffi.cdef[[ > +struct sz12_t { > + float f1; > + float f2; > + float f3; > +}; > + > +/* Just return the value itself. */ > +struct sz12_t retsz12(struct sz12_t a); > + > +/* Return sum of two. */ > +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b); > + > +/* Return sum of three. */ > +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct = sz12_t c); > +]] > + > +-- For pretty error messages. > +local sz12_mt =3D {__tostring =3D function(sz12) > + return string.format('{%d,%d,%d}', sz12.f1, sz12.f2, sz12.f3) > +end} > + > +local sz12_t =3D ffi.metatype('struct sz12_t', sz12_mt) > + > +local function new_sz12(f1, f2, f3) > + return sz12_t(f1, f2, f3) > +end > + > +-- For pretty error messages. > +local expected_mt =3D {__tostring =3D function(t) > + return '{'..table.concat(t, ',')..'}' > +end} > + > +local function test_sz12(got, expected) > + assert(type(got) =3D=3D 'cdata') > + assert(type(expected) =3D=3D 'table') > + expected =3D setmetatable(expected, expected_mt) > + for i =3D 1, #expected do > + if got['f'..i] ~=3D expected[i] then > + error(('bad sz12_t value: got %s, expected %s'):format( > + tostring(got), tostring(expected) > + )) > + end > + end > + return true > +end > + > +-- Test arm64 calling convention for HFA structures. > +-- Need structure which size is not multiple by 8. > +local sz12_111 =3D new_sz12(1, 1, 1) > +test:ok(test_sz12(sz12_111, {1, 1, 1}), 'base') > +local sz12_222 =3D ffi_ccall.add2sz12(sz12_111, sz12_111) > +test:ok(test_sz12(sz12_222, {2, 2, 2}), '2 structures as args') > +local sz12_333 =3D ffi_ccall.add3sz12(sz12_111, sz12_111, sz12_111) > +test:ok(test_sz12(sz12_333, {3, 3, 3}), '3 structures as args') > + > +os.exit(test:check() and 0 or 1) > diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt = b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > new file mode 100644 > index 00000000..df937bf8 > --- /dev/null > +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > @@ -0,0 +1 @@ > +BuildTestCLib(libfficcall libfficcall.c) > diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c = b/test/tarantool-tests/ffi-ccall/libfficcall.c > new file mode 100644 > index 00000000..0c043a68 > --- /dev/null > +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c > @@ -0,0 +1,28 @@ > +struct sz12_t { > + float f1; > + float f2; > + float f3; > +}; > + > +struct sz12_t retsz12(struct sz12_t a) > +{ > + return a; > +} > + > +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b) > +{ > + struct sz12_t res =3D {0}; > + res.f1 =3D a.f1 + b.f1; > + res.f2 =3D a.f2 + b.f2; > + res.f3 =3D a.f3 + b.f3; > + return res; > +} > + > +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct = sz12_t c) > +{ > + struct sz12_t res =3D {0}; > + res.f1 =3D a.f1 + b.f1 + c.f1; > + res.f2 =3D a.f2 + b.f2 + c.f2; > + res.f3 =3D a.f3 + b.f3 + c.f3; > + return res; > +} > --=20 > 2.33.1 >=20