From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 205976ECE3; Tue, 28 Jun 2022 22:07:29 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 205976ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1656443249; bh=cKCW7dKK2hqiQWJxTrJNDNrgR0/AoGRtmvSB4zikRJE=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=sP6VNX47j94txuAzvvxLNqCve9jHbmR2zSGyw9rPHfx/Z6aUM5kxkZfxIAuqO3y+J dKK8DF8AU7udB+M+duIRGVf3cYT9JD/lH/SjClXuuOziU//+sdt1wC0I0lTfM3qHGm Qf3jTVdLT/G3SzWKBLRo1FpFY5iwBafJXEzszBdA= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 383416ECE3 for ; Tue, 28 Jun 2022 22:07:26 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 383416ECE3 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1o6GYX-0005hH-7c; Tue, 28 Jun 2022 22:07:25 +0300 Date: Tue, 28 Jun 2022 22:05:09 +0300 To: sergos Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD921FF253C2DCA6432F205BFE23685B2C747BEBA2DF010ABD500894C459B0CD1B9DABEB0CA15AB2A0112E32C4537886872EA90967CD42106C63DE74FFC206C9863 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE751DD1FEBB966604DEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637129C704593A46970EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38B8859CA687ABA27BAE80E65569598D4EF969279ECE2A379ADCC7F00164DA146DAFE8445B8C89999728AA50765F7900637F6B57BC7E64490618DEB871D839B7333395957E7521B51C2DFABB839C843B9C08941B15DA834481F8AA50765F7900637028599BB38096F4F389733CBF5DBD5E9B5C8C57E37DE458B9E9CE733340B9D5F3BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4930A3850AC1BE2E735F43AACC0BCEB2632C4224003CC83647689D4C264860C145E X-8FC586DF: 6EFBBC1D9D64D975 X-C1DE0DAB: 0D63561A33F958A5BC5C8FD8CDC62A3A0D85592BD040D6D1D5963E9B05539413D59269BC5F550898D99A6476B3ADF6B4886A5961035A09600383DAD389E261318FB05168BE4CE3AF X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34829444FF2D8CB89BC17ECC6B3D098634EAFC419E4DBA00C96700F14AAA6CD6671F7098321C93A21E1D7E09C32AA3244CE2EE2E19D063A366933DBE528CCD368CA8CE788DE6831205FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojX1xomQseBkWW3odGpp8A3w== X-Mailru-Sender: 689FA8AB762F7393CC2E0F076E87284EE90F103FB4655B17DEE1F60E83130A570FBE9A32752B8C9C2AA642CC12EC09F1FB559BB5D741EB962F61BD320559CF1EFD657A8799238ED55FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Sergos! Thanks for the review! I've updated commit message to the following: =================================================================== FFI/ARM64: Fix pass-by-value struct calling conventions. (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224) If the argument type is a Composite Type then the size of the argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI backend makes this rounding unconditionally for arm64 architecture and uses the result value to determine the necessary amount of registers for the call. The arm64 parameters passing rules for Homogeneous Floating-point Aggregates (HFA) are the following [1]: | If the argument is an HFA and there are sufficient unallocated | Floating-point registers, then the argument is allocated to | Floating-point registers (with one register per member of the HFA). So for the HFA composed of 3 32-bit float members the extra one (4th) register is supposed as occupied. When the second parameter passed to the function these fields are loaded starting from 5th register. As far as the procedure to call uses 6 registers according to the ABI it leads to the wrong value of the second parameter passed to the callee (0 due to the corresponding memset in `cconv_struct_init()`). This patch fixes this case by using the real size of structure in the calculation of necessary amount of registers to use. Sergey Kaplun: * added the description and the test for the problem [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules Part of tarantool/tarantool#6548 =================================================================== On 02.06.22, sergos wrote: > Hi! > > Thanks for the patch. > Some minor updates to the wording, test updates requested. > > Regards, > Sergos > > > On 9 Dec 2021, at 13:24, Sergey Kaplun wrote: > > > > From: Mike Pall > > > > (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224) > > > > The arm64 parameters passing rules for Homogeneous Floating-point > > Aggregates (HFA) are the following [1]: > > * If the argument type is an HFA or an HVA, then the argument is used > > There’s no explanation of HVA here, unlike HFA. Should it help? > > > unmodified. > > * If the argument is an HFA and there are sufficient unallocated > > Floating-point registers, then the argument is allocated to > > Floating-pointRegisters (with one register per member of the HFA). > > > > Also, if the argument type is a Composite Type then the size of the > > argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI > > backend makes this rounding unconditionally for arm64 architecture and > > uses the result value to determine the necessary amount of registers > > for the call. So for the HFA composed of 2n + 1 (< 4) float members the > > I suppose you mean 32bit FP value here. > > > extra one register is supposed as occupied. This leads to the wrong > > value (0 due to the corresponding memset in `cconv_struct_init()`) in > > this register if the other one parameter is passed to the procedure. > > If I read it correct: “the HFA compliance zero-filled padding > can be referred to as a next argument passed”? > > > > > This patch fixes this case by using the real size of structure in the > > calculation of necessary amount of registers to use. > > > > Sergey Kaplun: > > * added the description and the test for the problem > > > > [1]: https://developer.arm.com/documentation/ihi0055/b/ > > The link is 404. ARM is known for shuffling the docs, dunno how to pin it. > > > > > Part of tarantool/tarantool#6548 > > --- > > OK, there is an important note before you proceed with the patch: > > > > `isfp` variable in arm64 may takes the following values: > > 0 - for the pointer argument > > 1 - each part of the argument (i.e. field in a structure or floating > > point number itself) takes exactly one register > > 2 - the structure is HFA (including complex floats) and compact, i.e. > > two fields may be saved into one register > > > > Patch fixes the behaviour in the last case, when the structure is HFA > > and takes 8*n + 4 bytes. The variable can't take another value exept > > mentioned before, IINM (I've checked it several times). So the magic > > I don’t see any tests using half precision FP, say _Float16. It is available > since GCC 7 at least. It can help with more variations in the size of the > HFA - say 2 or 6. Yes, LuaJIT FFI interface doesn't know about them :). This kind of HFA structures is NIY in LuaJIT. > > > `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no > > idea why Mike's prefered such interesting way to fix the behaviour in > > this case... May be he has its own branch with different `isfp` values > > and this is necessary for compatibility. > > > > Side note: I choose a general name (without mentioning arm64) for this C > > "library" in purpose. It may be adjusted in the future for brute force > > testing of FFI calling conventions for different architectures. > > See also: https://github.com/facebookarchive/luaffifb/blob/master/test.lua > > This link has an interesting example of FFI tests. May be we can create > > something similar with autogeneration of functions, structures and > > tests (not right now, but later). > > > > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci > > Tarantool branch: https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci > > > > src/lj_ccall.c | 3 +- > > test/tarantool-tests/CMakeLists.txt | 1 + > > .../arm64-ccall-fp-convention.test.lua | 65 +++++++++++++++++++ > > test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 + > > test/tarantool-tests/ffi-ccall/libfficcall.c | 28 ++++++++ > > 5 files changed, 97 insertions(+), 1 deletion(-) > > create mode 100644 test/tarantool-tests/arm64-ccall-fp-convention.test.lua > > create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt > > create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c > > > > -- > > 2.33.1 > > > -- Best regards, Sergey Kaplun