Thanks!  LGTM. Best regards, Sergos  Tuesday, 28 June 2022, 22:07 +0300 from Kaplun Sergey : >Hi, Sergos! > >Thanks for the review! > >I've updated commit message to the following: > >=================================================================== >FFI/ARM64: Fix pass-by-value struct calling conventions. > >(cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224) > >If the argument type is a Composite Type then the size of the >argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI >backend makes this rounding unconditionally for arm64 architecture and >uses the result value to determine the necessary amount of registers >for the call. > >The arm64 parameters passing rules for Homogeneous Floating-point >Aggregates (HFA) are the following [1]: >| If the argument is an HFA and there are sufficient unallocated >| Floating-point registers, then the argument is allocated to >| Floating-point registers (with one register per member of the HFA). > >So for the HFA composed of 3 32-bit float members the extra one (4th) >register is supposed as occupied. > >When the second parameter passed to the function these fields are loaded >starting from 5th register. As far as the procedure to call uses 6 >registers according to the ABI it leads to the wrong value of the second >parameter passed to the callee (0 due to the corresponding memset in >`cconv_struct_init()`). > >This patch fixes this case by using the real size of structure in the >calculation of necessary amount of registers to use. > >Sergey Kaplun: >* added the description and the test for the problem > >[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules > >Part of tarantool/tarantool#6548 >=================================================================== > >On 02.06.22, sergos wrote: >> Hi! >> >> Thanks for the patch. >> Some minor updates to the wording, test updates requested. >> >> Regards, >> Sergos >> >> > On 9 Dec 2021, at 13:24, Sergey Kaplun < skaplun@tarantool.org > wrote: >> > >> > From: Mike Pall >> > >> > (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224) >> > >> > The arm64 parameters passing rules for Homogeneous Floating-point >> > Aggregates (HFA) are the following [1]: >> > * If the argument type is an HFA or an HVA, then the argument is used >> >> There’s no explanation of HVA here, unlike HFA. Should it help? >> >> > unmodified. >> > * If the argument is an HFA and there are sufficient unallocated >> > Floating-point registers, then the argument is allocated to >> > Floating-pointRegisters (with one register per member of the HFA). >> > >> > Also, if the argument type is a Composite Type then the size of the >> > argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI >> > backend makes this rounding unconditionally for arm64 architecture and >> > uses the result value to determine the necessary amount of registers >> > for the call. So for the HFA composed of 2n + 1 (< 4) float members the >> >> I suppose you mean 32bit FP value here. >> >> > extra one register is supposed as occupied. This leads to the wrong >> > value (0 due to the corresponding memset in `cconv_struct_init()`) in >> > this register if the other one parameter is passed to the procedure. >> >> If I read it correct: “the HFA compliance zero-filled padding >> can be referred to as a next argument passed”? >> >> > >> > This patch fixes this case by using the real size of structure in the >> > calculation of necessary amount of registers to use. >> > >> > Sergey Kaplun: >> > * added the description and the test for the problem >> > >> > [1]: https://developer.arm.com/documentation/ihi0055/b/ >> >> The link is 404. ARM is known for shuffling the docs, dunno how to pin it. >> >> > >> > Part of tarantool/tarantool#6548 >> > --- >> > OK, there is an important note before you proceed with the patch: >> > >> > `isfp` variable in arm64 may takes the following values: >> > 0 - for the pointer argument >> > 1 - each part of the argument (i.e. field in a structure or floating >> > point number itself) takes exactly one register >> > 2 - the structure is HFA (including complex floats) and compact, i.e. >> > two fields may be saved into one register >> > >> > Patch fixes the behaviour in the last case, when the structure is HFA >> > and takes 8*n + 4 bytes. The variable can't take another value exept >> > mentioned before, IINM (I've checked it several times). So the magic >> >> I don’t see any tests using half precision FP, say _Float16. It is available >> since GCC 7 at least. It can help with more variations in the size of the >> HFA - say 2 or 6. > >Yes, LuaJIT FFI interface doesn't know about them :). This kind of >HFA structures is NIY in LuaJIT. > >> >> > `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no >> > idea why Mike's prefered such interesting way to fix the behaviour in >> > this case... May be he has its own branch with different `isfp` values >> > and this is necessary for compatibility. >> > >> > Side note: I choose a general name (without mentioning arm64) for this C >> > "library" in purpose. It may be adjusted in the future for brute force >> > testing of FFI calling conventions for different architectures. >> > See also: https://github.com/facebookarchive/luaffifb/blob/master/test.lua >> > This link has an interesting example of FFI tests. May be we can create >> > something similar with autogeneration of functions, structures and >> > tests (not right now, but later). >> > >> > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci >> > Tarantool branch: https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci >> > >> > src/lj_ccall.c | 3 +- >> > test/tarantool-tests/CMakeLists.txt | 1 + >> > .../arm64-ccall-fp-convention.test.lua | 65 +++++++++++++++++++ >> > test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 + >> > test/tarantool-tests/ffi-ccall/libfficcall.c | 28 ++++++++ >> > 5 files changed, 97 insertions(+), 1 deletion(-) >> > create mode 100644 test/tarantool-tests/arm64-ccall-fp-convention.test.lua >> > create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt >> > create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c >> > > > > >> > -- >> > 2.33.1 >> > >> > >-- >Best regards, >Sergey Kaplun