[Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions.
Sergey Ostanevich
sergos at tarantool.org
Tue Jun 28 22:47:26 MSK 2022
Thanks!
LGTM.
Best regards,
Sergos
Tuesday, 28 June 2022, 22:07 +0300 from Kaplun Sergey <skaplun at tarantool.org>:
>Hi, Sergos!
>
>Thanks for the review!
>
>I've updated commit message to the following:
>
>===================================================================
>FFI/ARM64: Fix pass-by-value struct calling conventions.
>
>(cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224)
>
>If the argument type is a Composite Type then the size of the
>argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI
>backend makes this rounding unconditionally for arm64 architecture and
>uses the result value to determine the necessary amount of registers
>for the call.
>
>The arm64 parameters passing rules for Homogeneous Floating-point
>Aggregates (HFA) are the following [1]:
>| If the argument is an HFA and there are sufficient unallocated
>| Floating-point registers, then the argument is allocated to
>| Floating-point registers (with one register per member of the HFA).
>
>So for the HFA composed of 3 32-bit float members the extra one (4th)
>register is supposed as occupied.
>
>When the second parameter passed to the function these fields are loaded
>starting from 5th register. As far as the procedure to call uses 6
>registers according to the ABI it leads to the wrong value of the second
>parameter passed to the callee (0 due to the corresponding memset in
>`cconv_struct_init()`).
>
>This patch fixes this case by using the real size of structure in the
>calculation of necessary amount of registers to use.
>
>Sergey Kaplun:
>* added the description and the test for the problem
>
>[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules
>
>Part of tarantool/tarantool#6548
>===================================================================
>
>On 02.06.22, sergos wrote:
>> Hi!
>>
>> Thanks for the patch.
>> Some minor updates to the wording, test updates requested.
>>
>> Regards,
>> Sergos
>>
>> > On 9 Dec 2021, at 13:24, Sergey Kaplun < skaplun at tarantool.org > wrote:
>> >
>> > From: Mike Pall <mike>
>> >
>> > (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224)
>> >
>> > The arm64 parameters passing rules for Homogeneous Floating-point
>> > Aggregates (HFA) are the following [1]:
>> > * If the argument type is an HFA or an HVA, then the argument is used
>>
>> There’s no explanation of HVA here, unlike HFA. Should it help?
>>
>> > unmodified.
>> > * If the argument is an HFA and there are sufficient unallocated
>> > Floating-point registers, then the argument is allocated to
>> > Floating-pointRegisters (with one register per member of the HFA).
>> >
>> > Also, if the argument type is a Composite Type then the size of the
>> > argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI
>> > backend makes this rounding unconditionally for arm64 architecture and
>> > uses the result value to determine the necessary amount of registers
>> > for the call. So for the HFA composed of 2n + 1 (< 4) float members the
>>
>> I suppose you mean 32bit FP value here.
>>
>> > extra one register is supposed as occupied. This leads to the wrong
>> > value (0 due to the corresponding memset in `cconv_struct_init()`) in
>> > this register if the other one parameter is passed to the procedure.
>>
>> If I read it correct: “the HFA compliance zero-filled padding
>> can be referred to as a next argument passed”?
>>
>> >
>> > This patch fixes this case by using the real size of structure in the
>> > calculation of necessary amount of registers to use.
>> >
>> > Sergey Kaplun:
>> > * added the description and the test for the problem
>> >
>> > [1]: https://developer.arm.com/documentation/ihi0055/b/
>>
>> The link is 404. ARM is known for shuffling the docs, dunno how to pin it.
>>
>> >
>> > Part of tarantool/tarantool#6548
>> > ---
>> > OK, there is an important note before you proceed with the patch:
>> >
>> > `isfp` variable in arm64 may takes the following values:
>> > 0 - for the pointer argument
>> > 1 - each part of the argument (i.e. field in a structure or floating
>> > point number itself) takes exactly one register
>> > 2 - the structure is HFA (including complex floats) and compact, i.e.
>> > two fields may be saved into one register
>> >
>> > Patch fixes the behaviour in the last case, when the structure is HFA
>> > and takes 8*n + 4 bytes. The variable can't take another value exept
>> > mentioned before, IINM (I've checked it several times). So the magic
>>
>> I don’t see any tests using half precision FP, say _Float16. It is available
>> since GCC 7 at least. It can help with more variations in the size of the
>> HFA - say 2 or 6.
>
>Yes, LuaJIT FFI interface doesn't know about them :). This kind of
>HFA structures is NIY in LuaJIT.
>
>>
>> > `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no
>> > idea why Mike's prefered such interesting way to fix the behaviour in
>> > this case... May be he has its own branch with different `isfp` values
>> > and this is necessary for compatibility.
>> >
>> > Side note: I choose a general name (without mentioning arm64) for this C
>> > "library" in purpose. It may be adjusted in the future for brute force
>> > testing of FFI calling conventions for different architectures.
>> > See also: https://github.com/facebookarchive/luaffifb/blob/master/test.lua
>> > This link has an interesting example of FFI tests. May be we can create
>> > something similar with autogeneration of functions, structures and
>> > tests (not right now, but later).
>> >
>> > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci
>> > Tarantool branch: https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci
>> >
>> > src/lj_ccall.c | 3 +-
>> > test/tarantool-tests/CMakeLists.txt | 1 +
>> > .../arm64-ccall-fp-convention.test.lua | 65 +++++++++++++++++++
>> > test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 +
>> > test/tarantool-tests/ffi-ccall/libfficcall.c | 28 ++++++++
>> > 5 files changed, 97 insertions(+), 1 deletion(-)
>> > create mode 100644 test/tarantool-tests/arm64-ccall-fp-convention.test.lua
>> > create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt
>> > create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c
>> >
>
><snipped>
>
>> > --
>> > 2.33.1
>> >
>>
>
>--
>Best regards,
>Sergey Kaplun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20220628/e6e739e1/attachment.htm>
More information about the Tarantool-patches
mailing list