[Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions.
sergos
sergos at tarantool.org
Thu Jun 2 11:51:53 MSK 2022
Hi!
Thanks for the patch.
Some minor updates to the wording, test updates requested.
Regards,
Sergos
> On 9 Dec 2021, at 13:24, Sergey Kaplun <skaplun at tarantool.org> wrote:
>
> From: Mike Pall <mike>
>
> (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224)
>
> The arm64 parameters passing rules for Homogeneous Floating-point
> Aggregates (HFA) are the following [1]:
> * If the argument type is an HFA or an HVA, then the argument is used
There’s no explanation of HVA here, unlike HFA. Should it help?
> unmodified.
> * If the argument is an HFA and there are sufficient unallocated
> Floating-point registers, then the argument is allocated to
> Floating-pointRegisters (with one register per member of the HFA).
>
> Also, if the argument type is a Composite Type then the size of the
> argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI
> backend makes this rounding unconditionally for arm64 architecture and
> uses the result value to determine the necessary amount of registers
> for the call. So for the HFA composed of 2n + 1 (< 4) float members the
I suppose you mean 32bit FP value here.
> extra one register is supposed as occupied. This leads to the wrong
> value (0 due to the corresponding memset in `cconv_struct_init()`) in
> this register if the other one parameter is passed to the procedure.
If I read it correct: “the HFA compliance zero-filled padding
can be referred to as a next argument passed”?
>
> This patch fixes this case by using the real size of structure in the
> calculation of necessary amount of registers to use.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> [1]: https://developer.arm.com/documentation/ihi0055/b/
The link is 404. ARM is known for shuffling the docs, dunno how to pin it.
>
> Part of tarantool/tarantool#6548
> ---
> OK, there is an important note before you proceed with the patch:
>
> `isfp` variable in arm64 may takes the following values:
> 0 - for the pointer argument
> 1 - each part of the argument (i.e. field in a structure or floating
> point number itself) takes exactly one register
> 2 - the structure is HFA (including complex floats) and compact, i.e.
> two fields may be saved into one register
>
> Patch fixes the behaviour in the last case, when the structure is HFA
> and takes 8*n + 4 bytes. The variable can't take another value exept
> mentioned before, IINM (I've checked it several times). So the magic
I don’t see any tests using half precision FP, say _Float16. It is available
since GCC 7 at least. It can help with more variations in the size of the
HFA - say 2 or 6.
> `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no
> idea why Mike's prefered such interesting way to fix the behaviour in
> this case... May be he has its own branch with different `isfp` values
> and this is necessary for compatibility.
>
> Side note: I choose a general name (without mentioning arm64) for this C
> "library" in purpose. It may be adjusted in the future for brute force
> testing of FFI calling conventions for different architectures.
> See also: https://github.com/facebookarchive/luaffifb/blob/master/test.lua
> This link has an interesting example of FFI tests. May be we can create
> something similar with autogeneration of functions, structures and
> tests (not right now, but later).
>
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci
> Tarantool branch: https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci
>
> src/lj_ccall.c | 3 +-
> test/tarantool-tests/CMakeLists.txt | 1 +
> .../arm64-ccall-fp-convention.test.lua | 65 +++++++++++++++++++
> test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 +
> test/tarantool-tests/ffi-ccall/libfficcall.c | 28 ++++++++
> 5 files changed, 97 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/arm64-ccall-fp-convention.test.lua
> create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt
> create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c
>
> diff --git a/src/lj_ccall.c b/src/lj_ccall.c
> index f26f1fea..d39ff861 100644
> --- a/src/lj_ccall.c
> +++ b/src/lj_ccall.c
> @@ -337,7 +337,8 @@
> if (LJ_TARGET_OSX && isva) { \
> /* IOS: All variadic arguments are on the stack. */ \
> } else if (isfp) { /* Try to pass argument in FPRs. */ \
> - int n2 = ctype_isvector(d->info) ? 1 : n*isfp; \
> + int n2 = ctype_isvector(d->info) ? 1 : \
> + isfp == 1 ? n : (d->size >> (4-isfp)); \
> if (nfpr + n2 <= CCALL_NARG_FPR) { \
> dp = &cc->fpr[nfpr]; \
> nfpr += n2; \
> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
> index 992556d9..50769cda 100644
> --- a/test/tarantool-tests/CMakeLists.txt
> +++ b/test/tarantool-tests/CMakeLists.txt
> @@ -56,6 +56,7 @@ macro(BuildTestCLib lib sources)
> set(LD_LIBRARY_PATH "${CMAKE_CURRENT_BINARY_DIR}:${LD_LIBRARY_PATH}" PARENT_SCOPE)
> endmacro()
>
> +add_subdirectory(ffi-ccall)
> add_subdirectory(gh-4427-ffi-sandwich)
> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64)
> add_subdirectory(gh-6189-cur_L)
> diff --git a/test/tarantool-tests/arm64-ccall-fp-convention.test.lua b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua
> new file mode 100644
> index 00000000..c0d2236a
> --- /dev/null
> +++ b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua
> @@ -0,0 +1,65 @@
> +local ffi = require('ffi')
> +local tap = require('tap')
> +
> +local ffi_ccall = ffi.load('libfficcall')
> +
> +local test = tap.test('ffi c call convention')
> +test:plan(3)
> +
> +ffi.cdef[[
> +struct sz12_t {
> + float f1;
> + float f2;
> + float f3;
> +};
> +
> +/* Just return the value itself. */
> +struct sz12_t retsz12(struct sz12_t a);
> +
> +/* Return sum of two. */
> +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b);
> +
> +/* Return sum of three. */
> +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c);
> +]]
> +
> +-- For pretty error messages.
> +local sz12_mt = {__tostring = function(sz12)
> + return string.format('{%d,%d,%d}', sz12.f1, sz12.f2, sz12.f3)
> +end}
> +
> +local sz12_t = ffi.metatype('struct sz12_t', sz12_mt)
> +
> +local function new_sz12(f1, f2, f3)
> + return sz12_t(f1, f2, f3)
> +end
> +
> +-- For pretty error messages.
> +local expected_mt = {__tostring = function(t)
> + return '{'..table.concat(t, ',')..'}'
> +end}
> +
> +local function test_sz12(got, expected)
> + assert(type(got) == 'cdata')
> + assert(type(expected) == 'table')
> + expected = setmetatable(expected, expected_mt)
> + for i = 1, #expected do
> + if got['f'..i] ~= expected[i] then
> + error(('bad sz12_t value: got %s, expected %s'):format(
> + tostring(got), tostring(expected)
> + ))
> + end
> + end
> + return true
> +end
> +
> +-- Test arm64 calling convention for HFA structures.
> +-- Need structure which size is not multiple by 8.
> +local sz12_111 = new_sz12(1, 1, 1)
> +test:ok(test_sz12(sz12_111, {1, 1, 1}), 'base')
> +local sz12_222 = ffi_ccall.add2sz12(sz12_111, sz12_111)
> +test:ok(test_sz12(sz12_222, {2, 2, 2}), '2 structures as args')
> +local sz12_333 = ffi_ccall.add3sz12(sz12_111, sz12_111, sz12_111)
> +test:ok(test_sz12(sz12_333, {3, 3, 3}), '3 structures as args')
> +
> +os.exit(test:check() and 0 or 1)
> diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt
> new file mode 100644
> index 00000000..df937bf8
> --- /dev/null
> +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt
> @@ -0,0 +1 @@
> +BuildTestCLib(libfficcall libfficcall.c)
> diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c
> new file mode 100644
> index 00000000..0c043a68
> --- /dev/null
> +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c
> @@ -0,0 +1,28 @@
> +struct sz12_t {
> + float f1;
> + float f2;
> + float f3;
> +};
> +
> +struct sz12_t retsz12(struct sz12_t a)
> +{
> + return a;
> +}
> +
> +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b)
> +{
> + struct sz12_t res = {0};
> + res.f1 = a.f1 + b.f1;
> + res.f2 = a.f2 + b.f2;
> + res.f3 = a.f3 + b.f3;
> + return res;
> +}
> +
> +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c)
> +{
> + struct sz12_t res = {0};
> + res.f1 = a.f1 + b.f1 + c.f1;
> + res.f2 = a.f2 + b.f2 + c.f2;
> + res.f3 = a.f3 + b.f3 + c.f3;
> + return res;
> +}
> --
> 2.33.1
>
More information about the Tarantool-patches
mailing list