Re: [Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions.

Tarantool development patches archive
 help / color / mirror / Atom feed

From: sergos via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Kaplun <skaplun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions.
Date: Thu, 2 Jun 2022 11:51:53 +0300	[thread overview]
Message-ID: <CF55B3FC-4A3E-4B1F-B42D-E5CDEB6D510F@tarantool.org> (raw)
In-Reply-To: <f30b7aabffa7c9c2f72bc356bb304175a51edee7.1639039919.git.skaplun@tarantool.org>

Hi!

Thanks for the patch.
Some minor updates to the wording, test updates requested.

Regards,
Sergos

> On 9 Dec 2021, at 13:24, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> From: Mike Pall <mike>
> 
> (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224)
> 
> The arm64 parameters passing rules for Homogeneous Floating-point
> Aggregates (HFA) are the following [1]:
> * If the argument type is an HFA or an HVA, then the argument is used

There’s no explanation of HVA here, unlike HFA. Should it help?

>  unmodified.
> * If the argument is an HFA and there are sufficient unallocated
>  Floating-point registers, then the argument is allocated to
>  Floating-pointRegisters (with one register per member of the HFA).
> 
> Also, if the argument type is a Composite Type then the size of the
> argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI
> backend makes this rounding unconditionally for arm64 architecture and
> uses the result value to determine the necessary amount of registers
> for the call. So for the HFA composed of 2n + 1 (< 4) float members the

I suppose you mean 32bit FP value here. 

> extra one register is supposed as occupied. This leads to the wrong
> value (0 due to the corresponding memset in `cconv_struct_init()`) in
> this register if the other one parameter is passed to the procedure.

If I read it correct: “the HFA compliance zero-filled padding  
can be referred to as a next argument passed”?

> 
> This patch fixes this case by using the real size of structure in the
> calculation of necessary amount of registers to use.
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> [1]: https://developer.arm.com/documentation/ihi0055/b/

The link is 404. ARM is known for shuffling the docs, dunno how to pin it.

> 
> Part of tarantool/tarantool#6548
> ---
> OK, there is an important note before you proceed with the patch:
> 
> `isfp` variable in arm64 may takes the following values:
> 0 - for the pointer argument
> 1 - each part of the argument (i.e. field in a structure or floating
>    point number itself) takes exactly one register
> 2 - the structure is HFA (including complex floats) and compact, i.e.
>    two fields may be saved into one register
> 
> Patch fixes the behaviour in the last case, when the structure is HFA
> and takes 8*n + 4 bytes. The variable can't take another value exept
> mentioned before, IINM (I've checked it several times). So the magic

I don’t see any tests using half precision FP, say _Float16. It is available
since GCC 7 at least. It can help with more variations in the size of the
HFA - say 2 or 6.
 
> `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no
> idea why Mike's prefered such interesting way to fix the behaviour in
> this case... May be he has its own branch with different `isfp` values
> and this is necessary for compatibility.
> 
> Side note: I choose a general name (without mentioning arm64) for this C
> "library" in purpose. It may be adjusted in the future for brute force
> testing of FFI calling conventions for different architectures.
> See also: https://github.com/facebookarchive/luaffifb/blob/master/test.lua
> This link has an interesting example of FFI tests. May be we can create
> something similar with autogeneration of functions, structures and
> tests (not right now, but later).
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci
> Tarantool branch: https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci
> 
> src/lj_ccall.c                                |  3 +-
> test/tarantool-tests/CMakeLists.txt           |  1 +
> .../arm64-ccall-fp-convention.test.lua        | 65 +++++++++++++++++++
> test/tarantool-tests/ffi-ccall/CMakeLists.txt |  1 +
> test/tarantool-tests/ffi-ccall/libfficcall.c  | 28 ++++++++
> 5 files changed, 97 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/arm64-ccall-fp-convention.test.lua
> create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt
> create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c
> 
> diff --git a/src/lj_ccall.c b/src/lj_ccall.c
> index f26f1fea..d39ff861 100644
> --- a/src/lj_ccall.c
> +++ b/src/lj_ccall.c
> @@ -337,7 +337,8 @@
>   if (LJ_TARGET_OSX && isva) { \
>     /* IOS: All variadic arguments are on the stack. */ \
>   } else if (isfp) {  /* Try to pass argument in FPRs. */ \
> -    int n2 = ctype_isvector(d->info) ? 1 : n*isfp; \
> +    int n2 = ctype_isvector(d->info) ? 1 : \
> +	     isfp == 1 ? n : (d->size >> (4-isfp)); \
>     if (nfpr + n2 <= CCALL_NARG_FPR) { \
>       dp = &cc->fpr[nfpr]; \
>       nfpr += n2; \
> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
> index 992556d9..50769cda 100644
> --- a/test/tarantool-tests/CMakeLists.txt
> +++ b/test/tarantool-tests/CMakeLists.txt
> @@ -56,6 +56,7 @@ macro(BuildTestCLib lib sources)
>   set(LD_LIBRARY_PATH "${CMAKE_CURRENT_BINARY_DIR}:${LD_LIBRARY_PATH}" PARENT_SCOPE)
> endmacro()
> 
> +add_subdirectory(ffi-ccall)
> add_subdirectory(gh-4427-ffi-sandwich)
> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64)
> add_subdirectory(gh-6189-cur_L)
> diff --git a/test/tarantool-tests/arm64-ccall-fp-convention.test.lua b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua
> new file mode 100644
> index 00000000..c0d2236a
> --- /dev/null
> +++ b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua
> @@ -0,0 +1,65 @@
> +local ffi = require('ffi')
> +local tap = require('tap')
> +
> +local ffi_ccall = ffi.load('libfficcall')
> +
> +local test = tap.test('ffi c call convention')
> +test:plan(3)
> +
> +ffi.cdef[[
> +struct sz12_t {
> +	float f1;
> +	float f2;
> +	float f3;
> +};
> +
> +/* Just return the value itself. */
> +struct sz12_t retsz12(struct sz12_t a);
> +
> +/* Return sum of two. */
> +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b);
> +
> +/* Return sum of three. */
> +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c);
> +]]
> +
> +-- For pretty error messages.
> +local sz12_mt = {__tostring = function(sz12)
> +  return string.format('{%d,%d,%d}', sz12.f1, sz12.f2, sz12.f3)
> +end}
> +
> +local sz12_t = ffi.metatype('struct sz12_t', sz12_mt)
> +
> +local function new_sz12(f1, f2, f3)
> +  return sz12_t(f1, f2, f3)
> +end
> +
> +-- For pretty error messages.
> +local expected_mt = {__tostring = function(t)
> +  return '{'..table.concat(t, ',')..'}'
> +end}
> +
> +local function test_sz12(got, expected)
> +  assert(type(got) == 'cdata')
> +  assert(type(expected) == 'table')
> +  expected = setmetatable(expected, expected_mt)
> +  for i = 1, #expected do
> +    if got['f'..i] ~= expected[i] then
> +      error(('bad sz12_t value: got %s, expected %s'):format(
> +        tostring(got), tostring(expected)
> +      ))
> +    end
> +  end
> +  return true
> +end
> +
> +-- Test arm64 calling convention for HFA structures.
> +-- Need structure which size is not multiple by 8.
> +local sz12_111 = new_sz12(1, 1, 1)
> +test:ok(test_sz12(sz12_111, {1, 1, 1}), 'base')
> +local sz12_222 = ffi_ccall.add2sz12(sz12_111, sz12_111)
> +test:ok(test_sz12(sz12_222, {2, 2, 2}), '2 structures as args')
> +local sz12_333 = ffi_ccall.add3sz12(sz12_111, sz12_111, sz12_111)
> +test:ok(test_sz12(sz12_333, {3, 3, 3}), '3 structures as args')
> +
> +os.exit(test:check() and 0 or 1)
> diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt
> new file mode 100644
> index 00000000..df937bf8
> --- /dev/null
> +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt
> @@ -0,0 +1 @@
> +BuildTestCLib(libfficcall libfficcall.c)
> diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c
> new file mode 100644
> index 00000000..0c043a68
> --- /dev/null
> +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c
> @@ -0,0 +1,28 @@
> +struct sz12_t {
> +	float f1;
> +	float f2;
> +	float f3;
> +};
> +
> +struct sz12_t retsz12(struct sz12_t a)
> +{
> +	return a;
> +}
> +
> +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b)
> +{
> +	struct sz12_t res = {0};
> +	res.f1 = a.f1 + b.f1;
> +	res.f2 = a.f2 + b.f2;
> +	res.f3 = a.f3 + b.f3;
> +	return res;
> +}
> +
> +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c)
> +{
> +	struct sz12_t res = {0};
> +	res.f1 = a.f1 + b.f1 + c.f1;
> +	res.f2 = a.f2 + b.f2 + c.f2;
> +	res.f3 = a.f3 + b.f3 + c.f3;
> +	return res;
> +}
> -- 
> 2.33.1
>

next prev parent reply	other threads:[~2022-06-02  8:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 10:24 [Tarantool-patches] [PATCH luajit 0/2] FFI: fix arm64 call for HFA Sergey Kaplun via Tarantool-patches
2021-12-09 10:24 ` [Tarantool-patches] [PATCH luajit 1/2] test: set DYLD_LIBRARY_PATH environment variable Sergey Kaplun via Tarantool-patches
2022-06-02  8:51   ` sergos via Tarantool-patches
2022-06-29  8:31   ` Igor Munkin via Tarantool-patches
2021-12-09 10:24 ` [Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions Sergey Kaplun via Tarantool-patches
2022-06-02  8:51   ` sergos via Tarantool-patches [this message]
2022-06-28 19:05     ` Sergey Kaplun via Tarantool-patches
2022-06-28 19:47       ` Sergey Ostanevich via Tarantool-patches
2022-06-29  8:45       ` Igor Munkin via Tarantool-patches
2022-06-30 12:11 ` [Tarantool-patches] [PATCH luajit 0/2] FFI: fix arm64 call for HFA Igor Munkin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CF55B3FC-4A3E-4B1F-B42D-E5CDEB6D510F@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=sergos@tarantool.org \
    --cc=skaplun@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox