* [Tarantool-patches] [PATCH luajit 0/5] Various FFI ABI calling conventions fixes
@ 2026-05-30 16:04 Sergey Kaplun via Tarantool-patches
2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 1/5] FFI: Unify stack setup for C calls in interpreter Sergey Kaplun via Tarantool-patches
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 UTC (permalink / raw)
To: Sergey Bronnikov, Evgeniy Temirgaleev; +Cc: tarantool-patches
This patch set provides the various improvements to the FFI interface.
The first two patches add support for OSX calling conventions for
varargs functions and fix the JIT-compiled code for varargs functions on
macOS arm64. The next patch adds handling for HFA structures with an
array in them. The fourth patch adds many fixups for x64/arm64 calling
conventions. The last one patch is a follow-up to it to fix the macOS
regression introduced in the fourth patch.
Branch: https://github.com/tarantool/luajit/tree/skaplun/ffi-c-call-conventions
Related issues:
* https://github.com/tarantool/tarantool/issues/12480
* https://github.com/tarantool/tarantool/issues/6097
* https://github.com/LuaJIT/LuaJIT/issues/205
* https://github.com/LuaJIT/LuaJIT/issues/1357
* https://github.com/LuaJIT/LuaJIT/issues/1455
Mike Pall (5):
FFI: Unify stack setup for C calls in interpreter.
FFI/ARM64/OSX: Handle non-standard OSX C calling conventions.
ARM64: Fix pass-by-value struct calling conventions.
FFI: Various ABI and calling convention fixes.
FFI/MacOS: Fix calling convention for enums.
src/lj_asm_arm64.h | 75 +-
src/lj_ccall.c | 133 ++--
src/lj_ccall.h | 13 +-
src/lj_cparse.c | 9 +-
src/lj_crecord.c | 27 +
src/lj_ctype.h | 2 +-
src/vm_arm.dasc | 8 +-
src/vm_arm64.dasc | 8 +-
src/vm_mips.dasc | 1 -
src/vm_mips64.dasc | 1 -
src/vm_ppc.dasc | 3 +-
src/vm_x64.dasc | 8 +-
src/vm_x86.dasc | 22 +-
.../ffi-call-empty-struct.test.lua | 47 ++
test/tarantool-tests/ffi-ccall/CMakeLists.txt | 13 +-
test/tarantool-tests/ffi-ccall/libfficcall.c | 650 ++++++++++++++++++
.../ffi-vector-arguments.test.lua | 62 ++
.../gh-6097-arm64-osx-ffi-vararg.test.lua | 43 ++
...57-arm64-struct-array-pass-by-val.test.lua | 23 +
.../lj-1455-arm64-ffi-ccall-hfa.test.lua | 82 +++
.../lj-1455-bitfield0-a16.test.lua | 27 +
.../lj-1455-ffi-conventions.test.lua | 441 ++++++++++++
.../lj-205-arm64-osx-ffi-enum-arg.test.lua | 63 ++
.../lj-205-arm64-osx-ffi-small-arg.test.lua | 29 +
24 files changed, 1703 insertions(+), 87 deletions(-)
create mode 100644 test/tarantool-tests/ffi-call-empty-struct.test.lua
create mode 100644 test/tarantool-tests/ffi-vector-arguments.test.lua
create mode 100644 test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua
create mode 100644 test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua
create mode 100644 test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua
create mode 100644 test/tarantool-tests/lj-1455-bitfield0-a16.test.lua
create mode 100644 test/tarantool-tests/lj-1455-ffi-conventions.test.lua
create mode 100644 test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua
create mode 100644 test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua
--
2.54.0
^ permalink raw reply [flat|nested] 10+ messages in thread* [Tarantool-patches] [PATCH luajit 1/5] FFI: Unify stack setup for C calls in interpreter. 2026-05-30 16:04 [Tarantool-patches] [PATCH luajit 0/5] Various FFI ABI calling conventions fixes Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 ` Sergey Kaplun via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 2/5] FFI/ARM64/OSX: Handle non-standard OSX C calling conventions Sergey Kaplun via Tarantool-patches ` (3 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 UTC (permalink / raw) To: Sergey Bronnikov, Evgeniy Temirgaleev; +Cc: tarantool-patches From: Mike Pall <mike> (cherry picked from commit cf903edb30e0cbd620ebd4bac02d4e2b4410fd02) This patch refactors the FFI CCallState structure. The `nsp` field now contains the number of bytes of the stack slot occupied. Also, it effectively decreased the number of arguments for the callee to 31. This patch is required for the next commit. Sergey Kaplun: * added the description for the patch Part of tarantool/tarantool#12480 --- src/lj_ccall.c | 57 +++++++++++++++++++++++++--------------------- src/lj_ccall.h | 7 +++--- src/vm_arm.dasc | 8 +++---- src/vm_arm64.dasc | 8 +++---- src/vm_mips.dasc | 1 - src/vm_mips64.dasc | 1 - src/vm_ppc.dasc | 3 +-- src/vm_x64.dasc | 8 +++---- src/vm_x86.dasc | 22 +++++++++++------- 9 files changed, 62 insertions(+), 53 deletions(-) diff --git a/src/lj_ccall.c b/src/lj_ccall.c index c3b27572..394255eb 100644 --- a/src/lj_ccall.c +++ b/src/lj_ccall.c @@ -20,12 +20,15 @@ #if LJ_TARGET_X86 /* -- x86 calling conventions --------------------------------------------- */ +#define CCALL_PUSH(arg) \ + *(GPRArg *)((uint8_t *)cc->stack + nsp) = (GPRArg)(arg), nsp += CTSIZE_PTR + #if LJ_ABI_WIN #define CCALL_HANDLE_STRUCTRET \ /* Return structs bigger than 8 by reference (on stack only). */ \ cc->retref = (sz > 8); \ - if (cc->retref) cc->stack[nsp++] = (GPRArg)dp; + if (cc->retref) CCALL_PUSH(dp); #define CCALL_HANDLE_COMPLEXRET CCALL_HANDLE_STRUCTRET @@ -40,7 +43,7 @@ if (ngpr < maxgpr) \ cc->gpr[ngpr++] = (GPRArg)dp; \ else \ - cc->stack[nsp++] = (GPRArg)dp; \ + CCALL_PUSH(dp); \ } else { /* Struct with single FP field ends up in FPR. */ \ cc->resx87 = ccall_classify_struct(cts, ctr); \ } @@ -56,7 +59,7 @@ if (ngpr < maxgpr) \ cc->gpr[ngpr++] = (GPRArg)dp; \ else \ - cc->stack[nsp++] = (GPRArg)dp; + CCALL_PUSH(dp); #endif @@ -67,7 +70,7 @@ if (ngpr < maxgpr) \ cc->gpr[ngpr++] = (GPRArg)dp; \ else \ - cc->stack[nsp++] = (GPRArg)dp; \ + CCALL_PUSH(dp); \ } #endif @@ -278,8 +281,8 @@ if (ngpr < maxgpr) { \ dp = &cc->gpr[ngpr]; \ if (ngpr + n > maxgpr) { \ - nsp += ngpr + n - maxgpr; /* Assumes contiguous gpr/stack fields. */ \ - if (nsp > CCALL_MAXSTACK) goto err_nyi; /* Too many arguments. */ \ + nsp += (ngpr + n - maxgpr) * CTSIZE_PTR; /* Assumes contiguous gpr/stack fields. */ \ + if (nsp > CCALL_SIZE_STACK) goto err_nyi; /* Too many arguments. */ \ ngpr = maxgpr; \ } else { \ ngpr += n; \ @@ -471,8 +474,8 @@ if (ngpr < maxgpr) { \ dp = &cc->gpr[ngpr]; \ if (ngpr + n > maxgpr) { \ - nsp += ngpr + n - maxgpr; /* Assumes contiguous gpr/stack fields. */ \ - if (nsp > CCALL_MAXSTACK) goto err_nyi; /* Too many arguments. */ \ + nsp += (ngpr + n - maxgpr) * CTSIZE_PTR; /* Assumes contiguous gpr/stack fields. */ \ + if (nsp > CCALL_SIZE_STACK) goto err_nyi; /* Too many arguments. */ \ ngpr = maxgpr; \ } else { \ ngpr += n; \ @@ -565,8 +568,8 @@ if (ngpr < maxgpr) { \ dp = &cc->gpr[ngpr]; \ if (ngpr + n > maxgpr) { \ - nsp += ngpr + n - maxgpr; /* Assumes contiguous gpr/stack fields. */ \ - if (nsp > CCALL_MAXSTACK) goto err_nyi; /* Too many arguments. */ \ + nsp += (ngpr + n - maxgpr) * CTSIZE_PTR; /* Assumes contiguous gpr/stack fields. */ \ + if (nsp > CCALL_SIZE_STACK) goto err_nyi; /* Too many arguments. */ \ ngpr = maxgpr; \ } else { \ ngpr += n; \ @@ -698,10 +701,11 @@ static int ccall_struct_arg(CCallState *cc, CTState *cts, CType *d, int *rcl, lj_cconv_ct_tv(cts, d, (uint8_t *)dp, o, CCF_ARG(narg)); if (ccall_struct_reg(cc, cts, dp, rcl)) { /* Register overflow? Pass on stack. */ - MSize nsp = cc->nsp, n = rcl[1] ? 2 : 1; - if (nsp + n > CCALL_MAXSTACK) return 1; /* Too many arguments. */ - cc->nsp = nsp + n; - memcpy(&cc->stack[nsp], dp, n*CTSIZE_PTR); + MSize nsp = cc->nsp, sz = rcl[1] ? 2*CTSIZE_PTR : CTSIZE_PTR; + if (nsp + sz > CCALL_SIZE_STACK) + return 1; /* Too many arguments. */ + cc->nsp = nsp + sz; + memcpy((uint8_t *)cc->stack + nsp, dp, sz); } return 0; /* Ok. */ } @@ -1026,22 +1030,23 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, } else { sz = CTSIZE_PTR; } - sz = (sz + CTSIZE_PTR-1) & ~(CTSIZE_PTR-1); - n = sz / CTSIZE_PTR; /* Number of GPRs or stack slots needed. */ + n = (sz + CTSIZE_PTR-1) / CTSIZE_PTR; /* Number of GPRs or stack slots needed. */ CCALL_HANDLE_REGARG /* Handle register arguments. */ /* Otherwise pass argument on stack. */ - if (CCALL_ALIGN_STACKARG && !rp && (d->info & CTF_ALIGN) > CTALIGN_PTR) { - MSize align = (1u << ctype_align(d->info-CTALIGN_PTR)) -1; - nsp = (nsp + align) & ~align; /* Align argument on stack. */ + if (CCALL_ALIGN_STACKARG) { /* Align argument on stack. */ + MSize align = (1u << ctype_align(d->info)) - 1; + if (rp) + align = CTSIZE_PTR-1; + nsp = (nsp + align) & ~align; } - if (nsp + n > CCALL_MAXSTACK) { /* Too many arguments. */ + dp = ((uint8_t *)cc->stack) + nsp; + nsp += n * CTSIZE_PTR; + if (nsp > CCALL_SIZE_STACK) { /* Too many arguments. */ err_nyi: lj_err_caller(L, LJ_ERR_FFI_NYICALL); } - dp = &cc->stack[nsp]; - nsp += n; isva = 0; done: @@ -1103,10 +1108,10 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, #if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP) cc->nfpr = nfpr; /* Required for vararg functions. */ #endif - cc->nsp = nsp; - cc->spadj = (CCALL_SPS_FREE + CCALL_SPS_EXTRA)*CTSIZE_PTR; - if (nsp > CCALL_SPS_FREE) - cc->spadj += (((nsp-CCALL_SPS_FREE)*CTSIZE_PTR + 15u) & ~15u); + cc->nsp = (nsp + CTSIZE_PTR-1) & ~(CTSIZE_PTR-1); + cc->spadj = (CCALL_SPS_FREE + CCALL_SPS_EXTRA) * CTSIZE_PTR; + if (cc->nsp > CCALL_SPS_FREE * CTSIZE_PTR) + cc->spadj += (((cc->nsp - CCALL_SPS_FREE * CTSIZE_PTR) + 15u) & ~15u); return gcsteps; } diff --git a/src/lj_ccall.h b/src/lj_ccall.h index 6efa48c7..af7a8e84 100644 --- a/src/lj_ccall.h +++ b/src/lj_ccall.h @@ -152,14 +152,15 @@ typedef union FPRArg { LJ_STATIC_ASSERT(CCALL_NUM_GPR <= CCALL_MAX_GPR); LJ_STATIC_ASSERT(CCALL_NUM_FPR <= CCALL_MAX_FPR); -#define CCALL_MAXSTACK 32 +#define CCALL_NUM_STACK 31 +#define CCALL_SIZE_STACK (CCALL_NUM_STACK * CTSIZE_PTR) /* -- C call state -------------------------------------------------------- */ typedef LJ_ALIGN(CCALL_ALIGN_CALLSTATE) struct CCallState { void (*func)(void); /* Pointer to called function. */ uint32_t spadj; /* Stack pointer adjustment. */ - uint8_t nsp; /* Number of stack slots. */ + uint8_t nsp; /* Number of bytes on stack. */ uint8_t retref; /* Return value by reference. */ #if LJ_TARGET_X64 uint8_t ngpr; /* Number of arguments in GPRs. */ @@ -178,7 +179,7 @@ typedef LJ_ALIGN(CCALL_ALIGN_CALLSTATE) struct CCallState { FPRArg fpr[CCALL_NUM_FPR]; /* Arguments/results in FPRs. */ #endif GPRArg gpr[CCALL_NUM_GPR]; /* Arguments/results in GPRs. */ - GPRArg stack[CCALL_MAXSTACK]; /* Stack slots. */ + GPRArg stack[CCALL_NUM_STACK]; /* Stack slots. */ } CCallState; /* -- C call handling ----------------------------------------------------- */ diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc index 628c1c24..7ed555f8 100644 --- a/src/vm_arm.dasc +++ b/src/vm_arm.dasc @@ -2513,16 +2513,16 @@ static void build_subroutines(BuildCtx *ctx) |.endif | mov r11, sp | sub sp, sp, CARG1 // Readjust stack. - | subs CARG2, CARG2, #1 + | subs CARG2, CARG2, #4 |.if HFABI | vldm RB, {d0-d7} |.endif | ldr RB, CCSTATE->func | bmi >2 |1: // Copy stack slots. - | ldr CARG4, [CARG3, CARG2, lsl #2] - | str CARG4, [sp, CARG2, lsl #2] - | subs CARG2, CARG2, #1 + | ldr CARG4, [CARG3, CARG2] + | str CARG4, [sp, CARG2] + | subs CARG2, CARG2, #4 | bpl <1 |2: | ldrd CARG12, CCSTATE->gpr[0] diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc index c35eaf12..57131140 100644 --- a/src/vm_arm64.dasc +++ b/src/vm_arm64.dasc @@ -2142,14 +2142,14 @@ static void build_subroutines(BuildCtx *ctx) | ldr TMP0w, CCSTATE:x0->spadj | ldrb TMP1w, CCSTATE->nsp | add TMP2, CCSTATE, #offsetof(CCallState, stack) - | subs TMP1, TMP1, #1 + | subs TMP1, TMP1, #8 | ldr TMP3, CCSTATE->func | sub sp, sp, TMP0 | bmi >2 |1: // Copy stack slots - | ldr TMP0, [TMP2, TMP1, lsl #3] - | str TMP0, [sp, TMP1, lsl #3] - | subs TMP1, TMP1, #1 + | ldr TMP0, [TMP2, TMP1] + | str TMP0, [sp, TMP1] + | subs TMP1, TMP1, #8 | bpl <1 |2: | ldp x0, x1, CCSTATE->gpr[0] diff --git a/src/vm_mips.dasc b/src/vm_mips.dasc index 52366b88..4db9308f 100644 --- a/src/vm_mips.dasc +++ b/src/vm_mips.dasc @@ -2836,7 +2836,6 @@ static void build_subroutines(BuildCtx *ctx) | move TMP2, sp | subu sp, sp, TMP1 | sw ra, -4(TMP2) - | sll CARG2, CARG2, 2 | sw r16, -8(TMP2) | sw CCSTATE, -12(TMP2) | move r16, TMP2 diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc index c41b27f4..87e240f7 100644 --- a/src/vm_mips64.dasc +++ b/src/vm_mips64.dasc @@ -2963,7 +2963,6 @@ static void build_subroutines(BuildCtx *ctx) | move TMP2, sp | dsubu sp, sp, TMP1 | sd ra, -8(TMP2) - | sll CARG2, CARG2, 3 | sd r16, -16(TMP2) | sd CCSTATE, -24(TMP2) | move r16, TMP2 diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc index edae7b98..23d6e316 100644 --- a/src/vm_ppc.dasc +++ b/src/vm_ppc.dasc @@ -3275,14 +3275,13 @@ static void build_subroutines(BuildCtx *ctx) | stw TMP0, 4(sp) | cmpwi cr1, CARG3, 0 | mr TMP2, sp - | addic. CARG2, CARG2, -1 + | addic. CARG2, CARG2, -4 | stwux sp, sp, TMP1 | crnot 4*cr1+eq, 4*cr1+eq // For vararg calls. | stw r14, -4(TMP2) | stw CCSTATE, -8(TMP2) | mr r14, TMP2 | la TMP1, CCSTATE->stack - | slwi CARG2, CARG2, 2 | blty >2 | la TMP2, 8(sp) |1: diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc index 6ac88d70..7f0da677 100644 --- a/src/vm_x64.dasc +++ b/src/vm_x64.dasc @@ -2773,12 +2773,12 @@ static void build_subroutines(BuildCtx *ctx) | | // Copy stack slots. | movzx ecx, byte CCSTATE->nsp - | sub ecx, 1 + | sub ecx, 8 | js >2 |1: - | mov rax, [CCSTATE+rcx*8+offsetof(CCallState, stack)] - | mov [rsp+rcx*8+CCALL_SPS_EXTRA*8], rax - | sub ecx, 1 + | mov rax, [CCSTATE+rcx+offsetof(CCallState, stack)] + | mov [rsp+rcx+CCALL_SPS_EXTRA*8], rax + | sub ecx, 8 | jns <1 |2: | diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc index d9234f3b..98546550 100644 --- a/src/vm_x86.dasc +++ b/src/vm_x86.dasc @@ -3348,19 +3348,25 @@ static void build_subroutines(BuildCtx *ctx) | | // Copy stack slots. | movzx ecx, byte CCSTATE->nsp - | sub ecx, 1 + |.if X64 + | sub ecx, 8 | js >2 |1: - |.if X64 - | mov rax, [CCSTATE+rcx*8+offsetof(CCallState, stack)] - | mov [rsp+rcx*8+CCALL_SPS_EXTRA*8], rax + | mov rax, [CCSTATE+rcx+offsetof(CCallState, stack)] + | mov [rsp+rcx+CCALL_SPS_EXTRA*8], rax + | sub ecx, 8 + | jns <1 + |2: |.else - | mov eax, [CCSTATE+ecx*4+offsetof(CCallState, stack)] - | mov [esp+ecx*4], eax - |.endif - | sub ecx, 1 + | sub ecx, 4 + | js >2 + |1: + | mov eax, [CCSTATE+ecx+offsetof(CCallState, stack)] + | mov [esp+ecx], eax + | sub ecx, 4 | jns <1 |2: + |.endif | |.if X64 | movzx eax, byte CCSTATE->nfpr -- 2.54.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH luajit 2/5] FFI/ARM64/OSX: Handle non-standard OSX C calling conventions. 2026-05-30 16:04 [Tarantool-patches] [PATCH luajit 0/5] Various FFI ABI calling conventions fixes Sergey Kaplun via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 1/5] FFI: Unify stack setup for C calls in interpreter Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 ` Sergey Kaplun via Tarantool-patches 2026-06-01 11:40 ` Sergey Bronnikov via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 3/5] ARM64: Fix pass-by-value struct " Sergey Kaplun via Tarantool-patches ` (2 subsequent siblings) 4 siblings, 1 reply; 10+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 UTC (permalink / raw) To: Sergey Bronnikov, Evgeniy Temirgaleev; +Cc: tarantool-patches From: Mike Pall <mike> Contributed by Peter Cawley. (cherry picked from commit 83954100dba9fc0cf5eeaf122f007df35ec9a604) This patch adds FFI support for passing small (< 8 bytes) parameters on the stack on the OSX arm64 architecture. Also, it fixes the compilation of FFI vararg functions, since before the patch arguments were passed in registers instead of the stack [1] for them. JIT machinery now uses `TREF_NIL` as a marker for the slot from which the variadic arguments begin. [1]: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Update-code-that-passes-arguments-to-variadic-functions Sergey Kaplun: * added the description and the test for the problem Resolves tarantool/tarantool#6097 Part of tarantool/tarantool#12480 --- src/lj_asm_arm64.h | 75 +++++++++++++++---- src/lj_ccall.c | 11 ++- src/lj_ccall.h | 6 ++ src/lj_crecord.c | 27 +++++++ test/tarantool-tests/ffi-ccall/CMakeLists.txt | 8 +- test/tarantool-tests/ffi-ccall/libfficcall.c | 51 +++++++++++++ .../gh-6097-arm64-osx-ffi-vararg.test.lua | 43 +++++++++++ .../lj-205-arm64-osx-ffi-enum-arg.test.lua | 63 ++++++++++++++++ .../lj-205-arm64-osx-ffi-small-arg.test.lua | 29 +++++++ 9 files changed, 291 insertions(+), 22 deletions(-) create mode 100644 test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua create mode 100644 test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua create mode 100644 test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h index 313b4a96..f731ab05 100644 --- a/src/lj_asm_arm64.h +++ b/src/lj_asm_arm64.h @@ -416,7 +416,7 @@ static int asm_fuseorshift(ASMState *as, IRIns *ir) static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args) { uint32_t n, nargs = CCI_XNARGS(ci); - int32_t ofs = 0; + int32_t spofs = 0, spalign = LJ_HASFFI && LJ_TARGET_OSX ? 0 : 7; Reg gpr, fpr = REGARG_FIRSTFPR; if ((void *)ci->func) emit_call(as, (void *)ci->func); @@ -435,8 +435,14 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args) fpr++; } else { Reg r = ra_alloc1(as, ref, RSET_FPR); - emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isnum(ir->t)) ? 4 : 0)); - ofs += 8; + int32_t al = spalign; +#if LJ_HASFFI && LJ_TARGET_OSX + al |= irt_isnum(ir->t) ? 7 : 3; +#endif + spofs = (spofs + al) & ~al; + if (LJ_BE && al >= 7 && !irt_isnum(ir->t)) spofs += 4, al -= 4; + emit_spstore(as, ir, r, spofs); + spofs += al + 1; } } else { if (gpr <= REGARG_LASTGPR) { @@ -446,10 +452,27 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args) gpr++; } else { Reg r = ra_alloc1(as, ref, RSET_GPR); - emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_is64(ir->t)) ? 4 : 0)); - ofs += 8; + int32_t al = spalign; +#if LJ_HASFFI && LJ_TARGET_OSX + al |= irt_size(ir->t) - 1; +#endif + spofs = (spofs + al) & ~al; + if (al >= 3) { + if (LJ_BE && al >= 7 && !irt_is64(ir->t)) spofs += 4, al -= 4; + emit_spstore(as, ir, r, spofs); + } else { + lj_assertA(al == 0 || al == 1, "size %d unexpected", al + 1); + emit_lso(as, al ? A64I_STRH : A64I_STRB, r, RID_SP, spofs); + } + spofs += al + 1; } } +#if LJ_HASFFI && LJ_TARGET_OSX + } else { /* Marker for start of varargs. */ + gpr = REGARG_LASTGPR+1; + fpr = REGARG_LASTFPR+1; + spalign = 7; +#endif } } } @@ -1928,19 +1951,41 @@ static void asm_tail_prep(ASMState *as) /* Ensure there are enough stack slots for call arguments. */ static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci) { - IRRef args[CCI_NARGS_MAX*2]; +#if LJ_HASFFI uint32_t i, nargs = CCI_XNARGS(ci); - int nslots = 0, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR; - asm_collectargs(as, ir, ci, args); - for (i = 0; i < nargs; i++) { - if (args[i] && irt_isfp(IR(args[i])->t)) { - if (nfpr > 0) nfpr--; else nslots += 2; - } else { - if (ngpr > 0) ngpr--; else nslots += 2; + if (nargs > (REGARG_NUMGPR < REGARG_NUMFPR ? REGARG_NUMGPR : REGARG_NUMFPR) || + (LJ_TARGET_OSX && (ci->flags & CCI_VARARG))) { + IRRef args[CCI_NARGS_MAX*2]; + int ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR; + int spofs = 0, spalign = LJ_TARGET_OSX ? 0 : 7, nslots; + asm_collectargs(as, ir, ci, args); + for (i = 0; i < nargs; i++) { + int al = spalign; + if (!args[i]) { +#if LJ_TARGET_OSX + /* Marker for start of varaargs. */ + nfpr = 0; + ngpr = 0; + spalign = 7; +#endif + } else if (irt_isfp(IR(args[i])->t)) { + if (nfpr > 0) { nfpr--; continue; } +#if LJ_TARGET_OSX + al |= irt_isnum(IR(args[i])->t) ? 7 : 3; +#endif + } else { + if (ngpr > 0) { ngpr--; continue; } +#if LJ_TARGET_OSX + al |= irt_size(IR(args[i])->t) - 1; +#endif + } + spofs = (spofs + 2*al+1) & ~al; /* Align and bump stack pointer. */ } + nslots = (spofs + 3) >> 2; + if (nslots > as->evenspill) /* Leave room for args in stack slots. */ + as->evenspill = nslots; } - if (nslots > as->evenspill) /* Leave room for args in stack slots. */ - as->evenspill = nslots; +#endif return REGSP_HINT(RID_RET); } diff --git a/src/lj_ccall.c b/src/lj_ccall.c index 394255eb..b2705de5 100644 --- a/src/lj_ccall.c +++ b/src/lj_ccall.c @@ -348,7 +348,6 @@ goto done; \ } else { \ nfpr = CCALL_NARG_FPR; /* Prevent reordering. */ \ - if (LJ_TARGET_OSX && d->size < 8) goto err_nyi; \ } \ } else { /* Try to pass argument in GPRs. */ \ if (!LJ_TARGET_OSX && (d->info & CTF_ALIGN) > CTALIGN_PTR) \ @@ -359,7 +358,6 @@ goto done; \ } else { \ ngpr = maxgpr; /* Prevent reordering. */ \ - if (LJ_TARGET_OSX && d->size < 8) goto err_nyi; \ } \ } @@ -1027,7 +1025,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, CCALL_HANDLE_STRUCTARG } else if (ctype_iscomplex(d->info)) { CCALL_HANDLE_COMPLEXARG - } else { + } else if (!(CCALL_PACK_STACKARG && ctype_isenum(d->info))) { sz = CTSIZE_PTR; } n = (sz + CTSIZE_PTR-1) / CTSIZE_PTR; /* Number of GPRs or stack slots needed. */ @@ -1037,12 +1035,12 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, /* Otherwise pass argument on stack. */ if (CCALL_ALIGN_STACKARG) { /* Align argument on stack. */ MSize align = (1u << ctype_align(d->info)) - 1; - if (rp) + if (rp || (CCALL_PACK_STACKARG && isva && align < CTSIZE_PTR-1)) align = CTSIZE_PTR-1; nsp = (nsp + align) & ~align; } dp = ((uint8_t *)cc->stack) + nsp; - nsp += n * CTSIZE_PTR; + nsp += CCALL_PACK_STACKARG ? sz : n * CTSIZE_PTR; if (nsp > CCALL_SIZE_STACK) { /* Too many arguments. */ err_nyi: lj_err_caller(L, LJ_ERR_FFI_NYICALL); @@ -1057,7 +1055,8 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, } lj_cconv_ct_tv(cts, d, (uint8_t *)dp, o, CCF_ARG(narg)); /* Extend passed integers to 32 bits at least. */ - if (ctype_isinteger_or_bool(d->info) && d->size < 4) { + if (ctype_isinteger_or_bool(d->info) && d->size < 4 && + (!CCALL_PACK_STACKARG || !((uintptr_t)dp & 3))) { /* Assumes LJ_LE. */ if (d->info & CTF_UNSIGNED) *(uint32_t *)dp = d->size == 1 ? (uint32_t)*(uint8_t *)dp : (uint32_t)*(uint16_t *)dp; diff --git a/src/lj_ccall.h b/src/lj_ccall.h index af7a8e84..10d93b65 100644 --- a/src/lj_ccall.h +++ b/src/lj_ccall.h @@ -75,6 +75,9 @@ typedef union FPRArg { #define CCALL_NARG_FPR 8 #define CCALL_NRET_FPR 4 #define CCALL_SPS_FREE 0 +#if LJ_TARGET_OSX +#define CCALL_PACK_STACKARG 1 +#endif typedef intptr_t GPRArg; typedef union FPRArg { @@ -139,6 +142,9 @@ typedef union FPRArg { #ifndef CCALL_ALIGN_STACKARG #define CCALL_ALIGN_STACKARG 1 #endif +#ifndef CCALL_PACK_STACKARG +#define CCALL_PACK_STACKARG 0 +#endif #ifndef CCALL_ALIGN_CALLSTATE #define CCALL_ALIGN_CALLSTATE 8 #endif diff --git a/src/lj_crecord.c b/src/lj_crecord.c index d486ee85..7d9421a6 100644 --- a/src/lj_crecord.c +++ b/src/lj_crecord.c @@ -1122,6 +1122,12 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, ngpr = 1; else if (ctype_cconv(info) == CTCC_FASTCALL) ngpr = 2; +#elif LJ_TARGET_ARM64 +#if LJ_ABI_WIN +#error "NYI: ARM64 Windows ABI calling conventions" +#elif LJ_TARGET_OSX + int ngpr = CCALL_NARG_GPR; +#endif #endif /* Skip initial attributes. */ @@ -1147,6 +1153,14 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, } else { if (!(info & CTF_VARARG)) lj_trace_err(J, LJ_TRERR_NYICALL); /* Too many arguments. */ +#if LJ_TARGET_ARM64 && LJ_TARGET_OSX + if (ngpr >= 0) { + ngpr = -1; + args[n++] = TREF_NIL; /* Marker for start of varargs. */ + if (n >= CCI_NARGS_MAX) + lj_trace_err(J, LJ_TRERR_NYICALL); + } +#endif did = lj_ccall_ctid_vararg(cts, o); /* Infer vararg type. */ } d = ctype_raw(cts, did); @@ -1155,6 +1169,15 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, lj_trace_err(J, LJ_TRERR_NYICALL); tr = crec_ct_tv(J, d, 0, *base, o); if (ctype_isinteger_or_bool(d->info)) { +#if LJ_TARGET_ARM64 && LJ_TARGET_OSX + if (!ngpr) { + /* Fixed args passed on the stack use their unpromoted size. */ + if (d->size != lj_ir_type_size[tref_type(tr)]) { + lj_assertJ(d->size == 1 || d->size==2, "unexpected size %d", d->size); + tr = emitconv(tr, d->size==1 ? IRT_U8 : IRT_U16, tref_type(tr), 0); + } + } else +#endif if (d->size < 4) { if ((d->info & CTF_UNSIGNED)) tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_U8 : IRT_U16, 0); @@ -1192,6 +1215,10 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, } } #endif +#elif LJ_TARGET_ARM64 && LJ_TARGET_OSX + if (!ctype_isfp(d->info) && ngpr) { + ngpr--; + } #endif args[n] = tr; } diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt index 8acd8fe4..27de07ac 100644 --- a/test/tarantool-tests/ffi-ccall/CMakeLists.txt +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt @@ -1 +1,7 @@ -BuildTestCLib(libfficcall libfficcall.c ffi-ccall-arm64-fp-convention.test.lua) +list(APPEND tests + ffi-ccall-arm64-fp-convention.test.lua + lj-205-arm64-osx-ffi-enum-arg.test.lua + lj-205-arm64-osx-ffi-small-arg.test.lua +) + +BuildTestCLib(libfficcall libfficcall.c "${tests}") diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c index 6c23f7d1..fd2d4711 100644 --- a/test/tarantool-tests/ffi-ccall/libfficcall.c +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c @@ -1,3 +1,5 @@ +#include <stdint.h> + struct sz12_t { float f1; float f2; @@ -26,3 +28,52 @@ struct sz12_t sum3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c) res.f3 = a.f3 + b.f3 + c.f3; return res; } + +/****************************************************************/ +/* Enums. */ +/****************************************************************/ + +typedef enum { + E1 = 1, + E2 = 2, + E3 = 3, + E4 = 4, + E5 = 5, + E6 = 6, + E7 = 7, + E8 = 8, + E9 = 9, + E10 = 10, + E11 = 11 +} enum_t; + +int test_enum_reg(enum_t e1, enum_t e2, enum_t e3) +{ + return e1 + e2 + e3; +} + +int test_enum_stack(enum_t e1, enum_t e2, enum_t e3, enum_t e4, enum_t e5, + enum_t e6, enum_t e7, enum_t e8, enum_t e9, enum_t e10, + enum_t e11) +{ + return e1 + e2 + e3 + e4 + e5 + e6 + e7 + e8 + e9 + e10 + e11; +} + +/****************************************************************/ +/* Basic types (< 8 bytes). */ +/****************************************************************/ + +uint8_t test_u8_stack(uint8_t u1, uint8_t u2, uint8_t u3, uint8_t u4, + uint8_t u5, uint8_t u6, uint8_t u7, uint8_t u8, + uint8_t u9, uint8_t u10, uint8_t u11) +{ + return u1 + u2 + u3 + u4 + u5 + u6 + u7 + u8 + u9 + u10 + u11; +} + +float test_float_stack(float f1, float f2, float f3, float f4, float f5, + float f6, float f7, float f8, float f9, float f10, + float f11) +{ + return f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 + f9 + f10 + f11; +} + diff --git a/test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua b/test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua new file mode 100644 index 00000000..fc44d253 --- /dev/null +++ b/test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua @@ -0,0 +1,43 @@ +local tap = require('tap') + +-- The test file to demonstrate LuaJIT incorrect FFI vararg call +-- on macOS M1. +-- See also: https://github.com/tarantool/tarantool/issues/6097. +local test = tap.test('gh-6097-arm64-osx-ffi-vararg'):skipcond({ + ['Test requires JIT enabled'] = not jit.status(), +}) + +test:plan(4) + +local ffi = require('ffi') + +ffi.cdef('int sprintf(char *str, const char *format, ...)') + +local EXPECTED = '1' +local EXPECTED_LEN = #EXPECTED + +local str = ffi.new(string.format('char[256]')) + +jit.opt.start('hotloop=1') + +local results = {} +for i = 1, 4 do + local strlen = ffi.C.sprintf(str, '%d', 1LL) + assert(strlen == EXPECTED_LEN, 'correct string length for result') + results[i] = ffi.string(str) +end + +test:is(results[1], EXPECTED, 'correct result of FFI vararg call for int') +test:samevalues(results, 'consistent behaviour JIT and VM for vararg int arg') + +results = {} +for i = 1, 4 do + local strlen = ffi.C.sprintf(str, '%c', ffi.new('char', string.byte('1'))) + assert(strlen == EXPECTED_LEN, 'correct string length for result') + results[i] = ffi.string(str) +end + +test:is(results[1], EXPECTED, 'correct result of FFI vararg call for char') +test:samevalues(results, 'consistent behaviour JIT and VM for vararg char arg') + +test:done(true) diff --git a/test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua b/test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua new file mode 100644 index 00000000..4ba4f69d --- /dev/null +++ b/test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua @@ -0,0 +1,63 @@ +local ffi = require('ffi') +local tap = require('tap') + +local ffi_ccall = ffi.load('libfficcall') + +-- The test file to check the FFI call for enum arguments. +-- See also: https://github.com/LuaJIT/LuaJIT/issues/205. +local test = tap.test('lj-205-arm64-osx-ffi-enum-arg'):skipcond({ + ['Test requires JIT enabled'] = not jit.status(), +}) + +test:plan(4) + +ffi.cdef[[ + int sprintf(char *str, const char *format, ...); + + typedef enum { + E1 = 1, + E2 = 2, + E3 = 3, + E4 = 4, + E5 = 5, + E6 = 6, + E7 = 7, + E8 = 8, + E9 = 9, + E10 = 10, + E11 = 11 + } enum_t; + + int test_enum_reg(enum_t e1, enum_t e2, enum_t e3); + + int test_enum_stack(enum_t e1, enum_t e2, enum_t e3, enum_t e4, enum_t e5, + enum_t e6, enum_t e7, enum_t e8, enum_t e9, enum_t e10, + enum_t e11); +]] + + +local str = ffi.new(string.format('char[256]')) + +jit.opt.start('hotloop=1') + +local enum_t = ffi.typeof('enum_t') + +local results = {} +for i = 1, 4 do + local strlen = ffi.C.sprintf(str, '%d', enum_t(1)) + assert(strlen == 1, 'correct string length for result') + results[i] = ffi.string(str) +end + +test:is(results[1], '1', 'correct result of FFI vararg call for enum') +test:samevalues(results, 'consistent behaviour JIT and VM for vararg enum arg') + +test:is(ffi_ccall.test_enum_reg(enum_t(1), enum_t(2), enum_t(3)), 6, + 'correct enum reg pass') + +test:is(ffi_ccall.test_enum_stack(enum_t(1), enum_t(2), enum_t(3), enum_t(4), + enum_t(5), enum_t(6), enum_t(7), enum_t(8), + enum_t(9), enum_t(10), enum_t(11)), + 66, 'correct enum stack pass') + +test:done(true) diff --git a/test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua b/test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua new file mode 100644 index 00000000..be60de93 --- /dev/null +++ b/test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua @@ -0,0 +1,29 @@ +local ffi = require('ffi') +local tap = require('tap') + +local ffi_ccall = ffi.load('libfficcall') + +-- The test file to check the FFI call for small (<8 bytes) +-- arguments give on stack. +-- See also: https://github.com/LuaJIT/LuaJIT/issues/205. +local test = tap.test('lj-205-arm64-osx-ffi-small-arg') +test:plan(2) + +ffi.cdef[[ + uint8_t test_u8_stack(uint8_t u1, uint8_t u2, uint8_t u3, uint8_t u4, + uint8_t u5, uint8_t u6, uint8_t u7, uint8_t u8, + uint8_t u9, uint8_t u10, uint8_t u11); + + float test_float_stack(float f1, float f2, float f3, float f4, float f5, + float f6, float f7, float f8, float f9, float f10, + float f11); +]] + +test:is(ffi_ccall.test_u8_stack(1ULL, 2ULL, 3ULL, 4ULL, 5ULL, 6ULL, 7ULL, + 8ULL, 9ULL, 10ULL, 11ULL), + 66, 'correct uint8_t stack pass') + +test:is(ffi_ccall.test_float_stack(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), 66, + 'correct float stack pass') + +test:done(true) -- 2.54.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/5] FFI/ARM64/OSX: Handle non-standard OSX C calling conventions. 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 2/5] FFI/ARM64/OSX: Handle non-standard OSX C calling conventions Sergey Kaplun via Tarantool-patches @ 2026-06-01 11:40 ` Sergey Bronnikov via Tarantool-patches 0 siblings, 0 replies; 10+ messages in thread From: Sergey Bronnikov via Tarantool-patches @ 2026-06-01 11:40 UTC (permalink / raw) To: Sergey Kaplun, Evgeniy Temirgaleev; +Cc: tarantool-patches [-- Attachment #1: Type: text/plain, Size: 18638 bytes --] Hi, Sergey, thanks for the patch! LGTM with two minor comments. Sergey On 5/30/26 19:04, Sergey Kaplun wrote: > From: Mike Pall <mike> > > Contributed by Peter Cawley. > > (cherry picked from commit 83954100dba9fc0cf5eeaf122f007df35ec9a604) > > This patch adds FFI support for passing small (< 8 bytes) parameters on > the stack on the OSX arm64 architecture. Also, it fixes the compilation > of FFI vararg functions, since before the patch arguments were passed in > registers instead of the stack [1] for them. JIT machinery now uses > `TREF_NIL` as a marker for the slot from which the variadic arguments > begin. > > [1]:https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Update-code-that-passes-arguments-to-variadic-functions > > Sergey Kaplun: > * added the description and the test for the problem > > Resolves tarantool/tarantool#6097 > Part of tarantool/tarantool#12480 > --- > src/lj_asm_arm64.h | 75 +++++++++++++++---- > src/lj_ccall.c | 11 ++- > src/lj_ccall.h | 6 ++ > src/lj_crecord.c | 27 +++++++ > test/tarantool-tests/ffi-ccall/CMakeLists.txt | 8 +- > test/tarantool-tests/ffi-ccall/libfficcall.c | 51 +++++++++++++ > .../gh-6097-arm64-osx-ffi-vararg.test.lua | 43 +++++++++++ > .../lj-205-arm64-osx-ffi-enum-arg.test.lua | 63 ++++++++++++++++ > .../lj-205-arm64-osx-ffi-small-arg.test.lua | 29 +++++++ > 9 files changed, 291 insertions(+), 22 deletions(-) > create mode 100644 test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua > create mode 100644 test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua > create mode 100644 test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua > > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > index 313b4a96..f731ab05 100644 > --- a/src/lj_asm_arm64.h > +++ b/src/lj_asm_arm64.h > @@ -416,7 +416,7 @@ static int asm_fuseorshift(ASMState *as, IRIns *ir) > static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args) > { > uint32_t n, nargs = CCI_XNARGS(ci); > - int32_t ofs = 0; > + int32_t spofs = 0, spalign = LJ_HASFFI && LJ_TARGET_OSX ? 0 : 7; > Reg gpr, fpr = REGARG_FIRSTFPR; > if ((void *)ci->func) > emit_call(as, (void *)ci->func); > @@ -435,8 +435,14 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args) > fpr++; > } else { > Reg r = ra_alloc1(as, ref, RSET_FPR); > - emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isnum(ir->t)) ? 4 : 0)); > - ofs += 8; > + int32_t al = spalign; > +#if LJ_HASFFI && LJ_TARGET_OSX > + al |= irt_isnum(ir->t) ? 7 : 3; > +#endif > + spofs = (spofs + al) & ~al; > + if (LJ_BE && al >= 7 && !irt_isnum(ir->t)) spofs += 4, al -= 4; > + emit_spstore(as, ir, r, spofs); > + spofs += al + 1; > } > } else { > if (gpr <= REGARG_LASTGPR) { > @@ -446,10 +452,27 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args) > gpr++; > } else { > Reg r = ra_alloc1(as, ref, RSET_GPR); > - emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_is64(ir->t)) ? 4 : 0)); > - ofs += 8; > + int32_t al = spalign; > +#if LJ_HASFFI && LJ_TARGET_OSX > + al |= irt_size(ir->t) - 1; > +#endif > + spofs = (spofs + al) & ~al; > + if (al >= 3) { > + if (LJ_BE && al >= 7 && !irt_is64(ir->t)) spofs += 4, al -= 4; > + emit_spstore(as, ir, r, spofs); > + } else { > + lj_assertA(al == 0 || al == 1, "size %d unexpected", al + 1); > + emit_lso(as, al ? A64I_STRH : A64I_STRB, r, RID_SP, spofs); > + } > + spofs += al + 1; > } > } > +#if LJ_HASFFI && LJ_TARGET_OSX > + } else { /* Marker for start of varargs. */ > + gpr = REGARG_LASTGPR+1; > + fpr = REGARG_LASTFPR+1; > + spalign = 7; > +#endif > } > } > } > @@ -1928,19 +1951,41 @@ static void asm_tail_prep(ASMState *as) > /* Ensure there are enough stack slots for call arguments. */ > static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci) > { > - IRRef args[CCI_NARGS_MAX*2]; > +#if LJ_HASFFI > uint32_t i, nargs = CCI_XNARGS(ci); > - int nslots = 0, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR; > - asm_collectargs(as, ir, ci, args); > - for (i = 0; i < nargs; i++) { > - if (args[i] && irt_isfp(IR(args[i])->t)) { > - if (nfpr > 0) nfpr--; else nslots += 2; > - } else { > - if (ngpr > 0) ngpr--; else nslots += 2; > + if (nargs > (REGARG_NUMGPR < REGARG_NUMFPR ? REGARG_NUMGPR : REGARG_NUMFPR) || > + (LJ_TARGET_OSX && (ci->flags & CCI_VARARG))) { > + IRRef args[CCI_NARGS_MAX*2]; > + int ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR; > + int spofs = 0, spalign = LJ_TARGET_OSX ? 0 : 7, nslots; > + asm_collectargs(as, ir, ci, args); > + for (i = 0; i < nargs; i++) { > + int al = spalign; > + if (!args[i]) { > +#if LJ_TARGET_OSX > + /* Marker for start of varaargs. */ > + nfpr = 0; > + ngpr = 0; > + spalign = 7; > +#endif > + } else if (irt_isfp(IR(args[i])->t)) { > + if (nfpr > 0) { nfpr--; continue; } > +#if LJ_TARGET_OSX > + al |= irt_isnum(IR(args[i])->t) ? 7 : 3; > +#endif > + } else { > + if (ngpr > 0) { ngpr--; continue; } > +#if LJ_TARGET_OSX > + al |= irt_size(IR(args[i])->t) - 1; > +#endif > + } > + spofs = (spofs + 2*al+1) & ~al; /* Align and bump stack pointer. */ > } > + nslots = (spofs + 3) >> 2; > + if (nslots > as->evenspill) /* Leave room for args in stack slots. */ > + as->evenspill = nslots; > } > - if (nslots > as->evenspill) /* Leave room for args in stack slots. */ > - as->evenspill = nslots; > +#endif > return REGSP_HINT(RID_RET); > } > > diff --git a/src/lj_ccall.c b/src/lj_ccall.c > index 394255eb..b2705de5 100644 > --- a/src/lj_ccall.c > +++ b/src/lj_ccall.c > @@ -348,7 +348,6 @@ > goto done; \ > } else { \ > nfpr = CCALL_NARG_FPR; /* Prevent reordering. */ \ > - if (LJ_TARGET_OSX && d->size < 8) goto err_nyi; \ > } \ > } else { /* Try to pass argument in GPRs. */ \ > if (!LJ_TARGET_OSX && (d->info & CTF_ALIGN) > CTALIGN_PTR) \ > @@ -359,7 +358,6 @@ > goto done; \ > } else { \ > ngpr = maxgpr; /* Prevent reordering. */ \ > - if (LJ_TARGET_OSX && d->size < 8) goto err_nyi; \ > } \ > } > > @@ -1027,7 +1025,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, > CCALL_HANDLE_STRUCTARG > } else if (ctype_iscomplex(d->info)) { > CCALL_HANDLE_COMPLEXARG > - } else { > + } else if (!(CCALL_PACK_STACKARG && ctype_isenum(d->info))) { > sz = CTSIZE_PTR; > } > n = (sz + CTSIZE_PTR-1) / CTSIZE_PTR; /* Number of GPRs or stack slots needed. */ > @@ -1037,12 +1035,12 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, > /* Otherwise pass argument on stack. */ > if (CCALL_ALIGN_STACKARG) { /* Align argument on stack. */ > MSize align = (1u << ctype_align(d->info)) - 1; > - if (rp) > + if (rp || (CCALL_PACK_STACKARG && isva && align < CTSIZE_PTR-1)) > align = CTSIZE_PTR-1; > nsp = (nsp + align) & ~align; > } > dp = ((uint8_t *)cc->stack) + nsp; > - nsp += n * CTSIZE_PTR; > + nsp += CCALL_PACK_STACKARG ? sz : n * CTSIZE_PTR; > if (nsp > CCALL_SIZE_STACK) { /* Too many arguments. */ > err_nyi: > lj_err_caller(L, LJ_ERR_FFI_NYICALL); > @@ -1057,7 +1055,8 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, > } > lj_cconv_ct_tv(cts, d, (uint8_t *)dp, o, CCF_ARG(narg)); > /* Extend passed integers to 32 bits at least. */ > - if (ctype_isinteger_or_bool(d->info) && d->size < 4) { > + if (ctype_isinteger_or_bool(d->info) && d->size < 4 && > + (!CCALL_PACK_STACKARG || !((uintptr_t)dp & 3))) { /* Assumes LJ_LE. */ > if (d->info & CTF_UNSIGNED) > *(uint32_t *)dp = d->size == 1 ? (uint32_t)*(uint8_t *)dp : > (uint32_t)*(uint16_t *)dp; > diff --git a/src/lj_ccall.h b/src/lj_ccall.h > index af7a8e84..10d93b65 100644 > --- a/src/lj_ccall.h > +++ b/src/lj_ccall.h > @@ -75,6 +75,9 @@ typedef union FPRArg { > #define CCALL_NARG_FPR 8 > #define CCALL_NRET_FPR 4 > #define CCALL_SPS_FREE 0 > +#if LJ_TARGET_OSX > +#define CCALL_PACK_STACKARG 1 > +#endif > > typedef intptr_t GPRArg; > typedef union FPRArg { > @@ -139,6 +142,9 @@ typedef union FPRArg { > #ifndef CCALL_ALIGN_STACKARG > #define CCALL_ALIGN_STACKARG 1 > #endif > +#ifndef CCALL_PACK_STACKARG > +#define CCALL_PACK_STACKARG 0 > +#endif > #ifndef CCALL_ALIGN_CALLSTATE > #define CCALL_ALIGN_CALLSTATE 8 > #endif > diff --git a/src/lj_crecord.c b/src/lj_crecord.c > index d486ee85..7d9421a6 100644 > --- a/src/lj_crecord.c > +++ b/src/lj_crecord.c > @@ -1122,6 +1122,12 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, > ngpr = 1; > else if (ctype_cconv(info) == CTCC_FASTCALL) > ngpr = 2; > +#elif LJ_TARGET_ARM64 > +#if LJ_ABI_WIN > +#error "NYI: ARM64 Windows ABI calling conventions" > +#elif LJ_TARGET_OSX > + int ngpr = CCALL_NARG_GPR; > +#endif > #endif > > /* Skip initial attributes. */ > @@ -1147,6 +1153,14 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, > } else { > if (!(info & CTF_VARARG)) > lj_trace_err(J, LJ_TRERR_NYICALL); /* Too many arguments. */ > +#if LJ_TARGET_ARM64 && LJ_TARGET_OSX > + if (ngpr >= 0) { > + ngpr = -1; > + args[n++] = TREF_NIL; /* Marker for start of varargs. */ > + if (n >= CCI_NARGS_MAX) > + lj_trace_err(J, LJ_TRERR_NYICALL); > + } > +#endif > did = lj_ccall_ctid_vararg(cts, o); /* Infer vararg type. */ > } > d = ctype_raw(cts, did); > @@ -1155,6 +1169,15 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, > lj_trace_err(J, LJ_TRERR_NYICALL); > tr = crec_ct_tv(J, d, 0, *base, o); > if (ctype_isinteger_or_bool(d->info)) { > +#if LJ_TARGET_ARM64 && LJ_TARGET_OSX > + if (!ngpr) { > + /* Fixed args passed on the stack use their unpromoted size. */ > + if (d->size != lj_ir_type_size[tref_type(tr)]) { > + lj_assertJ(d->size == 1 || d->size==2, "unexpected size %d", d->size); > + tr = emitconv(tr, d->size==1 ? IRT_U8 : IRT_U16, tref_type(tr), 0); > + } > + } else > +#endif > if (d->size < 4) { > if ((d->info & CTF_UNSIGNED)) > tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_U8 : IRT_U16, 0); > @@ -1192,6 +1215,10 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd, > } > } > #endif > +#elif LJ_TARGET_ARM64 && LJ_TARGET_OSX > + if (!ctype_isfp(d->info) && ngpr) { > + ngpr--; > + } > #endif > args[n] = tr; > } > diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > index 8acd8fe4..27de07ac 100644 > --- a/test/tarantool-tests/ffi-ccall/CMakeLists.txt > +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > @@ -1 +1,7 @@ > -BuildTestCLib(libfficcall libfficcall.c ffi-ccall-arm64-fp-convention.test.lua) > +list(APPEND tests > + ffi-ccall-arm64-fp-convention.test.lua > + lj-205-arm64-osx-ffi-enum-arg.test.lua > + lj-205-arm64-osx-ffi-small-arg.test.lua > +) > + > +BuildTestCLib(libfficcall libfficcall.c "${tests}") > diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c > index 6c23f7d1..fd2d4711 100644 > --- a/test/tarantool-tests/ffi-ccall/libfficcall.c > +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c > @@ -1,3 +1,5 @@ > +#include <stdint.h> > + > struct sz12_t { > float f1; > float f2; > @@ -26,3 +28,52 @@ struct sz12_t sum3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c) > res.f3 = a.f3 + b.f3 + c.f3; > return res; > } > + > +/****************************************************************/ > +/* Enums. */ > +/****************************************************************/ > + > +typedef enum { > + E1 = 1, > + E2 = 2, > + E3 = 3, > + E4 = 4, > + E5 = 5, > + E6 = 6, > + E7 = 7, > + E8 = 8, > + E9 = 9, > + E10 = 10, > + E11 = 11 > +} enum_t; > + > +int test_enum_reg(enum_t e1, enum_t e2, enum_t e3) > +{ > + return e1 + e2 + e3; > +} > + > +int test_enum_stack(enum_t e1, enum_t e2, enum_t e3, enum_t e4, enum_t e5, > + enum_t e6, enum_t e7, enum_t e8, enum_t e9, enum_t e10, > + enum_t e11) > +{ > + return e1 + e2 + e3 + e4 + e5 + e6 + e7 + e8 + e9 + e10 + e11; > +} > + > +/****************************************************************/ > +/* Basic types (< 8 bytes). */ > +/****************************************************************/ > + > +uint8_t test_u8_stack(uint8_t u1, uint8_t u2, uint8_t u3, uint8_t u4, > + uint8_t u5, uint8_t u6, uint8_t u7, uint8_t u8, > + uint8_t u9, uint8_t u10, uint8_t u11) > +{ > + return u1 + u2 + u3 + u4 + u5 + u6 + u7 + u8 + u9 + u10 + u11; > +} > + > +float test_float_stack(float f1, float f2, float f3, float f4, float f5, > + float f6, float f7, float f8, float f9, float f10, > + float f11) > +{ > + return f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 + f9 + f10 + f11; > +} > + newline is not needed > diff --git a/test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua b/test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua > new file mode 100644 > index 00000000..fc44d253 > --- /dev/null > +++ b/test/tarantool-tests/gh-6097-arm64-osx-ffi-vararg.test.lua > @@ -0,0 +1,43 @@ > +local tap = require('tap') > + > +-- The test file to demonstrate LuaJIT incorrect FFI vararg call > +-- on macOS M1. > +-- See also:https://github.com/tarantool/tarantool/issues/6097. > +local test = tap.test('gh-6097-arm64-osx-ffi-vararg'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(4) > + > +local ffi = require('ffi') > + > +ffi.cdef('int sprintf(char *str, const char *format, ...)') > + > +local EXPECTED = '1' > +local EXPECTED_LEN = #EXPECTED > + > +local str = ffi.new(string.format('char[256]')) > + > +jit.opt.start('hotloop=1') > + > +local results = {} > +for i = 1, 4 do > + local strlen = ffi.C.sprintf(str, '%d', 1LL) honestly, I didn't get why the resulted buffer is named "strlen". The same is below. > + assert(strlen == EXPECTED_LEN, 'correct string length for result') > + results[i] = ffi.string(str) > +end > + > +test:is(results[1], EXPECTED, 'correct result of FFI vararg call for int') > +test:samevalues(results, 'consistent behaviour JIT and VM for vararg int arg') > + > +results = {} > +for i = 1, 4 do > + local strlen = ffi.C.sprintf(str, '%c', ffi.new('char', string.byte('1'))) > + assert(strlen == EXPECTED_LEN, 'correct string length for result') > + results[i] = ffi.string(str) > +end > + > +test:is(results[1], EXPECTED, 'correct result of FFI vararg call for char') > +test:samevalues(results, 'consistent behaviour JIT and VM for vararg char arg') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua b/test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua > new file mode 100644 > index 00000000..4ba4f69d > --- /dev/null > +++ b/test/tarantool-tests/lj-205-arm64-osx-ffi-enum-arg.test.lua > @@ -0,0 +1,63 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +local ffi_ccall = ffi.load('libfficcall') > + > +-- The test file to check the FFI call for enum arguments. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/205. > +local test = tap.test('lj-205-arm64-osx-ffi-enum-arg'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(4) > + > +ffi.cdef[[ > + int sprintf(char *str, const char *format, ...); > + > + typedef enum { > + E1 = 1, > + E2 = 2, > + E3 = 3, > + E4 = 4, > + E5 = 5, > + E6 = 6, > + E7 = 7, > + E8 = 8, > + E9 = 9, > + E10 = 10, > + E11 = 11 > + } enum_t; > + > + int test_enum_reg(enum_t e1, enum_t e2, enum_t e3); > + > + int test_enum_stack(enum_t e1, enum_t e2, enum_t e3, enum_t e4, enum_t e5, > + enum_t e6, enum_t e7, enum_t e8, enum_t e9, enum_t e10, > + enum_t e11); > +]] > + > + > +local str = ffi.new(string.format('char[256]')) > + > +jit.opt.start('hotloop=1') > + > +local enum_t = ffi.typeof('enum_t') > + > +local results = {} > +for i = 1, 4 do > + local strlen = ffi.C.sprintf(str, '%d', enum_t(1)) > + assert(strlen == 1, 'correct string length for result') > + results[i] = ffi.string(str) > +end > + > +test:is(results[1], '1', 'correct result of FFI vararg call for enum') > +test:samevalues(results, 'consistent behaviour JIT and VM for vararg enum arg') > + > +test:is(ffi_ccall.test_enum_reg(enum_t(1), enum_t(2), enum_t(3)), 6, > + 'correct enum reg pass') > + > +test:is(ffi_ccall.test_enum_stack(enum_t(1), enum_t(2), enum_t(3), enum_t(4), > + enum_t(5), enum_t(6), enum_t(7), enum_t(8), > + enum_t(9), enum_t(10), enum_t(11)), > + 66, 'correct enum stack pass') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua b/test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua > new file mode 100644 > index 00000000..be60de93 > --- /dev/null > +++ b/test/tarantool-tests/lj-205-arm64-osx-ffi-small-arg.test.lua > @@ -0,0 +1,29 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +local ffi_ccall = ffi.load('libfficcall') > + > +-- The test file to check the FFI call for small (<8 bytes) > +-- arguments give on stack. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/205. > +local test = tap.test('lj-205-arm64-osx-ffi-small-arg') > +test:plan(2) > + > +ffi.cdef[[ > + uint8_t test_u8_stack(uint8_t u1, uint8_t u2, uint8_t u3, uint8_t u4, > + uint8_t u5, uint8_t u6, uint8_t u7, uint8_t u8, > + uint8_t u9, uint8_t u10, uint8_t u11); > + > + float test_float_stack(float f1, float f2, float f3, float f4, float f5, > + float f6, float f7, float f8, float f9, float f10, > + float f11); > +]] > + > +test:is(ffi_ccall.test_u8_stack(1ULL, 2ULL, 3ULL, 4ULL, 5ULL, 6ULL, 7ULL, > + 8ULL, 9ULL, 10ULL, 11ULL), > + 66, 'correct uint8_t stack pass') > + > +test:is(ffi_ccall.test_float_stack(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), 66, > + 'correct float stack pass') > + > +test:done(true) [-- Attachment #2: Type: text/html, Size: 19222 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH luajit 3/5] ARM64: Fix pass-by-value struct calling conventions. 2026-05-30 16:04 [Tarantool-patches] [PATCH luajit 0/5] Various FFI ABI calling conventions fixes Sergey Kaplun via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 1/5] FFI: Unify stack setup for C calls in interpreter Sergey Kaplun via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 2/5] FFI/ARM64/OSX: Handle non-standard OSX C calling conventions Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 ` Sergey Kaplun via Tarantool-patches 2026-06-01 12:27 ` Sergey Bronnikov via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 4/5] FFI: Various ABI and calling convention fixes Sergey Kaplun via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 5/5] FFI/MacOS: Fix calling convention for enums Sergey Kaplun via Tarantool-patches 4 siblings, 1 reply; 10+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 UTC (permalink / raw) To: Sergey Bronnikov, Evgeniy Temirgaleev; +Cc: tarantool-patches From: Mike Pall <mike> Reported by AnthonyK213. (cherry picked from commit c262976486e1e007b56380b6a36bfbea5f51d470) The FFI call to the function with the pass-by-value structure containing the HFA arrays works incorrectly due to the `ccall_classify_struct()` lacking the handling of the array case. This patch adds the corresponding branch to check the single-dimentional array. However, the multidimensional arrays are not handled. This will be fixed in the next commit. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#12480 --- src/lj_ccall.c | 18 +++++++++++---- test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 + test/tarantool-tests/ffi-ccall/libfficcall.c | 12 ++++++++++ ...57-arm64-struct-array-pass-by-val.test.lua | 23 +++++++++++++++++++ 4 files changed, 49 insertions(+), 5 deletions(-) create mode 100644 test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua diff --git a/src/lj_ccall.c b/src/lj_ccall.c index b2705de5..104c9d34 100644 --- a/src/lj_ccall.c +++ b/src/lj_ccall.c @@ -781,17 +781,24 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) { CTSize sz = ct->size; unsigned int r = 0, n = 0, isu = (ct->info & CTF_UNION); - while (ct->sib) { + while (ct->sib && n <= 4) { + unsigned int m = 1; CType *sct; ct = ctype_get(cts, ct->sib); if (ctype_isfield(ct->info)) { sct = ctype_rawchild(cts, ct); + if (ctype_isarray(sct->info)) { + CType *cct = ctype_rawchild(cts, sct); + if (!cct->size) continue; + m = sct->size / cct->size; + sct = cct; + } if (ctype_isfp(sct->info)) { r |= sct->size; - if (!isu) n++; else if (n == 0) n = 1; + if (!isu) n += m; else if (n < m) n = m; } else if (ctype_iscomplex(sct->info)) { r |= (sct->size >> 1); - if (!isu) n += 2; else if (n < 2) n = 2; + if (!isu) n += 2*m; else if (n < 2*m) n = 2*m; } else if (ctype_isstruct(sct->info)) { goto substruct; } else { @@ -803,10 +810,11 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) sct = ctype_rawchild(cts, ct); substruct: if (sct->size > 0) { - unsigned int s = ccall_classify_struct(cts, sct); + unsigned int s = ccall_classify_struct(cts, sct), sn; if (s <= 1) goto noth; r |= (s & 255); - if (!isu) n += (s >> 8); else if (n < (s >>8)) n = (s >> 8); + sn = (s >> 8) * m; + if (!isu) n += sn; else if (n < sn) n = sn; } } } diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt index 27de07ac..1d004591 100644 --- a/test/tarantool-tests/ffi-ccall/CMakeLists.txt +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt @@ -2,6 +2,7 @@ list(APPEND tests ffi-ccall-arm64-fp-convention.test.lua lj-205-arm64-osx-ffi-enum-arg.test.lua lj-205-arm64-osx-ffi-small-arg.test.lua + lj-1357-arm64-struct-array-pass-by-val.test.lua ) BuildTestCLib(libfficcall libfficcall.c "${tests}") diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c index fd2d4711..ecb21752 100644 --- a/test/tarantool-tests/ffi-ccall/libfficcall.c +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c @@ -77,3 +77,15 @@ float test_float_stack(float f1, float f2, float f3, float f4, float f5, return f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 + f9 + f10 + f11; } +/****************************************************************/ +/* Homogeneous Floating-Point Aggregate (HFA) argument. */ +/****************************************************************/ + +typedef struct hfa_float2 { + float v[2]; +} hfa_float2; + +float hfa_float2_sum(hfa_float2 h) +{ + return h.v[0] + h.v[1]; +} diff --git a/test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua b/test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua new file mode 100644 index 00000000..bb500de1 --- /dev/null +++ b/test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua @@ -0,0 +1,23 @@ +local ffi = require('ffi') +local tap = require('tap') + +-- The test file to demonstrate incorrect FFI pass-by-value +-- structure with an array HFA member. +-- See also: https://github.com/LuaJIT/LuaJIT/issues/1357. +local test = tap.test('lj-1357-arm64-struct-array-pass-by-val') + +test:plan(1) + +local ffi_ccall = ffi.load('libfficcall') + +ffi.cdef[[ + typedef struct hfa_float2 { + float v[2]; + } hfa_float2; + + float hfa_float2_sum(hfa_float2 h); +]] + +test:is(ffi_ccall.hfa_float2_sum({{1, 2}}), 3, 'HFA float correct') + +test:done(true) -- 2.54.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 3/5] ARM64: Fix pass-by-value struct calling conventions. 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 3/5] ARM64: Fix pass-by-value struct " Sergey Kaplun via Tarantool-patches @ 2026-06-01 12:27 ` Sergey Bronnikov via Tarantool-patches 0 siblings, 0 replies; 10+ messages in thread From: Sergey Bronnikov via Tarantool-patches @ 2026-06-01 12:27 UTC (permalink / raw) To: Sergey Kaplun, Evgeniy Temirgaleev; +Cc: tarantool-patches [-- Attachment #1: Type: text/plain, Size: 5844 bytes --] Hi, Sergey, thanks for the patch! Please see my comments. Sergey On 5/30/26 19:04, Sergey Kaplun wrote: > From: Mike Pall <mike> > > Reported by AnthonyK213. > > (cherry picked from commit c262976486e1e007b56380b6a36bfbea5f51d470) > > The FFI call to the function with the pass-by-value structure containing > the HFA arrays works incorrectly due to the `ccall_classify_struct()` > lacking the handling of the array case. > > This patch adds the corresponding branch to check the single-dimentional > array. However, the multidimensional arrays are not handled. This will > be fixed in the next commit. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#12480 > --- > src/lj_ccall.c | 18 +++++++++++---- > test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 + > test/tarantool-tests/ffi-ccall/libfficcall.c | 12 ++++++++++ > ...57-arm64-struct-array-pass-by-val.test.lua | 23 +++++++++++++++++++ > 4 files changed, 49 insertions(+), 5 deletions(-) > create mode 100644 test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua > > diff --git a/src/lj_ccall.c b/src/lj_ccall.c > index b2705de5..104c9d34 100644 > --- a/src/lj_ccall.c > +++ b/src/lj_ccall.c > @@ -781,17 +781,24 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) > { > CTSize sz = ct->size; > unsigned int r = 0, n = 0, isu = (ct->info & CTF_UNION); > - while (ct->sib) { > + while (ct->sib && n <= 4) { The patch adds a condition that strictly checks a number of elements with the same type (n <= 4). I would also add a test for this with the following `n`: 3/4/5. > + unsigned int m = 1; > CType *sct; > ct = ctype_get(cts, ct->sib); > if (ctype_isfield(ct->info)) { > sct = ctype_rawchild(cts, ct); > + if (ctype_isarray(sct->info)) { > + CType *cct = ctype_rawchild(cts, sct); > + if (!cct->size) continue; > + m = sct->size / cct->size; > + sct = cct; > + } > if (ctype_isfp(sct->info)) { > r |= sct->size; > - if (!isu) n++; else if (n == 0) n = 1; > + if (!isu) n += m; else if (n < m) n = m; The patch also touches a logic for unions (here and below), and it is desired to test it as well. This change was not caught by our regression tests on Apple M2: --- a/src/lj_ccall.c +++ b/src/lj_ccall.c @@ -742,7 +742,7 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct, CType *ctf) sct = ctype_rawchild(cts, ct); if (ctype_isfp(sct->info)) { r |= sct->size; - if (!isu) n++; else if (n == 0) n = 1; + if (!isu) n--; else if (n == 0) n = 1; } else if (ctype_iscomplex(sct->info)) { r |= (sct->size >> 1); if (!isu) n += 2; else if (n < 2) n = 2; > } else if (ctype_iscomplex(sct->info)) { > r |= (sct->size >> 1); > - if (!isu) n += 2; else if (n < 2) n = 2; > + if (!isu) n += 2*m; else if (n < 2*m) n = 2*m; > } else if (ctype_isstruct(sct->info)) { > goto substruct; > } else { > @@ -803,10 +810,11 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) > sct = ctype_rawchild(cts, ct); > substruct: > if (sct->size > 0) { > - unsigned int s = ccall_classify_struct(cts, sct); > + unsigned int s = ccall_classify_struct(cts, sct), sn; > if (s <= 1) goto noth; > r |= (s & 255); > - if (!isu) n += (s >> 8); else if (n < (s >>8)) n = (s >> 8); > + sn = (s >> 8) * m; > + if (!isu) n += sn; else if (n < sn) n = sn; > } > } > } > diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > index 27de07ac..1d004591 100644 > --- a/test/tarantool-tests/ffi-ccall/CMakeLists.txt > +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > @@ -2,6 +2,7 @@ list(APPEND tests > ffi-ccall-arm64-fp-convention.test.lua > lj-205-arm64-osx-ffi-enum-arg.test.lua > lj-205-arm64-osx-ffi-small-arg.test.lua > + lj-1357-arm64-struct-array-pass-by-val.test.lua > ) > > BuildTestCLib(libfficcall libfficcall.c "${tests}") > diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c > index fd2d4711..ecb21752 100644 > --- a/test/tarantool-tests/ffi-ccall/libfficcall.c > +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c > @@ -77,3 +77,15 @@ float test_float_stack(float f1, float f2, float f3, float f4, float f5, > return f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 + f9 + f10 + f11; > } > > +/****************************************************************/ > +/* Homogeneous Floating-Point Aggregate (HFA) argument. */ > +/****************************************************************/ > + > +typedef struct hfa_float2 { > + float v[2]; > +} hfa_float2; > + > +float hfa_float2_sum(hfa_float2 h) > +{ > + return h.v[0] + h.v[1]; > +} > diff --git a/test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua b/test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua > new file mode 100644 > index 00000000..bb500de1 > --- /dev/null > +++ b/test/tarantool-tests/lj-1357-arm64-struct-array-pass-by-val.test.lua > @@ -0,0 +1,23 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +-- The test file to demonstrate incorrect FFI pass-by-value > +-- structure with an array HFA member. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1357. > +local test = tap.test('lj-1357-arm64-struct-array-pass-by-val') > + > +test:plan(1) > + > +local ffi_ccall = ffi.load('libfficcall') > + > +ffi.cdef[[ > + typedef struct hfa_float2 { > + float v[2]; > + } hfa_float2; > + > + float hfa_float2_sum(hfa_float2 h); > +]] > + > +test:is(ffi_ccall.hfa_float2_sum({{1, 2}}), 3, 'HFA float correct') > + > +test:done(true) [-- Attachment #2: Type: text/html, Size: 6746 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH luajit 4/5] FFI: Various ABI and calling convention fixes. 2026-05-30 16:04 [Tarantool-patches] [PATCH luajit 0/5] Various FFI ABI calling conventions fixes Sergey Kaplun via Tarantool-patches ` (2 preceding siblings ...) 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 3/5] ARM64: Fix pass-by-value struct " Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 ` Sergey Kaplun via Tarantool-patches 2026-06-01 13:02 ` Sergey Bronnikov via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 5/5] FFI/MacOS: Fix calling convention for enums Sergey Kaplun via Tarantool-patches 4 siblings, 1 reply; 10+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 UTC (permalink / raw) To: Sergey Bronnikov, Evgeniy Temirgaleev; +Cc: tarantool-patches From: Mike Pall <mike> Thanks to Sergey Kaplun. (cherry picked from commit 5b2e51db2c5e445cb98e026fc1e290c14eca67c1) This patch fixes several issues at once: 1) On x64, the structure pass-by-value on the stack for the small argument size lacks the alignment check. This patch fixes that. 2 ) According to the AAPCS64, the alignment of the argument to be passed by value is determined by its natural alignment [1] instead of the alignment of the type (see B.6 [2]). Not applicable to OSX. For fixing this, we need to store the field alignment for `CT_FIELD` ctype since field alignment determines natural alignment. On OSX the "packed" stack rules [3] applied to non-variadic functions too. This patch fixes that. Unfortunately, it breaks the calling convention for native types to be passed by stack for non-variadic functions. This leads to the failure of tests added in the previous commit. This will be fixed in the next commit. 3) The x64 ABI allows reordering of arguments. The structures that should be passed on the stack due to lack of the corresponding check may be placed in registers anyway. This patch fixes that by adding the corresponding flag in `ccall_set_args()`. 4) Also, this patch fixes the zero-sized bitfield behaviour to be intact with GCC (after 12.1 [4]) and Clang. Be aware that the alignment of zero-sized fields is not applied to each field of the structure. 5) It fixes handling for multidimensional HFA structures on arm64. Structures with zero-sized arrays are considered non-HFA. Also, this commit adds tests for vector arguments and empty structures to be sure that their behaviour still valid after changes. [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#510composite-types [2]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules [3]: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Pass-arguments-to-functions-correctly [4]: https://gcc.gnu.org/gcc-12/changes.html Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#12480 --- src/lj_ccall.c | 59 +- src/lj_cparse.c | 9 +- src/lj_ctype.h | 2 +- .../ffi-call-empty-struct.test.lua | 47 ++ test/tarantool-tests/ffi-ccall/CMakeLists.txt | 4 + test/tarantool-tests/ffi-ccall/libfficcall.c | 587 ++++++++++++++++++ .../ffi-vector-arguments.test.lua | 62 ++ .../lj-1455-arm64-ffi-ccall-hfa.test.lua | 82 +++ .../lj-1455-bitfield0-a16.test.lua | 27 + .../lj-1455-ffi-conventions.test.lua | 441 +++++++++++++ 10 files changed, 1307 insertions(+), 13 deletions(-) create mode 100644 test/tarantool-tests/ffi-call-empty-struct.test.lua create mode 100644 test/tarantool-tests/ffi-vector-arguments.test.lua create mode 100644 test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua create mode 100644 test/tarantool-tests/lj-1455-bitfield0-a16.test.lua create mode 100644 test/tarantool-tests/lj-1455-ffi-conventions.test.lua diff --git a/src/lj_ccall.c b/src/lj_ccall.c index 104c9d34..1beccc10 100644 --- a/src/lj_ccall.c +++ b/src/lj_ccall.c @@ -168,7 +168,9 @@ if (ccall_struct_arg(cc, cts, d, rcl, o, narg)) goto err_nyi; \ nsp = cc->nsp; ngpr = cc->ngpr; nfpr = cc->nfpr; \ continue; \ - } /* Pass all other structs by value on stack. */ + } else { /* Pass all other structs by value on stack. */ \ + onstack = 1; \ + } #define CCALL_HANDLE_COMPLEXARG \ isfp = 2; /* Pass complex in FPRs or on stack. Needs postprocessing. */ @@ -183,7 +185,7 @@ } \ } else { /* Try to pass argument in GPRs. */ \ /* Note that reordering is explicitly allowed in the x64 ABI. */ \ - if (n <= 2 && ngpr + n <= maxgpr) { \ + if (!onstack && n <= 2 && ngpr + n <= maxgpr) { \ dp = &cc->gpr[ngpr]; \ ngpr += n; \ goto done; \ @@ -350,7 +352,7 @@ nfpr = CCALL_NARG_FPR; /* Prevent reordering. */ \ } \ } else { /* Try to pass argument in GPRs. */ \ - if (!LJ_TARGET_OSX && (d->info & CTF_ALIGN) > CTALIGN_PTR) \ + if (!LJ_TARGET_OSX && !rp && ccall_struct_align(cts, d) > CTALIGN_PTR) \ ngpr = (ngpr + 1u) & ~1u; /* Align to regpair. */ \ if (ngpr + n <= maxgpr) { \ dp = &cc->gpr[ngpr]; \ @@ -661,7 +663,7 @@ static int ccall_classify_struct(CTState *cts, CType *ct, int *rcl, CTSize ofs) fofs = ofs+ct->size; if (ctype_isfield(ct->info)) ccall_classify_ct(cts, ctype_rawchild(cts, ct), rcl, fofs); - else if (ctype_isbitfield(ct->info)) + else if (ctype_isbitfield(ct->info) && ctype_bitbsz(ct->info)) rcl[(fofs >= 8)] |= CCALL_RCL_INT; /* NYI: unaligned bitfields? */ else if (ctype_isxattrib(ct->info, CTA_SUBTYPE)) ccall_classify_struct(cts, ctype_rawchild(cts, ct), rcl, fofs); @@ -700,8 +702,11 @@ static int ccall_struct_arg(CCallState *cc, CTState *cts, CType *d, int *rcl, if (ccall_struct_reg(cc, cts, dp, rcl)) { /* Register overflow? Pass on stack. */ MSize nsp = cc->nsp, sz = rcl[1] ? 2*CTSIZE_PTR : CTSIZE_PTR; + MSize align = (1u << ctype_align(d->info)) - 1; if (nsp + sz > CCALL_SIZE_STACK) return 1; /* Too many arguments. */ + if (CCALL_ALIGN_STACKARG && align > CTSIZE_PTR-1) + nsp = (nsp + align) & ~align; /* Align argument on stack. */ cc->nsp = nsp + sz; memcpy((uint8_t *)cc->stack + nsp, dp, sz); } @@ -776,6 +781,31 @@ noth: /* Not a homogeneous float/double aggregate. */ #if LJ_TARGET_ARM64 +#if !LJ_TARGET_OSX +/* Alignment of pass-by-value structs: 8 or 16. */ +static CTInfo ccall_struct_align_arm64(CTState *cts, CType *ct) +{ + CTSize sz; + if (ct->sib) { + while (ct->sib) { + ct = ctype_get(cts, ct->sib); + if (ctype_isfield(ct->info)) { + if ((ct->info & CTF_ALIGN) > CTALIGN_PTR) return CTALIGN(4); + } else if (ctype_isxattrib(ct->info, CTA_SUBTYPE)) { + CType *sct = ctype_rawchild(cts, ct); + CTInfo info = lj_ctype_info(cts, ctype_typeid(cts, sct), &sz); + if ((info & CTF_ALIGN) > CTALIGN_PTR) return CTALIGN(4); + } + } + } else { + CTInfo info = lj_ctype_info(cts, ctype_typeid(cts, ct), &sz); + if ((info & CTF_ALIGN) > CTALIGN_PTR) return CTALIGN(4); + } + return CTALIGN_PTR; +} +#define ccall_struct_align(cts, ct) ccall_struct_align_arm64((cts), (ct)) +#endif + /* Classify a struct based on its fields. */ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) { @@ -787,10 +817,10 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) ct = ctype_get(cts, ct->sib); if (ctype_isfield(ct->info)) { sct = ctype_rawchild(cts, ct); - if (ctype_isarray(sct->info)) { + if (ctype_isarray(sct->info) && !sct->size) goto noth; + while (ctype_isarray(sct->info)) { CType *cct = ctype_rawchild(cts, sct); - if (!cct->size) continue; - m = sct->size / cct->size; + m *= sct->size / cct->size; sct = cct; } if (ctype_isfp(sct->info)) { @@ -804,7 +834,7 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) } else { goto noth; } - } else if (ctype_isbitfield(ct->info)) { + } else if (ctype_isbitfield(ct->info) && ctype_bitbsz(ct->info)) { goto noth; } else if (ctype_isxattrib(ct->info, CTA_SUBTYPE)) { sct = ctype_rawchild(cts, ct); @@ -898,6 +928,11 @@ void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft) #endif +#ifndef ccall_struct_align +/* Alignment of pass-by-value structs. */ +#define ccall_struct_align(cts, ct) ((ct)->info & CTF_ALIGN) +#endif + /* -- Common C call handling ---------------------------------------------- */ /* Infer the destination CTypeID for a vararg argument. @@ -1004,6 +1039,9 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, CTSize sz; MSize n, isfp = 0, isva = 0; void *dp, *rp = NULL; +#if LJ_TARGET_X64 && !LJ_ABI_WIN + int onstack = 0; +#endif if (fid) { /* Get argument type from field. */ CType *ctf = ctype_get(cts, fid); @@ -1042,7 +1080,10 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, /* Otherwise pass argument on stack. */ if (CCALL_ALIGN_STACKARG) { /* Align argument on stack. */ - MSize align = (1u << ctype_align(d->info)) - 1; + MSize align = (1u << ctype_align(ccall_struct_align(cts, d))) - 1; +#if LJ_TARGET_ARM64 && LJ_TARGET_OSX + isva = 1; +#endif if (rp || (CCALL_PACK_STACKARG && isva && align < CTSIZE_PTR-1)) align = CTSIZE_PTR-1; nsp = (nsp + align) & ~align; diff --git a/src/lj_cparse.c b/src/lj_cparse.c index 9f3b032a..ff23b44b 100644 --- a/src/lj_cparse.c +++ b/src/lj_cparse.c @@ -1320,14 +1320,16 @@ static void cp_struct_layout(CPState *cp, CTypeID sid, CTInfo sattr) align = ctype_align(attr); if (cp->packstack[cp->curpack] < align) align = cp->packstack[cp->curpack]; - if (align > maxalign) maxalign = align; + bsz = ctype_bitcsz(ct->info); /* Bitfield size (temp.). */ + if (align > maxalign && bsz) maxalign = align; amask = (8u << align) - 1; - bsz = ctype_bitcsz(ct->info); /* Bitfield size (temp.). */ if (bsz == CTBSZ_FIELD || !ctype_isfield(ct->info)) { bsz = csz; /* Regular fields or subtypes always fill the container. */ bofs = (bofs + amask) & ~amask; /* Start new aligned field. */ ct->size = (bofs >> 3); /* Store field offset. */ + if (ctype_isfield(ct->info)) + ct->info = CTINFO(CT_FIELD, ctype_cid(ct->info)) + CTALIGN(align); } else { /* Bitfield. */ if (bsz == 0 || (attr & CTFP_ALIGNED) || (!((attr|sattr) & CTFP_PACKED) && (bofs & amask) + bsz > csz)) @@ -1335,7 +1337,8 @@ static void cp_struct_layout(CPState *cp, CTypeID sid, CTInfo sattr) /* Prefer regular field over bitfield. */ if (bsz == csz && (bofs & amask) == 0) { - ct->info = CTINFO(CT_FIELD, ctype_cid(ct->info)); + ct->info = CTINFO(CT_FIELD, ctype_cid(ct->info)) + + CTALIGN(lj_fls(sz)); ct->size = (bofs >> 3); /* Store field offset. */ } else { ct->info = CTINFO(CT_BITFIELD, diff --git a/src/lj_ctype.h b/src/lj_ctype.h index 2d393eb9..bee3f72a 100644 --- a/src/lj_ctype.h +++ b/src/lj_ctype.h @@ -51,7 +51,7 @@ LJ_STATIC_ASSERT(((int)CT_STRUCT & (int)CT_ARRAY) == CT_STRUCT); ** |FUNC ....VS.. cc cid | nargs | field | name? | name? | ** |TYPEDEF cid | | | name | name | ** |ATTRIB attrnum cid | attr | sib? | type? | | -** |FIELD cid | offset | field | | name? | +** |FIELD A cid | offset | field | | name? | ** |BITFIELD B.cvU csz bsz pos | offset | field | | name? | ** |CONSTVAL c cid | value | const | name | name | ** |EXTERN cid | | sib? | name | name | diff --git a/test/tarantool-tests/ffi-call-empty-struct.test.lua b/test/tarantool-tests/ffi-call-empty-struct.test.lua new file mode 100644 index 00000000..cb7d3ea2 --- /dev/null +++ b/test/tarantool-tests/ffi-call-empty-struct.test.lua @@ -0,0 +1,47 @@ +local ffi = require('ffi') +local tap = require('tap') + +-- The test file to check FFI correctness for the empty structs. +local test = tap.test('ffi-call-empty-struct') + +test:plan(6) + +local ffi_ccall = ffi.load('libfficcall') + +ffi.cdef[[ + struct empty {}; + struct super_empty {int arg[0];}; + struct sort_of_empty {struct super_empty;}; + + struct empty empty_ret(void); + struct super_empty super_empty_ret(void); + struct sort_of_empty sort_of_empty_ret(void); + + int super_empty_arg(struct super_empty e, int a); + int sort_of_empty_arg(struct sort_of_empty e, int a); + int empty_arg(struct empty e, int a); +]] + +local MAGIC = 42LL + +local empty_t = ffi.typeof('struct empty') +local super_empty_t = ffi.typeof('struct super_empty') +local sort_of_empty_t = ffi.typeof('struct sort_of_empty') + +test:is(ffi.typeof(ffi_ccall.empty_ret()), empty_t, 'correct empty ret type') +test:is(ffi.typeof(ffi_ccall.super_empty_ret()), super_empty_t, + 'correct super_empty ret type') +test:is(ffi.typeof(ffi_ccall.sort_of_empty_ret()), sort_of_empty_t, + 'correct sort_of_empty ret type') + +local empty_o = empty_t() +local super_empty_o = super_empty_t() +local sort_of_empty_o = sort_of_empty_t() + +test:is(ffi_ccall.empty_arg(empty_o, MAGIC), MAGIC, 'correct empty arg handle') +test:is(ffi_ccall.super_empty_arg(super_empty_o, MAGIC), MAGIC, + 'correct super_empty arg handle') +test:is(ffi_ccall.sort_of_empty_arg(sort_of_empty_o, MAGIC), MAGIC, + 'correct sort_of_empty arg handle') + +test:done(true) diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt index 1d004591..dfd58bd2 100644 --- a/test/tarantool-tests/ffi-ccall/CMakeLists.txt +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt @@ -1,8 +1,12 @@ list(APPEND tests + ffi-call-empty-struct.test.lua + ffi-vector-arguments.test.lua ffi-ccall-arm64-fp-convention.test.lua lj-205-arm64-osx-ffi-enum-arg.test.lua lj-205-arm64-osx-ffi-small-arg.test.lua lj-1357-arm64-struct-array-pass-by-val.test.lua + lj-1455-arm64-ffi-ccall-hfa.test.lua + lj-1455-ffi-conventions.test.lua ) BuildTestCLib(libfficcall libfficcall.c "${tests}") diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c index ecb21752..60d30f3c 100644 --- a/test/tarantool-tests/ffi-ccall/libfficcall.c +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c @@ -1,4 +1,13 @@ #include <stdint.h> +#include <stdarg.h> + +#define lengthof(a) (sizeof(a) / sizeof((a)[0])) + +#define UNUSED(x) ((void)(x)) + +#if defined(__clang__) +#undef __GNUC__ +#endif struct sz12_t { float f1; @@ -85,7 +94,585 @@ typedef struct hfa_float2 { float v[2]; } hfa_float2; +typedef struct hfa_float22 { + float v[2][2]; +} hfa_float22; + +typedef struct non_hfa_float222 { + float v[2][2][2]; +} non_hfa_float222; + +typedef struct hfa_double2 { + double v[2]; +} hfa_double2; + +typedef struct hfa_double2_a16 { + __attribute__((__aligned__(16))) double v[2]; +} hfa_double2_a16; + +typedef struct hfa_double2_a32 { + __attribute__((__aligned__(32))) double v[4]; +} hfa_double2_a32; + float hfa_float2_sum(hfa_float2 h) { return h.v[0] + h.v[1]; } + +float hfa_float22_sum(hfa_float22 h) +{ + return h.v[0][0] + h.v[0][1] + h.v[1][0] + h.v[1][1]; +} + +float non_hfa_float222_sum(non_hfa_float222 h) +{ + return h.v[0][0][0] + h.v[0][0][1] + h.v[0][1][0] + h.v[0][1][1] + + h.v[1][0][0] + h.v[1][0][1] + h.v[1][1][0] + h.v[1][1][1]; +} + +/* + * Incorrect GCC behaviour. + * See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125023. + */ +#if !defined(__GNUC__) +typedef struct hfa_float_hole { + float x; + float hole[0][2][2]; + float y; +} hfa_float_hole; + +float hfa_float_hole_sum(hfa_float_hole h) +{ + return h.x + h.y; +} +#endif /* GCC */ + +double hfa_double2_sum(hfa_double2 h) +{ + return h.v[0] + h.v[1]; +} + +double hfa_double2_a16_sum(hfa_double2_a16 h) +{ + return h.v[0] + h.v[1]; +} + +double hfa_double2_a32_sum(hfa_double2_a32 h) +{ + return h.v[0] + h.v[1] + h.v[2] + h.v[3]; +} + +/* + * Enable only for GCC >= 12.0.0 + * See the first paragraph about the 0 bitfield: + * https://gcc.gnu.org/gcc-12/changes.html. + */ +#if !defined(__GNUC__) || __GNUC__ >= 12 +typedef struct hfa_0bitfield { + float x; + int : 0; + float y; + float z; +} hfa_0bitfield; + +float hfa_0bitfield_sum(hfa_0bitfield h) +{ + return h.x + h.y + h.z; +} +#endif /* GNUC >= 12.0.0 */ + +/****************************************************************/ +/* Empty structures. */ +/****************************************************************/ + +struct empty {}; + +struct super_empty { + int arr[0]; +}; + +struct sort_of_empty { + struct super_empty e; +}; + +struct empty empty_ret(void) +{ + struct empty e; + return e; +} + +struct super_empty super_empty_ret(void) +{ + struct super_empty e; + return e; +} + +struct sort_of_empty sort_of_empty_ret(void) +{ + struct sort_of_empty e; + return e; +} + +int empty_arg(struct empty e, int a) +{ + return a; +} + +int super_empty_arg(struct super_empty e, int a) +{ + return a; +} + +int sort_of_empty_arg(struct sort_of_empty e, int a) +{ + return a; +} + +/****************************************************************/ +/* Vector passing. */ +/****************************************************************/ + +/* Test direct vector passing. */ +typedef float vfloatx2 __attribute__ ((__vector_size__ (8))); +typedef float vfloatx4 __attribute__ ((__vector_size__ (16))); + +/* Return the given value without change. */ +vfloatx2 vfloatx2_call(vfloatx2 x) { return x; } +vfloatx4 vfloatx4_call(vfloatx4 x) { return x; } + +typedef int int32x4_t __attribute__((__vector_size__ (4 * 4))); + +int32x4_t test_hva_varg(int n, ...) +{ + va_list vl; + va_start(vl, n); + int32x4_t a = va_arg(vl, int32x4_t); + int32x4_t b = va_arg(vl, int32x4_t); + va_end(vl); + int32x4_t res = a + b; + return res; +} + +/****************************************************************/ +/* Various argument types. */ +/****************************************************************/ + +/* Testing alignment with aggregates. */ + +/* + * HFA, aggregates with size <= 16 bytes and aggregates with + * size > 16 bytes. + */ +typedef struct hfa_floatx4_a16 { + float v[4]; +} __attribute__((aligned(16))) hfa_floatx4_a16; + +float test_2_align_hfa(int i, hfa_floatx4_a16 s1, hfa_floatx4_a16 s2) +{ + UNUSED(i); + const float *v1 = s1.v; + const float *v2 = s2.v; + return v1[0] + v1[1] + v1[2] + v1[3] + v2[0] + v2[1] + v2[2] + v2[3]; +} + +/* Testing 16-byte aggregate. */ +typedef struct intx4_a16 { + int v[4]; +} __attribute__((aligned(16))) intx4_a16; + +int test_2_intx4_a16(int i, intx4_a16 s1, intx4_a16 s2) +{ + const int *v1 = s1.v; + const int *v2 = s2.v; + return i + v1[0] + v1[1] + v1[2] + v1[3] + + v2[0] + v2[1] + v2[2] + v2[3]; +} + +/* Testing large aggregate. */ +typedef struct large_agg_a16 { + int v[18]; +} __attribute__((aligned(16))) large_agg_a16; + +int test_2_large_agg_a16(int x, large_agg_a16 s1, large_agg_a16 s2) +{ + const int *v1 = s1.v; + const int *v2 = s2.v; + int sum = x; + for (int i = 0; i < lengthof(s1.v); i++) { + sum += v1[i] + v2[i]; + } + return sum; +} + +typedef struct intx3_0bitfield { + int x; + int : 0; + int y; + int z; +} intx3_0bitfield; + +int test_2_intx3_0bitfield_reg(int i, intx3_0bitfield s1, intx3_0bitfield s2) +{ + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; +} + +int test_2_intx3_0bitfield_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, intx3_0bitfield s1, + intx3_0bitfield s2) +{ + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + + s2.x + s2.y + s2.z; +} + +typedef struct intx3_0bitfield_a16 { + int x; + int : 0 __attribute__((aligned(16))); + int y; + int z; +} intx3_0bitfield_a16; + +int test_2_intx3_0bitfield_a16_reg(int i, intx3_0bitfield_a16 s1, + intx3_0bitfield_a16 s2) +{ + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; +} + +int test_2_intx3_0bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + intx3_0bitfield_a16 s1, + intx3_0bitfield_a16 s2) +{ + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + + s2.x + s2.y + s2.z; +} + +typedef struct intx3_full_bitfield_a16 { + int x; + int y: 32 __attribute__((aligned(16))); + int z; +} intx3_full_bitfield_a16; + +int test_2_intx3_full_bitfield_a16_reg(int i, intx3_full_bitfield_a16 s1, + intx3_full_bitfield_a16 s2) +{ + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; +} + +int test_2_intx3_full_bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + intx3_full_bitfield_a16 s1, + intx3_full_bitfield_a16 s2) +{ + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + + s2.x + s2.y + s2.z; +} + +typedef struct intx3_half_bitfield { + int x : 16; + int y : 16; + int z; +} intx3_half_bitfield; + +int test_2_intx3_half_bitfield_reg(int i, intx3_half_bitfield s1, + intx3_half_bitfield s2) +{ + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; +} + +int test_2_intx3_half_bitfield_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + intx3_half_bitfield s1, + intx3_half_bitfield s2) +{ + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + + s2.x + s2.y + s2.z; +} + +typedef struct intx3_half_bitfield_a16 { + int x : 16; + int y : 16 __attribute__((aligned(16))); + int z; +} intx3_half_bitfield_a16; + +int test_2_intx3_half_bitfield_a16_reg(int i, intx3_half_bitfield_a16 s1, + intx3_half_bitfield_a16 s2) +{ + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; +} + +int test_2_intx3_half_bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + intx3_half_bitfield_a16 s1, + intx3_half_bitfield_a16 s2) +{ + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + + s2.x + s2.y + s2.z; +} + +/* Increased natural alignment. */ +typedef struct la16l { + long long x __attribute__((aligned(16))); + long long y; +} la16l; + +int test_2_la16l_reg(int i, la16l s1, la16l s2) +{ + return s1.x + s2.x + i + s1.y + s2.y; +} + +int test_2_la16l_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, + int i8, int i9, la16l s1, la16l s2) +{ + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + + s2.y; +} + +/****************************************************************/ +/* Transparent structures. */ +/****************************************************************/ + +typedef struct a16_tsp { + struct { + long long x; + long long y; + } __attribute__((aligned(16))); +} a16_tsp; + +int test_2_a16_tsp_reg(int i, a16_tsp s1, a16_tsp s2) +{ + return s1.x + s2.x + i + s1.y + s2.y; +} + +int test_2_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, + int i8, int i9, a16_tsp s1, a16_tsp s2) +{ + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + + s2.y; +} + +typedef struct f_a16_tsp { + struct { + long long x __attribute__((aligned(16))); + long long y; + }; +} f_a16_tsp; + +int test_2_f_a16_tsp_reg(int i, f_a16_tsp s1, f_a16_tsp s2) +{ + return s1.x + s2.x + i + s1.y + s2.y; +} + +int test_2_f_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, f_a16_tsp s1, f_a16_tsp s2) +{ + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + + s2.y; +} + +/* + * Test passing structs with size < 8, < 16 and > 16 + * with alignment of 16 and without. + * + * Structs with size <= 8 bytes, without alignment attribute + * passed as i64 regardless of the align attribute. + */ +typedef struct is_no_align { + int i; + short s; +} is_no_align; + +int test_2_is_no_align_reg(int i, is_no_align s1, is_no_align s2) +{ + return s1.i + s2.i + i + s1.s + s2.s; +} + +int test_2_is_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, is_no_align s1, + is_no_align s2) +{ + return s1.i + s2.i + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.s + + s2.s; +} + +/* Structs with size <= 8 bytes, with alignment attribute. */ +typedef struct is_a16 { + int i; + short s; +} __attribute__((aligned(16))) is_a16; + +int test_2_is_a16_reg(int i, is_a16 s1, is_a16 s2) +{ + return s1.i + s2.i + i + s1.s + s2.s; +} + +int test_2_is_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, + int i8, int i9, is_a16 s1, is_a16 s2) +{ + return s1.i + s2.i + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.s + + s2.s; +} + +/* Structs with size <= 16 bytes, without alignment attribute. */ +typedef struct isis_no_align { + int i; + short s; + int i2; + short s2; +} isis_no_align; + +int test_2_isis_no_align_reg(int i, isis_no_align s1, isis_no_align s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + i + s1.s + s2.s + s1.s2 + s2.s2; +} + +int test_2_isis_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, isis_no_align s1, + isis_no_align s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2; +} + +/* Structs with size <= 16 bytes, with alignment attribute. */ +typedef struct isis_a16 { + int i; + short s; + int i2; + short s2; +} __attribute__((aligned(16))) isis_a16; + +int test_2_isis_a16_reg(int i, isis_a16 s1, isis_a16 s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + i + s1.s + s2.s + s1.s2 + s2.s2; +} + +int test_2_isis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, + int i8, int i9, isis_a16 s1, isis_a16 s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2; +} + +/* structs with size > 16 bytes, without alignment attribute. */ +typedef struct isisis { + int i; + short s; + int i2; + short s2; + int i3; + short s3; +} isisis_no_align; + +int test_2_isisis_no_align_reg(int i, isisis_no_align s1, isisis_no_align s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + s1.s + s2.s + + s1.s2 + s2.s2 + s1.s3 + s2.s3; +} + +int test_2_isisis_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, isisis_no_align s1, + isisis_no_align s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + i2 + i3 + i4 + + i5 + i6 + i7 + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2 + s1.s3 + + s2.s3; +} + +/* Structs with size > 16 bytes, with alignment attribute. */ +typedef struct isisis_a16 { + int i; + short s; + int i2; + short s2; + int i3; + short s3; +} __attribute__((aligned(16))) isisis_a16; + +int test_2_isisis_a16_reg(int i, isisis_a16 s1, isisis_a16 s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + s1.s + s2.s + + s1.s2 + s2.s2 + s1.s3 + s2.s3; +} + +int test_2_isisis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, isisis_a16 s1, + isisis_a16 s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + i2 + i3 + i4 + + i5 + i6 + i7 + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2 + s1.s3 + + s2.s3; +} + +/* We should not split struct argument between regs and stack. */ +int test_2_isis_no_align_split(int i, int i2, int i3, int i4, int i5, int i6, + int i7, isis_no_align s1, isis_no_align s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + + s1.s + s2.s + s1.s2 + s2.s2; +} + +int test_2_isis_a16_split(int i, int i2, int i3, int i4, int i5, int i6, int i7, + isis_a16 s1, isis_a16 s2) +{ + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + + s1.s + s2.s + s1.s2 + s2.s2; +} + +/****************************************************************/ +/* Packed structures. */ +/****************************************************************/ + +typedef struct ill_packed { + int x; + long long y; +} __attribute__((packed)) ill_packed; + +typedef struct ii { + int x; + int y; +} ii; + +/* Passing structs with unaligned fields, not in registers. */ +int test_2_ill_packed(int i, ill_packed s1, ill_packed s2) +{ + return s1.x + s2.x + i + s1.y + s2.y; +} + +int test_2_ill_packed_reord(int i, ill_packed s1, ill_packed s2, int i2, ii s3) +{ + return s1.x + s2.x + i + s1.y + s2.y + i2 + s3.x + s3.y; +} + +int test_2_ill_packed_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, ill_packed s1, + ill_packed s2) +{ + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + + s2.y; +} + +/* Packed structure, overaligned, same as above. */ +typedef struct ill_packed_a16 { + int x; + long long y; +} __attribute__((packed, aligned(16))) ill_packed_a16; + +/* Passing structs with unaligned fields not in registers. */ +int test_2_ill_packed_a16(int i, ill_packed_a16 s1, ill_packed_a16 s2) +{ + return s1.x + s2.x + i + s1.y + s2.y; +} + +int test_2_ill_packed_a16_reord(int i, ill_packed_a16 s1, ill_packed_a16 s2, + int i2, ii s3) +{ + return s1.x + s2.x + i + s1.y + s2.y + i2 + s3.x + s3.y; +} + +int test_2_ill_packed_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, ill_packed_a16 s1, + ill_packed_a16 s2) +{ + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + + s2.y; +} diff --git a/test/tarantool-tests/ffi-vector-arguments.test.lua b/test/tarantool-tests/ffi-vector-arguments.test.lua new file mode 100644 index 00000000..330ec991 --- /dev/null +++ b/test/tarantool-tests/ffi-vector-arguments.test.lua @@ -0,0 +1,62 @@ +local ffi = require('ffi') +local tap = require('tap') + +-- The test file to check FFI vector passing correctness. +local test = tap.test('ffi-vector-arguments'):skipcond({ + ['NYI for non x64 arches'] = jit.arch ~= 'x64', +}) + +local SIZING +-- Only those are implemented. +if jit.arch == 'x64' then + SIZING = {2, 4,} +else + SIZING = {} +end + +test:plan(#SIZING + 1) + +local ffi_ccall = ffi.load('libfficcall') + +ffi.cdef[[ + typedef float vfloatx2 __attribute__ ((__vector_size__ (8))); + typedef float vfloatx4 __attribute__ ((__vector_size__ (16))); + + vfloatx2 vfloatx2_call(vfloatx2 x); + vfloatx4 vfloatx4_call(vfloatx4 x); + + typedef int int32x4_t __attribute__((__vector_size__ (4 * 4))); + int32x4_t test_hva_varg(int n, ...); +]] + +local function test_self_ret_vector(subtest, nelem) + subtest:plan(1) + local typestr = 'vfloatx' .. nelem + local f = ffi_ccall[typestr .. '_call'] + local arg = {} + for i = 1, nelem do + arg[i - 1] = i + 0LL + end + local res = f(arg) + local table_res = {} + for i = 0, nelem - 1 do + table_res[i] = res[i] + end + subtest:is_deeply(table_res, arg, + 'correct result for ' .. nelem .. '-sized vec') +end + +for i = 1, #SIZING do + test:test('vec-' .. SIZING[i], test_self_ret_vector, SIZING[i]) +end + +local hva_arg_vec = ffi.new('int32x4_t', {0LL, 1LL, 2LL, 3LL}) +local hva_res = ffi_ccall.test_hva_varg(0LL, hva_arg_vec, hva_arg_vec) +local hva_res_tab = {} +local hva_expected = {[0] = 0, 2, 4, 6} +for i = 0, 3 do + hva_res_tab[i] = hva_res[i] +end +test:is_deeply(hva_res_tab, hva_expected, 'correct hva with the int type') + +test:done(true) diff --git a/test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua b/test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua new file mode 100644 index 00000000..8552a5f0 --- /dev/null +++ b/test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua @@ -0,0 +1,82 @@ +local ffi = require('ffi') +local tap = require('tap') + +-- The test file to test various FFI C call conventions for HFA +-- aggregates. +-- See also: https://github.com/LuaJIT/LuaJIT/issues/1455. +local test = tap.test('lj-1455-arm64-ffi-ccall-hfa') + +test:plan(7) + +local ffi_ccall = ffi.load('libfficcall') + +ffi.cdef[[ + typedef struct hfa_float22 { + float v[2][2]; + } hfa_float22; + + + typedef struct non_hfa_float222 { + float v[2][2][2]; + } non_hfa_float222; + + typedef struct hfa_float_hole { + float x; + float hole[0][2][2]; + float y; + } hfa_float_hole; + + typedef struct hfa_double2 { + double v[2]; + } hfa_double2; + + typedef struct hfa_double2_a16 { + __attribute__((__aligned__(16))) double v[2]; + } hfa_double2_a16; + + typedef struct hfa_double2_a32 { + __attribute__((__aligned__(32))) double v[4]; + } hfa_double2_a32; + + float hfa_float22_sum(hfa_float22 h); + double hfa_double2_sum(hfa_double2 h); + double hfa_double2_a16_sum(hfa_double2_a16 h); + double hfa_double2_a32_sum(hfa_double2_a32 h); + + typedef struct hfa_0bitfield { + float x; + int : 0; + float y; + float z; + } hfa_0bitfield; + + float hfa_0bitfield_sum(hfa_0bitfield h); + + float non_hfa_float222_sum(non_hfa_float222 h); + + float hfa_float_hole_sum(hfa_float_hole h); +]] + +test:is(ffi_ccall.hfa_float22_sum({{{1, 2}, {3, 4}}}), 10, + 'HFA 2 dimensional correct') +test:is(ffi_ccall.non_hfa_float222_sum({{{{1, 2}, {3, 4}},{{5, 6}, {7, 8}}}}), + 36, 'non HFA array correct') +local supported, func = pcall(function() + return ffi_ccall.hfa_float_hole_sum +end) +if supported then + test:is(func({x = 1, y = 2}), 3, 'HFA float hole correct') +else + test:skip('HFA float hole -- Unsupported by C compiler') +end +test:is(ffi_ccall.hfa_double2_sum({{1, 2}}), 3, 'HFA double correct') +test:is(ffi_ccall.hfa_double2_a16_sum({{1, 2}}), 3, 'align 16 correct') +test:is(ffi_ccall.hfa_double2_a32_sum({{1, 2, 3, 4}}), 10, 'align 32 correct') +supported, func = pcall(function() return ffi_ccall.hfa_0bitfield_sum end) +if supported then + test:is(func({x = 1, y = 2, z = 3}), 6, 'HFA 0 bitfield correct') +else + test:skip('HFA 0 bitfield -- Unsupported by C compiler') +end + +test:done(true) diff --git a/test/tarantool-tests/lj-1455-bitfield0-a16.test.lua b/test/tarantool-tests/lj-1455-bitfield0-a16.test.lua new file mode 100644 index 00000000..6f8e9aac --- /dev/null +++ b/test/tarantool-tests/lj-1455-bitfield0-a16.test.lua @@ -0,0 +1,27 @@ +local ffi = require('ffi') +local tap = require('tap') + +-- The test file demonstrates incorrect FFI attributes for the +-- structure with a zero-sized bitfield. +-- See also: https://github.com/LuaJIT/LuaJIT/issues/1455. +local test = tap.test('lj-1455-ffi-conventions') + +test:plan(3) + +ffi.cdef[[ + typedef struct { + int x; + int : 0 __attribute__((aligned(16))); + int y; + int z; + } intx3_0bitfield_a16; +]] + +test:is(ffi.sizeof(ffi.new('intx3_0bitfield_a16')), 24, + 'correct size of struct with 0 bitfield') +test:is(ffi.offsetof('intx3_0bitfield_a16', 'y'), 16, + 'correct offset of field after 0 bitfield') +test:is(ffi.alignof('intx3_0bitfield_a16'), 4, + 'correct total align of struct with 0 bitfield') + +test:done(true) diff --git a/test/tarantool-tests/lj-1455-ffi-conventions.test.lua b/test/tarantool-tests/lj-1455-ffi-conventions.test.lua new file mode 100644 index 00000000..e7a45736 --- /dev/null +++ b/test/tarantool-tests/lj-1455-ffi-conventions.test.lua @@ -0,0 +1,441 @@ +local ffi = require('ffi') +local tap = require('tap') + +-- The test file to test various FFI C call conventions. +-- See also: https://github.com/LuaJIT/LuaJIT/issues/1455. +local test = tap.test('lj-1455-ffi-conventions') + +test:plan(39) + +local ffi_ccall = ffi.load('libfficcall') + +ffi.cdef[[ + typedef struct hfa_floatx4_a16 { + float v[4]; + } __attribute__((aligned(16))) hfa_floatx4_a16; + + float test_2_align_hfa(int i, hfa_floatx4_a16 s1, hfa_floatx4_a16 s2); + + typedef struct intx4_a16 { + int v[4]; + } __attribute__((aligned(16))) intx4_a16; + + int test_2_intx4_a16(int i, intx4_a16 s1, intx4_a16 s2); + + typedef struct large_agg_a16 { + int v[18]; + } __attribute__((aligned(16))) large_agg_a16; + + int test_2_large_agg_a16(int x, large_agg_a16 s1, large_agg_a16 s2); + + typedef struct intx3_0bitfield { + int x; + int : 0; + int y; + int z; + } intx3_0bitfield; + + int test_2_intx3_0bitfield_reg(int i, intx3_0bitfield s1, intx3_0bitfield s2); + int test_2_intx3_0bitfield_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + intx3_0bitfield s1, intx3_0bitfield s2); + + typedef struct intx3_0bitfield_a16 { + int x; + int : 0 __attribute__((aligned(16))); + int y; + int z; + } intx3_0bitfield_a16; + + int test_2_intx3_0bitfield_a16_reg(int i, intx3_0bitfield_a16 s1, + intx3_0bitfield_a16 s2); + int test_2_intx3_0bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + intx3_0bitfield_a16 s1, + intx3_0bitfield_a16 s2); + + typedef struct intx3_full_bitfield_a16 { + int x; + int y: 32 __attribute__((aligned(16))); + int z; + } intx3_full_bitfield_a16; + + int test_2_intx3_full_bitfield_a16_reg(int i, intx3_full_bitfield_a16 s1, + intx3_full_bitfield_a16 s2); + int test_2_intx3_full_bitfield_a16_stack(int i, int i2, int i3, int i4, + int i5, int i6, int i7, int i8, + int i9, intx3_full_bitfield_a16 s1, + intx3_full_bitfield_a16 s2); + + typedef struct intx3_half_bitfield { + int x : 16; + int y : 16; + int z; + } intx3_half_bitfield; + + int test_2_intx3_half_bitfield_reg(int i, intx3_half_bitfield s1, + intx3_half_bitfield s2); + int test_2_intx3_half_bitfield_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + intx3_half_bitfield s1, + intx3_half_bitfield s2); + + typedef struct intx3_half_bitfield_a16 { + int x : 16; + int y : 16 __attribute__((aligned(16))); + int z; + } intx3_half_bitfield_a16; + + int test_2_intx3_half_bitfield_a16_reg(int i, intx3_half_bitfield_a16 s1, + intx3_half_bitfield_a16 s2); + int test_2_intx3_half_bitfield_a16_stack(int i, int i2, int i3, int i4, + int i5, int i6, int i7, int i8, + int i9, intx3_half_bitfield_a16 s1, + intx3_half_bitfield_a16 s2); + + typedef struct la16l { + long long x __attribute__((aligned(16))); + long long y; + } la16l; + + int test_2_la16l_reg(int i, la16l s1, la16l s2); + int test_2_la16l_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, + int i8, int i9, la16l s1, la16l s2); + + typedef struct a16_tsp { + struct { + long long x; + long long y; + } __attribute__((aligned(16))); + } a16_tsp; + + int test_2_a16_tsp_reg(int i, a16_tsp s1, a16_tsp s2); + int test_2_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, a16_tsp s1, a16_tsp s2); + + typedef struct f_a16_tsp { + struct { + long long x __attribute__((aligned(16))); + long long y; + }; + } f_a16_tsp; + + int test_2_f_a16_tsp_reg(int i, f_a16_tsp s1, f_a16_tsp s2); + int test_2_f_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, f_a16_tsp s1, + f_a16_tsp s2); + + typedef struct is_no_align { + int i; + short s; + } is_no_align; + + int test_2_is_no_align_reg(int i, is_no_align s1, is_no_align s2); + int test_2_is_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, is_no_align s1, + is_no_align s2); + + typedef struct is_a16 { + int i; + short s; + } __attribute__((aligned(16))) is_a16; + + int test_2_is_a16_reg(int i, is_a16 s1, is_a16 s2); + int test_2_is_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, + int i8, int i9, is_a16 s1, is_a16 s2); + + typedef struct isis_no_align { + int i; + short s; + int i2; + short s2; + } isis_no_align; + + int test_2_isis_no_align_reg(int i, isis_no_align s1, isis_no_align s2); + int test_2_isis_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, isis_no_align s1, + isis_no_align s2); + + typedef struct isis_a16 + { + int i; + short s; + int i2; + short s2; + } __attribute__((aligned(16))) isis_a16; + + int test_2_isis_a16_reg(int i, isis_a16 s1, isis_a16 s2); + int test_2_isis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, isis_a16 s1, isis_a16 s2); + + typedef struct isisis + { + int i; + short s; + int i2; + short s2; + int i3; + short s3; + } isisis_no_align; + + int test_2_isisis_no_align_reg(int i, isisis_no_align s1, isisis_no_align s2); + int test_2_isisis_no_align_stack(int i, int i2, int i3, int i4, int i5, + int i6, int i7, int i8, int i9, + isisis_no_align s1, isisis_no_align s2); + + typedef struct isisis_a16 + { + int i; + short s; + int i2; + short s2; + int i3; + short s3; + } __attribute__((aligned(16))) isisis_a16; + + int test_2_isisis_a16_reg(int i, isisis_a16 s1, isisis_a16 s2); + int test_2_isisis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, isisis_a16 s1, + isisis_a16 s2); + + int test_2_isis_no_align_split(int i, int i2, int i3, int i4, int i5, int i6, + int i7, isis_no_align s1, isis_no_align s2); + int test_2_isis_a16_split(int i, int i2, int i3, int i4, int i5, int i6, + int i7, isis_a16 s1, isis_a16 s2); + + typedef struct ill_packed { + int x; + long long y; + } __attribute__((packed)) ill_packed; + + typedef struct ii { + int x; + int y; + } ii; + + int test_2_ill_packed(int i, ill_packed s1, ill_packed s2); + int test_2_ill_packed_reord(int i, ill_packed s1, ill_packed s2, int i2, + ii s3); + int test_2_ill_packed_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, ill_packed s1, + ill_packed s2); + + typedef struct ill_packed_a16 { + int x; + long long y; + } __attribute__((packed, aligned(16))) ill_packed_a16; + + int test_2_ill_packed_a16(int i, ill_packed_a16 s1, ill_packed_a16 s2); + int test_2_ill_packed_a16_reord(int i, ill_packed_a16 s1, ill_packed_a16 s2, + int i2, ii s3); + int test_2_ill_packed_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, + int i7, int i8, int i9, ill_packed_a16 s1, + ill_packed_a16 s2); +]] + +test:is(ffi_ccall.test_2_align_hfa(0LL, + {{0, 1, 2, 3}}, {{4, 5, 6, 7}}), + 28, 'correct align hfa') + +test:is(ffi_ccall.test_2_intx4_a16(0LL, + {{0LL, 1LL, 2LL, 3LL}}, {{4LL, 5LL, 6LL, 7LL}}), + 28, 'correct align hva') + +local LARGE_HVA_SZ = 18 +local large_agg_sum = 0LL +local large_agg1 = {} +local large_agg2 = {} +for i = 0, LARGE_HVA_SZ - 1 do + large_agg1[i] = i + 0LL + large_agg2[i] = LARGE_HVA_SZ + i + 0LL + large_agg_sum = large_agg_sum + large_agg1[i] + large_agg2[i] +end + +test:is(ffi_ccall.test_2_large_agg_a16(0LL, {large_agg1}, {large_agg2}), + large_agg_sum, 'correct large align agg') + +test:is(ffi_ccall.test_2_intx3_0bitfield_reg(0LL, + {x = 1LL, y = 2LL, z = 3LL}, {x = 4LL, y = 5LL, z = 6LL}), + 21, 'correct intx3 0 bitfield reg') + +test:is(ffi_ccall.test_2_intx3_0bitfield_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL, z = 12LL}, + {x = 13LL, y = 14LL, z = 15LL}), + 120, 'correct intx3 0 bitfield stack') + +test:is(ffi_ccall.test_2_intx3_0bitfield_a16_reg(0LL, + {x = 1LL, y = 2LL, z = 3LL}, + {x = 4LL, y = 5LL, z = 6LL}), + 21, 'correct intx3 0 bitfield align 16 reg') + +test:is(ffi_ccall.test_2_intx3_0bitfield_a16_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL, z = 12LL}, + {x = 13LL, y = 14LL, z = 15LL}), + 120, 'correct intx3 0 bitfield align 16 stack') + +test:is(ffi_ccall.test_2_intx3_full_bitfield_a16_reg( + 0LL, + {x = 1LL, y = 2LL, z = 3LL}, + {x = 4LL, y = 5LL, z = 6LL}), + 21, 'correct intx3 0 full bitfield align 16 reg') + +test:is(ffi_ccall.test_2_intx3_full_bitfield_a16_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL, z = 12LL}, + {x = 13LL, y = 14LL, z = 15LL}), + 120, 'correct intx3 full bitfield align 16 stack') + +test:is(ffi_ccall.test_2_intx3_half_bitfield_reg(0LL, + {x = 1LL, y = 2LL, z = 3LL}, + {x = 4LL, y = 5LL, z = 6LL}), + 21, 'correct intx3 0 half bitfield reg') + +test:is(ffi_ccall.test_2_intx3_half_bitfield_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL, z = 12LL}, + {x = 13LL, y = 14LL, z = 15LL}), + 120, 'correct intx3 half bitfield stack') + +test:is(ffi_ccall.test_2_intx3_half_bitfield_a16_reg(0LL, + {x = 1LL, y = 2LL, z = 3LL}, + {x = 4LL, y = 5LL, z = 6LL}), + 21, 'correct intx3 0 half bitfield align 16 reg') + +test:is(ffi_ccall.test_2_intx3_half_bitfield_a16_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL, z = 12LL}, + {x = 13LL, y = 14LL, z = 15LL}), + 120, 'correct intx3 half bitfield align 16 stack') + +test:is(ffi_ccall.test_2_la16l_reg(0LL, {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), + 10, 'correct la16l reg') + +test:is(ffi_ccall.test_2_la16l_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), + 91, 'correct la16l stack') + +test:is(ffi_ccall.test_2_a16_tsp_reg(0LL, + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), + 10, 'correct tsp reg') + +test:is(ffi_ccall.test_2_a16_tsp_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), + 91, 'correct tsp stack') + +test:is(ffi_ccall.test_2_f_a16_tsp_reg(0LL, + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), + 10, 'correct tsp aligned field reg') + +test:is(ffi_ccall.test_2_f_a16_tsp_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), + 91, 'correct tsp aligned field stack') + +test:is(ffi_ccall.test_2_is_no_align_reg(0LL, + {i = 1LL, s = 2LL}, {i = 3LL, s = 4LL}), + 10, 'correct is no align reg') + +test:is(ffi_ccall.test_2_is_no_align_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {i = 10LL, s = 11LL}, {i = 12LL, s = 13LL}), + 91, 'correct is no align stack') + +test:is(ffi_ccall.test_2_is_a16_reg(0LL, + {i = 1LL, s = 2LL}, {i = 3LL, s = 4LL}), + 10, 'correct is align 16 reg') + +test:is(ffi_ccall.test_2_is_a16_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {i = 10LL, s = 11LL}, {i = 12LL, s = 13LL}), + 91, 'correct is align 16 stack') + +test:is(ffi_ccall.test_2_isis_no_align_reg(0LL, + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL}, + {i = 5LL, s = 6LL, i2 = 7LL, s2 = 8LL}), + 36, 'correct isis no align reg') + +test:is(ffi_ccall.test_2_isis_no_align_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL}, + {i = 14LL, s = 15LL, i2 = 16LL, s2 = 17LL}), + 153, 'correct isis no align stack') + +test:is(ffi_ccall.test_2_isis_a16_reg(0LL, + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL}, + {i = 5LL, s = 6LL, i2 = 7LL, s2 = 8LL}), + 36, 'correct isis align 16 reg') + +test:is(ffi_ccall.test_2_isis_a16_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL}, + {i = 14LL, s = 15LL, i2 = 16LL, s2 = 17LL}), + 153, 'correct isis align 16 stack') + +test:is(ffi_ccall.test_2_isisis_no_align_reg(0LL, + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL, i3 = 5LL, s3 = 6LL}, + {i = 7LL, s = 8LL, i2 = 9LL, s2 = 10LL}), + 55, 'correct isisis no align reg') + +test:is(ffi_ccall.test_2_isisis_no_align_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL, i3 = 14LL, s3 = 15LL}, + {i = 16LL, s = 17LL, i2 = 18LL, s2 = 19LL, i3 = 20LL, s3 = 21LL}), + 231, 'correct isisis no align stack') + +test:is(ffi_ccall.test_2_isisis_a16_reg(0LL, + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL, i3 = 5LL, s3 = 6LL}, + {i = 7LL, s = 8LL, i2 = 9LL, s2 = 10LL}), + 55, 'correct isisis align 16 reg') + +test:is(ffi_ccall.test_2_isisis_a16_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL, i3 = 14LL, s3 = 15LL}, + {i = 16LL, s = 17LL, i2 = 18LL, s2 = 19LL, i3 = 20LL, s3 = 21LL}), + 231, 'correct isisis align 16 stack') + + +test:is(ffi_ccall.test_2_isis_no_align_split( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, + {i = 8LL, s = 9LL, i2 = 10LL, s2 = 11LL}, + {i = 12LL, s = 13LL, i2 = 14LL, s2 = 15LL}), + 120, 'correct isis no align split') + +test:is(ffi_ccall.test_2_isis_a16_split( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, + {i = 8LL, s = 9LL, i2 = 10LL, s2 = 11LL}, + {i = 12LL, s = 13LL, i2 = 14LL, s2 = 15LL}), + 120, 'correct isis a16 split') + +test:is(ffi_ccall.test_2_ill_packed(0LL, + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), + 10, 'correct ill packed') + +test:is(ffi_ccall.test_2_ill_packed_reord(0LL, + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}, + 5LL, {x = 6LL, y = 7LL}), + 28, 'correct ill packed reord') + +test:is(ffi_ccall.test_2_ill_packed_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), + 91, 'correct ill packed stack') + +test:is(ffi_ccall.test_2_ill_packed_a16(0LL, + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), + 10, 'correct ill packed a16') + +test:is(ffi_ccall.test_2_ill_packed_a16_reord(0LL, + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}, + 5LL, {x = 6LL, y = 7LL}), + 28, 'correct ill packed a16 reord') + +test:is(ffi_ccall.test_2_ill_packed_a16_stack( + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), + 91, 'correct ill packed a16 stack') + +test:done(true) -- 2.54.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 4/5] FFI: Various ABI and calling convention fixes. 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 4/5] FFI: Various ABI and calling convention fixes Sergey Kaplun via Tarantool-patches @ 2026-06-01 13:02 ` Sergey Bronnikov via Tarantool-patches 0 siblings, 0 replies; 10+ messages in thread From: Sergey Bronnikov via Tarantool-patches @ 2026-06-01 13:02 UTC (permalink / raw) To: Sergey Kaplun, Evgeniy Temirgaleev; +Cc: tarantool-patches [-- Attachment #1: Type: text/plain, Size: 55011 bytes --] Hi, Sergey, thanks for the patch! See my comments below. It looks like you tried to test the patch as thoroughly as possible. Could you tell me how you compiled your test list? Is there a matrix of possible types and structures you used? How do you assess the completeness of testing? Sergey On 5/30/26 19:04, Sergey Kaplun wrote: > From: Mike Pall <mike> > > Thanks to Sergey Kaplun. > > (cherry picked from commit 5b2e51db2c5e445cb98e026fc1e290c14eca67c1) > > This patch fixes several issues at once: > > 1) On x64, the structure pass-by-value on the stack for the small > argument size lacks the alignment check. This patch fixes that. > > 2 ) According to the AAPCS64, the alignment of the argument to be passed > by value is determined by its natural alignment [1] instead of the > alignment of the type (see B.6 [2]). Not applicable to OSX. For > fixing this, we need to store the field alignment for `CT_FIELD` > ctype since field alignment determines natural alignment. > > On OSX the "packed" stack rules [3] applied to non-variadic > functions too. This patch fixes that. Unfortunately, it breaks the > calling convention for native types to be passed by stack for > non-variadic functions. This leads to the failure of tests added in > the previous commit. This will be fixed in the next commit. > > 3) The x64 ABI allows reordering of arguments. The structures that > should be passed on the stack due to lack of the corresponding check > may be placed in registers anyway. This patch fixes that by adding > the corresponding flag in `ccall_set_args()`. > > 4) Also, this patch fixes the zero-sized bitfield behaviour to be intact > with GCC (after 12.1 [4]) and Clang. Be aware that the alignment of > zero-sized fields is not applied to each field of the structure. > > 5) It fixes handling for multidimensional HFA structures on arm64. > Structures with zero-sized arrays are considered non-HFA. > > Also, this commit adds tests for vector arguments and empty structures > to be sure that their behaviour still valid after changes. > > [1]:https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#510composite-types > [2]:https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules > [3]:https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Pass-arguments-to-functions-correctly > [4]:https://gcc.gnu.org/gcc-12/changes.html > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#12480 > --- > src/lj_ccall.c | 59 +- > src/lj_cparse.c | 9 +- > src/lj_ctype.h | 2 +- > .../ffi-call-empty-struct.test.lua | 47 ++ > test/tarantool-tests/ffi-ccall/CMakeLists.txt | 4 + > test/tarantool-tests/ffi-ccall/libfficcall.c | 587 ++++++++++++++++++ > .../ffi-vector-arguments.test.lua | 62 ++ > .../lj-1455-arm64-ffi-ccall-hfa.test.lua | 82 +++ > .../lj-1455-bitfield0-a16.test.lua | 27 + > .../lj-1455-ffi-conventions.test.lua | 441 +++++++++++++ > 10 files changed, 1307 insertions(+), 13 deletions(-) > create mode 100644 test/tarantool-tests/ffi-call-empty-struct.test.lua > create mode 100644 test/tarantool-tests/ffi-vector-arguments.test.lua > create mode 100644 test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua > create mode 100644 test/tarantool-tests/lj-1455-bitfield0-a16.test.lua > create mode 100644 test/tarantool-tests/lj-1455-ffi-conventions.test.lua > > diff --git a/src/lj_ccall.c b/src/lj_ccall.c > index 104c9d34..1beccc10 100644 > --- a/src/lj_ccall.c > +++ b/src/lj_ccall.c > @@ -168,7 +168,9 @@ > if (ccall_struct_arg(cc, cts, d, rcl, o, narg)) goto err_nyi; \ > nsp = cc->nsp; ngpr = cc->ngpr; nfpr = cc->nfpr; \ > continue; \ > - } /* Pass all other structs by value on stack. */ > + } else { /* Pass all other structs by value on stack. */ \ > + onstack = 1; \ > + } > > #define CCALL_HANDLE_COMPLEXARG \ > isfp = 2; /* Pass complex in FPRs or on stack. Needs postprocessing. */ > @@ -183,7 +185,7 @@ > } \ > } else { /* Try to pass argument in GPRs. */ \ > /* Note that reordering is explicitly allowed in the x64 ABI. */ \ > - if (n <= 2 && ngpr + n <= maxgpr) { \ > + if (!onstack && n <= 2 && ngpr + n <= maxgpr) { \ > dp = &cc->gpr[ngpr]; \ > ngpr += n; \ > goto done; \ > @@ -350,7 +352,7 @@ > nfpr = CCALL_NARG_FPR; /* Prevent reordering. */ \ > } \ > } else { /* Try to pass argument in GPRs. */ \ > - if (!LJ_TARGET_OSX && (d->info & CTF_ALIGN) > CTALIGN_PTR) \ > + if (!LJ_TARGET_OSX && !rp && ccall_struct_align(cts, d) > CTALIGN_PTR) \ > ngpr = (ngpr + 1u) & ~1u; /* Align to regpair. */ \ > if (ngpr + n <= maxgpr) { \ > dp = &cc->gpr[ngpr]; \ > @@ -661,7 +663,7 @@ static int ccall_classify_struct(CTState *cts, CType *ct, int *rcl, CTSize ofs) > fofs = ofs+ct->size; > if (ctype_isfield(ct->info)) > ccall_classify_ct(cts, ctype_rawchild(cts, ct), rcl, fofs); > - else if (ctype_isbitfield(ct->info)) > + else if (ctype_isbitfield(ct->info) && ctype_bitbsz(ct->info)) > rcl[(fofs >= 8)] |= CCALL_RCL_INT; /* NYI: unaligned bitfields? */ > else if (ctype_isxattrib(ct->info, CTA_SUBTYPE)) > ccall_classify_struct(cts, ctype_rawchild(cts, ct), rcl, fofs); > @@ -700,8 +702,11 @@ static int ccall_struct_arg(CCallState *cc, CTState *cts, CType *d, int *rcl, > if (ccall_struct_reg(cc, cts, dp, rcl)) { > /* Register overflow? Pass on stack. */ > MSize nsp = cc->nsp, sz = rcl[1] ? 2*CTSIZE_PTR : CTSIZE_PTR; > + MSize align = (1u << ctype_align(d->info)) - 1; > if (nsp + sz > CCALL_SIZE_STACK) > return 1; /* Too many arguments. */ > + if (CCALL_ALIGN_STACKARG && align > CTSIZE_PTR-1) > + nsp = (nsp + align) & ~align; /* Align argument on stack. */ > cc->nsp = nsp + sz; > memcpy((uint8_t *)cc->stack + nsp, dp, sz); > } > @@ -776,6 +781,31 @@ noth: /* Not a homogeneous float/double aggregate. */ > > #if LJ_TARGET_ARM64 > > +#if !LJ_TARGET_OSX > +/* Alignment of pass-by-value structs: 8 or 16. */ > +static CTInfo ccall_struct_align_arm64(CTState *cts, CType *ct) > +{ > + CTSize sz; > + if (ct->sib) { > + while (ct->sib) { > + ct = ctype_get(cts, ct->sib); > + if (ctype_isfield(ct->info)) { > + if ((ct->info & CTF_ALIGN) > CTALIGN_PTR) return CTALIGN(4); > + } else if (ctype_isxattrib(ct->info, CTA_SUBTYPE)) { > + CType *sct = ctype_rawchild(cts, ct); > + CTInfo info = lj_ctype_info(cts, ctype_typeid(cts, sct), &sz); > + if ((info & CTF_ALIGN) > CTALIGN_PTR) return CTALIGN(4); > + } > + } > + } else { > + CTInfo info = lj_ctype_info(cts, ctype_typeid(cts, ct), &sz); > + if ((info & CTF_ALIGN) > CTALIGN_PTR) return CTALIGN(4); > + } > + return CTALIGN_PTR; > +} > +#define ccall_struct_align(cts, ct) ccall_struct_align_arm64((cts), (ct)) > +#endif > + > /* Classify a struct based on its fields. */ > static unsigned int ccall_classify_struct(CTState *cts, CType *ct) > { > @@ -787,10 +817,10 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) > ct = ctype_get(cts, ct->sib); > if (ctype_isfield(ct->info)) { > sct = ctype_rawchild(cts, ct); > - if (ctype_isarray(sct->info)) { > + if (ctype_isarray(sct->info) && !sct->size) goto noth; > + while (ctype_isarray(sct->info)) { > CType *cct = ctype_rawchild(cts, sct); > - if (!cct->size) continue; > - m = sct->size / cct->size; > + m *= sct->size / cct->size; > sct = cct; > } > if (ctype_isfp(sct->info)) { > @@ -804,7 +834,7 @@ static unsigned int ccall_classify_struct(CTState *cts, CType *ct) > } else { > goto noth; > } > - } else if (ctype_isbitfield(ct->info)) { > + } else if (ctype_isbitfield(ct->info) && ctype_bitbsz(ct->info)) { > goto noth; > } else if (ctype_isxattrib(ct->info, CTA_SUBTYPE)) { > sct = ctype_rawchild(cts, ct); > @@ -898,6 +928,11 @@ void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft) > > #endif > > +#ifndef ccall_struct_align > +/* Alignment of pass-by-value structs. */ > +#define ccall_struct_align(cts, ct) ((ct)->info & CTF_ALIGN) > +#endif > + > /* -- Common C call handling ---------------------------------------------- */ > > /* Infer the destination CTypeID for a vararg argument. > @@ -1004,6 +1039,9 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, > CTSize sz; > MSize n, isfp = 0, isva = 0; > void *dp, *rp = NULL; > +#if LJ_TARGET_X64 && !LJ_ABI_WIN > + int onstack = 0; > +#endif > > if (fid) { /* Get argument type from field. */ > CType *ctf = ctype_get(cts, fid); > @@ -1042,7 +1080,10 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, > > /* Otherwise pass argument on stack. */ > if (CCALL_ALIGN_STACKARG) { /* Align argument on stack. */ > - MSize align = (1u << ctype_align(d->info)) - 1; > + MSize align = (1u << ctype_align(ccall_struct_align(cts, d))) - 1; > +#if LJ_TARGET_ARM64 && LJ_TARGET_OSX > + isva = 1; > +#endif > if (rp || (CCALL_PACK_STACKARG && isva && align < CTSIZE_PTR-1)) > align = CTSIZE_PTR-1; > nsp = (nsp + align) & ~align; > diff --git a/src/lj_cparse.c b/src/lj_cparse.c > index 9f3b032a..ff23b44b 100644 > --- a/src/lj_cparse.c > +++ b/src/lj_cparse.c > @@ -1320,14 +1320,16 @@ static void cp_struct_layout(CPState *cp, CTypeID sid, CTInfo sattr) > align = ctype_align(attr); > if (cp->packstack[cp->curpack] < align) > align = cp->packstack[cp->curpack]; > - if (align > maxalign) maxalign = align; > + bsz = ctype_bitcsz(ct->info); /* Bitfield size (temp.). */ > + if (align > maxalign && bsz) maxalign = align; > amask = (8u << align) - 1; > > - bsz = ctype_bitcsz(ct->info); /* Bitfield size (temp.). */ > if (bsz == CTBSZ_FIELD || !ctype_isfield(ct->info)) { > bsz = csz; /* Regular fields or subtypes always fill the container. */ > bofs = (bofs + amask) & ~amask; /* Start new aligned field. */ > ct->size = (bofs >> 3); /* Store field offset. */ > + if (ctype_isfield(ct->info)) > + ct->info = CTINFO(CT_FIELD, ctype_cid(ct->info)) + CTALIGN(align); > } else { /* Bitfield. */ > if (bsz == 0 || (attr & CTFP_ALIGNED) || > (!((attr|sattr) & CTFP_PACKED) && (bofs & amask) + bsz > csz)) > @@ -1335,7 +1337,8 @@ static void cp_struct_layout(CPState *cp, CTypeID sid, CTInfo sattr) > > /* Prefer regular field over bitfield. */ > if (bsz == csz && (bofs & amask) == 0) { > - ct->info = CTINFO(CT_FIELD, ctype_cid(ct->info)); > + ct->info = CTINFO(CT_FIELD, ctype_cid(ct->info)) + > + CTALIGN(lj_fls(sz)); > ct->size = (bofs >> 3); /* Store field offset. */ > } else { > ct->info = CTINFO(CT_BITFIELD, > diff --git a/src/lj_ctype.h b/src/lj_ctype.h > index 2d393eb9..bee3f72a 100644 > --- a/src/lj_ctype.h > +++ b/src/lj_ctype.h > @@ -51,7 +51,7 @@ LJ_STATIC_ASSERT(((int)CT_STRUCT & (int)CT_ARRAY) == CT_STRUCT); > ** |FUNC ....VS.. cc cid | nargs | field | name? | name? | > ** |TYPEDEF cid | | | name | name | > ** |ATTRIB attrnum cid | attr | sib? | type? | | > -** |FIELD cid | offset | field | | name? | > +** |FIELD A cid | offset | field | | name? | > ** |BITFIELD B.cvU csz bsz pos | offset | field | | name? | > ** |CONSTVAL c cid | value | const | name | name | > ** |EXTERN cid | | sib? | name | name | > diff --git a/test/tarantool-tests/ffi-call-empty-struct.test.lua b/test/tarantool-tests/ffi-call-empty-struct.test.lua > new file mode 100644 > index 00000000..cb7d3ea2 > --- /dev/null > +++ b/test/tarantool-tests/ffi-call-empty-struct.test.lua > @@ -0,0 +1,47 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +-- The test file to check FFI correctness for the empty structs. > +local test = tap.test('ffi-call-empty-struct') > + > +test:plan(6) > + > +local ffi_ccall = ffi.load('libfficcall') > + > +ffi.cdef[[ > + struct empty {}; > + struct super_empty {int arg[0];}; > + struct sort_of_empty {struct super_empty;}; > + > + struct empty empty_ret(void); > + struct super_empty super_empty_ret(void); > + struct sort_of_empty sort_of_empty_ret(void); > + > + int super_empty_arg(struct super_empty e, int a); > + int sort_of_empty_arg(struct sort_of_empty e, int a); > + int empty_arg(struct empty e, int a); > +]] > + > +local MAGIC = 42LL > + > +local empty_t = ffi.typeof('struct empty') > +local super_empty_t = ffi.typeof('struct super_empty') > +local sort_of_empty_t = ffi.typeof('struct sort_of_empty') > + > +test:is(ffi.typeof(ffi_ccall.empty_ret()), empty_t, 'correct empty ret type') > +test:is(ffi.typeof(ffi_ccall.super_empty_ret()), super_empty_t, > + 'correct super_empty ret type') > +test:is(ffi.typeof(ffi_ccall.sort_of_empty_ret()), sort_of_empty_t, > + 'correct sort_of_empty ret type') > + > +local empty_o = empty_t() > +local super_empty_o = super_empty_t() > +local sort_of_empty_o = sort_of_empty_t() > + > +test:is(ffi_ccall.empty_arg(empty_o, MAGIC), MAGIC, 'correct empty arg handle') > +test:is(ffi_ccall.super_empty_arg(super_empty_o, MAGIC), MAGIC, > + 'correct super_empty arg handle') > +test:is(ffi_ccall.sort_of_empty_arg(sort_of_empty_o, MAGIC), MAGIC, > + 'correct sort_of_empty arg handle') > + > +test:done(true) > diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > index 1d004591..dfd58bd2 100644 > --- a/test/tarantool-tests/ffi-ccall/CMakeLists.txt > +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt > @@ -1,8 +1,12 @@ > list(APPEND tests > + ffi-call-empty-struct.test.lua > + ffi-vector-arguments.test.lua > ffi-ccall-arm64-fp-convention.test.lua > lj-205-arm64-osx-ffi-enum-arg.test.lua > lj-205-arm64-osx-ffi-small-arg.test.lua > lj-1357-arm64-struct-array-pass-by-val.test.lua > + lj-1455-arm64-ffi-ccall-hfa.test.lua > + lj-1455-ffi-conventions.test.lua > ) it is sorted not alphabetically: --- a/test/tarantool-tests/ffi-ccall/CMakeLists.txt +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt @@ -1,12 +1,12 @@ list(APPEND tests ffi-call-empty-struct.test.lua - ffi-vector-arguments.test.lua ffi-ccall-arm64-fp-convention.test.lua - lj-205-arm64-osx-ffi-enum-arg.test.lua - lj-205-arm64-osx-ffi-small-arg.test.lua + ffi-vector-arguments.test.lua lj-1357-arm64-struct-array-pass-by-val.test.lua lj-1455-arm64-ffi-ccall-hfa.test.lua lj-1455-ffi-conventions.test.lua + lj-205-arm64-osx-ffi-enum-arg.test.lua + lj-205-arm64-osx-ffi-small-arg.test.lua ) > > BuildTestCLib(libfficcall libfficcall.c "${tests}") > diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c > index ecb21752..60d30f3c 100644 > --- a/test/tarantool-tests/ffi-ccall/libfficcall.c > +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c > @@ -1,4 +1,13 @@ > #include <stdint.h> > +#include <stdarg.h> > + > +#define lengthof(a) (sizeof(a) / sizeof((a)[0])) > + > +#define UNUSED(x) ((void)(x)) > + > +#if defined(__clang__) > +#undef __GNUC__ > +#endif > > struct sz12_t { > float f1; > @@ -85,7 +94,585 @@ typedef struct hfa_float2 { > float v[2]; > } hfa_float2; > > +typedef struct hfa_float22 { > + float v[2][2]; > +} hfa_float22; > + > +typedef struct non_hfa_float222 { > + float v[2][2][2]; > +} non_hfa_float222; > + > +typedef struct hfa_double2 { > + double v[2]; > +} hfa_double2; > + > +typedef struct hfa_double2_a16 { > + __attribute__((__aligned__(16))) double v[2]; > +} hfa_double2_a16; > + > +typedef struct hfa_double2_a32 { > + __attribute__((__aligned__(32))) double v[4]; > +} hfa_double2_a32; > + > float hfa_float2_sum(hfa_float2 h) > { > return h.v[0] + h.v[1]; > } > + > +float hfa_float22_sum(hfa_float22 h) > +{ > + return h.v[0][0] + h.v[0][1] + h.v[1][0] + h.v[1][1]; > +} > + > +float non_hfa_float222_sum(non_hfa_float222 h) > +{ > + return h.v[0][0][0] + h.v[0][0][1] + h.v[0][1][0] + h.v[0][1][1] + > + h.v[1][0][0] + h.v[1][0][1] + h.v[1][1][0] + h.v[1][1][1]; > +} > + > +/* > + * Incorrect GCC behaviour. > + * See:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125023. > + */ > +#if !defined(__GNUC__) > +typedef struct hfa_float_hole { > + float x; > + float hole[0][2][2]; > + float y; > +} hfa_float_hole; > + > +float hfa_float_hole_sum(hfa_float_hole h) > +{ > + return h.x + h.y; > +} > +#endif /* GCC */ > + > +double hfa_double2_sum(hfa_double2 h) > +{ > + return h.v[0] + h.v[1]; > +} > + > +double hfa_double2_a16_sum(hfa_double2_a16 h) > +{ > + return h.v[0] + h.v[1]; > +} > + > +double hfa_double2_a32_sum(hfa_double2_a32 h) > +{ > + return h.v[0] + h.v[1] + h.v[2] + h.v[3]; > +} > + > +/* > + * Enable only for GCC >= 12.0.0 > + * See the first paragraph about the 0 bitfield: > + *https://gcc.gnu.org/gcc-12/changes.html. > + */ > +#if !defined(__GNUC__) || __GNUC__ >= 12 > +typedef struct hfa_0bitfield { > + float x; > + int : 0; > + float y; > + float z; > +} hfa_0bitfield; > + > +float hfa_0bitfield_sum(hfa_0bitfield h) > +{ > + return h.x + h.y + h.z; > +} > +#endif /* GNUC >= 12.0.0 */ > + > +/****************************************************************/ > +/* Empty structures. */ > +/****************************************************************/ > + > +struct empty {}; > + > +struct super_empty { > + int arr[0]; > +}; > + > +struct sort_of_empty { > + struct super_empty e; > +}; > + > +struct empty empty_ret(void) > +{ > + struct empty e; > + return e; > +} > + > +struct super_empty super_empty_ret(void) > +{ > + struct super_empty e; > + return e; > +} > + > +struct sort_of_empty sort_of_empty_ret(void) > +{ > + struct sort_of_empty e; > + return e; > +} > + > +int empty_arg(struct empty e, int a) > +{ > + return a; > +} > + > +int super_empty_arg(struct super_empty e, int a) > +{ > + return a; > +} > + > +int sort_of_empty_arg(struct sort_of_empty e, int a) > +{ > + return a; > +} > + > +/****************************************************************/ > +/* Vector passing. */ > +/****************************************************************/ > + > +/* Test direct vector passing. */ > +typedef float vfloatx2 __attribute__ ((__vector_size__ (8))); > +typedef float vfloatx4 __attribute__ ((__vector_size__ (16))); > + > +/* Return the given value without change. */ > +vfloatx2 vfloatx2_call(vfloatx2 x) { return x; } > +vfloatx4 vfloatx4_call(vfloatx4 x) { return x; } > + > +typedef int int32x4_t __attribute__((__vector_size__ (4 * 4))); > + > +int32x4_t test_hva_varg(int n, ...) > +{ > + va_list vl; > + va_start(vl, n); > + int32x4_t a = va_arg(vl, int32x4_t); > + int32x4_t b = va_arg(vl, int32x4_t); > + va_end(vl); > + int32x4_t res = a + b; > + return res; > +} > + > +/****************************************************************/ > +/* Various argument types. */ > +/****************************************************************/ > + > +/* Testing alignment with aggregates. */ > + > +/* > + * HFA, aggregates with size <= 16 bytes and aggregates with > + * size > 16 bytes. > + */ > +typedef struct hfa_floatx4_a16 { > + float v[4]; > +} __attribute__((aligned(16))) hfa_floatx4_a16; > + > +float test_2_align_hfa(int i, hfa_floatx4_a16 s1, hfa_floatx4_a16 s2) > +{ > + UNUSED(i); > + const float *v1 = s1.v; > + const float *v2 = s2.v; > + return v1[0] + v1[1] + v1[2] + v1[3] + v2[0] + v2[1] + v2[2] + v2[3]; > +} > + > +/* Testing 16-byte aggregate. */ > +typedef struct intx4_a16 { > + int v[4]; > +} __attribute__((aligned(16))) intx4_a16; > + > +int test_2_intx4_a16(int i, intx4_a16 s1, intx4_a16 s2) > +{ > + const int *v1 = s1.v; > + const int *v2 = s2.v; > + return i + v1[0] + v1[1] + v1[2] + v1[3] + > + v2[0] + v2[1] + v2[2] + v2[3]; > +} > + > +/* Testing large aggregate. */ > +typedef struct large_agg_a16 { > + int v[18]; > +} __attribute__((aligned(16))) large_agg_a16; > + > +int test_2_large_agg_a16(int x, large_agg_a16 s1, large_agg_a16 s2) > +{ > + const int *v1 = s1.v; > + const int *v2 = s2.v; > + int sum = x; > + for (int i = 0; i < lengthof(s1.v); i++) { > + sum += v1[i] + v2[i]; > + } > + return sum; > +} > + > +typedef struct intx3_0bitfield { > + int x; > + int : 0; > + int y; > + int z; > +} intx3_0bitfield; > + > +int test_2_intx3_0bitfield_reg(int i, intx3_0bitfield s1, intx3_0bitfield s2) > +{ > + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; > +} > + > +int test_2_intx3_0bitfield_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, intx3_0bitfield s1, > + intx3_0bitfield s2) > +{ > + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + > + s2.x + s2.y + s2.z; > +} > + > +typedef struct intx3_0bitfield_a16 { > + int x; > + int : 0 __attribute__((aligned(16))); > + int y; > + int z; > +} intx3_0bitfield_a16; > + > +int test_2_intx3_0bitfield_a16_reg(int i, intx3_0bitfield_a16 s1, > + intx3_0bitfield_a16 s2) > +{ > + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; > +} > + > +int test_2_intx3_0bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + intx3_0bitfield_a16 s1, > + intx3_0bitfield_a16 s2) > +{ > + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + > + s2.x + s2.y + s2.z; > +} > + > +typedef struct intx3_full_bitfield_a16 { > + int x; > + int y: 32 __attribute__((aligned(16))); > + int z; > +} intx3_full_bitfield_a16; > + > +int test_2_intx3_full_bitfield_a16_reg(int i, intx3_full_bitfield_a16 s1, > + intx3_full_bitfield_a16 s2) > +{ > + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; > +} > + > +int test_2_intx3_full_bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + intx3_full_bitfield_a16 s1, > + intx3_full_bitfield_a16 s2) > +{ > + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + > + s2.x + s2.y + s2.z; > +} > + > +typedef struct intx3_half_bitfield { > + int x : 16; > + int y : 16; > + int z; > +} intx3_half_bitfield; > + > +int test_2_intx3_half_bitfield_reg(int i, intx3_half_bitfield s1, > + intx3_half_bitfield s2) > +{ > + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; > +} > + > +int test_2_intx3_half_bitfield_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + intx3_half_bitfield s1, > + intx3_half_bitfield s2) > +{ > + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + > + s2.x + s2.y + s2.z; > +} > + > +typedef struct intx3_half_bitfield_a16 { > + int x : 16; > + int y : 16 __attribute__((aligned(16))); > + int z; > +} intx3_half_bitfield_a16; > + > +int test_2_intx3_half_bitfield_a16_reg(int i, intx3_half_bitfield_a16 s1, > + intx3_half_bitfield_a16 s2) > +{ > + return i + s1.x + s1.y + s1.z + s2.x + s2.y + s2.z; > +} > + > +int test_2_intx3_half_bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + intx3_half_bitfield_a16 s1, > + intx3_half_bitfield_a16 s2) > +{ > + return i + s1.x + s1.y + s1.z + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + > + s2.x + s2.y + s2.z; > +} > + > +/* Increased natural alignment. */ > +typedef struct la16l { > + long long x __attribute__((aligned(16))); > + long long y; > +} la16l; > + > +int test_2_la16l_reg(int i, la16l s1, la16l s2) > +{ > + return s1.x + s2.x + i + s1.y + s2.y; > +} > + > +int test_2_la16l_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, > + int i8, int i9, la16l s1, la16l s2) > +{ > + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + > + s2.y; > +} > + > +/****************************************************************/ > +/* Transparent structures. */ > +/****************************************************************/ > + > +typedef struct a16_tsp { > + struct { > + long long x; > + long long y; > + } __attribute__((aligned(16))); > +} a16_tsp; > + > +int test_2_a16_tsp_reg(int i, a16_tsp s1, a16_tsp s2) > +{ > + return s1.x + s2.x + i + s1.y + s2.y; > +} > + > +int test_2_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, > + int i8, int i9, a16_tsp s1, a16_tsp s2) > +{ > + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + > + s2.y; > +} > + > +typedef struct f_a16_tsp { > + struct { > + long long x __attribute__((aligned(16))); > + long long y; > + }; > +} f_a16_tsp; > + > +int test_2_f_a16_tsp_reg(int i, f_a16_tsp s1, f_a16_tsp s2) > +{ > + return s1.x + s2.x + i + s1.y + s2.y; > +} > + > +int test_2_f_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, f_a16_tsp s1, f_a16_tsp s2) > +{ > + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + > + s2.y; > +} > + > +/* > + * Test passing structs with size < 8, < 16 and > 16 > + * with alignment of 16 and without. > + * > + * Structs with size <= 8 bytes, without alignment attribute > + * passed as i64 regardless of the align attribute. > + */ > +typedef struct is_no_align { > + int i; > + short s; > +} is_no_align; > + > +int test_2_is_no_align_reg(int i, is_no_align s1, is_no_align s2) > +{ > + return s1.i + s2.i + i + s1.s + s2.s; > +} > + > +int test_2_is_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, is_no_align s1, > + is_no_align s2) > +{ > + return s1.i + s2.i + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.s + > + s2.s; > +} > + > +/* Structs with size <= 8 bytes, with alignment attribute. */ > +typedef struct is_a16 { > + int i; > + short s; > +} __attribute__((aligned(16))) is_a16; > + > +int test_2_is_a16_reg(int i, is_a16 s1, is_a16 s2) > +{ > + return s1.i + s2.i + i + s1.s + s2.s; > +} > + > +int test_2_is_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, > + int i8, int i9, is_a16 s1, is_a16 s2) > +{ > + return s1.i + s2.i + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.s + > + s2.s; > +} > + > +/* Structs with size <= 16 bytes, without alignment attribute. */ > +typedef struct isis_no_align { > + int i; > + short s; > + int i2; > + short s2; > +} isis_no_align; > + > +int test_2_isis_no_align_reg(int i, isis_no_align s1, isis_no_align s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + i + s1.s + s2.s + s1.s2 + s2.s2; > +} > + > +int test_2_isis_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, isis_no_align s1, > + isis_no_align s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + > + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2; > +} > + > +/* Structs with size <= 16 bytes, with alignment attribute. */ > +typedef struct isis_a16 { > + int i; > + short s; > + int i2; > + short s2; > +} __attribute__((aligned(16))) isis_a16; > + > +int test_2_isis_a16_reg(int i, isis_a16 s1, isis_a16 s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + i + s1.s + s2.s + s1.s2 + s2.s2; > +} > + > +int test_2_isis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, > + int i8, int i9, isis_a16 s1, isis_a16 s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + > + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2; > +} > + > +/* structs with size > 16 bytes, without alignment attribute. */ > +typedef struct isisis { > + int i; > + short s; > + int i2; > + short s2; > + int i3; > + short s3; > +} isisis_no_align; > + > +int test_2_isisis_no_align_reg(int i, isisis_no_align s1, isisis_no_align s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + s1.s + s2.s + > + s1.s2 + s2.s2 + s1.s3 + s2.s3; > +} > + > +int test_2_isisis_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, isisis_no_align s1, > + isisis_no_align s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + i2 + i3 + i4 + > + i5 + i6 + i7 + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2 + s1.s3 + > + s2.s3; > +} > + > +/* Structs with size > 16 bytes, with alignment attribute. */ > +typedef struct isisis_a16 { > + int i; > + short s; > + int i2; > + short s2; > + int i3; > + short s3; > +} __attribute__((aligned(16))) isisis_a16; > + > +int test_2_isisis_a16_reg(int i, isisis_a16 s1, isisis_a16 s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + s1.s + s2.s + > + s1.s2 + s2.s2 + s1.s3 + s2.s3; > +} > + > +int test_2_isisis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, isisis_a16 s1, > + isisis_a16 s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + s1.i3 + s2.i3 + i + i2 + i3 + i4 + > + i5 + i6 + i7 + i8 + i9 + s1.s + s2.s + s1.s2 + s2.s2 + s1.s3 + > + s2.s3; > +} > + > +/* We should not split struct argument between regs and stack. */ > +int test_2_isis_no_align_split(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, isis_no_align s1, isis_no_align s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + > + s1.s + s2.s + s1.s2 + s2.s2; > +} > + > +int test_2_isis_a16_split(int i, int i2, int i3, int i4, int i5, int i6, int i7, > + isis_a16 s1, isis_a16 s2) > +{ > + return s1.i + s2.i + s1.i2 + s2.i2 + i + i2 + i3 + i4 + i5 + i6 + i7 + > + s1.s + s2.s + s1.s2 + s2.s2; > +} > + > +/****************************************************************/ > +/* Packed structures. */ > +/****************************************************************/ > + > +typedef struct ill_packed { > + int x; > + long long y; > +} __attribute__((packed)) ill_packed; > + > +typedef struct ii { > + int x; > + int y; > +} ii; > + > +/* Passing structs with unaligned fields, not in registers. */ > +int test_2_ill_packed(int i, ill_packed s1, ill_packed s2) > +{ > + return s1.x + s2.x + i + s1.y + s2.y; > +} > + > +int test_2_ill_packed_reord(int i, ill_packed s1, ill_packed s2, int i2, ii s3) > +{ > + return s1.x + s2.x + i + s1.y + s2.y + i2 + s3.x + s3.y; > +} > + > +int test_2_ill_packed_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, ill_packed s1, > + ill_packed s2) > +{ > + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + > + s2.y; > +} > + > +/* Packed structure, overaligned, same as above. */ > +typedef struct ill_packed_a16 { > + int x; > + long long y; > +} __attribute__((packed, aligned(16))) ill_packed_a16; > + > +/* Passing structs with unaligned fields not in registers. */ > +int test_2_ill_packed_a16(int i, ill_packed_a16 s1, ill_packed_a16 s2) > +{ > + return s1.x + s2.x + i + s1.y + s2.y; > +} > + > +int test_2_ill_packed_a16_reord(int i, ill_packed_a16 s1, ill_packed_a16 s2, > + int i2, ii s3) > +{ > + return s1.x + s2.x + i + s1.y + s2.y + i2 + s3.x + s3.y; > +} > + > +int test_2_ill_packed_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, ill_packed_a16 s1, > + ill_packed_a16 s2) > +{ > + return s1.x + s2.x + i + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + s1.y + > + s2.y; > +} > diff --git a/test/tarantool-tests/ffi-vector-arguments.test.lua b/test/tarantool-tests/ffi-vector-arguments.test.lua > new file mode 100644 > index 00000000..330ec991 > --- /dev/null > +++ b/test/tarantool-tests/ffi-vector-arguments.test.lua > @@ -0,0 +1,62 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +-- The test file to check FFI vector passing correctness. > +local test = tap.test('ffi-vector-arguments'):skipcond({ > + ['NYI for non x64 arches'] = jit.arch ~= 'x64', s/non x64/non-x64/ > +}) > + > +local SIZING > +-- Only those are implemented. > +if jit.arch == 'x64' then > + SIZING = {2, 4,} s/,}/}/ > +else > + SIZING = {} > +end > + > +test:plan(#SIZING + 1) > + > +local ffi_ccall = ffi.load('libfficcall') > + > +ffi.cdef[[ > + typedef float vfloatx2 __attribute__ ((__vector_size__ (8))); > + typedef float vfloatx4 __attribute__ ((__vector_size__ (16))); > + > + vfloatx2 vfloatx2_call(vfloatx2 x); > + vfloatx4 vfloatx4_call(vfloatx4 x); > + > + typedef int int32x4_t __attribute__((__vector_size__ (4 * 4))); > + int32x4_t test_hva_varg(int n, ...); > +]] > + > +local function test_self_ret_vector(subtest, nelem) > +subtest:plan(1) > + local typestr = 'vfloatx' .. nelem > + local f = ffi_ccall[typestr .. '_call'] > + local arg = {} > + for i = 1, nelem do > + arg[i - 1] = i + 0LL > + end > + local res = f(arg) > + local table_res = {} > + for i = 0, nelem - 1 do > + table_res[i] = res[i] > + end > +subtest:is_deeply(table_res, arg, > + 'correct result for ' .. nelem .. '-sized vec') > +end > + > +for i = 1, #SIZING do > +test:test('vec-' .. SIZING[i], test_self_ret_vector, SIZING[i]) > +end > + > +local hva_arg_vec = ffi.new('int32x4_t', {0LL, 1LL, 2LL, 3LL}) > +local hva_res = ffi_ccall.test_hva_varg(0LL, hva_arg_vec, hva_arg_vec) > +local hva_res_tab = {} > +local hva_expected = {[0] = 0, 2, 4, 6} > +for i = 0, 3 do > + hva_res_tab[i] = hva_res[i] > +end > +test:is_deeply(hva_res_tab, hva_expected, 'correct hva with the int type') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua b/test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua > new file mode 100644 > index 00000000..8552a5f0 > --- /dev/null > +++ b/test/tarantool-tests/lj-1455-arm64-ffi-ccall-hfa.test.lua > @@ -0,0 +1,82 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +-- The test file to test various FFI C call conventions for HFA > +-- aggregates. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1455. > +local test = tap.test('lj-1455-arm64-ffi-ccall-hfa') > + > +test:plan(7) > + > +local ffi_ccall = ffi.load('libfficcall') > + > +ffi.cdef[[ > + typedef struct hfa_float22 { > + float v[2][2]; > + } hfa_float22; > + > + > + typedef struct non_hfa_float222 { > + float v[2][2][2]; > + } non_hfa_float222; > + > + typedef struct hfa_float_hole { > + float x; > + float hole[0][2][2]; > + float y; > + } hfa_float_hole; > + > + typedef struct hfa_double2 { > + double v[2]; > + } hfa_double2; > + > + typedef struct hfa_double2_a16 { > + __attribute__((__aligned__(16))) double v[2]; > + } hfa_double2_a16; > + > + typedef struct hfa_double2_a32 { > + __attribute__((__aligned__(32))) double v[4]; > + } hfa_double2_a32; > + > + float hfa_float22_sum(hfa_float22 h); > + double hfa_double2_sum(hfa_double2 h); > + double hfa_double2_a16_sum(hfa_double2_a16 h); > + double hfa_double2_a32_sum(hfa_double2_a32 h); > + > + typedef struct hfa_0bitfield { > + float x; > + int : 0; > + float y; > + float z; > + } hfa_0bitfield; > + > + float hfa_0bitfield_sum(hfa_0bitfield h); > + > + float non_hfa_float222_sum(non_hfa_float222 h); > + > + float hfa_float_hole_sum(hfa_float_hole h); > +]] > + > +test:is(ffi_ccall.hfa_float22_sum({{{1, 2}, {3, 4}}}), 10, > + 'HFA 2 dimensional correct') > +test:is(ffi_ccall.non_hfa_float222_sum({{{{1, 2}, {3, 4}},{{5, 6}, {7, 8}}}}), > + 36, 'non HFA array correct') > +local supported, func = pcall(function() > + return ffi_ccall.hfa_float_hole_sum > +end) > +if supported then > +test:is(func({x = 1, y = 2}), 3, 'HFA float hole correct') > +else > +test:skip('HFA float hole -- Unsupported by C compiler') > +end > +test:is(ffi_ccall.hfa_double2_sum({{1, 2}}), 3, 'HFA double correct') > +test:is(ffi_ccall.hfa_double2_a16_sum({{1, 2}}), 3, 'align 16 correct') > +test:is(ffi_ccall.hfa_double2_a32_sum({{1, 2, 3, 4}}), 10, 'align 32 correct') > +supported, func = pcall(function() return ffi_ccall.hfa_0bitfield_sum end) > +if supported then > +test:is(func({x = 1, y = 2, z = 3}), 6, 'HFA 0 bitfield correct') > +else > +test:skip('HFA 0 bitfield -- Unsupported by C compiler') > +end > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-1455-bitfield0-a16.test.lua b/test/tarantool-tests/lj-1455-bitfield0-a16.test.lua > new file mode 100644 > index 00000000..6f8e9aac > --- /dev/null > +++ b/test/tarantool-tests/lj-1455-bitfield0-a16.test.lua > @@ -0,0 +1,27 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +-- The test file demonstrates incorrect FFI attributes for the > +-- structure with a zero-sized bitfield. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1455. > +local test = tap.test('lj-1455-ffi-conventions') > + > +test:plan(3) > + > +ffi.cdef[[ > + typedef struct { > + int x; > + int : 0 __attribute__((aligned(16))); > + int y; > + int z; > + } intx3_0bitfield_a16; > +]] > + > +test:is(ffi.sizeof(ffi.new('intx3_0bitfield_a16')), 24, > + 'correct size of struct with 0 bitfield') > +test:is(ffi.offsetof('intx3_0bitfield_a16', 'y'), 16, > + 'correct offset of field after 0 bitfield') > +test:is(ffi.alignof('intx3_0bitfield_a16'), 4, > + 'correct total align of struct with 0 bitfield') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-1455-ffi-conventions.test.lua b/test/tarantool-tests/lj-1455-ffi-conventions.test.lua > new file mode 100644 > index 00000000..e7a45736 > --- /dev/null > +++ b/test/tarantool-tests/lj-1455-ffi-conventions.test.lua > @@ -0,0 +1,441 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +-- The test file to test various FFI C call conventions. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1455. > +local test = tap.test('lj-1455-ffi-conventions') > + > +test:plan(39) > + > +local ffi_ccall = ffi.load('libfficcall') > + > +ffi.cdef[[ > + typedef struct hfa_floatx4_a16 { > + float v[4]; > + } __attribute__((aligned(16))) hfa_floatx4_a16; > + > + float test_2_align_hfa(int i, hfa_floatx4_a16 s1, hfa_floatx4_a16 s2); > + > + typedef struct intx4_a16 { > + int v[4]; > + } __attribute__((aligned(16))) intx4_a16; > + > + int test_2_intx4_a16(int i, intx4_a16 s1, intx4_a16 s2); > + > + typedef struct large_agg_a16 { > + int v[18]; > + } __attribute__((aligned(16))) large_agg_a16; > + > + int test_2_large_agg_a16(int x, large_agg_a16 s1, large_agg_a16 s2); > + > + typedef struct intx3_0bitfield { > + int x; > + int : 0; > + int y; > + int z; > + } intx3_0bitfield; > + > + int test_2_intx3_0bitfield_reg(int i, intx3_0bitfield s1, intx3_0bitfield s2); > + int test_2_intx3_0bitfield_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + intx3_0bitfield s1, intx3_0bitfield s2); > + > + typedef struct intx3_0bitfield_a16 { > + int x; > + int : 0 __attribute__((aligned(16))); > + int y; > + int z; > + } intx3_0bitfield_a16; > + > + int test_2_intx3_0bitfield_a16_reg(int i, intx3_0bitfield_a16 s1, > + intx3_0bitfield_a16 s2); > + int test_2_intx3_0bitfield_a16_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + intx3_0bitfield_a16 s1, > + intx3_0bitfield_a16 s2); > + > + typedef struct intx3_full_bitfield_a16 { > + int x; > + int y: 32 __attribute__((aligned(16))); > + int z; > + } intx3_full_bitfield_a16; > + > + int test_2_intx3_full_bitfield_a16_reg(int i, intx3_full_bitfield_a16 s1, > + intx3_full_bitfield_a16 s2); > + int test_2_intx3_full_bitfield_a16_stack(int i, int i2, int i3, int i4, > + int i5, int i6, int i7, int i8, > + int i9, intx3_full_bitfield_a16 s1, > + intx3_full_bitfield_a16 s2); > + > + typedef struct intx3_half_bitfield { > + int x : 16; > + int y : 16; > + int z; > + } intx3_half_bitfield; > + > + int test_2_intx3_half_bitfield_reg(int i, intx3_half_bitfield s1, > + intx3_half_bitfield s2); > + int test_2_intx3_half_bitfield_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + intx3_half_bitfield s1, > + intx3_half_bitfield s2); > + > + typedef struct intx3_half_bitfield_a16 { > + int x : 16; > + int y : 16 __attribute__((aligned(16))); > + int z; > + } intx3_half_bitfield_a16; > + > + int test_2_intx3_half_bitfield_a16_reg(int i, intx3_half_bitfield_a16 s1, > + intx3_half_bitfield_a16 s2); > + int test_2_intx3_half_bitfield_a16_stack(int i, int i2, int i3, int i4, > + int i5, int i6, int i7, int i8, > + int i9, intx3_half_bitfield_a16 s1, > + intx3_half_bitfield_a16 s2); > + > + typedef struct la16l { > + long long x __attribute__((aligned(16))); > + long long y; > + } la16l; > + > + int test_2_la16l_reg(int i, la16l s1, la16l s2); > + int test_2_la16l_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, > + int i8, int i9, la16l s1, la16l s2); > + > + typedef struct a16_tsp { > + struct { > + long long x; > + long long y; > + } __attribute__((aligned(16))); > + } a16_tsp; > + > + int test_2_a16_tsp_reg(int i, a16_tsp s1, a16_tsp s2); > + int test_2_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, a16_tsp s1, a16_tsp s2); > + > + typedef struct f_a16_tsp { > + struct { > + long long x __attribute__((aligned(16))); > + long long y; > + }; > + } f_a16_tsp; > + > + int test_2_f_a16_tsp_reg(int i, f_a16_tsp s1, f_a16_tsp s2); > + int test_2_f_a16_tsp_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, f_a16_tsp s1, > + f_a16_tsp s2); > + > + typedef struct is_no_align { > + int i; > + short s; > + } is_no_align; > + > + int test_2_is_no_align_reg(int i, is_no_align s1, is_no_align s2); > + int test_2_is_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, is_no_align s1, > + is_no_align s2); > + > + typedef struct is_a16 { > + int i; > + short s; > + } __attribute__((aligned(16))) is_a16; > + > + int test_2_is_a16_reg(int i, is_a16 s1, is_a16 s2); > + int test_2_is_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, int i7, > + int i8, int i9, is_a16 s1, is_a16 s2); > + > + typedef struct isis_no_align { > + int i; > + short s; > + int i2; > + short s2; > + } isis_no_align; > + > + int test_2_isis_no_align_reg(int i, isis_no_align s1, isis_no_align s2); > + int test_2_isis_no_align_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, isis_no_align s1, > + isis_no_align s2); > + > + typedef struct isis_a16 > + { > + int i; > + short s; > + int i2; > + short s2; > + } __attribute__((aligned(16))) isis_a16; > + > + int test_2_isis_a16_reg(int i, isis_a16 s1, isis_a16 s2); > + int test_2_isis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, isis_a16 s1, isis_a16 s2); > + > + typedef struct isisis > + { > + int i; > + short s; > + int i2; > + short s2; > + int i3; > + short s3; > + } isisis_no_align; > + > + int test_2_isisis_no_align_reg(int i, isisis_no_align s1, isisis_no_align s2); > + int test_2_isisis_no_align_stack(int i, int i2, int i3, int i4, int i5, > + int i6, int i7, int i8, int i9, > + isisis_no_align s1, isisis_no_align s2); > + > + typedef struct isisis_a16 > + { > + int i; > + short s; > + int i2; > + short s2; > + int i3; > + short s3; > + } __attribute__((aligned(16))) isisis_a16; > + > + int test_2_isisis_a16_reg(int i, isisis_a16 s1, isisis_a16 s2); > + int test_2_isisis_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, isisis_a16 s1, > + isisis_a16 s2); > + > + int test_2_isis_no_align_split(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, isis_no_align s1, isis_no_align s2); > + int test_2_isis_a16_split(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, isis_a16 s1, isis_a16 s2); > + > + typedef struct ill_packed { > + int x; > + long long y; > + } __attribute__((packed)) ill_packed; > + > + typedef struct ii { > + int x; > + int y; > + } ii; > + > + int test_2_ill_packed(int i, ill_packed s1, ill_packed s2); > + int test_2_ill_packed_reord(int i, ill_packed s1, ill_packed s2, int i2, > + ii s3); > + int test_2_ill_packed_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, ill_packed s1, > + ill_packed s2); > + > + typedef struct ill_packed_a16 { > + int x; > + long long y; > + } __attribute__((packed, aligned(16))) ill_packed_a16; > + > + int test_2_ill_packed_a16(int i, ill_packed_a16 s1, ill_packed_a16 s2); > + int test_2_ill_packed_a16_reord(int i, ill_packed_a16 s1, ill_packed_a16 s2, > + int i2, ii s3); > + int test_2_ill_packed_a16_stack(int i, int i2, int i3, int i4, int i5, int i6, > + int i7, int i8, int i9, ill_packed_a16 s1, > + ill_packed_a16 s2); > +]] > + > +test:is(ffi_ccall.test_2_align_hfa(0LL, > + {{0, 1, 2, 3}}, {{4, 5, 6, 7}}), > + 28, 'correct align hfa') > + > +test:is(ffi_ccall.test_2_intx4_a16(0LL, > + {{0LL, 1LL, 2LL, 3LL}}, {{4LL, 5LL, 6LL, 7LL}}), > + 28, 'correct align hva') > + > +local LARGE_HVA_SZ = 18 > +local large_agg_sum = 0LL > +local large_agg1 = {} > +local large_agg2 = {} > +for i = 0, LARGE_HVA_SZ - 1 do > + large_agg1[i] = i + 0LL > + large_agg2[i] = LARGE_HVA_SZ + i + 0LL > + large_agg_sum = large_agg_sum + large_agg1[i] + large_agg2[i] > +end > + > +test:is(ffi_ccall.test_2_large_agg_a16(0LL, {large_agg1}, {large_agg2}), > + large_agg_sum, 'correct large align agg') > + > +test:is(ffi_ccall.test_2_intx3_0bitfield_reg(0LL, > + {x = 1LL, y = 2LL, z = 3LL}, {x = 4LL, y = 5LL, z = 6LL}), > + 21, 'correct intx3 0 bitfield reg') > + > +test:is(ffi_ccall.test_2_intx3_0bitfield_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL, z = 12LL}, > + {x = 13LL, y = 14LL, z = 15LL}), > + 120, 'correct intx3 0 bitfield stack') > + > +test:is(ffi_ccall.test_2_intx3_0bitfield_a16_reg(0LL, > + {x = 1LL, y = 2LL, z = 3LL}, > + {x = 4LL, y = 5LL, z = 6LL}), > + 21, 'correct intx3 0 bitfield align 16 reg') > + > +test:is(ffi_ccall.test_2_intx3_0bitfield_a16_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL, z = 12LL}, > + {x = 13LL, y = 14LL, z = 15LL}), > + 120, 'correct intx3 0 bitfield align 16 stack') > + > +test:is(ffi_ccall.test_2_intx3_full_bitfield_a16_reg( > + 0LL, > + {x = 1LL, y = 2LL, z = 3LL}, > + {x = 4LL, y = 5LL, z = 6LL}), > + 21, 'correct intx3 0 full bitfield align 16 reg') > + > +test:is(ffi_ccall.test_2_intx3_full_bitfield_a16_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL, z = 12LL}, > + {x = 13LL, y = 14LL, z = 15LL}), > + 120, 'correct intx3 full bitfield align 16 stack') > + > +test:is(ffi_ccall.test_2_intx3_half_bitfield_reg(0LL, > + {x = 1LL, y = 2LL, z = 3LL}, > + {x = 4LL, y = 5LL, z = 6LL}), > + 21, 'correct intx3 0 half bitfield reg') > + > +test:is(ffi_ccall.test_2_intx3_half_bitfield_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL, z = 12LL}, > + {x = 13LL, y = 14LL, z = 15LL}), > + 120, 'correct intx3 half bitfield stack') > + > +test:is(ffi_ccall.test_2_intx3_half_bitfield_a16_reg(0LL, > + {x = 1LL, y = 2LL, z = 3LL}, > + {x = 4LL, y = 5LL, z = 6LL}), > + 21, 'correct intx3 0 half bitfield align 16 reg') > + > +test:is(ffi_ccall.test_2_intx3_half_bitfield_a16_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL, z = 12LL}, > + {x = 13LL, y = 14LL, z = 15LL}), > + 120, 'correct intx3 half bitfield align 16 stack') > + > +test:is(ffi_ccall.test_2_la16l_reg(0LL, {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), > + 10, 'correct la16l reg') > + > +test:is(ffi_ccall.test_2_la16l_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), > + 91, 'correct la16l stack') > + > +test:is(ffi_ccall.test_2_a16_tsp_reg(0LL, > + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), > + 10, 'correct tsp reg') > + > +test:is(ffi_ccall.test_2_a16_tsp_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), > + 91, 'correct tsp stack') > + > +test:is(ffi_ccall.test_2_f_a16_tsp_reg(0LL, > + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), > + 10, 'correct tsp aligned field reg') > + > +test:is(ffi_ccall.test_2_f_a16_tsp_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), > + 91, 'correct tsp aligned field stack') > + > +test:is(ffi_ccall.test_2_is_no_align_reg(0LL, > + {i = 1LL, s = 2LL}, {i = 3LL, s = 4LL}), > + 10, 'correct is no align reg') > + > +test:is(ffi_ccall.test_2_is_no_align_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {i = 10LL, s = 11LL}, {i = 12LL, s = 13LL}), > + 91, 'correct is no align stack') > + > +test:is(ffi_ccall.test_2_is_a16_reg(0LL, > + {i = 1LL, s = 2LL}, {i = 3LL, s = 4LL}), > + 10, 'correct is align 16 reg') > + > +test:is(ffi_ccall.test_2_is_a16_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {i = 10LL, s = 11LL}, {i = 12LL, s = 13LL}), > + 91, 'correct is align 16 stack') > + > +test:is(ffi_ccall.test_2_isis_no_align_reg(0LL, > + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL}, > + {i = 5LL, s = 6LL, i2 = 7LL, s2 = 8LL}), > + 36, 'correct isis no align reg') > + > +test:is(ffi_ccall.test_2_isis_no_align_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL}, > + {i = 14LL, s = 15LL, i2 = 16LL, s2 = 17LL}), > + 153, 'correct isis no align stack') > + > +test:is(ffi_ccall.test_2_isis_a16_reg(0LL, > + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL}, > + {i = 5LL, s = 6LL, i2 = 7LL, s2 = 8LL}), > + 36, 'correct isis align 16 reg') > + > +test:is(ffi_ccall.test_2_isis_a16_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL}, > + {i = 14LL, s = 15LL, i2 = 16LL, s2 = 17LL}), > + 153, 'correct isis align 16 stack') > + > +test:is(ffi_ccall.test_2_isisis_no_align_reg(0LL, > + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL, i3 = 5LL, s3 = 6LL}, > + {i = 7LL, s = 8LL, i2 = 9LL, s2 = 10LL}), > + 55, 'correct isisis no align reg') > + > +test:is(ffi_ccall.test_2_isisis_no_align_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL, i3 = 14LL, s3 = 15LL}, > + {i = 16LL, s = 17LL, i2 = 18LL, s2 = 19LL, i3 = 20LL, s3 = 21LL}), > + 231, 'correct isisis no align stack') > + > +test:is(ffi_ccall.test_2_isisis_a16_reg(0LL, > + {i = 1LL, s = 2LL, i2 = 3LL, s2 = 4LL, i3 = 5LL, s3 = 6LL}, > + {i = 7LL, s = 8LL, i2 = 9LL, s2 = 10LL}), > + 55, 'correct isisis align 16 reg') > + > +test:is(ffi_ccall.test_2_isisis_a16_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {i = 10LL, s = 11LL, i2 = 12LL, s2 = 13LL, i3 = 14LL, s3 = 15LL}, > + {i = 16LL, s = 17LL, i2 = 18LL, s2 = 19LL, i3 = 20LL, s3 = 21LL}), > + 231, 'correct isisis align 16 stack') > + > + > +test:is(ffi_ccall.test_2_isis_no_align_split( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, > + {i = 8LL, s = 9LL, i2 = 10LL, s2 = 11LL}, > + {i = 12LL, s = 13LL, i2 = 14LL, s2 = 15LL}), > + 120, 'correct isis no align split') > + > +test:is(ffi_ccall.test_2_isis_a16_split( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, > + {i = 8LL, s = 9LL, i2 = 10LL, s2 = 11LL}, > + {i = 12LL, s = 13LL, i2 = 14LL, s2 = 15LL}), > + 120, 'correct isis a16 split') > + > +test:is(ffi_ccall.test_2_ill_packed(0LL, > + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), > + 10, 'correct ill packed') > + > +test:is(ffi_ccall.test_2_ill_packed_reord(0LL, > + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}, > + 5LL, {x = 6LL, y = 7LL}), > + 28, 'correct ill packed reord') > + > +test:is(ffi_ccall.test_2_ill_packed_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), > + 91, 'correct ill packed stack') > + > +test:is(ffi_ccall.test_2_ill_packed_a16(0LL, > + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}), > + 10, 'correct ill packed a16') > + > +test:is(ffi_ccall.test_2_ill_packed_a16_reord(0LL, > + {x = 1LL, y = 2LL}, {x = 3LL, y = 4LL}, > + 5LL, {x = 6LL, y = 7LL}), > + 28, 'correct ill packed a16 reord') > + > +test:is(ffi_ccall.test_2_ill_packed_a16_stack( > + 1LL, 2LL, 3LL, 4LL, 5LL, 6LL, 7LL, 8LL, 9LL, > + {x = 10LL, y = 11LL}, {x = 12LL, y = 13LL}), > + 91, 'correct ill packed a16 stack') > + > +test:done(true) [-- Attachment #2: Type: text/html, Size: 55312 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH luajit 5/5] FFI/MacOS: Fix calling convention for enums. 2026-05-30 16:04 [Tarantool-patches] [PATCH luajit 0/5] Various FFI ABI calling conventions fixes Sergey Kaplun via Tarantool-patches ` (3 preceding siblings ...) 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 4/5] FFI: Various ABI and calling convention fixes Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 ` Sergey Kaplun via Tarantool-patches 2026-06-01 13:07 ` Sergey Bronnikov via Tarantool-patches 4 siblings, 1 reply; 10+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2026-05-30 16:04 UTC (permalink / raw) To: Sergey Bronnikov, Evgeniy Temirgaleev; +Cc: tarantool-patches From: Mike Pall <mike> Thanks to Sergey Kaplun. (cherry picked from commit b925b3e3fc6771171602323b45fbe9fb8fc90369) This patch fixes the regression in the arm64 OSX FFI behaviour when the fundamental C types are passed via the stack. The given arguments should occupy the "packed" size on the stack if they are fundamental C types [1]. Structures require the pointer-size alignment like it was done in the previous commit. This patch fixes that (not only for enums but for other data types too). This fixes the tests that were failing after the previous commit. [1]: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Pass-arguments-to-functions-correctly Sergey Kaplun: * added the description for the problem Part of tarantool/tarantool#12480 --- src/lj_ccall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lj_ccall.c b/src/lj_ccall.c index 1beccc10..7c3ec1e5 100644 --- a/src/lj_ccall.c +++ b/src/lj_ccall.c @@ -1082,7 +1082,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, if (CCALL_ALIGN_STACKARG) { /* Align argument on stack. */ MSize align = (1u << ctype_align(ccall_struct_align(cts, d))) - 1; #if LJ_TARGET_ARM64 && LJ_TARGET_OSX - isva = 1; + isva = ctype_isstruct(d->info); #endif if (rp || (CCALL_PACK_STACKARG && isva && align < CTSIZE_PTR-1)) align = CTSIZE_PTR-1; -- 2.54.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 5/5] FFI/MacOS: Fix calling convention for enums. 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 5/5] FFI/MacOS: Fix calling convention for enums Sergey Kaplun via Tarantool-patches @ 2026-06-01 13:07 ` Sergey Bronnikov via Tarantool-patches 0 siblings, 0 replies; 10+ messages in thread From: Sergey Bronnikov via Tarantool-patches @ 2026-06-01 13:07 UTC (permalink / raw) To: Sergey Kaplun, Evgeniy Temirgaleev; +Cc: tarantool-patches [-- Attachment #1: Type: text/plain, Size: 1567 bytes --] Hi, Sergey, thanks for the patch! LGTM Sergey On 5/30/26 19:04, Sergey Kaplun wrote: > From: Mike Pall <mike> > > Thanks to Sergey Kaplun. > > (cherry picked from commit b925b3e3fc6771171602323b45fbe9fb8fc90369) > > This patch fixes the regression in the arm64 OSX FFI behaviour when the > fundamental C types are passed via the stack. The given arguments should > occupy the "packed" size on the stack if they are fundamental C > types [1]. Structures require the pointer-size alignment like it was done > in the previous commit. This patch fixes that (not only for enums but for > other data types too). This fixes the tests that were failing after the > previous commit. > > [1]:https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Pass-arguments-to-functions-correctly > > Sergey Kaplun: > * added the description for the problem > > Part of tarantool/tarantool#12480 > --- > src/lj_ccall.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/lj_ccall.c b/src/lj_ccall.c > index 1beccc10..7c3ec1e5 100644 > --- a/src/lj_ccall.c > +++ b/src/lj_ccall.c > @@ -1082,7 +1082,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, > if (CCALL_ALIGN_STACKARG) { /* Align argument on stack. */ > MSize align = (1u << ctype_align(ccall_struct_align(cts, d))) - 1; > #if LJ_TARGET_ARM64 && LJ_TARGET_OSX > - isva = 1; > + isva = ctype_isstruct(d->info); > #endif > if (rp || (CCALL_PACK_STACKARG && isva && align < CTSIZE_PTR-1)) > align = CTSIZE_PTR-1; [-- Attachment #2: Type: text/html, Size: 2169 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-01 13:07 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-05-30 16:04 [Tarantool-patches] [PATCH luajit 0/5] Various FFI ABI calling conventions fixes Sergey Kaplun via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 1/5] FFI: Unify stack setup for C calls in interpreter Sergey Kaplun via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 2/5] FFI/ARM64/OSX: Handle non-standard OSX C calling conventions Sergey Kaplun via Tarantool-patches 2026-06-01 11:40 ` Sergey Bronnikov via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 3/5] ARM64: Fix pass-by-value struct " Sergey Kaplun via Tarantool-patches 2026-06-01 12:27 ` Sergey Bronnikov via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 4/5] FFI: Various ABI and calling convention fixes Sergey Kaplun via Tarantool-patches 2026-06-01 13:02 ` Sergey Bronnikov via Tarantool-patches 2026-05-30 16:04 ` [Tarantool-patches] [PATCH luajit 5/5] FFI/MacOS: Fix calling convention for enums Sergey Kaplun via Tarantool-patches 2026-06-01 13:07 ` Sergey Bronnikov via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox