<HTML><BODY><div id="composeWebView_editable_content" data-mailruapp-compose-id="composeWebView_editable_content" style="text-align: left;"><div>Thanks! </div><div><br></div><div>LGTM.</div><div><br></div><div id="mail-app-auto-signature">


Best regards,<div>Sergos </div>

</div><br><br>Tuesday, 28 June 2022, 22:07 +0300 from Kaplun Sergey  <skaplun@tarantool.org>:<br><div id="composeWebView_previouse_content" data-mailruapp-compose-id="composeWebView_previouse_content" data-darkosha-id="transformer_0" style=""><blockquote id="mail-app-auto-quote" style="border-left-width: 1px; border-left-style: solid; border-left-color: rgb(11, 102, 249); margin: 10px 0px 10px 5px; padding: 0px 0px 0px 10px; display: inherit;"><div class="js-helper js-readmsg-msg" style="">

        <style type="text/css"></style>

        <div style="">

                <base target="_self" href="https://e.mail.ru/" style="">

                
            <div id="style_16564432451772749730_BODY" style="">Hi, Sergos!<br style="">

<br style="">

Thanks for the review!<br style="">

<br style="">

I've updated commit message to the following:<br style="">

<br style="">

===================================================================<br style="">

FFI/ARM64: Fix pass-by-value struct calling conventions.<br style="">

<br style="">

(cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224)<br style="">

<br style="">

If the argument type is a Composite Type then the size of the<br style="">

argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI<br style="">

backend makes this rounding unconditionally for arm64 architecture and<br style="">

uses the result value to determine the necessary amount of registers<br style="">

for the call.<br style="">

<br style="">

The arm64 parameters passing rules for Homogeneous Floating-point<br style="">

Aggregates (HFA) are the following [1]:<br style="">

| If the argument is an HFA and there are sufficient unallocated<br style="">

| Floating-point registers, then the argument is allocated to<br style="">

| Floating-point registers (with one register per member of the HFA).<br style="">

<br style="">

So for the HFA composed of 3 32-bit float members the extra one (4th)<br style="">

register is supposed as occupied.<br style="">

<br style="">

When the second parameter passed to the function these fields are loaded<br style="">

starting from 5th register. As far as the procedure to call uses 6<br style="">

registers according to the ABI it leads to the wrong value of the second<br style="">

parameter passed to the callee (0 due to the corresponding memset in<br style="">

`cconv_struct_init()`).<br style="">

<br style="">

This patch fixes this case by using the real size of structure in the<br style="">

calculation of necessary amount of registers to use.<br style="">

<br style="">

Sergey Kaplun:<br style="">

* added the description and the test for the problem<br style="">

<br style="">

[1]: <a href="https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules" target="_blank" style="">https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules</a><br style="">

<br style="">

Part of tarantool/tarantool#6548<br style="">

===================================================================<br style="">

<br style="">

On 02.06.22, sergos wrote:<br style="">

                                 > Hi!<br style="">

> <br style="">

> Thanks for the patch.<br style="">

> Some minor updates to the wording, test updates requested.<br style="">

> <br style="">

> Regards,<br style="">

> Sergos<br style="">

> <br style="">

> > On 9 Dec 2021, at 13:24, Sergey Kaplun <<a href="mailto:skaplun@tarantool.org" style="">skaplun@tarantool.org</a>> wrote:<br style="">

> > <br style="">

> > From: Mike Pall <mike><br style="">

> > <br style="">

> > (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224)<br style="">

> > <br style="">

> > The arm64 parameters passing rules for Homogeneous Floating-point<br style="">

> > Aggregates (HFA) are the following [1]:<br style="">

> > * If the argument type is an HFA or an HVA, then the argument is used<br style="">

> <br style="">

> There’s no explanation of HVA here, unlike HFA. Should it help?<br style="">

> <br style="">

> >  unmodified.<br style="">

> > * If the argument is an HFA and there are sufficient unallocated<br style="">

> >  Floating-point registers, then the argument is allocated to<br style="">

> >  Floating-pointRegisters (with one register per member of the HFA).<br style="">

> > <br style="">

> > Also, if the argument type is a Composite Type then the size of the<br style="">

> > argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI<br style="">

> > backend makes this rounding unconditionally for arm64 architecture and<br style="">

> > uses the result value to determine the necessary amount of registers<br style="">

> > for the call. So for the HFA composed of 2n + 1 (< 4) float members the<br style="">

> <br style="">

> I suppose you mean 32bit FP value here. <br style="">

> <br style="">

> > extra one register is supposed as occupied. This leads to the wrong<br style="">

> > value (0 due to the corresponding memset in `cconv_struct_init()`) in<br style="">

> > this register if the other one parameter is passed to the procedure.<br style="">

> <br style="">

> If I read it correct: “the HFA compliance zero-filled padding  <br style="">

> can be referred to as a next argument passed”?<br style="">

> <br style="">

> > <br style="">

> > This patch fixes this case by using the real size of structure in the<br style="">

> > calculation of necessary amount of registers to use.<br style="">

> > <br style="">

> > Sergey Kaplun:<br style="">

> > * added the description and the test for the problem<br style="">

> > <br style="">

> > [1]: <a href="https://developer.arm.com/documentation/ihi0055/b/" target="_blank" style="">https://developer.arm.com/documentation/ihi0055/b/</a><br style="">

> <br style="">

> The link is 404. ARM is known for shuffling the docs, dunno how to pin it.<br style="">

> <br style="">

> > <br style="">

> > Part of tarantool/tarantool#6548<br style="">

> > ---<br style="">

> > OK, there is an important note before you proceed with the patch:<br style="">

> > <br style="">

> > `isfp` variable in arm64 may takes the following values:<br style="">

> > 0 - for the pointer argument<br style="">

> > 1 - each part of the argument (i.e. field in a structure or floating<br style="">

> >    point number itself) takes exactly one register<br style="">

> > 2 - the structure is HFA (including complex floats) and compact, i.e.<br style="">

> >    two fields may be saved into one register<br style="">

> > <br style="">

> > Patch fixes the behaviour in the last case, when the structure is HFA<br style="">

> > and takes 8*n + 4 bytes. The variable can't take another value exept<br style="">

> > mentioned before, IINM (I've checked it several times). So the magic<br style="">

> <br style="">

> I don’t see any tests using half precision FP, say _Float16. It is available<br style="">

> since GCC 7 at least. It can help with more variations in the size of the<br style="">

> HFA - say 2 or 6.<br style="">

      <br style="">

Yes, LuaJIT FFI interface doesn't know about them :). This kind of<br style="">

HFA structures is NIY in LuaJIT.<br style="">

<br style="">

>  <br style="">

> > `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no<br style="">

> > idea why Mike's prefered such interesting way to fix the behaviour in<br style="">

> > this case... May be he has its own branch with different `isfp` values<br style="">

> > and this is necessary for compatibility.<br style="">

> > <br style="">

> > Side note: I choose a general name (without mentioning arm64) for this C<br style="">

> > "library" in purpose. It may be adjusted in the future for brute force<br style="">

> > testing of FFI calling conventions for different architectures.<br style="">

> > See also: <a href="https://github.com/facebookarchive/luaffifb/blob/master/test.lua" target="_blank" style="">https://github.com/facebookarchive/luaffifb/blob/master/test.lua</a><br style="">

> > This link has an interesting example of FFI tests. May be we can create<br style="">

> > something similar with autogeneration of functions, structures and<br style="">

> > tests (not right now, but later).<br style="">

> > <br style="">

> > Branch: <a href="https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci" target="_blank" style="">https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci</a><br style="">

> > Tarantool branch: <a href="https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci" target="_blank" style="">https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci</a><br style="">

> > <br style="">

> > src/lj_ccall.c                                |  3 +-<br style="">

> > test/tarantool-tests/CMakeLists.txt           |  1 +<br style="">

> > .../arm64-ccall-fp-convention.test.lua        | 65 +++++++++++++++++++<br style="">

> > test/tarantool-tests/ffi-ccall/CMakeLists.txt |  1 +<br style="">

> > test/tarantool-tests/ffi-ccall/libfficcall.c  | 28 ++++++++<br style="">

> > 5 files changed, 97 insertions(+), 1 deletion(-)<br style="">

> > create mode 100644 test/tarantool-tests/arm64-ccall-fp-convention.test.lua<br style="">

> > create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt<br style="">

> > create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c<br style="">

> > <br style="">

<br style="">

<snipped><br style="">

<br style="">

> > -- <br style="">

> > 2.33.1<br style="">

> > <br style="">

> <br style="">

<br style="">

-- <br style="">

Best regards,<br style="">

Sergey Kaplun</div>

            
                <base target="_self" href="https://e.mail.ru/" style="">

        </div>


</div></blockquote></div></div></BODY></HTML>