From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 36E956ECC0; Thu, 9 Dec 2021 13:27:03 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 36E956ECC0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1639045623; bh=yMfX26bDWpny6VDqNSuq2EDtCLSPa7vK7t2gdB2aGo8=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=BqeEqhmgFk3iQYiek+Pslh/YUe5Fk2/NDO4LkVyBoEglwMGJuc9dzatigxiOIk9ce IG/CaJCF7zgzlTj2uHUTcq33iIPEYXMfuPj4P8bqTEpSfRVUaKE34ESAHKIfN0owvq Nd9QZbQBZXkpC8InRcsSSJwFG6Nyz+eIKHf41SCQ= Received: from smtp50.i.mail.ru (smtp50.i.mail.ru [94.100.177.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9FBC06ECDB for ; Thu, 9 Dec 2021 13:26:05 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 9FBC06ECDB Received: by smtp50.i.mail.ru with esmtpa (envelope-from ) id 1mvGcm-0003gI-N1; Thu, 09 Dec 2021 13:26:05 +0300 To: Sergey Ostanevich , Igor Munkin Date: Thu, 9 Dec 2021 13:24:03 +0300 Message-Id: X-Mailer: git-send-email 2.33.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD99F281DB1F96F126D842ED54683F193CB18B7EFB3A8B8627300894C459B0CD1B9652967D9E74735DC691016E0B736A3F48D213B971117D8B442A26DAC8B975E28 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7705F446BE41E38A1EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006370487B2B06A9A4E218638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D899825D2D5424914FB2C83303DD991B85117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD186FD1C55BDD38FC3FD2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B613439FA09F3DCB32089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FCAA813A7B1C2A557F7EB84C0F41898730DFC5024D78A0F5BEB1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDBF87214F1A954108EDC48ACC2A39D04F89CDFB48F4795C241BDAD6C7F3747799A X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D348B409C2D257583DF4BFB78E87C8FD8556C05D47C9F56A04D06082DD946082B901602983B1B92D3D61D7E09C32AA3244CED34E4362D98172158ED2881311E189933C9DC155518937F927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioja3eBVivI1t0e/Ec06Unlwg== X-Mailru-Sender: 3B9A0136629DC91206CBC582EFEF4CB46EA2E13EA80C3DA0FB6B013CB891BD5129F442182F6C0BAEF2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A84198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: [Tarantool-patches] [PATCH luajit 2/2] FFI/ARM64: Fix pass-by-value struct calling conventions. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" From: Mike Pall (cherry picked from commit 9143e86498436892cb4316550be4d45b68a61224) The arm64 parameters passing rules for Homogeneous Floating-point Aggregates (HFA) are the following [1]: * If the argument type is an HFA or an HVA, then the argument is used unmodified. * If the argument is an HFA and there are sufficient unallocated Floating-point registers, then the argument is allocated to Floating-pointRegisters (with one register per member of the HFA). Also, if the argument type is a Composite Type then the size of the argument is rounded up to the nearest multiple of 8 bytes. LuaJIT FFI backend makes this rounding unconditionally for arm64 architecture and uses the result value to determine the necessary amount of registers for the call. So for the HFA composed of 2n + 1 (< 4) float members the extra one register is supposed as occupied. This leads to the wrong value (0 due to the corresponding memset in `cconv_struct_init()`) in this register if the other one parameter is passed to the procedure. This patch fixes this case by using the real size of structure in the calculation of necessary amount of registers to use. Sergey Kaplun: * added the description and the test for the problem [1]: https://developer.arm.com/documentation/ihi0055/b/ Part of tarantool/tarantool#6548 --- OK, there is an important note before you proceed with the patch: `isfp` variable in arm64 may takes the following values: 0 - for the pointer argument 1 - each part of the argument (i.e. field in a structure or floating point number itself) takes exactly one register 2 - the structure is HFA (including complex floats) and compact, i.e. two fields may be saved into one register Patch fixes the behaviour in the last case, when the structure is HFA and takes 8*n + 4 bytes. The variable can't take another value exept mentioned before, IINM (I've checked it several times). So the magic `d->size >> (4-isfp)` is exactly `d->size/sizeof(float)`. I have no idea why Mike's prefered such interesting way to fix the behaviour in this case... May be he has its own branch with different `isfp` values and this is necessary for compatibility. Side note: I choose a general name (without mentioning arm64) for this C "library" in purpose. It may be adjusted in the future for brute force testing of FFI calling conventions for different architectures. See also: https://github.com/facebookarchive/luaffifb/blob/master/test.lua This link has an interesting example of FFI tests. May be we can create something similar with autogeneration of functions, structures and tests (not right now, but later). Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci Tarantool branch: https://github.com/tarantool/luajit/tree/tarantool/gh-noticket-arm64-ffi-ccall-fp-convention-full-ci src/lj_ccall.c | 3 +- test/tarantool-tests/CMakeLists.txt | 1 + .../arm64-ccall-fp-convention.test.lua | 65 +++++++++++++++++++ test/tarantool-tests/ffi-ccall/CMakeLists.txt | 1 + test/tarantool-tests/ffi-ccall/libfficcall.c | 28 ++++++++ 5 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 test/tarantool-tests/arm64-ccall-fp-convention.test.lua create mode 100644 test/tarantool-tests/ffi-ccall/CMakeLists.txt create mode 100644 test/tarantool-tests/ffi-ccall/libfficcall.c diff --git a/src/lj_ccall.c b/src/lj_ccall.c index f26f1fea..d39ff861 100644 --- a/src/lj_ccall.c +++ b/src/lj_ccall.c @@ -337,7 +337,8 @@ if (LJ_TARGET_OSX && isva) { \ /* IOS: All variadic arguments are on the stack. */ \ } else if (isfp) { /* Try to pass argument in FPRs. */ \ - int n2 = ctype_isvector(d->info) ? 1 : n*isfp; \ + int n2 = ctype_isvector(d->info) ? 1 : \ + isfp == 1 ? n : (d->size >> (4-isfp)); \ if (nfpr + n2 <= CCALL_NARG_FPR) { \ dp = &cc->fpr[nfpr]; \ nfpr += n2; \ diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt index 992556d9..50769cda 100644 --- a/test/tarantool-tests/CMakeLists.txt +++ b/test/tarantool-tests/CMakeLists.txt @@ -56,6 +56,7 @@ macro(BuildTestCLib lib sources) set(LD_LIBRARY_PATH "${CMAKE_CURRENT_BINARY_DIR}:${LD_LIBRARY_PATH}" PARENT_SCOPE) endmacro() +add_subdirectory(ffi-ccall) add_subdirectory(gh-4427-ffi-sandwich) add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) add_subdirectory(gh-6189-cur_L) diff --git a/test/tarantool-tests/arm64-ccall-fp-convention.test.lua b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua new file mode 100644 index 00000000..c0d2236a --- /dev/null +++ b/test/tarantool-tests/arm64-ccall-fp-convention.test.lua @@ -0,0 +1,65 @@ +local ffi = require('ffi') +local tap = require('tap') + +local ffi_ccall = ffi.load('libfficcall') + +local test = tap.test('ffi c call convention') +test:plan(3) + +ffi.cdef[[ +struct sz12_t { + float f1; + float f2; + float f3; +}; + +/* Just return the value itself. */ +struct sz12_t retsz12(struct sz12_t a); + +/* Return sum of two. */ +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b); + +/* Return sum of three. */ +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c); +]] + +-- For pretty error messages. +local sz12_mt = {__tostring = function(sz12) + return string.format('{%d,%d,%d}', sz12.f1, sz12.f2, sz12.f3) +end} + +local sz12_t = ffi.metatype('struct sz12_t', sz12_mt) + +local function new_sz12(f1, f2, f3) + return sz12_t(f1, f2, f3) +end + +-- For pretty error messages. +local expected_mt = {__tostring = function(t) + return '{'..table.concat(t, ',')..'}' +end} + +local function test_sz12(got, expected) + assert(type(got) == 'cdata') + assert(type(expected) == 'table') + expected = setmetatable(expected, expected_mt) + for i = 1, #expected do + if got['f'..i] ~= expected[i] then + error(('bad sz12_t value: got %s, expected %s'):format( + tostring(got), tostring(expected) + )) + end + end + return true +end + +-- Test arm64 calling convention for HFA structures. +-- Need structure which size is not multiple by 8. +local sz12_111 = new_sz12(1, 1, 1) +test:ok(test_sz12(sz12_111, {1, 1, 1}), 'base') +local sz12_222 = ffi_ccall.add2sz12(sz12_111, sz12_111) +test:ok(test_sz12(sz12_222, {2, 2, 2}), '2 structures as args') +local sz12_333 = ffi_ccall.add3sz12(sz12_111, sz12_111, sz12_111) +test:ok(test_sz12(sz12_333, {3, 3, 3}), '3 structures as args') + +os.exit(test:check() and 0 or 1) diff --git a/test/tarantool-tests/ffi-ccall/CMakeLists.txt b/test/tarantool-tests/ffi-ccall/CMakeLists.txt new file mode 100644 index 00000000..df937bf8 --- /dev/null +++ b/test/tarantool-tests/ffi-ccall/CMakeLists.txt @@ -0,0 +1 @@ +BuildTestCLib(libfficcall libfficcall.c) diff --git a/test/tarantool-tests/ffi-ccall/libfficcall.c b/test/tarantool-tests/ffi-ccall/libfficcall.c new file mode 100644 index 00000000..0c043a68 --- /dev/null +++ b/test/tarantool-tests/ffi-ccall/libfficcall.c @@ -0,0 +1,28 @@ +struct sz12_t { + float f1; + float f2; + float f3; +}; + +struct sz12_t retsz12(struct sz12_t a) +{ + return a; +} + +struct sz12_t add2sz12(struct sz12_t a, struct sz12_t b) +{ + struct sz12_t res = {0}; + res.f1 = a.f1 + b.f1; + res.f2 = a.f2 + b.f2; + res.f3 = a.f3 + b.f3; + return res; +} + +struct sz12_t add3sz12(struct sz12_t a, struct sz12_t b, struct sz12_t c) +{ + struct sz12_t res = {0}; + res.f1 = a.f1 + b.f1 + c.f1; + res.f2 = a.f2 + b.f2 + c.f2; + res.f3 = a.f3 + b.f3 + c.f3; + return res; +} -- 2.33.1