* [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 9:36 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:25 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
` (20 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Contributed by Djordje Kovacevic and Stefan Pejic.
(cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
Without the aforementioned checks, some non-branch instructions may be
interpreted as some branch due to memory address collisions. This patch
adds the corresponding comparisons masked values with instruction
opcodes used in the LuaJIT:
* `MIPSI_BEQ` for `beq` and `bne`,
* `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
* `MIPSI_BC1F` for `bc1f` and `bc1t`,
see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
details.
To reproduce this failure, we need specific memory mapping, so testcase
is omitted.
Since MIPS architecture is not supported by Tarantool (at the moment)
this patch is not necessary for backport. OTOH, it gives to us the
following benefits:
* Be in sync with the LuaJIT upstream not only for x86_64, arm64
architectures.
* Avoid conflicts during the future backporting.
So, it's more useful to backport some of the patches to avoid conflicts
with the future patch series.
[1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
Sergey Kaplun:
* added the description for the problem
Part of tarantool/tarantool#8825
---
src/lj_asm_mips.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index 03417013..03215821 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -2472,7 +2472,11 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
MCode tjump = MIPSI_J|(((uintptr_t)target>>2)&0x03ffffffu);
for (p++; p < pe; p++) {
if (*p == exitload) { /* Look for load of exit number. */
- if (((p[-1] ^ (px-p)) & 0xffffu) == 0) { /* Look for exitstub branch. */
+ /* Look for exitstub branch. Yes, this covers all used branch variants. */
+ if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
+ ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
+ (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
+ (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
ptrdiff_t delta = target - p;
if (((delta + 0x8000) >> 16) == 0) { /* Patch in-range branch. */
patchbranch:
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
@ 2023-08-15 9:36 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 12:40 ` Sergey Kaplun via Tarantool-patches
2023-08-16 13:25 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 9:36 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:50PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic.
>
> (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
>
> Without the aforementioned checks, some non-branch instructions may be
> interpreted as some branch due to memory address collisions. This patch
Please add a more comprehensive description of behavior before the patch.
Because of magic values it is not obvious that the difference between the
current PC and the jump address is XORed with the opcode, to make sure
that this is a branching instruction.
Typo: s/some branch/branches/
> adds the corresponding comparisons masked values with instruction
Typo: s/comparisons masked values/mask values for comparisons/
> opcodes used in the LuaJIT:
> * `MIPSI_BEQ` for `beq` and `bne`,
> * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
> * `MIPSI_BC1F` for `bc1f` and `bc1t`,
> see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
> details.
>
> To reproduce this failure, we need specific memory mapping, so testcase
Typo: s/testcase/the test case/
> is omitted.
>
> Since MIPS architecture is not supported by Tarantool (at the moment)
> this patch is not necessary for backport. OTOH, it gives to us the
Typo: s/gives to us/gives us/
> following benefits:
> * Be in sync with the LuaJIT upstream not only for x86_64, arm64
> architectures.
> * Avoid conflicts during the future backporting.
Typo: s/during the future/during future/
> So, it's more useful to backport some of the patches to avoid conflicts
> with the future patch series.
>
> [1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_asm_mips.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 03417013..03215821 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -2472,7 +2472,11 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
> MCode tjump = MIPSI_J|(((uintptr_t)target>>2)&0x03ffffffu);
> for (p++; p < pe; p++) {
> if (*p == exitload) { /* Look for load of exit number. */
> - if (((p[-1] ^ (px-p)) & 0xffffu) == 0) { /* Look for exitstub branch. */
> + /* Look for exitstub branch. Yes, this covers all used branch variants. */
> + if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
> + ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
> + (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
> + (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
> ptrdiff_t delta = target - p;
> if (((delta + 0x8000) >> 16) == 0) { /* Patch in-range branch. */
> patchbranch:
> --
> 2.41.0
>
Best regards,
Maxim Kokryashkin
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
2023-08-15 9:36 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 12:40 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 12:40 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> LGTM, except for a few comments below.
>
> On Wed, Aug 09, 2023 at 06:35:50PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Djordje Kovacevic and Stefan Pejic.
> >
> > (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
> >
> > Without the aforementioned checks, some non-branch instructions may be
> > interpreted as some branch due to memory address collisions. This patch
> Please add a more comprehensive description of behavior before the patch.
> Because of magic values it is not obvious that the difference between the
> current PC and the jump address is XORed with the opcode, to make sure
> that this is a branching instruction.
Added. The new commit message is the following:
| MIPS: Use precise search for exit jump patching.
|
| Contributed by Djordje Kovacevic and Stefan Pejic.
|
| (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
|
| The branch instruction contains PC-relative mcode address in the lowest
| 4 bytes. To ensure that it is branch instruction we check that
| difference of the address of the current instruction and jump target is
| contained in the lowest 4 bytes of the instruction. But there is no
| check that opcode of this instruction is branch opcode. Without the
| aforementioned checks, some non-branch instructions may be interpreted
| as branches due to memory address collisions. This patch adds the
| corresponding mask values for comparisons with instruction opcodes used
| in the LuaJIT:
| * `MIPSI_BEQ` for `beq` and `bne`,
| * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
| * `MIPSI_BC1F` for `bc1f` and `bc1t`,
| see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
| details.
|
| To reproduce this failure, we need specific memory mapping, so the test
| case is omitted.
|
| Since MIPS architecture is not supported by Tarantool (at the moment)
| this patch is not necessary for backport. OTOH, it gives us the
| following benefits:
| * Be in sync with the LuaJIT upstream not only for x86_64, arm64
| architectures.
| * Avoid conflicts during future backporting.
| So, it's more useful to backport some of the patches to avoid conflicts
| with the future patch series.
|
| [1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
|
| Sergey Kaplun:
| * added the description for the problem
|
| Part of tarantool/tarantool#8825
>
> Typo: s/some branch/branches/
Fixed.
> > adds the corresponding comparisons masked values with instruction
> Typo: s/comparisons masked values/mask values for comparisons/
Fixed.
> > opcodes used in the LuaJIT:
> > * `MIPSI_BEQ` for `beq` and `bne`,
> > * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
> > * `MIPSI_BC1F` for `bc1f` and `bc1t`,
> > see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
> > details.
> >
> > To reproduce this failure, we need specific memory mapping, so testcase
> Typo: s/testcase/the test case/
Fixed.
> > is omitted.
> >
> > Since MIPS architecture is not supported by Tarantool (at the moment)
> > this patch is not necessary for backport. OTOH, it gives to us the
> Typo: s/gives to us/gives us/
Fixed.
> > following benefits:
> > * Be in sync with the LuaJIT upstream not only for x86_64, arm64
> > architectures.
> > * Avoid conflicts during the future backporting.
> Typo: s/during the future/during future/
Fixed.
> > So, it's more useful to backport some of the patches to avoid conflicts
<snipped>
> >
> Best regards,
> Maxim Kokryashkin
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
2023-08-15 9:36 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:25 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 13:25 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey
Thanks for the patch! LGTM
On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic.
>
> (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
>
> Without the aforementioned checks, some non-branch instructions may be
> interpreted as some branch due to memory address collisions. This patch
> adds the corresponding comparisons masked values with instruction
> opcodes used in the LuaJIT:
> * `MIPSI_BEQ` for `beq` and `bne`,
> * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
> * `MIPSI_BC1F` for `bc1f` and `bc1t`,
> see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
> details.
>
> To reproduce this failure, we need specific memory mapping, so testcase
> is omitted.
>
> Since MIPS architecture is not supported by Tarantool (at the moment)
> this patch is not necessary for backport. OTOH, it gives to us the
> following benefits:
> * Be in sync with the LuaJIT upstream not only for x86_64, arm64
> architectures.
> * Avoid conflicts during the future backporting.
> So, it's more useful to backport some of the patches to avoid conflicts
> with the future patch series.
>
> [1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_asm_mips.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 03417013..03215821 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -2472,7 +2472,11 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
> MCode tjump = MIPSI_J|(((uintptr_t)target>>2)&0x03ffffffu);
> for (p++; p < pe; p++) {
> if (*p == exitload) { /* Look for load of exit number. */
> - if (((p[-1] ^ (px-p)) & 0xffffu) == 0) { /* Look for exitstub branch. */
> + /* Look for exitstub branch. Yes, this covers all used branch variants. */
> + if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
> + ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
> + (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
> + (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
> ptrdiff_t delta = target - p;
> if (((delta + 0x8000) >> 16) == 0) { /* Patch in-range branch. */
> patchbranch:
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 10:14 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:32 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
` (19 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
depends on particular offset of mcode for side trace regarding the
parent trace. Before this commit just run some amount of functions to
generate traces to fill the required mcode range. Unfortunately, this
approach is not robust, since sometimes trace is not recorded due to
errors "leaving loop in root trace" observed because of hotcount
collisions.
This patch introduces the following helpers:
* `frontend.gettraceno(func)` -- returns the traceno for the given
function, assumming that there is compiled trace for its prototype
(i.e. the 0th bytecode is JFUNC).
* `jit.generators.fillmcode(traceno, size)` fills mcode area of the
given size from the given trace. It is useful to generate some mcode
to test jumps to side traces remote enough from the parent.
---
...8-fix-side-exit-patching-on-arm64.test.lua | 78 ++----------
test/tarantool-tests/utils/frontend.lua | 24 ++++
test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
3 files changed, 150 insertions(+), 67 deletions(-)
create mode 100644 test/tarantool-tests/utils/jit/generators.lua
diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
index 93db3041..678ac914 100644
--- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
+++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
@@ -1,8 +1,12 @@
local tap = require('tap')
local test = tap.test('gh-6098-fix-side-exit-patching-on-arm64'):skipcond({
['Test requires JIT enabled'] = not jit.status(),
+ ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
})
+local generators = require('utils').jit.generators
+local frontend = require('utils').frontend
+
test:plan(1)
-- The function to be tested for side exit patching:
@@ -20,52 +24,6 @@ local function cbool(cond)
end
end
--- XXX: Function template below produces 8Kb mcode for ARM64, so
--- we need to compile at least 128 traces to exceed 1Mb delta
--- between <cbool> root trace side exit and <cbool> side trace.
--- Unfortunately, we have no other option for extending this jump
--- delta, since the base of the current mcode area (J->mcarea) is
--- used as a hint for mcode allocator (see lj_mcode.c for info).
-local FUNCS = 128
-local recfuncs = { }
-for i = 1, FUNCS do
- -- This is a quite heavy workload (though it doesn't look like
- -- one at first). Each load from a table is type guarded. Each
- -- table lookup (for both stores and loads) is guarded for table
- -- <hmask> value and metatable presence. The code below results
- -- to 8Kb of mcode for ARM64 in practice.
- recfuncs[i] = assert(load(([[
- return function(src)
- local p = %d
- local tmp = { }
- local dst = { }
- for i = 1, 3 do
- tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p
- tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p
- tmp.c = src.c * p tmp.l = src.l * p tmp.u = src.u * p
- tmp.d = src.d * p tmp.m = src.m * p tmp.v = src.v * p
- tmp.e = src.e * p tmp.n = src.n * p tmp.w = src.w * p
- tmp.f = src.f * p tmp.o = src.o * p tmp.x = src.x * p
- tmp.g = src.g * p tmp.p = src.p * p tmp.y = src.y * p
- tmp.h = src.h * p tmp.q = src.q * p tmp.z = src.z * p
- tmp.i = src.i * p tmp.r = src.r * p
-
- dst.a = tmp.z + p dst.j = tmp.q + p dst.s = tmp.h + p
- dst.b = tmp.y + p dst.k = tmp.p + p dst.t = tmp.g + p
- dst.c = tmp.x + p dst.l = tmp.o + p dst.u = tmp.f + p
- dst.d = tmp.w + p dst.m = tmp.n + p dst.v = tmp.e + p
- dst.e = tmp.v + p dst.n = tmp.m + p dst.w = tmp.d + p
- dst.f = tmp.u + p dst.o = tmp.l + p dst.x = tmp.c + p
- dst.g = tmp.t + p dst.p = tmp.k + p dst.y = tmp.b + p
- dst.h = tmp.s + p dst.q = tmp.j + p dst.z = tmp.a + p
- dst.i = tmp.r + p dst.r = tmp.i + p
- end
- dst.tmp = tmp
- return dst
- end
- ]]):format(i)), ('Syntax error in function recfuncs[%d]'):format(i))()
-end
-
-- Make compiler work hard:
-- * No optimizations at all to produce more mcode.
-- * Try to compile all compiled paths as early as JIT can.
@@ -78,27 +36,13 @@ cbool(true)
-- a root trace for <cbool>.
cbool(true)
-for i = 1, FUNCS do
- -- XXX: FNEW is NYI, hence loop recording fails at this point.
- -- The recording is aborted on purpose: we are going to record
- -- <FUNCS> number of traces for functions in <recfuncs>.
- -- Otherwise, loop recording might lead to a very long trace
- -- error (via return to a lower frame), or a trace with lots of
- -- side traces. We need neither of this, but just bunch of
- -- traces filling the available mcode area.
- local function tnew(p)
- return {
- a = p + 1, f = p + 6, k = p + 11, p = p + 16, u = p + 21, z = p + 26,
- b = p + 2, g = p + 7, l = p + 12, q = p + 17, v = p + 22,
- c = p + 3, h = p + 8, m = p + 13, r = p + 18, w = p + 23,
- d = p + 4, i = p + 9, n = p + 14, s = p + 19, x = p + 24,
- e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
- }
- end
- -- Each function call produces a trace (see the template for the
- -- function definition above).
- recfuncs[i](tnew(i))
-end
+local cbool_traceno = frontend.gettraceno(cbool)
+
+-- XXX: Unfortunately, we have no other option for extending
+-- this jump delta, since the base of the current mcode area
+-- (J->mcarea) is used as a hint for mcode allocator (see
+-- lj_mcode.c for info).
+generators.fillmcode(cbool_traceno, 1024 * 1024)
-- XXX: I tried to make the test in pure Lua, but I failed to
-- implement the robust solution. As a result I've implemented a
diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
index 2afebbb2..414257fd 100644
--- a/test/tarantool-tests/utils/frontend.lua
+++ b/test/tarantool-tests/utils/frontend.lua
@@ -1,6 +1,10 @@
local M = {}
local bc = require('jit.bc')
+local jutil = require('jit.util')
+local vmdef = require('jit.vmdef')
+local bcnames = vmdef.bcnames
+local band, rshift = bit.band, bit.rshift
function M.hasbc(f, bytecode)
assert(type(f) == 'function', 'argument #1 should be a function')
@@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
return hasbc
end
+-- Get traceno of the trace assotiated for the given function.
+function M.gettraceno(func)
+ assert(type(func) == 'function', 'argument #1 should be a function')
+
+ -- The 0th BC is the header.
+ local func_ins = jutil.funcbc(func, 0)
+ local BC_NAME_LENGTH = 6
+ local RD_SHIFT = 16
+
+ -- Calculate index in `bcnames` string.
+ local op_idx = BC_NAME_LENGTH * band(func_ins, 0xff)
+ -- Get the name of the operation.
+ local op_name = string.sub(bcnames, op_idx + 1, op_idx + BC_NAME_LENGTH)
+ assert(op_name:match('JFUNC'),
+ 'The given function has non-jitted header: ' .. op_name)
+
+ -- RD contains the traceno.
+ return rshift(func_ins, RD_SHIFT)
+end
+
return M
diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
new file mode 100644
index 00000000..62b6e0ef
--- /dev/null
+++ b/test/tarantool-tests/utils/jit/generators.lua
@@ -0,0 +1,115 @@
+local M = {}
+
+local jutil = require('jit.util')
+
+local function getlast_traceno()
+ return misc.getmetrics().jit_trace_num
+end
+
+-- Convert addr to positive value if needed.
+local function canonize_address(addr)
+ if addr < 0 then addr = addr + 2 ^ 32 end
+ return addr
+end
+
+-- Need some storage to avoid functions and traces to be
+-- collected.
+local recfuncs = {}
+local last_i = 0
+-- This function generates a table of functions with heavy mcode
+-- payload with tab arithmetics to fill the mcode area from the
+-- one trace mcode by the some given size. This size is usually
+-- big enough, because we want to check long jump side exits from
+-- some traces.
+-- Assumes, that maxmcode and maxtrace options are set to be sure,
+-- that we can produce such amount of mcode.
+function M.fillmcode(trace_from, size)
+ local mcode, addr_from = jutil.tracemc(trace_from)
+ assert(mcode, 'the #1 argument should be an existed trace number')
+ addr_from = canonize_address(addr_from)
+ local required_diff = size + #mcode
+
+ -- Marker to check that traces are not flushed.
+ local maxtraceno = getlast_traceno()
+ local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
+
+ local _, last_addr = jutil.tracemc(maxtraceno)
+ last_addr = canonize_address(last_addr)
+
+ -- Addresses of traces may increase or decrease depending on OS,
+ -- so use absolute diff.
+ while math.abs(last_addr - addr_from) > required_diff do
+ last_i = last_i + 1
+ -- This is a quite heavy workload (though it doesn't look like
+ -- one at first). Each load from a table is type guarded. Each
+ -- table lookup (for both stores and loads) is guarded for
+ -- table <hmask> value and presence of the metatable. The code
+ -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
+ -- practice.
+ local fname = ('fillmcode[%d]'):format(last_i)
+ recfuncs[last_i] = assert(loadstring(([[
+ return function(src)
+ local p = %d
+ local tmp = { }
+ local dst = { }
+ -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
+ -- in root trace) errors due to hotcount collisions.
+ for i = 1, 5 do
+ tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p
+ tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p
+ tmp.c = src.c * p tmp.l = src.l * p tmp.u = src.u * p
+ tmp.d = src.d * p tmp.m = src.m * p tmp.v = src.v * p
+ tmp.e = src.e * p tmp.n = src.n * p tmp.w = src.w * p
+ tmp.f = src.f * p tmp.o = src.o * p tmp.x = src.x * p
+ tmp.g = src.g * p tmp.p = src.p * p tmp.y = src.y * p
+ tmp.h = src.h * p tmp.q = src.q * p tmp.z = src.z * p
+ tmp.i = src.i * p tmp.r = src.r * p
+
+ dst.a = tmp.z + p dst.j = tmp.q + p dst.s = tmp.h + p
+ dst.b = tmp.y + p dst.k = tmp.p + p dst.t = tmp.g + p
+ dst.c = tmp.x + p dst.l = tmp.o + p dst.u = tmp.f + p
+ dst.d = tmp.w + p dst.m = tmp.n + p dst.v = tmp.e + p
+ dst.e = tmp.v + p dst.n = tmp.m + p dst.w = tmp.d + p
+ dst.f = tmp.u + p dst.o = tmp.l + p dst.x = tmp.c + p
+ dst.g = tmp.t + p dst.p = tmp.k + p dst.y = tmp.b + p
+ dst.h = tmp.s + p dst.q = tmp.j + p dst.z = tmp.a + p
+ dst.i = tmp.r + p dst.r = tmp.i + p
+ end
+ dst.tmp = tmp
+ return dst
+ end
+ ]]):format(last_i), fname), ('Syntax error in function %s'):format(fname))()
+ -- XXX: FNEW is NYI, hence loop recording fails at this point.
+ -- The recording is aborted on purpose: the whole loop
+ -- recording might lead to a very long trace error (via return
+ -- to a lower frame), or a trace with lots of side traces. We
+ -- need neither of this, but just a bunch of traces filling
+ -- the available mcode area.
+ local function tnew(p)
+ return {
+ a = p + 1, f = p + 6, k = p + 11, p = p + 16, u = p + 21, z = p + 26,
+ b = p + 2, g = p + 7, l = p + 12, q = p + 17, v = p + 22,
+ c = p + 3, h = p + 8, m = p + 13, r = p + 18, w = p + 23,
+ d = p + 4, i = p + 9, n = p + 14, s = p + 19, x = p + 24,
+ e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
+ }
+ end
+ -- Each function call produces a trace (see the template for
+ -- the function definition above).
+ recfuncs[last_i](tnew(last_i))
+ local last_traceno = getlast_traceno()
+ if last_traceno < maxtraceno then
+ error(FLUSH_ERR)
+ end
+
+ -- Calculate the address of the last trace start.
+ maxtraceno = last_traceno
+ _, last_addr = jutil.tracemc(last_traceno)
+ if not last_addr then
+ error(FLUSH_ERR)
+ end
+ last_addr = canonize_address(last_addr)
+ end
+end
+
+return M
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
@ 2023-08-15 10:14 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 12:55 ` Sergey Kaplun via Tarantool-patches
2023-08-16 14:32 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 10:14 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
Please consider my comments below.
On Wed, Aug 09, 2023 at 06:35:51PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
> depends on particular offset of mcode for side trace regarding the
> parent trace. Before this commit just run some amount of functions to
> generate traces to fill the required mcode range. Unfortunately, this
> approach is not robust, since sometimes trace is not recorded due to
> errors "leaving loop in root trace" observed because of hotcount
> collisions.
>
> This patch introduces the following helpers:
> * `frontend.gettraceno(func)` -- returns the traceno for the given
> function, assumming that there is compiled trace for its prototype
> (i.e. the 0th bytecode is JFUNC).
> * `jit.generators.fillmcode(traceno, size)` fills mcode area of the
> given size from the given trace. It is useful to generate some mcode
> to test jumps to side traces remote enough from the parent.
> ---
> ...8-fix-side-exit-patching-on-arm64.test.lua | 78 ++----------
> test/tarantool-tests/utils/frontend.lua | 24 ++++
> test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
> 3 files changed, 150 insertions(+), 67 deletions(-)
> create mode 100644 test/tarantool-tests/utils/jit/generators.lua
>
> diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> index 93db3041..678ac914 100644
> --- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> +++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> @@ -1,8 +1,12 @@
> local tap = require('tap')
> local test = tap.test('gh-6098-fix-side-exit-patching-on-arm64'):skipcond({
> ['Test requires JIT enabled'] = not jit.status(),
> + ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
> })
>
> +local generators = require('utils').jit.generators
> +local frontend = require('utils').frontend
> +
> test:plan(1)
>
> -- The function to be tested for side exit patching:
> @@ -20,52 +24,6 @@ local function cbool(cond)
> end
> end
>
> --- XXX: Function template below produces 8Kb mcode for ARM64, so
> --- we need to compile at least 128 traces to exceed 1Mb delta
> --- between <cbool> root trace side exit and <cbool> side trace.
> --- Unfortunately, we have no other option for extending this jump
> --- delta, since the base of the current mcode area (J->mcarea) is
> --- used as a hint for mcode allocator (see lj_mcode.c for info).
> -local FUNCS = 128
> -local recfuncs = { }
> -for i = 1, FUNCS do
> - -- This is a quite heavy workload (though it doesn't look like
> - -- one at first). Each load from a table is type guarded. Each
> - -- table lookup (for both stores and loads) is guarded for table
> - -- <hmask> value and metatable presence. The code below results
> - -- to 8Kb of mcode for ARM64 in practice.
> - recfuncs[i] = assert(load(([[
> - return function(src)
> - local p = %d
> - local tmp = { }
> - local dst = { }
> - for i = 1, 3 do
> - tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p
> - tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p
> - tmp.c = src.c * p tmp.l = src.l * p tmp.u = src.u * p
> - tmp.d = src.d * p tmp.m = src.m * p tmp.v = src.v * p
> - tmp.e = src.e * p tmp.n = src.n * p tmp.w = src.w * p
> - tmp.f = src.f * p tmp.o = src.o * p tmp.x = src.x * p
> - tmp.g = src.g * p tmp.p = src.p * p tmp.y = src.y * p
> - tmp.h = src.h * p tmp.q = src.q * p tmp.z = src.z * p
> - tmp.i = src.i * p tmp.r = src.r * p
> -
> - dst.a = tmp.z + p dst.j = tmp.q + p dst.s = tmp.h + p
> - dst.b = tmp.y + p dst.k = tmp.p + p dst.t = tmp.g + p
> - dst.c = tmp.x + p dst.l = tmp.o + p dst.u = tmp.f + p
> - dst.d = tmp.w + p dst.m = tmp.n + p dst.v = tmp.e + p
> - dst.e = tmp.v + p dst.n = tmp.m + p dst.w = tmp.d + p
> - dst.f = tmp.u + p dst.o = tmp.l + p dst.x = tmp.c + p
> - dst.g = tmp.t + p dst.p = tmp.k + p dst.y = tmp.b + p
> - dst.h = tmp.s + p dst.q = tmp.j + p dst.z = tmp.a + p
> - dst.i = tmp.r + p dst.r = tmp.i + p
> - end
> - dst.tmp = tmp
> - return dst
> - end
> - ]]):format(i)), ('Syntax error in function recfuncs[%d]'):format(i))()
> -end
> -
> -- Make compiler work hard:
> -- * No optimizations at all to produce more mcode.
> -- * Try to compile all compiled paths as early as JIT can.
> @@ -78,27 +36,13 @@ cbool(true)
> -- a root trace for <cbool>.
> cbool(true)
>
> -for i = 1, FUNCS do
> - -- XXX: FNEW is NYI, hence loop recording fails at this point.
> - -- The recording is aborted on purpose: we are going to record
> - -- <FUNCS> number of traces for functions in <recfuncs>.
> - -- Otherwise, loop recording might lead to a very long trace
> - -- error (via return to a lower frame), or a trace with lots of
> - -- side traces. We need neither of this, but just bunch of
> - -- traces filling the available mcode area.
> - local function tnew(p)
> - return {
> - a = p + 1, f = p + 6, k = p + 11, p = p + 16, u = p + 21, z = p + 26,
> - b = p + 2, g = p + 7, l = p + 12, q = p + 17, v = p + 22,
> - c = p + 3, h = p + 8, m = p + 13, r = p + 18, w = p + 23,
> - d = p + 4, i = p + 9, n = p + 14, s = p + 19, x = p + 24,
> - e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
> - }
> - end
> - -- Each function call produces a trace (see the template for the
> - -- function definition above).
> - recfuncs[i](tnew(i))
> -end
> +local cbool_traceno = frontend.gettraceno(cbool)
> +
> +-- XXX: Unfortunately, we have no other option for extending
> +-- this jump delta, since the base of the current mcode area
> +-- (J->mcarea) is used as a hint for mcode allocator (see
> +-- lj_mcode.c for info).
> +generators.fillmcode(cbool_traceno, 1024 * 1024)
>
> -- XXX: I tried to make the test in pure Lua, but I failed to
> -- implement the robust solution. As a result I've implemented a
> diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> index 2afebbb2..414257fd 100644
> --- a/test/tarantool-tests/utils/frontend.lua
> +++ b/test/tarantool-tests/utils/frontend.lua
> @@ -1,6 +1,10 @@
> local M = {}
>
> local bc = require('jit.bc')
> +local jutil = require('jit.util')
> +local vmdef = require('jit.vmdef')
> +local bcnames = vmdef.bcnames
> +local band, rshift = bit.band, bit.rshift
>
> function M.hasbc(f, bytecode)
> assert(type(f) == 'function', 'argument #1 should be a function')
> @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
> return hasbc
> end
>
> +-- Get traceno of the trace assotiated for the given function.
> +function M.gettraceno(func)
> + assert(type(func) == 'function', 'argument #1 should be a function')
> +
> + -- The 0th BC is the header.
> + local func_ins = jutil.funcbc(func, 0)
> + local BC_NAME_LENGTH = 6
> + local RD_SHIFT = 16
> +
> + -- Calculate index in `bcnames` string.
> + local op_idx = BC_NAME_LENGTH * band(func_ins, 0xff)
> + -- Get the name of the operation.
> + local op_name = string.sub(bcnames, op_idx + 1, op_idx + BC_NAME_LENGTH)
> + assert(op_name:match('JFUNC'),
> + 'The given function has non-jitted header: ' .. op_name)
> +
> + -- RD contains the traceno.
> + return rshift(func_ins, RD_SHIFT)
> +end
> +
> return M
> diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
> new file mode 100644
> index 00000000..62b6e0ef
> --- /dev/null
> +++ b/test/tarantool-tests/utils/jit/generators.lua
> @@ -0,0 +1,115 @@
> +local M = {}
> +
> +local jutil = require('jit.util')
> +
> +local function getlast_traceno()
> + return misc.getmetrics().jit_trace_num
> +end
> +
> +-- Convert addr to positive value if needed.
> +local function canonize_address(addr)
Nit: most of the time, the `canonize` variant is used in theological materials,
while the `canonicalize` is more common in the sphere of software development.
Feel free to ignore.
> + if addr < 0 then addr = addr + 2 ^ 32 end
> + return addr
> +end
> +
> +-- Need some storage to avoid functions and traces to be
> +-- collected.
Typo: s/Need/We need/ or s/Need some storage/Some storage is needed/
Typo: s/to be collected/being collected/
> +local recfuncs = {}
> +local last_i = 0
> +-- This function generates a table of functions with heavy mcode
> +-- payload with tab arithmetics to fill the mcode area from the
> +-- one trace mcode by the some given size. This size is usually
Typo: s/by the some/by some/
> +-- big enough, because we want to check long jump side exits from
> +-- some traces.
> +-- Assumes, that maxmcode and maxtrace options are set to be sure,
Typo: s/that/that the/
> +-- that we can produce such amount of mcode.
> +function M.fillmcode(trace_from, size)
> + local mcode, addr_from = jutil.tracemc(trace_from)
> + assert(mcode, 'the #1 argument should be an existed trace number')
Typo: s/existed/existing/
> + addr_from = canonize_address(addr_from)
> + local required_diff = size + #mcode
> +
> + -- Marker to check that traces are not flushed.
> + local maxtraceno = getlast_traceno()
> + local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
> +
> + local _, last_addr = jutil.tracemc(maxtraceno)
> + last_addr = canonize_address(last_addr)
> +
> + -- Addresses of traces may increase or decrease depending on OS,
> + -- so use absolute diff.
> + while math.abs(last_addr - addr_from) > required_diff do
> + last_i = last_i + 1
> + -- This is a quite heavy workload (though it doesn't look like
Typo: s/This is a quite/This is quite a/
> + -- one at first). Each load from a table is type guarded. Each
> + -- table lookup (for both stores and loads) is guarded for
> + -- table <hmask> value and presence of the metatable. The code
Typo: s/and presence/and the presence/
> + -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
Typo: s/results to/results in/
> + -- practice.
> + local fname = ('fillmcode[%d]'):format(last_i)
> + recfuncs[last_i] = assert(loadstring(([[
> + return function(src)
> + local p = %d
Nit: Poor naming, a more descriptive name is preferred.
> + local tmp = { }
> + local dst = { }
> + -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
Typo: s/as stop/as a stop/
> + -- in root trace) errors due to hotcount collisions.
> + for i = 1, 5 do
> + tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p
> + tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p
> + tmp.c = src.c * p tmp.l = src.l * p tmp.u = src.u * p
> + tmp.d = src.d * p tmp.m = src.m * p tmp.v = src.v * p
> + tmp.e = src.e * p tmp.n = src.n * p tmp.w = src.w * p
> + tmp.f = src.f * p tmp.o = src.o * p tmp.x = src.x * p
> + tmp.g = src.g * p tmp.p = src.p * p tmp.y = src.y * p
> + tmp.h = src.h * p tmp.q = src.q * p tmp.z = src.z * p
> + tmp.i = src.i * p tmp.r = src.r * p
> +
> + dst.a = tmp.z + p dst.j = tmp.q + p dst.s = tmp.h + p
> + dst.b = tmp.y + p dst.k = tmp.p + p dst.t = tmp.g + p
> + dst.c = tmp.x + p dst.l = tmp.o + p dst.u = tmp.f + p
> + dst.d = tmp.w + p dst.m = tmp.n + p dst.v = tmp.e + p
> + dst.e = tmp.v + p dst.n = tmp.m + p dst.w = tmp.d + p
> + dst.f = tmp.u + p dst.o = tmp.l + p dst.x = tmp.c + p
> + dst.g = tmp.t + p dst.p = tmp.k + p dst.y = tmp.b + p
> + dst.h = tmp.s + p dst.q = tmp.j + p dst.z = tmp.a + p
> + dst.i = tmp.r + p dst.r = tmp.i + p
> + end
> + dst.tmp = tmp
> + return dst
> + end
> + ]]):format(last_i), fname), ('Syntax error in function %s'):format(fname))()
> + -- XXX: FNEW is NYI, hence loop recording fails at this point.
> + -- The recording is aborted on purpose: the whole loop
> + -- recording might lead to a very long trace error (via return
> + -- to a lower frame), or a trace with lots of side traces. We
> + -- need neither of this, but just a bunch of traces filling
> + -- the available mcode area.
> + local function tnew(p)
Nit: same issue with naming.
> + return {
> + a = p + 1, f = p + 6, k = p + 11, p = p + 16, u = p + 21, z = p + 26,
> + b = p + 2, g = p + 7, l = p + 12, q = p + 17, v = p + 22,
> + c = p + 3, h = p + 8, m = p + 13, r = p + 18, w = p + 23,
> + d = p + 4, i = p + 9, n = p + 14, s = p + 19, x = p + 24,
> + e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
> + }
> + end
> + -- Each function call produces a trace (see the template for
> + -- the function definition above).
> + recfuncs[last_i](tnew(last_i))
> + local last_traceno = getlast_traceno()
> + if last_traceno < maxtraceno then
> + error(FLUSH_ERR)
> + end
> +
> + -- Calculate the address of the last trace start.
> + maxtraceno = last_traceno
> + _, last_addr = jutil.tracemc(last_traceno)
> + if not last_addr then
> + error(FLUSH_ERR)
> + end
> + last_addr = canonize_address(last_addr)
> + end
> +end
> +
> +return M
> --
> 2.41.0
Best regards,
Maxim Kokryashkin
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
2023-08-15 10:14 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 12:55 ` Sergey Kaplun via Tarantool-patches
2023-08-16 13:06 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 12:55 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Please, see my replies below.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
>
> On Wed, Aug 09, 2023 at 06:35:51PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
> > depends on particular offset of mcode for side trace regarding the
> > parent trace. Before this commit just run some amount of functions to
> > generate traces to fill the required mcode range. Unfortunately, this
> > approach is not robust, since sometimes trace is not recorded due to
> > errors "leaving loop in root trace" observed because of hotcount
> > collisions.
> >
> > This patch introduces the following helpers:
> > * `frontend.gettraceno(func)` -- returns the traceno for the given
> > function, assumming that there is compiled trace for its prototype
> > (i.e. the 0th bytecode is JFUNC).
> > * `jit.generators.fillmcode(traceno, size)` fills mcode area of the
> > given size from the given trace. It is useful to generate some mcode
> > to test jumps to side traces remote enough from the parent.
> > ---
> > ...8-fix-side-exit-patching-on-arm64.test.lua | 78 ++----------
> > test/tarantool-tests/utils/frontend.lua | 24 ++++
> > test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
> > 3 files changed, 150 insertions(+), 67 deletions(-)
> > create mode 100644 test/tarantool-tests/utils/jit/generators.lua
> >
> > diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> > index 93db3041..678ac914 100644
> > --- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> > +++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
<snipped>
> > diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> > index 2afebbb2..414257fd 100644
> > --- a/test/tarantool-tests/utils/frontend.lua
> > +++ b/test/tarantool-tests/utils/frontend.lua
<snipped>
> > diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
> > new file mode 100644
> > index 00000000..62b6e0ef
> > --- /dev/null
> > +++ b/test/tarantool-tests/utils/jit/generators.lua
> > @@ -0,0 +1,115 @@
> > +local M = {}
> > +
> > +local jutil = require('jit.util')
> > +
> > +local function getlast_traceno()
> > + return misc.getmetrics().jit_trace_num
> > +end
> > +
> > +-- Convert addr to positive value if needed.
> > +local function canonize_address(addr)
> Nit: most of the time, the `canonize` variant is used in theological materials,
> while the `canonicalize` is more common in the sphere of software development.
> Feel free to ignore.
Fixed, thanks.
> > + if addr < 0 then addr = addr + 2 ^ 32 end
> > + return addr
> > +end
> > +
> > +-- Need some storage to avoid functions and traces to be
> > +-- collected.
> Typo: s/Need/We need/ or s/Need some storage/Some storage is needed/
> Typo: s/to be collected/being collected/
Fixed.
> > +local recfuncs = {}
> > +local last_i = 0
> > +-- This function generates a table of functions with heavy mcode
> > +-- payload with tab arithmetics to fill the mcode area from the
> > +-- one trace mcode by the some given size. This size is usually
> Typo: s/by the some/by some/
Fixed, thanks!
> > +-- big enough, because we want to check long jump side exits from
> > +-- some traces.
> > +-- Assumes, that maxmcode and maxtrace options are set to be sure,
> Typo: s/that/that the/
Fixed.
> > +-- that we can produce such amount of mcode.
> > +function M.fillmcode(trace_from, size)
> > + local mcode, addr_from = jutil.tracemc(trace_from)
> > + assert(mcode, 'the #1 argument should be an existed trace number')
> Typo: s/existed/existing/
Fixed, thanks!
> > + addr_from = canonize_address(addr_from)
> > + local required_diff = size + #mcode
> > +
> > + -- Marker to check that traces are not flushed.
> > + local maxtraceno = getlast_traceno()
> > + local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
> > +
> > + local _, last_addr = jutil.tracemc(maxtraceno)
> > + last_addr = canonize_address(last_addr)
> > +
> > + -- Addresses of traces may increase or decrease depending on OS,
> > + -- so use absolute diff.
> > + while math.abs(last_addr - addr_from) > required_diff do
> > + last_i = last_i + 1
> > + -- This is a quite heavy workload (though it doesn't look like
> Typo: s/This is a quite/This is quite a/
Fixed.
> > + -- one at first). Each load from a table is type guarded. Each
> > + -- table lookup (for both stores and loads) is guarded for
> > + -- table <hmask> value and presence of the metatable. The code
> Typo: s/and presence/and the presence/
Fixed.
> > + -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
> Typo: s/results to/results in/
Fixed.
> > + -- practice.
> > + local fname = ('fillmcode[%d]'):format(last_i)
> > + recfuncs[last_i] = assert(loadstring(([[
> > + return function(src)
> > + local p = %d
> Nit: Poor naming, a more descriptive name is preferred.
It has no much sense, because we really don't care about of the
function's content. Since it's just moved part of the code, I prefer to
leave it as is.
Ignoring for now.
> > + local tmp = { }
> > + local dst = { }
> > + -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
> Typo: s/as stop/as a stop/
Fixed, thanks!
> > + -- in root trace) errors due to hotcount collisions.
> > + for i = 1, 5 do
<snipped>
> > + local function tnew(p)
> Nit: same issue with naming.
Ditto.
> > + return {
<snipped>
See the iterative patch below:
===================================================================
diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
index 62b6e0ef..65abfdaa 100644
--- a/test/tarantool-tests/utils/jit/generators.lua
+++ b/test/tarantool-tests/utils/jit/generators.lua
@@ -7,26 +7,26 @@ local function getlast_traceno()
end
-- Convert addr to positive value if needed.
-local function canonize_address(addr)
+local function canonicalize_address(addr)
if addr < 0 then addr = addr + 2 ^ 32 end
return addr
end
--- Need some storage to avoid functions and traces to be
+-- Some storage is needed to avoid functions and traces being
-- collected.
local recfuncs = {}
local last_i = 0
-- This function generates a table of functions with heavy mcode
-- payload with tab arithmetics to fill the mcode area from the
--- one trace mcode by the some given size. This size is usually
--- big enough, because we want to check long jump side exits from
--- some traces.
--- Assumes, that maxmcode and maxtrace options are set to be sure,
--- that we can produce such amount of mcode.
+-- one trace mcode by some given size. This size is usually big
+-- enough, because we want to check long jump side exits from some
+-- traces.
+-- Assumes, that the maxmcode and maxtrace options are set to be
+-- sure, that we can produce such amount of mcode.
function M.fillmcode(trace_from, size)
local mcode, addr_from = jutil.tracemc(trace_from)
- assert(mcode, 'the #1 argument should be an existed trace number')
- addr_from = canonize_address(addr_from)
+ assert(mcode, 'the #1 argument should be an existing trace number')
+ addr_from = canonicalize_address(addr_from)
local required_diff = size + #mcode
-- Marker to check that traces are not flushed.
@@ -34,17 +34,17 @@ function M.fillmcode(trace_from, size)
local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
local _, last_addr = jutil.tracemc(maxtraceno)
- last_addr = canonize_address(last_addr)
+ last_addr = canonicalize_address(last_addr)
-- Addresses of traces may increase or decrease depending on OS,
-- so use absolute diff.
while math.abs(last_addr - addr_from) > required_diff do
last_i = last_i + 1
- -- This is a quite heavy workload (though it doesn't look like
+ -- This is quite a heavy workload (though it doesn't look like
-- one at first). Each load from a table is type guarded. Each
-- table lookup (for both stores and loads) is guarded for
- -- table <hmask> value and presence of the metatable. The code
- -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
+ -- table <hmask> value and the presence of the metatable. The
+ -- code below results in ~8Kb of mcode for ARM64 and MIPS64 in
-- practice.
local fname = ('fillmcode[%d]'):format(last_i)
recfuncs[last_i] = assert(loadstring(([[
@@ -52,8 +52,8 @@ function M.fillmcode(trace_from, size)
local p = %d
local tmp = { }
local dst = { }
- -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
- -- in root trace) errors due to hotcount collisions.
+ -- XXX: use 5 as a stop index to reduce LLEAVE (leaving
+ -- loop in root trace) errors due to hotcount collisions.
for i = 1, 5 do
tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p
tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p
@@ -108,7 +108,7 @@ function M.fillmcode(trace_from, size)
if not last_addr then
error(FLUSH_ERR)
end
- last_addr = canonize_address(last_addr)
+ last_addr = canonicalize_address(last_addr)
end
end
===================================================================
> > +end
> > +
> > +return M
> > --
> > 2.41.0
> Best regards,
> Maxim Kokryashkin
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
2023-08-16 12:55 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 13:06 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16 13:06 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 10152 bytes --]
Hi, Sergey!
Thanks for the fixes!
LGTM
--
Best regards,
Maxim Kokryashkin
>Среда, 16 августа 2023, 16:00 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
>
>Hi, Maxim!
>Thanks for the review!
>Please, see my replies below.
>
>On 15.08.23, Maxim Kokryashkin wrote:
>> Hi, Sergey!
>> Thanks for the patch!
>> Please consider my comments below.
>>
>> On Wed, Aug 09, 2023 at 06:35:51PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> > The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
>> > depends on particular offset of mcode for side trace regarding the
>> > parent trace. Before this commit just run some amount of functions to
>> > generate traces to fill the required mcode range. Unfortunately, this
>> > approach is not robust, since sometimes trace is not recorded due to
>> > errors "leaving loop in root trace" observed because of hotcount
>> > collisions.
>> >
>> > This patch introduces the following helpers:
>> > * `frontend.gettraceno(func)` -- returns the traceno for the given
>> > function, assumming that there is compiled trace for its prototype
>> > (i.e. the 0th bytecode is JFUNC).
>> > * `jit.generators.fillmcode(traceno, size)` fills mcode area of the
>> > given size from the given trace. It is useful to generate some mcode
>> > to test jumps to side traces remote enough from the parent.
>> > ---
>> > ...8-fix-side-exit-patching-on-arm64.test.lua | 78 ++----------
>> > test/tarantool-tests/utils/frontend.lua | 24 ++++
>> > test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
>> > 3 files changed, 150 insertions(+), 67 deletions(-)
>> > create mode 100644 test/tarantool-tests/utils/jit/generators.lua
>> >
>> > diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
>> > index 93db3041..678ac914 100644
>> > --- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
>> > +++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
>
><snipped>
>
>> > diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
>> > index 2afebbb2..414257fd 100644
>> > --- a/test/tarantool-tests/utils/frontend.lua
>> > +++ b/test/tarantool-tests/utils/frontend.lua
>
><snipped>
>
>> > diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
>> > new file mode 100644
>> > index 00000000..62b6e0ef
>> > --- /dev/null
>> > +++ b/test/tarantool-tests/utils/jit/generators.lua
>> > @@ -0,0 +1,115 @@
>> > +local M = {}
>> > +
>> > +local jutil = require('jit.util')
>> > +
>> > +local function getlast_traceno()
>> > + return misc.getmetrics().jit_trace_num
>> > +end
>> > +
>> > +-- Convert addr to positive value if needed.
>> > +local function canonize_address(addr)
>> Nit: most of the time, the `canonize` variant is used in theological materials,
>> while the `canonicalize` is more common in the sphere of software development.
>> Feel free to ignore.
>
>Fixed, thanks.
>
>> > + if addr < 0 then addr = addr + 2 ^ 32 end
>> > + return addr
>> > +end
>> > +
>> > +-- Need some storage to avoid functions and traces to be
>> > +-- collected.
>> Typo: s/Need/We need/ or s/Need some storage/Some storage is needed/
>> Typo: s/to be collected/being collected/
>
>Fixed.
>
>> > +local recfuncs = {}
>> > +local last_i = 0
>> > +-- This function generates a table of functions with heavy mcode
>> > +-- payload with tab arithmetics to fill the mcode area from the
>> > +-- one trace mcode by the some given size. This size is usually
>> Typo: s/by the some/by some/
>
>Fixed, thanks!
>
>> > +-- big enough, because we want to check long jump side exits from
>> > +-- some traces.
>> > +-- Assumes, that maxmcode and maxtrace options are set to be sure,
>> Typo: s/that/that the/
>
>Fixed.
>
>> > +-- that we can produce such amount of mcode.
>> > +function M.fillmcode(trace_from, size)
>> > + local mcode, addr_from = jutil.tracemc(trace_from)
>> > + assert(mcode, 'the #1 argument should be an existed trace number')
>> Typo: s/existed/existing/
>
>Fixed, thanks!
>
>> > + addr_from = canonize_address(addr_from)
>> > + local required_diff = size + #mcode
>> > +
>> > + -- Marker to check that traces are not flushed.
>> > + local maxtraceno = getlast_traceno()
>> > + local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
>> > +
>> > + local _, last_addr = jutil.tracemc(maxtraceno)
>> > + last_addr = canonize_address(last_addr)
>> > +
>> > + -- Addresses of traces may increase or decrease depending on OS,
>> > + -- so use absolute diff.
>> > + while math.abs(last_addr - addr_from) > required_diff do
>> > + last_i = last_i + 1
>> > + -- This is a quite heavy workload (though it doesn't look like
>> Typo: s/This is a quite/This is quite a/
>
>Fixed.
>
>> > + -- one at first). Each load from a table is type guarded. Each
>> > + -- table lookup (for both stores and loads) is guarded for
>> > + -- table <hmask> value and presence of the metatable. The code
>> Typo: s/and presence/and the presence/
>
>Fixed.
>
>> > + -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
>> Typo: s/results to/results in/
>
>Fixed.
>
>> > + -- practice.
>> > + local fname = ('fillmcode[%d]'):format(last_i)
>> > + recfuncs[last_i] = assert(loadstring(([[
>> > + return function(src)
>> > + local p = %d
>> Nit: Poor naming, a more descriptive name is preferred.
>
>It has no much sense, because we really don't care about of the
>function's content. Since it's just moved part of the code, I prefer to
>leave it as is.
>
>Ignoring for now.
>
>> > + local tmp = { }
>> > + local dst = { }
>> > + -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
>> Typo: s/as stop/as a stop/
>
>Fixed, thanks!
>
>> > + -- in root trace) errors due to hotcount collisions.
>> > + for i = 1, 5 do
>
><snipped>
>
>> > + local function tnew(p)
>> Nit: same issue with naming.
>
>Ditto.
>
>> > + return {
>
><snipped>
>
>See the iterative patch below:
>
>===================================================================
>diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
>index 62b6e0ef..65abfdaa 100644
>--- a/test/tarantool-tests/utils/jit/generators.lua
>+++ b/test/tarantool-tests/utils/jit/generators.lua
>@@ -7,26 +7,26 @@ local function getlast_traceno()
> end
>
> -- Convert addr to positive value if needed.
>-local function canonize_address(addr)
>+local function canonicalize_address(addr)
> if addr < 0 then addr = addr + 2 ^ 32 end
> return addr
> end
>
>--- Need some storage to avoid functions and traces to be
>+-- Some storage is needed to avoid functions and traces being
> -- collected.
> local recfuncs = {}
> local last_i = 0
> -- This function generates a table of functions with heavy mcode
> -- payload with tab arithmetics to fill the mcode area from the
>--- one trace mcode by the some given size. This size is usually
>--- big enough, because we want to check long jump side exits from
>--- some traces.
>--- Assumes, that maxmcode and maxtrace options are set to be sure,
>--- that we can produce such amount of mcode.
>+-- one trace mcode by some given size. This size is usually big
>+-- enough, because we want to check long jump side exits from some
>+-- traces.
>+-- Assumes, that the maxmcode and maxtrace options are set to be
>+-- sure, that we can produce such amount of mcode.
> function M.fillmcode(trace_from, size)
> local mcode, addr_from = jutil.tracemc(trace_from)
>- assert(mcode, 'the #1 argument should be an existed trace number')
>- addr_from = canonize_address(addr_from)
>+ assert(mcode, 'the #1 argument should be an existing trace number')
>+ addr_from = canonicalize_address(addr_from)
> local required_diff = size + #mcode
>
> -- Marker to check that traces are not flushed.
>@@ -34,17 +34,17 @@ function M.fillmcode(trace_from, size)
> local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
>
> local _, last_addr = jutil.tracemc(maxtraceno)
>- last_addr = canonize_address(last_addr)
>+ last_addr = canonicalize_address(last_addr)
>
> -- Addresses of traces may increase or decrease depending on OS,
> -- so use absolute diff.
> while math.abs(last_addr - addr_from) > required_diff do
> last_i = last_i + 1
>- -- This is a quite heavy workload (though it doesn't look like
>+ -- This is quite a heavy workload (though it doesn't look like
> -- one at first). Each load from a table is type guarded. Each
> -- table lookup (for both stores and loads) is guarded for
>- -- table <hmask> value and presence of the metatable. The code
>- -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
>+ -- table <hmask> value and the presence of the metatable. The
>+ -- code below results in ~8Kb of mcode for ARM64 and MIPS64 in
> -- practice.
> local fname = ('fillmcode[%d]'):format(last_i)
> recfuncs[last_i] = assert(loadstring(([[
>@@ -52,8 +52,8 @@ function M.fillmcode(trace_from, size)
> local p = %d
> local tmp = { }
> local dst = { }
>- -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
>- -- in root trace) errors due to hotcount collisions.
>+ -- XXX: use 5 as a stop index to reduce LLEAVE (leaving
>+ -- loop in root trace) errors due to hotcount collisions.
> for i = 1, 5 do
> tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p
> tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p
>@@ -108,7 +108,7 @@ function M.fillmcode(trace_from, size)
> if not last_addr then
> error(FLUSH_ERR)
> end
>- last_addr = canonize_address(last_addr)
>+ last_addr = canonicalize_address(last_addr)
> end
> end
>
>===================================================================
>
>> > +end
>> > +
>> > +return M
>> > --
>> > 2.41.0
>> Best regards,
>> Maxim Kokryashkin
>> >
>
>--
>Best regards,
>Sergey Kaplun
[-- Attachment #2: Type: text/html, Size: 12464 bytes --]
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
2023-08-15 10:14 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:32 ` Sergey Bronnikov via Tarantool-patches
2023-08-16 15:20 ` Sergey Kaplun via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 14:32 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey
Thanks for the patch!
Sergey
On 8/9/23 18:35, Sergey Kaplun wrote:
<snipped>
> ls/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> index 2afebbb2..414257fd 100644
> --- a/test/tarantool-tests/utils/frontend.lua
> +++ b/test/tarantool-tests/utils/frontend.lua
> @@ -1,6 +1,10 @@
> local M = {}
>
> local bc = require('jit.bc')
> +local jutil = require('jit.util')
> +local vmdef = require('jit.vmdef')
> +local bcnames = vmdef.bcnames
> +local band, rshift = bit.band, bit.rshift
>
> function M.hasbc(f, bytecode)
> assert(type(f) == 'function', 'argument #1 should be a function')
> @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
> return hasbc
> end
>
> +-- Get traceno of the trace assotiated for the given function.
> +function M.gettraceno(func)
> + assert(type(func) == 'function', 'argument #1 should be a function')
> +
> + -- The 0th BC is the header.
> + local func_ins = jutil.funcbc(func, 0)
> + local BC_NAME_LENGTH = 6
> + local RD_SHIFT = 16
Nit: AFAIK usually we left a comment with a source of constants.
<snipped>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
2023-08-16 14:32 ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-16 15:20 ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:08 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:20 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
On 16.08.23, Sergey Bronnikov wrote:
> Hi, Sergey
>
>
> Thanks for the patch!
>
> Sergey
>
> On 8/9/23 18:35, Sergey Kaplun wrote:
>
> <snipped>
>
>
> > ls/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> > index 2afebbb2..414257fd 100644
> > --- a/test/tarantool-tests/utils/frontend.lua
> > +++ b/test/tarantool-tests/utils/frontend.lua
> > @@ -1,6 +1,10 @@
> > local M = {}
> >
> > local bc = require('jit.bc')
> > +local jutil = require('jit.util')
> > +local vmdef = require('jit.vmdef')
> > +local bcnames = vmdef.bcnames
> > +local band, rshift = bit.band, bit.rshift
> >
> > function M.hasbc(f, bytecode)
> > assert(type(f) == 'function', 'argument #1 should be a function')
> > @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
> > return hasbc
> > end
> >
> > +-- Get traceno of the trace assotiated for the given function.
> > +function M.gettraceno(func)
> > + assert(type(func) == 'function', 'argument #1 should be a function')
> > +
> > + -- The 0th BC is the header.
> > + local func_ins = jutil.funcbc(func, 0)
> > + local BC_NAME_LENGTH = 6
> > + local RD_SHIFT = 16
>
>
> Nit: AFAIK usually we left a comment with a source of constants.
Unfortunately, there is no any real sources for these constants,
but the code is similar to the <src/jit/bc.lua>. But, I don't sure
that is worth to be mentioned.
>
> <snipped>
>
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
2023-08-16 15:20 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 16:08 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:08 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Ok, thanks.
LGTM now
On 8/16/23 18:20, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
>
> On 16.08.23, Sergey Bronnikov wrote:
>> Hi, Sergey
>>
>>
>> Thanks for the patch!
>>
>> Sergey
>>
>> On 8/9/23 18:35, Sergey Kaplun wrote:
>>
>> <snipped>
>>
>>
>>> ls/frontend.lua b/test/tarantool-tests/utils/frontend.lua
>>> index 2afebbb2..414257fd 100644
>>> --- a/test/tarantool-tests/utils/frontend.lua
>>> +++ b/test/tarantool-tests/utils/frontend.lua
>>> @@ -1,6 +1,10 @@
>>> local M = {}
>>>
>>> local bc = require('jit.bc')
>>> +local jutil = require('jit.util')
>>> +local vmdef = require('jit.vmdef')
>>> +local bcnames = vmdef.bcnames
>>> +local band, rshift = bit.band, bit.rshift
>>>
>>> function M.hasbc(f, bytecode)
>>> assert(type(f) == 'function', 'argument #1 should be a function')
>>> @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
>>> return hasbc
>>> end
>>>
>>> +-- Get traceno of the trace assotiated for the given function.
>>> +function M.gettraceno(func)
>>> + assert(type(func) == 'function', 'argument #1 should be a function')
>>> +
>>> + -- The 0th BC is the header.
>>> + local func_ins = jutil.funcbc(func, 0)
>>> + local BC_NAME_LENGTH = 6
>>> + local RD_SHIFT = 16
>>
>> Nit: AFAIK usually we left a comment with a source of constants.
> Unfortunately, there is no any real sources for these constants,
> but the code is similar to the <src/jit/bc.lua>. But, I don't sure
> that is worth to be mentioned.
>
>> <snipped>
>>
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 11:13 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 15:02 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
` (18 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Contributed by Djordje Kovacevic and Stefan Pejic.
(cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
`asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
check in `asm_sparejump_setup()`, so mcode bottom is not updated.
This patch fixes check of the MCLink offset from the mcbot.
Nevertheless, the emitting of spare jump slots is still incorrect, so
the introduced test still fails due to incorrect iteration through the
sparce table (the last slot is out of mcode range).
This should be fixed via backporting of the commit
dbb78630169a8106b355a5be8af627e98c362f1e ("MIPS: Fix handling of
long-range spare jumps."). But it triggers the new unconditional
assert, that is added in this patch, mentioning that sizemcode is too
bit. So some workaround should be found, when this test will be enabled
for MIPS.
Since test also validates the behaviour of long-range jumps to side
traces for arm64 and x64, and we have no testing for MIPS64 (yet), we
can leave it as is without a skipcond.
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#8825
---
src/lj_asm_mips.h | 9 +--
src/lj_jit.h | 6 ++
src/lj_mcode.c | 6 --
...x-mips64-spare-side-exit-patching.test.lua | 65 +++++++++++++++++++
4 files changed, 76 insertions(+), 10 deletions(-)
create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index 03215821..0e60fc07 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -65,10 +65,9 @@ static Reg ra_alloc2(ASMState *as, IRIns *ir, RegSet allow)
static void asm_sparejump_setup(ASMState *as)
{
MCode *mxp = as->mcbot;
- /* Assumes sizeof(MCLink) == 8. */
- if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == 8) {
+ if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == sizeof(MCLink)) {
lua_assert(MIPSI_NOP == 0);
- memset(mxp+2, 0, MIPS_SPAREJUMP*8);
+ memset(mxp, 0, MIPS_SPAREJUMP*2*sizeof(MCode));
mxp += MIPS_SPAREJUMP*2;
lua_assert(mxp < as->mctop);
lj_mcode_sync(as->mcbot, mxp);
@@ -2486,7 +2485,9 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
if (!cstart) cstart = p-1;
} else { /* Branch out of range. Use spare jump slot in mcarea. */
int i;
- for (i = 2; i < 2+MIPS_SPAREJUMP*2; i += 2) {
+ for (i = (int)(sizeof(MCLink)/sizeof(MCode));
+ i < (int)(sizeof(MCLink)/sizeof(MCode)+MIPS_SPAREJUMP*2);
+ i += 2) {
if (mcarea[i] == tjump) {
delta = mcarea+i - p;
goto patchbranch;
diff --git a/src/lj_jit.h b/src/lj_jit.h
index f2ad3c6e..cc8efd20 100644
--- a/src/lj_jit.h
+++ b/src/lj_jit.h
@@ -158,6 +158,12 @@ typedef uint8_t MCode;
typedef uint32_t MCode;
#endif
+/* Linked list of MCode areas. */
+typedef struct MCLink {
+ MCode *next; /* Next area. */
+ size_t size; /* Size of current area. */
+} MCLink;
+
/* Stack snapshot header. */
typedef struct SnapShot {
uint32_t mapofs; /* Offset into snapshot map. */
diff --git a/src/lj_mcode.c b/src/lj_mcode.c
index 7184d3b4..c6361018 100644
--- a/src/lj_mcode.c
+++ b/src/lj_mcode.c
@@ -272,12 +272,6 @@ static void *mcode_alloc(jit_State *J, size_t sz)
/* -- MCode area management ----------------------------------------------- */
-/* Linked list of MCode areas. */
-typedef struct MCLink {
- MCode *next; /* Next area. */
- size_t size; /* Size of current area. */
-} MCLink;
-
/* Allocate a new MCode area. */
static void mcode_allocarea(jit_State *J)
{
diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
new file mode 100644
index 00000000..fdc826cb
--- /dev/null
+++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
@@ -0,0 +1,65 @@
+local tap = require('tap')
+local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
+ ['Test requires JIT enabled'] = not jit.status(),
+ ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
+ -- Need to fix the MIPS behaviour first.
+ ['Disabled for MIPS architectures'] = jit.arch:match('mips'),
+})
+
+local generators = require('utils').jit.generators
+local frontend = require('utils').frontend
+
+test:plan(1)
+
+-- Make compiler work hard.
+jit.opt.start(
+ -- No optimizations at all to produce more mcode.
+ 0,
+ -- Try to compile all compiled paths as early as JIT can.
+ 'hotloop=1',
+ 'hotexit=1',
+ -- Allow to use 2000 traces to avoid flushes.
+ 'maxtrace=2000',
+ -- Allow to compile 8Mb of mcode to be sure the issue occurs.
+ 'maxmcode=8192',
+ -- Use big mcode area for traces to avoid using different
+ -- spare slots.
+ 'sizemcode=256'
+)
+
+local MAX_SPARE_SLOT = 4
+local function parent(marker)
+ -- Use several side exit to fill spare exit space (default is
+ -- 4 slots, each slot has 2 instructions -- jump and nop).
+ -- luacheck: ignore
+ if marker > MAX_SPARE_SLOT then end
+ if marker > 3 then end
+ if marker > 2 then end
+ if marker > 1 then end
+ if marker > 0 then end
+ -- XXX: use `fmod()` to avoid leaving the function and use
+ -- stitching here.
+ return math.fmod(1, 1)
+end
+
+-- Compile parent trace first.
+parent(0)
+parent(0)
+
+local parent_traceno = frontend.gettraceno(parent)
+local last_traceno = parent_traceno
+
+-- Now generate some mcode to forcify long jump with a spare slot.
+-- Each iteration provide different addresses and uses a different
+-- spare slot. After it compile and execute new side trace.
+for i = 1, MAX_SPARE_SLOT + 1 do
+ generators.fillmcode(last_traceno, 1024 * 1024)
+ parent(i)
+ parent(i)
+ parent(i)
+ last_traceno = misc.getmetrics().jit_trace_num
+end
+
+test:ok(true, 'all traces executed correctly')
+
+test:done(true)
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:13 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:05 ` Sergey Kaplun via Tarantool-patches
2023-08-16 15:02 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:13 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:52PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic.
>
> (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
>
> `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
Typo: s/to incorrect/to an incorrect/
> check in `asm_sparejump_setup()`, so mcode bottom is not updated.
Typo: s/so mcode/so the mcode/
>
> This patch fixes check of the MCLink offset from the mcbot.
Typo: s/fixes check/fixes the check/
> Nevertheless, the emitting of spare jump slots is still incorrect, so
> the introduced test still fails due to incorrect iteration through the
Typo: s/due to/due to the/
> sparce table (the last slot is out of mcode range).
>
> This should be fixed via backporting of the commit
> dbb78630169a8106b355a5be8af627e98c362f1e ("MIPS: Fix handling of
> long-range spare jumps."). But it triggers the new unconditional
> assert, that is added in this patch, mentioning that sizemcode is too
> bit. So some workaround should be found, when this test will be enabled
Typo: s/bit/big/
Typo: s/will be/is/
> for MIPS.
>
> Since test also validates the behaviour of long-range jumps to side
> traces for arm64 and x64, and we have no testing for MIPS64 (yet), we
> can leave it as is without a skipcond.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_asm_mips.h | 9 +--
> src/lj_jit.h | 6 ++
> src/lj_mcode.c | 6 --
> ...x-mips64-spare-side-exit-patching.test.lua | 65 +++++++++++++++++++
> 4 files changed, 76 insertions(+), 10 deletions(-)
> create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 03215821..0e60fc07 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -65,10 +65,9 @@ static Reg ra_alloc2(ASMState *as, IRIns *ir, RegSet allow)
> static void asm_sparejump_setup(ASMState *as)
> {
> MCode *mxp = as->mcbot;
> - /* Assumes sizeof(MCLink) == 8. */
> - if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == 8) {
> + if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == sizeof(MCLink)) {
> lua_assert(MIPSI_NOP == 0);
> - memset(mxp+2, 0, MIPS_SPAREJUMP*8);
> + memset(mxp, 0, MIPS_SPAREJUMP*2*sizeof(MCode));
> mxp += MIPS_SPAREJUMP*2;
> lua_assert(mxp < as->mctop);
> lj_mcode_sync(as->mcbot, mxp);
> @@ -2486,7 +2485,9 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
> if (!cstart) cstart = p-1;
> } else { /* Branch out of range. Use spare jump slot in mcarea. */
> int i;
> - for (i = 2; i < 2+MIPS_SPAREJUMP*2; i += 2) {
> + for (i = (int)(sizeof(MCLink)/sizeof(MCode));
> + i < (int)(sizeof(MCLink)/sizeof(MCode)+MIPS_SPAREJUMP*2);
> + i += 2) {
> if (mcarea[i] == tjump) {
> delta = mcarea+i - p;
> goto patchbranch;
> diff --git a/src/lj_jit.h b/src/lj_jit.h
> index f2ad3c6e..cc8efd20 100644
> --- a/src/lj_jit.h
> +++ b/src/lj_jit.h
> @@ -158,6 +158,12 @@ typedef uint8_t MCode;
> typedef uint32_t MCode;
> #endif
>
> +/* Linked list of MCode areas. */
> +typedef struct MCLink {
> + MCode *next; /* Next area. */
> + size_t size; /* Size of current area. */
> +} MCLink;
> +
> /* Stack snapshot header. */
> typedef struct SnapShot {
> uint32_t mapofs; /* Offset into snapshot map. */
> diff --git a/src/lj_mcode.c b/src/lj_mcode.c
> index 7184d3b4..c6361018 100644
> --- a/src/lj_mcode.c
> +++ b/src/lj_mcode.c
> @@ -272,12 +272,6 @@ static void *mcode_alloc(jit_State *J, size_t sz)
>
> /* -- MCode area management ----------------------------------------------- */
>
> -/* Linked list of MCode areas. */
> -typedef struct MCLink {
> - MCode *next; /* Next area. */
> - size_t size; /* Size of current area. */
> -} MCLink;
> -
> /* Allocate a new MCode area. */
> static void mcode_allocarea(jit_State *J)
> {
> diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> new file mode 100644
> index 00000000..fdc826cb
> --- /dev/null
> +++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> @@ -0,0 +1,65 @@
> +local tap = require('tap')
> +local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
> + ['Test requires JIT enabled'] = not jit.status(),
> + ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
> + -- Need to fix the MIPS behaviour first.
Typo: s/Need to/We need to/
> + ['Disabled for MIPS architectures'] = jit.arch:match('mips'),
> +})
> +
> +local generators = require('utils').jit.generators
> +local frontend = require('utils').frontend
> +
> +test:plan(1)
> +
> +-- Make compiler work hard.
> +jit.opt.start(
> + -- No optimizations at all to produce more mcode.
> + 0,
> + -- Try to compile all compiled paths as early as JIT can.
> + 'hotloop=1',
> + 'hotexit=1',
> + -- Allow to use 2000 traces to avoid flushes.
Typo: s/to use/compilation of up to/
> + 'maxtrace=2000',
> + -- Allow to compile 8Mb of mcode to be sure the issue occurs.
Typo: s/to compile/compilation of up to/
> + 'maxmcode=8192',
> + -- Use big mcode area for traces to avoid using different
Typo: s/using/usage of/
> + -- spare slots.
> + 'sizemcode=256'
> +)
> +
> +local MAX_SPARE_SLOT = 4
A link to the definition in `lj_asm_mips.h` would be nice to have.
> +local function parent(marker)
> + -- Use several side exit to fill spare exit space (default is
Typo: s/side exit/side exits/
> + -- 4 slots, each slot has 2 instructions -- jump and nop).
> + -- luacheck: ignore
> + if marker > MAX_SPARE_SLOT then end
> + if marker > 3 then end
> + if marker > 2 then end
> + if marker > 1 then end
> + if marker > 0 then end
> + -- XXX: use `fmod()` to avoid leaving the function and use
> + -- stitching here.
> + return math.fmod(1, 1)
> +end
> +
> +-- Compile parent trace first.
> +parent(0)
> +parent(0)
> +
> +local parent_traceno = frontend.gettraceno(parent)
> +local last_traceno = parent_traceno
> +
> +-- Now generate some mcode to forcify long jump with a spare slot.
> +-- Each iteration provide different addresses and uses a different
Typo: s/provide/provides/
> +-- spare slot. After it compile and execute new side trace.
Typo: s/After it compile and execute/After that, compiles and executes a/
> +for i = 1, MAX_SPARE_SLOT + 1 do
> + generators.fillmcode(last_traceno, 1024 * 1024)
> + parent(i)
> + parent(i)
> + parent(i)
> + last_traceno = misc.getmetrics().jit_trace_num
> +end
> +
> +test:ok(true, 'all traces executed correctly')
> +
> +test:done(true)
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
2023-08-15 11:13 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:05 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:05 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:52PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Djordje Kovacevic and Stefan Pejic.
> >
> > (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
> >
> > `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> > is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
> Typo: s/to incorrect/to an incorrect/
Fixed.
> > check in `asm_sparejump_setup()`, so mcode bottom is not updated.
> Typo: s/so mcode/so the mcode/
Fixed.
> >
> > This patch fixes check of the MCLink offset from the mcbot.
> Typo: s/fixes check/fixes the check/
Fixed.
> > Nevertheless, the emitting of spare jump slots is still incorrect, so
> > the introduced test still fails due to incorrect iteration through the
> Typo: s/due to/due to the/
Fixed.
> > sparce table (the last slot is out of mcode range).
> >
> > This should be fixed via backporting of the commit
> > dbb78630169a8106b355a5be8af627e98c362f1e ("MIPS: Fix handling of
> > long-range spare jumps."). But it triggers the new unconditional
> > assert, that is added in this patch, mentioning that sizemcode is too
> > bit. So some workaround should be found, when this test will be enabled
> Typo: s/bit/big/
> Typo: s/will be/is/
Fixed, thanks!
> > for MIPS.
> >
> > Since test also validates the behaviour of long-range jumps to side
> > traces for arm64 and x64, and we have no testing for MIPS64 (yet), we
> > can leave it as is without a skipcond.
> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#8825
> > ---
> > src/lj_asm_mips.h | 9 +--
> > src/lj_jit.h | 6 ++
> > src/lj_mcode.c | 6 --
> > ...x-mips64-spare-side-exit-patching.test.lua | 65 +++++++++++++++++++
> > 4 files changed, 76 insertions(+), 10 deletions(-)
> > create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> >
<snipped>
> > diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> > new file mode 100644
> > index 00000000..fdc826cb
> > --- /dev/null
> > +++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> > @@ -0,0 +1,65 @@
> > +local tap = require('tap')
> > +local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
> > + ['Test requires JIT enabled'] = not jit.status(),
> > + ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
> > + -- Need to fix the MIPS behaviour first.
> Typo: s/Need to/We need to/
Fixed.
> > + ['Disabled for MIPS architectures'] = jit.arch:match('mips'),
<snipped>
> > + -- Allow to use 2000 traces to avoid flushes.
> Typo: s/to use/compilation of up to/
Fixed.
> > + 'maxtrace=2000',
> > + -- Allow to compile 8Mb of mcode to be sure the issue occurs.
> Typo: s/to compile/compilation of up to/
Fixed.
> > + 'maxmcode=8192',
> > + -- Use big mcode area for traces to avoid using different
> Typo: s/using/usage of/
Fixed.
> > + -- spare slots.
> > + 'sizemcode=256'
> > +)
> > +
> > +local MAX_SPARE_SLOT = 4
> A link to the definition in `lj_asm_mips.h` would be nice to have.
Added.
>
> > +local function parent(marker)
> > + -- Use several side exit to fill spare exit space (default is
> Typo: s/side exit/side exits/
Fixed, thanks!
> > + -- 4 slots, each slot has 2 instructions -- jump and nop).
> > + -- luacheck: ignore
> > + if marker > MAX_SPARE_SLOT then end
> > + if marker > 3 then end
> > + if marker > 2 then end
> > + if marker > 1 then end
> > + if marker > 0 then end
> > + -- XXX: use `fmod()` to avoid leaving the function and use
> > + -- stitching here.
> > + return math.fmod(1, 1)
> > +end
> > +
> > +-- Compile parent trace first.
> > +parent(0)
> > +parent(0)
> > +
> > +local parent_traceno = frontend.gettraceno(parent)
> > +local last_traceno = parent_traceno
> > +
> > +-- Now generate some mcode to forcify long jump with a spare slot.
> > +-- Each iteration provide different addresses and uses a different
> Typo: s/provide/provides/
Fixed, thanks!
> > +-- spare slot. After it compile and execute new side trace.
> Typo: s/After it compile and execute/After that, compiles and executes a/
Fixed.
See the iterative patch below.
===================================================================
diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
index fdc826cb..62933df9 100644
--- a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
+++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
@@ -2,7 +2,7 @@ local tap = require('tap')
local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
['Test requires JIT enabled'] = not jit.status(),
['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
- -- Need to fix the MIPS behaviour first.
+ -- We need to fix the MIPS behaviour first.
['Disabled for MIPS architectures'] = jit.arch:match('mips'),
})
@@ -18,18 +18,19 @@ jit.opt.start(
-- Try to compile all compiled paths as early as JIT can.
'hotloop=1',
'hotexit=1',
- -- Allow to use 2000 traces to avoid flushes.
+ -- Allow compilation of up to 2000 traces to avoid flushes.
'maxtrace=2000',
-- Allow to compile 8Mb of mcode to be sure the issue occurs.
'maxmcode=8192',
- -- Use big mcode area for traces to avoid using different
+ -- Use big mcode area for traces to avoid usage of different
-- spare slots.
'sizemcode=256'
)
+-- See the define in the <src/lj_asm_mips.h>.
local MAX_SPARE_SLOT = 4
local function parent(marker)
- -- Use several side exit to fill spare exit space (default is
+ -- Use several side exits to fill spare exit space (default is
-- 4 slots, each slot has 2 instructions -- jump and nop).
-- luacheck: ignore
if marker > MAX_SPARE_SLOT then end
@@ -50,8 +51,9 @@ local parent_traceno = frontend.gettraceno(parent)
local last_traceno = parent_traceno
-- Now generate some mcode to forcify long jump with a spare slot.
--- Each iteration provide different addresses and uses a different
--- spare slot. After it compile and execute new side trace.
+-- Each iteration provides different addresses and uses a
+-- different spare slot. After that, compiles and executes a new
+-- side trace.
for i = 1, MAX_SPARE_SLOT + 1 do
generators.fillmcode(last_traceno, 1024 * 1024)
parent(i)
===================================================================
<snipped>
> > 2.41.0
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
2023-08-15 11:13 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 15:02 ` Sergey Bronnikov via Tarantool-patches
2023-08-16 15:32 ` Sergey Kaplun via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 15:02 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey
thanks for the patch!
Test has passed after reverting a patch and I suspect it is expected because
behaviour was broken for MIPS only, right?
See a minor comment below.
On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic.
>
> (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
>
> `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
> check in `asm_sparejump_setup()`, so mcode bottom is not updated.
>
> This patch fixes check of the MCLink offset from the mcbot.
> Nevertheless, the emitting of spare jump slots is still incorrect, so
> the introduced test still fails due to incorrect iteration through the
> sparce table (the last slot is out of mcode range).
"sparce" -> "sparse"?
<snipped >
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
2023-08-16 15:02 ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-16 15:32 ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:08 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:32 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
On 16.08.23, Sergey Bronnikov wrote:
> Hi, Sergey
>
>
> thanks for the patch!
>
> Test has passed after reverting a patch and I suspect it is expected because
>
> behaviour was broken for MIPS only, right?
Yes, its true.
>
> See a minor comment below.
>
>
> On 8/9/23 18:35, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Djordje Kovacevic and Stefan Pejic.
> >
> > (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
> >
> > `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> > is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
> > check in `asm_sparejump_setup()`, so mcode bottom is not updated.
> >
> > This patch fixes check of the MCLink offset from the mcbot.
> > Nevertheless, the emitting of spare jump slots is still incorrect, so
> > the introduced test still fails due to incorrect iteration through the
> > sparce table (the last slot is out of mcode range).
> "sparce" -> "sparse"?
Changed to the "spare slots".
> <snipped >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
2023-08-16 15:32 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 16:08 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:08 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Thanks! LGTM now
On 8/16/23 18:32, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
>
> On 16.08.23, Sergey Bronnikov wrote:
>> Hi, Sergey
>>
>>
>> thanks for the patch!
>>
>> Test has passed after reverting a patch and I suspect it is expected because
>>
>> behaviour was broken for MIPS only, right?
> Yes, its true.
>
>> See a minor comment below.
>>
>>
>> On 8/9/23 18:35, Sergey Kaplun wrote:
>>> From: Mike Pall <mike>
>>>
>>> Contributed by Djordje Kovacevic and Stefan Pejic.
>>>
>>> (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
>>>
>>> `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
>>> is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
>>> check in `asm_sparejump_setup()`, so mcode bottom is not updated.
>>>
>>> This patch fixes check of the MCLink offset from the mcbot.
>>> Nevertheless, the emitting of spare jump slots is still incorrect, so
>>> the introduced test still fails due to incorrect iteration through the
>>> sparce table (the last slot is out of mcode range).
>> "sparce" -> "sparse"?
> Changed to the "spare slots".
>
>> <snipped >
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (2 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 11:27 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 16:07 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
` (17 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
Sponsored by Cisco Systems, Inc.
(cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)
The software floating point library is used on machines which do not
have hardware support for floating point [1]. This patch enables
support for such machines in JIT compiler backend for MIPS64.
This includes:
* `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
number to integer with a check for the soft-float support (called from
JIT).
* `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
operations with a check for the soft-float support (called from JIT).
* `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
`LJ_SOFTFP`.
* All fp-depending paths are instrumented with `LJ_SOFTFP` or
`LJ_SOFTFP32` macro.
* The corresponding function calls in <src/lj_ircall.h> are marked as
`XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
for soft-FP on 32-bit MIPS.
[1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
Sergey Kaplun:
* added the description for the feature
Part of tarantool/tarantool#8825
---
src/lj_arch.h | 4 +-
src/lj_asm.c | 8 +-
src/lj_asm_mips.h | 217 +++++++++++++++++++++++++++++++++++++--------
src/lj_crecord.c | 4 +-
src/lj_emit_mips.h | 2 +
src/lj_ffrecord.c | 2 +-
src/lj_ircall.h | 43 ++++++---
src/lj_iropt.h | 2 +-
src/lj_jit.h | 4 +-
src/lj_obj.h | 3 +
src/lj_opt_split.c | 2 +-
src/lj_snap.c | 21 +++--
src/vm_mips64.dasc | 49 ++++++++++
13 files changed, 286 insertions(+), 75 deletions(-)
diff --git a/src/lj_arch.h b/src/lj_arch.h
index 5276ae56..c39526ea 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -349,9 +349,6 @@
#define LJ_ARCH_BITS 32
#define LJ_TARGET_MIPS32 1
#else
-#if LJ_ABI_SOFTFP || !LJ_ARCH_HASFPU
-#define LJ_ARCH_NOJIT 1 /* NYI */
-#endif
#define LJ_ARCH_BITS 64
#define LJ_TARGET_MIPS64 1
#define LJ_TARGET_GC64 1
@@ -528,6 +525,7 @@
#define LJ_ABI_SOFTFP 0
#endif
#define LJ_SOFTFP (!LJ_ARCH_HASFPU)
+#define LJ_SOFTFP32 (LJ_SOFTFP && LJ_32)
#if LJ_ARCH_ENDIAN == LUAJIT_BE
#define LJ_LE 0
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 0bfa44ed..15de7e33 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -341,7 +341,7 @@ static Reg ra_rematk(ASMState *as, IRRef ref)
ra_modified(as, r);
ir->r = RID_INIT; /* Do not keep any hint. */
RA_DBGX((as, "remat $i $r", ir, r));
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
if (ir->o == IR_KNUM) {
emit_loadk64(as, r, ir);
} else
@@ -1356,7 +1356,7 @@ static void asm_call(ASMState *as, IRIns *ir)
asm_gencall(as, ci, args);
}
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
static void asm_fppow(ASMState *as, IRIns *ir, IRRef lref, IRRef rref)
{
const CCallInfo *ci = &lj_ir_callinfo[IRCALL_pow];
@@ -1703,10 +1703,10 @@ static void asm_ir(ASMState *as, IRIns *ir)
case IR_MUL: asm_mul(as, ir); break;
case IR_MOD: asm_mod(as, ir); break;
case IR_NEG: asm_neg(as, ir); break;
-#if LJ_SOFTFP
+#if LJ_SOFTFP32
case IR_DIV: case IR_POW: case IR_ABS:
case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
- lua_assert(0); /* Unused for LJ_SOFTFP. */
+ lua_assert(0); /* Unused for LJ_SOFTFP32. */
break;
#else
case IR_DIV: asm_div(as, ir); break;
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index 0e60fc07..a26a82cd 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -290,7 +290,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
{
ra_leftov(as, gpr, ref);
gpr++;
-#if LJ_64
+#if LJ_64 && !LJ_SOFTFP
fpr++;
#endif
}
@@ -301,7 +301,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
emit_spstore(as, ir, r, ofs);
ofs += irt_isnum(ir->t) ? 8 : 4;
#else
- emit_spstore(as, ir, r, ofs + ((LJ_BE && (LJ_SOFTFP || r < RID_MAX_GPR) && !irt_is64(ir->t)) ? 4 : 0));
+ emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isfp(ir->t) && !irt_is64(ir->t)) ? 4 : 0));
ofs += 8;
#endif
}
@@ -312,7 +312,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
#endif
if (gpr <= REGARG_LASTGPR) {
gpr++;
-#if LJ_64
+#if LJ_64 && !LJ_SOFTFP
fpr++;
#endif
} else {
@@ -461,12 +461,36 @@ static void asm_tobit(ASMState *as, IRIns *ir)
emit_tg(as, MIPSI_MFC1, dest, tmp);
emit_fgh(as, MIPSI_ADD_D, tmp, left, right);
}
+#elif LJ_64 /* && LJ_SOFTFP */
+static void asm_tointg(ASMState *as, IRIns *ir, Reg r)
+{
+ /* The modified regs must match with the *.dasc implementation. */
+ RegSet drop = RID2RSET(REGARG_FIRSTGPR)|RID2RSET(RID_RET)|RID2RSET(RID_RET+1)|
+ RID2RSET(RID_R1)|RID2RSET(RID_R12);
+ if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
+ ra_evictset(as, drop);
+ /* Return values are in RID_RET (converted value) and RID_RET+1 (status). */
+ ra_destreg(as, ir, RID_RET);
+ asm_guard(as, MIPSI_BNE, RID_RET+1, RID_ZERO);
+ emit_call(as, (void *)lj_ir_callinfo[IRCALL_lj_vm_tointg].func, 0);
+ if (r == RID_NONE)
+ ra_leftov(as, REGARG_FIRSTGPR, ir->op1);
+ else if (r != REGARG_FIRSTGPR)
+ emit_move(as, REGARG_FIRSTGPR, r);
+}
+
+static void asm_tobit(ASMState *as, IRIns *ir)
+{
+ Reg dest = ra_dest(as, ir, RSET_GPR);
+ emit_dta(as, MIPSI_SLL, dest, dest, 0);
+ asm_callid(as, ir, IRCALL_lj_vm_tobit);
+}
#endif
static void asm_conv(ASMState *as, IRIns *ir)
{
IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
int stfp = (st == IRT_NUM || st == IRT_FLOAT);
#endif
#if LJ_64
@@ -477,12 +501,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
lua_assert(!(irt_isint64(ir->t) ||
(st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
#endif
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP32
/* FP conversions are handled by SPLIT. */
lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
/* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
#else
lua_assert(irt_type(ir->t) != st);
+#if !LJ_SOFTFP
if (irt_isfp(ir->t)) {
Reg dest = ra_dest(as, ir, RSET_FPR);
if (stfp) { /* FP to FP conversion. */
@@ -608,6 +633,42 @@ static void asm_conv(ASMState *as, IRIns *ir)
}
}
} else
+#else
+ if (irt_isfp(ir->t)) {
+#if LJ_64 && LJ_HASFFI
+ if (stfp) { /* FP to FP conversion. */
+ asm_callid(as, ir, irt_isnum(ir->t) ? IRCALL_softfp_f2d :
+ IRCALL_softfp_d2f);
+ } else { /* Integer to FP conversion. */
+ IRCallID cid = ((IRT_IS64 >> st) & 1) ?
+ (irt_isnum(ir->t) ?
+ (st == IRT_I64 ? IRCALL_fp64_l2d : IRCALL_fp64_ul2d) :
+ (st == IRT_I64 ? IRCALL_fp64_l2f : IRCALL_fp64_ul2f)) :
+ (irt_isnum(ir->t) ?
+ (st == IRT_INT ? IRCALL_softfp_i2d : IRCALL_softfp_ui2d) :
+ (st == IRT_INT ? IRCALL_softfp_i2f : IRCALL_softfp_ui2f));
+ asm_callid(as, ir, cid);
+ }
+#else
+ asm_callid(as, ir, IRCALL_softfp_i2d);
+#endif
+ } else if (stfp) { /* FP to integer conversion. */
+ if (irt_isguard(ir->t)) {
+ /* Checked conversions are only supported from number to int. */
+ lua_assert(irt_isint(ir->t) && st == IRT_NUM);
+ asm_tointg(as, ir, RID_NONE);
+ } else {
+ IRCallID cid = irt_is64(ir->t) ?
+ ((st == IRT_NUM) ?
+ (irt_isi64(ir->t) ? IRCALL_fp64_d2l : IRCALL_fp64_d2ul) :
+ (irt_isi64(ir->t) ? IRCALL_fp64_f2l : IRCALL_fp64_f2ul)) :
+ ((st == IRT_NUM) ?
+ (irt_isint(ir->t) ? IRCALL_softfp_d2i : IRCALL_softfp_d2ui) :
+ (irt_isint(ir->t) ? IRCALL_softfp_f2i : IRCALL_softfp_f2ui));
+ asm_callid(as, ir, cid);
+ }
+ } else
+#endif
#endif
{
Reg dest = ra_dest(as, ir, RSET_GPR);
@@ -665,7 +726,7 @@ static void asm_strto(ASMState *as, IRIns *ir)
const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
IRRef args[2];
int32_t ofs = 0;
-#if LJ_SOFTFP
+#if LJ_SOFTFP32
ra_evictset(as, RSET_SCRATCH);
if (ra_used(ir)) {
if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
@@ -806,7 +867,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
MCLabel l_end, l_loop, l_next;
rset_clear(allow, tab);
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP32
if (!isk) {
key = ra_alloc1(as, refkey, allow);
rset_clear(allow, key);
@@ -826,7 +887,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
}
}
#else
- if (irt_isnum(kt)) {
+ if (!LJ_SOFTFP && irt_isnum(kt)) {
key = ra_alloc1(as, refkey, RSET_FPR);
tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
} else if (!irt_ispri(kt)) {
@@ -882,6 +943,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
+ } else if (LJ_SOFTFP && irt_isnum(kt)) {
+ emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
+ emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
} else if (irt_isaddr(kt)) {
Reg refk = tmp2;
if (isk) {
@@ -960,7 +1024,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
emit_dta(as, MIPSI_ROTR, dest, tmp1, (-HASH_ROT1)&31);
if (irt_isnum(kt)) {
emit_dst(as, MIPSI_ADDU, tmp1, tmp1, tmp1);
- emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 0);
+ emit_dta(as, MIPSI_DSRA32, tmp1, LJ_SOFTFP ? key : tmp1, 0);
emit_dta(as, MIPSI_SLL, tmp2, LJ_SOFTFP ? key : tmp1, 0);
#if !LJ_SOFTFP
emit_tg(as, MIPSI_DMFC1, tmp1, key);
@@ -1123,7 +1187,7 @@ static MIPSIns asm_fxloadins(IRIns *ir)
case IRT_U8: return MIPSI_LBU;
case IRT_I16: return MIPSI_LH;
case IRT_U16: return MIPSI_LHU;
- case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_LDC1;
+ case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
}
@@ -1134,7 +1198,7 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
switch (irt_type(ir->t)) {
case IRT_I8: case IRT_U8: return MIPSI_SB;
case IRT_I16: case IRT_U16: return MIPSI_SH;
- case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_SDC1;
+ case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
}
@@ -1199,7 +1263,7 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
static void asm_ahuvload(ASMState *as, IRIns *ir)
{
- int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
+ int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
Reg dest = RID_NONE, type = RID_TMP, idx;
RegSet allow = RSET_GPR;
int32_t ofs = 0;
@@ -1212,7 +1276,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
}
}
if (ra_used(ir)) {
- lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
+ lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
irt_isint(ir->t) || irt_isaddr(ir->t));
dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
rset_clear(allow, dest);
@@ -1261,10 +1325,10 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
int32_t ofs = 0;
if (ir->r == RID_SINK)
return;
- if (!LJ_SOFTFP && irt_isnum(ir->t)) {
- src = ra_alloc1(as, ir->op2, RSET_FPR);
+ if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+ src = ra_alloc1(as, ir->op2, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
- emit_hsi(as, MIPSI_SDC1, src, idx, ofs);
+ emit_hsi(as, LJ_SOFTFP ? MIPSI_SD : MIPSI_SDC1, src, idx, ofs);
} else {
#if LJ_32
if (!irt_ispri(ir->t)) {
@@ -1312,7 +1376,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
IRType1 t = ir->t;
#if LJ_32
int32_t ofs = 8*((int32_t)ir->op1-1) + ((ir->op2 & IRSLOAD_FRAME) ? 4 : 0);
- int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
+ int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
if (hiop)
t.irt = IRT_NUM;
#else
@@ -1320,7 +1384,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
#endif
lua_assert(!(ir->op2 & IRSLOAD_PARENT)); /* Handled by asm_head_side(). */
lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP32
lua_assert(!(ir->op2 & IRSLOAD_CONVERT)); /* Handled by LJ_SOFTFP SPLIT. */
if (hiop && ra_used(ir+1)) {
type = ra_dest(as, ir+1, allow);
@@ -1328,29 +1392,44 @@ static void asm_sload(ASMState *as, IRIns *ir)
}
#else
if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
- dest = ra_scratch(as, RSET_FPR);
+ dest = ra_scratch(as, LJ_SOFTFP ? allow : RSET_FPR);
asm_tointg(as, ir, dest);
t.irt = IRT_NUM; /* Continue with a regular number type check. */
} else
#endif
if (ra_used(ir)) {
- lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
+ lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
irt_isint(ir->t) || irt_isaddr(ir->t));
dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
rset_clear(allow, dest);
base = ra_alloc1(as, REF_BASE, allow);
rset_clear(allow, base);
- if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
+ if (!LJ_SOFTFP32 && (ir->op2 & IRSLOAD_CONVERT)) {
if (irt_isint(t)) {
- Reg tmp = ra_scratch(as, RSET_FPR);
+ Reg tmp = ra_scratch(as, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
+#if LJ_SOFTFP
+ ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
+ ra_destreg(as, ir, RID_RET);
+ emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_d2i].func, 0);
+ if (tmp != REGARG_FIRSTGPR)
+ emit_move(as, REGARG_FIRSTGPR, tmp);
+#else
emit_tg(as, MIPSI_MFC1, dest, tmp);
emit_fg(as, MIPSI_TRUNC_W_D, tmp, tmp);
+#endif
dest = tmp;
t.irt = IRT_NUM; /* Check for original type. */
} else {
Reg tmp = ra_scratch(as, RSET_GPR);
+#if LJ_SOFTFP
+ ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
+ ra_destreg(as, ir, RID_RET);
+ emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_i2d].func, 0);
+ emit_dta(as, MIPSI_SLL, REGARG_FIRSTGPR, tmp, 0);
+#else
emit_fg(as, MIPSI_CVT_D_W, dest, dest);
emit_tg(as, MIPSI_MTC1, tmp, dest);
+#endif
dest = tmp;
t.irt = IRT_INT; /* Check for original type. */
}
@@ -1399,7 +1478,7 @@ dotypecheck:
if (irt_isnum(t)) {
asm_guard(as, MIPSI_BEQ, RID_TMP, RID_ZERO);
emit_tsi(as, MIPSI_SLTIU, RID_TMP, RID_TMP, (int32_t)LJ_TISNUM);
- if (ra_hasreg(dest))
+ if (!LJ_SOFTFP && ra_hasreg(dest))
emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
} else {
asm_guard(as, MIPSI_BNE, RID_TMP,
@@ -1409,7 +1488,7 @@ dotypecheck:
}
emit_tsi(as, MIPSI_LD, type, base, ofs);
} else if (ra_hasreg(dest)) {
- if (irt_isnum(t))
+ if (!LJ_SOFTFP && irt_isnum(t))
emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
else
emit_tsi(as, irt_isint(t) ? MIPSI_LW : MIPSI_LD, dest, base,
@@ -1554,26 +1633,40 @@ static void asm_fpunary(ASMState *as, IRIns *ir, MIPSIns mi)
Reg left = ra_hintalloc(as, ir->op1, dest, RSET_FPR);
emit_fg(as, mi, dest, left);
}
+#endif
+#if !LJ_SOFTFP32
static void asm_fpmath(ASMState *as, IRIns *ir)
{
if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir))
return;
+#if !LJ_SOFTFP
if (ir->op2 <= IRFPM_TRUNC)
asm_callround(as, ir, IRCALL_lj_vm_floor + ir->op2);
else if (ir->op2 == IRFPM_SQRT)
asm_fpunary(as, ir, MIPSI_SQRT_D);
else
+#endif
asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
}
#endif
+#if !LJ_SOFTFP
+#define asm_fpadd(as, ir) asm_fparith(as, ir, MIPSI_ADD_D)
+#define asm_fpsub(as, ir) asm_fparith(as, ir, MIPSI_SUB_D)
+#define asm_fpmul(as, ir) asm_fparith(as, ir, MIPSI_MUL_D)
+#elif LJ_64 /* && LJ_SOFTFP */
+#define asm_fpadd(as, ir) asm_callid(as, ir, IRCALL_softfp_add)
+#define asm_fpsub(as, ir) asm_callid(as, ir, IRCALL_softfp_sub)
+#define asm_fpmul(as, ir) asm_callid(as, ir, IRCALL_softfp_mul)
+#endif
+
static void asm_add(ASMState *as, IRIns *ir)
{
IRType1 t = ir->t;
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
if (irt_isnum(t)) {
- asm_fparith(as, ir, MIPSI_ADD_D);
+ asm_fpadd(as, ir);
} else
#endif
{
@@ -1595,9 +1688,9 @@ static void asm_add(ASMState *as, IRIns *ir)
static void asm_sub(ASMState *as, IRIns *ir)
{
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
if (irt_isnum(ir->t)) {
- asm_fparith(as, ir, MIPSI_SUB_D);
+ asm_fpsub(as, ir);
} else
#endif
{
@@ -1611,9 +1704,9 @@ static void asm_sub(ASMState *as, IRIns *ir)
static void asm_mul(ASMState *as, IRIns *ir)
{
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
if (irt_isnum(ir->t)) {
- asm_fparith(as, ir, MIPSI_MUL_D);
+ asm_fpmul(as, ir);
} else
#endif
{
@@ -1640,7 +1733,7 @@ static void asm_mod(ASMState *as, IRIns *ir)
asm_callid(as, ir, IRCALL_lj_vm_modi);
}
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
static void asm_pow(ASMState *as, IRIns *ir)
{
#if LJ_64 && LJ_HASFFI
@@ -1660,7 +1753,11 @@ static void asm_div(ASMState *as, IRIns *ir)
IRCALL_lj_carith_divu64);
else
#endif
+#if !LJ_SOFTFP
asm_fparith(as, ir, MIPSI_DIV_D);
+#else
+ asm_callid(as, ir, IRCALL_softfp_div);
+#endif
}
#endif
@@ -1670,6 +1767,13 @@ static void asm_neg(ASMState *as, IRIns *ir)
if (irt_isnum(ir->t)) {
asm_fpunary(as, ir, MIPSI_NEG_D);
} else
+#elif LJ_64 /* && LJ_SOFTFP */
+ if (irt_isnum(ir->t)) {
+ Reg dest = ra_dest(as, ir, RSET_GPR);
+ Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
+ emit_dst(as, MIPSI_XOR, dest, left,
+ ra_allock(as, 0x8000000000000000ll, rset_exclude(RSET_GPR, dest)));
+ } else
#endif
{
Reg dest = ra_dest(as, ir, RSET_GPR);
@@ -1679,7 +1783,17 @@ static void asm_neg(ASMState *as, IRIns *ir)
}
}
+#if !LJ_SOFTFP
#define asm_abs(as, ir) asm_fpunary(as, ir, MIPSI_ABS_D)
+#elif LJ_64 /* && LJ_SOFTFP */
+static void asm_abs(ASMState *as, IRIns *ir)
+{
+ Reg dest = ra_dest(as, ir, RSET_GPR);
+ Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
+ emit_tsml(as, MIPSI_DEXTM, dest, left, 30, 0);
+}
+#endif
+
#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
@@ -1924,15 +2038,21 @@ static void asm_bror(ASMState *as, IRIns *ir)
}
}
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP
static void asm_sfpmin_max(ASMState *as, IRIns *ir)
{
CCallInfo ci = lj_ir_callinfo[(IROp)ir->o == IR_MIN ? IRCALL_lj_vm_sfmin : IRCALL_lj_vm_sfmax];
+#if LJ_64
+ IRRef args[2];
+ args[0] = ir->op1;
+ args[1] = ir->op2;
+#else
IRRef args[4];
args[0^LJ_BE] = ir->op1;
args[1^LJ_BE] = (ir+1)->op1;
args[2^LJ_BE] = ir->op2;
args[3^LJ_BE] = (ir+1)->op2;
+#endif
asm_setupresult(as, ir, &ci);
emit_call(as, (void *)ci.func, 0);
ci.func = NULL;
@@ -1942,7 +2062,10 @@ static void asm_sfpmin_max(ASMState *as, IRIns *ir)
static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
{
- if (!LJ_SOFTFP && irt_isnum(ir->t)) {
+ if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+ asm_sfpmin_max(as, ir);
+#else
Reg dest = ra_dest(as, ir, RSET_FPR);
Reg right, left = ra_alloc2(as, ir, RSET_FPR);
right = (left >> 8); left &= 255;
@@ -1953,6 +2076,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
}
emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
+#endif
} else {
Reg dest = ra_dest(as, ir, RSET_GPR);
Reg right, left = ra_alloc2(as, ir, RSET_GPR);
@@ -1973,18 +2097,24 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
/* -- Comparisons --------------------------------------------------------- */
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP
/* SFP comparisons. */
static void asm_sfpcomp(ASMState *as, IRIns *ir)
{
const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
RegSet drop = RSET_SCRATCH;
Reg r;
+#if LJ_64
+ IRRef args[2];
+ args[0] = ir->op1;
+ args[1] = ir->op2;
+#else
IRRef args[4];
args[LJ_LE ? 0 : 1] = ir->op1; args[LJ_LE ? 1 : 0] = (ir+1)->op1;
args[LJ_LE ? 2 : 3] = ir->op2; args[LJ_LE ? 3 : 2] = (ir+1)->op2;
+#endif
- for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
+ for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+(LJ_64?1:3); r++) {
if (!rset_test(as->freeset, r) &&
regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
rset_clear(drop, r);
@@ -2038,11 +2168,15 @@ static void asm_comp(ASMState *as, IRIns *ir)
{
/* ORDER IR: LT GE LE GT ULT UGE ULE UGT. */
IROp op = ir->o;
- if (!LJ_SOFTFP && irt_isnum(ir->t)) {
+ if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+ asm_sfpcomp(as, ir);
+#else
Reg right, left = ra_alloc2(as, ir, RSET_FPR);
right = (left >> 8); left &= 255;
asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
+#endif
} else {
Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
if (op == IR_ABC) op = IR_UGT;
@@ -2074,9 +2208,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
Reg right, left = ra_alloc2(as, ir, (!LJ_SOFTFP && irt_isnum(ir->t)) ?
RSET_FPR : RSET_GPR);
right = (left >> 8); left &= 255;
- if (!LJ_SOFTFP && irt_isnum(ir->t)) {
+ if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+ asm_sfpcomp(as, ir);
+#else
asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
+#endif
} else {
asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
}
@@ -2269,7 +2407,7 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
if ((sn & SNAP_NORESTORE))
continue;
if (irt_isnum(ir->t)) {
-#if LJ_SOFTFP
+#if LJ_SOFTFP32
Reg tmp;
RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
lua_assert(irref_isk(ref)); /* LJ_SOFTFP: must be a number constant. */
@@ -2278,6 +2416,9 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
emit_tsi(as, MIPSI_SW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
+#elif LJ_SOFTFP /* && LJ_64 */
+ Reg src = ra_alloc1(as, ref, rset_exclude(RSET_GPR, RID_BASE));
+ emit_tsi(as, MIPSI_SD, src, RID_BASE, ofs);
#else
Reg src = ra_alloc1(as, ref, RSET_FPR);
emit_hsi(as, MIPSI_SDC1, src, RID_BASE, ofs);
diff --git a/src/lj_crecord.c b/src/lj_crecord.c
index ffe995f4..804cdbf4 100644
--- a/src/lj_crecord.c
+++ b/src/lj_crecord.c
@@ -212,7 +212,7 @@ static void crec_copy_emit(jit_State *J, CRecMemList *ml, MSize mlp,
ml[i].trval = emitir(IRT(IR_XLOAD, ml[i].tp), trsptr, 0);
ml[i].trofs = trofs;
i++;
- rwin += (LJ_SOFTFP && ml[i].tp == IRT_NUM) ? 2 : 1;
+ rwin += (LJ_SOFTFP32 && ml[i].tp == IRT_NUM) ? 2 : 1;
if (rwin >= CREC_COPY_REGWIN || i >= mlp) { /* Flush buffered stores. */
rwin = 0;
for ( ; j < i; j++) {
@@ -1152,7 +1152,7 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd,
else
tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_I8 : IRT_I16,IRCONV_SEXT);
}
- } else if (LJ_SOFTFP && ctype_isfp(d->info) && d->size > 4) {
+ } else if (LJ_SOFTFP32 && ctype_isfp(d->info) && d->size > 4) {
lj_needsplit(J);
}
#if LJ_TARGET_X86
diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
index 8a9ee24d..bb6593ae 100644
--- a/src/lj_emit_mips.h
+++ b/src/lj_emit_mips.h
@@ -12,6 +12,8 @@ static intptr_t get_k64val(IRIns *ir)
return (intptr_t)ir_kgc(ir);
} else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) {
return (intptr_t)ir_kptr(ir);
+ } else if (LJ_SOFTFP && ir->o == IR_KNUM) {
+ return (intptr_t)ir_knum(ir)->u64;
} else {
lua_assert(ir->o == IR_KINT || ir->o == IR_KNULL);
return ir->i; /* Sign-extended. */
diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
index 8af9da1d..0746ec64 100644
--- a/src/lj_ffrecord.c
+++ b/src/lj_ffrecord.c
@@ -986,7 +986,7 @@ static void LJ_FASTCALL recff_string_format(jit_State *J, RecordFFData *rd)
handle_num:
tra = lj_ir_tonum(J, tra);
tr = lj_ir_call(J, id, tr, trsf, tra);
- if (LJ_SOFTFP) lj_needsplit(J);
+ if (LJ_SOFTFP32) lj_needsplit(J);
break;
case STRFMT_STR:
if (!tref_isstr(tra)) {
diff --git a/src/lj_ircall.h b/src/lj_ircall.h
index aa06b273..c1ac29d1 100644
--- a/src/lj_ircall.h
+++ b/src/lj_ircall.h
@@ -52,7 +52,7 @@ typedef struct CCallInfo {
#define CCI_XARGS(ci) (((ci)->flags >> CCI_XARGS_SHIFT) & 3)
#define CCI_XA (1u << CCI_XARGS_SHIFT)
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
#define CCI_XNARGS(ci) (CCI_NARGS((ci)) + CCI_XARGS((ci)))
#else
#define CCI_XNARGS(ci) CCI_NARGS((ci))
@@ -79,13 +79,19 @@ typedef struct CCallInfo {
#define IRCALLCOND_SOFTFP_FFI(x) NULL
#endif
-#if LJ_SOFTFP && LJ_TARGET_MIPS32
+#if LJ_SOFTFP && LJ_TARGET_MIPS
#define IRCALLCOND_SOFTFP_MIPS(x) x
#else
#define IRCALLCOND_SOFTFP_MIPS(x) NULL
#endif
-#define LJ_NEED_FP64 (LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS32)
+#if LJ_SOFTFP && LJ_TARGET_MIPS64
+#define IRCALLCOND_SOFTFP_MIPS64(x) x
+#else
+#define IRCALLCOND_SOFTFP_MIPS64(x) NULL
+#endif
+
+#define LJ_NEED_FP64 (LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS)
#if LJ_HASFFI && (LJ_SOFTFP || LJ_NEED_FP64)
#define IRCALLCOND_FP64_FFI(x) x
@@ -113,6 +119,14 @@ typedef struct CCallInfo {
#define XA2_FP 0
#endif
+#if LJ_SOFTFP32
+#define XA_FP32 CCI_XA
+#define XA2_FP32 (CCI_XA+CCI_XA)
+#else
+#define XA_FP32 0
+#define XA2_FP32 0
+#endif
+
#if LJ_32
#define XA_64 CCI_XA
#define XA2_64 (CCI_XA+CCI_XA)
@@ -185,20 +199,21 @@ typedef struct CCallInfo {
_(ANY, pow, 2, N, NUM, XA2_FP) \
_(ANY, atan2, 2, N, NUM, XA2_FP) \
_(ANY, ldexp, 2, N, NUM, XA_FP) \
- _(SOFTFP, lj_vm_tobit, 2, N, INT, 0) \
- _(SOFTFP, softfp_add, 4, N, NUM, 0) \
- _(SOFTFP, softfp_sub, 4, N, NUM, 0) \
- _(SOFTFP, softfp_mul, 4, N, NUM, 0) \
- _(SOFTFP, softfp_div, 4, N, NUM, 0) \
- _(SOFTFP, softfp_cmp, 4, N, NIL, 0) \
+ _(SOFTFP, lj_vm_tobit, 1, N, INT, XA_FP32) \
+ _(SOFTFP, softfp_add, 2, N, NUM, XA2_FP32) \
+ _(SOFTFP, softfp_sub, 2, N, NUM, XA2_FP32) \
+ _(SOFTFP, softfp_mul, 2, N, NUM, XA2_FP32) \
+ _(SOFTFP, softfp_div, 2, N, NUM, XA2_FP32) \
+ _(SOFTFP, softfp_cmp, 2, N, NIL, XA2_FP32) \
_(SOFTFP, softfp_i2d, 1, N, NUM, 0) \
- _(SOFTFP, softfp_d2i, 2, N, INT, 0) \
- _(SOFTFP_MIPS, lj_vm_sfmin, 4, N, NUM, 0) \
- _(SOFTFP_MIPS, lj_vm_sfmax, 4, N, NUM, 0) \
+ _(SOFTFP, softfp_d2i, 1, N, INT, XA_FP32) \
+ _(SOFTFP_MIPS, lj_vm_sfmin, 2, N, NUM, XA2_FP32) \
+ _(SOFTFP_MIPS, lj_vm_sfmax, 2, N, NUM, XA2_FP32) \
+ _(SOFTFP_MIPS64, lj_vm_tointg, 1, N, INT, 0) \
_(SOFTFP_FFI, softfp_ui2d, 1, N, NUM, 0) \
_(SOFTFP_FFI, softfp_f2d, 1, N, NUM, 0) \
- _(SOFTFP_FFI, softfp_d2ui, 2, N, INT, 0) \
- _(SOFTFP_FFI, softfp_d2f, 2, N, FLOAT, 0) \
+ _(SOFTFP_FFI, softfp_d2ui, 1, N, INT, XA_FP32) \
+ _(SOFTFP_FFI, softfp_d2f, 1, N, FLOAT, XA_FP32) \
_(SOFTFP_FFI, softfp_i2f, 1, N, FLOAT, 0) \
_(SOFTFP_FFI, softfp_ui2f, 1, N, FLOAT, 0) \
_(SOFTFP_FFI, softfp_f2i, 1, N, INT, 0) \
diff --git a/src/lj_iropt.h b/src/lj_iropt.h
index 73aef0ef..a59ba3f4 100644
--- a/src/lj_iropt.h
+++ b/src/lj_iropt.h
@@ -150,7 +150,7 @@ LJ_FUNC IRType lj_opt_narrow_forl(jit_State *J, cTValue *forbase);
/* Optimization passes. */
LJ_FUNC void lj_opt_dce(jit_State *J);
LJ_FUNC int lj_opt_loop(jit_State *J);
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
LJ_FUNC void lj_opt_split(jit_State *J);
#else
#define lj_opt_split(J) UNUSED(J)
diff --git a/src/lj_jit.h b/src/lj_jit.h
index cc8efd20..c06829ab 100644
--- a/src/lj_jit.h
+++ b/src/lj_jit.h
@@ -375,7 +375,7 @@ enum {
((TValue *)(((intptr_t)&J->ksimd[2*(n)] + 15) & ~(intptr_t)15))
/* Set/reset flag to activate the SPLIT pass for the current trace. */
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
#define lj_needsplit(J) (J->needsplit = 1)
#define lj_resetsplit(J) (J->needsplit = 0)
#else
@@ -438,7 +438,7 @@ typedef struct jit_State {
MSize sizesnapmap; /* Size of temp. snapshot map buffer. */
PostProc postproc; /* Required post-processing after execution. */
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
uint8_t needsplit; /* Need SPLIT pass. */
#endif
uint8_t retryrec; /* Retry recording. */
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 45507e0d..bf95e1eb 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -984,6 +984,9 @@ static LJ_AINLINE void copyTV(lua_State *L, TValue *o1, const TValue *o2)
#if LJ_SOFTFP
LJ_ASMF int32_t lj_vm_tobit(double x);
+#if LJ_TARGET_MIPS64
+LJ_ASMF int32_t lj_vm_tointg(double x);
+#endif
#endif
static LJ_AINLINE int32_t lj_num2bit(lua_Number n)
diff --git a/src/lj_opt_split.c b/src/lj_opt_split.c
index c0788106..2fc36b8d 100644
--- a/src/lj_opt_split.c
+++ b/src/lj_opt_split.c
@@ -8,7 +8,7 @@
#include "lj_obj.h"
-#if LJ_HASJIT && (LJ_SOFTFP || (LJ_32 && LJ_HASFFI))
+#if LJ_HASJIT && (LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI))
#include "lj_err.h"
#include "lj_buf.h"
diff --git a/src/lj_snap.c b/src/lj_snap.c
index a063c316..9146cddc 100644
--- a/src/lj_snap.c
+++ b/src/lj_snap.c
@@ -93,7 +93,7 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
(ir->op2 & (IRSLOAD_READONLY|IRSLOAD_PARENT)) != IRSLOAD_PARENT)
sn |= SNAP_NORESTORE;
}
- if (LJ_SOFTFP && irt_isnum(ir->t))
+ if (LJ_SOFTFP32 && irt_isnum(ir->t))
sn |= SNAP_SOFTFPNUM;
map[n++] = sn;
}
@@ -379,7 +379,7 @@ IRIns *lj_snap_regspmap(GCtrace *T, SnapNo snapno, IRIns *ir)
break;
}
}
- } else if (LJ_SOFTFP && ir->o == IR_HIOP) {
+ } else if (LJ_SOFTFP32 && ir->o == IR_HIOP) {
ref++;
} else if (ir->o == IR_PVAL) {
ref = ir->op1 + REF_BIAS;
@@ -491,7 +491,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
} else {
IRType t = irt_type(ir->t);
uint32_t mode = IRSLOAD_INHERIT|IRSLOAD_PARENT;
- if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
+ if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
if (ir->o == IR_SLOAD) mode |= (ir->op2 & IRSLOAD_READONLY);
tr = emitir_raw(IRT(IR_SLOAD, t), s, mode);
}
@@ -525,7 +525,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
if (irs->r == RID_SINK && snap_sunk_store(T, ir, irs)) {
if (snap_pref(J, T, map, nent, seen, irs->op2) == 0)
snap_pref(J, T, map, nent, seen, T->ir[irs->op2].op1);
- else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
+ else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
irs+1 < irlast && (irs+1)->o == IR_HIOP)
snap_pref(J, T, map, nent, seen, (irs+1)->op2);
}
@@ -584,10 +584,10 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
lua_assert(irc->o == IR_CONV && irc->op2 == IRCONV_NUM_INT);
val = snap_pref(J, T, map, nent, seen, irc->op1);
val = emitir(IRTN(IR_CONV), val, IRCONV_NUM_INT);
- } else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
+ } else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
irs+1 < irlast && (irs+1)->o == IR_HIOP) {
IRType t = IRT_I64;
- if (LJ_SOFTFP && irt_type((irs+1)->t) == IRT_SOFTFP)
+ if (LJ_SOFTFP32 && irt_type((irs+1)->t) == IRT_SOFTFP)
t = IRT_NUM;
lj_needsplit(J);
if (irref_isk(irs->op2) && irref_isk((irs+1)->op2)) {
@@ -645,7 +645,7 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
int32_t *sps = &ex->spill[regsp_spill(rs)];
if (irt_isinteger(t)) {
setintV(o, *sps);
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
} else if (irt_isnum(t)) {
o->u64 = *(uint64_t *)sps;
#endif
@@ -670,6 +670,9 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
#if !LJ_SOFTFP
} else if (irt_isnum(t)) {
setnumV(o, ex->fpr[r-RID_MIN_FPR]);
+#elif LJ_64 /* && LJ_SOFTFP */
+ } else if (irt_isnum(t)) {
+ o->u64 = ex->gpr[r-RID_MIN_GPR];
#endif
#if LJ_64 && !LJ_GC64
} else if (irt_is64(t)) {
@@ -823,7 +826,7 @@ static void snap_unsink(jit_State *J, GCtrace *T, ExitState *ex,
val = lj_tab_set(J->L, t, &tmp);
/* NOBARRIER: The table is new (marked white). */
snap_restoreval(J, T, ex, snapno, rfilt, irs->op2, val);
- if (LJ_SOFTFP && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
+ if (LJ_SOFTFP32 && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
snap_restoreval(J, T, ex, snapno, rfilt, (irs+1)->op2, &tmp);
val->u32.hi = tmp.u32.lo;
}
@@ -884,7 +887,7 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
continue;
}
snap_restoreval(J, T, ex, snapno, rfilt, ref, o);
- if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
+ if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
TValue tmp;
snap_restoreval(J, T, ex, snapno, rfilt, ref+1, &tmp);
o->u32.hi = tmp.u32.lo;
diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
index 04be38f0..9839b5ac 100644
--- a/src/vm_mips64.dasc
+++ b/src/vm_mips64.dasc
@@ -1984,6 +1984,38 @@ static void build_subroutines(BuildCtx *ctx)
|1:
| jr ra
|. move CRET1, r0
+ |
+ |// FP number to int conversion with a check for soft-float.
+ |// Modifies CARG1, CRET1, CRET2, TMP0, AT.
+ |->vm_tointg:
+ |.if JIT
+ | dsll CRET2, CARG1, 1
+ | beqz CRET2, >2
+ |. li TMP0, 1076
+ | dsrl AT, CRET2, 53
+ | dsubu TMP0, TMP0, AT
+ | sltiu AT, TMP0, 54
+ | beqz AT, >1
+ |. dextm CRET2, CRET2, 0, 20
+ | dinsu CRET2, AT, 21, 21
+ | slt AT, CARG1, r0
+ | dsrlv CRET1, CRET2, TMP0
+ | dsubu CARG1, r0, CRET1
+ | movn CRET1, CARG1, AT
+ | li CARG1, 64
+ | subu TMP0, CARG1, TMP0
+ | dsllv CRET2, CRET2, TMP0 // Integer check.
+ | sextw AT, CRET1
+ | xor AT, CRET1, AT // Range check.
+ | jr ra
+ |. movz CRET2, AT, CRET2
+ |1:
+ | jr ra
+ |. li CRET2, 1
+ |2:
+ | jr ra
+ |. move CRET1, r0
+ |.endif
|.endif
|
|.macro .ffunc_bit, name
@@ -2669,6 +2701,23 @@ static void build_subroutines(BuildCtx *ctx)
|. li CRET1, 0
|.endif
|
+ |.macro sfmin_max, name, intins
+ |->vm_sf .. name:
+ |.if JIT and not FPU
+ | move TMP2, ra
+ | bal ->vm_sfcmpolt
+ |. nop
+ | move ra, TMP2
+ | move TMP0, CRET1
+ | move CRET1, CARG1
+ | jr ra
+ |. intins CRET1, CARG2, TMP0
+ |.endif
+ |.endmacro
+ |
+ | sfmin_max min, movz
+ | sfmin_max max, movn
+ |
|//-----------------------------------------------------------------------
|//-- Miscellaneous functions --------------------------------------------
|//-----------------------------------------------------------------------
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:27 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:10 ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:07 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:27 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:53PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in JIT compiler backend for MIPS64.
Typo: s/in JIT/in the JIT/
> This includes:
> * `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
> number to integer with a check for the soft-float support (called from
> JIT).
> * `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
> operations with a check for the soft-float support (called from JIT).
Typo: s/the soft-float/soft-float/
> * `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
> `LJ_SOFTFP`.
> * All fp-depending paths are instrumented with `LJ_SOFTFP` or
Typo: s/fp-depending/fp-dependent/
> `LJ_SOFTFP32` macro.
Typo: s/macro/macros/
> * The corresponding function calls in <src/lj_ircall.h> are marked as
> `XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
> for soft-FP on 32-bit MIPS.
Shouldn't we also mention the `asm_tobit` function?
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_arch.h | 4 +-
> src/lj_asm.c | 8 +-
> src/lj_asm_mips.h | 217 +++++++++++++++++++++++++++++++++++++--------
> src/lj_crecord.c | 4 +-
> src/lj_emit_mips.h | 2 +
> src/lj_ffrecord.c | 2 +-
> src/lj_ircall.h | 43 ++++++---
> src/lj_iropt.h | 2 +-
> src/lj_jit.h | 4 +-
> src/lj_obj.h | 3 +
> src/lj_opt_split.c | 2 +-
> src/lj_snap.c | 21 +++--
> src/vm_mips64.dasc | 49 ++++++++++
> 13 files changed, 286 insertions(+), 75 deletions(-)
>
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 5276ae56..c39526ea 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -349,9 +349,6 @@
> #define LJ_ARCH_BITS 32
> #define LJ_TARGET_MIPS32 1
> #else
> -#if LJ_ABI_SOFTFP || !LJ_ARCH_HASFPU
> -#define LJ_ARCH_NOJIT 1 /* NYI */
> -#endif
> #define LJ_ARCH_BITS 64
> #define LJ_TARGET_MIPS64 1
> #define LJ_TARGET_GC64 1
> @@ -528,6 +525,7 @@
> #define LJ_ABI_SOFTFP 0
> #endif
> #define LJ_SOFTFP (!LJ_ARCH_HASFPU)
> +#define LJ_SOFTFP32 (LJ_SOFTFP && LJ_32)
>
> #if LJ_ARCH_ENDIAN == LUAJIT_BE
> #define LJ_LE 0
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 0bfa44ed..15de7e33 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -341,7 +341,7 @@ static Reg ra_rematk(ASMState *as, IRRef ref)
> ra_modified(as, r);
> ir->r = RID_INIT; /* Do not keep any hint. */
> RA_DBGX((as, "remat $i $r", ir, r));
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (ir->o == IR_KNUM) {
> emit_loadk64(as, r, ir);
> } else
> @@ -1356,7 +1356,7 @@ static void asm_call(ASMState *as, IRIns *ir)
> asm_gencall(as, ci, args);
> }
>
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> static void asm_fppow(ASMState *as, IRIns *ir, IRRef lref, IRRef rref)
> {
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_pow];
> @@ -1703,10 +1703,10 @@ static void asm_ir(ASMState *as, IRIns *ir)
> case IR_MUL: asm_mul(as, ir); break;
> case IR_MOD: asm_mod(as, ir); break;
> case IR_NEG: asm_neg(as, ir); break;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
> case IR_DIV: case IR_POW: case IR_ABS:
> case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
> - lua_assert(0); /* Unused for LJ_SOFTFP. */
> + lua_assert(0); /* Unused for LJ_SOFTFP32. */
> break;
> #else
> case IR_DIV: asm_div(as, ir); break;
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 0e60fc07..a26a82cd 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -290,7 +290,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> {
> ra_leftov(as, gpr, ref);
> gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
> fpr++;
> #endif
> }
> @@ -301,7 +301,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> emit_spstore(as, ir, r, ofs);
> ofs += irt_isnum(ir->t) ? 8 : 4;
> #else
> - emit_spstore(as, ir, r, ofs + ((LJ_BE && (LJ_SOFTFP || r < RID_MAX_GPR) && !irt_is64(ir->t)) ? 4 : 0));
> + emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isfp(ir->t) && !irt_is64(ir->t)) ? 4 : 0));
> ofs += 8;
> #endif
> }
> @@ -312,7 +312,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> #endif
> if (gpr <= REGARG_LASTGPR) {
> gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
> fpr++;
> #endif
> } else {
> @@ -461,12 +461,36 @@ static void asm_tobit(ASMState *as, IRIns *ir)
> emit_tg(as, MIPSI_MFC1, dest, tmp);
> emit_fgh(as, MIPSI_ADD_D, tmp, left, right);
> }
> +#elif LJ_64 /* && LJ_SOFTFP */
> +static void asm_tointg(ASMState *as, IRIns *ir, Reg r)
> +{
> + /* The modified regs must match with the *.dasc implementation. */
> + RegSet drop = RID2RSET(REGARG_FIRSTGPR)|RID2RSET(RID_RET)|RID2RSET(RID_RET+1)|
> + RID2RSET(RID_R1)|RID2RSET(RID_R12);
> + if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
> + ra_evictset(as, drop);
> + /* Return values are in RID_RET (converted value) and RID_RET+1 (status). */
> + ra_destreg(as, ir, RID_RET);
> + asm_guard(as, MIPSI_BNE, RID_RET+1, RID_ZERO);
> + emit_call(as, (void *)lj_ir_callinfo[IRCALL_lj_vm_tointg].func, 0);
> + if (r == RID_NONE)
> + ra_leftov(as, REGARG_FIRSTGPR, ir->op1);
> + else if (r != REGARG_FIRSTGPR)
> + emit_move(as, REGARG_FIRSTGPR, r);
> +}
> +
> +static void asm_tobit(ASMState *as, IRIns *ir)
> +{
> + Reg dest = ra_dest(as, ir, RSET_GPR);
> + emit_dta(as, MIPSI_SLL, dest, dest, 0);
> + asm_callid(as, ir, IRCALL_lj_vm_tobit);
> +}
> #endif
>
> static void asm_conv(ASMState *as, IRIns *ir)
> {
> IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> int stfp = (st == IRT_NUM || st == IRT_FLOAT);
> #endif
> #if LJ_64
> @@ -477,12 +501,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
> lua_assert(!(irt_isint64(ir->t) ||
> (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
> #endif
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
> /* FP conversions are handled by SPLIT. */
> lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
> /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
> #else
> lua_assert(irt_type(ir->t) != st);
> +#if !LJ_SOFTFP
> if (irt_isfp(ir->t)) {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> if (stfp) { /* FP to FP conversion. */
> @@ -608,6 +633,42 @@ static void asm_conv(ASMState *as, IRIns *ir)
> }
> }
> } else
> +#else
> + if (irt_isfp(ir->t)) {
> +#if LJ_64 && LJ_HASFFI
> + if (stfp) { /* FP to FP conversion. */
> + asm_callid(as, ir, irt_isnum(ir->t) ? IRCALL_softfp_f2d :
> + IRCALL_softfp_d2f);
> + } else { /* Integer to FP conversion. */
> + IRCallID cid = ((IRT_IS64 >> st) & 1) ?
> + (irt_isnum(ir->t) ?
> + (st == IRT_I64 ? IRCALL_fp64_l2d : IRCALL_fp64_ul2d) :
> + (st == IRT_I64 ? IRCALL_fp64_l2f : IRCALL_fp64_ul2f)) :
> + (irt_isnum(ir->t) ?
> + (st == IRT_INT ? IRCALL_softfp_i2d : IRCALL_softfp_ui2d) :
> + (st == IRT_INT ? IRCALL_softfp_i2f : IRCALL_softfp_ui2f));
> + asm_callid(as, ir, cid);
> + }
> +#else
> + asm_callid(as, ir, IRCALL_softfp_i2d);
> +#endif
> + } else if (stfp) { /* FP to integer conversion. */
> + if (irt_isguard(ir->t)) {
> + /* Checked conversions are only supported from number to int. */
> + lua_assert(irt_isint(ir->t) && st == IRT_NUM);
> + asm_tointg(as, ir, RID_NONE);
> + } else {
> + IRCallID cid = irt_is64(ir->t) ?
> + ((st == IRT_NUM) ?
> + (irt_isi64(ir->t) ? IRCALL_fp64_d2l : IRCALL_fp64_d2ul) :
> + (irt_isi64(ir->t) ? IRCALL_fp64_f2l : IRCALL_fp64_f2ul)) :
> + ((st == IRT_NUM) ?
> + (irt_isint(ir->t) ? IRCALL_softfp_d2i : IRCALL_softfp_d2ui) :
> + (irt_isint(ir->t) ? IRCALL_softfp_f2i : IRCALL_softfp_f2ui));
> + asm_callid(as, ir, cid);
> + }
> + } else
> +#endif
> #endif
> {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -665,7 +726,7 @@ static void asm_strto(ASMState *as, IRIns *ir)
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
> IRRef args[2];
> int32_t ofs = 0;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
> ra_evictset(as, RSET_SCRATCH);
> if (ra_used(ir)) {
> if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> @@ -806,7 +867,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> MCLabel l_end, l_loop, l_next;
>
> rset_clear(allow, tab);
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
> if (!isk) {
> key = ra_alloc1(as, refkey, allow);
> rset_clear(allow, key);
> @@ -826,7 +887,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> }
> }
> #else
> - if (irt_isnum(kt)) {
> + if (!LJ_SOFTFP && irt_isnum(kt)) {
> key = ra_alloc1(as, refkey, RSET_FPR);
> tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
> } else if (!irt_ispri(kt)) {
> @@ -882,6 +943,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
> emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
> emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> + } else if (LJ_SOFTFP && irt_isnum(kt)) {
> + emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> } else if (irt_isaddr(kt)) {
> Reg refk = tmp2;
> if (isk) {
> @@ -960,7 +1024,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_dta(as, MIPSI_ROTR, dest, tmp1, (-HASH_ROT1)&31);
> if (irt_isnum(kt)) {
> emit_dst(as, MIPSI_ADDU, tmp1, tmp1, tmp1);
> - emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 0);
> + emit_dta(as, MIPSI_DSRA32, tmp1, LJ_SOFTFP ? key : tmp1, 0);
> emit_dta(as, MIPSI_SLL, tmp2, LJ_SOFTFP ? key : tmp1, 0);
> #if !LJ_SOFTFP
> emit_tg(as, MIPSI_DMFC1, tmp1, key);
> @@ -1123,7 +1187,7 @@ static MIPSIns asm_fxloadins(IRIns *ir)
> case IRT_U8: return MIPSI_LBU;
> case IRT_I16: return MIPSI_LH;
> case IRT_U16: return MIPSI_LHU;
> - case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_LDC1;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
> case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
> default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
> }
> @@ -1134,7 +1198,7 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
> switch (irt_type(ir->t)) {
> case IRT_I8: case IRT_U8: return MIPSI_SB;
> case IRT_I16: case IRT_U16: return MIPSI_SH;
> - case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_SDC1;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
> case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
> default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
> }
> @@ -1199,7 +1263,7 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
>
> static void asm_ahuvload(ASMState *as, IRIns *ir)
> {
> - int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> + int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
> Reg dest = RID_NONE, type = RID_TMP, idx;
> RegSet allow = RSET_GPR;
> int32_t ofs = 0;
> @@ -1212,7 +1276,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
> }
> }
> if (ra_used(ir)) {
> - lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> + lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
> irt_isint(ir->t) || irt_isaddr(ir->t));
> dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> @@ -1261,10 +1325,10 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
> int32_t ofs = 0;
> if (ir->r == RID_SINK)
> return;
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> - src = ra_alloc1(as, ir->op2, RSET_FPR);
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> + src = ra_alloc1(as, ir->op2, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
> idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> - emit_hsi(as, MIPSI_SDC1, src, idx, ofs);
> + emit_hsi(as, LJ_SOFTFP ? MIPSI_SD : MIPSI_SDC1, src, idx, ofs);
> } else {
> #if LJ_32
> if (!irt_ispri(ir->t)) {
> @@ -1312,7 +1376,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
> IRType1 t = ir->t;
> #if LJ_32
> int32_t ofs = 8*((int32_t)ir->op1-1) + ((ir->op2 & IRSLOAD_FRAME) ? 4 : 0);
> - int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> + int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
> if (hiop)
> t.irt = IRT_NUM;
> #else
> @@ -1320,7 +1384,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
> #endif
> lua_assert(!(ir->op2 & IRSLOAD_PARENT)); /* Handled by asm_head_side(). */
> lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
> lua_assert(!(ir->op2 & IRSLOAD_CONVERT)); /* Handled by LJ_SOFTFP SPLIT. */
> if (hiop && ra_used(ir+1)) {
> type = ra_dest(as, ir+1, allow);
> @@ -1328,29 +1392,44 @@ static void asm_sload(ASMState *as, IRIns *ir)
> }
> #else
> if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
> - dest = ra_scratch(as, RSET_FPR);
> + dest = ra_scratch(as, LJ_SOFTFP ? allow : RSET_FPR);
> asm_tointg(as, ir, dest);
> t.irt = IRT_NUM; /* Continue with a regular number type check. */
> } else
> #endif
> if (ra_used(ir)) {
> - lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> + lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
> irt_isint(ir->t) || irt_isaddr(ir->t));
> dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> base = ra_alloc1(as, REF_BASE, allow);
> rset_clear(allow, base);
> - if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
> + if (!LJ_SOFTFP32 && (ir->op2 & IRSLOAD_CONVERT)) {
> if (irt_isint(t)) {
> - Reg tmp = ra_scratch(as, RSET_FPR);
> + Reg tmp = ra_scratch(as, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
> +#if LJ_SOFTFP
> + ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> + ra_destreg(as, ir, RID_RET);
> + emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_d2i].func, 0);
> + if (tmp != REGARG_FIRSTGPR)
> + emit_move(as, REGARG_FIRSTGPR, tmp);
> +#else
> emit_tg(as, MIPSI_MFC1, dest, tmp);
> emit_fg(as, MIPSI_TRUNC_W_D, tmp, tmp);
> +#endif
> dest = tmp;
> t.irt = IRT_NUM; /* Check for original type. */
> } else {
> Reg tmp = ra_scratch(as, RSET_GPR);
> +#if LJ_SOFTFP
> + ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> + ra_destreg(as, ir, RID_RET);
> + emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_i2d].func, 0);
> + emit_dta(as, MIPSI_SLL, REGARG_FIRSTGPR, tmp, 0);
> +#else
> emit_fg(as, MIPSI_CVT_D_W, dest, dest);
> emit_tg(as, MIPSI_MTC1, tmp, dest);
> +#endif
> dest = tmp;
> t.irt = IRT_INT; /* Check for original type. */
> }
> @@ -1399,7 +1478,7 @@ dotypecheck:
> if (irt_isnum(t)) {
> asm_guard(as, MIPSI_BEQ, RID_TMP, RID_ZERO);
> emit_tsi(as, MIPSI_SLTIU, RID_TMP, RID_TMP, (int32_t)LJ_TISNUM);
> - if (ra_hasreg(dest))
> + if (!LJ_SOFTFP && ra_hasreg(dest))
> emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
> } else {
> asm_guard(as, MIPSI_BNE, RID_TMP,
> @@ -1409,7 +1488,7 @@ dotypecheck:
> }
> emit_tsi(as, MIPSI_LD, type, base, ofs);
> } else if (ra_hasreg(dest)) {
> - if (irt_isnum(t))
> + if (!LJ_SOFTFP && irt_isnum(t))
> emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
> else
> emit_tsi(as, irt_isint(t) ? MIPSI_LW : MIPSI_LD, dest, base,
> @@ -1554,26 +1633,40 @@ static void asm_fpunary(ASMState *as, IRIns *ir, MIPSIns mi)
> Reg left = ra_hintalloc(as, ir->op1, dest, RSET_FPR);
> emit_fg(as, mi, dest, left);
> }
> +#endif
>
> +#if !LJ_SOFTFP32
> static void asm_fpmath(ASMState *as, IRIns *ir)
> {
> if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir))
> return;
> +#if !LJ_SOFTFP
> if (ir->op2 <= IRFPM_TRUNC)
> asm_callround(as, ir, IRCALL_lj_vm_floor + ir->op2);
> else if (ir->op2 == IRFPM_SQRT)
> asm_fpunary(as, ir, MIPSI_SQRT_D);
> else
> +#endif
> asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
> }
> #endif
>
> +#if !LJ_SOFTFP
> +#define asm_fpadd(as, ir) asm_fparith(as, ir, MIPSI_ADD_D)
> +#define asm_fpsub(as, ir) asm_fparith(as, ir, MIPSI_SUB_D)
> +#define asm_fpmul(as, ir) asm_fparith(as, ir, MIPSI_MUL_D)
> +#elif LJ_64 /* && LJ_SOFTFP */
> +#define asm_fpadd(as, ir) asm_callid(as, ir, IRCALL_softfp_add)
> +#define asm_fpsub(as, ir) asm_callid(as, ir, IRCALL_softfp_sub)
> +#define asm_fpmul(as, ir) asm_callid(as, ir, IRCALL_softfp_mul)
> +#endif
> +
> static void asm_add(ASMState *as, IRIns *ir)
> {
> IRType1 t = ir->t;
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (irt_isnum(t)) {
> - asm_fparith(as, ir, MIPSI_ADD_D);
> + asm_fpadd(as, ir);
> } else
> #endif
> {
> @@ -1595,9 +1688,9 @@ static void asm_add(ASMState *as, IRIns *ir)
>
> static void asm_sub(ASMState *as, IRIns *ir)
> {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (irt_isnum(ir->t)) {
> - asm_fparith(as, ir, MIPSI_SUB_D);
> + asm_fpsub(as, ir);
> } else
> #endif
> {
> @@ -1611,9 +1704,9 @@ static void asm_sub(ASMState *as, IRIns *ir)
>
> static void asm_mul(ASMState *as, IRIns *ir)
> {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (irt_isnum(ir->t)) {
> - asm_fparith(as, ir, MIPSI_MUL_D);
> + asm_fpmul(as, ir);
> } else
> #endif
> {
> @@ -1640,7 +1733,7 @@ static void asm_mod(ASMState *as, IRIns *ir)
> asm_callid(as, ir, IRCALL_lj_vm_modi);
> }
>
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> static void asm_pow(ASMState *as, IRIns *ir)
> {
> #if LJ_64 && LJ_HASFFI
> @@ -1660,7 +1753,11 @@ static void asm_div(ASMState *as, IRIns *ir)
> IRCALL_lj_carith_divu64);
> else
> #endif
> +#if !LJ_SOFTFP
> asm_fparith(as, ir, MIPSI_DIV_D);
> +#else
> + asm_callid(as, ir, IRCALL_softfp_div);
> +#endif
> }
> #endif
>
> @@ -1670,6 +1767,13 @@ static void asm_neg(ASMState *as, IRIns *ir)
> if (irt_isnum(ir->t)) {
> asm_fpunary(as, ir, MIPSI_NEG_D);
> } else
> +#elif LJ_64 /* && LJ_SOFTFP */
> + if (irt_isnum(ir->t)) {
> + Reg dest = ra_dest(as, ir, RSET_GPR);
> + Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> + emit_dst(as, MIPSI_XOR, dest, left,
> + ra_allock(as, 0x8000000000000000ll, rset_exclude(RSET_GPR, dest)));
> + } else
> #endif
> {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -1679,7 +1783,17 @@ static void asm_neg(ASMState *as, IRIns *ir)
> }
> }
>
> +#if !LJ_SOFTFP
> #define asm_abs(as, ir) asm_fpunary(as, ir, MIPSI_ABS_D)
> +#elif LJ_64 /* && LJ_SOFTFP */
> +static void asm_abs(ASMState *as, IRIns *ir)
> +{
> + Reg dest = ra_dest(as, ir, RSET_GPR);
> + Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> + emit_tsml(as, MIPSI_DEXTM, dest, left, 30, 0);
> +}
> +#endif
> +
> #define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
> #define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
>
> @@ -1924,15 +2038,21 @@ static void asm_bror(ASMState *as, IRIns *ir)
> }
> }
>
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
> static void asm_sfpmin_max(ASMState *as, IRIns *ir)
> {
> CCallInfo ci = lj_ir_callinfo[(IROp)ir->o == IR_MIN ? IRCALL_lj_vm_sfmin : IRCALL_lj_vm_sfmax];
> +#if LJ_64
> + IRRef args[2];
> + args[0] = ir->op1;
> + args[1] = ir->op2;
> +#else
> IRRef args[4];
> args[0^LJ_BE] = ir->op1;
> args[1^LJ_BE] = (ir+1)->op1;
> args[2^LJ_BE] = ir->op2;
> args[3^LJ_BE] = (ir+1)->op2;
> +#endif
> asm_setupresult(as, ir, &ci);
> emit_call(as, (void *)ci.func, 0);
> ci.func = NULL;
> @@ -1942,7 +2062,10 @@ static void asm_sfpmin_max(ASMState *as, IRIns *ir)
>
> static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
> {
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + asm_sfpmin_max(as, ir);
> +#else
> Reg dest = ra_dest(as, ir, RSET_FPR);
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> right = (left >> 8); left &= 255;
> @@ -1953,6 +2076,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
> if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
> }
> emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
> +#endif
> } else {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg right, left = ra_alloc2(as, ir, RSET_GPR);
> @@ -1973,18 +2097,24 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>
> /* -- Comparisons --------------------------------------------------------- */
>
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
> /* SFP comparisons. */
> static void asm_sfpcomp(ASMState *as, IRIns *ir)
> {
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
> RegSet drop = RSET_SCRATCH;
> Reg r;
> +#if LJ_64
> + IRRef args[2];
> + args[0] = ir->op1;
> + args[1] = ir->op2;
> +#else
> IRRef args[4];
> args[LJ_LE ? 0 : 1] = ir->op1; args[LJ_LE ? 1 : 0] = (ir+1)->op1;
> args[LJ_LE ? 2 : 3] = ir->op2; args[LJ_LE ? 3 : 2] = (ir+1)->op2;
> +#endif
>
> - for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> + for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+(LJ_64?1:3); r++) {
> if (!rset_test(as->freeset, r) &&
> regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
> rset_clear(drop, r);
> @@ -2038,11 +2168,15 @@ static void asm_comp(ASMState *as, IRIns *ir)
> {
> /* ORDER IR: LT GE LE GT ULT UGE ULE UGT. */
> IROp op = ir->o;
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + asm_sfpcomp(as, ir);
> +#else
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> right = (left >> 8); left &= 255;
> asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
> emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
> +#endif
> } else {
> Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
> if (op == IR_ABC) op = IR_UGT;
> @@ -2074,9 +2208,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
> Reg right, left = ra_alloc2(as, ir, (!LJ_SOFTFP && irt_isnum(ir->t)) ?
> RSET_FPR : RSET_GPR);
> right = (left >> 8); left &= 255;
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + asm_sfpcomp(as, ir);
> +#else
> asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
> emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
> +#endif
> } else {
> asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
> }
> @@ -2269,7 +2407,7 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if ((sn & SNAP_NORESTORE))
> continue;
> if (irt_isnum(ir->t)) {
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
> Reg tmp;
> RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> lua_assert(irref_isk(ref)); /* LJ_SOFTFP: must be a number constant. */
> @@ -2278,6 +2416,9 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
> tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
> emit_tsi(as, MIPSI_SW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#elif LJ_SOFTFP /* && LJ_64 */
> + Reg src = ra_alloc1(as, ref, rset_exclude(RSET_GPR, RID_BASE));
> + emit_tsi(as, MIPSI_SD, src, RID_BASE, ofs);
> #else
> Reg src = ra_alloc1(as, ref, RSET_FPR);
> emit_hsi(as, MIPSI_SDC1, src, RID_BASE, ofs);
> diff --git a/src/lj_crecord.c b/src/lj_crecord.c
> index ffe995f4..804cdbf4 100644
> --- a/src/lj_crecord.c
> +++ b/src/lj_crecord.c
> @@ -212,7 +212,7 @@ static void crec_copy_emit(jit_State *J, CRecMemList *ml, MSize mlp,
> ml[i].trval = emitir(IRT(IR_XLOAD, ml[i].tp), trsptr, 0);
> ml[i].trofs = trofs;
> i++;
> - rwin += (LJ_SOFTFP && ml[i].tp == IRT_NUM) ? 2 : 1;
> + rwin += (LJ_SOFTFP32 && ml[i].tp == IRT_NUM) ? 2 : 1;
> if (rwin >= CREC_COPY_REGWIN || i >= mlp) { /* Flush buffered stores. */
> rwin = 0;
> for ( ; j < i; j++) {
> @@ -1152,7 +1152,7 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd,
> else
> tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_I8 : IRT_I16,IRCONV_SEXT);
> }
> - } else if (LJ_SOFTFP && ctype_isfp(d->info) && d->size > 4) {
> + } else if (LJ_SOFTFP32 && ctype_isfp(d->info) && d->size > 4) {
> lj_needsplit(J);
> }
> #if LJ_TARGET_X86
> diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
> index 8a9ee24d..bb6593ae 100644
> --- a/src/lj_emit_mips.h
> +++ b/src/lj_emit_mips.h
> @@ -12,6 +12,8 @@ static intptr_t get_k64val(IRIns *ir)
> return (intptr_t)ir_kgc(ir);
> } else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) {
> return (intptr_t)ir_kptr(ir);
> + } else if (LJ_SOFTFP && ir->o == IR_KNUM) {
> + return (intptr_t)ir_knum(ir)->u64;
> } else {
> lua_assert(ir->o == IR_KINT || ir->o == IR_KNULL);
> return ir->i; /* Sign-extended. */
> diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
> index 8af9da1d..0746ec64 100644
> --- a/src/lj_ffrecord.c
> +++ b/src/lj_ffrecord.c
> @@ -986,7 +986,7 @@ static void LJ_FASTCALL recff_string_format(jit_State *J, RecordFFData *rd)
> handle_num:
> tra = lj_ir_tonum(J, tra);
> tr = lj_ir_call(J, id, tr, trsf, tra);
> - if (LJ_SOFTFP) lj_needsplit(J);
> + if (LJ_SOFTFP32) lj_needsplit(J);
> break;
> case STRFMT_STR:
> if (!tref_isstr(tra)) {
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index aa06b273..c1ac29d1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -52,7 +52,7 @@ typedef struct CCallInfo {
> #define CCI_XARGS(ci) (((ci)->flags >> CCI_XARGS_SHIFT) & 3)
> #define CCI_XA (1u << CCI_XARGS_SHIFT)
>
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> #define CCI_XNARGS(ci) (CCI_NARGS((ci)) + CCI_XARGS((ci)))
> #else
> #define CCI_XNARGS(ci) CCI_NARGS((ci))
> @@ -79,13 +79,19 @@ typedef struct CCallInfo {
> #define IRCALLCOND_SOFTFP_FFI(x) NULL
> #endif
>
> -#if LJ_SOFTFP && LJ_TARGET_MIPS32
> +#if LJ_SOFTFP && LJ_TARGET_MIPS
> #define IRCALLCOND_SOFTFP_MIPS(x) x
> #else
> #define IRCALLCOND_SOFTFP_MIPS(x) NULL
> #endif
>
> -#define LJ_NEED_FP64 (LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS32)
> +#if LJ_SOFTFP && LJ_TARGET_MIPS64
> +#define IRCALLCOND_SOFTFP_MIPS64(x) x
> +#else
> +#define IRCALLCOND_SOFTFP_MIPS64(x) NULL
> +#endif
> +
> +#define LJ_NEED_FP64 (LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS)
>
> #if LJ_HASFFI && (LJ_SOFTFP || LJ_NEED_FP64)
> #define IRCALLCOND_FP64_FFI(x) x
> @@ -113,6 +119,14 @@ typedef struct CCallInfo {
> #define XA2_FP 0
> #endif
>
> +#if LJ_SOFTFP32
> +#define XA_FP32 CCI_XA
> +#define XA2_FP32 (CCI_XA+CCI_XA)
> +#else
> +#define XA_FP32 0
> +#define XA2_FP32 0
> +#endif
> +
> #if LJ_32
> #define XA_64 CCI_XA
> #define XA2_64 (CCI_XA+CCI_XA)
> @@ -185,20 +199,21 @@ typedef struct CCallInfo {
> _(ANY, pow, 2, N, NUM, XA2_FP) \
> _(ANY, atan2, 2, N, NUM, XA2_FP) \
> _(ANY, ldexp, 2, N, NUM, XA_FP) \
> - _(SOFTFP, lj_vm_tobit, 2, N, INT, 0) \
> - _(SOFTFP, softfp_add, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_sub, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_mul, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_div, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_cmp, 4, N, NIL, 0) \
> + _(SOFTFP, lj_vm_tobit, 1, N, INT, XA_FP32) \
> + _(SOFTFP, softfp_add, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_sub, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_mul, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_div, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_cmp, 2, N, NIL, XA2_FP32) \
> _(SOFTFP, softfp_i2d, 1, N, NUM, 0) \
> - _(SOFTFP, softfp_d2i, 2, N, INT, 0) \
> - _(SOFTFP_MIPS, lj_vm_sfmin, 4, N, NUM, 0) \
> - _(SOFTFP_MIPS, lj_vm_sfmax, 4, N, NUM, 0) \
> + _(SOFTFP, softfp_d2i, 1, N, INT, XA_FP32) \
> + _(SOFTFP_MIPS, lj_vm_sfmin, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP_MIPS, lj_vm_sfmax, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP_MIPS64, lj_vm_tointg, 1, N, INT, 0) \
> _(SOFTFP_FFI, softfp_ui2d, 1, N, NUM, 0) \
> _(SOFTFP_FFI, softfp_f2d, 1, N, NUM, 0) \
> - _(SOFTFP_FFI, softfp_d2ui, 2, N, INT, 0) \
> - _(SOFTFP_FFI, softfp_d2f, 2, N, FLOAT, 0) \
> + _(SOFTFP_FFI, softfp_d2ui, 1, N, INT, XA_FP32) \
> + _(SOFTFP_FFI, softfp_d2f, 1, N, FLOAT, XA_FP32) \
> _(SOFTFP_FFI, softfp_i2f, 1, N, FLOAT, 0) \
> _(SOFTFP_FFI, softfp_ui2f, 1, N, FLOAT, 0) \
> _(SOFTFP_FFI, softfp_f2i, 1, N, INT, 0) \
> diff --git a/src/lj_iropt.h b/src/lj_iropt.h
> index 73aef0ef..a59ba3f4 100644
> --- a/src/lj_iropt.h
> +++ b/src/lj_iropt.h
> @@ -150,7 +150,7 @@ LJ_FUNC IRType lj_opt_narrow_forl(jit_State *J, cTValue *forbase);
> /* Optimization passes. */
> LJ_FUNC void lj_opt_dce(jit_State *J);
> LJ_FUNC int lj_opt_loop(jit_State *J);
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> LJ_FUNC void lj_opt_split(jit_State *J);
> #else
> #define lj_opt_split(J) UNUSED(J)
> diff --git a/src/lj_jit.h b/src/lj_jit.h
> index cc8efd20..c06829ab 100644
> --- a/src/lj_jit.h
> +++ b/src/lj_jit.h
> @@ -375,7 +375,7 @@ enum {
> ((TValue *)(((intptr_t)&J->ksimd[2*(n)] + 15) & ~(intptr_t)15))
>
> /* Set/reset flag to activate the SPLIT pass for the current trace. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> #define lj_needsplit(J) (J->needsplit = 1)
> #define lj_resetsplit(J) (J->needsplit = 0)
> #else
> @@ -438,7 +438,7 @@ typedef struct jit_State {
> MSize sizesnapmap; /* Size of temp. snapshot map buffer. */
>
> PostProc postproc; /* Required post-processing after execution. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> uint8_t needsplit; /* Need SPLIT pass. */
> #endif
> uint8_t retryrec; /* Retry recording. */
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 45507e0d..bf95e1eb 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -984,6 +984,9 @@ static LJ_AINLINE void copyTV(lua_State *L, TValue *o1, const TValue *o2)
>
> #if LJ_SOFTFP
> LJ_ASMF int32_t lj_vm_tobit(double x);
> +#if LJ_TARGET_MIPS64
> +LJ_ASMF int32_t lj_vm_tointg(double x);
> +#endif
> #endif
>
> static LJ_AINLINE int32_t lj_num2bit(lua_Number n)
> diff --git a/src/lj_opt_split.c b/src/lj_opt_split.c
> index c0788106..2fc36b8d 100644
> --- a/src/lj_opt_split.c
> +++ b/src/lj_opt_split.c
> @@ -8,7 +8,7 @@
>
> #include "lj_obj.h"
>
> -#if LJ_HASJIT && (LJ_SOFTFP || (LJ_32 && LJ_HASFFI))
> +#if LJ_HASJIT && (LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI))
>
> #include "lj_err.h"
> #include "lj_buf.h"
> diff --git a/src/lj_snap.c b/src/lj_snap.c
> index a063c316..9146cddc 100644
> --- a/src/lj_snap.c
> +++ b/src/lj_snap.c
> @@ -93,7 +93,7 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
> (ir->op2 & (IRSLOAD_READONLY|IRSLOAD_PARENT)) != IRSLOAD_PARENT)
> sn |= SNAP_NORESTORE;
> }
> - if (LJ_SOFTFP && irt_isnum(ir->t))
> + if (LJ_SOFTFP32 && irt_isnum(ir->t))
> sn |= SNAP_SOFTFPNUM;
> map[n++] = sn;
> }
> @@ -379,7 +379,7 @@ IRIns *lj_snap_regspmap(GCtrace *T, SnapNo snapno, IRIns *ir)
> break;
> }
> }
> - } else if (LJ_SOFTFP && ir->o == IR_HIOP) {
> + } else if (LJ_SOFTFP32 && ir->o == IR_HIOP) {
> ref++;
> } else if (ir->o == IR_PVAL) {
> ref = ir->op1 + REF_BIAS;
> @@ -491,7 +491,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
> } else {
> IRType t = irt_type(ir->t);
> uint32_t mode = IRSLOAD_INHERIT|IRSLOAD_PARENT;
> - if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
> + if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
> if (ir->o == IR_SLOAD) mode |= (ir->op2 & IRSLOAD_READONLY);
> tr = emitir_raw(IRT(IR_SLOAD, t), s, mode);
> }
> @@ -525,7 +525,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
> if (irs->r == RID_SINK && snap_sunk_store(T, ir, irs)) {
> if (snap_pref(J, T, map, nent, seen, irs->op2) == 0)
> snap_pref(J, T, map, nent, seen, T->ir[irs->op2].op1);
> - else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> + else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
> irs+1 < irlast && (irs+1)->o == IR_HIOP)
> snap_pref(J, T, map, nent, seen, (irs+1)->op2);
> }
> @@ -584,10 +584,10 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
> lua_assert(irc->o == IR_CONV && irc->op2 == IRCONV_NUM_INT);
> val = snap_pref(J, T, map, nent, seen, irc->op1);
> val = emitir(IRTN(IR_CONV), val, IRCONV_NUM_INT);
> - } else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> + } else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
> irs+1 < irlast && (irs+1)->o == IR_HIOP) {
> IRType t = IRT_I64;
> - if (LJ_SOFTFP && irt_type((irs+1)->t) == IRT_SOFTFP)
> + if (LJ_SOFTFP32 && irt_type((irs+1)->t) == IRT_SOFTFP)
> t = IRT_NUM;
> lj_needsplit(J);
> if (irref_isk(irs->op2) && irref_isk((irs+1)->op2)) {
> @@ -645,7 +645,7 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
> int32_t *sps = &ex->spill[regsp_spill(rs)];
> if (irt_isinteger(t)) {
> setintV(o, *sps);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> } else if (irt_isnum(t)) {
> o->u64 = *(uint64_t *)sps;
> #endif
> @@ -670,6 +670,9 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
> #if !LJ_SOFTFP
> } else if (irt_isnum(t)) {
> setnumV(o, ex->fpr[r-RID_MIN_FPR]);
> +#elif LJ_64 /* && LJ_SOFTFP */
> + } else if (irt_isnum(t)) {
> + o->u64 = ex->gpr[r-RID_MIN_GPR];
> #endif
> #if LJ_64 && !LJ_GC64
> } else if (irt_is64(t)) {
> @@ -823,7 +826,7 @@ static void snap_unsink(jit_State *J, GCtrace *T, ExitState *ex,
> val = lj_tab_set(J->L, t, &tmp);
> /* NOBARRIER: The table is new (marked white). */
> snap_restoreval(J, T, ex, snapno, rfilt, irs->op2, val);
> - if (LJ_SOFTFP && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
> + if (LJ_SOFTFP32 && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
> snap_restoreval(J, T, ex, snapno, rfilt, (irs+1)->op2, &tmp);
> val->u32.hi = tmp.u32.lo;
> }
> @@ -884,7 +887,7 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
> continue;
> }
> snap_restoreval(J, T, ex, snapno, rfilt, ref, o);
> - if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
> + if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
> TValue tmp;
> snap_restoreval(J, T, ex, snapno, rfilt, ref+1, &tmp);
> o->u32.hi = tmp.u32.lo;
> diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
> index 04be38f0..9839b5ac 100644
> --- a/src/vm_mips64.dasc
> +++ b/src/vm_mips64.dasc
> @@ -1984,6 +1984,38 @@ static void build_subroutines(BuildCtx *ctx)
> |1:
> | jr ra
> |. move CRET1, r0
> + |
> + |// FP number to int conversion with a check for soft-float.
> + |// Modifies CARG1, CRET1, CRET2, TMP0, AT.
> + |->vm_tointg:
> + |.if JIT
> + | dsll CRET2, CARG1, 1
> + | beqz CRET2, >2
> + |. li TMP0, 1076
> + | dsrl AT, CRET2, 53
> + | dsubu TMP0, TMP0, AT
> + | sltiu AT, TMP0, 54
> + | beqz AT, >1
> + |. dextm CRET2, CRET2, 0, 20
> + | dinsu CRET2, AT, 21, 21
> + | slt AT, CARG1, r0
> + | dsrlv CRET1, CRET2, TMP0
> + | dsubu CARG1, r0, CRET1
> + | movn CRET1, CARG1, AT
> + | li CARG1, 64
> + | subu TMP0, CARG1, TMP0
> + | dsllv CRET2, CRET2, TMP0 // Integer check.
> + | sextw AT, CRET1
> + | xor AT, CRET1, AT // Range check.
> + | jr ra
> + |. movz CRET2, AT, CRET2
> + |1:
> + | jr ra
> + |. li CRET2, 1
> + |2:
> + | jr ra
> + |. move CRET1, r0
> + |.endif
> |.endif
> |
> |.macro .ffunc_bit, name
> @@ -2669,6 +2701,23 @@ static void build_subroutines(BuildCtx *ctx)
> |. li CRET1, 0
> |.endif
> |
> + |.macro sfmin_max, name, intins
> + |->vm_sf .. name:
> + |.if JIT and not FPU
> + | move TMP2, ra
> + | bal ->vm_sfcmpolt
> + |. nop
> + | move ra, TMP2
> + | move TMP0, CRET1
> + | move CRET1, CARG1
> + | jr ra
> + |. intins CRET1, CARG2, TMP0
> + |.endif
> + |.endmacro
> + |
> + | sfmin_max min, movz
> + | sfmin_max max, movn
> + |
> |//-----------------------------------------------------------------------
> |//-- Miscellaneous functions --------------------------------------------
> |//-----------------------------------------------------------------------
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
2023-08-15 11:27 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:10 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:10 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:53PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> > Sponsored by Cisco Systems, Inc.
> >
> > (cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)
> >
> > The software floating point library is used on machines which do not
> > have hardware support for floating point [1]. This patch enables
> > support for such machines in JIT compiler backend for MIPS64.
> Typo: s/in JIT/in the JIT/
Fixed.
> > This includes:
> > * `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
> > number to integer with a check for the soft-float support (called from
> > JIT).
> > * `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
> > operations with a check for the soft-float support (called from JIT).
> Typo: s/the soft-float/soft-float/
Fixed.
> > * `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
> > `LJ_SOFTFP`.
> > * All fp-depending paths are instrumented with `LJ_SOFTFP` or
> Typo: s/fp-depending/fp-dependent/
Fixed.
> > `LJ_SOFTFP32` macro.
> Typo: s/macro/macros/
Fixed.
> > * The corresponding function calls in <src/lj_ircall.h> are marked as
> > `XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
> > for soft-FP on 32-bit MIPS.
>
> Shouldn't we also mention the `asm_tobit` function?
I suppose no, since it still just another implementation for SOFTFP &&
LJ_64 mode.
> >
> > [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> >
> > Sergey Kaplun:
> > * added the description for the feature
> >
> > Part of tarantool/tarantool#8825
> > ---
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
2023-08-15 11:27 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 16:07 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:07 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey
LGTM
On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in JIT compiler backend for MIPS64.
> This includes:
> * `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
> number to integer with a check for the soft-float support (called from
> JIT).
> * `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
> operations with a check for the soft-float support (called from JIT).
> * `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
> `LJ_SOFTFP`.
> * All fp-depending paths are instrumented with `LJ_SOFTFP` or
> `LJ_SOFTFP32` macro.
> * The corresponding function calls in <src/lj_ircall.h> are marked as
> `XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
> for soft-FP on 32-bit MIPS.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_arch.h | 4 +-
> src/lj_asm.c | 8 +-
> src/lj_asm_mips.h | 217 +++++++++++++++++++++++++++++++++++++--------
> src/lj_crecord.c | 4 +-
> src/lj_emit_mips.h | 2 +
> src/lj_ffrecord.c | 2 +-
> src/lj_ircall.h | 43 ++++++---
> src/lj_iropt.h | 2 +-
> src/lj_jit.h | 4 +-
> src/lj_obj.h | 3 +
> src/lj_opt_split.c | 2 +-
> src/lj_snap.c | 21 +++--
> src/vm_mips64.dasc | 49 ++++++++++
> 13 files changed, 286 insertions(+), 75 deletions(-)
>
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 5276ae56..c39526ea 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -349,9 +349,6 @@
> #define LJ_ARCH_BITS 32
> #define LJ_TARGET_MIPS32 1
> #else
> -#if LJ_ABI_SOFTFP || !LJ_ARCH_HASFPU
> -#define LJ_ARCH_NOJIT 1 /* NYI */
> -#endif
> #define LJ_ARCH_BITS 64
> #define LJ_TARGET_MIPS64 1
> #define LJ_TARGET_GC64 1
> @@ -528,6 +525,7 @@
> #define LJ_ABI_SOFTFP 0
> #endif
> #define LJ_SOFTFP (!LJ_ARCH_HASFPU)
> +#define LJ_SOFTFP32 (LJ_SOFTFP && LJ_32)
>
> #if LJ_ARCH_ENDIAN == LUAJIT_BE
> #define LJ_LE 0
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 0bfa44ed..15de7e33 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -341,7 +341,7 @@ static Reg ra_rematk(ASMState *as, IRRef ref)
> ra_modified(as, r);
> ir->r = RID_INIT; /* Do not keep any hint. */
> RA_DBGX((as, "remat $i $r", ir, r));
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (ir->o == IR_KNUM) {
> emit_loadk64(as, r, ir);
> } else
> @@ -1356,7 +1356,7 @@ static void asm_call(ASMState *as, IRIns *ir)
> asm_gencall(as, ci, args);
> }
>
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> static void asm_fppow(ASMState *as, IRIns *ir, IRRef lref, IRRef rref)
> {
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_pow];
> @@ -1703,10 +1703,10 @@ static void asm_ir(ASMState *as, IRIns *ir)
> case IR_MUL: asm_mul(as, ir); break;
> case IR_MOD: asm_mod(as, ir); break;
> case IR_NEG: asm_neg(as, ir); break;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
> case IR_DIV: case IR_POW: case IR_ABS:
> case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
> - lua_assert(0); /* Unused for LJ_SOFTFP. */
> + lua_assert(0); /* Unused for LJ_SOFTFP32. */
> break;
> #else
> case IR_DIV: asm_div(as, ir); break;
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 0e60fc07..a26a82cd 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -290,7 +290,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> {
> ra_leftov(as, gpr, ref);
> gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
> fpr++;
> #endif
> }
> @@ -301,7 +301,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> emit_spstore(as, ir, r, ofs);
> ofs += irt_isnum(ir->t) ? 8 : 4;
> #else
> - emit_spstore(as, ir, r, ofs + ((LJ_BE && (LJ_SOFTFP || r < RID_MAX_GPR) && !irt_is64(ir->t)) ? 4 : 0));
> + emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isfp(ir->t) && !irt_is64(ir->t)) ? 4 : 0));
> ofs += 8;
> #endif
> }
> @@ -312,7 +312,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> #endif
> if (gpr <= REGARG_LASTGPR) {
> gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
> fpr++;
> #endif
> } else {
> @@ -461,12 +461,36 @@ static void asm_tobit(ASMState *as, IRIns *ir)
> emit_tg(as, MIPSI_MFC1, dest, tmp);
> emit_fgh(as, MIPSI_ADD_D, tmp, left, right);
> }
> +#elif LJ_64 /* && LJ_SOFTFP */
> +static void asm_tointg(ASMState *as, IRIns *ir, Reg r)
> +{
> + /* The modified regs must match with the *.dasc implementation. */
> + RegSet drop = RID2RSET(REGARG_FIRSTGPR)|RID2RSET(RID_RET)|RID2RSET(RID_RET+1)|
> + RID2RSET(RID_R1)|RID2RSET(RID_R12);
> + if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
> + ra_evictset(as, drop);
> + /* Return values are in RID_RET (converted value) and RID_RET+1 (status). */
> + ra_destreg(as, ir, RID_RET);
> + asm_guard(as, MIPSI_BNE, RID_RET+1, RID_ZERO);
> + emit_call(as, (void *)lj_ir_callinfo[IRCALL_lj_vm_tointg].func, 0);
> + if (r == RID_NONE)
> + ra_leftov(as, REGARG_FIRSTGPR, ir->op1);
> + else if (r != REGARG_FIRSTGPR)
> + emit_move(as, REGARG_FIRSTGPR, r);
> +}
> +
> +static void asm_tobit(ASMState *as, IRIns *ir)
> +{
> + Reg dest = ra_dest(as, ir, RSET_GPR);
> + emit_dta(as, MIPSI_SLL, dest, dest, 0);
> + asm_callid(as, ir, IRCALL_lj_vm_tobit);
> +}
> #endif
>
> static void asm_conv(ASMState *as, IRIns *ir)
> {
> IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> int stfp = (st == IRT_NUM || st == IRT_FLOAT);
> #endif
> #if LJ_64
> @@ -477,12 +501,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
> lua_assert(!(irt_isint64(ir->t) ||
> (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
> #endif
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
> /* FP conversions are handled by SPLIT. */
> lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
> /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
> #else
> lua_assert(irt_type(ir->t) != st);
> +#if !LJ_SOFTFP
> if (irt_isfp(ir->t)) {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> if (stfp) { /* FP to FP conversion. */
> @@ -608,6 +633,42 @@ static void asm_conv(ASMState *as, IRIns *ir)
> }
> }
> } else
> +#else
> + if (irt_isfp(ir->t)) {
> +#if LJ_64 && LJ_HASFFI
> + if (stfp) { /* FP to FP conversion. */
> + asm_callid(as, ir, irt_isnum(ir->t) ? IRCALL_softfp_f2d :
> + IRCALL_softfp_d2f);
> + } else { /* Integer to FP conversion. */
> + IRCallID cid = ((IRT_IS64 >> st) & 1) ?
> + (irt_isnum(ir->t) ?
> + (st == IRT_I64 ? IRCALL_fp64_l2d : IRCALL_fp64_ul2d) :
> + (st == IRT_I64 ? IRCALL_fp64_l2f : IRCALL_fp64_ul2f)) :
> + (irt_isnum(ir->t) ?
> + (st == IRT_INT ? IRCALL_softfp_i2d : IRCALL_softfp_ui2d) :
> + (st == IRT_INT ? IRCALL_softfp_i2f : IRCALL_softfp_ui2f));
> + asm_callid(as, ir, cid);
> + }
> +#else
> + asm_callid(as, ir, IRCALL_softfp_i2d);
> +#endif
> + } else if (stfp) { /* FP to integer conversion. */
> + if (irt_isguard(ir->t)) {
> + /* Checked conversions are only supported from number to int. */
> + lua_assert(irt_isint(ir->t) && st == IRT_NUM);
> + asm_tointg(as, ir, RID_NONE);
> + } else {
> + IRCallID cid = irt_is64(ir->t) ?
> + ((st == IRT_NUM) ?
> + (irt_isi64(ir->t) ? IRCALL_fp64_d2l : IRCALL_fp64_d2ul) :
> + (irt_isi64(ir->t) ? IRCALL_fp64_f2l : IRCALL_fp64_f2ul)) :
> + ((st == IRT_NUM) ?
> + (irt_isint(ir->t) ? IRCALL_softfp_d2i : IRCALL_softfp_d2ui) :
> + (irt_isint(ir->t) ? IRCALL_softfp_f2i : IRCALL_softfp_f2ui));
> + asm_callid(as, ir, cid);
> + }
> + } else
> +#endif
> #endif
> {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -665,7 +726,7 @@ static void asm_strto(ASMState *as, IRIns *ir)
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
> IRRef args[2];
> int32_t ofs = 0;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
> ra_evictset(as, RSET_SCRATCH);
> if (ra_used(ir)) {
> if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> @@ -806,7 +867,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> MCLabel l_end, l_loop, l_next;
>
> rset_clear(allow, tab);
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
> if (!isk) {
> key = ra_alloc1(as, refkey, allow);
> rset_clear(allow, key);
> @@ -826,7 +887,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> }
> }
> #else
> - if (irt_isnum(kt)) {
> + if (!LJ_SOFTFP && irt_isnum(kt)) {
> key = ra_alloc1(as, refkey, RSET_FPR);
> tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
> } else if (!irt_ispri(kt)) {
> @@ -882,6 +943,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
> emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
> emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> + } else if (LJ_SOFTFP && irt_isnum(kt)) {
> + emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> } else if (irt_isaddr(kt)) {
> Reg refk = tmp2;
> if (isk) {
> @@ -960,7 +1024,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_dta(as, MIPSI_ROTR, dest, tmp1, (-HASH_ROT1)&31);
> if (irt_isnum(kt)) {
> emit_dst(as, MIPSI_ADDU, tmp1, tmp1, tmp1);
> - emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 0);
> + emit_dta(as, MIPSI_DSRA32, tmp1, LJ_SOFTFP ? key : tmp1, 0);
> emit_dta(as, MIPSI_SLL, tmp2, LJ_SOFTFP ? key : tmp1, 0);
> #if !LJ_SOFTFP
> emit_tg(as, MIPSI_DMFC1, tmp1, key);
> @@ -1123,7 +1187,7 @@ static MIPSIns asm_fxloadins(IRIns *ir)
> case IRT_U8: return MIPSI_LBU;
> case IRT_I16: return MIPSI_LH;
> case IRT_U16: return MIPSI_LHU;
> - case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_LDC1;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
> case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
> default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
> }
> @@ -1134,7 +1198,7 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
> switch (irt_type(ir->t)) {
> case IRT_I8: case IRT_U8: return MIPSI_SB;
> case IRT_I16: case IRT_U16: return MIPSI_SH;
> - case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_SDC1;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
> case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
> default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
> }
> @@ -1199,7 +1263,7 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
>
> static void asm_ahuvload(ASMState *as, IRIns *ir)
> {
> - int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> + int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
> Reg dest = RID_NONE, type = RID_TMP, idx;
> RegSet allow = RSET_GPR;
> int32_t ofs = 0;
> @@ -1212,7 +1276,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
> }
> }
> if (ra_used(ir)) {
> - lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> + lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
> irt_isint(ir->t) || irt_isaddr(ir->t));
> dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> @@ -1261,10 +1325,10 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
> int32_t ofs = 0;
> if (ir->r == RID_SINK)
> return;
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> - src = ra_alloc1(as, ir->op2, RSET_FPR);
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> + src = ra_alloc1(as, ir->op2, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
> idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> - emit_hsi(as, MIPSI_SDC1, src, idx, ofs);
> + emit_hsi(as, LJ_SOFTFP ? MIPSI_SD : MIPSI_SDC1, src, idx, ofs);
> } else {
> #if LJ_32
> if (!irt_ispri(ir->t)) {
> @@ -1312,7 +1376,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
> IRType1 t = ir->t;
> #if LJ_32
> int32_t ofs = 8*((int32_t)ir->op1-1) + ((ir->op2 & IRSLOAD_FRAME) ? 4 : 0);
> - int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> + int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
> if (hiop)
> t.irt = IRT_NUM;
> #else
> @@ -1320,7 +1384,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
> #endif
> lua_assert(!(ir->op2 & IRSLOAD_PARENT)); /* Handled by asm_head_side(). */
> lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
> lua_assert(!(ir->op2 & IRSLOAD_CONVERT)); /* Handled by LJ_SOFTFP SPLIT. */
> if (hiop && ra_used(ir+1)) {
> type = ra_dest(as, ir+1, allow);
> @@ -1328,29 +1392,44 @@ static void asm_sload(ASMState *as, IRIns *ir)
> }
> #else
> if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
> - dest = ra_scratch(as, RSET_FPR);
> + dest = ra_scratch(as, LJ_SOFTFP ? allow : RSET_FPR);
> asm_tointg(as, ir, dest);
> t.irt = IRT_NUM; /* Continue with a regular number type check. */
> } else
> #endif
> if (ra_used(ir)) {
> - lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> + lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
> irt_isint(ir->t) || irt_isaddr(ir->t));
> dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> base = ra_alloc1(as, REF_BASE, allow);
> rset_clear(allow, base);
> - if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
> + if (!LJ_SOFTFP32 && (ir->op2 & IRSLOAD_CONVERT)) {
> if (irt_isint(t)) {
> - Reg tmp = ra_scratch(as, RSET_FPR);
> + Reg tmp = ra_scratch(as, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
> +#if LJ_SOFTFP
> + ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> + ra_destreg(as, ir, RID_RET);
> + emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_d2i].func, 0);
> + if (tmp != REGARG_FIRSTGPR)
> + emit_move(as, REGARG_FIRSTGPR, tmp);
> +#else
> emit_tg(as, MIPSI_MFC1, dest, tmp);
> emit_fg(as, MIPSI_TRUNC_W_D, tmp, tmp);
> +#endif
> dest = tmp;
> t.irt = IRT_NUM; /* Check for original type. */
> } else {
> Reg tmp = ra_scratch(as, RSET_GPR);
> +#if LJ_SOFTFP
> + ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> + ra_destreg(as, ir, RID_RET);
> + emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_i2d].func, 0);
> + emit_dta(as, MIPSI_SLL, REGARG_FIRSTGPR, tmp, 0);
> +#else
> emit_fg(as, MIPSI_CVT_D_W, dest, dest);
> emit_tg(as, MIPSI_MTC1, tmp, dest);
> +#endif
> dest = tmp;
> t.irt = IRT_INT; /* Check for original type. */
> }
> @@ -1399,7 +1478,7 @@ dotypecheck:
> if (irt_isnum(t)) {
> asm_guard(as, MIPSI_BEQ, RID_TMP, RID_ZERO);
> emit_tsi(as, MIPSI_SLTIU, RID_TMP, RID_TMP, (int32_t)LJ_TISNUM);
> - if (ra_hasreg(dest))
> + if (!LJ_SOFTFP && ra_hasreg(dest))
> emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
> } else {
> asm_guard(as, MIPSI_BNE, RID_TMP,
> @@ -1409,7 +1488,7 @@ dotypecheck:
> }
> emit_tsi(as, MIPSI_LD, type, base, ofs);
> } else if (ra_hasreg(dest)) {
> - if (irt_isnum(t))
> + if (!LJ_SOFTFP && irt_isnum(t))
> emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
> else
> emit_tsi(as, irt_isint(t) ? MIPSI_LW : MIPSI_LD, dest, base,
> @@ -1554,26 +1633,40 @@ static void asm_fpunary(ASMState *as, IRIns *ir, MIPSIns mi)
> Reg left = ra_hintalloc(as, ir->op1, dest, RSET_FPR);
> emit_fg(as, mi, dest, left);
> }
> +#endif
>
> +#if !LJ_SOFTFP32
> static void asm_fpmath(ASMState *as, IRIns *ir)
> {
> if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir))
> return;
> +#if !LJ_SOFTFP
> if (ir->op2 <= IRFPM_TRUNC)
> asm_callround(as, ir, IRCALL_lj_vm_floor + ir->op2);
> else if (ir->op2 == IRFPM_SQRT)
> asm_fpunary(as, ir, MIPSI_SQRT_D);
> else
> +#endif
> asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
> }
> #endif
>
> +#if !LJ_SOFTFP
> +#define asm_fpadd(as, ir) asm_fparith(as, ir, MIPSI_ADD_D)
> +#define asm_fpsub(as, ir) asm_fparith(as, ir, MIPSI_SUB_D)
> +#define asm_fpmul(as, ir) asm_fparith(as, ir, MIPSI_MUL_D)
> +#elif LJ_64 /* && LJ_SOFTFP */
> +#define asm_fpadd(as, ir) asm_callid(as, ir, IRCALL_softfp_add)
> +#define asm_fpsub(as, ir) asm_callid(as, ir, IRCALL_softfp_sub)
> +#define asm_fpmul(as, ir) asm_callid(as, ir, IRCALL_softfp_mul)
> +#endif
> +
> static void asm_add(ASMState *as, IRIns *ir)
> {
> IRType1 t = ir->t;
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (irt_isnum(t)) {
> - asm_fparith(as, ir, MIPSI_ADD_D);
> + asm_fpadd(as, ir);
> } else
> #endif
> {
> @@ -1595,9 +1688,9 @@ static void asm_add(ASMState *as, IRIns *ir)
>
> static void asm_sub(ASMState *as, IRIns *ir)
> {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (irt_isnum(ir->t)) {
> - asm_fparith(as, ir, MIPSI_SUB_D);
> + asm_fpsub(as, ir);
> } else
> #endif
> {
> @@ -1611,9 +1704,9 @@ static void asm_sub(ASMState *as, IRIns *ir)
>
> static void asm_mul(ASMState *as, IRIns *ir)
> {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> if (irt_isnum(ir->t)) {
> - asm_fparith(as, ir, MIPSI_MUL_D);
> + asm_fpmul(as, ir);
> } else
> #endif
> {
> @@ -1640,7 +1733,7 @@ static void asm_mod(ASMState *as, IRIns *ir)
> asm_callid(as, ir, IRCALL_lj_vm_modi);
> }
>
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> static void asm_pow(ASMState *as, IRIns *ir)
> {
> #if LJ_64 && LJ_HASFFI
> @@ -1660,7 +1753,11 @@ static void asm_div(ASMState *as, IRIns *ir)
> IRCALL_lj_carith_divu64);
> else
> #endif
> +#if !LJ_SOFTFP
> asm_fparith(as, ir, MIPSI_DIV_D);
> +#else
> + asm_callid(as, ir, IRCALL_softfp_div);
> +#endif
> }
> #endif
>
> @@ -1670,6 +1767,13 @@ static void asm_neg(ASMState *as, IRIns *ir)
> if (irt_isnum(ir->t)) {
> asm_fpunary(as, ir, MIPSI_NEG_D);
> } else
> +#elif LJ_64 /* && LJ_SOFTFP */
> + if (irt_isnum(ir->t)) {
> + Reg dest = ra_dest(as, ir, RSET_GPR);
> + Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> + emit_dst(as, MIPSI_XOR, dest, left,
> + ra_allock(as, 0x8000000000000000ll, rset_exclude(RSET_GPR, dest)));
> + } else
> #endif
> {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -1679,7 +1783,17 @@ static void asm_neg(ASMState *as, IRIns *ir)
> }
> }
>
> +#if !LJ_SOFTFP
> #define asm_abs(as, ir) asm_fpunary(as, ir, MIPSI_ABS_D)
> +#elif LJ_64 /* && LJ_SOFTFP */
> +static void asm_abs(ASMState *as, IRIns *ir)
> +{
> + Reg dest = ra_dest(as, ir, RSET_GPR);
> + Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> + emit_tsml(as, MIPSI_DEXTM, dest, left, 30, 0);
> +}
> +#endif
> +
> #define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
> #define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
>
> @@ -1924,15 +2038,21 @@ static void asm_bror(ASMState *as, IRIns *ir)
> }
> }
>
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
> static void asm_sfpmin_max(ASMState *as, IRIns *ir)
> {
> CCallInfo ci = lj_ir_callinfo[(IROp)ir->o == IR_MIN ? IRCALL_lj_vm_sfmin : IRCALL_lj_vm_sfmax];
> +#if LJ_64
> + IRRef args[2];
> + args[0] = ir->op1;
> + args[1] = ir->op2;
> +#else
> IRRef args[4];
> args[0^LJ_BE] = ir->op1;
> args[1^LJ_BE] = (ir+1)->op1;
> args[2^LJ_BE] = ir->op2;
> args[3^LJ_BE] = (ir+1)->op2;
> +#endif
> asm_setupresult(as, ir, &ci);
> emit_call(as, (void *)ci.func, 0);
> ci.func = NULL;
> @@ -1942,7 +2062,10 @@ static void asm_sfpmin_max(ASMState *as, IRIns *ir)
>
> static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
> {
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + asm_sfpmin_max(as, ir);
> +#else
> Reg dest = ra_dest(as, ir, RSET_FPR);
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> right = (left >> 8); left &= 255;
> @@ -1953,6 +2076,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
> if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
> }
> emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
> +#endif
> } else {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg right, left = ra_alloc2(as, ir, RSET_GPR);
> @@ -1973,18 +2097,24 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>
> /* -- Comparisons --------------------------------------------------------- */
>
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
> /* SFP comparisons. */
> static void asm_sfpcomp(ASMState *as, IRIns *ir)
> {
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
> RegSet drop = RSET_SCRATCH;
> Reg r;
> +#if LJ_64
> + IRRef args[2];
> + args[0] = ir->op1;
> + args[1] = ir->op2;
> +#else
> IRRef args[4];
> args[LJ_LE ? 0 : 1] = ir->op1; args[LJ_LE ? 1 : 0] = (ir+1)->op1;
> args[LJ_LE ? 2 : 3] = ir->op2; args[LJ_LE ? 3 : 2] = (ir+1)->op2;
> +#endif
>
> - for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> + for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+(LJ_64?1:3); r++) {
> if (!rset_test(as->freeset, r) &&
> regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
> rset_clear(drop, r);
> @@ -2038,11 +2168,15 @@ static void asm_comp(ASMState *as, IRIns *ir)
> {
> /* ORDER IR: LT GE LE GT ULT UGE ULE UGT. */
> IROp op = ir->o;
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + asm_sfpcomp(as, ir);
> +#else
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> right = (left >> 8); left &= 255;
> asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
> emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
> +#endif
> } else {
> Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
> if (op == IR_ABC) op = IR_UGT;
> @@ -2074,9 +2208,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
> Reg right, left = ra_alloc2(as, ir, (!LJ_SOFTFP && irt_isnum(ir->t)) ?
> RSET_FPR : RSET_GPR);
> right = (left >> 8); left &= 255;
> - if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + asm_sfpcomp(as, ir);
> +#else
> asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
> emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
> +#endif
> } else {
> asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
> }
> @@ -2269,7 +2407,7 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if ((sn & SNAP_NORESTORE))
> continue;
> if (irt_isnum(ir->t)) {
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
> Reg tmp;
> RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> lua_assert(irref_isk(ref)); /* LJ_SOFTFP: must be a number constant. */
> @@ -2278,6 +2416,9 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
> tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
> emit_tsi(as, MIPSI_SW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#elif LJ_SOFTFP /* && LJ_64 */
> + Reg src = ra_alloc1(as, ref, rset_exclude(RSET_GPR, RID_BASE));
> + emit_tsi(as, MIPSI_SD, src, RID_BASE, ofs);
> #else
> Reg src = ra_alloc1(as, ref, RSET_FPR);
> emit_hsi(as, MIPSI_SDC1, src, RID_BASE, ofs);
> diff --git a/src/lj_crecord.c b/src/lj_crecord.c
> index ffe995f4..804cdbf4 100644
> --- a/src/lj_crecord.c
> +++ b/src/lj_crecord.c
> @@ -212,7 +212,7 @@ static void crec_copy_emit(jit_State *J, CRecMemList *ml, MSize mlp,
> ml[i].trval = emitir(IRT(IR_XLOAD, ml[i].tp), trsptr, 0);
> ml[i].trofs = trofs;
> i++;
> - rwin += (LJ_SOFTFP && ml[i].tp == IRT_NUM) ? 2 : 1;
> + rwin += (LJ_SOFTFP32 && ml[i].tp == IRT_NUM) ? 2 : 1;
> if (rwin >= CREC_COPY_REGWIN || i >= mlp) { /* Flush buffered stores. */
> rwin = 0;
> for ( ; j < i; j++) {
> @@ -1152,7 +1152,7 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd,
> else
> tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_I8 : IRT_I16,IRCONV_SEXT);
> }
> - } else if (LJ_SOFTFP && ctype_isfp(d->info) && d->size > 4) {
> + } else if (LJ_SOFTFP32 && ctype_isfp(d->info) && d->size > 4) {
> lj_needsplit(J);
> }
> #if LJ_TARGET_X86
> diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
> index 8a9ee24d..bb6593ae 100644
> --- a/src/lj_emit_mips.h
> +++ b/src/lj_emit_mips.h
> @@ -12,6 +12,8 @@ static intptr_t get_k64val(IRIns *ir)
> return (intptr_t)ir_kgc(ir);
> } else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) {
> return (intptr_t)ir_kptr(ir);
> + } else if (LJ_SOFTFP && ir->o == IR_KNUM) {
> + return (intptr_t)ir_knum(ir)->u64;
> } else {
> lua_assert(ir->o == IR_KINT || ir->o == IR_KNULL);
> return ir->i; /* Sign-extended. */
> diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
> index 8af9da1d..0746ec64 100644
> --- a/src/lj_ffrecord.c
> +++ b/src/lj_ffrecord.c
> @@ -986,7 +986,7 @@ static void LJ_FASTCALL recff_string_format(jit_State *J, RecordFFData *rd)
> handle_num:
> tra = lj_ir_tonum(J, tra);
> tr = lj_ir_call(J, id, tr, trsf, tra);
> - if (LJ_SOFTFP) lj_needsplit(J);
> + if (LJ_SOFTFP32) lj_needsplit(J);
> break;
> case STRFMT_STR:
> if (!tref_isstr(tra)) {
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index aa06b273..c1ac29d1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -52,7 +52,7 @@ typedef struct CCallInfo {
> #define CCI_XARGS(ci) (((ci)->flags >> CCI_XARGS_SHIFT) & 3)
> #define CCI_XA (1u << CCI_XARGS_SHIFT)
>
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> #define CCI_XNARGS(ci) (CCI_NARGS((ci)) + CCI_XARGS((ci)))
> #else
> #define CCI_XNARGS(ci) CCI_NARGS((ci))
> @@ -79,13 +79,19 @@ typedef struct CCallInfo {
> #define IRCALLCOND_SOFTFP_FFI(x) NULL
> #endif
>
> -#if LJ_SOFTFP && LJ_TARGET_MIPS32
> +#if LJ_SOFTFP && LJ_TARGET_MIPS
> #define IRCALLCOND_SOFTFP_MIPS(x) x
> #else
> #define IRCALLCOND_SOFTFP_MIPS(x) NULL
> #endif
>
> -#define LJ_NEED_FP64 (LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS32)
> +#if LJ_SOFTFP && LJ_TARGET_MIPS64
> +#define IRCALLCOND_SOFTFP_MIPS64(x) x
> +#else
> +#define IRCALLCOND_SOFTFP_MIPS64(x) NULL
> +#endif
> +
> +#define LJ_NEED_FP64 (LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS)
>
> #if LJ_HASFFI && (LJ_SOFTFP || LJ_NEED_FP64)
> #define IRCALLCOND_FP64_FFI(x) x
> @@ -113,6 +119,14 @@ typedef struct CCallInfo {
> #define XA2_FP 0
> #endif
>
> +#if LJ_SOFTFP32
> +#define XA_FP32 CCI_XA
> +#define XA2_FP32 (CCI_XA+CCI_XA)
> +#else
> +#define XA_FP32 0
> +#define XA2_FP32 0
> +#endif
> +
> #if LJ_32
> #define XA_64 CCI_XA
> #define XA2_64 (CCI_XA+CCI_XA)
> @@ -185,20 +199,21 @@ typedef struct CCallInfo {
> _(ANY, pow, 2, N, NUM, XA2_FP) \
> _(ANY, atan2, 2, N, NUM, XA2_FP) \
> _(ANY, ldexp, 2, N, NUM, XA_FP) \
> - _(SOFTFP, lj_vm_tobit, 2, N, INT, 0) \
> - _(SOFTFP, softfp_add, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_sub, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_mul, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_div, 4, N, NUM, 0) \
> - _(SOFTFP, softfp_cmp, 4, N, NIL, 0) \
> + _(SOFTFP, lj_vm_tobit, 1, N, INT, XA_FP32) \
> + _(SOFTFP, softfp_add, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_sub, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_mul, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_div, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP, softfp_cmp, 2, N, NIL, XA2_FP32) \
> _(SOFTFP, softfp_i2d, 1, N, NUM, 0) \
> - _(SOFTFP, softfp_d2i, 2, N, INT, 0) \
> - _(SOFTFP_MIPS, lj_vm_sfmin, 4, N, NUM, 0) \
> - _(SOFTFP_MIPS, lj_vm_sfmax, 4, N, NUM, 0) \
> + _(SOFTFP, softfp_d2i, 1, N, INT, XA_FP32) \
> + _(SOFTFP_MIPS, lj_vm_sfmin, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP_MIPS, lj_vm_sfmax, 2, N, NUM, XA2_FP32) \
> + _(SOFTFP_MIPS64, lj_vm_tointg, 1, N, INT, 0) \
> _(SOFTFP_FFI, softfp_ui2d, 1, N, NUM, 0) \
> _(SOFTFP_FFI, softfp_f2d, 1, N, NUM, 0) \
> - _(SOFTFP_FFI, softfp_d2ui, 2, N, INT, 0) \
> - _(SOFTFP_FFI, softfp_d2f, 2, N, FLOAT, 0) \
> + _(SOFTFP_FFI, softfp_d2ui, 1, N, INT, XA_FP32) \
> + _(SOFTFP_FFI, softfp_d2f, 1, N, FLOAT, XA_FP32) \
> _(SOFTFP_FFI, softfp_i2f, 1, N, FLOAT, 0) \
> _(SOFTFP_FFI, softfp_ui2f, 1, N, FLOAT, 0) \
> _(SOFTFP_FFI, softfp_f2i, 1, N, INT, 0) \
> diff --git a/src/lj_iropt.h b/src/lj_iropt.h
> index 73aef0ef..a59ba3f4 100644
> --- a/src/lj_iropt.h
> +++ b/src/lj_iropt.h
> @@ -150,7 +150,7 @@ LJ_FUNC IRType lj_opt_narrow_forl(jit_State *J, cTValue *forbase);
> /* Optimization passes. */
> LJ_FUNC void lj_opt_dce(jit_State *J);
> LJ_FUNC int lj_opt_loop(jit_State *J);
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> LJ_FUNC void lj_opt_split(jit_State *J);
> #else
> #define lj_opt_split(J) UNUSED(J)
> diff --git a/src/lj_jit.h b/src/lj_jit.h
> index cc8efd20..c06829ab 100644
> --- a/src/lj_jit.h
> +++ b/src/lj_jit.h
> @@ -375,7 +375,7 @@ enum {
> ((TValue *)(((intptr_t)&J->ksimd[2*(n)] + 15) & ~(intptr_t)15))
>
> /* Set/reset flag to activate the SPLIT pass for the current trace. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> #define lj_needsplit(J) (J->needsplit = 1)
> #define lj_resetsplit(J) (J->needsplit = 0)
> #else
> @@ -438,7 +438,7 @@ typedef struct jit_State {
> MSize sizesnapmap; /* Size of temp. snapshot map buffer. */
>
> PostProc postproc; /* Required post-processing after execution. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
> uint8_t needsplit; /* Need SPLIT pass. */
> #endif
> uint8_t retryrec; /* Retry recording. */
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 45507e0d..bf95e1eb 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -984,6 +984,9 @@ static LJ_AINLINE void copyTV(lua_State *L, TValue *o1, const TValue *o2)
>
> #if LJ_SOFTFP
> LJ_ASMF int32_t lj_vm_tobit(double x);
> +#if LJ_TARGET_MIPS64
> +LJ_ASMF int32_t lj_vm_tointg(double x);
> +#endif
> #endif
>
> static LJ_AINLINE int32_t lj_num2bit(lua_Number n)
> diff --git a/src/lj_opt_split.c b/src/lj_opt_split.c
> index c0788106..2fc36b8d 100644
> --- a/src/lj_opt_split.c
> +++ b/src/lj_opt_split.c
> @@ -8,7 +8,7 @@
>
> #include "lj_obj.h"
>
> -#if LJ_HASJIT && (LJ_SOFTFP || (LJ_32 && LJ_HASFFI))
> +#if LJ_HASJIT && (LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI))
>
> #include "lj_err.h"
> #include "lj_buf.h"
> diff --git a/src/lj_snap.c b/src/lj_snap.c
> index a063c316..9146cddc 100644
> --- a/src/lj_snap.c
> +++ b/src/lj_snap.c
> @@ -93,7 +93,7 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
> (ir->op2 & (IRSLOAD_READONLY|IRSLOAD_PARENT)) != IRSLOAD_PARENT)
> sn |= SNAP_NORESTORE;
> }
> - if (LJ_SOFTFP && irt_isnum(ir->t))
> + if (LJ_SOFTFP32 && irt_isnum(ir->t))
> sn |= SNAP_SOFTFPNUM;
> map[n++] = sn;
> }
> @@ -379,7 +379,7 @@ IRIns *lj_snap_regspmap(GCtrace *T, SnapNo snapno, IRIns *ir)
> break;
> }
> }
> - } else if (LJ_SOFTFP && ir->o == IR_HIOP) {
> + } else if (LJ_SOFTFP32 && ir->o == IR_HIOP) {
> ref++;
> } else if (ir->o == IR_PVAL) {
> ref = ir->op1 + REF_BIAS;
> @@ -491,7 +491,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
> } else {
> IRType t = irt_type(ir->t);
> uint32_t mode = IRSLOAD_INHERIT|IRSLOAD_PARENT;
> - if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
> + if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
> if (ir->o == IR_SLOAD) mode |= (ir->op2 & IRSLOAD_READONLY);
> tr = emitir_raw(IRT(IR_SLOAD, t), s, mode);
> }
> @@ -525,7 +525,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
> if (irs->r == RID_SINK && snap_sunk_store(T, ir, irs)) {
> if (snap_pref(J, T, map, nent, seen, irs->op2) == 0)
> snap_pref(J, T, map, nent, seen, T->ir[irs->op2].op1);
> - else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> + else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
> irs+1 < irlast && (irs+1)->o == IR_HIOP)
> snap_pref(J, T, map, nent, seen, (irs+1)->op2);
> }
> @@ -584,10 +584,10 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
> lua_assert(irc->o == IR_CONV && irc->op2 == IRCONV_NUM_INT);
> val = snap_pref(J, T, map, nent, seen, irc->op1);
> val = emitir(IRTN(IR_CONV), val, IRCONV_NUM_INT);
> - } else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> + } else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
> irs+1 < irlast && (irs+1)->o == IR_HIOP) {
> IRType t = IRT_I64;
> - if (LJ_SOFTFP && irt_type((irs+1)->t) == IRT_SOFTFP)
> + if (LJ_SOFTFP32 && irt_type((irs+1)->t) == IRT_SOFTFP)
> t = IRT_NUM;
> lj_needsplit(J);
> if (irref_isk(irs->op2) && irref_isk((irs+1)->op2)) {
> @@ -645,7 +645,7 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
> int32_t *sps = &ex->spill[regsp_spill(rs)];
> if (irt_isinteger(t)) {
> setintV(o, *sps);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
> } else if (irt_isnum(t)) {
> o->u64 = *(uint64_t *)sps;
> #endif
> @@ -670,6 +670,9 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
> #if !LJ_SOFTFP
> } else if (irt_isnum(t)) {
> setnumV(o, ex->fpr[r-RID_MIN_FPR]);
> +#elif LJ_64 /* && LJ_SOFTFP */
> + } else if (irt_isnum(t)) {
> + o->u64 = ex->gpr[r-RID_MIN_GPR];
> #endif
> #if LJ_64 && !LJ_GC64
> } else if (irt_is64(t)) {
> @@ -823,7 +826,7 @@ static void snap_unsink(jit_State *J, GCtrace *T, ExitState *ex,
> val = lj_tab_set(J->L, t, &tmp);
> /* NOBARRIER: The table is new (marked white). */
> snap_restoreval(J, T, ex, snapno, rfilt, irs->op2, val);
> - if (LJ_SOFTFP && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
> + if (LJ_SOFTFP32 && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
> snap_restoreval(J, T, ex, snapno, rfilt, (irs+1)->op2, &tmp);
> val->u32.hi = tmp.u32.lo;
> }
> @@ -884,7 +887,7 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
> continue;
> }
> snap_restoreval(J, T, ex, snapno, rfilt, ref, o);
> - if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
> + if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
> TValue tmp;
> snap_restoreval(J, T, ex, snapno, rfilt, ref+1, &tmp);
> o->u32.hi = tmp.u32.lo;
> diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
> index 04be38f0..9839b5ac 100644
> --- a/src/vm_mips64.dasc
> +++ b/src/vm_mips64.dasc
> @@ -1984,6 +1984,38 @@ static void build_subroutines(BuildCtx *ctx)
> |1:
> | jr ra
> |. move CRET1, r0
> + |
> + |// FP number to int conversion with a check for soft-float.
> + |// Modifies CARG1, CRET1, CRET2, TMP0, AT.
> + |->vm_tointg:
> + |.if JIT
> + | dsll CRET2, CARG1, 1
> + | beqz CRET2, >2
> + |. li TMP0, 1076
> + | dsrl AT, CRET2, 53
> + | dsubu TMP0, TMP0, AT
> + | sltiu AT, TMP0, 54
> + | beqz AT, >1
> + |. dextm CRET2, CRET2, 0, 20
> + | dinsu CRET2, AT, 21, 21
> + | slt AT, CARG1, r0
> + | dsrlv CRET1, CRET2, TMP0
> + | dsubu CARG1, r0, CRET1
> + | movn CRET1, CARG1, AT
> + | li CARG1, 64
> + | subu TMP0, CARG1, TMP0
> + | dsllv CRET2, CRET2, TMP0 // Integer check.
> + | sextw AT, CRET1
> + | xor AT, CRET1, AT // Range check.
> + | jr ra
> + |. movz CRET2, AT, CRET2
> + |1:
> + | jr ra
> + |. li CRET2, 1
> + |2:
> + | jr ra
> + |. move CRET1, r0
> + |.endif
> |.endif
> |
> |.macro .ffunc_bit, name
> @@ -2669,6 +2701,23 @@ static void build_subroutines(BuildCtx *ctx)
> |. li CRET1, 0
> |.endif
> |
> + |.macro sfmin_max, name, intins
> + |->vm_sf .. name:
> + |.if JIT and not FPU
> + | move TMP2, ra
> + | bal ->vm_sfcmpolt
> + |. nop
> + | move ra, TMP2
> + | move TMP0, CRET1
> + | move CRET1, CARG1
> + | jr ra
> + |. intins CRET1, CARG2, TMP0
> + |.endif
> + |.endmacro
> + |
> + | sfmin_max min, movz
> + | sfmin_max max, movn
> + |
> |//-----------------------------------------------------------------------
> |//-- Miscellaneous functions --------------------------------------------
> |//-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (3 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 11:40 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 14:53 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
` (16 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
Sponsored by Cisco Systems, Inc.
(cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
The software floating point library is used on machines which do not
have hardware support for floating point [1]. This patch enables
support for such machines in the VM for powerpc.
This includes:
* Any loads/storages of double values use load/storage through 32-bit
registers of `lo` and `hi` part of the TValue union.
* Macro .FPU is added to skip instructions necessary only for
hard-float operations (load/store floating point registers from/on the
stack, when leave/enter VM, for example).
* Now r25 named as `SAVE1` is used as saved temporary register (used in
different fast functions)
* `sfi2d` macro is introduced to convert integer, that represents a
soft-float, to double. Receives destination and source registers, uses
`TMP0` and `TMP1`.
* `sfpmod` macro is introduced for soft-float point `fmod` built-in.
* `ins_arith` now receives the third parameter -- operation to use for
soft-float point.
* `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.
Support of soft-float point for the JIT compiler will be added in the
next patch.
[1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
Sergey Kaplun:
* added the description for the feature
Part of tarantool/tarantool#8825
---
src/host/buildvm_asm.c | 2 +-
src/lj_arch.h | 29 +-
src/lj_ccall.c | 38 +-
src/lj_ccall.h | 4 +-
src/lj_ccallback.c | 30 +-
src/lj_frame.h | 2 +-
src/lj_ircall.h | 2 +-
src/vm_ppc.dasc | 1249 +++++++++++++++++++++++++++++++++-------
8 files changed, 1101 insertions(+), 255 deletions(-)
diff --git a/src/host/buildvm_asm.c b/src/host/buildvm_asm.c
index ffd14903..43595b31 100644
--- a/src/host/buildvm_asm.c
+++ b/src/host/buildvm_asm.c
@@ -338,7 +338,7 @@ void emit_asm(BuildCtx *ctx)
#if !(LJ_TARGET_PS3 || LJ_TARGET_PSVITA)
fprintf(ctx->fp, "\t.section .note.GNU-stack,\"\"," ELFASM_PX "progbits\n");
#endif
-#if LJ_TARGET_PPC && !LJ_TARGET_PS3
+#if LJ_TARGET_PPC && !LJ_TARGET_PS3 && !LJ_ABI_SOFTFP
/* Hard-float ABI. */
fprintf(ctx->fp, "\t.gnu_attribute 4, 1\n");
#endif
diff --git a/src/lj_arch.h b/src/lj_arch.h
index c39526ea..8bb8757d 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -262,6 +262,29 @@
#else
#define LJ_ARCH_BITS 32
#define LJ_ARCH_NAME "ppc"
+
+#if !defined(LJ_ARCH_HASFPU)
+#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
+#define LJ_ARCH_HASFPU 0
+#else
+#define LJ_ARCH_HASFPU 1
+#endif
+#endif
+
+#if !defined(LJ_ABI_SOFTFP)
+#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
+#define LJ_ABI_SOFTFP 1
+#else
+#define LJ_ABI_SOFTFP 0
+#endif
+#endif
+#endif
+
+#if LJ_ABI_SOFTFP
+#define LJ_ARCH_NOJIT 1 /* NYI */
+#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL
+#else
+#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
#endif
#define LJ_TARGET_PPC 1
@@ -271,7 +294,6 @@
#define LJ_TARGET_MASKSHIFT 0
#define LJ_TARGET_MASKROT 1
#define LJ_TARGET_UNIFYROT 1 /* Want only IR_BROL. */
-#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
#if LJ_TARGET_CONSOLE
#define LJ_ARCH_PPC32ON64 1
@@ -431,16 +453,13 @@
#error "No support for ILP32 model on ARM64"
#endif
#elif LJ_TARGET_PPC
-#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
-#error "No support for PowerPC CPUs without double-precision FPU"
-#endif
#if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
#error "No support for little-endian PPC32"
#endif
#if LJ_ARCH_PPC64
#error "No support for PowerPC 64 bit mode (yet)"
#endif
-#ifdef __NO_FPRS__
+#if defined(__NO_FPRS__) && !defined(_SOFT_FLOAT)
#error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
#endif
#elif LJ_TARGET_MIPS32
diff --git a/src/lj_ccall.c b/src/lj_ccall.c
index d39ff861..c1e12f56 100644
--- a/src/lj_ccall.c
+++ b/src/lj_ccall.c
@@ -388,6 +388,24 @@
#define CCALL_HANDLE_COMPLEXARG \
/* Pass complex by value in 2 or 4 GPRs. */
+#define CCALL_HANDLE_GPR \
+ /* Try to pass argument in GPRs. */ \
+ if (n > 1) { \
+ lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \
+ if (ctype_isinteger(d->info) || ctype_isfp(d->info)) \
+ ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
+ else if (ngpr + n > maxgpr) \
+ ngpr = maxgpr; /* Prevent reordering. */ \
+ } \
+ if (ngpr + n <= maxgpr) { \
+ dp = &cc->gpr[ngpr]; \
+ ngpr += n; \
+ goto done; \
+ } \
+
+#if LJ_ABI_SOFTFP
+#define CCALL_HANDLE_REGARG CCALL_HANDLE_GPR
+#else
#define CCALL_HANDLE_REGARG \
if (isfp) { /* Try to pass argument in FPRs. */ \
if (nfpr + 1 <= CCALL_NARG_FPR) { \
@@ -396,24 +414,16 @@
d = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */ \
goto done; \
} \
- } else { /* Try to pass argument in GPRs. */ \
- if (n > 1) { \
- lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \
- if (ctype_isinteger(d->info)) \
- ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
- else if (ngpr + n > maxgpr) \
- ngpr = maxgpr; /* Prevent reordering. */ \
- } \
- if (ngpr + n <= maxgpr) { \
- dp = &cc->gpr[ngpr]; \
- ngpr += n; \
- goto done; \
- } \
+ } else { \
+ CCALL_HANDLE_GPR \
}
+#endif
+#if !LJ_ABI_SOFTFP
#define CCALL_HANDLE_RET \
if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
ctr = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */
+#endif
#elif LJ_TARGET_MIPS32
/* -- MIPS o32 calling conventions ---------------------------------------- */
@@ -1081,7 +1091,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
}
if (fid) lj_err_caller(L, LJ_ERR_FFI_NUMARG); /* Too few arguments. */
-#if LJ_TARGET_X64 || LJ_TARGET_PPC
+#if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP)
cc->nfpr = nfpr; /* Required for vararg functions. */
#endif
cc->nsp = nsp;
diff --git a/src/lj_ccall.h b/src/lj_ccall.h
index 59f66481..6efa48c7 100644
--- a/src/lj_ccall.h
+++ b/src/lj_ccall.h
@@ -86,9 +86,9 @@ typedef union FPRArg {
#elif LJ_TARGET_PPC
#define CCALL_NARG_GPR 8
-#define CCALL_NARG_FPR 8
+#define CCALL_NARG_FPR (LJ_ABI_SOFTFP ? 0 : 8)
#define CCALL_NRET_GPR 4 /* For complex double. */
-#define CCALL_NRET_FPR 1
+#define CCALL_NRET_FPR (LJ_ABI_SOFTFP ? 0 : 1)
#define CCALL_SPS_EXTRA 4
#define CCALL_SPS_FREE 0
diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
index 224b6b94..c33190d7 100644
--- a/src/lj_ccallback.c
+++ b/src/lj_ccallback.c
@@ -419,6 +419,23 @@ void lj_ccallback_mcode_free(CTState *cts)
#elif LJ_TARGET_PPC
+#define CALLBACK_HANDLE_GPR \
+ if (n > 1) { \
+ lua_assert(((LJ_ABI_SOFTFP && ctype_isnum(cta->info)) || /* double. */ \
+ ctype_isinteger(cta->info)) && n == 2); /* int64_t. */ \
+ ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
+ } \
+ if (ngpr + n <= maxgpr) { \
+ sp = &cts->cb.gpr[ngpr]; \
+ ngpr += n; \
+ goto done; \
+ }
+
+#if LJ_ABI_SOFTFP
+#define CALLBACK_HANDLE_REGARG \
+ CALLBACK_HANDLE_GPR \
+ UNUSED(isfp);
+#else
#define CALLBACK_HANDLE_REGARG \
if (isfp) { \
if (nfpr + 1 <= CCALL_NARG_FPR) { \
@@ -427,20 +444,15 @@ void lj_ccallback_mcode_free(CTState *cts)
goto done; \
} \
} else { /* Try to pass argument in GPRs. */ \
- if (n > 1) { \
- lua_assert(ctype_isinteger(cta->info) && n == 2); /* int64_t. */ \
- ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
- } \
- if (ngpr + n <= maxgpr) { \
- sp = &cts->cb.gpr[ngpr]; \
- ngpr += n; \
- goto done; \
- } \
+ CALLBACK_HANDLE_GPR \
}
+#endif
+#if !LJ_ABI_SOFTFP
#define CALLBACK_HANDLE_RET \
if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
*(double *)dp = *(float *)dp; /* FPRs always hold doubles. */
+#endif
#elif LJ_TARGET_MIPS32
diff --git a/src/lj_frame.h b/src/lj_frame.h
index 2bdf3c48..5cb3d639 100644
--- a/src/lj_frame.h
+++ b/src/lj_frame.h
@@ -226,7 +226,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK }; /* Special continuations. */
#define CFRAME_OFS_L 36
#define CFRAME_OFS_PC 32
#define CFRAME_OFS_MULTRES 28
-#define CFRAME_SIZE 272
+#define CFRAME_SIZE (LJ_ARCH_HASFPU ? 272 : 128)
#define CFRAME_SHIFT_MULTRES 3
#endif
#elif LJ_TARGET_MIPS32
diff --git a/src/lj_ircall.h b/src/lj_ircall.h
index c1ac29d1..bbad35b1 100644
--- a/src/lj_ircall.h
+++ b/src/lj_ircall.h
@@ -291,7 +291,7 @@ LJ_DATA const CCallInfo lj_ir_callinfo[IRCALL__MAX+1];
#define fp64_f2l __aeabi_f2lz
#define fp64_f2ul __aeabi_f2ulz
#endif
-#elif LJ_TARGET_MIPS
+#elif LJ_TARGET_MIPS || LJ_TARGET_PPC
#define softfp_add __adddf3
#define softfp_sub __subdf3
#define softfp_mul __muldf3
diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
index 7ad8df37..980ad897 100644
--- a/src/vm_ppc.dasc
+++ b/src/vm_ppc.dasc
@@ -103,6 +103,18 @@
|// Fixed register assignments for the interpreter.
|// Don't use: r1 = sp, r2 and r13 = reserved (TOC, TLS or SDATA)
|
+|.macro .FPU, a, b
+|.if FPU
+| a, b
+|.endif
+|.endmacro
+|
+|.macro .FPU, a, b, c
+|.if FPU
+| a, b, c
+|.endif
+|.endmacro
+|
|// The following must be C callee-save (but BASE is often refetched).
|.define BASE, r14 // Base of current Lua stack frame.
|.define KBASE, r15 // Constants of current Lua function.
@@ -116,8 +128,10 @@
|.define TISNUM, r22
|.define TISNIL, r23
|.define ZERO, r24
+|.if FPU
|.define TOBIT, f30 // 2^52 + 2^51.
|.define TONUM, f31 // 2^52 + 2^51 + 2^31.
+|.endif
|
|// The following temporaries are not saved across C calls, except for RA.
|.define RA, r20 // Callee-save.
@@ -133,6 +147,7 @@
|
|// Saved temporaries.
|.define SAVE0, r21
+|.define SAVE1, r25
|
|// Calling conventions.
|.define CARG1, r3
@@ -141,8 +156,10 @@
|.define CARG4, r6 // Overlaps TMP3.
|.define CARG5, r7 // Overlaps INS.
|
+|.if FPU
|.define FARG1, f1
|.define FARG2, f2
+|.endif
|
|.define CRET1, r3
|.define CRET2, r4
@@ -213,10 +230,16 @@
|.endif
|.else
|
+|.if FPU
|.define SAVE_LR, 276(sp)
|.define CFRAME_SPACE, 272 // Delta for sp.
|// Back chain for sp: 272(sp) <-- sp entering interpreter
|.define SAVE_FPR_, 128 // .. 128+18*8: 64 bit FPR saves.
+|.else
+|.define SAVE_LR, 132(sp)
+|.define CFRAME_SPACE, 128 // Delta for sp.
+|// Back chain for sp: 128(sp) <-- sp entering interpreter
+|.endif
|.define SAVE_GPR_, 56 // .. 56+18*4: 32 bit GPR saves.
|.define SAVE_CR, 52(sp) // 32 bit CR save.
|.define SAVE_ERRF, 48(sp) // 32 bit C frame info.
@@ -226,16 +249,25 @@
|.define SAVE_PC, 32(sp)
|.define SAVE_MULTRES, 28(sp)
|.define UNUSED1, 24(sp)
+|.if FPU
|.define TMPD_LO, 20(sp)
|.define TMPD_HI, 16(sp)
|.define TONUM_LO, 12(sp)
|.define TONUM_HI, 8(sp)
+|.else
+|.define SFSAVE_4, 20(sp)
+|.define SFSAVE_3, 16(sp)
+|.define SFSAVE_2, 12(sp)
+|.define SFSAVE_1, 8(sp)
+|.endif
|// Next frame lr: 4(sp)
|// Back chain for sp: 0(sp) <-- sp while in interpreter
|
+|.if FPU
|.define TMPD_BLO, 23(sp)
|.define TMPD, TMPD_HI
|.define TONUM_D, TONUM_HI
+|.endif
|
|.endif
|
@@ -245,7 +277,7 @@
|.else
| stw r..reg, SAVE_GPR_+(reg-14)*4(sp)
|.endif
-| stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
+| .FPU stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
|.endmacro
|.macro rest_, reg
|.if GPR64
@@ -253,7 +285,7 @@
|.else
| lwz r..reg, SAVE_GPR_+(reg-14)*4(sp)
|.endif
-| lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
+| .FPU lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
|.endmacro
|
|.macro saveregs
@@ -323,6 +355,7 @@
|// Trap for not-yet-implemented parts.
|.macro NYI; tw 4, sp, sp; .endmacro
|
+|.if FPU
|// int/FP conversions.
|.macro tonum_i, freg, reg
| xoris reg, reg, 0x8000
@@ -346,6 +379,7 @@
|.macro toint, reg, freg
| toint reg, freg, freg
|.endmacro
+|.endif
|
|//-----------------------------------------------------------------------
|
@@ -533,9 +567,19 @@ static void build_subroutines(BuildCtx *ctx)
| beq >2
|1:
| addic. TMP1, TMP1, -8
+ |.if FPU
| lfd f0, 0(RA)
+ |.else
+ | lwz CARG1, 0(RA)
+ | lwz CARG2, 4(RA)
+ |.endif
| addi RA, RA, 8
+ |.if FPU
| stfd f0, 0(BASE)
+ |.else
+ | stw CARG1, 0(BASE)
+ | stw CARG2, 4(BASE)
+ |.endif
| addi BASE, BASE, 8
| bney <1
|
@@ -613,23 +657,23 @@ static void build_subroutines(BuildCtx *ctx)
| .toc ld TOCREG, SAVE_TOC
| li TISNUM, LJ_TISNUM // Setup type comparison constants.
| lp BASE, L->base
- | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
+ | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
| lwz DISPATCH, L->glref // Setup pointer to dispatch table.
| li ZERO, 0
- | stw TMP3, TMPD
+ | .FPU stw TMP3, TMPD
| li TMP1, LJ_TFALSE
- | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
+ | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
| li TISNIL, LJ_TNIL
| li_vmstate INTERP
- | lfs TOBIT, TMPD
+ | .FPU lfs TOBIT, TMPD
| lwz PC, FRAME_PC(BASE) // Fetch PC of previous frame.
| la RA, -8(BASE) // Results start at BASE-8.
- | stw TMP3, TMPD
+ | .FPU stw TMP3, TMPD
| addi DISPATCH, DISPATCH, GG_G2DISP
| stw TMP1, 0(RA) // Prepend false to error message.
| li RD, 16 // 2 results: false + error message.
| st_vmstate
- | lfs TONUM, TMPD
+ | .FPU lfs TONUM, TMPD
| b ->vm_returnc
|
|//-----------------------------------------------------------------------
@@ -690,22 +734,22 @@ static void build_subroutines(BuildCtx *ctx)
| li TISNUM, LJ_TISNUM // Setup type comparison constants.
| lp TMP1, L->top
| lwz PC, FRAME_PC(BASE)
- | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
+ | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
| stb CARG3, L->status
- | stw TMP3, TMPD
- | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
- | lfs TOBIT, TMPD
+ | .FPU stw TMP3, TMPD
+ | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
+ | .FPU lfs TOBIT, TMPD
| sub RD, TMP1, BASE
- | stw TMP3, TMPD
- | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
+ | .FPU stw TMP3, TMPD
+ | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
| addi RD, RD, 8
- | stw TMP0, TONUM_HI
+ | .FPU stw TMP0, TONUM_HI
| li_vmstate INTERP
| li ZERO, 0
| st_vmstate
| andix. TMP0, PC, FRAME_TYPE
| mr MULTRES, RD
- | lfs TONUM, TMPD
+ | .FPU lfs TONUM, TMPD
| li TISNIL, LJ_TNIL
| beq ->BC_RET_Z
| b ->vm_return
@@ -739,19 +783,19 @@ static void build_subroutines(BuildCtx *ctx)
| lp TMP2, L->base // TMP2 = old base (used in vmeta_call).
| li TISNUM, LJ_TISNUM // Setup type comparison constants.
| lp TMP1, L->top
- | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
+ | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
| add PC, PC, BASE
- | stw TMP3, TMPD
+ | .FPU stw TMP3, TMPD
| li ZERO, 0
- | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
- | lfs TOBIT, TMPD
+ | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
+ | .FPU lfs TOBIT, TMPD
| sub PC, PC, TMP2 // PC = frame delta + frame type
- | stw TMP3, TMPD
- | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
+ | .FPU stw TMP3, TMPD
+ | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
| sub NARGS8:RC, TMP1, BASE
- | stw TMP0, TONUM_HI
+ | .FPU stw TMP0, TONUM_HI
| li_vmstate INTERP
- | lfs TONUM, TMPD
+ | .FPU lfs TONUM, TMPD
| li TISNIL, LJ_TNIL
| st_vmstate
|
@@ -839,15 +883,30 @@ static void build_subroutines(BuildCtx *ctx)
| lwz INS, -4(PC)
| subi CARG2, RB, 16
| decode_RB8 SAVE0, INS
+ |.if FPU
| lfd f0, 0(RA)
+ |.else
+ | lwz TMP2, 0(RA)
+ | lwz TMP3, 4(RA)
+ |.endif
| add TMP1, BASE, SAVE0
| stp BASE, L->base
| cmplw TMP1, CARG2
| sub CARG3, CARG2, TMP1
| decode_RA8 RA, INS
+ |.if FPU
| stfd f0, 0(CARG2)
+ |.else
+ | stw TMP2, 0(CARG2)
+ | stw TMP3, 4(CARG2)
+ |.endif
| bney ->BC_CAT_Z
+ |.if FPU
| stfdx f0, BASE, RA
+ |.else
+ | stwux TMP2, RA, BASE
+ | stw TMP3, 4(RA)
+ |.endif
| b ->cont_nop
|
|//-- Table indexing metamethods -----------------------------------------
@@ -900,9 +959,19 @@ static void build_subroutines(BuildCtx *ctx)
| // Returns TValue * (finished) or NULL (metamethod).
| cmplwi CRET1, 0
| beq >3
+ |.if FPU
| lfd f0, 0(CRET1)
+ |.else
+ | lwz TMP0, 0(CRET1)
+ | lwz TMP1, 4(CRET1)
+ |.endif
| ins_next1
+ |.if FPU
| stfdx f0, BASE, RA
+ |.else
+ | stwux TMP0, RA, BASE
+ | stw TMP1, 4(RA)
+ |.endif
| ins_next2
|
|3: // Call __index metamethod.
@@ -920,7 +989,12 @@ static void build_subroutines(BuildCtx *ctx)
| // Returns cTValue * or NULL.
| cmplwi CRET1, 0
| beq >1
+ |.if FPU
| lfd f14, 0(CRET1)
+ |.else
+ | lwz SAVE0, 0(CRET1)
+ | lwz SAVE1, 4(CRET1)
+ |.endif
| b ->BC_TGETR_Z
|1:
| stwx TISNIL, BASE, RA
@@ -975,11 +1049,21 @@ static void build_subroutines(BuildCtx *ctx)
| bl extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k)
| // Returns TValue * (finished) or NULL (metamethod).
| cmplwi CRET1, 0
+ |.if FPU
| lfdx f0, BASE, RA
+ |.else
+ | lwzux TMP2, RA, BASE
+ | lwz TMP3, 4(RA)
+ |.endif
| beq >3
| // NOBARRIER: lj_meta_tset ensures the table is not black.
| ins_next1
+ |.if FPU
| stfd f0, 0(CRET1)
+ |.else
+ | stw TMP2, 0(CRET1)
+ | stw TMP3, 4(CRET1)
+ |.endif
| ins_next2
|
|3: // Call __newindex metamethod.
@@ -990,7 +1074,12 @@ static void build_subroutines(BuildCtx *ctx)
| add PC, TMP1, BASE
| lwz LFUNC:RB, FRAME_FUNC(BASE) // Guaranteed to be a function here.
| li NARGS8:RC, 24 // 3 args for func(t, k, v)
+ |.if FPU
| stfd f0, 16(BASE) // Copy value to third argument.
+ |.else
+ | stw TMP2, 16(BASE)
+ | stw TMP3, 20(BASE)
+ |.endif
| b ->vm_call_dispatch_f
|
|->vmeta_tsetr:
@@ -999,7 +1088,12 @@ static void build_subroutines(BuildCtx *ctx)
| stw PC, SAVE_PC
| bl extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key)
| // Returns TValue *.
+ |.if FPU
| stfd f14, 0(CRET1)
+ |.else
+ | stw SAVE0, 0(CRET1)
+ | stw SAVE1, 4(CRET1)
+ |.endif
| b ->cont_nop
|
|//-- Comparison metamethods ---------------------------------------------
@@ -1038,9 +1132,19 @@ static void build_subroutines(BuildCtx *ctx)
|
|->cont_ra: // RA = resultptr
| lwz INS, -4(PC)
+ |.if FPU
| lfd f0, 0(RA)
+ |.else
+ | lwz CARG1, 0(RA)
+ | lwz CARG2, 4(RA)
+ |.endif
| decode_RA8 TMP1, INS
+ |.if FPU
| stfdx f0, BASE, TMP1
+ |.else
+ | stwux CARG1, TMP1, BASE
+ | stw CARG2, 4(TMP1)
+ |.endif
| b ->cont_nop
|
|->cont_condt: // RA = resultptr
@@ -1246,22 +1350,32 @@ static void build_subroutines(BuildCtx *ctx)
|.macro .ffunc_n, name
|->ff_ .. name:
| cmplwi NARGS8:RC, 8
- | lwz CARG3, 0(BASE)
+ | lwz CARG1, 0(BASE)
+ |.if FPU
| lfd FARG1, 0(BASE)
+ |.else
+ | lwz CARG2, 4(BASE)
+ |.endif
| blt ->fff_fallback
- | checknum CARG3; bge ->fff_fallback
+ | checknum CARG1; bge ->fff_fallback
|.endmacro
|
|.macro .ffunc_nn, name
|->ff_ .. name:
| cmplwi NARGS8:RC, 16
- | lwz CARG3, 0(BASE)
+ | lwz CARG1, 0(BASE)
+ |.if FPU
| lfd FARG1, 0(BASE)
- | lwz CARG4, 8(BASE)
+ | lwz CARG3, 8(BASE)
| lfd FARG2, 8(BASE)
+ |.else
+ | lwz CARG2, 4(BASE)
+ | lwz CARG3, 8(BASE)
+ | lwz CARG4, 12(BASE)
+ |.endif
| blt ->fff_fallback
+ | checknum CARG1; bge ->fff_fallback
| checknum CARG3; bge ->fff_fallback
- | checknum CARG4; bge ->fff_fallback
|.endmacro
|
|// Inlined GC threshold check. Caveat: uses TMP0 and TMP1.
@@ -1282,14 +1396,21 @@ static void build_subroutines(BuildCtx *ctx)
| bge cr1, ->fff_fallback
| stw CARG3, 0(RA)
| addi RD, NARGS8:RC, 8 // Compute (nresults+1)*8.
+ | addi TMP1, BASE, 8
+ | add TMP2, RA, NARGS8:RC
| stw CARG1, 4(RA)
| beq ->fff_res // Done if exactly 1 argument.
- | li TMP1, 8
- | subi RC, RC, 8
|1:
- | cmplw TMP1, RC
- | lfdx f0, BASE, TMP1
- | stfdx f0, RA, TMP1
+ | cmplw TMP1, TMP2
+ |.if FPU
+ | lfd f0, 0(TMP1)
+ | stfd f0, 0(TMP1)
+ |.else
+ | lwz CARG1, 0(TMP1)
+ | lwz CARG2, 4(TMP1)
+ | stw CARG1, -8(TMP1)
+ | stw CARG2, -4(TMP1)
+ |.endif
| addi TMP1, TMP1, 8
| bney <1
| b ->fff_res
@@ -1304,8 +1425,14 @@ static void build_subroutines(BuildCtx *ctx)
| orc TMP1, TMP2, TMP0
| addi TMP1, TMP1, ~LJ_TISNUM+1
| slwi TMP1, TMP1, 3
+ |.if FPU
| la TMP2, CFUNC:RB->upvalue
| lfdx FARG1, TMP2, TMP1
+ |.else
+ | add TMP1, CFUNC:RB, TMP1
+ | lwz CARG1, CFUNC:TMP1->upvalue[0].u32.hi
+ | lwz CARG2, CFUNC:TMP1->upvalue[0].u32.lo
+ |.endif
| b ->fff_resn
|
|//-- Base library: getters and setters ---------------------------------
@@ -1383,7 +1510,12 @@ static void build_subroutines(BuildCtx *ctx)
| mr CARG1, L
| bl extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key)
| // Returns cTValue *.
+ |.if FPU
| lfd FARG1, 0(CRET1)
+ |.else
+ | lwz CARG2, 4(CRET1)
+ | lwz CARG1, 0(CRET1) // Caveat: CARG1 == CRET1.
+ |.endif
| b ->fff_resn
|
|//-- Base library: conversions ------------------------------------------
@@ -1392,7 +1524,11 @@ static void build_subroutines(BuildCtx *ctx)
| // Only handles the number case inline (without a base argument).
| cmplwi NARGS8:RC, 8
| lwz CARG1, 0(BASE)
+ |.if FPU
| lfd FARG1, 0(BASE)
+ |.else
+ | lwz CARG2, 4(BASE)
+ |.endif
| bne ->fff_fallback // Exactly one argument.
| checknum CARG1; bgt ->fff_fallback
| b ->fff_resn
@@ -1443,12 +1579,23 @@ static void build_subroutines(BuildCtx *ctx)
| cmplwi CRET1, 0
| li CARG3, LJ_TNIL
| beq ->fff_restv // End of traversal: return nil.
- | lfd f0, 8(BASE) // Copy key and value to results.
| la RA, -8(BASE)
+ |.if FPU
+ | lfd f0, 8(BASE) // Copy key and value to results.
| lfd f1, 16(BASE)
| stfd f0, 0(RA)
- | li RD, (2+1)*8
| stfd f1, 8(RA)
+ |.else
+ | lwz CARG1, 8(BASE)
+ | lwz CARG2, 12(BASE)
+ | lwz CARG3, 16(BASE)
+ | lwz CARG4, 20(BASE)
+ | stw CARG1, 0(RA)
+ | stw CARG2, 4(RA)
+ | stw CARG3, 8(RA)
+ | stw CARG4, 12(RA)
+ |.endif
+ | li RD, (2+1)*8
| b ->fff_res
|
|.ffunc_1 pairs
@@ -1457,17 +1604,32 @@ static void build_subroutines(BuildCtx *ctx)
| bne ->fff_fallback
#if LJ_52
| lwz TAB:TMP2, TAB:CARG1->metatable
+ |.if FPU
| lfd f0, CFUNC:RB->upvalue[0]
+ |.else
+ | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+ | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+ |.endif
| cmplwi TAB:TMP2, 0
| la RA, -8(BASE)
| bne ->fff_fallback
#else
+ |.if FPU
| lfd f0, CFUNC:RB->upvalue[0]
+ |.else
+ | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+ | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+ |.endif
| la RA, -8(BASE)
#endif
| stw TISNIL, 8(BASE)
| li RD, (3+1)*8
+ |.if FPU
| stfd f0, 0(RA)
+ |.else
+ | stw TMP0, 0(RA)
+ | stw TMP1, 4(RA)
+ |.endif
| b ->fff_res
|
|.ffunc ipairs_aux
@@ -1513,14 +1675,24 @@ static void build_subroutines(BuildCtx *ctx)
| stfd FARG2, 0(RA)
|.endif
| ble >2 // Not in array part?
+ |.if FPU
| lwzx TMP2, TMP1, TMP3
| lfdx f0, TMP1, TMP3
+ |.else
+ | lwzux TMP2, TMP1, TMP3
+ | lwz TMP3, 4(TMP1)
+ |.endif
|1:
| checknil TMP2
| li RD, (0+1)*8
| beq ->fff_res // End of iteration, return 0 results.
| li RD, (2+1)*8
+ |.if FPU
| stfd f0, 8(RA)
+ |.else
+ | stw TMP2, 8(RA)
+ | stw TMP3, 12(RA)
+ |.endif
| b ->fff_res
|2: // Check for empty hash part first. Otherwise call C function.
| lwz TMP0, TAB:CARG1->hmask
@@ -1534,7 +1706,11 @@ static void build_subroutines(BuildCtx *ctx)
| li RD, (0+1)*8
| beq ->fff_res
| lwz TMP2, 0(CRET1)
+ |.if FPU
| lfd f0, 0(CRET1)
+ |.else
+ | lwz TMP3, 4(CRET1)
+ |.endif
| b <1
|
|.ffunc_1 ipairs
@@ -1543,12 +1719,22 @@ static void build_subroutines(BuildCtx *ctx)
| bne ->fff_fallback
#if LJ_52
| lwz TAB:TMP2, TAB:CARG1->metatable
+ |.if FPU
| lfd f0, CFUNC:RB->upvalue[0]
+ |.else
+ | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+ | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+ |.endif
| cmplwi TAB:TMP2, 0
| la RA, -8(BASE)
| bne ->fff_fallback
#else
+ |.if FPU
| lfd f0, CFUNC:RB->upvalue[0]
+ |.else
+ | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+ | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+ |.endif
| la RA, -8(BASE)
#endif
|.if DUALNUM
@@ -1558,7 +1744,12 @@ static void build_subroutines(BuildCtx *ctx)
|.endif
| stw ZERO, 12(BASE)
| li RD, (3+1)*8
+ |.if FPU
| stfd f0, 0(RA)
+ |.else
+ | stw TMP0, 0(RA)
+ | stw TMP1, 4(RA)
+ |.endif
| b ->fff_res
|
|//-- Base library: catch errors ----------------------------------------
@@ -1577,19 +1768,32 @@ static void build_subroutines(BuildCtx *ctx)
|
|.ffunc xpcall
| cmplwi NARGS8:RC, 16
- | lwz CARG4, 8(BASE)
+ | lwz CARG3, 8(BASE)
+ |.if FPU
| lfd FARG2, 8(BASE)
| lfd FARG1, 0(BASE)
+ |.else
+ | lwz CARG1, 0(BASE)
+ | lwz CARG2, 4(BASE)
+ | lwz CARG4, 12(BASE)
+ |.endif
| blt ->fff_fallback
| lbz TMP1, DISPATCH_GL(hookmask)(DISPATCH)
| mr TMP2, BASE
- | checkfunc CARG4; bne ->fff_fallback // Traceback must be a function.
+ | checkfunc CARG3; bne ->fff_fallback // Traceback must be a function.
| la BASE, 16(BASE)
| // Remember active hook before pcall.
| rlwinm TMP1, TMP1, 32-HOOK_ACTIVE_SHIFT, 31, 31
+ |.if FPU
| stfd FARG2, 0(TMP2) // Swap function and traceback.
- | subi NARGS8:RC, NARGS8:RC, 16
| stfd FARG1, 8(TMP2)
+ |.else
+ | stw CARG3, 0(TMP2)
+ | stw CARG4, 4(TMP2)
+ | stw CARG1, 8(TMP2)
+ | stw CARG2, 12(TMP2)
+ |.endif
+ | subi NARGS8:RC, NARGS8:RC, 16
| addi PC, TMP1, 16+FRAME_PCALL
| b ->vm_call_dispatch
|
@@ -1632,9 +1836,21 @@ static void build_subroutines(BuildCtx *ctx)
| stp BASE, L->top
|2: // Move args to coroutine.
| cmpw TMP1, NARGS8:RC
+ |.if FPU
| lfdx f0, BASE, TMP1
+ |.else
+ | add CARG3, BASE, TMP1
+ | lwz TMP2, 0(CARG3)
+ | lwz TMP3, 4(CARG3)
+ |.endif
| beq >3
+ |.if FPU
| stfdx f0, CARG2, TMP1
+ |.else
+ | add CARG3, CARG2, TMP1
+ | stw TMP2, 0(CARG3)
+ | stw TMP3, 4(CARG3)
+ |.endif
| addi TMP1, TMP1, 8
| b <2
|3:
@@ -1665,8 +1881,17 @@ static void build_subroutines(BuildCtx *ctx)
| stp TMP2, L:SAVE0->top // Clear coroutine stack.
|5: // Move results from coroutine.
| cmplw TMP1, TMP3
+ |.if FPU
| lfdx f0, TMP2, TMP1
| stfdx f0, BASE, TMP1
+ |.else
+ | add CARG3, TMP2, TMP1
+ | lwz CARG1, 0(CARG3)
+ | lwz CARG2, 4(CARG3)
+ | add CARG3, BASE, TMP1
+ | stw CARG1, 0(CARG3)
+ | stw CARG2, 4(CARG3)
+ |.endif
| addi TMP1, TMP1, 8
| bne <5
|6:
@@ -1691,12 +1916,22 @@ static void build_subroutines(BuildCtx *ctx)
| andix. TMP0, PC, FRAME_TYPE
| la TMP3, -8(TMP3)
| li TMP1, LJ_TFALSE
+ |.if FPU
| lfd f0, 0(TMP3)
+ |.else
+ | lwz CARG1, 0(TMP3)
+ | lwz CARG2, 4(TMP3)
+ |.endif
| stp TMP3, L:SAVE0->top // Remove error from coroutine stack.
| li RD, (2+1)*8
| stw TMP1, -8(BASE) // Prepend false to results.
| la RA, -8(BASE)
+ |.if FPU
| stfd f0, 0(BASE) // Copy error message.
+ |.else
+ | stw CARG1, 0(BASE) // Copy error message.
+ | stw CARG2, 4(BASE)
+ |.endif
| b <7
|.else
| mr CARG1, L
@@ -1875,7 +2110,12 @@ static void build_subroutines(BuildCtx *ctx)
| lus CARG1, 0x8000 // -(2^31).
| beqy ->fff_resi
|5:
+ |.if FPU
| lfd FARG1, 0(BASE)
+ |.else
+ | lwz CARG1, 0(BASE)
+ | lwz CARG2, 4(BASE)
+ |.endif
| blex func
| b ->fff_resn
|.endmacro
@@ -1899,10 +2139,14 @@ static void build_subroutines(BuildCtx *ctx)
|
|.ffunc math_log
| cmplwi NARGS8:RC, 8
- | lwz CARG3, 0(BASE)
- | lfd FARG1, 0(BASE)
+ | lwz CARG1, 0(BASE)
| bne ->fff_fallback // Need exactly 1 argument.
- | checknum CARG3; bge ->fff_fallback
+ | checknum CARG1; bge ->fff_fallback
+ |.if FPU
+ | lfd FARG1, 0(BASE)
+ |.else
+ | lwz CARG2, 4(BASE)
+ |.endif
| blex log
| b ->fff_resn
|
@@ -1924,17 +2168,24 @@ static void build_subroutines(BuildCtx *ctx)
|.if DUALNUM
|.ffunc math_ldexp
| cmplwi NARGS8:RC, 16
- | lwz CARG3, 0(BASE)
+ | lwz TMP0, 0(BASE)
+ |.if FPU
| lfd FARG1, 0(BASE)
- | lwz CARG4, 8(BASE)
+ |.else
+ | lwz CARG1, 0(BASE)
+ | lwz CARG2, 4(BASE)
+ |.endif
+ | lwz TMP1, 8(BASE)
|.if GPR64
| lwz CARG2, 12(BASE)
- |.else
+ |.elif FPU
| lwz CARG1, 12(BASE)
+ |.else
+ | lwz CARG3, 12(BASE)
|.endif
| blt ->fff_fallback
- | checknum CARG3; bge ->fff_fallback
- | checknum CARG4; bne ->fff_fallback
+ | checknum TMP0; bge ->fff_fallback
+ | checknum TMP1; bne ->fff_fallback
|.else
|.ffunc_nn math_ldexp
|.if GPR64
@@ -1949,8 +2200,10 @@ static void build_subroutines(BuildCtx *ctx)
|.ffunc_n math_frexp
|.if GPR64
| la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
- |.else
+ |.elif FPU
| la CARG1, DISPATCH_GL(tmptv)(DISPATCH)
+ |.else
+ | la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
|.endif
| lwz PC, FRAME_PC(BASE)
| blex frexp
@@ -1959,7 +2212,12 @@ static void build_subroutines(BuildCtx *ctx)
|.if not DUALNUM
| tonum_i FARG2, TMP1
|.endif
+ |.if FPU
| stfd FARG1, 0(RA)
+ |.else
+ | stw CRET1, 0(RA)
+ | stw CRET2, 4(RA)
+ |.endif
| li RD, (2+1)*8
|.if DUALNUM
| stw TISNUM, 8(RA)
@@ -1972,13 +2230,20 @@ static void build_subroutines(BuildCtx *ctx)
|.ffunc_n math_modf
|.if GPR64
| la CARG2, -8(BASE)
- |.else
+ |.elif FPU
| la CARG1, -8(BASE)
+ |.else
+ | la CARG3, -8(BASE)
|.endif
| lwz PC, FRAME_PC(BASE)
| blex modf
| la RA, -8(BASE)
+ |.if FPU
| stfd FARG1, 0(BASE)
+ |.else
+ | stw CRET1, 0(BASE)
+ | stw CRET2, 4(BASE)
+ |.endif
| li RD, (2+1)*8
| b ->fff_res
|
@@ -1986,13 +2251,13 @@ static void build_subroutines(BuildCtx *ctx)
|.if DUALNUM
| .ffunc_1 name
| checknum CARG3
- | addi TMP1, BASE, 8
- | add TMP2, BASE, NARGS8:RC
+ | addi SAVE0, BASE, 8
+ | add SAVE1, BASE, NARGS8:RC
| bne >4
|1: // Handle integers.
- | lwz CARG4, 0(TMP1)
- | cmplw cr1, TMP1, TMP2
- | lwz CARG2, 4(TMP1)
+ | lwz CARG4, 0(SAVE0)
+ | cmplw cr1, SAVE0, SAVE1
+ | lwz CARG2, 4(SAVE0)
| bge cr1, ->fff_resi
| checknum CARG4
| xoris TMP0, CARG1, 0x8000
@@ -2009,36 +2274,76 @@ static void build_subroutines(BuildCtx *ctx)
|.if GPR64
| rldicl CARG1, CARG1, 0, 32
|.endif
- | addi TMP1, TMP1, 8
+ | addi SAVE0, SAVE0, 8
| b <1
|3:
| bge ->fff_fallback
| // Convert intermediate result to number and continue below.
+ |.if FPU
| tonum_i FARG1, CARG1
- | lfd FARG2, 0(TMP1)
+ | lfd FARG2, 0(SAVE0)
+ |.else
+ | mr CARG2, CARG1
+ | bl ->vm_sfi2d_1
+ | lwz CARG3, 0(SAVE0)
+ | lwz CARG4, 4(SAVE0)
+ |.endif
| b >6
|4:
+ |.if FPU
| lfd FARG1, 0(BASE)
+ |.else
+ | lwz CARG1, 0(BASE)
+ | lwz CARG2, 4(BASE)
+ |.endif
| bge ->fff_fallback
|5: // Handle numbers.
- | lwz CARG4, 0(TMP1)
- | cmplw cr1, TMP1, TMP2
- | lfd FARG2, 0(TMP1)
+ | lwz CARG3, 0(SAVE0)
+ | cmplw cr1, SAVE0, SAVE1
+ |.if FPU
+ | lfd FARG2, 0(SAVE0)
+ |.else
+ | lwz CARG4, 4(SAVE0)
+ |.endif
| bge cr1, ->fff_resn
- | checknum CARG4; bge >7
+ | checknum CARG3; bge >7
|6:
+ | addi SAVE0, SAVE0, 8
+ |.if FPU
| fsub f0, FARG1, FARG2
- | addi TMP1, TMP1, 8
|.if ismax
| fsel FARG1, f0, FARG1, FARG2
|.else
| fsel FARG1, f0, FARG2, FARG1
|.endif
+ |.else
+ | stw CARG1, SFSAVE_1
+ | stw CARG2, SFSAVE_2
+ | stw CARG3, SFSAVE_3
+ | stw CARG4, SFSAVE_4
+ | blex __ledf2
+ | cmpwi CRET1, 0
+ |.if ismax
+ | blt >8
+ |.else
+ | bge >8
+ |.endif
+ | lwz CARG1, SFSAVE_1
+ | lwz CARG2, SFSAVE_2
+ | b <5
+ |8:
+ | lwz CARG1, SFSAVE_3
+ | lwz CARG2, SFSAVE_4
+ |.endif
| b <5
|7: // Convert integer to number and continue above.
- | lwz CARG2, 4(TMP1)
+ | lwz CARG3, 4(SAVE0)
| bne ->fff_fallback
- | tonum_i FARG2, CARG2
+ |.if FPU
+ | tonum_i FARG2, CARG3
+ |.else
+ | bl ->vm_sfi2d_2
+ |.endif
| b <6
|.else
| .ffunc_n name
@@ -2238,28 +2543,37 @@ static void build_subroutines(BuildCtx *ctx)
|
|.macro .ffunc_bit_op, name, ins
| .ffunc_bit name
- | addi TMP1, BASE, 8
- | add TMP2, BASE, NARGS8:RC
+ | addi SAVE0, BASE, 8
+ | add SAVE1, BASE, NARGS8:RC
|1:
- | lwz CARG4, 0(TMP1)
- | cmplw cr1, TMP1, TMP2
+ | lwz CARG4, 0(SAVE0)
+ | cmplw cr1, SAVE0, SAVE1
|.if DUALNUM
- | lwz CARG2, 4(TMP1)
+ | lwz CARG2, 4(SAVE0)
|.else
- | lfd FARG1, 0(TMP1)
+ | lfd FARG1, 0(SAVE0)
|.endif
| bgey cr1, ->fff_resi
| checknum CARG4
|.if DUALNUM
+ |.if FPU
| bnel ->fff_bitop_fb
|.else
+ | beq >3
+ | stw CARG1, SFSAVE_1
+ | bl ->fff_bitop_fb
+ | mr CARG2, CARG1
+ | lwz CARG1, SFSAVE_1
+ |3:
+ |.endif
+ |.else
| fadd FARG1, FARG1, TOBIT
| bge ->fff_fallback
| stfd FARG1, TMPD
| lwz CARG2, TMPD_LO
|.endif
| ins CARG1, CARG1, CARG2
- | addi TMP1, TMP1, 8
+ | addi SAVE0, SAVE0, 8
| b <1
|.endmacro
|
@@ -2281,7 +2595,14 @@ static void build_subroutines(BuildCtx *ctx)
|.macro .ffunc_bit_sh, name, ins, shmod
|.if DUALNUM
| .ffunc_2 bit_..name
+ |.if FPU
| checknum CARG3; bnel ->fff_tobit_fb
+ |.else
+ | checknum CARG3; beq >1
+ | bl ->fff_tobit_fb
+ | lwz CARG2, 12(BASE) // Conversion polluted CARG2.
+ |1:
+ |.endif
| // Note: no inline conversion from number for 2nd argument!
| checknum CARG4; bne ->fff_fallback
|.else
@@ -2318,27 +2639,77 @@ static void build_subroutines(BuildCtx *ctx)
|->fff_resn:
| lwz PC, FRAME_PC(BASE)
| la RA, -8(BASE)
+ |.if FPU
| stfd FARG1, -8(BASE)
+ |.else
+ | stw CARG1, -8(BASE)
+ | stw CARG2, -4(BASE)
+ |.endif
| b ->fff_res1
|
|// Fallback FP number to bit conversion.
|->fff_tobit_fb:
|.if DUALNUM
+ |.if FPU
| lfd FARG1, 0(BASE)
| bgt ->fff_fallback
| fadd FARG1, FARG1, TOBIT
| stfd FARG1, TMPD
| lwz CARG1, TMPD_LO
| blr
+ |.else
+ | bgt ->fff_fallback
+ | mr CARG2, CARG1
+ | mr CARG1, CARG3
+ |// Modifies: CARG1, CARG2, TMP0, TMP1, TMP2.
+ |->vm_tobit:
+ | slwi TMP2, CARG1, 1
+ | addis TMP2, TMP2, 0x0020
+ | cmpwi TMP2, 0
+ | bge >2
+ | li TMP1, 0x3e0
+ | srawi TMP2, TMP2, 21
+ | not TMP1, TMP1
+ | sub. TMP2, TMP1, TMP2
+ | cmpwi cr7, CARG1, 0
+ | blt >1
+ | slwi TMP1, CARG1, 11
+ | srwi TMP0, CARG2, 21
+ | oris TMP1, TMP1, 0x8000
+ | or TMP1, TMP1, TMP0
+ | srw CARG1, TMP1, TMP2
+ | bclr 4, 28 // Return if cr7[lt] == 0, no hint.
+ | neg CARG1, CARG1
+ | blr
+ |1:
+ | addi TMP2, TMP2, 21
+ | srw TMP1, CARG2, TMP2
+ | slwi CARG2, CARG1, 12
+ | subfic TMP2, TMP2, 20
+ | slw TMP0, CARG2, TMP2
+ | or CARG1, TMP1, TMP0
+ | bclr 4, 28 // Return if cr7[lt] == 0, no hint.
+ | neg CARG1, CARG1
+ | blr
+ |2:
+ | li CARG1, 0
+ | blr
+ |.endif
|.endif
|->fff_bitop_fb:
|.if DUALNUM
- | lfd FARG1, 0(TMP1)
+ |.if FPU
+ | lfd FARG1, 0(SAVE0)
| bgt ->fff_fallback
| fadd FARG1, FARG1, TOBIT
| stfd FARG1, TMPD
| lwz CARG2, TMPD_LO
| blr
+ |.else
+ | bgt ->fff_fallback
+ | mr CARG1, CARG4
+ | b ->vm_tobit
+ |.endif
|.endif
|
|//-----------------------------------------------------------------------
@@ -2531,10 +2902,21 @@ static void build_subroutines(BuildCtx *ctx)
| decode_RA8 RC, INS // Call base.
| beq >2
|1: // Move results down.
+ |.if FPU
| lfd f0, 0(RA)
+ |.else
+ | lwz CARG1, 0(RA)
+ | lwz CARG2, 4(RA)
+ |.endif
| addic. TMP1, TMP1, -8
| addi RA, RA, 8
+ |.if FPU
| stfdx f0, BASE, RC
+ |.else
+ | add CARG3, BASE, RC
+ | stw CARG1, 0(CARG3)
+ | stw CARG2, 4(CARG3)
+ |.endif
| addi RC, RC, 8
| bne <1
|2:
@@ -2587,10 +2969,12 @@ static void build_subroutines(BuildCtx *ctx)
|//-----------------------------------------------------------------------
|
|.macro savex_, a, b, c, d
+ |.if FPU
| stfd f..a, 16+a*8(sp)
| stfd f..b, 16+b*8(sp)
| stfd f..c, 16+c*8(sp)
| stfd f..d, 16+d*8(sp)
+ |.endif
|.endmacro
|
|->vm_exit_handler:
@@ -2662,16 +3046,16 @@ static void build_subroutines(BuildCtx *ctx)
| lwz KBASE, PC2PROTO(k)(TMP1)
| // Setup type comparison constants.
| li TISNUM, LJ_TISNUM
- | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
- | stw TMP3, TMPD
+ | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
+ | .FPU stw TMP3, TMPD
| li ZERO, 0
- | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
- | lfs TOBIT, TMPD
- | stw TMP3, TMPD
- | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
+ | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
+ | .FPU lfs TOBIT, TMPD
+ | .FPU stw TMP3, TMPD
+ | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
| li TISNIL, LJ_TNIL
- | stw TMP0, TONUM_HI
- | lfs TONUM, TMPD
+ | .FPU stw TMP0, TONUM_HI
+ | .FPU lfs TONUM, TMPD
| // Modified copy of ins_next which handles function header dispatch, too.
| lwz INS, 0(PC)
| addi PC, PC, 4
@@ -2716,7 +3100,35 @@ static void build_subroutines(BuildCtx *ctx)
|//-- Math helper functions ----------------------------------------------
|//-----------------------------------------------------------------------
|
- |// NYI: Use internal implementations of floor, ceil, trunc.
+ |// NYI: Use internal implementations of floor, ceil, trunc, sfcmp.
+ |
+ |.macro sfi2d, AHI, ALO
+ |.if not FPU
+ | mr. AHI, ALO
+ | bclr 12, 2 // Handle zero first.
+ | srawi TMP0, ALO, 31
+ | xor TMP1, ALO, TMP0
+ | sub TMP1, TMP1, TMP0 // Absolute value in TMP1.
+ | cntlzw AHI, TMP1
+ | andix. TMP0, TMP0, 0x800 // Mask sign bit.
+ | slw TMP1, TMP1, AHI // Align mantissa left with leading 1.
+ | subfic AHI, AHI, 0x3ff+31-1 // Exponent -1 in AHI.
+ | slwi ALO, TMP1, 21
+ | or AHI, AHI, TMP0 // Sign | Exponent.
+ | srwi TMP1, TMP1, 11
+ | slwi AHI, AHI, 20 // Align left.
+ | add AHI, AHI, TMP1 // Add mantissa, increment exponent.
+ | blr
+ |.endif
+ |.endmacro
+ |
+ |// Input: CARG2. Output: CARG1, CARG2. Temporaries: TMP0, TMP1.
+ |->vm_sfi2d_1:
+ | sfi2d CARG1, CARG2
+ |
+ |// Input: CARG4. Output: CARG3, CARG4. Temporaries: TMP0, TMP1.
+ |->vm_sfi2d_2:
+ | sfi2d CARG3, CARG4
|
|->vm_modi:
| divwo. TMP0, CARG1, CARG2
@@ -2784,21 +3196,21 @@ static void build_subroutines(BuildCtx *ctx)
| addi DISPATCH, r12, GG_G2DISP
| stw r11, CTSTATE->cb.slot
| stw r3, CTSTATE->cb.gpr[0]
- | stfd f1, CTSTATE->cb.fpr[0]
+ | .FPU stfd f1, CTSTATE->cb.fpr[0]
| stw r4, CTSTATE->cb.gpr[1]
- | stfd f2, CTSTATE->cb.fpr[1]
+ | .FPU stfd f2, CTSTATE->cb.fpr[1]
| stw r5, CTSTATE->cb.gpr[2]
- | stfd f3, CTSTATE->cb.fpr[2]
+ | .FPU stfd f3, CTSTATE->cb.fpr[2]
| stw r6, CTSTATE->cb.gpr[3]
- | stfd f4, CTSTATE->cb.fpr[3]
+ | .FPU stfd f4, CTSTATE->cb.fpr[3]
| stw r7, CTSTATE->cb.gpr[4]
- | stfd f5, CTSTATE->cb.fpr[4]
+ | .FPU stfd f5, CTSTATE->cb.fpr[4]
| stw r8, CTSTATE->cb.gpr[5]
- | stfd f6, CTSTATE->cb.fpr[5]
+ | .FPU stfd f6, CTSTATE->cb.fpr[5]
| stw r9, CTSTATE->cb.gpr[6]
- | stfd f7, CTSTATE->cb.fpr[6]
+ | .FPU stfd f7, CTSTATE->cb.fpr[6]
| stw r10, CTSTATE->cb.gpr[7]
- | stfd f8, CTSTATE->cb.fpr[7]
+ | .FPU stfd f8, CTSTATE->cb.fpr[7]
| addi TMP0, sp, CFRAME_SPACE+8
| stw TMP0, CTSTATE->cb.stack
| mr CARG1, CTSTATE
@@ -2809,21 +3221,21 @@ static void build_subroutines(BuildCtx *ctx)
| lp BASE, L:CRET1->base
| li TISNUM, LJ_TISNUM // Setup type comparison constants.
| lp RC, L:CRET1->top
- | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
+ | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
| li ZERO, 0
| mr L, CRET1
- | stw TMP3, TMPD
- | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
+ | .FPU stw TMP3, TMPD
+ | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
| lwz LFUNC:RB, FRAME_FUNC(BASE)
- | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
- | stw TMP0, TONUM_HI
+ | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
+ | .FPU stw TMP0, TONUM_HI
| li TISNIL, LJ_TNIL
| li_vmstate INTERP
- | lfs TOBIT, TMPD
- | stw TMP3, TMPD
+ | .FPU lfs TOBIT, TMPD
+ | .FPU stw TMP3, TMPD
| sub RC, RC, BASE
| st_vmstate
- | lfs TONUM, TMPD
+ | .FPU lfs TONUM, TMPD
| ins_callt
|.endif
|
@@ -2837,7 +3249,7 @@ static void build_subroutines(BuildCtx *ctx)
| mr CARG2, RA
| bl extern lj_ccallback_leave // (CTState *cts, TValue *o)
| lwz CRET1, CTSTATE->cb.gpr[0]
- | lfd FARG1, CTSTATE->cb.fpr[0]
+ | .FPU lfd FARG1, CTSTATE->cb.fpr[0]
| lwz CRET2, CTSTATE->cb.gpr[1]
| b ->vm_leave_unw
|.endif
@@ -2871,14 +3283,14 @@ static void build_subroutines(BuildCtx *ctx)
| bge <1
|2:
| bney cr1, >3
- | lfd f1, CCSTATE->fpr[0]
- | lfd f2, CCSTATE->fpr[1]
- | lfd f3, CCSTATE->fpr[2]
- | lfd f4, CCSTATE->fpr[3]
- | lfd f5, CCSTATE->fpr[4]
- | lfd f6, CCSTATE->fpr[5]
- | lfd f7, CCSTATE->fpr[6]
- | lfd f8, CCSTATE->fpr[7]
+ | .FPU lfd f1, CCSTATE->fpr[0]
+ | .FPU lfd f2, CCSTATE->fpr[1]
+ | .FPU lfd f3, CCSTATE->fpr[2]
+ | .FPU lfd f4, CCSTATE->fpr[3]
+ | .FPU lfd f5, CCSTATE->fpr[4]
+ | .FPU lfd f6, CCSTATE->fpr[5]
+ | .FPU lfd f7, CCSTATE->fpr[6]
+ | .FPU lfd f8, CCSTATE->fpr[7]
|3:
| lp TMP0, CCSTATE->func
| lwz CARG2, CCSTATE->gpr[1]
@@ -2895,7 +3307,7 @@ static void build_subroutines(BuildCtx *ctx)
| lwz TMP2, -4(r14)
| lwz TMP0, 4(r14)
| stw CARG1, CCSTATE:TMP1->gpr[0]
- | stfd FARG1, CCSTATE:TMP1->fpr[0]
+ | .FPU stfd FARG1, CCSTATE:TMP1->fpr[0]
| stw CARG2, CCSTATE:TMP1->gpr[1]
| mtlr TMP0
| stw CARG3, CCSTATE:TMP1->gpr[2]
@@ -2924,19 +3336,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
| // RA = src1*8, RD = src2*8, JMP with RD = target
|.if DUALNUM
- | lwzux TMP0, RA, BASE
+ | lwzux CARG1, RA, BASE
| addi PC, PC, 4
| lwz CARG2, 4(RA)
- | lwzux TMP1, RD, BASE
+ | lwzux CARG3, RD, BASE
| lwz TMP2, -4(PC)
- | checknum cr0, TMP0
- | lwz CARG3, 4(RD)
+ | checknum cr0, CARG1
+ | lwz CARG4, 4(RD)
| decode_RD4 TMP2, TMP2
- | checknum cr1, TMP1
- | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+ | checknum cr1, CARG3
+ | addis SAVE0, TMP2, -(BCBIAS_J*4 >> 16)
| bne cr0, >7
| bne cr1, >8
- | cmpw CARG2, CARG3
+ | cmpw CARG2, CARG4
if (op == BC_ISLT) {
| bge >2
} else if (op == BC_ISGE) {
@@ -2947,28 +3359,41 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| ble >2
}
|1:
- | add PC, PC, TMP2
+ | add PC, PC, SAVE0
|2:
| ins_next
|
|7: // RA is not an integer.
| bgt cr0, ->vmeta_comp
| // RA is a number.
- | lfd f0, 0(RA)
+ | .FPU lfd f0, 0(RA)
| bgt cr1, ->vmeta_comp
| blt cr1, >4
| // RA is a number, RD is an integer.
- | tonum_i f1, CARG3
+ |.if FPU
+ | tonum_i f1, CARG4
+ |.else
+ | bl ->vm_sfi2d_2
+ |.endif
| b >5
|
|8: // RA is an integer, RD is not an integer.
| bgt cr1, ->vmeta_comp
| // RA is an integer, RD is a number.
+ |.if FPU
| tonum_i f0, CARG2
+ |.else
+ | bl ->vm_sfi2d_1
+ |.endif
|4:
- | lfd f1, 0(RD)
+ | .FPU lfd f1, 0(RD)
|5:
+ |.if FPU
| fcmpu cr0, f0, f1
+ |.else
+ | blex __ledf2
+ | cmpwi CRET1, 0
+ |.endif
if (op == BC_ISLT) {
| bge <2
} else if (op == BC_ISGE) {
@@ -3016,42 +3441,42 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
vk = op == BC_ISEQV;
| // RA = src1*8, RD = src2*8, JMP with RD = target
|.if DUALNUM
- | lwzux TMP0, RA, BASE
+ | lwzux CARG1, RA, BASE
| addi PC, PC, 4
| lwz CARG2, 4(RA)
- | lwzux TMP1, RD, BASE
- | checknum cr0, TMP0
- | lwz TMP2, -4(PC)
- | checknum cr1, TMP1
- | decode_RD4 TMP2, TMP2
- | lwz CARG3, 4(RD)
+ | lwzux CARG3, RD, BASE
+ | checknum cr0, CARG1
+ | lwz SAVE0, -4(PC)
+ | checknum cr1, CARG3
+ | decode_RD4 SAVE0, SAVE0
+ | lwz CARG4, 4(RD)
| cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
- | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+ | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
if (vk) {
| ble cr7, ->BC_ISEQN_Z
} else {
| ble cr7, ->BC_ISNEN_Z
}
|.else
- | lwzux TMP0, RA, BASE
- | lwz TMP2, 0(PC)
+ | lwzux CARG1, RA, BASE
+ | lwz SAVE0, 0(PC)
| lfd f0, 0(RA)
| addi PC, PC, 4
- | lwzux TMP1, RD, BASE
- | checknum cr0, TMP0
- | decode_RD4 TMP2, TMP2
+ | lwzux CARG3, RD, BASE
+ | checknum cr0, CARG1
+ | decode_RD4 SAVE0, SAVE0
| lfd f1, 0(RD)
- | checknum cr1, TMP1
- | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+ | checknum cr1, CARG3
+ | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
| bge cr0, >5
| bge cr1, >5
| fcmpu cr0, f0, f1
if (vk) {
| bne >1
- | add PC, PC, TMP2
+ | add PC, PC, SAVE0
} else {
| beq >1
- | add PC, PC, TMP2
+ | add PC, PC, SAVE0
}
|1:
| ins_next
@@ -3059,36 +3484,36 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|5: // Either or both types are not numbers.
|.if not DUALNUM
| lwz CARG2, 4(RA)
- | lwz CARG3, 4(RD)
+ | lwz CARG4, 4(RD)
|.endif
|.if FFI
- | cmpwi cr7, TMP0, LJ_TCDATA
- | cmpwi cr5, TMP1, LJ_TCDATA
+ | cmpwi cr7, CARG1, LJ_TCDATA
+ | cmpwi cr5, CARG3, LJ_TCDATA
|.endif
- | not TMP3, TMP0
- | cmplw TMP0, TMP1
- | cmplwi cr1, TMP3, ~LJ_TISPRI // Primitive?
+ | not TMP2, CARG1
+ | cmplw CARG1, CARG3
+ | cmplwi cr1, TMP2, ~LJ_TISPRI // Primitive?
|.if FFI
| cror 4*cr7+eq, 4*cr7+eq, 4*cr5+eq
|.endif
- | cmplwi cr6, TMP3, ~LJ_TISTABUD // Table or userdata?
+ | cmplwi cr6, TMP2, ~LJ_TISTABUD // Table or userdata?
|.if FFI
| beq cr7, ->vmeta_equal_cd
|.endif
- | cmplw cr5, CARG2, CARG3
+ | cmplw cr5, CARG2, CARG4
| crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt // 2: Same type and primitive.
| crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq // 1: Same tv or different type.
| crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq // 0: Same type and same tv.
- | mr SAVE0, PC
+ | mr SAVE1, PC
| cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt // 0 or 2.
| cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt // 1 or 2.
if (vk) {
| bne cr0, >6
- | add PC, PC, TMP2
+ | add PC, PC, SAVE0
|6:
} else {
| beq cr0, >6
- | add PC, PC, TMP2
+ | add PC, PC, SAVE0
|6:
}
|.if DUALNUM
@@ -3103,6 +3528,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|
| // Different tables or userdatas. Need to check __eq metamethod.
| // Field metatable must be at same offset for GCtab and GCudata!
+ | mr CARG3, CARG4
| lwz TAB:TMP2, TAB:CARG2->metatable
| li CARG4, 1-vk // ne = 0 or 1.
| cmplwi TAB:TMP2, 0
@@ -3110,7 +3536,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lbz TMP2, TAB:TMP2->nomm
| andix. TMP2, TMP2, 1<<MM_eq
| bne <1 // Or 'no __eq' flag set?
- | mr PC, SAVE0 // Restore old PC.
+ | mr PC, SAVE1 // Restore old PC.
| b ->vmeta_equal // Handle __eq metamethod.
break;
@@ -3151,16 +3577,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
vk = op == BC_ISEQN;
| // RA = src*8, RD = num_const*8, JMP with RD = target
|.if DUALNUM
- | lwzux TMP0, RA, BASE
+ | lwzux CARG1, RA, BASE
| addi PC, PC, 4
| lwz CARG2, 4(RA)
- | lwzux TMP1, RD, KBASE
- | checknum cr0, TMP0
- | lwz TMP2, -4(PC)
- | checknum cr1, TMP1
- | decode_RD4 TMP2, TMP2
- | lwz CARG3, 4(RD)
- | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+ | lwzux CARG3, RD, KBASE
+ | checknum cr0, CARG1
+ | lwz SAVE0, -4(PC)
+ | checknum cr1, CARG3
+ | decode_RD4 SAVE0, SAVE0
+ | lwz CARG4, 4(RD)
+ | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
if (vk) {
|->BC_ISEQN_Z:
} else {
@@ -3168,7 +3594,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
}
| bne cr0, >7
| bne cr1, >8
- | cmpw CARG2, CARG3
+ | cmpw CARG2, CARG4
|4:
|.else
if (vk) {
@@ -3176,20 +3602,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
} else {
|->BC_ISNEN_Z: // Dummy label.
}
- | lwzx TMP0, BASE, RA
+ | lwzx CARG1, BASE, RA
| addi PC, PC, 4
| lfdx f0, BASE, RA
- | lwz TMP2, -4(PC)
+ | lwz SAVE0, -4(PC)
| lfdx f1, KBASE, RD
- | decode_RD4 TMP2, TMP2
- | checknum TMP0
- | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+ | decode_RD4 SAVE0, SAVE0
+ | checknum CARG1
+ | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
| bge >3
| fcmpu cr0, f0, f1
|.endif
if (vk) {
| bne >1
- | add PC, PC, TMP2
+ | add PC, PC, SAVE0
|1:
|.if not FFI
|3:
@@ -3200,13 +3626,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|.if not FFI
|3:
|.endif
- | add PC, PC, TMP2
+ | add PC, PC, SAVE0
|2:
}
| ins_next
|.if FFI
|3:
- | cmpwi TMP0, LJ_TCDATA
+ | cmpwi CARG1, LJ_TCDATA
| beq ->vmeta_equal_cd
| b <1
|.endif
@@ -3214,18 +3640,31 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|7: // RA is not an integer.
| bge cr0, <3
| // RA is a number.
- | lfd f0, 0(RA)
+ | .FPU lfd f0, 0(RA)
| blt cr1, >1
| // RA is a number, RD is an integer.
- | tonum_i f1, CARG3
+ |.if FPU
+ | tonum_i f1, CARG4
+ |.else
+ | bl ->vm_sfi2d_2
+ |.endif
| b >2
|
|8: // RA is an integer, RD is a number.
+ |.if FPU
| tonum_i f0, CARG2
+ |.else
+ | bl ->vm_sfi2d_1
+ |.endif
|1:
- | lfd f1, 0(RD)
+ | .FPU lfd f1, 0(RD)
|2:
+ |.if FPU
| fcmpu cr0, f0, f1
+ |.else
+ | blex __ledf2
+ | cmpwi CRET1, 0
+ |.endif
| b <4
|.endif
break;
@@ -3280,7 +3719,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| add PC, PC, TMP2
} else {
| li TMP1, LJ_TFALSE
+ |.if FPU
| lfdx f0, BASE, RD
+ |.else
+ | lwzux CARG1, RD, BASE
+ | lwz CARG2, 4(RD)
+ |.endif
| cmplw TMP0, TMP1
if (op == BC_ISTC) {
| bge >1
@@ -3289,7 +3733,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
}
| addis PC, PC, -(BCBIAS_J*4 >> 16)
| decode_RD4 TMP2, INS
+ |.if FPU
| stfdx f0, BASE, RA
+ |.else
+ | stwux CARG1, RA, BASE
+ | stw CARG2, 4(RA)
+ |.endif
| add PC, PC, TMP2
|1:
}
@@ -3324,8 +3773,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
case BC_MOV:
| // RA = dst*8, RD = src*8
| ins_next1
+ |.if FPU
| lfdx f0, BASE, RD
| stfdx f0, BASE, RA
+ |.else
+ | lwzux TMP0, RD, BASE
+ | lwz TMP1, 4(RD)
+ | stwux TMP0, RA, BASE
+ | stw TMP1, 4(RA)
+ |.endif
| ins_next2
break;
case BC_NOT:
@@ -3427,44 +3883,65 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
||switch (vk) {
||case 0:
- | lwzx TMP1, BASE, RB
+ | lwzx CARG1, BASE, RB
| .if DUALNUM
- | lwzx TMP2, KBASE, RC
+ | lwzx CARG3, KBASE, RC
| .endif
+ | .if FPU
| lfdx f14, BASE, RB
| lfdx f15, KBASE, RC
+ | .else
+ | add TMP1, BASE, RB
+ | add TMP2, KBASE, RC
+ | lwz CARG2, 4(TMP1)
+ | lwz CARG4, 4(TMP2)
+ | .endif
| .if DUALNUM
- | checknum cr0, TMP1
- | checknum cr1, TMP2
+ | checknum cr0, CARG1
+ | checknum cr1, CARG3
| crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
| bge ->vmeta_arith_vn
| .else
- | checknum TMP1; bge ->vmeta_arith_vn
+ | checknum CARG1; bge ->vmeta_arith_vn
| .endif
|| break;
||case 1:
- | lwzx TMP1, BASE, RB
+ | lwzx CARG1, BASE, RB
| .if DUALNUM
- | lwzx TMP2, KBASE, RC
+ | lwzx CARG3, KBASE, RC
| .endif
+ | .if FPU
| lfdx f15, BASE, RB
| lfdx f14, KBASE, RC
+ | .else
+ | add TMP1, BASE, RB
+ | add TMP2, KBASE, RC
+ | lwz CARG2, 4(TMP1)
+ | lwz CARG4, 4(TMP2)
+ | .endif
| .if DUALNUM
- | checknum cr0, TMP1
- | checknum cr1, TMP2
+ | checknum cr0, CARG1
+ | checknum cr1, CARG3
| crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
| bge ->vmeta_arith_nv
| .else
- | checknum TMP1; bge ->vmeta_arith_nv
+ | checknum CARG1; bge ->vmeta_arith_nv
| .endif
|| break;
||default:
- | lwzx TMP1, BASE, RB
- | lwzx TMP2, BASE, RC
+ | lwzx CARG1, BASE, RB
+ | lwzx CARG3, BASE, RC
+ | .if FPU
| lfdx f14, BASE, RB
| lfdx f15, BASE, RC
- | checknum cr0, TMP1
- | checknum cr1, TMP2
+ | .else
+ | add TMP1, BASE, RB
+ | add TMP2, BASE, RC
+ | lwz CARG2, 4(TMP1)
+ | lwz CARG4, 4(TMP2)
+ | .endif
+ | checknum cr0, CARG1
+ | checknum cr1, CARG3
| crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
| bge ->vmeta_arith_vv
|| break;
@@ -3498,48 +3975,78 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| fsub a, b, a // b - floor(b/c)*c
|.endmacro
|
+ |.macro sfpmod
+ |->BC_MODVN_Z:
+ | stw CARG1, SFSAVE_1
+ | stw CARG2, SFSAVE_2
+ | mr SAVE0, CARG3
+ | mr SAVE1, CARG4
+ | blex __divdf3
+ | blex floor
+ | mr CARG3, SAVE0
+ | mr CARG4, SAVE1
+ | blex __muldf3
+ | mr CARG3, CRET1
+ | mr CARG4, CRET2
+ | lwz CARG1, SFSAVE_1
+ | lwz CARG2, SFSAVE_2
+ | blex __subdf3
+ |.endmacro
+ |
|.macro ins_arithfp, fpins
| ins_arithpre
|.if "fpins" == "fpmod_"
| b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway.
- |.else
+ |.elif FPU
| fpins f0, f14, f15
| ins_next1
| stfdx f0, BASE, RA
| ins_next2
+ |.else
+ | blex __divdf3 // Only soft-float div uses this macro.
+ | ins_next1
+ | stwux CRET1, RA, BASE
+ | stw CRET2, 4(RA)
+ | ins_next2
|.endif
|.endmacro
|
- |.macro ins_arithdn, intins, fpins
+ |.macro ins_arithdn, intins, fpins, fpcall
| // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8
||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
||switch (vk) {
||case 0:
- | lwzux TMP1, RB, BASE
- | lwzux TMP2, RC, KBASE
- | lwz CARG1, 4(RB)
- | checknum cr0, TMP1
- | lwz CARG2, 4(RC)
+ | lwzux CARG1, RB, BASE
+ | lwzux CARG3, RC, KBASE
+ | lwz CARG2, 4(RB)
+ | checknum cr0, CARG1
+ | lwz CARG4, 4(RC)
+ | checknum cr1, CARG3
|| break;
||case 1:
- | lwzux TMP1, RB, BASE
- | lwzux TMP2, RC, KBASE
- | lwz CARG2, 4(RB)
- | checknum cr0, TMP1
- | lwz CARG1, 4(RC)
+ | lwzux CARG3, RB, BASE
+ | lwzux CARG1, RC, KBASE
+ | lwz CARG4, 4(RB)
+ | checknum cr0, CARG3
+ | lwz CARG2, 4(RC)
+ | checknum cr1, CARG1
|| break;
||default:
- | lwzux TMP1, RB, BASE
- | lwzux TMP2, RC, BASE
- | lwz CARG1, 4(RB)
- | checknum cr0, TMP1
- | lwz CARG2, 4(RC)
+ | lwzux CARG1, RB, BASE
+ | lwzux CARG3, RC, BASE
+ | lwz CARG2, 4(RB)
+ | checknum cr0, CARG1
+ | lwz CARG4, 4(RC)
+ | checknum cr1, CARG3
|| break;
||}
- | checknum cr1, TMP2
| bne >5
| bne cr1, >5
- | intins CARG1, CARG1, CARG2
+ |.if "intins" == "intmod"
+ | mr CARG1, CARG2
+ | mr CARG2, CARG4
+ |.endif
+ | intins CARG1, CARG2, CARG4
| bso >4
|1:
| ins_next1
@@ -3551,29 +4058,40 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| checkov TMP0, <1 // Ignore unrelated overflow.
| ins_arithfallback b
|5: // FP variant.
+ |.if FPU
||if (vk == 1) {
| lfd f15, 0(RB)
- | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
| lfd f14, 0(RC)
||} else {
| lfd f14, 0(RB)
- | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
| lfd f15, 0(RC)
||}
+ |.endif
+ | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
| ins_arithfallback bge
|.if "fpins" == "fpmod_"
| b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway.
|.else
+ |.if FPU
| fpins f0, f14, f15
- | ins_next1
| stfdx f0, BASE, RA
+ |.else
+ |.if "fpcall" == "sfpmod"
+ | sfpmod
+ |.else
+ | blex fpcall
+ |.endif
+ | stwux CRET1, RA, BASE
+ | stw CRET2, 4(RA)
+ |.endif
+ | ins_next1
| b <2
|.endif
|.endmacro
|
- |.macro ins_arith, intins, fpins
+ |.macro ins_arith, intins, fpins, fpcall
|.if DUALNUM
- | ins_arithdn intins, fpins
+ | ins_arithdn intins, fpins, fpcall
|.else
| ins_arithfp fpins
|.endif
@@ -3588,9 +4106,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| addo. TMP0, TMP0, TMP3
| add y, a, b
|.endmacro
- | ins_arith addo32., fadd
+ | ins_arith addo32., fadd, __adddf3
|.else
- | ins_arith addo., fadd
+ | ins_arith addo., fadd, __adddf3
|.endif
break;
case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
@@ -3602,36 +4120,48 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| subo. TMP0, TMP0, TMP3
| sub y, a, b
|.endmacro
- | ins_arith subo32., fsub
+ | ins_arith subo32., fsub, __subdf3
|.else
- | ins_arith subo., fsub
+ | ins_arith subo., fsub, __subdf3
|.endif
break;
case BC_MULVN: case BC_MULNV: case BC_MULVV:
- | ins_arith mullwo., fmul
+ | ins_arith mullwo., fmul, __muldf3
break;
case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
| ins_arithfp fdiv
break;
case BC_MODVN:
- | ins_arith intmod, fpmod
+ | ins_arith intmod, fpmod, sfpmod
break;
case BC_MODNV: case BC_MODVV:
- | ins_arith intmod, fpmod_
+ | ins_arith intmod, fpmod_, sfpmod
break;
case BC_POW:
| // NYI: (partial) integer arithmetic.
- | lwzx TMP1, BASE, RB
+ | lwzx CARG1, BASE, RB
+ | lwzx CARG3, BASE, RC
+ |.if FPU
| lfdx FARG1, BASE, RB
- | lwzx TMP2, BASE, RC
| lfdx FARG2, BASE, RC
- | checknum cr0, TMP1
- | checknum cr1, TMP2
+ |.else
+ | add TMP1, BASE, RB
+ | add TMP2, BASE, RC
+ | lwz CARG2, 4(TMP1)
+ | lwz CARG4, 4(TMP2)
+ |.endif
+ | checknum cr0, CARG1
+ | checknum cr1, CARG3
| crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
| bge ->vmeta_arith_vv
| blex pow
| ins_next1
+ |.if FPU
| stfdx FARG1, BASE, RA
+ |.else
+ | stwux CARG1, RA, BASE
+ | stw CARG2, 4(RA)
+ |.endif
| ins_next2
break;
@@ -3651,8 +4181,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lp BASE, L->base
| bne ->vmeta_binop
| ins_next1
+ |.if FPU
| lfdx f0, BASE, SAVE0 // Copy result from RB to RA.
| stfdx f0, BASE, RA
+ |.else
+ | lwzux TMP0, SAVE0, BASE
+ | lwz TMP1, 4(SAVE0)
+ | stwux TMP0, RA, BASE
+ | stw TMP1, 4(RA)
+ |.endif
| ins_next2
break;
@@ -3715,8 +4252,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
case BC_KNUM:
| // RA = dst*8, RD = num_const*8
| ins_next1
+ |.if FPU
| lfdx f0, KBASE, RD
| stfdx f0, BASE, RA
+ |.else
+ | lwzux TMP0, RD, KBASE
+ | lwz TMP1, 4(RD)
+ | stwux TMP0, RA, BASE
+ | stw TMP1, 4(RA)
+ |.endif
| ins_next2
break;
case BC_KPRI:
@@ -3749,8 +4293,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lwzx UPVAL:RB, LFUNC:RB, RD
| ins_next1
| lwz TMP1, UPVAL:RB->v
+ |.if FPU
| lfd f0, 0(TMP1)
| stfdx f0, BASE, RA
+ |.else
+ | lwz TMP2, 0(TMP1)
+ | lwz TMP3, 4(TMP1)
+ | stwux TMP2, RA, BASE
+ | stw TMP3, 4(RA)
+ |.endif
| ins_next2
break;
case BC_USETV:
@@ -3758,14 +4309,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lwz LFUNC:RB, FRAME_FUNC(BASE)
| srwi RA, RA, 1
| addi RA, RA, offsetof(GCfuncL, uvptr)
+ |.if FPU
| lfdux f0, RD, BASE
+ |.else
+ | lwzux CARG1, RD, BASE
+ | lwz CARG3, 4(RD)
+ |.endif
| lwzx UPVAL:RB, LFUNC:RB, RA
| lbz TMP3, UPVAL:RB->marked
| lwz CARG2, UPVAL:RB->v
| andix. TMP3, TMP3, LJ_GC_BLACK // isblack(uv)
| lbz TMP0, UPVAL:RB->closed
| lwz TMP2, 0(RD)
+ |.if FPU
| stfd f0, 0(CARG2)
+ |.else
+ | stw CARG1, 0(CARG2)
+ | stw CARG3, 4(CARG2)
+ |.endif
| cmplwi cr1, TMP0, 0
| lwz TMP1, 4(RD)
| cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
@@ -3821,11 +4382,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lwz LFUNC:RB, FRAME_FUNC(BASE)
| srwi RA, RA, 1
| addi RA, RA, offsetof(GCfuncL, uvptr)
+ |.if FPU
| lfdx f0, KBASE, RD
+ |.else
+ | lwzux TMP2, RD, KBASE
+ | lwz TMP3, 4(RD)
+ |.endif
| lwzx UPVAL:RB, LFUNC:RB, RA
| ins_next1
| lwz TMP1, UPVAL:RB->v
+ |.if FPU
| stfd f0, 0(TMP1)
+ |.else
+ | stw TMP2, 0(TMP1)
+ | stw TMP3, 4(TMP1)
+ |.endif
| ins_next2
break;
case BC_USETP:
@@ -3973,11 +4544,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|.endif
| ble ->vmeta_tgetv // Integer key and in array part?
| lwzx TMP0, TMP1, TMP2
+ |.if FPU
| lfdx f14, TMP1, TMP2
+ |.else
+ | lwzux SAVE0, TMP1, TMP2
+ | lwz SAVE1, 4(TMP1)
+ |.endif
| checknil TMP0; beq >2
|1:
| ins_next1
+ |.if FPU
| stfdx f14, BASE, RA
+ |.else
+ | stwux SAVE0, RA, BASE
+ | stw SAVE1, 4(RA)
+ |.endif
| ins_next2
|
|2: // Check for __index if table value is nil.
@@ -4053,12 +4634,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lwz TMP1, TAB:RB->asize
| lwz TMP2, TAB:RB->array
| cmplw TMP0, TMP1; bge ->vmeta_tgetb
+ |.if FPU
| lwzx TMP1, TMP2, RC
| lfdx f0, TMP2, RC
+ |.else
+ | lwzux TMP1, TMP2, RC
+ | lwz TMP3, 4(TMP2)
+ |.endif
| checknil TMP1; beq >5
|1:
| ins_next1
+ |.if FPU
| stfdx f0, BASE, RA
+ |.else
+ | stwux TMP1, RA, BASE
+ | stw TMP3, 4(RA)
+ |.endif
| ins_next2
|
|5: // Check for __index if table value is nil.
@@ -4088,10 +4679,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| cmplw TMP0, CARG2
| slwi TMP2, CARG2, 3
| ble ->vmeta_tgetr // In array part?
+ |.if FPU
| lfdx f14, TMP1, TMP2
+ |.else
+ | lwzux SAVE0, TMP2, TMP1
+ | lwz SAVE1, 4(TMP2)
+ |.endif
|->BC_TGETR_Z:
| ins_next1
+ |.if FPU
| stfdx f14, BASE, RA
+ |.else
+ | stwux SAVE0, RA, BASE
+ | stw SAVE1, 4(RA)
+ |.endif
| ins_next2
break;
@@ -4132,11 +4733,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| ble ->vmeta_tsetv // Integer key and in array part?
| lwzx TMP2, TMP1, TMP0
| lbz TMP3, TAB:RB->marked
+ |.if FPU
| lfdx f14, BASE, RA
+ |.else
+ | add SAVE1, BASE, RA
+ | lwz SAVE0, 0(SAVE1)
+ | lwz SAVE1, 4(SAVE1)
+ |.endif
| checknil TMP2; beq >3
|1:
| andix. TMP2, TMP3, LJ_GC_BLACK // isblack(table)
+ |.if FPU
| stfdx f14, TMP1, TMP0
+ |.else
+ | stwux SAVE0, TMP1, TMP0
+ | stw SAVE1, 4(TMP1)
+ |.endif
| bne >7
|2:
| ins_next
@@ -4177,7 +4789,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lwz NODE:TMP2, TAB:RB->node
| stb ZERO, TAB:RB->nomm // Clear metamethod cache.
| and TMP1, TMP1, TMP0 // idx = str->hash & tab->hmask
+ |.if FPU
| lfdx f14, BASE, RA
+ |.else
+ | add CARG2, BASE, RA
+ | lwz SAVE0, 0(CARG2)
+ | lwz SAVE1, 4(CARG2)
+ |.endif
| slwi TMP0, TMP1, 5
| slwi TMP1, TMP1, 3
| sub TMP1, TMP0, TMP1
@@ -4193,7 +4811,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| checknil CARG2; beq >4 // Key found, but nil value?
|2:
| andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
+ |.if FPU
| stfd f14, NODE:TMP2->val
+ |.else
+ | stw SAVE0, NODE:TMP2->val.u32.hi
+ | stw SAVE1, NODE:TMP2->val.u32.lo
+ |.endif
| bne >7
|3:
| ins_next
@@ -4232,7 +4855,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| bl extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k)
| // Returns TValue *.
| lp BASE, L->base
+ |.if FPU
| stfd f14, 0(CRET1)
+ |.else
+ | stw SAVE0, 0(CRET1)
+ | stw SAVE1, 4(CRET1)
+ |.endif
| b <3 // No 2nd write barrier needed.
|
|7: // Possible table write barrier for the value. Skip valiswhite check.
@@ -4249,13 +4877,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| lwz TMP2, TAB:RB->array
| lbz TMP3, TAB:RB->marked
| cmplw TMP0, TMP1
+ |.if FPU
| lfdx f14, BASE, RA
+ |.else
+ | add CARG2, BASE, RA
+ | lwz SAVE0, 0(CARG2)
+ | lwz SAVE1, 4(CARG2)
+ |.endif
| bge ->vmeta_tsetb
| lwzx TMP1, TMP2, RC
| checknil TMP1; beq >5
|1:
| andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
+ |.if FPU
| stfdx f14, TMP2, RC
+ |.else
+ | stwux SAVE0, RC, TMP2
+ | stw SAVE1, 4(RC)
+ |.endif
| bne >7
|2:
| ins_next
@@ -4295,10 +4934,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|2:
| cmplw TMP0, CARG3
| slwi TMP2, CARG3, 3
+ |.if FPU
| lfdx f14, BASE, RA
+ |.else
+ | lwzux SAVE0, RA, BASE
+ | lwz SAVE1, 4(RA)
+ |.endif
| ble ->vmeta_tsetr // In array part?
| ins_next1
+ |.if FPU
| stfdx f14, TMP1, TMP2
+ |.else
+ | stwux SAVE0, TMP1, TMP2
+ | stw SAVE1, 4(TMP1)
+ |.endif
| ins_next2
|
|7: // Possible table write barrier for the value. Skip valiswhite check.
@@ -4328,10 +4977,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| add TMP1, TMP1, TMP0
| andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
|3: // Copy result slots to table.
+ |.if FPU
| lfd f0, 0(RA)
+ |.else
+ | lwz SAVE0, 0(RA)
+ | lwz SAVE1, 4(RA)
+ |.endif
| addi RA, RA, 8
| cmpw cr1, RA, TMP2
+ |.if FPU
| stfd f0, 0(TMP1)
+ |.else
+ | stw SAVE0, 0(TMP1)
+ | stw SAVE1, 4(TMP1)
+ |.endif
| addi TMP1, TMP1, 8
| blt cr1, <3
| bne >7
@@ -4398,9 +5057,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| beq cr1, >3
|2:
| addi TMP3, TMP2, 8
+ |.if FPU
| lfdx f0, RA, TMP2
+ |.else
+ | add CARG3, RA, TMP2
+ | lwz CARG1, 0(CARG3)
+ | lwz CARG2, 4(CARG3)
+ |.endif
| cmplw cr1, TMP3, NARGS8:RC
+ |.if FPU
| stfdx f0, BASE, TMP2
+ |.else
+ | stwux CARG1, TMP2, BASE
+ | stw CARG2, 4(TMP2)
+ |.endif
| mr TMP2, TMP3
| bne cr1, <2
|3:
@@ -4433,14 +5103,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| add BASE, BASE, RA
| lwz TMP1, -24(BASE)
| lwz LFUNC:RB, -20(BASE)
+ |.if FPU
| lfd f1, -8(BASE)
| lfd f0, -16(BASE)
+ |.else
+ | lwz CARG1, -8(BASE)
+ | lwz CARG2, -4(BASE)
+ | lwz CARG3, -16(BASE)
+ | lwz CARG4, -12(BASE)
+ |.endif
| stw TMP1, 0(BASE) // Copy callable.
| stw LFUNC:RB, 4(BASE)
| checkfunc TMP1
- | stfd f1, 16(BASE) // Copy control var.
| li NARGS8:RC, 16 // Iterators get 2 arguments.
+ |.if FPU
+ | stfd f1, 16(BASE) // Copy control var.
| stfdu f0, 8(BASE) // Copy state.
+ |.else
+ | stw CARG1, 16(BASE) // Copy control var.
+ | stw CARG2, 20(BASE)
+ | stwu CARG3, 8(BASE) // Copy state.
+ | stw CARG4, 4(BASE)
+ |.endif
| bne ->vmeta_call
| ins_call
break;
@@ -4461,7 +5145,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| slwi TMP3, RC, 3
| bge >5 // Index points after array part?
| lwzx TMP2, TMP1, TMP3
+ |.if FPU
| lfdx f0, TMP1, TMP3
+ |.else
+ | lwzux CARG1, TMP3, TMP1
+ | lwz CARG2, 4(TMP3)
+ |.endif
| checknil TMP2
| lwz INS, -4(PC)
| beq >4
@@ -4473,7 +5162,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|.endif
| addi RC, RC, 1
| addis TMP3, PC, -(BCBIAS_J*4 >> 16)
+ |.if FPU
| stfd f0, 8(RA)
+ |.else
+ | stw CARG1, 8(RA)
+ | stw CARG2, 12(RA)
+ |.endif
| decode_RD4 TMP1, INS
| stw RC, -4(RA) // Update control var.
| add PC, TMP1, TMP3
@@ -4498,17 +5192,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| slwi RB, RC, 3
| sub TMP3, TMP3, RB
| lwzx RB, TMP2, TMP3
+ |.if FPU
| lfdx f0, TMP2, TMP3
+ |.else
+ | add CARG3, TMP2, TMP3
+ | lwz CARG1, 0(CARG3)
+ | lwz CARG2, 4(CARG3)
+ |.endif
| add NODE:TMP3, TMP2, TMP3
| checknil RB
| lwz INS, -4(PC)
| beq >7
+ |.if FPU
| lfd f1, NODE:TMP3->key
+ |.else
+ | lwz CARG3, NODE:TMP3->key.u32.hi
+ | lwz CARG4, NODE:TMP3->key.u32.lo
+ |.endif
| addis TMP2, PC, -(BCBIAS_J*4 >> 16)
+ |.if FPU
| stfd f0, 8(RA)
+ |.else
+ | stw CARG1, 8(RA)
+ | stw CARG2, 12(RA)
+ |.endif
| add RC, RC, TMP0
| decode_RD4 TMP1, INS
+ |.if FPU
| stfd f1, 0(RA)
+ |.else
+ | stw CARG3, 0(RA)
+ | stw CARG4, 4(RA)
+ |.endif
| addi RC, RC, 1
| add PC, TMP1, TMP2
| stw RC, -4(RA) // Update control var.
@@ -4574,9 +5289,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| subi TMP2, TMP2, 16
| ble >2 // No vararg slots?
|1: // Copy vararg slots to destination slots.
+ |.if FPU
| lfd f0, 0(RC)
+ |.else
+ | lwz CARG1, 0(RC)
+ | lwz CARG2, 4(RC)
+ |.endif
| addi RC, RC, 8
+ |.if FPU
| stfd f0, 0(RA)
+ |.else
+ | stw CARG1, 0(RA)
+ | stw CARG2, 4(RA)
+ |.endif
| cmplw RA, TMP2
| cmplw cr1, RC, TMP3
| bge >3 // All destination slots filled?
@@ -4599,9 +5324,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| addi MULTRES, TMP1, 8
| bgt >7
|6:
+ |.if FPU
| lfd f0, 0(RC)
+ |.else
+ | lwz CARG1, 0(RC)
+ | lwz CARG2, 4(RC)
+ |.endif
| addi RC, RC, 8
+ |.if FPU
| stfd f0, 0(RA)
+ |.else
+ | stw CARG1, 0(RA)
+ | stw CARG2, 4(RA)
+ |.endif
| cmplw RC, TMP3
| addi RA, RA, 8
| blt <6 // More vararg slots?
@@ -4652,14 +5387,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| li TMP1, 0
|2:
| addi TMP3, TMP1, 8
+ |.if FPU
| lfdx f0, RA, TMP1
+ |.else
+ | add CARG3, RA, TMP1
+ | lwz CARG1, 0(CARG3)
+ | lwz CARG2, 4(CARG3)
+ |.endif
| cmpw TMP3, RC
+ |.if FPU
| stfdx f0, TMP2, TMP1
+ |.else
+ | add CARG3, TMP2, TMP1
+ | stw CARG1, 0(CARG3)
+ | stw CARG2, 4(CARG3)
+ |.endif
| beq >3
| addi TMP1, TMP3, 8
+ |.if FPU
| lfdx f1, RA, TMP3
+ |.else
+ | add CARG3, RA, TMP3
+ | lwz CARG1, 0(CARG3)
+ | lwz CARG2, 4(CARG3)
+ |.endif
| cmpw TMP1, RC
+ |.if FPU
| stfdx f1, TMP2, TMP3
+ |.else
+ | add CARG3, TMP2, TMP3
+ | stw CARG1, 0(CARG3)
+ | stw CARG2, 4(CARG3)
+ |.endif
| bne <2
|3:
|5:
@@ -4701,8 +5460,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
| subi TMP2, BASE, 8
| decode_RB8 RB, INS
if (op == BC_RET1) {
+ |.if FPU
| lfd f0, 0(RA)
| stfd f0, 0(TMP2)
+ |.else
+ | lwz CARG1, 0(RA)
+ | lwz CARG2, 4(RA)
+ | stw CARG1, 0(TMP2)
+ | stw CARG2, 4(TMP2)
+ |.endif
}
|5:
| cmplw RB, RD
@@ -4763,11 +5529,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
|4:
| stw CARG1, FORL_IDX*8+4(RA)
} else {
- | lwz TMP3, FORL_STEP*8(RA)
+ | lwz SAVE0, FORL_STEP*8(RA)
| lwz CARG3, FORL_STEP*8+4(RA)
| lwz TMP2, FORL_STOP*8(RA)
| lwz CARG2, FORL_STOP*8+4(RA)
- | cmplw cr7, TMP3, TISNUM
+ | cmplw cr7, SAVE0, TISNUM
| cmplw cr1, TMP2, TISNUM
| crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
| crand 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
@@ -4810,41 +5576,80 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
if (vk) {
|.if DUALNUM
|9: // FP loop.
+ |.if FPU
| lfd f1, FORL_IDX*8(RA)
|.else
+ | lwz CARG1, FORL_IDX*8(RA)
+ | lwz CARG2, FORL_IDX*8+4(RA)
+ |.endif
+ |.else
| lfdux f1, RA, BASE
|.endif
+ |.if FPU
| lfd f3, FORL_STEP*8(RA)
| lfd f2, FORL_STOP*8(RA)
- | lwz TMP3, FORL_STEP*8(RA)
| fadd f1, f1, f3
| stfd f1, FORL_IDX*8(RA)
+ |.else
+ | lwz CARG3, FORL_STEP*8(RA)
+ | lwz CARG4, FORL_STEP*8+4(RA)
+ | mr SAVE1, RD
+ | blex __adddf3
+ | mr RD, SAVE1
+ | stw CRET1, FORL_IDX*8(RA)
+ | stw CRET2, FORL_IDX*8+4(RA)
+ | lwz CARG3, FORL_STOP*8(RA)
+ | lwz CARG4, FORL_STOP*8+4(RA)
+ |.endif
+ | lwz SAVE0, FORL_STEP*8(RA)
} else {
|.if DUALNUM
|9: // FP loop.
|.else
| lwzux TMP1, RA, BASE
- | lwz TMP3, FORL_STEP*8(RA)
+ | lwz SAVE0, FORL_STEP*8(RA)
| lwz TMP2, FORL_STOP*8(RA)
| cmplw cr0, TMP1, TISNUM
- | cmplw cr7, TMP3, TISNUM
+ | cmplw cr7, SAVE0, TISNUM
| cmplw cr1, TMP2, TISNUM
|.endif
+ |.if FPU
| lfd f1, FORL_IDX*8(RA)
+ |.else
+ | lwz CARG1, FORL_IDX*8(RA)
+ | lwz CARG2, FORL_IDX*8+4(RA)
+ |.endif
| crand 4*cr0+lt, 4*cr0+lt, 4*cr7+lt
| crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
+ |.if FPU
| lfd f2, FORL_STOP*8(RA)
+ |.else
+ | lwz CARG3, FORL_STOP*8(RA)
+ | lwz CARG4, FORL_STOP*8+4(RA)
+ |.endif
| bge ->vmeta_for
}
- | cmpwi cr6, TMP3, 0
+ | cmpwi cr6, SAVE0, 0
if (op != BC_JFORL) {
| srwi RD, RD, 1
}
+ |.if FPU
| stfd f1, FORL_EXT*8(RA)
+ |.else
+ | stw CARG1, FORL_EXT*8(RA)
+ | stw CARG2, FORL_EXT*8+4(RA)
+ |.endif
if (op != BC_JFORL) {
| add RD, PC, RD
}
+ |.if FPU
| fcmpu cr0, f1, f2
+ |.else
+ | mr SAVE1, RD
+ | blex __ledf2
+ | cmpwi CRET1, 0
+ | mr RD, SAVE1
+ |.endif
if (op == BC_JFORI) {
| addis PC, RD, -(BCBIAS_J*4 >> 16)
}
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:40 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:13 ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:53 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:40 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:54PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the VM for powerpc.
Typo: s/powerpc/PowerPC/
> This includes:
> * Any loads/storages of double values use load/storage through 32-bit
Typo: s/storages/stores/ Feel free to ignore, though.
> registers of `lo` and `hi` part of the TValue union.
> * Macro .FPU is added to skip instructions necessary only for
> hard-float operations (load/store floating point registers from/on the
> stack, when leave/enter VM, for example).
Typo: s/leave/enter/leaving/entering/
> * Now r25 named as `SAVE1` is used as saved temporary register (used in
> different fast functions)
> * `sfi2d` macro is introduced to convert integer, that represents a
Typo: s/convert/convert an/
> soft-float, to double. Receives destination and source registers, uses
Typo: s/to double/to a double/
> `TMP0` and `TMP1`.
> * `sfpmod` macro is introduced for soft-float point `fmod` built-in.
> * `ins_arith` now receives the third parameter -- operation to use for
> soft-float point.
> * `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
> there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
> set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.
>
> Support of soft-float point for the JIT compiler will be added in the
> next patch.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> src/host/buildvm_asm.c | 2 +-
> src/lj_arch.h | 29 +-
> src/lj_ccall.c | 38 +-
> src/lj_ccall.h | 4 +-
> src/lj_ccallback.c | 30 +-
> src/lj_frame.h | 2 +-
> src/lj_ircall.h | 2 +-
> src/vm_ppc.dasc | 1249 +++++++++++++++++++++++++++++++++-------
> 8 files changed, 1101 insertions(+), 255 deletions(-)
>
> diff --git a/src/host/buildvm_asm.c b/src/host/buildvm_asm.c
> index ffd14903..43595b31 100644
> --- a/src/host/buildvm_asm.c
> +++ b/src/host/buildvm_asm.c
> @@ -338,7 +338,7 @@ void emit_asm(BuildCtx *ctx)
> #if !(LJ_TARGET_PS3 || LJ_TARGET_PSVITA)
> fprintf(ctx->fp, "\t.section .note.GNU-stack,\"\"," ELFASM_PX "progbits\n");
> #endif
> -#if LJ_TARGET_PPC && !LJ_TARGET_PS3
> +#if LJ_TARGET_PPC && !LJ_TARGET_PS3 && !LJ_ABI_SOFTFP
> /* Hard-float ABI. */
> fprintf(ctx->fp, "\t.gnu_attribute 4, 1\n");
> #endif
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index c39526ea..8bb8757d 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -262,6 +262,29 @@
> #else
> #define LJ_ARCH_BITS 32
> #define LJ_ARCH_NAME "ppc"
> +
> +#if !defined(LJ_ARCH_HASFPU)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ARCH_HASFPU 0
> +#else
> +#define LJ_ARCH_HASFPU 1
> +#endif
> +#endif
> +
> +#if !defined(LJ_ABI_SOFTFP)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ABI_SOFTFP 1
> +#else
> +#define LJ_ABI_SOFTFP 0
> +#endif
> +#endif
> +#endif
> +
> +#if LJ_ABI_SOFTFP
> +#define LJ_ARCH_NOJIT 1 /* NYI */
> +#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL
> +#else
> +#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
> #endif
>
> #define LJ_TARGET_PPC 1
> @@ -271,7 +294,6 @@
> #define LJ_TARGET_MASKSHIFT 0
> #define LJ_TARGET_MASKROT 1
> #define LJ_TARGET_UNIFYROT 1 /* Want only IR_BROL. */
> -#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
>
> #if LJ_TARGET_CONSOLE
> #define LJ_ARCH_PPC32ON64 1
> @@ -431,16 +453,13 @@
> #error "No support for ILP32 model on ARM64"
> #endif
> #elif LJ_TARGET_PPC
> -#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> -#error "No support for PowerPC CPUs without double-precision FPU"
> -#endif
> #if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
> #error "No support for little-endian PPC32"
> #endif
> #if LJ_ARCH_PPC64
> #error "No support for PowerPC 64 bit mode (yet)"
> #endif
> -#ifdef __NO_FPRS__
> +#if defined(__NO_FPRS__) && !defined(_SOFT_FLOAT)
> #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
> #endif
> #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ccall.c b/src/lj_ccall.c
> index d39ff861..c1e12f56 100644
> --- a/src/lj_ccall.c
> +++ b/src/lj_ccall.c
> @@ -388,6 +388,24 @@
> #define CCALL_HANDLE_COMPLEXARG \
> /* Pass complex by value in 2 or 4 GPRs. */
>
> +#define CCALL_HANDLE_GPR \
> + /* Try to pass argument in GPRs. */ \
> + if (n > 1) { \
> + lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \
> + if (ctype_isinteger(d->info) || ctype_isfp(d->info)) \
> + ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> + else if (ngpr + n > maxgpr) \
> + ngpr = maxgpr; /* Prevent reordering. */ \
> + } \
> + if (ngpr + n <= maxgpr) { \
> + dp = &cc->gpr[ngpr]; \
> + ngpr += n; \
> + goto done; \
> + } \
> +
> +#if LJ_ABI_SOFTFP
> +#define CCALL_HANDLE_REGARG CCALL_HANDLE_GPR
> +#else
> #define CCALL_HANDLE_REGARG \
> if (isfp) { /* Try to pass argument in FPRs. */ \
> if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -396,24 +414,16 @@
> d = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */ \
> goto done; \
> } \
> - } else { /* Try to pass argument in GPRs. */ \
> - if (n > 1) { \
> - lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \
> - if (ctype_isinteger(d->info)) \
> - ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> - else if (ngpr + n > maxgpr) \
> - ngpr = maxgpr; /* Prevent reordering. */ \
> - } \
> - if (ngpr + n <= maxgpr) { \
> - dp = &cc->gpr[ngpr]; \
> - ngpr += n; \
> - goto done; \
> - } \
> + } else { \
> + CCALL_HANDLE_GPR \
> }
> +#endif
>
> +#if !LJ_ABI_SOFTFP
> #define CCALL_HANDLE_RET \
> if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
> ctr = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */
> +#endif
>
> #elif LJ_TARGET_MIPS32
> /* -- MIPS o32 calling conventions ---------------------------------------- */
> @@ -1081,7 +1091,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
> }
> if (fid) lj_err_caller(L, LJ_ERR_FFI_NUMARG); /* Too few arguments. */
>
> -#if LJ_TARGET_X64 || LJ_TARGET_PPC
> +#if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP)
> cc->nfpr = nfpr; /* Required for vararg functions. */
> #endif
> cc->nsp = nsp;
> diff --git a/src/lj_ccall.h b/src/lj_ccall.h
> index 59f66481..6efa48c7 100644
> --- a/src/lj_ccall.h
> +++ b/src/lj_ccall.h
> @@ -86,9 +86,9 @@ typedef union FPRArg {
> #elif LJ_TARGET_PPC
>
> #define CCALL_NARG_GPR 8
> -#define CCALL_NARG_FPR 8
> +#define CCALL_NARG_FPR (LJ_ABI_SOFTFP ? 0 : 8)
> #define CCALL_NRET_GPR 4 /* For complex double. */
> -#define CCALL_NRET_FPR 1
> +#define CCALL_NRET_FPR (LJ_ABI_SOFTFP ? 0 : 1)
> #define CCALL_SPS_EXTRA 4
> #define CCALL_SPS_FREE 0
>
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index 224b6b94..c33190d7 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -419,6 +419,23 @@ void lj_ccallback_mcode_free(CTState *cts)
>
> #elif LJ_TARGET_PPC
>
> +#define CALLBACK_HANDLE_GPR \
> + if (n > 1) { \
> + lua_assert(((LJ_ABI_SOFTFP && ctype_isnum(cta->info)) || /* double. */ \
> + ctype_isinteger(cta->info)) && n == 2); /* int64_t. */ \
> + ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> + } \
> + if (ngpr + n <= maxgpr) { \
> + sp = &cts->cb.gpr[ngpr]; \
> + ngpr += n; \
> + goto done; \
> + }
> +
> +#if LJ_ABI_SOFTFP
> +#define CALLBACK_HANDLE_REGARG \
> + CALLBACK_HANDLE_GPR \
> + UNUSED(isfp);
> +#else
> #define CALLBACK_HANDLE_REGARG \
> if (isfp) { \
> if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -427,20 +444,15 @@ void lj_ccallback_mcode_free(CTState *cts)
> goto done; \
> } \
> } else { /* Try to pass argument in GPRs. */ \
> - if (n > 1) { \
> - lua_assert(ctype_isinteger(cta->info) && n == 2); /* int64_t. */ \
> - ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> - } \
> - if (ngpr + n <= maxgpr) { \
> - sp = &cts->cb.gpr[ngpr]; \
> - ngpr += n; \
> - goto done; \
> - } \
> + CALLBACK_HANDLE_GPR \
> }
> +#endif
>
> +#if !LJ_ABI_SOFTFP
> #define CALLBACK_HANDLE_RET \
> if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
> *(double *)dp = *(float *)dp; /* FPRs always hold doubles. */
> +#endif
>
> #elif LJ_TARGET_MIPS32
>
> diff --git a/src/lj_frame.h b/src/lj_frame.h
> index 2bdf3c48..5cb3d639 100644
> --- a/src/lj_frame.h
> +++ b/src/lj_frame.h
> @@ -226,7 +226,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK }; /* Special continuations. */
> #define CFRAME_OFS_L 36
> #define CFRAME_OFS_PC 32
> #define CFRAME_OFS_MULTRES 28
> -#define CFRAME_SIZE 272
> +#define CFRAME_SIZE (LJ_ARCH_HASFPU ? 272 : 128)
> #define CFRAME_SHIFT_MULTRES 3
> #endif
> #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index c1ac29d1..bbad35b1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -291,7 +291,7 @@ LJ_DATA const CCallInfo lj_ir_callinfo[IRCALL__MAX+1];
> #define fp64_f2l __aeabi_f2lz
> #define fp64_f2ul __aeabi_f2ulz
> #endif
> -#elif LJ_TARGET_MIPS
> +#elif LJ_TARGET_MIPS || LJ_TARGET_PPC
> #define softfp_add __adddf3
> #define softfp_sub __subdf3
> #define softfp_mul __muldf3
> diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
> index 7ad8df37..980ad897 100644
> --- a/src/vm_ppc.dasc
> +++ b/src/vm_ppc.dasc
> @@ -103,6 +103,18 @@
> |// Fixed register assignments for the interpreter.
> |// Don't use: r1 = sp, r2 and r13 = reserved (TOC, TLS or SDATA)
> |
> +|.macro .FPU, a, b
> +|.if FPU
> +| a, b
> +|.endif
> +|.endmacro
> +|
> +|.macro .FPU, a, b, c
> +|.if FPU
> +| a, b, c
> +|.endif
> +|.endmacro
> +|
> |// The following must be C callee-save (but BASE is often refetched).
> |.define BASE, r14 // Base of current Lua stack frame.
> |.define KBASE, r15 // Constants of current Lua function.
> @@ -116,8 +128,10 @@
> |.define TISNUM, r22
> |.define TISNIL, r23
> |.define ZERO, r24
> +|.if FPU
> |.define TOBIT, f30 // 2^52 + 2^51.
> |.define TONUM, f31 // 2^52 + 2^51 + 2^31.
> +|.endif
> |
> |// The following temporaries are not saved across C calls, except for RA.
> |.define RA, r20 // Callee-save.
> @@ -133,6 +147,7 @@
> |
> |// Saved temporaries.
> |.define SAVE0, r21
> +|.define SAVE1, r25
> |
> |// Calling conventions.
> |.define CARG1, r3
> @@ -141,8 +156,10 @@
> |.define CARG4, r6 // Overlaps TMP3.
> |.define CARG5, r7 // Overlaps INS.
> |
> +|.if FPU
> |.define FARG1, f1
> |.define FARG2, f2
> +|.endif
> |
> |.define CRET1, r3
> |.define CRET2, r4
> @@ -213,10 +230,16 @@
> |.endif
> |.else
> |
> +|.if FPU
> |.define SAVE_LR, 276(sp)
> |.define CFRAME_SPACE, 272 // Delta for sp.
> |// Back chain for sp: 272(sp) <-- sp entering interpreter
> |.define SAVE_FPR_, 128 // .. 128+18*8: 64 bit FPR saves.
> +|.else
> +|.define SAVE_LR, 132(sp)
> +|.define CFRAME_SPACE, 128 // Delta for sp.
> +|// Back chain for sp: 128(sp) <-- sp entering interpreter
> +|.endif
> |.define SAVE_GPR_, 56 // .. 56+18*4: 32 bit GPR saves.
> |.define SAVE_CR, 52(sp) // 32 bit CR save.
> |.define SAVE_ERRF, 48(sp) // 32 bit C frame info.
> @@ -226,16 +249,25 @@
> |.define SAVE_PC, 32(sp)
> |.define SAVE_MULTRES, 28(sp)
> |.define UNUSED1, 24(sp)
> +|.if FPU
> |.define TMPD_LO, 20(sp)
> |.define TMPD_HI, 16(sp)
> |.define TONUM_LO, 12(sp)
> |.define TONUM_HI, 8(sp)
> +|.else
> +|.define SFSAVE_4, 20(sp)
> +|.define SFSAVE_3, 16(sp)
> +|.define SFSAVE_2, 12(sp)
> +|.define SFSAVE_1, 8(sp)
> +|.endif
> |// Next frame lr: 4(sp)
> |// Back chain for sp: 0(sp) <-- sp while in interpreter
> |
> +|.if FPU
> |.define TMPD_BLO, 23(sp)
> |.define TMPD, TMPD_HI
> |.define TONUM_D, TONUM_HI
> +|.endif
> |
> |.endif
> |
> @@ -245,7 +277,7 @@
> |.else
> | stw r..reg, SAVE_GPR_+(reg-14)*4(sp)
> |.endif
> -| stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +| .FPU stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> |.endmacro
> |.macro rest_, reg
> |.if GPR64
> @@ -253,7 +285,7 @@
> |.else
> | lwz r..reg, SAVE_GPR_+(reg-14)*4(sp)
> |.endif
> -| lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +| .FPU lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> |.endmacro
> |
> |.macro saveregs
> @@ -323,6 +355,7 @@
> |// Trap for not-yet-implemented parts.
> |.macro NYI; tw 4, sp, sp; .endmacro
> |
> +|.if FPU
> |// int/FP conversions.
> |.macro tonum_i, freg, reg
> | xoris reg, reg, 0x8000
> @@ -346,6 +379,7 @@
> |.macro toint, reg, freg
> | toint reg, freg, freg
> |.endmacro
> +|.endif
> |
> |//-----------------------------------------------------------------------
> |
> @@ -533,9 +567,19 @@ static void build_subroutines(BuildCtx *ctx)
> | beq >2
> |1:
> | addic. TMP1, TMP1, -8
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + |.endif
> | addi RA, RA, 8
> + |.if FPU
> | stfd f0, 0(BASE)
> + |.else
> + | stw CARG1, 0(BASE)
> + | stw CARG2, 4(BASE)
> + |.endif
> | addi BASE, BASE, 8
> | bney <1
> |
> @@ -613,23 +657,23 @@ static void build_subroutines(BuildCtx *ctx)
> | .toc ld TOCREG, SAVE_TOC
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp BASE, L->base
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | lwz DISPATCH, L->glref // Setup pointer to dispatch table.
> | li ZERO, 0
> - | stw TMP3, TMPD
> + | .FPU stw TMP3, TMPD
> | li TMP1, LJ_TFALSE
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> | li TISNIL, LJ_TNIL
> | li_vmstate INTERP
> - | lfs TOBIT, TMPD
> + | .FPU lfs TOBIT, TMPD
> | lwz PC, FRAME_PC(BASE) // Fetch PC of previous frame.
> | la RA, -8(BASE) // Results start at BASE-8.
> - | stw TMP3, TMPD
> + | .FPU stw TMP3, TMPD
> | addi DISPATCH, DISPATCH, GG_G2DISP
> | stw TMP1, 0(RA) // Prepend false to error message.
> | li RD, 16 // 2 results: false + error message.
> | st_vmstate
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | b ->vm_returnc
> |
> |//-----------------------------------------------------------------------
> @@ -690,22 +734,22 @@ static void build_subroutines(BuildCtx *ctx)
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp TMP1, L->top
> | lwz PC, FRAME_PC(BASE)
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | stb CARG3, L->status
> - | stw TMP3, TMPD
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | lfs TOBIT, TMPD
> + | .FPU stw TMP3, TMPD
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU lfs TOBIT, TMPD
> | sub RD, TMP1, BASE
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | addi RD, RD, 8
> - | stw TMP0, TONUM_HI
> + | .FPU stw TMP0, TONUM_HI
> | li_vmstate INTERP
> | li ZERO, 0
> | st_vmstate
> | andix. TMP0, PC, FRAME_TYPE
> | mr MULTRES, RD
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | li TISNIL, LJ_TNIL
> | beq ->BC_RET_Z
> | b ->vm_return
> @@ -739,19 +783,19 @@ static void build_subroutines(BuildCtx *ctx)
> | lp TMP2, L->base // TMP2 = old base (used in vmeta_call).
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp TMP1, L->top
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | add PC, PC, BASE
> - | stw TMP3, TMPD
> + | .FPU stw TMP3, TMPD
> | li ZERO, 0
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | lfs TOBIT, TMPD
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU lfs TOBIT, TMPD
> | sub PC, PC, TMP2 // PC = frame delta + frame type
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | sub NARGS8:RC, TMP1, BASE
> - | stw TMP0, TONUM_HI
> + | .FPU stw TMP0, TONUM_HI
> | li_vmstate INTERP
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | li TISNIL, LJ_TNIL
> | st_vmstate
> |
> @@ -839,15 +883,30 @@ static void build_subroutines(BuildCtx *ctx)
> | lwz INS, -4(PC)
> | subi CARG2, RB, 16
> | decode_RB8 SAVE0, INS
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz TMP2, 0(RA)
> + | lwz TMP3, 4(RA)
> + |.endif
> | add TMP1, BASE, SAVE0
> | stp BASE, L->base
> | cmplw TMP1, CARG2
> | sub CARG3, CARG2, TMP1
> | decode_RA8 RA, INS
> + |.if FPU
> | stfd f0, 0(CARG2)
> + |.else
> + | stw TMP2, 0(CARG2)
> + | stw TMP3, 4(CARG2)
> + |.endif
> | bney ->BC_CAT_Z
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux TMP2, RA, BASE
> + | stw TMP3, 4(RA)
> + |.endif
> | b ->cont_nop
> |
> |//-- Table indexing metamethods -----------------------------------------
> @@ -900,9 +959,19 @@ static void build_subroutines(BuildCtx *ctx)
> | // Returns TValue * (finished) or NULL (metamethod).
> | cmplwi CRET1, 0
> | beq >3
> + |.if FPU
> | lfd f0, 0(CRET1)
> + |.else
> + | lwz TMP0, 0(CRET1)
> + | lwz TMP1, 4(CRET1)
> + |.endif
> | ins_next1
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> |
> |3: // Call __index metamethod.
> @@ -920,7 +989,12 @@ static void build_subroutines(BuildCtx *ctx)
> | // Returns cTValue * or NULL.
> | cmplwi CRET1, 0
> | beq >1
> + |.if FPU
> | lfd f14, 0(CRET1)
> + |.else
> + | lwz SAVE0, 0(CRET1)
> + | lwz SAVE1, 4(CRET1)
> + |.endif
> | b ->BC_TGETR_Z
> |1:
> | stwx TISNIL, BASE, RA
> @@ -975,11 +1049,21 @@ static void build_subroutines(BuildCtx *ctx)
> | bl extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k)
> | // Returns TValue * (finished) or NULL (metamethod).
> | cmplwi CRET1, 0
> + |.if FPU
> | lfdx f0, BASE, RA
> + |.else
> + | lwzux TMP2, RA, BASE
> + | lwz TMP3, 4(RA)
> + |.endif
> | beq >3
> | // NOBARRIER: lj_meta_tset ensures the table is not black.
> | ins_next1
> + |.if FPU
> | stfd f0, 0(CRET1)
> + |.else
> + | stw TMP2, 0(CRET1)
> + | stw TMP3, 4(CRET1)
> + |.endif
> | ins_next2
> |
> |3: // Call __newindex metamethod.
> @@ -990,7 +1074,12 @@ static void build_subroutines(BuildCtx *ctx)
> | add PC, TMP1, BASE
> | lwz LFUNC:RB, FRAME_FUNC(BASE) // Guaranteed to be a function here.
> | li NARGS8:RC, 24 // 3 args for func(t, k, v)
> + |.if FPU
> | stfd f0, 16(BASE) // Copy value to third argument.
> + |.else
> + | stw TMP2, 16(BASE)
> + | stw TMP3, 20(BASE)
> + |.endif
> | b ->vm_call_dispatch_f
> |
> |->vmeta_tsetr:
> @@ -999,7 +1088,12 @@ static void build_subroutines(BuildCtx *ctx)
> | stw PC, SAVE_PC
> | bl extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key)
> | // Returns TValue *.
> + |.if FPU
> | stfd f14, 0(CRET1)
> + |.else
> + | stw SAVE0, 0(CRET1)
> + | stw SAVE1, 4(CRET1)
> + |.endif
> | b ->cont_nop
> |
> |//-- Comparison metamethods ---------------------------------------------
> @@ -1038,9 +1132,19 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |->cont_ra: // RA = resultptr
> | lwz INS, -4(PC)
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + |.endif
> | decode_RA8 TMP1, INS
> + |.if FPU
> | stfdx f0, BASE, TMP1
> + |.else
> + | stwux CARG1, TMP1, BASE
> + | stw CARG2, 4(TMP1)
> + |.endif
> | b ->cont_nop
> |
> |->cont_condt: // RA = resultptr
> @@ -1246,22 +1350,32 @@ static void build_subroutines(BuildCtx *ctx)
> |.macro .ffunc_n, name
> |->ff_ .. name:
> | cmplwi NARGS8:RC, 8
> - | lwz CARG3, 0(BASE)
> + | lwz CARG1, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + |.endif
> | blt ->fff_fallback
> - | checknum CARG3; bge ->fff_fallback
> + | checknum CARG1; bge ->fff_fallback
> |.endmacro
> |
> |.macro .ffunc_nn, name
> |->ff_ .. name:
> | cmplwi NARGS8:RC, 16
> - | lwz CARG3, 0(BASE)
> + | lwz CARG1, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> - | lwz CARG4, 8(BASE)
> + | lwz CARG3, 8(BASE)
> | lfd FARG2, 8(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + | lwz CARG3, 8(BASE)
> + | lwz CARG4, 12(BASE)
> + |.endif
> | blt ->fff_fallback
> + | checknum CARG1; bge ->fff_fallback
> | checknum CARG3; bge ->fff_fallback
> - | checknum CARG4; bge ->fff_fallback
> |.endmacro
> |
> |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1.
> @@ -1282,14 +1396,21 @@ static void build_subroutines(BuildCtx *ctx)
> | bge cr1, ->fff_fallback
> | stw CARG3, 0(RA)
> | addi RD, NARGS8:RC, 8 // Compute (nresults+1)*8.
> + | addi TMP1, BASE, 8
> + | add TMP2, RA, NARGS8:RC
> | stw CARG1, 4(RA)
> | beq ->fff_res // Done if exactly 1 argument.
> - | li TMP1, 8
> - | subi RC, RC, 8
> |1:
> - | cmplw TMP1, RC
> - | lfdx f0, BASE, TMP1
> - | stfdx f0, RA, TMP1
> + | cmplw TMP1, TMP2
> + |.if FPU
> + | lfd f0, 0(TMP1)
> + | stfd f0, 0(TMP1)
> + |.else
> + | lwz CARG1, 0(TMP1)
> + | lwz CARG2, 4(TMP1)
> + | stw CARG1, -8(TMP1)
> + | stw CARG2, -4(TMP1)
> + |.endif
> | addi TMP1, TMP1, 8
> | bney <1
> | b ->fff_res
> @@ -1304,8 +1425,14 @@ static void build_subroutines(BuildCtx *ctx)
> | orc TMP1, TMP2, TMP0
> | addi TMP1, TMP1, ~LJ_TISNUM+1
> | slwi TMP1, TMP1, 3
> + |.if FPU
> | la TMP2, CFUNC:RB->upvalue
> | lfdx FARG1, TMP2, TMP1
> + |.else
> + | add TMP1, CFUNC:RB, TMP1
> + | lwz CARG1, CFUNC:TMP1->upvalue[0].u32.hi
> + | lwz CARG2, CFUNC:TMP1->upvalue[0].u32.lo
> + |.endif
> | b ->fff_resn
> |
> |//-- Base library: getters and setters ---------------------------------
> @@ -1383,7 +1510,12 @@ static void build_subroutines(BuildCtx *ctx)
> | mr CARG1, L
> | bl extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key)
> | // Returns cTValue *.
> + |.if FPU
> | lfd FARG1, 0(CRET1)
> + |.else
> + | lwz CARG2, 4(CRET1)
> + | lwz CARG1, 0(CRET1) // Caveat: CARG1 == CRET1.
> + |.endif
> | b ->fff_resn
> |
> |//-- Base library: conversions ------------------------------------------
> @@ -1392,7 +1524,11 @@ static void build_subroutines(BuildCtx *ctx)
> | // Only handles the number case inline (without a base argument).
> | cmplwi NARGS8:RC, 8
> | lwz CARG1, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + |.endif
> | bne ->fff_fallback // Exactly one argument.
> | checknum CARG1; bgt ->fff_fallback
> | b ->fff_resn
> @@ -1443,12 +1579,23 @@ static void build_subroutines(BuildCtx *ctx)
> | cmplwi CRET1, 0
> | li CARG3, LJ_TNIL
> | beq ->fff_restv // End of traversal: return nil.
> - | lfd f0, 8(BASE) // Copy key and value to results.
> | la RA, -8(BASE)
> + |.if FPU
> + | lfd f0, 8(BASE) // Copy key and value to results.
> | lfd f1, 16(BASE)
> | stfd f0, 0(RA)
> - | li RD, (2+1)*8
> | stfd f1, 8(RA)
> + |.else
> + | lwz CARG1, 8(BASE)
> + | lwz CARG2, 12(BASE)
> + | lwz CARG3, 16(BASE)
> + | lwz CARG4, 20(BASE)
> + | stw CARG1, 0(RA)
> + | stw CARG2, 4(RA)
> + | stw CARG3, 8(RA)
> + | stw CARG4, 12(RA)
> + |.endif
> + | li RD, (2+1)*8
> | b ->fff_res
> |
> |.ffunc_1 pairs
> @@ -1457,17 +1604,32 @@ static void build_subroutines(BuildCtx *ctx)
> | bne ->fff_fallback
> #if LJ_52
> | lwz TAB:TMP2, TAB:CARG1->metatable
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | cmplwi TAB:TMP2, 0
> | la RA, -8(BASE)
> | bne ->fff_fallback
> #else
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | la RA, -8(BASE)
> #endif
> | stw TISNIL, 8(BASE)
> | li RD, (3+1)*8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw TMP0, 0(RA)
> + | stw TMP1, 4(RA)
> + |.endif
> | b ->fff_res
> |
> |.ffunc ipairs_aux
> @@ -1513,14 +1675,24 @@ static void build_subroutines(BuildCtx *ctx)
> | stfd FARG2, 0(RA)
> |.endif
> | ble >2 // Not in array part?
> + |.if FPU
> | lwzx TMP2, TMP1, TMP3
> | lfdx f0, TMP1, TMP3
> + |.else
> + | lwzux TMP2, TMP1, TMP3
> + | lwz TMP3, 4(TMP1)
> + |.endif
> |1:
> | checknil TMP2
> | li RD, (0+1)*8
> | beq ->fff_res // End of iteration, return 0 results.
> | li RD, (2+1)*8
> + |.if FPU
> | stfd f0, 8(RA)
> + |.else
> + | stw TMP2, 8(RA)
> + | stw TMP3, 12(RA)
> + |.endif
> | b ->fff_res
> |2: // Check for empty hash part first. Otherwise call C function.
> | lwz TMP0, TAB:CARG1->hmask
> @@ -1534,7 +1706,11 @@ static void build_subroutines(BuildCtx *ctx)
> | li RD, (0+1)*8
> | beq ->fff_res
> | lwz TMP2, 0(CRET1)
> + |.if FPU
> | lfd f0, 0(CRET1)
> + |.else
> + | lwz TMP3, 4(CRET1)
> + |.endif
> | b <1
> |
> |.ffunc_1 ipairs
> @@ -1543,12 +1719,22 @@ static void build_subroutines(BuildCtx *ctx)
> | bne ->fff_fallback
> #if LJ_52
> | lwz TAB:TMP2, TAB:CARG1->metatable
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | cmplwi TAB:TMP2, 0
> | la RA, -8(BASE)
> | bne ->fff_fallback
> #else
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | la RA, -8(BASE)
> #endif
> |.if DUALNUM
> @@ -1558,7 +1744,12 @@ static void build_subroutines(BuildCtx *ctx)
> |.endif
> | stw ZERO, 12(BASE)
> | li RD, (3+1)*8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw TMP0, 0(RA)
> + | stw TMP1, 4(RA)
> + |.endif
> | b ->fff_res
> |
> |//-- Base library: catch errors ----------------------------------------
> @@ -1577,19 +1768,32 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |.ffunc xpcall
> | cmplwi NARGS8:RC, 16
> - | lwz CARG4, 8(BASE)
> + | lwz CARG3, 8(BASE)
> + |.if FPU
> | lfd FARG2, 8(BASE)
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + | lwz CARG4, 12(BASE)
> + |.endif
> | blt ->fff_fallback
> | lbz TMP1, DISPATCH_GL(hookmask)(DISPATCH)
> | mr TMP2, BASE
> - | checkfunc CARG4; bne ->fff_fallback // Traceback must be a function.
> + | checkfunc CARG3; bne ->fff_fallback // Traceback must be a function.
> | la BASE, 16(BASE)
> | // Remember active hook before pcall.
> | rlwinm TMP1, TMP1, 32-HOOK_ACTIVE_SHIFT, 31, 31
> + |.if FPU
> | stfd FARG2, 0(TMP2) // Swap function and traceback.
> - | subi NARGS8:RC, NARGS8:RC, 16
> | stfd FARG1, 8(TMP2)
> + |.else
> + | stw CARG3, 0(TMP2)
> + | stw CARG4, 4(TMP2)
> + | stw CARG1, 8(TMP2)
> + | stw CARG2, 12(TMP2)
> + |.endif
> + | subi NARGS8:RC, NARGS8:RC, 16
> | addi PC, TMP1, 16+FRAME_PCALL
> | b ->vm_call_dispatch
> |
> @@ -1632,9 +1836,21 @@ static void build_subroutines(BuildCtx *ctx)
> | stp BASE, L->top
> |2: // Move args to coroutine.
> | cmpw TMP1, NARGS8:RC
> + |.if FPU
> | lfdx f0, BASE, TMP1
> + |.else
> + | add CARG3, BASE, TMP1
> + | lwz TMP2, 0(CARG3)
> + | lwz TMP3, 4(CARG3)
> + |.endif
> | beq >3
> + |.if FPU
> | stfdx f0, CARG2, TMP1
> + |.else
> + | add CARG3, CARG2, TMP1
> + | stw TMP2, 0(CARG3)
> + | stw TMP3, 4(CARG3)
> + |.endif
> | addi TMP1, TMP1, 8
> | b <2
> |3:
> @@ -1665,8 +1881,17 @@ static void build_subroutines(BuildCtx *ctx)
> | stp TMP2, L:SAVE0->top // Clear coroutine stack.
> |5: // Move results from coroutine.
> | cmplw TMP1, TMP3
> + |.if FPU
> | lfdx f0, TMP2, TMP1
> | stfdx f0, BASE, TMP1
> + |.else
> + | add CARG3, TMP2, TMP1
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + | add CARG3, BASE, TMP1
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | addi TMP1, TMP1, 8
> | bne <5
> |6:
> @@ -1691,12 +1916,22 @@ static void build_subroutines(BuildCtx *ctx)
> | andix. TMP0, PC, FRAME_TYPE
> | la TMP3, -8(TMP3)
> | li TMP1, LJ_TFALSE
> + |.if FPU
> | lfd f0, 0(TMP3)
> + |.else
> + | lwz CARG1, 0(TMP3)
> + | lwz CARG2, 4(TMP3)
> + |.endif
> | stp TMP3, L:SAVE0->top // Remove error from coroutine stack.
> | li RD, (2+1)*8
> | stw TMP1, -8(BASE) // Prepend false to results.
> | la RA, -8(BASE)
> + |.if FPU
> | stfd f0, 0(BASE) // Copy error message.
> + |.else
> + | stw CARG1, 0(BASE) // Copy error message.
> + | stw CARG2, 4(BASE)
> + |.endif
> | b <7
> |.else
> | mr CARG1, L
> @@ -1875,7 +2110,12 @@ static void build_subroutines(BuildCtx *ctx)
> | lus CARG1, 0x8000 // -(2^31).
> | beqy ->fff_resi
> |5:
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + |.endif
> | blex func
> | b ->fff_resn
> |.endmacro
> @@ -1899,10 +2139,14 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |.ffunc math_log
> | cmplwi NARGS8:RC, 8
> - | lwz CARG3, 0(BASE)
> - | lfd FARG1, 0(BASE)
> + | lwz CARG1, 0(BASE)
> | bne ->fff_fallback // Need exactly 1 argument.
> - | checknum CARG3; bge ->fff_fallback
> + | checknum CARG1; bge ->fff_fallback
> + |.if FPU
> + | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + |.endif
> | blex log
> | b ->fff_resn
> |
> @@ -1924,17 +2168,24 @@ static void build_subroutines(BuildCtx *ctx)
> |.if DUALNUM
> |.ffunc math_ldexp
> | cmplwi NARGS8:RC, 16
> - | lwz CARG3, 0(BASE)
> + | lwz TMP0, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> - | lwz CARG4, 8(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + |.endif
> + | lwz TMP1, 8(BASE)
> |.if GPR64
> | lwz CARG2, 12(BASE)
> - |.else
> + |.elif FPU
> | lwz CARG1, 12(BASE)
> + |.else
> + | lwz CARG3, 12(BASE)
> |.endif
> | blt ->fff_fallback
> - | checknum CARG3; bge ->fff_fallback
> - | checknum CARG4; bne ->fff_fallback
> + | checknum TMP0; bge ->fff_fallback
> + | checknum TMP1; bne ->fff_fallback
> |.else
> |.ffunc_nn math_ldexp
> |.if GPR64
> @@ -1949,8 +2200,10 @@ static void build_subroutines(BuildCtx *ctx)
> |.ffunc_n math_frexp
> |.if GPR64
> | la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
> - |.else
> + |.elif FPU
> | la CARG1, DISPATCH_GL(tmptv)(DISPATCH)
> + |.else
> + | la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
> |.endif
> | lwz PC, FRAME_PC(BASE)
> | blex frexp
> @@ -1959,7 +2212,12 @@ static void build_subroutines(BuildCtx *ctx)
> |.if not DUALNUM
> | tonum_i FARG2, TMP1
> |.endif
> + |.if FPU
> | stfd FARG1, 0(RA)
> + |.else
> + | stw CRET1, 0(RA)
> + | stw CRET2, 4(RA)
> + |.endif
> | li RD, (2+1)*8
> |.if DUALNUM
> | stw TISNUM, 8(RA)
> @@ -1972,13 +2230,20 @@ static void build_subroutines(BuildCtx *ctx)
> |.ffunc_n math_modf
> |.if GPR64
> | la CARG2, -8(BASE)
> - |.else
> + |.elif FPU
> | la CARG1, -8(BASE)
> + |.else
> + | la CARG3, -8(BASE)
> |.endif
> | lwz PC, FRAME_PC(BASE)
> | blex modf
> | la RA, -8(BASE)
> + |.if FPU
> | stfd FARG1, 0(BASE)
> + |.else
> + | stw CRET1, 0(BASE)
> + | stw CRET2, 4(BASE)
> + |.endif
> | li RD, (2+1)*8
> | b ->fff_res
> |
> @@ -1986,13 +2251,13 @@ static void build_subroutines(BuildCtx *ctx)
> |.if DUALNUM
> | .ffunc_1 name
> | checknum CARG3
> - | addi TMP1, BASE, 8
> - | add TMP2, BASE, NARGS8:RC
> + | addi SAVE0, BASE, 8
> + | add SAVE1, BASE, NARGS8:RC
> | bne >4
> |1: // Handle integers.
> - | lwz CARG4, 0(TMP1)
> - | cmplw cr1, TMP1, TMP2
> - | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 0(SAVE0)
> + | cmplw cr1, SAVE0, SAVE1
> + | lwz CARG2, 4(SAVE0)
> | bge cr1, ->fff_resi
> | checknum CARG4
> | xoris TMP0, CARG1, 0x8000
> @@ -2009,36 +2274,76 @@ static void build_subroutines(BuildCtx *ctx)
> |.if GPR64
> | rldicl CARG1, CARG1, 0, 32
> |.endif
> - | addi TMP1, TMP1, 8
> + | addi SAVE0, SAVE0, 8
> | b <1
> |3:
> | bge ->fff_fallback
> | // Convert intermediate result to number and continue below.
> + |.if FPU
> | tonum_i FARG1, CARG1
> - | lfd FARG2, 0(TMP1)
> + | lfd FARG2, 0(SAVE0)
> + |.else
> + | mr CARG2, CARG1
> + | bl ->vm_sfi2d_1
> + | lwz CARG3, 0(SAVE0)
> + | lwz CARG4, 4(SAVE0)
> + |.endif
> | b >6
> |4:
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + |.endif
> | bge ->fff_fallback
> |5: // Handle numbers.
> - | lwz CARG4, 0(TMP1)
> - | cmplw cr1, TMP1, TMP2
> - | lfd FARG2, 0(TMP1)
> + | lwz CARG3, 0(SAVE0)
> + | cmplw cr1, SAVE0, SAVE1
> + |.if FPU
> + | lfd FARG2, 0(SAVE0)
> + |.else
> + | lwz CARG4, 4(SAVE0)
> + |.endif
> | bge cr1, ->fff_resn
> - | checknum CARG4; bge >7
> + | checknum CARG3; bge >7
> |6:
> + | addi SAVE0, SAVE0, 8
> + |.if FPU
> | fsub f0, FARG1, FARG2
> - | addi TMP1, TMP1, 8
> |.if ismax
> | fsel FARG1, f0, FARG1, FARG2
> |.else
> | fsel FARG1, f0, FARG2, FARG1
> |.endif
> + |.else
> + | stw CARG1, SFSAVE_1
> + | stw CARG2, SFSAVE_2
> + | stw CARG3, SFSAVE_3
> + | stw CARG4, SFSAVE_4
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + |.if ismax
> + | blt >8
> + |.else
> + | bge >8
> + |.endif
> + | lwz CARG1, SFSAVE_1
> + | lwz CARG2, SFSAVE_2
> + | b <5
> + |8:
> + | lwz CARG1, SFSAVE_3
> + | lwz CARG2, SFSAVE_4
> + |.endif
> | b <5
> |7: // Convert integer to number and continue above.
> - | lwz CARG2, 4(TMP1)
> + | lwz CARG3, 4(SAVE0)
> | bne ->fff_fallback
> - | tonum_i FARG2, CARG2
> + |.if FPU
> + | tonum_i FARG2, CARG3
> + |.else
> + | bl ->vm_sfi2d_2
> + |.endif
> | b <6
> |.else
> | .ffunc_n name
> @@ -2238,28 +2543,37 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |.macro .ffunc_bit_op, name, ins
> | .ffunc_bit name
> - | addi TMP1, BASE, 8
> - | add TMP2, BASE, NARGS8:RC
> + | addi SAVE0, BASE, 8
> + | add SAVE1, BASE, NARGS8:RC
> |1:
> - | lwz CARG4, 0(TMP1)
> - | cmplw cr1, TMP1, TMP2
> + | lwz CARG4, 0(SAVE0)
> + | cmplw cr1, SAVE0, SAVE1
> |.if DUALNUM
> - | lwz CARG2, 4(TMP1)
> + | lwz CARG2, 4(SAVE0)
> |.else
> - | lfd FARG1, 0(TMP1)
> + | lfd FARG1, 0(SAVE0)
> |.endif
> | bgey cr1, ->fff_resi
> | checknum CARG4
> |.if DUALNUM
> + |.if FPU
> | bnel ->fff_bitop_fb
> |.else
> + | beq >3
> + | stw CARG1, SFSAVE_1
> + | bl ->fff_bitop_fb
> + | mr CARG2, CARG1
> + | lwz CARG1, SFSAVE_1
> + |3:
> + |.endif
> + |.else
> | fadd FARG1, FARG1, TOBIT
> | bge ->fff_fallback
> | stfd FARG1, TMPD
> | lwz CARG2, TMPD_LO
> |.endif
> | ins CARG1, CARG1, CARG2
> - | addi TMP1, TMP1, 8
> + | addi SAVE0, SAVE0, 8
> | b <1
> |.endmacro
> |
> @@ -2281,7 +2595,14 @@ static void build_subroutines(BuildCtx *ctx)
> |.macro .ffunc_bit_sh, name, ins, shmod
> |.if DUALNUM
> | .ffunc_2 bit_..name
> + |.if FPU
> | checknum CARG3; bnel ->fff_tobit_fb
> + |.else
> + | checknum CARG3; beq >1
> + | bl ->fff_tobit_fb
> + | lwz CARG2, 12(BASE) // Conversion polluted CARG2.
> + |1:
> + |.endif
> | // Note: no inline conversion from number for 2nd argument!
> | checknum CARG4; bne ->fff_fallback
> |.else
> @@ -2318,27 +2639,77 @@ static void build_subroutines(BuildCtx *ctx)
> |->fff_resn:
> | lwz PC, FRAME_PC(BASE)
> | la RA, -8(BASE)
> + |.if FPU
> | stfd FARG1, -8(BASE)
> + |.else
> + | stw CARG1, -8(BASE)
> + | stw CARG2, -4(BASE)
> + |.endif
> | b ->fff_res1
> |
> |// Fallback FP number to bit conversion.
> |->fff_tobit_fb:
> |.if DUALNUM
> + |.if FPU
> | lfd FARG1, 0(BASE)
> | bgt ->fff_fallback
> | fadd FARG1, FARG1, TOBIT
> | stfd FARG1, TMPD
> | lwz CARG1, TMPD_LO
> | blr
> + |.else
> + | bgt ->fff_fallback
> + | mr CARG2, CARG1
> + | mr CARG1, CARG3
> + |// Modifies: CARG1, CARG2, TMP0, TMP1, TMP2.
> + |->vm_tobit:
> + | slwi TMP2, CARG1, 1
> + | addis TMP2, TMP2, 0x0020
> + | cmpwi TMP2, 0
> + | bge >2
> + | li TMP1, 0x3e0
> + | srawi TMP2, TMP2, 21
> + | not TMP1, TMP1
> + | sub. TMP2, TMP1, TMP2
> + | cmpwi cr7, CARG1, 0
> + | blt >1
> + | slwi TMP1, CARG1, 11
> + | srwi TMP0, CARG2, 21
> + | oris TMP1, TMP1, 0x8000
> + | or TMP1, TMP1, TMP0
> + | srw CARG1, TMP1, TMP2
> + | bclr 4, 28 // Return if cr7[lt] == 0, no hint.
> + | neg CARG1, CARG1
> + | blr
> + |1:
> + | addi TMP2, TMP2, 21
> + | srw TMP1, CARG2, TMP2
> + | slwi CARG2, CARG1, 12
> + | subfic TMP2, TMP2, 20
> + | slw TMP0, CARG2, TMP2
> + | or CARG1, TMP1, TMP0
> + | bclr 4, 28 // Return if cr7[lt] == 0, no hint.
> + | neg CARG1, CARG1
> + | blr
> + |2:
> + | li CARG1, 0
> + | blr
> + |.endif
> |.endif
> |->fff_bitop_fb:
> |.if DUALNUM
> - | lfd FARG1, 0(TMP1)
> + |.if FPU
> + | lfd FARG1, 0(SAVE0)
> | bgt ->fff_fallback
> | fadd FARG1, FARG1, TOBIT
> | stfd FARG1, TMPD
> | lwz CARG2, TMPD_LO
> | blr
> + |.else
> + | bgt ->fff_fallback
> + | mr CARG1, CARG4
> + | b ->vm_tobit
> + |.endif
> |.endif
> |
> |//-----------------------------------------------------------------------
> @@ -2531,10 +2902,21 @@ static void build_subroutines(BuildCtx *ctx)
> | decode_RA8 RC, INS // Call base.
> | beq >2
> |1: // Move results down.
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + |.endif
> | addic. TMP1, TMP1, -8
> | addi RA, RA, 8
> + |.if FPU
> | stfdx f0, BASE, RC
> + |.else
> + | add CARG3, BASE, RC
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | addi RC, RC, 8
> | bne <1
> |2:
> @@ -2587,10 +2969,12 @@ static void build_subroutines(BuildCtx *ctx)
> |//-----------------------------------------------------------------------
> |
> |.macro savex_, a, b, c, d
> + |.if FPU
> | stfd f..a, 16+a*8(sp)
> | stfd f..b, 16+b*8(sp)
> | stfd f..c, 16+c*8(sp)
> | stfd f..d, 16+d*8(sp)
> + |.endif
> |.endmacro
> |
> |->vm_exit_handler:
> @@ -2662,16 +3046,16 @@ static void build_subroutines(BuildCtx *ctx)
> | lwz KBASE, PC2PROTO(k)(TMP1)
> | // Setup type comparison constants.
> | li TISNUM, LJ_TISNUM
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> - | stw TMP3, TMPD
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU stw TMP3, TMPD
> | li ZERO, 0
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | lfs TOBIT, TMPD
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU lfs TOBIT, TMPD
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | li TISNIL, LJ_TNIL
> - | stw TMP0, TONUM_HI
> - | lfs TONUM, TMPD
> + | .FPU stw TMP0, TONUM_HI
> + | .FPU lfs TONUM, TMPD
> | // Modified copy of ins_next which handles function header dispatch, too.
> | lwz INS, 0(PC)
> | addi PC, PC, 4
> @@ -2716,7 +3100,35 @@ static void build_subroutines(BuildCtx *ctx)
> |//-- Math helper functions ----------------------------------------------
> |//-----------------------------------------------------------------------
> |
> - |// NYI: Use internal implementations of floor, ceil, trunc.
> + |// NYI: Use internal implementations of floor, ceil, trunc, sfcmp.
> + |
> + |.macro sfi2d, AHI, ALO
> + |.if not FPU
> + | mr. AHI, ALO
> + | bclr 12, 2 // Handle zero first.
> + | srawi TMP0, ALO, 31
> + | xor TMP1, ALO, TMP0
> + | sub TMP1, TMP1, TMP0 // Absolute value in TMP1.
> + | cntlzw AHI, TMP1
> + | andix. TMP0, TMP0, 0x800 // Mask sign bit.
> + | slw TMP1, TMP1, AHI // Align mantissa left with leading 1.
> + | subfic AHI, AHI, 0x3ff+31-1 // Exponent -1 in AHI.
> + | slwi ALO, TMP1, 21
> + | or AHI, AHI, TMP0 // Sign | Exponent.
> + | srwi TMP1, TMP1, 11
> + | slwi AHI, AHI, 20 // Align left.
> + | add AHI, AHI, TMP1 // Add mantissa, increment exponent.
> + | blr
> + |.endif
> + |.endmacro
> + |
> + |// Input: CARG2. Output: CARG1, CARG2. Temporaries: TMP0, TMP1.
> + |->vm_sfi2d_1:
> + | sfi2d CARG1, CARG2
> + |
> + |// Input: CARG4. Output: CARG3, CARG4. Temporaries: TMP0, TMP1.
> + |->vm_sfi2d_2:
> + | sfi2d CARG3, CARG4
> |
> |->vm_modi:
> | divwo. TMP0, CARG1, CARG2
> @@ -2784,21 +3196,21 @@ static void build_subroutines(BuildCtx *ctx)
> | addi DISPATCH, r12, GG_G2DISP
> | stw r11, CTSTATE->cb.slot
> | stw r3, CTSTATE->cb.gpr[0]
> - | stfd f1, CTSTATE->cb.fpr[0]
> + | .FPU stfd f1, CTSTATE->cb.fpr[0]
> | stw r4, CTSTATE->cb.gpr[1]
> - | stfd f2, CTSTATE->cb.fpr[1]
> + | .FPU stfd f2, CTSTATE->cb.fpr[1]
> | stw r5, CTSTATE->cb.gpr[2]
> - | stfd f3, CTSTATE->cb.fpr[2]
> + | .FPU stfd f3, CTSTATE->cb.fpr[2]
> | stw r6, CTSTATE->cb.gpr[3]
> - | stfd f4, CTSTATE->cb.fpr[3]
> + | .FPU stfd f4, CTSTATE->cb.fpr[3]
> | stw r7, CTSTATE->cb.gpr[4]
> - | stfd f5, CTSTATE->cb.fpr[4]
> + | .FPU stfd f5, CTSTATE->cb.fpr[4]
> | stw r8, CTSTATE->cb.gpr[5]
> - | stfd f6, CTSTATE->cb.fpr[5]
> + | .FPU stfd f6, CTSTATE->cb.fpr[5]
> | stw r9, CTSTATE->cb.gpr[6]
> - | stfd f7, CTSTATE->cb.fpr[6]
> + | .FPU stfd f7, CTSTATE->cb.fpr[6]
> | stw r10, CTSTATE->cb.gpr[7]
> - | stfd f8, CTSTATE->cb.fpr[7]
> + | .FPU stfd f8, CTSTATE->cb.fpr[7]
> | addi TMP0, sp, CFRAME_SPACE+8
> | stw TMP0, CTSTATE->cb.stack
> | mr CARG1, CTSTATE
> @@ -2809,21 +3221,21 @@ static void build_subroutines(BuildCtx *ctx)
> | lp BASE, L:CRET1->base
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp RC, L:CRET1->top
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | li ZERO, 0
> | mr L, CRET1
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | lwz LFUNC:RB, FRAME_FUNC(BASE)
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | stw TMP0, TONUM_HI
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU stw TMP0, TONUM_HI
> | li TISNIL, LJ_TNIL
> | li_vmstate INTERP
> - | lfs TOBIT, TMPD
> - | stw TMP3, TMPD
> + | .FPU lfs TOBIT, TMPD
> + | .FPU stw TMP3, TMPD
> | sub RC, RC, BASE
> | st_vmstate
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | ins_callt
> |.endif
> |
> @@ -2837,7 +3249,7 @@ static void build_subroutines(BuildCtx *ctx)
> | mr CARG2, RA
> | bl extern lj_ccallback_leave // (CTState *cts, TValue *o)
> | lwz CRET1, CTSTATE->cb.gpr[0]
> - | lfd FARG1, CTSTATE->cb.fpr[0]
> + | .FPU lfd FARG1, CTSTATE->cb.fpr[0]
> | lwz CRET2, CTSTATE->cb.gpr[1]
> | b ->vm_leave_unw
> |.endif
> @@ -2871,14 +3283,14 @@ static void build_subroutines(BuildCtx *ctx)
> | bge <1
> |2:
> | bney cr1, >3
> - | lfd f1, CCSTATE->fpr[0]
> - | lfd f2, CCSTATE->fpr[1]
> - | lfd f3, CCSTATE->fpr[2]
> - | lfd f4, CCSTATE->fpr[3]
> - | lfd f5, CCSTATE->fpr[4]
> - | lfd f6, CCSTATE->fpr[5]
> - | lfd f7, CCSTATE->fpr[6]
> - | lfd f8, CCSTATE->fpr[7]
> + | .FPU lfd f1, CCSTATE->fpr[0]
> + | .FPU lfd f2, CCSTATE->fpr[1]
> + | .FPU lfd f3, CCSTATE->fpr[2]
> + | .FPU lfd f4, CCSTATE->fpr[3]
> + | .FPU lfd f5, CCSTATE->fpr[4]
> + | .FPU lfd f6, CCSTATE->fpr[5]
> + | .FPU lfd f7, CCSTATE->fpr[6]
> + | .FPU lfd f8, CCSTATE->fpr[7]
> |3:
> | lp TMP0, CCSTATE->func
> | lwz CARG2, CCSTATE->gpr[1]
> @@ -2895,7 +3307,7 @@ static void build_subroutines(BuildCtx *ctx)
> | lwz TMP2, -4(r14)
> | lwz TMP0, 4(r14)
> | stw CARG1, CCSTATE:TMP1->gpr[0]
> - | stfd FARG1, CCSTATE:TMP1->fpr[0]
> + | .FPU stfd FARG1, CCSTATE:TMP1->fpr[0]
> | stw CARG2, CCSTATE:TMP1->gpr[1]
> | mtlr TMP0
> | stw CARG3, CCSTATE:TMP1->gpr[2]
> @@ -2924,19 +3336,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
> | // RA = src1*8, RD = src2*8, JMP with RD = target
> |.if DUALNUM
> - | lwzux TMP0, RA, BASE
> + | lwzux CARG1, RA, BASE
> | addi PC, PC, 4
> | lwz CARG2, 4(RA)
> - | lwzux TMP1, RD, BASE
> + | lwzux CARG3, RD, BASE
> | lwz TMP2, -4(PC)
> - | checknum cr0, TMP0
> - | lwz CARG3, 4(RD)
> + | checknum cr0, CARG1
> + | lwz CARG4, 4(RD)
> | decode_RD4 TMP2, TMP2
> - | checknum cr1, TMP1
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | checknum cr1, CARG3
> + | addis SAVE0, TMP2, -(BCBIAS_J*4 >> 16)
> | bne cr0, >7
> | bne cr1, >8
> - | cmpw CARG2, CARG3
> + | cmpw CARG2, CARG4
> if (op == BC_ISLT) {
> | bge >2
> } else if (op == BC_ISGE) {
> @@ -2947,28 +3359,41 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | ble >2
> }
> |1:
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |2:
> | ins_next
> |
> |7: // RA is not an integer.
> | bgt cr0, ->vmeta_comp
> | // RA is a number.
> - | lfd f0, 0(RA)
> + | .FPU lfd f0, 0(RA)
> | bgt cr1, ->vmeta_comp
> | blt cr1, >4
> | // RA is a number, RD is an integer.
> - | tonum_i f1, CARG3
> + |.if FPU
> + | tonum_i f1, CARG4
> + |.else
> + | bl ->vm_sfi2d_2
> + |.endif
> | b >5
> |
> |8: // RA is an integer, RD is not an integer.
> | bgt cr1, ->vmeta_comp
> | // RA is an integer, RD is a number.
> + |.if FPU
> | tonum_i f0, CARG2
> + |.else
> + | bl ->vm_sfi2d_1
> + |.endif
> |4:
> - | lfd f1, 0(RD)
> + | .FPU lfd f1, 0(RD)
> |5:
> + |.if FPU
> | fcmpu cr0, f0, f1
> + |.else
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + |.endif
> if (op == BC_ISLT) {
> | bge <2
> } else if (op == BC_ISGE) {
> @@ -3016,42 +3441,42 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> vk = op == BC_ISEQV;
> | // RA = src1*8, RD = src2*8, JMP with RD = target
> |.if DUALNUM
> - | lwzux TMP0, RA, BASE
> + | lwzux CARG1, RA, BASE
> | addi PC, PC, 4
> | lwz CARG2, 4(RA)
> - | lwzux TMP1, RD, BASE
> - | checknum cr0, TMP0
> - | lwz TMP2, -4(PC)
> - | checknum cr1, TMP1
> - | decode_RD4 TMP2, TMP2
> - | lwz CARG3, 4(RD)
> + | lwzux CARG3, RD, BASE
> + | checknum cr0, CARG1
> + | lwz SAVE0, -4(PC)
> + | checknum cr1, CARG3
> + | decode_RD4 SAVE0, SAVE0
> + | lwz CARG4, 4(RD)
> | cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> if (vk) {
> | ble cr7, ->BC_ISEQN_Z
> } else {
> | ble cr7, ->BC_ISNEN_Z
> }
> |.else
> - | lwzux TMP0, RA, BASE
> - | lwz TMP2, 0(PC)
> + | lwzux CARG1, RA, BASE
> + | lwz SAVE0, 0(PC)
> | lfd f0, 0(RA)
> | addi PC, PC, 4
> - | lwzux TMP1, RD, BASE
> - | checknum cr0, TMP0
> - | decode_RD4 TMP2, TMP2
> + | lwzux CARG3, RD, BASE
> + | checknum cr0, CARG1
> + | decode_RD4 SAVE0, SAVE0
> | lfd f1, 0(RD)
> - | checknum cr1, TMP1
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | checknum cr1, CARG3
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> | bge cr0, >5
> | bge cr1, >5
> | fcmpu cr0, f0, f1
> if (vk) {
> | bne >1
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> } else {
> | beq >1
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> }
> |1:
> | ins_next
> @@ -3059,36 +3484,36 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |5: // Either or both types are not numbers.
> |.if not DUALNUM
> | lwz CARG2, 4(RA)
> - | lwz CARG3, 4(RD)
> + | lwz CARG4, 4(RD)
> |.endif
> |.if FFI
> - | cmpwi cr7, TMP0, LJ_TCDATA
> - | cmpwi cr5, TMP1, LJ_TCDATA
> + | cmpwi cr7, CARG1, LJ_TCDATA
> + | cmpwi cr5, CARG3, LJ_TCDATA
> |.endif
> - | not TMP3, TMP0
> - | cmplw TMP0, TMP1
> - | cmplwi cr1, TMP3, ~LJ_TISPRI // Primitive?
> + | not TMP2, CARG1
> + | cmplw CARG1, CARG3
> + | cmplwi cr1, TMP2, ~LJ_TISPRI // Primitive?
> |.if FFI
> | cror 4*cr7+eq, 4*cr7+eq, 4*cr5+eq
> |.endif
> - | cmplwi cr6, TMP3, ~LJ_TISTABUD // Table or userdata?
> + | cmplwi cr6, TMP2, ~LJ_TISTABUD // Table or userdata?
> |.if FFI
> | beq cr7, ->vmeta_equal_cd
> |.endif
> - | cmplw cr5, CARG2, CARG3
> + | cmplw cr5, CARG2, CARG4
> | crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt // 2: Same type and primitive.
> | crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq // 1: Same tv or different type.
> | crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq // 0: Same type and same tv.
> - | mr SAVE0, PC
> + | mr SAVE1, PC
> | cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt // 0 or 2.
> | cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt // 1 or 2.
> if (vk) {
> | bne cr0, >6
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |6:
> } else {
> | beq cr0, >6
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |6:
> }
> |.if DUALNUM
> @@ -3103,6 +3528,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |
> | // Different tables or userdatas. Need to check __eq metamethod.
> | // Field metatable must be at same offset for GCtab and GCudata!
> + | mr CARG3, CARG4
> | lwz TAB:TMP2, TAB:CARG2->metatable
> | li CARG4, 1-vk // ne = 0 or 1.
> | cmplwi TAB:TMP2, 0
> @@ -3110,7 +3536,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lbz TMP2, TAB:TMP2->nomm
> | andix. TMP2, TMP2, 1<<MM_eq
> | bne <1 // Or 'no __eq' flag set?
> - | mr PC, SAVE0 // Restore old PC.
> + | mr PC, SAVE1 // Restore old PC.
> | b ->vmeta_equal // Handle __eq metamethod.
> break;
>
> @@ -3151,16 +3577,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> vk = op == BC_ISEQN;
> | // RA = src*8, RD = num_const*8, JMP with RD = target
> |.if DUALNUM
> - | lwzux TMP0, RA, BASE
> + | lwzux CARG1, RA, BASE
> | addi PC, PC, 4
> | lwz CARG2, 4(RA)
> - | lwzux TMP1, RD, KBASE
> - | checknum cr0, TMP0
> - | lwz TMP2, -4(PC)
> - | checknum cr1, TMP1
> - | decode_RD4 TMP2, TMP2
> - | lwz CARG3, 4(RD)
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | lwzux CARG3, RD, KBASE
> + | checknum cr0, CARG1
> + | lwz SAVE0, -4(PC)
> + | checknum cr1, CARG3
> + | decode_RD4 SAVE0, SAVE0
> + | lwz CARG4, 4(RD)
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> if (vk) {
> |->BC_ISEQN_Z:
> } else {
> @@ -3168,7 +3594,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> }
> | bne cr0, >7
> | bne cr1, >8
> - | cmpw CARG2, CARG3
> + | cmpw CARG2, CARG4
> |4:
> |.else
> if (vk) {
> @@ -3176,20 +3602,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> } else {
> |->BC_ISNEN_Z: // Dummy label.
> }
> - | lwzx TMP0, BASE, RA
> + | lwzx CARG1, BASE, RA
> | addi PC, PC, 4
> | lfdx f0, BASE, RA
> - | lwz TMP2, -4(PC)
> + | lwz SAVE0, -4(PC)
> | lfdx f1, KBASE, RD
> - | decode_RD4 TMP2, TMP2
> - | checknum TMP0
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | decode_RD4 SAVE0, SAVE0
> + | checknum CARG1
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> | bge >3
> | fcmpu cr0, f0, f1
> |.endif
> if (vk) {
> | bne >1
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |1:
> |.if not FFI
> |3:
> @@ -3200,13 +3626,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |.if not FFI
> |3:
> |.endif
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |2:
> }
> | ins_next
> |.if FFI
> |3:
> - | cmpwi TMP0, LJ_TCDATA
> + | cmpwi CARG1, LJ_TCDATA
> | beq ->vmeta_equal_cd
> | b <1
> |.endif
> @@ -3214,18 +3640,31 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |7: // RA is not an integer.
> | bge cr0, <3
> | // RA is a number.
> - | lfd f0, 0(RA)
> + | .FPU lfd f0, 0(RA)
> | blt cr1, >1
> | // RA is a number, RD is an integer.
> - | tonum_i f1, CARG3
> + |.if FPU
> + | tonum_i f1, CARG4
> + |.else
> + | bl ->vm_sfi2d_2
> + |.endif
> | b >2
> |
> |8: // RA is an integer, RD is a number.
> + |.if FPU
> | tonum_i f0, CARG2
> + |.else
> + | bl ->vm_sfi2d_1
> + |.endif
> |1:
> - | lfd f1, 0(RD)
> + | .FPU lfd f1, 0(RD)
> |2:
> + |.if FPU
> | fcmpu cr0, f0, f1
> + |.else
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + |.endif
> | b <4
> |.endif
> break;
> @@ -3280,7 +3719,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | add PC, PC, TMP2
> } else {
> | li TMP1, LJ_TFALSE
> + |.if FPU
> | lfdx f0, BASE, RD
> + |.else
> + | lwzux CARG1, RD, BASE
> + | lwz CARG2, 4(RD)
> + |.endif
> | cmplw TMP0, TMP1
> if (op == BC_ISTC) {
> | bge >1
> @@ -3289,7 +3733,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> }
> | addis PC, PC, -(BCBIAS_J*4 >> 16)
> | decode_RD4 TMP2, INS
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux CARG1, RA, BASE
> + | stw CARG2, 4(RA)
> + |.endif
> | add PC, PC, TMP2
> |1:
> }
> @@ -3324,8 +3773,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> case BC_MOV:
> | // RA = dst*8, RD = src*8
> | ins_next1
> + |.if FPU
> | lfdx f0, BASE, RD
> | stfdx f0, BASE, RA
> + |.else
> + | lwzux TMP0, RD, BASE
> + | lwz TMP1, 4(RD)
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> break;
> case BC_NOT:
> @@ -3427,44 +3883,65 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
> ||switch (vk) {
> ||case 0:
> - | lwzx TMP1, BASE, RB
> + | lwzx CARG1, BASE, RB
> | .if DUALNUM
> - | lwzx TMP2, KBASE, RC
> + | lwzx CARG3, KBASE, RC
> | .endif
> + | .if FPU
> | lfdx f14, BASE, RB
> | lfdx f15, KBASE, RC
> + | .else
> + | add TMP1, BASE, RB
> + | add TMP2, KBASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + | .endif
> | .if DUALNUM
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_vn
> | .else
> - | checknum TMP1; bge ->vmeta_arith_vn
> + | checknum CARG1; bge ->vmeta_arith_vn
> | .endif
> || break;
> ||case 1:
> - | lwzx TMP1, BASE, RB
> + | lwzx CARG1, BASE, RB
> | .if DUALNUM
> - | lwzx TMP2, KBASE, RC
> + | lwzx CARG3, KBASE, RC
> | .endif
> + | .if FPU
> | lfdx f15, BASE, RB
> | lfdx f14, KBASE, RC
> + | .else
> + | add TMP1, BASE, RB
> + | add TMP2, KBASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + | .endif
> | .if DUALNUM
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_nv
> | .else
> - | checknum TMP1; bge ->vmeta_arith_nv
> + | checknum CARG1; bge ->vmeta_arith_nv
> | .endif
> || break;
> ||default:
> - | lwzx TMP1, BASE, RB
> - | lwzx TMP2, BASE, RC
> + | lwzx CARG1, BASE, RB
> + | lwzx CARG3, BASE, RC
> + | .if FPU
> | lfdx f14, BASE, RB
> | lfdx f15, BASE, RC
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + | .else
> + | add TMP1, BASE, RB
> + | add TMP2, BASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + | .endif
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_vv
> || break;
> @@ -3498,48 +3975,78 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | fsub a, b, a // b - floor(b/c)*c
> |.endmacro
> |
> + |.macro sfpmod
> + |->BC_MODVN_Z:
> + | stw CARG1, SFSAVE_1
> + | stw CARG2, SFSAVE_2
> + | mr SAVE0, CARG3
> + | mr SAVE1, CARG4
> + | blex __divdf3
> + | blex floor
> + | mr CARG3, SAVE0
> + | mr CARG4, SAVE1
> + | blex __muldf3
> + | mr CARG3, CRET1
> + | mr CARG4, CRET2
> + | lwz CARG1, SFSAVE_1
> + | lwz CARG2, SFSAVE_2
> + | blex __subdf3
> + |.endmacro
> + |
> |.macro ins_arithfp, fpins
> | ins_arithpre
> |.if "fpins" == "fpmod_"
> | b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway.
> - |.else
> + |.elif FPU
> | fpins f0, f14, f15
> | ins_next1
> | stfdx f0, BASE, RA
> | ins_next2
> + |.else
> + | blex __divdf3 // Only soft-float div uses this macro.
> + | ins_next1
> + | stwux CRET1, RA, BASE
> + | stw CRET2, 4(RA)
> + | ins_next2
> |.endif
> |.endmacro
> |
> - |.macro ins_arithdn, intins, fpins
> + |.macro ins_arithdn, intins, fpins, fpcall
> | // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8
> ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
> ||switch (vk) {
> ||case 0:
> - | lwzux TMP1, RB, BASE
> - | lwzux TMP2, RC, KBASE
> - | lwz CARG1, 4(RB)
> - | checknum cr0, TMP1
> - | lwz CARG2, 4(RC)
> + | lwzux CARG1, RB, BASE
> + | lwzux CARG3, RC, KBASE
> + | lwz CARG2, 4(RB)
> + | checknum cr0, CARG1
> + | lwz CARG4, 4(RC)
> + | checknum cr1, CARG3
> || break;
> ||case 1:
> - | lwzux TMP1, RB, BASE
> - | lwzux TMP2, RC, KBASE
> - | lwz CARG2, 4(RB)
> - | checknum cr0, TMP1
> - | lwz CARG1, 4(RC)
> + | lwzux CARG3, RB, BASE
> + | lwzux CARG1, RC, KBASE
> + | lwz CARG4, 4(RB)
> + | checknum cr0, CARG3
> + | lwz CARG2, 4(RC)
> + | checknum cr1, CARG1
> || break;
> ||default:
> - | lwzux TMP1, RB, BASE
> - | lwzux TMP2, RC, BASE
> - | lwz CARG1, 4(RB)
> - | checknum cr0, TMP1
> - | lwz CARG2, 4(RC)
> + | lwzux CARG1, RB, BASE
> + | lwzux CARG3, RC, BASE
> + | lwz CARG2, 4(RB)
> + | checknum cr0, CARG1
> + | lwz CARG4, 4(RC)
> + | checknum cr1, CARG3
> || break;
> ||}
> - | checknum cr1, TMP2
> | bne >5
> | bne cr1, >5
> - | intins CARG1, CARG1, CARG2
> + |.if "intins" == "intmod"
> + | mr CARG1, CARG2
> + | mr CARG2, CARG4
> + |.endif
> + | intins CARG1, CARG2, CARG4
> | bso >4
> |1:
> | ins_next1
> @@ -3551,29 +4058,40 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | checkov TMP0, <1 // Ignore unrelated overflow.
> | ins_arithfallback b
> |5: // FP variant.
> + |.if FPU
> ||if (vk == 1) {
> | lfd f15, 0(RB)
> - | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | lfd f14, 0(RC)
> ||} else {
> | lfd f14, 0(RB)
> - | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | lfd f15, 0(RC)
> ||}
> + |.endif
> + | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | ins_arithfallback bge
> |.if "fpins" == "fpmod_"
> | b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway.
> |.else
> + |.if FPU
> | fpins f0, f14, f15
> - | ins_next1
> | stfdx f0, BASE, RA
> + |.else
> + |.if "fpcall" == "sfpmod"
> + | sfpmod
> + |.else
> + | blex fpcall
> + |.endif
> + | stwux CRET1, RA, BASE
> + | stw CRET2, 4(RA)
> + |.endif
> + | ins_next1
> | b <2
> |.endif
> |.endmacro
> |
> - |.macro ins_arith, intins, fpins
> + |.macro ins_arith, intins, fpins, fpcall
> |.if DUALNUM
> - | ins_arithdn intins, fpins
> + | ins_arithdn intins, fpins, fpcall
> |.else
> | ins_arithfp fpins
> |.endif
> @@ -3588,9 +4106,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | addo. TMP0, TMP0, TMP3
> | add y, a, b
> |.endmacro
> - | ins_arith addo32., fadd
> + | ins_arith addo32., fadd, __adddf3
> |.else
> - | ins_arith addo., fadd
> + | ins_arith addo., fadd, __adddf3
> |.endif
> break;
> case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
> @@ -3602,36 +4120,48 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | subo. TMP0, TMP0, TMP3
> | sub y, a, b
> |.endmacro
> - | ins_arith subo32., fsub
> + | ins_arith subo32., fsub, __subdf3
> |.else
> - | ins_arith subo., fsub
> + | ins_arith subo., fsub, __subdf3
> |.endif
> break;
> case BC_MULVN: case BC_MULNV: case BC_MULVV:
> - | ins_arith mullwo., fmul
> + | ins_arith mullwo., fmul, __muldf3
> break;
> case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
> | ins_arithfp fdiv
> break;
> case BC_MODVN:
> - | ins_arith intmod, fpmod
> + | ins_arith intmod, fpmod, sfpmod
> break;
> case BC_MODNV: case BC_MODVV:
> - | ins_arith intmod, fpmod_
> + | ins_arith intmod, fpmod_, sfpmod
> break;
> case BC_POW:
> | // NYI: (partial) integer arithmetic.
> - | lwzx TMP1, BASE, RB
> + | lwzx CARG1, BASE, RB
> + | lwzx CARG3, BASE, RC
> + |.if FPU
> | lfdx FARG1, BASE, RB
> - | lwzx TMP2, BASE, RC
> | lfdx FARG2, BASE, RC
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + |.else
> + | add TMP1, BASE, RB
> + | add TMP2, BASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + |.endif
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_vv
> | blex pow
> | ins_next1
> + |.if FPU
> | stfdx FARG1, BASE, RA
> + |.else
> + | stwux CARG1, RA, BASE
> + | stw CARG2, 4(RA)
> + |.endif
> | ins_next2
> break;
>
> @@ -3651,8 +4181,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lp BASE, L->base
> | bne ->vmeta_binop
> | ins_next1
> + |.if FPU
> | lfdx f0, BASE, SAVE0 // Copy result from RB to RA.
> | stfdx f0, BASE, RA
> + |.else
> + | lwzux TMP0, SAVE0, BASE
> + | lwz TMP1, 4(SAVE0)
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> break;
>
> @@ -3715,8 +4252,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> case BC_KNUM:
> | // RA = dst*8, RD = num_const*8
> | ins_next1
> + |.if FPU
> | lfdx f0, KBASE, RD
> | stfdx f0, BASE, RA
> + |.else
> + | lwzux TMP0, RD, KBASE
> + | lwz TMP1, 4(RD)
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> break;
> case BC_KPRI:
> @@ -3749,8 +4293,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwzx UPVAL:RB, LFUNC:RB, RD
> | ins_next1
> | lwz TMP1, UPVAL:RB->v
> + |.if FPU
> | lfd f0, 0(TMP1)
> | stfdx f0, BASE, RA
> + |.else
> + | lwz TMP2, 0(TMP1)
> + | lwz TMP3, 4(TMP1)
> + | stwux TMP2, RA, BASE
> + | stw TMP3, 4(RA)
> + |.endif
> | ins_next2
> break;
> case BC_USETV:
> @@ -3758,14 +4309,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz LFUNC:RB, FRAME_FUNC(BASE)
> | srwi RA, RA, 1
> | addi RA, RA, offsetof(GCfuncL, uvptr)
> + |.if FPU
> | lfdux f0, RD, BASE
> + |.else
> + | lwzux CARG1, RD, BASE
> + | lwz CARG3, 4(RD)
> + |.endif
> | lwzx UPVAL:RB, LFUNC:RB, RA
> | lbz TMP3, UPVAL:RB->marked
> | lwz CARG2, UPVAL:RB->v
> | andix. TMP3, TMP3, LJ_GC_BLACK // isblack(uv)
> | lbz TMP0, UPVAL:RB->closed
> | lwz TMP2, 0(RD)
> + |.if FPU
> | stfd f0, 0(CARG2)
> + |.else
> + | stw CARG1, 0(CARG2)
> + | stw CARG3, 4(CARG2)
> + |.endif
> | cmplwi cr1, TMP0, 0
> | lwz TMP1, 4(RD)
> | cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -3821,11 +4382,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz LFUNC:RB, FRAME_FUNC(BASE)
> | srwi RA, RA, 1
> | addi RA, RA, offsetof(GCfuncL, uvptr)
> + |.if FPU
> | lfdx f0, KBASE, RD
> + |.else
> + | lwzux TMP2, RD, KBASE
> + | lwz TMP3, 4(RD)
> + |.endif
> | lwzx UPVAL:RB, LFUNC:RB, RA
> | ins_next1
> | lwz TMP1, UPVAL:RB->v
> + |.if FPU
> | stfd f0, 0(TMP1)
> + |.else
> + | stw TMP2, 0(TMP1)
> + | stw TMP3, 4(TMP1)
> + |.endif
> | ins_next2
> break;
> case BC_USETP:
> @@ -3973,11 +4544,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |.endif
> | ble ->vmeta_tgetv // Integer key and in array part?
> | lwzx TMP0, TMP1, TMP2
> + |.if FPU
> | lfdx f14, TMP1, TMP2
> + |.else
> + | lwzux SAVE0, TMP1, TMP2
> + | lwz SAVE1, 4(TMP1)
> + |.endif
> | checknil TMP0; beq >2
> |1:
> | ins_next1
> + |.if FPU
> | stfdx f14, BASE, RA
> + |.else
> + | stwux SAVE0, RA, BASE
> + | stw SAVE1, 4(RA)
> + |.endif
> | ins_next2
> |
> |2: // Check for __index if table value is nil.
> @@ -4053,12 +4634,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz TMP1, TAB:RB->asize
> | lwz TMP2, TAB:RB->array
> | cmplw TMP0, TMP1; bge ->vmeta_tgetb
> + |.if FPU
> | lwzx TMP1, TMP2, RC
> | lfdx f0, TMP2, RC
> + |.else
> + | lwzux TMP1, TMP2, RC
> + | lwz TMP3, 4(TMP2)
> + |.endif
> | checknil TMP1; beq >5
> |1:
> | ins_next1
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux TMP1, RA, BASE
> + | stw TMP3, 4(RA)
> + |.endif
> | ins_next2
> |
> |5: // Check for __index if table value is nil.
> @@ -4088,10 +4679,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | cmplw TMP0, CARG2
> | slwi TMP2, CARG2, 3
> | ble ->vmeta_tgetr // In array part?
> + |.if FPU
> | lfdx f14, TMP1, TMP2
> + |.else
> + | lwzux SAVE0, TMP2, TMP1
> + | lwz SAVE1, 4(TMP2)
> + |.endif
> |->BC_TGETR_Z:
> | ins_next1
> + |.if FPU
> | stfdx f14, BASE, RA
> + |.else
> + | stwux SAVE0, RA, BASE
> + | stw SAVE1, 4(RA)
> + |.endif
> | ins_next2
> break;
>
> @@ -4132,11 +4733,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | ble ->vmeta_tsetv // Integer key and in array part?
> | lwzx TMP2, TMP1, TMP0
> | lbz TMP3, TAB:RB->marked
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | add SAVE1, BASE, RA
> + | lwz SAVE0, 0(SAVE1)
> + | lwz SAVE1, 4(SAVE1)
> + |.endif
> | checknil TMP2; beq >3
> |1:
> | andix. TMP2, TMP3, LJ_GC_BLACK // isblack(table)
> + |.if FPU
> | stfdx f14, TMP1, TMP0
> + |.else
> + | stwux SAVE0, TMP1, TMP0
> + | stw SAVE1, 4(TMP1)
> + |.endif
> | bne >7
> |2:
> | ins_next
> @@ -4177,7 +4789,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz NODE:TMP2, TAB:RB->node
> | stb ZERO, TAB:RB->nomm // Clear metamethod cache.
> | and TMP1, TMP1, TMP0 // idx = str->hash & tab->hmask
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | add CARG2, BASE, RA
> + | lwz SAVE0, 0(CARG2)
> + | lwz SAVE1, 4(CARG2)
> + |.endif
> | slwi TMP0, TMP1, 5
> | slwi TMP1, TMP1, 3
> | sub TMP1, TMP0, TMP1
> @@ -4193,7 +4811,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | checknil CARG2; beq >4 // Key found, but nil value?
> |2:
> | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
> + |.if FPU
> | stfd f14, NODE:TMP2->val
> + |.else
> + | stw SAVE0, NODE:TMP2->val.u32.hi
> + | stw SAVE1, NODE:TMP2->val.u32.lo
> + |.endif
> | bne >7
> |3:
> | ins_next
> @@ -4232,7 +4855,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | bl extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k)
> | // Returns TValue *.
> | lp BASE, L->base
> + |.if FPU
> | stfd f14, 0(CRET1)
> + |.else
> + | stw SAVE0, 0(CRET1)
> + | stw SAVE1, 4(CRET1)
> + |.endif
> | b <3 // No 2nd write barrier needed.
> |
> |7: // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4249,13 +4877,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz TMP2, TAB:RB->array
> | lbz TMP3, TAB:RB->marked
> | cmplw TMP0, TMP1
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | add CARG2, BASE, RA
> + | lwz SAVE0, 0(CARG2)
> + | lwz SAVE1, 4(CARG2)
> + |.endif
> | bge ->vmeta_tsetb
> | lwzx TMP1, TMP2, RC
> | checknil TMP1; beq >5
> |1:
> | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
> + |.if FPU
> | stfdx f14, TMP2, RC
> + |.else
> + | stwux SAVE0, RC, TMP2
> + | stw SAVE1, 4(RC)
> + |.endif
> | bne >7
> |2:
> | ins_next
> @@ -4295,10 +4934,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |2:
> | cmplw TMP0, CARG3
> | slwi TMP2, CARG3, 3
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | lwzux SAVE0, RA, BASE
> + | lwz SAVE1, 4(RA)
> + |.endif
> | ble ->vmeta_tsetr // In array part?
> | ins_next1
> + |.if FPU
> | stfdx f14, TMP1, TMP2
> + |.else
> + | stwux SAVE0, TMP1, TMP2
> + | stw SAVE1, 4(TMP1)
> + |.endif
> | ins_next2
> |
> |7: // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4328,10 +4977,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | add TMP1, TMP1, TMP0
> | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
> |3: // Copy result slots to table.
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz SAVE0, 0(RA)
> + | lwz SAVE1, 4(RA)
> + |.endif
> | addi RA, RA, 8
> | cmpw cr1, RA, TMP2
> + |.if FPU
> | stfd f0, 0(TMP1)
> + |.else
> + | stw SAVE0, 0(TMP1)
> + | stw SAVE1, 4(TMP1)
> + |.endif
> | addi TMP1, TMP1, 8
> | blt cr1, <3
> | bne >7
> @@ -4398,9 +5057,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | beq cr1, >3
> |2:
> | addi TMP3, TMP2, 8
> + |.if FPU
> | lfdx f0, RA, TMP2
> + |.else
> + | add CARG3, RA, TMP2
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | cmplw cr1, TMP3, NARGS8:RC
> + |.if FPU
> | stfdx f0, BASE, TMP2
> + |.else
> + | stwux CARG1, TMP2, BASE
> + | stw CARG2, 4(TMP2)
> + |.endif
> | mr TMP2, TMP3
> | bne cr1, <2
> |3:
> @@ -4433,14 +5103,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | add BASE, BASE, RA
> | lwz TMP1, -24(BASE)
> | lwz LFUNC:RB, -20(BASE)
> + |.if FPU
> | lfd f1, -8(BASE)
> | lfd f0, -16(BASE)
> + |.else
> + | lwz CARG1, -8(BASE)
> + | lwz CARG2, -4(BASE)
> + | lwz CARG3, -16(BASE)
> + | lwz CARG4, -12(BASE)
> + |.endif
> | stw TMP1, 0(BASE) // Copy callable.
> | stw LFUNC:RB, 4(BASE)
> | checkfunc TMP1
> - | stfd f1, 16(BASE) // Copy control var.
> | li NARGS8:RC, 16 // Iterators get 2 arguments.
> + |.if FPU
> + | stfd f1, 16(BASE) // Copy control var.
> | stfdu f0, 8(BASE) // Copy state.
> + |.else
> + | stw CARG1, 16(BASE) // Copy control var.
> + | stw CARG2, 20(BASE)
> + | stwu CARG3, 8(BASE) // Copy state.
> + | stw CARG4, 4(BASE)
> + |.endif
> | bne ->vmeta_call
> | ins_call
> break;
> @@ -4461,7 +5145,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | slwi TMP3, RC, 3
> | bge >5 // Index points after array part?
> | lwzx TMP2, TMP1, TMP3
> + |.if FPU
> | lfdx f0, TMP1, TMP3
> + |.else
> + | lwzux CARG1, TMP3, TMP1
> + | lwz CARG2, 4(TMP3)
> + |.endif
> | checknil TMP2
> | lwz INS, -4(PC)
> | beq >4
> @@ -4473,7 +5162,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |.endif
> | addi RC, RC, 1
> | addis TMP3, PC, -(BCBIAS_J*4 >> 16)
> + |.if FPU
> | stfd f0, 8(RA)
> + |.else
> + | stw CARG1, 8(RA)
> + | stw CARG2, 12(RA)
> + |.endif
> | decode_RD4 TMP1, INS
> | stw RC, -4(RA) // Update control var.
> | add PC, TMP1, TMP3
> @@ -4498,17 +5192,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | slwi RB, RC, 3
> | sub TMP3, TMP3, RB
> | lwzx RB, TMP2, TMP3
> + |.if FPU
> | lfdx f0, TMP2, TMP3
> + |.else
> + | add CARG3, TMP2, TMP3
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | add NODE:TMP3, TMP2, TMP3
> | checknil RB
> | lwz INS, -4(PC)
> | beq >7
> + |.if FPU
> | lfd f1, NODE:TMP3->key
> + |.else
> + | lwz CARG3, NODE:TMP3->key.u32.hi
> + | lwz CARG4, NODE:TMP3->key.u32.lo
> + |.endif
> | addis TMP2, PC, -(BCBIAS_J*4 >> 16)
> + |.if FPU
> | stfd f0, 8(RA)
> + |.else
> + | stw CARG1, 8(RA)
> + | stw CARG2, 12(RA)
> + |.endif
> | add RC, RC, TMP0
> | decode_RD4 TMP1, INS
> + |.if FPU
> | stfd f1, 0(RA)
> + |.else
> + | stw CARG3, 0(RA)
> + | stw CARG4, 4(RA)
> + |.endif
> | addi RC, RC, 1
> | add PC, TMP1, TMP2
> | stw RC, -4(RA) // Update control var.
> @@ -4574,9 +5289,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | subi TMP2, TMP2, 16
> | ble >2 // No vararg slots?
> |1: // Copy vararg slots to destination slots.
> + |.if FPU
> | lfd f0, 0(RC)
> + |.else
> + | lwz CARG1, 0(RC)
> + | lwz CARG2, 4(RC)
> + |.endif
> | addi RC, RC, 8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw CARG1, 0(RA)
> + | stw CARG2, 4(RA)
> + |.endif
> | cmplw RA, TMP2
> | cmplw cr1, RC, TMP3
> | bge >3 // All destination slots filled?
> @@ -4599,9 +5324,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | addi MULTRES, TMP1, 8
> | bgt >7
> |6:
> + |.if FPU
> | lfd f0, 0(RC)
> + |.else
> + | lwz CARG1, 0(RC)
> + | lwz CARG2, 4(RC)
> + |.endif
> | addi RC, RC, 8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw CARG1, 0(RA)
> + | stw CARG2, 4(RA)
> + |.endif
> | cmplw RC, TMP3
> | addi RA, RA, 8
> | blt <6 // More vararg slots?
> @@ -4652,14 +5387,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | li TMP1, 0
> |2:
> | addi TMP3, TMP1, 8
> + |.if FPU
> | lfdx f0, RA, TMP1
> + |.else
> + | add CARG3, RA, TMP1
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | cmpw TMP3, RC
> + |.if FPU
> | stfdx f0, TMP2, TMP1
> + |.else
> + | add CARG3, TMP2, TMP1
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | beq >3
> | addi TMP1, TMP3, 8
> + |.if FPU
> | lfdx f1, RA, TMP3
> + |.else
> + | add CARG3, RA, TMP3
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | cmpw TMP1, RC
> + |.if FPU
> | stfdx f1, TMP2, TMP3
> + |.else
> + | add CARG3, TMP2, TMP3
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | bne <2
> |3:
> |5:
> @@ -4701,8 +5460,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | subi TMP2, BASE, 8
> | decode_RB8 RB, INS
> if (op == BC_RET1) {
> + |.if FPU
> | lfd f0, 0(RA)
> | stfd f0, 0(TMP2)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + | stw CARG1, 0(TMP2)
> + | stw CARG2, 4(TMP2)
> + |.endif
> }
> |5:
> | cmplw RB, RD
> @@ -4763,11 +5529,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |4:
> | stw CARG1, FORL_IDX*8+4(RA)
> } else {
> - | lwz TMP3, FORL_STEP*8(RA)
> + | lwz SAVE0, FORL_STEP*8(RA)
> | lwz CARG3, FORL_STEP*8+4(RA)
> | lwz TMP2, FORL_STOP*8(RA)
> | lwz CARG2, FORL_STOP*8+4(RA)
> - | cmplw cr7, TMP3, TISNUM
> + | cmplw cr7, SAVE0, TISNUM
> | cmplw cr1, TMP2, TISNUM
> | crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
> | crand 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -4810,41 +5576,80 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> if (vk) {
> |.if DUALNUM
> |9: // FP loop.
> + |.if FPU
> | lfd f1, FORL_IDX*8(RA)
> |.else
> + | lwz CARG1, FORL_IDX*8(RA)
> + | lwz CARG2, FORL_IDX*8+4(RA)
> + |.endif
> + |.else
> | lfdux f1, RA, BASE
> |.endif
> + |.if FPU
> | lfd f3, FORL_STEP*8(RA)
> | lfd f2, FORL_STOP*8(RA)
> - | lwz TMP3, FORL_STEP*8(RA)
> | fadd f1, f1, f3
> | stfd f1, FORL_IDX*8(RA)
> + |.else
> + | lwz CARG3, FORL_STEP*8(RA)
> + | lwz CARG4, FORL_STEP*8+4(RA)
> + | mr SAVE1, RD
> + | blex __adddf3
> + | mr RD, SAVE1
> + | stw CRET1, FORL_IDX*8(RA)
> + | stw CRET2, FORL_IDX*8+4(RA)
> + | lwz CARG3, FORL_STOP*8(RA)
> + | lwz CARG4, FORL_STOP*8+4(RA)
> + |.endif
> + | lwz SAVE0, FORL_STEP*8(RA)
> } else {
> |.if DUALNUM
> |9: // FP loop.
> |.else
> | lwzux TMP1, RA, BASE
> - | lwz TMP3, FORL_STEP*8(RA)
> + | lwz SAVE0, FORL_STEP*8(RA)
> | lwz TMP2, FORL_STOP*8(RA)
> | cmplw cr0, TMP1, TISNUM
> - | cmplw cr7, TMP3, TISNUM
> + | cmplw cr7, SAVE0, TISNUM
> | cmplw cr1, TMP2, TISNUM
> |.endif
> + |.if FPU
> | lfd f1, FORL_IDX*8(RA)
> + |.else
> + | lwz CARG1, FORL_IDX*8(RA)
> + | lwz CARG2, FORL_IDX*8+4(RA)
> + |.endif
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr7+lt
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> + |.if FPU
> | lfd f2, FORL_STOP*8(RA)
> + |.else
> + | lwz CARG3, FORL_STOP*8(RA)
> + | lwz CARG4, FORL_STOP*8+4(RA)
> + |.endif
> | bge ->vmeta_for
> }
> - | cmpwi cr6, TMP3, 0
> + | cmpwi cr6, SAVE0, 0
> if (op != BC_JFORL) {
> | srwi RD, RD, 1
> }
> + |.if FPU
> | stfd f1, FORL_EXT*8(RA)
> + |.else
> + | stw CARG1, FORL_EXT*8(RA)
> + | stw CARG2, FORL_EXT*8+4(RA)
> + |.endif
> if (op != BC_JFORL) {
> | add RD, PC, RD
> }
> + |.if FPU
> | fcmpu cr0, f1, f2
> + |.else
> + | mr SAVE1, RD
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + | mr RD, SAVE1
> + |.endif
> if (op == BC_JFORI) {
> | addis PC, RD, -(BCBIAS_J*4 >> 16)
> }
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
2023-08-15 11:40 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:13 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:13 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:54PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> > Sponsored by Cisco Systems, Inc.
> >
> > (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
> >
> > The software floating point library is used on machines which do not
> > have hardware support for floating point [1]. This patch enables
> > support for such machines in the VM for powerpc.
> Typo: s/powerpc/PowerPC/
Fixed, thanks.
> > This includes:
> > * Any loads/storages of double values use load/storage through 32-bit
> Typo: s/storages/stores/ Feel free to ignore, though.
Fixed, thanks.
> > registers of `lo` and `hi` part of the TValue union.
> > * Macro .FPU is added to skip instructions necessary only for
> > hard-float operations (load/store floating point registers from/on the
> > stack, when leave/enter VM, for example).
> Typo: s/leave/enter/leaving/entering/
Fixed, thanks!
> > * Now r25 named as `SAVE1` is used as saved temporary register (used in
> > different fast functions)
> > * `sfi2d` macro is introduced to convert integer, that represents a
> Typo: s/convert/convert an/
Fixed.
> > soft-float, to double. Receives destination and source registers, uses
> Typo: s/to double/to a double/
Fixed.
> > `TMP0` and `TMP1`.
> > * `sfpmod` macro is introduced for soft-float point `fmod` built-in.
> > * `ins_arith` now receives the third parameter -- operation to use for
> > soft-float point.
> > * `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
> > there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
> > set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.
> >
> > Support of soft-float point for the JIT compiler will be added in the
> > next patch.
> >
> > [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> >
> > Sergey Kaplun:
> > * added the description for the feature
> >
> > Part of tarantool/tarantool#8825
> > ---
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
2023-08-15 11:40 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 14:53 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:53 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey
thanks for the patch! LGTM
On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the VM for powerpc.
> This includes:
> * Any loads/storages of double values use load/storage through 32-bit
> registers of `lo` and `hi` part of the TValue union.
> * Macro .FPU is added to skip instructions necessary only for
> hard-float operations (load/store floating point registers from/on the
> stack, when leave/enter VM, for example).
> * Now r25 named as `SAVE1` is used as saved temporary register (used in
> different fast functions)
> * `sfi2d` macro is introduced to convert integer, that represents a
> soft-float, to double. Receives destination and source registers, uses
> `TMP0` and `TMP1`.
> * `sfpmod` macro is introduced for soft-float point `fmod` built-in.
> * `ins_arith` now receives the third parameter -- operation to use for
> soft-float point.
> * `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
> there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
> set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.
>
> Support of soft-float point for the JIT compiler will be added in the
> next patch.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> src/host/buildvm_asm.c | 2 +-
> src/lj_arch.h | 29 +-
> src/lj_ccall.c | 38 +-
> src/lj_ccall.h | 4 +-
> src/lj_ccallback.c | 30 +-
> src/lj_frame.h | 2 +-
> src/lj_ircall.h | 2 +-
> src/vm_ppc.dasc | 1249 +++++++++++++++++++++++++++++++++-------
> 8 files changed, 1101 insertions(+), 255 deletions(-)
>
> diff --git a/src/host/buildvm_asm.c b/src/host/buildvm_asm.c
> index ffd14903..43595b31 100644
> --- a/src/host/buildvm_asm.c
> +++ b/src/host/buildvm_asm.c
> @@ -338,7 +338,7 @@ void emit_asm(BuildCtx *ctx)
> #if !(LJ_TARGET_PS3 || LJ_TARGET_PSVITA)
> fprintf(ctx->fp, "\t.section .note.GNU-stack,\"\"," ELFASM_PX "progbits\n");
> #endif
> -#if LJ_TARGET_PPC && !LJ_TARGET_PS3
> +#if LJ_TARGET_PPC && !LJ_TARGET_PS3 && !LJ_ABI_SOFTFP
> /* Hard-float ABI. */
> fprintf(ctx->fp, "\t.gnu_attribute 4, 1\n");
> #endif
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index c39526ea..8bb8757d 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -262,6 +262,29 @@
> #else
> #define LJ_ARCH_BITS 32
> #define LJ_ARCH_NAME "ppc"
> +
> +#if !defined(LJ_ARCH_HASFPU)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ARCH_HASFPU 0
> +#else
> +#define LJ_ARCH_HASFPU 1
> +#endif
> +#endif
> +
> +#if !defined(LJ_ABI_SOFTFP)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ABI_SOFTFP 1
> +#else
> +#define LJ_ABI_SOFTFP 0
> +#endif
> +#endif
> +#endif
> +
> +#if LJ_ABI_SOFTFP
> +#define LJ_ARCH_NOJIT 1 /* NYI */
> +#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL
> +#else
> +#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
> #endif
>
> #define LJ_TARGET_PPC 1
> @@ -271,7 +294,6 @@
> #define LJ_TARGET_MASKSHIFT 0
> #define LJ_TARGET_MASKROT 1
> #define LJ_TARGET_UNIFYROT 1 /* Want only IR_BROL. */
> -#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
>
> #if LJ_TARGET_CONSOLE
> #define LJ_ARCH_PPC32ON64 1
> @@ -431,16 +453,13 @@
> #error "No support for ILP32 model on ARM64"
> #endif
> #elif LJ_TARGET_PPC
> -#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> -#error "No support for PowerPC CPUs without double-precision FPU"
> -#endif
> #if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
> #error "No support for little-endian PPC32"
> #endif
> #if LJ_ARCH_PPC64
> #error "No support for PowerPC 64 bit mode (yet)"
> #endif
> -#ifdef __NO_FPRS__
> +#if defined(__NO_FPRS__) && !defined(_SOFT_FLOAT)
> #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
> #endif
> #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ccall.c b/src/lj_ccall.c
> index d39ff861..c1e12f56 100644
> --- a/src/lj_ccall.c
> +++ b/src/lj_ccall.c
> @@ -388,6 +388,24 @@
> #define CCALL_HANDLE_COMPLEXARG \
> /* Pass complex by value in 2 or 4 GPRs. */
>
> +#define CCALL_HANDLE_GPR \
> + /* Try to pass argument in GPRs. */ \
> + if (n > 1) { \
> + lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \
> + if (ctype_isinteger(d->info) || ctype_isfp(d->info)) \
> + ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> + else if (ngpr + n > maxgpr) \
> + ngpr = maxgpr; /* Prevent reordering. */ \
> + } \
> + if (ngpr + n <= maxgpr) { \
> + dp = &cc->gpr[ngpr]; \
> + ngpr += n; \
> + goto done; \
> + } \
> +
> +#if LJ_ABI_SOFTFP
> +#define CCALL_HANDLE_REGARG CCALL_HANDLE_GPR
> +#else
> #define CCALL_HANDLE_REGARG \
> if (isfp) { /* Try to pass argument in FPRs. */ \
> if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -396,24 +414,16 @@
> d = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */ \
> goto done; \
> } \
> - } else { /* Try to pass argument in GPRs. */ \
> - if (n > 1) { \
> - lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \
> - if (ctype_isinteger(d->info)) \
> - ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> - else if (ngpr + n > maxgpr) \
> - ngpr = maxgpr; /* Prevent reordering. */ \
> - } \
> - if (ngpr + n <= maxgpr) { \
> - dp = &cc->gpr[ngpr]; \
> - ngpr += n; \
> - goto done; \
> - } \
> + } else { \
> + CCALL_HANDLE_GPR \
> }
> +#endif
>
> +#if !LJ_ABI_SOFTFP
> #define CCALL_HANDLE_RET \
> if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
> ctr = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */
> +#endif
>
> #elif LJ_TARGET_MIPS32
> /* -- MIPS o32 calling conventions ---------------------------------------- */
> @@ -1081,7 +1091,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
> }
> if (fid) lj_err_caller(L, LJ_ERR_FFI_NUMARG); /* Too few arguments. */
>
> -#if LJ_TARGET_X64 || LJ_TARGET_PPC
> +#if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP)
> cc->nfpr = nfpr; /* Required for vararg functions. */
> #endif
> cc->nsp = nsp;
> diff --git a/src/lj_ccall.h b/src/lj_ccall.h
> index 59f66481..6efa48c7 100644
> --- a/src/lj_ccall.h
> +++ b/src/lj_ccall.h
> @@ -86,9 +86,9 @@ typedef union FPRArg {
> #elif LJ_TARGET_PPC
>
> #define CCALL_NARG_GPR 8
> -#define CCALL_NARG_FPR 8
> +#define CCALL_NARG_FPR (LJ_ABI_SOFTFP ? 0 : 8)
> #define CCALL_NRET_GPR 4 /* For complex double. */
> -#define CCALL_NRET_FPR 1
> +#define CCALL_NRET_FPR (LJ_ABI_SOFTFP ? 0 : 1)
> #define CCALL_SPS_EXTRA 4
> #define CCALL_SPS_FREE 0
>
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index 224b6b94..c33190d7 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -419,6 +419,23 @@ void lj_ccallback_mcode_free(CTState *cts)
>
> #elif LJ_TARGET_PPC
>
> +#define CALLBACK_HANDLE_GPR \
> + if (n > 1) { \
> + lua_assert(((LJ_ABI_SOFTFP && ctype_isnum(cta->info)) || /* double. */ \
> + ctype_isinteger(cta->info)) && n == 2); /* int64_t. */ \
> + ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> + } \
> + if (ngpr + n <= maxgpr) { \
> + sp = &cts->cb.gpr[ngpr]; \
> + ngpr += n; \
> + goto done; \
> + }
> +
> +#if LJ_ABI_SOFTFP
> +#define CALLBACK_HANDLE_REGARG \
> + CALLBACK_HANDLE_GPR \
> + UNUSED(isfp);
> +#else
> #define CALLBACK_HANDLE_REGARG \
> if (isfp) { \
> if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -427,20 +444,15 @@ void lj_ccallback_mcode_free(CTState *cts)
> goto done; \
> } \
> } else { /* Try to pass argument in GPRs. */ \
> - if (n > 1) { \
> - lua_assert(ctype_isinteger(cta->info) && n == 2); /* int64_t. */ \
> - ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \
> - } \
> - if (ngpr + n <= maxgpr) { \
> - sp = &cts->cb.gpr[ngpr]; \
> - ngpr += n; \
> - goto done; \
> - } \
> + CALLBACK_HANDLE_GPR \
> }
> +#endif
>
> +#if !LJ_ABI_SOFTFP
> #define CALLBACK_HANDLE_RET \
> if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
> *(double *)dp = *(float *)dp; /* FPRs always hold doubles. */
> +#endif
>
> #elif LJ_TARGET_MIPS32
>
> diff --git a/src/lj_frame.h b/src/lj_frame.h
> index 2bdf3c48..5cb3d639 100644
> --- a/src/lj_frame.h
> +++ b/src/lj_frame.h
> @@ -226,7 +226,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK }; /* Special continuations. */
> #define CFRAME_OFS_L 36
> #define CFRAME_OFS_PC 32
> #define CFRAME_OFS_MULTRES 28
> -#define CFRAME_SIZE 272
> +#define CFRAME_SIZE (LJ_ARCH_HASFPU ? 272 : 128)
> #define CFRAME_SHIFT_MULTRES 3
> #endif
> #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index c1ac29d1..bbad35b1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -291,7 +291,7 @@ LJ_DATA const CCallInfo lj_ir_callinfo[IRCALL__MAX+1];
> #define fp64_f2l __aeabi_f2lz
> #define fp64_f2ul __aeabi_f2ulz
> #endif
> -#elif LJ_TARGET_MIPS
> +#elif LJ_TARGET_MIPS || LJ_TARGET_PPC
> #define softfp_add __adddf3
> #define softfp_sub __subdf3
> #define softfp_mul __muldf3
> diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
> index 7ad8df37..980ad897 100644
> --- a/src/vm_ppc.dasc
> +++ b/src/vm_ppc.dasc
> @@ -103,6 +103,18 @@
> |// Fixed register assignments for the interpreter.
> |// Don't use: r1 = sp, r2 and r13 = reserved (TOC, TLS or SDATA)
> |
> +|.macro .FPU, a, b
> +|.if FPU
> +| a, b
> +|.endif
> +|.endmacro
> +|
> +|.macro .FPU, a, b, c
> +|.if FPU
> +| a, b, c
> +|.endif
> +|.endmacro
> +|
> |// The following must be C callee-save (but BASE is often refetched).
> |.define BASE, r14 // Base of current Lua stack frame.
> |.define KBASE, r15 // Constants of current Lua function.
> @@ -116,8 +128,10 @@
> |.define TISNUM, r22
> |.define TISNIL, r23
> |.define ZERO, r24
> +|.if FPU
> |.define TOBIT, f30 // 2^52 + 2^51.
> |.define TONUM, f31 // 2^52 + 2^51 + 2^31.
> +|.endif
> |
> |// The following temporaries are not saved across C calls, except for RA.
> |.define RA, r20 // Callee-save.
> @@ -133,6 +147,7 @@
> |
> |// Saved temporaries.
> |.define SAVE0, r21
> +|.define SAVE1, r25
> |
> |// Calling conventions.
> |.define CARG1, r3
> @@ -141,8 +156,10 @@
> |.define CARG4, r6 // Overlaps TMP3.
> |.define CARG5, r7 // Overlaps INS.
> |
> +|.if FPU
> |.define FARG1, f1
> |.define FARG2, f2
> +|.endif
> |
> |.define CRET1, r3
> |.define CRET2, r4
> @@ -213,10 +230,16 @@
> |.endif
> |.else
> |
> +|.if FPU
> |.define SAVE_LR, 276(sp)
> |.define CFRAME_SPACE, 272 // Delta for sp.
> |// Back chain for sp: 272(sp) <-- sp entering interpreter
> |.define SAVE_FPR_, 128 // .. 128+18*8: 64 bit FPR saves.
> +|.else
> +|.define SAVE_LR, 132(sp)
> +|.define CFRAME_SPACE, 128 // Delta for sp.
> +|// Back chain for sp: 128(sp) <-- sp entering interpreter
> +|.endif
> |.define SAVE_GPR_, 56 // .. 56+18*4: 32 bit GPR saves.
> |.define SAVE_CR, 52(sp) // 32 bit CR save.
> |.define SAVE_ERRF, 48(sp) // 32 bit C frame info.
> @@ -226,16 +249,25 @@
> |.define SAVE_PC, 32(sp)
> |.define SAVE_MULTRES, 28(sp)
> |.define UNUSED1, 24(sp)
> +|.if FPU
> |.define TMPD_LO, 20(sp)
> |.define TMPD_HI, 16(sp)
> |.define TONUM_LO, 12(sp)
> |.define TONUM_HI, 8(sp)
> +|.else
> +|.define SFSAVE_4, 20(sp)
> +|.define SFSAVE_3, 16(sp)
> +|.define SFSAVE_2, 12(sp)
> +|.define SFSAVE_1, 8(sp)
> +|.endif
> |// Next frame lr: 4(sp)
> |// Back chain for sp: 0(sp) <-- sp while in interpreter
> |
> +|.if FPU
> |.define TMPD_BLO, 23(sp)
> |.define TMPD, TMPD_HI
> |.define TONUM_D, TONUM_HI
> +|.endif
> |
> |.endif
> |
> @@ -245,7 +277,7 @@
> |.else
> | stw r..reg, SAVE_GPR_+(reg-14)*4(sp)
> |.endif
> -| stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +| .FPU stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> |.endmacro
> |.macro rest_, reg
> |.if GPR64
> @@ -253,7 +285,7 @@
> |.else
> | lwz r..reg, SAVE_GPR_+(reg-14)*4(sp)
> |.endif
> -| lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +| .FPU lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> |.endmacro
> |
> |.macro saveregs
> @@ -323,6 +355,7 @@
> |// Trap for not-yet-implemented parts.
> |.macro NYI; tw 4, sp, sp; .endmacro
> |
> +|.if FPU
> |// int/FP conversions.
> |.macro tonum_i, freg, reg
> | xoris reg, reg, 0x8000
> @@ -346,6 +379,7 @@
> |.macro toint, reg, freg
> | toint reg, freg, freg
> |.endmacro
> +|.endif
> |
> |//-----------------------------------------------------------------------
> |
> @@ -533,9 +567,19 @@ static void build_subroutines(BuildCtx *ctx)
> | beq >2
> |1:
> | addic. TMP1, TMP1, -8
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + |.endif
> | addi RA, RA, 8
> + |.if FPU
> | stfd f0, 0(BASE)
> + |.else
> + | stw CARG1, 0(BASE)
> + | stw CARG2, 4(BASE)
> + |.endif
> | addi BASE, BASE, 8
> | bney <1
> |
> @@ -613,23 +657,23 @@ static void build_subroutines(BuildCtx *ctx)
> | .toc ld TOCREG, SAVE_TOC
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp BASE, L->base
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | lwz DISPATCH, L->glref // Setup pointer to dispatch table.
> | li ZERO, 0
> - | stw TMP3, TMPD
> + | .FPU stw TMP3, TMPD
> | li TMP1, LJ_TFALSE
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> | li TISNIL, LJ_TNIL
> | li_vmstate INTERP
> - | lfs TOBIT, TMPD
> + | .FPU lfs TOBIT, TMPD
> | lwz PC, FRAME_PC(BASE) // Fetch PC of previous frame.
> | la RA, -8(BASE) // Results start at BASE-8.
> - | stw TMP3, TMPD
> + | .FPU stw TMP3, TMPD
> | addi DISPATCH, DISPATCH, GG_G2DISP
> | stw TMP1, 0(RA) // Prepend false to error message.
> | li RD, 16 // 2 results: false + error message.
> | st_vmstate
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | b ->vm_returnc
> |
> |//-----------------------------------------------------------------------
> @@ -690,22 +734,22 @@ static void build_subroutines(BuildCtx *ctx)
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp TMP1, L->top
> | lwz PC, FRAME_PC(BASE)
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | stb CARG3, L->status
> - | stw TMP3, TMPD
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | lfs TOBIT, TMPD
> + | .FPU stw TMP3, TMPD
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU lfs TOBIT, TMPD
> | sub RD, TMP1, BASE
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | addi RD, RD, 8
> - | stw TMP0, TONUM_HI
> + | .FPU stw TMP0, TONUM_HI
> | li_vmstate INTERP
> | li ZERO, 0
> | st_vmstate
> | andix. TMP0, PC, FRAME_TYPE
> | mr MULTRES, RD
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | li TISNIL, LJ_TNIL
> | beq ->BC_RET_Z
> | b ->vm_return
> @@ -739,19 +783,19 @@ static void build_subroutines(BuildCtx *ctx)
> | lp TMP2, L->base // TMP2 = old base (used in vmeta_call).
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp TMP1, L->top
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | add PC, PC, BASE
> - | stw TMP3, TMPD
> + | .FPU stw TMP3, TMPD
> | li ZERO, 0
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | lfs TOBIT, TMPD
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU lfs TOBIT, TMPD
> | sub PC, PC, TMP2 // PC = frame delta + frame type
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | sub NARGS8:RC, TMP1, BASE
> - | stw TMP0, TONUM_HI
> + | .FPU stw TMP0, TONUM_HI
> | li_vmstate INTERP
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | li TISNIL, LJ_TNIL
> | st_vmstate
> |
> @@ -839,15 +883,30 @@ static void build_subroutines(BuildCtx *ctx)
> | lwz INS, -4(PC)
> | subi CARG2, RB, 16
> | decode_RB8 SAVE0, INS
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz TMP2, 0(RA)
> + | lwz TMP3, 4(RA)
> + |.endif
> | add TMP1, BASE, SAVE0
> | stp BASE, L->base
> | cmplw TMP1, CARG2
> | sub CARG3, CARG2, TMP1
> | decode_RA8 RA, INS
> + |.if FPU
> | stfd f0, 0(CARG2)
> + |.else
> + | stw TMP2, 0(CARG2)
> + | stw TMP3, 4(CARG2)
> + |.endif
> | bney ->BC_CAT_Z
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux TMP2, RA, BASE
> + | stw TMP3, 4(RA)
> + |.endif
> | b ->cont_nop
> |
> |//-- Table indexing metamethods -----------------------------------------
> @@ -900,9 +959,19 @@ static void build_subroutines(BuildCtx *ctx)
> | // Returns TValue * (finished) or NULL (metamethod).
> | cmplwi CRET1, 0
> | beq >3
> + |.if FPU
> | lfd f0, 0(CRET1)
> + |.else
> + | lwz TMP0, 0(CRET1)
> + | lwz TMP1, 4(CRET1)
> + |.endif
> | ins_next1
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> |
> |3: // Call __index metamethod.
> @@ -920,7 +989,12 @@ static void build_subroutines(BuildCtx *ctx)
> | // Returns cTValue * or NULL.
> | cmplwi CRET1, 0
> | beq >1
> + |.if FPU
> | lfd f14, 0(CRET1)
> + |.else
> + | lwz SAVE0, 0(CRET1)
> + | lwz SAVE1, 4(CRET1)
> + |.endif
> | b ->BC_TGETR_Z
> |1:
> | stwx TISNIL, BASE, RA
> @@ -975,11 +1049,21 @@ static void build_subroutines(BuildCtx *ctx)
> | bl extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k)
> | // Returns TValue * (finished) or NULL (metamethod).
> | cmplwi CRET1, 0
> + |.if FPU
> | lfdx f0, BASE, RA
> + |.else
> + | lwzux TMP2, RA, BASE
> + | lwz TMP3, 4(RA)
> + |.endif
> | beq >3
> | // NOBARRIER: lj_meta_tset ensures the table is not black.
> | ins_next1
> + |.if FPU
> | stfd f0, 0(CRET1)
> + |.else
> + | stw TMP2, 0(CRET1)
> + | stw TMP3, 4(CRET1)
> + |.endif
> | ins_next2
> |
> |3: // Call __newindex metamethod.
> @@ -990,7 +1074,12 @@ static void build_subroutines(BuildCtx *ctx)
> | add PC, TMP1, BASE
> | lwz LFUNC:RB, FRAME_FUNC(BASE) // Guaranteed to be a function here.
> | li NARGS8:RC, 24 // 3 args for func(t, k, v)
> + |.if FPU
> | stfd f0, 16(BASE) // Copy value to third argument.
> + |.else
> + | stw TMP2, 16(BASE)
> + | stw TMP3, 20(BASE)
> + |.endif
> | b ->vm_call_dispatch_f
> |
> |->vmeta_tsetr:
> @@ -999,7 +1088,12 @@ static void build_subroutines(BuildCtx *ctx)
> | stw PC, SAVE_PC
> | bl extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key)
> | // Returns TValue *.
> + |.if FPU
> | stfd f14, 0(CRET1)
> + |.else
> + | stw SAVE0, 0(CRET1)
> + | stw SAVE1, 4(CRET1)
> + |.endif
> | b ->cont_nop
> |
> |//-- Comparison metamethods ---------------------------------------------
> @@ -1038,9 +1132,19 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |->cont_ra: // RA = resultptr
> | lwz INS, -4(PC)
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + |.endif
> | decode_RA8 TMP1, INS
> + |.if FPU
> | stfdx f0, BASE, TMP1
> + |.else
> + | stwux CARG1, TMP1, BASE
> + | stw CARG2, 4(TMP1)
> + |.endif
> | b ->cont_nop
> |
> |->cont_condt: // RA = resultptr
> @@ -1246,22 +1350,32 @@ static void build_subroutines(BuildCtx *ctx)
> |.macro .ffunc_n, name
> |->ff_ .. name:
> | cmplwi NARGS8:RC, 8
> - | lwz CARG3, 0(BASE)
> + | lwz CARG1, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + |.endif
> | blt ->fff_fallback
> - | checknum CARG3; bge ->fff_fallback
> + | checknum CARG1; bge ->fff_fallback
> |.endmacro
> |
> |.macro .ffunc_nn, name
> |->ff_ .. name:
> | cmplwi NARGS8:RC, 16
> - | lwz CARG3, 0(BASE)
> + | lwz CARG1, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> - | lwz CARG4, 8(BASE)
> + | lwz CARG3, 8(BASE)
> | lfd FARG2, 8(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + | lwz CARG3, 8(BASE)
> + | lwz CARG4, 12(BASE)
> + |.endif
> | blt ->fff_fallback
> + | checknum CARG1; bge ->fff_fallback
> | checknum CARG3; bge ->fff_fallback
> - | checknum CARG4; bge ->fff_fallback
> |.endmacro
> |
> |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1.
> @@ -1282,14 +1396,21 @@ static void build_subroutines(BuildCtx *ctx)
> | bge cr1, ->fff_fallback
> | stw CARG3, 0(RA)
> | addi RD, NARGS8:RC, 8 // Compute (nresults+1)*8.
> + | addi TMP1, BASE, 8
> + | add TMP2, RA, NARGS8:RC
> | stw CARG1, 4(RA)
> | beq ->fff_res // Done if exactly 1 argument.
> - | li TMP1, 8
> - | subi RC, RC, 8
> |1:
> - | cmplw TMP1, RC
> - | lfdx f0, BASE, TMP1
> - | stfdx f0, RA, TMP1
> + | cmplw TMP1, TMP2
> + |.if FPU
> + | lfd f0, 0(TMP1)
> + | stfd f0, 0(TMP1)
> + |.else
> + | lwz CARG1, 0(TMP1)
> + | lwz CARG2, 4(TMP1)
> + | stw CARG1, -8(TMP1)
> + | stw CARG2, -4(TMP1)
> + |.endif
> | addi TMP1, TMP1, 8
> | bney <1
> | b ->fff_res
> @@ -1304,8 +1425,14 @@ static void build_subroutines(BuildCtx *ctx)
> | orc TMP1, TMP2, TMP0
> | addi TMP1, TMP1, ~LJ_TISNUM+1
> | slwi TMP1, TMP1, 3
> + |.if FPU
> | la TMP2, CFUNC:RB->upvalue
> | lfdx FARG1, TMP2, TMP1
> + |.else
> + | add TMP1, CFUNC:RB, TMP1
> + | lwz CARG1, CFUNC:TMP1->upvalue[0].u32.hi
> + | lwz CARG2, CFUNC:TMP1->upvalue[0].u32.lo
> + |.endif
> | b ->fff_resn
> |
> |//-- Base library: getters and setters ---------------------------------
> @@ -1383,7 +1510,12 @@ static void build_subroutines(BuildCtx *ctx)
> | mr CARG1, L
> | bl extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key)
> | // Returns cTValue *.
> + |.if FPU
> | lfd FARG1, 0(CRET1)
> + |.else
> + | lwz CARG2, 4(CRET1)
> + | lwz CARG1, 0(CRET1) // Caveat: CARG1 == CRET1.
> + |.endif
> | b ->fff_resn
> |
> |//-- Base library: conversions ------------------------------------------
> @@ -1392,7 +1524,11 @@ static void build_subroutines(BuildCtx *ctx)
> | // Only handles the number case inline (without a base argument).
> | cmplwi NARGS8:RC, 8
> | lwz CARG1, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + |.endif
> | bne ->fff_fallback // Exactly one argument.
> | checknum CARG1; bgt ->fff_fallback
> | b ->fff_resn
> @@ -1443,12 +1579,23 @@ static void build_subroutines(BuildCtx *ctx)
> | cmplwi CRET1, 0
> | li CARG3, LJ_TNIL
> | beq ->fff_restv // End of traversal: return nil.
> - | lfd f0, 8(BASE) // Copy key and value to results.
> | la RA, -8(BASE)
> + |.if FPU
> + | lfd f0, 8(BASE) // Copy key and value to results.
> | lfd f1, 16(BASE)
> | stfd f0, 0(RA)
> - | li RD, (2+1)*8
> | stfd f1, 8(RA)
> + |.else
> + | lwz CARG1, 8(BASE)
> + | lwz CARG2, 12(BASE)
> + | lwz CARG3, 16(BASE)
> + | lwz CARG4, 20(BASE)
> + | stw CARG1, 0(RA)
> + | stw CARG2, 4(RA)
> + | stw CARG3, 8(RA)
> + | stw CARG4, 12(RA)
> + |.endif
> + | li RD, (2+1)*8
> | b ->fff_res
> |
> |.ffunc_1 pairs
> @@ -1457,17 +1604,32 @@ static void build_subroutines(BuildCtx *ctx)
> | bne ->fff_fallback
> #if LJ_52
> | lwz TAB:TMP2, TAB:CARG1->metatable
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | cmplwi TAB:TMP2, 0
> | la RA, -8(BASE)
> | bne ->fff_fallback
> #else
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | la RA, -8(BASE)
> #endif
> | stw TISNIL, 8(BASE)
> | li RD, (3+1)*8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw TMP0, 0(RA)
> + | stw TMP1, 4(RA)
> + |.endif
> | b ->fff_res
> |
> |.ffunc ipairs_aux
> @@ -1513,14 +1675,24 @@ static void build_subroutines(BuildCtx *ctx)
> | stfd FARG2, 0(RA)
> |.endif
> | ble >2 // Not in array part?
> + |.if FPU
> | lwzx TMP2, TMP1, TMP3
> | lfdx f0, TMP1, TMP3
> + |.else
> + | lwzux TMP2, TMP1, TMP3
> + | lwz TMP3, 4(TMP1)
> + |.endif
> |1:
> | checknil TMP2
> | li RD, (0+1)*8
> | beq ->fff_res // End of iteration, return 0 results.
> | li RD, (2+1)*8
> + |.if FPU
> | stfd f0, 8(RA)
> + |.else
> + | stw TMP2, 8(RA)
> + | stw TMP3, 12(RA)
> + |.endif
> | b ->fff_res
> |2: // Check for empty hash part first. Otherwise call C function.
> | lwz TMP0, TAB:CARG1->hmask
> @@ -1534,7 +1706,11 @@ static void build_subroutines(BuildCtx *ctx)
> | li RD, (0+1)*8
> | beq ->fff_res
> | lwz TMP2, 0(CRET1)
> + |.if FPU
> | lfd f0, 0(CRET1)
> + |.else
> + | lwz TMP3, 4(CRET1)
> + |.endif
> | b <1
> |
> |.ffunc_1 ipairs
> @@ -1543,12 +1719,22 @@ static void build_subroutines(BuildCtx *ctx)
> | bne ->fff_fallback
> #if LJ_52
> | lwz TAB:TMP2, TAB:CARG1->metatable
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | cmplwi TAB:TMP2, 0
> | la RA, -8(BASE)
> | bne ->fff_fallback
> #else
> + |.if FPU
> | lfd f0, CFUNC:RB->upvalue[0]
> + |.else
> + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> + |.endif
> | la RA, -8(BASE)
> #endif
> |.if DUALNUM
> @@ -1558,7 +1744,12 @@ static void build_subroutines(BuildCtx *ctx)
> |.endif
> | stw ZERO, 12(BASE)
> | li RD, (3+1)*8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw TMP0, 0(RA)
> + | stw TMP1, 4(RA)
> + |.endif
> | b ->fff_res
> |
> |//-- Base library: catch errors ----------------------------------------
> @@ -1577,19 +1768,32 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |.ffunc xpcall
> | cmplwi NARGS8:RC, 16
> - | lwz CARG4, 8(BASE)
> + | lwz CARG3, 8(BASE)
> + |.if FPU
> | lfd FARG2, 8(BASE)
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + | lwz CARG4, 12(BASE)
> + |.endif
> | blt ->fff_fallback
> | lbz TMP1, DISPATCH_GL(hookmask)(DISPATCH)
> | mr TMP2, BASE
> - | checkfunc CARG4; bne ->fff_fallback // Traceback must be a function.
> + | checkfunc CARG3; bne ->fff_fallback // Traceback must be a function.
> | la BASE, 16(BASE)
> | // Remember active hook before pcall.
> | rlwinm TMP1, TMP1, 32-HOOK_ACTIVE_SHIFT, 31, 31
> + |.if FPU
> | stfd FARG2, 0(TMP2) // Swap function and traceback.
> - | subi NARGS8:RC, NARGS8:RC, 16
> | stfd FARG1, 8(TMP2)
> + |.else
> + | stw CARG3, 0(TMP2)
> + | stw CARG4, 4(TMP2)
> + | stw CARG1, 8(TMP2)
> + | stw CARG2, 12(TMP2)
> + |.endif
> + | subi NARGS8:RC, NARGS8:RC, 16
> | addi PC, TMP1, 16+FRAME_PCALL
> | b ->vm_call_dispatch
> |
> @@ -1632,9 +1836,21 @@ static void build_subroutines(BuildCtx *ctx)
> | stp BASE, L->top
> |2: // Move args to coroutine.
> | cmpw TMP1, NARGS8:RC
> + |.if FPU
> | lfdx f0, BASE, TMP1
> + |.else
> + | add CARG3, BASE, TMP1
> + | lwz TMP2, 0(CARG3)
> + | lwz TMP3, 4(CARG3)
> + |.endif
> | beq >3
> + |.if FPU
> | stfdx f0, CARG2, TMP1
> + |.else
> + | add CARG3, CARG2, TMP1
> + | stw TMP2, 0(CARG3)
> + | stw TMP3, 4(CARG3)
> + |.endif
> | addi TMP1, TMP1, 8
> | b <2
> |3:
> @@ -1665,8 +1881,17 @@ static void build_subroutines(BuildCtx *ctx)
> | stp TMP2, L:SAVE0->top // Clear coroutine stack.
> |5: // Move results from coroutine.
> | cmplw TMP1, TMP3
> + |.if FPU
> | lfdx f0, TMP2, TMP1
> | stfdx f0, BASE, TMP1
> + |.else
> + | add CARG3, TMP2, TMP1
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + | add CARG3, BASE, TMP1
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | addi TMP1, TMP1, 8
> | bne <5
> |6:
> @@ -1691,12 +1916,22 @@ static void build_subroutines(BuildCtx *ctx)
> | andix. TMP0, PC, FRAME_TYPE
> | la TMP3, -8(TMP3)
> | li TMP1, LJ_TFALSE
> + |.if FPU
> | lfd f0, 0(TMP3)
> + |.else
> + | lwz CARG1, 0(TMP3)
> + | lwz CARG2, 4(TMP3)
> + |.endif
> | stp TMP3, L:SAVE0->top // Remove error from coroutine stack.
> | li RD, (2+1)*8
> | stw TMP1, -8(BASE) // Prepend false to results.
> | la RA, -8(BASE)
> + |.if FPU
> | stfd f0, 0(BASE) // Copy error message.
> + |.else
> + | stw CARG1, 0(BASE) // Copy error message.
> + | stw CARG2, 4(BASE)
> + |.endif
> | b <7
> |.else
> | mr CARG1, L
> @@ -1875,7 +2110,12 @@ static void build_subroutines(BuildCtx *ctx)
> | lus CARG1, 0x8000 // -(2^31).
> | beqy ->fff_resi
> |5:
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + |.endif
> | blex func
> | b ->fff_resn
> |.endmacro
> @@ -1899,10 +2139,14 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |.ffunc math_log
> | cmplwi NARGS8:RC, 8
> - | lwz CARG3, 0(BASE)
> - | lfd FARG1, 0(BASE)
> + | lwz CARG1, 0(BASE)
> | bne ->fff_fallback // Need exactly 1 argument.
> - | checknum CARG3; bge ->fff_fallback
> + | checknum CARG1; bge ->fff_fallback
> + |.if FPU
> + | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG2, 4(BASE)
> + |.endif
> | blex log
> | b ->fff_resn
> |
> @@ -1924,17 +2168,24 @@ static void build_subroutines(BuildCtx *ctx)
> |.if DUALNUM
> |.ffunc math_ldexp
> | cmplwi NARGS8:RC, 16
> - | lwz CARG3, 0(BASE)
> + | lwz TMP0, 0(BASE)
> + |.if FPU
> | lfd FARG1, 0(BASE)
> - | lwz CARG4, 8(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + |.endif
> + | lwz TMP1, 8(BASE)
> |.if GPR64
> | lwz CARG2, 12(BASE)
> - |.else
> + |.elif FPU
> | lwz CARG1, 12(BASE)
> + |.else
> + | lwz CARG3, 12(BASE)
> |.endif
> | blt ->fff_fallback
> - | checknum CARG3; bge ->fff_fallback
> - | checknum CARG4; bne ->fff_fallback
> + | checknum TMP0; bge ->fff_fallback
> + | checknum TMP1; bne ->fff_fallback
> |.else
> |.ffunc_nn math_ldexp
> |.if GPR64
> @@ -1949,8 +2200,10 @@ static void build_subroutines(BuildCtx *ctx)
> |.ffunc_n math_frexp
> |.if GPR64
> | la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
> - |.else
> + |.elif FPU
> | la CARG1, DISPATCH_GL(tmptv)(DISPATCH)
> + |.else
> + | la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
> |.endif
> | lwz PC, FRAME_PC(BASE)
> | blex frexp
> @@ -1959,7 +2212,12 @@ static void build_subroutines(BuildCtx *ctx)
> |.if not DUALNUM
> | tonum_i FARG2, TMP1
> |.endif
> + |.if FPU
> | stfd FARG1, 0(RA)
> + |.else
> + | stw CRET1, 0(RA)
> + | stw CRET2, 4(RA)
> + |.endif
> | li RD, (2+1)*8
> |.if DUALNUM
> | stw TISNUM, 8(RA)
> @@ -1972,13 +2230,20 @@ static void build_subroutines(BuildCtx *ctx)
> |.ffunc_n math_modf
> |.if GPR64
> | la CARG2, -8(BASE)
> - |.else
> + |.elif FPU
> | la CARG1, -8(BASE)
> + |.else
> + | la CARG3, -8(BASE)
> |.endif
> | lwz PC, FRAME_PC(BASE)
> | blex modf
> | la RA, -8(BASE)
> + |.if FPU
> | stfd FARG1, 0(BASE)
> + |.else
> + | stw CRET1, 0(BASE)
> + | stw CRET2, 4(BASE)
> + |.endif
> | li RD, (2+1)*8
> | b ->fff_res
> |
> @@ -1986,13 +2251,13 @@ static void build_subroutines(BuildCtx *ctx)
> |.if DUALNUM
> | .ffunc_1 name
> | checknum CARG3
> - | addi TMP1, BASE, 8
> - | add TMP2, BASE, NARGS8:RC
> + | addi SAVE0, BASE, 8
> + | add SAVE1, BASE, NARGS8:RC
> | bne >4
> |1: // Handle integers.
> - | lwz CARG4, 0(TMP1)
> - | cmplw cr1, TMP1, TMP2
> - | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 0(SAVE0)
> + | cmplw cr1, SAVE0, SAVE1
> + | lwz CARG2, 4(SAVE0)
> | bge cr1, ->fff_resi
> | checknum CARG4
> | xoris TMP0, CARG1, 0x8000
> @@ -2009,36 +2274,76 @@ static void build_subroutines(BuildCtx *ctx)
> |.if GPR64
> | rldicl CARG1, CARG1, 0, 32
> |.endif
> - | addi TMP1, TMP1, 8
> + | addi SAVE0, SAVE0, 8
> | b <1
> |3:
> | bge ->fff_fallback
> | // Convert intermediate result to number and continue below.
> + |.if FPU
> | tonum_i FARG1, CARG1
> - | lfd FARG2, 0(TMP1)
> + | lfd FARG2, 0(SAVE0)
> + |.else
> + | mr CARG2, CARG1
> + | bl ->vm_sfi2d_1
> + | lwz CARG3, 0(SAVE0)
> + | lwz CARG4, 4(SAVE0)
> + |.endif
> | b >6
> |4:
> + |.if FPU
> | lfd FARG1, 0(BASE)
> + |.else
> + | lwz CARG1, 0(BASE)
> + | lwz CARG2, 4(BASE)
> + |.endif
> | bge ->fff_fallback
> |5: // Handle numbers.
> - | lwz CARG4, 0(TMP1)
> - | cmplw cr1, TMP1, TMP2
> - | lfd FARG2, 0(TMP1)
> + | lwz CARG3, 0(SAVE0)
> + | cmplw cr1, SAVE0, SAVE1
> + |.if FPU
> + | lfd FARG2, 0(SAVE0)
> + |.else
> + | lwz CARG4, 4(SAVE0)
> + |.endif
> | bge cr1, ->fff_resn
> - | checknum CARG4; bge >7
> + | checknum CARG3; bge >7
> |6:
> + | addi SAVE0, SAVE0, 8
> + |.if FPU
> | fsub f0, FARG1, FARG2
> - | addi TMP1, TMP1, 8
> |.if ismax
> | fsel FARG1, f0, FARG1, FARG2
> |.else
> | fsel FARG1, f0, FARG2, FARG1
> |.endif
> + |.else
> + | stw CARG1, SFSAVE_1
> + | stw CARG2, SFSAVE_2
> + | stw CARG3, SFSAVE_3
> + | stw CARG4, SFSAVE_4
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + |.if ismax
> + | blt >8
> + |.else
> + | bge >8
> + |.endif
> + | lwz CARG1, SFSAVE_1
> + | lwz CARG2, SFSAVE_2
> + | b <5
> + |8:
> + | lwz CARG1, SFSAVE_3
> + | lwz CARG2, SFSAVE_4
> + |.endif
> | b <5
> |7: // Convert integer to number and continue above.
> - | lwz CARG2, 4(TMP1)
> + | lwz CARG3, 4(SAVE0)
> | bne ->fff_fallback
> - | tonum_i FARG2, CARG2
> + |.if FPU
> + | tonum_i FARG2, CARG3
> + |.else
> + | bl ->vm_sfi2d_2
> + |.endif
> | b <6
> |.else
> | .ffunc_n name
> @@ -2238,28 +2543,37 @@ static void build_subroutines(BuildCtx *ctx)
> |
> |.macro .ffunc_bit_op, name, ins
> | .ffunc_bit name
> - | addi TMP1, BASE, 8
> - | add TMP2, BASE, NARGS8:RC
> + | addi SAVE0, BASE, 8
> + | add SAVE1, BASE, NARGS8:RC
> |1:
> - | lwz CARG4, 0(TMP1)
> - | cmplw cr1, TMP1, TMP2
> + | lwz CARG4, 0(SAVE0)
> + | cmplw cr1, SAVE0, SAVE1
> |.if DUALNUM
> - | lwz CARG2, 4(TMP1)
> + | lwz CARG2, 4(SAVE0)
> |.else
> - | lfd FARG1, 0(TMP1)
> + | lfd FARG1, 0(SAVE0)
> |.endif
> | bgey cr1, ->fff_resi
> | checknum CARG4
> |.if DUALNUM
> + |.if FPU
> | bnel ->fff_bitop_fb
> |.else
> + | beq >3
> + | stw CARG1, SFSAVE_1
> + | bl ->fff_bitop_fb
> + | mr CARG2, CARG1
> + | lwz CARG1, SFSAVE_1
> + |3:
> + |.endif
> + |.else
> | fadd FARG1, FARG1, TOBIT
> | bge ->fff_fallback
> | stfd FARG1, TMPD
> | lwz CARG2, TMPD_LO
> |.endif
> | ins CARG1, CARG1, CARG2
> - | addi TMP1, TMP1, 8
> + | addi SAVE0, SAVE0, 8
> | b <1
> |.endmacro
> |
> @@ -2281,7 +2595,14 @@ static void build_subroutines(BuildCtx *ctx)
> |.macro .ffunc_bit_sh, name, ins, shmod
> |.if DUALNUM
> | .ffunc_2 bit_..name
> + |.if FPU
> | checknum CARG3; bnel ->fff_tobit_fb
> + |.else
> + | checknum CARG3; beq >1
> + | bl ->fff_tobit_fb
> + | lwz CARG2, 12(BASE) // Conversion polluted CARG2.
> + |1:
> + |.endif
> | // Note: no inline conversion from number for 2nd argument!
> | checknum CARG4; bne ->fff_fallback
> |.else
> @@ -2318,27 +2639,77 @@ static void build_subroutines(BuildCtx *ctx)
> |->fff_resn:
> | lwz PC, FRAME_PC(BASE)
> | la RA, -8(BASE)
> + |.if FPU
> | stfd FARG1, -8(BASE)
> + |.else
> + | stw CARG1, -8(BASE)
> + | stw CARG2, -4(BASE)
> + |.endif
> | b ->fff_res1
> |
> |// Fallback FP number to bit conversion.
> |->fff_tobit_fb:
> |.if DUALNUM
> + |.if FPU
> | lfd FARG1, 0(BASE)
> | bgt ->fff_fallback
> | fadd FARG1, FARG1, TOBIT
> | stfd FARG1, TMPD
> | lwz CARG1, TMPD_LO
> | blr
> + |.else
> + | bgt ->fff_fallback
> + | mr CARG2, CARG1
> + | mr CARG1, CARG3
> + |// Modifies: CARG1, CARG2, TMP0, TMP1, TMP2.
> + |->vm_tobit:
> + | slwi TMP2, CARG1, 1
> + | addis TMP2, TMP2, 0x0020
> + | cmpwi TMP2, 0
> + | bge >2
> + | li TMP1, 0x3e0
> + | srawi TMP2, TMP2, 21
> + | not TMP1, TMP1
> + | sub. TMP2, TMP1, TMP2
> + | cmpwi cr7, CARG1, 0
> + | blt >1
> + | slwi TMP1, CARG1, 11
> + | srwi TMP0, CARG2, 21
> + | oris TMP1, TMP1, 0x8000
> + | or TMP1, TMP1, TMP0
> + | srw CARG1, TMP1, TMP2
> + | bclr 4, 28 // Return if cr7[lt] == 0, no hint.
> + | neg CARG1, CARG1
> + | blr
> + |1:
> + | addi TMP2, TMP2, 21
> + | srw TMP1, CARG2, TMP2
> + | slwi CARG2, CARG1, 12
> + | subfic TMP2, TMP2, 20
> + | slw TMP0, CARG2, TMP2
> + | or CARG1, TMP1, TMP0
> + | bclr 4, 28 // Return if cr7[lt] == 0, no hint.
> + | neg CARG1, CARG1
> + | blr
> + |2:
> + | li CARG1, 0
> + | blr
> + |.endif
> |.endif
> |->fff_bitop_fb:
> |.if DUALNUM
> - | lfd FARG1, 0(TMP1)
> + |.if FPU
> + | lfd FARG1, 0(SAVE0)
> | bgt ->fff_fallback
> | fadd FARG1, FARG1, TOBIT
> | stfd FARG1, TMPD
> | lwz CARG2, TMPD_LO
> | blr
> + |.else
> + | bgt ->fff_fallback
> + | mr CARG1, CARG4
> + | b ->vm_tobit
> + |.endif
> |.endif
> |
> |//-----------------------------------------------------------------------
> @@ -2531,10 +2902,21 @@ static void build_subroutines(BuildCtx *ctx)
> | decode_RA8 RC, INS // Call base.
> | beq >2
> |1: // Move results down.
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + |.endif
> | addic. TMP1, TMP1, -8
> | addi RA, RA, 8
> + |.if FPU
> | stfdx f0, BASE, RC
> + |.else
> + | add CARG3, BASE, RC
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | addi RC, RC, 8
> | bne <1
> |2:
> @@ -2587,10 +2969,12 @@ static void build_subroutines(BuildCtx *ctx)
> |//-----------------------------------------------------------------------
> |
> |.macro savex_, a, b, c, d
> + |.if FPU
> | stfd f..a, 16+a*8(sp)
> | stfd f..b, 16+b*8(sp)
> | stfd f..c, 16+c*8(sp)
> | stfd f..d, 16+d*8(sp)
> + |.endif
> |.endmacro
> |
> |->vm_exit_handler:
> @@ -2662,16 +3046,16 @@ static void build_subroutines(BuildCtx *ctx)
> | lwz KBASE, PC2PROTO(k)(TMP1)
> | // Setup type comparison constants.
> | li TISNUM, LJ_TISNUM
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> - | stw TMP3, TMPD
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU stw TMP3, TMPD
> | li ZERO, 0
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | lfs TOBIT, TMPD
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU lfs TOBIT, TMPD
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | li TISNIL, LJ_TNIL
> - | stw TMP0, TONUM_HI
> - | lfs TONUM, TMPD
> + | .FPU stw TMP0, TONUM_HI
> + | .FPU lfs TONUM, TMPD
> | // Modified copy of ins_next which handles function header dispatch, too.
> | lwz INS, 0(PC)
> | addi PC, PC, 4
> @@ -2716,7 +3100,35 @@ static void build_subroutines(BuildCtx *ctx)
> |//-- Math helper functions ----------------------------------------------
> |//-----------------------------------------------------------------------
> |
> - |// NYI: Use internal implementations of floor, ceil, trunc.
> + |// NYI: Use internal implementations of floor, ceil, trunc, sfcmp.
> + |
> + |.macro sfi2d, AHI, ALO
> + |.if not FPU
> + | mr. AHI, ALO
> + | bclr 12, 2 // Handle zero first.
> + | srawi TMP0, ALO, 31
> + | xor TMP1, ALO, TMP0
> + | sub TMP1, TMP1, TMP0 // Absolute value in TMP1.
> + | cntlzw AHI, TMP1
> + | andix. TMP0, TMP0, 0x800 // Mask sign bit.
> + | slw TMP1, TMP1, AHI // Align mantissa left with leading 1.
> + | subfic AHI, AHI, 0x3ff+31-1 // Exponent -1 in AHI.
> + | slwi ALO, TMP1, 21
> + | or AHI, AHI, TMP0 // Sign | Exponent.
> + | srwi TMP1, TMP1, 11
> + | slwi AHI, AHI, 20 // Align left.
> + | add AHI, AHI, TMP1 // Add mantissa, increment exponent.
> + | blr
> + |.endif
> + |.endmacro
> + |
> + |// Input: CARG2. Output: CARG1, CARG2. Temporaries: TMP0, TMP1.
> + |->vm_sfi2d_1:
> + | sfi2d CARG1, CARG2
> + |
> + |// Input: CARG4. Output: CARG3, CARG4. Temporaries: TMP0, TMP1.
> + |->vm_sfi2d_2:
> + | sfi2d CARG3, CARG4
> |
> |->vm_modi:
> | divwo. TMP0, CARG1, CARG2
> @@ -2784,21 +3196,21 @@ static void build_subroutines(BuildCtx *ctx)
> | addi DISPATCH, r12, GG_G2DISP
> | stw r11, CTSTATE->cb.slot
> | stw r3, CTSTATE->cb.gpr[0]
> - | stfd f1, CTSTATE->cb.fpr[0]
> + | .FPU stfd f1, CTSTATE->cb.fpr[0]
> | stw r4, CTSTATE->cb.gpr[1]
> - | stfd f2, CTSTATE->cb.fpr[1]
> + | .FPU stfd f2, CTSTATE->cb.fpr[1]
> | stw r5, CTSTATE->cb.gpr[2]
> - | stfd f3, CTSTATE->cb.fpr[2]
> + | .FPU stfd f3, CTSTATE->cb.fpr[2]
> | stw r6, CTSTATE->cb.gpr[3]
> - | stfd f4, CTSTATE->cb.fpr[3]
> + | .FPU stfd f4, CTSTATE->cb.fpr[3]
> | stw r7, CTSTATE->cb.gpr[4]
> - | stfd f5, CTSTATE->cb.fpr[4]
> + | .FPU stfd f5, CTSTATE->cb.fpr[4]
> | stw r8, CTSTATE->cb.gpr[5]
> - | stfd f6, CTSTATE->cb.fpr[5]
> + | .FPU stfd f6, CTSTATE->cb.fpr[5]
> | stw r9, CTSTATE->cb.gpr[6]
> - | stfd f7, CTSTATE->cb.fpr[6]
> + | .FPU stfd f7, CTSTATE->cb.fpr[6]
> | stw r10, CTSTATE->cb.gpr[7]
> - | stfd f8, CTSTATE->cb.fpr[7]
> + | .FPU stfd f8, CTSTATE->cb.fpr[7]
> | addi TMP0, sp, CFRAME_SPACE+8
> | stw TMP0, CTSTATE->cb.stack
> | mr CARG1, CTSTATE
> @@ -2809,21 +3221,21 @@ static void build_subroutines(BuildCtx *ctx)
> | lp BASE, L:CRET1->base
> | li TISNUM, LJ_TISNUM // Setup type comparison constants.
> | lp RC, L:CRET1->top
> - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float).
> | li ZERO, 0
> | mr L, CRET1
> - | stw TMP3, TMPD
> - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> + | .FPU stw TMP3, TMPD
> + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double)
> | lwz LFUNC:RB, FRAME_FUNC(BASE)
> - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> - | stw TMP0, TONUM_HI
> + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float).
> + | .FPU stw TMP0, TONUM_HI
> | li TISNIL, LJ_TNIL
> | li_vmstate INTERP
> - | lfs TOBIT, TMPD
> - | stw TMP3, TMPD
> + | .FPU lfs TOBIT, TMPD
> + | .FPU stw TMP3, TMPD
> | sub RC, RC, BASE
> | st_vmstate
> - | lfs TONUM, TMPD
> + | .FPU lfs TONUM, TMPD
> | ins_callt
> |.endif
> |
> @@ -2837,7 +3249,7 @@ static void build_subroutines(BuildCtx *ctx)
> | mr CARG2, RA
> | bl extern lj_ccallback_leave // (CTState *cts, TValue *o)
> | lwz CRET1, CTSTATE->cb.gpr[0]
> - | lfd FARG1, CTSTATE->cb.fpr[0]
> + | .FPU lfd FARG1, CTSTATE->cb.fpr[0]
> | lwz CRET2, CTSTATE->cb.gpr[1]
> | b ->vm_leave_unw
> |.endif
> @@ -2871,14 +3283,14 @@ static void build_subroutines(BuildCtx *ctx)
> | bge <1
> |2:
> | bney cr1, >3
> - | lfd f1, CCSTATE->fpr[0]
> - | lfd f2, CCSTATE->fpr[1]
> - | lfd f3, CCSTATE->fpr[2]
> - | lfd f4, CCSTATE->fpr[3]
> - | lfd f5, CCSTATE->fpr[4]
> - | lfd f6, CCSTATE->fpr[5]
> - | lfd f7, CCSTATE->fpr[6]
> - | lfd f8, CCSTATE->fpr[7]
> + | .FPU lfd f1, CCSTATE->fpr[0]
> + | .FPU lfd f2, CCSTATE->fpr[1]
> + | .FPU lfd f3, CCSTATE->fpr[2]
> + | .FPU lfd f4, CCSTATE->fpr[3]
> + | .FPU lfd f5, CCSTATE->fpr[4]
> + | .FPU lfd f6, CCSTATE->fpr[5]
> + | .FPU lfd f7, CCSTATE->fpr[6]
> + | .FPU lfd f8, CCSTATE->fpr[7]
> |3:
> | lp TMP0, CCSTATE->func
> | lwz CARG2, CCSTATE->gpr[1]
> @@ -2895,7 +3307,7 @@ static void build_subroutines(BuildCtx *ctx)
> | lwz TMP2, -4(r14)
> | lwz TMP0, 4(r14)
> | stw CARG1, CCSTATE:TMP1->gpr[0]
> - | stfd FARG1, CCSTATE:TMP1->fpr[0]
> + | .FPU stfd FARG1, CCSTATE:TMP1->fpr[0]
> | stw CARG2, CCSTATE:TMP1->gpr[1]
> | mtlr TMP0
> | stw CARG3, CCSTATE:TMP1->gpr[2]
> @@ -2924,19 +3336,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
> | // RA = src1*8, RD = src2*8, JMP with RD = target
> |.if DUALNUM
> - | lwzux TMP0, RA, BASE
> + | lwzux CARG1, RA, BASE
> | addi PC, PC, 4
> | lwz CARG2, 4(RA)
> - | lwzux TMP1, RD, BASE
> + | lwzux CARG3, RD, BASE
> | lwz TMP2, -4(PC)
> - | checknum cr0, TMP0
> - | lwz CARG3, 4(RD)
> + | checknum cr0, CARG1
> + | lwz CARG4, 4(RD)
> | decode_RD4 TMP2, TMP2
> - | checknum cr1, TMP1
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | checknum cr1, CARG3
> + | addis SAVE0, TMP2, -(BCBIAS_J*4 >> 16)
> | bne cr0, >7
> | bne cr1, >8
> - | cmpw CARG2, CARG3
> + | cmpw CARG2, CARG4
> if (op == BC_ISLT) {
> | bge >2
> } else if (op == BC_ISGE) {
> @@ -2947,28 +3359,41 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | ble >2
> }
> |1:
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |2:
> | ins_next
> |
> |7: // RA is not an integer.
> | bgt cr0, ->vmeta_comp
> | // RA is a number.
> - | lfd f0, 0(RA)
> + | .FPU lfd f0, 0(RA)
> | bgt cr1, ->vmeta_comp
> | blt cr1, >4
> | // RA is a number, RD is an integer.
> - | tonum_i f1, CARG3
> + |.if FPU
> + | tonum_i f1, CARG4
> + |.else
> + | bl ->vm_sfi2d_2
> + |.endif
> | b >5
> |
> |8: // RA is an integer, RD is not an integer.
> | bgt cr1, ->vmeta_comp
> | // RA is an integer, RD is a number.
> + |.if FPU
> | tonum_i f0, CARG2
> + |.else
> + | bl ->vm_sfi2d_1
> + |.endif
> |4:
> - | lfd f1, 0(RD)
> + | .FPU lfd f1, 0(RD)
> |5:
> + |.if FPU
> | fcmpu cr0, f0, f1
> + |.else
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + |.endif
> if (op == BC_ISLT) {
> | bge <2
> } else if (op == BC_ISGE) {
> @@ -3016,42 +3441,42 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> vk = op == BC_ISEQV;
> | // RA = src1*8, RD = src2*8, JMP with RD = target
> |.if DUALNUM
> - | lwzux TMP0, RA, BASE
> + | lwzux CARG1, RA, BASE
> | addi PC, PC, 4
> | lwz CARG2, 4(RA)
> - | lwzux TMP1, RD, BASE
> - | checknum cr0, TMP0
> - | lwz TMP2, -4(PC)
> - | checknum cr1, TMP1
> - | decode_RD4 TMP2, TMP2
> - | lwz CARG3, 4(RD)
> + | lwzux CARG3, RD, BASE
> + | checknum cr0, CARG1
> + | lwz SAVE0, -4(PC)
> + | checknum cr1, CARG3
> + | decode_RD4 SAVE0, SAVE0
> + | lwz CARG4, 4(RD)
> | cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> if (vk) {
> | ble cr7, ->BC_ISEQN_Z
> } else {
> | ble cr7, ->BC_ISNEN_Z
> }
> |.else
> - | lwzux TMP0, RA, BASE
> - | lwz TMP2, 0(PC)
> + | lwzux CARG1, RA, BASE
> + | lwz SAVE0, 0(PC)
> | lfd f0, 0(RA)
> | addi PC, PC, 4
> - | lwzux TMP1, RD, BASE
> - | checknum cr0, TMP0
> - | decode_RD4 TMP2, TMP2
> + | lwzux CARG3, RD, BASE
> + | checknum cr0, CARG1
> + | decode_RD4 SAVE0, SAVE0
> | lfd f1, 0(RD)
> - | checknum cr1, TMP1
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | checknum cr1, CARG3
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> | bge cr0, >5
> | bge cr1, >5
> | fcmpu cr0, f0, f1
> if (vk) {
> | bne >1
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> } else {
> | beq >1
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> }
> |1:
> | ins_next
> @@ -3059,36 +3484,36 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |5: // Either or both types are not numbers.
> |.if not DUALNUM
> | lwz CARG2, 4(RA)
> - | lwz CARG3, 4(RD)
> + | lwz CARG4, 4(RD)
> |.endif
> |.if FFI
> - | cmpwi cr7, TMP0, LJ_TCDATA
> - | cmpwi cr5, TMP1, LJ_TCDATA
> + | cmpwi cr7, CARG1, LJ_TCDATA
> + | cmpwi cr5, CARG3, LJ_TCDATA
> |.endif
> - | not TMP3, TMP0
> - | cmplw TMP0, TMP1
> - | cmplwi cr1, TMP3, ~LJ_TISPRI // Primitive?
> + | not TMP2, CARG1
> + | cmplw CARG1, CARG3
> + | cmplwi cr1, TMP2, ~LJ_TISPRI // Primitive?
> |.if FFI
> | cror 4*cr7+eq, 4*cr7+eq, 4*cr5+eq
> |.endif
> - | cmplwi cr6, TMP3, ~LJ_TISTABUD // Table or userdata?
> + | cmplwi cr6, TMP2, ~LJ_TISTABUD // Table or userdata?
> |.if FFI
> | beq cr7, ->vmeta_equal_cd
> |.endif
> - | cmplw cr5, CARG2, CARG3
> + | cmplw cr5, CARG2, CARG4
> | crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt // 2: Same type and primitive.
> | crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq // 1: Same tv or different type.
> | crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq // 0: Same type and same tv.
> - | mr SAVE0, PC
> + | mr SAVE1, PC
> | cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt // 0 or 2.
> | cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt // 1 or 2.
> if (vk) {
> | bne cr0, >6
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |6:
> } else {
> | beq cr0, >6
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |6:
> }
> |.if DUALNUM
> @@ -3103,6 +3528,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |
> | // Different tables or userdatas. Need to check __eq metamethod.
> | // Field metatable must be at same offset for GCtab and GCudata!
> + | mr CARG3, CARG4
> | lwz TAB:TMP2, TAB:CARG2->metatable
> | li CARG4, 1-vk // ne = 0 or 1.
> | cmplwi TAB:TMP2, 0
> @@ -3110,7 +3536,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lbz TMP2, TAB:TMP2->nomm
> | andix. TMP2, TMP2, 1<<MM_eq
> | bne <1 // Or 'no __eq' flag set?
> - | mr PC, SAVE0 // Restore old PC.
> + | mr PC, SAVE1 // Restore old PC.
> | b ->vmeta_equal // Handle __eq metamethod.
> break;
>
> @@ -3151,16 +3577,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> vk = op == BC_ISEQN;
> | // RA = src*8, RD = num_const*8, JMP with RD = target
> |.if DUALNUM
> - | lwzux TMP0, RA, BASE
> + | lwzux CARG1, RA, BASE
> | addi PC, PC, 4
> | lwz CARG2, 4(RA)
> - | lwzux TMP1, RD, KBASE
> - | checknum cr0, TMP0
> - | lwz TMP2, -4(PC)
> - | checknum cr1, TMP1
> - | decode_RD4 TMP2, TMP2
> - | lwz CARG3, 4(RD)
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | lwzux CARG3, RD, KBASE
> + | checknum cr0, CARG1
> + | lwz SAVE0, -4(PC)
> + | checknum cr1, CARG3
> + | decode_RD4 SAVE0, SAVE0
> + | lwz CARG4, 4(RD)
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> if (vk) {
> |->BC_ISEQN_Z:
> } else {
> @@ -3168,7 +3594,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> }
> | bne cr0, >7
> | bne cr1, >8
> - | cmpw CARG2, CARG3
> + | cmpw CARG2, CARG4
> |4:
> |.else
> if (vk) {
> @@ -3176,20 +3602,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> } else {
> |->BC_ISNEN_Z: // Dummy label.
> }
> - | lwzx TMP0, BASE, RA
> + | lwzx CARG1, BASE, RA
> | addi PC, PC, 4
> | lfdx f0, BASE, RA
> - | lwz TMP2, -4(PC)
> + | lwz SAVE0, -4(PC)
> | lfdx f1, KBASE, RD
> - | decode_RD4 TMP2, TMP2
> - | checknum TMP0
> - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> + | decode_RD4 SAVE0, SAVE0
> + | checknum CARG1
> + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
> | bge >3
> | fcmpu cr0, f0, f1
> |.endif
> if (vk) {
> | bne >1
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |1:
> |.if not FFI
> |3:
> @@ -3200,13 +3626,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |.if not FFI
> |3:
> |.endif
> - | add PC, PC, TMP2
> + | add PC, PC, SAVE0
> |2:
> }
> | ins_next
> |.if FFI
> |3:
> - | cmpwi TMP0, LJ_TCDATA
> + | cmpwi CARG1, LJ_TCDATA
> | beq ->vmeta_equal_cd
> | b <1
> |.endif
> @@ -3214,18 +3640,31 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |7: // RA is not an integer.
> | bge cr0, <3
> | // RA is a number.
> - | lfd f0, 0(RA)
> + | .FPU lfd f0, 0(RA)
> | blt cr1, >1
> | // RA is a number, RD is an integer.
> - | tonum_i f1, CARG3
> + |.if FPU
> + | tonum_i f1, CARG4
> + |.else
> + | bl ->vm_sfi2d_2
> + |.endif
> | b >2
> |
> |8: // RA is an integer, RD is a number.
> + |.if FPU
> | tonum_i f0, CARG2
> + |.else
> + | bl ->vm_sfi2d_1
> + |.endif
> |1:
> - | lfd f1, 0(RD)
> + | .FPU lfd f1, 0(RD)
> |2:
> + |.if FPU
> | fcmpu cr0, f0, f1
> + |.else
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + |.endif
> | b <4
> |.endif
> break;
> @@ -3280,7 +3719,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | add PC, PC, TMP2
> } else {
> | li TMP1, LJ_TFALSE
> + |.if FPU
> | lfdx f0, BASE, RD
> + |.else
> + | lwzux CARG1, RD, BASE
> + | lwz CARG2, 4(RD)
> + |.endif
> | cmplw TMP0, TMP1
> if (op == BC_ISTC) {
> | bge >1
> @@ -3289,7 +3733,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> }
> | addis PC, PC, -(BCBIAS_J*4 >> 16)
> | decode_RD4 TMP2, INS
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux CARG1, RA, BASE
> + | stw CARG2, 4(RA)
> + |.endif
> | add PC, PC, TMP2
> |1:
> }
> @@ -3324,8 +3773,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> case BC_MOV:
> | // RA = dst*8, RD = src*8
> | ins_next1
> + |.if FPU
> | lfdx f0, BASE, RD
> | stfdx f0, BASE, RA
> + |.else
> + | lwzux TMP0, RD, BASE
> + | lwz TMP1, 4(RD)
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> break;
> case BC_NOT:
> @@ -3427,44 +3883,65 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
> ||switch (vk) {
> ||case 0:
> - | lwzx TMP1, BASE, RB
> + | lwzx CARG1, BASE, RB
> | .if DUALNUM
> - | lwzx TMP2, KBASE, RC
> + | lwzx CARG3, KBASE, RC
> | .endif
> + | .if FPU
> | lfdx f14, BASE, RB
> | lfdx f15, KBASE, RC
> + | .else
> + | add TMP1, BASE, RB
> + | add TMP2, KBASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + | .endif
> | .if DUALNUM
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_vn
> | .else
> - | checknum TMP1; bge ->vmeta_arith_vn
> + | checknum CARG1; bge ->vmeta_arith_vn
> | .endif
> || break;
> ||case 1:
> - | lwzx TMP1, BASE, RB
> + | lwzx CARG1, BASE, RB
> | .if DUALNUM
> - | lwzx TMP2, KBASE, RC
> + | lwzx CARG3, KBASE, RC
> | .endif
> + | .if FPU
> | lfdx f15, BASE, RB
> | lfdx f14, KBASE, RC
> + | .else
> + | add TMP1, BASE, RB
> + | add TMP2, KBASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + | .endif
> | .if DUALNUM
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_nv
> | .else
> - | checknum TMP1; bge ->vmeta_arith_nv
> + | checknum CARG1; bge ->vmeta_arith_nv
> | .endif
> || break;
> ||default:
> - | lwzx TMP1, BASE, RB
> - | lwzx TMP2, BASE, RC
> + | lwzx CARG1, BASE, RB
> + | lwzx CARG3, BASE, RC
> + | .if FPU
> | lfdx f14, BASE, RB
> | lfdx f15, BASE, RC
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + | .else
> + | add TMP1, BASE, RB
> + | add TMP2, BASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + | .endif
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_vv
> || break;
> @@ -3498,48 +3975,78 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | fsub a, b, a // b - floor(b/c)*c
> |.endmacro
> |
> + |.macro sfpmod
> + |->BC_MODVN_Z:
> + | stw CARG1, SFSAVE_1
> + | stw CARG2, SFSAVE_2
> + | mr SAVE0, CARG3
> + | mr SAVE1, CARG4
> + | blex __divdf3
> + | blex floor
> + | mr CARG3, SAVE0
> + | mr CARG4, SAVE1
> + | blex __muldf3
> + | mr CARG3, CRET1
> + | mr CARG4, CRET2
> + | lwz CARG1, SFSAVE_1
> + | lwz CARG2, SFSAVE_2
> + | blex __subdf3
> + |.endmacro
> + |
> |.macro ins_arithfp, fpins
> | ins_arithpre
> |.if "fpins" == "fpmod_"
> | b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway.
> - |.else
> + |.elif FPU
> | fpins f0, f14, f15
> | ins_next1
> | stfdx f0, BASE, RA
> | ins_next2
> + |.else
> + | blex __divdf3 // Only soft-float div uses this macro.
> + | ins_next1
> + | stwux CRET1, RA, BASE
> + | stw CRET2, 4(RA)
> + | ins_next2
> |.endif
> |.endmacro
> |
> - |.macro ins_arithdn, intins, fpins
> + |.macro ins_arithdn, intins, fpins, fpcall
> | // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8
> ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
> ||switch (vk) {
> ||case 0:
> - | lwzux TMP1, RB, BASE
> - | lwzux TMP2, RC, KBASE
> - | lwz CARG1, 4(RB)
> - | checknum cr0, TMP1
> - | lwz CARG2, 4(RC)
> + | lwzux CARG1, RB, BASE
> + | lwzux CARG3, RC, KBASE
> + | lwz CARG2, 4(RB)
> + | checknum cr0, CARG1
> + | lwz CARG4, 4(RC)
> + | checknum cr1, CARG3
> || break;
> ||case 1:
> - | lwzux TMP1, RB, BASE
> - | lwzux TMP2, RC, KBASE
> - | lwz CARG2, 4(RB)
> - | checknum cr0, TMP1
> - | lwz CARG1, 4(RC)
> + | lwzux CARG3, RB, BASE
> + | lwzux CARG1, RC, KBASE
> + | lwz CARG4, 4(RB)
> + | checknum cr0, CARG3
> + | lwz CARG2, 4(RC)
> + | checknum cr1, CARG1
> || break;
> ||default:
> - | lwzux TMP1, RB, BASE
> - | lwzux TMP2, RC, BASE
> - | lwz CARG1, 4(RB)
> - | checknum cr0, TMP1
> - | lwz CARG2, 4(RC)
> + | lwzux CARG1, RB, BASE
> + | lwzux CARG3, RC, BASE
> + | lwz CARG2, 4(RB)
> + | checknum cr0, CARG1
> + | lwz CARG4, 4(RC)
> + | checknum cr1, CARG3
> || break;
> ||}
> - | checknum cr1, TMP2
> | bne >5
> | bne cr1, >5
> - | intins CARG1, CARG1, CARG2
> + |.if "intins" == "intmod"
> + | mr CARG1, CARG2
> + | mr CARG2, CARG4
> + |.endif
> + | intins CARG1, CARG2, CARG4
> | bso >4
> |1:
> | ins_next1
> @@ -3551,29 +4058,40 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | checkov TMP0, <1 // Ignore unrelated overflow.
> | ins_arithfallback b
> |5: // FP variant.
> + |.if FPU
> ||if (vk == 1) {
> | lfd f15, 0(RB)
> - | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | lfd f14, 0(RC)
> ||} else {
> | lfd f14, 0(RB)
> - | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | lfd f15, 0(RC)
> ||}
> + |.endif
> + | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | ins_arithfallback bge
> |.if "fpins" == "fpmod_"
> | b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway.
> |.else
> + |.if FPU
> | fpins f0, f14, f15
> - | ins_next1
> | stfdx f0, BASE, RA
> + |.else
> + |.if "fpcall" == "sfpmod"
> + | sfpmod
> + |.else
> + | blex fpcall
> + |.endif
> + | stwux CRET1, RA, BASE
> + | stw CRET2, 4(RA)
> + |.endif
> + | ins_next1
> | b <2
> |.endif
> |.endmacro
> |
> - |.macro ins_arith, intins, fpins
> + |.macro ins_arith, intins, fpins, fpcall
> |.if DUALNUM
> - | ins_arithdn intins, fpins
> + | ins_arithdn intins, fpins, fpcall
> |.else
> | ins_arithfp fpins
> |.endif
> @@ -3588,9 +4106,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | addo. TMP0, TMP0, TMP3
> | add y, a, b
> |.endmacro
> - | ins_arith addo32., fadd
> + | ins_arith addo32., fadd, __adddf3
> |.else
> - | ins_arith addo., fadd
> + | ins_arith addo., fadd, __adddf3
> |.endif
> break;
> case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
> @@ -3602,36 +4120,48 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | subo. TMP0, TMP0, TMP3
> | sub y, a, b
> |.endmacro
> - | ins_arith subo32., fsub
> + | ins_arith subo32., fsub, __subdf3
> |.else
> - | ins_arith subo., fsub
> + | ins_arith subo., fsub, __subdf3
> |.endif
> break;
> case BC_MULVN: case BC_MULNV: case BC_MULVV:
> - | ins_arith mullwo., fmul
> + | ins_arith mullwo., fmul, __muldf3
> break;
> case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
> | ins_arithfp fdiv
> break;
> case BC_MODVN:
> - | ins_arith intmod, fpmod
> + | ins_arith intmod, fpmod, sfpmod
> break;
> case BC_MODNV: case BC_MODVV:
> - | ins_arith intmod, fpmod_
> + | ins_arith intmod, fpmod_, sfpmod
> break;
> case BC_POW:
> | // NYI: (partial) integer arithmetic.
> - | lwzx TMP1, BASE, RB
> + | lwzx CARG1, BASE, RB
> + | lwzx CARG3, BASE, RC
> + |.if FPU
> | lfdx FARG1, BASE, RB
> - | lwzx TMP2, BASE, RC
> | lfdx FARG2, BASE, RC
> - | checknum cr0, TMP1
> - | checknum cr1, TMP2
> + |.else
> + | add TMP1, BASE, RB
> + | add TMP2, BASE, RC
> + | lwz CARG2, 4(TMP1)
> + | lwz CARG4, 4(TMP2)
> + |.endif
> + | checknum cr0, CARG1
> + | checknum cr1, CARG3
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> | bge ->vmeta_arith_vv
> | blex pow
> | ins_next1
> + |.if FPU
> | stfdx FARG1, BASE, RA
> + |.else
> + | stwux CARG1, RA, BASE
> + | stw CARG2, 4(RA)
> + |.endif
> | ins_next2
> break;
>
> @@ -3651,8 +4181,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lp BASE, L->base
> | bne ->vmeta_binop
> | ins_next1
> + |.if FPU
> | lfdx f0, BASE, SAVE0 // Copy result from RB to RA.
> | stfdx f0, BASE, RA
> + |.else
> + | lwzux TMP0, SAVE0, BASE
> + | lwz TMP1, 4(SAVE0)
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> break;
>
> @@ -3715,8 +4252,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> case BC_KNUM:
> | // RA = dst*8, RD = num_const*8
> | ins_next1
> + |.if FPU
> | lfdx f0, KBASE, RD
> | stfdx f0, BASE, RA
> + |.else
> + | lwzux TMP0, RD, KBASE
> + | lwz TMP1, 4(RD)
> + | stwux TMP0, RA, BASE
> + | stw TMP1, 4(RA)
> + |.endif
> | ins_next2
> break;
> case BC_KPRI:
> @@ -3749,8 +4293,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwzx UPVAL:RB, LFUNC:RB, RD
> | ins_next1
> | lwz TMP1, UPVAL:RB->v
> + |.if FPU
> | lfd f0, 0(TMP1)
> | stfdx f0, BASE, RA
> + |.else
> + | lwz TMP2, 0(TMP1)
> + | lwz TMP3, 4(TMP1)
> + | stwux TMP2, RA, BASE
> + | stw TMP3, 4(RA)
> + |.endif
> | ins_next2
> break;
> case BC_USETV:
> @@ -3758,14 +4309,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz LFUNC:RB, FRAME_FUNC(BASE)
> | srwi RA, RA, 1
> | addi RA, RA, offsetof(GCfuncL, uvptr)
> + |.if FPU
> | lfdux f0, RD, BASE
> + |.else
> + | lwzux CARG1, RD, BASE
> + | lwz CARG3, 4(RD)
> + |.endif
> | lwzx UPVAL:RB, LFUNC:RB, RA
> | lbz TMP3, UPVAL:RB->marked
> | lwz CARG2, UPVAL:RB->v
> | andix. TMP3, TMP3, LJ_GC_BLACK // isblack(uv)
> | lbz TMP0, UPVAL:RB->closed
> | lwz TMP2, 0(RD)
> + |.if FPU
> | stfd f0, 0(CARG2)
> + |.else
> + | stw CARG1, 0(CARG2)
> + | stw CARG3, 4(CARG2)
> + |.endif
> | cmplwi cr1, TMP0, 0
> | lwz TMP1, 4(RD)
> | cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -3821,11 +4382,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz LFUNC:RB, FRAME_FUNC(BASE)
> | srwi RA, RA, 1
> | addi RA, RA, offsetof(GCfuncL, uvptr)
> + |.if FPU
> | lfdx f0, KBASE, RD
> + |.else
> + | lwzux TMP2, RD, KBASE
> + | lwz TMP3, 4(RD)
> + |.endif
> | lwzx UPVAL:RB, LFUNC:RB, RA
> | ins_next1
> | lwz TMP1, UPVAL:RB->v
> + |.if FPU
> | stfd f0, 0(TMP1)
> + |.else
> + | stw TMP2, 0(TMP1)
> + | stw TMP3, 4(TMP1)
> + |.endif
> | ins_next2
> break;
> case BC_USETP:
> @@ -3973,11 +4544,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |.endif
> | ble ->vmeta_tgetv // Integer key and in array part?
> | lwzx TMP0, TMP1, TMP2
> + |.if FPU
> | lfdx f14, TMP1, TMP2
> + |.else
> + | lwzux SAVE0, TMP1, TMP2
> + | lwz SAVE1, 4(TMP1)
> + |.endif
> | checknil TMP0; beq >2
> |1:
> | ins_next1
> + |.if FPU
> | stfdx f14, BASE, RA
> + |.else
> + | stwux SAVE0, RA, BASE
> + | stw SAVE1, 4(RA)
> + |.endif
> | ins_next2
> |
> |2: // Check for __index if table value is nil.
> @@ -4053,12 +4634,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz TMP1, TAB:RB->asize
> | lwz TMP2, TAB:RB->array
> | cmplw TMP0, TMP1; bge ->vmeta_tgetb
> + |.if FPU
> | lwzx TMP1, TMP2, RC
> | lfdx f0, TMP2, RC
> + |.else
> + | lwzux TMP1, TMP2, RC
> + | lwz TMP3, 4(TMP2)
> + |.endif
> | checknil TMP1; beq >5
> |1:
> | ins_next1
> + |.if FPU
> | stfdx f0, BASE, RA
> + |.else
> + | stwux TMP1, RA, BASE
> + | stw TMP3, 4(RA)
> + |.endif
> | ins_next2
> |
> |5: // Check for __index if table value is nil.
> @@ -4088,10 +4679,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | cmplw TMP0, CARG2
> | slwi TMP2, CARG2, 3
> | ble ->vmeta_tgetr // In array part?
> + |.if FPU
> | lfdx f14, TMP1, TMP2
> + |.else
> + | lwzux SAVE0, TMP2, TMP1
> + | lwz SAVE1, 4(TMP2)
> + |.endif
> |->BC_TGETR_Z:
> | ins_next1
> + |.if FPU
> | stfdx f14, BASE, RA
> + |.else
> + | stwux SAVE0, RA, BASE
> + | stw SAVE1, 4(RA)
> + |.endif
> | ins_next2
> break;
>
> @@ -4132,11 +4733,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | ble ->vmeta_tsetv // Integer key and in array part?
> | lwzx TMP2, TMP1, TMP0
> | lbz TMP3, TAB:RB->marked
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | add SAVE1, BASE, RA
> + | lwz SAVE0, 0(SAVE1)
> + | lwz SAVE1, 4(SAVE1)
> + |.endif
> | checknil TMP2; beq >3
> |1:
> | andix. TMP2, TMP3, LJ_GC_BLACK // isblack(table)
> + |.if FPU
> | stfdx f14, TMP1, TMP0
> + |.else
> + | stwux SAVE0, TMP1, TMP0
> + | stw SAVE1, 4(TMP1)
> + |.endif
> | bne >7
> |2:
> | ins_next
> @@ -4177,7 +4789,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz NODE:TMP2, TAB:RB->node
> | stb ZERO, TAB:RB->nomm // Clear metamethod cache.
> | and TMP1, TMP1, TMP0 // idx = str->hash & tab->hmask
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | add CARG2, BASE, RA
> + | lwz SAVE0, 0(CARG2)
> + | lwz SAVE1, 4(CARG2)
> + |.endif
> | slwi TMP0, TMP1, 5
> | slwi TMP1, TMP1, 3
> | sub TMP1, TMP0, TMP1
> @@ -4193,7 +4811,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | checknil CARG2; beq >4 // Key found, but nil value?
> |2:
> | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
> + |.if FPU
> | stfd f14, NODE:TMP2->val
> + |.else
> + | stw SAVE0, NODE:TMP2->val.u32.hi
> + | stw SAVE1, NODE:TMP2->val.u32.lo
> + |.endif
> | bne >7
> |3:
> | ins_next
> @@ -4232,7 +4855,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | bl extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k)
> | // Returns TValue *.
> | lp BASE, L->base
> + |.if FPU
> | stfd f14, 0(CRET1)
> + |.else
> + | stw SAVE0, 0(CRET1)
> + | stw SAVE1, 4(CRET1)
> + |.endif
> | b <3 // No 2nd write barrier needed.
> |
> |7: // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4249,13 +4877,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | lwz TMP2, TAB:RB->array
> | lbz TMP3, TAB:RB->marked
> | cmplw TMP0, TMP1
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | add CARG2, BASE, RA
> + | lwz SAVE0, 0(CARG2)
> + | lwz SAVE1, 4(CARG2)
> + |.endif
> | bge ->vmeta_tsetb
> | lwzx TMP1, TMP2, RC
> | checknil TMP1; beq >5
> |1:
> | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
> + |.if FPU
> | stfdx f14, TMP2, RC
> + |.else
> + | stwux SAVE0, RC, TMP2
> + | stw SAVE1, 4(RC)
> + |.endif
> | bne >7
> |2:
> | ins_next
> @@ -4295,10 +4934,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |2:
> | cmplw TMP0, CARG3
> | slwi TMP2, CARG3, 3
> + |.if FPU
> | lfdx f14, BASE, RA
> + |.else
> + | lwzux SAVE0, RA, BASE
> + | lwz SAVE1, 4(RA)
> + |.endif
> | ble ->vmeta_tsetr // In array part?
> | ins_next1
> + |.if FPU
> | stfdx f14, TMP1, TMP2
> + |.else
> + | stwux SAVE0, TMP1, TMP2
> + | stw SAVE1, 4(TMP1)
> + |.endif
> | ins_next2
> |
> |7: // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4328,10 +4977,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | add TMP1, TMP1, TMP0
> | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table)
> |3: // Copy result slots to table.
> + |.if FPU
> | lfd f0, 0(RA)
> + |.else
> + | lwz SAVE0, 0(RA)
> + | lwz SAVE1, 4(RA)
> + |.endif
> | addi RA, RA, 8
> | cmpw cr1, RA, TMP2
> + |.if FPU
> | stfd f0, 0(TMP1)
> + |.else
> + | stw SAVE0, 0(TMP1)
> + | stw SAVE1, 4(TMP1)
> + |.endif
> | addi TMP1, TMP1, 8
> | blt cr1, <3
> | bne >7
> @@ -4398,9 +5057,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | beq cr1, >3
> |2:
> | addi TMP3, TMP2, 8
> + |.if FPU
> | lfdx f0, RA, TMP2
> + |.else
> + | add CARG3, RA, TMP2
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | cmplw cr1, TMP3, NARGS8:RC
> + |.if FPU
> | stfdx f0, BASE, TMP2
> + |.else
> + | stwux CARG1, TMP2, BASE
> + | stw CARG2, 4(TMP2)
> + |.endif
> | mr TMP2, TMP3
> | bne cr1, <2
> |3:
> @@ -4433,14 +5103,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | add BASE, BASE, RA
> | lwz TMP1, -24(BASE)
> | lwz LFUNC:RB, -20(BASE)
> + |.if FPU
> | lfd f1, -8(BASE)
> | lfd f0, -16(BASE)
> + |.else
> + | lwz CARG1, -8(BASE)
> + | lwz CARG2, -4(BASE)
> + | lwz CARG3, -16(BASE)
> + | lwz CARG4, -12(BASE)
> + |.endif
> | stw TMP1, 0(BASE) // Copy callable.
> | stw LFUNC:RB, 4(BASE)
> | checkfunc TMP1
> - | stfd f1, 16(BASE) // Copy control var.
> | li NARGS8:RC, 16 // Iterators get 2 arguments.
> + |.if FPU
> + | stfd f1, 16(BASE) // Copy control var.
> | stfdu f0, 8(BASE) // Copy state.
> + |.else
> + | stw CARG1, 16(BASE) // Copy control var.
> + | stw CARG2, 20(BASE)
> + | stwu CARG3, 8(BASE) // Copy state.
> + | stw CARG4, 4(BASE)
> + |.endif
> | bne ->vmeta_call
> | ins_call
> break;
> @@ -4461,7 +5145,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | slwi TMP3, RC, 3
> | bge >5 // Index points after array part?
> | lwzx TMP2, TMP1, TMP3
> + |.if FPU
> | lfdx f0, TMP1, TMP3
> + |.else
> + | lwzux CARG1, TMP3, TMP1
> + | lwz CARG2, 4(TMP3)
> + |.endif
> | checknil TMP2
> | lwz INS, -4(PC)
> | beq >4
> @@ -4473,7 +5162,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |.endif
> | addi RC, RC, 1
> | addis TMP3, PC, -(BCBIAS_J*4 >> 16)
> + |.if FPU
> | stfd f0, 8(RA)
> + |.else
> + | stw CARG1, 8(RA)
> + | stw CARG2, 12(RA)
> + |.endif
> | decode_RD4 TMP1, INS
> | stw RC, -4(RA) // Update control var.
> | add PC, TMP1, TMP3
> @@ -4498,17 +5192,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | slwi RB, RC, 3
> | sub TMP3, TMP3, RB
> | lwzx RB, TMP2, TMP3
> + |.if FPU
> | lfdx f0, TMP2, TMP3
> + |.else
> + | add CARG3, TMP2, TMP3
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | add NODE:TMP3, TMP2, TMP3
> | checknil RB
> | lwz INS, -4(PC)
> | beq >7
> + |.if FPU
> | lfd f1, NODE:TMP3->key
> + |.else
> + | lwz CARG3, NODE:TMP3->key.u32.hi
> + | lwz CARG4, NODE:TMP3->key.u32.lo
> + |.endif
> | addis TMP2, PC, -(BCBIAS_J*4 >> 16)
> + |.if FPU
> | stfd f0, 8(RA)
> + |.else
> + | stw CARG1, 8(RA)
> + | stw CARG2, 12(RA)
> + |.endif
> | add RC, RC, TMP0
> | decode_RD4 TMP1, INS
> + |.if FPU
> | stfd f1, 0(RA)
> + |.else
> + | stw CARG3, 0(RA)
> + | stw CARG4, 4(RA)
> + |.endif
> | addi RC, RC, 1
> | add PC, TMP1, TMP2
> | stw RC, -4(RA) // Update control var.
> @@ -4574,9 +5289,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | subi TMP2, TMP2, 16
> | ble >2 // No vararg slots?
> |1: // Copy vararg slots to destination slots.
> + |.if FPU
> | lfd f0, 0(RC)
> + |.else
> + | lwz CARG1, 0(RC)
> + | lwz CARG2, 4(RC)
> + |.endif
> | addi RC, RC, 8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw CARG1, 0(RA)
> + | stw CARG2, 4(RA)
> + |.endif
> | cmplw RA, TMP2
> | cmplw cr1, RC, TMP3
> | bge >3 // All destination slots filled?
> @@ -4599,9 +5324,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | addi MULTRES, TMP1, 8
> | bgt >7
> |6:
> + |.if FPU
> | lfd f0, 0(RC)
> + |.else
> + | lwz CARG1, 0(RC)
> + | lwz CARG2, 4(RC)
> + |.endif
> | addi RC, RC, 8
> + |.if FPU
> | stfd f0, 0(RA)
> + |.else
> + | stw CARG1, 0(RA)
> + | stw CARG2, 4(RA)
> + |.endif
> | cmplw RC, TMP3
> | addi RA, RA, 8
> | blt <6 // More vararg slots?
> @@ -4652,14 +5387,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | li TMP1, 0
> |2:
> | addi TMP3, TMP1, 8
> + |.if FPU
> | lfdx f0, RA, TMP1
> + |.else
> + | add CARG3, RA, TMP1
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | cmpw TMP3, RC
> + |.if FPU
> | stfdx f0, TMP2, TMP1
> + |.else
> + | add CARG3, TMP2, TMP1
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | beq >3
> | addi TMP1, TMP3, 8
> + |.if FPU
> | lfdx f1, RA, TMP3
> + |.else
> + | add CARG3, RA, TMP3
> + | lwz CARG1, 0(CARG3)
> + | lwz CARG2, 4(CARG3)
> + |.endif
> | cmpw TMP1, RC
> + |.if FPU
> | stfdx f1, TMP2, TMP3
> + |.else
> + | add CARG3, TMP2, TMP3
> + | stw CARG1, 0(CARG3)
> + | stw CARG2, 4(CARG3)
> + |.endif
> | bne <2
> |3:
> |5:
> @@ -4701,8 +5460,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> | subi TMP2, BASE, 8
> | decode_RB8 RB, INS
> if (op == BC_RET1) {
> + |.if FPU
> | lfd f0, 0(RA)
> | stfd f0, 0(TMP2)
> + |.else
> + | lwz CARG1, 0(RA)
> + | lwz CARG2, 4(RA)
> + | stw CARG1, 0(TMP2)
> + | stw CARG2, 4(TMP2)
> + |.endif
> }
> |5:
> | cmplw RB, RD
> @@ -4763,11 +5529,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> |4:
> | stw CARG1, FORL_IDX*8+4(RA)
> } else {
> - | lwz TMP3, FORL_STEP*8(RA)
> + | lwz SAVE0, FORL_STEP*8(RA)
> | lwz CARG3, FORL_STEP*8+4(RA)
> | lwz TMP2, FORL_STOP*8(RA)
> | lwz CARG2, FORL_STOP*8+4(RA)
> - | cmplw cr7, TMP3, TISNUM
> + | cmplw cr7, SAVE0, TISNUM
> | cmplw cr1, TMP2, TISNUM
> | crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
> | crand 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -4810,41 +5576,80 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> if (vk) {
> |.if DUALNUM
> |9: // FP loop.
> + |.if FPU
> | lfd f1, FORL_IDX*8(RA)
> |.else
> + | lwz CARG1, FORL_IDX*8(RA)
> + | lwz CARG2, FORL_IDX*8+4(RA)
> + |.endif
> + |.else
> | lfdux f1, RA, BASE
> |.endif
> + |.if FPU
> | lfd f3, FORL_STEP*8(RA)
> | lfd f2, FORL_STOP*8(RA)
> - | lwz TMP3, FORL_STEP*8(RA)
> | fadd f1, f1, f3
> | stfd f1, FORL_IDX*8(RA)
> + |.else
> + | lwz CARG3, FORL_STEP*8(RA)
> + | lwz CARG4, FORL_STEP*8+4(RA)
> + | mr SAVE1, RD
> + | blex __adddf3
> + | mr RD, SAVE1
> + | stw CRET1, FORL_IDX*8(RA)
> + | stw CRET2, FORL_IDX*8+4(RA)
> + | lwz CARG3, FORL_STOP*8(RA)
> + | lwz CARG4, FORL_STOP*8+4(RA)
> + |.endif
> + | lwz SAVE0, FORL_STEP*8(RA)
> } else {
> |.if DUALNUM
> |9: // FP loop.
> |.else
> | lwzux TMP1, RA, BASE
> - | lwz TMP3, FORL_STEP*8(RA)
> + | lwz SAVE0, FORL_STEP*8(RA)
> | lwz TMP2, FORL_STOP*8(RA)
> | cmplw cr0, TMP1, TISNUM
> - | cmplw cr7, TMP3, TISNUM
> + | cmplw cr7, SAVE0, TISNUM
> | cmplw cr1, TMP2, TISNUM
> |.endif
> + |.if FPU
> | lfd f1, FORL_IDX*8(RA)
> + |.else
> + | lwz CARG1, FORL_IDX*8(RA)
> + | lwz CARG2, FORL_IDX*8+4(RA)
> + |.endif
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr7+lt
> | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> + |.if FPU
> | lfd f2, FORL_STOP*8(RA)
> + |.else
> + | lwz CARG3, FORL_STOP*8(RA)
> + | lwz CARG4, FORL_STOP*8+4(RA)
> + |.endif
> | bge ->vmeta_for
> }
> - | cmpwi cr6, TMP3, 0
> + | cmpwi cr6, SAVE0, 0
> if (op != BC_JFORL) {
> | srwi RD, RD, 1
> }
> + |.if FPU
> | stfd f1, FORL_EXT*8(RA)
> + |.else
> + | stw CARG1, FORL_EXT*8(RA)
> + | stw CARG2, FORL_EXT*8+4(RA)
> + |.endif
> if (op != BC_JFORL) {
> | add RD, PC, RD
> }
> + |.if FPU
> | fcmpu cr0, f1, f2
> + |.else
> + | mr SAVE1, RD
> + | blex __ledf2
> + | cmpwi CRET1, 0
> + | mr RD, SAVE1
> + |.endif
> if (op == BC_JFORI) {
> | addis PC, RD, -(BCBIAS_J*4 >> 16)
> }
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (4 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 11:46 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 14:33 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
` (15 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
Sponsored by Cisco Systems, Inc.
(cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
The software floating point library is used on machines which do not
have hardware support for floating point [1]. This patch enables
support for such machines in the JIT compiler for powerpc.
This includes:
* All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
* `asm_sfpmin_max()` is introduced for min/max operations on soft-float
point.
* `asm_sfpcomp()` is introduced for soft-float point comparisons.
[1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
Sergey Kaplun:
* added the description for the feature
Part of tarantool/tarantool#8825
---
src/lj_arch.h | 1 -
src/lj_asm_ppc.h | 321 ++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 278 insertions(+), 44 deletions(-)
diff --git a/src/lj_arch.h b/src/lj_arch.h
index 8bb8757d..7397492e 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -281,7 +281,6 @@
#endif
#if LJ_ABI_SOFTFP
-#define LJ_ARCH_NOJIT 1 /* NYI */
#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL
#else
#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
index aa2d45c0..6cb608f7 100644
--- a/src/lj_asm_ppc.h
+++ b/src/lj_asm_ppc.h
@@ -226,6 +226,7 @@ static void asm_fusexrefx(ASMState *as, PPCIns pi, Reg rt, IRRef ref,
emit_tab(as, pi, rt, left, right);
}
+#if !LJ_SOFTFP
/* Fuse to multiply-add/sub instruction. */
static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
{
@@ -245,6 +246,7 @@ static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
}
return 0;
}
+#endif
/* -- Calls --------------------------------------------------------------- */
@@ -253,13 +255,17 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
{
uint32_t n, nargs = CCI_XNARGS(ci);
int32_t ofs = 8;
- Reg gpr = REGARG_FIRSTGPR, fpr = REGARG_FIRSTFPR;
+ Reg gpr = REGARG_FIRSTGPR;
+#if !LJ_SOFTFP
+ Reg fpr = REGARG_FIRSTFPR;
+#endif
if ((void *)ci->func)
emit_call(as, (void *)ci->func);
for (n = 0; n < nargs; n++) { /* Setup args. */
IRRef ref = args[n];
if (ref) {
IRIns *ir = IR(ref);
+#if !LJ_SOFTFP
if (irt_isfp(ir->t)) {
if (fpr <= REGARG_LASTFPR) {
lua_assert(rset_test(as->freeset, fpr)); /* Already evicted. */
@@ -271,7 +277,9 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
emit_spstore(as, ir, r, ofs);
ofs += irt_isnum(ir->t) ? 8 : 4;
}
- } else {
+ } else
+#endif
+ {
if (gpr <= REGARG_LASTGPR) {
lua_assert(rset_test(as->freeset, gpr)); /* Already evicted. */
ra_leftov(as, gpr, ref);
@@ -290,8 +298,10 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
}
checkmclim(as);
}
+#if !LJ_SOFTFP
if ((ci->flags & CCI_VARARG)) /* Vararg calls need to know about FPR use. */
emit_tab(as, fpr == REGARG_FIRSTFPR ? PPCI_CRXOR : PPCI_CREQV, 6, 6, 6);
+#endif
}
/* Setup result reg/sp for call. Evict scratch regs. */
@@ -299,8 +309,10 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
{
RegSet drop = RSET_SCRATCH;
int hiop = ((ir+1)->o == IR_HIOP && !irt_isnil((ir+1)->t));
+#if !LJ_SOFTFP
if ((ci->flags & CCI_NOFPRCLOBBER))
drop &= ~RSET_FPR;
+#endif
if (ra_hasreg(ir->r))
rset_clear(drop, ir->r); /* Dest reg handled below. */
if (hiop && ra_hasreg((ir+1)->r))
@@ -308,7 +320,7 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
ra_evictset(as, drop); /* Evictions must be performed first. */
if (ra_used(ir)) {
lua_assert(!irt_ispri(ir->t));
- if (irt_isfp(ir->t)) {
+ if (!LJ_SOFTFP && irt_isfp(ir->t)) {
if ((ci->flags & CCI_CASTU64)) {
/* Use spill slot or temp slots. */
int32_t ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
@@ -377,6 +389,7 @@ static void asm_retf(ASMState *as, IRIns *ir)
/* -- Type conversions ---------------------------------------------------- */
+#if !LJ_SOFTFP
static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
{
RegSet allow = RSET_FPR;
@@ -409,15 +422,23 @@ static void asm_tobit(ASMState *as, IRIns *ir)
emit_fai(as, PPCI_STFD, tmp, RID_SP, SPOFS_TMP);
emit_fab(as, PPCI_FADD, tmp, left, right);
}
+#endif
static void asm_conv(ASMState *as, IRIns *ir)
{
IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
+#if !LJ_SOFTFP
int stfp = (st == IRT_NUM || st == IRT_FLOAT);
+#endif
IRRef lref = ir->op1;
- lua_assert(irt_type(ir->t) != st);
lua_assert(!(irt_isint64(ir->t) ||
(st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
+#if LJ_SOFTFP
+ /* FP conversions are handled by SPLIT. */
+ lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
+ /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
+#else
+ lua_assert(irt_type(ir->t) != st);
if (irt_isfp(ir->t)) {
Reg dest = ra_dest(as, ir, RSET_FPR);
if (stfp) { /* FP to FP conversion. */
@@ -476,7 +497,9 @@ static void asm_conv(ASMState *as, IRIns *ir)
emit_fb(as, PPCI_FCTIWZ, tmp, left);
}
}
- } else {
+ } else
+#endif
+ {
Reg dest = ra_dest(as, ir, RSET_GPR);
if (st >= IRT_I8 && st <= IRT_U16) { /* Extend to 32 bit integer. */
Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
@@ -496,17 +519,41 @@ static void asm_strto(ASMState *as, IRIns *ir)
{
const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
IRRef args[2];
- int32_t ofs;
+ int32_t ofs = SPOFS_TMP;
+#if LJ_SOFTFP
+ ra_evictset(as, RSET_SCRATCH);
+ if (ra_used(ir)) {
+ if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
+ (ir->s & 1) == LJ_BE && (ir->s ^ 1) == (ir+1)->s) {
+ int i;
+ for (i = 0; i < 2; i++) {
+ Reg r = (ir+i)->r;
+ if (ra_hasreg(r)) {
+ ra_free(as, r);
+ ra_modified(as, r);
+ emit_spload(as, ir+i, r, sps_scale((ir+i)->s));
+ }
+ }
+ ofs = sps_scale(ir->s & ~1);
+ } else {
+ Reg rhi = ra_dest(as, ir+1, RSET_GPR);
+ Reg rlo = ra_dest(as, ir, rset_exclude(RSET_GPR, rhi));
+ emit_tai(as, PPCI_LWZ, rhi, RID_SP, ofs);
+ emit_tai(as, PPCI_LWZ, rlo, RID_SP, ofs+4);
+ }
+ }
+#else
RegSet drop = RSET_SCRATCH;
if (ra_hasreg(ir->r)) rset_set(drop, ir->r); /* Spill dest reg (if any). */
ra_evictset(as, drop);
+ if (ir->s) ofs = sps_scale(ir->s);
+#endif
asm_guardcc(as, CC_EQ);
emit_ai(as, PPCI_CMPWI, RID_RET, 0); /* Test return status. */
args[0] = ir->op1; /* GCstr *str */
args[1] = ASMREF_TMP1; /* TValue *n */
asm_gencall(as, ci, args);
/* Store the result to the spill slot or temp slots. */
- ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
emit_tai(as, PPCI_ADDI, ra_releasetmp(as, ASMREF_TMP1), RID_SP, ofs);
}
@@ -530,7 +577,10 @@ static void asm_tvptr(ASMState *as, Reg dest, IRRef ref)
Reg src = ra_alloc1(as, ref, allow);
emit_setgl(as, src, tmptv.gcr);
}
- type = ra_allock(as, irt_toitype(ir->t), allow);
+ if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
+ type = ra_alloc1(as, ref+1, allow);
+ else
+ type = ra_allock(as, irt_toitype(ir->t), allow);
emit_setgl(as, type, tmptv.it);
}
}
@@ -574,11 +624,27 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
Reg tisnum = RID_NONE, tmpnum = RID_NONE;
IRRef refkey = ir->op2;
IRIns *irkey = IR(refkey);
+ int isk = irref_isk(refkey);
IRType1 kt = irkey->t;
uint32_t khash;
MCLabel l_end, l_loop, l_next;
rset_clear(allow, tab);
+#if LJ_SOFTFP
+ if (!isk) {
+ key = ra_alloc1(as, refkey, allow);
+ rset_clear(allow, key);
+ if (irkey[1].o == IR_HIOP) {
+ if (ra_hasreg((irkey+1)->r)) {
+ tmpnum = (irkey+1)->r;
+ ra_noweak(as, tmpnum);
+ } else {
+ tmpnum = ra_allocref(as, refkey+1, allow);
+ }
+ rset_clear(allow, tmpnum);
+ }
+ }
+#else
if (irt_isnum(kt)) {
key = ra_alloc1(as, refkey, RSET_FPR);
tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
@@ -588,6 +654,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
key = ra_alloc1(as, refkey, allow);
rset_clear(allow, key);
}
+#endif
tmp2 = ra_scratch(as, allow);
rset_clear(allow, tmp2);
@@ -610,7 +677,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
asm_guardcc(as, CC_EQ);
else
emit_condbranch(as, PPCI_BC|PPCF_Y, CC_EQ, l_end);
- if (irt_isnum(kt)) {
+ if (!LJ_SOFTFP && irt_isnum(kt)) {
emit_fab(as, PPCI_FCMPU, 0, tmpnum, key);
emit_condbranch(as, PPCI_BC, CC_GE, l_next);
emit_ab(as, PPCI_CMPLW, tmp1, tisnum);
@@ -620,7 +687,10 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
emit_ab(as, PPCI_CMPW, tmp2, key);
emit_condbranch(as, PPCI_BC, CC_NE, l_next);
}
- emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
+ if (LJ_SOFTFP && ra_hasreg(tmpnum))
+ emit_ab(as, PPCI_CMPW, tmp1, tmpnum);
+ else
+ emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
if (!irt_ispri(kt))
emit_tai(as, PPCI_LWZ, tmp2, dest, (int32_t)offsetof(Node, key.gcr));
}
@@ -629,19 +699,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
(((char *)as->mcp-(char *)l_loop) & 0xffffu);
/* Load main position relative to tab->node into dest. */
- khash = irref_isk(refkey) ? ir_khash(irkey) : 1;
+ khash = isk ? ir_khash(irkey) : 1;
if (khash == 0) {
emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
} else {
Reg tmphash = tmp1;
- if (irref_isk(refkey))
+ if (isk)
tmphash = ra_allock(as, khash, allow);
emit_tab(as, PPCI_ADD, dest, dest, tmp1);
emit_tai(as, PPCI_MULLI, tmp1, tmp1, sizeof(Node));
emit_asb(as, PPCI_AND, tmp1, tmp2, tmphash);
emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
emit_tai(as, PPCI_LWZ, tmp2, tab, (int32_t)offsetof(GCtab, hmask));
- if (irref_isk(refkey)) {
+ if (isk) {
/* Nothing to do. */
} else if (irt_isstr(kt)) {
emit_tai(as, PPCI_LWZ, tmp1, key, (int32_t)offsetof(GCstr, hash));
@@ -651,13 +721,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
emit_asb(as, PPCI_XOR, tmp1, tmp1, tmp2);
emit_rotlwi(as, tmp1, tmp1, (HASH_ROT2+HASH_ROT1)&31);
emit_tab(as, PPCI_SUBF, tmp2, dest, tmp2);
- if (irt_isnum(kt)) {
+ if (LJ_SOFTFP ? (irkey[1].o == IR_HIOP) : irt_isnum(kt)) {
+#if LJ_SOFTFP
+ emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
+ emit_rotlwi(as, dest, tmp1, HASH_ROT1);
+ emit_tab(as, PPCI_ADD, tmp1, tmpnum, tmpnum);
+#else
int32_t ofs = ra_spill(as, irkey);
emit_asb(as, PPCI_XOR, tmp2, tmp2, tmp1);
emit_rotlwi(as, dest, tmp1, HASH_ROT1);
emit_tab(as, PPCI_ADD, tmp1, tmp1, tmp1);
emit_tai(as, PPCI_LWZ, tmp2, RID_SP, ofs+4);
emit_tai(as, PPCI_LWZ, tmp1, RID_SP, ofs);
+#endif
} else {
emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
emit_rotlwi(as, dest, tmp1, HASH_ROT1);
@@ -784,8 +860,8 @@ static PPCIns asm_fxloadins(IRIns *ir)
case IRT_U8: return PPCI_LBZ;
case IRT_I16: return PPCI_LHA;
case IRT_U16: return PPCI_LHZ;
- case IRT_NUM: return PPCI_LFD;
- case IRT_FLOAT: return PPCI_LFS;
+ case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_LFD;
+ case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_LFS;
default: return PPCI_LWZ;
}
}
@@ -795,8 +871,8 @@ static PPCIns asm_fxstoreins(IRIns *ir)
switch (irt_type(ir->t)) {
case IRT_I8: case IRT_U8: return PPCI_STB;
case IRT_I16: case IRT_U16: return PPCI_STH;
- case IRT_NUM: return PPCI_STFD;
- case IRT_FLOAT: return PPCI_STFS;
+ case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_STFD;
+ case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_STFS;
default: return PPCI_STW;
}
}
@@ -839,7 +915,8 @@ static void asm_fstore(ASMState *as, IRIns *ir)
static void asm_xload(ASMState *as, IRIns *ir)
{
- Reg dest = ra_dest(as, ir, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
+ Reg dest = ra_dest(as, ir,
+ (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
if (irt_isi8(ir->t))
emit_as(as, PPCI_EXTSB, dest, dest);
@@ -857,7 +934,8 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
Reg src = ra_alloc1(as, irb->op1, RSET_GPR);
asm_fusexrefx(as, PPCI_STWBRX, src, ir->op1, rset_exclude(RSET_GPR, src));
} else {
- Reg src = ra_alloc1(as, ir->op2, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
+ Reg src = ra_alloc1(as, ir->op2,
+ (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
asm_fusexref(as, asm_fxstoreins(ir), src, ir->op1,
rset_exclude(RSET_GPR, src), ofs);
}
@@ -871,10 +949,19 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
Reg dest = RID_NONE, type = RID_TMP, tmp = RID_TMP, idx;
RegSet allow = RSET_GPR;
int32_t ofs = AHUREF_LSX;
+ if (LJ_SOFTFP && (ir+1)->o == IR_HIOP) {
+ t.irt = IRT_NUM;
+ if (ra_used(ir+1)) {
+ type = ra_dest(as, ir+1, allow);
+ rset_clear(allow, type);
+ }
+ ofs = 0;
+ }
if (ra_used(ir)) {
- lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
- if (!irt_isnum(t)) ofs = 0;
- dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
+ lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
+ irt_isint(ir->t) || irt_isaddr(ir->t));
+ if (LJ_SOFTFP || !irt_isnum(t)) ofs = 0;
+ dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
rset_clear(allow, dest);
}
idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
@@ -883,12 +970,13 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
asm_guardcc(as, CC_GE);
emit_ab(as, PPCI_CMPLW, type, tisnum);
if (ra_hasreg(dest)) {
- if (ofs == AHUREF_LSX) {
+ if (!LJ_SOFTFP && ofs == AHUREF_LSX) {
tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_GPR,
(idx&255)), (idx>>8)));
emit_fab(as, PPCI_LFDX, dest, (idx&255), tmp);
} else {
- emit_fai(as, PPCI_LFD, dest, idx, ofs);
+ emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest, idx,
+ ofs+4*LJ_SOFTFP);
}
}
} else {
@@ -911,7 +999,7 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
int32_t ofs = AHUREF_LSX;
if (ir->r == RID_SINK)
return;
- if (irt_isnum(ir->t)) {
+ if (!LJ_SOFTFP && irt_isnum(ir->t)) {
src = ra_alloc1(as, ir->op2, RSET_FPR);
} else {
if (!irt_ispri(ir->t)) {
@@ -919,11 +1007,14 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
rset_clear(allow, src);
ofs = 0;
}
- type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
+ if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
+ type = ra_alloc1(as, (ir+1)->op2, allow);
+ else
+ type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
rset_clear(allow, type);
}
idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
- if (irt_isnum(ir->t)) {
+ if (!LJ_SOFTFP && irt_isnum(ir->t)) {
if (ofs == AHUREF_LSX) {
emit_fab(as, PPCI_STFDX, src, (idx&255), RID_TMP);
emit_slwi(as, RID_TMP, (idx>>8), 3);
@@ -948,21 +1039,33 @@ static void asm_sload(ASMState *as, IRIns *ir)
IRType1 t = ir->t;
Reg dest = RID_NONE, type = RID_NONE, base;
RegSet allow = RSET_GPR;
+ int hiop = (LJ_SOFTFP && (ir+1)->o == IR_HIOP);
+ if (hiop)
+ t.irt = IRT_NUM;
lua_assert(!(ir->op2 & IRSLOAD_PARENT)); /* Handled by asm_head_side(). */
- lua_assert(irt_isguard(t) || !(ir->op2 & IRSLOAD_TYPECHECK));
+ lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
lua_assert(LJ_DUALNUM ||
!irt_isint(t) || (ir->op2 & (IRSLOAD_CONVERT|IRSLOAD_FRAME)));
+#if LJ_SOFTFP
+ lua_assert(!(ir->op2 & IRSLOAD_CONVERT)); /* Handled by LJ_SOFTFP SPLIT. */
+ if (hiop && ra_used(ir+1)) {
+ type = ra_dest(as, ir+1, allow);
+ rset_clear(allow, type);
+ }
+#else
if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
dest = ra_scratch(as, RSET_FPR);
asm_tointg(as, ir, dest);
t.irt = IRT_NUM; /* Continue with a regular number type check. */
- } else if (ra_used(ir)) {
+ } else
+#endif
+ if (ra_used(ir)) {
lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
- dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
+ dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
rset_clear(allow, dest);
base = ra_alloc1(as, REF_BASE, allow);
rset_clear(allow, base);
- if ((ir->op2 & IRSLOAD_CONVERT)) {
+ if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
if (irt_isint(t)) {
emit_tai(as, PPCI_LWZ, dest, RID_SP, SPOFS_TMPLO);
dest = ra_scratch(as, RSET_FPR);
@@ -994,10 +1097,13 @@ dotypecheck:
if ((ir->op2 & IRSLOAD_TYPECHECK)) {
Reg tisnum = ra_allock(as, (int32_t)LJ_TISNUM, allow);
asm_guardcc(as, CC_GE);
- emit_ab(as, PPCI_CMPLW, RID_TMP, tisnum);
+#if !LJ_SOFTFP
type = RID_TMP;
+#endif
+ emit_ab(as, PPCI_CMPLW, type, tisnum);
}
- if (ra_hasreg(dest)) emit_fai(as, PPCI_LFD, dest, base, ofs-4);
+ if (ra_hasreg(dest)) emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest,
+ base, ofs-(LJ_SOFTFP?0:4));
} else {
if ((ir->op2 & IRSLOAD_TYPECHECK)) {
asm_guardcc(as, CC_NE);
@@ -1122,6 +1228,7 @@ static void asm_obar(ASMState *as, IRIns *ir)
/* -- Arithmetic and logic operations ------------------------------------- */
+#if !LJ_SOFTFP
static void asm_fparith(ASMState *as, IRIns *ir, PPCIns pi)
{
Reg dest = ra_dest(as, ir, RSET_FPR);
@@ -1149,13 +1256,17 @@ static void asm_fpmath(ASMState *as, IRIns *ir)
else
asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
}
+#endif
static void asm_add(ASMState *as, IRIns *ir)
{
+#if !LJ_SOFTFP
if (irt_isnum(ir->t)) {
if (!asm_fusemadd(as, ir, PPCI_FMADD, PPCI_FMADD))
asm_fparith(as, ir, PPCI_FADD);
- } else {
+ } else
+#endif
+ {
Reg dest = ra_dest(as, ir, RSET_GPR);
Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
PPCIns pi;
@@ -1194,10 +1305,13 @@ static void asm_add(ASMState *as, IRIns *ir)
static void asm_sub(ASMState *as, IRIns *ir)
{
+#if !LJ_SOFTFP
if (irt_isnum(ir->t)) {
if (!asm_fusemadd(as, ir, PPCI_FMSUB, PPCI_FNMSUB))
asm_fparith(as, ir, PPCI_FSUB);
- } else {
+ } else
+#endif
+ {
PPCIns pi = PPCI_SUBF;
Reg dest = ra_dest(as, ir, RSET_GPR);
Reg left, right;
@@ -1223,9 +1337,12 @@ static void asm_sub(ASMState *as, IRIns *ir)
static void asm_mul(ASMState *as, IRIns *ir)
{
+#if !LJ_SOFTFP
if (irt_isnum(ir->t)) {
asm_fparith(as, ir, PPCI_FMUL);
- } else {
+ } else
+#endif
+ {
PPCIns pi = PPCI_MULLW;
Reg dest = ra_dest(as, ir, RSET_GPR);
Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
@@ -1253,9 +1370,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
static void asm_neg(ASMState *as, IRIns *ir)
{
+#if !LJ_SOFTFP
if (irt_isnum(ir->t)) {
asm_fpunary(as, ir, PPCI_FNEG);
- } else {
+ } else
+#endif
+ {
Reg dest, left;
PPCIns pi = PPCI_NEG;
if (as->flagmcp == as->mcp) {
@@ -1566,9 +1686,40 @@ static void asm_bitshift(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pik)
PPCI_RLWINM|PPCF_MB(0)|PPCF_ME(31))
#define asm_bror(as, ir) lua_assert(0)
+#if LJ_SOFTFP
+static void asm_sfpmin_max(ASMState *as, IRIns *ir)
+{
+ CCallInfo ci = lj_ir_callinfo[IRCALL_softfp_cmp];
+ IRRef args[4];
+ MCLabel l_right, l_end;
+ Reg desthi = ra_dest(as, ir, RSET_GPR), destlo = ra_dest(as, ir+1, RSET_GPR);
+ Reg righthi, lefthi = ra_alloc2(as, ir, RSET_GPR);
+ Reg rightlo, leftlo = ra_alloc2(as, ir+1, RSET_GPR);
+ PPCCC cond = (IROp)ir->o == IR_MIN ? CC_EQ : CC_NE;
+ righthi = (lefthi >> 8); lefthi &= 255;
+ rightlo = (leftlo >> 8); leftlo &= 255;
+ args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
+ args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
+ l_end = emit_label(as);
+ if (desthi != righthi) emit_mr(as, desthi, righthi);
+ if (destlo != rightlo) emit_mr(as, destlo, rightlo);
+ l_right = emit_label(as);
+ if (l_end != l_right) emit_jmp(as, l_end);
+ if (desthi != lefthi) emit_mr(as, desthi, lefthi);
+ if (destlo != leftlo) emit_mr(as, destlo, leftlo);
+ if (l_right == as->mcp+1) {
+ cond ^= 4; l_right = l_end; ++as->mcp;
+ }
+ emit_condbranch(as, PPCI_BC, cond, l_right);
+ ra_evictset(as, RSET_SCRATCH);
+ emit_cmpi(as, RID_RET, 1);
+ asm_gencall(as, &ci, args);
+}
+#endif
+
static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
{
- if (irt_isnum(ir->t)) {
+ if (!LJ_SOFTFP && irt_isnum(ir->t)) {
Reg dest = ra_dest(as, ir, RSET_FPR);
Reg tmp = dest;
Reg right, left = ra_alloc2(as, ir, RSET_FPR);
@@ -1656,7 +1807,7 @@ static void asm_intcomp_(ASMState *as, IRRef lref, IRRef rref, Reg cr, PPCCC cc)
static void asm_comp(ASMState *as, IRIns *ir)
{
PPCCC cc = asm_compmap[ir->o];
- if (irt_isnum(ir->t)) {
+ if (!LJ_SOFTFP && irt_isnum(ir->t)) {
Reg right, left = ra_alloc2(as, ir, RSET_FPR);
right = (left >> 8); left &= 255;
asm_guardcc(as, (cc >> 4));
@@ -1677,6 +1828,44 @@ static void asm_comp(ASMState *as, IRIns *ir)
#define asm_equal(as, ir) asm_comp(as, ir)
+#if LJ_SOFTFP
+/* SFP comparisons. */
+static void asm_sfpcomp(ASMState *as, IRIns *ir)
+{
+ const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
+ RegSet drop = RSET_SCRATCH;
+ Reg r;
+ IRRef args[4];
+ args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
+ args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
+
+ for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
+ if (!rset_test(as->freeset, r) &&
+ regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
+ rset_clear(drop, r);
+ }
+ ra_evictset(as, drop);
+ asm_setupresult(as, ir, ci);
+ switch ((IROp)ir->o) {
+ case IR_ULT:
+ asm_guardcc(as, CC_EQ);
+ emit_ai(as, PPCI_CMPWI, RID_RET, 0);
+ case IR_ULE:
+ asm_guardcc(as, CC_EQ);
+ emit_ai(as, PPCI_CMPWI, RID_RET, 1);
+ break;
+ case IR_GE: case IR_GT:
+ asm_guardcc(as, CC_EQ);
+ emit_ai(as, PPCI_CMPWI, RID_RET, 2);
+ default:
+ asm_guardcc(as, (asm_compmap[ir->o] & 0xf));
+ emit_ai(as, PPCI_CMPWI, RID_RET, 0);
+ break;
+ }
+ asm_gencall(as, ci, args);
+}
+#endif
+
#if LJ_HASFFI
/* 64 bit integer comparisons. */
static void asm_comp64(ASMState *as, IRIns *ir)
@@ -1706,19 +1895,36 @@ static void asm_comp64(ASMState *as, IRIns *ir)
/* Hiword op of a split 64 bit op. Previous op must be the loword op. */
static void asm_hiop(ASMState *as, IRIns *ir)
{
-#if LJ_HASFFI
+#if LJ_HASFFI || LJ_SOFTFP
/* HIOP is marked as a store because it needs its own DCE logic. */
int uselo = ra_used(ir-1), usehi = ra_used(ir); /* Loword/hiword used? */
if (LJ_UNLIKELY(!(as->flags & JIT_F_OPT_DCE))) uselo = usehi = 1;
if ((ir-1)->o == IR_CONV) { /* Conversions to/from 64 bit. */
as->curins--; /* Always skip the CONV. */
+#if LJ_HASFFI && !LJ_SOFTFP
if (usehi || uselo)
asm_conv64(as, ir);
return;
+#endif
} else if ((ir-1)->o <= IR_NE) { /* 64 bit integer comparisons. ORDER IR. */
as->curins--; /* Always skip the loword comparison. */
+#if LJ_SOFTFP
+ if (!irt_isint(ir->t)) {
+ asm_sfpcomp(as, ir-1);
+ return;
+ }
+#endif
+#if LJ_HASFFI
asm_comp64(as, ir);
+#endif
+ return;
+#if LJ_SOFTFP
+ } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
+ as->curins--; /* Always skip the loword min/max. */
+ if (uselo || usehi)
+ asm_sfpmin_max(as, ir-1);
return;
+#endif
} else if ((ir-1)->o == IR_XSTORE) {
as->curins--; /* Handle both stores here. */
if ((ir-1)->r != RID_SINK) {
@@ -1729,14 +1935,27 @@ static void asm_hiop(ASMState *as, IRIns *ir)
}
if (!usehi) return; /* Skip unused hiword op for all remaining ops. */
switch ((ir-1)->o) {
+#if LJ_HASFFI
case IR_ADD: as->curins--; asm_add64(as, ir); break;
case IR_SUB: as->curins--; asm_sub64(as, ir); break;
case IR_NEG: as->curins--; asm_neg64(as, ir); break;
+#endif
+#if LJ_SOFTFP
+ case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
+ case IR_STRTO:
+ if (!uselo)
+ ra_allocref(as, ir->op1, RSET_GPR); /* Mark lo op as used. */
+ break;
+#endif
case IR_CALLN:
+ case IR_CALLS:
case IR_CALLXS:
if (!uselo)
ra_allocref(as, ir->op1, RID2RSET(RID_RETLO)); /* Mark lo op as used. */
break;
+#if LJ_SOFTFP
+ case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
+#endif
case IR_CNEWI:
/* Nothing to do here. Handled by lo op itself. */
break;
@@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
if ((sn & SNAP_NORESTORE))
continue;
if (irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+ Reg tmp;
+ RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
+ lua_assert(irref_isk(ref)); /* LJ_SOFTFP: must be a number constant. */
+ tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.lo, allow);
+ emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?4:0));
+ if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
+ tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
+ emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
+#else
Reg src = ra_alloc1(as, ref, RSET_FPR);
emit_fai(as, PPCI_STFD, src, RID_BASE, ofs);
+#endif
} else {
Reg type;
RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
@@ -1814,6 +2044,10 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
if ((sn & (SNAP_CONT|SNAP_FRAME))) {
if (s == 0) continue; /* Do not overwrite link to previous frame. */
type = ra_allock(as, (int32_t)(*flinks--), allow);
+#if LJ_SOFTFP
+ } else if ((sn & SNAP_SOFTFPNUM)) {
+ type = ra_alloc1(as, ref+1, rset_exclude(RSET_GPR, RID_BASE));
+#endif
} else {
type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
}
@@ -1950,14 +2184,15 @@ static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci)
int nslots = 2, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR;
asm_collectargs(as, ir, ci, args);
for (i = 0; i < nargs; i++)
- if (args[i] && irt_isfp(IR(args[i])->t)) {
+ if (!LJ_SOFTFP && args[i] && irt_isfp(IR(args[i])->t)) {
if (nfpr > 0) nfpr--; else nslots = (nslots+3) & ~1;
} else {
if (ngpr > 0) ngpr--; else nslots++;
}
if (nslots > as->evenspill) /* Leave room for args in stack slots. */
as->evenspill = nslots;
- return irt_isfp(ir->t) ? REGSP_HINT(RID_FPRET) : REGSP_HINT(RID_RET);
+ return (!LJ_SOFTFP && irt_isfp(ir->t)) ? REGSP_HINT(RID_FPRET) :
+ REGSP_HINT(RID_RET);
}
static void asm_setup_target(ASMState *as)
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:46 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:21 ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:33 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:46 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few typos and a single question below.
On Wed, Aug 09, 2023 at 06:35:55PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the JIT compiler for powerpc.
Typo: s/powerpc/PowerPC/
> This includes:
> * All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
Typo: s/fp-depending/fp-dependent/
> * `asm_sfpmin_max()` is introduced for min/max operations on soft-float
> point.
> * `asm_sfpcomp()` is introduced for soft-float point comparisons.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_arch.h | 1 -
> src/lj_asm_ppc.h | 321 ++++++++++++++++++++++++++++++++++++++++-------
> 2 files changed, 278 insertions(+), 44 deletions(-)
>
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 8bb8757d..7397492e 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -281,7 +281,6 @@
> #endif
>
> #if LJ_ABI_SOFTFP
> -#define LJ_ARCH_NOJIT 1 /* NYI */
> #define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL
> #else
> #define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index aa2d45c0..6cb608f7 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -226,6 +226,7 @@ static void asm_fusexrefx(ASMState *as, PPCIns pi, Reg rt, IRRef ref,
> emit_tab(as, pi, rt, left, right);
> }
>
> +#if !LJ_SOFTFP
> /* Fuse to multiply-add/sub instruction. */
> static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
> {
> @@ -245,6 +246,7 @@ static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
> }
> return 0;
> }
> +#endif
>
> /* -- Calls --------------------------------------------------------------- */
>
> @@ -253,13 +255,17 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> {
> uint32_t n, nargs = CCI_XNARGS(ci);
> int32_t ofs = 8;
> - Reg gpr = REGARG_FIRSTGPR, fpr = REGARG_FIRSTFPR;
> + Reg gpr = REGARG_FIRSTGPR;
> +#if !LJ_SOFTFP
> + Reg fpr = REGARG_FIRSTFPR;
> +#endif
> if ((void *)ci->func)
> emit_call(as, (void *)ci->func);
> for (n = 0; n < nargs; n++) { /* Setup args. */
> IRRef ref = args[n];
> if (ref) {
> IRIns *ir = IR(ref);
> +#if !LJ_SOFTFP
> if (irt_isfp(ir->t)) {
> if (fpr <= REGARG_LASTFPR) {
> lua_assert(rset_test(as->freeset, fpr)); /* Already evicted. */
> @@ -271,7 +277,9 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> emit_spstore(as, ir, r, ofs);
> ofs += irt_isnum(ir->t) ? 8 : 4;
> }
> - } else {
> + } else
> +#endif
> + {
> if (gpr <= REGARG_LASTGPR) {
> lua_assert(rset_test(as->freeset, gpr)); /* Already evicted. */
> ra_leftov(as, gpr, ref);
> @@ -290,8 +298,10 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> }
> checkmclim(as);
> }
> +#if !LJ_SOFTFP
> if ((ci->flags & CCI_VARARG)) /* Vararg calls need to know about FPR use. */
> emit_tab(as, fpr == REGARG_FIRSTFPR ? PPCI_CRXOR : PPCI_CREQV, 6, 6, 6);
> +#endif
> }
>
> /* Setup result reg/sp for call. Evict scratch regs. */
> @@ -299,8 +309,10 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
> {
> RegSet drop = RSET_SCRATCH;
> int hiop = ((ir+1)->o == IR_HIOP && !irt_isnil((ir+1)->t));
> +#if !LJ_SOFTFP
> if ((ci->flags & CCI_NOFPRCLOBBER))
> drop &= ~RSET_FPR;
> +#endif
> if (ra_hasreg(ir->r))
> rset_clear(drop, ir->r); /* Dest reg handled below. */
> if (hiop && ra_hasreg((ir+1)->r))
> @@ -308,7 +320,7 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
> ra_evictset(as, drop); /* Evictions must be performed first. */
> if (ra_used(ir)) {
> lua_assert(!irt_ispri(ir->t));
> - if (irt_isfp(ir->t)) {
> + if (!LJ_SOFTFP && irt_isfp(ir->t)) {
> if ((ci->flags & CCI_CASTU64)) {
> /* Use spill slot or temp slots. */
> int32_t ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
> @@ -377,6 +389,7 @@ static void asm_retf(ASMState *as, IRIns *ir)
>
> /* -- Type conversions ---------------------------------------------------- */
>
> +#if !LJ_SOFTFP
> static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
> {
> RegSet allow = RSET_FPR;
> @@ -409,15 +422,23 @@ static void asm_tobit(ASMState *as, IRIns *ir)
> emit_fai(as, PPCI_STFD, tmp, RID_SP, SPOFS_TMP);
> emit_fab(as, PPCI_FADD, tmp, left, right);
> }
> +#endif
>
> static void asm_conv(ASMState *as, IRIns *ir)
> {
> IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> +#if !LJ_SOFTFP
> int stfp = (st == IRT_NUM || st == IRT_FLOAT);
> +#endif
> IRRef lref = ir->op1;
> - lua_assert(irt_type(ir->t) != st);
> lua_assert(!(irt_isint64(ir->t) ||
> (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
> +#if LJ_SOFTFP
> + /* FP conversions are handled by SPLIT. */
> + lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
> + /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
> +#else
> + lua_assert(irt_type(ir->t) != st);
> if (irt_isfp(ir->t)) {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> if (stfp) { /* FP to FP conversion. */
> @@ -476,7 +497,9 @@ static void asm_conv(ASMState *as, IRIns *ir)
> emit_fb(as, PPCI_FCTIWZ, tmp, left);
> }
> }
> - } else {
> + } else
> +#endif
> + {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> if (st >= IRT_I8 && st <= IRT_U16) { /* Extend to 32 bit integer. */
> Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> @@ -496,17 +519,41 @@ static void asm_strto(ASMState *as, IRIns *ir)
> {
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
> IRRef args[2];
> - int32_t ofs;
> + int32_t ofs = SPOFS_TMP;
> +#if LJ_SOFTFP
> + ra_evictset(as, RSET_SCRATCH);
> + if (ra_used(ir)) {
> + if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> + (ir->s & 1) == LJ_BE && (ir->s ^ 1) == (ir+1)->s) {
> + int i;
> + for (i = 0; i < 2; i++) {
> + Reg r = (ir+i)->r;
> + if (ra_hasreg(r)) {
> + ra_free(as, r);
> + ra_modified(as, r);
> + emit_spload(as, ir+i, r, sps_scale((ir+i)->s));
> + }
> + }
> + ofs = sps_scale(ir->s & ~1);
> + } else {
> + Reg rhi = ra_dest(as, ir+1, RSET_GPR);
> + Reg rlo = ra_dest(as, ir, rset_exclude(RSET_GPR, rhi));
> + emit_tai(as, PPCI_LWZ, rhi, RID_SP, ofs);
> + emit_tai(as, PPCI_LWZ, rlo, RID_SP, ofs+4);
> + }
> + }
> +#else
> RegSet drop = RSET_SCRATCH;
> if (ra_hasreg(ir->r)) rset_set(drop, ir->r); /* Spill dest reg (if any). */
> ra_evictset(as, drop);
> + if (ir->s) ofs = sps_scale(ir->s);
> +#endif
> asm_guardcc(as, CC_EQ);
> emit_ai(as, PPCI_CMPWI, RID_RET, 0); /* Test return status. */
> args[0] = ir->op1; /* GCstr *str */
> args[1] = ASMREF_TMP1; /* TValue *n */
> asm_gencall(as, ci, args);
> /* Store the result to the spill slot or temp slots. */
> - ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
> emit_tai(as, PPCI_ADDI, ra_releasetmp(as, ASMREF_TMP1), RID_SP, ofs);
> }
>
> @@ -530,7 +577,10 @@ static void asm_tvptr(ASMState *as, Reg dest, IRRef ref)
> Reg src = ra_alloc1(as, ref, allow);
> emit_setgl(as, src, tmptv.gcr);
> }
> - type = ra_allock(as, irt_toitype(ir->t), allow);
> + if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> + type = ra_alloc1(as, ref+1, allow);
> + else
> + type = ra_allock(as, irt_toitype(ir->t), allow);
> emit_setgl(as, type, tmptv.it);
> }
> }
> @@ -574,11 +624,27 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> Reg tisnum = RID_NONE, tmpnum = RID_NONE;
> IRRef refkey = ir->op2;
> IRIns *irkey = IR(refkey);
> + int isk = irref_isk(refkey);
> IRType1 kt = irkey->t;
> uint32_t khash;
> MCLabel l_end, l_loop, l_next;
>
> rset_clear(allow, tab);
> +#if LJ_SOFTFP
> + if (!isk) {
> + key = ra_alloc1(as, refkey, allow);
> + rset_clear(allow, key);
> + if (irkey[1].o == IR_HIOP) {
> + if (ra_hasreg((irkey+1)->r)) {
> + tmpnum = (irkey+1)->r;
> + ra_noweak(as, tmpnum);
> + } else {
> + tmpnum = ra_allocref(as, refkey+1, allow);
> + }
> + rset_clear(allow, tmpnum);
> + }
> + }
> +#else
> if (irt_isnum(kt)) {
> key = ra_alloc1(as, refkey, RSET_FPR);
> tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
> @@ -588,6 +654,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> key = ra_alloc1(as, refkey, allow);
> rset_clear(allow, key);
> }
> +#endif
> tmp2 = ra_scratch(as, allow);
> rset_clear(allow, tmp2);
>
> @@ -610,7 +677,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> asm_guardcc(as, CC_EQ);
> else
> emit_condbranch(as, PPCI_BC|PPCF_Y, CC_EQ, l_end);
> - if (irt_isnum(kt)) {
> + if (!LJ_SOFTFP && irt_isnum(kt)) {
> emit_fab(as, PPCI_FCMPU, 0, tmpnum, key);
> emit_condbranch(as, PPCI_BC, CC_GE, l_next);
> emit_ab(as, PPCI_CMPLW, tmp1, tisnum);
> @@ -620,7 +687,10 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_ab(as, PPCI_CMPW, tmp2, key);
> emit_condbranch(as, PPCI_BC, CC_NE, l_next);
> }
> - emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
> + if (LJ_SOFTFP && ra_hasreg(tmpnum))
> + emit_ab(as, PPCI_CMPW, tmp1, tmpnum);
> + else
> + emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
> if (!irt_ispri(kt))
> emit_tai(as, PPCI_LWZ, tmp2, dest, (int32_t)offsetof(Node, key.gcr));
> }
> @@ -629,19 +699,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> (((char *)as->mcp-(char *)l_loop) & 0xffffu);
>
> /* Load main position relative to tab->node into dest. */
> - khash = irref_isk(refkey) ? ir_khash(irkey) : 1;
> + khash = isk ? ir_khash(irkey) : 1;
> if (khash == 0) {
> emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
> } else {
> Reg tmphash = tmp1;
> - if (irref_isk(refkey))
> + if (isk)
> tmphash = ra_allock(as, khash, allow);
> emit_tab(as, PPCI_ADD, dest, dest, tmp1);
> emit_tai(as, PPCI_MULLI, tmp1, tmp1, sizeof(Node));
> emit_asb(as, PPCI_AND, tmp1, tmp2, tmphash);
> emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
> emit_tai(as, PPCI_LWZ, tmp2, tab, (int32_t)offsetof(GCtab, hmask));
> - if (irref_isk(refkey)) {
> + if (isk) {
> /* Nothing to do. */
> } else if (irt_isstr(kt)) {
> emit_tai(as, PPCI_LWZ, tmp1, key, (int32_t)offsetof(GCstr, hash));
> @@ -651,13 +721,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_asb(as, PPCI_XOR, tmp1, tmp1, tmp2);
> emit_rotlwi(as, tmp1, tmp1, (HASH_ROT2+HASH_ROT1)&31);
> emit_tab(as, PPCI_SUBF, tmp2, dest, tmp2);
> - if (irt_isnum(kt)) {
> + if (LJ_SOFTFP ? (irkey[1].o == IR_HIOP) : irt_isnum(kt)) {
> +#if LJ_SOFTFP
> + emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
> + emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> + emit_tab(as, PPCI_ADD, tmp1, tmpnum, tmpnum);
> +#else
> int32_t ofs = ra_spill(as, irkey);
> emit_asb(as, PPCI_XOR, tmp2, tmp2, tmp1);
> emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> emit_tab(as, PPCI_ADD, tmp1, tmp1, tmp1);
> emit_tai(as, PPCI_LWZ, tmp2, RID_SP, ofs+4);
> emit_tai(as, PPCI_LWZ, tmp1, RID_SP, ofs);
> +#endif
> } else {
> emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
> emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> @@ -784,8 +860,8 @@ static PPCIns asm_fxloadins(IRIns *ir)
> case IRT_U8: return PPCI_LBZ;
> case IRT_I16: return PPCI_LHA;
> case IRT_U16: return PPCI_LHZ;
> - case IRT_NUM: return PPCI_LFD;
> - case IRT_FLOAT: return PPCI_LFS;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_LFD;
> + case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_LFS;
> default: return PPCI_LWZ;
> }
> }
> @@ -795,8 +871,8 @@ static PPCIns asm_fxstoreins(IRIns *ir)
> switch (irt_type(ir->t)) {
> case IRT_I8: case IRT_U8: return PPCI_STB;
> case IRT_I16: case IRT_U16: return PPCI_STH;
> - case IRT_NUM: return PPCI_STFD;
> - case IRT_FLOAT: return PPCI_STFS;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_STFD;
> + case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_STFS;
> default: return PPCI_STW;
> }
> }
> @@ -839,7 +915,8 @@ static void asm_fstore(ASMState *as, IRIns *ir)
>
> static void asm_xload(ASMState *as, IRIns *ir)
> {
> - Reg dest = ra_dest(as, ir, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> + Reg dest = ra_dest(as, ir,
> + (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
> lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
> if (irt_isi8(ir->t))
> emit_as(as, PPCI_EXTSB, dest, dest);
> @@ -857,7 +934,8 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
> Reg src = ra_alloc1(as, irb->op1, RSET_GPR);
> asm_fusexrefx(as, PPCI_STWBRX, src, ir->op1, rset_exclude(RSET_GPR, src));
> } else {
> - Reg src = ra_alloc1(as, ir->op2, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> + Reg src = ra_alloc1(as, ir->op2,
> + (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
> asm_fusexref(as, asm_fxstoreins(ir), src, ir->op1,
> rset_exclude(RSET_GPR, src), ofs);
> }
> @@ -871,10 +949,19 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
> Reg dest = RID_NONE, type = RID_TMP, tmp = RID_TMP, idx;
> RegSet allow = RSET_GPR;
> int32_t ofs = AHUREF_LSX;
> + if (LJ_SOFTFP && (ir+1)->o == IR_HIOP) {
> + t.irt = IRT_NUM;
> + if (ra_used(ir+1)) {
> + type = ra_dest(as, ir+1, allow);
> + rset_clear(allow, type);
> + }
> + ofs = 0;
> + }
> if (ra_used(ir)) {
> - lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> - if (!irt_isnum(t)) ofs = 0;
> - dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> + lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> + irt_isint(ir->t) || irt_isaddr(ir->t));
> + if (LJ_SOFTFP || !irt_isnum(t)) ofs = 0;
> + dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> }
> idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> @@ -883,12 +970,13 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
> asm_guardcc(as, CC_GE);
> emit_ab(as, PPCI_CMPLW, type, tisnum);
> if (ra_hasreg(dest)) {
> - if (ofs == AHUREF_LSX) {
> + if (!LJ_SOFTFP && ofs == AHUREF_LSX) {
> tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_GPR,
> (idx&255)), (idx>>8)));
> emit_fab(as, PPCI_LFDX, dest, (idx&255), tmp);
> } else {
> - emit_fai(as, PPCI_LFD, dest, idx, ofs);
> + emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest, idx,
> + ofs+4*LJ_SOFTFP);
> }
> }
> } else {
> @@ -911,7 +999,7 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
> int32_t ofs = AHUREF_LSX;
> if (ir->r == RID_SINK)
> return;
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> src = ra_alloc1(as, ir->op2, RSET_FPR);
> } else {
> if (!irt_ispri(ir->t)) {
> @@ -919,11 +1007,14 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
> rset_clear(allow, src);
> ofs = 0;
> }
> - type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> + if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> + type = ra_alloc1(as, (ir+1)->op2, allow);
> + else
> + type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> rset_clear(allow, type);
> }
> idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> if (ofs == AHUREF_LSX) {
> emit_fab(as, PPCI_STFDX, src, (idx&255), RID_TMP);
> emit_slwi(as, RID_TMP, (idx>>8), 3);
> @@ -948,21 +1039,33 @@ static void asm_sload(ASMState *as, IRIns *ir)
> IRType1 t = ir->t;
> Reg dest = RID_NONE, type = RID_NONE, base;
> RegSet allow = RSET_GPR;
> + int hiop = (LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> + if (hiop)
> + t.irt = IRT_NUM;
> lua_assert(!(ir->op2 & IRSLOAD_PARENT)); /* Handled by asm_head_side(). */
> - lua_assert(irt_isguard(t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> + lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> lua_assert(LJ_DUALNUM ||
> !irt_isint(t) || (ir->op2 & (IRSLOAD_CONVERT|IRSLOAD_FRAME)));
> +#if LJ_SOFTFP
> + lua_assert(!(ir->op2 & IRSLOAD_CONVERT)); /* Handled by LJ_SOFTFP SPLIT. */
> + if (hiop && ra_used(ir+1)) {
> + type = ra_dest(as, ir+1, allow);
> + rset_clear(allow, type);
> + }
> +#else
> if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
> dest = ra_scratch(as, RSET_FPR);
> asm_tointg(as, ir, dest);
> t.irt = IRT_NUM; /* Continue with a regular number type check. */
> - } else if (ra_used(ir)) {
> + } else
> +#endif
> + if (ra_used(ir)) {
> lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> - dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> + dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> base = ra_alloc1(as, REF_BASE, allow);
> rset_clear(allow, base);
> - if ((ir->op2 & IRSLOAD_CONVERT)) {
> + if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
> if (irt_isint(t)) {
> emit_tai(as, PPCI_LWZ, dest, RID_SP, SPOFS_TMPLO);
> dest = ra_scratch(as, RSET_FPR);
> @@ -994,10 +1097,13 @@ dotypecheck:
> if ((ir->op2 & IRSLOAD_TYPECHECK)) {
> Reg tisnum = ra_allock(as, (int32_t)LJ_TISNUM, allow);
> asm_guardcc(as, CC_GE);
> - emit_ab(as, PPCI_CMPLW, RID_TMP, tisnum);
> +#if !LJ_SOFTFP
> type = RID_TMP;
> +#endif
> + emit_ab(as, PPCI_CMPLW, type, tisnum);
> }
> - if (ra_hasreg(dest)) emit_fai(as, PPCI_LFD, dest, base, ofs-4);
> + if (ra_hasreg(dest)) emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest,
> + base, ofs-(LJ_SOFTFP?0:4));
> } else {
> if ((ir->op2 & IRSLOAD_TYPECHECK)) {
> asm_guardcc(as, CC_NE);
> @@ -1122,6 +1228,7 @@ static void asm_obar(ASMState *as, IRIns *ir)
>
> /* -- Arithmetic and logic operations ------------------------------------- */
>
> +#if !LJ_SOFTFP
> static void asm_fparith(ASMState *as, IRIns *ir, PPCIns pi)
> {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> @@ -1149,13 +1256,17 @@ static void asm_fpmath(ASMState *as, IRIns *ir)
> else
> asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
> }
> +#endif
>
> static void asm_add(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> if (!asm_fusemadd(as, ir, PPCI_FMADD, PPCI_FMADD))
> asm_fparith(as, ir, PPCI_FADD);
> - } else {
> + } else
> +#endif
> + {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> PPCIns pi;
> @@ -1194,10 +1305,13 @@ static void asm_add(ASMState *as, IRIns *ir)
>
> static void asm_sub(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> if (!asm_fusemadd(as, ir, PPCI_FMSUB, PPCI_FNMSUB))
> asm_fparith(as, ir, PPCI_FSUB);
> - } else {
> + } else
> +#endif
> + {
> PPCIns pi = PPCI_SUBF;
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg left, right;
> @@ -1223,9 +1337,12 @@ static void asm_sub(ASMState *as, IRIns *ir)
>
> static void asm_mul(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> asm_fparith(as, ir, PPCI_FMUL);
> - } else {
> + } else
> +#endif
> + {
> PPCIns pi = PPCI_MULLW;
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> @@ -1253,9 +1370,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
>
> static void asm_neg(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> asm_fpunary(as, ir, PPCI_FNEG);
> - } else {
> + } else
> +#endif
> + {
> Reg dest, left;
> PPCIns pi = PPCI_NEG;
> if (as->flagmcp == as->mcp) {
> @@ -1566,9 +1686,40 @@ static void asm_bitshift(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pik)
> PPCI_RLWINM|PPCF_MB(0)|PPCF_ME(31))
> #define asm_bror(as, ir) lua_assert(0)
>
> +#if LJ_SOFTFP
> +static void asm_sfpmin_max(ASMState *as, IRIns *ir)
> +{
> + CCallInfo ci = lj_ir_callinfo[IRCALL_softfp_cmp];
> + IRRef args[4];
> + MCLabel l_right, l_end;
> + Reg desthi = ra_dest(as, ir, RSET_GPR), destlo = ra_dest(as, ir+1, RSET_GPR);
> + Reg righthi, lefthi = ra_alloc2(as, ir, RSET_GPR);
> + Reg rightlo, leftlo = ra_alloc2(as, ir+1, RSET_GPR);
> + PPCCC cond = (IROp)ir->o == IR_MIN ? CC_EQ : CC_NE;
> + righthi = (lefthi >> 8); lefthi &= 255;
> + rightlo = (leftlo >> 8); leftlo &= 255;
> + args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> + args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> + l_end = emit_label(as);
> + if (desthi != righthi) emit_mr(as, desthi, righthi);
> + if (destlo != rightlo) emit_mr(as, destlo, rightlo);
> + l_right = emit_label(as);
> + if (l_end != l_right) emit_jmp(as, l_end);
> + if (desthi != lefthi) emit_mr(as, desthi, lefthi);
> + if (destlo != leftlo) emit_mr(as, destlo, leftlo);
> + if (l_right == as->mcp+1) {
> + cond ^= 4; l_right = l_end; ++as->mcp;
> + }
> + emit_condbranch(as, PPCI_BC, cond, l_right);
> + ra_evictset(as, RSET_SCRATCH);
> + emit_cmpi(as, RID_RET, 1);
> + asm_gencall(as, &ci, args);
> +}
> +#endif
> +
> static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
> {
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> Reg tmp = dest;
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> @@ -1656,7 +1807,7 @@ static void asm_intcomp_(ASMState *as, IRRef lref, IRRef rref, Reg cr, PPCCC cc)
> static void asm_comp(ASMState *as, IRIns *ir)
> {
> PPCCC cc = asm_compmap[ir->o];
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> right = (left >> 8); left &= 255;
> asm_guardcc(as, (cc >> 4));
> @@ -1677,6 +1828,44 @@ static void asm_comp(ASMState *as, IRIns *ir)
>
> #define asm_equal(as, ir) asm_comp(as, ir)
>
> +#if LJ_SOFTFP
> +/* SFP comparisons. */
> +static void asm_sfpcomp(ASMState *as, IRIns *ir)
> +{
> + const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
> + RegSet drop = RSET_SCRATCH;
> + Reg r;
> + IRRef args[4];
> + args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> + args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> +
> + for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> + if (!rset_test(as->freeset, r) &&
> + regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
> + rset_clear(drop, r);
> + }
> + ra_evictset(as, drop);
> + asm_setupresult(as, ir, ci);
> + switch ((IROp)ir->o) {
> + case IR_ULT:
> + asm_guardcc(as, CC_EQ);
> + emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> + case IR_ULE:
> + asm_guardcc(as, CC_EQ);
> + emit_ai(as, PPCI_CMPWI, RID_RET, 1);
> + break;
> + case IR_GE: case IR_GT:
> + asm_guardcc(as, CC_EQ);
> + emit_ai(as, PPCI_CMPWI, RID_RET, 2);
> + default:
> + asm_guardcc(as, (asm_compmap[ir->o] & 0xf));
> + emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> + break;
> + }
> + asm_gencall(as, ci, args);
> +}
> +#endif
> +
> #if LJ_HASFFI
> /* 64 bit integer comparisons. */
> static void asm_comp64(ASMState *as, IRIns *ir)
> @@ -1706,19 +1895,36 @@ static void asm_comp64(ASMState *as, IRIns *ir)
> /* Hiword op of a split 64 bit op. Previous op must be the loword op. */
> static void asm_hiop(ASMState *as, IRIns *ir)
> {
> -#if LJ_HASFFI
> +#if LJ_HASFFI || LJ_SOFTFP
> /* HIOP is marked as a store because it needs its own DCE logic. */
> int uselo = ra_used(ir-1), usehi = ra_used(ir); /* Loword/hiword used? */
> if (LJ_UNLIKELY(!(as->flags & JIT_F_OPT_DCE))) uselo = usehi = 1;
> if ((ir-1)->o == IR_CONV) { /* Conversions to/from 64 bit. */
> as->curins--; /* Always skip the CONV. */
> +#if LJ_HASFFI && !LJ_SOFTFP
> if (usehi || uselo)
> asm_conv64(as, ir);
> return;
> +#endif
> } else if ((ir-1)->o <= IR_NE) { /* 64 bit integer comparisons. ORDER IR. */
> as->curins--; /* Always skip the loword comparison. */
> +#if LJ_SOFTFP
> + if (!irt_isint(ir->t)) {
> + asm_sfpcomp(as, ir-1);
> + return;
> + }
> +#endif
> +#if LJ_HASFFI
> asm_comp64(as, ir);
> +#endif
> + return;
> +#if LJ_SOFTFP
> + } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
> + as->curins--; /* Always skip the loword min/max. */
> + if (uselo || usehi)
> + asm_sfpmin_max(as, ir-1);
> return;
> +#endif
> } else if ((ir-1)->o == IR_XSTORE) {
> as->curins--; /* Handle both stores here. */
> if ((ir-1)->r != RID_SINK) {
> @@ -1729,14 +1935,27 @@ static void asm_hiop(ASMState *as, IRIns *ir)
> }
> if (!usehi) return; /* Skip unused hiword op for all remaining ops. */
> switch ((ir-1)->o) {
> +#if LJ_HASFFI
> case IR_ADD: as->curins--; asm_add64(as, ir); break;
> case IR_SUB: as->curins--; asm_sub64(as, ir); break;
> case IR_NEG: as->curins--; asm_neg64(as, ir); break;
> +#endif
> +#if LJ_SOFTFP
> + case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
> + case IR_STRTO:
Why are those fp-dependent? Should we write an explanation?
> + if (!uselo)
> + ra_allocref(as, ir->op1, RSET_GPR); /* Mark lo op as used. */
> + break;
> +#endif
> case IR_CALLN:
> + case IR_CALLS:
> case IR_CALLXS:
> if (!uselo)
> ra_allocref(as, ir->op1, RID2RSET(RID_RETLO)); /* Mark lo op as used. */
> break;
> +#if LJ_SOFTFP
> + case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
> +#endif
> case IR_CNEWI:
> /* Nothing to do here. Handled by lo op itself. */
> break;
> @@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if ((sn & SNAP_NORESTORE))
> continue;
> if (irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + Reg tmp;
> + RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> + lua_assert(irref_isk(ref)); /* LJ_SOFTFP: must be a number constant. */
> + tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.lo, allow);
> + emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?4:0));
> + if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
> + tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
> + emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#else
> Reg src = ra_alloc1(as, ref, RSET_FPR);
> emit_fai(as, PPCI_STFD, src, RID_BASE, ofs);
> +#endif
> } else {
> Reg type;
> RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> @@ -1814,6 +2044,10 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if ((sn & (SNAP_CONT|SNAP_FRAME))) {
> if (s == 0) continue; /* Do not overwrite link to previous frame. */
> type = ra_allock(as, (int32_t)(*flinks--), allow);
> +#if LJ_SOFTFP
> + } else if ((sn & SNAP_SOFTFPNUM)) {
> + type = ra_alloc1(as, ref+1, rset_exclude(RSET_GPR, RID_BASE));
> +#endif
> } else {
> type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> }
> @@ -1950,14 +2184,15 @@ static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci)
> int nslots = 2, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR;
> asm_collectargs(as, ir, ci, args);
> for (i = 0; i < nargs; i++)
> - if (args[i] && irt_isfp(IR(args[i])->t)) {
> + if (!LJ_SOFTFP && args[i] && irt_isfp(IR(args[i])->t)) {
> if (nfpr > 0) nfpr--; else nslots = (nslots+3) & ~1;
> } else {
> if (ngpr > 0) ngpr--; else nslots++;
> }
> if (nslots > as->evenspill) /* Leave room for args in stack slots. */
> as->evenspill = nslots;
> - return irt_isfp(ir->t) ? REGSP_HINT(RID_FPRET) : REGSP_HINT(RID_RET);
> + return (!LJ_SOFTFP && irt_isfp(ir->t)) ? REGSP_HINT(RID_FPRET) :
> + REGSP_HINT(RID_RET);
> }
>
> static void asm_setup_target(ASMState *as)
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
2023-08-15 11:46 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:21 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:21 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
See my answers below.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few typos and a single question below.
> On Wed, Aug 09, 2023 at 06:35:55PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> > Sponsored by Cisco Systems, Inc.
> >
> > (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
> >
> > The software floating point library is used on machines which do not
> > have hardware support for floating point [1]. This patch enables
> > support for such machines in the JIT compiler for powerpc.
> Typo: s/powerpc/PowerPC/
Fixed, thanks!
> > This includes:
> > * All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
> Typo: s/fp-depending/fp-dependent/
Fixed.
> > * `asm_sfpmin_max()` is introduced for min/max operations on soft-float
> > point.
> > * `asm_sfpcomp()` is introduced for soft-float point comparisons.
> >
> > [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> >
> > Sergey Kaplun:
> > * added the description for the feature
> >
> > Part of tarantool/tarantool#8825
> > ---
<snipped>
> > +#if LJ_SOFTFP
> > + case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
> > + case IR_STRTO:
> Why are those fp-dependent? Should we write an explanation?
I supposed, that is used lo for possible half of a fp value.
Also, there is no need to use it on hard-float machines.
I suppose, that the comment as is is OK.
Same for the stores.
> > + if (!uselo)
> > + ra_allocref(as, ir->op1, RSET_GPR); /* Mark lo op as used. */
> > + break;
> > +#endif
> > case IR_CALLN:
> > + case IR_CALLS:
> > case IR_CALLXS:
> > if (!uselo)
> > ra_allocref(as, ir->op1, RID2RSET(RID_RETLO)); /* Mark lo op as used. */
> > break;
> > +#if LJ_SOFTFP
> > + case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
> > +#endif
> > case IR_CNEWI:
> > /* Nothing to do here. Handled by lo op itself. */
> > break;
> > @@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> > if ((sn & SNAP_NORESTORE))
> > continue;
> > if (irt_isnum(ir->t)) {
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
2023-08-15 11:46 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 14:33 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:33 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
thanks for the patch! LGTM
On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the JIT compiler for powerpc.
> This includes:
> * All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
> * `asm_sfpmin_max()` is introduced for min/max operations on soft-float
> point.
> * `asm_sfpcomp()` is introduced for soft-float point comparisons.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_arch.h | 1 -
> src/lj_asm_ppc.h | 321 ++++++++++++++++++++++++++++++++++++++++-------
> 2 files changed, 278 insertions(+), 44 deletions(-)
>
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 8bb8757d..7397492e 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -281,7 +281,6 @@
> #endif
>
> #if LJ_ABI_SOFTFP
> -#define LJ_ARCH_NOJIT 1 /* NYI */
> #define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL
> #else
> #define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index aa2d45c0..6cb608f7 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -226,6 +226,7 @@ static void asm_fusexrefx(ASMState *as, PPCIns pi, Reg rt, IRRef ref,
> emit_tab(as, pi, rt, left, right);
> }
>
> +#if !LJ_SOFTFP
> /* Fuse to multiply-add/sub instruction. */
> static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
> {
> @@ -245,6 +246,7 @@ static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
> }
> return 0;
> }
> +#endif
>
> /* -- Calls --------------------------------------------------------------- */
>
> @@ -253,13 +255,17 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> {
> uint32_t n, nargs = CCI_XNARGS(ci);
> int32_t ofs = 8;
> - Reg gpr = REGARG_FIRSTGPR, fpr = REGARG_FIRSTFPR;
> + Reg gpr = REGARG_FIRSTGPR;
> +#if !LJ_SOFTFP
> + Reg fpr = REGARG_FIRSTFPR;
> +#endif
> if ((void *)ci->func)
> emit_call(as, (void *)ci->func);
> for (n = 0; n < nargs; n++) { /* Setup args. */
> IRRef ref = args[n];
> if (ref) {
> IRIns *ir = IR(ref);
> +#if !LJ_SOFTFP
> if (irt_isfp(ir->t)) {
> if (fpr <= REGARG_LASTFPR) {
> lua_assert(rset_test(as->freeset, fpr)); /* Already evicted. */
> @@ -271,7 +277,9 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> emit_spstore(as, ir, r, ofs);
> ofs += irt_isnum(ir->t) ? 8 : 4;
> }
> - } else {
> + } else
> +#endif
> + {
> if (gpr <= REGARG_LASTGPR) {
> lua_assert(rset_test(as->freeset, gpr)); /* Already evicted. */
> ra_leftov(as, gpr, ref);
> @@ -290,8 +298,10 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
> }
> checkmclim(as);
> }
> +#if !LJ_SOFTFP
> if ((ci->flags & CCI_VARARG)) /* Vararg calls need to know about FPR use. */
> emit_tab(as, fpr == REGARG_FIRSTFPR ? PPCI_CRXOR : PPCI_CREQV, 6, 6, 6);
> +#endif
> }
>
> /* Setup result reg/sp for call. Evict scratch regs. */
> @@ -299,8 +309,10 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
> {
> RegSet drop = RSET_SCRATCH;
> int hiop = ((ir+1)->o == IR_HIOP && !irt_isnil((ir+1)->t));
> +#if !LJ_SOFTFP
> if ((ci->flags & CCI_NOFPRCLOBBER))
> drop &= ~RSET_FPR;
> +#endif
> if (ra_hasreg(ir->r))
> rset_clear(drop, ir->r); /* Dest reg handled below. */
> if (hiop && ra_hasreg((ir+1)->r))
> @@ -308,7 +320,7 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
> ra_evictset(as, drop); /* Evictions must be performed first. */
> if (ra_used(ir)) {
> lua_assert(!irt_ispri(ir->t));
> - if (irt_isfp(ir->t)) {
> + if (!LJ_SOFTFP && irt_isfp(ir->t)) {
> if ((ci->flags & CCI_CASTU64)) {
> /* Use spill slot or temp slots. */
> int32_t ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
> @@ -377,6 +389,7 @@ static void asm_retf(ASMState *as, IRIns *ir)
>
> /* -- Type conversions ---------------------------------------------------- */
>
> +#if !LJ_SOFTFP
> static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
> {
> RegSet allow = RSET_FPR;
> @@ -409,15 +422,23 @@ static void asm_tobit(ASMState *as, IRIns *ir)
> emit_fai(as, PPCI_STFD, tmp, RID_SP, SPOFS_TMP);
> emit_fab(as, PPCI_FADD, tmp, left, right);
> }
> +#endif
>
> static void asm_conv(ASMState *as, IRIns *ir)
> {
> IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> +#if !LJ_SOFTFP
> int stfp = (st == IRT_NUM || st == IRT_FLOAT);
> +#endif
> IRRef lref = ir->op1;
> - lua_assert(irt_type(ir->t) != st);
> lua_assert(!(irt_isint64(ir->t) ||
> (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
> +#if LJ_SOFTFP
> + /* FP conversions are handled by SPLIT. */
> + lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
> + /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
> +#else
> + lua_assert(irt_type(ir->t) != st);
> if (irt_isfp(ir->t)) {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> if (stfp) { /* FP to FP conversion. */
> @@ -476,7 +497,9 @@ static void asm_conv(ASMState *as, IRIns *ir)
> emit_fb(as, PPCI_FCTIWZ, tmp, left);
> }
> }
> - } else {
> + } else
> +#endif
> + {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> if (st >= IRT_I8 && st <= IRT_U16) { /* Extend to 32 bit integer. */
> Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> @@ -496,17 +519,41 @@ static void asm_strto(ASMState *as, IRIns *ir)
> {
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
> IRRef args[2];
> - int32_t ofs;
> + int32_t ofs = SPOFS_TMP;
> +#if LJ_SOFTFP
> + ra_evictset(as, RSET_SCRATCH);
> + if (ra_used(ir)) {
> + if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> + (ir->s & 1) == LJ_BE && (ir->s ^ 1) == (ir+1)->s) {
> + int i;
> + for (i = 0; i < 2; i++) {
> + Reg r = (ir+i)->r;
> + if (ra_hasreg(r)) {
> + ra_free(as, r);
> + ra_modified(as, r);
> + emit_spload(as, ir+i, r, sps_scale((ir+i)->s));
> + }
> + }
> + ofs = sps_scale(ir->s & ~1);
> + } else {
> + Reg rhi = ra_dest(as, ir+1, RSET_GPR);
> + Reg rlo = ra_dest(as, ir, rset_exclude(RSET_GPR, rhi));
> + emit_tai(as, PPCI_LWZ, rhi, RID_SP, ofs);
> + emit_tai(as, PPCI_LWZ, rlo, RID_SP, ofs+4);
> + }
> + }
> +#else
> RegSet drop = RSET_SCRATCH;
> if (ra_hasreg(ir->r)) rset_set(drop, ir->r); /* Spill dest reg (if any). */
> ra_evictset(as, drop);
> + if (ir->s) ofs = sps_scale(ir->s);
> +#endif
> asm_guardcc(as, CC_EQ);
> emit_ai(as, PPCI_CMPWI, RID_RET, 0); /* Test return status. */
> args[0] = ir->op1; /* GCstr *str */
> args[1] = ASMREF_TMP1; /* TValue *n */
> asm_gencall(as, ci, args);
> /* Store the result to the spill slot or temp slots. */
> - ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
> emit_tai(as, PPCI_ADDI, ra_releasetmp(as, ASMREF_TMP1), RID_SP, ofs);
> }
>
> @@ -530,7 +577,10 @@ static void asm_tvptr(ASMState *as, Reg dest, IRRef ref)
> Reg src = ra_alloc1(as, ref, allow);
> emit_setgl(as, src, tmptv.gcr);
> }
> - type = ra_allock(as, irt_toitype(ir->t), allow);
> + if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> + type = ra_alloc1(as, ref+1, allow);
> + else
> + type = ra_allock(as, irt_toitype(ir->t), allow);
> emit_setgl(as, type, tmptv.it);
> }
> }
> @@ -574,11 +624,27 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> Reg tisnum = RID_NONE, tmpnum = RID_NONE;
> IRRef refkey = ir->op2;
> IRIns *irkey = IR(refkey);
> + int isk = irref_isk(refkey);
> IRType1 kt = irkey->t;
> uint32_t khash;
> MCLabel l_end, l_loop, l_next;
>
> rset_clear(allow, tab);
> +#if LJ_SOFTFP
> + if (!isk) {
> + key = ra_alloc1(as, refkey, allow);
> + rset_clear(allow, key);
> + if (irkey[1].o == IR_HIOP) {
> + if (ra_hasreg((irkey+1)->r)) {
> + tmpnum = (irkey+1)->r;
> + ra_noweak(as, tmpnum);
> + } else {
> + tmpnum = ra_allocref(as, refkey+1, allow);
> + }
> + rset_clear(allow, tmpnum);
> + }
> + }
> +#else
> if (irt_isnum(kt)) {
> key = ra_alloc1(as, refkey, RSET_FPR);
> tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
> @@ -588,6 +654,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> key = ra_alloc1(as, refkey, allow);
> rset_clear(allow, key);
> }
> +#endif
> tmp2 = ra_scratch(as, allow);
> rset_clear(allow, tmp2);
>
> @@ -610,7 +677,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> asm_guardcc(as, CC_EQ);
> else
> emit_condbranch(as, PPCI_BC|PPCF_Y, CC_EQ, l_end);
> - if (irt_isnum(kt)) {
> + if (!LJ_SOFTFP && irt_isnum(kt)) {
> emit_fab(as, PPCI_FCMPU, 0, tmpnum, key);
> emit_condbranch(as, PPCI_BC, CC_GE, l_next);
> emit_ab(as, PPCI_CMPLW, tmp1, tisnum);
> @@ -620,7 +687,10 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_ab(as, PPCI_CMPW, tmp2, key);
> emit_condbranch(as, PPCI_BC, CC_NE, l_next);
> }
> - emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
> + if (LJ_SOFTFP && ra_hasreg(tmpnum))
> + emit_ab(as, PPCI_CMPW, tmp1, tmpnum);
> + else
> + emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
> if (!irt_ispri(kt))
> emit_tai(as, PPCI_LWZ, tmp2, dest, (int32_t)offsetof(Node, key.gcr));
> }
> @@ -629,19 +699,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> (((char *)as->mcp-(char *)l_loop) & 0xffffu);
>
> /* Load main position relative to tab->node into dest. */
> - khash = irref_isk(refkey) ? ir_khash(irkey) : 1;
> + khash = isk ? ir_khash(irkey) : 1;
> if (khash == 0) {
> emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
> } else {
> Reg tmphash = tmp1;
> - if (irref_isk(refkey))
> + if (isk)
> tmphash = ra_allock(as, khash, allow);
> emit_tab(as, PPCI_ADD, dest, dest, tmp1);
> emit_tai(as, PPCI_MULLI, tmp1, tmp1, sizeof(Node));
> emit_asb(as, PPCI_AND, tmp1, tmp2, tmphash);
> emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
> emit_tai(as, PPCI_LWZ, tmp2, tab, (int32_t)offsetof(GCtab, hmask));
> - if (irref_isk(refkey)) {
> + if (isk) {
> /* Nothing to do. */
> } else if (irt_isstr(kt)) {
> emit_tai(as, PPCI_LWZ, tmp1, key, (int32_t)offsetof(GCstr, hash));
> @@ -651,13 +721,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_asb(as, PPCI_XOR, tmp1, tmp1, tmp2);
> emit_rotlwi(as, tmp1, tmp1, (HASH_ROT2+HASH_ROT1)&31);
> emit_tab(as, PPCI_SUBF, tmp2, dest, tmp2);
> - if (irt_isnum(kt)) {
> + if (LJ_SOFTFP ? (irkey[1].o == IR_HIOP) : irt_isnum(kt)) {
> +#if LJ_SOFTFP
> + emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
> + emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> + emit_tab(as, PPCI_ADD, tmp1, tmpnum, tmpnum);
> +#else
> int32_t ofs = ra_spill(as, irkey);
> emit_asb(as, PPCI_XOR, tmp2, tmp2, tmp1);
> emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> emit_tab(as, PPCI_ADD, tmp1, tmp1, tmp1);
> emit_tai(as, PPCI_LWZ, tmp2, RID_SP, ofs+4);
> emit_tai(as, PPCI_LWZ, tmp1, RID_SP, ofs);
> +#endif
> } else {
> emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
> emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> @@ -784,8 +860,8 @@ static PPCIns asm_fxloadins(IRIns *ir)
> case IRT_U8: return PPCI_LBZ;
> case IRT_I16: return PPCI_LHA;
> case IRT_U16: return PPCI_LHZ;
> - case IRT_NUM: return PPCI_LFD;
> - case IRT_FLOAT: return PPCI_LFS;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_LFD;
> + case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_LFS;
> default: return PPCI_LWZ;
> }
> }
> @@ -795,8 +871,8 @@ static PPCIns asm_fxstoreins(IRIns *ir)
> switch (irt_type(ir->t)) {
> case IRT_I8: case IRT_U8: return PPCI_STB;
> case IRT_I16: case IRT_U16: return PPCI_STH;
> - case IRT_NUM: return PPCI_STFD;
> - case IRT_FLOAT: return PPCI_STFS;
> + case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_STFD;
> + case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_STFS;
> default: return PPCI_STW;
> }
> }
> @@ -839,7 +915,8 @@ static void asm_fstore(ASMState *as, IRIns *ir)
>
> static void asm_xload(ASMState *as, IRIns *ir)
> {
> - Reg dest = ra_dest(as, ir, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> + Reg dest = ra_dest(as, ir,
> + (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
> lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
> if (irt_isi8(ir->t))
> emit_as(as, PPCI_EXTSB, dest, dest);
> @@ -857,7 +934,8 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
> Reg src = ra_alloc1(as, irb->op1, RSET_GPR);
> asm_fusexrefx(as, PPCI_STWBRX, src, ir->op1, rset_exclude(RSET_GPR, src));
> } else {
> - Reg src = ra_alloc1(as, ir->op2, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> + Reg src = ra_alloc1(as, ir->op2,
> + (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
> asm_fusexref(as, asm_fxstoreins(ir), src, ir->op1,
> rset_exclude(RSET_GPR, src), ofs);
> }
> @@ -871,10 +949,19 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
> Reg dest = RID_NONE, type = RID_TMP, tmp = RID_TMP, idx;
> RegSet allow = RSET_GPR;
> int32_t ofs = AHUREF_LSX;
> + if (LJ_SOFTFP && (ir+1)->o == IR_HIOP) {
> + t.irt = IRT_NUM;
> + if (ra_used(ir+1)) {
> + type = ra_dest(as, ir+1, allow);
> + rset_clear(allow, type);
> + }
> + ofs = 0;
> + }
> if (ra_used(ir)) {
> - lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> - if (!irt_isnum(t)) ofs = 0;
> - dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> + lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> + irt_isint(ir->t) || irt_isaddr(ir->t));
> + if (LJ_SOFTFP || !irt_isnum(t)) ofs = 0;
> + dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> }
> idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> @@ -883,12 +970,13 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
> asm_guardcc(as, CC_GE);
> emit_ab(as, PPCI_CMPLW, type, tisnum);
> if (ra_hasreg(dest)) {
> - if (ofs == AHUREF_LSX) {
> + if (!LJ_SOFTFP && ofs == AHUREF_LSX) {
> tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_GPR,
> (idx&255)), (idx>>8)));
> emit_fab(as, PPCI_LFDX, dest, (idx&255), tmp);
> } else {
> - emit_fai(as, PPCI_LFD, dest, idx, ofs);
> + emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest, idx,
> + ofs+4*LJ_SOFTFP);
> }
> }
> } else {
> @@ -911,7 +999,7 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
> int32_t ofs = AHUREF_LSX;
> if (ir->r == RID_SINK)
> return;
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> src = ra_alloc1(as, ir->op2, RSET_FPR);
> } else {
> if (!irt_ispri(ir->t)) {
> @@ -919,11 +1007,14 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
> rset_clear(allow, src);
> ofs = 0;
> }
> - type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> + if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> + type = ra_alloc1(as, (ir+1)->op2, allow);
> + else
> + type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> rset_clear(allow, type);
> }
> idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> if (ofs == AHUREF_LSX) {
> emit_fab(as, PPCI_STFDX, src, (idx&255), RID_TMP);
> emit_slwi(as, RID_TMP, (idx>>8), 3);
> @@ -948,21 +1039,33 @@ static void asm_sload(ASMState *as, IRIns *ir)
> IRType1 t = ir->t;
> Reg dest = RID_NONE, type = RID_NONE, base;
> RegSet allow = RSET_GPR;
> + int hiop = (LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> + if (hiop)
> + t.irt = IRT_NUM;
> lua_assert(!(ir->op2 & IRSLOAD_PARENT)); /* Handled by asm_head_side(). */
> - lua_assert(irt_isguard(t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> + lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> lua_assert(LJ_DUALNUM ||
> !irt_isint(t) || (ir->op2 & (IRSLOAD_CONVERT|IRSLOAD_FRAME)));
> +#if LJ_SOFTFP
> + lua_assert(!(ir->op2 & IRSLOAD_CONVERT)); /* Handled by LJ_SOFTFP SPLIT. */
> + if (hiop && ra_used(ir+1)) {
> + type = ra_dest(as, ir+1, allow);
> + rset_clear(allow, type);
> + }
> +#else
> if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
> dest = ra_scratch(as, RSET_FPR);
> asm_tointg(as, ir, dest);
> t.irt = IRT_NUM; /* Continue with a regular number type check. */
> - } else if (ra_used(ir)) {
> + } else
> +#endif
> + if (ra_used(ir)) {
> lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> - dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> + dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
> rset_clear(allow, dest);
> base = ra_alloc1(as, REF_BASE, allow);
> rset_clear(allow, base);
> - if ((ir->op2 & IRSLOAD_CONVERT)) {
> + if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
> if (irt_isint(t)) {
> emit_tai(as, PPCI_LWZ, dest, RID_SP, SPOFS_TMPLO);
> dest = ra_scratch(as, RSET_FPR);
> @@ -994,10 +1097,13 @@ dotypecheck:
> if ((ir->op2 & IRSLOAD_TYPECHECK)) {
> Reg tisnum = ra_allock(as, (int32_t)LJ_TISNUM, allow);
> asm_guardcc(as, CC_GE);
> - emit_ab(as, PPCI_CMPLW, RID_TMP, tisnum);
> +#if !LJ_SOFTFP
> type = RID_TMP;
> +#endif
> + emit_ab(as, PPCI_CMPLW, type, tisnum);
> }
> - if (ra_hasreg(dest)) emit_fai(as, PPCI_LFD, dest, base, ofs-4);
> + if (ra_hasreg(dest)) emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest,
> + base, ofs-(LJ_SOFTFP?0:4));
> } else {
> if ((ir->op2 & IRSLOAD_TYPECHECK)) {
> asm_guardcc(as, CC_NE);
> @@ -1122,6 +1228,7 @@ static void asm_obar(ASMState *as, IRIns *ir)
>
> /* -- Arithmetic and logic operations ------------------------------------- */
>
> +#if !LJ_SOFTFP
> static void asm_fparith(ASMState *as, IRIns *ir, PPCIns pi)
> {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> @@ -1149,13 +1256,17 @@ static void asm_fpmath(ASMState *as, IRIns *ir)
> else
> asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
> }
> +#endif
>
> static void asm_add(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> if (!asm_fusemadd(as, ir, PPCI_FMADD, PPCI_FMADD))
> asm_fparith(as, ir, PPCI_FADD);
> - } else {
> + } else
> +#endif
> + {
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> PPCIns pi;
> @@ -1194,10 +1305,13 @@ static void asm_add(ASMState *as, IRIns *ir)
>
> static void asm_sub(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> if (!asm_fusemadd(as, ir, PPCI_FMSUB, PPCI_FNMSUB))
> asm_fparith(as, ir, PPCI_FSUB);
> - } else {
> + } else
> +#endif
> + {
> PPCIns pi = PPCI_SUBF;
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg left, right;
> @@ -1223,9 +1337,12 @@ static void asm_sub(ASMState *as, IRIns *ir)
>
> static void asm_mul(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> asm_fparith(as, ir, PPCI_FMUL);
> - } else {
> + } else
> +#endif
> + {
> PPCIns pi = PPCI_MULLW;
> Reg dest = ra_dest(as, ir, RSET_GPR);
> Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> @@ -1253,9 +1370,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
>
> static void asm_neg(ASMState *as, IRIns *ir)
> {
> +#if !LJ_SOFTFP
> if (irt_isnum(ir->t)) {
> asm_fpunary(as, ir, PPCI_FNEG);
> - } else {
> + } else
> +#endif
> + {
> Reg dest, left;
> PPCIns pi = PPCI_NEG;
> if (as->flagmcp == as->mcp) {
> @@ -1566,9 +1686,40 @@ static void asm_bitshift(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pik)
> PPCI_RLWINM|PPCF_MB(0)|PPCF_ME(31))
> #define asm_bror(as, ir) lua_assert(0)
>
> +#if LJ_SOFTFP
> +static void asm_sfpmin_max(ASMState *as, IRIns *ir)
> +{
> + CCallInfo ci = lj_ir_callinfo[IRCALL_softfp_cmp];
> + IRRef args[4];
> + MCLabel l_right, l_end;
> + Reg desthi = ra_dest(as, ir, RSET_GPR), destlo = ra_dest(as, ir+1, RSET_GPR);
> + Reg righthi, lefthi = ra_alloc2(as, ir, RSET_GPR);
> + Reg rightlo, leftlo = ra_alloc2(as, ir+1, RSET_GPR);
> + PPCCC cond = (IROp)ir->o == IR_MIN ? CC_EQ : CC_NE;
> + righthi = (lefthi >> 8); lefthi &= 255;
> + rightlo = (leftlo >> 8); leftlo &= 255;
> + args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> + args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> + l_end = emit_label(as);
> + if (desthi != righthi) emit_mr(as, desthi, righthi);
> + if (destlo != rightlo) emit_mr(as, destlo, rightlo);
> + l_right = emit_label(as);
> + if (l_end != l_right) emit_jmp(as, l_end);
> + if (desthi != lefthi) emit_mr(as, desthi, lefthi);
> + if (destlo != leftlo) emit_mr(as, destlo, leftlo);
> + if (l_right == as->mcp+1) {
> + cond ^= 4; l_right = l_end; ++as->mcp;
> + }
> + emit_condbranch(as, PPCI_BC, cond, l_right);
> + ra_evictset(as, RSET_SCRATCH);
> + emit_cmpi(as, RID_RET, 1);
> + asm_gencall(as, &ci, args);
> +}
> +#endif
> +
> static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
> {
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> Reg dest = ra_dest(as, ir, RSET_FPR);
> Reg tmp = dest;
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> @@ -1656,7 +1807,7 @@ static void asm_intcomp_(ASMState *as, IRRef lref, IRRef rref, Reg cr, PPCCC cc)
> static void asm_comp(ASMState *as, IRIns *ir)
> {
> PPCCC cc = asm_compmap[ir->o];
> - if (irt_isnum(ir->t)) {
> + if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> right = (left >> 8); left &= 255;
> asm_guardcc(as, (cc >> 4));
> @@ -1677,6 +1828,44 @@ static void asm_comp(ASMState *as, IRIns *ir)
>
> #define asm_equal(as, ir) asm_comp(as, ir)
>
> +#if LJ_SOFTFP
> +/* SFP comparisons. */
> +static void asm_sfpcomp(ASMState *as, IRIns *ir)
> +{
> + const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
> + RegSet drop = RSET_SCRATCH;
> + Reg r;
> + IRRef args[4];
> + args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> + args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> +
> + for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> + if (!rset_test(as->freeset, r) &&
> + regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
> + rset_clear(drop, r);
> + }
> + ra_evictset(as, drop);
> + asm_setupresult(as, ir, ci);
> + switch ((IROp)ir->o) {
> + case IR_ULT:
> + asm_guardcc(as, CC_EQ);
> + emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> + case IR_ULE:
> + asm_guardcc(as, CC_EQ);
> + emit_ai(as, PPCI_CMPWI, RID_RET, 1);
> + break;
> + case IR_GE: case IR_GT:
> + asm_guardcc(as, CC_EQ);
> + emit_ai(as, PPCI_CMPWI, RID_RET, 2);
> + default:
> + asm_guardcc(as, (asm_compmap[ir->o] & 0xf));
> + emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> + break;
> + }
> + asm_gencall(as, ci, args);
> +}
> +#endif
> +
> #if LJ_HASFFI
> /* 64 bit integer comparisons. */
> static void asm_comp64(ASMState *as, IRIns *ir)
> @@ -1706,19 +1895,36 @@ static void asm_comp64(ASMState *as, IRIns *ir)
> /* Hiword op of a split 64 bit op. Previous op must be the loword op. */
> static void asm_hiop(ASMState *as, IRIns *ir)
> {
> -#if LJ_HASFFI
> +#if LJ_HASFFI || LJ_SOFTFP
> /* HIOP is marked as a store because it needs its own DCE logic. */
> int uselo = ra_used(ir-1), usehi = ra_used(ir); /* Loword/hiword used? */
> if (LJ_UNLIKELY(!(as->flags & JIT_F_OPT_DCE))) uselo = usehi = 1;
> if ((ir-1)->o == IR_CONV) { /* Conversions to/from 64 bit. */
> as->curins--; /* Always skip the CONV. */
> +#if LJ_HASFFI && !LJ_SOFTFP
> if (usehi || uselo)
> asm_conv64(as, ir);
> return;
> +#endif
> } else if ((ir-1)->o <= IR_NE) { /* 64 bit integer comparisons. ORDER IR. */
> as->curins--; /* Always skip the loword comparison. */
> +#if LJ_SOFTFP
> + if (!irt_isint(ir->t)) {
> + asm_sfpcomp(as, ir-1);
> + return;
> + }
> +#endif
> +#if LJ_HASFFI
> asm_comp64(as, ir);
> +#endif
> + return;
> +#if LJ_SOFTFP
> + } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
> + as->curins--; /* Always skip the loword min/max. */
> + if (uselo || usehi)
> + asm_sfpmin_max(as, ir-1);
> return;
> +#endif
> } else if ((ir-1)->o == IR_XSTORE) {
> as->curins--; /* Handle both stores here. */
> if ((ir-1)->r != RID_SINK) {
> @@ -1729,14 +1935,27 @@ static void asm_hiop(ASMState *as, IRIns *ir)
> }
> if (!usehi) return; /* Skip unused hiword op for all remaining ops. */
> switch ((ir-1)->o) {
> +#if LJ_HASFFI
> case IR_ADD: as->curins--; asm_add64(as, ir); break;
> case IR_SUB: as->curins--; asm_sub64(as, ir); break;
> case IR_NEG: as->curins--; asm_neg64(as, ir); break;
> +#endif
> +#if LJ_SOFTFP
> + case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
> + case IR_STRTO:
> + if (!uselo)
> + ra_allocref(as, ir->op1, RSET_GPR); /* Mark lo op as used. */
> + break;
> +#endif
> case IR_CALLN:
> + case IR_CALLS:
> case IR_CALLXS:
> if (!uselo)
> ra_allocref(as, ir->op1, RID2RSET(RID_RETLO)); /* Mark lo op as used. */
> break;
> +#if LJ_SOFTFP
> + case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
> +#endif
> case IR_CNEWI:
> /* Nothing to do here. Handled by lo op itself. */
> break;
> @@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if ((sn & SNAP_NORESTORE))
> continue;
> if (irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> + Reg tmp;
> + RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> + lua_assert(irref_isk(ref)); /* LJ_SOFTFP: must be a number constant. */
> + tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.lo, allow);
> + emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?4:0));
> + if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
> + tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
> + emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#else
> Reg src = ra_alloc1(as, ref, RSET_FPR);
> emit_fai(as, PPCI_STFD, src, RID_BASE, ofs);
> +#endif
> } else {
> Reg type;
> RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> @@ -1814,6 +2044,10 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> if ((sn & (SNAP_CONT|SNAP_FRAME))) {
> if (s == 0) continue; /* Do not overwrite link to previous frame. */
> type = ra_allock(as, (int32_t)(*flinks--), allow);
> +#if LJ_SOFTFP
> + } else if ((sn & SNAP_SOFTFPNUM)) {
> + type = ra_alloc1(as, ref+1, rset_exclude(RSET_GPR, RID_BASE));
> +#endif
> } else {
> type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> }
> @@ -1950,14 +2184,15 @@ static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci)
> int nslots = 2, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR;
> asm_collectargs(as, ir, ci, args);
> for (i = 0; i < nargs; i++)
> - if (args[i] && irt_isfp(IR(args[i])->t)) {
> + if (!LJ_SOFTFP && args[i] && irt_isfp(IR(args[i])->t)) {
> if (nfpr > 0) nfpr--; else nslots = (nslots+3) & ~1;
> } else {
> if (ngpr > 0) ngpr--; else nslots++;
> }
> if (nslots > as->evenspill) /* Leave room for args in stack slots. */
> as->evenspill = nslots;
> - return irt_isfp(ir->t) ? REGSP_HINT(RID_FPRET) : REGSP_HINT(RID_RET);
> + return (!LJ_SOFTFP && irt_isfp(ir->t)) ? REGSP_HINT(RID_FPRET) :
> + REGSP_HINT(RID_RET);
> }
>
> static void asm_setup_target(ASMState *as)
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (5 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 11:58 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 14:31 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
` (14 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
This patch is a follow-up for the commit
a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
timer from lj_profile"). It moves the timer machinery to the separate
module. Unfortunately, the `profile_{un}lock()` calls for Windows and
PS3 wasn't updated to access `lj_profile_timer` structure instead of
`ProfileState`.
Also, it is a follow-up to the commit
f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
section"). Since this commit the system-dependent header <unistd.h> and
`write()`, `open()`, `close()` functions are used. They are undefining
on Windows, so this leads to error during the build.
This patch fixes the aforementioned misbehaviour. After it our fork may
be built on Windows at least.
---
src/lib_misc.c | 16 ++++++++++++----
src/lj_profile_timer.h | 8 ++++----
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/src/lib_misc.c b/src/lib_misc.c
index c18d297e..1913a622 100644
--- a/src/lib_misc.c
+++ b/src/lib_misc.c
@@ -8,10 +8,6 @@
#define lib_misc_c
#define LUA_LIB
-#include <errno.h>
-#include <fcntl.h>
-#include <unistd.h>
-
#include "lua.h"
#include "lmisclib.h"
#include "lauxlib.h"
@@ -25,6 +21,12 @@
#include "lj_memprof.h"
+#include <errno.h>
+#include <fcntl.h>
+#if !LJ_TARGET_WINDOWS
+#include <unistd.h>
+#endif
+
/* ------------------------------------------------------------------------ */
static LJ_AINLINE void setnumfield(struct lua_State *L, GCtab *t,
@@ -78,6 +80,7 @@ LJLIB_CF(misc_getmetrics)
/* --------- profile common section --------------------------------------- */
+#if !LJ_TARGET_WINDOWS
/*
** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
*/
@@ -434,6 +437,7 @@ LJLIB_CF(misc_memprof_stop)
lua_pushboolean(L, 1);
return 1;
}
+#endif /* !LJ_TARGET_WINDOWS */
#include "lj_libdef.h"
@@ -441,6 +445,7 @@ LJLIB_CF(misc_memprof_stop)
LUALIB_API int luaopen_misc(struct lua_State *L)
{
+#if !LJ_TARGET_WINDOWS
luaM_sysprof_set_writer(buffer_writer_default);
luaM_sysprof_set_on_stop(on_stop_cb_default);
/*
@@ -448,9 +453,12 @@ LUALIB_API int luaopen_misc(struct lua_State *L)
** backtracing function.
*/
luaM_sysprof_set_backtracer(NULL);
+#endif /* !LJ_TARGET_WINDOWS */
LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
+#if !LJ_TARGET_WINDOWS
LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
LJ_LIB_REG(L, LUAM_MISCLIBNAME ".sysprof", misc_sysprof);
+#endif /* !LJ_TARGET_WINDOWS */
return 1;
}
diff --git a/src/lj_profile_timer.h b/src/lj_profile_timer.h
index 1deeea53..b3e1a6e9 100644
--- a/src/lj_profile_timer.h
+++ b/src/lj_profile_timer.h
@@ -25,8 +25,8 @@
#if LJ_TARGET_PS3
#include <sys/timer.h>
#endif
-#define profile_lock(ps) pthread_mutex_lock(&ps->lock)
-#define profile_unlock(ps) pthread_mutex_unlock(&ps->lock)
+#define profile_lock(ps) pthread_mutex_lock(&ps->timer.lock)
+#define profile_unlock(ps) pthread_mutex_unlock(&ps->timer.lock)
#elif LJ_PROFILE_WTHREAD
@@ -38,8 +38,8 @@
#include <windows.h>
#endif
typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int);
-#define profile_lock(ps) EnterCriticalSection(&ps->lock)
-#define profile_unlock(ps) LeaveCriticalSection(&ps->lock)
+#define profile_lock(ps) EnterCriticalSection(&ps->timer.lock)
+#define profile_unlock(ps) LeaveCriticalSection(&ps->timer.lock)
#endif
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:58 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:40 ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:31 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:58 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few typos below.
On Wed, Aug 09, 2023 at 06:35:56PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> This patch is a follow-up for the commit
> a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
> timer from lj_profile"). It moves the timer machinery to the separate
Typo: s/to the/to a/
> module. Unfortunately, the `profile_{un}lock()` calls for Windows and
> PS3 wasn't updated to access `lj_profile_timer` structure instead of
Typo: s/wasn't/weren't/
> `ProfileState`.
>
> Also, it is a follow-up to the commit
> f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
> section"). Since this commit the system-dependent header <unistd.h> and
> `write()`, `open()`, `close()` functions are used. They are undefining
Typo: s/undefining/undefined/
> on Windows, so this leads to error during the build.
Typo: s/error/errors/
>
> This patch fixes the aforementioned misbehaviour. After it our fork may
> be built on Windows at least.
> ---
> src/lib_misc.c | 16 ++++++++++++----
> src/lj_profile_timer.h | 8 ++++----
> 2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index c18d297e..1913a622 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c
> @@ -8,10 +8,6 @@
> #define lib_misc_c
> #define LUA_LIB
>
> -#include <errno.h>
> -#include <fcntl.h>
> -#include <unistd.h>
> -
> #include "lua.h"
> #include "lmisclib.h"
> #include "lauxlib.h"
> @@ -25,6 +21,12 @@
>
> #include "lj_memprof.h"
>
> +#include <errno.h>
> +#include <fcntl.h>
> +#if !LJ_TARGET_WINDOWS
> +#include <unistd.h>
> +#endif
> +
> /* ------------------------------------------------------------------------ */
>
> static LJ_AINLINE void setnumfield(struct lua_State *L, GCtab *t,
> @@ -78,6 +80,7 @@ LJLIB_CF(misc_getmetrics)
>
> /* --------- profile common section --------------------------------------- */
>
> +#if !LJ_TARGET_WINDOWS
> /*
> ** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
> */
> @@ -434,6 +437,7 @@ LJLIB_CF(misc_memprof_stop)
> lua_pushboolean(L, 1);
> return 1;
> }
> +#endif /* !LJ_TARGET_WINDOWS */
>
> #include "lj_libdef.h"
>
> @@ -441,6 +445,7 @@ LJLIB_CF(misc_memprof_stop)
>
> LUALIB_API int luaopen_misc(struct lua_State *L)
> {
> +#if !LJ_TARGET_WINDOWS
> luaM_sysprof_set_writer(buffer_writer_default);
> luaM_sysprof_set_on_stop(on_stop_cb_default);
> /*
> @@ -448,9 +453,12 @@ LUALIB_API int luaopen_misc(struct lua_State *L)
> ** backtracing function.
> */
> luaM_sysprof_set_backtracer(NULL);
> +#endif /* !LJ_TARGET_WINDOWS */
>
> LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
> +#if !LJ_TARGET_WINDOWS
> LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
> LJ_LIB_REG(L, LUAM_MISCLIBNAME ".sysprof", misc_sysprof);
> +#endif /* !LJ_TARGET_WINDOWS */
> return 1;
> }
> diff --git a/src/lj_profile_timer.h b/src/lj_profile_timer.h
> index 1deeea53..b3e1a6e9 100644
> --- a/src/lj_profile_timer.h
> +++ b/src/lj_profile_timer.h
> @@ -25,8 +25,8 @@
> #if LJ_TARGET_PS3
> #include <sys/timer.h>
> #endif
> -#define profile_lock(ps) pthread_mutex_lock(&ps->lock)
> -#define profile_unlock(ps) pthread_mutex_unlock(&ps->lock)
> +#define profile_lock(ps) pthread_mutex_lock(&ps->timer.lock)
> +#define profile_unlock(ps) pthread_mutex_unlock(&ps->timer.lock)
>
> #elif LJ_PROFILE_WTHREAD
>
> @@ -38,8 +38,8 @@
> #include <windows.h>
> #endif
> typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int);
> -#define profile_lock(ps) EnterCriticalSection(&ps->lock)
> -#define profile_unlock(ps) LeaveCriticalSection(&ps->lock)
> +#define profile_lock(ps) EnterCriticalSection(&ps->timer.lock)
> +#define profile_unlock(ps) LeaveCriticalSection(&ps->timer.lock)
>
> #endif
>
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
2023-08-15 11:58 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:40 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:40 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thansk for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few typos below.
> On Wed, Aug 09, 2023 at 06:35:56PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > This patch is a follow-up for the commit
> > a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
> > timer from lj_profile"). It moves the timer machinery to the separate
> Typo: s/to the/to a/
Fixed.
> > module. Unfortunately, the `profile_{un}lock()` calls for Windows and
> > PS3 wasn't updated to access `lj_profile_timer` structure instead of
> Typo: s/wasn't/weren't/
Fixed, thanks.
> > `ProfileState`.
> >
> > Also, it is a follow-up to the commit
> > f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
> > section"). Since this commit the system-dependent header <unistd.h> and
> > `write()`, `open()`, `close()` functions are used. They are undefining
> Typo: s/undefining/undefined/
Fixed.
> > on Windows, so this leads to error during the build.
> Typo: s/error/errors/
Fixed.
> >
> > This patch fixes the aforementioned misbehaviour. After it our fork may
> > be built on Windows at least.
> > ---
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
2023-08-15 11:58 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 14:31 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:31 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey
thanks for the patch! LGTM
On 8/9/23 18:35, Sergey Kaplun wrote:
> This patch is a follow-up for the commit
> a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
> timer from lj_profile"). It moves the timer machinery to the separate
> module. Unfortunately, the `profile_{un}lock()` calls for Windows and
> PS3 wasn't updated to access `lj_profile_timer` structure instead of
> `ProfileState`.
>
> Also, it is a follow-up to the commit
> f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
> section"). Since this commit the system-dependent header <unistd.h> and
> `write()`, `open()`, `close()` functions are used. They are undefining
> on Windows, so this leads to error during the build.
>
> This patch fixes the aforementioned misbehaviour. After it our fork may
> be built on Windows at least.
> ---
> src/lib_misc.c | 16 ++++++++++++----
> src/lj_profile_timer.h | 8 ++++----
> 2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index c18d297e..1913a622 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c
> @@ -8,10 +8,6 @@
> #define lib_misc_c
> #define LUA_LIB
>
> -#include <errno.h>
> -#include <fcntl.h>
> -#include <unistd.h>
> -
> #include "lua.h"
> #include "lmisclib.h"
> #include "lauxlib.h"
> @@ -25,6 +21,12 @@
>
> #include "lj_memprof.h"
>
> +#include <errno.h>
> +#include <fcntl.h>
> +#if !LJ_TARGET_WINDOWS
> +#include <unistd.h>
> +#endif
> +
> /* ------------------------------------------------------------------------ */
>
> static LJ_AINLINE void setnumfield(struct lua_State *L, GCtab *t,
> @@ -78,6 +80,7 @@ LJLIB_CF(misc_getmetrics)
>
> /* --------- profile common section --------------------------------------- */
>
> +#if !LJ_TARGET_WINDOWS
> /*
> ** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
> */
> @@ -434,6 +437,7 @@ LJLIB_CF(misc_memprof_stop)
> lua_pushboolean(L, 1);
> return 1;
> }
> +#endif /* !LJ_TARGET_WINDOWS */
>
> #include "lj_libdef.h"
>
> @@ -441,6 +445,7 @@ LJLIB_CF(misc_memprof_stop)
>
> LUALIB_API int luaopen_misc(struct lua_State *L)
> {
> +#if !LJ_TARGET_WINDOWS
> luaM_sysprof_set_writer(buffer_writer_default);
> luaM_sysprof_set_on_stop(on_stop_cb_default);
> /*
> @@ -448,9 +453,12 @@ LUALIB_API int luaopen_misc(struct lua_State *L)
> ** backtracing function.
> */
> luaM_sysprof_set_backtracer(NULL);
> +#endif /* !LJ_TARGET_WINDOWS */
>
> LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
> +#if !LJ_TARGET_WINDOWS
> LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
> LJ_LIB_REG(L, LUAM_MISCLIBNAME ".sysprof", misc_sysprof);
> +#endif /* !LJ_TARGET_WINDOWS */
> return 1;
> }
> diff --git a/src/lj_profile_timer.h b/src/lj_profile_timer.h
> index 1deeea53..b3e1a6e9 100644
> --- a/src/lj_profile_timer.h
> +++ b/src/lj_profile_timer.h
> @@ -25,8 +25,8 @@
> #if LJ_TARGET_PS3
> #include <sys/timer.h>
> #endif
> -#define profile_lock(ps) pthread_mutex_lock(&ps->lock)
> -#define profile_unlock(ps) pthread_mutex_unlock(&ps->lock)
> +#define profile_lock(ps) pthread_mutex_lock(&ps->timer.lock)
> +#define profile_unlock(ps) pthread_mutex_unlock(&ps->timer.lock)
>
> #elif LJ_PROFILE_WTHREAD
>
> @@ -38,8 +38,8 @@
> #include <windows.h>
> #endif
> typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int);
> -#define profile_lock(ps) EnterCriticalSection(&ps->lock)
> -#define profile_unlock(ps) LeaveCriticalSection(&ps->lock)
> +#define profile_lock(ps) EnterCriticalSection(&ps->timer.lock)
> +#define profile_unlock(ps) LeaveCriticalSection(&ps->timer.lock)
>
> #endif
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (6 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 12:09 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 16:40 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes Sergey Kaplun via Tarantool-patches
` (13 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Contributed by Ben Pye.
(cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
This patch adds partial support for the Universal Windows Platform [1]
in LuaJIT.
This includes:
* `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
Platform.
* `LJ_WIN_VALLOC()` macro is introduced to use instead of
`VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
* `LJ_WIN_VPROTECT()` macro is introduced to use instead of
`VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
* `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
`LoadLibraryExA()` [6] (custom implementation using
`LoadPackagedLibrary()` [7] for UWP).
Note that the following features are not implemented for UWP:
* `io.popen()`.
* LuaJIT profiler's (`jit.p`) timer for Windows has not very high
resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
not used, because the <winmm.dll> library isn't loaded.
[1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
[2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
[3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
[4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
[5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
[6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
[7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
[8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
[9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
Sergey Kaplun:
* added the description for the feature
Part of tarantool/tarantool#8825
---
doc/ext_ffi_api.html | 2 ++
src/lib_ffi.c | 3 +++
src/lib_io.c | 4 ++--
src/lib_package.c | 24 +++++++++++++++++++++++-
src/lj_alloc.c | 6 +++---
src/lj_arch.h | 19 +++++++++++++++++++
src/lj_ccallback.c | 4 ++--
src/lj_clib.c | 20 ++++++++++++++++----
src/lj_mcode.c | 8 ++++----
src/lj_profile_timer.c | 8 ++++----
10 files changed, 78 insertions(+), 20 deletions(-)
diff --git a/doc/ext_ffi_api.html b/doc/ext_ffi_api.html
index 91af2e1d..c72191d1 100644
--- a/doc/ext_ffi_api.html
+++ b/doc/ext_ffi_api.html
@@ -469,6 +469,8 @@ otherwise. The following parameters are currently defined:
<tr class="odd">
<td class="abiparam">win</td><td class="abidesc">Windows variant of the standard ABI</td></tr>
<tr class="even">
+<td class="abiparam">uwp</td><td class="abidesc">Universal Windows Platform</td></tr>
+<tr class="odd">
<td class="abiparam">gc64</td><td class="abidesc">64 bit GC references</td></tr>
</table>
diff --git a/src/lib_ffi.c b/src/lib_ffi.c
index 136e98e8..d1fe1a14 100644
--- a/src/lib_ffi.c
+++ b/src/lib_ffi.c
@@ -746,6 +746,9 @@ LJLIB_CF(ffi_abi) LJLIB_REC(.)
#endif
#if LJ_ABI_WIN
case H_(4ab624a8,4ab624a8): b = 1; break; /* win */
+#endif
+#if LJ_TARGET_UWP
+ case H_(a40f0bcb,a40f0bcb): b = 1; break; /* uwp */
#endif
case H_(3af93066,1f001464): b = 1; break; /* le/be */
#if LJ_GC64
diff --git a/src/lib_io.c b/src/lib_io.c
index f0108227..db995ae6 100644
--- a/src/lib_io.c
+++ b/src/lib_io.c
@@ -99,7 +99,7 @@ static int io_file_close(lua_State *L, IOFileUD *iof)
int stat = -1;
#if LJ_TARGET_POSIX
stat = pclose(iof->fp);
-#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE
+#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP
stat = _pclose(iof->fp);
#else
lua_assert(0);
@@ -414,7 +414,7 @@ LJLIB_CF(io_open)
LJLIB_CF(io_popen)
{
-#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE)
+#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP)
const char *fname = strdata(lj_lib_checkstr(L, 1));
GCstr *s = lj_lib_optstr(L, 2);
const char *mode = s ? strdata(s) : "r";
diff --git a/src/lib_package.c b/src/lib_package.c
index 67959a10..b49f0209 100644
--- a/src/lib_package.c
+++ b/src/lib_package.c
@@ -76,6 +76,20 @@ static const char *ll_bcsym(void *lib, const char *sym)
BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
#endif
+#if LJ_TARGET_UWP
+void *LJ_WIN_LOADLIBA(const char *path)
+{
+ DWORD err = GetLastError();
+ wchar_t wpath[256];
+ HANDLE lib = NULL;
+ if (MultiByteToWideChar(CP_ACP, 0, path, -1, wpath, 256) > 0) {
+ lib = LoadPackagedLibrary(wpath, 0);
+ }
+ SetLastError(err);
+ return lib;
+}
+#endif
+
#undef setprogdir
static void setprogdir(lua_State *L)
@@ -119,7 +133,7 @@ static void ll_unloadlib(void *lib)
static void *ll_load(lua_State *L, const char *path, int gl)
{
- HINSTANCE lib = LoadLibraryExA(path, NULL, 0);
+ HINSTANCE lib = LJ_WIN_LOADLIBA(path);
if (lib == NULL) pusherror(L);
UNUSED(gl);
return lib;
@@ -132,17 +146,25 @@ static lua_CFunction ll_sym(lua_State *L, void *lib, const char *sym)
return f;
}
+#if LJ_TARGET_UWP
+EXTERN_C IMAGE_DOS_HEADER __ImageBase;
+#endif
+
static const char *ll_bcsym(void *lib, const char *sym)
{
if (lib) {
return (const char *)GetProcAddress((HINSTANCE)lib, sym);
} else {
+#if LJ_TARGET_UWP
+ return (const char *)GetProcAddress((HINSTANCE)&__ImageBase, sym);
+#else
HINSTANCE h = GetModuleHandleA(NULL);
const char *p = (const char *)GetProcAddress(h, sym);
if (p == NULL && GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
(const char *)ll_bcsym, &h))
p = (const char *)GetProcAddress(h, sym);
return p;
+#endif
}
}
diff --git a/src/lj_alloc.c b/src/lj_alloc.c
index f7039b5b..9e2fb1f6 100644
--- a/src/lj_alloc.c
+++ b/src/lj_alloc.c
@@ -167,7 +167,7 @@ static void *DIRECT_MMAP(size_t size)
static void *CALL_MMAP(size_t size)
{
DWORD olderr = GetLastError();
- void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
+ void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
SetLastError(olderr);
return ptr ? ptr : MFAIL;
}
@@ -176,8 +176,8 @@ static void *CALL_MMAP(size_t size)
static void *DIRECT_MMAP(size_t size)
{
DWORD olderr = GetLastError();
- void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
- PAGE_READWRITE);
+ void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
+ PAGE_READWRITE);
SetLastError(olderr);
return ptr ? ptr : MFAIL;
}
diff --git a/src/lj_arch.h b/src/lj_arch.h
index 7397492e..0351e046 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -141,6 +141,13 @@
#define LJ_TARGET_GC64 1
#endif
+#ifdef _UWP
+#define LJ_TARGET_UWP 1
+#if LUAJIT_TARGET == LUAJIT_ARCH_X64
+#define LJ_TARGET_GC64 1
+#endif
+#endif
+
#define LJ_NUMMODE_SINGLE 0 /* Single-number mode only. */
#define LJ_NUMMODE_SINGLE_DUAL 1 /* Default to single-number mode. */
#define LJ_NUMMODE_DUAL 2 /* Dual-number mode only. */
@@ -586,6 +593,18 @@
#define LJ_ABI_WIN 0
#endif
+#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_UWP
+#define LJ_WIN_VALLOC VirtualAllocFromApp
+#define LJ_WIN_VPROTECT VirtualProtectFromApp
+extern void *LJ_WIN_LOADLIBA(const char *path);
+#else
+#define LJ_WIN_VALLOC VirtualAlloc
+#define LJ_WIN_VPROTECT VirtualProtect
+#define LJ_WIN_LOADLIBA(path) LoadLibraryExA((path), NULL, 0)
+#endif
+#endif
+
#if defined(LUAJIT_NO_UNWIND) || __GNU_COMPACT_EH__ || defined(__symbian__) || LJ_TARGET_IOS || LJ_TARGET_PS3 || LJ_TARGET_PS4
#define LJ_NO_UNWIND 1
#endif
diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
index c33190d7..37edd00f 100644
--- a/src/lj_ccallback.c
+++ b/src/lj_ccallback.c
@@ -267,7 +267,7 @@ static void callback_mcode_new(CTState *cts)
if (CALLBACK_MAX_SLOT == 0)
lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
#if LJ_TARGET_WINDOWS
- p = VirtualAlloc(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
+ p = LJ_WIN_VALLOC(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
if (!p)
lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
#elif LJ_TARGET_POSIX
@@ -285,7 +285,7 @@ static void callback_mcode_new(CTState *cts)
#if LJ_TARGET_WINDOWS
{
DWORD oprot;
- VirtualProtect(p, sz, PAGE_EXECUTE_READ, &oprot);
+ LJ_WIN_VPROTECT(p, sz, PAGE_EXECUTE_READ, &oprot);
}
#elif LJ_TARGET_POSIX
mprotect(p, sz, (PROT_READ|PROT_EXEC));
diff --git a/src/lj_clib.c b/src/lj_clib.c
index c06c0915..a8672052 100644
--- a/src/lj_clib.c
+++ b/src/lj_clib.c
@@ -158,11 +158,13 @@ BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
/* Default libraries. */
enum {
CLIB_HANDLE_EXE,
+#if !LJ_TARGET_UWP
CLIB_HANDLE_DLL,
CLIB_HANDLE_CRT,
CLIB_HANDLE_KERNEL32,
CLIB_HANDLE_USER32,
CLIB_HANDLE_GDI32,
+#endif
CLIB_HANDLE_MAX
};
@@ -208,7 +210,7 @@ static const char *clib_extname(lua_State *L, const char *name)
static void *clib_loadlib(lua_State *L, const char *name, int global)
{
DWORD oldwerr = GetLastError();
- void *h = (void *)LoadLibraryExA(clib_extname(L, name), NULL, 0);
+ void *h = LJ_WIN_LOADLIBA(clib_extname(L, name));
if (!h) clib_error(L, "cannot load module " LUA_QS ": %s", name);
SetLastError(oldwerr);
UNUSED(global);
@@ -218,6 +220,7 @@ static void *clib_loadlib(lua_State *L, const char *name, int global)
static void clib_unloadlib(CLibrary *cl)
{
if (cl->handle == CLIB_DEFHANDLE) {
+#if !LJ_TARGET_UWP
MSize i;
for (i = CLIB_HANDLE_KERNEL32; i < CLIB_HANDLE_MAX; i++) {
void *h = clib_def_handle[i];
@@ -226,11 +229,16 @@ static void clib_unloadlib(CLibrary *cl)
FreeLibrary((HINSTANCE)h);
}
}
+#endif
} else if (cl->handle) {
FreeLibrary((HINSTANCE)cl->handle);
}
}
+#if LJ_TARGET_UWP
+EXTERN_C IMAGE_DOS_HEADER __ImageBase;
+#endif
+
static void *clib_getsym(CLibrary *cl, const char *name)
{
void *p = NULL;
@@ -239,6 +247,9 @@ static void *clib_getsym(CLibrary *cl, const char *name)
for (i = 0; i < CLIB_HANDLE_MAX; i++) {
HINSTANCE h = (HINSTANCE)clib_def_handle[i];
if (!(void *)h) { /* Resolve default library handles (once). */
+#if LJ_TARGET_UWP
+ h = (HINSTANCE)&__ImageBase;
+#else
switch (i) {
case CLIB_HANDLE_EXE: GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &h); break;
case CLIB_HANDLE_DLL:
@@ -249,11 +260,12 @@ static void *clib_getsym(CLibrary *cl, const char *name)
GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
(const char *)&_fmode, &h);
break;
- case CLIB_HANDLE_KERNEL32: h = LoadLibraryExA("kernel32.dll", NULL, 0); break;
- case CLIB_HANDLE_USER32: h = LoadLibraryExA("user32.dll", NULL, 0); break;
- case CLIB_HANDLE_GDI32: h = LoadLibraryExA("gdi32.dll", NULL, 0); break;
+ case CLIB_HANDLE_KERNEL32: h = LJ_WIN_LOADLIBA("kernel32.dll"); break;
+ case CLIB_HANDLE_USER32: h = LJ_WIN_LOADLIBA("user32.dll"); break;
+ case CLIB_HANDLE_GDI32: h = LJ_WIN_LOADLIBA("gdi32.dll"); break;
}
if (!h) continue;
+#endif
clib_def_handle[i] = (void *)h;
}
p = (void *)GetProcAddress(h, name);
diff --git a/src/lj_mcode.c b/src/lj_mcode.c
index c6361018..10db4457 100644
--- a/src/lj_mcode.c
+++ b/src/lj_mcode.c
@@ -66,8 +66,8 @@ void lj_mcode_sync(void *start, void *end)
static void *mcode_alloc_at(jit_State *J, uintptr_t hint, size_t sz, DWORD prot)
{
- void *p = VirtualAlloc((void *)hint, sz,
- MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
+ void *p = LJ_WIN_VALLOC((void *)hint, sz,
+ MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
if (!p && !hint)
lj_trace_err(J, LJ_TRERR_MCODEAL);
return p;
@@ -82,7 +82,7 @@ static void mcode_free(jit_State *J, void *p, size_t sz)
static int mcode_setprot(void *p, size_t sz, DWORD prot)
{
DWORD oprot;
- return !VirtualProtect(p, sz, prot, &oprot);
+ return !LJ_WIN_VPROTECT(p, sz, prot, &oprot);
}
#elif LJ_TARGET_POSIX
@@ -255,7 +255,7 @@ static void *mcode_alloc(jit_State *J, size_t sz)
/* All memory addresses are reachable by relative jumps. */
static void *mcode_alloc(jit_State *J, size_t sz)
{
-#ifdef __OpenBSD__
+#if defined(__OpenBSD__) || LJ_TARGET_UWP
/* Allow better executable memory allocation for OpenBSD W^X mode. */
void *p = mcode_alloc_at(J, 0, sz, MCPROT_RUN);
if (p && mcode_setprot(p, sz, MCPROT_GEN)) {
diff --git a/src/lj_profile_timer.c b/src/lj_profile_timer.c
index 056fd1f7..0b859457 100644
--- a/src/lj_profile_timer.c
+++ b/src/lj_profile_timer.c
@@ -84,7 +84,7 @@ static DWORD WINAPI timer_thread(void *timerx)
{
lj_profile_timer *timer = (lj_profile_timer *)timerx;
int interval = timer->opt.interval_msec;
-#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
timer->wmm_tbp(interval);
#endif
while (1) {
@@ -92,7 +92,7 @@ static DWORD WINAPI timer_thread(void *timerx)
if (timer->abort) break;
timer->opt.handler();
}
-#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
timer->wmm_tep(interval);
#endif
return 0;
@@ -101,9 +101,9 @@ static DWORD WINAPI timer_thread(void *timerx)
/* Start profiling timer thread. */
void lj_profile_timer_start(lj_profile_timer *timer)
{
-#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
if (!timer->wmm) { /* Load WinMM library on-demand. */
- timer->wmm = LoadLibraryExA("winmm.dll", NULL, 0);
+ timer->wmm = LJ_WIN_LOADLIBA("winmm.dll");
if (timer->wmm) {
timer->wmm_tbp =
(WMM_TPFUNC)GetProcAddress(timer->wmm, "timeBeginPeriod");
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
@ 2023-08-15 12:09 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:50 ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:40 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 12:09 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:57PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Contributed by Ben Pye.
>
> (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
>
> This patch adds partial support for the Universal Windows Platform [1]
> in LuaJIT.
> This includes:
> * `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
Typo: s/is Unviersal/is the Universal/
> Platform.
> * `LJ_WIN_VALLOC()` macro is introduced to use instead of
Typo: s/to use/to be used/
> `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
> * `LJ_WIN_VPROTECT()` macro is introduced to use instead of
Typo: s/to use/to be used/
> `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
> * `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
Typo: s/to use/to be used/
> `LoadLibraryExA()` [6] (custom implementation using
> `LoadPackagedLibrary()` [7] for UWP).
>
> Note that the following features are not implemented for UWP:
> * `io.popen()`.
> * LuaJIT profiler's (`jit.p`) timer for Windows has not very high
> resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
Typo: s/not very high/a low/
> not used, because the <winmm.dll> library isn't loaded.
>
> [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
> [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
> [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
> [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
> [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
> [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
> [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
> [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
> [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> doc/ext_ffi_api.html | 2 ++
> src/lib_ffi.c | 3 +++
> src/lib_io.c | 4 ++--
> src/lib_package.c | 24 +++++++++++++++++++++++-
> src/lj_alloc.c | 6 +++---
> src/lj_arch.h | 19 +++++++++++++++++++
> src/lj_ccallback.c | 4 ++--
> src/lj_clib.c | 20 ++++++++++++++++----
> src/lj_mcode.c | 8 ++++----
> src/lj_profile_timer.c | 8 ++++----
> 10 files changed, 78 insertions(+), 20 deletions(-)
>
> diff --git a/doc/ext_ffi_api.html b/doc/ext_ffi_api.html
> index 91af2e1d..c72191d1 100644
> --- a/doc/ext_ffi_api.html
> +++ b/doc/ext_ffi_api.html
> @@ -469,6 +469,8 @@ otherwise. The following parameters are currently defined:
> <tr class="odd">
> <td class="abiparam">win</td><td class="abidesc">Windows variant of the standard ABI</td></tr>
> <tr class="even">
> +<td class="abiparam">uwp</td><td class="abidesc">Universal Windows Platform</td></tr>
> +<tr class="odd">
> <td class="abiparam">gc64</td><td class="abidesc">64 bit GC references</td></tr>
> </table>
>
> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
> index 136e98e8..d1fe1a14 100644
> --- a/src/lib_ffi.c
> +++ b/src/lib_ffi.c
> @@ -746,6 +746,9 @@ LJLIB_CF(ffi_abi) LJLIB_REC(.)
> #endif
> #if LJ_ABI_WIN
> case H_(4ab624a8,4ab624a8): b = 1; break; /* win */
> +#endif
> +#if LJ_TARGET_UWP
> + case H_(a40f0bcb,a40f0bcb): b = 1; break; /* uwp */
> #endif
It is not obvious what happens here and it is not mentioned in the commit message.
Please add a description of this change too.
> case H_(3af93066,1f001464): b = 1; break; /* le/be */
> #if LJ_GC64
> diff --git a/src/lib_io.c b/src/lib_io.c
> index f0108227..db995ae6 100644
> --- a/src/lib_io.c
> +++ b/src/lib_io.c
> @@ -99,7 +99,7 @@ static int io_file_close(lua_State *L, IOFileUD *iof)
> int stat = -1;
> #if LJ_TARGET_POSIX
> stat = pclose(iof->fp);
> -#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE
> +#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP
> stat = _pclose(iof->fp);
> #else
> lua_assert(0);
> @@ -414,7 +414,7 @@ LJLIB_CF(io_open)
>
> LJLIB_CF(io_popen)
> {
> -#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE)
> +#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP)
> const char *fname = strdata(lj_lib_checkstr(L, 1));
> GCstr *s = lj_lib_optstr(L, 2);
> const char *mode = s ? strdata(s) : "r";
> diff --git a/src/lib_package.c b/src/lib_package.c
> index 67959a10..b49f0209 100644
> --- a/src/lib_package.c
> +++ b/src/lib_package.c
> @@ -76,6 +76,20 @@ static const char *ll_bcsym(void *lib, const char *sym)
> BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
> #endif
>
> +#if LJ_TARGET_UWP
> +void *LJ_WIN_LOADLIBA(const char *path)
> +{
> + DWORD err = GetLastError();
> + wchar_t wpath[256];
> + HANDLE lib = NULL;
> + if (MultiByteToWideChar(CP_ACP, 0, path, -1, wpath, 256) > 0) {
> + lib = LoadPackagedLibrary(wpath, 0);
> + }
> + SetLastError(err);
> + return lib;
> +}
> +#endif
> +
> #undef setprogdir
>
> static void setprogdir(lua_State *L)
> @@ -119,7 +133,7 @@ static void ll_unloadlib(void *lib)
>
> static void *ll_load(lua_State *L, const char *path, int gl)
> {
> - HINSTANCE lib = LoadLibraryExA(path, NULL, 0);
> + HINSTANCE lib = LJ_WIN_LOADLIBA(path);
> if (lib == NULL) pusherror(L);
> UNUSED(gl);
> return lib;
> @@ -132,17 +146,25 @@ static lua_CFunction ll_sym(lua_State *L, void *lib, const char *sym)
> return f;
> }
>
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
> static const char *ll_bcsym(void *lib, const char *sym)
> {
> if (lib) {
> return (const char *)GetProcAddress((HINSTANCE)lib, sym);
> } else {
> +#if LJ_TARGET_UWP
> + return (const char *)GetProcAddress((HINSTANCE)&__ImageBase, sym);
> +#else
> HINSTANCE h = GetModuleHandleA(NULL);
> const char *p = (const char *)GetProcAddress(h, sym);
> if (p == NULL && GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
> (const char *)ll_bcsym, &h))
> p = (const char *)GetProcAddress(h, sym);
> return p;
> +#endif
> }
> }
>
> diff --git a/src/lj_alloc.c b/src/lj_alloc.c
> index f7039b5b..9e2fb1f6 100644
> --- a/src/lj_alloc.c
> +++ b/src/lj_alloc.c
> @@ -167,7 +167,7 @@ static void *DIRECT_MMAP(size_t size)
> static void *CALL_MMAP(size_t size)
> {
> DWORD olderr = GetLastError();
> - void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> + void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> SetLastError(olderr);
> return ptr ? ptr : MFAIL;
> }
> @@ -176,8 +176,8 @@ static void *CALL_MMAP(size_t size)
> static void *DIRECT_MMAP(size_t size)
> {
> DWORD olderr = GetLastError();
> - void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> - PAGE_READWRITE);
> + void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> + PAGE_READWRITE);
> SetLastError(olderr);
> return ptr ? ptr : MFAIL;
> }
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 7397492e..0351e046 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -141,6 +141,13 @@
> #define LJ_TARGET_GC64 1
> #endif
>
> +#ifdef _UWP
> +#define LJ_TARGET_UWP 1
> +#if LUAJIT_TARGET == LUAJIT_ARCH_X64
> +#define LJ_TARGET_GC64 1
> +#endif
> +#endif
> +
> #define LJ_NUMMODE_SINGLE 0 /* Single-number mode only. */
> #define LJ_NUMMODE_SINGLE_DUAL 1 /* Default to single-number mode. */
> #define LJ_NUMMODE_DUAL 2 /* Dual-number mode only. */
> @@ -586,6 +593,18 @@
> #define LJ_ABI_WIN 0
> #endif
>
> +#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_UWP
> +#define LJ_WIN_VALLOC VirtualAllocFromApp
> +#define LJ_WIN_VPROTECT VirtualProtectFromApp
> +extern void *LJ_WIN_LOADLIBA(const char *path);
> +#else
> +#define LJ_WIN_VALLOC VirtualAlloc
> +#define LJ_WIN_VPROTECT VirtualProtect
> +#define LJ_WIN_LOADLIBA(path) LoadLibraryExA((path), NULL, 0)
> +#endif
> +#endif
> +
> #if defined(LUAJIT_NO_UNWIND) || __GNU_COMPACT_EH__ || defined(__symbian__) || LJ_TARGET_IOS || LJ_TARGET_PS3 || LJ_TARGET_PS4
> #define LJ_NO_UNWIND 1
> #endif
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index c33190d7..37edd00f 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -267,7 +267,7 @@ static void callback_mcode_new(CTState *cts)
> if (CALLBACK_MAX_SLOT == 0)
> lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
> #if LJ_TARGET_WINDOWS
> - p = VirtualAlloc(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> + p = LJ_WIN_VALLOC(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> if (!p)
> lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
> #elif LJ_TARGET_POSIX
> @@ -285,7 +285,7 @@ static void callback_mcode_new(CTState *cts)
> #if LJ_TARGET_WINDOWS
> {
> DWORD oprot;
> - VirtualProtect(p, sz, PAGE_EXECUTE_READ, &oprot);
> + LJ_WIN_VPROTECT(p, sz, PAGE_EXECUTE_READ, &oprot);
> }
> #elif LJ_TARGET_POSIX
> mprotect(p, sz, (PROT_READ|PROT_EXEC));
> diff --git a/src/lj_clib.c b/src/lj_clib.c
> index c06c0915..a8672052 100644
> --- a/src/lj_clib.c
> +++ b/src/lj_clib.c
> @@ -158,11 +158,13 @@ BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
> /* Default libraries. */
> enum {
> CLIB_HANDLE_EXE,
> +#if !LJ_TARGET_UWP
> CLIB_HANDLE_DLL,
> CLIB_HANDLE_CRT,
> CLIB_HANDLE_KERNEL32,
> CLIB_HANDLE_USER32,
> CLIB_HANDLE_GDI32,
> +#endif
> CLIB_HANDLE_MAX
> };
>
> @@ -208,7 +210,7 @@ static const char *clib_extname(lua_State *L, const char *name)
> static void *clib_loadlib(lua_State *L, const char *name, int global)
> {
> DWORD oldwerr = GetLastError();
> - void *h = (void *)LoadLibraryExA(clib_extname(L, name), NULL, 0);
> + void *h = LJ_WIN_LOADLIBA(clib_extname(L, name));
> if (!h) clib_error(L, "cannot load module " LUA_QS ": %s", name);
> SetLastError(oldwerr);
> UNUSED(global);
> @@ -218,6 +220,7 @@ static void *clib_loadlib(lua_State *L, const char *name, int global)
> static void clib_unloadlib(CLibrary *cl)
> {
> if (cl->handle == CLIB_DEFHANDLE) {
> +#if !LJ_TARGET_UWP
> MSize i;
> for (i = CLIB_HANDLE_KERNEL32; i < CLIB_HANDLE_MAX; i++) {
> void *h = clib_def_handle[i];
> @@ -226,11 +229,16 @@ static void clib_unloadlib(CLibrary *cl)
> FreeLibrary((HINSTANCE)h);
> }
> }
> +#endif
> } else if (cl->handle) {
> FreeLibrary((HINSTANCE)cl->handle);
> }
> }
>
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
> static void *clib_getsym(CLibrary *cl, const char *name)
> {
> void *p = NULL;
> @@ -239,6 +247,9 @@ static void *clib_getsym(CLibrary *cl, const char *name)
> for (i = 0; i < CLIB_HANDLE_MAX; i++) {
> HINSTANCE h = (HINSTANCE)clib_def_handle[i];
> if (!(void *)h) { /* Resolve default library handles (once). */
> +#if LJ_TARGET_UWP
> + h = (HINSTANCE)&__ImageBase;
> +#else
> switch (i) {
> case CLIB_HANDLE_EXE: GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &h); break;
> case CLIB_HANDLE_DLL:
> @@ -249,11 +260,12 @@ static void *clib_getsym(CLibrary *cl, const char *name)
> GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
> (const char *)&_fmode, &h);
> break;
> - case CLIB_HANDLE_KERNEL32: h = LoadLibraryExA("kernel32.dll", NULL, 0); break;
> - case CLIB_HANDLE_USER32: h = LoadLibraryExA("user32.dll", NULL, 0); break;
> - case CLIB_HANDLE_GDI32: h = LoadLibraryExA("gdi32.dll", NULL, 0); break;
> + case CLIB_HANDLE_KERNEL32: h = LJ_WIN_LOADLIBA("kernel32.dll"); break;
> + case CLIB_HANDLE_USER32: h = LJ_WIN_LOADLIBA("user32.dll"); break;
> + case CLIB_HANDLE_GDI32: h = LJ_WIN_LOADLIBA("gdi32.dll"); break;
> }
> if (!h) continue;
> +#endif
> clib_def_handle[i] = (void *)h;
> }
> p = (void *)GetProcAddress(h, name);
> diff --git a/src/lj_mcode.c b/src/lj_mcode.c
> index c6361018..10db4457 100644
> --- a/src/lj_mcode.c
> +++ b/src/lj_mcode.c
> @@ -66,8 +66,8 @@ void lj_mcode_sync(void *start, void *end)
>
> static void *mcode_alloc_at(jit_State *J, uintptr_t hint, size_t sz, DWORD prot)
> {
> - void *p = VirtualAlloc((void *)hint, sz,
> - MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
> + void *p = LJ_WIN_VALLOC((void *)hint, sz,
> + MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
> if (!p && !hint)
> lj_trace_err(J, LJ_TRERR_MCODEAL);
> return p;
> @@ -82,7 +82,7 @@ static void mcode_free(jit_State *J, void *p, size_t sz)
> static int mcode_setprot(void *p, size_t sz, DWORD prot)
> {
> DWORD oprot;
> - return !VirtualProtect(p, sz, prot, &oprot);
> + return !LJ_WIN_VPROTECT(p, sz, prot, &oprot);
> }
>
> #elif LJ_TARGET_POSIX
> @@ -255,7 +255,7 @@ static void *mcode_alloc(jit_State *J, size_t sz)
> /* All memory addresses are reachable by relative jumps. */
> static void *mcode_alloc(jit_State *J, size_t sz)
> {
> -#ifdef __OpenBSD__
> +#if defined(__OpenBSD__) || LJ_TARGET_UWP
> /* Allow better executable memory allocation for OpenBSD W^X mode. */
> void *p = mcode_alloc_at(J, 0, sz, MCPROT_RUN);
> if (p && mcode_setprot(p, sz, MCPROT_GEN)) {
> diff --git a/src/lj_profile_timer.c b/src/lj_profile_timer.c
> index 056fd1f7..0b859457 100644
> --- a/src/lj_profile_timer.c
> +++ b/src/lj_profile_timer.c
> @@ -84,7 +84,7 @@ static DWORD WINAPI timer_thread(void *timerx)
> {
> lj_profile_timer *timer = (lj_profile_timer *)timerx;
> int interval = timer->opt.interval_msec;
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
> timer->wmm_tbp(interval);
> #endif
> while (1) {
> @@ -92,7 +92,7 @@ static DWORD WINAPI timer_thread(void *timerx)
> if (timer->abort) break;
> timer->opt.handler();
> }
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
> timer->wmm_tep(interval);
> #endif
> return 0;
> @@ -101,9 +101,9 @@ static DWORD WINAPI timer_thread(void *timerx)
> /* Start profiling timer thread. */
> void lj_profile_timer_start(lj_profile_timer *timer)
> {
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
> if (!timer->wmm) { /* Load WinMM library on-demand. */
> - timer->wmm = LoadLibraryExA("winmm.dll", NULL, 0);
> + timer->wmm = LJ_WIN_LOADLIBA("winmm.dll");
> if (timer->wmm) {
> timer->wmm_tbp =
> (WMM_TPFUNC)GetProcAddress(timer->wmm, "timeBeginPeriod");
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
2023-08-15 12:09 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:50 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:50 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:57PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Ben Pye.
> >
> > (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
> >
> > This patch adds partial support for the Universal Windows Platform [1]
> > in LuaJIT.
> > This includes:
> > * `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
> Typo: s/is Unviersal/is the Universal/
Fixed.
> > Platform.
> > * `LJ_WIN_VALLOC()` macro is introduced to use instead of
> Typo: s/to use/to be used/
Fixed.
> > `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
> > * `LJ_WIN_VPROTECT()` macro is introduced to use instead of
> Typo: s/to use/to be used/
Fixed.
> > `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
> > * `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
> Typo: s/to use/to be used/
Fixed.
> > `LoadLibraryExA()` [6] (custom implementation using
> > `LoadPackagedLibrary()` [7] for UWP).
> >
> > Note that the following features are not implemented for UWP:
> > * `io.popen()`.
> > * LuaJIT profiler's (`jit.p`) timer for Windows has not very high
> > resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
> Typo: s/not very high/a low/
Fixed.
> > not used, because the <winmm.dll> library isn't loaded.
> >
> > [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
> > [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
> > [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
> > [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
> > [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
> > [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
> > [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
> > [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
> > [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
> >
> > Sergey Kaplun:
> > * added the description for the feature
> >
> > Part of tarantool/tarantool#8825
> > ---
<snipped>
> > +#if LJ_TARGET_UWP
> > + case H_(a40f0bcb,a40f0bcb): b = 1; break; /* uwp */
> > #endif
> It is not obvious what happens here and it is not mentioned in the commit message.
> Please add a description of this change too.
Added.
> > case H_(3af93066,1f001464): b = 1; break; /* le/be */
The new commit message is the following:
| Windows: Add UWP support, part 1.
|
| Contributed by Ben Pye.
|
| (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
|
| This patch adds partial support for the Universal Windows Platform [1]
| in LuaJIT.
| This includes:
| * `LJ_TARGET_UWP` is introduced to mark that target is the Universal
| Windows Platform.
| * `LJ_WIN_VALLOC()` macro is introduced to be used instead of
| `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
| * `LJ_WIN_VPROTECT()` macro is introduced to be used instead of
| `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
| * `LJ_WIN_LOADLIBA()` macro is introduced to be used instead of
| `LoadLibraryExA()` [6] (custom implementation using
| `LoadPackagedLibrary()` [7] for UWP).
| * Now `ffi.abi()` also provides information about "uwp" parameter for
| target ABI.
|
| Note that the following features are not implemented for UWP:
| * `io.popen()`.
| * LuaJIT profiler's (`jit.p`) timer for Windows has a low resolution
| since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are not used,
| because the <winmm.dll> library isn't loaded.
|
| [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
| [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
| [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
| [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
| [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
| [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
| [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
| [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
| [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
|
| Sergey Kaplun:
| * added the description for the feature
|
| Part of tarantool/tarantool#8825
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
2023-08-15 12:09 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 16:40 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:40 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey
Thanks for the patch! LGTM
On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Ben Pye.
>
> (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
>
> This patch adds partial support for the Universal Windows Platform [1]
> in LuaJIT.
> This includes:
> * `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
> Platform.
> * `LJ_WIN_VALLOC()` macro is introduced to use instead of
> `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
> * `LJ_WIN_VPROTECT()` macro is introduced to use instead of
> `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
> * `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
> `LoadLibraryExA()` [6] (custom implementation using
> `LoadPackagedLibrary()` [7] for UWP).
>
> Note that the following features are not implemented for UWP:
> * `io.popen()`.
> * LuaJIT profiler's (`jit.p`) timer for Windows has not very high
> resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
> not used, because the <winmm.dll> library isn't loaded.
>
> [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
> [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
> [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
> [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
> [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
> [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
> [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
> [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
> [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
> doc/ext_ffi_api.html | 2 ++
> src/lib_ffi.c | 3 +++
> src/lib_io.c | 4 ++--
> src/lib_package.c | 24 +++++++++++++++++++++++-
> src/lj_alloc.c | 6 +++---
> src/lj_arch.h | 19 +++++++++++++++++++
> src/lj_ccallback.c | 4 ++--
> src/lj_clib.c | 20 ++++++++++++++++----
> src/lj_mcode.c | 8 ++++----
> src/lj_profile_timer.c | 8 ++++----
> 10 files changed, 78 insertions(+), 20 deletions(-)
>
> diff --git a/doc/ext_ffi_api.html b/doc/ext_ffi_api.html
> index 91af2e1d..c72191d1 100644
> --- a/doc/ext_ffi_api.html
> +++ b/doc/ext_ffi_api.html
> @@ -469,6 +469,8 @@ otherwise. The following parameters are currently defined:
> <tr class="odd">
> <td class="abiparam">win</td><td class="abidesc">Windows variant of the standard ABI</td></tr>
> <tr class="even">
> +<td class="abiparam">uwp</td><td class="abidesc">Universal Windows Platform</td></tr>
> +<tr class="odd">
> <td class="abiparam">gc64</td><td class="abidesc">64 bit GC references</td></tr>
> </table>
>
> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
> index 136e98e8..d1fe1a14 100644
> --- a/src/lib_ffi.c
> +++ b/src/lib_ffi.c
> @@ -746,6 +746,9 @@ LJLIB_CF(ffi_abi) LJLIB_REC(.)
> #endif
> #if LJ_ABI_WIN
> case H_(4ab624a8,4ab624a8): b = 1; break; /* win */
> +#endif
> +#if LJ_TARGET_UWP
> + case H_(a40f0bcb,a40f0bcb): b = 1; break; /* uwp */
> #endif
> case H_(3af93066,1f001464): b = 1; break; /* le/be */
> #if LJ_GC64
> diff --git a/src/lib_io.c b/src/lib_io.c
> index f0108227..db995ae6 100644
> --- a/src/lib_io.c
> +++ b/src/lib_io.c
> @@ -99,7 +99,7 @@ static int io_file_close(lua_State *L, IOFileUD *iof)
> int stat = -1;
> #if LJ_TARGET_POSIX
> stat = pclose(iof->fp);
> -#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE
> +#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP
> stat = _pclose(iof->fp);
> #else
> lua_assert(0);
> @@ -414,7 +414,7 @@ LJLIB_CF(io_open)
>
> LJLIB_CF(io_popen)
> {
> -#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE)
> +#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP)
> const char *fname = strdata(lj_lib_checkstr(L, 1));
> GCstr *s = lj_lib_optstr(L, 2);
> const char *mode = s ? strdata(s) : "r";
> diff --git a/src/lib_package.c b/src/lib_package.c
> index 67959a10..b49f0209 100644
> --- a/src/lib_package.c
> +++ b/src/lib_package.c
> @@ -76,6 +76,20 @@ static const char *ll_bcsym(void *lib, const char *sym)
> BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
> #endif
>
> +#if LJ_TARGET_UWP
> +void *LJ_WIN_LOADLIBA(const char *path)
> +{
> + DWORD err = GetLastError();
> + wchar_t wpath[256];
> + HANDLE lib = NULL;
> + if (MultiByteToWideChar(CP_ACP, 0, path, -1, wpath, 256) > 0) {
> + lib = LoadPackagedLibrary(wpath, 0);
> + }
> + SetLastError(err);
> + return lib;
> +}
> +#endif
> +
> #undef setprogdir
>
> static void setprogdir(lua_State *L)
> @@ -119,7 +133,7 @@ static void ll_unloadlib(void *lib)
>
> static void *ll_load(lua_State *L, const char *path, int gl)
> {
> - HINSTANCE lib = LoadLibraryExA(path, NULL, 0);
> + HINSTANCE lib = LJ_WIN_LOADLIBA(path);
> if (lib == NULL) pusherror(L);
> UNUSED(gl);
> return lib;
> @@ -132,17 +146,25 @@ static lua_CFunction ll_sym(lua_State *L, void *lib, const char *sym)
> return f;
> }
>
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
> static const char *ll_bcsym(void *lib, const char *sym)
> {
> if (lib) {
> return (const char *)GetProcAddress((HINSTANCE)lib, sym);
> } else {
> +#if LJ_TARGET_UWP
> + return (const char *)GetProcAddress((HINSTANCE)&__ImageBase, sym);
> +#else
> HINSTANCE h = GetModuleHandleA(NULL);
> const char *p = (const char *)GetProcAddress(h, sym);
> if (p == NULL && GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
> (const char *)ll_bcsym, &h))
> p = (const char *)GetProcAddress(h, sym);
> return p;
> +#endif
> }
> }
>
> diff --git a/src/lj_alloc.c b/src/lj_alloc.c
> index f7039b5b..9e2fb1f6 100644
> --- a/src/lj_alloc.c
> +++ b/src/lj_alloc.c
> @@ -167,7 +167,7 @@ static void *DIRECT_MMAP(size_t size)
> static void *CALL_MMAP(size_t size)
> {
> DWORD olderr = GetLastError();
> - void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> + void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> SetLastError(olderr);
> return ptr ? ptr : MFAIL;
> }
> @@ -176,8 +176,8 @@ static void *CALL_MMAP(size_t size)
> static void *DIRECT_MMAP(size_t size)
> {
> DWORD olderr = GetLastError();
> - void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> - PAGE_READWRITE);
> + void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> + PAGE_READWRITE);
> SetLastError(olderr);
> return ptr ? ptr : MFAIL;
> }
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 7397492e..0351e046 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -141,6 +141,13 @@
> #define LJ_TARGET_GC64 1
> #endif
>
> +#ifdef _UWP
> +#define LJ_TARGET_UWP 1
> +#if LUAJIT_TARGET == LUAJIT_ARCH_X64
> +#define LJ_TARGET_GC64 1
> +#endif
> +#endif
> +
> #define LJ_NUMMODE_SINGLE 0 /* Single-number mode only. */
> #define LJ_NUMMODE_SINGLE_DUAL 1 /* Default to single-number mode. */
> #define LJ_NUMMODE_DUAL 2 /* Dual-number mode only. */
> @@ -586,6 +593,18 @@
> #define LJ_ABI_WIN 0
> #endif
>
> +#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_UWP
> +#define LJ_WIN_VALLOC VirtualAllocFromApp
> +#define LJ_WIN_VPROTECT VirtualProtectFromApp
> +extern void *LJ_WIN_LOADLIBA(const char *path);
> +#else
> +#define LJ_WIN_VALLOC VirtualAlloc
> +#define LJ_WIN_VPROTECT VirtualProtect
> +#define LJ_WIN_LOADLIBA(path) LoadLibraryExA((path), NULL, 0)
> +#endif
> +#endif
> +
> #if defined(LUAJIT_NO_UNWIND) || __GNU_COMPACT_EH__ || defined(__symbian__) || LJ_TARGET_IOS || LJ_TARGET_PS3 || LJ_TARGET_PS4
> #define LJ_NO_UNWIND 1
> #endif
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index c33190d7..37edd00f 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -267,7 +267,7 @@ static void callback_mcode_new(CTState *cts)
> if (CALLBACK_MAX_SLOT == 0)
> lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
> #if LJ_TARGET_WINDOWS
> - p = VirtualAlloc(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> + p = LJ_WIN_VALLOC(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> if (!p)
> lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
> #elif LJ_TARGET_POSIX
> @@ -285,7 +285,7 @@ static void callback_mcode_new(CTState *cts)
> #if LJ_TARGET_WINDOWS
> {
> DWORD oprot;
> - VirtualProtect(p, sz, PAGE_EXECUTE_READ, &oprot);
> + LJ_WIN_VPROTECT(p, sz, PAGE_EXECUTE_READ, &oprot);
> }
> #elif LJ_TARGET_POSIX
> mprotect(p, sz, (PROT_READ|PROT_EXEC));
> diff --git a/src/lj_clib.c b/src/lj_clib.c
> index c06c0915..a8672052 100644
> --- a/src/lj_clib.c
> +++ b/src/lj_clib.c
> @@ -158,11 +158,13 @@ BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
> /* Default libraries. */
> enum {
> CLIB_HANDLE_EXE,
> +#if !LJ_TARGET_UWP
> CLIB_HANDLE_DLL,
> CLIB_HANDLE_CRT,
> CLIB_HANDLE_KERNEL32,
> CLIB_HANDLE_USER32,
> CLIB_HANDLE_GDI32,
> +#endif
> CLIB_HANDLE_MAX
> };
>
> @@ -208,7 +210,7 @@ static const char *clib_extname(lua_State *L, const char *name)
> static void *clib_loadlib(lua_State *L, const char *name, int global)
> {
> DWORD oldwerr = GetLastError();
> - void *h = (void *)LoadLibraryExA(clib_extname(L, name), NULL, 0);
> + void *h = LJ_WIN_LOADLIBA(clib_extname(L, name));
> if (!h) clib_error(L, "cannot load module " LUA_QS ": %s", name);
> SetLastError(oldwerr);
> UNUSED(global);
> @@ -218,6 +220,7 @@ static void *clib_loadlib(lua_State *L, const char *name, int global)
> static void clib_unloadlib(CLibrary *cl)
> {
> if (cl->handle == CLIB_DEFHANDLE) {
> +#if !LJ_TARGET_UWP
> MSize i;
> for (i = CLIB_HANDLE_KERNEL32; i < CLIB_HANDLE_MAX; i++) {
> void *h = clib_def_handle[i];
> @@ -226,11 +229,16 @@ static void clib_unloadlib(CLibrary *cl)
> FreeLibrary((HINSTANCE)h);
> }
> }
> +#endif
> } else if (cl->handle) {
> FreeLibrary((HINSTANCE)cl->handle);
> }
> }
>
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
> static void *clib_getsym(CLibrary *cl, const char *name)
> {
> void *p = NULL;
> @@ -239,6 +247,9 @@ static void *clib_getsym(CLibrary *cl, const char *name)
> for (i = 0; i < CLIB_HANDLE_MAX; i++) {
> HINSTANCE h = (HINSTANCE)clib_def_handle[i];
> if (!(void *)h) { /* Resolve default library handles (once). */
> +#if LJ_TARGET_UWP
> + h = (HINSTANCE)&__ImageBase;
> +#else
> switch (i) {
> case CLIB_HANDLE_EXE: GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &h); break;
> case CLIB_HANDLE_DLL:
> @@ -249,11 +260,12 @@ static void *clib_getsym(CLibrary *cl, const char *name)
> GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
> (const char *)&_fmode, &h);
> break;
> - case CLIB_HANDLE_KERNEL32: h = LoadLibraryExA("kernel32.dll", NULL, 0); break;
> - case CLIB_HANDLE_USER32: h = LoadLibraryExA("user32.dll", NULL, 0); break;
> - case CLIB_HANDLE_GDI32: h = LoadLibraryExA("gdi32.dll", NULL, 0); break;
> + case CLIB_HANDLE_KERNEL32: h = LJ_WIN_LOADLIBA("kernel32.dll"); break;
> + case CLIB_HANDLE_USER32: h = LJ_WIN_LOADLIBA("user32.dll"); break;
> + case CLIB_HANDLE_GDI32: h = LJ_WIN_LOADLIBA("gdi32.dll"); break;
> }
> if (!h) continue;
> +#endif
> clib_def_handle[i] = (void *)h;
> }
> p = (void *)GetProcAddress(h, name);
> diff --git a/src/lj_mcode.c b/src/lj_mcode.c
> index c6361018..10db4457 100644
> --- a/src/lj_mcode.c
> +++ b/src/lj_mcode.c
> @@ -66,8 +66,8 @@ void lj_mcode_sync(void *start, void *end)
>
> static void *mcode_alloc_at(jit_State *J, uintptr_t hint, size_t sz, DWORD prot)
> {
> - void *p = VirtualAlloc((void *)hint, sz,
> - MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
> + void *p = LJ_WIN_VALLOC((void *)hint, sz,
> + MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
> if (!p && !hint)
> lj_trace_err(J, LJ_TRERR_MCODEAL);
> return p;
> @@ -82,7 +82,7 @@ static void mcode_free(jit_State *J, void *p, size_t sz)
> static int mcode_setprot(void *p, size_t sz, DWORD prot)
> {
> DWORD oprot;
> - return !VirtualProtect(p, sz, prot, &oprot);
> + return !LJ_WIN_VPROTECT(p, sz, prot, &oprot);
> }
>
> #elif LJ_TARGET_POSIX
> @@ -255,7 +255,7 @@ static void *mcode_alloc(jit_State *J, size_t sz)
> /* All memory addresses are reachable by relative jumps. */
> static void *mcode_alloc(jit_State *J, size_t sz)
> {
> -#ifdef __OpenBSD__
> +#if defined(__OpenBSD__) || LJ_TARGET_UWP
> /* Allow better executable memory allocation for OpenBSD W^X mode. */
> void *p = mcode_alloc_at(J, 0, sz, MCPROT_RUN);
> if (p && mcode_setprot(p, sz, MCPROT_GEN)) {
> diff --git a/src/lj_profile_timer.c b/src/lj_profile_timer.c
> index 056fd1f7..0b859457 100644
> --- a/src/lj_profile_timer.c
> +++ b/src/lj_profile_timer.c
> @@ -84,7 +84,7 @@ static DWORD WINAPI timer_thread(void *timerx)
> {
> lj_profile_timer *timer = (lj_profile_timer *)timerx;
> int interval = timer->opt.interval_msec;
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
> timer->wmm_tbp(interval);
> #endif
> while (1) {
> @@ -92,7 +92,7 @@ static DWORD WINAPI timer_thread(void *timerx)
> if (timer->abort) break;
> timer->opt.handler();
> }
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
> timer->wmm_tep(interval);
> #endif
> return 0;
> @@ -101,9 +101,9 @@ static DWORD WINAPI timer_thread(void *timerx)
> /* Start profiling timer thread. */
> void lj_profile_timer_start(lj_profile_timer *timer)
> {
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
> if (!timer->wmm) { /* Load WinMM library on-demand. */
> - timer->wmm = LoadLibraryExA("winmm.dll", NULL, 0);
> + timer->wmm = LJ_WIN_LOADLIBA("winmm.dll");
> if (timer->wmm) {
> timer->wmm_tbp =
> (WMM_TPFUNC)GetProcAddress(timer->wmm, "timeBeginPeriod");
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (7 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:07 ` Maxim Kokryashkin via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
` (12 subsequent siblings)
21 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
(cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)
This patch refactors FFI parsing of supported C attributes and pragmas,
`ffi.abi()` parameter check. It replaces usage of comparison (with
hardcoded string hashes) with search in the given string with the
format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
"attribute" name.
Sergey Kaplun:
* added the description for the commit
Part of tarantool/tarantool#8825
---
src/lib_ffi.c | 35 ++++++++++------------
src/lj_cparse.c | 77 +++++++++++++++++++++++++++++++------------------
src/lj_cparse.h | 2 ++
3 files changed, 67 insertions(+), 47 deletions(-)
diff --git a/src/lib_ffi.c b/src/lib_ffi.c
index d1fe1a14..62af54c1 100644
--- a/src/lib_ffi.c
+++ b/src/lib_ffi.c
@@ -720,50 +720,47 @@ LJLIB_CF(ffi_fill) LJLIB_REC(.)
return 0;
}
-#define H_(le, be) LJ_ENDIAN_SELECT(0x##le, 0x##be)
-
/* Test ABI string. */
LJLIB_CF(ffi_abi) LJLIB_REC(.)
{
GCstr *s = lj_lib_checkstr(L, 1);
- int b = 0;
- switch (s->hash) {
+ int b = lj_cparse_case(s,
#if LJ_64
- case H_(849858eb,ad35fd06): b = 1; break; /* 64bit */
+ "\00564bit"
#else
- case H_(662d3c79,d0e22477): b = 1; break; /* 32bit */
+ "\00532bit"
#endif
#if LJ_ARCH_HASFPU
- case H_(e33ee463,e33ee463): b = 1; break; /* fpu */
+ "\003fpu"
#endif
#if LJ_ABI_SOFTFP
- case H_(61211a23,c2e8c81c): b = 1; break; /* softfp */
+ "\006softfp"
#else
- case H_(539417a8,8ce0812f): b = 1; break; /* hardfp */
+ "\006hardfp"
#endif
#if LJ_ABI_EABI
- case H_(2182df8f,f2ed1152): b = 1; break; /* eabi */
+ "\004eabi"
#endif
#if LJ_ABI_WIN
- case H_(4ab624a8,4ab624a8): b = 1; break; /* win */
+ "\003win"
#endif
#if LJ_TARGET_UWP
- case H_(a40f0bcb,a40f0bcb): b = 1; break; /* uwp */
+ "\003uwp"
+#endif
+#if LJ_LE
+ "\002le"
+#else
+ "\002be"
#endif
- case H_(3af93066,1f001464): b = 1; break; /* le/be */
#if LJ_GC64
- case H_(9e89d2c9,13c83c92): b = 1; break; /* gc64 */
+ "\004gc64"
#endif
- default:
- break;
- }
+ ) >= 0;
setboolV(L->top-1, b);
setboolV(&G(L)->tmptv2, b); /* Remember for trace recorder. */
return 1;
}
-#undef H_
-
LJLIB_PUSH(top-8) LJLIB_SET(!) /* Store reference to miscmap table. */
LJLIB_CF(ffi_metatype)
diff --git a/src/lj_cparse.c b/src/lj_cparse.c
index fb440567..07c643d4 100644
--- a/src/lj_cparse.c
+++ b/src/lj_cparse.c
@@ -28,6 +28,24 @@
** If in doubt, please check the input against your favorite C compiler.
*/
+/* -- Miscellaneous ------------------------------------------------------- */
+
+/* Match string against a C literal. */
+#define cp_str_is(str, k) \
+ ((str)->len == sizeof(k)-1 && !memcmp(strdata(str), k, sizeof(k)-1))
+
+/* Check string against a linear list of matches. */
+int lj_cparse_case(GCstr *str, const char *match)
+{
+ MSize len;
+ int n;
+ for (n = 0; (len = (MSize)*match++); n++, match += len) {
+ if (str->len == len && !memcmp(match, strdata(str), len))
+ return n;
+ }
+ return -1;
+}
+
/* -- C lexer ------------------------------------------------------------- */
/* C lexer token names. */
@@ -930,8 +948,6 @@ static CTypeID cp_decl_intern(CPState *cp, CPDecl *decl)
/* -- C declaration parser ------------------------------------------------ */
-#define H_(le, be) LJ_ENDIAN_SELECT(0x##le, 0x##be)
-
/* Reset declaration state to declaration specifier. */
static void cp_decl_reset(CPDecl *decl)
{
@@ -1071,44 +1087,57 @@ static void cp_decl_gccattribute(CPState *cp, CPDecl *decl)
attrstr = lj_str_new(cp->L, c+2, attrstr->len-4);
#endif
cp_next(cp);
- switch (attrstr->hash) {
- case H_(64a9208e,8ce14319): case H_(8e6331b2,95a282af): /* aligned */
+ switch (lj_cparse_case(attrstr,
+ "\007aligned" "\013__aligned__"
+ "\006packed" "\012__packed__"
+ "\004mode" "\010__mode__"
+ "\013vector_size" "\017__vector_size__"
+#if LJ_TARGET_X86
+ "\007regparm" "\013__regparm__"
+ "\005cdecl" "\011__cdecl__"
+ "\010thiscall" "\014__thiscall__"
+ "\010fastcall" "\014__fastcall__"
+ "\007stdcall" "\013__stdcall__"
+ "\012sseregparm" "\016__sseregparm__"
+#endif
+ )) {
+ case 0: case 1: /* aligned */
cp_decl_align(cp, decl);
break;
- case H_(42eb47de,f0ede26c): case H_(29f48a09,cf383e0c): /* packed */
+ case 2: case 3: /* packed */
decl->attr |= CTFP_PACKED;
break;
- case H_(0a84eef6,8dfab04c): case H_(995cf92c,d5696591): /* mode */
+ case 4: case 5: /* mode */
cp_decl_mode(cp, decl);
break;
- case H_(0ab31997,2d5213fa): case H_(bf875611,200e9990): /* vector_size */
+ case 6: case 7: /* vector_size */
{
CTSize vsize = cp_decl_sizeattr(cp);
if (vsize) CTF_INSERT(decl->attr, VSIZEP, lj_fls(vsize));
}
break;
#if LJ_TARGET_X86
- case H_(5ad22db8,c689b848): case H_(439150fa,65ea78cb): /* regparm */
+ case 8: case 9: /* regparm */
CTF_INSERT(decl->fattr, REGPARM, cp_decl_sizeattr(cp));
decl->fattr |= CTFP_CCONV;
break;
- case H_(18fc0b98,7ff4c074): case H_(4e62abed,0a747424): /* cdecl */
+ case 10: case 11: /* cdecl */
CTF_INSERT(decl->fattr, CCONV, CTCC_CDECL);
decl->fattr |= CTFP_CCONV;
break;
- case H_(72b2e41b,494c5a44): case H_(f2356d59,f25fc9bd): /* thiscall */
+ case 12: case 13: /* thiscall */
CTF_INSERT(decl->fattr, CCONV, CTCC_THISCALL);
decl->fattr |= CTFP_CCONV;
break;
- case H_(0d0ffc42,ab746f88): case H_(21c54ba1,7f0ca7e3): /* fastcall */
+ case 14: case 15: /* fastcall */
CTF_INSERT(decl->fattr, CCONV, CTCC_FASTCALL);
decl->fattr |= CTFP_CCONV;
break;
- case H_(ef76b040,9412e06a): case H_(de56697b,c750e6e1): /* stdcall */
+ case 16: case 17: /* stdcall */
CTF_INSERT(decl->fattr, CCONV, CTCC_STDCALL);
decl->fattr |= CTFP_CCONV;
break;
- case H_(ea78b622,f234bd8e): case H_(252ffb06,8d50f34b): /* sseregparm */
+ case 18: case 19: /* sseregparm */
decl->fattr |= CTF_SSEREGPARM;
decl->fattr |= CTFP_CCONV;
break;
@@ -1140,16 +1169,13 @@ static void cp_decl_msvcattribute(CPState *cp, CPDecl *decl)
while (cp->tok == CTOK_IDENT) {
GCstr *attrstr = cp->str;
cp_next(cp);
- switch (attrstr->hash) {
- case H_(bc2395fa,98f267f8): /* align */
+ if (cp_str_is(attrstr, "align")) {
cp_decl_align(cp, decl);
- break;
- default: /* Ignore all other attributes. */
+ } else { /* Ignore all other attributes. */
if (cp_opt(cp, '(')) {
while (cp->tok != ')' && cp->tok != CTOK_EOF) cp_next(cp);
cp_check(cp, ')');
}
- break;
}
}
cp_check(cp, ')');
@@ -1729,17 +1755,16 @@ static CTypeID cp_decl_abstract(CPState *cp)
static void cp_pragma(CPState *cp, BCLine pragmaline)
{
cp_next(cp);
- if (cp->tok == CTOK_IDENT &&
- cp->str->hash == H_(e79b999f,42ca3e85)) { /* pack */
+ if (cp->tok == CTOK_IDENT && cp_str_is(cp->str, "pack")) {
cp_next(cp);
cp_check(cp, '(');
if (cp->tok == CTOK_IDENT) {
- if (cp->str->hash == H_(738e923c,a1b65954)) { /* push */
+ if (cp_str_is(cp->str, "push")) {
if (cp->curpack < CPARSE_MAX_PACKSTACK) {
cp->packstack[cp->curpack+1] = cp->packstack[cp->curpack];
cp->curpack++;
}
- } else if (cp->str->hash == H_(6c71cf27,6c71cf27)) { /* pop */
+ } else if (cp_str_is(cp->str, "pop")) {
if (cp->curpack > 0) cp->curpack--;
} else {
cp_errmsg(cp, cp->tok, LJ_ERR_XSYMBOL);
@@ -1788,13 +1813,11 @@ static void cp_decl_multi(CPState *cp)
if (tok == CTOK_INTEGER) {
cp_line(cp, hashline);
continue;
- } else if (tok == CTOK_IDENT &&
- cp->str->hash == H_(187aab88,fcb60b42)) { /* line */
+ } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "line")) {
if (cp_next(cp) != CTOK_INTEGER) cp_err_token(cp, tok);
cp_line(cp, hashline);
continue;
- } else if (tok == CTOK_IDENT &&
- cp->str->hash == H_(f5e6b4f8,1d509107)) { /* pragma */
+ } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "pragma")) {
cp_pragma(cp, hashline);
continue;
} else {
@@ -1865,8 +1888,6 @@ static void cp_decl_single(CPState *cp)
if (cp->tok != CTOK_EOF) cp_err_token(cp, CTOK_EOF);
}
-#undef H_
-
/* ------------------------------------------------------------------------ */
/* Protected callback for C parser. */
diff --git a/src/lj_cparse.h b/src/lj_cparse.h
index bad1060b..e40b4047 100644
--- a/src/lj_cparse.h
+++ b/src/lj_cparse.h
@@ -60,6 +60,8 @@ typedef struct CPState {
LJ_FUNC int lj_cparse(CPState *cp);
+LJ_FUNC int lj_cparse_case(GCstr *str, const char *match);
+
#endif
#endif
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:07 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:52 ` Sergey Kaplun via Tarantool-patches
2023-08-16 17:04 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 2 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:07 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:58PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)
>
> This patch refactors FFI parsing of supported C attributes and pragmas,
> `ffi.abi()` parameter check. It replaces usage of comparison (with
Typo: s/usage/the usage/
> hardcoded string hashes) with search in the given string with the
Typo: s/with search/with a search/
> format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
> "attribute" name.
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
> src/lib_ffi.c | 35 ++++++++++------------
> src/lj_cparse.c | 77 +++++++++++++++++++++++++++++++------------------
> src/lj_cparse.h | 2 ++
> 3 files changed, 67 insertions(+), 47 deletions(-)
>
> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
> index d1fe1a14..62af54c1 100644
> --- a/src/lib_ffi.c
> +++ b/src/lib_ffi.c
> @@ -720,50 +720,47 @@ LJLIB_CF(ffi_fill) LJLIB_REC(.)
> return 0;
> }
>
> -#define H_(le, be) LJ_ENDIAN_SELECT(0x##le, 0x##be)
> -
> /* Test ABI string. */
> LJLIB_CF(ffi_abi) LJLIB_REC(.)
> {
> GCstr *s = lj_lib_checkstr(L, 1);
> - int b = 0;
> - switch (s->hash) {
> + int b = lj_cparse_case(s,
> #if LJ_64
> - case H_(849858eb,ad35fd06): b = 1; break; /* 64bit */
> + "\00564bit"
> #else
> - case H_(662d3c79,d0e22477): b = 1; break; /* 32bit */
> + "\00532bit"
> #endif
> #if LJ_ARCH_HASFPU
> - case H_(e33ee463,e33ee463): b = 1; break; /* fpu */
> + "\003fpu"
> #endif
> #if LJ_ABI_SOFTFP
> - case H_(61211a23,c2e8c81c): b = 1; break; /* softfp */
> + "\006softfp"
> #else
> - case H_(539417a8,8ce0812f): b = 1; break; /* hardfp */
> + "\006hardfp"
> #endif
> #if LJ_ABI_EABI
> - case H_(2182df8f,f2ed1152): b = 1; break; /* eabi */
> + "\004eabi"
> #endif
> #if LJ_ABI_WIN
> - case H_(4ab624a8,4ab624a8): b = 1; break; /* win */
> + "\003win"
> #endif
> #if LJ_TARGET_UWP
> - case H_(a40f0bcb,a40f0bcb): b = 1; break; /* uwp */
> + "\003uwp"
> +#endif
> +#if LJ_LE
> + "\002le"
> +#else
> + "\002be"
> #endif
> - case H_(3af93066,1f001464): b = 1; break; /* le/be */
> #if LJ_GC64
> - case H_(9e89d2c9,13c83c92): b = 1; break; /* gc64 */
> + "\004gc64"
> #endif
> - default:
> - break;
> - }
> + ) >= 0;
> setboolV(L->top-1, b);
> setboolV(&G(L)->tmptv2, b); /* Remember for trace recorder. */
> return 1;
> }
>
> -#undef H_
> -
> LJLIB_PUSH(top-8) LJLIB_SET(!) /* Store reference to miscmap table. */
>
> LJLIB_CF(ffi_metatype)
> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
> index fb440567..07c643d4 100644
> --- a/src/lj_cparse.c
> +++ b/src/lj_cparse.c
> @@ -28,6 +28,24 @@
> ** If in doubt, please check the input against your favorite C compiler.
> */
>
> +/* -- Miscellaneous ------------------------------------------------------- */
> +
> +/* Match string against a C literal. */
> +#define cp_str_is(str, k) \
> + ((str)->len == sizeof(k)-1 && !memcmp(strdata(str), k, sizeof(k)-1))
> +
> +/* Check string against a linear list of matches. */
> +int lj_cparse_case(GCstr *str, const char *match)
> +{
> + MSize len;
> + int n;
> + for (n = 0; (len = (MSize)*match++); n++, match += len) {
> + if (str->len == len && !memcmp(match, strdata(str), len))
> + return n;
> + }
> + return -1;
> +}
> +
> /* -- C lexer ------------------------------------------------------------- */
>
> /* C lexer token names. */
> @@ -930,8 +948,6 @@ static CTypeID cp_decl_intern(CPState *cp, CPDecl *decl)
>
> /* -- C declaration parser ------------------------------------------------ */
>
> -#define H_(le, be) LJ_ENDIAN_SELECT(0x##le, 0x##be)
> -
> /* Reset declaration state to declaration specifier. */
> static void cp_decl_reset(CPDecl *decl)
> {
> @@ -1071,44 +1087,57 @@ static void cp_decl_gccattribute(CPState *cp, CPDecl *decl)
> attrstr = lj_str_new(cp->L, c+2, attrstr->len-4);
> #endif
> cp_next(cp);
> - switch (attrstr->hash) {
> - case H_(64a9208e,8ce14319): case H_(8e6331b2,95a282af): /* aligned */
> + switch (lj_cparse_case(attrstr,
> + "\007aligned" "\013__aligned__"
> + "\006packed" "\012__packed__"
> + "\004mode" "\010__mode__"
> + "\013vector_size" "\017__vector_size__"
> +#if LJ_TARGET_X86
> + "\007regparm" "\013__regparm__"
> + "\005cdecl" "\011__cdecl__"
> + "\010thiscall" "\014__thiscall__"
> + "\010fastcall" "\014__fastcall__"
> + "\007stdcall" "\013__stdcall__"
> + "\012sseregparm" "\016__sseregparm__"
> +#endif
> + )) {
> + case 0: case 1: /* aligned */
> cp_decl_align(cp, decl);
> break;
> - case H_(42eb47de,f0ede26c): case H_(29f48a09,cf383e0c): /* packed */
> + case 2: case 3: /* packed */
> decl->attr |= CTFP_PACKED;
> break;
> - case H_(0a84eef6,8dfab04c): case H_(995cf92c,d5696591): /* mode */
> + case 4: case 5: /* mode */
> cp_decl_mode(cp, decl);
> break;
> - case H_(0ab31997,2d5213fa): case H_(bf875611,200e9990): /* vector_size */
> + case 6: case 7: /* vector_size */
> {
> CTSize vsize = cp_decl_sizeattr(cp);
> if (vsize) CTF_INSERT(decl->attr, VSIZEP, lj_fls(vsize));
> }
> break;
> #if LJ_TARGET_X86
> - case H_(5ad22db8,c689b848): case H_(439150fa,65ea78cb): /* regparm */
> + case 8: case 9: /* regparm */
> CTF_INSERT(decl->fattr, REGPARM, cp_decl_sizeattr(cp));
> decl->fattr |= CTFP_CCONV;
> break;
> - case H_(18fc0b98,7ff4c074): case H_(4e62abed,0a747424): /* cdecl */
> + case 10: case 11: /* cdecl */
> CTF_INSERT(decl->fattr, CCONV, CTCC_CDECL);
> decl->fattr |= CTFP_CCONV;
> break;
> - case H_(72b2e41b,494c5a44): case H_(f2356d59,f25fc9bd): /* thiscall */
> + case 12: case 13: /* thiscall */
> CTF_INSERT(decl->fattr, CCONV, CTCC_THISCALL);
> decl->fattr |= CTFP_CCONV;
> break;
> - case H_(0d0ffc42,ab746f88): case H_(21c54ba1,7f0ca7e3): /* fastcall */
> + case 14: case 15: /* fastcall */
> CTF_INSERT(decl->fattr, CCONV, CTCC_FASTCALL);
> decl->fattr |= CTFP_CCONV;
> break;
> - case H_(ef76b040,9412e06a): case H_(de56697b,c750e6e1): /* stdcall */
> + case 16: case 17: /* stdcall */
> CTF_INSERT(decl->fattr, CCONV, CTCC_STDCALL);
> decl->fattr |= CTFP_CCONV;
> break;
> - case H_(ea78b622,f234bd8e): case H_(252ffb06,8d50f34b): /* sseregparm */
> + case 18: case 19: /* sseregparm */
> decl->fattr |= CTF_SSEREGPARM;
> decl->fattr |= CTFP_CCONV;
> break;
> @@ -1140,16 +1169,13 @@ static void cp_decl_msvcattribute(CPState *cp, CPDecl *decl)
> while (cp->tok == CTOK_IDENT) {
> GCstr *attrstr = cp->str;
> cp_next(cp);
> - switch (attrstr->hash) {
> - case H_(bc2395fa,98f267f8): /* align */
> + if (cp_str_is(attrstr, "align")) {
> cp_decl_align(cp, decl);
> - break;
> - default: /* Ignore all other attributes. */
> + } else { /* Ignore all other attributes. */
> if (cp_opt(cp, '(')) {
> while (cp->tok != ')' && cp->tok != CTOK_EOF) cp_next(cp);
> cp_check(cp, ')');
> }
> - break;
> }
> }
> cp_check(cp, ')');
> @@ -1729,17 +1755,16 @@ static CTypeID cp_decl_abstract(CPState *cp)
> static void cp_pragma(CPState *cp, BCLine pragmaline)
> {
> cp_next(cp);
> - if (cp->tok == CTOK_IDENT &&
> - cp->str->hash == H_(e79b999f,42ca3e85)) { /* pack */
> + if (cp->tok == CTOK_IDENT && cp_str_is(cp->str, "pack")) {
> cp_next(cp);
> cp_check(cp, '(');
> if (cp->tok == CTOK_IDENT) {
> - if (cp->str->hash == H_(738e923c,a1b65954)) { /* push */
> + if (cp_str_is(cp->str, "push")) {
> if (cp->curpack < CPARSE_MAX_PACKSTACK) {
> cp->packstack[cp->curpack+1] = cp->packstack[cp->curpack];
> cp->curpack++;
> }
> - } else if (cp->str->hash == H_(6c71cf27,6c71cf27)) { /* pop */
> + } else if (cp_str_is(cp->str, "pop")) {
> if (cp->curpack > 0) cp->curpack--;
> } else {
> cp_errmsg(cp, cp->tok, LJ_ERR_XSYMBOL);
> @@ -1788,13 +1813,11 @@ static void cp_decl_multi(CPState *cp)
> if (tok == CTOK_INTEGER) {
> cp_line(cp, hashline);
> continue;
> - } else if (tok == CTOK_IDENT &&
> - cp->str->hash == H_(187aab88,fcb60b42)) { /* line */
> + } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "line")) {
> if (cp_next(cp) != CTOK_INTEGER) cp_err_token(cp, tok);
> cp_line(cp, hashline);
> continue;
> - } else if (tok == CTOK_IDENT &&
> - cp->str->hash == H_(f5e6b4f8,1d509107)) { /* pragma */
> + } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "pragma")) {
> cp_pragma(cp, hashline);
> continue;
> } else {
> @@ -1865,8 +1888,6 @@ static void cp_decl_single(CPState *cp)
> if (cp->tok != CTOK_EOF) cp_err_token(cp, CTOK_EOF);
> }
>
> -#undef H_
> -
> /* ------------------------------------------------------------------------ */
>
> /* Protected callback for C parser. */
> diff --git a/src/lj_cparse.h b/src/lj_cparse.h
> index bad1060b..e40b4047 100644
> --- a/src/lj_cparse.h
> +++ b/src/lj_cparse.h
> @@ -60,6 +60,8 @@ typedef struct CPState {
>
> LJ_FUNC int lj_cparse(CPState *cp);
>
> +LJ_FUNC int lj_cparse_case(GCstr *str, const char *match);
> +
> #endif
>
> #endif
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
2023-08-15 13:07 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:52 ` Sergey Kaplun via Tarantool-patches
2023-08-16 17:04 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:52 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
>
> On Wed, Aug 09, 2023 at 06:35:58PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > (cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)
> >
> > This patch refactors FFI parsing of supported C attributes and pragmas,
> > `ffi.abi()` parameter check. It replaces usage of comparison (with
> Typo: s/usage/the usage/
Fixed.
> > hardcoded string hashes) with search in the given string with the
> Typo: s/with search/with a search/
Fixes.
> > format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
> > "attribute" name.
> >
> > Sergey Kaplun:
> > * added the description for the commit
> >
> > Part of tarantool/tarantool#8825
> > ---
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
2023-08-15 13:07 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:52 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 17:04 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 17:04 UTC (permalink / raw)
To: Maxim Kokryashkin, Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey
Thanks for the patch! LGTM
On 8/15/23 16:07, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
>
> On Wed, Aug 09, 2023 at 06:35:58PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> From: Mike Pall <mike>
>>
>> (cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)
>>
>> This patch refactors FFI parsing of supported C attributes and pragmas,
>> `ffi.abi()` parameter check. It replaces usage of comparison (with
> Typo: s/usage/the usage/
>> hardcoded string hashes) with search in the given string with the
> Typo: s/with search/with a search/
>> format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
>> "attribute" name.
>>
>> Sergey Kaplun:
>> * added the description for the commit
>>
>> Part of tarantool/tarantool#8825
>> ---
>> src/lib_ffi.c | 35 ++++++++++------------
>> src/lj_cparse.c | 77 +++++++++++++++++++++++++++++++------------------
>> src/lj_cparse.h | 2 ++
>> 3 files changed, 67 insertions(+), 47 deletions(-)
>>
>> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
>> index d1fe1a14..62af54c1 100644
>> --- a/src/lib_ffi.c
>> +++ b/src/lib_ffi.c
>> @@ -720,50 +720,47 @@ LJLIB_CF(ffi_fill) LJLIB_REC(.)
>> return 0;
>> }
>>
>> -#define H_(le, be) LJ_ENDIAN_SELECT(0x##le, 0x##be)
>> -
>> /* Test ABI string. */
>> LJLIB_CF(ffi_abi) LJLIB_REC(.)
>> {
>> GCstr *s = lj_lib_checkstr(L, 1);
>> - int b = 0;
>> - switch (s->hash) {
>> + int b = lj_cparse_case(s,
>> #if LJ_64
>> - case H_(849858eb,ad35fd06): b = 1; break; /* 64bit */
>> + "\00564bit"
>> #else
>> - case H_(662d3c79,d0e22477): b = 1; break; /* 32bit */
>> + "\00532bit"
>> #endif
>> #if LJ_ARCH_HASFPU
>> - case H_(e33ee463,e33ee463): b = 1; break; /* fpu */
>> + "\003fpu"
>> #endif
>> #if LJ_ABI_SOFTFP
>> - case H_(61211a23,c2e8c81c): b = 1; break; /* softfp */
>> + "\006softfp"
>> #else
>> - case H_(539417a8,8ce0812f): b = 1; break; /* hardfp */
>> + "\006hardfp"
>> #endif
>> #if LJ_ABI_EABI
>> - case H_(2182df8f,f2ed1152): b = 1; break; /* eabi */
>> + "\004eabi"
>> #endif
>> #if LJ_ABI_WIN
>> - case H_(4ab624a8,4ab624a8): b = 1; break; /* win */
>> + "\003win"
>> #endif
>> #if LJ_TARGET_UWP
>> - case H_(a40f0bcb,a40f0bcb): b = 1; break; /* uwp */
>> + "\003uwp"
>> +#endif
>> +#if LJ_LE
>> + "\002le"
>> +#else
>> + "\002be"
>> #endif
>> - case H_(3af93066,1f001464): b = 1; break; /* le/be */
>> #if LJ_GC64
>> - case H_(9e89d2c9,13c83c92): b = 1; break; /* gc64 */
>> + "\004gc64"
>> #endif
>> - default:
>> - break;
>> - }
>> + ) >= 0;
>> setboolV(L->top-1, b);
>> setboolV(&G(L)->tmptv2, b); /* Remember for trace recorder. */
>> return 1;
>> }
>>
>> -#undef H_
>> -
>> LJLIB_PUSH(top-8) LJLIB_SET(!) /* Store reference to miscmap table. */
>>
>> LJLIB_CF(ffi_metatype)
>> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
>> index fb440567..07c643d4 100644
>> --- a/src/lj_cparse.c
>> +++ b/src/lj_cparse.c
>> @@ -28,6 +28,24 @@
>> ** If in doubt, please check the input against your favorite C compiler.
>> */
>>
>> +/* -- Miscellaneous ------------------------------------------------------- */
>> +
>> +/* Match string against a C literal. */
>> +#define cp_str_is(str, k) \
>> + ((str)->len == sizeof(k)-1 && !memcmp(strdata(str), k, sizeof(k)-1))
>> +
>> +/* Check string against a linear list of matches. */
>> +int lj_cparse_case(GCstr *str, const char *match)
>> +{
>> + MSize len;
>> + int n;
>> + for (n = 0; (len = (MSize)*match++); n++, match += len) {
>> + if (str->len == len && !memcmp(match, strdata(str), len))
>> + return n;
>> + }
>> + return -1;
>> +}
>> +
>> /* -- C lexer ------------------------------------------------------------- */
>>
>> /* C lexer token names. */
>> @@ -930,8 +948,6 @@ static CTypeID cp_decl_intern(CPState *cp, CPDecl *decl)
>>
>> /* -- C declaration parser ------------------------------------------------ */
>>
>> -#define H_(le, be) LJ_ENDIAN_SELECT(0x##le, 0x##be)
>> -
>> /* Reset declaration state to declaration specifier. */
>> static void cp_decl_reset(CPDecl *decl)
>> {
>> @@ -1071,44 +1087,57 @@ static void cp_decl_gccattribute(CPState *cp, CPDecl *decl)
>> attrstr = lj_str_new(cp->L, c+2, attrstr->len-4);
>> #endif
>> cp_next(cp);
>> - switch (attrstr->hash) {
>> - case H_(64a9208e,8ce14319): case H_(8e6331b2,95a282af): /* aligned */
>> + switch (lj_cparse_case(attrstr,
>> + "\007aligned" "\013__aligned__"
>> + "\006packed" "\012__packed__"
>> + "\004mode" "\010__mode__"
>> + "\013vector_size" "\017__vector_size__"
>> +#if LJ_TARGET_X86
>> + "\007regparm" "\013__regparm__"
>> + "\005cdecl" "\011__cdecl__"
>> + "\010thiscall" "\014__thiscall__"
>> + "\010fastcall" "\014__fastcall__"
>> + "\007stdcall" "\013__stdcall__"
>> + "\012sseregparm" "\016__sseregparm__"
>> +#endif
>> + )) {
>> + case 0: case 1: /* aligned */
>> cp_decl_align(cp, decl);
>> break;
>> - case H_(42eb47de,f0ede26c): case H_(29f48a09,cf383e0c): /* packed */
>> + case 2: case 3: /* packed */
>> decl->attr |= CTFP_PACKED;
>> break;
>> - case H_(0a84eef6,8dfab04c): case H_(995cf92c,d5696591): /* mode */
>> + case 4: case 5: /* mode */
>> cp_decl_mode(cp, decl);
>> break;
>> - case H_(0ab31997,2d5213fa): case H_(bf875611,200e9990): /* vector_size */
>> + case 6: case 7: /* vector_size */
>> {
>> CTSize vsize = cp_decl_sizeattr(cp);
>> if (vsize) CTF_INSERT(decl->attr, VSIZEP, lj_fls(vsize));
>> }
>> break;
>> #if LJ_TARGET_X86
>> - case H_(5ad22db8,c689b848): case H_(439150fa,65ea78cb): /* regparm */
>> + case 8: case 9: /* regparm */
>> CTF_INSERT(decl->fattr, REGPARM, cp_decl_sizeattr(cp));
>> decl->fattr |= CTFP_CCONV;
>> break;
>> - case H_(18fc0b98,7ff4c074): case H_(4e62abed,0a747424): /* cdecl */
>> + case 10: case 11: /* cdecl */
>> CTF_INSERT(decl->fattr, CCONV, CTCC_CDECL);
>> decl->fattr |= CTFP_CCONV;
>> break;
>> - case H_(72b2e41b,494c5a44): case H_(f2356d59,f25fc9bd): /* thiscall */
>> + case 12: case 13: /* thiscall */
>> CTF_INSERT(decl->fattr, CCONV, CTCC_THISCALL);
>> decl->fattr |= CTFP_CCONV;
>> break;
>> - case H_(0d0ffc42,ab746f88): case H_(21c54ba1,7f0ca7e3): /* fastcall */
>> + case 14: case 15: /* fastcall */
>> CTF_INSERT(decl->fattr, CCONV, CTCC_FASTCALL);
>> decl->fattr |= CTFP_CCONV;
>> break;
>> - case H_(ef76b040,9412e06a): case H_(de56697b,c750e6e1): /* stdcall */
>> + case 16: case 17: /* stdcall */
>> CTF_INSERT(decl->fattr, CCONV, CTCC_STDCALL);
>> decl->fattr |= CTFP_CCONV;
>> break;
>> - case H_(ea78b622,f234bd8e): case H_(252ffb06,8d50f34b): /* sseregparm */
>> + case 18: case 19: /* sseregparm */
>> decl->fattr |= CTF_SSEREGPARM;
>> decl->fattr |= CTFP_CCONV;
>> break;
>> @@ -1140,16 +1169,13 @@ static void cp_decl_msvcattribute(CPState *cp, CPDecl *decl)
>> while (cp->tok == CTOK_IDENT) {
>> GCstr *attrstr = cp->str;
>> cp_next(cp);
>> - switch (attrstr->hash) {
>> - case H_(bc2395fa,98f267f8): /* align */
>> + if (cp_str_is(attrstr, "align")) {
>> cp_decl_align(cp, decl);
>> - break;
>> - default: /* Ignore all other attributes. */
>> + } else { /* Ignore all other attributes. */
>> if (cp_opt(cp, '(')) {
>> while (cp->tok != ')' && cp->tok != CTOK_EOF) cp_next(cp);
>> cp_check(cp, ')');
>> }
>> - break;
>> }
>> }
>> cp_check(cp, ')');
>> @@ -1729,17 +1755,16 @@ static CTypeID cp_decl_abstract(CPState *cp)
>> static void cp_pragma(CPState *cp, BCLine pragmaline)
>> {
>> cp_next(cp);
>> - if (cp->tok == CTOK_IDENT &&
>> - cp->str->hash == H_(e79b999f,42ca3e85)) { /* pack */
>> + if (cp->tok == CTOK_IDENT && cp_str_is(cp->str, "pack")) {
>> cp_next(cp);
>> cp_check(cp, '(');
>> if (cp->tok == CTOK_IDENT) {
>> - if (cp->str->hash == H_(738e923c,a1b65954)) { /* push */
>> + if (cp_str_is(cp->str, "push")) {
>> if (cp->curpack < CPARSE_MAX_PACKSTACK) {
>> cp->packstack[cp->curpack+1] = cp->packstack[cp->curpack];
>> cp->curpack++;
>> }
>> - } else if (cp->str->hash == H_(6c71cf27,6c71cf27)) { /* pop */
>> + } else if (cp_str_is(cp->str, "pop")) {
>> if (cp->curpack > 0) cp->curpack--;
>> } else {
>> cp_errmsg(cp, cp->tok, LJ_ERR_XSYMBOL);
>> @@ -1788,13 +1813,11 @@ static void cp_decl_multi(CPState *cp)
>> if (tok == CTOK_INTEGER) {
>> cp_line(cp, hashline);
>> continue;
>> - } else if (tok == CTOK_IDENT &&
>> - cp->str->hash == H_(187aab88,fcb60b42)) { /* line */
>> + } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "line")) {
>> if (cp_next(cp) != CTOK_INTEGER) cp_err_token(cp, tok);
>> cp_line(cp, hashline);
>> continue;
>> - } else if (tok == CTOK_IDENT &&
>> - cp->str->hash == H_(f5e6b4f8,1d509107)) { /* pragma */
>> + } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "pragma")) {
>> cp_pragma(cp, hashline);
>> continue;
>> } else {
>> @@ -1865,8 +1888,6 @@ static void cp_decl_single(CPState *cp)
>> if (cp->tok != CTOK_EOF) cp_err_token(cp, CTOK_EOF);
>> }
>>
>> -#undef H_
>> -
>> /* ------------------------------------------------------------------------ */
>>
>> /* Protected callback for C parser. */
>> diff --git a/src/lj_cparse.h b/src/lj_cparse.h
>> index bad1060b..e40b4047 100644
>> --- a/src/lj_cparse.h
>> +++ b/src/lj_cparse.h
>> @@ -60,6 +60,8 @@ typedef struct CPState {
>>
>> LJ_FUNC int lj_cparse(CPState *cp);
>>
>> +LJ_FUNC int lj_cparse_case(GCstr *str, const char *match);
>> +
>> #endif
>>
>> #endif
>> --
>> 2.41.0
>>
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (8 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
2023-08-11 8:06 ` Sergey Kaplun via Tarantool-patches
` (2 more replies)
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
` (11 subsequent siblings)
21 siblings, 3 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
(cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)
This patch backports the aforementioned patch for mips and ppc, because
those architectures were stripped during the backporting via
71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
compilation and fix inconsistencies."). This applies these missed diffs
to prevent conflict during backporting future patches.
This patch just removes macros, that are no more in use.
Sergey Kaplun:
* added the description for the problem
Part of tarantool/tarantool#8825
---
src/lj_asm_mips.h | 1 -
src/lj_asm_ppc.h | 1 -
2 files changed, 2 deletions(-)
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index a26a82cd..c27d8413 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
}
#endif
-#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
static void asm_arithov(ASMState *as, IRIns *ir)
diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
index 6cb608f7..6aaed058 100644
--- a/src/lj_asm_ppc.h
+++ b/src/lj_asm_ppc.h
@@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
}
#define asm_abs(as, ir) asm_fpunary(as, ir, PPCI_FABS)
-#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
@ 2023-08-11 8:06 ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:10 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 17:15 ` Sergey Bronnikov via Tarantool-patches
2 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-11 8:06 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
Hi, folks!
I found that some changes (see <src/lj_asm.c>) are missing, I've updated
the patch and force-pushed the branch.
===================================================================
Cleanup math function compilation and fix inconsistencies.
(cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)
This patch backports the aforementioned patch for mips and ppc, because
those architectures were stripped during the backporting via
71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
compilation and fix inconsistencies."). This applies these missed diffs
to prevent conflict during backporting future patches.
This patch just removes macros, that are no more in use. Also, it
removes `IR_ATAN2` usage, that is not defined.
Sergey Kaplun:
* added the description for the problem
Part of tarantool/tarantool#8825
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 15de7e33..ff68f79b 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -1705,7 +1705,7 @@ static void asm_ir(ASMState *as, IRIns *ir)
case IR_NEG: asm_neg(as, ir); break;
#if LJ_SOFTFP32
case IR_DIV: case IR_POW: case IR_ABS:
- case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
+ case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
lua_assert(0); /* Unused for LJ_SOFTFP32. */
break;
#else
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index a26a82cd..c27d8413 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
}
#endif
-#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
static void asm_arithov(ASMState *as, IRIns *ir)
diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
index 6cb608f7..6aaed058 100644
--- a/src/lj_asm_ppc.h
+++ b/src/lj_asm_ppc.h
@@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
}
#define asm_abs(as, ir) asm_fpunary(as, ir, PPCI_FABS)
-#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)
===================================================================
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
2023-08-11 8:06 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:10 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 17:15 ` Sergey Bronnikov via Tarantool-patches
2 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:10 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM
On Wed, Aug 09, 2023 at 06:35:59PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)
>
> This patch backports the aforementioned patch for mips and ppc, because
> those architectures were stripped during the backporting via
> 71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
> compilation and fix inconsistencies."). This applies these missed diffs
> to prevent conflict during backporting future patches.
>
> This patch just removes macros, that are no more in use.
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_asm_mips.h | 1 -
> src/lj_asm_ppc.h | 1 -
> 2 files changed, 2 deletions(-)
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index a26a82cd..c27d8413 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
> }
> #endif
>
> -#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
> #define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
>
> static void asm_arithov(ASMState *as, IRIns *ir)
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index 6cb608f7..6aaed058 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
> }
>
> #define asm_abs(as, ir) asm_fpunary(as, ir, PPCI_FABS)
> -#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
> #define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
>
> static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
2023-08-11 8:06 ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:10 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 17:15 ` Sergey Bronnikov via Tarantool-patches
2 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 17:15 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
Thanks
for
the
patch!
LGTM
On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)
>
> This patch backports the aforementioned patch for mips and ppc, because
> those architectures were stripped during the backporting via
> 71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
> compilation and fix inconsistencies."). This applies these missed diffs
> to prevent conflict during backporting future patches.
>
> This patch just removes macros, that are no more in use.
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_asm_mips.h | 1 -
> src/lj_asm_ppc.h | 1 -
> 2 files changed, 2 deletions(-)
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index a26a82cd..c27d8413 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
> }
> #endif
>
> -#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
> #define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
>
> static void asm_arithov(ASMState *as, IRIns *ir)
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index 6cb608f7..6aaed058 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
> }
>
> #define asm_abs(as, ir) asm_fpunary(as, ir, PPCI_FABS)
> -#define asm_atan2(as, ir) asm_callid(as, ir, IRCALL_atan2)
> #define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp)
>
> static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (9 preceding siblings ...)
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:17 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 7:37 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
` (10 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
(cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)
This patch adds the `/* fallthrough */` where it may trigger the
`-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
this comment and will be fixed in the future commits.
[1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
Sergey Kaplun:
* added the description for the commit
Part of tarantool/tarantool#8825
---
dynasm/dasm_arm.h | 2 ++
dynasm/dasm_mips.h | 1 +
dynasm/dasm_ppc.h | 1 +
dynasm/dasm_x86.h | 18 ++++++++++++++----
src/lj_asm.c | 7 ++++++-
src/lj_cparse.c | 10 ++++++++++
src/lj_err.c | 1 +
src/lj_opt_sink.c | 2 +-
src/lj_parse.c | 3 ++-
src/luajit.c | 1 +
10 files changed, 39 insertions(+), 7 deletions(-)
diff --git a/dynasm/dasm_arm.h b/dynasm/dasm_arm.h
index a43f7c66..1d404ccd 100644
--- a/dynasm/dasm_arm.h
+++ b/dynasm/dasm_arm.h
@@ -254,6 +254,7 @@ void dasm_put(Dst_DECL, int start, ...)
case DASM_IMMV8:
CK((n & 3) == 0, RANGE_I);
n >>= 2;
+ /* fallthrough */
case DASM_IMML8:
case DASM_IMML12:
CK(n >= 0 ? ((n>>((ins>>5)&31)) == 0) :
@@ -371,6 +372,7 @@ int dasm_encode(Dst_DECL, void *buffer)
break;
case DASM_REL_LG:
CK(n >= 0, UNDEF_LG);
+ /* fallthrough */
case DASM_REL_PC:
CK(n >= 0, UNDEF_PC);
n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) - 4;
diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
index 4b49fd8c..71a835b2 100644
--- a/dynasm/dasm_mips.h
+++ b/dynasm/dasm_mips.h
@@ -350,6 +350,7 @@ int dasm_encode(Dst_DECL, void *buffer)
break;
case DASM_REL_LG:
CK(n >= 0, UNDEF_LG);
+ /* fallthrough */
case DASM_REL_PC:
CK(n >= 0, UNDEF_PC);
n = *DASM_POS2PTR(D, n);
diff --git a/dynasm/dasm_ppc.h b/dynasm/dasm_ppc.h
index 3a7ee9b0..83fc030a 100644
--- a/dynasm/dasm_ppc.h
+++ b/dynasm/dasm_ppc.h
@@ -354,6 +354,7 @@ int dasm_encode(Dst_DECL, void *buffer)
break;
case DASM_REL_LG:
CK(n >= 0, UNDEF_LG);
+ /* fallthrough */
case DASM_REL_PC:
CK(n >= 0, UNDEF_PC);
n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base);
diff --git a/dynasm/dasm_x86.h b/dynasm/dasm_x86.h
index bc636357..2a276042 100644
--- a/dynasm/dasm_x86.h
+++ b/dynasm/dasm_x86.h
@@ -194,12 +194,13 @@ void dasm_put(Dst_DECL, int start, ...)
switch (action) {
case DASM_DISP:
if (n == 0) { if (mrm < 0) mrm = p[-2]; if ((mrm&7) != 5) break; }
- case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob;
+ /* fallthrough */
+ case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
case DASM_REL_A: /* Assumes ptrdiff_t is int. !x64 */
case DASM_IMM_D: ofs += 4; break;
case DASM_IMM_S: CK(((n+128)&-256) == 0, RANGE_I); goto ob;
case DASM_IMM_B: CK((n&-256) == 0, RANGE_I); ob: ofs++; break;
- case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob;
+ case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
case DASM_IMM_W: CK((n&-65536) == 0, RANGE_I); ofs += 2; break;
case DASM_SPACE: p++; ofs += n; break;
case DASM_SETLABEL: b[pos-2] = -0x40000000; break; /* Neg. label ofs. */
@@ -207,8 +208,8 @@ void dasm_put(Dst_DECL, int start, ...)
if (*p < 0x40 && p[1] == DASM_DISP) mrm = n;
if (*p < 0x20 && (n&7) == 4) ofs++;
switch ((*p++ >> 3) & 3) {
- case 3: n |= b[pos-3];
- case 2: n |= b[pos-2];
+ case 3: n |= b[pos-3]; /* fallthrough */
+ case 2: n |= b[pos-2]; /* fallthrough */
case 1: if (n <= 7) { b[pos-1] |= 0x10; ofs--; }
}
continue;
@@ -329,11 +330,14 @@ int dasm_link(Dst_DECL, size_t *szp)
pos += 2;
break;
}
+ /* fallthrough */
case DASM_SPACE: case DASM_IMM_LG: case DASM_VREG: p++;
+ /* fallthrough */
case DASM_DISP: case DASM_IMM_S: case DASM_IMM_B: case DASM_IMM_W:
case DASM_IMM_D: case DASM_IMM_WB: case DASM_IMM_DB:
case DASM_SETLABEL: case DASM_REL_A: case DASM_IMM_PC: pos++; break;
case DASM_LABEL_LG: p++;
+ /* fallthrough */
case DASM_LABEL_PC: b[pos++] += ofs; break; /* Fix label offset. */
case DASM_ALIGN: ofs -= (b[pos++]+ofs)&*p++; break; /* Adjust ofs. */
case DASM_EXTERN: p += 2; break;
@@ -391,12 +395,15 @@ int dasm_encode(Dst_DECL, void *buffer)
if (mrm != 5) { mm[-1] -= 0x80; break; } }
if (((n+128) & -256) != 0) goto wd; else mm[-1] -= 0x40;
}
+ /* fallthrough */
case DASM_IMM_S: case DASM_IMM_B: wb: dasmb(n); break;
case DASM_IMM_DB: if (((n+128)&-256) == 0) {
db: if (!mark) mark = cp; mark[-2] += 2; mark = NULL; goto wb;
} else mark = NULL;
+ /* fallthrough */
case DASM_IMM_D: wd: dasmd(n); break;
case DASM_IMM_WB: if (((n+128)&-256) == 0) goto db; else mark = NULL;
+ /* fallthrough */
case DASM_IMM_W: dasmw(n); break;
case DASM_VREG: {
int t = *p++;
@@ -421,6 +428,7 @@ int dasm_encode(Dst_DECL, void *buffer)
}
case DASM_REL_LG: p++; if (n >= 0) goto rel_pc;
b++; n = (int)(ptrdiff_t)D->globals[-n];
+ /* fallthrough */
case DASM_REL_A: rel_a: n -= (int)(ptrdiff_t)(cp+4); goto wd; /* !x64 */
case DASM_REL_PC: rel_pc: {
int shrink = *b++;
@@ -432,6 +440,7 @@ int dasm_encode(Dst_DECL, void *buffer)
}
case DASM_IMM_LG:
p++; if (n < 0) { n = (int)(ptrdiff_t)D->globals[-n]; goto wd; }
+ /* fallthrough */
case DASM_IMM_PC: {
int *pb = DASM_POS2PTR(D, n);
n = *pb < 0 ? pb[1] : (*pb + (int)(ptrdiff_t)base);
@@ -452,6 +461,7 @@ int dasm_encode(Dst_DECL, void *buffer)
case DASM_EXTERN: n = DASM_EXTERN(Dst, cp, p[1], *p); p += 2; goto wd;
case DASM_MARK: mark = cp; break;
case DASM_ESC: action = *p++;
+ /* fallthrough */
default: *cp++ = action; break;
case DASM_SECTION: case DASM_STOP: goto stop;
}
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 15de7e33..2d570bb9 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -2188,9 +2188,12 @@ static void asm_setup_regsp(ASMState *as)
if (ir->op2 != REF_NIL && as->evenspill < 4)
as->evenspill = 4; /* lj_cdata_newv needs 4 args. */
}
+ /* fallthrough */
#else
+ /* fallthrough */
case IR_CNEW:
#endif
+ /* fallthrough */
case IR_TNEW: case IR_TDUP: case IR_CNEWI: case IR_TOSTR:
case IR_BUFSTR:
ir->prev = REGSP_HINT(RID_RET);
@@ -2206,6 +2209,7 @@ static void asm_setup_regsp(ASMState *as)
case IR_LDEXP:
#endif
#endif
+ /* fallthrough */
case IR_POW:
if (!LJ_SOFTFP && irt_isnum(ir->t)) {
if (inloop)
@@ -2217,7 +2221,7 @@ static void asm_setup_regsp(ASMState *as)
continue;
#endif
}
- /* fallthrough for integer POW */
+ /* fallthrough */ /* for integer POW */
case IR_DIV: case IR_MOD:
if (!irt_isnum(ir->t)) {
ir->prev = REGSP_HINT(RID_RET);
@@ -2254,6 +2258,7 @@ static void asm_setup_regsp(ASMState *as)
case IR_BSHL: case IR_BSHR: case IR_BSAR:
if ((as->flags & JIT_F_BMI2)) /* Except if BMI2 is available. */
break;
+ /* fallthrough */
case IR_BROL: case IR_BROR:
if (!irref_isk(ir->op2) && !ra_hashint(IR(ir->op2)->r)) {
IR(ir->op2)->r = REGSP_HINT(RID_ECX);
diff --git a/src/lj_cparse.c b/src/lj_cparse.c
index 07c643d4..cd032b8e 100644
--- a/src/lj_cparse.c
+++ b/src/lj_cparse.c
@@ -595,28 +595,34 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
k->id = k2.id > k3.id ? k2.id : k3.id;
continue;
}
+ /* fallthrough */
case 1:
if (cp_opt(cp, CTOK_OROR)) {
cp_expr_sub(cp, &k2, 2); k->i32 = k->u32 || k2.u32; k->id = CTID_INT32;
continue;
}
+ /* fallthrough */
case 2:
if (cp_opt(cp, CTOK_ANDAND)) {
cp_expr_sub(cp, &k2, 3); k->i32 = k->u32 && k2.u32; k->id = CTID_INT32;
continue;
}
+ /* fallthrough */
case 3:
if (cp_opt(cp, '|')) {
cp_expr_sub(cp, &k2, 4); k->u32 = k->u32 | k2.u32; goto arith_result;
}
+ /* fallthrough */
case 4:
if (cp_opt(cp, '^')) {
cp_expr_sub(cp, &k2, 5); k->u32 = k->u32 ^ k2.u32; goto arith_result;
}
+ /* fallthrough */
case 5:
if (cp_opt(cp, '&')) {
cp_expr_sub(cp, &k2, 6); k->u32 = k->u32 & k2.u32; goto arith_result;
}
+ /* fallthrough */
case 6:
if (cp_opt(cp, CTOK_EQ)) {
cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 == k2.u32; k->id = CTID_INT32;
@@ -625,6 +631,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 != k2.u32; k->id = CTID_INT32;
continue;
}
+ /* fallthrough */
case 7:
if (cp_opt(cp, '<')) {
cp_expr_sub(cp, &k2, 8);
@@ -659,6 +666,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
k->id = CTID_INT32;
continue;
}
+ /* fallthrough */
case 8:
if (cp_opt(cp, CTOK_SHL)) {
cp_expr_sub(cp, &k2, 9); k->u32 = k->u32 << k2.u32;
@@ -671,6 +679,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
k->u32 = k->u32 >> k2.u32;
continue;
}
+ /* fallthrough */
case 9:
if (cp_opt(cp, '+')) {
cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 + k2.u32;
@@ -680,6 +689,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
} else if (cp_opt(cp, '-')) {
cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 - k2.u32; goto arith_result;
}
+ /* fallthrough */
case 10:
if (cp_opt(cp, '*')) {
cp_expr_unary(cp, &k2); k->u32 = k->u32 * k2.u32; goto arith_result;
diff --git a/src/lj_err.c b/src/lj_err.c
index 9903d273..8d7134d9 100644
--- a/src/lj_err.c
+++ b/src/lj_err.c
@@ -167,6 +167,7 @@ static void *err_unwind(lua_State *L, void *stopcf, int errcode)
case FRAME_CONT: /* Continuation frame. */
if (frame_iscont_fficb(frame))
goto unwind_c;
+ /* fallthrough */
case FRAME_VARG: /* Vararg frame. */
frame = frame_prevd(frame);
break;
diff --git a/src/lj_opt_sink.c b/src/lj_opt_sink.c
index a16d112f..c16363e7 100644
--- a/src/lj_opt_sink.c
+++ b/src/lj_opt_sink.c
@@ -100,8 +100,8 @@ static void sink_mark_ins(jit_State *J)
(LJ_32 && ir+1 < irlast && (ir+1)->o == IR_HIOP &&
!sink_checkphi(J, ir, (ir+1)->op2))))
irt_setmark(ir->t); /* Mark ineligible allocation. */
- /* fallthrough */
#endif
+ /* fallthrough */
case IR_USTORE:
irt_setmark(IR(ir->op2)->t); /* Mark stored value. */
break;
diff --git a/src/lj_parse.c b/src/lj_parse.c
index 343fa797..e238afa3 100644
--- a/src/lj_parse.c
+++ b/src/lj_parse.c
@@ -2684,7 +2684,8 @@ static int parse_stmt(LexState *ls)
lj_lex_next(ls);
parse_goto(ls);
break;
- } /* else: fallthrough */
+ }
+ /* fallthrough */
default:
parse_call_assign(ls);
break;
diff --git a/src/luajit.c b/src/luajit.c
index 1ca24301..3a3ec247 100644
--- a/src/luajit.c
+++ b/src/luajit.c
@@ -421,6 +421,7 @@ static int collectargs(char **argv, int *flags)
break;
case 'e':
*flags |= FLAGS_EXEC;
+ /* fallthrough */
case 'j': /* LuaJIT extension */
case 'l':
*flags |= FLAGS_OPTION;
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:17 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:59 ` Sergey Kaplun via Tarantool-patches
2023-08-17 7:37 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:17 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM as trivial, except for a few comments regarding the commit message below.
On Wed, Aug 09, 2023 at 06:36:00PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)
>
> This patch adds the `/* fallthrough */` where it may trigger the
> `-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
Typo: s/cases still/cases are still/
> this comment and will be fixed in the future commits.
Typo: s/in the/in/
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
> dynasm/dasm_arm.h | 2 ++
> dynasm/dasm_mips.h | 1 +
> dynasm/dasm_ppc.h | 1 +
> dynasm/dasm_x86.h | 18 ++++++++++++++----
> src/lj_asm.c | 7 ++++++-
> src/lj_cparse.c | 10 ++++++++++
> src/lj_err.c | 1 +
> src/lj_opt_sink.c | 2 +-
> src/lj_parse.c | 3 ++-
> src/luajit.c | 1 +
> 10 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/dynasm/dasm_arm.h b/dynasm/dasm_arm.h
> index a43f7c66..1d404ccd 100644
> --- a/dynasm/dasm_arm.h
> +++ b/dynasm/dasm_arm.h
> @@ -254,6 +254,7 @@ void dasm_put(Dst_DECL, int start, ...)
> case DASM_IMMV8:
> CK((n & 3) == 0, RANGE_I);
> n >>= 2;
> + /* fallthrough */
> case DASM_IMML8:
> case DASM_IMML12:
> CK(n >= 0 ? ((n>>((ins>>5)&31)) == 0) :
> @@ -371,6 +372,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) - 4;
> diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
> index 4b49fd8c..71a835b2 100644
> --- a/dynasm/dasm_mips.h
> +++ b/dynasm/dasm_mips.h
> @@ -350,6 +350,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n);
> diff --git a/dynasm/dasm_ppc.h b/dynasm/dasm_ppc.h
> index 3a7ee9b0..83fc030a 100644
> --- a/dynasm/dasm_ppc.h
> +++ b/dynasm/dasm_ppc.h
> @@ -354,6 +354,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base);
> diff --git a/dynasm/dasm_x86.h b/dynasm/dasm_x86.h
> index bc636357..2a276042 100644
> --- a/dynasm/dasm_x86.h
> +++ b/dynasm/dasm_x86.h
> @@ -194,12 +194,13 @@ void dasm_put(Dst_DECL, int start, ...)
> switch (action) {
> case DASM_DISP:
> if (n == 0) { if (mrm < 0) mrm = p[-2]; if ((mrm&7) != 5) break; }
> - case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob;
> + /* fallthrough */
> + case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
> case DASM_REL_A: /* Assumes ptrdiff_t is int. !x64 */
> case DASM_IMM_D: ofs += 4; break;
> case DASM_IMM_S: CK(((n+128)&-256) == 0, RANGE_I); goto ob;
> case DASM_IMM_B: CK((n&-256) == 0, RANGE_I); ob: ofs++; break;
> - case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob;
> + case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
> case DASM_IMM_W: CK((n&-65536) == 0, RANGE_I); ofs += 2; break;
> case DASM_SPACE: p++; ofs += n; break;
> case DASM_SETLABEL: b[pos-2] = -0x40000000; break; /* Neg. label ofs. */
> @@ -207,8 +208,8 @@ void dasm_put(Dst_DECL, int start, ...)
> if (*p < 0x40 && p[1] == DASM_DISP) mrm = n;
> if (*p < 0x20 && (n&7) == 4) ofs++;
> switch ((*p++ >> 3) & 3) {
> - case 3: n |= b[pos-3];
> - case 2: n |= b[pos-2];
> + case 3: n |= b[pos-3]; /* fallthrough */
> + case 2: n |= b[pos-2]; /* fallthrough */
> case 1: if (n <= 7) { b[pos-1] |= 0x10; ofs--; }
> }
> continue;
> @@ -329,11 +330,14 @@ int dasm_link(Dst_DECL, size_t *szp)
> pos += 2;
> break;
> }
> + /* fallthrough */
> case DASM_SPACE: case DASM_IMM_LG: case DASM_VREG: p++;
> + /* fallthrough */
> case DASM_DISP: case DASM_IMM_S: case DASM_IMM_B: case DASM_IMM_W:
> case DASM_IMM_D: case DASM_IMM_WB: case DASM_IMM_DB:
> case DASM_SETLABEL: case DASM_REL_A: case DASM_IMM_PC: pos++; break;
> case DASM_LABEL_LG: p++;
> + /* fallthrough */
> case DASM_LABEL_PC: b[pos++] += ofs; break; /* Fix label offset. */
> case DASM_ALIGN: ofs -= (b[pos++]+ofs)&*p++; break; /* Adjust ofs. */
> case DASM_EXTERN: p += 2; break;
> @@ -391,12 +395,15 @@ int dasm_encode(Dst_DECL, void *buffer)
> if (mrm != 5) { mm[-1] -= 0x80; break; } }
> if (((n+128) & -256) != 0) goto wd; else mm[-1] -= 0x40;
> }
> + /* fallthrough */
> case DASM_IMM_S: case DASM_IMM_B: wb: dasmb(n); break;
> case DASM_IMM_DB: if (((n+128)&-256) == 0) {
> db: if (!mark) mark = cp; mark[-2] += 2; mark = NULL; goto wb;
> } else mark = NULL;
> + /* fallthrough */
> case DASM_IMM_D: wd: dasmd(n); break;
> case DASM_IMM_WB: if (((n+128)&-256) == 0) goto db; else mark = NULL;
> + /* fallthrough */
> case DASM_IMM_W: dasmw(n); break;
> case DASM_VREG: {
> int t = *p++;
> @@ -421,6 +428,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> }
> case DASM_REL_LG: p++; if (n >= 0) goto rel_pc;
> b++; n = (int)(ptrdiff_t)D->globals[-n];
> + /* fallthrough */
> case DASM_REL_A: rel_a: n -= (int)(ptrdiff_t)(cp+4); goto wd; /* !x64 */
> case DASM_REL_PC: rel_pc: {
> int shrink = *b++;
> @@ -432,6 +440,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> }
> case DASM_IMM_LG:
> p++; if (n < 0) { n = (int)(ptrdiff_t)D->globals[-n]; goto wd; }
> + /* fallthrough */
> case DASM_IMM_PC: {
> int *pb = DASM_POS2PTR(D, n);
> n = *pb < 0 ? pb[1] : (*pb + (int)(ptrdiff_t)base);
> @@ -452,6 +461,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> case DASM_EXTERN: n = DASM_EXTERN(Dst, cp, p[1], *p); p += 2; goto wd;
> case DASM_MARK: mark = cp; break;
> case DASM_ESC: action = *p++;
> + /* fallthrough */
> default: *cp++ = action; break;
> case DASM_SECTION: case DASM_STOP: goto stop;
> }
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 15de7e33..2d570bb9 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -2188,9 +2188,12 @@ static void asm_setup_regsp(ASMState *as)
> if (ir->op2 != REF_NIL && as->evenspill < 4)
> as->evenspill = 4; /* lj_cdata_newv needs 4 args. */
> }
> + /* fallthrough */
> #else
> + /* fallthrough */
> case IR_CNEW:
> #endif
> + /* fallthrough */
> case IR_TNEW: case IR_TDUP: case IR_CNEWI: case IR_TOSTR:
> case IR_BUFSTR:
> ir->prev = REGSP_HINT(RID_RET);
> @@ -2206,6 +2209,7 @@ static void asm_setup_regsp(ASMState *as)
> case IR_LDEXP:
> #endif
> #endif
> + /* fallthrough */
> case IR_POW:
> if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> if (inloop)
> @@ -2217,7 +2221,7 @@ static void asm_setup_regsp(ASMState *as)
> continue;
> #endif
> }
> - /* fallthrough for integer POW */
> + /* fallthrough */ /* for integer POW */
> case IR_DIV: case IR_MOD:
> if (!irt_isnum(ir->t)) {
> ir->prev = REGSP_HINT(RID_RET);
> @@ -2254,6 +2258,7 @@ static void asm_setup_regsp(ASMState *as)
> case IR_BSHL: case IR_BSHR: case IR_BSAR:
> if ((as->flags & JIT_F_BMI2)) /* Except if BMI2 is available. */
> break;
> + /* fallthrough */
> case IR_BROL: case IR_BROR:
> if (!irref_isk(ir->op2) && !ra_hashint(IR(ir->op2)->r)) {
> IR(ir->op2)->r = REGSP_HINT(RID_ECX);
> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
> index 07c643d4..cd032b8e 100644
> --- a/src/lj_cparse.c
> +++ b/src/lj_cparse.c
> @@ -595,28 +595,34 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> k->id = k2.id > k3.id ? k2.id : k3.id;
> continue;
> }
> + /* fallthrough */
> case 1:
> if (cp_opt(cp, CTOK_OROR)) {
> cp_expr_sub(cp, &k2, 2); k->i32 = k->u32 || k2.u32; k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 2:
> if (cp_opt(cp, CTOK_ANDAND)) {
> cp_expr_sub(cp, &k2, 3); k->i32 = k->u32 && k2.u32; k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 3:
> if (cp_opt(cp, '|')) {
> cp_expr_sub(cp, &k2, 4); k->u32 = k->u32 | k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 4:
> if (cp_opt(cp, '^')) {
> cp_expr_sub(cp, &k2, 5); k->u32 = k->u32 ^ k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 5:
> if (cp_opt(cp, '&')) {
> cp_expr_sub(cp, &k2, 6); k->u32 = k->u32 & k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 6:
> if (cp_opt(cp, CTOK_EQ)) {
> cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 == k2.u32; k->id = CTID_INT32;
> @@ -625,6 +631,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 != k2.u32; k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 7:
> if (cp_opt(cp, '<')) {
> cp_expr_sub(cp, &k2, 8);
> @@ -659,6 +666,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 8:
> if (cp_opt(cp, CTOK_SHL)) {
> cp_expr_sub(cp, &k2, 9); k->u32 = k->u32 << k2.u32;
> @@ -671,6 +679,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> k->u32 = k->u32 >> k2.u32;
> continue;
> }
> + /* fallthrough */
> case 9:
> if (cp_opt(cp, '+')) {
> cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 + k2.u32;
> @@ -680,6 +689,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> } else if (cp_opt(cp, '-')) {
> cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 - k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 10:
> if (cp_opt(cp, '*')) {
> cp_expr_unary(cp, &k2); k->u32 = k->u32 * k2.u32; goto arith_result;
> diff --git a/src/lj_err.c b/src/lj_err.c
> index 9903d273..8d7134d9 100644
> --- a/src/lj_err.c
> +++ b/src/lj_err.c
> @@ -167,6 +167,7 @@ static void *err_unwind(lua_State *L, void *stopcf, int errcode)
> case FRAME_CONT: /* Continuation frame. */
> if (frame_iscont_fficb(frame))
> goto unwind_c;
> + /* fallthrough */
> case FRAME_VARG: /* Vararg frame. */
> frame = frame_prevd(frame);
> break;
> diff --git a/src/lj_opt_sink.c b/src/lj_opt_sink.c
> index a16d112f..c16363e7 100644
> --- a/src/lj_opt_sink.c
> +++ b/src/lj_opt_sink.c
> @@ -100,8 +100,8 @@ static void sink_mark_ins(jit_State *J)
> (LJ_32 && ir+1 < irlast && (ir+1)->o == IR_HIOP &&
> !sink_checkphi(J, ir, (ir+1)->op2))))
> irt_setmark(ir->t); /* Mark ineligible allocation. */
> - /* fallthrough */
> #endif
> + /* fallthrough */
> case IR_USTORE:
> irt_setmark(IR(ir->op2)->t); /* Mark stored value. */
> break;
> diff --git a/src/lj_parse.c b/src/lj_parse.c
> index 343fa797..e238afa3 100644
> --- a/src/lj_parse.c
> +++ b/src/lj_parse.c
> @@ -2684,7 +2684,8 @@ static int parse_stmt(LexState *ls)
> lj_lex_next(ls);
> parse_goto(ls);
> break;
> - } /* else: fallthrough */
> + }
> + /* fallthrough */
> default:
> parse_call_assign(ls);
> break;
> diff --git a/src/luajit.c b/src/luajit.c
> index 1ca24301..3a3ec247 100644
> --- a/src/luajit.c
> +++ b/src/luajit.c
> @@ -421,6 +421,7 @@ static int collectargs(char **argv, int *flags)
> break;
> case 'e':
> *flags |= FLAGS_EXEC;
> + /* fallthrough */
> case 'j': /* LuaJIT extension */
> case 'l':
> *flags |= FLAGS_OPTION;
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-15 13:17 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:59 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:59 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM as trivial, except for a few comments regarding the commit message below.
> On Wed, Aug 09, 2023 at 06:36:00PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > (cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)
> >
> > This patch adds the `/* fallthrough */` where it may trigger the
> > `-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
> Typo: s/cases still/cases are still/
Fixed, thanks!
> > this comment and will be fixed in the future commits.
> Typo: s/in the/in/
Fixed.
> >
> > [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> >
> > Sergey Kaplun:
> > * added the description for the commit
> >
> > Part of tarantool/tarantool#8825
> > ---
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
2023-08-15 13:17 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 7:37 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 7:37 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch! LGTM
On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)
>
> This patch adds the `/* fallthrough */` where it may trigger the
> `-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
> this comment and will be fixed in the future commits.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
> dynasm/dasm_arm.h | 2 ++
> dynasm/dasm_mips.h | 1 +
> dynasm/dasm_ppc.h | 1 +
> dynasm/dasm_x86.h | 18 ++++++++++++++----
> src/lj_asm.c | 7 ++++++-
> src/lj_cparse.c | 10 ++++++++++
> src/lj_err.c | 1 +
> src/lj_opt_sink.c | 2 +-
> src/lj_parse.c | 3 ++-
> src/luajit.c | 1 +
> 10 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/dynasm/dasm_arm.h b/dynasm/dasm_arm.h
> index a43f7c66..1d404ccd 100644
> --- a/dynasm/dasm_arm.h
> +++ b/dynasm/dasm_arm.h
> @@ -254,6 +254,7 @@ void dasm_put(Dst_DECL, int start, ...)
> case DASM_IMMV8:
> CK((n & 3) == 0, RANGE_I);
> n >>= 2;
> + /* fallthrough */
> case DASM_IMML8:
> case DASM_IMML12:
> CK(n >= 0 ? ((n>>((ins>>5)&31)) == 0) :
> @@ -371,6 +372,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) - 4;
> diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
> index 4b49fd8c..71a835b2 100644
> --- a/dynasm/dasm_mips.h
> +++ b/dynasm/dasm_mips.h
> @@ -350,6 +350,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n);
> diff --git a/dynasm/dasm_ppc.h b/dynasm/dasm_ppc.h
> index 3a7ee9b0..83fc030a 100644
> --- a/dynasm/dasm_ppc.h
> +++ b/dynasm/dasm_ppc.h
> @@ -354,6 +354,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base);
> diff --git a/dynasm/dasm_x86.h b/dynasm/dasm_x86.h
> index bc636357..2a276042 100644
> --- a/dynasm/dasm_x86.h
> +++ b/dynasm/dasm_x86.h
> @@ -194,12 +194,13 @@ void dasm_put(Dst_DECL, int start, ...)
> switch (action) {
> case DASM_DISP:
> if (n == 0) { if (mrm < 0) mrm = p[-2]; if ((mrm&7) != 5) break; }
> - case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob;
> + /* fallthrough */
> + case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
> case DASM_REL_A: /* Assumes ptrdiff_t is int. !x64 */
> case DASM_IMM_D: ofs += 4; break;
> case DASM_IMM_S: CK(((n+128)&-256) == 0, RANGE_I); goto ob;
> case DASM_IMM_B: CK((n&-256) == 0, RANGE_I); ob: ofs++; break;
> - case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob;
> + case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
> case DASM_IMM_W: CK((n&-65536) == 0, RANGE_I); ofs += 2; break;
> case DASM_SPACE: p++; ofs += n; break;
> case DASM_SETLABEL: b[pos-2] = -0x40000000; break; /* Neg. label ofs. */
> @@ -207,8 +208,8 @@ void dasm_put(Dst_DECL, int start, ...)
> if (*p < 0x40 && p[1] == DASM_DISP) mrm = n;
> if (*p < 0x20 && (n&7) == 4) ofs++;
> switch ((*p++ >> 3) & 3) {
> - case 3: n |= b[pos-3];
> - case 2: n |= b[pos-2];
> + case 3: n |= b[pos-3]; /* fallthrough */
> + case 2: n |= b[pos-2]; /* fallthrough */
> case 1: if (n <= 7) { b[pos-1] |= 0x10; ofs--; }
> }
> continue;
> @@ -329,11 +330,14 @@ int dasm_link(Dst_DECL, size_t *szp)
> pos += 2;
> break;
> }
> + /* fallthrough */
> case DASM_SPACE: case DASM_IMM_LG: case DASM_VREG: p++;
> + /* fallthrough */
> case DASM_DISP: case DASM_IMM_S: case DASM_IMM_B: case DASM_IMM_W:
> case DASM_IMM_D: case DASM_IMM_WB: case DASM_IMM_DB:
> case DASM_SETLABEL: case DASM_REL_A: case DASM_IMM_PC: pos++; break;
> case DASM_LABEL_LG: p++;
> + /* fallthrough */
> case DASM_LABEL_PC: b[pos++] += ofs; break; /* Fix label offset. */
> case DASM_ALIGN: ofs -= (b[pos++]+ofs)&*p++; break; /* Adjust ofs. */
> case DASM_EXTERN: p += 2; break;
> @@ -391,12 +395,15 @@ int dasm_encode(Dst_DECL, void *buffer)
> if (mrm != 5) { mm[-1] -= 0x80; break; } }
> if (((n+128) & -256) != 0) goto wd; else mm[-1] -= 0x40;
> }
> + /* fallthrough */
> case DASM_IMM_S: case DASM_IMM_B: wb: dasmb(n); break;
> case DASM_IMM_DB: if (((n+128)&-256) == 0) {
> db: if (!mark) mark = cp; mark[-2] += 2; mark = NULL; goto wb;
> } else mark = NULL;
> + /* fallthrough */
> case DASM_IMM_D: wd: dasmd(n); break;
> case DASM_IMM_WB: if (((n+128)&-256) == 0) goto db; else mark = NULL;
> + /* fallthrough */
> case DASM_IMM_W: dasmw(n); break;
> case DASM_VREG: {
> int t = *p++;
> @@ -421,6 +428,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> }
> case DASM_REL_LG: p++; if (n >= 0) goto rel_pc;
> b++; n = (int)(ptrdiff_t)D->globals[-n];
> + /* fallthrough */
> case DASM_REL_A: rel_a: n -= (int)(ptrdiff_t)(cp+4); goto wd; /* !x64 */
> case DASM_REL_PC: rel_pc: {
> int shrink = *b++;
> @@ -432,6 +440,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> }
> case DASM_IMM_LG:
> p++; if (n < 0) { n = (int)(ptrdiff_t)D->globals[-n]; goto wd; }
> + /* fallthrough */
> case DASM_IMM_PC: {
> int *pb = DASM_POS2PTR(D, n);
> n = *pb < 0 ? pb[1] : (*pb + (int)(ptrdiff_t)base);
> @@ -452,6 +461,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> case DASM_EXTERN: n = DASM_EXTERN(Dst, cp, p[1], *p); p += 2; goto wd;
> case DASM_MARK: mark = cp; break;
> case DASM_ESC: action = *p++;
> + /* fallthrough */
> default: *cp++ = action; break;
> case DASM_SECTION: case DASM_STOP: goto stop;
> }
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 15de7e33..2d570bb9 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -2188,9 +2188,12 @@ static void asm_setup_regsp(ASMState *as)
> if (ir->op2 != REF_NIL && as->evenspill < 4)
> as->evenspill = 4; /* lj_cdata_newv needs 4 args. */
> }
> + /* fallthrough */
> #else
> + /* fallthrough */
> case IR_CNEW:
> #endif
> + /* fallthrough */
> case IR_TNEW: case IR_TDUP: case IR_CNEWI: case IR_TOSTR:
> case IR_BUFSTR:
> ir->prev = REGSP_HINT(RID_RET);
> @@ -2206,6 +2209,7 @@ static void asm_setup_regsp(ASMState *as)
> case IR_LDEXP:
> #endif
> #endif
> + /* fallthrough */
> case IR_POW:
> if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> if (inloop)
> @@ -2217,7 +2221,7 @@ static void asm_setup_regsp(ASMState *as)
> continue;
> #endif
> }
> - /* fallthrough for integer POW */
> + /* fallthrough */ /* for integer POW */
> case IR_DIV: case IR_MOD:
> if (!irt_isnum(ir->t)) {
> ir->prev = REGSP_HINT(RID_RET);
> @@ -2254,6 +2258,7 @@ static void asm_setup_regsp(ASMState *as)
> case IR_BSHL: case IR_BSHR: case IR_BSAR:
> if ((as->flags & JIT_F_BMI2)) /* Except if BMI2 is available. */
> break;
> + /* fallthrough */
> case IR_BROL: case IR_BROR:
> if (!irref_isk(ir->op2) && !ra_hashint(IR(ir->op2)->r)) {
> IR(ir->op2)->r = REGSP_HINT(RID_ECX);
> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
> index 07c643d4..cd032b8e 100644
> --- a/src/lj_cparse.c
> +++ b/src/lj_cparse.c
> @@ -595,28 +595,34 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> k->id = k2.id > k3.id ? k2.id : k3.id;
> continue;
> }
> + /* fallthrough */
> case 1:
> if (cp_opt(cp, CTOK_OROR)) {
> cp_expr_sub(cp, &k2, 2); k->i32 = k->u32 || k2.u32; k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 2:
> if (cp_opt(cp, CTOK_ANDAND)) {
> cp_expr_sub(cp, &k2, 3); k->i32 = k->u32 && k2.u32; k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 3:
> if (cp_opt(cp, '|')) {
> cp_expr_sub(cp, &k2, 4); k->u32 = k->u32 | k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 4:
> if (cp_opt(cp, '^')) {
> cp_expr_sub(cp, &k2, 5); k->u32 = k->u32 ^ k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 5:
> if (cp_opt(cp, '&')) {
> cp_expr_sub(cp, &k2, 6); k->u32 = k->u32 & k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 6:
> if (cp_opt(cp, CTOK_EQ)) {
> cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 == k2.u32; k->id = CTID_INT32;
> @@ -625,6 +631,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 != k2.u32; k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 7:
> if (cp_opt(cp, '<')) {
> cp_expr_sub(cp, &k2, 8);
> @@ -659,6 +666,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> k->id = CTID_INT32;
> continue;
> }
> + /* fallthrough */
> case 8:
> if (cp_opt(cp, CTOK_SHL)) {
> cp_expr_sub(cp, &k2, 9); k->u32 = k->u32 << k2.u32;
> @@ -671,6 +679,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> k->u32 = k->u32 >> k2.u32;
> continue;
> }
> + /* fallthrough */
> case 9:
> if (cp_opt(cp, '+')) {
> cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 + k2.u32;
> @@ -680,6 +689,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
> } else if (cp_opt(cp, '-')) {
> cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 - k2.u32; goto arith_result;
> }
> + /* fallthrough */
> case 10:
> if (cp_opt(cp, '*')) {
> cp_expr_unary(cp, &k2); k->u32 = k->u32 * k2.u32; goto arith_result;
> diff --git a/src/lj_err.c b/src/lj_err.c
> index 9903d273..8d7134d9 100644
> --- a/src/lj_err.c
> +++ b/src/lj_err.c
> @@ -167,6 +167,7 @@ static void *err_unwind(lua_State *L, void *stopcf, int errcode)
> case FRAME_CONT: /* Continuation frame. */
> if (frame_iscont_fficb(frame))
> goto unwind_c;
> + /* fallthrough */
> case FRAME_VARG: /* Vararg frame. */
> frame = frame_prevd(frame);
> break;
> diff --git a/src/lj_opt_sink.c b/src/lj_opt_sink.c
> index a16d112f..c16363e7 100644
> --- a/src/lj_opt_sink.c
> +++ b/src/lj_opt_sink.c
> @@ -100,8 +100,8 @@ static void sink_mark_ins(jit_State *J)
> (LJ_32 && ir+1 < irlast && (ir+1)->o == IR_HIOP &&
> !sink_checkphi(J, ir, (ir+1)->op2))))
> irt_setmark(ir->t); /* Mark ineligible allocation. */
> - /* fallthrough */
> #endif
> + /* fallthrough */
> case IR_USTORE:
> irt_setmark(IR(ir->op2)->t); /* Mark stored value. */
> break;
> diff --git a/src/lj_parse.c b/src/lj_parse.c
> index 343fa797..e238afa3 100644
> --- a/src/lj_parse.c
> +++ b/src/lj_parse.c
> @@ -2684,7 +2684,8 @@ static int parse_stmt(LexState *ls)
> lj_lex_next(ls);
> parse_goto(ls);
> break;
> - } /* else: fallthrough */
> + }
> + /* fallthrough */
> default:
> parse_call_assign(ls);
> break;
> diff --git a/src/luajit.c b/src/luajit.c
> index 1ca24301..3a3ec247 100644
> --- a/src/luajit.c
> +++ b/src/luajit.c
> @@ -421,6 +421,7 @@ static int collectargs(char **argv, int *flags)
> break;
> case 'e':
> *flags |= FLAGS_EXEC;
> + /* fallthrough */
> case 'j': /* LuaJIT extension */
> case 'l':
> *flags |= FLAGS_OPTION;
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (10 preceding siblings ...)
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:21 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 7:39 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
` (9 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
(cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
This patch adds the `/* fallthrough */` comments elsewhere, where it was
missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
is trigerred.
[1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
Sergey Kaplun:
* added the description for the commit
Part of tarantool/tarantool#8825
---
dynasm/dasm_arm64.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/dynasm/dasm_arm64.h b/dynasm/dasm_arm64.h
index 47e1e074..ff21236d 100644
--- a/dynasm/dasm_arm64.h
+++ b/dynasm/dasm_arm64.h
@@ -427,6 +427,7 @@ int dasm_encode(Dst_DECL, void *buffer)
break;
case DASM_REL_LG:
CK(n >= 0, UNDEF_LG);
+ /* fallthrough */
case DASM_REL_PC:
CK(n >= 0, UNDEF_PC);
n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4;
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:21 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:01 ` Sergey Kaplun via Tarantool-patches
2023-08-17 7:39 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:21 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM as trivial, except for the single comment regarding the commit message below.
On Wed, Aug 09, 2023 at 06:36:01PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
>
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
Since there are no 'comments', but the single 'comment', I believe a better phrasing
would be:
| This patch adds the `/* fallthrough */` comment to dynasm/dasm_arm64.h, so the
| `-Wimplicit-fallthrough` [1] warning is not trigerred anymore for the ARM64 build.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
> dynasm/dasm_arm64.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/dynasm/dasm_arm64.h b/dynasm/dasm_arm64.h
> index 47e1e074..ff21236d 100644
> --- a/dynasm/dasm_arm64.h
> +++ b/dynasm/dasm_arm64.h
> @@ -427,6 +427,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4;
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
2023-08-15 13:21 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:01 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:01 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Rephrased the commit message, as you've suggested.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM as trivial, except for the single comment regarding the commit message below.
> On Wed, Aug 09, 2023 at 06:36:01PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
> >
> > This patch adds the `/* fallthrough */` comments elsewhere, where it was
> > missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> > is trigerred.
> Since there are no 'comments', but the single 'comment', I believe a better phrasing
> would be:
> | This patch adds the `/* fallthrough */` comment to dynasm/dasm_arm64.h, so the
> | `-Wimplicit-fallthrough` [1] warning is not trigerred anymore for the ARM64 build.
Fixed, thanks!
> >
> > [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> >
> > Sergey Kaplun:
> > * added the description for the commit
> >
> > Part of tarantool/tarantool#8825
> > ---
<snipped>
> > 2.41.0
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
2023-08-15 13:21 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 7:39 ` Sergey Bronnikov via Tarantool-patches
2023-08-17 7:51 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 7:39 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch! LGTM
On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
>
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
> dynasm/dasm_arm64.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/dynasm/dasm_arm64.h b/dynasm/dasm_arm64.h
> index 47e1e074..ff21236d 100644
> --- a/dynasm/dasm_arm64.h
> +++ b/dynasm/dasm_arm64.h
> @@ -427,6 +427,7 @@ int dasm_encode(Dst_DECL, void *buffer)
> break;
> case DASM_REL_LG:
> CK(n >= 0, UNDEF_LG);
> + /* fallthrough */
> case DASM_REL_PC:
> CK(n >= 0, UNDEF_PC);
> n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4;
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
2023-08-17 7:39 ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17 7:51 ` Sergey Bronnikov via Tarantool-patches
2023-08-17 7:58 ` Sergey Kaplun via Tarantool-patches
0 siblings, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 7:51 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey, again
On 8/17/23 10:39, Sergey Bronnikov via Tarantool-patches wrote:
> Hi, Sergey!
>
>
> Thanks for the patch! LGTM
>
>
> On 8/9/23 18:36, Sergey Kaplun wrote:
>> From: Mike Pall <mike>
>>
>> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
>>
>> This patch adds the `/* fallthrough */` comments elsewhere, where it was
>> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
>> is trigerred.
>>
Typo: triggered
<snipped>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
2023-08-17 7:51 ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17 7:58 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17 7:58 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey, again
>
> On 8/17/23 10:39, Sergey Bronnikov via Tarantool-patches wrote:
> > Hi, Sergey!
> >
> >
> > Thanks for the patch! LGTM
> >
> >
> > On 8/9/23 18:36, Sergey Kaplun wrote:
> >> From: Mike Pall <mike>
> >>
> >> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
> >>
> >> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> >> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> >> is trigerred.
> >>
> Typo: triggered
Fixed.
>
>
> <snipped>
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (11 preceding siblings ...)
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:25 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 7:44 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
` (8 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
(cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)
This patch adds the `/* fallthrough */` comments elsewhere, where it was
missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
is trigerred.
Also, this commits sets the correspoinding flag in the
<cmake/SetTargetFlags.cmake>.
[1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
Sergey Kaplun:
* added the description for the commit
Part of tarantool/tarantool#8825
---
cmake/SetTargetFlags.cmake | 6 ++++++
src/lj_asm.c | 2 +-
src/lj_asm_arm.h | 4 ++--
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
index 3b9e481d..d309989e 100644
--- a/cmake/SetTargetFlags.cmake
+++ b/cmake/SetTargetFlags.cmake
@@ -8,6 +8,12 @@
include(CheckUnwindTables)
+# Clang does not recognize comment markers.
+if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
+ AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
+ AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
+endif()
+
if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
set(BUILDVM_MODE machasm)
else() # Linux and FreeBSD.
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 2d570bb9..25b96264 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -2176,8 +2176,8 @@ static void asm_setup_regsp(ASMState *as)
#if LJ_SOFTFP
case IR_MIN: case IR_MAX:
if ((ir+1)->o != IR_HIOP) break;
- /* fallthrough */
#endif
+ /* fallthrough */
/* C calls evict all scratch regs and return results in RID_RET. */
case IR_SNEW: case IR_XSNEW: case IR_NEWREF: case IR_BUFPUT:
if (REGARG_NUMGPR < 3 && as->evenspill < 3)
diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h
index 6ae6e2f2..2894e5c9 100644
--- a/src/lj_asm_arm.h
+++ b/src/lj_asm_arm.h
@@ -979,7 +979,7 @@ static ARMIns asm_fxloadins(IRIns *ir)
case IRT_I16: return ARMI_LDRSH;
case IRT_U16: return ARMI_LDRH;
case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VLDR_D;
- case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S;
+ case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S; /* fallthrough */
default: return ARMI_LDR;
}
}
@@ -990,7 +990,7 @@ static ARMIns asm_fxstoreins(IRIns *ir)
case IRT_I8: case IRT_U8: return ARMI_STRB;
case IRT_I16: case IRT_U16: return ARMI_STRH;
case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VSTR_D;
- case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S;
+ case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S; /* fallthrough */
default: return ARMI_STR;
}
}
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:25 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:08 ` Sergey Kaplun via Tarantool-patches
2023-08-17 7:44 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:25 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM as trivial, except for a few nits, regarding the commit message.
On Wed, Aug 09, 2023 at 06:36:02PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)
>
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
Typo: s/where it was/where they were/
> missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
Typo: s/is trigerred/is not triggered/
>
> Also, this commits sets the correspoinding flag in the
Typo: s/commits/commit/
Typo: s/in the/in/
> <cmake/SetTargetFlags.cmake>.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
> cmake/SetTargetFlags.cmake | 6 ++++++
> src/lj_asm.c | 2 +-
> src/lj_asm_arm.h | 4 ++--
> 3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
> index 3b9e481d..d309989e 100644
> --- a/cmake/SetTargetFlags.cmake
> +++ b/cmake/SetTargetFlags.cmake
> @@ -8,6 +8,12 @@
>
> include(CheckUnwindTables)
>
> +# Clang does not recognize comment markers.
> +if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
> + AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
> + AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
> +endif()
> +
> if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
> set(BUILDVM_MODE machasm)
> else() # Linux and FreeBSD.
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 2d570bb9..25b96264 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -2176,8 +2176,8 @@ static void asm_setup_regsp(ASMState *as)
> #if LJ_SOFTFP
> case IR_MIN: case IR_MAX:
> if ((ir+1)->o != IR_HIOP) break;
> - /* fallthrough */
> #endif
> + /* fallthrough */
> /* C calls evict all scratch regs and return results in RID_RET. */
> case IR_SNEW: case IR_XSNEW: case IR_NEWREF: case IR_BUFPUT:
> if (REGARG_NUMGPR < 3 && as->evenspill < 3)
> diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h
> index 6ae6e2f2..2894e5c9 100644
> --- a/src/lj_asm_arm.h
> +++ b/src/lj_asm_arm.h
> @@ -979,7 +979,7 @@ static ARMIns asm_fxloadins(IRIns *ir)
> case IRT_I16: return ARMI_LDRSH;
> case IRT_U16: return ARMI_LDRH;
> case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VLDR_D;
> - case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S;
> + case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S; /* fallthrough */
> default: return ARMI_LDR;
> }
> }
> @@ -990,7 +990,7 @@ static ARMIns asm_fxstoreins(IRIns *ir)
> case IRT_I8: case IRT_U8: return ARMI_STRB;
> case IRT_I16: case IRT_U16: return ARMI_STRH;
> case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VSTR_D;
> - case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S;
> + case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S; /* fallthrough */
> default: return ARMI_STR;
> }
> }
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-15 13:25 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:08 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:08 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the patch!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM as trivial, except for a few nits, regarding the commit message.
> On Wed, Aug 09, 2023 at 06:36:02PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > (cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)
> >
> > This patch adds the `/* fallthrough */` comments elsewhere, where it was
> Typo: s/where it was/where they were/
Fixed.
> > missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
> > is trigerred.
> Typo: s/is trigerred/is not triggered/
Fixed, thanks!
> >
> > Also, this commits sets the correspoinding flag in the
> Typo: s/commits/commit/
> Typo: s/in the/in/
Fixed.
> > <cmake/SetTargetFlags.cmake>.
> >
> > [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> >
> > Sergey Kaplun:
> > * added the description for the commit
> >
> > Part of tarantool/tarantool#8825
> > ---
Also added the following patch to be consisted with our codestyle in
CMake.
===================================================================
diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
index d309989e..d6ee1693 100644
--- a/cmake/SetTargetFlags.cmake
+++ b/cmake/SetTargetFlags.cmake
@@ -9,7 +9,7 @@
include(CheckUnwindTables)
# Clang does not recognize comment markers.
-if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
+if(CMAKE_C_COMPILER_ID STREQUAL "GNU"
AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
endif()
===================================================================
<snipped>
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
2023-08-15 13:25 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 7:44 ` Sergey Bronnikov via Tarantool-patches
2023-08-17 8:01 ` Sergey Kaplun via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 7:44 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
LGTM, but I have a question inline
On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)
>
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
>
> Also, this commits sets the correspoinding flag in the
> <cmake/SetTargetFlags.cmake>.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
> cmake/SetTargetFlags.cmake | 6 ++++++
> src/lj_asm.c | 2 +-
> src/lj_asm_arm.h | 4 ++--
> 3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
> index 3b9e481d..d309989e 100644
> --- a/cmake/SetTargetFlags.cmake
> +++ b/cmake/SetTargetFlags.cmake
> @@ -8,6 +8,12 @@
>
> include(CheckUnwindTables)
>
> +# Clang does not recognize comment markers.
> +if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
> + AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
GCC 7.1 because there was no 7.0 in release series 7 [1], right?
1. https://gcc.gnu.org/gcc-7/
> + AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
> +endif()
> +
> if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
> set(BUILDVM_MODE machasm)
> else() # Linux and FreeBSD.
<snipped>
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (12 preceding siblings ...)
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:35 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 8:29 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
` (7 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Thanks to Sergey Ostanevich.
(cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
This patch just reverts the commit
48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
debug.getinfo(1,'>S')") and applies the one from the main repo for the
consistency with the upstream.
---
src/lj_debug.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/src/lj_debug.c b/src/lj_debug.c
index 654dc913..c4edcabb 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -431,16 +431,12 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
TValue *frame = NULL;
TValue *nextframe = NULL;
GCfunc *fn;
- if (*what == '>') { /* we have to have an extra arg on stack */
- if (lua_gettop(L) > 2) {
- TValue *func = L->top - 1;
- api_check(L, tvisfunc(func));
- fn = funcV(func);
- L->top--;
- what++;
- } else { /* need better error to display? */
- return 0;
- }
+ if (*what == '>') {
+ TValue *func = L->top - 1;
+ if (!tvisfunc(func)) return 0;
+ fn = funcV(func);
+ L->top--;
+ what++;
} else {
uint32_t offset = (uint32_t)ar->i_ci & 0xffff;
uint32_t size = (uint32_t)ar->i_ci >> 16;
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:35 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:20 ` Sergey Kaplun via Tarantool-patches
2023-08-17 8:29 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:35 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
Please consider my comments below.
On Wed, Aug 09, 2023 at 06:36:03PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Thanks to Sergey Ostanevich.
>
> (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
>
> This patch just reverts the commit
> 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
> debug.getinfo(1,'>S')") and applies the one from the main repo for the
Typo: s/for the/for/
> consistency with the upstream.
> ---
> src/lj_debug.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
Since there were no test with the original fix, it would be nice to
add one.
>
> diff --git a/src/lj_debug.c b/src/lj_debug.c
> index 654dc913..c4edcabb 100644
> --- a/src/lj_debug.c
> +++ b/src/lj_debug.c
> @@ -431,16 +431,12 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
> TValue *frame = NULL;
> TValue *nextframe = NULL;
> GCfunc *fn;
> - if (*what == '>') { /* we have to have an extra arg on stack */
> - if (lua_gettop(L) > 2) {
> - TValue *func = L->top - 1;
> - api_check(L, tvisfunc(func));
> - fn = funcV(func);
> - L->top--;
> - what++;
> - } else { /* need better error to display? */
> - return 0;
> - }
> + if (*what == '>') {
> + TValue *func = L->top - 1;
> + if (!tvisfunc(func)) return 0;
> + fn = funcV(func);
> + L->top--;
> + what++;
> } else {
> uint32_t offset = (uint32_t)ar->i_ci & 0xffff;
> uint32_t size = (uint32_t)ar->i_ci >> 16;
> --
> 2.41.0
Best regards,
Maxim Kokryashkin
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
2023-08-15 13:35 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:20 ` Sergey Kaplun via Tarantool-patches
2023-08-16 20:13 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:20 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
>
> On Wed, Aug 09, 2023 at 06:36:03PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Thanks to Sergey Ostanevich.
> >
> > (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
> >
> > This patch just reverts the commit
> > 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
> > debug.getinfo(1,'>S')") and applies the one from the main repo for the
> Typo: s/for the/for/
Fixed.
> > consistency with the upstream.
> > ---
> > src/lj_debug.c | 16 ++++++----------
> > 1 file changed, 6 insertions(+), 10 deletions(-)
>
> Since there were no test with the original fix, it would be nice to
> add one.
Added, see iterative diff below:
===================================================================
diff --git a/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
new file mode 100644
index 00000000..a50b80e4
--- /dev/null
+++ b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
@@ -0,0 +1,13 @@
+local tap = require('tap')
+
+-- Test file to demonstrate crash in the `debug.getinfo()` call.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/509.
+local test = tap.test('lj-509-debug-getinfo-arguments-check.test.lua')
+test:plan(2)
+
+-- '>' expects to have an extra argument on the stack.
+local res, err = pcall(debug.getinfo, 1, '>S')
+test:ok(not res, 'check result of the call with invalid arguments')
+test:like(err, 'bad argument', 'check the error message')
+
+test:done(true)
===================================================================
> >
<snipped>
> > 2.41.0
> Best regards,
> Maxim Kokryashkin
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
2023-08-16 14:20 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 20:13 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16 20:13 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 2125 bytes --]
Hi, Sergey!
Thanks for the fixes!
LGTM
--
Best regards,
Maxim Kokryashkin
>Среда, 16 августа 2023, 17:25 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
>
>Hi, Maxim!
>Thanks for the review!
>
>On 15.08.23, Maxim Kokryashkin wrote:
>> Hi, Sergey!
>> Thanks for the patch!
>> Please consider my comments below.
>>
>> On Wed, Aug 09, 2023 at 06:36:03PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> > From: Mike Pall <mike>
>> >
>> > Thanks to Sergey Ostanevich.
>> >
>> > (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
>> >
>> > This patch just reverts the commit
>> > 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
>> > debug.getinfo(1,'>S')") and applies the one from the main repo for the
>> Typo: s/for the/for/
>
>Fixed.
>
>> > consistency with the upstream.
>> > ---
>> > src/lj_debug.c | 16 ++++++----------
>> > 1 file changed, 6 insertions(+), 10 deletions(-)
>>
>> Since there were no test with the original fix, it would be nice to
>> add one.
>
>Added, see iterative diff below:
>
>===================================================================
>diff --git a/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
>new file mode 100644
>index 00000000..a50b80e4
>--- /dev/null
>+++ b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
>@@ -0,0 +1,13 @@
>+local tap = require('tap')
>+
>+-- Test file to demonstrate crash in the `debug.getinfo()` call.
>+-- See also: https://github.com/LuaJIT/LuaJIT/issues/509 .
>+local test = tap.test('lj-509-debug-getinfo-arguments-check.test.lua')
>+test:plan(2)
>+
>+-- '>' expects to have an extra argument on the stack.
>+local res, err = pcall(debug.getinfo, 1, '>S')
>+test:ok(not res, 'check result of the call with invalid arguments')
>+test:like(err, 'bad argument', 'check the error message')
>+
>+test:done(true)
>===================================================================
>
>> >
>
><snipped>
>
>> > 2.41.0
>> Best regards,
>> Maxim Kokryashkin
>> >
>
>--
>Best regards,
>Sergey Kaplun
[-- Attachment #2: Type: text/html, Size: 2912 bytes --]
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
2023-08-15 13:35 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 8:29 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 8:29 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
LGTM
On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Thanks to Sergey Ostanevich.
>
> (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
>
> This patch just reverts the commit
> 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
> debug.getinfo(1,'>S')") and applies the one from the main repo for the
> consistency with the upstream.
> ---
> src/lj_debug.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/src/lj_debug.c b/src/lj_debug.c
> index 654dc913..c4edcabb 100644
> --- a/src/lj_debug.c
> +++ b/src/lj_debug.c
> @@ -431,16 +431,12 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
> TValue *frame = NULL;
> TValue *nextframe = NULL;
> GCfunc *fn;
> - if (*what == '>') { /* we have to have an extra arg on stack */
> - if (lua_gettop(L) > 2) {
> - TValue *func = L->top - 1;
> - api_check(L, tvisfunc(func));
> - fn = funcV(func);
> - L->top--;
> - what++;
> - } else { /* need better error to display? */
> - return 0;
> - }
> + if (*what == '>') {
> + TValue *func = L->top - 1;
> + if (!tvisfunc(func)) return 0;
> + fn = funcV(func);
> + L->top--;
> + what++;
> } else {
> uint32_t offset = (uint32_t)ar->i_ci & 0xffff;
> uint32_t size = (uint32_t)ar->i_ci >> 16;
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (13 preceding siblings ...)
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
2023-08-15 14:07 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 8:57 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
` (6 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Thanks to Yichun Zhang.
(cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
This patch is predecessor for the commit
944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
that leading to the assertion failure. Since the predecessor patch,
there are no places, that can lead to the condition failure, since we
always check that new baseslot + framesize (+ vargframe) >=
`LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
for details), we can't obtain this assertion failure. This patch is
added for the consistency with the upstream.
Since the predecessor patch fixes the issue, there is no new test case
to add.
Sergey Kaplun:
* added the description for the problem
Part of tarantool/tarantool#8825
---
src/lj_record.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/lj_record.c b/src/lj_record.c
index 02d9db9e..6030f77c 100644
--- a/src/lj_record.c
+++ b/src/lj_record.c
@@ -87,9 +87,9 @@ static void rec_check_slots(jit_State *J)
BCReg s, nslots = J->baseslot + J->maxslot;
int32_t depth = 0;
cTValue *base = J->L->base - J->baseslot;
- lua_assert(J->baseslot >= 1+LJ_FR2 && J->baseslot < LJ_MAX_JSLOTS);
+ lua_assert(J->baseslot >= 1+LJ_FR2);
lua_assert(J->baseslot == 1+LJ_FR2 || (J->slot[J->baseslot-1] & TREF_FRAME));
- lua_assert(nslots < LJ_MAX_JSLOTS);
+ lua_assert(nslots <= LJ_MAX_JSLOTS);
for (s = 0; s < nslots; s++) {
TRef tr = J->slot[s];
if (tr) {
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
@ 2023-08-15 14:07 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:22 ` Sergey Kaplun via Tarantool-patches
2023-08-17 8:57 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 14:07 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:36:04PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Thanks to Yichun Zhang.
>
> (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
>
> This patch is predecessor for the commit
Typo: s/is predecessor for the/is the predecessor to/
> 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> that leading to the assertion failure. Since the predecessor patch,
Typo: s/leading/leads/
> there are no places, that can lead to the condition failure, since we
> always check that new baseslot + framesize (+ vargframe) >=
> `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
Typo: s/as minimum/as the minimum/
> for details), we can't obtain this assertion failure. This patch is
> added for the consistency with the upstream.
Typo: s/the consistency/consistency/
>
> Since the predecessor patch fixes the issue, there is no new test case
> to add.
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_record.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/lj_record.c b/src/lj_record.c
> index 02d9db9e..6030f77c 100644
> --- a/src/lj_record.c
> +++ b/src/lj_record.c
> @@ -87,9 +87,9 @@ static void rec_check_slots(jit_State *J)
> BCReg s, nslots = J->baseslot + J->maxslot;
> int32_t depth = 0;
> cTValue *base = J->L->base - J->baseslot;
> - lua_assert(J->baseslot >= 1+LJ_FR2 && J->baseslot < LJ_MAX_JSLOTS);
> + lua_assert(J->baseslot >= 1+LJ_FR2);
> lua_assert(J->baseslot == 1+LJ_FR2 || (J->slot[J->baseslot-1] & TREF_FRAME));
> - lua_assert(nslots < LJ_MAX_JSLOTS);
> + lua_assert(nslots <= LJ_MAX_JSLOTS);
> for (s = 0; s < nslots; s++) {
> TRef tr = J->slot[s];
> if (tr) {
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
2023-08-15 14:07 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:22 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:22 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Fixed your comments inline.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:36:04PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Thanks to Yichun Zhang.
> >
> > (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
> >
> > This patch is predecessor for the commit
> Typo: s/is predecessor for the/is the predecessor to/
Fixed.
> > 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> > check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> > that leading to the assertion failure. Since the predecessor patch,
> Typo: s/leading/leads/
Fixed, thanks!
> > there are no places, that can lead to the condition failure, since we
> > always check that new baseslot + framesize (+ vargframe) >=
> > `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
> Typo: s/as minimum/as the minimum/
Fixed.
> > for details), we can't obtain this assertion failure. This patch is
> > added for the consistency with the upstream.
> Typo: s/the consistency/consistency/
Fixed.
> >
> > Since the predecessor patch fixes the issue, there is no new test case
> > to add.
> >
> > Sergey Kaplun:
> > * added the description for the problem
> >
> > Part of tarantool/tarantool#8825
> > ---
> > src/lj_record.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/lj_record.c b/src/lj_record.c
> > index 02d9db9e..6030f77c 100644
> > --- a/src/lj_record.c
> > +++ b/src/lj_record.c
<snipped>
> > --
> > 2.41.0
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
2023-08-15 14:07 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 8:57 ` Sergey Bronnikov via Tarantool-patches
2023-08-17 8:57 ` Sergey Kaplun via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 8:57 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Thanks to Yichun Zhang.
>
> (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
>
> This patch is predecessor for the commit
> 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> that leading to the assertion failure. Since the predecessor patch,
> there are no places, that can lead to the condition failure, since we
> always check that new baseslot + framesize (+ vargframe) >=
> `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
> for details), we can't obtain this assertion failure. This patch is
> added for the consistency with the upstream.
>
> Since the predecessor patch fixes the issue, there is no new test case
> to add.
>
> Sergey Kaplun:
> * added the description for the problem
Test for backported patch is missing. Why?
>
> Part of tarantool/tarantool#8825
<snipped>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
2023-08-17 8:57 ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17 8:57 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17 8:57 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey!
>
> On 8/9/23 18:36, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Thanks to Yichun Zhang.
> >
> > (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
> >
> > This patch is predecessor for the commit
> > 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> > check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> > that leading to the assertion failure. Since the predecessor patch,
> > there are no places, that can lead to the condition failure, since we
> > always check that new baseslot + framesize (+ vargframe) >=
> > `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
> > for details), we can't obtain this assertion failure. This patch is
> > added for the consistency with the upstream.
> >
> > Since the predecessor patch fixes the issue, there is no new test case
> > to add.
> >
> > Sergey Kaplun:
> > * added the description for the problem
> Test for backported patch is missing. Why?
As mentioned above there is two separate commits (the current one and
this one [1]). Since the second fixes the issue and was backported
earlier there is no new testcase provided (see the commit message).
> >
> > Part of tarantool/tarantool#8825
> <snipped>
[1]: https://github.com/LuaJIT/LuaJIT/commit/630ff319
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (14 preceding siblings ...)
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
2023-08-15 14:38 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 10:53 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF Sergey Kaplun via Tarantool-patches
` (5 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
(cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)
This commit fixes possible integer overflow of the separator's length
counter during parsing long strings. It may lead to the fact, that
parser considers a string with unbalanced long brackets to be correct.
Since this is pointless to parse too long string separators in the hope,
that the string is correct, just use hardcoded limit (2 ^ 25 is enough).
Be aware that this limit is different for Lua 5.1.
We can't check the string overflow itself without a really large file,
because the ERR_MEM error will be raised, due to the string buffer
reallocations during parsing. Keep such huge file in the repo is
pointless, so just check that we don't parse long string after
aforementioned separator length.
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#8825
---
src/lj_lex.c | 2 +-
.../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
2 files changed, 32 insertions(+), 1 deletion(-)
create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
diff --git a/src/lj_lex.c b/src/lj_lex.c
index 52856912..c66660d7 100644
--- a/src/lj_lex.c
+++ b/src/lj_lex.c
@@ -138,7 +138,7 @@ static int lex_skipeq(LexState *ls)
int count = 0;
LexChar s = ls->c;
lua_assert(s == '[' || s == ']');
- while (lex_savenext(ls) == '=')
+ while (lex_savenext(ls) == '=' && count < 0x20000000)
count++;
return (ls->c == s) ? count : (-count) - 1;
}
diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
new file mode 100644
index 00000000..fda69d17
--- /dev/null
+++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
@@ -0,0 +1,31 @@
+local tap = require('tap')
+
+-- Test to check that we avoid parsing of too long separator
+-- for long strings.
+-- See also the discussion in the
+-- https://github.com/LuaJIT/LuaJIT/issues/812.
+
+local test = tap.test('lj-812-too-long-string-separator'):skipcond({
+ ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
+})
+test:plan(2)
+
+-- We can't check the string overflow itself without a really
+-- large file, because the ERR_MEM error will be raised, due to
+-- the string buffer reallocations during parsing.
+-- Keep such huge file in the repo is pointless, so just check
+-- that we don't parse long string after some separator length.
+-- Be aware that this limit is different for Lua 5.1.
+
+-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
+local separator = string.rep('=', 0x20000000 + 1)
+local test_str = ('return [%s[]%s]'):format(separator, separator)
+
+local f, err = loadstring(test_str, 'empty_str_f')
+test:ok(not f, 'correct status when parsing string with too long separator')
+
+-- Check error message.
+test:ok(tostring(err):match('invalid long string delimiter'),
+ 'correct error when parsing string with too long separator')
+
+test:done(true)
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
@ 2023-08-15 14:38 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:52 ` Sergey Kaplun via Tarantool-patches
2023-08-17 10:53 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 14:38 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:36:05PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)
>
> This commit fixes possible integer overflow of the separator's length
Typo: s/possible/a possible/
> counter during parsing long strings. It may lead to the fact, that
> parser considers a string with unbalanced long brackets to be correct.
Typo: s/parser/the parser/
> Since this is pointless to parse too long string separators in the hope,
Typo: s/this is/it is/
> that the string is correct, just use hardcoded limit (2 ^ 25 is enough).
Typo: s/use hardcoded/use the hardcoded/
>
> Be aware that this limit is different for Lua 5.1.
>
> We can't check the string overflow itself without a really large file,
> because the ERR_MEM error will be raised, due to the string buffer
> reallocations during parsing. Keep such huge file in the repo is
Typo: s/Keep such/Keeping such a/
> pointless, so just check that we don't parse long string after
Typo: s/long string/long strings/
> aforementioned separator length.
Typo: s/aforementioned/the aforementioned/
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_lex.c | 2 +-
> .../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
> 2 files changed, 32 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
>
> diff --git a/src/lj_lex.c b/src/lj_lex.c
> index 52856912..c66660d7 100644
> --- a/src/lj_lex.c
> +++ b/src/lj_lex.c
> @@ -138,7 +138,7 @@ static int lex_skipeq(LexState *ls)
> int count = 0;
> LexChar s = ls->c;
> lua_assert(s == '[' || s == ']');
> - while (lex_savenext(ls) == '=')
> + while (lex_savenext(ls) == '=' && count < 0x20000000)
> count++;
> return (ls->c == s) ? count : (-count) - 1;
> }
> diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> new file mode 100644
> index 00000000..fda69d17
> --- /dev/null
> +++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> @@ -0,0 +1,31 @@
> +local tap = require('tap')
> +
> +-- Test to check that we avoid parsing of too long separator
Typo: s/parsing of/parsing/
Typo: s/separator/separators/
> +-- for long strings.
> +-- See also the discussion in the
> +-- https://github.com/LuaJIT/LuaJIT/issues/812.
> +
> +local test = tap.test('lj-812-too-long-string-separator'):skipcond({
> + ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
Please write a more detailed description of how it can be tested for non-GC64 build
and why it is disabled now, as we have discussed offline.
> +})
> +test:plan(2)
> +
> +-- We can't check the string overflow itself without a really
> +-- large file, because the ERR_MEM error will be raised, due to
> +-- the string buffer reallocations during parsing.
> +-- Keep such huge file in the repo is pointless, so just check
> +-- that we don't parse long string after some separator length.
> +-- Be aware that this limit is different for Lua 5.1.
Please fix the same typos as in the commit message here.
> +
> +-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
> +local separator = string.rep('=', 0x20000000 + 1)
> +local test_str = ('return [%s[]%s]'):format(separator, separator)
> +
> +local f, err = loadstring(test_str, 'empty_str_f')
> +test:ok(not f, 'correct status when parsing string with too long separator')
> +
> +-- Check error message.
> +test:ok(tostring(err):match('invalid long string delimiter'),
> + 'correct error when parsing string with too long separator')
> +
> +test:done(true)
> --
> 2.41.0
>
Best regards,
Maxim Kokryashkin
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
2023-08-15 14:38 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:52 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:52 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
See my answers below.
On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
>
> On Wed, Aug 09, 2023 at 06:36:05PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > (cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)
> >
> > This commit fixes possible integer overflow of the separator's length
> Typo: s/possible/a possible/
Fixed.
> > counter during parsing long strings. It may lead to the fact, that
> > parser considers a string with unbalanced long brackets to be correct.
> Typo: s/parser/the parser/
Fixed.
> > Since this is pointless to parse too long string separators in the hope,
> Typo: s/this is/it is/
Fixed.
> > that the string is correct, just use hardcoded limit (2 ^ 25 is enough).
> Typo: s/use hardcoded/use the hardcoded/
Fixed.
> >
> > Be aware that this limit is different for Lua 5.1.
> >
> > We can't check the string overflow itself without a really large file,
> > because the ERR_MEM error will be raised, due to the string buffer
> > reallocations during parsing. Keep such huge file in the repo is
> Typo: s/Keep such/Keeping such a/
Fixed.
> > pointless, so just check that we don't parse long string after
> Typo: s/long string/long strings/
Fixed.
> > aforementioned separator length.
> Typo: s/aforementioned/the aforementioned/
Fixed.
> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#8825
> > ---
> > src/lj_lex.c | 2 +-
> > .../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
> > 2 files changed, 32 insertions(+), 1 deletion(-)
> > create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> >
> > diff --git a/src/lj_lex.c b/src/lj_lex.c
> > index 52856912..c66660d7 100644
> > --- a/src/lj_lex.c
> > +++ b/src/lj_lex.c
<snipped>
> > diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> > new file mode 100644
> > index 00000000..fda69d17
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> > @@ -0,0 +1,31 @@
> > +local tap = require('tap')
> > +
> > +-- Test to check that we avoid parsing of too long separator
> Typo: s/parsing of/parsing/
> Typo: s/separator/separators/
Fixed.
> > +-- for long strings.
> > +-- See also the discussion in the
> > +-- https://github.com/LuaJIT/LuaJIT/issues/812.
> > +
> > +local test = tap.test('lj-812-too-long-string-separator'):skipcond({
> > + ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
> Please write a more detailed description of how it can be tested for non-GC64 build
> and why it is disabled now, as we have discussed offline.
Added, see the diff below.
>
> > +})
> > +test:plan(2)
> > +
> > +-- We can't check the string overflow itself without a really
> > +-- large file, because the ERR_MEM error will be raised, due to
> > +-- the string buffer reallocations during parsing.
> > +-- Keep such huge file in the repo is pointless, so just check
> > +-- that we don't parse long string after some separator length.
> > +-- Be aware that this limit is different for Lua 5.1.
> Please fix the same typos as in the commit message here.
Fixed.
> > +
> > +-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
> > +local separator = string.rep('=', 0x20000000 + 1)
> > +local test_str = ('return [%s[]%s]'):format(separator, separator)
> > +
> > +local f, err = loadstring(test_str, 'empty_str_f')
> > +test:ok(not f, 'correct status when parsing string with too long separator')
> > +
> > +-- Check error message.
> > +test:ok(tostring(err):match('invalid long string delimiter'),
> > + 'correct error when parsing string with too long separator')
Also, changed this part to the `test:like()`, since it is more readable
and has the same behaviour.
See the iterative patch below:
===================================================================
diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
index fda69d17..380e26f0 100644
--- a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
+++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
@@ -1,11 +1,17 @@
local tap = require('tap')
--- Test to check that we avoid parsing of too long separator
--- for long strings.
+-- Test to check that we avoid parsing too long separators for
+-- long strings.
-- See also the discussion in the
-- https://github.com/LuaJIT/LuaJIT/issues/812.
local test = tap.test('lj-812-too-long-string-separator'):skipcond({
+ -- In non-GC64 mode, we get the OOM error since we need memory
+ -- for the string to load and the same amount of memory for the
+ -- string buffer. So, the only option is to create a big file
+ -- in the repo and keep it, or generate it and remove each time.
+ -- These options are kinda pointless, so let's check the
+ -- behaviour only for GC64 mode.
['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
})
test:plan(2)
@@ -13,8 +19,9 @@ test:plan(2)
-- We can't check the string overflow itself without a really
-- large file, because the ERR_MEM error will be raised, due to
-- the string buffer reallocations during parsing.
--- Keep such huge file in the repo is pointless, so just check
--- that we don't parse long string after some separator length.
+-- Keeping such a huge file in the repo is pointless, so just
+-- check that we don't parse long strings after some separator
+-- length.
-- Be aware that this limit is different for Lua 5.1.
-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
@@ -25,7 +32,7 @@ local f, err = loadstring(test_str, 'empty_str_f')
test:ok(not f, 'correct status when parsing string with too long separator')
-- Check error message.
-test:ok(tostring(err):match('invalid long string delimiter'),
- 'correct error when parsing string with too long separator')
+test:like(err, 'invalid long string delimiter',
+ 'correct error when parsing string with too long separator')
test:done(true)
===================================================================
> > +
> > +test:done(true)
> > --
> > 2.41.0
> >
> Best regards,
> Maxim Kokryashkin
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
2023-08-15 14:38 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 10:53 ` Sergey Bronnikov via Tarantool-patches
2023-08-17 13:57 ` Sergey Kaplun via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 10:53 UTC (permalink / raw)
To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches
Hi, Sergey!
thanks for the patch!
test duration is about 7 sec, I propose to add a print before string.rep:
print('# test generation requires about 7 sec')
Otherwise it looks like test hang. Feel free to ignore.
LGTM after fixing comments from Max.
Sergey
On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)
>
> This commit fixes possible integer overflow of the separator's length
> counter during parsing long strings. It may lead to the fact, that
> parser considers a string with unbalanced long brackets to be correct.
> Since this is pointless to parse too long string separators in the hope,
> that the string is correct, just use hardcoded limit (2 ^ 25 is enough).
>
> Be aware that this limit is different for Lua 5.1.
>
> We can't check the string overflow itself without a really large file,
> because the ERR_MEM error will be raised, due to the string buffer
> reallocations during parsing. Keep such huge file in the repo is
> pointless, so just check that we don't parse long string after
> aforementioned separator length.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_lex.c | 2 +-
> .../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
> 2 files changed, 32 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
>
> diff --git a/src/lj_lex.c b/src/lj_lex.c
> index 52856912..c66660d7 100644
> --- a/src/lj_lex.c
> +++ b/src/lj_lex.c
> @@ -138,7 +138,7 @@ static int lex_skipeq(LexState *ls)
> int count = 0;
> LexChar s = ls->c;
> lua_assert(s == '[' || s == ']');
> - while (lex_savenext(ls) == '=')
> + while (lex_savenext(ls) == '=' && count < 0x20000000)
> count++;
> return (ls->c == s) ? count : (-count) - 1;
> }
> diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> new file mode 100644
> index 00000000..fda69d17
> --- /dev/null
> +++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> @@ -0,0 +1,31 @@
> +local tap = require('tap')
> +
> +-- Test to check that we avoid parsing of too long separator
> +-- for long strings.
> +-- See also the discussion in the
> +-- https://github.com/LuaJIT/LuaJIT/issues/812.
> +
> +local test = tap.test('lj-812-too-long-string-separator'):skipcond({
> + ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
> +})
> +test:plan(2)
> +
> +-- We can't check the string overflow itself without a really
> +-- large file, because the ERR_MEM error will be raised, due to
> +-- the string buffer reallocations during parsing.
> +-- Keep such huge file in the repo is pointless, so just check
> +-- that we don't parse long string after some separator length.
> +-- Be aware that this limit is different for Lua 5.1.
> +
> +-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
> +local separator = string.rep('=', 0x20000000 + 1)
> +local test_str = ('return [%s[]%s]'):format(separator, separator)
> +
> +local f, err = loadstring(test_str, 'empty_str_f')
> +test:ok(not f, 'correct status when parsing string with too long separator')
> +
> +-- Check error message.
> +test:ok(tostring(err):match('invalid long string delimiter'),
> + 'correct error when parsing string with too long separator')
> +
> +test:done(true)
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
2023-08-17 10:53 ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17 13:57 ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:28 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17 13:57 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey!
>
>
> thanks for the patch!
>
> test duration is about 7 sec, I propose to add a print before string.rep:
>
> print('# test generation requires about 7 sec')
I suppose it is a good thing to do, but there is another one test
(gh-7745-oom-on-trace.test.lua) which takes a long time too.
Maybe it's better to create a group of "long tests", or something like
that. Also, we may introduce `test:comment()` helper which provides the
same behaviour as the `test:diag('# ' .. fmt, ...)` and use it in all
long tests. But I prefer to do it in the separate patch set, since this
one is already huge enough :).
>
> Otherwise it looks like test hang. Feel free to ignore.
>
>
> LGTM after fixing comments from Max.
>
>
> Sergey
<snipped>
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
2023-08-17 13:57 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-17 14:28 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:28 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, again
On 8/17/23 16:57, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
>
> On 17.08.23, Sergey Bronnikov wrote:
>> Hi, Sergey!
>>
>>
>> thanks for the patch!
>>
>> test duration is about 7 sec, I propose to add a print before string.rep:
>>
>> print('# test generation requires about 7 sec')
> I suppose it is a good thing to do, but there is another one test
> (gh-7745-oom-on-trace.test.lua) which takes a long time too.
> Maybe it's better to create a group of "long tests", or something like
> that. Also, we may introduce `test:comment()` helper which provides the
> same behaviour as the `test:diag('# ' .. fmt, ...)` and use it in all
> long tests. But I prefer to do it in the separate patch set, since this
> one is already huge enough :).
Okey, I'll not insist. LGTM
>
>> Otherwise it looks like test hang. Feel free to ignore.
>>
>>
>> LGTM after fixing comments from Max.
>>
>>
>> Sergey
> <snipped>
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
` (15 preceding siblings ...)
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
2023-08-16 9:01 ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 11:06 ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable Sergey Kaplun via Tarantool-patches
` (4 subsequent siblings)
21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Contributed by James Cowgill.
(cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
The issue is observed for the following merged IRs:
| p64 HREF 0001 "a" ; or other keys
| > p64 EQ 0002 [0x4002d0c528] ; nilnode
Sometimes, when we need to rematerialize a constant during evicting of
the register. So, the instruction related to constant rematerialization
is placed in the delay branch slot, which suppose to contain the loads
of trace exit number to the `$ra` register. The resulting assembly is
the following (for example):
| beq ra, r1, 0x400abee9b0 ->exit
| lui r1, 65531 ; delay slot without setting of the `ra`
This leading to the assertion failure during trace exit in
`lj_trace_exit()`, since a trace number is incorrect.
This patch moves the constant register allocations above the main
instruction emitting code in `asm_href()`.
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#8825
---
src/lj_asm_mips.h | 42 +++++---
...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
2 files changed, 126 insertions(+), 17 deletions(-)
create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index c27d8413..23ffc3aa 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
Reg dest = ra_dest(as, ir, allow);
Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
+#if LJ_64
+ Reg cmp64 = RID_NONE;
+#endif
IRRef refkey = ir->op2;
IRIns *irkey = IR(refkey);
int isk = irref_isk(refkey);
@@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
#endif
tmp2 = ra_scratch(as, allow);
rset_clear(allow, tmp2);
+#if LJ_64
+ if (LJ_SOFTFP || !irt_isnum(kt)) {
+ /* Allocate cmp64 register used for 64-bit comparisons */
+ if (LJ_SOFTFP && irt_isnum(kt)) {
+ cmp64 = key;
+ } else if (!isk && irt_isaddr(kt)) {
+ cmp64 = tmp2;
+ } else {
+ int64_t k;
+ if (isk && irt_isaddr(kt)) {
+ k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
+ } else {
+ lua_assert(irt_ispri(kt) && !irt_isnil(kt));
+ k = ~((int64_t)~irt_toitype(ir->t) << 47);
+ }
+ cmp64 = ra_allock(as, k, allow);
+ rset_clear(allow, cmp64);
+ }
+ }
+#endif
/* Key not found in chain: jump to exit (if merged) or load niltv. */
l_end = emit_label(as);
@@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
- } else if (LJ_SOFTFP && irt_isnum(kt)) {
- emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
- emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
- } else if (irt_isaddr(kt)) {
- Reg refk = tmp2;
- if (isk) {
- int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
- refk = ra_allock(as, k, allow);
- rset_clear(allow, refk);
- }
- emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
- emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
} else {
- Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
- rset_clear(allow, pri);
- lua_assert(irt_ispri(kt) && !irt_isnil(kt));
- emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
- emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
+ emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
+ emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
}
*l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
if (!isk && irt_isaddr(kt)) {
diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
new file mode 100644
index 00000000..8c75e69c
--- /dev/null
+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
@@ -0,0 +1,101 @@
+local tap = require('tap')
+-- Test file to demonstrate the incorrect JIT behaviour for HREF
+-- IR compilation on mips64.
+-- See also https://github.com/LuaJIT/LuaJIT/pull/362.
+local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
+ ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(1)
+
+-- To reproduce the issue we need to compile a trace with
+-- `IR_HREF`, with a lookup of constant hash key GC value. To
+-- prevent an `IR_HREFK` to be emitted instead, we need a table
+-- with a huge hash part. Delta of address between the start of
+-- the hash part of the table and the current node to lookup must
+-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- See <src/lj_record.c>, for details.
+-- XXX: This constant is well suited to prevent test to be flaky,
+-- because the aforementioned delta is always large enough.
+-- Also, this constant avoids table rehashing, when inserting new
+-- keys.
+local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
+
+-- XXX: don't set `hotexit` to prevent compilation of trace after
+-- exiting the main test cycle.
+jit.opt.start('hotloop=1')
+
+-- Don't use `table.new()`, here by intence -- this leads to the
+-- allocation failure for the mcode memory, so traces are not
+-- compiled.
+local filled_tab = {}
+-- Filling-up the table with GC values to minimize the amount of
+-- hash collisions and increase delta between the start of the
+-- hash part of the table and currently stored node.
+for _ = 1, N_HASH_FIELDS do
+ filled_tab[1LL] = 1
+end
+
+-- luacheck: no unused
+local tab_value_a
+local tab_value_b
+local tab_value_c
+local tab_value_d
+local tab_value_e
+local tab_value_f
+local tab_value_g
+local tab_value_h
+local tab_value_i
+
+-- The function for this trace has a bunch of the following IRs:
+-- p64 HREF 0001 "a" ; or other keys
+-- > p64 EQ 0002 [0x4002d0c528] ; nilnode
+-- Sometimes, when we need to rematerialize a constant during
+-- evicting of the register. So, the instruction related to
+-- constant rematerialization is placed in the delay branch slot,
+-- which suppose to contain the loads of trace exit number to the
+-- `$ra` register. This leading to the assertion failure during
+-- trace exit in `lj_trace_exit()`, since a trace number is
+-- incorrect. The amount of the side exit to check is empirical
+-- (even a little bit more, than necessary just in case).
+local function href_const(tab)
+ tab_value_a = tab.a
+ tab_value_b = tab.b
+ tab_value_c = tab.c
+ tab_value_d = tab.d
+ tab_value_e = tab.e
+ tab_value_f = tab.f
+ tab_value_g = tab.g
+ tab_value_h = tab.h
+ tab_value_i = tab.i
+end
+
+-- Compile main trace first.
+href_const(filled_tab)
+href_const(filled_tab)
+
+-- Now brute-force side exits to check that they are compiled
+-- correct. Take side exits in the reverse order to take a new
+-- side exit each time.
+filled_tab.i = 'i'
+href_const(filled_tab)
+filled_tab.h = 'h'
+href_const(filled_tab)
+filled_tab.g = 'g'
+href_const(filled_tab)
+filled_tab.f = 'f'
+href_const(filled_tab)
+filled_tab.e = 'e'
+href_const(filled_tab)
+filled_tab.d = 'd'
+href_const(filled_tab)
+filled_tab.c = 'c'
+href_const(filled_tab)
+filled_tab.b = 'b'
+href_const(filled_tab)
+filled_tab.a = 'a'
+href_const(filled_tab)
+
+test:ok(true, 'no assertion failures during trace exits')
+
+test:done(true)
--
2.41.0
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF Sergey Kaplun via Tarantool-patches
@ 2023-08-16 9:01 ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 15:17 ` Sergey Kaplun via Tarantool-patches
2023-08-17 11:06 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16 9:01 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the patch!
Please consider my comments below.
On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> Contributed by James Cowgill.
>
> (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
>
> The issue is observed for the following merged IRs:
> | p64 HREF 0001 "a" ; or other keys
> | > p64 EQ 0002 [0x4002d0c528] ; nilnode
> Sometimes, when we need to rematerialize a constant during evicting of
Typo: s/during evicting/during the eviction/
> the register. So, the instruction related to constant rematerialization
Sometimes happens what? The sentence looks kind of chopped.
> is placed in the delay branch slot, which suppose to contain the loads
Typo: s/which suppose/which is supposed/
> of trace exit number to the `$ra` register. The resulting assembly is
Typo: s/number/numbers/ (because of `loads` being in the plural form)
> the following (for example):
> | beq ra, r1, 0x400abee9b0 ->exit
> | lui r1, 65531 ; delay slot without setting of the `ra`
> This leading to the assertion failure during trace exit in
Typo: s/leading/leads/
> `lj_trace_exit()`, since a trace number is incorrect.
>
> This patch moves the constant register allocations above the main
> instruction emitting code in `asm_href()`.
AFAICS, It is not just moved, the register allocation logic has changed too.
Before the patch, there were a few cases of inplace emissions, which
disappeared after the patch. I believe it is important to mention to, along
with a more detailed description of the logic changes.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8825
> ---
> src/lj_asm_mips.h | 42 +++++---
> ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
> 2 files changed, 126 insertions(+), 17 deletions(-)
> create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index c27d8413..23ffc3aa 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> Reg dest = ra_dest(as, ir, allow);
> Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
> Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
> +#if LJ_64
> + Reg cmp64 = RID_NONE;
> +#endif
> IRRef refkey = ir->op2;
> IRIns *irkey = IR(refkey);
> int isk = irref_isk(refkey);
> @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> #endif
> tmp2 = ra_scratch(as, allow);
> rset_clear(allow, tmp2);
> +#if LJ_64
> + if (LJ_SOFTFP || !irt_isnum(kt)) {
> + /* Allocate cmp64 register used for 64-bit comparisons */
> + if (LJ_SOFTFP && irt_isnum(kt)) {
> + cmp64 = key;
> + } else if (!isk && irt_isaddr(kt)) {
> + cmp64 = tmp2;
> + } else {
> + int64_t k;
> + if (isk && irt_isaddr(kt)) {
> + k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> + } else {
> + lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> + k = ~((int64_t)~irt_toitype(ir->t) << 47);
> + }
> + cmp64 = ra_allock(as, k, allow);
> + rset_clear(allow, cmp64);
> + }
> + }
> +#endif
>
> /* Key not found in chain: jump to exit (if merged) or load niltv. */
> l_end = emit_label(as);
> @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
> emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
> emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> - } else if (LJ_SOFTFP && irt_isnum(kt)) {
> - emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> - emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> - } else if (irt_isaddr(kt)) {
> - Reg refk = tmp2;
> - if (isk) {
> - int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> - refk = ra_allock(as, k, allow);
> - rset_clear(allow, refk);
> - }
> - emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
> - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> } else {
> - Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
> - rset_clear(allow, pri);
> - lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> - emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
> - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> + emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
> + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> }
> *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
> if (!isk && irt_isaddr(kt)) {
> diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> new file mode 100644
> index 00000000..8c75e69c
> --- /dev/null
> +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> @@ -0,0 +1,101 @@
> +local tap = require('tap')
> +-- Test file to demonstrate the incorrect JIT behaviour for HREF
> +-- IR compilation on mips64.
> +-- See also https://github.com/LuaJIT/LuaJIT/pull/362.
> +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
> + ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +test:plan(1)
> +
> +-- To reproduce the issue we need to compile a trace with
> +-- `IR_HREF`, with a lookup of constant hash key GC value. To
Typo: s/constant/a constant/
> +-- prevent an `IR_HREFK` to be emitted instead, we need a table
Typo: s/to be/from being/
> +-- with a huge hash part. Delta of address between the start of
Typo: s/Delta/The delta/
> +-- the hash part of the table and the current node to lookup must
> +-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
Typo: s/more/greater/
> +-- See <src/lj_record.c>, for details.
> +-- XXX: This constant is well suited to prevent test to be flaky,
Typo: s/to be/from being/
> +-- because the aforementioned delta is always large enough.
> +-- Also, this constant avoids table rehashing, when inserting new
> +-- keys.
> +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
> +
> +-- XXX: don't set `hotexit` to prevent compilation of trace after
> +-- exiting the main test cycle.
I suggest rehprasing it the following way:
| The `hotexit` option is not set to prevent the compilation of traces
| after the emission of the main test cycle.
> +jit.opt.start('hotloop=1')
> +
> +-- Don't use `table.new()`, here by intence -- this leads to the
Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/
> +-- allocation failure for the mcode memory, so traces are not
> +-- compiled.
> +local filled_tab = {}
> +-- Filling-up the table with GC values to minimize the amount of
Typo: s/Filling-up/Fill up/
> +-- hash collisions and increase delta between the start of the
Typo: s/delta/the delta/
> +-- hash part of the table and currently stored node.
Typo: s/currently/the currently/
> +for _ = 1, N_HASH_FIELDS do
> + filled_tab[1LL] = 1
> +end
> +
> +-- luacheck: no unused
> +local tab_value_a
> +local tab_value_b
> +local tab_value_c
> +local tab_value_d
> +local tab_value_e
> +local tab_value_f
> +local tab_value_g
> +local tab_value_h
> +local tab_value_i
> +
> +-- The function for this trace has a bunch of the following IRs:
> +-- p64 HREF 0001 "a" ; or other keys
> +-- > p64 EQ 0002 [0x4002d0c528] ; nilnode
> +-- Sometimes, when we need to rematerialize a constant during
> +-- evicting of the register. So, the instruction related to
Typo: s/evicting/the eviction/
Again, sometimes happens what?
> +-- constant rematerialization is placed in the delay branch slot,
> +-- which suppose to contain the loads of trace exit number to the
Typo: s/which suppose/which is supposed/
Typo: s/number/numbers/
> +-- `$ra` register. This leading to the assertion failure during
Typo: s/leading/leads/
> +-- trace exit in `lj_trace_exit()`, since a trace number is
> +-- incorrect. The amount of the side exit to check is empirical
Typo: s/exit/exits/
> +-- (even a little bit more, than necessary just in case).
Typo: s/more/greater/
> +local function href_const(tab)
> + tab_value_a = tab.a
> + tab_value_b = tab.b
> + tab_value_c = tab.c
> + tab_value_d = tab.d
> + tab_value_e = tab.e
> + tab_value_f = tab.f
> + tab_value_g = tab.g
> + tab_value_h = tab.h
> + tab_value_i = tab.i
> +end
> +
> +-- Compile main trace first.
Typo: s/main/the main/
> +href_const(filled_tab)
> +href_const(filled_tab)
> +
> +-- Now brute-force side exits to check that they are compiled
> +-- correct. Take side exits in the reverse order to take a new
Typo: s/correct/correctly/
Typo: s/the reverse/reverse/
> +-- side exit each time.
> +filled_tab.i = 'i'
> +href_const(filled_tab)
> +filled_tab.h = 'h'
> +href_const(filled_tab)
> +filled_tab.g = 'g'
> +href_const(filled_tab)
> +filled_tab.f = 'f'
> +href_const(filled_tab)
> +filled_tab.e = 'e'
> +href_const(filled_tab)
> +filled_tab.d = 'd'
> +href_const(filled_tab)
> +filled_tab.c = 'c'
> +href_const(filled_tab)
> +filled_tab.b = 'b'
> +href_const(filled_tab)
> +filled_tab.a = 'a'
> +href_const(filled_tab)
> +
> +test:ok(true, 'no assertion failures during trace exits')
> +
> +test:done(true)
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
2023-08-16 9:01 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 15:17 ` Sergey Kaplun via Tarantool-patches
2023-08-16 20:14 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:17 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
Please, see my answers below.
On 16.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
> On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by James Cowgill.
> >
> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
> >
> > The issue is observed for the following merged IRs:
> > | p64 HREF 0001 "a" ; or other keys
> > | > p64 EQ 0002 [0x4002d0c528] ; nilnode
> > Sometimes, when we need to rematerialize a constant during evicting of
> Typo: s/during evicting/during the eviction/
Fixed.
> > the register. So, the instruction related to constant rematerialization
> Sometimes happens what? The sentence looks kind of chopped.
The "when" is misleading here. Dropped it.
> > is placed in the delay branch slot, which suppose to contain the loads
> Typo: s/which suppose/which is supposed/
Fixed.
> > of trace exit number to the `$ra` register. The resulting assembly is
> Typo: s/number/numbers/ (because of `loads` being in the plural form)
Fixed.
> > the following (for example):
> > | beq ra, r1, 0x400abee9b0 ->exit
> > | lui r1, 65531 ; delay slot without setting of the `ra`
> > This leading to the assertion failure during trace exit in
> Typo: s/leading/leads/
Fixed.
> > `lj_trace_exit()`, since a trace number is incorrect.
> >
> > This patch moves the constant register allocations above the main
> > instruction emitting code in `asm_href()`.
> AFAICS, It is not just moved, the register allocation logic has changed too.
> Before the patch, there were a few cases of inplace emissions, which
> disappeared after the patch. I believe it is important to mention to, along
> with a more detailed description of the logic changes.
No, the logic is just the same, we just choose the register early.
Since we use now `cmp64` register everywhere, there is no need to use
duplicate code in if - else if - else chunks.
> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#8825
> > ---
> > src/lj_asm_mips.h | 42 +++++---
> > ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
> > 2 files changed, 126 insertions(+), 17 deletions(-)
> > create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> >
> > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> > index c27d8413..23ffc3aa 100644
> > --- a/src/lj_asm_mips.h
> > +++ b/src/lj_asm_mips.h
> > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> > Reg dest = ra_dest(as, ir, allow);
> > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
> > Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
> > +#if LJ_64
> > + Reg cmp64 = RID_NONE;
> > +#endif
> > IRRef refkey = ir->op2;
> > IRIns *irkey = IR(refkey);
> > int isk = irref_isk(refkey);
> > @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> > #endif
> > tmp2 = ra_scratch(as, allow);
> > rset_clear(allow, tmp2);
> > +#if LJ_64
> > + if (LJ_SOFTFP || !irt_isnum(kt)) {
> > + /* Allocate cmp64 register used for 64-bit comparisons */
> > + if (LJ_SOFTFP && irt_isnum(kt)) {
> > + cmp64 = key;
> > + } else if (!isk && irt_isaddr(kt)) {
> > + cmp64 = tmp2;
> > + } else {
> > + int64_t k;
> > + if (isk && irt_isaddr(kt)) {
> > + k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> > + } else {
> > + lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> > + k = ~((int64_t)~irt_toitype(ir->t) << 47);
> > + }
> > + cmp64 = ra_allock(as, k, allow);
> > + rset_clear(allow, cmp64);
> > + }
> > + }
> > +#endif
> >
> > /* Key not found in chain: jump to exit (if merged) or load niltv. */
> > l_end = emit_label(as);
> > @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> > emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
> > emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
> > emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > - } else if (LJ_SOFTFP && irt_isnum(kt)) {
> > - emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> > - emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > - } else if (irt_isaddr(kt)) {
> > - Reg refk = tmp2;
> > - if (isk) {
> > - int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> > - refk = ra_allock(as, k, allow);
> > - rset_clear(allow, refk);
> > - }
> > - emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> > } else {
> > - Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
> > - rset_clear(allow, pri);
> > - lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> > - emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> > + emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
> > + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > }
> > *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
> > if (!isk && irt_isaddr(kt)) {
> > diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> > new file mode 100644
> > index 00000000..8c75e69c
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> > @@ -0,0 +1,101 @@
> > +local tap = require('tap')
> > +-- Test file to demonstrate the incorrect JIT behaviour for HREF
> > +-- IR compilation on mips64.
> > +-- See also https://github.com/LuaJIT/LuaJIT/pull/362.
> > +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
> > + ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(1)
> > +
> > +-- To reproduce the issue we need to compile a trace with
> > +-- `IR_HREF`, with a lookup of constant hash key GC value. To
> Typo: s/constant/a constant/
Fixed.
> > +-- prevent an `IR_HREFK` to be emitted instead, we need a table
> Typo: s/to be/from being/
Fixed.
> > +-- with a huge hash part. Delta of address between the start of
> Typo: s/Delta/The delta/
Fixed.
> > +-- the hash part of the table and the current node to lookup must
> > +-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
> Typo: s/more/greater/
Fixed.
> > +-- See <src/lj_record.c>, for details.
> > +-- XXX: This constant is well suited to prevent test to be flaky,
> Typo: s/to be/from being/
Fixed.
> > +-- because the aforementioned delta is always large enough.
> > +-- Also, this constant avoids table rehashing, when inserting new
> > +-- keys.
> > +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
> > +
> > +-- XXX: don't set `hotexit` to prevent compilation of trace after
> > +-- exiting the main test cycle.
> I suggest rehprasing it the following way:
> | The `hotexit` option is not set to prevent the compilation of traces
> | after the emission of the main test cycle.
Rephrased.
> > +jit.opt.start('hotloop=1')
> > +
> > +-- Don't use `table.new()`, here by intence -- this leads to the
> Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/
Fixed.
> > +-- allocation failure for the mcode memory, so traces are not
> > +-- compiled.
> > +local filled_tab = {}
> > +-- Filling-up the table with GC values to minimize the amount of
> Typo: s/Filling-up/Fill up/
Fixed.
> > +-- hash collisions and increase delta between the start of the
> Typo: s/delta/the delta/
Fixed.
> > +-- hash part of the table and currently stored node.
> Typo: s/currently/the currently/
Fixed.
> > +for _ = 1, N_HASH_FIELDS do
> > + filled_tab[1LL] = 1
> > +end
> > +
> > +-- luacheck: no unused
> > +local tab_value_a
> > +local tab_value_b
> > +local tab_value_c
> > +local tab_value_d
> > +local tab_value_e
> > +local tab_value_f
> > +local tab_value_g
> > +local tab_value_h
> > +local tab_value_i
> > +
> > +-- The function for this trace has a bunch of the following IRs:
> > +-- p64 HREF 0001 "a" ; or other keys
> > +-- > p64 EQ 0002 [0x4002d0c528] ; nilnode
> > +-- Sometimes, when we need to rematerialize a constant during
> > +-- evicting of the register. So, the instruction related to
> Typo: s/evicting/the eviction/
Fixed.
> Again, sometimes happens what?
The "when" is misleading here. Dropped it.
> > +-- constant rematerialization is placed in the delay branch slot,
> > +-- which suppose to contain the loads of trace exit number to the
> Typo: s/which suppose/which is supposed/
Fixed.
> Typo: s/number/numbers/
Fixed.
> > +-- `$ra` register. This leading to the assertion failure during
> Typo: s/leading/leads/
Fixed.
> > +-- trace exit in `lj_trace_exit()`, since a trace number is
> > +-- incorrect. The amount of the side exit to check is empirical
> Typo: s/exit/exits/
Fixed.
> > +-- (even a little bit more, than necessary just in case).
> Typo: s/more/greater/
Fixed.
> > +local function href_const(tab)
> > + tab_value_a = tab.a
> > + tab_value_b = tab.b
> > + tab_value_c = tab.c
> > + tab_value_d = tab.d
> > + tab_value_e = tab.e
> > + tab_value_f = tab.f
> > + tab_value_g = tab.g
> > + tab_value_h = tab.h
> > + tab_value_i = tab.i
> > +end
> > +
> > +-- Compile main trace first.
> Typo: s/main/the main/
Fixed.
> > +href_const(filled_tab)
> > +href_const(filled_tab)
> > +
> > +-- Now brute-force side exits to check that they are compiled
> > +-- correct. Take side exits in the reverse order to take a new
> Typo: s/correct/correctly/
> Typo: s/the reverse/reverse/
Fixed.
<snipped>
See the iterative patch below:
===================================================================
diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
index 8c75e69c..b4ee9e2b 100644
--- a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
@@ -9,29 +9,29 @@ local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
test:plan(1)
-- To reproduce the issue we need to compile a trace with
--- `IR_HREF`, with a lookup of constant hash key GC value. To
--- prevent an `IR_HREFK` to be emitted instead, we need a table
--- with a huge hash part. Delta of address between the start of
--- the hash part of the table and the current node to lookup must
--- be more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- `IR_HREF`, with a lookup of a constant hash key GC value. To
+-- prevent an `IR_HREFK` from being emitted instead, we need a
+-- table with a huge hash part. The delta of address between the
+-- start of the hash part of the table and the current node to
+-- lookup must be greater than `(1024 * 64 - 1) * sizeof(Node)`.
-- See <src/lj_record.c>, for details.
--- XXX: This constant is well suited to prevent test to be flaky,
--- because the aforementioned delta is always large enough.
+-- XXX: This constant is well suited to prevent test from being
+-- flaky, because the aforementioned delta is always large enough.
-- Also, this constant avoids table rehashing, when inserting new
-- keys.
local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
--- XXX: don't set `hotexit` to prevent compilation of trace after
--- exiting the main test cycle.
+-- XXX: The `hotexit` option is not set to prevent the compilation
+-- of traces after the emission of the main test cycle.
jit.opt.start('hotloop=1')
--- Don't use `table.new()`, here by intence -- this leads to the
--- allocation failure for the mcode memory, so traces are not
+-- `table.new()` is not used here by intention -- this leads to
+-- the allocation failure for the mcode memory, so traces are not
-- compiled.
local filled_tab = {}
--- Filling-up the table with GC values to minimize the amount of
--- hash collisions and increase delta between the start of the
--- hash part of the table and currently stored node.
+-- Fill up the table with GC values to minimize the amount of hash
+-- collisions and increase the delta between the start of the hash
+-- part of the table and the currently stored node.
for _ = 1, N_HASH_FIELDS do
filled_tab[1LL] = 1
end
@@ -50,14 +50,14 @@ local tab_value_i
-- The function for this trace has a bunch of the following IRs:
-- p64 HREF 0001 "a" ; or other keys
-- > p64 EQ 0002 [0x4002d0c528] ; nilnode
--- Sometimes, when we need to rematerialize a constant during
--- evicting of the register. So, the instruction related to
+-- Sometimes, we need to rematerialize a constant during the
+-- eviction of the register. So, the instruction related to
-- constant rematerialization is placed in the delay branch slot,
--- which suppose to contain the loads of trace exit number to the
--- `$ra` register. This leading to the assertion failure during
--- trace exit in `lj_trace_exit()`, since a trace number is
--- incorrect. The amount of the side exit to check is empirical
--- (even a little bit more, than necessary just in case).
+-- which is supposed to contain the load of the trace exit number
+-- to the `$ra` register. This leads to the assertion failure
+-- during trace exit in `lj_trace_exit()`, since a trace number is
+-- incorrect. The amount of the side exits to check is empirical
+-- (even a little bit greater, than necessary just in case).
local function href_const(tab)
tab_value_a = tab.a
tab_value_b = tab.b
@@ -70,13 +70,13 @@ local function href_const(tab)
tab_value_i = tab.i
end
--- Compile main trace first.
+-- Compile the main trace first.
href_const(filled_tab)
href_const(filled_tab)
-- Now brute-force side exits to check that they are compiled
--- correct. Take side exits in the reverse order to take a new
--- side exit each time.
+-- correctly. Take side exits in reverse order to take a new side
+-- exit each time.
filled_tab.i = 'i'
href_const(filled_tab)
filled_tab.h = 'h'
===================================================================
> >
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
2023-08-16 15:17 ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 20:14 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16 20:14 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 14702 bytes --]
Hi, Sergey!
Thanks for the fixes!
LGTM
--
Best regards,
Maxim Kokryashkin
>Среда, 16 августа 2023, 18:22 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
>
>Hi, Maxim!
>Thanks for the review!
>Please, see my answers below.
>
>On 16.08.23, Maxim Kokryashkin wrote:
>> Hi, Sergey!
>> Thanks for the patch!
>> Please consider my comments below.
>> On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> > From: Mike Pall <mike>
>> >
>> > Contributed by James Cowgill.
>> >
>> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
>> >
>> > The issue is observed for the following merged IRs:
>> > | p64 HREF 0001 "a" ; or other keys
>> > | > p64 EQ 0002 [0x4002d0c528] ; nilnode
>> > Sometimes, when we need to rematerialize a constant during evicting of
>> Typo: s/during evicting/during the eviction/
>
>Fixed.
>
>> > the register. So, the instruction related to constant rematerialization
>> Sometimes happens what? The sentence looks kind of chopped.
>
>The "when" is misleading here. Dropped it.
>
>> > is placed in the delay branch slot, which suppose to contain the loads
>> Typo: s/which suppose/which is supposed/
>
>Fixed.
>
>> > of trace exit number to the `$ra` register. The resulting assembly is
>> Typo: s/number/numbers/ (because of `loads` being in the plural form)
>
>Fixed.
>
>> > the following (for example):
>> > | beq ra, r1, 0x400abee9b0 ->exit
>> > | lui r1, 65531 ; delay slot without setting of the `ra`
>> > This leading to the assertion failure during trace exit in
>> Typo: s/leading/leads/
>
>Fixed.
>
>> > `lj_trace_exit()`, since a trace number is incorrect.
>> >
>> > This patch moves the constant register allocations above the main
>> > instruction emitting code in `asm_href()`.
>> AFAICS, It is not just moved, the register allocation logic has changed too.
>> Before the patch, there were a few cases of inplace emissions, which
>> disappeared after the patch. I believe it is important to mention to, along
>> with a more detailed description of the logic changes.
>
>No, the logic is just the same, we just choose the register early.
>Since we use now `cmp64` register everywhere, there is no need to use
>duplicate code in if - else if - else chunks.
>
>> >
>> > Sergey Kaplun:
>> > * added the description and the test for the problem
>> >
>> > Part of tarantool/tarantool#8825
>> > ---
>> > src/lj_asm_mips.h | 42 +++++---
>> > ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
>> > 2 files changed, 126 insertions(+), 17 deletions(-)
>> > create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>> >
>> > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
>> > index c27d8413..23ffc3aa 100644
>> > --- a/src/lj_asm_mips.h
>> > +++ b/src/lj_asm_mips.h
>> > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>> > Reg dest = ra_dest(as, ir, allow);
>> > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
>> > Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
>> >