Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions
@ 2023-08-09 15:35 Sergey Kaplun via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
                   ` (21 more replies)
  0 siblings, 22 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

This patch-set contains all commits are necessary to avoid conflicts
during the backporting of 8ae5170c "Improve assertions" [1], that are
caused by outdating from the upstream.

This patch-set:
- includes several new ports (see patches 4-6, 8, 19) -- only
  description is added for such patches.
- fixes some MIPS misbehaviour (1, 3, 17) -- include tests (*), except
  the first one, since it depends on memory mapping.
- fixes non-Linux/macOS build (7)
- backportes patches, that was excluded or partially stripped before (10,
  14, 15)
- includes refactoring (2, 9, 18)
- fixes general bugs (16)
- fixes gcc 7.1 -Wimplicit-fallthrough warnings (11 - 13)

Note: that only patches 3, 16, 17 adds some new tests.
Other patches just provided description, and the patch 13 adds
-Wimplicit-fallthrough for GCC (>= 7.1) builds.

Patches are backported in the free order as far as they are unrelated
to each other.

(*) To run tests for mips64 in qemu:

Compile with the following command:

| make -j -f Makefile.original HOST_CC="gcc " \
|         CROSS=mips64el-unknown-linux-gnu- \
|         CCDEBUG=" -g -ggdb3" CFLAGS=" -O0" \
|         XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT "

Be avare, that mips64el-unknown-linux-gnu-gcc should provide n64 abi by
default.
Side note: installed on Gentoo with the following command
| crossdev -t mips64el --abis n64 --ex-gdb

And run the corresponding test (-g 7776 to use GDB server on 7776
port):
| LUA_PATH="src/?.lua;test/tarantool-tests/?.lua;test/tarantool-tests/?/init.lua;;" \
| LD_LIBRARY_PATH="/usr/lib/gcc/mips64el-unknown-linux-gnu/13/" \
|   qemu-mips64el  -g 7776 -L /usr/mips64el-unknown-linux-gnu/ \
|     src/luajit -jdump=ta test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua

If you want to connect to the running test from multiarch-gdb:
| mips64el-unknown-linux-gnu-gdb src/luajit
| (gdb) target remote 0.0.0.0:7776
| ...
| 0x000000400297fd00 in __start () from /usr/mips64el-unknown-linux-gnu/lib64/ld.so.1
| (gdb) c

[1]: https://github.com/LuaJIT/LuaJIT/commit/8ae5170c

Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-8825-mips-ppc-refactoring
PR: https://github.com/tarantool/tarantool/pull/8969
Related Issues:
* https://github.com/tarantool/tarantool/issues/8825
* https://github.com/LuaJIT/LuaJIT/pull/362
* https://github.com/LuaJIT/LuaJIT/issues/812

Mike Pall (17):
  MIPS: Use precise search for exit jump patching.
  MIPS: Fix handling of spare long-range jump slots.
  MIPS64: Add soft-float support to JIT compiler backend.
  PPC: Add soft-float support to interpreter.
  PPC: Add soft-float support to JIT compiler backend.
  Windows: Add UWP support, part 1.
  FFI: Eliminate hardcoded string hashes.
  Cleanup math function compilation and fix inconsistencies.
  Fix GCC 7 -Wimplicit-fallthrough warnings.
  DynASM: Fix warning.
  ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
  Fix debug.getinfo() argument check.
  Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
  Prevent integer overflow while parsing long strings.
  MIPS64: Fix register allocation in assembly of HREF.
  DynASM/MIPS: Fix shadowed variable.
  MIPS: Add MIPS64 R6 port.

Sergey Kaplun (2):
  test: introduce mcode generator for tests
  build: fix non-Linux/macOS builds

 cmake/SetDynASMFlags.cmake                    |    5 +
 cmake/SetTargetFlags.cmake                    |    6 +
 doc/ext_ffi_api.html                          |    2 +
 dynasm/dasm_arm.h                             |    2 +
 dynasm/dasm_arm64.h                           |    1 +
 dynasm/dasm_mips.h                            |   14 +-
 dynasm/dasm_mips.lua                          |  629 ++++++---
 dynasm/dasm_ppc.h                             |    1 +
 dynasm/dasm_x86.h                             |   18 +-
 dynasm/dynasm.lua                             |    1 +
 src/Makefile.original                         |    3 +
 src/host/buildvm_asm.c                        |    2 +-
 src/jit/bcsave.lua                            |   84 +-
 src/jit/dis_mips.lua                          |  293 +++-
 src/jit/dis_mips64r6.lua                      |   17 +
 src/jit/dis_mips64r6el.lua                    |   17 +
 src/lib_ffi.c                                 |   36 +-
 src/lib_io.c                                  |    4 +-
 src/lib_misc.c                                |   16 +-
 src/lib_package.c                             |   24 +-
 src/lj_alloc.c                                |    6 +-
 src/lj_arch.h                                 |   80 +-
 src/lj_asm.c                                  |   19 +-
 src/lj_asm_arm.h                              |    4 +-
 src/lj_asm_mips.h                             |  379 ++++-
 src/lj_asm_ppc.h                              |  322 ++++-
 src/lj_ccall.c                                |   38 +-
 src/lj_ccall.h                                |    4 +-
 src/lj_ccallback.c                            |   34 +-
 src/lj_clib.c                                 |   20 +-
 src/lj_cparse.c                               |   87 +-
 src/lj_cparse.h                               |    2 +
 src/lj_crecord.c                              |    4 +-
 src/lj_debug.c                                |   16 +-
 src/lj_emit_mips.h                            |   17 +-
 src/lj_err.c                                  |    1 +
 src/lj_ffrecord.c                             |    2 +-
 src/lj_frame.h                                |    2 +-
 src/lj_ircall.h                               |   45 +-
 src/lj_iropt.h                                |    2 +-
 src/lj_jit.h                                  |   18 +-
 src/lj_lex.c                                  |    2 +-
 src/lj_mcode.c                                |   14 +-
 src/lj_obj.h                                  |    3 +
 src/lj_opt_sink.c                             |    2 +-
 src/lj_opt_split.c                            |    2 +-
 src/lj_parse.c                                |    3 +-
 src/lj_profile_timer.c                        |    8 +-
 src/lj_profile_timer.h                        |    8 +-
 src/lj_record.c                               |    4 +-
 src/lj_snap.c                                 |   21 +-
 src/lj_target_mips.h                          |   52 +-
 src/luajit.c                                  |    1 +
 src/vm_mips64.dasc                            |  413 +++++-
 src/vm_ppc.dasc                               | 1249 ++++++++++++++---
 ...x-mips64-spare-side-exit-patching.test.lua |   65 +
 ...8-fix-side-exit-patching-on-arm64.test.lua |   78 +-
 ...-mips64-href-delay-slot-side-exit.test.lua |  101 ++
 .../lj-812-too-long-string-separator.test.lua |   31 +
 test/tarantool-tests/utils/frontend.lua       |   24 +
 test/tarantool-tests/utils/jit/generators.lua |  115 ++
 61 files changed, 3565 insertions(+), 908 deletions(-)
 create mode 100644 src/jit/dis_mips64r6.lua
 create mode 100644 src/jit/dis_mips64r6el.lua
 create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
 create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
 create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
 create mode 100644 test/tarantool-tests/utils/jit/generators.lua

-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15  9:36   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:25   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by Djordje Kovacevic and Stefan Pejic.

(cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)

Without the aforementioned checks, some non-branch instructions may be
interpreted as some branch due to memory address collisions. This patch
adds the corresponding comparisons masked values with instruction
opcodes used in the LuaJIT:
* `MIPSI_BEQ` for `beq` and `bne`,
* `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
* `MIPSI_BC1F` for `bc1f` and `bc1t`,
see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
details.

To reproduce this failure, we need specific memory mapping, so testcase
is omitted.

Since MIPS architecture is not supported by Tarantool (at the moment)
this patch is not necessary for backport. OTOH, it gives to us the
following benefits:
* Be in sync with the LuaJIT upstream not only for x86_64, arm64
  architectures.
* Avoid conflicts during the future backporting.
So, it's more useful to backport some of the patches to avoid conflicts
with the future patch series.

[1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#8825
---
 src/lj_asm_mips.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index 03417013..03215821 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -2472,7 +2472,11 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
   MCode tjump = MIPSI_J|(((uintptr_t)target>>2)&0x03ffffffu);
   for (p++; p < pe; p++) {
     if (*p == exitload) {  /* Look for load of exit number. */
-      if (((p[-1] ^ (px-p)) & 0xffffu) == 0) {  /* Look for exitstub branch. */
+      /* Look for exitstub branch. Yes, this covers all used branch variants. */
+      if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
+	  ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
+	   (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
+	   (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
 	ptrdiff_t delta = target - p;
 	if (((delta + 0x8000) >> 16) == 0) {  /* Patch in-range branch. */
 	patchbranch:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 10:14   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 14:32   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
                   ` (19 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
depends on particular offset of mcode for side trace regarding the
parent trace. Before this commit just run some amount of functions to
generate traces to fill the required mcode range. Unfortunately, this
approach is not robust, since sometimes trace is not recorded due to
errors "leaving loop in root trace" observed because of hotcount
collisions.

This patch introduces the following helpers:
* `frontend.gettraceno(func)` -- returns the traceno for the given
  function, assumming that there is compiled trace for its prototype
  (i.e. the 0th bytecode is JFUNC).
* `jit.generators.fillmcode(traceno, size)` fills mcode area of the
  given size from the given trace. It is useful to generate some mcode
  to test jumps to side traces remote enough from the parent.
---
 ...8-fix-side-exit-patching-on-arm64.test.lua |  78 ++----------
 test/tarantool-tests/utils/frontend.lua       |  24 ++++
 test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
 3 files changed, 150 insertions(+), 67 deletions(-)
 create mode 100644 test/tarantool-tests/utils/jit/generators.lua

diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
index 93db3041..678ac914 100644
--- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
+++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
@@ -1,8 +1,12 @@
 local tap = require('tap')
 local test = tap.test('gh-6098-fix-side-exit-patching-on-arm64'):skipcond({
   ['Test requires JIT enabled'] = not jit.status(),
+  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
 })
 
+local generators = require('utils').jit.generators
+local frontend = require('utils').frontend
+
 test:plan(1)
 
 -- The function to be tested for side exit patching:
@@ -20,52 +24,6 @@ local function cbool(cond)
   end
 end
 
--- XXX: Function template below produces 8Kb mcode for ARM64, so
--- we need to compile at least 128 traces to exceed 1Mb delta
--- between <cbool> root trace side exit and <cbool> side trace.
--- Unfortunately, we have no other option for extending this jump
--- delta, since the base of the current mcode area (J->mcarea) is
--- used as a hint for mcode allocator (see lj_mcode.c for info).
-local FUNCS = 128
-local recfuncs = { }
-for i = 1, FUNCS do
-  -- This is a quite heavy workload (though it doesn't look like
-  -- one at first). Each load from a table is type guarded. Each
-  -- table lookup (for both stores and loads) is guarded for table
-  -- <hmask> value and metatable presence. The code below results
-  -- to 8Kb of mcode for ARM64 in practice.
-  recfuncs[i] = assert(load(([[
-    return function(src)
-      local p = %d
-      local tmp = { }
-      local dst = { }
-      for i = 1, 3 do
-        tmp.a = src.a * p   tmp.j = src.j * p   tmp.s = src.s * p
-        tmp.b = src.b * p   tmp.k = src.k * p   tmp.t = src.t * p
-        tmp.c = src.c * p   tmp.l = src.l * p   tmp.u = src.u * p
-        tmp.d = src.d * p   tmp.m = src.m * p   tmp.v = src.v * p
-        tmp.e = src.e * p   tmp.n = src.n * p   tmp.w = src.w * p
-        tmp.f = src.f * p   tmp.o = src.o * p   tmp.x = src.x * p
-        tmp.g = src.g * p   tmp.p = src.p * p   tmp.y = src.y * p
-        tmp.h = src.h * p   tmp.q = src.q * p   tmp.z = src.z * p
-        tmp.i = src.i * p   tmp.r = src.r * p
-
-        dst.a = tmp.z + p   dst.j = tmp.q + p   dst.s = tmp.h + p
-        dst.b = tmp.y + p   dst.k = tmp.p + p   dst.t = tmp.g + p
-        dst.c = tmp.x + p   dst.l = tmp.o + p   dst.u = tmp.f + p
-        dst.d = tmp.w + p   dst.m = tmp.n + p   dst.v = tmp.e + p
-        dst.e = tmp.v + p   dst.n = tmp.m + p   dst.w = tmp.d + p
-        dst.f = tmp.u + p   dst.o = tmp.l + p   dst.x = tmp.c + p
-        dst.g = tmp.t + p   dst.p = tmp.k + p   dst.y = tmp.b + p
-        dst.h = tmp.s + p   dst.q = tmp.j + p   dst.z = tmp.a + p
-        dst.i = tmp.r + p   dst.r = tmp.i + p
-      end
-      dst.tmp = tmp
-      return dst
-    end
-  ]]):format(i)), ('Syntax error in function recfuncs[%d]'):format(i))()
-end
-
 -- Make compiler work hard:
 -- * No optimizations at all to produce more mcode.
 -- * Try to compile all compiled paths as early as JIT can.
@@ -78,27 +36,13 @@ cbool(true)
 -- a root trace for <cbool>.
 cbool(true)
 
-for i = 1, FUNCS do
-  -- XXX: FNEW is NYI, hence loop recording fails at this point.
-  -- The recording is aborted on purpose: we are going to record
-  -- <FUNCS> number of traces for functions in <recfuncs>.
-  -- Otherwise, loop recording might lead to a very long trace
-  -- error (via return to a lower frame), or a trace with lots of
-  -- side traces. We need neither of this, but just bunch of
-  -- traces filling the available mcode area.
-  local function tnew(p)
-    return {
-      a = p + 1, f = p + 6,  k = p + 11, p = p + 16, u = p + 21, z = p + 26,
-      b = p + 2, g = p + 7,  l = p + 12, q = p + 17, v = p + 22,
-      c = p + 3, h = p + 8,  m = p + 13, r = p + 18, w = p + 23,
-      d = p + 4, i = p + 9,  n = p + 14, s = p + 19, x = p + 24,
-      e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
-    }
-  end
-  -- Each function call produces a trace (see the template for the
-  -- function definition above).
-  recfuncs[i](tnew(i))
-end
+local cbool_traceno = frontend.gettraceno(cbool)
+
+-- XXX: Unfortunately, we have no other option for extending
+-- this jump delta, since the base of the current mcode area
+-- (J->mcarea) is used as a hint for mcode allocator (see
+-- lj_mcode.c for info).
+generators.fillmcode(cbool_traceno, 1024 * 1024)
 
 -- XXX: I tried to make the test in pure Lua, but I failed to
 -- implement the robust solution. As a result I've implemented a
diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
index 2afebbb2..414257fd 100644
--- a/test/tarantool-tests/utils/frontend.lua
+++ b/test/tarantool-tests/utils/frontend.lua
@@ -1,6 +1,10 @@
 local M = {}
 
 local bc = require('jit.bc')
+local jutil = require('jit.util')
+local vmdef = require('jit.vmdef')
+local bcnames = vmdef.bcnames
+local band, rshift = bit.band, bit.rshift
 
 function M.hasbc(f, bytecode)
   assert(type(f) == 'function', 'argument #1 should be a function')
@@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
   return hasbc
 end
 
+-- Get traceno of the trace assotiated for the given function.
+function M.gettraceno(func)
+  assert(type(func) == 'function', 'argument #1 should be a function')
+
+  -- The 0th BC is the header.
+  local func_ins = jutil.funcbc(func, 0)
+  local BC_NAME_LENGTH = 6
+  local RD_SHIFT = 16
+
+  -- Calculate index in `bcnames` string.
+  local op_idx = BC_NAME_LENGTH * band(func_ins, 0xff)
+  -- Get the name of the operation.
+  local op_name = string.sub(bcnames, op_idx + 1, op_idx + BC_NAME_LENGTH)
+  assert(op_name:match('JFUNC'),
+         'The given function has non-jitted header: ' .. op_name)
+
+  -- RD contains the traceno.
+  return rshift(func_ins, RD_SHIFT)
+end
+
 return M
diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
new file mode 100644
index 00000000..62b6e0ef
--- /dev/null
+++ b/test/tarantool-tests/utils/jit/generators.lua
@@ -0,0 +1,115 @@
+local M = {}
+
+local jutil = require('jit.util')
+
+local function getlast_traceno()
+  return misc.getmetrics().jit_trace_num
+end
+
+-- Convert addr to positive value if needed.
+local function canonize_address(addr)
+  if addr < 0 then addr = addr + 2 ^ 32 end
+  return addr
+end
+
+-- Need some storage to avoid functions and traces to be
+-- collected.
+local recfuncs = {}
+local last_i = 0
+-- This function generates a table of functions with heavy mcode
+-- payload with tab arithmetics to fill the mcode area from the
+-- one trace mcode by the some given size. This size is usually
+-- big enough, because we want to check long jump side exits from
+-- some traces.
+-- Assumes, that maxmcode and maxtrace options are set to be sure,
+-- that we can produce such amount of mcode.
+function M.fillmcode(trace_from, size)
+  local mcode, addr_from = jutil.tracemc(trace_from)
+  assert(mcode, 'the #1 argument should be an existed trace number')
+  addr_from = canonize_address(addr_from)
+  local required_diff = size + #mcode
+
+  -- Marker to check that traces are not flushed.
+  local maxtraceno = getlast_traceno()
+  local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
+
+  local _, last_addr = jutil.tracemc(maxtraceno)
+  last_addr = canonize_address(last_addr)
+
+  -- Addresses of traces may increase or decrease depending on OS,
+  -- so use absolute diff.
+  while math.abs(last_addr - addr_from) > required_diff do
+    last_i = last_i + 1
+    -- This is a quite heavy workload (though it doesn't look like
+    -- one at first). Each load from a table is type guarded. Each
+    -- table lookup (for both stores and loads) is guarded for
+    -- table <hmask> value and presence of the metatable. The code
+    -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
+    -- practice.
+    local fname = ('fillmcode[%d]'):format(last_i)
+    recfuncs[last_i] = assert(loadstring(([[
+      return function(src)
+        local p = %d
+        local tmp = { }
+        local dst = { }
+        -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
+        -- in root trace) errors due to hotcount collisions.
+        for i = 1, 5 do
+          tmp.a = src.a * p   tmp.j = src.j * p   tmp.s = src.s * p
+          tmp.b = src.b * p   tmp.k = src.k * p   tmp.t = src.t * p
+          tmp.c = src.c * p   tmp.l = src.l * p   tmp.u = src.u * p
+          tmp.d = src.d * p   tmp.m = src.m * p   tmp.v = src.v * p
+          tmp.e = src.e * p   tmp.n = src.n * p   tmp.w = src.w * p
+          tmp.f = src.f * p   tmp.o = src.o * p   tmp.x = src.x * p
+          tmp.g = src.g * p   tmp.p = src.p * p   tmp.y = src.y * p
+          tmp.h = src.h * p   tmp.q = src.q * p   tmp.z = src.z * p
+          tmp.i = src.i * p   tmp.r = src.r * p
+
+          dst.a = tmp.z + p   dst.j = tmp.q + p   dst.s = tmp.h + p
+          dst.b = tmp.y + p   dst.k = tmp.p + p   dst.t = tmp.g + p
+          dst.c = tmp.x + p   dst.l = tmp.o + p   dst.u = tmp.f + p
+          dst.d = tmp.w + p   dst.m = tmp.n + p   dst.v = tmp.e + p
+          dst.e = tmp.v + p   dst.n = tmp.m + p   dst.w = tmp.d + p
+          dst.f = tmp.u + p   dst.o = tmp.l + p   dst.x = tmp.c + p
+          dst.g = tmp.t + p   dst.p = tmp.k + p   dst.y = tmp.b + p
+          dst.h = tmp.s + p   dst.q = tmp.j + p   dst.z = tmp.a + p
+          dst.i = tmp.r + p   dst.r = tmp.i + p
+        end
+        dst.tmp = tmp
+        return dst
+      end
+    ]]):format(last_i), fname), ('Syntax error in function %s'):format(fname))()
+    -- XXX: FNEW is NYI, hence loop recording fails at this point.
+    -- The recording is aborted on purpose: the whole loop
+    -- recording might lead to a very long trace error (via return
+    -- to a lower frame), or a trace with lots of side traces. We
+    -- need neither of this, but just a bunch of traces filling
+    -- the available mcode area.
+    local function tnew(p)
+      return {
+        a = p + 1, f = p + 6,  k = p + 11, p = p + 16, u = p + 21, z = p + 26,
+        b = p + 2, g = p + 7,  l = p + 12, q = p + 17, v = p + 22,
+        c = p + 3, h = p + 8,  m = p + 13, r = p + 18, w = p + 23,
+        d = p + 4, i = p + 9,  n = p + 14, s = p + 19, x = p + 24,
+        e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
+      }
+    end
+    -- Each function call produces a trace (see the template for
+    -- the function definition above).
+    recfuncs[last_i](tnew(last_i))
+    local last_traceno = getlast_traceno()
+    if last_traceno < maxtraceno then
+      error(FLUSH_ERR)
+    end
+
+    -- Calculate the address of the last trace start.
+    maxtraceno = last_traceno
+    _, last_addr = jutil.tracemc(last_traceno)
+    if not last_addr then
+      error(FLUSH_ERR)
+    end
+    last_addr = canonize_address(last_addr)
+  end
+end
+
+return M
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 11:13   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 15:02   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
                   ` (18 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by Djordje Kovacevic and Stefan Pejic.

(cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)

`asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
check in `asm_sparejump_setup()`, so mcode bottom is not updated.

This patch fixes check of the MCLink offset from the mcbot.
Nevertheless, the emitting of spare jump slots is still incorrect, so
the introduced test still fails due to incorrect iteration through the
sparce table (the last slot is out of mcode range).

This should be fixed via backporting of the commit
dbb78630169a8106b355a5be8af627e98c362f1e ("MIPS: Fix handling of
long-range spare jumps."). But it triggers the new unconditional
assert, that is added in this patch, mentioning that sizemcode is too
bit. So some workaround should be found, when this test will be enabled
for MIPS.

Since test also validates the behaviour of long-range jumps to side
traces for arm64 and x64, and we have no testing for MIPS64 (yet), we
can leave it as is without a skipcond.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#8825
---
 src/lj_asm_mips.h                             |  9 +--
 src/lj_jit.h                                  |  6 ++
 src/lj_mcode.c                                |  6 --
 ...x-mips64-spare-side-exit-patching.test.lua | 65 +++++++++++++++++++
 4 files changed, 76 insertions(+), 10 deletions(-)
 create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua

diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index 03215821..0e60fc07 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -65,10 +65,9 @@ static Reg ra_alloc2(ASMState *as, IRIns *ir, RegSet allow)
 static void asm_sparejump_setup(ASMState *as)
 {
   MCode *mxp = as->mcbot;
-  /* Assumes sizeof(MCLink) == 8. */
-  if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == 8) {
+  if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == sizeof(MCLink)) {
     lua_assert(MIPSI_NOP == 0);
-    memset(mxp+2, 0, MIPS_SPAREJUMP*8);
+    memset(mxp, 0, MIPS_SPAREJUMP*2*sizeof(MCode));
     mxp += MIPS_SPAREJUMP*2;
     lua_assert(mxp < as->mctop);
     lj_mcode_sync(as->mcbot, mxp);
@@ -2486,7 +2485,9 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
 	  if (!cstart) cstart = p-1;
 	} else {  /* Branch out of range. Use spare jump slot in mcarea. */
 	  int i;
-	  for (i = 2; i < 2+MIPS_SPAREJUMP*2; i += 2) {
+	  for (i = (int)(sizeof(MCLink)/sizeof(MCode));
+	       i < (int)(sizeof(MCLink)/sizeof(MCode)+MIPS_SPAREJUMP*2);
+	       i += 2) {
 	    if (mcarea[i] == tjump) {
 	      delta = mcarea+i - p;
 	      goto patchbranch;
diff --git a/src/lj_jit.h b/src/lj_jit.h
index f2ad3c6e..cc8efd20 100644
--- a/src/lj_jit.h
+++ b/src/lj_jit.h
@@ -158,6 +158,12 @@ typedef uint8_t MCode;
 typedef uint32_t MCode;
 #endif
 
+/* Linked list of MCode areas. */
+typedef struct MCLink {
+  MCode *next;		/* Next area. */
+  size_t size;		/* Size of current area. */
+} MCLink;
+
 /* Stack snapshot header. */
 typedef struct SnapShot {
   uint32_t mapofs;	/* Offset into snapshot map. */
diff --git a/src/lj_mcode.c b/src/lj_mcode.c
index 7184d3b4..c6361018 100644
--- a/src/lj_mcode.c
+++ b/src/lj_mcode.c
@@ -272,12 +272,6 @@ static void *mcode_alloc(jit_State *J, size_t sz)
 
 /* -- MCode area management ----------------------------------------------- */
 
-/* Linked list of MCode areas. */
-typedef struct MCLink {
-  MCode *next;		/* Next area. */
-  size_t size;		/* Size of current area. */
-} MCLink;
-
 /* Allocate a new MCode area. */
 static void mcode_allocarea(jit_State *J)
 {
diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
new file mode 100644
index 00000000..fdc826cb
--- /dev/null
+++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
@@ -0,0 +1,65 @@
+local tap = require('tap')
+local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
+  -- Need to fix the MIPS behaviour first.
+  ['Disabled for MIPS architectures'] = jit.arch:match('mips'),
+})
+
+local generators = require('utils').jit.generators
+local frontend = require('utils').frontend
+
+test:plan(1)
+
+-- Make compiler work hard.
+jit.opt.start(
+  -- No optimizations at all to produce more mcode.
+  0,
+  -- Try to compile all compiled paths as early as JIT can.
+  'hotloop=1',
+  'hotexit=1',
+  -- Allow to use 2000 traces to avoid flushes.
+  'maxtrace=2000',
+  -- Allow to compile 8Mb of mcode to be sure the issue occurs.
+  'maxmcode=8192',
+  -- Use big mcode area for traces to avoid using different
+  -- spare slots.
+  'sizemcode=256'
+)
+
+local MAX_SPARE_SLOT = 4
+local function parent(marker)
+  -- Use several side exit to fill spare exit space (default is
+  -- 4 slots, each slot has 2 instructions -- jump and nop).
+  -- luacheck: ignore
+  if marker > MAX_SPARE_SLOT then end
+  if marker > 3 then end
+  if marker > 2 then end
+  if marker > 1 then end
+  if marker > 0 then end
+  -- XXX: use `fmod()` to avoid leaving the function and use
+  -- stitching here.
+  return math.fmod(1, 1)
+end
+
+-- Compile parent trace first.
+parent(0)
+parent(0)
+
+local parent_traceno = frontend.gettraceno(parent)
+local last_traceno = parent_traceno
+
+-- Now generate some mcode to forcify long jump with a spare slot.
+-- Each iteration provide different addresses and uses a different
+-- spare slot. After it compile and execute new side trace.
+for i = 1, MAX_SPARE_SLOT + 1 do
+  generators.fillmcode(last_traceno, 1024 * 1024)
+  parent(i)
+  parent(i)
+  parent(i)
+  last_traceno = misc.getmetrics().jit_trace_num
+end
+
+test:ok(true, 'all traces executed correctly')
+
+test:done(true)
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (2 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 11:27   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 16:07   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
                   ` (17 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
Sponsored by Cisco Systems, Inc.

(cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)

The software floating point library is used on machines which do not
have hardware support for floating point [1]. This patch enables
support for such machines in JIT compiler backend for MIPS64.
This includes:
* `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
  number to integer with a check for the soft-float support (called from
  JIT).
* `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
  operations with a check for the soft-float support (called from JIT).
* `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
  `LJ_SOFTFP`.
* All fp-depending paths are instrumented with `LJ_SOFTFP` or
  `LJ_SOFTFP32` macro.
* The corresponding function calls in <src/lj_ircall.h> are marked as
  `XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
  for soft-FP on 32-bit MIPS.

[1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

Sergey Kaplun:
* added the description for the feature

Part of tarantool/tarantool#8825
---
 src/lj_arch.h      |   4 +-
 src/lj_asm.c       |   8 +-
 src/lj_asm_mips.h  | 217 +++++++++++++++++++++++++++++++++++++--------
 src/lj_crecord.c   |   4 +-
 src/lj_emit_mips.h |   2 +
 src/lj_ffrecord.c  |   2 +-
 src/lj_ircall.h    |  43 ++++++---
 src/lj_iropt.h     |   2 +-
 src/lj_jit.h       |   4 +-
 src/lj_obj.h       |   3 +
 src/lj_opt_split.c |   2 +-
 src/lj_snap.c      |  21 +++--
 src/vm_mips64.dasc |  49 ++++++++++
 13 files changed, 286 insertions(+), 75 deletions(-)

diff --git a/src/lj_arch.h b/src/lj_arch.h
index 5276ae56..c39526ea 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -349,9 +349,6 @@
 #define LJ_ARCH_BITS		32
 #define LJ_TARGET_MIPS32	1
 #else
-#if LJ_ABI_SOFTFP || !LJ_ARCH_HASFPU
-#define LJ_ARCH_NOJIT		1	/* NYI */
-#endif
 #define LJ_ARCH_BITS		64
 #define LJ_TARGET_MIPS64	1
 #define LJ_TARGET_GC64		1
@@ -528,6 +525,7 @@
 #define LJ_ABI_SOFTFP		0
 #endif
 #define LJ_SOFTFP		(!LJ_ARCH_HASFPU)
+#define LJ_SOFTFP32		(LJ_SOFTFP && LJ_32)
 
 #if LJ_ARCH_ENDIAN == LUAJIT_BE
 #define LJ_LE			0
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 0bfa44ed..15de7e33 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -341,7 +341,7 @@ static Reg ra_rematk(ASMState *as, IRRef ref)
   ra_modified(as, r);
   ir->r = RID_INIT;  /* Do not keep any hint. */
   RA_DBGX((as, "remat     $i $r", ir, r));
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
   if (ir->o == IR_KNUM) {
     emit_loadk64(as, r, ir);
   } else
@@ -1356,7 +1356,7 @@ static void asm_call(ASMState *as, IRIns *ir)
   asm_gencall(as, ci, args);
 }
 
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
 static void asm_fppow(ASMState *as, IRIns *ir, IRRef lref, IRRef rref)
 {
   const CCallInfo *ci = &lj_ir_callinfo[IRCALL_pow];
@@ -1703,10 +1703,10 @@ static void asm_ir(ASMState *as, IRIns *ir)
   case IR_MUL: asm_mul(as, ir); break;
   case IR_MOD: asm_mod(as, ir); break;
   case IR_NEG: asm_neg(as, ir); break;
-#if LJ_SOFTFP
+#if LJ_SOFTFP32
   case IR_DIV: case IR_POW: case IR_ABS:
   case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
-    lua_assert(0);  /* Unused for LJ_SOFTFP. */
+    lua_assert(0);  /* Unused for LJ_SOFTFP32. */
     break;
 #else
   case IR_DIV: asm_div(as, ir); break;
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index 0e60fc07..a26a82cd 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -290,7 +290,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
 	  {
 	    ra_leftov(as, gpr, ref);
 	    gpr++;
-#if LJ_64
+#if LJ_64 && !LJ_SOFTFP
 	    fpr++;
 #endif
 	  }
@@ -301,7 +301,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
 	  emit_spstore(as, ir, r, ofs);
 	  ofs += irt_isnum(ir->t) ? 8 : 4;
 #else
-	  emit_spstore(as, ir, r, ofs + ((LJ_BE && (LJ_SOFTFP || r < RID_MAX_GPR) && !irt_is64(ir->t)) ? 4 : 0));
+	  emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isfp(ir->t) && !irt_is64(ir->t)) ? 4 : 0));
 	  ofs += 8;
 #endif
 	}
@@ -312,7 +312,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
 #endif
       if (gpr <= REGARG_LASTGPR) {
 	gpr++;
-#if LJ_64
+#if LJ_64 && !LJ_SOFTFP
 	fpr++;
 #endif
       } else {
@@ -461,12 +461,36 @@ static void asm_tobit(ASMState *as, IRIns *ir)
   emit_tg(as, MIPSI_MFC1, dest, tmp);
   emit_fgh(as, MIPSI_ADD_D, tmp, left, right);
 }
+#elif LJ_64  /* && LJ_SOFTFP */
+static void asm_tointg(ASMState *as, IRIns *ir, Reg r)
+{
+  /* The modified regs must match with the *.dasc implementation. */
+  RegSet drop = RID2RSET(REGARG_FIRSTGPR)|RID2RSET(RID_RET)|RID2RSET(RID_RET+1)|
+		RID2RSET(RID_R1)|RID2RSET(RID_R12);
+  if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
+  ra_evictset(as, drop);
+  /* Return values are in RID_RET (converted value) and RID_RET+1 (status). */
+  ra_destreg(as, ir, RID_RET);
+  asm_guard(as, MIPSI_BNE, RID_RET+1, RID_ZERO);
+  emit_call(as, (void *)lj_ir_callinfo[IRCALL_lj_vm_tointg].func, 0);
+  if (r == RID_NONE)
+    ra_leftov(as, REGARG_FIRSTGPR, ir->op1);
+  else if (r != REGARG_FIRSTGPR)
+    emit_move(as, REGARG_FIRSTGPR, r);
+}
+
+static void asm_tobit(ASMState *as, IRIns *ir)
+{
+  Reg dest = ra_dest(as, ir, RSET_GPR);
+  emit_dta(as, MIPSI_SLL, dest, dest, 0);
+  asm_callid(as, ir, IRCALL_lj_vm_tobit);
+}
 #endif
 
 static void asm_conv(ASMState *as, IRIns *ir)
 {
   IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
   int stfp = (st == IRT_NUM || st == IRT_FLOAT);
 #endif
 #if LJ_64
@@ -477,12 +501,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
   lua_assert(!(irt_isint64(ir->t) ||
 	       (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
 #endif
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP32
   /* FP conversions are handled by SPLIT. */
   lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
   /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
 #else
   lua_assert(irt_type(ir->t) != st);
+#if !LJ_SOFTFP
   if (irt_isfp(ir->t)) {
     Reg dest = ra_dest(as, ir, RSET_FPR);
     if (stfp) {  /* FP to FP conversion. */
@@ -608,6 +633,42 @@ static void asm_conv(ASMState *as, IRIns *ir)
       }
     }
   } else
+#else
+  if (irt_isfp(ir->t)) {
+#if LJ_64 && LJ_HASFFI
+    if (stfp) {  /* FP to FP conversion. */
+      asm_callid(as, ir, irt_isnum(ir->t) ? IRCALL_softfp_f2d :
+					    IRCALL_softfp_d2f);
+    } else {  /* Integer to FP conversion. */
+      IRCallID cid = ((IRT_IS64 >> st) & 1) ?
+	(irt_isnum(ir->t) ?
+	 (st == IRT_I64 ? IRCALL_fp64_l2d : IRCALL_fp64_ul2d) :
+	 (st == IRT_I64 ? IRCALL_fp64_l2f : IRCALL_fp64_ul2f)) :
+	(irt_isnum(ir->t) ?
+	 (st == IRT_INT ? IRCALL_softfp_i2d : IRCALL_softfp_ui2d) :
+	 (st == IRT_INT ? IRCALL_softfp_i2f : IRCALL_softfp_ui2f));
+      asm_callid(as, ir, cid);
+    }
+#else
+    asm_callid(as, ir, IRCALL_softfp_i2d);
+#endif
+  } else if (stfp) {  /* FP to integer conversion. */
+    if (irt_isguard(ir->t)) {
+      /* Checked conversions are only supported from number to int. */
+      lua_assert(irt_isint(ir->t) && st == IRT_NUM);
+      asm_tointg(as, ir, RID_NONE);
+    } else {
+      IRCallID cid = irt_is64(ir->t) ?
+	((st == IRT_NUM) ?
+	 (irt_isi64(ir->t) ? IRCALL_fp64_d2l : IRCALL_fp64_d2ul) :
+	 (irt_isi64(ir->t) ? IRCALL_fp64_f2l : IRCALL_fp64_f2ul)) :
+	((st == IRT_NUM) ?
+	 (irt_isint(ir->t) ? IRCALL_softfp_d2i : IRCALL_softfp_d2ui) :
+	 (irt_isint(ir->t) ? IRCALL_softfp_f2i : IRCALL_softfp_f2ui));
+      asm_callid(as, ir, cid);
+    }
+  } else
+#endif
 #endif
   {
     Reg dest = ra_dest(as, ir, RSET_GPR);
@@ -665,7 +726,7 @@ static void asm_strto(ASMState *as, IRIns *ir)
   const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
   IRRef args[2];
   int32_t ofs = 0;
-#if LJ_SOFTFP
+#if LJ_SOFTFP32
   ra_evictset(as, RSET_SCRATCH);
   if (ra_used(ir)) {
     if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
@@ -806,7 +867,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
   MCLabel l_end, l_loop, l_next;
 
   rset_clear(allow, tab);
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP32
   if (!isk) {
     key = ra_alloc1(as, refkey, allow);
     rset_clear(allow, key);
@@ -826,7 +887,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
     }
   }
 #else
-  if (irt_isnum(kt)) {
+  if (!LJ_SOFTFP && irt_isnum(kt)) {
     key = ra_alloc1(as, refkey, RSET_FPR);
     tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
   } else if (!irt_ispri(kt)) {
@@ -882,6 +943,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
     emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
     emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
     emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
+  } else if (LJ_SOFTFP && irt_isnum(kt)) {
+    emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
+    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
   } else if (irt_isaddr(kt)) {
     Reg refk = tmp2;
     if (isk) {
@@ -960,7 +1024,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
       emit_dta(as, MIPSI_ROTR, dest, tmp1, (-HASH_ROT1)&31);
       if (irt_isnum(kt)) {
 	emit_dst(as, MIPSI_ADDU, tmp1, tmp1, tmp1);
-	emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 0);
+	emit_dta(as, MIPSI_DSRA32, tmp1, LJ_SOFTFP ? key : tmp1, 0);
 	emit_dta(as, MIPSI_SLL, tmp2, LJ_SOFTFP ? key : tmp1, 0);
 #if !LJ_SOFTFP
 	emit_tg(as, MIPSI_DMFC1, tmp1, key);
@@ -1123,7 +1187,7 @@ static MIPSIns asm_fxloadins(IRIns *ir)
   case IRT_U8: return MIPSI_LBU;
   case IRT_I16: return MIPSI_LH;
   case IRT_U16: return MIPSI_LHU;
-  case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_LDC1;
+  case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
   case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
   default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
   }
@@ -1134,7 +1198,7 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
   switch (irt_type(ir->t)) {
   case IRT_I8: case IRT_U8: return MIPSI_SB;
   case IRT_I16: case IRT_U16: return MIPSI_SH;
-  case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_SDC1;
+  case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
   case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
   default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
   }
@@ -1199,7 +1263,7 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
 
 static void asm_ahuvload(ASMState *as, IRIns *ir)
 {
-  int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
+  int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
   Reg dest = RID_NONE, type = RID_TMP, idx;
   RegSet allow = RSET_GPR;
   int32_t ofs = 0;
@@ -1212,7 +1276,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
     }
   }
   if (ra_used(ir)) {
-    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
+    lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
 	       irt_isint(ir->t) || irt_isaddr(ir->t));
     dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
     rset_clear(allow, dest);
@@ -1261,10 +1325,10 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
   int32_t ofs = 0;
   if (ir->r == RID_SINK)
     return;
-  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
-    src = ra_alloc1(as, ir->op2, RSET_FPR);
+  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+    src = ra_alloc1(as, ir->op2, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
     idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
-    emit_hsi(as, MIPSI_SDC1, src, idx, ofs);
+    emit_hsi(as, LJ_SOFTFP ? MIPSI_SD : MIPSI_SDC1, src, idx, ofs);
   } else {
 #if LJ_32
     if (!irt_ispri(ir->t)) {
@@ -1312,7 +1376,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
   IRType1 t = ir->t;
 #if LJ_32
   int32_t ofs = 8*((int32_t)ir->op1-1) + ((ir->op2 & IRSLOAD_FRAME) ? 4 : 0);
-  int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
+  int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
   if (hiop)
     t.irt = IRT_NUM;
 #else
@@ -1320,7 +1384,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
 #endif
   lua_assert(!(ir->op2 & IRSLOAD_PARENT));  /* Handled by asm_head_side(). */
   lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP32
   lua_assert(!(ir->op2 & IRSLOAD_CONVERT));  /* Handled by LJ_SOFTFP SPLIT. */
   if (hiop && ra_used(ir+1)) {
     type = ra_dest(as, ir+1, allow);
@@ -1328,29 +1392,44 @@ static void asm_sload(ASMState *as, IRIns *ir)
   }
 #else
   if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
-    dest = ra_scratch(as, RSET_FPR);
+    dest = ra_scratch(as, LJ_SOFTFP ? allow : RSET_FPR);
     asm_tointg(as, ir, dest);
     t.irt = IRT_NUM;  /* Continue with a regular number type check. */
   } else
 #endif
   if (ra_used(ir)) {
-    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
+    lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
 	       irt_isint(ir->t) || irt_isaddr(ir->t));
     dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
     rset_clear(allow, dest);
     base = ra_alloc1(as, REF_BASE, allow);
     rset_clear(allow, base);
-    if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
+    if (!LJ_SOFTFP32 && (ir->op2 & IRSLOAD_CONVERT)) {
       if (irt_isint(t)) {
-	Reg tmp = ra_scratch(as, RSET_FPR);
+	Reg tmp = ra_scratch(as, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
+#if LJ_SOFTFP
+	ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
+	ra_destreg(as, ir, RID_RET);
+	emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_d2i].func, 0);
+	if (tmp != REGARG_FIRSTGPR)
+	  emit_move(as, REGARG_FIRSTGPR, tmp);
+#else
 	emit_tg(as, MIPSI_MFC1, dest, tmp);
 	emit_fg(as, MIPSI_TRUNC_W_D, tmp, tmp);
+#endif
 	dest = tmp;
 	t.irt = IRT_NUM;  /* Check for original type. */
       } else {
 	Reg tmp = ra_scratch(as, RSET_GPR);
+#if LJ_SOFTFP
+	ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
+	ra_destreg(as, ir, RID_RET);
+	emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_i2d].func, 0);
+	emit_dta(as, MIPSI_SLL, REGARG_FIRSTGPR, tmp, 0);
+#else
 	emit_fg(as, MIPSI_CVT_D_W, dest, dest);
 	emit_tg(as, MIPSI_MTC1, tmp, dest);
+#endif
 	dest = tmp;
 	t.irt = IRT_INT;  /* Check for original type. */
       }
@@ -1399,7 +1478,7 @@ dotypecheck:
       if (irt_isnum(t)) {
 	asm_guard(as, MIPSI_BEQ, RID_TMP, RID_ZERO);
 	emit_tsi(as, MIPSI_SLTIU, RID_TMP, RID_TMP, (int32_t)LJ_TISNUM);
-	if (ra_hasreg(dest))
+	if (!LJ_SOFTFP && ra_hasreg(dest))
 	  emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
       } else {
 	asm_guard(as, MIPSI_BNE, RID_TMP,
@@ -1409,7 +1488,7 @@ dotypecheck:
     }
     emit_tsi(as, MIPSI_LD, type, base, ofs);
   } else if (ra_hasreg(dest)) {
-    if (irt_isnum(t))
+    if (!LJ_SOFTFP && irt_isnum(t))
       emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
     else
       emit_tsi(as, irt_isint(t) ? MIPSI_LW : MIPSI_LD, dest, base,
@@ -1554,26 +1633,40 @@ static void asm_fpunary(ASMState *as, IRIns *ir, MIPSIns mi)
   Reg left = ra_hintalloc(as, ir->op1, dest, RSET_FPR);
   emit_fg(as, mi, dest, left);
 }
+#endif
 
+#if !LJ_SOFTFP32
 static void asm_fpmath(ASMState *as, IRIns *ir)
 {
   if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir))
     return;
+#if !LJ_SOFTFP
   if (ir->op2 <= IRFPM_TRUNC)
     asm_callround(as, ir, IRCALL_lj_vm_floor + ir->op2);
   else if (ir->op2 == IRFPM_SQRT)
     asm_fpunary(as, ir, MIPSI_SQRT_D);
   else
+#endif
     asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
 }
 #endif
 
+#if !LJ_SOFTFP
+#define asm_fpadd(as, ir)	asm_fparith(as, ir, MIPSI_ADD_D)
+#define asm_fpsub(as, ir)	asm_fparith(as, ir, MIPSI_SUB_D)
+#define asm_fpmul(as, ir)	asm_fparith(as, ir, MIPSI_MUL_D)
+#elif LJ_64  /* && LJ_SOFTFP */
+#define asm_fpadd(as, ir)	asm_callid(as, ir, IRCALL_softfp_add)
+#define asm_fpsub(as, ir)	asm_callid(as, ir, IRCALL_softfp_sub)
+#define asm_fpmul(as, ir)	asm_callid(as, ir, IRCALL_softfp_mul)
+#endif
+
 static void asm_add(ASMState *as, IRIns *ir)
 {
   IRType1 t = ir->t;
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
   if (irt_isnum(t)) {
-    asm_fparith(as, ir, MIPSI_ADD_D);
+    asm_fpadd(as, ir);
   } else
 #endif
   {
@@ -1595,9 +1688,9 @@ static void asm_add(ASMState *as, IRIns *ir)
 
 static void asm_sub(ASMState *as, IRIns *ir)
 {
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
   if (irt_isnum(ir->t)) {
-    asm_fparith(as, ir, MIPSI_SUB_D);
+    asm_fpsub(as, ir);
   } else
 #endif
   {
@@ -1611,9 +1704,9 @@ static void asm_sub(ASMState *as, IRIns *ir)
 
 static void asm_mul(ASMState *as, IRIns *ir)
 {
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
   if (irt_isnum(ir->t)) {
-    asm_fparith(as, ir, MIPSI_MUL_D);
+    asm_fpmul(as, ir);
   } else
 #endif
   {
@@ -1640,7 +1733,7 @@ static void asm_mod(ASMState *as, IRIns *ir)
     asm_callid(as, ir, IRCALL_lj_vm_modi);
 }
 
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
 static void asm_pow(ASMState *as, IRIns *ir)
 {
 #if LJ_64 && LJ_HASFFI
@@ -1660,7 +1753,11 @@ static void asm_div(ASMState *as, IRIns *ir)
 					  IRCALL_lj_carith_divu64);
   else
 #endif
+#if !LJ_SOFTFP
     asm_fparith(as, ir, MIPSI_DIV_D);
+#else
+  asm_callid(as, ir, IRCALL_softfp_div);
+#endif
 }
 #endif
 
@@ -1670,6 +1767,13 @@ static void asm_neg(ASMState *as, IRIns *ir)
   if (irt_isnum(ir->t)) {
     asm_fpunary(as, ir, MIPSI_NEG_D);
   } else
+#elif LJ_64  /* && LJ_SOFTFP */
+  if (irt_isnum(ir->t)) {
+    Reg dest = ra_dest(as, ir, RSET_GPR);
+    Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
+    emit_dst(as, MIPSI_XOR, dest, left,
+	    ra_allock(as, 0x8000000000000000ll, rset_exclude(RSET_GPR, dest)));
+  } else
 #endif
   {
     Reg dest = ra_dest(as, ir, RSET_GPR);
@@ -1679,7 +1783,17 @@ static void asm_neg(ASMState *as, IRIns *ir)
   }
 }
 
+#if !LJ_SOFTFP
 #define asm_abs(as, ir)		asm_fpunary(as, ir, MIPSI_ABS_D)
+#elif LJ_64   /* && LJ_SOFTFP */
+static void asm_abs(ASMState *as, IRIns *ir)
+{
+  Reg dest = ra_dest(as, ir, RSET_GPR);
+  Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
+  emit_tsml(as, MIPSI_DEXTM, dest, left, 30, 0);
+}
+#endif
+
 #define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
 #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
 
@@ -1924,15 +2038,21 @@ static void asm_bror(ASMState *as, IRIns *ir)
   }
 }
 
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP
 static void asm_sfpmin_max(ASMState *as, IRIns *ir)
 {
   CCallInfo ci = lj_ir_callinfo[(IROp)ir->o == IR_MIN ? IRCALL_lj_vm_sfmin : IRCALL_lj_vm_sfmax];
+#if LJ_64
+  IRRef args[2];
+  args[0] = ir->op1;
+  args[1] = ir->op2;
+#else
   IRRef args[4];
   args[0^LJ_BE] = ir->op1;
   args[1^LJ_BE] = (ir+1)->op1;
   args[2^LJ_BE] = ir->op2;
   args[3^LJ_BE] = (ir+1)->op2;
+#endif
   asm_setupresult(as, ir, &ci);
   emit_call(as, (void *)ci.func, 0);
   ci.func = NULL;
@@ -1942,7 +2062,10 @@ static void asm_sfpmin_max(ASMState *as, IRIns *ir)
 
 static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
 {
-  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
+  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+    asm_sfpmin_max(as, ir);
+#else
     Reg dest = ra_dest(as, ir, RSET_FPR);
     Reg right, left = ra_alloc2(as, ir, RSET_FPR);
     right = (left >> 8); left &= 255;
@@ -1953,6 +2076,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
       if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
     }
     emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
+#endif
   } else {
     Reg dest = ra_dest(as, ir, RSET_GPR);
     Reg right, left = ra_alloc2(as, ir, RSET_GPR);
@@ -1973,18 +2097,24 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
 
 /* -- Comparisons --------------------------------------------------------- */
 
-#if LJ_32 && LJ_SOFTFP
+#if LJ_SOFTFP
 /* SFP comparisons. */
 static void asm_sfpcomp(ASMState *as, IRIns *ir)
 {
   const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
   RegSet drop = RSET_SCRATCH;
   Reg r;
+#if LJ_64
+  IRRef args[2];
+  args[0] = ir->op1;
+  args[1] = ir->op2;
+#else
   IRRef args[4];
   args[LJ_LE ? 0 : 1] = ir->op1; args[LJ_LE ? 1 : 0] = (ir+1)->op1;
   args[LJ_LE ? 2 : 3] = ir->op2; args[LJ_LE ? 3 : 2] = (ir+1)->op2;
+#endif
 
-  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
+  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+(LJ_64?1:3); r++) {
     if (!rset_test(as->freeset, r) &&
 	regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
       rset_clear(drop, r);
@@ -2038,11 +2168,15 @@ static void asm_comp(ASMState *as, IRIns *ir)
 {
   /* ORDER IR: LT GE LE GT  ULT UGE ULE UGT. */
   IROp op = ir->o;
-  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
+  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+    asm_sfpcomp(as, ir);
+#else
     Reg right, left = ra_alloc2(as, ir, RSET_FPR);
     right = (left >> 8); left &= 255;
     asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
     emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
+#endif
   } else {
     Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
     if (op == IR_ABC) op = IR_UGT;
@@ -2074,9 +2208,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
   Reg right, left = ra_alloc2(as, ir, (!LJ_SOFTFP && irt_isnum(ir->t)) ?
 				       RSET_FPR : RSET_GPR);
   right = (left >> 8); left &= 255;
-  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
+  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+    asm_sfpcomp(as, ir);
+#else
     asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
     emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
+#endif
   } else {
     asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
   }
@@ -2269,7 +2407,7 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
     if ((sn & SNAP_NORESTORE))
       continue;
     if (irt_isnum(ir->t)) {
-#if LJ_SOFTFP
+#if LJ_SOFTFP32
       Reg tmp;
       RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
       lua_assert(irref_isk(ref));  /* LJ_SOFTFP: must be a number constant. */
@@ -2278,6 +2416,9 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
       if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
       tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
       emit_tsi(as, MIPSI_SW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
+#elif LJ_SOFTFP  /* && LJ_64 */
+      Reg src = ra_alloc1(as, ref, rset_exclude(RSET_GPR, RID_BASE));
+      emit_tsi(as, MIPSI_SD, src, RID_BASE, ofs);
 #else
       Reg src = ra_alloc1(as, ref, RSET_FPR);
       emit_hsi(as, MIPSI_SDC1, src, RID_BASE, ofs);
diff --git a/src/lj_crecord.c b/src/lj_crecord.c
index ffe995f4..804cdbf4 100644
--- a/src/lj_crecord.c
+++ b/src/lj_crecord.c
@@ -212,7 +212,7 @@ static void crec_copy_emit(jit_State *J, CRecMemList *ml, MSize mlp,
     ml[i].trval = emitir(IRT(IR_XLOAD, ml[i].tp), trsptr, 0);
     ml[i].trofs = trofs;
     i++;
-    rwin += (LJ_SOFTFP && ml[i].tp == IRT_NUM) ? 2 : 1;
+    rwin += (LJ_SOFTFP32 && ml[i].tp == IRT_NUM) ? 2 : 1;
     if (rwin >= CREC_COPY_REGWIN || i >= mlp) {  /* Flush buffered stores. */
       rwin = 0;
       for ( ; j < i; j++) {
@@ -1152,7 +1152,7 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd,
 	else
 	  tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_I8 : IRT_I16,IRCONV_SEXT);
       }
-    } else if (LJ_SOFTFP && ctype_isfp(d->info) && d->size > 4) {
+    } else if (LJ_SOFTFP32 && ctype_isfp(d->info) && d->size > 4) {
       lj_needsplit(J);
     }
 #if LJ_TARGET_X86
diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
index 8a9ee24d..bb6593ae 100644
--- a/src/lj_emit_mips.h
+++ b/src/lj_emit_mips.h
@@ -12,6 +12,8 @@ static intptr_t get_k64val(IRIns *ir)
     return (intptr_t)ir_kgc(ir);
   } else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) {
     return (intptr_t)ir_kptr(ir);
+  } else if (LJ_SOFTFP && ir->o == IR_KNUM) {
+    return (intptr_t)ir_knum(ir)->u64;
   } else {
     lua_assert(ir->o == IR_KINT || ir->o == IR_KNULL);
     return ir->i;  /* Sign-extended. */
diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
index 8af9da1d..0746ec64 100644
--- a/src/lj_ffrecord.c
+++ b/src/lj_ffrecord.c
@@ -986,7 +986,7 @@ static void LJ_FASTCALL recff_string_format(jit_State *J, RecordFFData *rd)
     handle_num:
       tra = lj_ir_tonum(J, tra);
       tr = lj_ir_call(J, id, tr, trsf, tra);
-      if (LJ_SOFTFP) lj_needsplit(J);
+      if (LJ_SOFTFP32) lj_needsplit(J);
       break;
     case STRFMT_STR:
       if (!tref_isstr(tra)) {
diff --git a/src/lj_ircall.h b/src/lj_ircall.h
index aa06b273..c1ac29d1 100644
--- a/src/lj_ircall.h
+++ b/src/lj_ircall.h
@@ -52,7 +52,7 @@ typedef struct CCallInfo {
 #define CCI_XARGS(ci)		(((ci)->flags >> CCI_XARGS_SHIFT) & 3)
 #define CCI_XA			(1u << CCI_XARGS_SHIFT)
 
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
 #define CCI_XNARGS(ci)		(CCI_NARGS((ci)) + CCI_XARGS((ci)))
 #else
 #define CCI_XNARGS(ci)		CCI_NARGS((ci))
@@ -79,13 +79,19 @@ typedef struct CCallInfo {
 #define IRCALLCOND_SOFTFP_FFI(x)	NULL
 #endif
 
-#if LJ_SOFTFP && LJ_TARGET_MIPS32
+#if LJ_SOFTFP && LJ_TARGET_MIPS
 #define IRCALLCOND_SOFTFP_MIPS(x)	x
 #else
 #define IRCALLCOND_SOFTFP_MIPS(x)	NULL
 #endif
 
-#define LJ_NEED_FP64	(LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS32)
+#if LJ_SOFTFP && LJ_TARGET_MIPS64
+#define IRCALLCOND_SOFTFP_MIPS64(x)	x
+#else
+#define IRCALLCOND_SOFTFP_MIPS64(x)	NULL
+#endif
+
+#define LJ_NEED_FP64	(LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS)
 
 #if LJ_HASFFI && (LJ_SOFTFP || LJ_NEED_FP64)
 #define IRCALLCOND_FP64_FFI(x)		x
@@ -113,6 +119,14 @@ typedef struct CCallInfo {
 #define XA2_FP		0
 #endif
 
+#if LJ_SOFTFP32
+#define XA_FP32		CCI_XA
+#define XA2_FP32	(CCI_XA+CCI_XA)
+#else
+#define XA_FP32		0
+#define XA2_FP32	0
+#endif
+
 #if LJ_32
 #define XA_64		CCI_XA
 #define XA2_64		(CCI_XA+CCI_XA)
@@ -185,20 +199,21 @@ typedef struct CCallInfo {
   _(ANY,	pow,			2,   N, NUM, XA2_FP) \
   _(ANY,	atan2,			2,   N, NUM, XA2_FP) \
   _(ANY,	ldexp,			2,   N, NUM, XA_FP) \
-  _(SOFTFP,	lj_vm_tobit,		2,   N, INT, 0) \
-  _(SOFTFP,	softfp_add,		4,   N, NUM, 0) \
-  _(SOFTFP,	softfp_sub,		4,   N, NUM, 0) \
-  _(SOFTFP,	softfp_mul,		4,   N, NUM, 0) \
-  _(SOFTFP,	softfp_div,		4,   N, NUM, 0) \
-  _(SOFTFP,	softfp_cmp,		4,   N, NIL, 0) \
+  _(SOFTFP,	lj_vm_tobit,		1,   N, INT, XA_FP32) \
+  _(SOFTFP,	softfp_add,		2,   N, NUM, XA2_FP32) \
+  _(SOFTFP,	softfp_sub,		2,   N, NUM, XA2_FP32) \
+  _(SOFTFP,	softfp_mul,		2,   N, NUM, XA2_FP32) \
+  _(SOFTFP,	softfp_div,		2,   N, NUM, XA2_FP32) \
+  _(SOFTFP,	softfp_cmp,		2,   N, NIL, XA2_FP32) \
   _(SOFTFP,	softfp_i2d,		1,   N, NUM, 0) \
-  _(SOFTFP,	softfp_d2i,		2,   N, INT, 0) \
-  _(SOFTFP_MIPS, lj_vm_sfmin,		4,   N, NUM, 0) \
-  _(SOFTFP_MIPS, lj_vm_sfmax,		4,   N, NUM, 0) \
+  _(SOFTFP,	softfp_d2i,		1,   N, INT, XA_FP32) \
+  _(SOFTFP_MIPS, lj_vm_sfmin,		2,   N, NUM, XA2_FP32) \
+  _(SOFTFP_MIPS, lj_vm_sfmax,		2,   N, NUM, XA2_FP32) \
+  _(SOFTFP_MIPS64, lj_vm_tointg,	1,   N, INT, 0) \
   _(SOFTFP_FFI,	softfp_ui2d,		1,   N, NUM, 0) \
   _(SOFTFP_FFI,	softfp_f2d,		1,   N, NUM, 0) \
-  _(SOFTFP_FFI,	softfp_d2ui,		2,   N, INT, 0) \
-  _(SOFTFP_FFI,	softfp_d2f,		2,   N, FLOAT, 0) \
+  _(SOFTFP_FFI,	softfp_d2ui,		1,   N, INT, XA_FP32) \
+  _(SOFTFP_FFI,	softfp_d2f,		1,   N, FLOAT, XA_FP32) \
   _(SOFTFP_FFI,	softfp_i2f,		1,   N, FLOAT, 0) \
   _(SOFTFP_FFI,	softfp_ui2f,		1,   N, FLOAT, 0) \
   _(SOFTFP_FFI,	softfp_f2i,		1,   N, INT, 0) \
diff --git a/src/lj_iropt.h b/src/lj_iropt.h
index 73aef0ef..a59ba3f4 100644
--- a/src/lj_iropt.h
+++ b/src/lj_iropt.h
@@ -150,7 +150,7 @@ LJ_FUNC IRType lj_opt_narrow_forl(jit_State *J, cTValue *forbase);
 /* Optimization passes. */
 LJ_FUNC void lj_opt_dce(jit_State *J);
 LJ_FUNC int lj_opt_loop(jit_State *J);
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
 LJ_FUNC void lj_opt_split(jit_State *J);
 #else
 #define lj_opt_split(J)		UNUSED(J)
diff --git a/src/lj_jit.h b/src/lj_jit.h
index cc8efd20..c06829ab 100644
--- a/src/lj_jit.h
+++ b/src/lj_jit.h
@@ -375,7 +375,7 @@ enum {
   ((TValue *)(((intptr_t)&J->ksimd[2*(n)] + 15) & ~(intptr_t)15))
 
 /* Set/reset flag to activate the SPLIT pass for the current trace. */
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
 #define lj_needsplit(J)		(J->needsplit = 1)
 #define lj_resetsplit(J)	(J->needsplit = 0)
 #else
@@ -438,7 +438,7 @@ typedef struct jit_State {
   MSize sizesnapmap;	/* Size of temp. snapshot map buffer. */
 
   PostProc postproc;	/* Required post-processing after execution. */
-#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
+#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
   uint8_t needsplit;	/* Need SPLIT pass. */
 #endif
   uint8_t retryrec;	/* Retry recording. */
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 45507e0d..bf95e1eb 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -984,6 +984,9 @@ static LJ_AINLINE void copyTV(lua_State *L, TValue *o1, const TValue *o2)
 
 #if LJ_SOFTFP
 LJ_ASMF int32_t lj_vm_tobit(double x);
+#if LJ_TARGET_MIPS64
+LJ_ASMF int32_t lj_vm_tointg(double x);
+#endif
 #endif
 
 static LJ_AINLINE int32_t lj_num2bit(lua_Number n)
diff --git a/src/lj_opt_split.c b/src/lj_opt_split.c
index c0788106..2fc36b8d 100644
--- a/src/lj_opt_split.c
+++ b/src/lj_opt_split.c
@@ -8,7 +8,7 @@
 
 #include "lj_obj.h"
 
-#if LJ_HASJIT && (LJ_SOFTFP || (LJ_32 && LJ_HASFFI))
+#if LJ_HASJIT && (LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI))
 
 #include "lj_err.h"
 #include "lj_buf.h"
diff --git a/src/lj_snap.c b/src/lj_snap.c
index a063c316..9146cddc 100644
--- a/src/lj_snap.c
+++ b/src/lj_snap.c
@@ -93,7 +93,7 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
 	    (ir->op2 & (IRSLOAD_READONLY|IRSLOAD_PARENT)) != IRSLOAD_PARENT)
 	  sn |= SNAP_NORESTORE;
       }
-      if (LJ_SOFTFP && irt_isnum(ir->t))
+      if (LJ_SOFTFP32 && irt_isnum(ir->t))
 	sn |= SNAP_SOFTFPNUM;
       map[n++] = sn;
     }
@@ -379,7 +379,7 @@ IRIns *lj_snap_regspmap(GCtrace *T, SnapNo snapno, IRIns *ir)
 	  break;
 	}
       }
-    } else if (LJ_SOFTFP && ir->o == IR_HIOP) {
+    } else if (LJ_SOFTFP32 && ir->o == IR_HIOP) {
       ref++;
     } else if (ir->o == IR_PVAL) {
       ref = ir->op1 + REF_BIAS;
@@ -491,7 +491,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
     } else {
       IRType t = irt_type(ir->t);
       uint32_t mode = IRSLOAD_INHERIT|IRSLOAD_PARENT;
-      if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
+      if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
       if (ir->o == IR_SLOAD) mode |= (ir->op2 & IRSLOAD_READONLY);
       tr = emitir_raw(IRT(IR_SLOAD, t), s, mode);
     }
@@ -525,7 +525,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
 	    if (irs->r == RID_SINK && snap_sunk_store(T, ir, irs)) {
 	      if (snap_pref(J, T, map, nent, seen, irs->op2) == 0)
 		snap_pref(J, T, map, nent, seen, T->ir[irs->op2].op1);
-	      else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
+	      else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
 		       irs+1 < irlast && (irs+1)->o == IR_HIOP)
 		snap_pref(J, T, map, nent, seen, (irs+1)->op2);
 	    }
@@ -584,10 +584,10 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
 		lua_assert(irc->o == IR_CONV && irc->op2 == IRCONV_NUM_INT);
 		val = snap_pref(J, T, map, nent, seen, irc->op1);
 		val = emitir(IRTN(IR_CONV), val, IRCONV_NUM_INT);
-	      } else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
+	      } else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
 			 irs+1 < irlast && (irs+1)->o == IR_HIOP) {
 		IRType t = IRT_I64;
-		if (LJ_SOFTFP && irt_type((irs+1)->t) == IRT_SOFTFP)
+		if (LJ_SOFTFP32 && irt_type((irs+1)->t) == IRT_SOFTFP)
 		  t = IRT_NUM;
 		lj_needsplit(J);
 		if (irref_isk(irs->op2) && irref_isk((irs+1)->op2)) {
@@ -645,7 +645,7 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
     int32_t *sps = &ex->spill[regsp_spill(rs)];
     if (irt_isinteger(t)) {
       setintV(o, *sps);
-#if !LJ_SOFTFP
+#if !LJ_SOFTFP32
     } else if (irt_isnum(t)) {
       o->u64 = *(uint64_t *)sps;
 #endif
@@ -670,6 +670,9 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
 #if !LJ_SOFTFP
     } else if (irt_isnum(t)) {
       setnumV(o, ex->fpr[r-RID_MIN_FPR]);
+#elif LJ_64  /* && LJ_SOFTFP */
+    } else if (irt_isnum(t)) {
+      o->u64 = ex->gpr[r-RID_MIN_GPR];
 #endif
 #if LJ_64 && !LJ_GC64
     } else if (irt_is64(t)) {
@@ -823,7 +826,7 @@ static void snap_unsink(jit_State *J, GCtrace *T, ExitState *ex,
 	  val = lj_tab_set(J->L, t, &tmp);
 	  /* NOBARRIER: The table is new (marked white). */
 	  snap_restoreval(J, T, ex, snapno, rfilt, irs->op2, val);
-	  if (LJ_SOFTFP && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
+	  if (LJ_SOFTFP32 && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
 	    snap_restoreval(J, T, ex, snapno, rfilt, (irs+1)->op2, &tmp);
 	    val->u32.hi = tmp.u32.lo;
 	  }
@@ -884,7 +887,7 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
 	continue;
       }
       snap_restoreval(J, T, ex, snapno, rfilt, ref, o);
-      if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
+      if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
 	TValue tmp;
 	snap_restoreval(J, T, ex, snapno, rfilt, ref+1, &tmp);
 	o->u32.hi = tmp.u32.lo;
diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
index 04be38f0..9839b5ac 100644
--- a/src/vm_mips64.dasc
+++ b/src/vm_mips64.dasc
@@ -1984,6 +1984,38 @@ static void build_subroutines(BuildCtx *ctx)
   |1:
   |  jr ra
   |.  move CRET1, r0
+  |
+  |// FP number to int conversion with a check for soft-float.
+  |// Modifies CARG1, CRET1, CRET2, TMP0, AT.
+  |->vm_tointg:
+  |.if JIT
+  |  dsll CRET2, CARG1, 1
+  |  beqz CRET2, >2
+  |.  li TMP0, 1076
+  |  dsrl AT, CRET2, 53
+  |  dsubu TMP0, TMP0, AT
+  |  sltiu AT, TMP0, 54
+  |  beqz AT, >1
+  |.  dextm CRET2, CRET2, 0, 20
+  |  dinsu CRET2, AT, 21, 21
+  |  slt AT, CARG1, r0
+  |  dsrlv CRET1, CRET2, TMP0
+  |  dsubu CARG1, r0, CRET1
+  |  movn CRET1, CARG1, AT
+  |  li CARG1, 64
+  |  subu TMP0, CARG1, TMP0
+  |  dsllv CRET2, CRET2, TMP0	// Integer check.
+  |  sextw AT, CRET1
+  |  xor AT, CRET1, AT		// Range check.
+  |  jr ra
+  |.  movz CRET2, AT, CRET2
+  |1:
+  |  jr ra
+  |.  li CRET2, 1
+  |2:
+  |  jr ra
+  |.  move CRET1, r0
+  |.endif
   |.endif
   |
   |.macro .ffunc_bit, name
@@ -2669,6 +2701,23 @@ static void build_subroutines(BuildCtx *ctx)
   |.  li CRET1, 0
   |.endif
   |
+  |.macro sfmin_max, name, intins
+  |->vm_sf .. name:
+  |.if JIT and not FPU
+  |  move TMP2, ra
+  |  bal ->vm_sfcmpolt
+  |.  nop
+  |  move ra, TMP2
+  |  move TMP0, CRET1
+  |  move CRET1, CARG1
+  |  jr ra
+  |.  intins CRET1, CARG2, TMP0
+  |.endif
+  |.endmacro
+  |
+  |  sfmin_max min, movz
+  |  sfmin_max max, movn
+  |
   |//-----------------------------------------------------------------------
   |//-- Miscellaneous functions --------------------------------------------
   |//-----------------------------------------------------------------------
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (3 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 11:40   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 14:53   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
Sponsored by Cisco Systems, Inc.

(cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)

The software floating point library is used on machines which do not
have hardware support for floating point [1]. This patch enables
support for such machines in the VM for powerpc.
This includes:
* Any loads/storages of double values use load/storage through 32-bit
  registers of `lo` and `hi` part of the TValue union.
* Macro .FPU is added to skip instructions necessary only for
  hard-float operations (load/store floating point registers from/on the
  stack, when leave/enter VM, for example).
* Now r25 named as `SAVE1` is used as saved temporary register (used in
  different fast functions)
* `sfi2d` macro is introduced to convert integer, that represents a
  soft-float, to double. Receives destination and source registers, uses
  `TMP0` and `TMP1`.
* `sfpmod` macro is introduced for soft-float point `fmod` built-in.
* `ins_arith` now receives the third parameter -- operation to use for
  soft-float point.
* `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
  there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
  set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.

Support of soft-float point for the JIT compiler will be added in the
next patch.

[1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

Sergey Kaplun:
* added the description for the feature

Part of tarantool/tarantool#8825
---
 src/host/buildvm_asm.c |    2 +-
 src/lj_arch.h          |   29 +-
 src/lj_ccall.c         |   38 +-
 src/lj_ccall.h         |    4 +-
 src/lj_ccallback.c     |   30 +-
 src/lj_frame.h         |    2 +-
 src/lj_ircall.h        |    2 +-
 src/vm_ppc.dasc        | 1249 +++++++++++++++++++++++++++++++++-------
 8 files changed, 1101 insertions(+), 255 deletions(-)

diff --git a/src/host/buildvm_asm.c b/src/host/buildvm_asm.c
index ffd14903..43595b31 100644
--- a/src/host/buildvm_asm.c
+++ b/src/host/buildvm_asm.c
@@ -338,7 +338,7 @@ void emit_asm(BuildCtx *ctx)
 #if !(LJ_TARGET_PS3 || LJ_TARGET_PSVITA)
     fprintf(ctx->fp, "\t.section .note.GNU-stack,\"\"," ELFASM_PX "progbits\n");
 #endif
-#if LJ_TARGET_PPC && !LJ_TARGET_PS3
+#if LJ_TARGET_PPC && !LJ_TARGET_PS3 && !LJ_ABI_SOFTFP
     /* Hard-float ABI. */
     fprintf(ctx->fp, "\t.gnu_attribute 4, 1\n");
 #endif
diff --git a/src/lj_arch.h b/src/lj_arch.h
index c39526ea..8bb8757d 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -262,6 +262,29 @@
 #else
 #define LJ_ARCH_BITS		32
 #define LJ_ARCH_NAME		"ppc"
+
+#if !defined(LJ_ARCH_HASFPU)
+#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
+#define LJ_ARCH_HASFPU		0
+#else
+#define LJ_ARCH_HASFPU		1
+#endif
+#endif
+
+#if !defined(LJ_ABI_SOFTFP)
+#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
+#define LJ_ABI_SOFTFP		1
+#else
+#define LJ_ABI_SOFTFP		0
+#endif
+#endif
+#endif
+
+#if LJ_ABI_SOFTFP
+#define LJ_ARCH_NOJIT		1  /* NYI */
+#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
+#else
+#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
 #endif
 
 #define LJ_TARGET_PPC		1
@@ -271,7 +294,6 @@
 #define LJ_TARGET_MASKSHIFT	0
 #define LJ_TARGET_MASKROT	1
 #define LJ_TARGET_UNIFYROT	1	/* Want only IR_BROL. */
-#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
 
 #if LJ_TARGET_CONSOLE
 #define LJ_ARCH_PPC32ON64	1
@@ -431,16 +453,13 @@
 #error "No support for ILP32 model on ARM64"
 #endif
 #elif LJ_TARGET_PPC
-#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
-#error "No support for PowerPC CPUs without double-precision FPU"
-#endif
 #if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
 #error "No support for little-endian PPC32"
 #endif
 #if LJ_ARCH_PPC64
 #error "No support for PowerPC 64 bit mode (yet)"
 #endif
-#ifdef __NO_FPRS__
+#if defined(__NO_FPRS__) && !defined(_SOFT_FLOAT)
 #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
 #endif
 #elif LJ_TARGET_MIPS32
diff --git a/src/lj_ccall.c b/src/lj_ccall.c
index d39ff861..c1e12f56 100644
--- a/src/lj_ccall.c
+++ b/src/lj_ccall.c
@@ -388,6 +388,24 @@
 #define CCALL_HANDLE_COMPLEXARG \
   /* Pass complex by value in 2 or 4 GPRs. */
 
+#define CCALL_HANDLE_GPR \
+  /* Try to pass argument in GPRs. */ \
+  if (n > 1) { \
+    lua_assert(n == 2 || n == 4);  /* int64_t or complex (float). */ \
+    if (ctype_isinteger(d->info) || ctype_isfp(d->info)) \
+      ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
+    else if (ngpr + n > maxgpr) \
+      ngpr = maxgpr;  /* Prevent reordering. */ \
+  } \
+  if (ngpr + n <= maxgpr) { \
+    dp = &cc->gpr[ngpr]; \
+    ngpr += n; \
+    goto done; \
+  } \
+
+#if LJ_ABI_SOFTFP
+#define CCALL_HANDLE_REGARG  CCALL_HANDLE_GPR
+#else
 #define CCALL_HANDLE_REGARG \
   if (isfp) {  /* Try to pass argument in FPRs. */ \
     if (nfpr + 1 <= CCALL_NARG_FPR) { \
@@ -396,24 +414,16 @@
       d = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */ \
       goto done; \
     } \
-  } else {  /* Try to pass argument in GPRs. */ \
-    if (n > 1) { \
-      lua_assert(n == 2 || n == 4);  /* int64_t or complex (float). */ \
-      if (ctype_isinteger(d->info)) \
-	ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
-      else if (ngpr + n > maxgpr) \
-	ngpr = maxgpr;  /* Prevent reordering. */ \
-    } \
-    if (ngpr + n <= maxgpr) { \
-      dp = &cc->gpr[ngpr]; \
-      ngpr += n; \
-      goto done; \
-    } \
+  } else { \
+    CCALL_HANDLE_GPR \
   }
+#endif
 
+#if !LJ_ABI_SOFTFP
 #define CCALL_HANDLE_RET \
   if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
     ctr = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */
+#endif
 
 #elif LJ_TARGET_MIPS32
 /* -- MIPS o32 calling conventions ---------------------------------------- */
@@ -1081,7 +1091,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
   }
   if (fid) lj_err_caller(L, LJ_ERR_FFI_NUMARG);  /* Too few arguments. */
 
-#if LJ_TARGET_X64 || LJ_TARGET_PPC
+#if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP)
   cc->nfpr = nfpr;  /* Required for vararg functions. */
 #endif
   cc->nsp = nsp;
diff --git a/src/lj_ccall.h b/src/lj_ccall.h
index 59f66481..6efa48c7 100644
--- a/src/lj_ccall.h
+++ b/src/lj_ccall.h
@@ -86,9 +86,9 @@ typedef union FPRArg {
 #elif LJ_TARGET_PPC
 
 #define CCALL_NARG_GPR		8
-#define CCALL_NARG_FPR		8
+#define CCALL_NARG_FPR		(LJ_ABI_SOFTFP ? 0 : 8)
 #define CCALL_NRET_GPR		4	/* For complex double. */
-#define CCALL_NRET_FPR		1
+#define CCALL_NRET_FPR		(LJ_ABI_SOFTFP ? 0 : 1)
 #define CCALL_SPS_EXTRA		4
 #define CCALL_SPS_FREE		0
 
diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
index 224b6b94..c33190d7 100644
--- a/src/lj_ccallback.c
+++ b/src/lj_ccallback.c
@@ -419,6 +419,23 @@ void lj_ccallback_mcode_free(CTState *cts)
 
 #elif LJ_TARGET_PPC
 
+#define CALLBACK_HANDLE_GPR \
+  if (n > 1) { \
+    lua_assert(((LJ_ABI_SOFTFP && ctype_isnum(cta->info)) ||  /* double. */ \
+		ctype_isinteger(cta->info)) && n == 2);  /* int64_t. */ \
+    ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
+  } \
+  if (ngpr + n <= maxgpr) { \
+    sp = &cts->cb.gpr[ngpr]; \
+    ngpr += n; \
+    goto done; \
+  }
+
+#if LJ_ABI_SOFTFP
+#define CALLBACK_HANDLE_REGARG \
+  CALLBACK_HANDLE_GPR \
+  UNUSED(isfp);
+#else
 #define CALLBACK_HANDLE_REGARG \
   if (isfp) { \
     if (nfpr + 1 <= CCALL_NARG_FPR) { \
@@ -427,20 +444,15 @@ void lj_ccallback_mcode_free(CTState *cts)
       goto done; \
     } \
   } else {  /* Try to pass argument in GPRs. */ \
-    if (n > 1) { \
-      lua_assert(ctype_isinteger(cta->info) && n == 2);  /* int64_t. */ \
-      ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
-    } \
-    if (ngpr + n <= maxgpr) { \
-      sp = &cts->cb.gpr[ngpr]; \
-      ngpr += n; \
-      goto done; \
-    } \
+    CALLBACK_HANDLE_GPR \
   }
+#endif
 
+#if !LJ_ABI_SOFTFP
 #define CALLBACK_HANDLE_RET \
   if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
     *(double *)dp = *(float *)dp;  /* FPRs always hold doubles. */
+#endif
 
 #elif LJ_TARGET_MIPS32
 
diff --git a/src/lj_frame.h b/src/lj_frame.h
index 2bdf3c48..5cb3d639 100644
--- a/src/lj_frame.h
+++ b/src/lj_frame.h
@@ -226,7 +226,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_OFS_L		36
 #define CFRAME_OFS_PC		32
 #define CFRAME_OFS_MULTRES	28
-#define CFRAME_SIZE		272
+#define CFRAME_SIZE		(LJ_ARCH_HASFPU ? 272 : 128)
 #define CFRAME_SHIFT_MULTRES	3
 #endif
 #elif LJ_TARGET_MIPS32
diff --git a/src/lj_ircall.h b/src/lj_ircall.h
index c1ac29d1..bbad35b1 100644
--- a/src/lj_ircall.h
+++ b/src/lj_ircall.h
@@ -291,7 +291,7 @@ LJ_DATA const CCallInfo lj_ir_callinfo[IRCALL__MAX+1];
 #define fp64_f2l __aeabi_f2lz
 #define fp64_f2ul __aeabi_f2ulz
 #endif
-#elif LJ_TARGET_MIPS
+#elif LJ_TARGET_MIPS || LJ_TARGET_PPC
 #define softfp_add __adddf3
 #define softfp_sub __subdf3
 #define softfp_mul __muldf3
diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
index 7ad8df37..980ad897 100644
--- a/src/vm_ppc.dasc
+++ b/src/vm_ppc.dasc
@@ -103,6 +103,18 @@
 |// Fixed register assignments for the interpreter.
 |// Don't use: r1 = sp, r2 and r13 = reserved (TOC, TLS or SDATA)
 |
+|.macro .FPU, a, b
+|.if FPU
+|  a, b
+|.endif
+|.endmacro
+|
+|.macro .FPU, a, b, c
+|.if FPU
+|  a, b, c
+|.endif
+|.endmacro
+|
 |// The following must be C callee-save (but BASE is often refetched).
 |.define BASE,		r14	// Base of current Lua stack frame.
 |.define KBASE,		r15	// Constants of current Lua function.
@@ -116,8 +128,10 @@
 |.define TISNUM,	r22
 |.define TISNIL,	r23
 |.define ZERO,		r24
+|.if FPU
 |.define TOBIT,		f30	// 2^52 + 2^51.
 |.define TONUM,		f31	// 2^52 + 2^51 + 2^31.
+|.endif
 |
 |// The following temporaries are not saved across C calls, except for RA.
 |.define RA,		r20	// Callee-save.
@@ -133,6 +147,7 @@
 |
 |// Saved temporaries.
 |.define SAVE0,		r21
+|.define SAVE1,		r25
 |
 |// Calling conventions.
 |.define CARG1,		r3
@@ -141,8 +156,10 @@
 |.define CARG4,		r6	// Overlaps TMP3.
 |.define CARG5,		r7	// Overlaps INS.
 |
+|.if FPU
 |.define FARG1,		f1
 |.define FARG2,		f2
+|.endif
 |
 |.define CRET1,		r3
 |.define CRET2,		r4
@@ -213,10 +230,16 @@
 |.endif
 |.else
 |
+|.if FPU
 |.define SAVE_LR,	276(sp)
 |.define CFRAME_SPACE,	272     // Delta for sp.
 |// Back chain for sp:	272(sp) <-- sp entering interpreter
 |.define SAVE_FPR_,	128     // .. 128+18*8: 64 bit FPR saves.
+|.else
+|.define SAVE_LR,	132(sp)
+|.define CFRAME_SPACE,	128     // Delta for sp.
+|// Back chain for sp:	128(sp) <-- sp entering interpreter
+|.endif
 |.define SAVE_GPR_,	56      // .. 56+18*4: 32 bit GPR saves.
 |.define SAVE_CR,	52(sp)  // 32 bit CR save.
 |.define SAVE_ERRF,	48(sp)  // 32 bit C frame info.
@@ -226,16 +249,25 @@
 |.define SAVE_PC,	32(sp)
 |.define SAVE_MULTRES,	28(sp)
 |.define UNUSED1,	24(sp)
+|.if FPU
 |.define TMPD_LO,	20(sp)
 |.define TMPD_HI,	16(sp)
 |.define TONUM_LO,	12(sp)
 |.define TONUM_HI,	8(sp)
+|.else
+|.define SFSAVE_4,	20(sp)
+|.define SFSAVE_3,	16(sp)
+|.define SFSAVE_2,	12(sp)
+|.define SFSAVE_1,	8(sp)
+|.endif
 |// Next frame lr:	4(sp)
 |// Back chain for sp:	0(sp)	<-- sp while in interpreter
 |
+|.if FPU
 |.define TMPD_BLO,	23(sp)
 |.define TMPD,		TMPD_HI
 |.define TONUM_D,	TONUM_HI
+|.endif
 |
 |.endif
 |
@@ -245,7 +277,7 @@
 |.else
 |  stw r..reg, SAVE_GPR_+(reg-14)*4(sp)
 |.endif
-|  stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
+|  .FPU stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
 |.endmacro
 |.macro rest_, reg
 |.if GPR64
@@ -253,7 +285,7 @@
 |.else
 |  lwz r..reg, SAVE_GPR_+(reg-14)*4(sp)
 |.endif
-|  lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
+|  .FPU lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
 |.endmacro
 |
 |.macro saveregs
@@ -323,6 +355,7 @@
 |// Trap for not-yet-implemented parts.
 |.macro NYI; tw 4, sp, sp; .endmacro
 |
+|.if FPU
 |// int/FP conversions.
 |.macro tonum_i, freg, reg
 |  xoris reg, reg, 0x8000
@@ -346,6 +379,7 @@
 |.macro toint, reg, freg
 |  toint reg, freg, freg
 |.endmacro
+|.endif
 |
 |//-----------------------------------------------------------------------
 |
@@ -533,9 +567,19 @@ static void build_subroutines(BuildCtx *ctx)
   |  beq >2
   |1:
   |  addic. TMP1, TMP1, -8
+  |.if FPU
   |   lfd f0, 0(RA)
+  |.else
+  |   lwz CARG1, 0(RA)
+  |   lwz CARG2, 4(RA)
+  |.endif
   |    addi RA, RA, 8
+  |.if FPU
   |   stfd f0, 0(BASE)
+  |.else
+  |   stw CARG1, 0(BASE)
+  |   stw CARG2, 4(BASE)
+  |.endif
   |    addi BASE, BASE, 8
   |  bney <1
   |
@@ -613,23 +657,23 @@ static void build_subroutines(BuildCtx *ctx)
   |  .toc ld TOCREG, SAVE_TOC
   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
   |  lp BASE, L->base
-  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
+  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
   |   lwz DISPATCH, L->glref		// Setup pointer to dispatch table.
   |     li ZERO, 0
-  |     stw TMP3, TMPD
+  |     .FPU stw TMP3, TMPD
   |  li TMP1, LJ_TFALSE
-  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
+  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
   |     li TISNIL, LJ_TNIL
   |    li_vmstate INTERP
-  |     lfs TOBIT, TMPD
+  |     .FPU lfs TOBIT, TMPD
   |  lwz PC, FRAME_PC(BASE)		// Fetch PC of previous frame.
   |  la RA, -8(BASE)			// Results start at BASE-8.
-  |     stw TMP3, TMPD
+  |     .FPU stw TMP3, TMPD
   |   addi DISPATCH, DISPATCH, GG_G2DISP
   |  stw TMP1, 0(RA)			// Prepend false to error message.
   |  li RD, 16				// 2 results: false + error message.
   |    st_vmstate
-  |     lfs TONUM, TMPD
+  |     .FPU lfs TONUM, TMPD
   |  b ->vm_returnc
   |
   |//-----------------------------------------------------------------------
@@ -690,22 +734,22 @@ static void build_subroutines(BuildCtx *ctx)
   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
   |   lp TMP1, L->top
   |  lwz PC, FRAME_PC(BASE)
-  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
+  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
   |    stb CARG3, L->status
-  |     stw TMP3, TMPD
-  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
-  |     lfs TOBIT, TMPD
+  |     .FPU stw TMP3, TMPD
+  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
+  |     .FPU lfs TOBIT, TMPD
   |   sub RD, TMP1, BASE
-  |     stw TMP3, TMPD
-  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
+  |     .FPU stw TMP3, TMPD
+  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
   |   addi RD, RD, 8
-  |     stw TMP0, TONUM_HI
+  |     .FPU stw TMP0, TONUM_HI
   |    li_vmstate INTERP
   |     li ZERO, 0
   |    st_vmstate
   |  andix. TMP0, PC, FRAME_TYPE
   |   mr MULTRES, RD
-  |     lfs TONUM, TMPD
+  |     .FPU lfs TONUM, TMPD
   |     li TISNIL, LJ_TNIL
   |  beq ->BC_RET_Z
   |  b ->vm_return
@@ -739,19 +783,19 @@ static void build_subroutines(BuildCtx *ctx)
   |  lp TMP2, L->base			// TMP2 = old base (used in vmeta_call).
   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
   |   lp TMP1, L->top
-  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
+  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
   |  add PC, PC, BASE
-  |     stw TMP3, TMPD
+  |     .FPU stw TMP3, TMPD
   |     li ZERO, 0
-  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
-  |     lfs TOBIT, TMPD
+  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
+  |     .FPU lfs TOBIT, TMPD
   |  sub PC, PC, TMP2			// PC = frame delta + frame type
-  |     stw TMP3, TMPD
-  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
+  |     .FPU stw TMP3, TMPD
+  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
   |   sub NARGS8:RC, TMP1, BASE
-  |     stw TMP0, TONUM_HI
+  |     .FPU stw TMP0, TONUM_HI
   |    li_vmstate INTERP
-  |     lfs TONUM, TMPD
+  |     .FPU lfs TONUM, TMPD
   |     li TISNIL, LJ_TNIL
   |    st_vmstate
   |
@@ -839,15 +883,30 @@ static void build_subroutines(BuildCtx *ctx)
   |  lwz INS, -4(PC)
   |   subi CARG2, RB, 16
   |  decode_RB8 SAVE0, INS
+  |.if FPU
   |   lfd f0, 0(RA)
+  |.else
+  |   lwz TMP2, 0(RA)
+  |   lwz TMP3, 4(RA)
+  |.endif
   |  add TMP1, BASE, SAVE0
   |   stp BASE, L->base
   |  cmplw TMP1, CARG2
   |   sub CARG3, CARG2, TMP1
   |  decode_RA8 RA, INS
+  |.if FPU
   |   stfd f0, 0(CARG2)
+  |.else
+  |   stw TMP2, 0(CARG2)
+  |   stw TMP3, 4(CARG2)
+  |.endif
   |  bney ->BC_CAT_Z
+  |.if FPU
   |   stfdx f0, BASE, RA
+  |.else
+  |   stwux TMP2, RA, BASE
+  |   stw TMP3, 4(RA)
+  |.endif
   |  b ->cont_nop
   |
   |//-- Table indexing metamethods -----------------------------------------
@@ -900,9 +959,19 @@ static void build_subroutines(BuildCtx *ctx)
   |  // Returns TValue * (finished) or NULL (metamethod).
   |  cmplwi CRET1, 0
   |  beq >3
+  |.if FPU
   |   lfd f0, 0(CRET1)
+  |.else
+  |   lwz TMP0, 0(CRET1)
+  |   lwz TMP1, 4(CRET1)
+  |.endif
   |  ins_next1
+  |.if FPU
   |   stfdx f0, BASE, RA
+  |.else
+  |   stwux TMP0, RA, BASE
+  |   stw TMP1, 4(RA)
+  |.endif
   |  ins_next2
   |
   |3:  // Call __index metamethod.
@@ -920,7 +989,12 @@ static void build_subroutines(BuildCtx *ctx)
   |  // Returns cTValue * or NULL.
   |  cmplwi CRET1, 0
   |  beq >1
+  |.if FPU
   |  lfd f14, 0(CRET1)
+  |.else
+  |  lwz SAVE0, 0(CRET1)
+  |  lwz SAVE1, 4(CRET1)
+  |.endif
   |  b ->BC_TGETR_Z
   |1:
   |  stwx TISNIL, BASE, RA
@@ -975,11 +1049,21 @@ static void build_subroutines(BuildCtx *ctx)
   |  bl extern lj_meta_tset		// (lua_State *L, TValue *o, TValue *k)
   |  // Returns TValue * (finished) or NULL (metamethod).
   |  cmplwi CRET1, 0
+  |.if FPU
   |   lfdx f0, BASE, RA
+  |.else
+  |   lwzux TMP2, RA, BASE
+  |   lwz TMP3, 4(RA)
+  |.endif
   |  beq >3
   |  // NOBARRIER: lj_meta_tset ensures the table is not black.
   |  ins_next1
+  |.if FPU
   |   stfd f0, 0(CRET1)
+  |.else
+  |   stw TMP2, 0(CRET1)
+  |   stw TMP3, 4(CRET1)
+  |.endif
   |  ins_next2
   |
   |3:  // Call __newindex metamethod.
@@ -990,7 +1074,12 @@ static void build_subroutines(BuildCtx *ctx)
   |   add PC, TMP1, BASE
   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
   |   li NARGS8:RC, 24			// 3 args for func(t, k, v)
+  |.if FPU
   |  stfd f0, 16(BASE)			// Copy value to third argument.
+  |.else
+  |  stw TMP2, 16(BASE)
+  |  stw TMP3, 20(BASE)
+  |.endif
   |  b ->vm_call_dispatch_f
   |
   |->vmeta_tsetr:
@@ -999,7 +1088,12 @@ static void build_subroutines(BuildCtx *ctx)
   |  stw PC, SAVE_PC
   |  bl extern lj_tab_setinth  // (lua_State *L, GCtab *t, int32_t key)
   |  // Returns TValue *.
+  |.if FPU
   |  stfd f14, 0(CRET1)
+  |.else
+  |  stw SAVE0, 0(CRET1)
+  |  stw SAVE1, 4(CRET1)
+  |.endif
   |  b ->cont_nop
   |
   |//-- Comparison metamethods ---------------------------------------------
@@ -1038,9 +1132,19 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |->cont_ra:				// RA = resultptr
   |  lwz INS, -4(PC)
+  |.if FPU
   |   lfd f0, 0(RA)
+  |.else
+  |   lwz CARG1, 0(RA)
+  |   lwz CARG2, 4(RA)
+  |.endif
   |  decode_RA8 TMP1, INS
+  |.if FPU
   |   stfdx f0, BASE, TMP1
+  |.else
+  |   stwux CARG1, TMP1, BASE
+  |   stw CARG2, 4(TMP1)
+  |.endif
   |  b ->cont_nop
   |
   |->cont_condt:			// RA = resultptr
@@ -1246,22 +1350,32 @@ static void build_subroutines(BuildCtx *ctx)
   |.macro .ffunc_n, name
   |->ff_ .. name:
   |  cmplwi NARGS8:RC, 8
-  |   lwz CARG3, 0(BASE)
+  |   lwz CARG1, 0(BASE)
+  |.if FPU
   |    lfd FARG1, 0(BASE)
+  |.else
+  |    lwz CARG2, 4(BASE)
+  |.endif
   |  blt ->fff_fallback
-  |  checknum CARG3; bge ->fff_fallback
+  |  checknum CARG1; bge ->fff_fallback
   |.endmacro
   |
   |.macro .ffunc_nn, name
   |->ff_ .. name:
   |  cmplwi NARGS8:RC, 16
-  |   lwz CARG3, 0(BASE)
+  |   lwz CARG1, 0(BASE)
+  |.if FPU
   |    lfd FARG1, 0(BASE)
-  |   lwz CARG4, 8(BASE)
+  |   lwz CARG3, 8(BASE)
   |    lfd FARG2, 8(BASE)
+  |.else
+  |    lwz CARG2, 4(BASE)
+  |   lwz CARG3, 8(BASE)
+  |    lwz CARG4, 12(BASE)
+  |.endif
   |  blt ->fff_fallback
+  |  checknum CARG1; bge ->fff_fallback
   |  checknum CARG3; bge ->fff_fallback
-  |  checknum CARG4; bge ->fff_fallback
   |.endmacro
   |
   |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1.
@@ -1282,14 +1396,21 @@ static void build_subroutines(BuildCtx *ctx)
   |  bge cr1, ->fff_fallback
   |   stw CARG3, 0(RA)
   |  addi RD, NARGS8:RC, 8		// Compute (nresults+1)*8.
+  |  addi TMP1, BASE, 8
+  |  add TMP2, RA, NARGS8:RC
   |   stw CARG1, 4(RA)
   |  beq ->fff_res			// Done if exactly 1 argument.
-  |  li TMP1, 8
-  |  subi RC, RC, 8
   |1:
-  |  cmplw TMP1, RC
-  |   lfdx f0, BASE, TMP1
-  |   stfdx f0, RA, TMP1
+  |  cmplw TMP1, TMP2
+  |.if FPU
+  |   lfd f0, 0(TMP1)
+  |   stfd f0, 0(TMP1)
+  |.else
+  |   lwz CARG1, 0(TMP1)
+  |   lwz CARG2, 4(TMP1)
+  |   stw CARG1, -8(TMP1)
+  |   stw CARG2, -4(TMP1)
+  |.endif
   |    addi TMP1, TMP1, 8
   |  bney <1
   |  b ->fff_res
@@ -1304,8 +1425,14 @@ static void build_subroutines(BuildCtx *ctx)
   |  orc TMP1, TMP2, TMP0
   |  addi TMP1, TMP1, ~LJ_TISNUM+1
   |  slwi TMP1, TMP1, 3
+  |.if FPU
   |   la TMP2, CFUNC:RB->upvalue
   |  lfdx FARG1, TMP2, TMP1
+  |.else
+  |  add TMP1, CFUNC:RB, TMP1
+  |  lwz CARG1, CFUNC:TMP1->upvalue[0].u32.hi
+  |  lwz CARG2, CFUNC:TMP1->upvalue[0].u32.lo
+  |.endif
   |  b ->fff_resn
   |
   |//-- Base library: getters and setters ---------------------------------
@@ -1383,7 +1510,12 @@ static void build_subroutines(BuildCtx *ctx)
   |   mr CARG1, L
   |  bl extern lj_tab_get  // (lua_State *L, GCtab *t, cTValue *key)
   |  // Returns cTValue *.
+  |.if FPU
   |  lfd FARG1, 0(CRET1)
+  |.else
+  |  lwz CARG2, 4(CRET1)
+  |  lwz CARG1, 0(CRET1)	// Caveat: CARG1 == CRET1.
+  |.endif
   |  b ->fff_resn
   |
   |//-- Base library: conversions ------------------------------------------
@@ -1392,7 +1524,11 @@ static void build_subroutines(BuildCtx *ctx)
   |  // Only handles the number case inline (without a base argument).
   |  cmplwi NARGS8:RC, 8
   |   lwz CARG1, 0(BASE)
+  |.if FPU
   |    lfd FARG1, 0(BASE)
+  |.else
+  |    lwz CARG2, 4(BASE)
+  |.endif
   |  bne ->fff_fallback			// Exactly one argument.
   |   checknum CARG1; bgt ->fff_fallback
   |  b ->fff_resn
@@ -1443,12 +1579,23 @@ static void build_subroutines(BuildCtx *ctx)
   |  cmplwi CRET1, 0
   |   li CARG3, LJ_TNIL
   |  beq ->fff_restv			// End of traversal: return nil.
-  |  lfd f0, 8(BASE)			// Copy key and value to results.
   |   la RA, -8(BASE)
+  |.if FPU
+  |  lfd f0, 8(BASE)			// Copy key and value to results.
   |  lfd f1, 16(BASE)
   |  stfd f0, 0(RA)
-  |   li RD, (2+1)*8
   |  stfd f1, 8(RA)
+  |.else
+  |  lwz CARG1, 8(BASE)
+  |  lwz CARG2, 12(BASE)
+  |  lwz CARG3, 16(BASE)
+  |  lwz CARG4, 20(BASE)
+  |  stw CARG1, 0(RA)
+  |  stw CARG2, 4(RA)
+  |  stw CARG3, 8(RA)
+  |  stw CARG4, 12(RA)
+  |.endif
+  |   li RD, (2+1)*8
   |  b ->fff_res
   |
   |.ffunc_1 pairs
@@ -1457,17 +1604,32 @@ static void build_subroutines(BuildCtx *ctx)
   |  bne ->fff_fallback
 #if LJ_52
   |   lwz TAB:TMP2, TAB:CARG1->metatable
+  |.if FPU
   |  lfd f0, CFUNC:RB->upvalue[0]
+  |.else
+  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+  |.endif
   |   cmplwi TAB:TMP2, 0
   |  la RA, -8(BASE)
   |   bne ->fff_fallback
 #else
+  |.if FPU
   |  lfd f0, CFUNC:RB->upvalue[0]
+  |.else
+  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+  |.endif
   |  la RA, -8(BASE)
 #endif
   |   stw TISNIL, 8(BASE)
   |  li RD, (3+1)*8
+  |.if FPU
   |  stfd f0, 0(RA)
+  |.else
+  |  stw TMP0, 0(RA)
+  |  stw TMP1, 4(RA)
+  |.endif
   |  b ->fff_res
   |
   |.ffunc ipairs_aux
@@ -1513,14 +1675,24 @@ static void build_subroutines(BuildCtx *ctx)
   |  stfd FARG2, 0(RA)
   |.endif
   |  ble >2				// Not in array part?
+  |.if FPU
   |  lwzx TMP2, TMP1, TMP3
   |  lfdx f0, TMP1, TMP3
+  |.else
+  |  lwzux TMP2, TMP1, TMP3
+  |  lwz TMP3, 4(TMP1)
+  |.endif
   |1:
   |  checknil TMP2
   |   li RD, (0+1)*8
   |  beq ->fff_res			// End of iteration, return 0 results.
   |   li RD, (2+1)*8
+  |.if FPU
   |  stfd f0, 8(RA)
+  |.else
+  |  stw TMP2, 8(RA)
+  |  stw TMP3, 12(RA)
+  |.endif
   |  b ->fff_res
   |2:  // Check for empty hash part first. Otherwise call C function.
   |  lwz TMP0, TAB:CARG1->hmask
@@ -1534,7 +1706,11 @@ static void build_subroutines(BuildCtx *ctx)
   |   li RD, (0+1)*8
   |  beq ->fff_res
   |  lwz TMP2, 0(CRET1)
+  |.if FPU
   |  lfd f0, 0(CRET1)
+  |.else
+  |  lwz TMP3, 4(CRET1)
+  |.endif
   |  b <1
   |
   |.ffunc_1 ipairs
@@ -1543,12 +1719,22 @@ static void build_subroutines(BuildCtx *ctx)
   |  bne ->fff_fallback
 #if LJ_52
   |   lwz TAB:TMP2, TAB:CARG1->metatable
+  |.if FPU
   |  lfd f0, CFUNC:RB->upvalue[0]
+  |.else
+  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+  |.endif
   |   cmplwi TAB:TMP2, 0
   |  la RA, -8(BASE)
   |   bne ->fff_fallback
 #else
+  |.if FPU
   |  lfd f0, CFUNC:RB->upvalue[0]
+  |.else
+  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
+  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
+  |.endif
   |  la RA, -8(BASE)
 #endif
   |.if DUALNUM
@@ -1558,7 +1744,12 @@ static void build_subroutines(BuildCtx *ctx)
   |.endif
   |   stw ZERO, 12(BASE)
   |  li RD, (3+1)*8
+  |.if FPU
   |  stfd f0, 0(RA)
+  |.else
+  |  stw TMP0, 0(RA)
+  |  stw TMP1, 4(RA)
+  |.endif
   |  b ->fff_res
   |
   |//-- Base library: catch errors ----------------------------------------
@@ -1577,19 +1768,32 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.ffunc xpcall
   |  cmplwi NARGS8:RC, 16
-  |   lwz CARG4, 8(BASE)
+  |   lwz CARG3, 8(BASE)
+  |.if FPU
   |    lfd FARG2, 8(BASE)
   |    lfd FARG1, 0(BASE)
+  |.else
+  |    lwz CARG1, 0(BASE)
+  |    lwz CARG2, 4(BASE)
+  |    lwz CARG4, 12(BASE)
+  |.endif
   |  blt ->fff_fallback
   |  lbz TMP1, DISPATCH_GL(hookmask)(DISPATCH)
   |   mr TMP2, BASE
-  |  checkfunc CARG4; bne ->fff_fallback  // Traceback must be a function.
+  |  checkfunc CARG3; bne ->fff_fallback  // Traceback must be a function.
   |   la BASE, 16(BASE)
   |  // Remember active hook before pcall.
   |  rlwinm TMP1, TMP1, 32-HOOK_ACTIVE_SHIFT, 31, 31
+  |.if FPU
   |    stfd FARG2, 0(TMP2)		// Swap function and traceback.
-  |  subi NARGS8:RC, NARGS8:RC, 16
   |    stfd FARG1, 8(TMP2)
+  |.else
+  |    stw CARG3, 0(TMP2)
+  |    stw CARG4, 4(TMP2)
+  |    stw CARG1, 8(TMP2)
+  |    stw CARG2, 12(TMP2)
+  |.endif
+  |  subi NARGS8:RC, NARGS8:RC, 16
   |  addi PC, TMP1, 16+FRAME_PCALL
   |  b ->vm_call_dispatch
   |
@@ -1632,9 +1836,21 @@ static void build_subroutines(BuildCtx *ctx)
   |  stp BASE, L->top
   |2:  // Move args to coroutine.
   |  cmpw TMP1, NARGS8:RC
+  |.if FPU
   |   lfdx f0, BASE, TMP1
+  |.else
+  |   add CARG3, BASE, TMP1
+  |   lwz TMP2, 0(CARG3)
+  |   lwz TMP3, 4(CARG3)
+  |.endif
   |  beq >3
+  |.if FPU
   |   stfdx f0, CARG2, TMP1
+  |.else
+  |   add CARG3, CARG2, TMP1
+  |   stw TMP2, 0(CARG3)
+  |   stw TMP3, 4(CARG3)
+  |.endif
   |  addi TMP1, TMP1, 8
   |  b <2
   |3:
@@ -1665,8 +1881,17 @@ static void build_subroutines(BuildCtx *ctx)
   |   stp TMP2, L:SAVE0->top		// Clear coroutine stack.
   |5:  // Move results from coroutine.
   |  cmplw TMP1, TMP3
+  |.if FPU
   |   lfdx f0, TMP2, TMP1
   |   stfdx f0, BASE, TMP1
+  |.else
+  |   add CARG3, TMP2, TMP1
+  |   lwz CARG1, 0(CARG3)
+  |   lwz CARG2, 4(CARG3)
+  |   add CARG3, BASE, TMP1
+  |   stw CARG1, 0(CARG3)
+  |   stw CARG2, 4(CARG3)
+  |.endif
   |    addi TMP1, TMP1, 8
   |  bne <5
   |6:
@@ -1691,12 +1916,22 @@ static void build_subroutines(BuildCtx *ctx)
   |  andix. TMP0, PC, FRAME_TYPE
   |  la TMP3, -8(TMP3)
   |   li TMP1, LJ_TFALSE
+  |.if FPU
   |  lfd f0, 0(TMP3)
+  |.else
+  |  lwz CARG1, 0(TMP3)
+  |  lwz CARG2, 4(TMP3)
+  |.endif
   |   stp TMP3, L:SAVE0->top		// Remove error from coroutine stack.
   |    li RD, (2+1)*8
   |   stw TMP1, -8(BASE)		// Prepend false to results.
   |    la RA, -8(BASE)
+  |.if FPU
   |  stfd f0, 0(BASE)			// Copy error message.
+  |.else
+  |  stw CARG1, 0(BASE)			// Copy error message.
+  |  stw CARG2, 4(BASE)
+  |.endif
   |  b <7
   |.else
   |  mr CARG1, L
@@ -1875,7 +2110,12 @@ static void build_subroutines(BuildCtx *ctx)
   |  lus CARG1, 0x8000			// -(2^31).
   |  beqy ->fff_resi
   |5:
+  |.if FPU
   |  lfd FARG1, 0(BASE)
+  |.else
+  |  lwz CARG1, 0(BASE)
+  |  lwz CARG2, 4(BASE)
+  |.endif
   |  blex func
   |  b ->fff_resn
   |.endmacro
@@ -1899,10 +2139,14 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.ffunc math_log
   |  cmplwi NARGS8:RC, 8
-  |   lwz CARG3, 0(BASE)
-  |    lfd FARG1, 0(BASE)
+  |   lwz CARG1, 0(BASE)
   |  bne ->fff_fallback			// Need exactly 1 argument.
-  |  checknum CARG3; bge ->fff_fallback
+  |  checknum CARG1; bge ->fff_fallback
+  |.if FPU
+  |  lfd FARG1, 0(BASE)
+  |.else
+  |  lwz CARG2, 4(BASE)
+  |.endif
   |  blex log
   |  b ->fff_resn
   |
@@ -1924,17 +2168,24 @@ static void build_subroutines(BuildCtx *ctx)
   |.if DUALNUM
   |.ffunc math_ldexp
   |  cmplwi NARGS8:RC, 16
-  |   lwz CARG3, 0(BASE)
+  |   lwz TMP0, 0(BASE)
+  |.if FPU
   |    lfd FARG1, 0(BASE)
-  |   lwz CARG4, 8(BASE)
+  |.else
+  |    lwz CARG1, 0(BASE)
+  |    lwz CARG2, 4(BASE)
+  |.endif
+  |   lwz TMP1, 8(BASE)
   |.if GPR64
   |    lwz CARG2, 12(BASE)
-  |.else
+  |.elif FPU
   |    lwz CARG1, 12(BASE)
+  |.else
+  |    lwz CARG3, 12(BASE)
   |.endif
   |  blt ->fff_fallback
-  |  checknum CARG3; bge ->fff_fallback
-  |  checknum CARG4; bne ->fff_fallback
+  |  checknum TMP0; bge ->fff_fallback
+  |  checknum TMP1; bne ->fff_fallback
   |.else
   |.ffunc_nn math_ldexp
   |.if GPR64
@@ -1949,8 +2200,10 @@ static void build_subroutines(BuildCtx *ctx)
   |.ffunc_n math_frexp
   |.if GPR64
   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-  |.else
+  |.elif FPU
   |  la CARG1, DISPATCH_GL(tmptv)(DISPATCH)
+  |.else
+  |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
   |.endif
   |   lwz PC, FRAME_PC(BASE)
   |  blex frexp
@@ -1959,7 +2212,12 @@ static void build_subroutines(BuildCtx *ctx)
   |.if not DUALNUM
   |   tonum_i FARG2, TMP1
   |.endif
+  |.if FPU
   |  stfd FARG1, 0(RA)
+  |.else
+  |  stw CRET1, 0(RA)
+  |  stw CRET2, 4(RA)
+  |.endif
   |  li RD, (2+1)*8
   |.if DUALNUM
   |   stw TISNUM, 8(RA)
@@ -1972,13 +2230,20 @@ static void build_subroutines(BuildCtx *ctx)
   |.ffunc_n math_modf
   |.if GPR64
   |  la CARG2, -8(BASE)
-  |.else
+  |.elif FPU
   |  la CARG1, -8(BASE)
+  |.else
+  |  la CARG3, -8(BASE)
   |.endif
   |   lwz PC, FRAME_PC(BASE)
   |  blex modf
   |   la RA, -8(BASE)
+  |.if FPU
   |  stfd FARG1, 0(BASE)
+  |.else
+  |  stw CRET1, 0(BASE)
+  |  stw CRET2, 4(BASE)
+  |.endif
   |  li RD, (2+1)*8
   |  b ->fff_res
   |
@@ -1986,13 +2251,13 @@ static void build_subroutines(BuildCtx *ctx)
   |.if DUALNUM
   |  .ffunc_1 name
   |  checknum CARG3
-  |   addi TMP1, BASE, 8
-  |   add TMP2, BASE, NARGS8:RC
+  |   addi SAVE0, BASE, 8
+  |   add SAVE1, BASE, NARGS8:RC
   |  bne >4
   |1:  // Handle integers.
-  |  lwz CARG4, 0(TMP1)
-  |   cmplw cr1, TMP1, TMP2
-  |  lwz CARG2, 4(TMP1)
+  |  lwz CARG4, 0(SAVE0)
+  |   cmplw cr1, SAVE0, SAVE1
+  |  lwz CARG2, 4(SAVE0)
   |   bge cr1, ->fff_resi
   |  checknum CARG4
   |   xoris TMP0, CARG1, 0x8000
@@ -2009,36 +2274,76 @@ static void build_subroutines(BuildCtx *ctx)
   |.if GPR64
   |  rldicl CARG1, CARG1, 0, 32
   |.endif
-  |   addi TMP1, TMP1, 8
+  |   addi SAVE0, SAVE0, 8
   |  b <1
   |3:
   |  bge ->fff_fallback
   |  // Convert intermediate result to number and continue below.
+  |.if FPU
   |  tonum_i FARG1, CARG1
-  |  lfd FARG2, 0(TMP1)
+  |  lfd FARG2, 0(SAVE0)
+  |.else
+  |  mr CARG2, CARG1
+  |  bl ->vm_sfi2d_1
+  |  lwz CARG3, 0(SAVE0)
+  |  lwz CARG4, 4(SAVE0)
+  |.endif
   |  b >6
   |4:
+  |.if FPU
   |   lfd FARG1, 0(BASE)
+  |.else
+  |   lwz CARG1, 0(BASE)
+  |   lwz CARG2, 4(BASE)
+  |.endif
   |  bge ->fff_fallback
   |5:  // Handle numbers.
-  |  lwz CARG4, 0(TMP1)
-  |   cmplw cr1, TMP1, TMP2
-  |  lfd FARG2, 0(TMP1)
+  |  lwz CARG3, 0(SAVE0)
+  |   cmplw cr1, SAVE0, SAVE1
+  |.if FPU
+  |  lfd FARG2, 0(SAVE0)
+  |.else
+  |  lwz CARG4, 4(SAVE0)
+  |.endif
   |   bge cr1, ->fff_resn
-  |  checknum CARG4; bge >7
+  |  checknum CARG3; bge >7
   |6:
+  |   addi SAVE0, SAVE0, 8
+  |.if FPU
   |  fsub f0, FARG1, FARG2
-  |   addi TMP1, TMP1, 8
   |.if ismax
   |  fsel FARG1, f0, FARG1, FARG2
   |.else
   |  fsel FARG1, f0, FARG2, FARG1
   |.endif
+  |.else
+  |  stw CARG1, SFSAVE_1
+  |  stw CARG2, SFSAVE_2
+  |  stw CARG3, SFSAVE_3
+  |  stw CARG4, SFSAVE_4
+  |  blex __ledf2
+  |  cmpwi CRET1, 0
+  |.if ismax
+  |  blt >8
+  |.else
+  |  bge >8
+  |.endif
+  |  lwz CARG1, SFSAVE_1
+  |  lwz CARG2, SFSAVE_2
+  |  b <5
+  |8:
+  |  lwz CARG1, SFSAVE_3
+  |  lwz CARG2, SFSAVE_4
+  |.endif
   |  b <5
   |7:  // Convert integer to number and continue above.
-  |   lwz CARG2, 4(TMP1)
+  |   lwz CARG3, 4(SAVE0)
   |  bne ->fff_fallback
-  |  tonum_i FARG2, CARG2
+  |.if FPU
+  |  tonum_i FARG2, CARG3
+  |.else
+  |  bl ->vm_sfi2d_2
+  |.endif
   |  b <6
   |.else
   |  .ffunc_n name
@@ -2238,28 +2543,37 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc_bit_op, name, ins
   |  .ffunc_bit name
-  |  addi TMP1, BASE, 8
-  |  add TMP2, BASE, NARGS8:RC
+  |  addi SAVE0, BASE, 8
+  |  add SAVE1, BASE, NARGS8:RC
   |1:
-  |  lwz CARG4, 0(TMP1)
-  |   cmplw cr1, TMP1, TMP2
+  |  lwz CARG4, 0(SAVE0)
+  |   cmplw cr1, SAVE0, SAVE1
   |.if DUALNUM
-  |  lwz CARG2, 4(TMP1)
+  |  lwz CARG2, 4(SAVE0)
   |.else
-  |  lfd FARG1, 0(TMP1)
+  |  lfd FARG1, 0(SAVE0)
   |.endif
   |   bgey cr1, ->fff_resi
   |  checknum CARG4
   |.if DUALNUM
+  |.if FPU
   |  bnel ->fff_bitop_fb
   |.else
+  |  beq >3
+  |  stw CARG1, SFSAVE_1
+  |  bl ->fff_bitop_fb
+  |  mr CARG2, CARG1
+  |  lwz CARG1, SFSAVE_1
+  |3:
+  |.endif
+  |.else
   |  fadd FARG1, FARG1, TOBIT
   |  bge ->fff_fallback
   |  stfd FARG1, TMPD
   |  lwz CARG2, TMPD_LO
   |.endif
   |  ins CARG1, CARG1, CARG2
-  |   addi TMP1, TMP1, 8
+  |   addi SAVE0, SAVE0, 8
   |  b <1
   |.endmacro
   |
@@ -2281,7 +2595,14 @@ static void build_subroutines(BuildCtx *ctx)
   |.macro .ffunc_bit_sh, name, ins, shmod
   |.if DUALNUM
   |  .ffunc_2 bit_..name
+  |.if FPU
   |  checknum CARG3; bnel ->fff_tobit_fb
+  |.else
+  |  checknum CARG3; beq >1
+  |  bl ->fff_tobit_fb
+  |  lwz CARG2, 12(BASE)	// Conversion polluted CARG2.
+  |1:
+  |.endif
   |  // Note: no inline conversion from number for 2nd argument!
   |  checknum CARG4; bne ->fff_fallback
   |.else
@@ -2318,27 +2639,77 @@ static void build_subroutines(BuildCtx *ctx)
   |->fff_resn:
   |  lwz PC, FRAME_PC(BASE)
   |  la RA, -8(BASE)
+  |.if FPU
   |  stfd FARG1, -8(BASE)
+  |.else
+  |  stw CARG1, -8(BASE)
+  |  stw CARG2, -4(BASE)
+  |.endif
   |  b ->fff_res1
   |
   |// Fallback FP number to bit conversion.
   |->fff_tobit_fb:
   |.if DUALNUM
+  |.if FPU
   |  lfd FARG1, 0(BASE)
   |  bgt ->fff_fallback
   |  fadd FARG1, FARG1, TOBIT
   |  stfd FARG1, TMPD
   |  lwz CARG1, TMPD_LO
   |  blr
+  |.else
+  |  bgt ->fff_fallback
+  |  mr CARG2, CARG1
+  |  mr CARG1, CARG3
+  |// Modifies: CARG1, CARG2, TMP0, TMP1, TMP2.
+  |->vm_tobit:
+  |  slwi TMP2, CARG1, 1
+  |  addis TMP2, TMP2, 0x0020
+  |  cmpwi TMP2, 0
+  |  bge >2
+  |   li TMP1, 0x3e0
+  |  srawi TMP2, TMP2, 21
+  |   not TMP1, TMP1
+  |  sub. TMP2, TMP1, TMP2
+  |    cmpwi cr7, CARG1, 0
+  |  blt >1
+  |   slwi TMP1, CARG1, 11
+  |    srwi TMP0, CARG2, 21
+  |   oris TMP1, TMP1, 0x8000
+  |   or TMP1, TMP1, TMP0
+  |   srw CARG1, TMP1, TMP2
+  |  bclr 4, 28			// Return if cr7[lt] == 0, no hint.
+  |   neg CARG1, CARG1
+  |  blr
+  |1:
+  |  addi TMP2, TMP2, 21
+  |  srw TMP1, CARG2, TMP2
+  |   slwi CARG2, CARG1, 12
+  |  subfic TMP2, TMP2, 20
+  |   slw TMP0, CARG2, TMP2
+  |   or CARG1, TMP1, TMP0
+  |  bclr 4, 28			// Return if cr7[lt] == 0, no hint.
+  |   neg CARG1, CARG1
+  |  blr
+  |2:
+  |  li CARG1, 0
+  |  blr
+  |.endif
   |.endif
   |->fff_bitop_fb:
   |.if DUALNUM
-  |  lfd FARG1, 0(TMP1)
+  |.if FPU
+  |  lfd FARG1, 0(SAVE0)
   |  bgt ->fff_fallback
   |  fadd FARG1, FARG1, TOBIT
   |  stfd FARG1, TMPD
   |  lwz CARG2, TMPD_LO
   |  blr
+  |.else
+  |  bgt ->fff_fallback
+  |  mr CARG1, CARG4
+  |  b ->vm_tobit
+  |.endif
   |.endif
   |
   |//-----------------------------------------------------------------------
@@ -2531,10 +2902,21 @@ static void build_subroutines(BuildCtx *ctx)
   |  decode_RA8 RC, INS			// Call base.
   |   beq >2
   |1:  // Move results down.
+  |.if FPU
   |  lfd f0, 0(RA)
+  |.else
+  |  lwz CARG1, 0(RA)
+  |  lwz CARG2, 4(RA)
+  |.endif
   |   addic. TMP1, TMP1, -8
   |    addi RA, RA, 8
+  |.if FPU
   |  stfdx f0, BASE, RC
+  |.else
+  |  add CARG3, BASE, RC
+  |  stw CARG1, 0(CARG3)
+  |  stw CARG2, 4(CARG3)
+  |.endif
   |    addi RC, RC, 8
   |   bne <1
   |2:
@@ -2587,10 +2969,12 @@ static void build_subroutines(BuildCtx *ctx)
   |//-----------------------------------------------------------------------
   |
   |.macro savex_, a, b, c, d
+  |.if FPU
   |  stfd f..a, 16+a*8(sp)
   |  stfd f..b, 16+b*8(sp)
   |  stfd f..c, 16+c*8(sp)
   |  stfd f..d, 16+d*8(sp)
+  |.endif
   |.endmacro
   |
   |->vm_exit_handler:
@@ -2662,16 +3046,16 @@ static void build_subroutines(BuildCtx *ctx)
   |  lwz KBASE, PC2PROTO(k)(TMP1)
   |  // Setup type comparison constants.
   |  li TISNUM, LJ_TISNUM
-  |  lus TMP3, 0x59c0			// TOBIT = 2^52 + 2^51 (float).
-  |  stw TMP3, TMPD
+  |  .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
+  |  .FPU stw TMP3, TMPD
   |  li ZERO, 0
-  |  ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
-  |  lfs TOBIT, TMPD
-  |  stw TMP3, TMPD
-  |  lus TMP0, 0x4338			// Hiword of 2^52 + 2^51 (double)
+  |  .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
+  |  .FPU lfs TOBIT, TMPD
+  |  .FPU stw TMP3, TMPD
+  |  .FPU lus TMP0, 0x4338			// Hiword of 2^52 + 2^51 (double)
   |    li TISNIL, LJ_TNIL
-  |  stw TMP0, TONUM_HI
-  |  lfs TONUM, TMPD
+  |  .FPU stw TMP0, TONUM_HI
+  |  .FPU lfs TONUM, TMPD
   |  // Modified copy of ins_next which handles function header dispatch, too.
   |  lwz INS, 0(PC)
   |   addi PC, PC, 4
@@ -2716,7 +3100,35 @@ static void build_subroutines(BuildCtx *ctx)
   |//-- Math helper functions ----------------------------------------------
   |//-----------------------------------------------------------------------
   |
-  |// NYI: Use internal implementations of floor, ceil, trunc.
+  |// NYI: Use internal implementations of floor, ceil, trunc, sfcmp.
+  |
+  |.macro sfi2d, AHI, ALO
+  |.if not FPU
+  |  mr. AHI, ALO
+  |  bclr 12, 2				// Handle zero first.
+  |  srawi TMP0, ALO, 31
+  |  xor TMP1, ALO, TMP0
+  |  sub TMP1, TMP1, TMP0		// Absolute value in TMP1.
+  |  cntlzw AHI, TMP1
+  |  andix. TMP0, TMP0, 0x800		// Mask sign bit.
+  |  slw TMP1, TMP1, AHI		// Align mantissa left with leading 1.
+  |  subfic AHI, AHI, 0x3ff+31-1	// Exponent -1 in AHI.
+  |  slwi ALO, TMP1, 21
+  |  or AHI, AHI, TMP0			// Sign | Exponent.
+  |  srwi TMP1, TMP1, 11
+  |  slwi AHI, AHI, 20			// Align left.
+  |  add AHI, AHI, TMP1			// Add mantissa, increment exponent.
+  |  blr
+  |.endif
+  |.endmacro
+  |
+  |// Input: CARG2. Output: CARG1, CARG2. Temporaries: TMP0, TMP1.
+  |->vm_sfi2d_1:
+  |  sfi2d CARG1, CARG2
+  |
+  |// Input: CARG4. Output: CARG3, CARG4. Temporaries: TMP0, TMP1.
+  |->vm_sfi2d_2:
+  |  sfi2d CARG3, CARG4
   |
   |->vm_modi:
   |  divwo. TMP0, CARG1, CARG2
@@ -2784,21 +3196,21 @@ static void build_subroutines(BuildCtx *ctx)
   |   addi DISPATCH, r12, GG_G2DISP
   |  stw r11, CTSTATE->cb.slot
   |  stw r3, CTSTATE->cb.gpr[0]
-  |   stfd f1, CTSTATE->cb.fpr[0]
+  |   .FPU stfd f1, CTSTATE->cb.fpr[0]
   |  stw r4, CTSTATE->cb.gpr[1]
-  |   stfd f2, CTSTATE->cb.fpr[1]
+  |   .FPU stfd f2, CTSTATE->cb.fpr[1]
   |  stw r5, CTSTATE->cb.gpr[2]
-  |   stfd f3, CTSTATE->cb.fpr[2]
+  |   .FPU stfd f3, CTSTATE->cb.fpr[2]
   |  stw r6, CTSTATE->cb.gpr[3]
-  |   stfd f4, CTSTATE->cb.fpr[3]
+  |   .FPU stfd f4, CTSTATE->cb.fpr[3]
   |  stw r7, CTSTATE->cb.gpr[4]
-  |   stfd f5, CTSTATE->cb.fpr[4]
+  |   .FPU stfd f5, CTSTATE->cb.fpr[4]
   |  stw r8, CTSTATE->cb.gpr[5]
-  |   stfd f6, CTSTATE->cb.fpr[5]
+  |   .FPU stfd f6, CTSTATE->cb.fpr[5]
   |  stw r9, CTSTATE->cb.gpr[6]
-  |   stfd f7, CTSTATE->cb.fpr[6]
+  |   .FPU stfd f7, CTSTATE->cb.fpr[6]
   |  stw r10, CTSTATE->cb.gpr[7]
-  |   stfd f8, CTSTATE->cb.fpr[7]
+  |   .FPU stfd f8, CTSTATE->cb.fpr[7]
   |  addi TMP0, sp, CFRAME_SPACE+8
   |  stw TMP0, CTSTATE->cb.stack
   |   mr CARG1, CTSTATE
@@ -2809,21 +3221,21 @@ static void build_subroutines(BuildCtx *ctx)
   |  lp BASE, L:CRET1->base
   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
   |  lp RC, L:CRET1->top
-  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
+  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
   |     li ZERO, 0
   |   mr L, CRET1
-  |     stw TMP3, TMPD
-  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
+  |     .FPU stw TMP3, TMPD
+  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
   |  lwz LFUNC:RB, FRAME_FUNC(BASE)
-  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
-  |     stw TMP0, TONUM_HI
+  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
+  |     .FPU stw TMP0, TONUM_HI
   |     li TISNIL, LJ_TNIL
   |    li_vmstate INTERP
-  |     lfs TOBIT, TMPD
-  |     stw TMP3, TMPD
+  |     .FPU lfs TOBIT, TMPD
+  |     .FPU stw TMP3, TMPD
   |  sub RC, RC, BASE
   |    st_vmstate
-  |     lfs TONUM, TMPD
+  |     .FPU lfs TONUM, TMPD
   |  ins_callt
   |.endif
   |
@@ -2837,7 +3249,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mr CARG2, RA
   |  bl extern lj_ccallback_leave	// (CTState *cts, TValue *o)
   |  lwz CRET1, CTSTATE->cb.gpr[0]
-  |  lfd FARG1, CTSTATE->cb.fpr[0]
+  |  .FPU lfd FARG1, CTSTATE->cb.fpr[0]
   |  lwz CRET2, CTSTATE->cb.gpr[1]
   |  b ->vm_leave_unw
   |.endif
@@ -2871,14 +3283,14 @@ static void build_subroutines(BuildCtx *ctx)
   |  bge <1
   |2:
   |  bney cr1, >3
-  |  lfd f1, CCSTATE->fpr[0]
-  |  lfd f2, CCSTATE->fpr[1]
-  |  lfd f3, CCSTATE->fpr[2]
-  |  lfd f4, CCSTATE->fpr[3]
-  |  lfd f5, CCSTATE->fpr[4]
-  |  lfd f6, CCSTATE->fpr[5]
-  |  lfd f7, CCSTATE->fpr[6]
-  |  lfd f8, CCSTATE->fpr[7]
+  |  .FPU lfd f1, CCSTATE->fpr[0]
+  |  .FPU lfd f2, CCSTATE->fpr[1]
+  |  .FPU lfd f3, CCSTATE->fpr[2]
+  |  .FPU lfd f4, CCSTATE->fpr[3]
+  |  .FPU lfd f5, CCSTATE->fpr[4]
+  |  .FPU lfd f6, CCSTATE->fpr[5]
+  |  .FPU lfd f7, CCSTATE->fpr[6]
+  |  .FPU lfd f8, CCSTATE->fpr[7]
   |3:
   |   lp TMP0, CCSTATE->func
   |  lwz CARG2, CCSTATE->gpr[1]
@@ -2895,7 +3307,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  lwz TMP2, -4(r14)
   |   lwz TMP0, 4(r14)
   |  stw CARG1, CCSTATE:TMP1->gpr[0]
-  |  stfd FARG1, CCSTATE:TMP1->fpr[0]
+  |  .FPU stfd FARG1, CCSTATE:TMP1->fpr[0]
   |  stw CARG2, CCSTATE:TMP1->gpr[1]
   |   mtlr TMP0
   |  stw CARG3, CCSTATE:TMP1->gpr[2]
@@ -2924,19 +3336,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
   case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
     |  // RA = src1*8, RD = src2*8, JMP with RD = target
     |.if DUALNUM
-    |  lwzux TMP0, RA, BASE
+    |  lwzux CARG1, RA, BASE
     |    addi PC, PC, 4
     |   lwz CARG2, 4(RA)
-    |  lwzux TMP1, RD, BASE
+    |  lwzux CARG3, RD, BASE
     |    lwz TMP2, -4(PC)
-    |  checknum cr0, TMP0
-    |   lwz CARG3, 4(RD)
+    |  checknum cr0, CARG1
+    |   lwz CARG4, 4(RD)
     |    decode_RD4 TMP2, TMP2
-    |  checknum cr1, TMP1
-    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+    |  checknum cr1, CARG3
+    |    addis SAVE0, TMP2, -(BCBIAS_J*4 >> 16)
     |  bne cr0, >7
     |  bne cr1, >8
-    |   cmpw CARG2, CARG3
+    |   cmpw CARG2, CARG4
     if (op == BC_ISLT) {
       |  bge >2
     } else if (op == BC_ISGE) {
@@ -2947,28 +3359,41 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  ble >2
     }
     |1:
-    |  add PC, PC, TMP2
+    |  add PC, PC, SAVE0
     |2:
     |  ins_next
     |
     |7:  // RA is not an integer.
     |  bgt cr0, ->vmeta_comp
     |  // RA is a number.
-    |   lfd f0, 0(RA)
+    |   .FPU lfd f0, 0(RA)
     |  bgt cr1, ->vmeta_comp
     |  blt cr1, >4
     |  // RA is a number, RD is an integer.
-    |  tonum_i f1, CARG3
+    |.if FPU
+    |  tonum_i f1, CARG4
+    |.else
+    |  bl ->vm_sfi2d_2
+    |.endif
     |  b >5
     |
     |8: // RA is an integer, RD is not an integer.
     |  bgt cr1, ->vmeta_comp
     |  // RA is an integer, RD is a number.
+    |.if FPU
     |  tonum_i f0, CARG2
+    |.else
+    |  bl ->vm_sfi2d_1
+    |.endif
     |4:
-    |  lfd f1, 0(RD)
+    |  .FPU lfd f1, 0(RD)
     |5:
+    |.if FPU
     |  fcmpu cr0, f0, f1
+    |.else
+    |  blex __ledf2
+    |  cmpwi CRET1, 0
+    |.endif
     if (op == BC_ISLT) {
       |  bge <2
     } else if (op == BC_ISGE) {
@@ -3016,42 +3441,42 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     vk = op == BC_ISEQV;
     |  // RA = src1*8, RD = src2*8, JMP with RD = target
     |.if DUALNUM
-    |  lwzux TMP0, RA, BASE
+    |  lwzux CARG1, RA, BASE
     |    addi PC, PC, 4
     |   lwz CARG2, 4(RA)
-    |  lwzux TMP1, RD, BASE
-    |  checknum cr0, TMP0
-    |    lwz TMP2, -4(PC)
-    |  checknum cr1, TMP1
-    |    decode_RD4 TMP2, TMP2
-    |   lwz CARG3, 4(RD)
+    |  lwzux CARG3, RD, BASE
+    |  checknum cr0, CARG1
+    |    lwz SAVE0, -4(PC)
+    |  checknum cr1, CARG3
+    |    decode_RD4 SAVE0, SAVE0
+    |   lwz CARG4, 4(RD)
     |  cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
-    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
     if (vk) {
       |  ble cr7, ->BC_ISEQN_Z
     } else {
       |  ble cr7, ->BC_ISNEN_Z
     }
     |.else
-    |  lwzux TMP0, RA, BASE
-    |   lwz TMP2, 0(PC)
+    |  lwzux CARG1, RA, BASE
+    |   lwz SAVE0, 0(PC)
     |    lfd f0, 0(RA)
     |   addi PC, PC, 4
-    |  lwzux TMP1, RD, BASE
-    |  checknum cr0, TMP0
-    |   decode_RD4 TMP2, TMP2
+    |  lwzux CARG3, RD, BASE
+    |  checknum cr0, CARG1
+    |   decode_RD4 SAVE0, SAVE0
     |    lfd f1, 0(RD)
-    |  checknum cr1, TMP1
-    |   addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+    |  checknum cr1, CARG3
+    |   addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
     |  bge cr0, >5
     |  bge cr1, >5
     |  fcmpu cr0, f0, f1
     if (vk) {
       |  bne >1
-      |  add PC, PC, TMP2
+      |  add PC, PC, SAVE0
     } else {
       |  beq >1
-      |  add PC, PC, TMP2
+      |  add PC, PC, SAVE0
     }
     |1:
     |  ins_next
@@ -3059,36 +3484,36 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |5:  // Either or both types are not numbers.
     |.if not DUALNUM
     |    lwz CARG2, 4(RA)
-    |    lwz CARG3, 4(RD)
+    |    lwz CARG4, 4(RD)
     |.endif
     |.if FFI
-    |  cmpwi cr7, TMP0, LJ_TCDATA
-    |  cmpwi cr5, TMP1, LJ_TCDATA
+    |  cmpwi cr7, CARG1, LJ_TCDATA
+    |  cmpwi cr5, CARG3, LJ_TCDATA
     |.endif
-    |   not TMP3, TMP0
-    |  cmplw TMP0, TMP1
-    |   cmplwi cr1, TMP3, ~LJ_TISPRI		// Primitive?
+    |   not TMP2, CARG1
+    |  cmplw CARG1, CARG3
+    |   cmplwi cr1, TMP2, ~LJ_TISPRI		// Primitive?
     |.if FFI
     |  cror 4*cr7+eq, 4*cr7+eq, 4*cr5+eq
     |.endif
-    |   cmplwi cr6, TMP3, ~LJ_TISTABUD		// Table or userdata?
+    |   cmplwi cr6, TMP2, ~LJ_TISTABUD		// Table or userdata?
     |.if FFI
     |  beq cr7, ->vmeta_equal_cd
     |.endif
-    |    cmplw cr5, CARG2, CARG3
+    |    cmplw cr5, CARG2, CARG4
     |  crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt	// 2: Same type and primitive.
     |  crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq	// 1: Same tv or different type.
     |  crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq	// 0: Same type and same tv.
-    |   mr SAVE0, PC
+    |   mr SAVE1, PC
     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt	// 0 or 2.
     |  cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt	// 1 or 2.
     if (vk) {
       |  bne cr0, >6
-      |  add PC, PC, TMP2
+      |  add PC, PC, SAVE0
       |6:
     } else {
       |  beq cr0, >6
-      |  add PC, PC, TMP2
+      |  add PC, PC, SAVE0
       |6:
     }
     |.if DUALNUM
@@ -3103,6 +3528,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |
     |  // Different tables or userdatas. Need to check __eq metamethod.
     |  // Field metatable must be at same offset for GCtab and GCudata!
+    |   mr CARG3, CARG4
     |  lwz TAB:TMP2, TAB:CARG2->metatable
     |   li CARG4, 1-vk			// ne = 0 or 1.
     |  cmplwi TAB:TMP2, 0
@@ -3110,7 +3536,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  lbz TMP2, TAB:TMP2->nomm
     |  andix. TMP2, TMP2, 1<<MM_eq
     |  bne <1				// Or 'no __eq' flag set?
-    |  mr PC, SAVE0			// Restore old PC.
+    |  mr PC, SAVE1			// Restore old PC.
     |  b ->vmeta_equal			// Handle __eq metamethod.
     break;
 
@@ -3151,16 +3577,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     vk = op == BC_ISEQN;
     |  // RA = src*8, RD = num_const*8, JMP with RD = target
     |.if DUALNUM
-    |  lwzux TMP0, RA, BASE
+    |  lwzux CARG1, RA, BASE
     |    addi PC, PC, 4
     |   lwz CARG2, 4(RA)
-    |  lwzux TMP1, RD, KBASE
-    |  checknum cr0, TMP0
-    |    lwz TMP2, -4(PC)
-    |  checknum cr1, TMP1
-    |    decode_RD4 TMP2, TMP2
-    |   lwz CARG3, 4(RD)
-    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+    |  lwzux CARG3, RD, KBASE
+    |  checknum cr0, CARG1
+    |    lwz SAVE0, -4(PC)
+    |  checknum cr1, CARG3
+    |    decode_RD4 SAVE0, SAVE0
+    |   lwz CARG4, 4(RD)
+    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
     if (vk) {
       |->BC_ISEQN_Z:
     } else {
@@ -3168,7 +3594,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     }
     |  bne cr0, >7
     |  bne cr1, >8
-    |   cmpw CARG2, CARG3
+    |   cmpw CARG2, CARG4
     |4:
     |.else
     if (vk) {
@@ -3176,20 +3602,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     } else {
       |->BC_ISNEN_Z:  // Dummy label.
     }
-    |  lwzx TMP0, BASE, RA
+    |  lwzx CARG1, BASE, RA
     |    addi PC, PC, 4
     |   lfdx f0, BASE, RA
-    |    lwz TMP2, -4(PC)
+    |    lwz SAVE0, -4(PC)
     |  lfdx f1, KBASE, RD
-    |    decode_RD4 TMP2, TMP2
-    |  checknum TMP0
-    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
+    |    decode_RD4 SAVE0, SAVE0
+    |  checknum CARG1
+    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
     |  bge >3
     |  fcmpu cr0, f0, f1
     |.endif
     if (vk) {
       |  bne >1
-      |  add PC, PC, TMP2
+      |  add PC, PC, SAVE0
       |1:
       |.if not FFI
       |3:
@@ -3200,13 +3626,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |.if not FFI
       |3:
       |.endif
-      |  add PC, PC, TMP2
+      |  add PC, PC, SAVE0
       |2:
     }
     |  ins_next
     |.if FFI
     |3:
-    |  cmpwi TMP0, LJ_TCDATA
+    |  cmpwi CARG1, LJ_TCDATA
     |  beq ->vmeta_equal_cd
     |  b <1
     |.endif
@@ -3214,18 +3640,31 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |7:  // RA is not an integer.
     |  bge cr0, <3
     |  // RA is a number.
-    |   lfd f0, 0(RA)
+    |   .FPU lfd f0, 0(RA)
     |  blt cr1, >1
     |  // RA is a number, RD is an integer.
-    |  tonum_i f1, CARG3
+    |.if FPU
+    |  tonum_i f1, CARG4
+    |.else
+    |  bl ->vm_sfi2d_2
+    |.endif
     |  b >2
     |
     |8: // RA is an integer, RD is a number.
+    |.if FPU
     |  tonum_i f0, CARG2
+    |.else
+    |  bl ->vm_sfi2d_1
+    |.endif
     |1:
-    |  lfd f1, 0(RD)
+    |  .FPU lfd f1, 0(RD)
     |2:
+    |.if FPU
     |  fcmpu cr0, f0, f1
+    |.else
+    |  blex __ledf2
+    |  cmpwi CRET1, 0
+    |.endif
     |  b <4
     |.endif
     break;
@@ -3280,7 +3719,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  add PC, PC, TMP2
     } else {
       |  li TMP1, LJ_TFALSE
+      |.if FPU
       |   lfdx f0, BASE, RD
+      |.else
+      |   lwzux CARG1, RD, BASE
+      |   lwz CARG2, 4(RD)
+      |.endif
       |  cmplw TMP0, TMP1
       if (op == BC_ISTC) {
 	|  bge >1
@@ -3289,7 +3733,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       }
       |  addis PC, PC, -(BCBIAS_J*4 >> 16)
       |  decode_RD4 TMP2, INS
+      |.if FPU
       |   stfdx f0, BASE, RA
+      |.else
+      |   stwux CARG1, RA, BASE
+      |   stw CARG2, 4(RA)
+      |.endif
       |  add PC, PC, TMP2
       |1:
     }
@@ -3324,8 +3773,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
   case BC_MOV:
     |  // RA = dst*8, RD = src*8
     |  ins_next1
+    |.if FPU
     |  lfdx f0, BASE, RD
     |  stfdx f0, BASE, RA
+    |.else
+    |  lwzux TMP0, RD, BASE
+    |  lwz TMP1, 4(RD)
+    |  stwux TMP0, RA, BASE
+    |  stw TMP1, 4(RA)
+    |.endif
     |  ins_next2
     break;
   case BC_NOT:
@@ -3427,44 +3883,65 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
     ||switch (vk) {
     ||case 0:
-    |   lwzx TMP1, BASE, RB
+    |   lwzx CARG1, BASE, RB
     |   .if DUALNUM
-    |     lwzx TMP2, KBASE, RC
+    |     lwzx CARG3, KBASE, RC
     |   .endif
+    |   .if FPU
     |    lfdx f14, BASE, RB
     |    lfdx f15, KBASE, RC
+    |   .else
+    |    add TMP1, BASE, RB
+    |    add TMP2, KBASE, RC
+    |    lwz CARG2, 4(TMP1)
+    |    lwz CARG4, 4(TMP2)
+    |   .endif
     |   .if DUALNUM
-    |     checknum cr0, TMP1
-    |     checknum cr1, TMP2
+    |     checknum cr0, CARG1
+    |     checknum cr1, CARG3
     |     crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
     |     bge ->vmeta_arith_vn
     |   .else
-    |     checknum TMP1; bge ->vmeta_arith_vn
+    |     checknum CARG1; bge ->vmeta_arith_vn
     |   .endif
     ||  break;
     ||case 1:
-    |   lwzx TMP1, BASE, RB
+    |   lwzx CARG1, BASE, RB
     |   .if DUALNUM
-    |     lwzx TMP2, KBASE, RC
+    |     lwzx CARG3, KBASE, RC
     |   .endif
+    |   .if FPU
     |    lfdx f15, BASE, RB
     |    lfdx f14, KBASE, RC
+    |   .else
+    |    add TMP1, BASE, RB
+    |    add TMP2, KBASE, RC
+    |    lwz CARG2, 4(TMP1)
+    |    lwz CARG4, 4(TMP2)
+    |   .endif
     |   .if DUALNUM
-    |     checknum cr0, TMP1
-    |     checknum cr1, TMP2
+    |     checknum cr0, CARG1
+    |     checknum cr1, CARG3
     |     crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
     |     bge ->vmeta_arith_nv
     |   .else
-    |     checknum TMP1; bge ->vmeta_arith_nv
+    |     checknum CARG1; bge ->vmeta_arith_nv
     |   .endif
     ||  break;
     ||default:
-    |   lwzx TMP1, BASE, RB
-    |   lwzx TMP2, BASE, RC
+    |   lwzx CARG1, BASE, RB
+    |   lwzx CARG3, BASE, RC
+    |   .if FPU
     |    lfdx f14, BASE, RB
     |    lfdx f15, BASE, RC
-    |   checknum cr0, TMP1
-    |   checknum cr1, TMP2
+    |   .else
+    |    add TMP1, BASE, RB
+    |    add TMP2, BASE, RC
+    |    lwz CARG2, 4(TMP1)
+    |    lwz CARG4, 4(TMP2)
+    |   .endif
+    |   checknum cr0, CARG1
+    |   checknum cr1, CARG3
     |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
     |   bge ->vmeta_arith_vv
     ||  break;
@@ -3498,48 +3975,78 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  fsub a, b, a			// b - floor(b/c)*c
     |.endmacro
     |
+    |.macro sfpmod
+    |->BC_MODVN_Z:
+    |  stw CARG1, SFSAVE_1
+    |  stw CARG2, SFSAVE_2
+    |  mr SAVE0, CARG3
+    |  mr SAVE1, CARG4
+    |  blex __divdf3
+    |  blex floor
+    |  mr CARG3, SAVE0
+    |  mr CARG4, SAVE1
+    |  blex __muldf3
+    |  mr CARG3, CRET1
+    |  mr CARG4, CRET2
+    |  lwz CARG1, SFSAVE_1
+    |  lwz CARG2, SFSAVE_2
+    |  blex __subdf3
+    |.endmacro
+    |
     |.macro ins_arithfp, fpins
     |  ins_arithpre
     |.if "fpins" == "fpmod_"
     |  b ->BC_MODVN_Z			// Avoid 3 copies. It's slow anyway.
-    |.else
+    |.elif FPU
     |  fpins f0, f14, f15
     |  ins_next1
     |  stfdx f0, BASE, RA
     |  ins_next2
+    |.else
+    |  blex __divdf3			// Only soft-float div uses this macro.
+    |  ins_next1
+    |  stwux CRET1, RA, BASE
+    |  stw CRET2, 4(RA)
+    |  ins_next2
     |.endif
     |.endmacro
     |
-    |.macro ins_arithdn, intins, fpins
+    |.macro ins_arithdn, intins, fpins, fpcall
     |  // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8
     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
     ||switch (vk) {
     ||case 0:
-    |   lwzux TMP1, RB, BASE
-    |   lwzux TMP2, RC, KBASE
-    |    lwz CARG1, 4(RB)
-    |   checknum cr0, TMP1
-    |    lwz CARG2, 4(RC)
+    |   lwzux CARG1, RB, BASE
+    |   lwzux CARG3, RC, KBASE
+    |    lwz CARG2, 4(RB)
+    |   checknum cr0, CARG1
+    |    lwz CARG4, 4(RC)
+    |   checknum cr1, CARG3
     ||  break;
     ||case 1:
-    |   lwzux TMP1, RB, BASE
-    |   lwzux TMP2, RC, KBASE
-    |    lwz CARG2, 4(RB)
-    |   checknum cr0, TMP1
-    |    lwz CARG1, 4(RC)
+    |   lwzux CARG3, RB, BASE
+    |   lwzux CARG1, RC, KBASE
+    |    lwz CARG4, 4(RB)
+    |   checknum cr0, CARG3
+    |    lwz CARG2, 4(RC)
+    |   checknum cr1, CARG1
     ||  break;
     ||default:
-    |   lwzux TMP1, RB, BASE
-    |   lwzux TMP2, RC, BASE
-    |    lwz CARG1, 4(RB)
-    |   checknum cr0, TMP1
-    |    lwz CARG2, 4(RC)
+    |   lwzux CARG1, RB, BASE
+    |   lwzux CARG3, RC, BASE
+    |    lwz CARG2, 4(RB)
+    |   checknum cr0, CARG1
+    |    lwz CARG4, 4(RC)
+    |   checknum cr1, CARG3
     ||  break;
     ||}
-    |  checknum cr1, TMP2
     |  bne >5
     |  bne cr1, >5
-    |  intins CARG1, CARG1, CARG2
+    |.if "intins" == "intmod"
+    |  mr CARG1, CARG2
+    |  mr CARG2, CARG4
+    |.endif
+    |  intins CARG1, CARG2, CARG4
     |  bso >4
     |1:
     |  ins_next1
@@ -3551,29 +4058,40 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  checkov TMP0, <1			// Ignore unrelated overflow.
     |  ins_arithfallback b
     |5:  // FP variant.
+    |.if FPU
     ||if (vk == 1) {
     |  lfd f15, 0(RB)
-    |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
     |  lfd f14, 0(RC)
     ||} else {
     |  lfd f14, 0(RB)
-    |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
     |  lfd f15, 0(RC)
     ||}
+    |.endif
+    |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
     |   ins_arithfallback bge
     |.if "fpins" == "fpmod_"
     |  b ->BC_MODVN_Z			// Avoid 3 copies. It's slow anyway.
     |.else
+    |.if FPU
     |  fpins f0, f14, f15
-    |  ins_next1
     |  stfdx f0, BASE, RA
+    |.else
+    |.if "fpcall" == "sfpmod"
+    |  sfpmod
+    |.else
+    |  blex fpcall
+    |.endif
+    |  stwux CRET1, RA, BASE
+    |  stw CRET2, 4(RA)
+    |.endif
+    |  ins_next1
     |  b <2
     |.endif
     |.endmacro
     |
-    |.macro ins_arith, intins, fpins
+    |.macro ins_arith, intins, fpins, fpcall
     |.if DUALNUM
-    |  ins_arithdn intins, fpins
+    |  ins_arithdn intins, fpins, fpcall
     |.else
     |  ins_arithfp fpins
     |.endif
@@ -3588,9 +4106,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  addo. TMP0, TMP0, TMP3
     |  add y, a, b
     |.endmacro
-    |  ins_arith addo32., fadd
+    |  ins_arith addo32., fadd, __adddf3
     |.else
-    |  ins_arith addo., fadd
+    |  ins_arith addo., fadd, __adddf3
     |.endif
     break;
   case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
@@ -3602,36 +4120,48 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  subo. TMP0, TMP0, TMP3
     |  sub y, a, b
     |.endmacro
-    |  ins_arith subo32., fsub
+    |  ins_arith subo32., fsub, __subdf3
     |.else
-    |  ins_arith subo., fsub
+    |  ins_arith subo., fsub, __subdf3
     |.endif
     break;
   case BC_MULVN: case BC_MULNV: case BC_MULVV:
-    |  ins_arith mullwo., fmul
+    |  ins_arith mullwo., fmul, __muldf3
     break;
   case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
     |  ins_arithfp fdiv
     break;
   case BC_MODVN:
-    |  ins_arith intmod, fpmod
+    |  ins_arith intmod, fpmod, sfpmod
     break;
   case BC_MODNV: case BC_MODVV:
-    |  ins_arith intmod, fpmod_
+    |  ins_arith intmod, fpmod_, sfpmod
     break;
   case BC_POW:
     |  // NYI: (partial) integer arithmetic.
-    |  lwzx TMP1, BASE, RB
+    |  lwzx CARG1, BASE, RB
+    |  lwzx CARG3, BASE, RC
+    |.if FPU
     |   lfdx FARG1, BASE, RB
-    |  lwzx TMP2, BASE, RC
     |   lfdx FARG2, BASE, RC
-    |  checknum cr0, TMP1
-    |  checknum cr1, TMP2
+    |.else
+    |   add TMP1, BASE, RB
+    |   add TMP2, BASE, RC
+    |   lwz CARG2, 4(TMP1)
+    |   lwz CARG4, 4(TMP2)
+    |.endif
+    |  checknum cr0, CARG1
+    |  checknum cr1, CARG3
     |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
     |  bge ->vmeta_arith_vv
     |  blex pow
     |  ins_next1
+    |.if FPU
     |  stfdx FARG1, BASE, RA
+    |.else
+    |  stwux CARG1, RA, BASE
+    |  stw CARG2, 4(RA)
+    |.endif
     |  ins_next2
     break;
 
@@ -3651,8 +4181,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   lp BASE, L->base
     |  bne ->vmeta_binop
     |  ins_next1
+    |.if FPU
     |  lfdx f0, BASE, SAVE0		// Copy result from RB to RA.
     |  stfdx f0, BASE, RA
+    |.else
+    |  lwzux TMP0, SAVE0, BASE
+    |  lwz TMP1, 4(SAVE0)
+    |  stwux TMP0, RA, BASE
+    |  stw TMP1, 4(RA)
+    |.endif
     |  ins_next2
     break;
 
@@ -3715,8 +4252,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
   case BC_KNUM:
     |  // RA = dst*8, RD = num_const*8
     |  ins_next1
+    |.if FPU
     |  lfdx f0, KBASE, RD
     |  stfdx f0, BASE, RA
+    |.else
+    |  lwzux TMP0, RD, KBASE
+    |  lwz TMP1, 4(RD)
+    |  stwux TMP0, RA, BASE
+    |  stw TMP1, 4(RA)
+    |.endif
     |  ins_next2
     break;
   case BC_KPRI:
@@ -3749,8 +4293,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  lwzx UPVAL:RB, LFUNC:RB, RD
     |  ins_next1
     |  lwz TMP1, UPVAL:RB->v
+    |.if FPU
     |  lfd f0, 0(TMP1)
     |  stfdx f0, BASE, RA
+    |.else
+    |  lwz TMP2, 0(TMP1)
+    |  lwz TMP3, 4(TMP1)
+    |  stwux TMP2, RA, BASE
+    |  stw TMP3, 4(RA)
+    |.endif
     |  ins_next2
     break;
   case BC_USETV:
@@ -3758,14 +4309,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  lwz LFUNC:RB, FRAME_FUNC(BASE)
     |    srwi RA, RA, 1
     |    addi RA, RA, offsetof(GCfuncL, uvptr)
+    |.if FPU
     |   lfdux f0, RD, BASE
+    |.else
+    |   lwzux CARG1, RD, BASE
+    |   lwz CARG3, 4(RD)
+    |.endif
     |  lwzx UPVAL:RB, LFUNC:RB, RA
     |  lbz TMP3, UPVAL:RB->marked
     |   lwz CARG2, UPVAL:RB->v
     |  andix. TMP3, TMP3, LJ_GC_BLACK	// isblack(uv)
     |    lbz TMP0, UPVAL:RB->closed
     |   lwz TMP2, 0(RD)
+    |.if FPU
     |   stfd f0, 0(CARG2)
+    |.else
+    |   stw CARG1, 0(CARG2)
+    |   stw CARG3, 4(CARG2)
+    |.endif
     |    cmplwi cr1, TMP0, 0
     |   lwz TMP1, 4(RD)
     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
@@ -3821,11 +4382,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  lwz LFUNC:RB, FRAME_FUNC(BASE)
     |   srwi RA, RA, 1
     |   addi RA, RA, offsetof(GCfuncL, uvptr)
+    |.if FPU
     |    lfdx f0, KBASE, RD
+    |.else
+    |    lwzux TMP2, RD, KBASE
+    |    lwz TMP3, 4(RD)
+    |.endif
     |  lwzx UPVAL:RB, LFUNC:RB, RA
     |  ins_next1
     |  lwz TMP1, UPVAL:RB->v
+    |.if FPU
     |  stfd f0, 0(TMP1)
+    |.else
+    |  stw TMP2, 0(TMP1)
+    |  stw TMP3, 4(TMP1)
+    |.endif
     |  ins_next2
     break;
   case BC_USETP:
@@ -3973,11 +4544,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.endif
     |  ble ->vmeta_tgetv		// Integer key and in array part?
     |  lwzx TMP0, TMP1, TMP2
+    |.if FPU
     |   lfdx f14, TMP1, TMP2
+    |.else
+    |   lwzux SAVE0, TMP1, TMP2
+    |   lwz SAVE1, 4(TMP1)
+    |.endif
     |  checknil TMP0; beq >2
     |1:
     |  ins_next1
+    |.if FPU
     |   stfdx f14, BASE, RA
+    |.else
+    |   stwux SAVE0, RA, BASE
+    |   stw SAVE1, 4(RA)
+    |.endif
     |  ins_next2
     |
     |2:  // Check for __index if table value is nil.
@@ -4053,12 +4634,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  lwz TMP1, TAB:RB->asize
     |   lwz TMP2, TAB:RB->array
     |  cmplw TMP0, TMP1; bge ->vmeta_tgetb
+    |.if FPU
     |  lwzx TMP1, TMP2, RC
     |   lfdx f0, TMP2, RC
+    |.else
+    |  lwzux TMP1, TMP2, RC
+    |   lwz TMP3, 4(TMP2)
+    |.endif
     |  checknil TMP1; beq >5
     |1:
     |  ins_next1
+    |.if FPU
     |   stfdx f0, BASE, RA
+    |.else
+    |   stwux TMP1, RA, BASE
+    |   stw TMP3, 4(RA)
+    |.endif
     |  ins_next2
     |
     |5:  // Check for __index if table value is nil.
@@ -4088,10 +4679,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  cmplw TMP0, CARG2
     |   slwi TMP2, CARG2, 3
     |  ble ->vmeta_tgetr		// In array part?
+    |.if FPU
     |   lfdx f14, TMP1, TMP2
+    |.else
+    |   lwzux SAVE0, TMP2, TMP1
+    |   lwz SAVE1, 4(TMP2)
+    |.endif
     |->BC_TGETR_Z:
     |  ins_next1
+    |.if FPU
     |   stfdx f14, BASE, RA
+    |.else
+    |   stwux SAVE0, RA, BASE
+    |   stw SAVE1, 4(RA)
+    |.endif
     |  ins_next2
     break;
 
@@ -4132,11 +4733,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ble ->vmeta_tsetv		// Integer key and in array part?
     |   lwzx TMP2, TMP1, TMP0
     |  lbz TMP3, TAB:RB->marked
+    |.if FPU
     |    lfdx f14, BASE, RA
+    |.else
+    |    add SAVE1, BASE, RA
+    |    lwz SAVE0, 0(SAVE1)
+    |    lwz SAVE1, 4(SAVE1)
+    |.endif
     |   checknil TMP2; beq >3
     |1:
     |  andix. TMP2, TMP3, LJ_GC_BLACK	// isblack(table)
+    |.if FPU
     |    stfdx f14, TMP1, TMP0
+    |.else
+    |    stwux SAVE0, TMP1, TMP0
+    |    stw SAVE1, 4(TMP1)
+    |.endif
     |  bne >7
     |2:
     |  ins_next
@@ -4177,7 +4789,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  lwz NODE:TMP2, TAB:RB->node
     |    stb ZERO, TAB:RB->nomm		// Clear metamethod cache.
     |  and TMP1, TMP1, TMP0		// idx = str->hash & tab->hmask
+    |.if FPU
     |    lfdx f14, BASE, RA
+    |.else
+    |    add CARG2, BASE, RA
+    |    lwz SAVE0, 0(CARG2)
+    |    lwz SAVE1, 4(CARG2)
+    |.endif
     |  slwi TMP0, TMP1, 5
     |  slwi TMP1, TMP1, 3
     |  sub TMP1, TMP0, TMP1
@@ -4193,7 +4811,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |    checknil CARG2; beq >4		// Key found, but nil value?
     |2:
     |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
+    |.if FPU
     |    stfd f14, NODE:TMP2->val
+    |.else
+    |    stw SAVE0, NODE:TMP2->val.u32.hi
+    |    stw SAVE1, NODE:TMP2->val.u32.lo
+    |.endif
     |  bne >7
     |3:
     |  ins_next
@@ -4232,7 +4855,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  bl extern lj_tab_newkey		// (lua_State *L, GCtab *t, TValue *k)
     |  // Returns TValue *.
     |  lp BASE, L->base
+    |.if FPU
     |  stfd f14, 0(CRET1)
+    |.else
+    |  stw SAVE0, 0(CRET1)
+    |  stw SAVE1, 4(CRET1)
+    |.endif
     |  b <3				// No 2nd write barrier needed.
     |
     |7:  // Possible table write barrier for the value. Skip valiswhite check.
@@ -4249,13 +4877,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   lwz TMP2, TAB:RB->array
     |    lbz TMP3, TAB:RB->marked
     |  cmplw TMP0, TMP1
+    |.if FPU
     |   lfdx f14, BASE, RA
+    |.else
+    |   add CARG2, BASE, RA
+    |   lwz SAVE0, 0(CARG2)
+    |   lwz SAVE1, 4(CARG2)
+    |.endif
     |  bge ->vmeta_tsetb
     |  lwzx TMP1, TMP2, RC
     |  checknil TMP1; beq >5
     |1:
     |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
+    |.if FPU
     |   stfdx f14, TMP2, RC
+    |.else
+    |   stwux SAVE0, RC, TMP2
+    |   stw SAVE1, 4(RC)
+    |.endif
     |  bne >7
     |2:
     |  ins_next
@@ -4295,10 +4934,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |2:
     |  cmplw TMP0, CARG3
     |   slwi TMP2, CARG3, 3
+    |.if FPU
     |   lfdx f14, BASE, RA
+    |.else
+    |  lwzux SAVE0, RA, BASE
+    |  lwz SAVE1, 4(RA)
+    |.endif
     |  ble ->vmeta_tsetr		// In array part?
     |  ins_next1
+    |.if FPU
     |   stfdx f14, TMP1, TMP2
+    |.else
+    |   stwux SAVE0, TMP1, TMP2
+    |   stw SAVE1, 4(TMP1)
+    |.endif
     |  ins_next2
     |
     |7:  // Possible table write barrier for the value. Skip valiswhite check.
@@ -4328,10 +4977,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   add TMP1, TMP1, TMP0
     |    andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
     |3:  // Copy result slots to table.
+    |.if FPU
     |   lfd f0, 0(RA)
+    |.else
+    |   lwz SAVE0, 0(RA)
+    |   lwz SAVE1, 4(RA)
+    |.endif
     |  addi RA, RA, 8
     |  cmpw cr1, RA, TMP2
+    |.if FPU
     |   stfd f0, 0(TMP1)
+    |.else
+    |   stw SAVE0, 0(TMP1)
+    |   stw SAVE1, 4(TMP1)
+    |.endif
     |    addi TMP1, TMP1, 8
     |  blt cr1, <3
     |  bne >7
@@ -4398,9 +5057,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |    beq cr1, >3
     |2:
     |  addi TMP3, TMP2, 8
+    |.if FPU
     |   lfdx f0, RA, TMP2
+    |.else
+    |   add CARG3, RA, TMP2
+    |   lwz CARG1, 0(CARG3)
+    |   lwz CARG2, 4(CARG3)
+    |.endif
     |  cmplw cr1, TMP3, NARGS8:RC
+    |.if FPU
     |   stfdx f0, BASE, TMP2
+    |.else
+    |   stwux CARG1, TMP2, BASE
+    |   stw CARG2, 4(TMP2)
+    |.endif
     |  mr TMP2, TMP3
     |  bne cr1, <2
     |3:
@@ -4433,14 +5103,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  add BASE, BASE, RA
     |  lwz TMP1, -24(BASE)
     |   lwz LFUNC:RB, -20(BASE)
+    |.if FPU
     |    lfd f1, -8(BASE)
     |    lfd f0, -16(BASE)
+    |.else
+    |    lwz CARG1, -8(BASE)
+    |    lwz CARG2, -4(BASE)
+    |    lwz CARG3, -16(BASE)
+    |    lwz CARG4, -12(BASE)
+    |.endif
     |  stw TMP1, 0(BASE)		// Copy callable.
     |   stw LFUNC:RB, 4(BASE)
     |  checkfunc TMP1
-    |    stfd f1, 16(BASE)		// Copy control var.
     |     li NARGS8:RC, 16		// Iterators get 2 arguments.
+    |.if FPU
+    |    stfd f1, 16(BASE)		// Copy control var.
     |    stfdu f0, 8(BASE)		// Copy state.
+    |.else
+    |    stw CARG1, 16(BASE)		// Copy control var.
+    |    stw CARG2, 20(BASE)
+    |    stwu CARG3, 8(BASE)		// Copy state.
+    |    stw CARG4, 4(BASE)
+    |.endif
     |  bne ->vmeta_call
     |  ins_call
     break;
@@ -4461,7 +5145,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   slwi TMP3, RC, 3
     |  bge >5				// Index points after array part?
     |  lwzx TMP2, TMP1, TMP3
+    |.if FPU
     |   lfdx f0, TMP1, TMP3
+    |.else
+    |   lwzux CARG1, TMP3, TMP1
+    |   lwz CARG2, 4(TMP3)
+    |.endif
     |  checknil TMP2
     |     lwz INS, -4(PC)
     |  beq >4
@@ -4473,7 +5162,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.endif
     |    addi RC, RC, 1
     |     addis TMP3, PC, -(BCBIAS_J*4 >> 16)
+    |.if FPU
     |  stfd f0, 8(RA)
+    |.else
+    |  stw CARG1, 8(RA)
+    |  stw CARG2, 12(RA)
+    |.endif
     |     decode_RD4 TMP1, INS
     |    stw RC, -4(RA)			// Update control var.
     |     add PC, TMP1, TMP3
@@ -4498,17 +5192,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   slwi RB, RC, 3
     |   sub TMP3, TMP3, RB
     |  lwzx RB, TMP2, TMP3
+    |.if FPU
     |  lfdx f0, TMP2, TMP3
+    |.else
+    |  add CARG3, TMP2, TMP3
+    |  lwz CARG1, 0(CARG3)
+    |  lwz CARG2, 4(CARG3)
+    |.endif
     |   add NODE:TMP3, TMP2, TMP3
     |  checknil RB
     |     lwz INS, -4(PC)
     |  beq >7
+    |.if FPU
     |   lfd f1, NODE:TMP3->key
+    |.else
+    |   lwz CARG3, NODE:TMP3->key.u32.hi
+    |   lwz CARG4, NODE:TMP3->key.u32.lo
+    |.endif
     |     addis TMP2, PC, -(BCBIAS_J*4 >> 16)
+    |.if FPU
     |  stfd f0, 8(RA)
+    |.else
+    |  stw CARG1, 8(RA)
+    |  stw CARG2, 12(RA)
+    |.endif
     |    add RC, RC, TMP0
     |     decode_RD4 TMP1, INS
+    |.if FPU
     |   stfd f1, 0(RA)
+    |.else
+    |   stw CARG3, 0(RA)
+    |   stw CARG4, 4(RA)
+    |.endif
     |    addi RC, RC, 1
     |     add PC, TMP1, TMP2
     |    stw RC, -4(RA)			// Update control var.
@@ -4574,9 +5289,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   subi TMP2, TMP2, 16
     |   ble >2				// No vararg slots?
     |1:  // Copy vararg slots to destination slots.
+    |.if FPU
     |  lfd f0, 0(RC)
+    |.else
+    |  lwz CARG1, 0(RC)
+    |  lwz CARG2, 4(RC)
+    |.endif
     |   addi RC, RC, 8
+    |.if FPU
     |  stfd f0, 0(RA)
+    |.else
+    |  stw CARG1, 0(RA)
+    |  stw CARG2, 4(RA)
+    |.endif
     |  cmplw RA, TMP2
     |   cmplw cr1, RC, TMP3
     |  bge >3				// All destination slots filled?
@@ -4599,9 +5324,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   addi MULTRES, TMP1, 8
     |  bgt >7
     |6:
+    |.if FPU
     |  lfd f0, 0(RC)
+    |.else
+    |  lwz CARG1, 0(RC)
+    |  lwz CARG2, 4(RC)
+    |.endif
     |   addi RC, RC, 8
+    |.if FPU
     |  stfd f0, 0(RA)
+    |.else
+    |  stw CARG1, 0(RA)
+    |  stw CARG2, 4(RA)
+    |.endif
     |  cmplw RC, TMP3
     |   addi RA, RA, 8
     |  blt <6				// More vararg slots?
@@ -4652,14 +5387,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   li TMP1, 0
     |2:
     |  addi TMP3, TMP1, 8
+    |.if FPU
     |   lfdx f0, RA, TMP1
+    |.else
+    |   add CARG3, RA, TMP1
+    |   lwz CARG1, 0(CARG3)
+    |   lwz CARG2, 4(CARG3)
+    |.endif
     |  cmpw TMP3, RC
+    |.if FPU
     |   stfdx f0, TMP2, TMP1
+    |.else
+    |   add CARG3, TMP2, TMP1
+    |   stw CARG1, 0(CARG3)
+    |   stw CARG2, 4(CARG3)
+    |.endif
     |  beq >3
     |  addi TMP1, TMP3, 8
+    |.if FPU
     |   lfdx f1, RA, TMP3
+    |.else
+    |   add CARG3, RA, TMP3
+    |   lwz CARG1, 0(CARG3)
+    |   lwz CARG2, 4(CARG3)
+    |.endif
     |  cmpw TMP1, RC
+    |.if FPU
     |   stfdx f1, TMP2, TMP3
+    |.else
+    |   add CARG3, TMP2, TMP3
+    |   stw CARG1, 0(CARG3)
+    |   stw CARG2, 4(CARG3)
+    |.endif
     |  bne <2
     |3:
     |5:
@@ -4701,8 +5460,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   subi TMP2, BASE, 8
     |  decode_RB8 RB, INS
     if (op == BC_RET1) {
+      |.if FPU
       |  lfd f0, 0(RA)
       |  stfd f0, 0(TMP2)
+      |.else
+      |  lwz CARG1, 0(RA)
+      |  lwz CARG2, 4(RA)
+      |  stw CARG1, 0(TMP2)
+      |  stw CARG2, 4(TMP2)
+      |.endif
     }
     |5:
     |  cmplw RB, RD
@@ -4763,11 +5529,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |4:
       |  stw CARG1, FORL_IDX*8+4(RA)
     } else {
-      |  lwz TMP3, FORL_STEP*8(RA)
+      |  lwz SAVE0, FORL_STEP*8(RA)
       |   lwz CARG3, FORL_STEP*8+4(RA)
       |  lwz TMP2, FORL_STOP*8(RA)
       |   lwz CARG2, FORL_STOP*8+4(RA)
-      |  cmplw cr7, TMP3, TISNUM
+      |  cmplw cr7, SAVE0, TISNUM
       |  cmplw cr1, TMP2, TISNUM
       |  crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
       |  crand 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
@@ -4810,41 +5576,80 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     if (vk) {
       |.if DUALNUM
       |9:  // FP loop.
+      |.if FPU
       |  lfd f1, FORL_IDX*8(RA)
       |.else
+      |  lwz CARG1, FORL_IDX*8(RA)
+      |  lwz CARG2, FORL_IDX*8+4(RA)
+      |.endif
+      |.else
       |  lfdux f1, RA, BASE
       |.endif
+      |.if FPU
       |  lfd f3, FORL_STEP*8(RA)
       |  lfd f2, FORL_STOP*8(RA)
-      |   lwz TMP3, FORL_STEP*8(RA)
       |  fadd f1, f1, f3
       |  stfd f1, FORL_IDX*8(RA)
+      |.else
+      |  lwz CARG3, FORL_STEP*8(RA)
+      |  lwz CARG4, FORL_STEP*8+4(RA)
+      |  mr SAVE1, RD
+      |  blex __adddf3
+      |  mr RD, SAVE1
+      |  stw CRET1, FORL_IDX*8(RA)
+      |  stw CRET2, FORL_IDX*8+4(RA)
+      |  lwz CARG3, FORL_STOP*8(RA)
+      |  lwz CARG4, FORL_STOP*8+4(RA)
+      |.endif
+      |   lwz SAVE0, FORL_STEP*8(RA)
     } else {
       |.if DUALNUM
       |9:  // FP loop.
       |.else
       |  lwzux TMP1, RA, BASE
-      |  lwz TMP3, FORL_STEP*8(RA)
+      |  lwz SAVE0, FORL_STEP*8(RA)
       |  lwz TMP2, FORL_STOP*8(RA)
       |  cmplw cr0, TMP1, TISNUM
-      |  cmplw cr7, TMP3, TISNUM
+      |  cmplw cr7, SAVE0, TISNUM
       |  cmplw cr1, TMP2, TISNUM
       |.endif
+      |.if FPU
       |   lfd f1, FORL_IDX*8(RA)
+      |.else
+      |   lwz CARG1, FORL_IDX*8(RA)
+      |   lwz CARG2, FORL_IDX*8+4(RA)
+      |.endif
       |  crand 4*cr0+lt, 4*cr0+lt, 4*cr7+lt
       |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
+      |.if FPU
       |   lfd f2, FORL_STOP*8(RA)
+      |.else
+      |   lwz CARG3, FORL_STOP*8(RA)
+      |   lwz CARG4, FORL_STOP*8+4(RA)
+      |.endif
       |  bge ->vmeta_for
     }
-    |  cmpwi cr6, TMP3, 0
+    |  cmpwi cr6, SAVE0, 0
     if (op != BC_JFORL) {
       |  srwi RD, RD, 1
     }
+    |.if FPU
     |   stfd f1, FORL_EXT*8(RA)
+    |.else
+    |   stw CARG1, FORL_EXT*8(RA)
+    |   stw CARG2, FORL_EXT*8+4(RA)
+    |.endif
     if (op != BC_JFORL) {
       |  add RD, PC, RD
     }
+    |.if FPU
     |  fcmpu cr0, f1, f2
+    |.else
+    |  mr SAVE1, RD
+    |  blex __ledf2
+    |  cmpwi CRET1, 0
+    |  mr RD, SAVE1
+    |.endif
     if (op == BC_JFORI) {
       |  addis PC, RD, -(BCBIAS_J*4 >> 16)
     }
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (4 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 11:46   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 14:33   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
                   ` (15 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
Sponsored by Cisco Systems, Inc.

(cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)

The software floating point library is used on machines which do not
have hardware support for floating point [1]. This patch enables
support for such machines in the JIT compiler for powerpc.
This includes:
* All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
* `asm_sfpmin_max()` is introduced for min/max operations on soft-float
  point.
* `asm_sfpcomp()` is introduced for soft-float point comparisons.

[1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

Sergey Kaplun:
* added the description for the feature

Part of tarantool/tarantool#8825
---
 src/lj_arch.h    |   1 -
 src/lj_asm_ppc.h | 321 ++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 278 insertions(+), 44 deletions(-)

diff --git a/src/lj_arch.h b/src/lj_arch.h
index 8bb8757d..7397492e 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -281,7 +281,6 @@
 #endif
 
 #if LJ_ABI_SOFTFP
-#define LJ_ARCH_NOJIT		1  /* NYI */
 #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
 #else
 #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
index aa2d45c0..6cb608f7 100644
--- a/src/lj_asm_ppc.h
+++ b/src/lj_asm_ppc.h
@@ -226,6 +226,7 @@ static void asm_fusexrefx(ASMState *as, PPCIns pi, Reg rt, IRRef ref,
   emit_tab(as, pi, rt, left, right);
 }
 
+#if !LJ_SOFTFP
 /* Fuse to multiply-add/sub instruction. */
 static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
 {
@@ -245,6 +246,7 @@ static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
   }
   return 0;
 }
+#endif
 
 /* -- Calls --------------------------------------------------------------- */
 
@@ -253,13 +255,17 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
 {
   uint32_t n, nargs = CCI_XNARGS(ci);
   int32_t ofs = 8;
-  Reg gpr = REGARG_FIRSTGPR, fpr = REGARG_FIRSTFPR;
+  Reg gpr = REGARG_FIRSTGPR;
+#if !LJ_SOFTFP
+  Reg fpr = REGARG_FIRSTFPR;
+#endif
   if ((void *)ci->func)
     emit_call(as, (void *)ci->func);
   for (n = 0; n < nargs; n++) {  /* Setup args. */
     IRRef ref = args[n];
     if (ref) {
       IRIns *ir = IR(ref);
+#if !LJ_SOFTFP
       if (irt_isfp(ir->t)) {
 	if (fpr <= REGARG_LASTFPR) {
 	  lua_assert(rset_test(as->freeset, fpr));  /* Already evicted. */
@@ -271,7 +277,9 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
 	  emit_spstore(as, ir, r, ofs);
 	  ofs += irt_isnum(ir->t) ? 8 : 4;
 	}
-      } else {
+      } else
+#endif
+      {
 	if (gpr <= REGARG_LASTGPR) {
 	  lua_assert(rset_test(as->freeset, gpr));  /* Already evicted. */
 	  ra_leftov(as, gpr, ref);
@@ -290,8 +298,10 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
     }
     checkmclim(as);
   }
+#if !LJ_SOFTFP
   if ((ci->flags & CCI_VARARG))  /* Vararg calls need to know about FPR use. */
     emit_tab(as, fpr == REGARG_FIRSTFPR ? PPCI_CRXOR : PPCI_CREQV, 6, 6, 6);
+#endif
 }
 
 /* Setup result reg/sp for call. Evict scratch regs. */
@@ -299,8 +309,10 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
 {
   RegSet drop = RSET_SCRATCH;
   int hiop = ((ir+1)->o == IR_HIOP && !irt_isnil((ir+1)->t));
+#if !LJ_SOFTFP
   if ((ci->flags & CCI_NOFPRCLOBBER))
     drop &= ~RSET_FPR;
+#endif
   if (ra_hasreg(ir->r))
     rset_clear(drop, ir->r);  /* Dest reg handled below. */
   if (hiop && ra_hasreg((ir+1)->r))
@@ -308,7 +320,7 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
   ra_evictset(as, drop);  /* Evictions must be performed first. */
   if (ra_used(ir)) {
     lua_assert(!irt_ispri(ir->t));
-    if (irt_isfp(ir->t)) {
+    if (!LJ_SOFTFP && irt_isfp(ir->t)) {
       if ((ci->flags & CCI_CASTU64)) {
 	/* Use spill slot or temp slots. */
 	int32_t ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
@@ -377,6 +389,7 @@ static void asm_retf(ASMState *as, IRIns *ir)
 
 /* -- Type conversions ---------------------------------------------------- */
 
+#if !LJ_SOFTFP
 static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
 {
   RegSet allow = RSET_FPR;
@@ -409,15 +422,23 @@ static void asm_tobit(ASMState *as, IRIns *ir)
   emit_fai(as, PPCI_STFD, tmp, RID_SP, SPOFS_TMP);
   emit_fab(as, PPCI_FADD, tmp, left, right);
 }
+#endif
 
 static void asm_conv(ASMState *as, IRIns *ir)
 {
   IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
+#if !LJ_SOFTFP
   int stfp = (st == IRT_NUM || st == IRT_FLOAT);
+#endif
   IRRef lref = ir->op1;
-  lua_assert(irt_type(ir->t) != st);
   lua_assert(!(irt_isint64(ir->t) ||
 	       (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
+#if LJ_SOFTFP
+  /* FP conversions are handled by SPLIT. */
+  lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
+  /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
+#else
+  lua_assert(irt_type(ir->t) != st);
   if (irt_isfp(ir->t)) {
     Reg dest = ra_dest(as, ir, RSET_FPR);
     if (stfp) {  /* FP to FP conversion. */
@@ -476,7 +497,9 @@ static void asm_conv(ASMState *as, IRIns *ir)
 	emit_fb(as, PPCI_FCTIWZ, tmp, left);
       }
     }
-  } else {
+  } else
+#endif
+  {
     Reg dest = ra_dest(as, ir, RSET_GPR);
     if (st >= IRT_I8 && st <= IRT_U16) {  /* Extend to 32 bit integer. */
       Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
@@ -496,17 +519,41 @@ static void asm_strto(ASMState *as, IRIns *ir)
 {
   const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
   IRRef args[2];
-  int32_t ofs;
+  int32_t ofs = SPOFS_TMP;
+#if LJ_SOFTFP
+  ra_evictset(as, RSET_SCRATCH);
+  if (ra_used(ir)) {
+    if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
+	(ir->s & 1) == LJ_BE && (ir->s ^ 1) == (ir+1)->s) {
+      int i;
+      for (i = 0; i < 2; i++) {
+	Reg r = (ir+i)->r;
+	if (ra_hasreg(r)) {
+	  ra_free(as, r);
+	  ra_modified(as, r);
+	  emit_spload(as, ir+i, r, sps_scale((ir+i)->s));
+	}
+      }
+      ofs = sps_scale(ir->s & ~1);
+    } else {
+      Reg rhi = ra_dest(as, ir+1, RSET_GPR);
+      Reg rlo = ra_dest(as, ir, rset_exclude(RSET_GPR, rhi));
+      emit_tai(as, PPCI_LWZ, rhi, RID_SP, ofs);
+      emit_tai(as, PPCI_LWZ, rlo, RID_SP, ofs+4);
+    }
+  }
+#else
   RegSet drop = RSET_SCRATCH;
   if (ra_hasreg(ir->r)) rset_set(drop, ir->r);  /* Spill dest reg (if any). */
   ra_evictset(as, drop);
+  if (ir->s) ofs = sps_scale(ir->s);
+#endif
   asm_guardcc(as, CC_EQ);
   emit_ai(as, PPCI_CMPWI, RID_RET, 0);  /* Test return status. */
   args[0] = ir->op1;      /* GCstr *str */
   args[1] = ASMREF_TMP1;  /* TValue *n  */
   asm_gencall(as, ci, args);
   /* Store the result to the spill slot or temp slots. */
-  ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
   emit_tai(as, PPCI_ADDI, ra_releasetmp(as, ASMREF_TMP1), RID_SP, ofs);
 }
 
@@ -530,7 +577,10 @@ static void asm_tvptr(ASMState *as, Reg dest, IRRef ref)
       Reg src = ra_alloc1(as, ref, allow);
       emit_setgl(as, src, tmptv.gcr);
     }
-    type = ra_allock(as, irt_toitype(ir->t), allow);
+    if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
+      type = ra_alloc1(as, ref+1, allow);
+    else
+      type = ra_allock(as, irt_toitype(ir->t), allow);
     emit_setgl(as, type, tmptv.it);
   }
 }
@@ -574,11 +624,27 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
   Reg tisnum = RID_NONE, tmpnum = RID_NONE;
   IRRef refkey = ir->op2;
   IRIns *irkey = IR(refkey);
+  int isk = irref_isk(refkey);
   IRType1 kt = irkey->t;
   uint32_t khash;
   MCLabel l_end, l_loop, l_next;
 
   rset_clear(allow, tab);
+#if LJ_SOFTFP
+  if (!isk) {
+    key = ra_alloc1(as, refkey, allow);
+    rset_clear(allow, key);
+    if (irkey[1].o == IR_HIOP) {
+      if (ra_hasreg((irkey+1)->r)) {
+	tmpnum = (irkey+1)->r;
+	ra_noweak(as, tmpnum);
+      } else {
+	tmpnum = ra_allocref(as, refkey+1, allow);
+      }
+      rset_clear(allow, tmpnum);
+    }
+  }
+#else
   if (irt_isnum(kt)) {
     key = ra_alloc1(as, refkey, RSET_FPR);
     tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
@@ -588,6 +654,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
     key = ra_alloc1(as, refkey, allow);
     rset_clear(allow, key);
   }
+#endif
   tmp2 = ra_scratch(as, allow);
   rset_clear(allow, tmp2);
 
@@ -610,7 +677,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
     asm_guardcc(as, CC_EQ);
   else
     emit_condbranch(as, PPCI_BC|PPCF_Y, CC_EQ, l_end);
-  if (irt_isnum(kt)) {
+  if (!LJ_SOFTFP && irt_isnum(kt)) {
     emit_fab(as, PPCI_FCMPU, 0, tmpnum, key);
     emit_condbranch(as, PPCI_BC, CC_GE, l_next);
     emit_ab(as, PPCI_CMPLW, tmp1, tisnum);
@@ -620,7 +687,10 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
       emit_ab(as, PPCI_CMPW, tmp2, key);
       emit_condbranch(as, PPCI_BC, CC_NE, l_next);
     }
-    emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
+    if (LJ_SOFTFP && ra_hasreg(tmpnum))
+      emit_ab(as, PPCI_CMPW, tmp1, tmpnum);
+    else
+      emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
     if (!irt_ispri(kt))
       emit_tai(as, PPCI_LWZ, tmp2, dest, (int32_t)offsetof(Node, key.gcr));
   }
@@ -629,19 +699,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
 	    (((char *)as->mcp-(char *)l_loop) & 0xffffu);
 
   /* Load main position relative to tab->node into dest. */
-  khash = irref_isk(refkey) ? ir_khash(irkey) : 1;
+  khash = isk ? ir_khash(irkey) : 1;
   if (khash == 0) {
     emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
   } else {
     Reg tmphash = tmp1;
-    if (irref_isk(refkey))
+    if (isk)
       tmphash = ra_allock(as, khash, allow);
     emit_tab(as, PPCI_ADD, dest, dest, tmp1);
     emit_tai(as, PPCI_MULLI, tmp1, tmp1, sizeof(Node));
     emit_asb(as, PPCI_AND, tmp1, tmp2, tmphash);
     emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
     emit_tai(as, PPCI_LWZ, tmp2, tab, (int32_t)offsetof(GCtab, hmask));
-    if (irref_isk(refkey)) {
+    if (isk) {
       /* Nothing to do. */
     } else if (irt_isstr(kt)) {
       emit_tai(as, PPCI_LWZ, tmp1, key, (int32_t)offsetof(GCstr, hash));
@@ -651,13 +721,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
       emit_asb(as, PPCI_XOR, tmp1, tmp1, tmp2);
       emit_rotlwi(as, tmp1, tmp1, (HASH_ROT2+HASH_ROT1)&31);
       emit_tab(as, PPCI_SUBF, tmp2, dest, tmp2);
-      if (irt_isnum(kt)) {
+      if (LJ_SOFTFP ? (irkey[1].o == IR_HIOP) : irt_isnum(kt)) {
+#if LJ_SOFTFP
+	emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
+	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
+	emit_tab(as, PPCI_ADD, tmp1, tmpnum, tmpnum);
+#else
 	int32_t ofs = ra_spill(as, irkey);
 	emit_asb(as, PPCI_XOR, tmp2, tmp2, tmp1);
 	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
 	emit_tab(as, PPCI_ADD, tmp1, tmp1, tmp1);
 	emit_tai(as, PPCI_LWZ, tmp2, RID_SP, ofs+4);
 	emit_tai(as, PPCI_LWZ, tmp1, RID_SP, ofs);
+#endif
       } else {
 	emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
 	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
@@ -784,8 +860,8 @@ static PPCIns asm_fxloadins(IRIns *ir)
   case IRT_U8: return PPCI_LBZ;
   case IRT_I16: return PPCI_LHA;
   case IRT_U16: return PPCI_LHZ;
-  case IRT_NUM: return PPCI_LFD;
-  case IRT_FLOAT: return PPCI_LFS;
+  case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_LFD;
+  case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_LFS;
   default: return PPCI_LWZ;
   }
 }
@@ -795,8 +871,8 @@ static PPCIns asm_fxstoreins(IRIns *ir)
   switch (irt_type(ir->t)) {
   case IRT_I8: case IRT_U8: return PPCI_STB;
   case IRT_I16: case IRT_U16: return PPCI_STH;
-  case IRT_NUM: return PPCI_STFD;
-  case IRT_FLOAT: return PPCI_STFS;
+  case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_STFD;
+  case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_STFS;
   default: return PPCI_STW;
   }
 }
@@ -839,7 +915,8 @@ static void asm_fstore(ASMState *as, IRIns *ir)
 
 static void asm_xload(ASMState *as, IRIns *ir)
 {
-  Reg dest = ra_dest(as, ir, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
+  Reg dest = ra_dest(as, ir,
+    (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
   lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
   if (irt_isi8(ir->t))
     emit_as(as, PPCI_EXTSB, dest, dest);
@@ -857,7 +934,8 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
     Reg src = ra_alloc1(as, irb->op1, RSET_GPR);
     asm_fusexrefx(as, PPCI_STWBRX, src, ir->op1, rset_exclude(RSET_GPR, src));
   } else {
-    Reg src = ra_alloc1(as, ir->op2, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
+    Reg src = ra_alloc1(as, ir->op2,
+      (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
     asm_fusexref(as, asm_fxstoreins(ir), src, ir->op1,
 		 rset_exclude(RSET_GPR, src), ofs);
   }
@@ -871,10 +949,19 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
   Reg dest = RID_NONE, type = RID_TMP, tmp = RID_TMP, idx;
   RegSet allow = RSET_GPR;
   int32_t ofs = AHUREF_LSX;
+  if (LJ_SOFTFP && (ir+1)->o == IR_HIOP) {
+    t.irt = IRT_NUM;
+    if (ra_used(ir+1)) {
+      type = ra_dest(as, ir+1, allow);
+      rset_clear(allow, type);
+    }
+    ofs = 0;
+  }
   if (ra_used(ir)) {
-    lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
-    if (!irt_isnum(t)) ofs = 0;
-    dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
+    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
+	       irt_isint(ir->t) || irt_isaddr(ir->t));
+    if (LJ_SOFTFP || !irt_isnum(t)) ofs = 0;
+    dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
     rset_clear(allow, dest);
   }
   idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
@@ -883,12 +970,13 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
     asm_guardcc(as, CC_GE);
     emit_ab(as, PPCI_CMPLW, type, tisnum);
     if (ra_hasreg(dest)) {
-      if (ofs == AHUREF_LSX) {
+      if (!LJ_SOFTFP && ofs == AHUREF_LSX) {
 	tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_GPR,
 						       (idx&255)), (idx>>8)));
 	emit_fab(as, PPCI_LFDX, dest, (idx&255), tmp);
       } else {
-	emit_fai(as, PPCI_LFD, dest, idx, ofs);
+	emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest, idx,
+		 ofs+4*LJ_SOFTFP);
       }
     }
   } else {
@@ -911,7 +999,7 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
   int32_t ofs = AHUREF_LSX;
   if (ir->r == RID_SINK)
     return;
-  if (irt_isnum(ir->t)) {
+  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
     src = ra_alloc1(as, ir->op2, RSET_FPR);
   } else {
     if (!irt_ispri(ir->t)) {
@@ -919,11 +1007,14 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
       rset_clear(allow, src);
       ofs = 0;
     }
-    type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
+    if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
+      type = ra_alloc1(as, (ir+1)->op2, allow);
+    else
+      type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
     rset_clear(allow, type);
   }
   idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
-  if (irt_isnum(ir->t)) {
+  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
     if (ofs == AHUREF_LSX) {
       emit_fab(as, PPCI_STFDX, src, (idx&255), RID_TMP);
       emit_slwi(as, RID_TMP, (idx>>8), 3);
@@ -948,21 +1039,33 @@ static void asm_sload(ASMState *as, IRIns *ir)
   IRType1 t = ir->t;
   Reg dest = RID_NONE, type = RID_NONE, base;
   RegSet allow = RSET_GPR;
+  int hiop = (LJ_SOFTFP && (ir+1)->o == IR_HIOP);
+  if (hiop)
+    t.irt = IRT_NUM;
   lua_assert(!(ir->op2 & IRSLOAD_PARENT));  /* Handled by asm_head_side(). */
-  lua_assert(irt_isguard(t) || !(ir->op2 & IRSLOAD_TYPECHECK));
+  lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
   lua_assert(LJ_DUALNUM ||
 	     !irt_isint(t) || (ir->op2 & (IRSLOAD_CONVERT|IRSLOAD_FRAME)));
+#if LJ_SOFTFP
+  lua_assert(!(ir->op2 & IRSLOAD_CONVERT));  /* Handled by LJ_SOFTFP SPLIT. */
+  if (hiop && ra_used(ir+1)) {
+    type = ra_dest(as, ir+1, allow);
+    rset_clear(allow, type);
+  }
+#else
   if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
     dest = ra_scratch(as, RSET_FPR);
     asm_tointg(as, ir, dest);
     t.irt = IRT_NUM;  /* Continue with a regular number type check. */
-  } else if (ra_used(ir)) {
+  } else
+#endif
+  if (ra_used(ir)) {
     lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
-    dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
+    dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
     rset_clear(allow, dest);
     base = ra_alloc1(as, REF_BASE, allow);
     rset_clear(allow, base);
-    if ((ir->op2 & IRSLOAD_CONVERT)) {
+    if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
       if (irt_isint(t)) {
 	emit_tai(as, PPCI_LWZ, dest, RID_SP, SPOFS_TMPLO);
 	dest = ra_scratch(as, RSET_FPR);
@@ -994,10 +1097,13 @@ dotypecheck:
     if ((ir->op2 & IRSLOAD_TYPECHECK)) {
       Reg tisnum = ra_allock(as, (int32_t)LJ_TISNUM, allow);
       asm_guardcc(as, CC_GE);
-      emit_ab(as, PPCI_CMPLW, RID_TMP, tisnum);
+#if !LJ_SOFTFP
       type = RID_TMP;
+#endif
+      emit_ab(as, PPCI_CMPLW, type, tisnum);
     }
-    if (ra_hasreg(dest)) emit_fai(as, PPCI_LFD, dest, base, ofs-4);
+    if (ra_hasreg(dest)) emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest,
+				  base, ofs-(LJ_SOFTFP?0:4));
   } else {
     if ((ir->op2 & IRSLOAD_TYPECHECK)) {
       asm_guardcc(as, CC_NE);
@@ -1122,6 +1228,7 @@ static void asm_obar(ASMState *as, IRIns *ir)
 
 /* -- Arithmetic and logic operations ------------------------------------- */
 
+#if !LJ_SOFTFP
 static void asm_fparith(ASMState *as, IRIns *ir, PPCIns pi)
 {
   Reg dest = ra_dest(as, ir, RSET_FPR);
@@ -1149,13 +1256,17 @@ static void asm_fpmath(ASMState *as, IRIns *ir)
   else
     asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
 }
+#endif
 
 static void asm_add(ASMState *as, IRIns *ir)
 {
+#if !LJ_SOFTFP
   if (irt_isnum(ir->t)) {
     if (!asm_fusemadd(as, ir, PPCI_FMADD, PPCI_FMADD))
       asm_fparith(as, ir, PPCI_FADD);
-  } else {
+  } else
+#endif
+  {
     Reg dest = ra_dest(as, ir, RSET_GPR);
     Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
     PPCIns pi;
@@ -1194,10 +1305,13 @@ static void asm_add(ASMState *as, IRIns *ir)
 
 static void asm_sub(ASMState *as, IRIns *ir)
 {
+#if !LJ_SOFTFP
   if (irt_isnum(ir->t)) {
     if (!asm_fusemadd(as, ir, PPCI_FMSUB, PPCI_FNMSUB))
       asm_fparith(as, ir, PPCI_FSUB);
-  } else {
+  } else
+#endif
+  {
     PPCIns pi = PPCI_SUBF;
     Reg dest = ra_dest(as, ir, RSET_GPR);
     Reg left, right;
@@ -1223,9 +1337,12 @@ static void asm_sub(ASMState *as, IRIns *ir)
 
 static void asm_mul(ASMState *as, IRIns *ir)
 {
+#if !LJ_SOFTFP
   if (irt_isnum(ir->t)) {
     asm_fparith(as, ir, PPCI_FMUL);
-  } else {
+  } else
+#endif
+  {
     PPCIns pi = PPCI_MULLW;
     Reg dest = ra_dest(as, ir, RSET_GPR);
     Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
@@ -1253,9 +1370,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
 
 static void asm_neg(ASMState *as, IRIns *ir)
 {
+#if !LJ_SOFTFP
   if (irt_isnum(ir->t)) {
     asm_fpunary(as, ir, PPCI_FNEG);
-  } else {
+  } else
+#endif
+  {
     Reg dest, left;
     PPCIns pi = PPCI_NEG;
     if (as->flagmcp == as->mcp) {
@@ -1566,9 +1686,40 @@ static void asm_bitshift(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pik)
 		       PPCI_RLWINM|PPCF_MB(0)|PPCF_ME(31))
 #define asm_bror(as, ir)	lua_assert(0)
 
+#if LJ_SOFTFP
+static void asm_sfpmin_max(ASMState *as, IRIns *ir)
+{
+  CCallInfo ci = lj_ir_callinfo[IRCALL_softfp_cmp];
+  IRRef args[4];
+  MCLabel l_right, l_end;
+  Reg desthi = ra_dest(as, ir, RSET_GPR), destlo = ra_dest(as, ir+1, RSET_GPR);
+  Reg righthi, lefthi = ra_alloc2(as, ir, RSET_GPR);
+  Reg rightlo, leftlo = ra_alloc2(as, ir+1, RSET_GPR);
+  PPCCC cond = (IROp)ir->o == IR_MIN ? CC_EQ : CC_NE;
+  righthi = (lefthi >> 8); lefthi &= 255;
+  rightlo = (leftlo >> 8); leftlo &= 255;
+  args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
+  args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
+  l_end = emit_label(as);
+  if (desthi != righthi) emit_mr(as, desthi, righthi);
+  if (destlo != rightlo) emit_mr(as, destlo, rightlo);
+  l_right = emit_label(as);
+  if (l_end != l_right) emit_jmp(as, l_end);
+  if (desthi != lefthi) emit_mr(as, desthi, lefthi);
+  if (destlo != leftlo) emit_mr(as, destlo, leftlo);
+  if (l_right == as->mcp+1) {
+    cond ^= 4; l_right = l_end; ++as->mcp;
+  }
+  emit_condbranch(as, PPCI_BC, cond, l_right);
+  ra_evictset(as, RSET_SCRATCH);
+  emit_cmpi(as, RID_RET, 1);
+  asm_gencall(as, &ci, args);
+}
+#endif
+
 static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
 {
-  if (irt_isnum(ir->t)) {
+  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
     Reg dest = ra_dest(as, ir, RSET_FPR);
     Reg tmp = dest;
     Reg right, left = ra_alloc2(as, ir, RSET_FPR);
@@ -1656,7 +1807,7 @@ static void asm_intcomp_(ASMState *as, IRRef lref, IRRef rref, Reg cr, PPCCC cc)
 static void asm_comp(ASMState *as, IRIns *ir)
 {
   PPCCC cc = asm_compmap[ir->o];
-  if (irt_isnum(ir->t)) {
+  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
     Reg right, left = ra_alloc2(as, ir, RSET_FPR);
     right = (left >> 8); left &= 255;
     asm_guardcc(as, (cc >> 4));
@@ -1677,6 +1828,44 @@ static void asm_comp(ASMState *as, IRIns *ir)
 
 #define asm_equal(as, ir)	asm_comp(as, ir)
 
+#if LJ_SOFTFP
+/* SFP comparisons. */
+static void asm_sfpcomp(ASMState *as, IRIns *ir)
+{
+  const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
+  RegSet drop = RSET_SCRATCH;
+  Reg r;
+  IRRef args[4];
+  args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
+  args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
+
+  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
+    if (!rset_test(as->freeset, r) &&
+	regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
+      rset_clear(drop, r);
+  }
+  ra_evictset(as, drop);
+  asm_setupresult(as, ir, ci);
+  switch ((IROp)ir->o) {
+  case IR_ULT:
+    asm_guardcc(as, CC_EQ);
+    emit_ai(as, PPCI_CMPWI, RID_RET, 0);
+  case IR_ULE:
+    asm_guardcc(as, CC_EQ);
+    emit_ai(as, PPCI_CMPWI, RID_RET, 1);
+    break;
+  case IR_GE: case IR_GT:
+    asm_guardcc(as, CC_EQ);
+    emit_ai(as, PPCI_CMPWI, RID_RET, 2);
+  default:
+    asm_guardcc(as, (asm_compmap[ir->o] & 0xf));
+    emit_ai(as, PPCI_CMPWI, RID_RET, 0);
+    break;
+  }
+  asm_gencall(as, ci, args);
+}
+#endif
+
 #if LJ_HASFFI
 /* 64 bit integer comparisons. */
 static void asm_comp64(ASMState *as, IRIns *ir)
@@ -1706,19 +1895,36 @@ static void asm_comp64(ASMState *as, IRIns *ir)
 /* Hiword op of a split 64 bit op. Previous op must be the loword op. */
 static void asm_hiop(ASMState *as, IRIns *ir)
 {
-#if LJ_HASFFI
+#if LJ_HASFFI || LJ_SOFTFP
   /* HIOP is marked as a store because it needs its own DCE logic. */
   int uselo = ra_used(ir-1), usehi = ra_used(ir);  /* Loword/hiword used? */
   if (LJ_UNLIKELY(!(as->flags & JIT_F_OPT_DCE))) uselo = usehi = 1;
   if ((ir-1)->o == IR_CONV) {  /* Conversions to/from 64 bit. */
     as->curins--;  /* Always skip the CONV. */
+#if LJ_HASFFI && !LJ_SOFTFP
     if (usehi || uselo)
       asm_conv64(as, ir);
     return;
+#endif
   } else if ((ir-1)->o <= IR_NE) {  /* 64 bit integer comparisons. ORDER IR. */
     as->curins--;  /* Always skip the loword comparison. */
+#if LJ_SOFTFP
+    if (!irt_isint(ir->t)) {
+      asm_sfpcomp(as, ir-1);
+      return;
+    }
+#endif
+#if LJ_HASFFI
     asm_comp64(as, ir);
+#endif
+    return;
+#if LJ_SOFTFP
+  } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
+      as->curins--;  /* Always skip the loword min/max. */
+    if (uselo || usehi)
+      asm_sfpmin_max(as, ir-1);
     return;
+#endif
   } else if ((ir-1)->o == IR_XSTORE) {
     as->curins--;  /* Handle both stores here. */
     if ((ir-1)->r != RID_SINK) {
@@ -1729,14 +1935,27 @@ static void asm_hiop(ASMState *as, IRIns *ir)
   }
   if (!usehi) return;  /* Skip unused hiword op for all remaining ops. */
   switch ((ir-1)->o) {
+#if LJ_HASFFI
   case IR_ADD: as->curins--; asm_add64(as, ir); break;
   case IR_SUB: as->curins--; asm_sub64(as, ir); break;
   case IR_NEG: as->curins--; asm_neg64(as, ir); break;
+#endif
+#if LJ_SOFTFP
+  case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
+  case IR_STRTO:
+    if (!uselo)
+      ra_allocref(as, ir->op1, RSET_GPR);  /* Mark lo op as used. */
+    break;
+#endif
   case IR_CALLN:
+  case IR_CALLS:
   case IR_CALLXS:
     if (!uselo)
       ra_allocref(as, ir->op1, RID2RSET(RID_RETLO));  /* Mark lo op as used. */
     break;
+#if LJ_SOFTFP
+  case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
+#endif
   case IR_CNEWI:
     /* Nothing to do here. Handled by lo op itself. */
     break;
@@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
     if ((sn & SNAP_NORESTORE))
       continue;
     if (irt_isnum(ir->t)) {
+#if LJ_SOFTFP
+      Reg tmp;
+      RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
+      lua_assert(irref_isk(ref));  /* LJ_SOFTFP: must be a number constant. */
+      tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.lo, allow);
+      emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?4:0));
+      if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
+      tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
+      emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
+#else
       Reg src = ra_alloc1(as, ref, RSET_FPR);
       emit_fai(as, PPCI_STFD, src, RID_BASE, ofs);
+#endif
     } else {
       Reg type;
       RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
@@ -1814,6 +2044,10 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
       if ((sn & (SNAP_CONT|SNAP_FRAME))) {
 	if (s == 0) continue;  /* Do not overwrite link to previous frame. */
 	type = ra_allock(as, (int32_t)(*flinks--), allow);
+#if LJ_SOFTFP
+      } else if ((sn & SNAP_SOFTFPNUM)) {
+	type = ra_alloc1(as, ref+1, rset_exclude(RSET_GPR, RID_BASE));
+#endif
       } else {
 	type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
       }
@@ -1950,14 +2184,15 @@ static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci)
   int nslots = 2, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR;
   asm_collectargs(as, ir, ci, args);
   for (i = 0; i < nargs; i++)
-    if (args[i] && irt_isfp(IR(args[i])->t)) {
+    if (!LJ_SOFTFP && args[i] && irt_isfp(IR(args[i])->t)) {
       if (nfpr > 0) nfpr--; else nslots = (nslots+3) & ~1;
     } else {
       if (ngpr > 0) ngpr--; else nslots++;
     }
   if (nslots > as->evenspill)  /* Leave room for args in stack slots. */
     as->evenspill = nslots;
-  return irt_isfp(ir->t) ? REGSP_HINT(RID_FPRET) : REGSP_HINT(RID_RET);
+  return (!LJ_SOFTFP && irt_isfp(ir->t)) ? REGSP_HINT(RID_FPRET) :
+					   REGSP_HINT(RID_RET);
 }
 
 static void asm_setup_target(ASMState *as)
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (5 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 11:58   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 14:31   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

This patch is a follow-up for the commit
a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
timer from lj_profile"). It moves the timer machinery to the separate
module. Unfortunately, the `profile_{un}lock()` calls for Windows and
PS3 wasn't updated to access `lj_profile_timer` structure instead of
`ProfileState`.

Also, it is a follow-up to the commit
f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
section"). Since this commit the system-dependent header <unistd.h> and
`write()`, `open()`, `close()` functions are used. They are undefining
on Windows, so this leads to error during the build.

This patch fixes the aforementioned misbehaviour. After it our fork may
be built on Windows at least.
---
 src/lib_misc.c         | 16 ++++++++++++----
 src/lj_profile_timer.h |  8 ++++----
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/lib_misc.c b/src/lib_misc.c
index c18d297e..1913a622 100644
--- a/src/lib_misc.c
+++ b/src/lib_misc.c
@@ -8,10 +8,6 @@
 #define lib_misc_c
 #define LUA_LIB
 
-#include <errno.h>
-#include <fcntl.h>
-#include <unistd.h>
-
 #include "lua.h"
 #include "lmisclib.h"
 #include "lauxlib.h"
@@ -25,6 +21,12 @@
 
 #include "lj_memprof.h"
 
+#include <errno.h>
+#include <fcntl.h>
+#if !LJ_TARGET_WINDOWS
+#include <unistd.h>
+#endif
+
 /* ------------------------------------------------------------------------ */
 
 static LJ_AINLINE void setnumfield(struct lua_State *L, GCtab *t,
@@ -78,6 +80,7 @@ LJLIB_CF(misc_getmetrics)
 
 /* --------- profile common section --------------------------------------- */
 
+#if !LJ_TARGET_WINDOWS
 /*
 ** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
 */
@@ -434,6 +437,7 @@ LJLIB_CF(misc_memprof_stop)
   lua_pushboolean(L, 1);
   return 1;
 }
+#endif /* !LJ_TARGET_WINDOWS */
 
 #include "lj_libdef.h"
 
@@ -441,6 +445,7 @@ LJLIB_CF(misc_memprof_stop)
 
 LUALIB_API int luaopen_misc(struct lua_State *L)
 {
+#if !LJ_TARGET_WINDOWS
   luaM_sysprof_set_writer(buffer_writer_default);
   luaM_sysprof_set_on_stop(on_stop_cb_default);
   /*
@@ -448,9 +453,12 @@ LUALIB_API int luaopen_misc(struct lua_State *L)
   ** backtracing function.
   */
   luaM_sysprof_set_backtracer(NULL);
+#endif /* !LJ_TARGET_WINDOWS */
 
   LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
+#if !LJ_TARGET_WINDOWS
   LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
   LJ_LIB_REG(L, LUAM_MISCLIBNAME ".sysprof", misc_sysprof);
+#endif /* !LJ_TARGET_WINDOWS */
   return 1;
 }
diff --git a/src/lj_profile_timer.h b/src/lj_profile_timer.h
index 1deeea53..b3e1a6e9 100644
--- a/src/lj_profile_timer.h
+++ b/src/lj_profile_timer.h
@@ -25,8 +25,8 @@
 #if LJ_TARGET_PS3
 #include <sys/timer.h>
 #endif
-#define profile_lock(ps)	pthread_mutex_lock(&ps->lock)
-#define profile_unlock(ps)	pthread_mutex_unlock(&ps->lock)
+#define profile_lock(ps)	pthread_mutex_lock(&ps->timer.lock)
+#define profile_unlock(ps)	pthread_mutex_unlock(&ps->timer.lock)
 
 #elif LJ_PROFILE_WTHREAD
 
@@ -38,8 +38,8 @@
 #include <windows.h>
 #endif
 typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int);
-#define profile_lock(ps)	EnterCriticalSection(&ps->lock)
-#define profile_unlock(ps)	LeaveCriticalSection(&ps->lock)
+#define profile_lock(ps)	EnterCriticalSection(&ps->timer.lock)
+#define profile_unlock(ps)	LeaveCriticalSection(&ps->timer.lock)
 
 #endif
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (6 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 12:09   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 16:40   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes Sergey Kaplun via Tarantool-patches
                   ` (13 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by Ben Pye.

(cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)

This patch adds partial support for the Universal Windows Platform [1]
in LuaJIT.
This includes:
* `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
  Platform.
* `LJ_WIN_VALLOC()` macro is introduced to use instead of
  `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
* `LJ_WIN_VPROTECT()` macro is introduced to use instead of
  `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
* `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
  `LoadLibraryExA()` [6] (custom implementation using
  `LoadPackagedLibrary()` [7] for UWP).

Note that the following features are not implemented for UWP:
* `io.popen()`.
* LuaJIT profiler's (`jit.p`) timer for Windows has not very high
  resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
  not used, because the <winmm.dll> library isn't loaded.

[1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
[2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
[3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
[4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
[5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
[6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
[7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
[8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
[9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod

Sergey Kaplun:
* added the description for the feature

Part of tarantool/tarantool#8825
---
 doc/ext_ffi_api.html   |  2 ++
 src/lib_ffi.c          |  3 +++
 src/lib_io.c           |  4 ++--
 src/lib_package.c      | 24 +++++++++++++++++++++++-
 src/lj_alloc.c         |  6 +++---
 src/lj_arch.h          | 19 +++++++++++++++++++
 src/lj_ccallback.c     |  4 ++--
 src/lj_clib.c          | 20 ++++++++++++++++----
 src/lj_mcode.c         |  8 ++++----
 src/lj_profile_timer.c |  8 ++++----
 10 files changed, 78 insertions(+), 20 deletions(-)

diff --git a/doc/ext_ffi_api.html b/doc/ext_ffi_api.html
index 91af2e1d..c72191d1 100644
--- a/doc/ext_ffi_api.html
+++ b/doc/ext_ffi_api.html
@@ -469,6 +469,8 @@ otherwise. The following parameters are currently defined:
 <tr class="odd">
 <td class="abiparam">win</td><td class="abidesc">Windows variant of the standard ABI</td></tr>
 <tr class="even">
+<td class="abiparam">uwp</td><td class="abidesc">Universal Windows Platform</td></tr>
+<tr class="odd">
 <td class="abiparam">gc64</td><td class="abidesc">64 bit GC references</td></tr>
 </table>
 
diff --git a/src/lib_ffi.c b/src/lib_ffi.c
index 136e98e8..d1fe1a14 100644
--- a/src/lib_ffi.c
+++ b/src/lib_ffi.c
@@ -746,6 +746,9 @@ LJLIB_CF(ffi_abi)	LJLIB_REC(.)
 #endif
 #if LJ_ABI_WIN
   case H_(4ab624a8,4ab624a8): b = 1; break;  /* win */
+#endif
+#if LJ_TARGET_UWP
+  case H_(a40f0bcb,a40f0bcb): b = 1; break;  /* uwp */
 #endif
   case H_(3af93066,1f001464): b = 1; break;  /* le/be */
 #if LJ_GC64
diff --git a/src/lib_io.c b/src/lib_io.c
index f0108227..db995ae6 100644
--- a/src/lib_io.c
+++ b/src/lib_io.c
@@ -99,7 +99,7 @@ static int io_file_close(lua_State *L, IOFileUD *iof)
     int stat = -1;
 #if LJ_TARGET_POSIX
     stat = pclose(iof->fp);
-#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE
+#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP
     stat = _pclose(iof->fp);
 #else
     lua_assert(0);
@@ -414,7 +414,7 @@ LJLIB_CF(io_open)
 
 LJLIB_CF(io_popen)
 {
-#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE)
+#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP)
   const char *fname = strdata(lj_lib_checkstr(L, 1));
   GCstr *s = lj_lib_optstr(L, 2);
   const char *mode = s ? strdata(s) : "r";
diff --git a/src/lib_package.c b/src/lib_package.c
index 67959a10..b49f0209 100644
--- a/src/lib_package.c
+++ b/src/lib_package.c
@@ -76,6 +76,20 @@ static const char *ll_bcsym(void *lib, const char *sym)
 BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
 #endif
 
+#if LJ_TARGET_UWP
+void *LJ_WIN_LOADLIBA(const char *path)
+{
+  DWORD err = GetLastError();
+  wchar_t wpath[256];
+  HANDLE lib = NULL;
+  if (MultiByteToWideChar(CP_ACP, 0, path, -1, wpath, 256) > 0) {
+    lib = LoadPackagedLibrary(wpath, 0);
+  }
+  SetLastError(err);
+  return lib;
+}
+#endif
+
 #undef setprogdir
 
 static void setprogdir(lua_State *L)
@@ -119,7 +133,7 @@ static void ll_unloadlib(void *lib)
 
 static void *ll_load(lua_State *L, const char *path, int gl)
 {
-  HINSTANCE lib = LoadLibraryExA(path, NULL, 0);
+  HINSTANCE lib = LJ_WIN_LOADLIBA(path);
   if (lib == NULL) pusherror(L);
   UNUSED(gl);
   return lib;
@@ -132,17 +146,25 @@ static lua_CFunction ll_sym(lua_State *L, void *lib, const char *sym)
   return f;
 }
 
+#if LJ_TARGET_UWP
+EXTERN_C IMAGE_DOS_HEADER __ImageBase;
+#endif
+
 static const char *ll_bcsym(void *lib, const char *sym)
 {
   if (lib) {
     return (const char *)GetProcAddress((HINSTANCE)lib, sym);
   } else {
+#if LJ_TARGET_UWP
+    return (const char *)GetProcAddress((HINSTANCE)&__ImageBase, sym);
+#else
     HINSTANCE h = GetModuleHandleA(NULL);
     const char *p = (const char *)GetProcAddress(h, sym);
     if (p == NULL && GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
 					(const char *)ll_bcsym, &h))
       p = (const char *)GetProcAddress(h, sym);
     return p;
+#endif
   }
 }
 
diff --git a/src/lj_alloc.c b/src/lj_alloc.c
index f7039b5b..9e2fb1f6 100644
--- a/src/lj_alloc.c
+++ b/src/lj_alloc.c
@@ -167,7 +167,7 @@ static void *DIRECT_MMAP(size_t size)
 static void *CALL_MMAP(size_t size)
 {
   DWORD olderr = GetLastError();
-  void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
+  void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
   SetLastError(olderr);
   return ptr ? ptr : MFAIL;
 }
@@ -176,8 +176,8 @@ static void *CALL_MMAP(size_t size)
 static void *DIRECT_MMAP(size_t size)
 {
   DWORD olderr = GetLastError();
-  void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
-			   PAGE_READWRITE);
+  void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
+			    PAGE_READWRITE);
   SetLastError(olderr);
   return ptr ? ptr : MFAIL;
 }
diff --git a/src/lj_arch.h b/src/lj_arch.h
index 7397492e..0351e046 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -141,6 +141,13 @@
 #define LJ_TARGET_GC64		1
 #endif
 
+#ifdef _UWP
+#define LJ_TARGET_UWP		1
+#if LUAJIT_TARGET == LUAJIT_ARCH_X64
+#define LJ_TARGET_GC64		1
+#endif
+#endif
+
 #define LJ_NUMMODE_SINGLE	0	/* Single-number mode only. */
 #define LJ_NUMMODE_SINGLE_DUAL	1	/* Default to single-number mode. */
 #define LJ_NUMMODE_DUAL		2	/* Dual-number mode only. */
@@ -586,6 +593,18 @@
 #define LJ_ABI_WIN		0
 #endif
 
+#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_UWP
+#define LJ_WIN_VALLOC	VirtualAllocFromApp
+#define LJ_WIN_VPROTECT	VirtualProtectFromApp
+extern void *LJ_WIN_LOADLIBA(const char *path);
+#else
+#define LJ_WIN_VALLOC	VirtualAlloc
+#define LJ_WIN_VPROTECT	VirtualProtect
+#define LJ_WIN_LOADLIBA(path)	LoadLibraryExA((path), NULL, 0)
+#endif
+#endif
+
 #if defined(LUAJIT_NO_UNWIND) || __GNU_COMPACT_EH__ || defined(__symbian__) || LJ_TARGET_IOS || LJ_TARGET_PS3 || LJ_TARGET_PS4
 #define LJ_NO_UNWIND		1
 #endif
diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
index c33190d7..37edd00f 100644
--- a/src/lj_ccallback.c
+++ b/src/lj_ccallback.c
@@ -267,7 +267,7 @@ static void callback_mcode_new(CTState *cts)
   if (CALLBACK_MAX_SLOT == 0)
     lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
 #if LJ_TARGET_WINDOWS
-  p = VirtualAlloc(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
+  p = LJ_WIN_VALLOC(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
   if (!p)
     lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
 #elif LJ_TARGET_POSIX
@@ -285,7 +285,7 @@ static void callback_mcode_new(CTState *cts)
 #if LJ_TARGET_WINDOWS
   {
     DWORD oprot;
-    VirtualProtect(p, sz, PAGE_EXECUTE_READ, &oprot);
+    LJ_WIN_VPROTECT(p, sz, PAGE_EXECUTE_READ, &oprot);
   }
 #elif LJ_TARGET_POSIX
   mprotect(p, sz, (PROT_READ|PROT_EXEC));
diff --git a/src/lj_clib.c b/src/lj_clib.c
index c06c0915..a8672052 100644
--- a/src/lj_clib.c
+++ b/src/lj_clib.c
@@ -158,11 +158,13 @@ BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
 /* Default libraries. */
 enum {
   CLIB_HANDLE_EXE,
+#if !LJ_TARGET_UWP
   CLIB_HANDLE_DLL,
   CLIB_HANDLE_CRT,
   CLIB_HANDLE_KERNEL32,
   CLIB_HANDLE_USER32,
   CLIB_HANDLE_GDI32,
+#endif
   CLIB_HANDLE_MAX
 };
 
@@ -208,7 +210,7 @@ static const char *clib_extname(lua_State *L, const char *name)
 static void *clib_loadlib(lua_State *L, const char *name, int global)
 {
   DWORD oldwerr = GetLastError();
-  void *h = (void *)LoadLibraryExA(clib_extname(L, name), NULL, 0);
+  void *h = LJ_WIN_LOADLIBA(clib_extname(L, name));
   if (!h) clib_error(L, "cannot load module " LUA_QS ": %s", name);
   SetLastError(oldwerr);
   UNUSED(global);
@@ -218,6 +220,7 @@ static void *clib_loadlib(lua_State *L, const char *name, int global)
 static void clib_unloadlib(CLibrary *cl)
 {
   if (cl->handle == CLIB_DEFHANDLE) {
+#if !LJ_TARGET_UWP
     MSize i;
     for (i = CLIB_HANDLE_KERNEL32; i < CLIB_HANDLE_MAX; i++) {
       void *h = clib_def_handle[i];
@@ -226,11 +229,16 @@ static void clib_unloadlib(CLibrary *cl)
 	FreeLibrary((HINSTANCE)h);
       }
     }
+#endif
   } else if (cl->handle) {
     FreeLibrary((HINSTANCE)cl->handle);
   }
 }
 
+#if LJ_TARGET_UWP
+EXTERN_C IMAGE_DOS_HEADER __ImageBase;
+#endif
+
 static void *clib_getsym(CLibrary *cl, const char *name)
 {
   void *p = NULL;
@@ -239,6 +247,9 @@ static void *clib_getsym(CLibrary *cl, const char *name)
     for (i = 0; i < CLIB_HANDLE_MAX; i++) {
       HINSTANCE h = (HINSTANCE)clib_def_handle[i];
       if (!(void *)h) {  /* Resolve default library handles (once). */
+#if LJ_TARGET_UWP
+	h = (HINSTANCE)&__ImageBase;
+#else
 	switch (i) {
 	case CLIB_HANDLE_EXE: GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &h); break;
 	case CLIB_HANDLE_DLL:
@@ -249,11 +260,12 @@ static void *clib_getsym(CLibrary *cl, const char *name)
 	  GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
 			     (const char *)&_fmode, &h);
 	  break;
-	case CLIB_HANDLE_KERNEL32: h = LoadLibraryExA("kernel32.dll", NULL, 0); break;
-	case CLIB_HANDLE_USER32: h = LoadLibraryExA("user32.dll", NULL, 0); break;
-	case CLIB_HANDLE_GDI32: h = LoadLibraryExA("gdi32.dll", NULL, 0); break;
+	case CLIB_HANDLE_KERNEL32: h = LJ_WIN_LOADLIBA("kernel32.dll"); break;
+	case CLIB_HANDLE_USER32: h = LJ_WIN_LOADLIBA("user32.dll"); break;
+	case CLIB_HANDLE_GDI32: h = LJ_WIN_LOADLIBA("gdi32.dll"); break;
 	}
 	if (!h) continue;
+#endif
 	clib_def_handle[i] = (void *)h;
       }
       p = (void *)GetProcAddress(h, name);
diff --git a/src/lj_mcode.c b/src/lj_mcode.c
index c6361018..10db4457 100644
--- a/src/lj_mcode.c
+++ b/src/lj_mcode.c
@@ -66,8 +66,8 @@ void lj_mcode_sync(void *start, void *end)
 
 static void *mcode_alloc_at(jit_State *J, uintptr_t hint, size_t sz, DWORD prot)
 {
-  void *p = VirtualAlloc((void *)hint, sz,
-			 MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
+  void *p = LJ_WIN_VALLOC((void *)hint, sz,
+			  MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
   if (!p && !hint)
     lj_trace_err(J, LJ_TRERR_MCODEAL);
   return p;
@@ -82,7 +82,7 @@ static void mcode_free(jit_State *J, void *p, size_t sz)
 static int mcode_setprot(void *p, size_t sz, DWORD prot)
 {
   DWORD oprot;
-  return !VirtualProtect(p, sz, prot, &oprot);
+  return !LJ_WIN_VPROTECT(p, sz, prot, &oprot);
 }
 
 #elif LJ_TARGET_POSIX
@@ -255,7 +255,7 @@ static void *mcode_alloc(jit_State *J, size_t sz)
 /* All memory addresses are reachable by relative jumps. */
 static void *mcode_alloc(jit_State *J, size_t sz)
 {
-#ifdef __OpenBSD__
+#if defined(__OpenBSD__) || LJ_TARGET_UWP
   /* Allow better executable memory allocation for OpenBSD W^X mode. */
   void *p = mcode_alloc_at(J, 0, sz, MCPROT_RUN);
   if (p && mcode_setprot(p, sz, MCPROT_GEN)) {
diff --git a/src/lj_profile_timer.c b/src/lj_profile_timer.c
index 056fd1f7..0b859457 100644
--- a/src/lj_profile_timer.c
+++ b/src/lj_profile_timer.c
@@ -84,7 +84,7 @@ static DWORD WINAPI timer_thread(void *timerx)
 {
   lj_profile_timer *timer = (lj_profile_timer *)timerx;
   int interval = timer->opt.interval_msec;
-#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
   timer->wmm_tbp(interval);
 #endif
   while (1) {
@@ -92,7 +92,7 @@ static DWORD WINAPI timer_thread(void *timerx)
     if (timer->abort) break;
     timer->opt.handler();
   }
-#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
   timer->wmm_tep(interval);
 #endif
   return 0;
@@ -101,9 +101,9 @@ static DWORD WINAPI timer_thread(void *timerx)
 /* Start profiling timer thread. */
 void lj_profile_timer_start(lj_profile_timer *timer)
 {
-#if LJ_TARGET_WINDOWS
+#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
   if (!timer->wmm) { /* Load WinMM library on-demand. */
-    timer->wmm = LoadLibraryExA("winmm.dll", NULL, 0);
+    timer->wmm = LJ_WIN_LOADLIBA("winmm.dll");
     if (timer->wmm) {
       timer->wmm_tbp =
 	(WMM_TPFUNC)GetProcAddress(timer->wmm, "timeBeginPeriod");
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (7 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 13:07   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)

This patch refactors FFI parsing of supported C attributes and pragmas,
`ffi.abi()` parameter check. It replaces usage of comparison (with
hardcoded string hashes) with search in the given string with the
format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
"attribute" name.

Sergey Kaplun:
* added the description for the commit

Part of tarantool/tarantool#8825
---
 src/lib_ffi.c   | 35 ++++++++++------------
 src/lj_cparse.c | 77 +++++++++++++++++++++++++++++++------------------
 src/lj_cparse.h |  2 ++
 3 files changed, 67 insertions(+), 47 deletions(-)

diff --git a/src/lib_ffi.c b/src/lib_ffi.c
index d1fe1a14..62af54c1 100644
--- a/src/lib_ffi.c
+++ b/src/lib_ffi.c
@@ -720,50 +720,47 @@ LJLIB_CF(ffi_fill)	LJLIB_REC(.)
   return 0;
 }
 
-#define H_(le, be)	LJ_ENDIAN_SELECT(0x##le, 0x##be)
-
 /* Test ABI string. */
 LJLIB_CF(ffi_abi)	LJLIB_REC(.)
 {
   GCstr *s = lj_lib_checkstr(L, 1);
-  int b = 0;
-  switch (s->hash) {
+  int b = lj_cparse_case(s,
 #if LJ_64
-  case H_(849858eb,ad35fd06): b = 1; break;  /* 64bit */
+    "\00564bit"
 #else
-  case H_(662d3c79,d0e22477): b = 1; break;  /* 32bit */
+    "\00532bit"
 #endif
 #if LJ_ARCH_HASFPU
-  case H_(e33ee463,e33ee463): b = 1; break;  /* fpu */
+    "\003fpu"
 #endif
 #if LJ_ABI_SOFTFP
-  case H_(61211a23,c2e8c81c): b = 1; break;  /* softfp */
+    "\006softfp"
 #else
-  case H_(539417a8,8ce0812f): b = 1; break;  /* hardfp */
+    "\006hardfp"
 #endif
 #if LJ_ABI_EABI
-  case H_(2182df8f,f2ed1152): b = 1; break;  /* eabi */
+    "\004eabi"
 #endif
 #if LJ_ABI_WIN
-  case H_(4ab624a8,4ab624a8): b = 1; break;  /* win */
+    "\003win"
 #endif
 #if LJ_TARGET_UWP
-  case H_(a40f0bcb,a40f0bcb): b = 1; break;  /* uwp */
+    "\003uwp"
+#endif
+#if LJ_LE
+    "\002le"
+#else
+    "\002be"
 #endif
-  case H_(3af93066,1f001464): b = 1; break;  /* le/be */
 #if LJ_GC64
-  case H_(9e89d2c9,13c83c92): b = 1; break;  /* gc64 */
+    "\004gc64"
 #endif
-  default:
-    break;
-  }
+  ) >= 0;
   setboolV(L->top-1, b);
   setboolV(&G(L)->tmptv2, b);  /* Remember for trace recorder. */
   return 1;
 }
 
-#undef H_
-
 LJLIB_PUSH(top-8) LJLIB_SET(!)  /* Store reference to miscmap table. */
 
 LJLIB_CF(ffi_metatype)
diff --git a/src/lj_cparse.c b/src/lj_cparse.c
index fb440567..07c643d4 100644
--- a/src/lj_cparse.c
+++ b/src/lj_cparse.c
@@ -28,6 +28,24 @@
 ** If in doubt, please check the input against your favorite C compiler.
 */
 
+/* -- Miscellaneous ------------------------------------------------------- */
+
+/* Match string against a C literal. */
+#define cp_str_is(str, k) \
+  ((str)->len == sizeof(k)-1 && !memcmp(strdata(str), k, sizeof(k)-1))
+
+/* Check string against a linear list of matches. */
+int lj_cparse_case(GCstr *str, const char *match)
+{
+  MSize len;
+  int n;
+  for  (n = 0; (len = (MSize)*match++); n++, match += len) {
+    if (str->len == len && !memcmp(match, strdata(str), len))
+      return n;
+  }
+  return -1;
+}
+
 /* -- C lexer ------------------------------------------------------------- */
 
 /* C lexer token names. */
@@ -930,8 +948,6 @@ static CTypeID cp_decl_intern(CPState *cp, CPDecl *decl)
 
 /* -- C declaration parser ------------------------------------------------ */
 
-#define H_(le, be)	LJ_ENDIAN_SELECT(0x##le, 0x##be)
-
 /* Reset declaration state to declaration specifier. */
 static void cp_decl_reset(CPDecl *decl)
 {
@@ -1071,44 +1087,57 @@ static void cp_decl_gccattribute(CPState *cp, CPDecl *decl)
 	attrstr = lj_str_new(cp->L, c+2, attrstr->len-4);
 #endif
       cp_next(cp);
-      switch (attrstr->hash) {
-      case H_(64a9208e,8ce14319): case H_(8e6331b2,95a282af):  /* aligned */
+      switch (lj_cparse_case(attrstr,
+		"\007aligned" "\013__aligned__"
+		"\006packed" "\012__packed__"
+		"\004mode" "\010__mode__"
+		"\013vector_size" "\017__vector_size__"
+#if LJ_TARGET_X86
+		"\007regparm" "\013__regparm__"
+		"\005cdecl"  "\011__cdecl__"
+		"\010thiscall" "\014__thiscall__"
+		"\010fastcall" "\014__fastcall__"
+		"\007stdcall" "\013__stdcall__"
+		"\012sseregparm" "\016__sseregparm__"
+#endif
+	      )) {
+      case 0: case 1: /* aligned */
 	cp_decl_align(cp, decl);
 	break;
-      case H_(42eb47de,f0ede26c): case H_(29f48a09,cf383e0c):  /* packed */
+      case 2: case 3: /* packed */
 	decl->attr |= CTFP_PACKED;
 	break;
-      case H_(0a84eef6,8dfab04c): case H_(995cf92c,d5696591):  /* mode */
+      case 4: case 5: /* mode */
 	cp_decl_mode(cp, decl);
 	break;
-      case H_(0ab31997,2d5213fa): case H_(bf875611,200e9990):  /* vector_size */
+      case 6: case 7: /* vector_size */
 	{
 	  CTSize vsize = cp_decl_sizeattr(cp);
 	  if (vsize) CTF_INSERT(decl->attr, VSIZEP, lj_fls(vsize));
 	}
 	break;
 #if LJ_TARGET_X86
-      case H_(5ad22db8,c689b848): case H_(439150fa,65ea78cb):  /* regparm */
+      case 8: case 9: /* regparm */
 	CTF_INSERT(decl->fattr, REGPARM, cp_decl_sizeattr(cp));
 	decl->fattr |= CTFP_CCONV;
 	break;
-      case H_(18fc0b98,7ff4c074): case H_(4e62abed,0a747424):  /* cdecl */
+      case 10: case 11: /* cdecl */
 	CTF_INSERT(decl->fattr, CCONV, CTCC_CDECL);
 	decl->fattr |= CTFP_CCONV;
 	break;
-      case H_(72b2e41b,494c5a44): case H_(f2356d59,f25fc9bd):  /* thiscall */
+      case 12: case 13: /* thiscall */
 	CTF_INSERT(decl->fattr, CCONV, CTCC_THISCALL);
 	decl->fattr |= CTFP_CCONV;
 	break;
-      case H_(0d0ffc42,ab746f88): case H_(21c54ba1,7f0ca7e3):  /* fastcall */
+      case 14: case 15: /* fastcall */
 	CTF_INSERT(decl->fattr, CCONV, CTCC_FASTCALL);
 	decl->fattr |= CTFP_CCONV;
 	break;
-      case H_(ef76b040,9412e06a): case H_(de56697b,c750e6e1):  /* stdcall */
+      case 16: case 17: /* stdcall */
 	CTF_INSERT(decl->fattr, CCONV, CTCC_STDCALL);
 	decl->fattr |= CTFP_CCONV;
 	break;
-      case H_(ea78b622,f234bd8e): case H_(252ffb06,8d50f34b):  /* sseregparm */
+      case 18: case 19: /* sseregparm */
 	decl->fattr |= CTF_SSEREGPARM;
 	decl->fattr |= CTFP_CCONV;
 	break;
@@ -1140,16 +1169,13 @@ static void cp_decl_msvcattribute(CPState *cp, CPDecl *decl)
   while (cp->tok == CTOK_IDENT) {
     GCstr *attrstr = cp->str;
     cp_next(cp);
-    switch (attrstr->hash) {
-    case H_(bc2395fa,98f267f8):  /* align */
+    if (cp_str_is(attrstr, "align")) {
       cp_decl_align(cp, decl);
-      break;
-    default:  /* Ignore all other attributes. */
+    } else {  /* Ignore all other attributes. */
       if (cp_opt(cp, '(')) {
 	while (cp->tok != ')' && cp->tok != CTOK_EOF) cp_next(cp);
 	cp_check(cp, ')');
       }
-      break;
     }
   }
   cp_check(cp, ')');
@@ -1729,17 +1755,16 @@ static CTypeID cp_decl_abstract(CPState *cp)
 static void cp_pragma(CPState *cp, BCLine pragmaline)
 {
   cp_next(cp);
-  if (cp->tok == CTOK_IDENT &&
-      cp->str->hash == H_(e79b999f,42ca3e85))  {  /* pack */
+  if (cp->tok == CTOK_IDENT && cp_str_is(cp->str, "pack"))  {
     cp_next(cp);
     cp_check(cp, '(');
     if (cp->tok == CTOK_IDENT) {
-      if (cp->str->hash == H_(738e923c,a1b65954)) {  /* push */
+      if (cp_str_is(cp->str, "push")) {
 	if (cp->curpack < CPARSE_MAX_PACKSTACK) {
 	  cp->packstack[cp->curpack+1] = cp->packstack[cp->curpack];
 	  cp->curpack++;
 	}
-      } else if (cp->str->hash == H_(6c71cf27,6c71cf27)) {  /* pop */
+      } else if (cp_str_is(cp->str, "pop")) {
 	if (cp->curpack > 0) cp->curpack--;
       } else {
 	cp_errmsg(cp, cp->tok, LJ_ERR_XSYMBOL);
@@ -1788,13 +1813,11 @@ static void cp_decl_multi(CPState *cp)
       if (tok == CTOK_INTEGER) {
 	cp_line(cp, hashline);
 	continue;
-      } else if (tok == CTOK_IDENT &&
-		 cp->str->hash == H_(187aab88,fcb60b42)) { /* line */
+      } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "line")) {
 	if (cp_next(cp) != CTOK_INTEGER) cp_err_token(cp, tok);
 	cp_line(cp, hashline);
 	continue;
-      } else if (tok == CTOK_IDENT &&
-	  cp->str->hash == H_(f5e6b4f8,1d509107)) { /* pragma */
+      } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "pragma")) {
 	cp_pragma(cp, hashline);
 	continue;
       } else {
@@ -1865,8 +1888,6 @@ static void cp_decl_single(CPState *cp)
   if (cp->tok != CTOK_EOF) cp_err_token(cp, CTOK_EOF);
 }
 
-#undef H_
-
 /* ------------------------------------------------------------------------ */
 
 /* Protected callback for C parser. */
diff --git a/src/lj_cparse.h b/src/lj_cparse.h
index bad1060b..e40b4047 100644
--- a/src/lj_cparse.h
+++ b/src/lj_cparse.h
@@ -60,6 +60,8 @@ typedef struct CPState {
 
 LJ_FUNC int lj_cparse(CPState *cp);
 
+LJ_FUNC int lj_cparse_case(GCstr *str, const char *match);
+
 #endif
 
 #endif
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (8 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-11  8:06   ` Sergey Kaplun via Tarantool-patches
                     ` (2 more replies)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
                   ` (11 subsequent siblings)
  21 siblings, 3 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)

This patch backports the aforementioned patch for mips and ppc, because
those architectures were stripped during the backporting via
71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
compilation and fix inconsistencies."). This applies these missed diffs
to prevent conflict during backporting future patches.

This patch just removes macros, that are no more in use.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#8825
---
 src/lj_asm_mips.h | 1 -
 src/lj_asm_ppc.h  | 1 -
 2 files changed, 2 deletions(-)

diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index a26a82cd..c27d8413 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
 }
 #endif
 
-#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
 #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
 
 static void asm_arithov(ASMState *as, IRIns *ir)
diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
index 6cb608f7..6aaed058 100644
--- a/src/lj_asm_ppc.h
+++ b/src/lj_asm_ppc.h
@@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
 }
 
 #define asm_abs(as, ir)		asm_fpunary(as, ir, PPCI_FABS)
-#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
 #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
 
 static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (9 preceding siblings ...)
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 13:17   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17  7:37   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
                   ` (10 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)

This patch adds the `/* fallthrough */` where it may trigger the
`-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
this comment and will be fixed in the future commits.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough

Sergey Kaplun:
* added the description for the commit

Part of tarantool/tarantool#8825
---
 dynasm/dasm_arm.h  |  2 ++
 dynasm/dasm_mips.h |  1 +
 dynasm/dasm_ppc.h  |  1 +
 dynasm/dasm_x86.h  | 18 ++++++++++++++----
 src/lj_asm.c       |  7 ++++++-
 src/lj_cparse.c    | 10 ++++++++++
 src/lj_err.c       |  1 +
 src/lj_opt_sink.c  |  2 +-
 src/lj_parse.c     |  3 ++-
 src/luajit.c       |  1 +
 10 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/dynasm/dasm_arm.h b/dynasm/dasm_arm.h
index a43f7c66..1d404ccd 100644
--- a/dynasm/dasm_arm.h
+++ b/dynasm/dasm_arm.h
@@ -254,6 +254,7 @@ void dasm_put(Dst_DECL, int start, ...)
       case DASM_IMMV8:
 	CK((n & 3) == 0, RANGE_I);
 	n >>= 2;
+	/* fallthrough */
       case DASM_IMML8:
       case DASM_IMML12:
 	CK(n >= 0 ? ((n>>((ins>>5)&31)) == 0) :
@@ -371,6 +372,7 @@ int dasm_encode(Dst_DECL, void *buffer)
 	  break;
 	case DASM_REL_LG:
 	  CK(n >= 0, UNDEF_LG);
+	  /* fallthrough */
 	case DASM_REL_PC:
 	  CK(n >= 0, UNDEF_PC);
 	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) - 4;
diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
index 4b49fd8c..71a835b2 100644
--- a/dynasm/dasm_mips.h
+++ b/dynasm/dasm_mips.h
@@ -350,6 +350,7 @@ int dasm_encode(Dst_DECL, void *buffer)
 	  break;
 	case DASM_REL_LG:
 	  CK(n >= 0, UNDEF_LG);
+	  /* fallthrough */
 	case DASM_REL_PC:
 	  CK(n >= 0, UNDEF_PC);
 	  n = *DASM_POS2PTR(D, n);
diff --git a/dynasm/dasm_ppc.h b/dynasm/dasm_ppc.h
index 3a7ee9b0..83fc030a 100644
--- a/dynasm/dasm_ppc.h
+++ b/dynasm/dasm_ppc.h
@@ -354,6 +354,7 @@ int dasm_encode(Dst_DECL, void *buffer)
 	  break;
 	case DASM_REL_LG:
 	  CK(n >= 0, UNDEF_LG);
+	  /* fallthrough */
 	case DASM_REL_PC:
 	  CK(n >= 0, UNDEF_PC);
 	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base);
diff --git a/dynasm/dasm_x86.h b/dynasm/dasm_x86.h
index bc636357..2a276042 100644
--- a/dynasm/dasm_x86.h
+++ b/dynasm/dasm_x86.h
@@ -194,12 +194,13 @@ void dasm_put(Dst_DECL, int start, ...)
       switch (action) {
       case DASM_DISP:
 	if (n == 0) { if (mrm < 0) mrm = p[-2]; if ((mrm&7) != 5) break; }
-      case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob;
+	/* fallthrough */
+      case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
       case DASM_REL_A: /* Assumes ptrdiff_t is int. !x64 */
       case DASM_IMM_D: ofs += 4; break;
       case DASM_IMM_S: CK(((n+128)&-256) == 0, RANGE_I); goto ob;
       case DASM_IMM_B: CK((n&-256) == 0, RANGE_I); ob: ofs++; break;
-      case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob;
+      case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
       case DASM_IMM_W: CK((n&-65536) == 0, RANGE_I); ofs += 2; break;
       case DASM_SPACE: p++; ofs += n; break;
       case DASM_SETLABEL: b[pos-2] = -0x40000000; break;  /* Neg. label ofs. */
@@ -207,8 +208,8 @@ void dasm_put(Dst_DECL, int start, ...)
 	if (*p < 0x40 && p[1] == DASM_DISP) mrm = n;
 	if (*p < 0x20 && (n&7) == 4) ofs++;
 	switch ((*p++ >> 3) & 3) {
-	case 3: n |= b[pos-3];
-	case 2: n |= b[pos-2];
+	case 3: n |= b[pos-3]; /* fallthrough */
+	case 2: n |= b[pos-2]; /* fallthrough */
 	case 1: if (n <= 7) { b[pos-1] |= 0x10; ofs--; }
 	}
 	continue;
@@ -329,11 +330,14 @@ int dasm_link(Dst_DECL, size_t *szp)
 	  pos += 2;
 	  break;
 	}
+	  /* fallthrough */
 	case DASM_SPACE: case DASM_IMM_LG: case DASM_VREG: p++;
+	  /* fallthrough */
 	case DASM_DISP: case DASM_IMM_S: case DASM_IMM_B: case DASM_IMM_W:
 	case DASM_IMM_D: case DASM_IMM_WB: case DASM_IMM_DB:
 	case DASM_SETLABEL: case DASM_REL_A: case DASM_IMM_PC: pos++; break;
 	case DASM_LABEL_LG: p++;
+	  /* fallthrough */
 	case DASM_LABEL_PC: b[pos++] += ofs; break; /* Fix label offset. */
 	case DASM_ALIGN: ofs -= (b[pos++]+ofs)&*p++; break; /* Adjust ofs. */
 	case DASM_EXTERN: p += 2; break;
@@ -391,12 +395,15 @@ int dasm_encode(Dst_DECL, void *buffer)
 	    if (mrm != 5) { mm[-1] -= 0x80; break; } }
 	  if (((n+128) & -256) != 0) goto wd; else mm[-1] -= 0x40;
 	}
+	  /* fallthrough */
 	case DASM_IMM_S: case DASM_IMM_B: wb: dasmb(n); break;
 	case DASM_IMM_DB: if (((n+128)&-256) == 0) {
 	    db: if (!mark) mark = cp; mark[-2] += 2; mark = NULL; goto wb;
 	  } else mark = NULL;
+	  /* fallthrough */
 	case DASM_IMM_D: wd: dasmd(n); break;
 	case DASM_IMM_WB: if (((n+128)&-256) == 0) goto db; else mark = NULL;
+	  /* fallthrough */
 	case DASM_IMM_W: dasmw(n); break;
 	case DASM_VREG: {
 	  int t = *p++;
@@ -421,6 +428,7 @@ int dasm_encode(Dst_DECL, void *buffer)
 	}
 	case DASM_REL_LG: p++; if (n >= 0) goto rel_pc;
 	  b++; n = (int)(ptrdiff_t)D->globals[-n];
+	  /* fallthrough */
 	case DASM_REL_A: rel_a: n -= (int)(ptrdiff_t)(cp+4); goto wd; /* !x64 */
 	case DASM_REL_PC: rel_pc: {
 	  int shrink = *b++;
@@ -432,6 +440,7 @@ int dasm_encode(Dst_DECL, void *buffer)
 	}
 	case DASM_IMM_LG:
 	  p++; if (n < 0) { n = (int)(ptrdiff_t)D->globals[-n]; goto wd; }
+	  /* fallthrough */
 	case DASM_IMM_PC: {
 	  int *pb = DASM_POS2PTR(D, n);
 	  n = *pb < 0 ? pb[1] : (*pb + (int)(ptrdiff_t)base);
@@ -452,6 +461,7 @@ int dasm_encode(Dst_DECL, void *buffer)
 	case DASM_EXTERN: n = DASM_EXTERN(Dst, cp, p[1], *p); p += 2; goto wd;
 	case DASM_MARK: mark = cp; break;
 	case DASM_ESC: action = *p++;
+	  /* fallthrough */
 	default: *cp++ = action; break;
 	case DASM_SECTION: case DASM_STOP: goto stop;
 	}
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 15de7e33..2d570bb9 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -2188,9 +2188,12 @@ static void asm_setup_regsp(ASMState *as)
 	if (ir->op2 != REF_NIL && as->evenspill < 4)
 	  as->evenspill = 4;  /* lj_cdata_newv needs 4 args. */
       }
+      /* fallthrough */
 #else
+      /* fallthrough */
     case IR_CNEW:
 #endif
+      /* fallthrough */
     case IR_TNEW: case IR_TDUP: case IR_CNEWI: case IR_TOSTR:
     case IR_BUFSTR:
       ir->prev = REGSP_HINT(RID_RET);
@@ -2206,6 +2209,7 @@ static void asm_setup_regsp(ASMState *as)
     case IR_LDEXP:
 #endif
 #endif
+      /* fallthrough */
     case IR_POW:
       if (!LJ_SOFTFP && irt_isnum(ir->t)) {
 	if (inloop)
@@ -2217,7 +2221,7 @@ static void asm_setup_regsp(ASMState *as)
 	continue;
 #endif
       }
-      /* fallthrough for integer POW */
+      /* fallthrough */ /* for integer POW */
     case IR_DIV: case IR_MOD:
       if (!irt_isnum(ir->t)) {
 	ir->prev = REGSP_HINT(RID_RET);
@@ -2254,6 +2258,7 @@ static void asm_setup_regsp(ASMState *as)
     case IR_BSHL: case IR_BSHR: case IR_BSAR:
       if ((as->flags & JIT_F_BMI2))  /* Except if BMI2 is available. */
 	break;
+      /* fallthrough */
     case IR_BROL: case IR_BROR:
       if (!irref_isk(ir->op2) && !ra_hashint(IR(ir->op2)->r)) {
 	IR(ir->op2)->r = REGSP_HINT(RID_ECX);
diff --git a/src/lj_cparse.c b/src/lj_cparse.c
index 07c643d4..cd032b8e 100644
--- a/src/lj_cparse.c
+++ b/src/lj_cparse.c
@@ -595,28 +595,34 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
 	k->id = k2.id > k3.id ? k2.id : k3.id;
 	continue;
       }
+      /* fallthrough */
     case 1:
       if (cp_opt(cp, CTOK_OROR)) {
 	cp_expr_sub(cp, &k2, 2); k->i32 = k->u32 || k2.u32; k->id = CTID_INT32;
 	continue;
       }
+      /* fallthrough */
     case 2:
       if (cp_opt(cp, CTOK_ANDAND)) {
 	cp_expr_sub(cp, &k2, 3); k->i32 = k->u32 && k2.u32; k->id = CTID_INT32;
 	continue;
       }
+      /* fallthrough */
     case 3:
       if (cp_opt(cp, '|')) {
 	cp_expr_sub(cp, &k2, 4); k->u32 = k->u32 | k2.u32; goto arith_result;
       }
+      /* fallthrough */
     case 4:
       if (cp_opt(cp, '^')) {
 	cp_expr_sub(cp, &k2, 5); k->u32 = k->u32 ^ k2.u32; goto arith_result;
       }
+      /* fallthrough */
     case 5:
       if (cp_opt(cp, '&')) {
 	cp_expr_sub(cp, &k2, 6); k->u32 = k->u32 & k2.u32; goto arith_result;
       }
+      /* fallthrough */
     case 6:
       if (cp_opt(cp, CTOK_EQ)) {
 	cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 == k2.u32; k->id = CTID_INT32;
@@ -625,6 +631,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
 	cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 != k2.u32; k->id = CTID_INT32;
 	continue;
       }
+      /* fallthrough */
     case 7:
       if (cp_opt(cp, '<')) {
 	cp_expr_sub(cp, &k2, 8);
@@ -659,6 +666,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
 	k->id = CTID_INT32;
 	continue;
       }
+      /* fallthrough */
     case 8:
       if (cp_opt(cp, CTOK_SHL)) {
 	cp_expr_sub(cp, &k2, 9); k->u32 = k->u32 << k2.u32;
@@ -671,6 +679,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
 	  k->u32 = k->u32 >> k2.u32;
 	continue;
       }
+      /* fallthrough */
     case 9:
       if (cp_opt(cp, '+')) {
 	cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 + k2.u32;
@@ -680,6 +689,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
       } else if (cp_opt(cp, '-')) {
 	cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 - k2.u32; goto arith_result;
       }
+      /* fallthrough */
     case 10:
       if (cp_opt(cp, '*')) {
 	cp_expr_unary(cp, &k2); k->u32 = k->u32 * k2.u32; goto arith_result;
diff --git a/src/lj_err.c b/src/lj_err.c
index 9903d273..8d7134d9 100644
--- a/src/lj_err.c
+++ b/src/lj_err.c
@@ -167,6 +167,7 @@ static void *err_unwind(lua_State *L, void *stopcf, int errcode)
     case FRAME_CONT:  /* Continuation frame. */
       if (frame_iscont_fficb(frame))
 	goto unwind_c;
+      /* fallthrough */
     case FRAME_VARG:  /* Vararg frame. */
       frame = frame_prevd(frame);
       break;
diff --git a/src/lj_opt_sink.c b/src/lj_opt_sink.c
index a16d112f..c16363e7 100644
--- a/src/lj_opt_sink.c
+++ b/src/lj_opt_sink.c
@@ -100,8 +100,8 @@ static void sink_mark_ins(jit_State *J)
 	   (LJ_32 && ir+1 < irlast && (ir+1)->o == IR_HIOP &&
 	    !sink_checkphi(J, ir, (ir+1)->op2))))
 	irt_setmark(ir->t);  /* Mark ineligible allocation. */
-      /* fallthrough */
 #endif
+      /* fallthrough */
     case IR_USTORE:
       irt_setmark(IR(ir->op2)->t);  /* Mark stored value. */
       break;
diff --git a/src/lj_parse.c b/src/lj_parse.c
index 343fa797..e238afa3 100644
--- a/src/lj_parse.c
+++ b/src/lj_parse.c
@@ -2684,7 +2684,8 @@ static int parse_stmt(LexState *ls)
       lj_lex_next(ls);
       parse_goto(ls);
       break;
-    }  /* else: fallthrough */
+    }
+    /* fallthrough */
   default:
     parse_call_assign(ls);
     break;
diff --git a/src/luajit.c b/src/luajit.c
index 1ca24301..3a3ec247 100644
--- a/src/luajit.c
+++ b/src/luajit.c
@@ -421,6 +421,7 @@ static int collectargs(char **argv, int *flags)
       break;
     case 'e':
       *flags |= FLAGS_EXEC;
+      /* fallthrough */
     case 'j':  /* LuaJIT extension */
     case 'l':
       *flags |= FLAGS_OPTION;
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (10 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 13:21   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17  7:39   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
                   ` (9 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)

This patch adds the `/* fallthrough */` comments elsewhere, where it was
missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
is trigerred.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough

Sergey Kaplun:
* added the description for the commit

Part of tarantool/tarantool#8825
---
 dynasm/dasm_arm64.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dynasm/dasm_arm64.h b/dynasm/dasm_arm64.h
index 47e1e074..ff21236d 100644
--- a/dynasm/dasm_arm64.h
+++ b/dynasm/dasm_arm64.h
@@ -427,6 +427,7 @@ int dasm_encode(Dst_DECL, void *buffer)
 	  break;
 	case DASM_REL_LG:
 	  CK(n >= 0, UNDEF_LG);
+	  /* fallthrough */
 	case DASM_REL_PC:
 	  CK(n >= 0, UNDEF_PC);
 	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4;
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (11 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 13:25   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17  7:44   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
                   ` (8 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)

This patch adds the `/* fallthrough */` comments elsewhere, where it was
missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
is trigerred.

Also, this commits sets the correspoinding flag in the
<cmake/SetTargetFlags.cmake>.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough

Sergey Kaplun:
* added the description for the commit

Part of tarantool/tarantool#8825
---
 cmake/SetTargetFlags.cmake | 6 ++++++
 src/lj_asm.c               | 2 +-
 src/lj_asm_arm.h           | 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
index 3b9e481d..d309989e 100644
--- a/cmake/SetTargetFlags.cmake
+++ b/cmake/SetTargetFlags.cmake
@@ -8,6 +8,12 @@
 
 include(CheckUnwindTables)
 
+# Clang does not recognize comment markers.
+if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
+    AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
+  AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
+endif()
+
 if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
   set(BUILDVM_MODE machasm)
 else() # Linux and FreeBSD.
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 2d570bb9..25b96264 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -2176,8 +2176,8 @@ static void asm_setup_regsp(ASMState *as)
 #if LJ_SOFTFP
     case IR_MIN: case IR_MAX:
       if ((ir+1)->o != IR_HIOP) break;
-      /* fallthrough */
 #endif
+    /* fallthrough */
     /* C calls evict all scratch regs and return results in RID_RET. */
     case IR_SNEW: case IR_XSNEW: case IR_NEWREF: case IR_BUFPUT:
       if (REGARG_NUMGPR < 3 && as->evenspill < 3)
diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h
index 6ae6e2f2..2894e5c9 100644
--- a/src/lj_asm_arm.h
+++ b/src/lj_asm_arm.h
@@ -979,7 +979,7 @@ static ARMIns asm_fxloadins(IRIns *ir)
   case IRT_I16: return ARMI_LDRSH;
   case IRT_U16: return ARMI_LDRH;
   case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VLDR_D;
-  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S;
+  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S;  /* fallthrough */
   default: return ARMI_LDR;
   }
 }
@@ -990,7 +990,7 @@ static ARMIns asm_fxstoreins(IRIns *ir)
   case IRT_I8: case IRT_U8: return ARMI_STRB;
   case IRT_I16: case IRT_U16: return ARMI_STRH;
   case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VSTR_D;
-  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S;
+  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S;  /* fallthrough */
   default: return ARMI_STR;
   }
 }
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (12 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 13:35   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17  8:29   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
                   ` (7 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Thanks to Sergey Ostanevich.

(cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)

This patch just reverts the commit
48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
debug.getinfo(1,'>S')") and applies the one from the main repo for the
consistency with the upstream.
---
 src/lj_debug.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/src/lj_debug.c b/src/lj_debug.c
index 654dc913..c4edcabb 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -431,16 +431,12 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
   TValue *frame = NULL;
   TValue *nextframe = NULL;
   GCfunc *fn;
-  if (*what == '>') { /* we have to have an extra arg on stack */
-    if (lua_gettop(L) > 2) {
-      TValue *func = L->top - 1;
-      api_check(L, tvisfunc(func));
-      fn = funcV(func);
-      L->top--;
-      what++;
-    } else { /* need better error to display? */
-      return 0;
-    }
+  if (*what == '>') {
+    TValue *func = L->top - 1;
+    if (!tvisfunc(func)) return 0;
+    fn = funcV(func);
+    L->top--;
+    what++;
   } else {
     uint32_t offset = (uint32_t)ar->i_ci & 0xffff;
     uint32_t size = (uint32_t)ar->i_ci >> 16;
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (13 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 14:07   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17  8:57   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
                   ` (6 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Thanks to Yichun Zhang.

(cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)

This patch is predecessor for the commit
944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
that leading to the assertion failure. Since the predecessor patch,
there are no places, that can lead to the condition failure, since we
always check that new baseslot + framesize (+ vargframe) >=
`LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
for details), we can't obtain this assertion failure. This patch is
added for the consistency with the upstream.

Since the predecessor patch fixes the issue, there is no new test case
to add.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#8825
---
 src/lj_record.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/lj_record.c b/src/lj_record.c
index 02d9db9e..6030f77c 100644
--- a/src/lj_record.c
+++ b/src/lj_record.c
@@ -87,9 +87,9 @@ static void rec_check_slots(jit_State *J)
   BCReg s, nslots = J->baseslot + J->maxslot;
   int32_t depth = 0;
   cTValue *base = J->L->base - J->baseslot;
-  lua_assert(J->baseslot >= 1+LJ_FR2 && J->baseslot < LJ_MAX_JSLOTS);
+  lua_assert(J->baseslot >= 1+LJ_FR2);
   lua_assert(J->baseslot == 1+LJ_FR2 || (J->slot[J->baseslot-1] & TREF_FRAME));
-  lua_assert(nslots < LJ_MAX_JSLOTS);
+  lua_assert(nslots <= LJ_MAX_JSLOTS);
   for (s = 0; s < nslots; s++) {
     TRef tr = J->slot[s];
     if (tr) {
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (14 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-15 14:38   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 10:53   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF Sergey Kaplun via Tarantool-patches
                   ` (5 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)

This commit fixes possible integer overflow of the separator's length
counter during parsing long strings. It may lead to the fact, that
parser considers a string with unbalanced long brackets to be correct.
Since this is pointless to parse too long string separators in the hope,
that the string is correct, just use hardcoded limit (2 ^ 25 is enough).

Be aware that this limit is different for Lua 5.1.

We can't check the string overflow itself without a really large file,
because the ERR_MEM error will be raised, due to the string buffer
reallocations during parsing. Keep such huge file in the repo is
pointless, so just check that we don't parse long string after
aforementioned separator length.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#8825
---
 src/lj_lex.c                                  |  2 +-
 .../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua

diff --git a/src/lj_lex.c b/src/lj_lex.c
index 52856912..c66660d7 100644
--- a/src/lj_lex.c
+++ b/src/lj_lex.c
@@ -138,7 +138,7 @@ static int lex_skipeq(LexState *ls)
   int count = 0;
   LexChar s = ls->c;
   lua_assert(s == '[' || s == ']');
-  while (lex_savenext(ls) == '=')
+  while (lex_savenext(ls) == '=' && count < 0x20000000)
     count++;
   return (ls->c == s) ? count : (-count) - 1;
 }
diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
new file mode 100644
index 00000000..fda69d17
--- /dev/null
+++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
@@ -0,0 +1,31 @@
+local tap = require('tap')
+
+-- Test to check that we avoid parsing of too long separator
+-- for long strings.
+-- See also the discussion in the
+-- https://github.com/LuaJIT/LuaJIT/issues/812.
+
+local test = tap.test('lj-812-too-long-string-separator'):skipcond({
+  ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
+})
+test:plan(2)
+
+-- We can't check the string overflow itself without a really
+-- large file, because the ERR_MEM error will be raised, due to
+-- the string buffer reallocations during parsing.
+-- Keep such huge file in the repo is pointless, so just check
+-- that we don't parse long string after some separator length.
+-- Be aware that this limit is different for Lua 5.1.
+
+-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
+local separator = string.rep('=', 0x20000000 + 1)
+local test_str = ('return [%s[]%s]'):format(separator, separator)
+
+local f, err = loadstring(test_str, 'empty_str_f')
+test:ok(not f, 'correct status when parsing string with too long separator')
+
+-- Check error message.
+test:ok(tostring(err):match('invalid long string delimiter'),
+        'correct error when parsing string with too long separator')
+
+test:done(true)
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (15 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-16  9:01   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 11:06   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable Sergey Kaplun via Tarantool-patches
                   ` (4 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by James Cowgill.

(cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)

The issue is observed for the following merged IRs:
|    p64 HREF   0001  "a"            ; or other keys
| >  p64 EQ     0002  [0x4002d0c528] ; nilnode
Sometimes, when we need to rematerialize a constant during evicting of
the register. So, the instruction related to constant rematerialization
is placed in the delay branch slot, which suppose to contain the loads
of trace exit number to the `$ra` register. The resulting assembly is
the following (for example):
| beq     ra, r1, 0x400abee9b0  ->exit
| lui     r1, 65531   ; delay slot without setting of the `ra`
This leading to the assertion failure during trace exit in
`lj_trace_exit()`, since a trace number is incorrect.

This patch moves the constant register allocations above the main
instruction emitting code in `asm_href()`.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#8825
---
 src/lj_asm_mips.h                             |  42 +++++---
 ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
 2 files changed, 126 insertions(+), 17 deletions(-)
 create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua

diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index c27d8413..23ffc3aa 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
   Reg dest = ra_dest(as, ir, allow);
   Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
   Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
+#if LJ_64
+  Reg cmp64 = RID_NONE;
+#endif
   IRRef refkey = ir->op2;
   IRIns *irkey = IR(refkey);
   int isk = irref_isk(refkey);
@@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
 #endif
   tmp2 = ra_scratch(as, allow);
   rset_clear(allow, tmp2);
+#if LJ_64
+  if (LJ_SOFTFP || !irt_isnum(kt)) {
+    /* Allocate cmp64 register used for 64-bit comparisons */
+    if (LJ_SOFTFP && irt_isnum(kt)) {
+      cmp64 = key;
+    } else if (!isk && irt_isaddr(kt)) {
+      cmp64 = tmp2;
+    } else {
+      int64_t k;
+      if (isk && irt_isaddr(kt)) {
+	k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
+      } else {
+	lua_assert(irt_ispri(kt) && !irt_isnil(kt));
+	k = ~((int64_t)~irt_toitype(ir->t) << 47);
+      }
+      cmp64 = ra_allock(as, k, allow);
+      rset_clear(allow, cmp64);
+    }
+  }
+#endif
 
   /* Key not found in chain: jump to exit (if merged) or load niltv. */
   l_end = emit_label(as);
@@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
     emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
     emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
     emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
-  } else if (LJ_SOFTFP && irt_isnum(kt)) {
-    emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
-    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
-  } else if (irt_isaddr(kt)) {
-    Reg refk = tmp2;
-    if (isk) {
-      int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
-      refk = ra_allock(as, k, allow);
-      rset_clear(allow, refk);
-    }
-    emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
-    emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
   } else {
-    Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
-    rset_clear(allow, pri);
-    lua_assert(irt_ispri(kt) && !irt_isnil(kt));
-    emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
-    emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
+    emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
+    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
   }
   *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
   if (!isk && irt_isaddr(kt)) {
diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
new file mode 100644
index 00000000..8c75e69c
--- /dev/null
+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
@@ -0,0 +1,101 @@
+local tap = require('tap')
+-- Test file to demonstrate the incorrect JIT behaviour for HREF
+-- IR compilation on mips64.
+-- See also https://github.com/LuaJIT/LuaJIT/pull/362.
+local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(1)
+
+-- To reproduce the issue we need to compile a trace with
+-- `IR_HREF`, with a lookup of constant hash key GC value. To
+-- prevent an `IR_HREFK` to be emitted instead, we need a table
+-- with a huge hash part. Delta of address between the start of
+-- the hash part of the table and the current node to lookup must
+-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- See <src/lj_record.c>, for details.
+-- XXX: This constant is well suited to prevent test to be flaky,
+-- because the aforementioned delta is always large enough.
+-- Also, this constant avoids table rehashing, when inserting new
+-- keys.
+local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
+
+-- XXX: don't set `hotexit` to prevent compilation of trace after
+-- exiting the main test cycle.
+jit.opt.start('hotloop=1')
+
+-- Don't use `table.new()`, here by intence -- this leads to the
+-- allocation failure for the mcode memory, so traces are not
+-- compiled.
+local filled_tab = {}
+-- Filling-up the table with GC values to minimize the amount of
+-- hash collisions and increase delta between the start of the
+-- hash part of the table and currently stored node.
+for _ = 1, N_HASH_FIELDS do
+  filled_tab[1LL] = 1
+end
+
+-- luacheck: no unused
+local tab_value_a
+local tab_value_b
+local tab_value_c
+local tab_value_d
+local tab_value_e
+local tab_value_f
+local tab_value_g
+local tab_value_h
+local tab_value_i
+
+-- The function for this trace has a bunch of the following IRs:
+--    p64 HREF   0001  "a"            ; or other keys
+-- >  p64 EQ     0002  [0x4002d0c528] ; nilnode
+-- Sometimes, when we need to rematerialize a constant during
+-- evicting of the register. So, the instruction related to
+-- constant rematerialization is placed in the delay branch slot,
+-- which suppose to contain the loads of trace exit number to the
+-- `$ra` register. This leading to the assertion failure during
+-- trace exit in `lj_trace_exit()`, since a trace number is
+-- incorrect. The amount of the side exit to check is empirical
+-- (even a little bit more, than necessary just in case).
+local function href_const(tab)
+  tab_value_a = tab.a
+  tab_value_b = tab.b
+  tab_value_c = tab.c
+  tab_value_d = tab.d
+  tab_value_e = tab.e
+  tab_value_f = tab.f
+  tab_value_g = tab.g
+  tab_value_h = tab.h
+  tab_value_i = tab.i
+end
+
+-- Compile main trace first.
+href_const(filled_tab)
+href_const(filled_tab)
+
+-- Now brute-force side exits to check that they are compiled
+-- correct. Take side exits in the reverse order to take a new
+-- side exit each time.
+filled_tab.i = 'i'
+href_const(filled_tab)
+filled_tab.h = 'h'
+href_const(filled_tab)
+filled_tab.g = 'g'
+href_const(filled_tab)
+filled_tab.f = 'f'
+href_const(filled_tab)
+filled_tab.e = 'e'
+href_const(filled_tab)
+filled_tab.d = 'd'
+href_const(filled_tab)
+filled_tab.c = 'c'
+href_const(filled_tab)
+filled_tab.b = 'b'
+href_const(filled_tab)
+filled_tab.a = 'a'
+href_const(filled_tab)
+
+test:ok(true, 'no assertion failures during trace exits')
+
+test:done(true)
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (16 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-16  9:03   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 12:01   ` Sergey Bronnikov via Tarantool-patches
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port Sergey Kaplun via Tarantool-patches
                   ` (3 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Cleanup only, bug cannot trigger.
Thanks to Domingo Alvarez Duarte.

(cherry-picked from commit 5c911998a3c85d024a8006feafc68d0b4c962fd8)

This patch fixes local shadow variable `n` in `template__` function from
<dynasm/dasm_mips.lua> by renaming it to `m`. Since this cannot be
triggered, there is no test provided.

Sergey Kaplun:
* added the description for the problem
---
 dynasm/dasm_mips.lua | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dynasm/dasm_mips.lua b/dynasm/dasm_mips.lua
index 78a4e34a..bd2a2b43 100644
--- a/dynasm/dasm_mips.lua
+++ b/dynasm/dasm_mips.lua
@@ -809,9 +809,9 @@ map_op[".template__"] = function(params, template, nparams)
     elseif p == "X" then
       op = op + parse_index(params[n]); n = n + 1
     elseif p == "B" or p == "J" then
-      local mode, n, s = parse_label(params[n], false)
-      if p == "B" then n = n + 2048 end
-      waction("REL_"..mode, n, s, 1)
+      local mode, m, s = parse_label(params[n], false)
+      if p == "B" then m = m + 2048 end
+      waction("REL_"..mode, m, s, 1)
       n = n + 1
     elseif p == "A" then
       op = op + parse_imm(params[n], 5, 6, 0, false); n = n + 1
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port.
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (17 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable Sergey Kaplun via Tarantool-patches
@ 2023-08-09 15:36 ` Sergey Kaplun via Tarantool-patches
  2023-08-16  9:16   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 13:03   ` Sergey Bronnikov via Tarantool-patches
  2023-08-16 15:35 ` [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (2 subsequent siblings)
  21 siblings, 2 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-09 15:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Contributed by Hua Zhang, YunQiang Su from Wave Computing,
and Radovan Birdic from RT-RK.
Sponsored by Wave Computing.

(cherry-picked from commit 94d0b53004a5fa368defa4307a17edcdb87fe727)

This patch adds support for MIPS Release 6 [1] for the 64-bit build.
This includes:
* Global `_map_def` value is set with <dynasm/dynasm.lua>. `MIPSR6` key
  specifies the corresponding instruction set support. Also, `MIPSR6` is
  defined in `DYNASM_FLAGS` (`DASM_AFLAGS`).
* New instructions are added within <dynasm/dasm_mips.lua>, they are
  used if the aforementioned key is set.
* Obsolete instructions (that are no more in use in r6) are used in the
  opposite case (if `MIPSR6` isn't set).
* New opcode maps are added into  <src/jit/dis_mips.lua>.
* `map_arch` table in <jit/bcsave.lua> is refactored for more convenient
  usage. Now each arch key contains a table with the corresponding info
  about supported architecture:
    - `e`: endianess; "le" or "be"
    - `b`: bit-width of the supported architecture; 32 or 64
    - `m`: machine specification (see `e_machine` in man elf)
    - `f`: processor-specific flags (see `e_flags` in man elf)
    - `p`: number that identifies the type of target machine [2] for
      Portable Executable format [3].
* New `LJ_TARGET_MIPSR6` define is set for MIPSR6 in <src/lj_arch.h>.
* The corresponding "MIPS32R6", "MIPS64R6" CPU strings are added to the
  <src/jit.h>
* MIPSR6 instructions are added to the <src/lj_target_mips.h>, some
  obsolete instructions are removed or defined only for the non-MIPSR6
  build.
* All release-dependent instructions in <src/lj_asm_mips.h> are
  instrumented with `LJ_TARGET_MIPSR6` macro.
* `f20`, `f21`, `f22` FP registers are defined as `FTMP0`, `FTMP1`,
  `FTMP2` correspondingly in the VM.
* All release-dependent instructions in <src/vm_mips64.dasm> are
  instrumented with `MIPSR6` macro.
* `sfmin_max` macro now takes the third operand for the MIPSR6 build.
* Fix implicit fallthrough warning for `LJ_SOFTFP && !LJ_NEED_FP64`
  build in <src/lj_asm.c>.

Note, that 32-bit r6 targets still unsupported, because it is difficult
and most available r6 CPUs are 64 bit.

[1]: https://www.mips.com/products/architectures/mips64/
[2]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#machine-types
[3]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format

Sergey Kaplun:
* added the description for the feature

Part of tarantool/tarantool#8825
---
 cmake/SetDynASMFlags.cmake |   5 +
 dynasm/dasm_mips.h         |  13 +-
 dynasm/dasm_mips.lua       | 625 +++++++++++++++++++++++--------------
 dynasm/dynasm.lua          |   1 +
 src/Makefile.original      |   3 +
 src/jit/bcsave.lua         |  84 ++---
 src/jit/dis_mips.lua       | 293 +++++++++++++++--
 src/jit/dis_mips64r6.lua   |  17 +
 src/jit/dis_mips64r6el.lua |  17 +
 src/lj_arch.h              |  29 +-
 src/lj_asm.c               |   2 +-
 src/lj_asm_mips.h          | 114 ++++++-
 src/lj_emit_mips.h         |  15 +-
 src/lj_jit.h               |   8 +
 src/lj_target_mips.h       |  52 ++-
 src/vm_mips64.dasc         | 370 ++++++++++++++++++++--
 16 files changed, 1301 insertions(+), 347 deletions(-)
 create mode 100644 src/jit/dis_mips64r6.lua
 create mode 100644 src/jit/dis_mips64r6el.lua

diff --git a/cmake/SetDynASMFlags.cmake b/cmake/SetDynASMFlags.cmake
index 142d7e64..7eead6e9 100644
--- a/cmake/SetDynASMFlags.cmake
+++ b/cmake/SetDynASMFlags.cmake
@@ -64,6 +64,11 @@ elseif(LUAJIT_ARCH STREQUAL "mips")
   endif()
 endif()
 
+string(FIND "${TESTARCH}" "LJ_TARGET_MIPSR6" FOUND)
+if(NOT FOUND EQUAL -1)
+  AppendFlags(DYNASM_FLAGS -D MIPSR6)
+endif()
+
 string(FIND "${TESTARCH}" "LJ_LE 1" FOUND)
 if(NOT FOUND EQUAL -1)
   list(APPEND DYNASM_FLAGS -D ENDIAN_LE)
diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
index 71a835b2..7d06aa72 100644
--- a/dynasm/dasm_mips.h
+++ b/dynasm/dasm_mips.h
@@ -355,14 +355,15 @@ int dasm_encode(Dst_DECL, void *buffer)
 	  CK(n >= 0, UNDEF_PC);
 	  n = *DASM_POS2PTR(D, n);
 	  if (ins & 2048)
-	    n = n - (int)((char *)cp - base);
-	  else
 	    n = (n + (int)(size_t)base) & 0x0fffffff;
-	patchrel:
+	  else
+	    n = n - (int)((char *)cp - base);
+	patchrel: {
+	  unsigned int e = 16 + ((ins >> 12) & 15);
 	  CK((n & 3) == 0 &&
-	     ((n + ((ins & 2048) ? 0x00020000 : 0)) >>
-	       ((ins & 2048) ? 18 : 28)) == 0, RANGE_REL);
-	  cp[-1] |= ((n>>2) & ((ins & 2048) ? 0x0000ffff: 0x03ffffff));
+	     ((n + ((ins & 2048) ? 0 : (1<<(e+1)))) >> (e+2)) == 0, RANGE_REL);
+	  cp[-1] |= ((n>>2) & ((1<<e)-1));
+	  }
 	  break;
 	case DASM_LABEL_LG:
 	  ins &= 2047; if (ins >= 20) D->globals[ins-10] = (void *)(base + n);
diff --git a/dynasm/dasm_mips.lua b/dynasm/dasm_mips.lua
index bd2a2b43..ccdc53cd 100644
--- a/dynasm/dasm_mips.lua
+++ b/dynasm/dasm_mips.lua
@@ -6,6 +6,7 @@
 ------------------------------------------------------------------------------
 
 local mips64 = mips64
+local mipsr6 = _map_def.MIPSR6
 
 -- Module information:
 local _info = {
@@ -238,7 +239,6 @@ local map_op = {
   bne_3 =	"14000000STB",
   blez_2 =	"18000000SB",
   bgtz_2 =	"1c000000SB",
-  addi_3 =	"20000000TSI",
   li_2 =	"24000000TI",
   addiu_3 =	"24000000TSI",
   slti_3 =	"28000000TSI",
@@ -248,40 +248,22 @@ local map_op = {
   ori_3 =	"34000000TSU",
   xori_3 =	"38000000TSU",
   lui_2 =	"3c000000TU",
-  beqzl_2 =	"50000000SB",
-  beql_3 =	"50000000STB",
-  bnezl_2 =	"54000000SB",
-  bnel_3 =	"54000000STB",
-  blezl_2 =	"58000000SB",
-  bgtzl_2 =	"5c000000SB",
-  daddi_3 =	mips64 and "60000000TSI",
   daddiu_3 =	mips64 and "64000000TSI",
   ldl_2 =	mips64 and "68000000TO",
   ldr_2 =	mips64 and "6c000000TO",
   lb_2 =	"80000000TO",
   lh_2 =	"84000000TO",
-  lwl_2 =	"88000000TO",
   lw_2 =	"8c000000TO",
   lbu_2 =	"90000000TO",
   lhu_2 =	"94000000TO",
-  lwr_2 =	"98000000TO",
   lwu_2 =	mips64 and "9c000000TO",
   sb_2 =	"a0000000TO",
   sh_2 =	"a4000000TO",
-  swl_2 =	"a8000000TO",
   sw_2 =	"ac000000TO",
-  sdl_2 =	mips64 and "b0000000TO",
-  sdr_2 =	mips64 and "b1000000TO",
-  swr_2 =	"b8000000TO",
-  cache_2 =	"bc000000NO",
-  ll_2 =	"c0000000TO",
   lwc1_2 =	"c4000000HO",
-  pref_2 =	"cc000000NO",
   ldc1_2 =	"d4000000HO",
   ld_2 =	mips64 and "dc000000TO",
-  sc_2 =	"e0000000TO",
   swc1_2 =	"e4000000HO",
-  scd_2 =	mips64 and "f0000000TO",
   sdc1_2 =	"f4000000HO",
   sd_2 =	mips64 and "fc000000TO",
 
@@ -289,10 +271,6 @@ local map_op = {
   nop_0 =	"00000000",
   sll_3 =	"00000000DTA",
   sextw_2 =	"00000000DT",
-  movf_2 =	"00000001DS",
-  movf_3 =	"00000001DSC",
-  movt_2 =	"00010001DS",
-  movt_3 =	"00010001DSC",
   srl_3 =	"00000002DTA",
   rotr_3 =	"00200002DTA",
   sra_3 =	"00000003DTA",
@@ -301,31 +279,16 @@ local map_op = {
   rotrv_3 =	"00000046DTS",
   drotrv_3 =	mips64 and "00000056DTS",
   srav_3 =	"00000007DTS",
-  jr_1 =	"00000008S",
   jalr_1 =	"0000f809S",
   jalr_2 =	"00000009DS",
-  movz_3 =	"0000000aDST",
-  movn_3 =	"0000000bDST",
   syscall_0 =	"0000000c",
   syscall_1 =	"0000000cY",
   break_0 =	"0000000d",
   break_1 =	"0000000dY",
   sync_0 =	"0000000f",
-  mfhi_1 =	"00000010D",
-  mthi_1 =	"00000011S",
-  mflo_1 =	"00000012D",
-  mtlo_1 =	"00000013S",
   dsllv_3 =	mips64 and "00000014DTS",
   dsrlv_3 =	mips64 and "00000016DTS",
   dsrav_3 =	mips64 and "00000017DTS",
-  mult_2 =	"00000018ST",
-  multu_2 =	"00000019ST",
-  div_2 =	"0000001aST",
-  divu_2 =	"0000001bST",
-  dmult_2 =	mips64 and "0000001cST",
-  dmultu_2 =	mips64 and "0000001dST",
-  ddiv_2 =	mips64 and "0000001eST",
-  ddivu_2 =	mips64 and "0000001fST",
   add_3 =	"00000020DST",
   move_2 =	mips64 and "00000025DS" or "00000021DS",
   addu_3 =	"00000021DST",
@@ -369,32 +332,9 @@ local map_op = {
   bgez_2 =	"04010000SB",
   bltzl_2 =	"04020000SB",
   bgezl_2 =	"04030000SB",
-  tgei_2 =	"04080000SI",
-  tgeiu_2 =	"04090000SI",
-  tlti_2 =	"040a0000SI",
-  tltiu_2 =	"040b0000SI",
-  teqi_2 =	"040c0000SI",
-  tnei_2 =	"040e0000SI",
-  bltzal_2 =	"04100000SB",
   bal_1 =	"04110000B",
-  bgezal_2 =	"04110000SB",
-  bltzall_2 =	"04120000SB",
-  bgezall_2 =	"04130000SB",
   synci_1 =	"041f0000O",
 
-  -- Opcode SPECIAL2.
-  madd_2 =	"70000000ST",
-  maddu_2 =	"70000001ST",
-  mul_3 =	"70000002DST",
-  msub_2 =	"70000004ST",
-  msubu_2 =	"70000005ST",
-  clz_2 =	"70000020DS=",
-  clo_2 =	"70000021DS=",
-  dclz_2 =	mips64 and "70000024DS=",
-  dclo_2 =	mips64 and "70000025DS=",
-  sdbbp_0 =	"7000003f",
-  sdbbp_1 =	"7000003fY",
-
   -- Opcode SPECIAL3.
   ext_4 =	"7c000000TSAM", -- Note: last arg is msbd = size-1
   dextm_4 =	mips64 and "7c000001TSAM", -- Args: pos    | size-1-32
@@ -445,15 +385,6 @@ local map_op = {
   ctc1_2 =	"44c00000TG",
   mthc1_2 =	"44e00000TG",
 
-  bc1f_1 =	"45000000B",
-  bc1f_2 =	"45000000CB",
-  bc1t_1 =	"45010000B",
-  bc1t_2 =	"45010000CB",
-  bc1fl_1 =	"45020000B",
-  bc1fl_2 =	"45020000CB",
-  bc1tl_1 =	"45030000B",
-  bc1tl_2 =	"45030000CB",
-
   ["add.s_3"] =		"46000000FGH",
   ["sub.s_3"] =		"46000001FGH",
   ["mul.s_3"] =		"46000002FGH",
@@ -470,51 +401,11 @@ local map_op = {
   ["trunc.w.s_2"] =	"4600000dFG",
   ["ceil.w.s_2"] =	"4600000eFG",
   ["floor.w.s_2"] =	"4600000fFG",
-  ["movf.s_2"] =	"46000011FG",
-  ["movf.s_3"] =	"46000011FGC",
-  ["movt.s_2"] =	"46010011FG",
-  ["movt.s_3"] =	"46010011FGC",
-  ["movz.s_3"] =	"46000012FGT",
-  ["movn.s_3"] =	"46000013FGT",
   ["recip.s_2"] =	"46000015FG",
   ["rsqrt.s_2"] =	"46000016FG",
   ["cvt.d.s_2"] =	"46000021FG",
   ["cvt.w.s_2"] =	"46000024FG",
   ["cvt.l.s_2"] =	"46000025FG",
-  ["cvt.ps.s_3"] =	"46000026FGH",
-  ["c.f.s_2"] =		"46000030GH",
-  ["c.f.s_3"] =		"46000030VGH",
-  ["c.un.s_2"] =	"46000031GH",
-  ["c.un.s_3"] =	"46000031VGH",
-  ["c.eq.s_2"] =	"46000032GH",
-  ["c.eq.s_3"] =	"46000032VGH",
-  ["c.ueq.s_2"] =	"46000033GH",
-  ["c.ueq.s_3"] =	"46000033VGH",
-  ["c.olt.s_2"] =	"46000034GH",
-  ["c.olt.s_3"] =	"46000034VGH",
-  ["c.ult.s_2"] =	"46000035GH",
-  ["c.ult.s_3"] =	"46000035VGH",
-  ["c.ole.s_2"] =	"46000036GH",
-  ["c.ole.s_3"] =	"46000036VGH",
-  ["c.ule.s_2"] =	"46000037GH",
-  ["c.ule.s_3"] =	"46000037VGH",
-  ["c.sf.s_2"] =	"46000038GH",
-  ["c.sf.s_3"] =	"46000038VGH",
-  ["c.ngle.s_2"] =	"46000039GH",
-  ["c.ngle.s_3"] =	"46000039VGH",
-  ["c.seq.s_2"] =	"4600003aGH",
-  ["c.seq.s_3"] =	"4600003aVGH",
-  ["c.ngl.s_2"] =	"4600003bGH",
-  ["c.ngl.s_3"] =	"4600003bVGH",
-  ["c.lt.s_2"] =	"4600003cGH",
-  ["c.lt.s_3"] =	"4600003cVGH",
-  ["c.nge.s_2"] =	"4600003dGH",
-  ["c.nge.s_3"] =	"4600003dVGH",
-  ["c.le.s_2"] =	"4600003eGH",
-  ["c.le.s_3"] =	"4600003eVGH",
-  ["c.ngt.s_2"] =	"4600003fGH",
-  ["c.ngt.s_3"] =	"4600003fVGH",
-
   ["add.d_3"] =		"46200000FGH",
   ["sub.d_3"] =		"46200001FGH",
   ["mul.d_3"] =		"46200002FGH",
@@ -531,130 +422,410 @@ local map_op = {
   ["trunc.w.d_2"] =	"4620000dFG",
   ["ceil.w.d_2"] =	"4620000eFG",
   ["floor.w.d_2"] =	"4620000fFG",
-  ["movf.d_2"] =	"46200011FG",
-  ["movf.d_3"] =	"46200011FGC",
-  ["movt.d_2"] =	"46210011FG",
-  ["movt.d_3"] =	"46210011FGC",
-  ["movz.d_3"] =	"46200012FGT",
-  ["movn.d_3"] =	"46200013FGT",
   ["recip.d_2"] =	"46200015FG",
   ["rsqrt.d_2"] =	"46200016FG",
   ["cvt.s.d_2"] =	"46200020FG",
   ["cvt.w.d_2"] =	"46200024FG",
   ["cvt.l.d_2"] =	"46200025FG",
-  ["c.f.d_2"] =		"46200030GH",
-  ["c.f.d_3"] =		"46200030VGH",
-  ["c.un.d_2"] =	"46200031GH",
-  ["c.un.d_3"] =	"46200031VGH",
-  ["c.eq.d_2"] =	"46200032GH",
-  ["c.eq.d_3"] =	"46200032VGH",
-  ["c.ueq.d_2"] =	"46200033GH",
-  ["c.ueq.d_3"] =	"46200033VGH",
-  ["c.olt.d_2"] =	"46200034GH",
-  ["c.olt.d_3"] =	"46200034VGH",
-  ["c.ult.d_2"] =	"46200035GH",
-  ["c.ult.d_3"] =	"46200035VGH",
-  ["c.ole.d_2"] =	"46200036GH",
-  ["c.ole.d_3"] =	"46200036VGH",
-  ["c.ule.d_2"] =	"46200037GH",
-  ["c.ule.d_3"] =	"46200037VGH",
-  ["c.sf.d_2"] =	"46200038GH",
-  ["c.sf.d_3"] =	"46200038VGH",
-  ["c.ngle.d_2"] =	"46200039GH",
-  ["c.ngle.d_3"] =	"46200039VGH",
-  ["c.seq.d_2"] =	"4620003aGH",
-  ["c.seq.d_3"] =	"4620003aVGH",
-  ["c.ngl.d_2"] =	"4620003bGH",
-  ["c.ngl.d_3"] =	"4620003bVGH",
-  ["c.lt.d_2"] =	"4620003cGH",
-  ["c.lt.d_3"] =	"4620003cVGH",
-  ["c.nge.d_2"] =	"4620003dGH",
-  ["c.nge.d_3"] =	"4620003dVGH",
-  ["c.le.d_2"] =	"4620003eGH",
-  ["c.le.d_3"] =	"4620003eVGH",
-  ["c.ngt.d_2"] =	"4620003fGH",
-  ["c.ngt.d_3"] =	"4620003fVGH",
-
-  ["add.ps_3"] =	"46c00000FGH",
-  ["sub.ps_3"] =	"46c00001FGH",
-  ["mul.ps_3"] =	"46c00002FGH",
-  ["abs.ps_2"] =	"46c00005FG",
-  ["mov.ps_2"] =	"46c00006FG",
-  ["neg.ps_2"] =	"46c00007FG",
-  ["movf.ps_2"] =	"46c00011FG",
-  ["movf.ps_3"] =	"46c00011FGC",
-  ["movt.ps_2"] =	"46c10011FG",
-  ["movt.ps_3"] =	"46c10011FGC",
-  ["movz.ps_3"] =	"46c00012FGT",
-  ["movn.ps_3"] =	"46c00013FGT",
-  ["cvt.s.pu_2"] =	"46c00020FG",
-  ["cvt.s.pl_2"] =	"46c00028FG",
-  ["pll.ps_3"] =	"46c0002cFGH",
-  ["plu.ps_3"] =	"46c0002dFGH",
-  ["pul.ps_3"] =	"46c0002eFGH",
-  ["puu.ps_3"] =	"46c0002fFGH",
-  ["c.f.ps_2"] =	"46c00030GH",
-  ["c.f.ps_3"] =	"46c00030VGH",
-  ["c.un.ps_2"] =	"46c00031GH",
-  ["c.un.ps_3"] =	"46c00031VGH",
-  ["c.eq.ps_2"] =	"46c00032GH",
-  ["c.eq.ps_3"] =	"46c00032VGH",
-  ["c.ueq.ps_2"] =	"46c00033GH",
-  ["c.ueq.ps_3"] =	"46c00033VGH",
-  ["c.olt.ps_2"] =	"46c00034GH",
-  ["c.olt.ps_3"] =	"46c00034VGH",
-  ["c.ult.ps_2"] =	"46c00035GH",
-  ["c.ult.ps_3"] =	"46c00035VGH",
-  ["c.ole.ps_2"] =	"46c00036GH",
-  ["c.ole.ps_3"] =	"46c00036VGH",
-  ["c.ule.ps_2"] =	"46c00037GH",
-  ["c.ule.ps_3"] =	"46c00037VGH",
-  ["c.sf.ps_2"] =	"46c00038GH",
-  ["c.sf.ps_3"] =	"46c00038VGH",
-  ["c.ngle.ps_2"] =	"46c00039GH",
-  ["c.ngle.ps_3"] =	"46c00039VGH",
-  ["c.seq.ps_2"] =	"46c0003aGH",
-  ["c.seq.ps_3"] =	"46c0003aVGH",
-  ["c.ngl.ps_2"] =	"46c0003bGH",
-  ["c.ngl.ps_3"] =	"46c0003bVGH",
-  ["c.lt.ps_2"] =	"46c0003cGH",
-  ["c.lt.ps_3"] =	"46c0003cVGH",
-  ["c.nge.ps_2"] =	"46c0003dGH",
-  ["c.nge.ps_3"] =	"46c0003dVGH",
-  ["c.le.ps_2"] =	"46c0003eGH",
-  ["c.le.ps_3"] =	"46c0003eVGH",
-  ["c.ngt.ps_2"] =	"46c0003fGH",
-  ["c.ngt.ps_3"] =	"46c0003fVGH",
-
   ["cvt.s.w_2"] =	"46800020FG",
   ["cvt.d.w_2"] =	"46800021FG",
-
   ["cvt.s.l_2"] =	"46a00020FG",
   ["cvt.d.l_2"] =	"46a00021FG",
-
-  -- Opcode COP1X.
-  lwxc1_2 =		"4c000000FX",
-  ldxc1_2 =		"4c000001FX",
-  luxc1_2 =		"4c000005FX",
-  swxc1_2 =		"4c000008FX",
-  sdxc1_2 =		"4c000009FX",
-  suxc1_2 =		"4c00000dFX",
-  prefx_2 =		"4c00000fMX",
-  ["alnv.ps_4"] =	"4c00001eFGHS",
-  ["madd.s_4"] =	"4c000020FRGH",
-  ["madd.d_4"] =	"4c000021FRGH",
-  ["madd.ps_4"] =	"4c000026FRGH",
-  ["msub.s_4"] =	"4c000028FRGH",
-  ["msub.d_4"] =	"4c000029FRGH",
-  ["msub.ps_4"] =	"4c00002eFRGH",
-  ["nmadd.s_4"] =	"4c000030FRGH",
-  ["nmadd.d_4"] =	"4c000031FRGH",
-  ["nmadd.ps_4"] =	"4c000036FRGH",
-  ["nmsub.s_4"] =	"4c000038FRGH",
-  ["nmsub.d_4"] =	"4c000039FRGH",
-  ["nmsub.ps_4"] =	"4c00003eFRGH",
 }
 
+if mipsr6 then -- Instructions added with MIPSR6.
+
+  for k,v in pairs({
+
+    -- Add immediate to upper bits.
+    aui_3 =	"3c000000TSI",
+    daui_3 =	mips64 and "74000000TSI",
+    dahi_2 =	mips64 and "04060000SI",
+    dati_2 =	mips64 and "041e0000SI",
+
+    -- TODO: addiupc, auipc, aluipc, lwpc, lwupc, ldpc.
+
+    -- Compact branches.
+    blezalc_2 =	"18000000TB",	-- rt != 0.
+    bgezalc_2 =	"18000000T=SB",	-- rt != 0.
+    bgtzalc_2 =	"1c000000TB",	-- rt != 0.
+    bltzalc_2 =	"1c000000T=SB",	-- rt != 0.
+
+    blezc_2 =	"58000000TB",	-- rt != 0.
+    bgezc_2 =	"58000000T=SB",	-- rt != 0.
+    bgec_3 =	"58000000STB",	-- rs != rt.
+    blec_3 =	"58000000TSB",	-- rt != rs.
+
+    bgtzc_2 =	"5c000000TB",	-- rt != 0.
+    bltzc_2 =	"5c000000T=SB",	-- rt != 0.
+    bltc_3 =	"5c000000STB",	-- rs != rt.
+    bgtc_3 =	"5c000000TSB",	-- rt != rs.
+
+    bgeuc_3 =	"18000000STB",	-- rs != rt.
+    bleuc_3 =	"18000000TSB",	-- rt != rs.
+    bltuc_3 =	"1c000000STB",	-- rs != rt.
+    bgtuc_3 =	"1c000000TSB",	-- rt != rs.
+
+    beqzalc_2 =	"20000000TB",	-- rt != 0.
+    bnezalc_2 =	"60000000TB",	-- rt != 0.
+    beqc_3 =	"20000000STB",	-- rs < rt.
+    bnec_3 =	"60000000STB",	-- rs < rt.
+    bovc_3 =	"20000000STB",	-- rs >= rt.
+    bnvc_3 =	"60000000STB",	-- rs >= rt.
+
+    beqzc_2 =	"d8000000SK",	-- rs != 0.
+    bnezc_2 =	"f8000000SK",	-- rs != 0.
+    jic_2 =	"d8000000TI",
+    jialc_2 =	"f8000000TI",
+    bc_1 =	"c8000000L",
+    balc_1 =	"e8000000L",
+
+    -- Opcode SPECIAL.
+    jr_1 =	"00000009S",
+    sdbbp_0 =	"0000000e",
+    sdbbp_1 =	"0000000eY",
+    lsa_4 =	"00000005DSTA",
+    dlsa_4 =	mips64 and "00000015DSTA",
+    seleqz_3 =	"00000035DST",
+    selnez_3 =	"00000037DST",
+    clz_2 =	"00000050DS",
+    clo_2 =	"00000051DS",
+    dclz_2 =	mips64 and "00000052DS",
+    dclo_2 =	mips64 and "00000053DS",
+    mul_3 =	"00000098DST",
+    muh_3 =	"000000d8DST",
+    mulu_3 =	"00000099DST",
+    muhu_3 =	"000000d9DST",
+    div_3 =	"0000009aDST",
+    mod_3 =	"000000daDST",
+    divu_3 =	"0000009bDST",
+    modu_3 =	"000000dbDST",
+    dmul_3 =	mips64 and "0000009cDST",
+    dmuh_3 =	mips64 and "000000dcDST",
+    dmulu_3 =	mips64 and "0000009dDST",
+    dmuhu_3 =	mips64 and "000000ddDST",
+    ddiv_3 =	mips64 and "0000009eDST",
+    dmod_3 =	mips64 and "000000deDST",
+    ddivu_3 =	mips64 and "0000009fDST",
+    dmodu_3 =	mips64 and "000000dfDST",
+
+    -- Opcode SPECIAL3.
+    align_4 =		"7c000220DSTA",
+    dalign_4 =		mips64 and "7c000224DSTA",
+    bitswap_2 =		"7c000020DT",
+    dbitswap_2 =	mips64 and "7c000024DT",
+
+    -- Opcode COP1.
+    bc1eqz_2 =	"45200000HB",
+    bc1nez_2 =	"45a00000HB",
+
+    ["sel.s_3"] =	"46000010FGH",
+    ["seleqz.s_3"] =	"46000014FGH",
+    ["selnez.s_3"] =	"46000017FGH",
+    ["maddf.s_3"] =	"46000018FGH",
+    ["msubf.s_3"] =	"46000019FGH",
+    ["rint.s_2"] =	"4600001aFG",
+    ["class.s_2"] =	"4600001bFG",
+    ["min.s_3"] =	"4600001cFGH",
+    ["mina.s_3"] =	"4600001dFGH",
+    ["max.s_3"] =	"4600001eFGH",
+    ["maxa.s_3"] =	"4600001fFGH",
+    ["cmp.af.s_3"] =	"46800000FGH",
+    ["cmp.un.s_3"] =	"46800001FGH",
+    ["cmp.or.s_3"] =	"46800011FGH",
+    ["cmp.eq.s_3"] =	"46800002FGH",
+    ["cmp.une.s_3"] =	"46800012FGH",
+    ["cmp.ueq.s_3"] =	"46800003FGH",
+    ["cmp.ne.s_3"] =	"46800013FGH",
+    ["cmp.lt.s_3"] =	"46800004FGH",
+    ["cmp.ult.s_3"] =	"46800005FGH",
+    ["cmp.le.s_3"] =	"46800006FGH",
+    ["cmp.ule.s_3"] =	"46800007FGH",
+    ["cmp.saf.s_3"] =	"46800008FGH",
+    ["cmp.sun.s_3"] =	"46800009FGH",
+    ["cmp.sor.s_3"] =	"46800019FGH",
+    ["cmp.seq.s_3"] =	"4680000aFGH",
+    ["cmp.sune.s_3"] =	"4680001aFGH",
+    ["cmp.sueq.s_3"] =	"4680000bFGH",
+    ["cmp.sne.s_3"] =	"4680001bFGH",
+    ["cmp.slt.s_3"] =	"4680000cFGH",
+    ["cmp.sult.s_3"] =	"4680000dFGH",
+    ["cmp.sle.s_3"] =	"4680000eFGH",
+    ["cmp.sule.s_3"] =	"4680000fFGH",
+
+    ["sel.d_3"] =	"46200010FGH",
+    ["seleqz.d_3"] =	"46200014FGH",
+    ["selnez.d_3"] =	"46200017FGH",
+    ["maddf.d_3"] =	"46200018FGH",
+    ["msubf.d_3"] =	"46200019FGH",
+    ["rint.d_2"] =	"4620001aFG",
+    ["class.d_2"] =	"4620001bFG",
+    ["min.d_3"] =	"4620001cFGH",
+    ["mina.d_3"] =	"4620001dFGH",
+    ["max.d_3"] =	"4620001eFGH",
+    ["maxa.d_3"] =	"4620001fFGH",
+    ["cmp.af.d_3"] =	"46a00000FGH",
+    ["cmp.un.d_3"] =	"46a00001FGH",
+    ["cmp.or.d_3"] =	"46a00011FGH",
+    ["cmp.eq.d_3"] =	"46a00002FGH",
+    ["cmp.une.d_3"] =	"46a00012FGH",
+    ["cmp.ueq.d_3"] =	"46a00003FGH",
+    ["cmp.ne.d_3"] =	"46a00013FGH",
+    ["cmp.lt.d_3"] =	"46a00004FGH",
+    ["cmp.ult.d_3"] =	"46a00005FGH",
+    ["cmp.le.d_3"] =	"46a00006FGH",
+    ["cmp.ule.d_3"] =	"46a00007FGH",
+    ["cmp.saf.d_3"] =	"46a00008FGH",
+    ["cmp.sun.d_3"] =	"46a00009FGH",
+    ["cmp.sor.d_3"] =	"46a00019FGH",
+    ["cmp.seq.d_3"] =	"46a0000aFGH",
+    ["cmp.sune.d_3"] =	"46a0001aFGH",
+    ["cmp.sueq.d_3"] =	"46a0000bFGH",
+    ["cmp.sne.d_3"] =	"46a0001bFGH",
+    ["cmp.slt.d_3"] =	"46a0000cFGH",
+    ["cmp.sult.d_3"] =	"46a0000dFGH",
+    ["cmp.sle.d_3"] =	"46a0000eFGH",
+    ["cmp.sule.d_3"] =	"46a0000fFGH",
+
+  }) do map_op[k] = v end
+
+else -- Instructions removed by MIPSR6.
+
+  for k,v in pairs({
+    -- Traps, don't use.
+    addi_3 =	"20000000TSI",
+    daddi_3 =	mips64 and "60000000TSI",
+
+    -- Branch on likely, don't use.
+    beqzl_2 =	"50000000SB",
+    beql_3 =	"50000000STB",
+    bnezl_2 =	"54000000SB",
+    bnel_3 =	"54000000STB",
+    blezl_2 =	"58000000SB",
+    bgtzl_2 =	"5c000000SB",
+
+    lwl_2 =	"88000000TO",
+    lwr_2 =	"98000000TO",
+    swl_2 =	"a8000000TO",
+    sdl_2 =	mips64 and "b0000000TO",
+    sdr_2 =	mips64 and "b1000000TO",
+    swr_2 =	"b8000000TO",
+    cache_2 =	"bc000000NO",
+    ll_2 =	"c0000000TO",
+    pref_2 =	"cc000000NO",
+    sc_2 =	"e0000000TO",
+    scd_2 =	mips64 and "f0000000TO",
+
+    -- Opcode SPECIAL.
+    movf_2 =	"00000001DS",
+    movf_3 =	"00000001DSC",
+    movt_2 =	"00010001DS",
+    movt_3 =	"00010001DSC",
+    jr_1 =	"00000008S",
+    movz_3 =	"0000000aDST",
+    movn_3 =	"0000000bDST",
+    mfhi_1 =	"00000010D",
+    mthi_1 =	"00000011S",
+    mflo_1 =	"00000012D",
+    mtlo_1 =	"00000013S",
+    mult_2 =	"00000018ST",
+    multu_2 =	"00000019ST",
+    div_3 =	"0000001aST",
+    divu_3 =	"0000001bST",
+    ddiv_3 =	mips64 and "0000001eST",
+    ddivu_3 =	mips64 and "0000001fST",
+    dmult_2 =	mips64 and "0000001cST",
+    dmultu_2 =	mips64 and "0000001dST",
+
+    -- Opcode REGIMM.
+    tgei_2 =	"04080000SI",
+    tgeiu_2 =	"04090000SI",
+    tlti_2 =	"040a0000SI",
+    tltiu_2 =	"040b0000SI",
+    teqi_2 =	"040c0000SI",
+    tnei_2 =	"040e0000SI",
+    bltzal_2 =	"04100000SB",
+    bgezal_2 =	"04110000SB",
+    bltzall_2 =	"04120000SB",
+    bgezall_2 =	"04130000SB",
+
+    -- Opcode SPECIAL2.
+    madd_2 =	"70000000ST",
+    maddu_2 =	"70000001ST",
+    mul_3 =	"70000002DST",
+    msub_2 =	"70000004ST",
+    msubu_2 =	"70000005ST",
+    clz_2 =	"70000020D=TS",
+    clo_2 =	"70000021D=TS",
+    dclz_2 =	mips64 and "70000024D=TS",
+    dclo_2 =	mips64 and "70000025D=TS",
+    sdbbp_0 =	"7000003f",
+    sdbbp_1 =	"7000003fY",
+
+    -- Opcode COP1.
+    bc1f_1 =	"45000000B",
+    bc1f_2 =	"45000000CB",
+    bc1t_1 =	"45010000B",
+    bc1t_2 =	"45010000CB",
+    bc1fl_1 =	"45020000B",
+    bc1fl_2 =	"45020000CB",
+    bc1tl_1 =	"45030000B",
+    bc1tl_2 =	"45030000CB",
+
+    ["movf.s_2"] =	"46000011FG",
+    ["movf.s_3"] =	"46000011FGC",
+    ["movt.s_2"] =	"46010011FG",
+    ["movt.s_3"] =	"46010011FGC",
+    ["movz.s_3"] =	"46000012FGT",
+    ["movn.s_3"] =	"46000013FGT",
+    ["cvt.ps.s_3"] =	"46000026FGH",
+    ["c.f.s_2"] =	"46000030GH",
+    ["c.f.s_3"] =	"46000030VGH",
+    ["c.un.s_2"] =	"46000031GH",
+    ["c.un.s_3"] =	"46000031VGH",
+    ["c.eq.s_2"] =	"46000032GH",
+    ["c.eq.s_3"] =	"46000032VGH",
+    ["c.ueq.s_2"] =	"46000033GH",
+    ["c.ueq.s_3"] =	"46000033VGH",
+    ["c.olt.s_2"] =	"46000034GH",
+    ["c.olt.s_3"] =	"46000034VGH",
+    ["c.ult.s_2"] =	"46000035GH",
+    ["c.ult.s_3"] =	"46000035VGH",
+    ["c.ole.s_2"] =	"46000036GH",
+    ["c.ole.s_3"] =	"46000036VGH",
+    ["c.ule.s_2"] =	"46000037GH",
+    ["c.ule.s_3"] =	"46000037VGH",
+    ["c.sf.s_2"] =	"46000038GH",
+    ["c.sf.s_3"] =	"46000038VGH",
+    ["c.ngle.s_2"] =	"46000039GH",
+    ["c.ngle.s_3"] =	"46000039VGH",
+    ["c.seq.s_2"] =	"4600003aGH",
+    ["c.seq.s_3"] =	"4600003aVGH",
+    ["c.ngl.s_2"] =	"4600003bGH",
+    ["c.ngl.s_3"] =	"4600003bVGH",
+    ["c.lt.s_2"] =	"4600003cGH",
+    ["c.lt.s_3"] =	"4600003cVGH",
+    ["c.nge.s_2"] =	"4600003dGH",
+    ["c.nge.s_3"] =	"4600003dVGH",
+    ["c.le.s_2"] =	"4600003eGH",
+    ["c.le.s_3"] =	"4600003eVGH",
+    ["c.ngt.s_2"] =	"4600003fGH",
+    ["c.ngt.s_3"] =	"4600003fVGH",
+    ["movf.d_2"] =	"46200011FG",
+    ["movf.d_3"] =	"46200011FGC",
+    ["movt.d_2"] =	"46210011FG",
+    ["movt.d_3"] =	"46210011FGC",
+    ["movz.d_3"] =	"46200012FGT",
+    ["movn.d_3"] =	"46200013FGT",
+    ["c.f.d_2"] =	"46200030GH",
+    ["c.f.d_3"] =	"46200030VGH",
+    ["c.un.d_2"] =	"46200031GH",
+    ["c.un.d_3"] =	"46200031VGH",
+    ["c.eq.d_2"] =	"46200032GH",
+    ["c.eq.d_3"] =	"46200032VGH",
+    ["c.ueq.d_2"] =	"46200033GH",
+    ["c.ueq.d_3"] =	"46200033VGH",
+    ["c.olt.d_2"] =	"46200034GH",
+    ["c.olt.d_3"] =	"46200034VGH",
+    ["c.ult.d_2"] =	"46200035GH",
+    ["c.ult.d_3"] =	"46200035VGH",
+    ["c.ole.d_2"] =	"46200036GH",
+    ["c.ole.d_3"] =	"46200036VGH",
+    ["c.ule.d_2"] =	"46200037GH",
+    ["c.ule.d_3"] =	"46200037VGH",
+    ["c.sf.d_2"] =	"46200038GH",
+    ["c.sf.d_3"] =	"46200038VGH",
+    ["c.ngle.d_2"] =	"46200039GH",
+    ["c.ngle.d_3"] =	"46200039VGH",
+    ["c.seq.d_2"] =	"4620003aGH",
+    ["c.seq.d_3"] =	"4620003aVGH",
+    ["c.ngl.d_2"] =	"4620003bGH",
+    ["c.ngl.d_3"] =	"4620003bVGH",
+    ["c.lt.d_2"] =	"4620003cGH",
+    ["c.lt.d_3"] =	"4620003cVGH",
+    ["c.nge.d_2"] =	"4620003dGH",
+    ["c.nge.d_3"] =	"4620003dVGH",
+    ["c.le.d_2"] =	"4620003eGH",
+    ["c.le.d_3"] =	"4620003eVGH",
+    ["c.ngt.d_2"] =	"4620003fGH",
+    ["c.ngt.d_3"] =	"4620003fVGH",
+    ["add.ps_3"] =	"46c00000FGH",
+    ["sub.ps_3"] =	"46c00001FGH",
+    ["mul.ps_3"] =	"46c00002FGH",
+    ["abs.ps_2"] =	"46c00005FG",
+    ["mov.ps_2"] =	"46c00006FG",
+    ["neg.ps_2"] =	"46c00007FG",
+    ["movf.ps_2"] =	"46c00011FG",
+    ["movf.ps_3"] =	"46c00011FGC",
+    ["movt.ps_2"] =	"46c10011FG",
+    ["movt.ps_3"] =	"46c10011FGC",
+    ["movz.ps_3"] =	"46c00012FGT",
+    ["movn.ps_3"] =	"46c00013FGT",
+    ["cvt.s.pu_2"] =	"46c00020FG",
+    ["cvt.s.pl_2"] =	"46c00028FG",
+    ["pll.ps_3"] =	"46c0002cFGH",
+    ["plu.ps_3"] =	"46c0002dFGH",
+    ["pul.ps_3"] =	"46c0002eFGH",
+    ["puu.ps_3"] =	"46c0002fFGH",
+    ["c.f.ps_2"] =	"46c00030GH",
+    ["c.f.ps_3"] =	"46c00030VGH",
+    ["c.un.ps_2"] =	"46c00031GH",
+    ["c.un.ps_3"] =	"46c00031VGH",
+    ["c.eq.ps_2"] =	"46c00032GH",
+    ["c.eq.ps_3"] =	"46c00032VGH",
+    ["c.ueq.ps_2"] =	"46c00033GH",
+    ["c.ueq.ps_3"] =	"46c00033VGH",
+    ["c.olt.ps_2"] =	"46c00034GH",
+    ["c.olt.ps_3"] =	"46c00034VGH",
+    ["c.ult.ps_2"] =	"46c00035GH",
+    ["c.ult.ps_3"] =	"46c00035VGH",
+    ["c.ole.ps_2"] =	"46c00036GH",
+    ["c.ole.ps_3"] =	"46c00036VGH",
+    ["c.ule.ps_2"] =	"46c00037GH",
+    ["c.ule.ps_3"] =	"46c00037VGH",
+    ["c.sf.ps_2"] =	"46c00038GH",
+    ["c.sf.ps_3"] =	"46c00038VGH",
+    ["c.ngle.ps_2"] =	"46c00039GH",
+    ["c.ngle.ps_3"] =	"46c00039VGH",
+    ["c.seq.ps_2"] =	"46c0003aGH",
+    ["c.seq.ps_3"] =	"46c0003aVGH",
+    ["c.ngl.ps_2"] =	"46c0003bGH",
+    ["c.ngl.ps_3"] =	"46c0003bVGH",
+    ["c.lt.ps_2"] =	"46c0003cGH",
+    ["c.lt.ps_3"] =	"46c0003cVGH",
+    ["c.nge.ps_2"] =	"46c0003dGH",
+    ["c.nge.ps_3"] =	"46c0003dVGH",
+    ["c.le.ps_2"] =	"46c0003eGH",
+    ["c.le.ps_3"] =	"46c0003eVGH",
+    ["c.ngt.ps_2"] =	"46c0003fGH",
+    ["c.ngt.ps_3"] =	"46c0003fVGH",
+
+    -- Opcode COP1X.
+    lwxc1_2 =	"4c000000FX",
+    ldxc1_2 =	"4c000001FX",
+    luxc1_2 =	"4c000005FX",
+    swxc1_2 =	"4c000008FX",
+    sdxc1_2 =	"4c000009FX",
+    suxc1_2 =	"4c00000dFX",
+    prefx_2 =	"4c00000fMX",
+    ["alnv.ps_4"] =	"4c00001eFGHS",
+    ["madd.s_4"] =	"4c000020FRGH",
+    ["madd.d_4"] =	"4c000021FRGH",
+    ["madd.ps_4"] =	"4c000026FRGH",
+    ["msub.s_4"] =	"4c000028FRGH",
+    ["msub.d_4"] =	"4c000029FRGH",
+    ["msub.ps_4"] =	"4c00002eFRGH",
+    ["nmadd.s_4"] =	"4c000030FRGH",
+    ["nmadd.d_4"] =	"4c000031FRGH",
+    ["nmadd.ps_4"] =	"4c000036FRGH",
+    ["nmsub.s_4"] =	"4c000038FRGH",
+    ["nmsub.d_4"] =	"4c000039FRGH",
+    ["nmsub.ps_4"] =	"4c00003eFRGH",
+
+  }) do map_op[k] = v end
+
+end
+
 ------------------------------------------------------------------------------
 
 local function parse_gpr(expr)
@@ -808,9 +979,11 @@ map_op[".template__"] = function(params, template, nparams)
       op = op + parse_disp(params[n]); n = n + 1
     elseif p == "X" then
       op = op + parse_index(params[n]); n = n + 1
-    elseif p == "B" or p == "J" then
+    elseif p == "B" or p == "J" or p == "K" or p == "L" then
       local mode, m, s = parse_label(params[n], false)
-      if p == "B" then m = m + 2048 end
+      if p == "J" then m = m + 0xa800
+      elseif p == "K" then m = m + 0x5000
+      elseif p == "L" then m = m + 0xa000 end
       waction("REL_"..mode, m, s, 1)
       n = n + 1
     elseif p == "A" then
@@ -833,7 +1006,7 @@ map_op[".template__"] = function(params, template, nparams)
     elseif p == "Z" then
       op = op + parse_imm(params[n], 10, 6, 0, false); n = n + 1
     elseif p == "=" then
-      op = op + shl(band(op, 0xf800), 5) -- Copy D to T for clz, clo.
+      n = n - 1 -- Re-use previous parameter for next template char.
     else
       assert(false)
     end
diff --git a/dynasm/dynasm.lua b/dynasm/dynasm.lua
index 5ec21a79..46ebfca8 100644
--- a/dynasm/dynasm.lua
+++ b/dynasm/dynasm.lua
@@ -630,6 +630,7 @@ end
 -- Load architecture-specific module.
 local function loadarch(arch)
   if not match(arch, "^[%w_]+$") then return "bad arch name" end
+  _G._map_def = map_def
   local ok, m_arch = pcall(require, "dasm_"..arch)
   if not ok then return "cannot load module: "..m_arch end
   g_arch = m_arch
diff --git a/src/Makefile.original b/src/Makefile.original
index aedaaa73..22d36a27 100644
--- a/src/Makefile.original
+++ b/src/Makefile.original
@@ -455,6 +455,9 @@ ifeq (arm,$(TARGET_LJARCH))
     DASM_AFLAGS+= -D IOS
   endif
 else
+ifneq (,$(findstring LJ_TARGET_MIPSR6 ,$(TARGET_TESTARCH)))
+  DASM_AFLAGS+= -D MIPSR6
+endif
 ifeq (ppc,$(TARGET_LJARCH))
   ifneq (,$(findstring LJ_ARCH_SQRT 1,$(TARGET_TESTARCH)))
     DASM_AFLAGS+= -D SQRT
diff --git a/src/jit/bcsave.lua b/src/jit/bcsave.lua
index 2553d97e..41081184 100644
--- a/src/jit/bcsave.lua
+++ b/src/jit/bcsave.lua
@@ -17,6 +17,10 @@ local bit = require("bit")
 -- Symbol name prefix for LuaJIT bytecode.
 local LJBC_PREFIX = "luaJIT_BC_"
 
+local type, assert = type, assert
+local format = string.format
+local tremove, tconcat = table.remove, table.concat
+
 ------------------------------------------------------------------------------
 
 local function usage()
@@ -63,8 +67,18 @@ local map_type = {
 }
 
 local map_arch = {
-  x86 = true, x64 = true, arm = true, arm64 = true, arm64be = true,
-  ppc = true, mips = true, mipsel = true,
+  x86 =		{ e = "le", b = 32, m = 3, p = 0x14c, },
+  x64 =		{ e = "le", b = 64, m = 62, p = 0x8664, },
+  arm =		{ e = "le", b = 32, m = 40, p = 0x1c0, },
+  arm64 =	{ e = "le", b = 64, m = 183, p = 0xaa64, },
+  arm64be =	{ e = "be", b = 64, m = 183, },
+  ppc =		{ e = "be", b = 32, m = 20, },
+  mips =	{ e = "be", b = 32, m = 8, f = 0x50001006, },
+  mipsel =	{ e = "le", b = 32, m = 8, f = 0x50001006, },
+  mips64 =	{ e = "be", b = 64, m = 8, f = 0x80000007, },
+  mips64el =	{ e = "le", b = 64, m = 8, f = 0x80000007, },
+  mips64r6 =	{ e = "be", b = 64, m = 8, f = 0xa0000407, },
+  mips64r6el =	{ e = "le", b = 64, m = 8, f = 0xa0000407, },
 }
 
 local map_os = {
@@ -73,33 +87,33 @@ local map_os = {
 }
 
 local function checkarg(str, map, err)
-  str = string.lower(str)
+  str = str:lower()
   local s = check(map[str], "unknown ", err)
-  return s == true and str or s
+  return type(s) == "string" and s or str
 end
 
 local function detecttype(str)
-  local ext = string.match(string.lower(str), "%.(%a+)$")
+  local ext = str:lower():match("%.(%a+)$")
   return map_type[ext] or "raw"
 end
 
 local function checkmodname(str)
-  check(string.match(str, "^[%w_.%-]+$"), "bad module name")
-  return string.gsub(str, "[%.%-]", "_")
+  check(str:match("^[%w_.%-]+$"), "bad module name")
+  return str:gsub("[%.%-]", "_")
 end
 
 local function detectmodname(str)
   if type(str) == "string" then
-    local tail = string.match(str, "[^/\\]+$")
+    local tail = str:match("[^/\\]+$")
     if tail then str = tail end
-    local head = string.match(str, "^(.*)%.[^.]*$")
+    local head = str:match("^(.*)%.[^.]*$")
     if head then str = head end
-    str = string.match(str, "^[%w_.%-]+")
+    str = str:match("^[%w_.%-]+")
   else
     str = nil
   end
   check(str, "cannot derive module name, use -n name")
-  return string.gsub(str, "[%.%-]", "_")
+  return str:gsub("[%.%-]", "_")
 end
 
 ------------------------------------------------------------------------------
@@ -118,7 +132,7 @@ end
 local function bcsave_c(ctx, output, s)
   local fp = savefile(output, "w")
   if ctx.type == "c" then
-    fp:write(string.format([[
+    fp:write(format([[
 #ifdef _cplusplus
 extern "C"
 #endif
@@ -128,7 +142,7 @@ __declspec(dllexport)
 const unsigned char %s%s[] = {
 ]], LJBC_PREFIX, ctx.modname))
   else
-    fp:write(string.format([[
+    fp:write(format([[
 #define %s%s_SIZE %d
 static const unsigned char %s%s[] = {
 ]], LJBC_PREFIX, ctx.modname, #s, LJBC_PREFIX, ctx.modname))
@@ -138,13 +152,13 @@ static const unsigned char %s%s[] = {
     local b = tostring(string.byte(s, i))
     m = m + #b + 1
     if m > 78 then
-      fp:write(table.concat(t, ",", 1, n), ",\n")
+      fp:write(tconcat(t, ",", 1, n), ",\n")
       n, m = 0, #b + 1
     end
     n = n + 1
     t[n] = b
   end
-  bcsave_tail(fp, output, table.concat(t, ",", 1, n).."\n};\n")
+  bcsave_tail(fp, output, tconcat(t, ",", 1, n).."\n};\n")
 end
 
 local function bcsave_elfobj(ctx, output, s, ffi)
@@ -199,12 +213,8 @@ typedef struct {
 } ELF64obj;
 ]]
   local symname = LJBC_PREFIX..ctx.modname
-  local is64, isbe = false, false
-  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" then
-    is64 = true
-  elseif ctx.arch == "ppc" or ctx.arch == "mips" then
-    isbe = true
-  end
+  local ai = assert(map_arch[ctx.arch])
+  local is64, isbe = ai.b == 64, ai.e == "be"
 
   -- Handle different host/target endianess.
   local function f32(x) return x end
@@ -237,10 +247,8 @@ typedef struct {
   hdr.eendian = isbe and 2 or 1
   hdr.eversion = 1
   hdr.type = f16(1)
-  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, mips=8, mipsel=8 })[ctx.arch])
-  if ctx.arch == "mips" or ctx.arch == "mipsel" then
-    hdr.flags = f32(0x50001006)
-  end
+  hdr.machine = f16(ai.m)
+  hdr.flags = f32(ai.f or 0)
   hdr.version = f32(1)
   hdr.shofs = fofs(ffi.offsetof(o, "sect"))
   hdr.ehsize = f16(ffi.sizeof(hdr))
@@ -336,12 +344,8 @@ typedef struct {
 } PEobj;
 ]]
   local symname = LJBC_PREFIX..ctx.modname
-  local is64 = false
-  if ctx.arch == "x86" then
-    symname = "_"..symname
-  elseif ctx.arch == "x64" then
-    is64 = true
-  end
+  local ai = assert(map_arch[ctx.arch])
+  local is64 = ai.b == 64
   local symexport = "   /EXPORT:"..symname..",DATA "
 
   -- The file format is always little-endian. Swap if the host is big-endian.
@@ -355,7 +359,7 @@ typedef struct {
   -- Create PE object and fill in header.
   local o = ffi.new("PEobj")
   local hdr = o.hdr
-  hdr.arch = f16(({ x86=0x14c, x64=0x8664, arm=0x1c0, ppc=0x1f2, mips=0x366, mipsel=0x366 })[ctx.arch])
+  hdr.arch = f16(assert(ai.p))
   hdr.nsects = f16(2)
   hdr.symtabofs = f32(ffi.offsetof(o, "sym0"))
   hdr.nsyms = f32(6)
@@ -605,16 +609,16 @@ local function docmd(...)
   local n = 1
   local list = false
   local ctx = {
-    strip = true, arch = jit.arch, os = string.lower(jit.os),
+    strip = true, arch = jit.arch, os = jit.os:lower(),
     type = false, modname = false,
   }
   while n <= #arg do
     local a = arg[n]
-    if type(a) == "string" and string.sub(a, 1, 1) == "-" and a ~= "-" then
-      table.remove(arg, n)
+    if type(a) == "string" and a:sub(1, 1) == "-" and a ~= "-" then
+      tremove(arg, n)
       if a == "--" then break end
       for m=2,#a do
-	local opt = string.sub(a, m, m)
+	local opt = a:sub(m, m)
 	if opt == "l" then
 	  list = true
 	elseif opt == "s" then
@@ -627,13 +631,13 @@ local function docmd(...)
 	    if n ~= 1 then usage() end
 	    arg[1] = check(loadstring(arg[1]))
 	  elseif opt == "n" then
-	    ctx.modname = checkmodname(table.remove(arg, n))
+	    ctx.modname = checkmodname(tremove(arg, n))
 	  elseif opt == "t" then
-	    ctx.type = checkarg(table.remove(arg, n), map_type, "file type")
+	    ctx.type = checkarg(tremove(arg, n), map_type, "file type")
 	  elseif opt == "a" then
-	    ctx.arch = checkarg(table.remove(arg, n), map_arch, "architecture")
+	    ctx.arch = checkarg(tremove(arg, n), map_arch, "architecture")
 	  elseif opt == "o" then
-	    ctx.os = checkarg(table.remove(arg, n), map_os, "OS name")
+	    ctx.os = checkarg(tremove(arg, n), map_os, "OS name")
 	  else
 	    usage()
 	  end
diff --git a/src/jit/dis_mips.lua b/src/jit/dis_mips.lua
index a12b8e62..c003b984 100644
--- a/src/jit/dis_mips.lua
+++ b/src/jit/dis_mips.lua
@@ -19,13 +19,34 @@ local band, bor, tohex = bit.band, bit.bor, bit.tohex
 local lshift, rshift, arshift = bit.lshift, bit.rshift, bit.arshift
 
 ------------------------------------------------------------------------------
--- Primary and extended opcode maps
+-- Extended opcode maps common to all MIPS releases
 ------------------------------------------------------------------------------
 
-local map_movci = { shift = 16, mask = 1, [0] = "movfDSC", "movtDSC", }
 local map_srl = { shift = 21, mask = 1, [0] = "srlDTA", "rotrDTA", }
 local map_srlv = { shift = 6, mask = 1, [0] = "srlvDTS", "rotrvDTS", }
 
+local map_cop0 = {
+  shift = 25, mask = 1,
+  [0] = {
+    shift = 21, mask = 15,
+    [0] = "mfc0TDW", [4] = "mtc0TDW",
+    [10] = "rdpgprDT",
+    [11] = { shift = 5, mask = 1, [0] = "diT0", "eiT0", },
+    [14] = "wrpgprDT",
+  }, {
+    shift = 0, mask = 63,
+    [1] = "tlbr", [2] = "tlbwi", [6] = "tlbwr", [8] = "tlbp",
+    [24] = "eret", [31] = "deret",
+    [32] = "wait",
+  },
+}
+
+------------------------------------------------------------------------------
+-- Primary and extended opcode maps for MIPS R1-R5
+------------------------------------------------------------------------------
+
+local map_movci = { shift = 16, mask = 1, [0] = "movfDSC", "movtDSC", }
+
 local map_special = {
   shift = 0, mask = 63,
   [0] = { shift = 0, mask = -1, [0] = "nop", _ = "sllDTA" },
@@ -87,22 +108,6 @@ local map_regimm = {
   false,	false,		false,		"synciSO",
 }
 
-local map_cop0 = {
-  shift = 25, mask = 1,
-  [0] = {
-    shift = 21, mask = 15,
-    [0] = "mfc0TDW", [4] = "mtc0TDW",
-    [10] = "rdpgprDT",
-    [11] = { shift = 5, mask = 1, [0] = "diT0", "eiT0", },
-    [14] = "wrpgprDT",
-  }, {
-    shift = 0, mask = 63,
-    [1] = "tlbr", [2] = "tlbwi", [6] = "tlbwr", [8] = "tlbp",
-    [24] = "eret", [31] = "deret",
-    [32] = "wait",
-  },
-}
-
 local map_cop1s = {
   shift = 0, mask = 63,
   [0] = "add.sFGH",	"sub.sFGH",	"mul.sFGH",	"div.sFGH",
@@ -233,6 +238,208 @@ local map_pri = {
   false,	"sdc1HSO",	"sdc2TSO",	"sdTSO",
 }
 
+------------------------------------------------------------------------------
+-- Primary and extended opcode maps for MIPS R6
+------------------------------------------------------------------------------
+
+local map_mul_r6 =   { shift = 6, mask = 3, [2] = "mulDST",   [3] = "muhDST" }
+local map_mulu_r6 =  { shift = 6, mask = 3, [2] = "muluDST",  [3] = "muhuDST" }
+local map_div_r6 =   { shift = 6, mask = 3, [2] = "divDST",   [3] = "modDST" }
+local map_divu_r6 =  { shift = 6, mask = 3, [2] = "divuDST",  [3] = "moduDST" }
+local map_dmul_r6 =  { shift = 6, mask = 3, [2] = "dmulDST",  [3] = "dmuhDST" }
+local map_dmulu_r6 = { shift = 6, mask = 3, [2] = "dmuluDST", [3] = "dmuhuDST" }
+local map_ddiv_r6 =  { shift = 6, mask = 3, [2] = "ddivDST",  [3] = "dmodDST" }
+local map_ddivu_r6 = { shift = 6, mask = 3, [2] = "ddivuDST", [3] = "dmoduDST" }
+
+local map_special_r6 = {
+  shift = 0, mask = 63,
+  [0] = { shift = 0, mask = -1, [0] = "nop", _ = "sllDTA" },
+  false,	map_srl,	"sraDTA",
+  "sllvDTS",	false,		map_srlv,	"sravDTS",
+  "jrS",	"jalrD1S",	false,		false,
+  "syscallY",	"breakY",	false,		"sync",
+  "clzDS",	"cloDS",	"dclzDS",	"dcloDS",
+  "dsllvDST",	"dlsaDSTA",	"dsrlvDST",	"dsravDST",
+  map_mul_r6,	map_mulu_r6,	map_div_r6,	map_divu_r6,
+  map_dmul_r6,	map_dmulu_r6,	map_ddiv_r6,	map_ddivu_r6,
+  "addDST",	"addu|moveDST0", "subDST",	"subu|neguDS0T",
+  "andDST",	"or|moveDST0",	"xorDST",	"nor|notDST0",
+  false,	false,		"sltDST",	"sltuDST",
+  "daddDST",	"dadduDST",	"dsubDST",	"dsubuDST",
+  "tgeSTZ",	"tgeuSTZ",	"tltSTZ",	"tltuSTZ",
+  "teqSTZ",	"seleqzDST",	"tneSTZ",	"selnezDST",
+  "dsllDTA",	false,		"dsrlDTA",	"dsraDTA",
+  "dsll32DTA",	false,		"dsrl32DTA",	"dsra32DTA",
+}
+
+local map_bshfl_r6 = {
+  shift = 9, mask = 3,
+  [1] = "alignDSTa",
+  _ = {
+    shift = 6, mask = 31,
+    [0] = "bitswapDT",
+    [2] = "wsbhDT",
+    [16] = "sebDT",
+    [24] = "sehDT",
+  }
+}
+
+local map_dbshfl_r6 = {
+  shift = 9, mask = 3,
+  [1] = "dalignDSTa",
+  _ = {
+    shift = 6, mask = 31,
+    [0] = "dbitswapDT",
+    [2] = "dsbhDT",
+    [5] = "dshdDT",
+  }
+}
+
+local map_special3_r6 = {
+  shift = 0, mask = 63,
+  [0]  = "extTSAK", [1]  = "dextmTSAP", [3]  = "dextTSAK",
+  [4]  = "insTSAL", [6]  = "dinsuTSEQ", [7]  = "dinsTSAL",
+  [32] = map_bshfl_r6, [36] = map_dbshfl_r6,  [59] = "rdhwrTD",
+}
+
+local map_regimm_r6 = {
+  shift = 16, mask = 31,
+  [0] = "bltzSB", [1] = "bgezSB",
+  [6] = "dahiSI", [30] = "datiSI",
+  [23] = "sigrieI", [31] = "synciSO",
+}
+
+local map_pcrel_r6 = {
+  shift = 19, mask = 3,
+  [0] = "addiupcS2", "lwpcS2", "lwupcS2", {
+    shift = 18, mask = 1,
+    [0] = "ldpcS3", { shift = 16, mask = 3, [2] = "auipcSI", [3] = "aluipcSI" }
+  }
+}
+
+local map_cop1s_r6 = {
+  shift = 0, mask = 63,
+  [0] = "add.sFGH",	"sub.sFGH",	"mul.sFGH",	"div.sFGH",
+  "sqrt.sFG",		"abs.sFG",	"mov.sFG",	"neg.sFG",
+  "round.l.sFG",	"trunc.l.sFG",	"ceil.l.sFG",	"floor.l.sFG",
+  "round.w.sFG",	"trunc.w.sFG",	"ceil.w.sFG",	"floor.w.sFG",
+  "sel.sFGH",		false,		false,		false,
+  "seleqz.sFGH",	"recip.sFG",	"rsqrt.sFG",	"selnez.sFGH",
+  "maddf.sFGH",		"msubf.sFGH",	"rint.sFG",	"class.sFG",
+  "min.sFGH",		"mina.sFGH",	"max.sFGH",	"maxa.sFGH",
+  false,		"cvt.d.sFG",	false,		false,
+  "cvt.w.sFG",		"cvt.l.sFG",
+}
+
+local map_cop1d_r6 = {
+  shift = 0, mask = 63,
+  [0] = "add.dFGH",	"sub.dFGH",	"mul.dFGH",	"div.dFGH",
+  "sqrt.dFG",		"abs.dFG",	"mov.dFG",	"neg.dFG",
+  "round.l.dFG",	"trunc.l.dFG",	"ceil.l.dFG",	"floor.l.dFG",
+  "round.w.dFG",	"trunc.w.dFG",	"ceil.w.dFG",	"floor.w.dFG",
+  "sel.dFGH",		false,		false,		false,
+  "seleqz.dFGH",	"recip.dFG",	"rsqrt.dFG",	"selnez.dFGH",
+  "maddf.dFGH",		"msubf.dFGH",	"rint.dFG",	"class.dFG",
+  "min.dFGH",		"mina.dFGH",	"max.dFGH",	"maxa.dFGH",
+  "cvt.s.dFG",		false,		false,		false,
+  "cvt.w.dFG",		"cvt.l.dFG",
+}
+
+local map_cop1w_r6 = {
+  shift = 0, mask = 63,
+  [0] = "cmp.af.sFGH",	"cmp.un.sFGH",	"cmp.eq.sFGH",	"cmp.ueq.sFGH",
+  "cmp.lt.sFGH",	"cmp.ult.sFGH",	"cmp.le.sFGH",	"cmp.ule.sFGH",
+  "cmp.saf.sFGH",	"cmp.sun.sFGH",	"cmp.seq.sFGH",	"cmp.sueq.sFGH",
+  "cmp.slt.sFGH",	"cmp.sult.sFGH",	"cmp.sle.sFGH",	"cmp.sule.sFGH",
+  false,		"cmp.or.sFGH",	"cmp.une.sFGH",	"cmp.ne.sFGH",
+  false,		false,		false,		false,
+  false,		"cmp.sor.sFGH",	"cmp.sune.sFGH",	"cmp.sne.sFGH",
+  false,		false,		false,		false,
+  "cvt.s.wFG", "cvt.d.wFG",
+}
+
+local map_cop1l_r6 = {
+  shift = 0, mask = 63,
+  [0] = "cmp.af.dFGH",	"cmp.un.dFGH",	"cmp.eq.dFGH",	"cmp.ueq.dFGH",
+  "cmp.lt.dFGH",	"cmp.ult.dFGH",	"cmp.le.dFGH",	"cmp.ule.dFGH",
+  "cmp.saf.dFGH",	"cmp.sun.dFGH",	"cmp.seq.dFGH",	"cmp.sueq.dFGH",
+  "cmp.slt.dFGH",	"cmp.sult.dFGH",	"cmp.sle.dFGH",	"cmp.sule.dFGH",
+  false,		"cmp.or.dFGH",	"cmp.une.dFGH",	"cmp.ne.dFGH",
+  false,		false,		false,		false,
+  false,		"cmp.sor.dFGH",	"cmp.sune.dFGH",	"cmp.sne.dFGH",
+  false,		false,		false,		false,
+  "cvt.s.lFG", "cvt.d.lFG",
+}
+
+local map_cop1_r6 = {
+  shift = 21, mask = 31,
+  [0] = "mfc1TG", "dmfc1TG",	"cfc1TG",	"mfhc1TG",
+  "mtc1TG",	"dmtc1TG",	"ctc1TG",	"mthc1TG",
+  false,	"bc1eqzHB",	false,		false,
+  false,	"bc1nezHB",	false,		false,
+  map_cop1s_r6,	map_cop1d_r6,	false,		false,
+  map_cop1w_r6,	map_cop1l_r6,
+}
+
+local function maprs_popTS(rs, rt)
+  if rt == 0 then return 0 elseif rs == 0 then return 1
+  elseif rs == rt then return 2 else return 3 end
+end
+
+local map_pop06_r6 = {
+  maprs = maprs_popTS, [0] = "blezSB", "blezalcTB", "bgezalcTB", "bgeucSTB"
+}
+local map_pop07_r6 = {
+  maprs = maprs_popTS, [0] = "bgtzSB", "bgtzalcTB", "bltzalcTB", "bltucSTB"
+}
+local map_pop26_r6 = {
+  maprs = maprs_popTS, "blezcTB", "bgezcTB", "bgecSTB"
+}
+local map_pop27_r6 = {
+  maprs = maprs_popTS, "bgtzcTB", "bltzcTB", "bltcSTB"
+}
+
+local function maprs_popS(rs, rt)
+  if rs == 0 then return 0 else return 1 end
+end
+
+local map_pop66_r6 = {
+  maprs = maprs_popS, [0] = "jicTI", "beqzcSb"
+}
+local map_pop76_r6 = {
+  maprs = maprs_popS, [0] = "jialcTI", "bnezcSb"
+}
+
+local function maprs_popST(rs, rt)
+  if rs >= rt then return 0 elseif rs == 0 then return 1 else return 2 end
+end
+
+local map_pop10_r6 = {
+  maprs = maprs_popST, [0] = "bovcSTB", "beqzalcTB", "beqcSTB"
+}
+local map_pop30_r6 = {
+  maprs = maprs_popST, [0] = "bnvcSTB", "bnezalcTB", "bnecSTB"
+}
+
+local map_pri_r6 = {
+  [0] = map_special_r6,	map_regimm_r6,	"jJ",	"jalJ",
+  "beq|beqz|bST00B",	"bne|bnezST0B",		map_pop06_r6,	map_pop07_r6,
+  map_pop10_r6,	"addiu|liTS0I",	"sltiTSI",	"sltiuTSI",
+  "andiTSU",	"ori|liTS0U",	"xoriTSU",	"aui|luiTS0U",
+  map_cop0,	map_cop1_r6,	false,		false,
+  false,	false,		map_pop26_r6,	map_pop27_r6,
+  map_pop30_r6,	"daddiuTSI",	false,		false,
+  false,	"dauiTSI",	false,		map_special3_r6,
+  "lbTSO",	"lhTSO",	false,		"lwTSO",
+  "lbuTSO",	"lhuTSO",	false,		false,
+  "sbTSO",	"shTSO",	false,		"swTSO",
+  false,	false,		false,		false,
+  false,	"lwc1HSO",	"bc#",		false,
+  false,	"ldc1HSO",	map_pop66_r6,	"ldTSO",
+  false,	"swc1HSO",	"balc#",	map_pcrel_r6,
+  false,	"sdc1HSO",	map_pop76_r6,	"sdTSO",
+}
+
 ------------------------------------------------------------------------------
 
 local map_gpr = {
@@ -287,10 +494,14 @@ local function disass_ins(ctx)
   ctx.op = op
   ctx.rel = nil
 
-  local opat = map_pri[rshift(op, 26)]
+  local opat = ctx.map_pri[rshift(op, 26)]
   while type(opat) ~= "string" do
     if not opat then return unknown(ctx) end
-    opat = opat[band(rshift(op, opat.shift), opat.mask)] or opat._
+    if opat.maprs then
+      opat = opat[opat.maprs(band(rshift(op,21),31), band(rshift(op,16),31))]
+    else
+      opat = opat[band(rshift(op, opat.shift), opat.mask)] or opat._
+    end
   end
   local name, pat = match(opat, "^([a-z0-9_.]*)(.*)")
   local altname, pat2 = match(pat, "|([a-z0-9_.|]*)(.*)")
@@ -314,6 +525,8 @@ local function disass_ins(ctx)
       x = "f"..band(rshift(op, 21), 31)
     elseif p == "A" then
       x = band(rshift(op, 6), 31)
+    elseif p == "a" then
+      x = band(rshift(op, 6), 7)
     elseif p == "E" then
       x = band(rshift(op, 6), 31) + 32
     elseif p == "M" then
@@ -333,6 +546,10 @@ local function disass_ins(ctx)
       x = band(rshift(op, 11), 31) - last + 33
     elseif p == "I" then
       x = arshift(lshift(op, 16), 16)
+    elseif p == "2" then
+      x = arshift(lshift(op, 13), 11)
+    elseif p == "3" then
+      x = arshift(lshift(op, 14), 11)
     elseif p == "U" then
       x = band(op, 0xffff)
     elseif p == "O" then
@@ -342,7 +559,15 @@ local function disass_ins(ctx)
       local index = map_gpr[band(rshift(op, 16), 31)]
       operands[#operands] = format("%s(%s)", index, last)
     elseif p == "B" then
-      x = ctx.addr + ctx.pos + arshift(lshift(op, 16), 16)*4 + 4
+      x = ctx.addr + ctx.pos + arshift(lshift(op, 16), 14) + 4
+      ctx.rel = x
+      x = format("0x%08x", x)
+    elseif p == "b" then
+      x = ctx.addr + ctx.pos + arshift(lshift(op, 11), 9) + 4
+      ctx.rel = x
+      x = format("0x%08x", x)
+    elseif p == "#" then
+      x = ctx.addr + ctx.pos + arshift(lshift(op, 6), 4) + 4
       ctx.rel = x
       x = format("0x%08x", x)
     elseif p == "J" then
@@ -408,6 +633,7 @@ local function create(code, addr, out)
   ctx.disass = disass_block
   ctx.hexdump = 8
   ctx.get = get_be
+  ctx.map_pri = map_pri
   return ctx
 end
 
@@ -417,6 +643,19 @@ local function create_el(code, addr, out)
   return ctx
 end
 
+local function create_r6(code, addr, out)
+  local ctx = create(code, addr, out)
+  ctx.map_pri = map_pri_r6
+  return ctx
+end
+
+local function create_r6_el(code, addr, out)
+  local ctx = create(code, addr, out)
+  ctx.get = get_le
+  ctx.map_pri = map_pri_r6
+  return ctx
+end
+
 -- Simple API: disassemble code (a string) at address and output via out.
 local function disass(code, addr, out)
   create(code, addr, out):disass()
@@ -426,6 +665,14 @@ local function disass_el(code, addr, out)
   create_el(code, addr, out):disass()
 end
 
+local function disass_r6(code, addr, out)
+  create_r6(code, addr, out):disass()
+end
+
+local function disass_r6_el(code, addr, out)
+  create_r6_el(code, addr, out):disass()
+end
+
 -- Return register name for RID.
 local function regname(r)
   if r < 32 then return map_gpr[r] end
@@ -436,8 +683,12 @@ end
 return {
   create = create,
   create_el = create_el,
+  create_r6 = create_r6,
+  create_r6_el = create_r6_el,
   disass = disass,
   disass_el = disass_el,
+  disass_r6 = disass_r6,
+  disass_r6_el = disass_r6_el,
   regname = regname
 }
 
diff --git a/src/jit/dis_mips64r6.lua b/src/jit/dis_mips64r6.lua
new file mode 100644
index 00000000..023c05ab
--- /dev/null
+++ b/src/jit/dis_mips64r6.lua
@@ -0,0 +1,17 @@
+----------------------------------------------------------------------------
+-- LuaJIT MIPS64R6 disassembler wrapper module.
+--
+-- Copyright (C) 2005-2017 Mike Pall. All rights reserved.
+-- Released under the MIT license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+-- This module just exports the r6 big-endian functions from the
+-- MIPS disassembler module. All the interesting stuff is there.
+------------------------------------------------------------------------------
+
+local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips")
+return {
+  create = dis_mips.create_r6,
+  disass = dis_mips.disass_r6,
+  regname = dis_mips.regname
+}
+
diff --git a/src/jit/dis_mips64r6el.lua b/src/jit/dis_mips64r6el.lua
new file mode 100644
index 00000000..f2988339
--- /dev/null
+++ b/src/jit/dis_mips64r6el.lua
@@ -0,0 +1,17 @@
+----------------------------------------------------------------------------
+-- LuaJIT MIPS64R6EL disassembler wrapper module.
+--
+-- Copyright (C) 2005-2017 Mike Pall. All rights reserved.
+-- Released under the MIT license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+-- This module just exports the r6 little-endian functions from the
+-- MIPS disassembler module. All the interesting stuff is there.
+------------------------------------------------------------------------------
+
+local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips")
+return {
+  create = dis_mips.create_r6_el,
+  disass = dis_mips.disass_r6_el,
+  regname = dis_mips.regname
+}
+
diff --git a/src/lj_arch.h b/src/lj_arch.h
index 0351e046..cf31a291 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -342,18 +342,38 @@
 #elif LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 || LUAJIT_TARGET == LUAJIT_ARCH_MIPS64
 
 #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL)
+#if __mips_isa_rev >= 6
+#define LJ_TARGET_MIPSR6	1
+#define LJ_TARGET_UNALIGNED	1
+#endif
 #if LUAJIT_TARGET == LUAJIT_ARCH_MIPS32
+#if LJ_TARGET_MIPSR6
+#define LJ_ARCH_NAME		"mips32r6el"
+#else
 #define LJ_ARCH_NAME		"mipsel"
+#endif
+#else
+#if LJ_TARGET_MIPSR6
+#define LJ_ARCH_NAME		"mips64r6el"
 #else
 #define LJ_ARCH_NAME		"mips64el"
 #endif
+#endif
 #define LJ_ARCH_ENDIAN		LUAJIT_LE
 #else
 #if LUAJIT_TARGET == LUAJIT_ARCH_MIPS32
+#if LJ_TARGET_MIPSR6
+#define LJ_ARCH_NAME		"mips32r6"
+#else
 #define LJ_ARCH_NAME		"mips"
+#endif
+#else
+#if LJ_TARGET_MIPSR6
+#define LJ_ARCH_NAME		"mips64r6"
 #else
 #define LJ_ARCH_NAME		"mips64"
 #endif
+#endif
 #define LJ_ARCH_ENDIAN		LUAJIT_BE
 #endif
 
@@ -390,7 +410,9 @@
 #define LJ_TARGET_UNIFYROT	2	/* Want only IR_BROR. */
 #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
 
-#if _MIPS_ARCH_MIPS32R2 || _MIPS_ARCH_MIPS64R2
+#if LJ_TARGET_MIPSR6
+#define LJ_ARCH_VERSION		60
+#elif _MIPS_ARCH_MIPS32R2 || _MIPS_ARCH_MIPS64R2
 #define LJ_ARCH_VERSION		20
 #else
 #define LJ_ARCH_VERSION		10
@@ -472,8 +494,13 @@
 #if !((defined(_MIPS_SIM_ABI32) && _MIPS_SIM == _MIPS_SIM_ABI32) || (defined(_ABIO32) && _MIPS_SIM == _ABIO32))
 #error "Only o32 ABI supported for MIPS32"
 #endif
+#if LJ_TARGET_MIPSR6
+/* Not that useful, since most available r6 CPUs are 64 bit. */
+#error "No support for MIPS32R6"
+#endif
 #elif LJ_TARGET_MIPS64
 #if !((defined(_MIPS_SIM_ABI64) && _MIPS_SIM == _MIPS_SIM_ABI64) || (defined(_ABI64) && _MIPS_SIM == _ABI64))
+/* MIPS32ON64 aka n32 ABI support might be desirable, but difficult. */
 #error "Only n64 ABI supported for MIPS64"
 #endif
 #endif
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 25b96264..96b8c032 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -2159,8 +2159,8 @@ static void asm_setup_regsp(ASMState *as)
 	  ir->prev = REGSP_HINT(RID_FPRET);
 	  continue;
 	}
-	/* fallthrough */
 #endif
+      /* fallthrough */
       case IR_CALLN: case IR_CALLXS:
 #if LJ_SOFTFP
       case IR_MIN: case IR_MAX:
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index 23ffc3aa..4626507b 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -101,7 +101,12 @@ static void asm_guard(ASMState *as, MIPSIns mi, Reg rs, Reg rt)
     as->invmcp = NULL;
     as->loopinv = 1;
     as->mcp = p+1;
+#if !LJ_TARGET_MIPSR6
     mi = mi ^ ((mi>>28) == 1 ? 0x04000000u : 0x00010000u);  /* Invert cond. */
+#else
+    mi = mi ^ ((mi>>28) == 1 ? 0x04000000u :
+	       (mi>>28) == 4 ? 0x00800000u : 0x00010000u);  /* Invert cond. */
+#endif
     target = p;  /* Patch target later in asm_loop_fixup. */
   }
   emit_ti(as, MIPSI_LI, RID_TMP, as->snapno);
@@ -410,7 +415,11 @@ static void asm_callround(ASMState *as, IRIns *ir, IRCallID id)
 {
   /* The modified regs must match with the *.dasc implementation. */
   RegSet drop = RID2RSET(RID_R1)|RID2RSET(RID_R12)|RID2RSET(RID_FPRET)|
-		RID2RSET(RID_F2)|RID2RSET(RID_F4)|RID2RSET(REGARG_FIRSTFPR);
+		RID2RSET(RID_F2)|RID2RSET(RID_F4)|RID2RSET(REGARG_FIRSTFPR)
+#if LJ_TARGET_MIPSR6
+		|RID2RSET(RID_F21)
+#endif
+		;
   if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
   ra_evictset(as, drop);
   ra_destreg(as, ir, RID_FPRET);
@@ -444,8 +453,13 @@ static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
 {
   Reg tmp = ra_scratch(as, rset_exclude(RSET_FPR, left));
   Reg dest = ra_dest(as, ir, RSET_GPR);
+#if !LJ_TARGET_MIPSR6
   asm_guard(as, MIPSI_BC1F, 0, 0);
   emit_fgh(as, MIPSI_C_EQ_D, 0, tmp, left);
+#else
+  asm_guard(as, MIPSI_BC1EQZ, 0, (tmp&31));
+  emit_fgh(as, MIPSI_CMP_EQ_D, tmp, tmp, left);
+#endif
   emit_fg(as, MIPSI_CVT_D_W, tmp, tmp);
   emit_tg(as, MIPSI_MFC1, dest, tmp);
   emit_fg(as, MIPSI_CVT_W_D, tmp, left);
@@ -599,8 +613,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
 		     (void *)&as->J->k64[LJ_K64_M2P64],
 		     rset_exclude(RSET_GPR, dest));
 	  emit_fg(as, MIPSI_TRUNC_L_D, tmp, left);  /* Delay slot. */
-	  emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
-	  emit_fgh(as, MIPSI_C_OLT_D, 0, left, tmp);
+#if !LJ_TARGET_MIPSR6
+	 emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
+	 emit_fgh(as, MIPSI_C_OLT_D, 0, left, tmp);
+#else
+	 emit_branch(as, MIPSI_BC1NEZ, 0, (left&31), l_end);
+	 emit_fgh(as, MIPSI_CMP_LT_D, left, left, tmp);
+#endif
 	  emit_lsptr(as, MIPSI_LDC1, (tmp & 31),
 		     (void *)&as->J->k64[LJ_K64_2P63],
 		     rset_exclude(RSET_GPR, dest));
@@ -611,8 +630,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
 		     (void *)&as->J->k32[LJ_K32_M2P64],
 		     rset_exclude(RSET_GPR, dest));
 	  emit_fg(as, MIPSI_TRUNC_L_S, tmp, left);  /* Delay slot. */
-	  emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
-	  emit_fgh(as, MIPSI_C_OLT_S, 0, left, tmp);
+#if !LJ_TARGET_MIPSR6
+	 emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
+	 emit_fgh(as, MIPSI_C_OLT_S, 0, left, tmp);
+#else
+	 emit_branch(as, MIPSI_BC1NEZ, 0, (left&31), l_end);
+	 emit_fgh(as, MIPSI_CMP_LT_S, left, left, tmp);
+#endif
 	  emit_lsptr(as, MIPSI_LWC1, (tmp & 31),
 		     (void *)&as->J->k32[LJ_K32_2P63],
 		     rset_exclude(RSET_GPR, dest));
@@ -840,8 +864,12 @@ static void asm_aref(ASMState *as, IRIns *ir)
   }
   base = ra_alloc1(as, ir->op1, RSET_GPR);
   idx = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, base));
+#if !LJ_TARGET_MIPSR6
   emit_dst(as, MIPSI_AADDU, dest, RID_TMP, base);
   emit_dta(as, MIPSI_SLL, RID_TMP, idx, 3);
+#else
+  emit_dst(as, MIPSI_ALSA | MIPSF_A(3-1), dest, idx, base);
+#endif
 }
 
 /* Inlined hash lookup. Specialized for key type and for const keys.
@@ -944,8 +972,13 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
     l_end = asm_exitstub_addr(as);
   }
   if (!LJ_SOFTFP && irt_isnum(kt)) {
+#if !LJ_TARGET_MIPSR6
     emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
     emit_fgh(as, MIPSI_C_EQ_D, 0, tmpnum, key);
+#else
+    emit_branch(as, MIPSI_BC1NEZ, 0, (tmpnum&31), l_end);
+    emit_fgh(as, MIPSI_CMP_EQ_D, tmpnum, tmpnum, key);
+#endif
     *--as->mcp = MIPSI_NOP;  /* Avoid NaN comparison overhead. */
     emit_branch(as, MIPSI_BEQ, tmp1, RID_ZERO, l_next);
     emit_tsi(as, MIPSI_SLTIU, tmp1, tmp1, (int32_t)LJ_TISNUM);
@@ -1196,7 +1229,9 @@ static MIPSIns asm_fxloadins(IRIns *ir)
   case IRT_I16: return MIPSI_LH;
   case IRT_U16: return MIPSI_LHU;
   case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
+  /* fallthrough */
   case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
+  /* fallthrough */
   default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
   }
 }
@@ -1207,7 +1242,9 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
   case IRT_I8: case IRT_U8: return MIPSI_SB;
   case IRT_I16: case IRT_U16: return MIPSI_SH;
   case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
+  /* fallthrough */
   case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
+  /* fallthrough */
   default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
   }
 }
@@ -1253,7 +1290,7 @@ static void asm_xload(ASMState *as, IRIns *ir)
 {
   Reg dest = ra_dest(as, ir,
     (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
-  lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
+  lua_assert(LJ_TARGET_UNALIGNED || !(ir->op2 & IRXLOAD_UNALIGNED));
   asm_fusexref(as, asm_fxloadins(ir), dest, ir->op1, RSET_GPR, 0);
 }
 
@@ -1545,7 +1582,7 @@ static void asm_cnew(ASMState *as, IRIns *ir)
       ofs -= 4; if (LJ_BE) ir++; else ir--;
     }
 #else
-    emit_tsi(as, MIPSI_SD, ra_alloc1(as, ir->op2, allow),
+    emit_tsi(as, sz == 8 ? MIPSI_SD : MIPSI_SW, ra_alloc1(as, ir->op2, allow),
 	     RID_RET, sizeof(GCcdata));
 #endif
     lua_assert(sz == 4 || sz == 8);
@@ -1678,6 +1715,7 @@ static void asm_add(ASMState *as, IRIns *ir)
   } else
 #endif
   {
+    /* TODO MIPSR6: Fuse ADD(BSHL(a,1-4),b) or ADD(ADD(a,a),b) to MIPSI_ALSA. */
     Reg dest = ra_dest(as, ir, RSET_GPR);
     Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
     if (irref_isk(ir->op2)) {
@@ -1722,8 +1760,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
     Reg right, left = ra_alloc2(as, ir, RSET_GPR);
     right = (left >> 8); left &= 255;
     if (LJ_64 && irt_is64(ir->t)) {
+#if !LJ_TARGET_MIPSR6
       emit_dst(as, MIPSI_MFLO, dest, 0, 0);
       emit_dst(as, MIPSI_DMULT, 0, left, right);
+#else
+      emit_dst(as, MIPSI_DMUL, dest, left, right);
+#endif
     } else {
       emit_dst(as, MIPSI_MUL, dest, left, right);
     }
@@ -1806,6 +1848,7 @@ static void asm_abs(ASMState *as, IRIns *ir)
 
 static void asm_arithov(ASMState *as, IRIns *ir)
 {
+  /* TODO MIPSR6: bovc/bnvc. Caveat: no delay slot to load RID_TMP. */
   Reg right, left, tmp, dest = ra_dest(as, ir, RSET_GPR);
   lua_assert(!irt_is64(ir->t));
   if (irref_isk(ir->op2)) {
@@ -1850,9 +1893,14 @@ static void asm_mulov(ASMState *as, IRIns *ir)
 						 right), dest));
   asm_guard(as, MIPSI_BNE, RID_TMP, tmp);
   emit_dta(as, MIPSI_SRA, RID_TMP, dest, 31);
+#if !LJ_TARGET_MIPSR6
   emit_dst(as, MIPSI_MFHI, tmp, 0, 0);
   emit_dst(as, MIPSI_MFLO, dest, 0, 0);
   emit_dst(as, MIPSI_MULT, 0, left, right);
+#else
+  emit_dst(as, MIPSI_MUL, dest, left, right);
+  emit_dst(as, MIPSI_MUH, tmp, left, right);
+#endif
 }
 
 #if LJ_32 && LJ_HASFFI
@@ -2076,6 +2124,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
     Reg dest = ra_dest(as, ir, RSET_FPR);
     Reg right, left = ra_alloc2(as, ir, RSET_FPR);
     right = (left >> 8); left &= 255;
+#if !LJ_TARGET_MIPSR6
     if (dest == left) {
       emit_fg(as, MIPSI_MOVT_D, dest, right);
     } else {
@@ -2083,19 +2132,37 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
       if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
     }
     emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
+#else
+    emit_fgh(as, ismax ? MIPSI_MAX_D : MIPSI_MIN_D, dest, left, right);
+#endif
 #endif
   } else {
     Reg dest = ra_dest(as, ir, RSET_GPR);
     Reg right, left = ra_alloc2(as, ir, RSET_GPR);
     right = (left >> 8); left &= 255;
-    if (dest == left) {
-      emit_dst(as, MIPSI_MOVN, dest, right, RID_TMP);
+    if (left == right) {
+      if (dest != left) emit_move(as, dest, left);
     } else {
-      emit_dst(as, MIPSI_MOVZ, dest, left, RID_TMP);
-      if (dest != right) emit_move(as, dest, right);
+#if !LJ_TARGET_MIPSR6
+      if (dest == left) {
+	emit_dst(as, MIPSI_MOVN, dest, right, RID_TMP);
+      } else {
+	emit_dst(as, MIPSI_MOVZ, dest, left, RID_TMP);
+	if (dest != right) emit_move(as, dest, right);
+      }
+#else
+      emit_dst(as, MIPSI_OR, dest, dest, RID_TMP);
+      if (dest != right) {
+	emit_dst(as, MIPSI_SELNEZ, RID_TMP, right, RID_TMP);
+	emit_dst(as, MIPSI_SELEQZ, dest, left, RID_TMP);
+      } else {
+	emit_dst(as, MIPSI_SELEQZ, RID_TMP, left, RID_TMP);
+	emit_dst(as, MIPSI_SELNEZ, dest, right, RID_TMP);
+      }
+#endif
+      emit_dst(as, MIPSI_SLT, RID_TMP,
+	       ismax ? left : right, ismax ? right : left);
     }
-    emit_dst(as, MIPSI_SLT, RID_TMP,
-	     ismax ? left : right, ismax ? right : left);
   }
 }
 
@@ -2179,10 +2246,18 @@ static void asm_comp(ASMState *as, IRIns *ir)
 #if LJ_SOFTFP
     asm_sfpcomp(as, ir);
 #else
+#if !LJ_TARGET_MIPSR6
     Reg right, left = ra_alloc2(as, ir, RSET_FPR);
     right = (left >> 8); left &= 255;
     asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
     emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
+#else
+    Reg tmp, right, left = ra_alloc2(as, ir, RSET_FPR);
+    right = (left >> 8); left &= 255;
+    tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_FPR, left), right));
+    asm_guard(as, (op&1) ? MIPSI_BC1NEZ : MIPSI_BC1EQZ, 0, (tmp&31));
+    emit_fgh(as, MIPSI_CMP_LT_D + ((op&3) ^ ((op>>2)&1)), tmp, left, right);
+#endif
 #endif
   } else {
     Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
@@ -2218,9 +2293,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
   if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
 #if LJ_SOFTFP
     asm_sfpcomp(as, ir);
-#else
+#elif !LJ_TARGET_MIPSR6
     asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
     emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
+#else
+    Reg tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_FPR, left), right));
+    asm_guard(as, (ir->o & 1) ? MIPSI_BC1NEZ : MIPSI_BC1EQZ, 0, (tmp&31));
+    emit_fgh(as, MIPSI_CMP_EQ_D, tmp, left, right);
 #endif
   } else {
     asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
@@ -2623,7 +2702,12 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
       if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
 	  ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
 	   (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
-	   (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
+#if !LJ_TARGET_MIPSR6
+	   (p[-1] & 0xffe00000u) == MIPSI_BC1F
+#else
+	   (p[-1] & 0xff600000u) == MIPSI_BC1EQZ
+#endif
+	  )) {
 	ptrdiff_t delta = target - p;
 	if (((delta + 0x8000) >> 16) == 0) {  /* Patch in-range branch. */
 	patchbranch:
diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
index bb6593ae..313d030a 100644
--- a/src/lj_emit_mips.h
+++ b/src/lj_emit_mips.h
@@ -138,6 +138,7 @@ static void emit_loadu64(ASMState *as, Reg r, uint64_t u64)
     } else if (emit_kdelta1(as, r, (intptr_t)u64)) {
       return;
     } else {
+      /* TODO MIPSR6: Use DAHI & DATI. Caveat: sign-extension. */
       if ((u64 & 0xffff)) {
 	emit_tsi(as, MIPSI_ORI, r, r, u64 & 0xffff);
       }
@@ -236,10 +237,22 @@ static void emit_jmp(ASMState *as, MCode *target)
 static void emit_call(ASMState *as, void *target, int needcfa)
 {
   MCode *p = as->mcp;
-  *--p = MIPSI_NOP;
+#if LJ_TARGET_MIPSR6
+  ptrdiff_t delta = (char *)target - (char *)p;
+  if ((((delta>>2) + 0x02000000) >> 26) == 0) {  /* Try compact call first. */
+    *--p = MIPSI_BALC | (((uintptr_t)delta >>2) & 0x03ffffffu);
+    as->mcp = p;
+    return;
+  }
+#endif
+  *--p = MIPSI_NOP;  /* Delay slot. */
   if ((((uintptr_t)target ^ (uintptr_t)p) >> 28) == 0) {
+#if !LJ_TARGET_MIPSR6
     *--p = (((uintptr_t)target & 1) ? MIPSI_JALX : MIPSI_JAL) |
 	   (((uintptr_t)target >>2) & 0x03ffffffu);
+#else
+    *--p = MIPSI_JAL | (((uintptr_t)target >>2) & 0x03ffffffu);
+#endif
   } else {  /* Target out of range: need indirect call. */
     *--p = MIPSI_JALR | MIPSF_S(RID_CFUNCADDR);
     needcfa = 1;
diff --git a/src/lj_jit.h b/src/lj_jit.h
index c06829ab..a8b6f9a7 100644
--- a/src/lj_jit.h
+++ b/src/lj_jit.h
@@ -51,10 +51,18 @@
 /* Names for the CPU-specific flags. Must match the order above. */
 #define JIT_F_CPU_FIRST		JIT_F_MIPSXXR2
 #if LJ_TARGET_MIPS32
+#if LJ_TARGET_MIPSR6
+#define JIT_F_CPUSTRING		"\010MIPS32R6"
+#else
 #define JIT_F_CPUSTRING		"\010MIPS32R2"
+#endif
+#else
+#if LJ_TARGET_MIPSR6
+#define JIT_F_CPUSTRING		"\010MIPS64R6"
 #else
 #define JIT_F_CPUSTRING		"\010MIPS64R2"
 #endif
+#endif
 #else
 #define JIT_F_CPU_FIRST		0
 #define JIT_F_CPUSTRING		""
diff --git a/src/lj_target_mips.h b/src/lj_target_mips.h
index 740687b3..84db6012 100644
--- a/src/lj_target_mips.h
+++ b/src/lj_target_mips.h
@@ -223,6 +223,8 @@ typedef enum MIPSIns {
   MIPSI_ADDIU = 0x24000000,
   MIPSI_SUB = 0x00000022,
   MIPSI_SUBU = 0x00000023,
+
+#if !LJ_TARGET_MIPSR6
   MIPSI_MUL = 0x70000002,
   MIPSI_DIV = 0x0000001a,
   MIPSI_DIVU = 0x0000001b,
@@ -232,6 +234,15 @@ typedef enum MIPSIns {
   MIPSI_MFHI = 0x00000010,
   MIPSI_MFLO = 0x00000012,
   MIPSI_MULT = 0x00000018,
+#else
+  MIPSI_MUL = 0x00000098,
+  MIPSI_MUH = 0x000000d8,
+  MIPSI_DIV = 0x0000009a,
+  MIPSI_DIVU = 0x0000009b,
+
+  MIPSI_SELEQZ = 0x00000035,
+  MIPSI_SELNEZ = 0x00000037,
+#endif
 
   MIPSI_SLL = 0x00000000,
   MIPSI_SRL = 0x00000002,
@@ -253,8 +264,13 @@ typedef enum MIPSIns {
   MIPSI_B = 0x10000000,
   MIPSI_J = 0x08000000,
   MIPSI_JAL = 0x0c000000,
+#if !LJ_TARGET_MIPSR6
   MIPSI_JALX = 0x74000000,
   MIPSI_JR = 0x00000008,
+#else
+  MIPSI_JR = 0x00000009,
+  MIPSI_BALC = 0xe8000000,
+#endif
   MIPSI_JALR = 0x0000f809,
 
   MIPSI_BEQ = 0x10000000,
@@ -282,15 +298,23 @@ typedef enum MIPSIns {
 
   /* MIPS64 instructions. */
   MIPSI_DADD = 0x0000002c,
-  MIPSI_DADDI = 0x60000000,
   MIPSI_DADDU = 0x0000002d,
   MIPSI_DADDIU = 0x64000000,
   MIPSI_DSUB = 0x0000002e,
   MIPSI_DSUBU = 0x0000002f,
+#if !LJ_TARGET_MIPSR6
   MIPSI_DDIV = 0x0000001e,
   MIPSI_DDIVU = 0x0000001f,
   MIPSI_DMULT = 0x0000001c,
   MIPSI_DMULTU = 0x0000001d,
+#else
+  MIPSI_DDIV = 0x0000009e,
+  MIPSI_DMOD = 0x000000de,
+  MIPSI_DDIVU = 0x0000009f,
+  MIPSI_DMODU = 0x000000df,
+  MIPSI_DMUL = 0x0000009c,
+  MIPSI_DMUH = 0x000000dc,
+#endif
 
   MIPSI_DSLL = 0x00000038,
   MIPSI_DSRL = 0x0000003a,
@@ -308,6 +332,11 @@ typedef enum MIPSIns {
   MIPSI_ASUBU = LJ_32 ? MIPSI_SUBU : MIPSI_DSUBU,
   MIPSI_AL = LJ_32 ? MIPSI_LW : MIPSI_LD,
   MIPSI_AS = LJ_32 ? MIPSI_SW : MIPSI_SD,
+#if LJ_TARGET_MIPSR6
+  MIPSI_LSA = 0x00000005,
+  MIPSI_DLSA = 0x00000015,
+  MIPSI_ALSA = LJ_32 ? MIPSI_LSA : MIPSI_DLSA,
+#endif
 
   /* Extract/insert instructions. */
   MIPSI_DEXTM = 0x7c000001,
@@ -317,18 +346,19 @@ typedef enum MIPSIns {
   MIPSI_DINSU = 0x7c000006,
   MIPSI_DINS = 0x7c000007,
 
-  MIPSI_RINT_D = 0x4620001a,
-  MIPSI_RINT_S = 0x4600001a,
-  MIPSI_RINT = 0x4400001a,
   MIPSI_FLOOR_D = 0x4620000b,
-  MIPSI_CEIL_D = 0x4620000a,
-  MIPSI_ROUND_D = 0x46200008,
 
   /* FP instructions. */
   MIPSI_MOV_S = 0x46000006,
   MIPSI_MOV_D = 0x46200006,
+#if !LJ_TARGET_MIPSR6
   MIPSI_MOVT_D = 0x46210011,
   MIPSI_MOVF_D = 0x46200011,
+#else
+  MIPSI_MIN_D = 0x4620001C,
+  MIPSI_MAX_D = 0x4620001E,
+  MIPSI_SEL_D = 0x46200010,
+#endif
 
   MIPSI_ABS_D = 0x46200005,
   MIPSI_NEG_D = 0x46200007,
@@ -363,15 +393,23 @@ typedef enum MIPSIns {
   MIPSI_DMTC1 = 0x44a00000,
   MIPSI_DMFC1 = 0x44200000,
 
+#if !LJ_TARGET_MIPSR6
   MIPSI_BC1F = 0x45000000,
   MIPSI_BC1T = 0x45010000,
-
   MIPSI_C_EQ_D = 0x46200032,
   MIPSI_C_OLT_S = 0x46000034,
   MIPSI_C_OLT_D = 0x46200034,
   MIPSI_C_ULT_D = 0x46200035,
   MIPSI_C_OLE_D = 0x46200036,
   MIPSI_C_ULE_D = 0x46200037,
+#else
+  MIPSI_BC1EQZ = 0x45200000,
+  MIPSI_BC1NEZ = 0x45a00000,
+  MIPSI_CMP_EQ_D = 0x46a00002,
+  MIPSI_CMP_LT_S = 0x46800004,
+  MIPSI_CMP_LT_D = 0x46a00004,
+#endif
+
 } MIPSIns;
 
 #endif
diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
index 9839b5ac..44fba36c 100644
--- a/src/vm_mips64.dasc
+++ b/src/vm_mips64.dasc
@@ -83,6 +83,10 @@
 |
 |.define FRET1,		f0
 |.define FRET2,		f2
+|
+|.define FTMP0,		f20
+|.define FTMP1,		f21
+|.define FTMP2,		f22
 |.endif
 |
 |// Stack layout while in interpreter. Must match with lj_frame.h.
@@ -310,10 +314,10 @@
 |.endmacro
 |
 |// Assumes DISPATCH is relative to GL.
-#define DISPATCH_GL(field)      (GG_DISP2G + (int)offsetof(global_State, field))
-#define DISPATCH_J(field)       (GG_DISP2J + (int)offsetof(jit_State, field))
-#define GG_DISP2GOT             (GG_OFS(got) - GG_OFS(dispatch))
-#define DISPATCH_GOT(name)      (GG_DISP2GOT + sizeof(void*)*LJ_GOT_##name)
+#define DISPATCH_GL(field)	(GG_DISP2G + (int)offsetof(global_State, field))
+#define DISPATCH_J(field)	(GG_DISP2J + (int)offsetof(jit_State, field))
+#define GG_DISP2GOT		(GG_OFS(got) - GG_OFS(dispatch))
+#define DISPATCH_GOT(name)	(GG_DISP2GOT + sizeof(void*)*LJ_GOT_##name)
 |
 #define PC2PROTO(field)  ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
 |
@@ -492,8 +496,15 @@ static void build_subroutines(BuildCtx *ctx)
   |7:  // Less results wanted.
   |  subu TMP0, RD, TMP2
   |  dsubu TMP0, BASE, TMP0		// Either keep top or shrink it.
+  |.if MIPSR6
+  |  selnez TMP0, TMP0, TMP2		// LUA_MULTRET+1 case?
+  |  seleqz BASE, BASE, TMP2
+  |  b <3
+  |.  or BASE, BASE, TMP0
+  |.else
   |  b <3
   |.  movn BASE, TMP0, TMP2		// LUA_MULTRET+1 case?
+  |.endif
   |
   |8:  // Corner case: need to grow stack for filling up results.
   |  // This can happen if:
@@ -1125,11 +1136,16 @@ static void build_subroutines(BuildCtx *ctx)
   |.endmacro
   |
   |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1 and has delay slot!
+  |// MIPSR6: no delay slot, but a forbidden slot.
   |.macro ffgccheck
   |  ld TMP0, DISPATCH_GL(gc.total)(DISPATCH)
   |  ld TMP1, DISPATCH_GL(gc.threshold)(DISPATCH)
   |  dsubu AT, TMP0, TMP1
+  |.if MIPSR6
+  |  bgezalc AT, ->fff_gcstep
+  |.else
   |  bgezal AT, ->fff_gcstep
+  |.endif
   |.endmacro
   |
   |//-- Base library: checks -----------------------------------------------
@@ -1157,7 +1173,13 @@ static void build_subroutines(BuildCtx *ctx)
   |  sltu TMP1, TISNUM, TMP0
   |  not TMP2, TMP0
   |  li TMP3, ~LJ_TISNUM
+  |.if MIPSR6
+  |  selnez TMP2, TMP2, TMP1
+  |  seleqz TMP3, TMP3, TMP1
+  |  or TMP2, TMP2, TMP3
+  |.else
   |  movz TMP2, TMP3, TMP1
+  |.endif
   |  dsll TMP2, TMP2, 3
   |  daddu TMP2, CFUNC:RB, TMP2
   |  b ->fff_restv
@@ -1169,7 +1191,11 @@ static void build_subroutines(BuildCtx *ctx)
   |  gettp TMP2, CARG1
   |  daddiu TMP0, TMP2, -LJ_TTAB
   |  daddiu TMP1, TMP2, -LJ_TUDATA
+  |.if MIPSR6
+  |  selnez TMP0, TMP1, TMP0
+  |.else
   |  movn TMP0, TMP1, TMP0
+  |.endif
   |  bnez TMP0, >6
   |.  cleartp TAB:CARG1
   |1:  // Field metatable must be at same offset for GCtab and GCudata!
@@ -1208,7 +1234,13 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |6:
   |  sltiu AT, TMP2, LJ_TISNUM
+  |.if MIPSR6
+  |  selnez TMP0, TISNUM, AT
+  |  seleqz AT, TMP2, AT
+  |  or TMP2, TMP0, AT
+  |.else
   |  movn TMP2, TISNUM, AT
+  |.endif
   |  dsll TMP2, TMP2, 3
   |   dsubu TMP0, DISPATCH, TMP2
   |  b <2
@@ -1270,8 +1302,13 @@ static void build_subroutines(BuildCtx *ctx)
   |  or TMP0, TMP0, TMP1
   |  bnez TMP0, ->fff_fallback
   |.  sd BASE, L->base			// Add frame since C call can throw.
+  |.if MIPSR6
+  |  sd PC, SAVE_PC			// Redundant (but a defined value).
+  |  ffgccheck
+  |.else
   |  ffgccheck
   |.  sd PC, SAVE_PC			// Redundant (but a defined value).
+  |.endif
   |  load_got lj_strfmt_number
   |  move CARG1, L
   |  call_intern lj_strfmt_number	// (lua_State *L, cTValue *o)
@@ -1441,8 +1478,15 @@ static void build_subroutines(BuildCtx *ctx)
   |  addiu AT, TMP0, -LUA_YIELD
   |    daddu CARG3, CARG2, TMP0
   |   daddiu TMP3, CARG2, 8
+  |.if MIPSR6
+  |  seleqz CARG2, CARG2, AT
+  |  selnez TMP3, TMP3, AT
+  |  bgtz AT, ->fff_fallback		// st > LUA_YIELD?
+  |.  or CARG2, TMP3, CARG2
+  |.else
   |  bgtz AT, ->fff_fallback		// st > LUA_YIELD?
   |.  movn CARG2, TMP3, AT
+  |.endif
   |   xor TMP2, TMP2, CARG3
   |  bnez TMP1, ->fff_fallback		// cframe != 0?
   |.  or AT, TMP2, TMP0
@@ -1754,7 +1798,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  b ->fff_res
   |.  li RD, (2+1)*8
   |
-  |.macro math_minmax, name, intins, fpins
+  |.macro math_minmax, name, intins, intinsc, fpins
   |  .ffunc_1 name
   |  daddu TMP3, BASE, NARGS8:RC
   |  checkint CARG1, >5
@@ -1766,7 +1810,13 @@ static void build_subroutines(BuildCtx *ctx)
   |.  sextw CARG1, CARG1
   |  lw CARG2, LO(TMP2)
   |.  slt AT, CARG1, CARG2
+  |.if MIPSR6
+  |  intins TMP1, CARG2, AT
+  |  intinsc CARG1, CARG1, AT
+  |  or CARG1, CARG1, TMP1
+  |.else
   |  intins CARG1, CARG2, AT
+  |.endif
   |  daddiu TMP2, TMP2, 8
   |  zextw CARG1, CARG1
   |  b <1
@@ -1802,13 +1852,23 @@ static void build_subroutines(BuildCtx *ctx)
   |.  nop
   |7:
   |.if FPU
+  |.if MIPSR6
+  |  fpins FRET1, FRET1, FARG1
+  |.else
   |  c.olt.d FRET1, FARG1
   |  fpins FRET1, FARG1
+  |.endif
   |.else
   |  bal ->vm_sfcmpolt
   |.  nop
+  |.if MIPSR6
+  |  intins AT, CARG2, CRET1
+  |  intinsc CARG1, CARG1, CRET1
+  |  or CARG1, CARG1, AT
+  |.else
   |  intins CARG1, CARG2, CRET1
   |.endif
+  |.endif
   |  b <6
   |.  daddiu TMP2, TMP2, 8
   |
@@ -1828,8 +1888,13 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.endmacro
   |
-  |  math_minmax math_min, movz, movf.d
-  |  math_minmax math_max, movn, movt.d
+  |.if MIPSR6
+  |  math_minmax math_min, seleqz, selnez, min.d
+  |  math_minmax math_max, selnez, seleqz, max.d
+  |.else
+  |  math_minmax math_min, movz, _, movf.d
+  |  math_minmax math_max, movn, _, movt.d
+  |.endif
   |
   |//-- String library -----------------------------------------------------
   |
@@ -1854,7 +1919,9 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.ffunc string_char			// Only handle the 1-arg case here.
   |  ffgccheck
+  |.if not MIPSR6
   |.  nop
+  |.endif
   |  ld CARG1, 0(BASE)
   |  gettp TMP0, CARG1
   |  xori AT, NARGS8:RC, 8		// Exactly 1 argument.
@@ -1884,7 +1951,9 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.ffunc string_sub
   |  ffgccheck
+  |.if not MIPSR6
   |.  nop
+  |.endif
   |  addiu AT, NARGS8:RC, -16
   |  ld TMP0, 0(BASE)
   |  bltz AT, ->fff_fallback
@@ -1907,8 +1976,30 @@ static void build_subroutines(BuildCtx *ctx)
   |  addiu TMP0, CARG2, 1
   |  addu TMP1, CARG4, TMP0
   |   slt TMP3, CARG3, r0
+  |.if MIPSR6
+  |  seleqz CARG4, CARG4, AT
+  |  selnez TMP1, TMP1, AT
+  |  or CARG4, TMP1, CARG4		// if (end < 0) end += len+1
+  |.else
   |  movn CARG4, TMP1, AT		// if (end < 0) end += len+1
+  |.endif
   |   addu TMP1, CARG3, TMP0
+  |.if MIPSR6
+  |   selnez TMP1, TMP1, TMP3
+  |   seleqz CARG3, CARG3, TMP3
+  |   or CARG3, TMP1, CARG3		// if (start < 0) start += len+1
+  |   li TMP2, 1
+  |  slt AT, CARG4, r0
+  |   slt TMP3, r0, CARG3
+  |  seleqz CARG4, CARG4, AT		// if (end < 0) end = 0
+  |   selnez CARG3, CARG3, TMP3
+  |   seleqz TMP2, TMP2, TMP3
+  |   or CARG3, TMP2, CARG3		// if (start < 1) start = 1
+  |  slt AT, CARG2, CARG4
+  |  seleqz CARG4, CARG4, AT
+  |  selnez CARG2, CARG2, AT
+  |  or CARG4, CARG2, CARG4		// if (end > len) end = len
+  |.else
   |   movn CARG3, TMP1, TMP3		// if (start < 0) start += len+1
   |   li TMP2, 1
   |  slt AT, CARG4, r0
@@ -1917,6 +2008,7 @@ static void build_subroutines(BuildCtx *ctx)
   |   movz CARG3, TMP2, TMP3		// if (start < 1) start = 1
   |  slt AT, CARG2, CARG4
   |  movn CARG4, CARG2, AT		// if (end > len) end = len
+  |.endif
   |   daddu CARG2, STR:CARG1, CARG3
   |  subu CARG3, CARG4, CARG3		// len = end - start
   |   daddiu CARG2, CARG2, sizeof(GCstr)-1
@@ -1978,7 +2070,13 @@ static void build_subroutines(BuildCtx *ctx)
   |  slt AT, CARG1, r0
   |  dsrlv CRET1, TMP0, CARG3
   |  dsubu TMP0, r0, CRET1
+  |.if MIPSR6
+  |  selnez TMP0, TMP0, AT
+  |  seleqz CRET1, CRET1, AT
+  |  or CRET1, CRET1, TMP0
+  |.else
   |  movn CRET1, TMP0, AT
+  |.endif
   |  jr ra
   |.  zextw CRET1, CRET1
   |1:
@@ -2001,14 +2099,28 @@ static void build_subroutines(BuildCtx *ctx)
   |  slt AT, CARG1, r0
   |  dsrlv CRET1, CRET2, TMP0
   |  dsubu CARG1, r0, CRET1
+  |.if MIPSR6
+  |  seleqz CRET1, CRET1, AT
+  |  selnez CARG1, CARG1, AT
+  |  or CRET1, CRET1, CARG1
+  |.else
   |  movn CRET1, CARG1, AT
+  |.endif
   |  li CARG1, 64
   |  subu TMP0, CARG1, TMP0
   |  dsllv CRET2, CRET2, TMP0	// Integer check.
   |  sextw AT, CRET1
   |  xor AT, CRET1, AT		// Range check.
   |  jr ra
+  |.if MIPSR6
+  |  seleqz AT, AT, CRET2
+  |  selnez CRET2, CRET2, CRET2
+  |  jr ra
+  |.  or CRET2, AT, CRET2
+  |.else
+  |  jr ra
   |.  movz CRET2, AT, CRET2
+  |.endif
   |1:
   |  jr ra
   |.  li CRET2, 1
@@ -2518,15 +2630,22 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |// Hard-float round to integer.
   |// Modifies AT, TMP0, FRET1, FRET2, f4. Keeps all others incl. FARG1.
+  |// MIPSR6: Modifies FTMP1, too.
   |.macro vm_round_hf, func
   |  lui TMP0, 0x4330			// Hiword of 2^52 (double).
   |  dsll TMP0, TMP0, 32
   |  dmtc1 TMP0, f4
   |  abs.d FRET2, FARG1			// |x|
   |    dmfc1 AT, FARG1
+  |.if MIPSR6
+  |  cmp.lt.d FTMP1, FRET2, f4
+  |   add.d FRET1, FRET2, f4		// (|x| + 2^52) - 2^52
+  |  bc1eqz FTMP1, >1			// Truncate only if |x| < 2^52.
+  |.else
   |  c.olt.d 0, FRET2, f4
   |   add.d FRET1, FRET2, f4		// (|x| + 2^52) - 2^52
   |  bc1f 0, >1				// Truncate only if |x| < 2^52.
+  |.endif
   |.  sub.d FRET1, FRET1, f4
   |    slt AT, AT, r0
   |.if "func" == "ceil"
@@ -2537,16 +2656,38 @@ static void build_subroutines(BuildCtx *ctx)
   |.if "func" == "trunc"
   |   dsll TMP0, TMP0, 32
   |   dmtc1 TMP0, f4
+  |.if MIPSR6
+  |  cmp.lt.d FTMP1, FRET2, FRET1	// |x| < result?
+  |   sub.d FRET2, FRET1, f4
+  |  sel.d  FTMP1, FRET1, FRET2		// If yes, subtract +1.
+  |  dmtc1 AT, FRET1
+  |  neg.d FRET2, FTMP1
+  |  jr ra
+  |.  sel.d FRET1, FTMP1, FRET2		// Merge sign bit back in.
+  |.else
   |  c.olt.d 0, FRET2, FRET1		// |x| < result?
   |   sub.d FRET2, FRET1, f4
   |  movt.d FRET1, FRET2, 0		// If yes, subtract +1.
   |  neg.d FRET2, FRET1
   |  jr ra
   |.  movn.d FRET1, FRET2, AT		// Merge sign bit back in.
+  |.endif
   |.else
   |  neg.d FRET2, FRET1
   |   dsll TMP0, TMP0, 32
   |   dmtc1 TMP0, f4
+  |.if MIPSR6
+  |  dmtc1 AT, FTMP1
+  |  sel.d FTMP1, FRET1, FRET2
+  |.if "func" == "ceil"
+  |  cmp.lt.d FRET1, FTMP1, FARG1	// x > result?
+  |.else
+  |  cmp.lt.d FRET1, FARG1, FTMP1	// x < result?
+  |.endif
+  |   sub.d FRET2, FTMP1, f4		// If yes, subtract +-1.
+  |  jr ra
+  |.  sel.d FRET1, FTMP1, FRET2
+  |.else
   |  movn.d FRET1, FRET2, AT		// Merge sign bit back in.
   |.if "func" == "ceil"
   |  c.olt.d 0, FRET1, FARG1		// x > result?
@@ -2557,6 +2698,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  jr ra
   |.  movt.d FRET1, FRET2, 0
   |.endif
+  |.endif
   |1:
   |  jr ra
   |.  mov.d FRET1, FARG1
@@ -2701,7 +2843,7 @@ static void build_subroutines(BuildCtx *ctx)
   |.  li CRET1, 0
   |.endif
   |
-  |.macro sfmin_max, name, intins
+  |.macro sfmin_max, name, intins, intinsc
   |->vm_sf .. name:
   |.if JIT and not FPU
   |  move TMP2, ra
@@ -2710,13 +2852,25 @@ static void build_subroutines(BuildCtx *ctx)
   |  move ra, TMP2
   |  move TMP0, CRET1
   |  move CRET1, CARG1
+  |.if MIPSR6
+  |  intins CRET1, CRET1, TMP0
+  |  intinsc TMP0, CARG2, TMP0
+  |  jr ra
+  |.  or CRET1, CRET1, TMP0
+  |.else
   |  jr ra
   |.  intins CRET1, CARG2, TMP0
   |.endif
+  |.endif
   |.endmacro
   |
-  |  sfmin_max min, movz
-  |  sfmin_max max, movn
+  |.if MIPSR6
+  |  sfmin_max min, selnez, seleqz
+  |  sfmin_max max, seleqz, selnez
+  |.else
+  |  sfmin_max min, movz, _
+  |  sfmin_max max, movn, _
+  |.endif
   |
   |//-----------------------------------------------------------------------
   |//-- Miscellaneous functions --------------------------------------------
@@ -2885,7 +3039,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |    lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
     |  slt AT, CARG1, CARG2
     |    addu TMP2, TMP2, TMP3
+    |.if MIPSR6
+    |  movop TMP2, TMP2, AT
+    |.else
     |  movop TMP2, r0, AT
+    |.endif
     |1:
     |  daddu PC, PC, TMP2
     |  ins_next
@@ -2903,16 +3061,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.endif
     |3:  // RA and RD are both numbers.
     |.if FPU
-    |  fcomp f20, f22
+    |.if MIPSR6
+    |  fcomp FTMP0, FTMP0, FTMP2
+    |   addu TMP2, TMP2, TMP3
+    |  mfc1 TMP3, FTMP0
+    |  b <1
+    |.  fmovop TMP2, TMP2, TMP3
+    |.else
+    |  fcomp FTMP0, FTMP2
     |   addu TMP2, TMP2, TMP3
     |  b <1
     |.  fmovop TMP2, r0
+    |.endif
     |.else
     |  bal sfcomp
     |.   addu TMP2, TMP2, TMP3
     |  b <1
+    |.if MIPSR6
+    |.  movop TMP2, TMP2, CRET1
+    |.else
     |.  movop TMP2, r0, CRET1
     |.endif
+    |.endif
     |
     |4:  // RA is a number, RD is not a number.
     |  bne CARG4, TISNUM, ->vmeta_comp
@@ -2959,15 +3129,27 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.endif
     |.endmacro
     |
+    |.if MIPSR6
+    if (op == BC_ISLT) {
+      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, selnez, selnez, cmp.lt.d, ->vm_sfcmpolt
+    } else if (op == BC_ISGE) {
+      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, seleqz, seleqz, cmp.lt.d, ->vm_sfcmpolt
+    } else if (op == BC_ISLE) {
+      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, seleqz, seleqz, cmp.ult.d, ->vm_sfcmpult
+    } else {
+      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, selnez, selnez, cmp.ult.d, ->vm_sfcmpult
+    }
+    |.else
     if (op == BC_ISLT) {
-      |  bc_comp f20, f22, CARG1, CARG2, movz, movf, c.olt.d, ->vm_sfcmpolt
+      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, movz, movf, c.olt.d, ->vm_sfcmpolt
     } else if (op == BC_ISGE) {
-      |  bc_comp f20, f22, CARG1, CARG2, movn, movt, c.olt.d, ->vm_sfcmpolt
+      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, movn, movt, c.olt.d, ->vm_sfcmpolt
     } else if (op == BC_ISLE) {
-      |  bc_comp f22, f20, CARG2, CARG1, movn, movt, c.ult.d, ->vm_sfcmpult
+      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, movn, movt, c.ult.d, ->vm_sfcmpult
     } else {
-      |  bc_comp f22, f20, CARG2, CARG1, movz, movf, c.ult.d, ->vm_sfcmpult
+      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, movz, movf, c.ult.d, ->vm_sfcmpult
     }
+    |.endif
     break;
 
   case BC_ISEQV: case BC_ISNEV:
@@ -3013,7 +3195,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |2:  // Check if the tags are the same and it's a table or userdata.
     |  xor AT, CARG3, CARG4			// Same type?
     |  sltiu TMP0, CARG3, LJ_TISTABUD+1		// Table or userdata?
+    |.if MIPSR6
+    |  seleqz TMP0, TMP0, AT
+    |.else
     |  movn TMP0, r0, AT
+    |.endif
     if (vk) {
       |  beqz TMP0, <1
     } else {
@@ -3063,11 +3249,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
     |  xor TMP1, CARG1, CARG2
     |   addu TMP2, TMP2, TMP3
+    |.if MIPSR6
+    if (vk) {
+      |  seleqz TMP2, TMP2, TMP1
+    } else {
+      |  selnez TMP2, TMP2, TMP1
+    }
+    |.else
     if (vk) {
       |  movn TMP2, r0, TMP1
     } else {
       |  movz TMP2, r0, TMP1
     }
+    |.endif
     |  daddu PC, PC, TMP2
     |  ins_next
     break;
@@ -3094,6 +3288,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  bne CARG4, TISNUM, >6
     |.   addu TMP2, TMP2, TMP3
     |  xor AT, CARG1, CARG2
+    |.if MIPSR6
+    if (vk) {
+      | seleqz TMP2, TMP2, AT
+      |1:
+      |  daddu PC, PC, TMP2
+      |2:
+    } else {
+      |  selnez TMP2, TMP2, AT
+      |1:
+      |2:
+      |  daddu PC, PC, TMP2
+    }
+    |.else
     if (vk) {
       | movn TMP2, r0, AT
       |1:
@@ -3105,6 +3312,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |2:
       |  daddu PC, PC, TMP2
     }
+    |.endif
     |  ins_next
     |
     |3:  // RA is not an integer.
@@ -3117,30 +3325,49 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.   addu TMP2, TMP2, TMP3
     |  sltu AT, CARG4, TISNUM
     |.if FPU
-    |  ldc1 f20, 0(RA)
-    |   ldc1 f22, 0(RD)
+    |  ldc1 FTMP0, 0(RA)
+    |   ldc1 FTMP2, 0(RD)
     |.endif
     |  beqz AT, >5
     |.  nop
     |4:  // RA and RD are both numbers.
     |.if FPU
-    |  c.eq.d f20, f22
+    |.if MIPSR6
+    |  cmp.eq.d FTMP0, FTMP0, FTMP2
+    |  dmfc1 TMP1, FTMP0
+    |  b <1
+    if (vk) {
+      |.  selnez TMP2, TMP2, TMP1
+    } else {
+      |.  seleqz TMP2, TMP2, TMP1
+    }
+    |.else
+    |  c.eq.d FTMP0, FTMP2
     |  b <1
     if (vk) {
       |.  movf TMP2, r0
     } else {
       |.  movt TMP2, r0
     }
+    |.endif
     |.else
     |  bal ->vm_sfcmpeq
     |.  nop
     |  b <1
+    |.if MIPSR6
+    if (vk) {
+      |.  selnez TMP2, TMP2, CRET1
+    } else {
+      |.  seleqz TMP2, TMP2, CRET1
+    }
+    |.else
     if (vk) {
       |.  movz TMP2, r0, CRET1
     } else {
       |.  movn TMP2, r0, CRET1
     }
     |.endif
+    |.endif
     |
     |5:  // RA is a number, RD is not a number.
     |.if FFI
@@ -3150,9 +3377,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.endif
     |  // RA is a number, RD is an integer. Convert RD to a number.
     |.if FPU
-    |.  lwc1 f22, LO(RD)
+    |.  lwc1 FTMP2, LO(RD)
     |  b <4
-    |.  cvt.d.w f22, f22
+    |.  cvt.d.w FTMP2, FTMP2
     |.else
     |.  sextw CARG2, CARG2
     |  bal ->vm_sfi2d_2
@@ -3170,10 +3397,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.endif
     |  // RA is an integer, RD is a number. Convert RA to a number.
     |.if FPU
-    |.  lwc1 f20, LO(RA)
-    |   ldc1 f22, 0(RD)
+    |.  lwc1 FTMP0, LO(RA)
+    |   ldc1 FTMP2, 0(RD)
     |  b <4
-    |   cvt.d.w f20, f20
+    |   cvt.d.w FTMP0, FTMP0
     |.else
     |.  sextw CARG1, CARG1
     |  bal ->vm_sfi2d_1
@@ -3216,11 +3443,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  decode_RD4b TMP2
     |  lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
     |  addu TMP2, TMP2, TMP3
+    |.if MIPSR6
+    if (vk) {
+      |  seleqz TMP2, TMP2, TMP0
+    } else {
+      |  selnez TMP2, TMP2, TMP0
+    }
+    |.else
     if (vk) {
       |  movn TMP2, r0, TMP0
     } else {
       |  movz TMP2, r0, TMP0
     }
+    |.endif
     |  daddu PC, PC, TMP2
     |  ins_next
     break;
@@ -3239,11 +3474,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |   decode_RD4b TMP2
       |   lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
       |   addu TMP2, TMP2, TMP3
+      |.if MIPSR6
+      if (op == BC_IST) {
+	|  selnez TMP2, TMP2, TMP0;
+      } else {
+	|  seleqz TMP2, TMP2, TMP0;
+      }
+      |.else
       if (op == BC_IST) {
 	|  movz TMP2, r0, TMP0
       } else {
 	|  movn TMP2, r0, TMP0
       }
+      |.endif
       |  daddu PC, PC, TMP2
     } else {
       |  ld CRET1, 0(RD)
@@ -3486,9 +3729,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  bltz TMP1, ->vmeta_arith
     |.  daddu RA, BASE, RA
     |.elif "intins" == "mult"
+    |.if MIPSR6
+    |.  nop
+    |  mul CRET1, CARG3, CARG4
+    |  muh TMP2, CARG3, CARG4
+    |.else
     |.  intins CARG3, CARG4
     |  mflo CRET1
     |  mfhi TMP2
+    |.endif
     |  sra TMP1, CRET1, 31
     |  bne TMP1, TMP2, ->vmeta_arith
     |.  daddu RA, BASE, RA
@@ -3511,16 +3760,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |.endif
     |
     |5:  // Check for two numbers.
-    |  .FPU ldc1 f20, 0(RB)
+    |  .FPU ldc1 FTMP0, 0(RB)
     |  sltu AT, TMP0, TISNUM
     |   sltu TMP0, TMP1, TISNUM
-    |  .FPU ldc1 f22, 0(RC)
+    |  .FPU ldc1 FTMP2, 0(RC)
     |   and AT, AT, TMP0
     |   beqz AT, ->vmeta_arith
     |.   daddu RA, BASE, RA
     |
     |.if FPU
-    |  fpins FRET1, f20, f22
+    |  fpins FRET1, FTMP0, FTMP2
     |.elif "fpcall" == "sfpmod"
     |  sfpmod
     |.else
@@ -3850,7 +4099,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  li TMP0, 0x801
       |  addiu AT, CARG2, -0x7ff
       |   srl CARG3, RD, 14
+      |.if MIPSR6
+      |  seleqz TMP0, TMP0, AT
+      |  selnez CARG2, CARG2, AT
+      |  or CARG2, CARG2, TMP0
+      |.else
       |  movz CARG2, TMP0, AT
+      |.endif
       |  // (lua_State *L, int32_t asize, uint32_t hbits)
       |  call_intern lj_tab_new
       |.  move CARG1, L
@@ -4131,7 +4386,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  daddu NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
     |   settp STR:RC, TMP3		// Tagged key to look for.
     |.if FPU
-    |   ldc1 f20, 0(RA)
+    |   ldc1 FTMP0, 0(RA)
     |.else
     |   ld CRET1, 0(RA)
     |.endif
@@ -4147,7 +4402,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  andi AT, TMP3, LJ_GC_BLACK	// isblack(table)
     |  bnez AT, >7
     |.if FPU
-    |.  sdc1 f20, NODE:TMP2->val
+    |.  sdc1 FTMP0, NODE:TMP2->val
     |.else
     |.  sd CRET1, NODE:TMP2->val
     |.endif
@@ -4188,7 +4443,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ld BASE, L->base
     |.if FPU
     |  b <3				// No 2nd write barrier needed.
-    |.  sdc1 f20, 0(CRET1)
+    |.  sdc1 FTMP0, 0(CRET1)
     |.else
     |  ld CARG1, 0(RA)
     |  b <3				// No 2nd write barrier needed.
@@ -4531,7 +4786,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ld CARG1, 0(RC)
     |  sltu AT, RC, TMP3
     |    daddiu RC, RC, 8
+    |.if MIPSR6
+    |  selnez CARG1, CARG1, AT
+    |  seleqz AT, TISNIL, AT
+    |  or CARG1, CARG1, AT
+    |.else
     |  movz CARG1, TISNIL, AT
+    |.endif
     |  sd CARG1, 0(RA)
     |  sltu AT, RA, TMP2
     |  bnez AT, <1
@@ -4720,7 +4981,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  dext AT, CRET1, 31, 0
       |  slt CRET1, CARG2, CARG3
       |  slt TMP1, CARG3, CARG2
+      |.if MIPSR6
+      |  selnez TMP1, TMP1, AT
+      |  seleqz CRET1, CRET1, AT
+      |  or CRET1, CRET1, TMP1
+      |.else
       |  movn CRET1, TMP1, AT
+      |.endif
     } else {
       |  bne CARG3, TISNUM, >5
       |.  ld CARG2, FORL_STEP*8(RA)	// STEP CARG2 - CARG4 type
@@ -4736,20 +5003,34 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  slt CRET1, CRET1, CARG1
       |  slt AT, CARG2, r0
       |   slt TMP0, TMP0, r0		// ((y^a) & (y^b)) < 0: overflow.
+      |.if MIPSR6
+      |  selnez TMP1, TMP1, AT
+      |  seleqz CRET1, CRET1, AT
+      |  or CRET1, CRET1, TMP1
+      |.else
       |  movn CRET1, TMP1, AT
+      |.endif
       |   or CRET1, CRET1, TMP0
       |  zextw CARG1, CARG1
       |  settp CARG1, TISNUM
     }
     |1:
     if (op == BC_FORI) {
+      |.if MIPSR6
+      |  selnez TMP2, TMP2, CRET1
+      |.else
       |  movz TMP2, r0, CRET1
+      |.endif
       |  daddu PC, PC, TMP2
     } else if (op == BC_JFORI) {
       |  daddu PC, PC, TMP2
       |  lhu RD, -4+OFS_RD(PC)
     } else if (op == BC_IFORL) {
+      |.if MIPSR6
+      |  seleqz TMP2, TMP2, CRET1
+      |.else
       |  movn TMP2, r0, CRET1
+      |.endif
       |  daddu PC, PC, TMP2
     }
     if (vk) {
@@ -4779,6 +5060,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  and AT, AT, TMP0
       |  beqz AT, ->vmeta_for
       |.  slt TMP3, TMP3, r0
+      |.if MIPSR6
+      |   dmtc1 TMP3, FTMP2
+      |  cmp.lt.d FTMP0, f0, f2
+      |  cmp.lt.d FTMP1, f2, f0
+      |  sel.d FTMP2, FTMP1, FTMP0
+      |  b <1
+      |.  dmfc1 CRET1, FTMP2
+      |.else
       |  c.ole.d 0, f0, f2
       |  c.ole.d 1, f2, f0
       |  li CRET1, 1
@@ -4786,12 +5075,25 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  movt AT, r0, 1
       |  b <1
       |.  movn CRET1, AT, TMP3
+      |.endif
     } else {
       |  ldc1 f0, FORL_IDX*8(RA)
       |   ldc1 f4, FORL_STEP*8(RA)
       |    ldc1 f2, FORL_STOP*8(RA)
       |   ld TMP3, FORL_STEP*8(RA)
       |  add.d f0, f0, f4
+      |.if MIPSR6
+      |   slt TMP3, TMP3, r0
+      |   dmtc1 TMP3, FTMP2
+      |  cmp.lt.d FTMP0, f0, f2
+      |  cmp.lt.d FTMP1, f2, f0
+      |  sel.d FTMP2, FTMP1, FTMP0
+      |  dmfc1 CRET1, FTMP2
+      if (op == BC_IFORL) {
+	|  seleqz TMP2, TMP2, CRET1
+	|  daddu PC, PC, TMP2
+      }
+      |.else
       |  c.ole.d 0, f0, f2
       |  c.ole.d 1, f2, f0
       |   slt TMP3, TMP3, r0
@@ -4804,6 +5106,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
 	|  movn TMP2, r0, CRET1
 	|  daddu PC, PC, TMP2
       }
+      |.endif
       |  sdc1 f0, FORL_IDX*8(RA)
       |  ins_next1
       |  b <2
@@ -4979,8 +5282,17 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ld TMP0, 0(RA)
     |  sltu AT, RA, RC			// Less args than parameters?
     |  move CARG1, TMP0
+    |.if MIPSR6
+    |  selnez TMP0, TMP0, AT
+    |  seleqz TMP3, TISNIL, AT
+    |  or TMP0, TMP0, TMP3
+    |  seleqz TMP3, CARG1, AT
+    |  selnez CARG1, TISNIL, AT
+    |  or CARG1, CARG1, TMP3
+    |.else
     |  movz TMP0, TISNIL, AT		// Clear missing parameters.
     |  movn CARG1, TISNIL, AT		// Clear old fixarg slot (help the GC).
+    |.endif
     |    addiu TMP2, TMP2, -1
     |  sd TMP0, 16(TMP1)
     |    daddiu TMP1, TMP1, 8
-- 
2.41.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
@ 2023-08-11  8:06   ` Sergey Kaplun via Tarantool-patches
  2023-08-15 13:10   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 17:15   ` Sergey Bronnikov via Tarantool-patches
  2 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-11  8:06 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

Hi, folks!
I found that some changes (see <src/lj_asm.c>) are missing, I've updated
the patch and force-pushed the branch.

===================================================================
Cleanup math function compilation and fix inconsistencies.

(cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)

This patch backports the aforementioned patch for mips and ppc, because
those architectures were stripped during the backporting via
71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
compilation and fix inconsistencies."). This applies these missed diffs
to prevent conflict during backporting future patches.

This patch just removes macros, that are no more in use. Also, it
removes `IR_ATAN2` usage, that is not defined.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#8825

diff --git a/src/lj_asm.c b/src/lj_asm.c
index 15de7e33..ff68f79b 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -1705,7 +1705,7 @@ static void asm_ir(ASMState *as, IRIns *ir)
   case IR_NEG: asm_neg(as, ir); break;
 #if LJ_SOFTFP32
   case IR_DIV: case IR_POW: case IR_ABS:
-  case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
+  case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
     lua_assert(0);  /* Unused for LJ_SOFTFP32. */
     break;
 #else
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index a26a82cd..c27d8413 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
 }
 #endif
 
-#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
 #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
 
 static void asm_arithov(ASMState *as, IRIns *ir)
diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
index 6cb608f7..6aaed058 100644
--- a/src/lj_asm_ppc.h
+++ b/src/lj_asm_ppc.h
@@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
 }
 
 #define asm_abs(as, ir)		asm_fpunary(as, ir, PPCI_FABS)
-#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
 #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
 
 static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
@ 2023-08-15  9:36   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 12:40     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 13:25   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15  9:36 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
LGTM, except for a few comments below.

On Wed, Aug 09, 2023 at 06:35:50PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by Djordje Kovacevic and Stefan Pejic.
> 
> (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
> 
> Without the aforementioned checks, some non-branch instructions may be
> interpreted as some branch due to memory address collisions. This patch
Please add a more comprehensive description of behavior before the patch.
Because of magic values it is not obvious that the difference between the
current PC and the jump address is XORed with the opcode, to make sure
that this is a branching instruction.

Typo: s/some branch/branches/
> adds the corresponding comparisons masked values with instruction
Typo: s/comparisons masked values/mask values for comparisons/
> opcodes used in the LuaJIT:
> * `MIPSI_BEQ` for `beq` and `bne`,
> * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
> * `MIPSI_BC1F` for `bc1f` and `bc1t`,
> see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
> details.
> 
> To reproduce this failure, we need specific memory mapping, so testcase
Typo: s/testcase/the test case/
> is omitted.
> 
> Since MIPS architecture is not supported by Tarantool (at the moment)
> this patch is not necessary for backport. OTOH, it gives to us the
Typo: s/gives to us/gives us/
> following benefits:
> * Be in sync with the LuaJIT upstream not only for x86_64, arm64
>   architectures.
> * Avoid conflicts during the future backporting.
Typo: s/during the future/during future/
> So, it's more useful to backport some of the patches to avoid conflicts
> with the future patch series.
> 
> [1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
> 
> Sergey Kaplun:
> * added the description for the problem
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_asm_mips.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 03417013..03215821 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -2472,7 +2472,11 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
>    MCode tjump = MIPSI_J|(((uintptr_t)target>>2)&0x03ffffffu);
>    for (p++; p < pe; p++) {
>      if (*p == exitload) {  /* Look for load of exit number. */
> -      if (((p[-1] ^ (px-p)) & 0xffffu) == 0) {  /* Look for exitstub branch. */
> +      /* Look for exitstub branch. Yes, this covers all used branch variants. */
> +      if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
> +	  ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
> +	   (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
> +	   (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
>  	ptrdiff_t delta = target - p;
>  	if (((delta + 0x8000) >> 16) == 0) {  /* Patch in-range branch. */
>  	patchbranch:
> -- 
> 2.41.0
> 
Best regards,
Maxim Kokryashkin

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
@ 2023-08-15 10:14   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 12:55     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 14:32   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 10:14 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
Please consider my comments below.

On Wed, Aug 09, 2023 at 06:35:51PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
> depends on particular offset of mcode for side trace regarding the
> parent trace. Before this commit just run some amount of functions to
> generate traces to fill the required mcode range. Unfortunately, this
> approach is not robust, since sometimes trace is not recorded due to
> errors "leaving loop in root trace" observed because of hotcount
> collisions.
> 
> This patch introduces the following helpers:
> * `frontend.gettraceno(func)` -- returns the traceno for the given
>   function, assumming that there is compiled trace for its prototype
>   (i.e. the 0th bytecode is JFUNC).
> * `jit.generators.fillmcode(traceno, size)` fills mcode area of the
>   given size from the given trace. It is useful to generate some mcode
>   to test jumps to side traces remote enough from the parent.
> ---
>  ...8-fix-side-exit-patching-on-arm64.test.lua |  78 ++----------
>  test/tarantool-tests/utils/frontend.lua       |  24 ++++
>  test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
>  3 files changed, 150 insertions(+), 67 deletions(-)
>  create mode 100644 test/tarantool-tests/utils/jit/generators.lua
> 
> diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> index 93db3041..678ac914 100644
> --- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> +++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> @@ -1,8 +1,12 @@
>  local tap = require('tap')
>  local test = tap.test('gh-6098-fix-side-exit-patching-on-arm64'):skipcond({
>    ['Test requires JIT enabled'] = not jit.status(),
> +  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
>  })
>  
> +local generators = require('utils').jit.generators
> +local frontend = require('utils').frontend
> +
>  test:plan(1)
>  
>  -- The function to be tested for side exit patching:
> @@ -20,52 +24,6 @@ local function cbool(cond)
>    end
>  end
>  
> --- XXX: Function template below produces 8Kb mcode for ARM64, so
> --- we need to compile at least 128 traces to exceed 1Mb delta
> --- between <cbool> root trace side exit and <cbool> side trace.
> --- Unfortunately, we have no other option for extending this jump
> --- delta, since the base of the current mcode area (J->mcarea) is
> --- used as a hint for mcode allocator (see lj_mcode.c for info).
> -local FUNCS = 128
> -local recfuncs = { }
> -for i = 1, FUNCS do
> -  -- This is a quite heavy workload (though it doesn't look like
> -  -- one at first). Each load from a table is type guarded. Each
> -  -- table lookup (for both stores and loads) is guarded for table
> -  -- <hmask> value and metatable presence. The code below results
> -  -- to 8Kb of mcode for ARM64 in practice.
> -  recfuncs[i] = assert(load(([[
> -    return function(src)
> -      local p = %d
> -      local tmp = { }
> -      local dst = { }
> -      for i = 1, 3 do
> -        tmp.a = src.a * p   tmp.j = src.j * p   tmp.s = src.s * p
> -        tmp.b = src.b * p   tmp.k = src.k * p   tmp.t = src.t * p
> -        tmp.c = src.c * p   tmp.l = src.l * p   tmp.u = src.u * p
> -        tmp.d = src.d * p   tmp.m = src.m * p   tmp.v = src.v * p
> -        tmp.e = src.e * p   tmp.n = src.n * p   tmp.w = src.w * p
> -        tmp.f = src.f * p   tmp.o = src.o * p   tmp.x = src.x * p
> -        tmp.g = src.g * p   tmp.p = src.p * p   tmp.y = src.y * p
> -        tmp.h = src.h * p   tmp.q = src.q * p   tmp.z = src.z * p
> -        tmp.i = src.i * p   tmp.r = src.r * p
> -
> -        dst.a = tmp.z + p   dst.j = tmp.q + p   dst.s = tmp.h + p
> -        dst.b = tmp.y + p   dst.k = tmp.p + p   dst.t = tmp.g + p
> -        dst.c = tmp.x + p   dst.l = tmp.o + p   dst.u = tmp.f + p
> -        dst.d = tmp.w + p   dst.m = tmp.n + p   dst.v = tmp.e + p
> -        dst.e = tmp.v + p   dst.n = tmp.m + p   dst.w = tmp.d + p
> -        dst.f = tmp.u + p   dst.o = tmp.l + p   dst.x = tmp.c + p
> -        dst.g = tmp.t + p   dst.p = tmp.k + p   dst.y = tmp.b + p
> -        dst.h = tmp.s + p   dst.q = tmp.j + p   dst.z = tmp.a + p
> -        dst.i = tmp.r + p   dst.r = tmp.i + p
> -      end
> -      dst.tmp = tmp
> -      return dst
> -    end
> -  ]]):format(i)), ('Syntax error in function recfuncs[%d]'):format(i))()
> -end
> -
>  -- Make compiler work hard:
>  -- * No optimizations at all to produce more mcode.
>  -- * Try to compile all compiled paths as early as JIT can.
> @@ -78,27 +36,13 @@ cbool(true)
>  -- a root trace for <cbool>.
>  cbool(true)
>  
> -for i = 1, FUNCS do
> -  -- XXX: FNEW is NYI, hence loop recording fails at this point.
> -  -- The recording is aborted on purpose: we are going to record
> -  -- <FUNCS> number of traces for functions in <recfuncs>.
> -  -- Otherwise, loop recording might lead to a very long trace
> -  -- error (via return to a lower frame), or a trace with lots of
> -  -- side traces. We need neither of this, but just bunch of
> -  -- traces filling the available mcode area.
> -  local function tnew(p)
> -    return {
> -      a = p + 1, f = p + 6,  k = p + 11, p = p + 16, u = p + 21, z = p + 26,
> -      b = p + 2, g = p + 7,  l = p + 12, q = p + 17, v = p + 22,
> -      c = p + 3, h = p + 8,  m = p + 13, r = p + 18, w = p + 23,
> -      d = p + 4, i = p + 9,  n = p + 14, s = p + 19, x = p + 24,
> -      e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
> -    }
> -  end
> -  -- Each function call produces a trace (see the template for the
> -  -- function definition above).
> -  recfuncs[i](tnew(i))
> -end
> +local cbool_traceno = frontend.gettraceno(cbool)
> +
> +-- XXX: Unfortunately, we have no other option for extending
> +-- this jump delta, since the base of the current mcode area
> +-- (J->mcarea) is used as a hint for mcode allocator (see
> +-- lj_mcode.c for info).
> +generators.fillmcode(cbool_traceno, 1024 * 1024)
>  
>  -- XXX: I tried to make the test in pure Lua, but I failed to
>  -- implement the robust solution. As a result I've implemented a
> diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> index 2afebbb2..414257fd 100644
> --- a/test/tarantool-tests/utils/frontend.lua
> +++ b/test/tarantool-tests/utils/frontend.lua
> @@ -1,6 +1,10 @@
>  local M = {}
>  
>  local bc = require('jit.bc')
> +local jutil = require('jit.util')
> +local vmdef = require('jit.vmdef')
> +local bcnames = vmdef.bcnames
> +local band, rshift = bit.band, bit.rshift
>  
>  function M.hasbc(f, bytecode)
>    assert(type(f) == 'function', 'argument #1 should be a function')
> @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
>    return hasbc
>  end
>  
> +-- Get traceno of the trace assotiated for the given function.
> +function M.gettraceno(func)
> +  assert(type(func) == 'function', 'argument #1 should be a function')
> +
> +  -- The 0th BC is the header.
> +  local func_ins = jutil.funcbc(func, 0)
> +  local BC_NAME_LENGTH = 6
> +  local RD_SHIFT = 16
> +
> +  -- Calculate index in `bcnames` string.
> +  local op_idx = BC_NAME_LENGTH * band(func_ins, 0xff)
> +  -- Get the name of the operation.
> +  local op_name = string.sub(bcnames, op_idx + 1, op_idx + BC_NAME_LENGTH)
> +  assert(op_name:match('JFUNC'),
> +         'The given function has non-jitted header: ' .. op_name)
> +
> +  -- RD contains the traceno.
> +  return rshift(func_ins, RD_SHIFT)
> +end
> +
>  return M
> diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
> new file mode 100644
> index 00000000..62b6e0ef
> --- /dev/null
> +++ b/test/tarantool-tests/utils/jit/generators.lua
> @@ -0,0 +1,115 @@
> +local M = {}
> +
> +local jutil = require('jit.util')
> +
> +local function getlast_traceno()
> +  return misc.getmetrics().jit_trace_num
> +end
> +
> +-- Convert addr to positive value if needed.
> +local function canonize_address(addr)
Nit: most of the time, the `canonize` variant is used in theological materials,
while the `canonicalize` is more common in the sphere of software development.
Feel free to ignore.
> +  if addr < 0 then addr = addr + 2 ^ 32 end
> +  return addr
> +end
> +
> +-- Need some storage to avoid functions and traces to be
> +-- collected.
Typo: s/Need/We need/ or s/Need some storage/Some storage is needed/
Typo: s/to be collected/being collected/
> +local recfuncs = {}
> +local last_i = 0
> +-- This function generates a table of functions with heavy mcode
> +-- payload with tab arithmetics to fill the mcode area from the
> +-- one trace mcode by the some given size. This size is usually
Typo: s/by the some/by some/
> +-- big enough, because we want to check long jump side exits from
> +-- some traces.
> +-- Assumes, that maxmcode and maxtrace options are set to be sure,
Typo: s/that/that the/
> +-- that we can produce such amount of mcode.
> +function M.fillmcode(trace_from, size)
> +  local mcode, addr_from = jutil.tracemc(trace_from)
> +  assert(mcode, 'the #1 argument should be an existed trace number')
Typo: s/existed/existing/
> +  addr_from = canonize_address(addr_from)
> +  local required_diff = size + #mcode
> +
> +  -- Marker to check that traces are not flushed.
> +  local maxtraceno = getlast_traceno()
> +  local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
> +
> +  local _, last_addr = jutil.tracemc(maxtraceno)
> +  last_addr = canonize_address(last_addr)
> +
> +  -- Addresses of traces may increase or decrease depending on OS,
> +  -- so use absolute diff.
> +  while math.abs(last_addr - addr_from) > required_diff do
> +    last_i = last_i + 1
> +    -- This is a quite heavy workload (though it doesn't look like
Typo: s/This is a quite/This is quite a/
> +    -- one at first). Each load from a table is type guarded. Each
> +    -- table lookup (for both stores and loads) is guarded for
> +    -- table <hmask> value and presence of the metatable. The code
Typo: s/and presence/and the presence/
> +    -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
Typo: s/results to/results in/
> +    -- practice.
> +    local fname = ('fillmcode[%d]'):format(last_i)
> +    recfuncs[last_i] = assert(loadstring(([[
> +      return function(src)
> +        local p = %d
Nit: Poor naming, a more descriptive name is preferred.
> +        local tmp = { }
> +        local dst = { }
> +        -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
Typo: s/as stop/as a stop/
> +        -- in root trace) errors due to hotcount collisions.
> +        for i = 1, 5 do
> +          tmp.a = src.a * p   tmp.j = src.j * p   tmp.s = src.s * p
> +          tmp.b = src.b * p   tmp.k = src.k * p   tmp.t = src.t * p
> +          tmp.c = src.c * p   tmp.l = src.l * p   tmp.u = src.u * p
> +          tmp.d = src.d * p   tmp.m = src.m * p   tmp.v = src.v * p
> +          tmp.e = src.e * p   tmp.n = src.n * p   tmp.w = src.w * p
> +          tmp.f = src.f * p   tmp.o = src.o * p   tmp.x = src.x * p
> +          tmp.g = src.g * p   tmp.p = src.p * p   tmp.y = src.y * p
> +          tmp.h = src.h * p   tmp.q = src.q * p   tmp.z = src.z * p
> +          tmp.i = src.i * p   tmp.r = src.r * p
> +
> +          dst.a = tmp.z + p   dst.j = tmp.q + p   dst.s = tmp.h + p
> +          dst.b = tmp.y + p   dst.k = tmp.p + p   dst.t = tmp.g + p
> +          dst.c = tmp.x + p   dst.l = tmp.o + p   dst.u = tmp.f + p
> +          dst.d = tmp.w + p   dst.m = tmp.n + p   dst.v = tmp.e + p
> +          dst.e = tmp.v + p   dst.n = tmp.m + p   dst.w = tmp.d + p
> +          dst.f = tmp.u + p   dst.o = tmp.l + p   dst.x = tmp.c + p
> +          dst.g = tmp.t + p   dst.p = tmp.k + p   dst.y = tmp.b + p
> +          dst.h = tmp.s + p   dst.q = tmp.j + p   dst.z = tmp.a + p
> +          dst.i = tmp.r + p   dst.r = tmp.i + p
> +        end
> +        dst.tmp = tmp
> +        return dst
> +      end
> +    ]]):format(last_i), fname), ('Syntax error in function %s'):format(fname))()
> +    -- XXX: FNEW is NYI, hence loop recording fails at this point.
> +    -- The recording is aborted on purpose: the whole loop
> +    -- recording might lead to a very long trace error (via return
> +    -- to a lower frame), or a trace with lots of side traces. We
> +    -- need neither of this, but just a bunch of traces filling
> +    -- the available mcode area.
> +    local function tnew(p)
Nit: same issue with naming.
> +      return {
> +        a = p + 1, f = p + 6,  k = p + 11, p = p + 16, u = p + 21, z = p + 26,
> +        b = p + 2, g = p + 7,  l = p + 12, q = p + 17, v = p + 22,
> +        c = p + 3, h = p + 8,  m = p + 13, r = p + 18, w = p + 23,
> +        d = p + 4, i = p + 9,  n = p + 14, s = p + 19, x = p + 24,
> +        e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25,
> +      }
> +    end
> +    -- Each function call produces a trace (see the template for
> +    -- the function definition above).
> +    recfuncs[last_i](tnew(last_i))
> +    local last_traceno = getlast_traceno()
> +    if last_traceno < maxtraceno then
> +      error(FLUSH_ERR)
> +    end
> +
> +    -- Calculate the address of the last trace start.
> +    maxtraceno = last_traceno
> +    _, last_addr = jutil.tracemc(last_traceno)
> +    if not last_addr then
> +      error(FLUSH_ERR)
> +    end
> +    last_addr = canonize_address(last_addr)
> +  end
> +end
> +
> +return M
> -- 
> 2.41.0
Best regards,
Maxim Kokryashkin
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:13   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:05     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 15:02   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:13 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:52PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by Djordje Kovacevic and Stefan Pejic.
> 
> (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
> 
> `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
Typo: s/to incorrect/to an incorrect/
> check in `asm_sparejump_setup()`, so mcode bottom is not updated.
Typo: s/so mcode/so the mcode/
> 
> This patch fixes check of the MCLink offset from the mcbot.
Typo: s/fixes check/fixes the check/
> Nevertheless, the emitting of spare jump slots is still incorrect, so
> the introduced test still fails due to incorrect iteration through the
Typo: s/due to/due to the/
> sparce table (the last slot is out of mcode range).
> 
> This should be fixed via backporting of the commit
> dbb78630169a8106b355a5be8af627e98c362f1e ("MIPS: Fix handling of
> long-range spare jumps."). But it triggers the new unconditional
> assert, that is added in this patch, mentioning that sizemcode is too
> bit. So some workaround should be found, when this test will be enabled
Typo: s/bit/big/
Typo: s/will be/is/
> for MIPS.
> 
> Since test also validates the behaviour of long-range jumps to side
> traces for arm64 and x64, and we have no testing for MIPS64 (yet), we
> can leave it as is without a skipcond.
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_asm_mips.h                             |  9 +--
>  src/lj_jit.h                                  |  6 ++
>  src/lj_mcode.c                                |  6 --
>  ...x-mips64-spare-side-exit-patching.test.lua | 65 +++++++++++++++++++
>  4 files changed, 76 insertions(+), 10 deletions(-)
>  create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> 
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 03215821..0e60fc07 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -65,10 +65,9 @@ static Reg ra_alloc2(ASMState *as, IRIns *ir, RegSet allow)
>  static void asm_sparejump_setup(ASMState *as)
>  {
>    MCode *mxp = as->mcbot;
> -  /* Assumes sizeof(MCLink) == 8. */
> -  if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == 8) {
> +  if (((uintptr_t)mxp & (LJ_PAGESIZE-1)) == sizeof(MCLink)) {
>      lua_assert(MIPSI_NOP == 0);
> -    memset(mxp+2, 0, MIPS_SPAREJUMP*8);
> +    memset(mxp, 0, MIPS_SPAREJUMP*2*sizeof(MCode));
>      mxp += MIPS_SPAREJUMP*2;
>      lua_assert(mxp < as->mctop);
>      lj_mcode_sync(as->mcbot, mxp);
> @@ -2486,7 +2485,9 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
>  	  if (!cstart) cstart = p-1;
>  	} else {  /* Branch out of range. Use spare jump slot in mcarea. */
>  	  int i;
> -	  for (i = 2; i < 2+MIPS_SPAREJUMP*2; i += 2) {
> +	  for (i = (int)(sizeof(MCLink)/sizeof(MCode));
> +	       i < (int)(sizeof(MCLink)/sizeof(MCode)+MIPS_SPAREJUMP*2);
> +	       i += 2) {
>  	    if (mcarea[i] == tjump) {
>  	      delta = mcarea+i - p;
>  	      goto patchbranch;
> diff --git a/src/lj_jit.h b/src/lj_jit.h
> index f2ad3c6e..cc8efd20 100644
> --- a/src/lj_jit.h
> +++ b/src/lj_jit.h
> @@ -158,6 +158,12 @@ typedef uint8_t MCode;
>  typedef uint32_t MCode;
>  #endif
>  
> +/* Linked list of MCode areas. */
> +typedef struct MCLink {
> +  MCode *next;		/* Next area. */
> +  size_t size;		/* Size of current area. */
> +} MCLink;
> +
>  /* Stack snapshot header. */
>  typedef struct SnapShot {
>    uint32_t mapofs;	/* Offset into snapshot map. */
> diff --git a/src/lj_mcode.c b/src/lj_mcode.c
> index 7184d3b4..c6361018 100644
> --- a/src/lj_mcode.c
> +++ b/src/lj_mcode.c
> @@ -272,12 +272,6 @@ static void *mcode_alloc(jit_State *J, size_t sz)
>  
>  /* -- MCode area management ----------------------------------------------- */
>  
> -/* Linked list of MCode areas. */
> -typedef struct MCLink {
> -  MCode *next;		/* Next area. */
> -  size_t size;		/* Size of current area. */
> -} MCLink;
> -
>  /* Allocate a new MCode area. */
>  static void mcode_allocarea(jit_State *J)
>  {
> diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> new file mode 100644
> index 00000000..fdc826cb
> --- /dev/null
> +++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> @@ -0,0 +1,65 @@
> +local tap = require('tap')
> +local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
> +  -- Need to fix the MIPS behaviour first.
Typo: s/Need to/We need to/
> +  ['Disabled for MIPS architectures'] = jit.arch:match('mips'),
> +})
> +
> +local generators = require('utils').jit.generators
> +local frontend = require('utils').frontend
> +
> +test:plan(1)
> +
> +-- Make compiler work hard.
> +jit.opt.start(
> +  -- No optimizations at all to produce more mcode.
> +  0,
> +  -- Try to compile all compiled paths as early as JIT can.
> +  'hotloop=1',
> +  'hotexit=1',
> +  -- Allow to use 2000 traces to avoid flushes.
Typo: s/to use/compilation of up to/
> +  'maxtrace=2000',
> +  -- Allow to compile 8Mb of mcode to be sure the issue occurs.
Typo: s/to compile/compilation of up to/
> +  'maxmcode=8192',
> +  -- Use big mcode area for traces to avoid using different
Typo: s/using/usage of/
> +  -- spare slots.
> +  'sizemcode=256'
> +)
> +
> +local MAX_SPARE_SLOT = 4
A link to the definition in `lj_asm_mips.h` would be nice to have.

> +local function parent(marker)
> +  -- Use several side exit to fill spare exit space (default is
Typo: s/side exit/side exits/
> +  -- 4 slots, each slot has 2 instructions -- jump and nop).
> +  -- luacheck: ignore
> +  if marker > MAX_SPARE_SLOT then end
> +  if marker > 3 then end
> +  if marker > 2 then end
> +  if marker > 1 then end
> +  if marker > 0 then end
> +  -- XXX: use `fmod()` to avoid leaving the function and use
> +  -- stitching here.
> +  return math.fmod(1, 1)
> +end
> +
> +-- Compile parent trace first.
> +parent(0)
> +parent(0)
> +
> +local parent_traceno = frontend.gettraceno(parent)
> +local last_traceno = parent_traceno
> +
> +-- Now generate some mcode to forcify long jump with a spare slot.
> +-- Each iteration provide different addresses and uses a different
Typo: s/provide/provides/
> +-- spare slot. After it compile and execute new side trace.
Typo: s/After it compile and execute/After that, compiles and executes a/
> +for i = 1, MAX_SPARE_SLOT + 1 do
> +  generators.fillmcode(last_traceno, 1024 * 1024)
> +  parent(i)
> +  parent(i)
> +  parent(i)
> +  last_traceno = misc.getmetrics().jit_trace_num
> +end
> +
> +test:ok(true, 'all traces executed correctly')
> +
> +test:done(true)
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:27   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:10     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 16:07   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:27 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:53PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
> 
> (cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)
> 
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in JIT compiler backend for MIPS64.
Typo: s/in JIT/in the JIT/
> This includes:
> * `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
>   number to integer with a check for the soft-float support (called from
>   JIT).
> * `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
>   operations with a check for the soft-float support (called from JIT).
Typo: s/the soft-float/soft-float/
> * `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
>   `LJ_SOFTFP`.
> * All fp-depending paths are instrumented with `LJ_SOFTFP` or
Typo: s/fp-depending/fp-dependent/
>   `LJ_SOFTFP32` macro.
Typo: s/macro/macros/
> * The corresponding function calls in <src/lj_ircall.h> are marked as
>   `XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
>   for soft-FP on 32-bit MIPS.

Shouldn't we also mention the `asm_tobit` function?
> 
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> 
> Sergey Kaplun:
> * added the description for the feature
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_arch.h      |   4 +-
>  src/lj_asm.c       |   8 +-
>  src/lj_asm_mips.h  | 217 +++++++++++++++++++++++++++++++++++++--------
>  src/lj_crecord.c   |   4 +-
>  src/lj_emit_mips.h |   2 +
>  src/lj_ffrecord.c  |   2 +-
>  src/lj_ircall.h    |  43 ++++++---
>  src/lj_iropt.h     |   2 +-
>  src/lj_jit.h       |   4 +-
>  src/lj_obj.h       |   3 +
>  src/lj_opt_split.c |   2 +-
>  src/lj_snap.c      |  21 +++--
>  src/vm_mips64.dasc |  49 ++++++++++
>  13 files changed, 286 insertions(+), 75 deletions(-)
> 
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 5276ae56..c39526ea 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -349,9 +349,6 @@
>  #define LJ_ARCH_BITS		32
>  #define LJ_TARGET_MIPS32	1
>  #else
> -#if LJ_ABI_SOFTFP || !LJ_ARCH_HASFPU
> -#define LJ_ARCH_NOJIT		1	/* NYI */
> -#endif
>  #define LJ_ARCH_BITS		64
>  #define LJ_TARGET_MIPS64	1
>  #define LJ_TARGET_GC64		1
> @@ -528,6 +525,7 @@
>  #define LJ_ABI_SOFTFP		0
>  #endif
>  #define LJ_SOFTFP		(!LJ_ARCH_HASFPU)
> +#define LJ_SOFTFP32		(LJ_SOFTFP && LJ_32)
>  
>  #if LJ_ARCH_ENDIAN == LUAJIT_BE
>  #define LJ_LE			0
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 0bfa44ed..15de7e33 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -341,7 +341,7 @@ static Reg ra_rematk(ASMState *as, IRRef ref)
>    ra_modified(as, r);
>    ir->r = RID_INIT;  /* Do not keep any hint. */
>    RA_DBGX((as, "remat     $i $r", ir, r));
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>    if (ir->o == IR_KNUM) {
>      emit_loadk64(as, r, ir);
>    } else
> @@ -1356,7 +1356,7 @@ static void asm_call(ASMState *as, IRIns *ir)
>    asm_gencall(as, ci, args);
>  }
>  
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>  static void asm_fppow(ASMState *as, IRIns *ir, IRRef lref, IRRef rref)
>  {
>    const CCallInfo *ci = &lj_ir_callinfo[IRCALL_pow];
> @@ -1703,10 +1703,10 @@ static void asm_ir(ASMState *as, IRIns *ir)
>    case IR_MUL: asm_mul(as, ir); break;
>    case IR_MOD: asm_mod(as, ir); break;
>    case IR_NEG: asm_neg(as, ir); break;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
>    case IR_DIV: case IR_POW: case IR_ABS:
>    case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
> -    lua_assert(0);  /* Unused for LJ_SOFTFP. */
> +    lua_assert(0);  /* Unused for LJ_SOFTFP32. */
>      break;
>  #else
>    case IR_DIV: asm_div(as, ir); break;
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 0e60fc07..a26a82cd 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -290,7 +290,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>  	  {
>  	    ra_leftov(as, gpr, ref);
>  	    gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
>  	    fpr++;
>  #endif
>  	  }
> @@ -301,7 +301,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>  	  emit_spstore(as, ir, r, ofs);
>  	  ofs += irt_isnum(ir->t) ? 8 : 4;
>  #else
> -	  emit_spstore(as, ir, r, ofs + ((LJ_BE && (LJ_SOFTFP || r < RID_MAX_GPR) && !irt_is64(ir->t)) ? 4 : 0));
> +	  emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isfp(ir->t) && !irt_is64(ir->t)) ? 4 : 0));
>  	  ofs += 8;
>  #endif
>  	}
> @@ -312,7 +312,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>  #endif
>        if (gpr <= REGARG_LASTGPR) {
>  	gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
>  	fpr++;
>  #endif
>        } else {
> @@ -461,12 +461,36 @@ static void asm_tobit(ASMState *as, IRIns *ir)
>    emit_tg(as, MIPSI_MFC1, dest, tmp);
>    emit_fgh(as, MIPSI_ADD_D, tmp, left, right);
>  }
> +#elif LJ_64  /* && LJ_SOFTFP */
> +static void asm_tointg(ASMState *as, IRIns *ir, Reg r)
> +{
> +  /* The modified regs must match with the *.dasc implementation. */
> +  RegSet drop = RID2RSET(REGARG_FIRSTGPR)|RID2RSET(RID_RET)|RID2RSET(RID_RET+1)|
> +		RID2RSET(RID_R1)|RID2RSET(RID_R12);
> +  if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
> +  ra_evictset(as, drop);
> +  /* Return values are in RID_RET (converted value) and RID_RET+1 (status). */
> +  ra_destreg(as, ir, RID_RET);
> +  asm_guard(as, MIPSI_BNE, RID_RET+1, RID_ZERO);
> +  emit_call(as, (void *)lj_ir_callinfo[IRCALL_lj_vm_tointg].func, 0);
> +  if (r == RID_NONE)
> +    ra_leftov(as, REGARG_FIRSTGPR, ir->op1);
> +  else if (r != REGARG_FIRSTGPR)
> +    emit_move(as, REGARG_FIRSTGPR, r);
> +}
> +
> +static void asm_tobit(ASMState *as, IRIns *ir)
> +{
> +  Reg dest = ra_dest(as, ir, RSET_GPR);
> +  emit_dta(as, MIPSI_SLL, dest, dest, 0);
> +  asm_callid(as, ir, IRCALL_lj_vm_tobit);
> +}
>  #endif
>  
>  static void asm_conv(ASMState *as, IRIns *ir)
>  {
>    IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>    int stfp = (st == IRT_NUM || st == IRT_FLOAT);
>  #endif
>  #if LJ_64
> @@ -477,12 +501,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
>    lua_assert(!(irt_isint64(ir->t) ||
>  	       (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
>  #endif
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
>    /* FP conversions are handled by SPLIT. */
>    lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
>    /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
>  #else
>    lua_assert(irt_type(ir->t) != st);
> +#if !LJ_SOFTFP
>    if (irt_isfp(ir->t)) {
>      Reg dest = ra_dest(as, ir, RSET_FPR);
>      if (stfp) {  /* FP to FP conversion. */
> @@ -608,6 +633,42 @@ static void asm_conv(ASMState *as, IRIns *ir)
>        }
>      }
>    } else
> +#else
> +  if (irt_isfp(ir->t)) {
> +#if LJ_64 && LJ_HASFFI
> +    if (stfp) {  /* FP to FP conversion. */
> +      asm_callid(as, ir, irt_isnum(ir->t) ? IRCALL_softfp_f2d :
> +					    IRCALL_softfp_d2f);
> +    } else {  /* Integer to FP conversion. */
> +      IRCallID cid = ((IRT_IS64 >> st) & 1) ?
> +	(irt_isnum(ir->t) ?
> +	 (st == IRT_I64 ? IRCALL_fp64_l2d : IRCALL_fp64_ul2d) :
> +	 (st == IRT_I64 ? IRCALL_fp64_l2f : IRCALL_fp64_ul2f)) :
> +	(irt_isnum(ir->t) ?
> +	 (st == IRT_INT ? IRCALL_softfp_i2d : IRCALL_softfp_ui2d) :
> +	 (st == IRT_INT ? IRCALL_softfp_i2f : IRCALL_softfp_ui2f));
> +      asm_callid(as, ir, cid);
> +    }
> +#else
> +    asm_callid(as, ir, IRCALL_softfp_i2d);
> +#endif
> +  } else if (stfp) {  /* FP to integer conversion. */
> +    if (irt_isguard(ir->t)) {
> +      /* Checked conversions are only supported from number to int. */
> +      lua_assert(irt_isint(ir->t) && st == IRT_NUM);
> +      asm_tointg(as, ir, RID_NONE);
> +    } else {
> +      IRCallID cid = irt_is64(ir->t) ?
> +	((st == IRT_NUM) ?
> +	 (irt_isi64(ir->t) ? IRCALL_fp64_d2l : IRCALL_fp64_d2ul) :
> +	 (irt_isi64(ir->t) ? IRCALL_fp64_f2l : IRCALL_fp64_f2ul)) :
> +	((st == IRT_NUM) ?
> +	 (irt_isint(ir->t) ? IRCALL_softfp_d2i : IRCALL_softfp_d2ui) :
> +	 (irt_isint(ir->t) ? IRCALL_softfp_f2i : IRCALL_softfp_f2ui));
> +      asm_callid(as, ir, cid);
> +    }
> +  } else
> +#endif
>  #endif
>    {
>      Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -665,7 +726,7 @@ static void asm_strto(ASMState *as, IRIns *ir)
>    const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
>    IRRef args[2];
>    int32_t ofs = 0;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
>    ra_evictset(as, RSET_SCRATCH);
>    if (ra_used(ir)) {
>      if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> @@ -806,7 +867,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>    MCLabel l_end, l_loop, l_next;
>  
>    rset_clear(allow, tab);
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
>    if (!isk) {
>      key = ra_alloc1(as, refkey, allow);
>      rset_clear(allow, key);
> @@ -826,7 +887,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>      }
>    }
>  #else
> -  if (irt_isnum(kt)) {
> +  if (!LJ_SOFTFP && irt_isnum(kt)) {
>      key = ra_alloc1(as, refkey, RSET_FPR);
>      tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
>    } else if (!irt_ispri(kt)) {
> @@ -882,6 +943,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>      emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
>      emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
>      emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> +  } else if (LJ_SOFTFP && irt_isnum(kt)) {
> +    emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> +    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
>    } else if (irt_isaddr(kt)) {
>      Reg refk = tmp2;
>      if (isk) {
> @@ -960,7 +1024,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>        emit_dta(as, MIPSI_ROTR, dest, tmp1, (-HASH_ROT1)&31);
>        if (irt_isnum(kt)) {
>  	emit_dst(as, MIPSI_ADDU, tmp1, tmp1, tmp1);
> -	emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 0);
> +	emit_dta(as, MIPSI_DSRA32, tmp1, LJ_SOFTFP ? key : tmp1, 0);
>  	emit_dta(as, MIPSI_SLL, tmp2, LJ_SOFTFP ? key : tmp1, 0);
>  #if !LJ_SOFTFP
>  	emit_tg(as, MIPSI_DMFC1, tmp1, key);
> @@ -1123,7 +1187,7 @@ static MIPSIns asm_fxloadins(IRIns *ir)
>    case IRT_U8: return MIPSI_LBU;
>    case IRT_I16: return MIPSI_LH;
>    case IRT_U16: return MIPSI_LHU;
> -  case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_LDC1;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
>    case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
>    default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
>    }
> @@ -1134,7 +1198,7 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
>    switch (irt_type(ir->t)) {
>    case IRT_I8: case IRT_U8: return MIPSI_SB;
>    case IRT_I16: case IRT_U16: return MIPSI_SH;
> -  case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_SDC1;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
>    case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
>    default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
>    }
> @@ -1199,7 +1263,7 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
>  
>  static void asm_ahuvload(ASMState *as, IRIns *ir)
>  {
> -  int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> +  int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
>    Reg dest = RID_NONE, type = RID_TMP, idx;
>    RegSet allow = RSET_GPR;
>    int32_t ofs = 0;
> @@ -1212,7 +1276,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
>      }
>    }
>    if (ra_used(ir)) {
> -    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> +    lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
>  	       irt_isint(ir->t) || irt_isaddr(ir->t));
>      dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>      rset_clear(allow, dest);
> @@ -1261,10 +1325,10 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
>    int32_t ofs = 0;
>    if (ir->r == RID_SINK)
>      return;
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> -    src = ra_alloc1(as, ir->op2, RSET_FPR);
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +    src = ra_alloc1(as, ir->op2, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
>      idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> -    emit_hsi(as, MIPSI_SDC1, src, idx, ofs);
> +    emit_hsi(as, LJ_SOFTFP ? MIPSI_SD : MIPSI_SDC1, src, idx, ofs);
>    } else {
>  #if LJ_32
>      if (!irt_ispri(ir->t)) {
> @@ -1312,7 +1376,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
>    IRType1 t = ir->t;
>  #if LJ_32
>    int32_t ofs = 8*((int32_t)ir->op1-1) + ((ir->op2 & IRSLOAD_FRAME) ? 4 : 0);
> -  int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> +  int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
>    if (hiop)
>      t.irt = IRT_NUM;
>  #else
> @@ -1320,7 +1384,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
>  #endif
>    lua_assert(!(ir->op2 & IRSLOAD_PARENT));  /* Handled by asm_head_side(). */
>    lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
>    lua_assert(!(ir->op2 & IRSLOAD_CONVERT));  /* Handled by LJ_SOFTFP SPLIT. */
>    if (hiop && ra_used(ir+1)) {
>      type = ra_dest(as, ir+1, allow);
> @@ -1328,29 +1392,44 @@ static void asm_sload(ASMState *as, IRIns *ir)
>    }
>  #else
>    if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
> -    dest = ra_scratch(as, RSET_FPR);
> +    dest = ra_scratch(as, LJ_SOFTFP ? allow : RSET_FPR);
>      asm_tointg(as, ir, dest);
>      t.irt = IRT_NUM;  /* Continue with a regular number type check. */
>    } else
>  #endif
>    if (ra_used(ir)) {
> -    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> +    lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
>  	       irt_isint(ir->t) || irt_isaddr(ir->t));
>      dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>      rset_clear(allow, dest);
>      base = ra_alloc1(as, REF_BASE, allow);
>      rset_clear(allow, base);
> -    if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
> +    if (!LJ_SOFTFP32 && (ir->op2 & IRSLOAD_CONVERT)) {
>        if (irt_isint(t)) {
> -	Reg tmp = ra_scratch(as, RSET_FPR);
> +	Reg tmp = ra_scratch(as, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
> +#if LJ_SOFTFP
> +	ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> +	ra_destreg(as, ir, RID_RET);
> +	emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_d2i].func, 0);
> +	if (tmp != REGARG_FIRSTGPR)
> +	  emit_move(as, REGARG_FIRSTGPR, tmp);
> +#else
>  	emit_tg(as, MIPSI_MFC1, dest, tmp);
>  	emit_fg(as, MIPSI_TRUNC_W_D, tmp, tmp);
> +#endif
>  	dest = tmp;
>  	t.irt = IRT_NUM;  /* Check for original type. */
>        } else {
>  	Reg tmp = ra_scratch(as, RSET_GPR);
> +#if LJ_SOFTFP
> +	ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> +	ra_destreg(as, ir, RID_RET);
> +	emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_i2d].func, 0);
> +	emit_dta(as, MIPSI_SLL, REGARG_FIRSTGPR, tmp, 0);
> +#else
>  	emit_fg(as, MIPSI_CVT_D_W, dest, dest);
>  	emit_tg(as, MIPSI_MTC1, tmp, dest);
> +#endif
>  	dest = tmp;
>  	t.irt = IRT_INT;  /* Check for original type. */
>        }
> @@ -1399,7 +1478,7 @@ dotypecheck:
>        if (irt_isnum(t)) {
>  	asm_guard(as, MIPSI_BEQ, RID_TMP, RID_ZERO);
>  	emit_tsi(as, MIPSI_SLTIU, RID_TMP, RID_TMP, (int32_t)LJ_TISNUM);
> -	if (ra_hasreg(dest))
> +	if (!LJ_SOFTFP && ra_hasreg(dest))
>  	  emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
>        } else {
>  	asm_guard(as, MIPSI_BNE, RID_TMP,
> @@ -1409,7 +1488,7 @@ dotypecheck:
>      }
>      emit_tsi(as, MIPSI_LD, type, base, ofs);
>    } else if (ra_hasreg(dest)) {
> -    if (irt_isnum(t))
> +    if (!LJ_SOFTFP && irt_isnum(t))
>        emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
>      else
>        emit_tsi(as, irt_isint(t) ? MIPSI_LW : MIPSI_LD, dest, base,
> @@ -1554,26 +1633,40 @@ static void asm_fpunary(ASMState *as, IRIns *ir, MIPSIns mi)
>    Reg left = ra_hintalloc(as, ir->op1, dest, RSET_FPR);
>    emit_fg(as, mi, dest, left);
>  }
> +#endif
>  
> +#if !LJ_SOFTFP32
>  static void asm_fpmath(ASMState *as, IRIns *ir)
>  {
>    if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir))
>      return;
> +#if !LJ_SOFTFP
>    if (ir->op2 <= IRFPM_TRUNC)
>      asm_callround(as, ir, IRCALL_lj_vm_floor + ir->op2);
>    else if (ir->op2 == IRFPM_SQRT)
>      asm_fpunary(as, ir, MIPSI_SQRT_D);
>    else
> +#endif
>      asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
>  }
>  #endif
>  
> +#if !LJ_SOFTFP
> +#define asm_fpadd(as, ir)	asm_fparith(as, ir, MIPSI_ADD_D)
> +#define asm_fpsub(as, ir)	asm_fparith(as, ir, MIPSI_SUB_D)
> +#define asm_fpmul(as, ir)	asm_fparith(as, ir, MIPSI_MUL_D)
> +#elif LJ_64  /* && LJ_SOFTFP */
> +#define asm_fpadd(as, ir)	asm_callid(as, ir, IRCALL_softfp_add)
> +#define asm_fpsub(as, ir)	asm_callid(as, ir, IRCALL_softfp_sub)
> +#define asm_fpmul(as, ir)	asm_callid(as, ir, IRCALL_softfp_mul)
> +#endif
> +
>  static void asm_add(ASMState *as, IRIns *ir)
>  {
>    IRType1 t = ir->t;
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>    if (irt_isnum(t)) {
> -    asm_fparith(as, ir, MIPSI_ADD_D);
> +    asm_fpadd(as, ir);
>    } else
>  #endif
>    {
> @@ -1595,9 +1688,9 @@ static void asm_add(ASMState *as, IRIns *ir)
>  
>  static void asm_sub(ASMState *as, IRIns *ir)
>  {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>    if (irt_isnum(ir->t)) {
> -    asm_fparith(as, ir, MIPSI_SUB_D);
> +    asm_fpsub(as, ir);
>    } else
>  #endif
>    {
> @@ -1611,9 +1704,9 @@ static void asm_sub(ASMState *as, IRIns *ir)
>  
>  static void asm_mul(ASMState *as, IRIns *ir)
>  {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>    if (irt_isnum(ir->t)) {
> -    asm_fparith(as, ir, MIPSI_MUL_D);
> +    asm_fpmul(as, ir);
>    } else
>  #endif
>    {
> @@ -1640,7 +1733,7 @@ static void asm_mod(ASMState *as, IRIns *ir)
>      asm_callid(as, ir, IRCALL_lj_vm_modi);
>  }
>  
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>  static void asm_pow(ASMState *as, IRIns *ir)
>  {
>  #if LJ_64 && LJ_HASFFI
> @@ -1660,7 +1753,11 @@ static void asm_div(ASMState *as, IRIns *ir)
>  					  IRCALL_lj_carith_divu64);
>    else
>  #endif
> +#if !LJ_SOFTFP
>      asm_fparith(as, ir, MIPSI_DIV_D);
> +#else
> +  asm_callid(as, ir, IRCALL_softfp_div);
> +#endif
>  }
>  #endif
>  
> @@ -1670,6 +1767,13 @@ static void asm_neg(ASMState *as, IRIns *ir)
>    if (irt_isnum(ir->t)) {
>      asm_fpunary(as, ir, MIPSI_NEG_D);
>    } else
> +#elif LJ_64  /* && LJ_SOFTFP */
> +  if (irt_isnum(ir->t)) {
> +    Reg dest = ra_dest(as, ir, RSET_GPR);
> +    Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> +    emit_dst(as, MIPSI_XOR, dest, left,
> +	    ra_allock(as, 0x8000000000000000ll, rset_exclude(RSET_GPR, dest)));
> +  } else
>  #endif
>    {
>      Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -1679,7 +1783,17 @@ static void asm_neg(ASMState *as, IRIns *ir)
>    }
>  }
>  
> +#if !LJ_SOFTFP
>  #define asm_abs(as, ir)		asm_fpunary(as, ir, MIPSI_ABS_D)
> +#elif LJ_64   /* && LJ_SOFTFP */
> +static void asm_abs(ASMState *as, IRIns *ir)
> +{
> +  Reg dest = ra_dest(as, ir, RSET_GPR);
> +  Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> +  emit_tsml(as, MIPSI_DEXTM, dest, left, 30, 0);
> +}
> +#endif
> +
>  #define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
>  #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
>  
> @@ -1924,15 +2038,21 @@ static void asm_bror(ASMState *as, IRIns *ir)
>    }
>  }
>  
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
>  static void asm_sfpmin_max(ASMState *as, IRIns *ir)
>  {
>    CCallInfo ci = lj_ir_callinfo[(IROp)ir->o == IR_MIN ? IRCALL_lj_vm_sfmin : IRCALL_lj_vm_sfmax];
> +#if LJ_64
> +  IRRef args[2];
> +  args[0] = ir->op1;
> +  args[1] = ir->op2;
> +#else
>    IRRef args[4];
>    args[0^LJ_BE] = ir->op1;
>    args[1^LJ_BE] = (ir+1)->op1;
>    args[2^LJ_BE] = ir->op2;
>    args[3^LJ_BE] = (ir+1)->op2;
> +#endif
>    asm_setupresult(as, ir, &ci);
>    emit_call(as, (void *)ci.func, 0);
>    ci.func = NULL;
> @@ -1942,7 +2062,10 @@ static void asm_sfpmin_max(ASMState *as, IRIns *ir)
>  
>  static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>  {
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +    asm_sfpmin_max(as, ir);
> +#else
>      Reg dest = ra_dest(as, ir, RSET_FPR);
>      Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>      right = (left >> 8); left &= 255;
> @@ -1953,6 +2076,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>        if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
>      }
>      emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
> +#endif
>    } else {
>      Reg dest = ra_dest(as, ir, RSET_GPR);
>      Reg right, left = ra_alloc2(as, ir, RSET_GPR);
> @@ -1973,18 +2097,24 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>  
>  /* -- Comparisons --------------------------------------------------------- */
>  
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
>  /* SFP comparisons. */
>  static void asm_sfpcomp(ASMState *as, IRIns *ir)
>  {
>    const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
>    RegSet drop = RSET_SCRATCH;
>    Reg r;
> +#if LJ_64
> +  IRRef args[2];
> +  args[0] = ir->op1;
> +  args[1] = ir->op2;
> +#else
>    IRRef args[4];
>    args[LJ_LE ? 0 : 1] = ir->op1; args[LJ_LE ? 1 : 0] = (ir+1)->op1;
>    args[LJ_LE ? 2 : 3] = ir->op2; args[LJ_LE ? 3 : 2] = (ir+1)->op2;
> +#endif
>  
> -  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> +  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+(LJ_64?1:3); r++) {
>      if (!rset_test(as->freeset, r) &&
>  	regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
>        rset_clear(drop, r);
> @@ -2038,11 +2168,15 @@ static void asm_comp(ASMState *as, IRIns *ir)
>  {
>    /* ORDER IR: LT GE LE GT  ULT UGE ULE UGT. */
>    IROp op = ir->o;
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +    asm_sfpcomp(as, ir);
> +#else
>      Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>      right = (left >> 8); left &= 255;
>      asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
>      emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
> +#endif
>    } else {
>      Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
>      if (op == IR_ABC) op = IR_UGT;
> @@ -2074,9 +2208,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
>    Reg right, left = ra_alloc2(as, ir, (!LJ_SOFTFP && irt_isnum(ir->t)) ?
>  				       RSET_FPR : RSET_GPR);
>    right = (left >> 8); left &= 255;
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +    asm_sfpcomp(as, ir);
> +#else
>      asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
>      emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
> +#endif
>    } else {
>      asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
>    }
> @@ -2269,7 +2407,7 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>      if ((sn & SNAP_NORESTORE))
>        continue;
>      if (irt_isnum(ir->t)) {
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
>        Reg tmp;
>        RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
>        lua_assert(irref_isk(ref));  /* LJ_SOFTFP: must be a number constant. */
> @@ -2278,6 +2416,9 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>        if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
>        tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
>        emit_tsi(as, MIPSI_SW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#elif LJ_SOFTFP  /* && LJ_64 */
> +      Reg src = ra_alloc1(as, ref, rset_exclude(RSET_GPR, RID_BASE));
> +      emit_tsi(as, MIPSI_SD, src, RID_BASE, ofs);
>  #else
>        Reg src = ra_alloc1(as, ref, RSET_FPR);
>        emit_hsi(as, MIPSI_SDC1, src, RID_BASE, ofs);
> diff --git a/src/lj_crecord.c b/src/lj_crecord.c
> index ffe995f4..804cdbf4 100644
> --- a/src/lj_crecord.c
> +++ b/src/lj_crecord.c
> @@ -212,7 +212,7 @@ static void crec_copy_emit(jit_State *J, CRecMemList *ml, MSize mlp,
>      ml[i].trval = emitir(IRT(IR_XLOAD, ml[i].tp), trsptr, 0);
>      ml[i].trofs = trofs;
>      i++;
> -    rwin += (LJ_SOFTFP && ml[i].tp == IRT_NUM) ? 2 : 1;
> +    rwin += (LJ_SOFTFP32 && ml[i].tp == IRT_NUM) ? 2 : 1;
>      if (rwin >= CREC_COPY_REGWIN || i >= mlp) {  /* Flush buffered stores. */
>        rwin = 0;
>        for ( ; j < i; j++) {
> @@ -1152,7 +1152,7 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd,
>  	else
>  	  tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_I8 : IRT_I16,IRCONV_SEXT);
>        }
> -    } else if (LJ_SOFTFP && ctype_isfp(d->info) && d->size > 4) {
> +    } else if (LJ_SOFTFP32 && ctype_isfp(d->info) && d->size > 4) {
>        lj_needsplit(J);
>      }
>  #if LJ_TARGET_X86
> diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
> index 8a9ee24d..bb6593ae 100644
> --- a/src/lj_emit_mips.h
> +++ b/src/lj_emit_mips.h
> @@ -12,6 +12,8 @@ static intptr_t get_k64val(IRIns *ir)
>      return (intptr_t)ir_kgc(ir);
>    } else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) {
>      return (intptr_t)ir_kptr(ir);
> +  } else if (LJ_SOFTFP && ir->o == IR_KNUM) {
> +    return (intptr_t)ir_knum(ir)->u64;
>    } else {
>      lua_assert(ir->o == IR_KINT || ir->o == IR_KNULL);
>      return ir->i;  /* Sign-extended. */
> diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
> index 8af9da1d..0746ec64 100644
> --- a/src/lj_ffrecord.c
> +++ b/src/lj_ffrecord.c
> @@ -986,7 +986,7 @@ static void LJ_FASTCALL recff_string_format(jit_State *J, RecordFFData *rd)
>      handle_num:
>        tra = lj_ir_tonum(J, tra);
>        tr = lj_ir_call(J, id, tr, trsf, tra);
> -      if (LJ_SOFTFP) lj_needsplit(J);
> +      if (LJ_SOFTFP32) lj_needsplit(J);
>        break;
>      case STRFMT_STR:
>        if (!tref_isstr(tra)) {
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index aa06b273..c1ac29d1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -52,7 +52,7 @@ typedef struct CCallInfo {
>  #define CCI_XARGS(ci)		(((ci)->flags >> CCI_XARGS_SHIFT) & 3)
>  #define CCI_XA			(1u << CCI_XARGS_SHIFT)
>  
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>  #define CCI_XNARGS(ci)		(CCI_NARGS((ci)) + CCI_XARGS((ci)))
>  #else
>  #define CCI_XNARGS(ci)		CCI_NARGS((ci))
> @@ -79,13 +79,19 @@ typedef struct CCallInfo {
>  #define IRCALLCOND_SOFTFP_FFI(x)	NULL
>  #endif
>  
> -#if LJ_SOFTFP && LJ_TARGET_MIPS32
> +#if LJ_SOFTFP && LJ_TARGET_MIPS
>  #define IRCALLCOND_SOFTFP_MIPS(x)	x
>  #else
>  #define IRCALLCOND_SOFTFP_MIPS(x)	NULL
>  #endif
>  
> -#define LJ_NEED_FP64	(LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS32)
> +#if LJ_SOFTFP && LJ_TARGET_MIPS64
> +#define IRCALLCOND_SOFTFP_MIPS64(x)	x
> +#else
> +#define IRCALLCOND_SOFTFP_MIPS64(x)	NULL
> +#endif
> +
> +#define LJ_NEED_FP64	(LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS)
>  
>  #if LJ_HASFFI && (LJ_SOFTFP || LJ_NEED_FP64)
>  #define IRCALLCOND_FP64_FFI(x)		x
> @@ -113,6 +119,14 @@ typedef struct CCallInfo {
>  #define XA2_FP		0
>  #endif
>  
> +#if LJ_SOFTFP32
> +#define XA_FP32		CCI_XA
> +#define XA2_FP32	(CCI_XA+CCI_XA)
> +#else
> +#define XA_FP32		0
> +#define XA2_FP32	0
> +#endif
> +
>  #if LJ_32
>  #define XA_64		CCI_XA
>  #define XA2_64		(CCI_XA+CCI_XA)
> @@ -185,20 +199,21 @@ typedef struct CCallInfo {
>    _(ANY,	pow,			2,   N, NUM, XA2_FP) \
>    _(ANY,	atan2,			2,   N, NUM, XA2_FP) \
>    _(ANY,	ldexp,			2,   N, NUM, XA_FP) \
> -  _(SOFTFP,	lj_vm_tobit,		2,   N, INT, 0) \
> -  _(SOFTFP,	softfp_add,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_sub,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_mul,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_div,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_cmp,		4,   N, NIL, 0) \
> +  _(SOFTFP,	lj_vm_tobit,		1,   N, INT, XA_FP32) \
> +  _(SOFTFP,	softfp_add,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_sub,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_mul,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_div,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_cmp,		2,   N, NIL, XA2_FP32) \
>    _(SOFTFP,	softfp_i2d,		1,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_d2i,		2,   N, INT, 0) \
> -  _(SOFTFP_MIPS, lj_vm_sfmin,		4,   N, NUM, 0) \
> -  _(SOFTFP_MIPS, lj_vm_sfmax,		4,   N, NUM, 0) \
> +  _(SOFTFP,	softfp_d2i,		1,   N, INT, XA_FP32) \
> +  _(SOFTFP_MIPS, lj_vm_sfmin,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP_MIPS, lj_vm_sfmax,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP_MIPS64, lj_vm_tointg,	1,   N, INT, 0) \
>    _(SOFTFP_FFI,	softfp_ui2d,		1,   N, NUM, 0) \
>    _(SOFTFP_FFI,	softfp_f2d,		1,   N, NUM, 0) \
> -  _(SOFTFP_FFI,	softfp_d2ui,		2,   N, INT, 0) \
> -  _(SOFTFP_FFI,	softfp_d2f,		2,   N, FLOAT, 0) \
> +  _(SOFTFP_FFI,	softfp_d2ui,		1,   N, INT, XA_FP32) \
> +  _(SOFTFP_FFI,	softfp_d2f,		1,   N, FLOAT, XA_FP32) \
>    _(SOFTFP_FFI,	softfp_i2f,		1,   N, FLOAT, 0) \
>    _(SOFTFP_FFI,	softfp_ui2f,		1,   N, FLOAT, 0) \
>    _(SOFTFP_FFI,	softfp_f2i,		1,   N, INT, 0) \
> diff --git a/src/lj_iropt.h b/src/lj_iropt.h
> index 73aef0ef..a59ba3f4 100644
> --- a/src/lj_iropt.h
> +++ b/src/lj_iropt.h
> @@ -150,7 +150,7 @@ LJ_FUNC IRType lj_opt_narrow_forl(jit_State *J, cTValue *forbase);
>  /* Optimization passes. */
>  LJ_FUNC void lj_opt_dce(jit_State *J);
>  LJ_FUNC int lj_opt_loop(jit_State *J);
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>  LJ_FUNC void lj_opt_split(jit_State *J);
>  #else
>  #define lj_opt_split(J)		UNUSED(J)
> diff --git a/src/lj_jit.h b/src/lj_jit.h
> index cc8efd20..c06829ab 100644
> --- a/src/lj_jit.h
> +++ b/src/lj_jit.h
> @@ -375,7 +375,7 @@ enum {
>    ((TValue *)(((intptr_t)&J->ksimd[2*(n)] + 15) & ~(intptr_t)15))
>  
>  /* Set/reset flag to activate the SPLIT pass for the current trace. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>  #define lj_needsplit(J)		(J->needsplit = 1)
>  #define lj_resetsplit(J)	(J->needsplit = 0)
>  #else
> @@ -438,7 +438,7 @@ typedef struct jit_State {
>    MSize sizesnapmap;	/* Size of temp. snapshot map buffer. */
>  
>    PostProc postproc;	/* Required post-processing after execution. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>    uint8_t needsplit;	/* Need SPLIT pass. */
>  #endif
>    uint8_t retryrec;	/* Retry recording. */
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 45507e0d..bf95e1eb 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -984,6 +984,9 @@ static LJ_AINLINE void copyTV(lua_State *L, TValue *o1, const TValue *o2)
>  
>  #if LJ_SOFTFP
>  LJ_ASMF int32_t lj_vm_tobit(double x);
> +#if LJ_TARGET_MIPS64
> +LJ_ASMF int32_t lj_vm_tointg(double x);
> +#endif
>  #endif
>  
>  static LJ_AINLINE int32_t lj_num2bit(lua_Number n)
> diff --git a/src/lj_opt_split.c b/src/lj_opt_split.c
> index c0788106..2fc36b8d 100644
> --- a/src/lj_opt_split.c
> +++ b/src/lj_opt_split.c
> @@ -8,7 +8,7 @@
>  
>  #include "lj_obj.h"
>  
> -#if LJ_HASJIT && (LJ_SOFTFP || (LJ_32 && LJ_HASFFI))
> +#if LJ_HASJIT && (LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI))
>  
>  #include "lj_err.h"
>  #include "lj_buf.h"
> diff --git a/src/lj_snap.c b/src/lj_snap.c
> index a063c316..9146cddc 100644
> --- a/src/lj_snap.c
> +++ b/src/lj_snap.c
> @@ -93,7 +93,7 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
>  	    (ir->op2 & (IRSLOAD_READONLY|IRSLOAD_PARENT)) != IRSLOAD_PARENT)
>  	  sn |= SNAP_NORESTORE;
>        }
> -      if (LJ_SOFTFP && irt_isnum(ir->t))
> +      if (LJ_SOFTFP32 && irt_isnum(ir->t))
>  	sn |= SNAP_SOFTFPNUM;
>        map[n++] = sn;
>      }
> @@ -379,7 +379,7 @@ IRIns *lj_snap_regspmap(GCtrace *T, SnapNo snapno, IRIns *ir)
>  	  break;
>  	}
>        }
> -    } else if (LJ_SOFTFP && ir->o == IR_HIOP) {
> +    } else if (LJ_SOFTFP32 && ir->o == IR_HIOP) {
>        ref++;
>      } else if (ir->o == IR_PVAL) {
>        ref = ir->op1 + REF_BIAS;
> @@ -491,7 +491,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
>      } else {
>        IRType t = irt_type(ir->t);
>        uint32_t mode = IRSLOAD_INHERIT|IRSLOAD_PARENT;
> -      if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
> +      if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
>        if (ir->o == IR_SLOAD) mode |= (ir->op2 & IRSLOAD_READONLY);
>        tr = emitir_raw(IRT(IR_SLOAD, t), s, mode);
>      }
> @@ -525,7 +525,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
>  	    if (irs->r == RID_SINK && snap_sunk_store(T, ir, irs)) {
>  	      if (snap_pref(J, T, map, nent, seen, irs->op2) == 0)
>  		snap_pref(J, T, map, nent, seen, T->ir[irs->op2].op1);
> -	      else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> +	      else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
>  		       irs+1 < irlast && (irs+1)->o == IR_HIOP)
>  		snap_pref(J, T, map, nent, seen, (irs+1)->op2);
>  	    }
> @@ -584,10 +584,10 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
>  		lua_assert(irc->o == IR_CONV && irc->op2 == IRCONV_NUM_INT);
>  		val = snap_pref(J, T, map, nent, seen, irc->op1);
>  		val = emitir(IRTN(IR_CONV), val, IRCONV_NUM_INT);
> -	      } else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> +	      } else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
>  			 irs+1 < irlast && (irs+1)->o == IR_HIOP) {
>  		IRType t = IRT_I64;
> -		if (LJ_SOFTFP && irt_type((irs+1)->t) == IRT_SOFTFP)
> +		if (LJ_SOFTFP32 && irt_type((irs+1)->t) == IRT_SOFTFP)
>  		  t = IRT_NUM;
>  		lj_needsplit(J);
>  		if (irref_isk(irs->op2) && irref_isk((irs+1)->op2)) {
> @@ -645,7 +645,7 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
>      int32_t *sps = &ex->spill[regsp_spill(rs)];
>      if (irt_isinteger(t)) {
>        setintV(o, *sps);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>      } else if (irt_isnum(t)) {
>        o->u64 = *(uint64_t *)sps;
>  #endif
> @@ -670,6 +670,9 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
>  #if !LJ_SOFTFP
>      } else if (irt_isnum(t)) {
>        setnumV(o, ex->fpr[r-RID_MIN_FPR]);
> +#elif LJ_64  /* && LJ_SOFTFP */
> +    } else if (irt_isnum(t)) {
> +      o->u64 = ex->gpr[r-RID_MIN_GPR];
>  #endif
>  #if LJ_64 && !LJ_GC64
>      } else if (irt_is64(t)) {
> @@ -823,7 +826,7 @@ static void snap_unsink(jit_State *J, GCtrace *T, ExitState *ex,
>  	  val = lj_tab_set(J->L, t, &tmp);
>  	  /* NOBARRIER: The table is new (marked white). */
>  	  snap_restoreval(J, T, ex, snapno, rfilt, irs->op2, val);
> -	  if (LJ_SOFTFP && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
> +	  if (LJ_SOFTFP32 && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
>  	    snap_restoreval(J, T, ex, snapno, rfilt, (irs+1)->op2, &tmp);
>  	    val->u32.hi = tmp.u32.lo;
>  	  }
> @@ -884,7 +887,7 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
>  	continue;
>        }
>        snap_restoreval(J, T, ex, snapno, rfilt, ref, o);
> -      if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
> +      if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
>  	TValue tmp;
>  	snap_restoreval(J, T, ex, snapno, rfilt, ref+1, &tmp);
>  	o->u32.hi = tmp.u32.lo;
> diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
> index 04be38f0..9839b5ac 100644
> --- a/src/vm_mips64.dasc
> +++ b/src/vm_mips64.dasc
> @@ -1984,6 +1984,38 @@ static void build_subroutines(BuildCtx *ctx)
>    |1:
>    |  jr ra
>    |.  move CRET1, r0
> +  |
> +  |// FP number to int conversion with a check for soft-float.
> +  |// Modifies CARG1, CRET1, CRET2, TMP0, AT.
> +  |->vm_tointg:
> +  |.if JIT
> +  |  dsll CRET2, CARG1, 1
> +  |  beqz CRET2, >2
> +  |.  li TMP0, 1076
> +  |  dsrl AT, CRET2, 53
> +  |  dsubu TMP0, TMP0, AT
> +  |  sltiu AT, TMP0, 54
> +  |  beqz AT, >1
> +  |.  dextm CRET2, CRET2, 0, 20
> +  |  dinsu CRET2, AT, 21, 21
> +  |  slt AT, CARG1, r0
> +  |  dsrlv CRET1, CRET2, TMP0
> +  |  dsubu CARG1, r0, CRET1
> +  |  movn CRET1, CARG1, AT
> +  |  li CARG1, 64
> +  |  subu TMP0, CARG1, TMP0
> +  |  dsllv CRET2, CRET2, TMP0	// Integer check.
> +  |  sextw AT, CRET1
> +  |  xor AT, CRET1, AT		// Range check.
> +  |  jr ra
> +  |.  movz CRET2, AT, CRET2
> +  |1:
> +  |  jr ra
> +  |.  li CRET2, 1
> +  |2:
> +  |  jr ra
> +  |.  move CRET1, r0
> +  |.endif
>    |.endif
>    |
>    |.macro .ffunc_bit, name
> @@ -2669,6 +2701,23 @@ static void build_subroutines(BuildCtx *ctx)
>    |.  li CRET1, 0
>    |.endif
>    |
> +  |.macro sfmin_max, name, intins
> +  |->vm_sf .. name:
> +  |.if JIT and not FPU
> +  |  move TMP2, ra
> +  |  bal ->vm_sfcmpolt
> +  |.  nop
> +  |  move ra, TMP2
> +  |  move TMP0, CRET1
> +  |  move CRET1, CARG1
> +  |  jr ra
> +  |.  intins CRET1, CARG2, TMP0
> +  |.endif
> +  |.endmacro
> +  |
> +  |  sfmin_max min, movz
> +  |  sfmin_max max, movn
> +  |
>    |//-----------------------------------------------------------------------
>    |//-- Miscellaneous functions --------------------------------------------
>    |//-----------------------------------------------------------------------
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:40   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:13     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 14:53   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:40 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:54PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
> 
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
> 
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the VM for powerpc.
Typo: s/powerpc/PowerPC/
> This includes:
> * Any loads/storages of double values use load/storage through 32-bit
Typo: s/storages/stores/ Feel free to ignore, though.
>   registers of `lo` and `hi` part of the TValue union.
> * Macro .FPU is added to skip instructions necessary only for
>   hard-float operations (load/store floating point registers from/on the
>   stack, when leave/enter VM, for example).
Typo: s/leave/enter/leaving/entering/
> * Now r25 named as `SAVE1` is used as saved temporary register (used in
>   different fast functions)
> * `sfi2d` macro is introduced to convert integer, that represents a
Typo: s/convert/convert an/
>   soft-float, to double. Receives destination and source registers, uses
Typo: s/to double/to a double/
>   `TMP0` and `TMP1`.
> * `sfpmod` macro is introduced for soft-float point `fmod` built-in.
> * `ins_arith` now receives the third parameter -- operation to use for
>   soft-float point.
> * `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
>   there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
>   set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.
> 
> Support of soft-float point for the JIT compiler will be added in the
> next patch.
> 
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> 
> Sergey Kaplun:
> * added the description for the feature
> 
> Part of tarantool/tarantool#8825
> ---
>  src/host/buildvm_asm.c |    2 +-
>  src/lj_arch.h          |   29 +-
>  src/lj_ccall.c         |   38 +-
>  src/lj_ccall.h         |    4 +-
>  src/lj_ccallback.c     |   30 +-
>  src/lj_frame.h         |    2 +-
>  src/lj_ircall.h        |    2 +-
>  src/vm_ppc.dasc        | 1249 +++++++++++++++++++++++++++++++++-------
>  8 files changed, 1101 insertions(+), 255 deletions(-)
> 
> diff --git a/src/host/buildvm_asm.c b/src/host/buildvm_asm.c
> index ffd14903..43595b31 100644
> --- a/src/host/buildvm_asm.c
> +++ b/src/host/buildvm_asm.c
> @@ -338,7 +338,7 @@ void emit_asm(BuildCtx *ctx)
>  #if !(LJ_TARGET_PS3 || LJ_TARGET_PSVITA)
>      fprintf(ctx->fp, "\t.section .note.GNU-stack,\"\"," ELFASM_PX "progbits\n");
>  #endif
> -#if LJ_TARGET_PPC && !LJ_TARGET_PS3
> +#if LJ_TARGET_PPC && !LJ_TARGET_PS3 && !LJ_ABI_SOFTFP
>      /* Hard-float ABI. */
>      fprintf(ctx->fp, "\t.gnu_attribute 4, 1\n");
>  #endif
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index c39526ea..8bb8757d 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -262,6 +262,29 @@
>  #else
>  #define LJ_ARCH_BITS		32
>  #define LJ_ARCH_NAME		"ppc"
> +
> +#if !defined(LJ_ARCH_HASFPU)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ARCH_HASFPU		0
> +#else
> +#define LJ_ARCH_HASFPU		1
> +#endif
> +#endif
> +
> +#if !defined(LJ_ABI_SOFTFP)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ABI_SOFTFP		1
> +#else
> +#define LJ_ABI_SOFTFP		0
> +#endif
> +#endif
> +#endif
> +
> +#if LJ_ABI_SOFTFP
> +#define LJ_ARCH_NOJIT		1  /* NYI */
> +#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
> +#else
> +#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
>  #endif
>  
>  #define LJ_TARGET_PPC		1
> @@ -271,7 +294,6 @@
>  #define LJ_TARGET_MASKSHIFT	0
>  #define LJ_TARGET_MASKROT	1
>  #define LJ_TARGET_UNIFYROT	1	/* Want only IR_BROL. */
> -#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
>  
>  #if LJ_TARGET_CONSOLE
>  #define LJ_ARCH_PPC32ON64	1
> @@ -431,16 +453,13 @@
>  #error "No support for ILP32 model on ARM64"
>  #endif
>  #elif LJ_TARGET_PPC
> -#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> -#error "No support for PowerPC CPUs without double-precision FPU"
> -#endif
>  #if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
>  #error "No support for little-endian PPC32"
>  #endif
>  #if LJ_ARCH_PPC64
>  #error "No support for PowerPC 64 bit mode (yet)"
>  #endif
> -#ifdef __NO_FPRS__
> +#if defined(__NO_FPRS__) && !defined(_SOFT_FLOAT)
>  #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
>  #endif
>  #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ccall.c b/src/lj_ccall.c
> index d39ff861..c1e12f56 100644
> --- a/src/lj_ccall.c
> +++ b/src/lj_ccall.c
> @@ -388,6 +388,24 @@
>  #define CCALL_HANDLE_COMPLEXARG \
>    /* Pass complex by value in 2 or 4 GPRs. */
>  
> +#define CCALL_HANDLE_GPR \
> +  /* Try to pass argument in GPRs. */ \
> +  if (n > 1) { \
> +    lua_assert(n == 2 || n == 4);  /* int64_t or complex (float). */ \
> +    if (ctype_isinteger(d->info) || ctype_isfp(d->info)) \
> +      ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> +    else if (ngpr + n > maxgpr) \
> +      ngpr = maxgpr;  /* Prevent reordering. */ \
> +  } \
> +  if (ngpr + n <= maxgpr) { \
> +    dp = &cc->gpr[ngpr]; \
> +    ngpr += n; \
> +    goto done; \
> +  } \
> +
> +#if LJ_ABI_SOFTFP
> +#define CCALL_HANDLE_REGARG  CCALL_HANDLE_GPR
> +#else
>  #define CCALL_HANDLE_REGARG \
>    if (isfp) {  /* Try to pass argument in FPRs. */ \
>      if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -396,24 +414,16 @@
>        d = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */ \
>        goto done; \
>      } \
> -  } else {  /* Try to pass argument in GPRs. */ \
> -    if (n > 1) { \
> -      lua_assert(n == 2 || n == 4);  /* int64_t or complex (float). */ \
> -      if (ctype_isinteger(d->info)) \
> -	ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> -      else if (ngpr + n > maxgpr) \
> -	ngpr = maxgpr;  /* Prevent reordering. */ \
> -    } \
> -    if (ngpr + n <= maxgpr) { \
> -      dp = &cc->gpr[ngpr]; \
> -      ngpr += n; \
> -      goto done; \
> -    } \
> +  } else { \
> +    CCALL_HANDLE_GPR \
>    }
> +#endif
>  
> +#if !LJ_ABI_SOFTFP
>  #define CCALL_HANDLE_RET \
>    if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
>      ctr = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */
> +#endif
>  
>  #elif LJ_TARGET_MIPS32
>  /* -- MIPS o32 calling conventions ---------------------------------------- */
> @@ -1081,7 +1091,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
>    }
>    if (fid) lj_err_caller(L, LJ_ERR_FFI_NUMARG);  /* Too few arguments. */
>  
> -#if LJ_TARGET_X64 || LJ_TARGET_PPC
> +#if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP)
>    cc->nfpr = nfpr;  /* Required for vararg functions. */
>  #endif
>    cc->nsp = nsp;
> diff --git a/src/lj_ccall.h b/src/lj_ccall.h
> index 59f66481..6efa48c7 100644
> --- a/src/lj_ccall.h
> +++ b/src/lj_ccall.h
> @@ -86,9 +86,9 @@ typedef union FPRArg {
>  #elif LJ_TARGET_PPC
>  
>  #define CCALL_NARG_GPR		8
> -#define CCALL_NARG_FPR		8
> +#define CCALL_NARG_FPR		(LJ_ABI_SOFTFP ? 0 : 8)
>  #define CCALL_NRET_GPR		4	/* For complex double. */
> -#define CCALL_NRET_FPR		1
> +#define CCALL_NRET_FPR		(LJ_ABI_SOFTFP ? 0 : 1)
>  #define CCALL_SPS_EXTRA		4
>  #define CCALL_SPS_FREE		0
>  
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index 224b6b94..c33190d7 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -419,6 +419,23 @@ void lj_ccallback_mcode_free(CTState *cts)
>  
>  #elif LJ_TARGET_PPC
>  
> +#define CALLBACK_HANDLE_GPR \
> +  if (n > 1) { \
> +    lua_assert(((LJ_ABI_SOFTFP && ctype_isnum(cta->info)) ||  /* double. */ \
> +		ctype_isinteger(cta->info)) && n == 2);  /* int64_t. */ \
> +    ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> +  } \
> +  if (ngpr + n <= maxgpr) { \
> +    sp = &cts->cb.gpr[ngpr]; \
> +    ngpr += n; \
> +    goto done; \
> +  }
> +
> +#if LJ_ABI_SOFTFP
> +#define CALLBACK_HANDLE_REGARG \
> +  CALLBACK_HANDLE_GPR \
> +  UNUSED(isfp);
> +#else
>  #define CALLBACK_HANDLE_REGARG \
>    if (isfp) { \
>      if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -427,20 +444,15 @@ void lj_ccallback_mcode_free(CTState *cts)
>        goto done; \
>      } \
>    } else {  /* Try to pass argument in GPRs. */ \
> -    if (n > 1) { \
> -      lua_assert(ctype_isinteger(cta->info) && n == 2);  /* int64_t. */ \
> -      ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> -    } \
> -    if (ngpr + n <= maxgpr) { \
> -      sp = &cts->cb.gpr[ngpr]; \
> -      ngpr += n; \
> -      goto done; \
> -    } \
> +    CALLBACK_HANDLE_GPR \
>    }
> +#endif
>  
> +#if !LJ_ABI_SOFTFP
>  #define CALLBACK_HANDLE_RET \
>    if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
>      *(double *)dp = *(float *)dp;  /* FPRs always hold doubles. */
> +#endif
>  
>  #elif LJ_TARGET_MIPS32
>  
> diff --git a/src/lj_frame.h b/src/lj_frame.h
> index 2bdf3c48..5cb3d639 100644
> --- a/src/lj_frame.h
> +++ b/src/lj_frame.h
> @@ -226,7 +226,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
>  #define CFRAME_OFS_L		36
>  #define CFRAME_OFS_PC		32
>  #define CFRAME_OFS_MULTRES	28
> -#define CFRAME_SIZE		272
> +#define CFRAME_SIZE		(LJ_ARCH_HASFPU ? 272 : 128)
>  #define CFRAME_SHIFT_MULTRES	3
>  #endif
>  #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index c1ac29d1..bbad35b1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -291,7 +291,7 @@ LJ_DATA const CCallInfo lj_ir_callinfo[IRCALL__MAX+1];
>  #define fp64_f2l __aeabi_f2lz
>  #define fp64_f2ul __aeabi_f2ulz
>  #endif
> -#elif LJ_TARGET_MIPS
> +#elif LJ_TARGET_MIPS || LJ_TARGET_PPC
>  #define softfp_add __adddf3
>  #define softfp_sub __subdf3
>  #define softfp_mul __muldf3
> diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
> index 7ad8df37..980ad897 100644
> --- a/src/vm_ppc.dasc
> +++ b/src/vm_ppc.dasc
> @@ -103,6 +103,18 @@
>  |// Fixed register assignments for the interpreter.
>  |// Don't use: r1 = sp, r2 and r13 = reserved (TOC, TLS or SDATA)
>  |
> +|.macro .FPU, a, b
> +|.if FPU
> +|  a, b
> +|.endif
> +|.endmacro
> +|
> +|.macro .FPU, a, b, c
> +|.if FPU
> +|  a, b, c
> +|.endif
> +|.endmacro
> +|
>  |// The following must be C callee-save (but BASE is often refetched).
>  |.define BASE,		r14	// Base of current Lua stack frame.
>  |.define KBASE,		r15	// Constants of current Lua function.
> @@ -116,8 +128,10 @@
>  |.define TISNUM,	r22
>  |.define TISNIL,	r23
>  |.define ZERO,		r24
> +|.if FPU
>  |.define TOBIT,		f30	// 2^52 + 2^51.
>  |.define TONUM,		f31	// 2^52 + 2^51 + 2^31.
> +|.endif
>  |
>  |// The following temporaries are not saved across C calls, except for RA.
>  |.define RA,		r20	// Callee-save.
> @@ -133,6 +147,7 @@
>  |
>  |// Saved temporaries.
>  |.define SAVE0,		r21
> +|.define SAVE1,		r25
>  |
>  |// Calling conventions.
>  |.define CARG1,		r3
> @@ -141,8 +156,10 @@
>  |.define CARG4,		r6	// Overlaps TMP3.
>  |.define CARG5,		r7	// Overlaps INS.
>  |
> +|.if FPU
>  |.define FARG1,		f1
>  |.define FARG2,		f2
> +|.endif
>  |
>  |.define CRET1,		r3
>  |.define CRET2,		r4
> @@ -213,10 +230,16 @@
>  |.endif
>  |.else
>  |
> +|.if FPU
>  |.define SAVE_LR,	276(sp)
>  |.define CFRAME_SPACE,	272     // Delta for sp.
>  |// Back chain for sp:	272(sp) <-- sp entering interpreter
>  |.define SAVE_FPR_,	128     // .. 128+18*8: 64 bit FPR saves.
> +|.else
> +|.define SAVE_LR,	132(sp)
> +|.define CFRAME_SPACE,	128     // Delta for sp.
> +|// Back chain for sp:	128(sp) <-- sp entering interpreter
> +|.endif
>  |.define SAVE_GPR_,	56      // .. 56+18*4: 32 bit GPR saves.
>  |.define SAVE_CR,	52(sp)  // 32 bit CR save.
>  |.define SAVE_ERRF,	48(sp)  // 32 bit C frame info.
> @@ -226,16 +249,25 @@
>  |.define SAVE_PC,	32(sp)
>  |.define SAVE_MULTRES,	28(sp)
>  |.define UNUSED1,	24(sp)
> +|.if FPU
>  |.define TMPD_LO,	20(sp)
>  |.define TMPD_HI,	16(sp)
>  |.define TONUM_LO,	12(sp)
>  |.define TONUM_HI,	8(sp)
> +|.else
> +|.define SFSAVE_4,	20(sp)
> +|.define SFSAVE_3,	16(sp)
> +|.define SFSAVE_2,	12(sp)
> +|.define SFSAVE_1,	8(sp)
> +|.endif
>  |// Next frame lr:	4(sp)
>  |// Back chain for sp:	0(sp)	<-- sp while in interpreter
>  |
> +|.if FPU
>  |.define TMPD_BLO,	23(sp)
>  |.define TMPD,		TMPD_HI
>  |.define TONUM_D,	TONUM_HI
> +|.endif
>  |
>  |.endif
>  |
> @@ -245,7 +277,7 @@
>  |.else
>  |  stw r..reg, SAVE_GPR_+(reg-14)*4(sp)
>  |.endif
> -|  stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +|  .FPU stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
>  |.endmacro
>  |.macro rest_, reg
>  |.if GPR64
> @@ -253,7 +285,7 @@
>  |.else
>  |  lwz r..reg, SAVE_GPR_+(reg-14)*4(sp)
>  |.endif
> -|  lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +|  .FPU lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
>  |.endmacro
>  |
>  |.macro saveregs
> @@ -323,6 +355,7 @@
>  |// Trap for not-yet-implemented parts.
>  |.macro NYI; tw 4, sp, sp; .endmacro
>  |
> +|.if FPU
>  |// int/FP conversions.
>  |.macro tonum_i, freg, reg
>  |  xoris reg, reg, 0x8000
> @@ -346,6 +379,7 @@
>  |.macro toint, reg, freg
>  |  toint reg, freg, freg
>  |.endmacro
> +|.endif
>  |
>  |//-----------------------------------------------------------------------
>  |
> @@ -533,9 +567,19 @@ static void build_subroutines(BuildCtx *ctx)
>    |  beq >2
>    |1:
>    |  addic. TMP1, TMP1, -8
> +  |.if FPU
>    |   lfd f0, 0(RA)
> +  |.else
> +  |   lwz CARG1, 0(RA)
> +  |   lwz CARG2, 4(RA)
> +  |.endif
>    |    addi RA, RA, 8
> +  |.if FPU
>    |   stfd f0, 0(BASE)
> +  |.else
> +  |   stw CARG1, 0(BASE)
> +  |   stw CARG2, 4(BASE)
> +  |.endif
>    |    addi BASE, BASE, 8
>    |  bney <1
>    |
> @@ -613,23 +657,23 @@ static void build_subroutines(BuildCtx *ctx)
>    |  .toc ld TOCREG, SAVE_TOC
>    |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>    |  lp BASE, L->base
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>    |   lwz DISPATCH, L->glref		// Setup pointer to dispatch table.
>    |     li ZERO, 0
> -  |     stw TMP3, TMPD
> +  |     .FPU stw TMP3, TMPD
>    |  li TMP1, LJ_TFALSE
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
>    |     li TISNIL, LJ_TNIL
>    |    li_vmstate INTERP
> -  |     lfs TOBIT, TMPD
> +  |     .FPU lfs TOBIT, TMPD
>    |  lwz PC, FRAME_PC(BASE)		// Fetch PC of previous frame.
>    |  la RA, -8(BASE)			// Results start at BASE-8.
> -  |     stw TMP3, TMPD
> +  |     .FPU stw TMP3, TMPD
>    |   addi DISPATCH, DISPATCH, GG_G2DISP
>    |  stw TMP1, 0(RA)			// Prepend false to error message.
>    |  li RD, 16				// 2 results: false + error message.
>    |    st_vmstate
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>    |  b ->vm_returnc
>    |
>    |//-----------------------------------------------------------------------
> @@ -690,22 +734,22 @@ static void build_subroutines(BuildCtx *ctx)
>    |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>    |   lp TMP1, L->top
>    |  lwz PC, FRAME_PC(BASE)
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>    |    stb CARG3, L->status
> -  |     stw TMP3, TMPD
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |     lfs TOBIT, TMPD
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU lfs TOBIT, TMPD
>    |   sub RD, TMP1, BASE
> -  |     stw TMP3, TMPD
> -  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
>    |   addi RD, RD, 8
> -  |     stw TMP0, TONUM_HI
> +  |     .FPU stw TMP0, TONUM_HI
>    |    li_vmstate INTERP
>    |     li ZERO, 0
>    |    st_vmstate
>    |  andix. TMP0, PC, FRAME_TYPE
>    |   mr MULTRES, RD
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>    |     li TISNIL, LJ_TNIL
>    |  beq ->BC_RET_Z
>    |  b ->vm_return
> @@ -739,19 +783,19 @@ static void build_subroutines(BuildCtx *ctx)
>    |  lp TMP2, L->base			// TMP2 = old base (used in vmeta_call).
>    |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>    |   lp TMP1, L->top
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>    |  add PC, PC, BASE
> -  |     stw TMP3, TMPD
> +  |     .FPU stw TMP3, TMPD
>    |     li ZERO, 0
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |     lfs TOBIT, TMPD
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU lfs TOBIT, TMPD
>    |  sub PC, PC, TMP2			// PC = frame delta + frame type
> -  |     stw TMP3, TMPD
> -  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
>    |   sub NARGS8:RC, TMP1, BASE
> -  |     stw TMP0, TONUM_HI
> +  |     .FPU stw TMP0, TONUM_HI
>    |    li_vmstate INTERP
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>    |     li TISNIL, LJ_TNIL
>    |    st_vmstate
>    |
> @@ -839,15 +883,30 @@ static void build_subroutines(BuildCtx *ctx)
>    |  lwz INS, -4(PC)
>    |   subi CARG2, RB, 16
>    |  decode_RB8 SAVE0, INS
> +  |.if FPU
>    |   lfd f0, 0(RA)
> +  |.else
> +  |   lwz TMP2, 0(RA)
> +  |   lwz TMP3, 4(RA)
> +  |.endif
>    |  add TMP1, BASE, SAVE0
>    |   stp BASE, L->base
>    |  cmplw TMP1, CARG2
>    |   sub CARG3, CARG2, TMP1
>    |  decode_RA8 RA, INS
> +  |.if FPU
>    |   stfd f0, 0(CARG2)
> +  |.else
> +  |   stw TMP2, 0(CARG2)
> +  |   stw TMP3, 4(CARG2)
> +  |.endif
>    |  bney ->BC_CAT_Z
> +  |.if FPU
>    |   stfdx f0, BASE, RA
> +  |.else
> +  |   stwux TMP2, RA, BASE
> +  |   stw TMP3, 4(RA)
> +  |.endif
>    |  b ->cont_nop
>    |
>    |//-- Table indexing metamethods -----------------------------------------
> @@ -900,9 +959,19 @@ static void build_subroutines(BuildCtx *ctx)
>    |  // Returns TValue * (finished) or NULL (metamethod).
>    |  cmplwi CRET1, 0
>    |  beq >3
> +  |.if FPU
>    |   lfd f0, 0(CRET1)
> +  |.else
> +  |   lwz TMP0, 0(CRET1)
> +  |   lwz TMP1, 4(CRET1)
> +  |.endif
>    |  ins_next1
> +  |.if FPU
>    |   stfdx f0, BASE, RA
> +  |.else
> +  |   stwux TMP0, RA, BASE
> +  |   stw TMP1, 4(RA)
> +  |.endif
>    |  ins_next2
>    |
>    |3:  // Call __index metamethod.
> @@ -920,7 +989,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |  // Returns cTValue * or NULL.
>    |  cmplwi CRET1, 0
>    |  beq >1
> +  |.if FPU
>    |  lfd f14, 0(CRET1)
> +  |.else
> +  |  lwz SAVE0, 0(CRET1)
> +  |  lwz SAVE1, 4(CRET1)
> +  |.endif
>    |  b ->BC_TGETR_Z
>    |1:
>    |  stwx TISNIL, BASE, RA
> @@ -975,11 +1049,21 @@ static void build_subroutines(BuildCtx *ctx)
>    |  bl extern lj_meta_tset		// (lua_State *L, TValue *o, TValue *k)
>    |  // Returns TValue * (finished) or NULL (metamethod).
>    |  cmplwi CRET1, 0
> +  |.if FPU
>    |   lfdx f0, BASE, RA
> +  |.else
> +  |   lwzux TMP2, RA, BASE
> +  |   lwz TMP3, 4(RA)
> +  |.endif
>    |  beq >3
>    |  // NOBARRIER: lj_meta_tset ensures the table is not black.
>    |  ins_next1
> +  |.if FPU
>    |   stfd f0, 0(CRET1)
> +  |.else
> +  |   stw TMP2, 0(CRET1)
> +  |   stw TMP3, 4(CRET1)
> +  |.endif
>    |  ins_next2
>    |
>    |3:  // Call __newindex metamethod.
> @@ -990,7 +1074,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |   add PC, TMP1, BASE
>    |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
>    |   li NARGS8:RC, 24			// 3 args for func(t, k, v)
> +  |.if FPU
>    |  stfd f0, 16(BASE)			// Copy value to third argument.
> +  |.else
> +  |  stw TMP2, 16(BASE)
> +  |  stw TMP3, 20(BASE)
> +  |.endif
>    |  b ->vm_call_dispatch_f
>    |
>    |->vmeta_tsetr:
> @@ -999,7 +1088,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |  stw PC, SAVE_PC
>    |  bl extern lj_tab_setinth  // (lua_State *L, GCtab *t, int32_t key)
>    |  // Returns TValue *.
> +  |.if FPU
>    |  stfd f14, 0(CRET1)
> +  |.else
> +  |  stw SAVE0, 0(CRET1)
> +  |  stw SAVE1, 4(CRET1)
> +  |.endif
>    |  b ->cont_nop
>    |
>    |//-- Comparison metamethods ---------------------------------------------
> @@ -1038,9 +1132,19 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |->cont_ra:				// RA = resultptr
>    |  lwz INS, -4(PC)
> +  |.if FPU
>    |   lfd f0, 0(RA)
> +  |.else
> +  |   lwz CARG1, 0(RA)
> +  |   lwz CARG2, 4(RA)
> +  |.endif
>    |  decode_RA8 TMP1, INS
> +  |.if FPU
>    |   stfdx f0, BASE, TMP1
> +  |.else
> +  |   stwux CARG1, TMP1, BASE
> +  |   stw CARG2, 4(TMP1)
> +  |.endif
>    |  b ->cont_nop
>    |
>    |->cont_condt:			// RA = resultptr
> @@ -1246,22 +1350,32 @@ static void build_subroutines(BuildCtx *ctx)
>    |.macro .ffunc_n, name
>    |->ff_ .. name:
>    |  cmplwi NARGS8:RC, 8
> -  |   lwz CARG3, 0(BASE)
> +  |   lwz CARG1, 0(BASE)
> +  |.if FPU
>    |    lfd FARG1, 0(BASE)
> +  |.else
> +  |    lwz CARG2, 4(BASE)
> +  |.endif
>    |  blt ->fff_fallback
> -  |  checknum CARG3; bge ->fff_fallback
> +  |  checknum CARG1; bge ->fff_fallback
>    |.endmacro
>    |
>    |.macro .ffunc_nn, name
>    |->ff_ .. name:
>    |  cmplwi NARGS8:RC, 16
> -  |   lwz CARG3, 0(BASE)
> +  |   lwz CARG1, 0(BASE)
> +  |.if FPU
>    |    lfd FARG1, 0(BASE)
> -  |   lwz CARG4, 8(BASE)
> +  |   lwz CARG3, 8(BASE)
>    |    lfd FARG2, 8(BASE)
> +  |.else
> +  |    lwz CARG2, 4(BASE)
> +  |   lwz CARG3, 8(BASE)
> +  |    lwz CARG4, 12(BASE)
> +  |.endif
>    |  blt ->fff_fallback
> +  |  checknum CARG1; bge ->fff_fallback
>    |  checknum CARG3; bge ->fff_fallback
> -  |  checknum CARG4; bge ->fff_fallback
>    |.endmacro
>    |
>    |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1.
> @@ -1282,14 +1396,21 @@ static void build_subroutines(BuildCtx *ctx)
>    |  bge cr1, ->fff_fallback
>    |   stw CARG3, 0(RA)
>    |  addi RD, NARGS8:RC, 8		// Compute (nresults+1)*8.
> +  |  addi TMP1, BASE, 8
> +  |  add TMP2, RA, NARGS8:RC
>    |   stw CARG1, 4(RA)
>    |  beq ->fff_res			// Done if exactly 1 argument.
> -  |  li TMP1, 8
> -  |  subi RC, RC, 8
>    |1:
> -  |  cmplw TMP1, RC
> -  |   lfdx f0, BASE, TMP1
> -  |   stfdx f0, RA, TMP1
> +  |  cmplw TMP1, TMP2
> +  |.if FPU
> +  |   lfd f0, 0(TMP1)
> +  |   stfd f0, 0(TMP1)
> +  |.else
> +  |   lwz CARG1, 0(TMP1)
> +  |   lwz CARG2, 4(TMP1)
> +  |   stw CARG1, -8(TMP1)
> +  |   stw CARG2, -4(TMP1)
> +  |.endif
>    |    addi TMP1, TMP1, 8
>    |  bney <1
>    |  b ->fff_res
> @@ -1304,8 +1425,14 @@ static void build_subroutines(BuildCtx *ctx)
>    |  orc TMP1, TMP2, TMP0
>    |  addi TMP1, TMP1, ~LJ_TISNUM+1
>    |  slwi TMP1, TMP1, 3
> +  |.if FPU
>    |   la TMP2, CFUNC:RB->upvalue
>    |  lfdx FARG1, TMP2, TMP1
> +  |.else
> +  |  add TMP1, CFUNC:RB, TMP1
> +  |  lwz CARG1, CFUNC:TMP1->upvalue[0].u32.hi
> +  |  lwz CARG2, CFUNC:TMP1->upvalue[0].u32.lo
> +  |.endif
>    |  b ->fff_resn
>    |
>    |//-- Base library: getters and setters ---------------------------------
> @@ -1383,7 +1510,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |   mr CARG1, L
>    |  bl extern lj_tab_get  // (lua_State *L, GCtab *t, cTValue *key)
>    |  // Returns cTValue *.
> +  |.if FPU
>    |  lfd FARG1, 0(CRET1)
> +  |.else
> +  |  lwz CARG2, 4(CRET1)
> +  |  lwz CARG1, 0(CRET1)	// Caveat: CARG1 == CRET1.
> +  |.endif
>    |  b ->fff_resn
>    |
>    |//-- Base library: conversions ------------------------------------------
> @@ -1392,7 +1524,11 @@ static void build_subroutines(BuildCtx *ctx)
>    |  // Only handles the number case inline (without a base argument).
>    |  cmplwi NARGS8:RC, 8
>    |   lwz CARG1, 0(BASE)
> +  |.if FPU
>    |    lfd FARG1, 0(BASE)
> +  |.else
> +  |    lwz CARG2, 4(BASE)
> +  |.endif
>    |  bne ->fff_fallback			// Exactly one argument.
>    |   checknum CARG1; bgt ->fff_fallback
>    |  b ->fff_resn
> @@ -1443,12 +1579,23 @@ static void build_subroutines(BuildCtx *ctx)
>    |  cmplwi CRET1, 0
>    |   li CARG3, LJ_TNIL
>    |  beq ->fff_restv			// End of traversal: return nil.
> -  |  lfd f0, 8(BASE)			// Copy key and value to results.
>    |   la RA, -8(BASE)
> +  |.if FPU
> +  |  lfd f0, 8(BASE)			// Copy key and value to results.
>    |  lfd f1, 16(BASE)
>    |  stfd f0, 0(RA)
> -  |   li RD, (2+1)*8
>    |  stfd f1, 8(RA)
> +  |.else
> +  |  lwz CARG1, 8(BASE)
> +  |  lwz CARG2, 12(BASE)
> +  |  lwz CARG3, 16(BASE)
> +  |  lwz CARG4, 20(BASE)
> +  |  stw CARG1, 0(RA)
> +  |  stw CARG2, 4(RA)
> +  |  stw CARG3, 8(RA)
> +  |  stw CARG4, 12(RA)
> +  |.endif
> +  |   li RD, (2+1)*8
>    |  b ->fff_res
>    |
>    |.ffunc_1 pairs
> @@ -1457,17 +1604,32 @@ static void build_subroutines(BuildCtx *ctx)
>    |  bne ->fff_fallback
>  #if LJ_52
>    |   lwz TAB:TMP2, TAB:CARG1->metatable
> +  |.if FPU
>    |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>    |   cmplwi TAB:TMP2, 0
>    |  la RA, -8(BASE)
>    |   bne ->fff_fallback
>  #else
> +  |.if FPU
>    |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>    |  la RA, -8(BASE)
>  #endif
>    |   stw TISNIL, 8(BASE)
>    |  li RD, (3+1)*8
> +  |.if FPU
>    |  stfd f0, 0(RA)
> +  |.else
> +  |  stw TMP0, 0(RA)
> +  |  stw TMP1, 4(RA)
> +  |.endif
>    |  b ->fff_res
>    |
>    |.ffunc ipairs_aux
> @@ -1513,14 +1675,24 @@ static void build_subroutines(BuildCtx *ctx)
>    |  stfd FARG2, 0(RA)
>    |.endif
>    |  ble >2				// Not in array part?
> +  |.if FPU
>    |  lwzx TMP2, TMP1, TMP3
>    |  lfdx f0, TMP1, TMP3
> +  |.else
> +  |  lwzux TMP2, TMP1, TMP3
> +  |  lwz TMP3, 4(TMP1)
> +  |.endif
>    |1:
>    |  checknil TMP2
>    |   li RD, (0+1)*8
>    |  beq ->fff_res			// End of iteration, return 0 results.
>    |   li RD, (2+1)*8
> +  |.if FPU
>    |  stfd f0, 8(RA)
> +  |.else
> +  |  stw TMP2, 8(RA)
> +  |  stw TMP3, 12(RA)
> +  |.endif
>    |  b ->fff_res
>    |2:  // Check for empty hash part first. Otherwise call C function.
>    |  lwz TMP0, TAB:CARG1->hmask
> @@ -1534,7 +1706,11 @@ static void build_subroutines(BuildCtx *ctx)
>    |   li RD, (0+1)*8
>    |  beq ->fff_res
>    |  lwz TMP2, 0(CRET1)
> +  |.if FPU
>    |  lfd f0, 0(CRET1)
> +  |.else
> +  |  lwz TMP3, 4(CRET1)
> +  |.endif
>    |  b <1
>    |
>    |.ffunc_1 ipairs
> @@ -1543,12 +1719,22 @@ static void build_subroutines(BuildCtx *ctx)
>    |  bne ->fff_fallback
>  #if LJ_52
>    |   lwz TAB:TMP2, TAB:CARG1->metatable
> +  |.if FPU
>    |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>    |   cmplwi TAB:TMP2, 0
>    |  la RA, -8(BASE)
>    |   bne ->fff_fallback
>  #else
> +  |.if FPU
>    |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>    |  la RA, -8(BASE)
>  #endif
>    |.if DUALNUM
> @@ -1558,7 +1744,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |.endif
>    |   stw ZERO, 12(BASE)
>    |  li RD, (3+1)*8
> +  |.if FPU
>    |  stfd f0, 0(RA)
> +  |.else
> +  |  stw TMP0, 0(RA)
> +  |  stw TMP1, 4(RA)
> +  |.endif
>    |  b ->fff_res
>    |
>    |//-- Base library: catch errors ----------------------------------------
> @@ -1577,19 +1768,32 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |.ffunc xpcall
>    |  cmplwi NARGS8:RC, 16
> -  |   lwz CARG4, 8(BASE)
> +  |   lwz CARG3, 8(BASE)
> +  |.if FPU
>    |    lfd FARG2, 8(BASE)
>    |    lfd FARG1, 0(BASE)
> +  |.else
> +  |    lwz CARG1, 0(BASE)
> +  |    lwz CARG2, 4(BASE)
> +  |    lwz CARG4, 12(BASE)
> +  |.endif
>    |  blt ->fff_fallback
>    |  lbz TMP1, DISPATCH_GL(hookmask)(DISPATCH)
>    |   mr TMP2, BASE
> -  |  checkfunc CARG4; bne ->fff_fallback  // Traceback must be a function.
> +  |  checkfunc CARG3; bne ->fff_fallback  // Traceback must be a function.
>    |   la BASE, 16(BASE)
>    |  // Remember active hook before pcall.
>    |  rlwinm TMP1, TMP1, 32-HOOK_ACTIVE_SHIFT, 31, 31
> +  |.if FPU
>    |    stfd FARG2, 0(TMP2)		// Swap function and traceback.
> -  |  subi NARGS8:RC, NARGS8:RC, 16
>    |    stfd FARG1, 8(TMP2)
> +  |.else
> +  |    stw CARG3, 0(TMP2)
> +  |    stw CARG4, 4(TMP2)
> +  |    stw CARG1, 8(TMP2)
> +  |    stw CARG2, 12(TMP2)
> +  |.endif
> +  |  subi NARGS8:RC, NARGS8:RC, 16
>    |  addi PC, TMP1, 16+FRAME_PCALL
>    |  b ->vm_call_dispatch
>    |
> @@ -1632,9 +1836,21 @@ static void build_subroutines(BuildCtx *ctx)
>    |  stp BASE, L->top
>    |2:  // Move args to coroutine.
>    |  cmpw TMP1, NARGS8:RC
> +  |.if FPU
>    |   lfdx f0, BASE, TMP1
> +  |.else
> +  |   add CARG3, BASE, TMP1
> +  |   lwz TMP2, 0(CARG3)
> +  |   lwz TMP3, 4(CARG3)
> +  |.endif
>    |  beq >3
> +  |.if FPU
>    |   stfdx f0, CARG2, TMP1
> +  |.else
> +  |   add CARG3, CARG2, TMP1
> +  |   stw TMP2, 0(CARG3)
> +  |   stw TMP3, 4(CARG3)
> +  |.endif
>    |  addi TMP1, TMP1, 8
>    |  b <2
>    |3:
> @@ -1665,8 +1881,17 @@ static void build_subroutines(BuildCtx *ctx)
>    |   stp TMP2, L:SAVE0->top		// Clear coroutine stack.
>    |5:  // Move results from coroutine.
>    |  cmplw TMP1, TMP3
> +  |.if FPU
>    |   lfdx f0, TMP2, TMP1
>    |   stfdx f0, BASE, TMP1
> +  |.else
> +  |   add CARG3, TMP2, TMP1
> +  |   lwz CARG1, 0(CARG3)
> +  |   lwz CARG2, 4(CARG3)
> +  |   add CARG3, BASE, TMP1
> +  |   stw CARG1, 0(CARG3)
> +  |   stw CARG2, 4(CARG3)
> +  |.endif
>    |    addi TMP1, TMP1, 8
>    |  bne <5
>    |6:
> @@ -1691,12 +1916,22 @@ static void build_subroutines(BuildCtx *ctx)
>    |  andix. TMP0, PC, FRAME_TYPE
>    |  la TMP3, -8(TMP3)
>    |   li TMP1, LJ_TFALSE
> +  |.if FPU
>    |  lfd f0, 0(TMP3)
> +  |.else
> +  |  lwz CARG1, 0(TMP3)
> +  |  lwz CARG2, 4(TMP3)
> +  |.endif
>    |   stp TMP3, L:SAVE0->top		// Remove error from coroutine stack.
>    |    li RD, (2+1)*8
>    |   stw TMP1, -8(BASE)		// Prepend false to results.
>    |    la RA, -8(BASE)
> +  |.if FPU
>    |  stfd f0, 0(BASE)			// Copy error message.
> +  |.else
> +  |  stw CARG1, 0(BASE)			// Copy error message.
> +  |  stw CARG2, 4(BASE)
> +  |.endif
>    |  b <7
>    |.else
>    |  mr CARG1, L
> @@ -1875,7 +2110,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |  lus CARG1, 0x8000			// -(2^31).
>    |  beqy ->fff_resi
>    |5:
> +  |.if FPU
>    |  lfd FARG1, 0(BASE)
> +  |.else
> +  |  lwz CARG1, 0(BASE)
> +  |  lwz CARG2, 4(BASE)
> +  |.endif
>    |  blex func
>    |  b ->fff_resn
>    |.endmacro
> @@ -1899,10 +2139,14 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |.ffunc math_log
>    |  cmplwi NARGS8:RC, 8
> -  |   lwz CARG3, 0(BASE)
> -  |    lfd FARG1, 0(BASE)
> +  |   lwz CARG1, 0(BASE)
>    |  bne ->fff_fallback			// Need exactly 1 argument.
> -  |  checknum CARG3; bge ->fff_fallback
> +  |  checknum CARG1; bge ->fff_fallback
> +  |.if FPU
> +  |  lfd FARG1, 0(BASE)
> +  |.else
> +  |  lwz CARG2, 4(BASE)
> +  |.endif
>    |  blex log
>    |  b ->fff_resn
>    |
> @@ -1924,17 +2168,24 @@ static void build_subroutines(BuildCtx *ctx)
>    |.if DUALNUM
>    |.ffunc math_ldexp
>    |  cmplwi NARGS8:RC, 16
> -  |   lwz CARG3, 0(BASE)
> +  |   lwz TMP0, 0(BASE)
> +  |.if FPU
>    |    lfd FARG1, 0(BASE)
> -  |   lwz CARG4, 8(BASE)
> +  |.else
> +  |    lwz CARG1, 0(BASE)
> +  |    lwz CARG2, 4(BASE)
> +  |.endif
> +  |   lwz TMP1, 8(BASE)
>    |.if GPR64
>    |    lwz CARG2, 12(BASE)
> -  |.else
> +  |.elif FPU
>    |    lwz CARG1, 12(BASE)
> +  |.else
> +  |    lwz CARG3, 12(BASE)
>    |.endif
>    |  blt ->fff_fallback
> -  |  checknum CARG3; bge ->fff_fallback
> -  |  checknum CARG4; bne ->fff_fallback
> +  |  checknum TMP0; bge ->fff_fallback
> +  |  checknum TMP1; bne ->fff_fallback
>    |.else
>    |.ffunc_nn math_ldexp
>    |.if GPR64
> @@ -1949,8 +2200,10 @@ static void build_subroutines(BuildCtx *ctx)
>    |.ffunc_n math_frexp
>    |.if GPR64
>    |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
> -  |.else
> +  |.elif FPU
>    |  la CARG1, DISPATCH_GL(tmptv)(DISPATCH)
> +  |.else
> +  |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
>    |.endif
>    |   lwz PC, FRAME_PC(BASE)
>    |  blex frexp
> @@ -1959,7 +2212,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |.if not DUALNUM
>    |   tonum_i FARG2, TMP1
>    |.endif
> +  |.if FPU
>    |  stfd FARG1, 0(RA)
> +  |.else
> +  |  stw CRET1, 0(RA)
> +  |  stw CRET2, 4(RA)
> +  |.endif
>    |  li RD, (2+1)*8
>    |.if DUALNUM
>    |   stw TISNUM, 8(RA)
> @@ -1972,13 +2230,20 @@ static void build_subroutines(BuildCtx *ctx)
>    |.ffunc_n math_modf
>    |.if GPR64
>    |  la CARG2, -8(BASE)
> -  |.else
> +  |.elif FPU
>    |  la CARG1, -8(BASE)
> +  |.else
> +  |  la CARG3, -8(BASE)
>    |.endif
>    |   lwz PC, FRAME_PC(BASE)
>    |  blex modf
>    |   la RA, -8(BASE)
> +  |.if FPU
>    |  stfd FARG1, 0(BASE)
> +  |.else
> +  |  stw CRET1, 0(BASE)
> +  |  stw CRET2, 4(BASE)
> +  |.endif
>    |  li RD, (2+1)*8
>    |  b ->fff_res
>    |
> @@ -1986,13 +2251,13 @@ static void build_subroutines(BuildCtx *ctx)
>    |.if DUALNUM
>    |  .ffunc_1 name
>    |  checknum CARG3
> -  |   addi TMP1, BASE, 8
> -  |   add TMP2, BASE, NARGS8:RC
> +  |   addi SAVE0, BASE, 8
> +  |   add SAVE1, BASE, NARGS8:RC
>    |  bne >4
>    |1:  // Handle integers.
> -  |  lwz CARG4, 0(TMP1)
> -  |   cmplw cr1, TMP1, TMP2
> -  |  lwz CARG2, 4(TMP1)
> +  |  lwz CARG4, 0(SAVE0)
> +  |   cmplw cr1, SAVE0, SAVE1
> +  |  lwz CARG2, 4(SAVE0)
>    |   bge cr1, ->fff_resi
>    |  checknum CARG4
>    |   xoris TMP0, CARG1, 0x8000
> @@ -2009,36 +2274,76 @@ static void build_subroutines(BuildCtx *ctx)
>    |.if GPR64
>    |  rldicl CARG1, CARG1, 0, 32
>    |.endif
> -  |   addi TMP1, TMP1, 8
> +  |   addi SAVE0, SAVE0, 8
>    |  b <1
>    |3:
>    |  bge ->fff_fallback
>    |  // Convert intermediate result to number and continue below.
> +  |.if FPU
>    |  tonum_i FARG1, CARG1
> -  |  lfd FARG2, 0(TMP1)
> +  |  lfd FARG2, 0(SAVE0)
> +  |.else
> +  |  mr CARG2, CARG1
> +  |  bl ->vm_sfi2d_1
> +  |  lwz CARG3, 0(SAVE0)
> +  |  lwz CARG4, 4(SAVE0)
> +  |.endif
>    |  b >6
>    |4:
> +  |.if FPU
>    |   lfd FARG1, 0(BASE)
> +  |.else
> +  |   lwz CARG1, 0(BASE)
> +  |   lwz CARG2, 4(BASE)
> +  |.endif
>    |  bge ->fff_fallback
>    |5:  // Handle numbers.
> -  |  lwz CARG4, 0(TMP1)
> -  |   cmplw cr1, TMP1, TMP2
> -  |  lfd FARG2, 0(TMP1)
> +  |  lwz CARG3, 0(SAVE0)
> +  |   cmplw cr1, SAVE0, SAVE1
> +  |.if FPU
> +  |  lfd FARG2, 0(SAVE0)
> +  |.else
> +  |  lwz CARG4, 4(SAVE0)
> +  |.endif
>    |   bge cr1, ->fff_resn
> -  |  checknum CARG4; bge >7
> +  |  checknum CARG3; bge >7
>    |6:
> +  |   addi SAVE0, SAVE0, 8
> +  |.if FPU
>    |  fsub f0, FARG1, FARG2
> -  |   addi TMP1, TMP1, 8
>    |.if ismax
>    |  fsel FARG1, f0, FARG1, FARG2
>    |.else
>    |  fsel FARG1, f0, FARG2, FARG1
>    |.endif
> +  |.else
> +  |  stw CARG1, SFSAVE_1
> +  |  stw CARG2, SFSAVE_2
> +  |  stw CARG3, SFSAVE_3
> +  |  stw CARG4, SFSAVE_4
> +  |  blex __ledf2
> +  |  cmpwi CRET1, 0
> +  |.if ismax
> +  |  blt >8
> +  |.else
> +  |  bge >8
> +  |.endif
> +  |  lwz CARG1, SFSAVE_1
> +  |  lwz CARG2, SFSAVE_2
> +  |  b <5
> +  |8:
> +  |  lwz CARG1, SFSAVE_3
> +  |  lwz CARG2, SFSAVE_4
> +  |.endif
>    |  b <5
>    |7:  // Convert integer to number and continue above.
> -  |   lwz CARG2, 4(TMP1)
> +  |   lwz CARG3, 4(SAVE0)
>    |  bne ->fff_fallback
> -  |  tonum_i FARG2, CARG2
> +  |.if FPU
> +  |  tonum_i FARG2, CARG3
> +  |.else
> +  |  bl ->vm_sfi2d_2
> +  |.endif
>    |  b <6
>    |.else
>    |  .ffunc_n name
> @@ -2238,28 +2543,37 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |.macro .ffunc_bit_op, name, ins
>    |  .ffunc_bit name
> -  |  addi TMP1, BASE, 8
> -  |  add TMP2, BASE, NARGS8:RC
> +  |  addi SAVE0, BASE, 8
> +  |  add SAVE1, BASE, NARGS8:RC
>    |1:
> -  |  lwz CARG4, 0(TMP1)
> -  |   cmplw cr1, TMP1, TMP2
> +  |  lwz CARG4, 0(SAVE0)
> +  |   cmplw cr1, SAVE0, SAVE1
>    |.if DUALNUM
> -  |  lwz CARG2, 4(TMP1)
> +  |  lwz CARG2, 4(SAVE0)
>    |.else
> -  |  lfd FARG1, 0(TMP1)
> +  |  lfd FARG1, 0(SAVE0)
>    |.endif
>    |   bgey cr1, ->fff_resi
>    |  checknum CARG4
>    |.if DUALNUM
> +  |.if FPU
>    |  bnel ->fff_bitop_fb
>    |.else
> +  |  beq >3
> +  |  stw CARG1, SFSAVE_1
> +  |  bl ->fff_bitop_fb
> +  |  mr CARG2, CARG1
> +  |  lwz CARG1, SFSAVE_1
> +  |3:
> +  |.endif
> +  |.else
>    |  fadd FARG1, FARG1, TOBIT
>    |  bge ->fff_fallback
>    |  stfd FARG1, TMPD
>    |  lwz CARG2, TMPD_LO
>    |.endif
>    |  ins CARG1, CARG1, CARG2
> -  |   addi TMP1, TMP1, 8
> +  |   addi SAVE0, SAVE0, 8
>    |  b <1
>    |.endmacro
>    |
> @@ -2281,7 +2595,14 @@ static void build_subroutines(BuildCtx *ctx)
>    |.macro .ffunc_bit_sh, name, ins, shmod
>    |.if DUALNUM
>    |  .ffunc_2 bit_..name
> +  |.if FPU
>    |  checknum CARG3; bnel ->fff_tobit_fb
> +  |.else
> +  |  checknum CARG3; beq >1
> +  |  bl ->fff_tobit_fb
> +  |  lwz CARG2, 12(BASE)	// Conversion polluted CARG2.
> +  |1:
> +  |.endif
>    |  // Note: no inline conversion from number for 2nd argument!
>    |  checknum CARG4; bne ->fff_fallback
>    |.else
> @@ -2318,27 +2639,77 @@ static void build_subroutines(BuildCtx *ctx)
>    |->fff_resn:
>    |  lwz PC, FRAME_PC(BASE)
>    |  la RA, -8(BASE)
> +  |.if FPU
>    |  stfd FARG1, -8(BASE)
> +  |.else
> +  |  stw CARG1, -8(BASE)
> +  |  stw CARG2, -4(BASE)
> +  |.endif
>    |  b ->fff_res1
>    |
>    |// Fallback FP number to bit conversion.
>    |->fff_tobit_fb:
>    |.if DUALNUM
> +  |.if FPU
>    |  lfd FARG1, 0(BASE)
>    |  bgt ->fff_fallback
>    |  fadd FARG1, FARG1, TOBIT
>    |  stfd FARG1, TMPD
>    |  lwz CARG1, TMPD_LO
>    |  blr
> +  |.else
> +  |  bgt ->fff_fallback
> +  |  mr CARG2, CARG1
> +  |  mr CARG1, CARG3
> +  |// Modifies: CARG1, CARG2, TMP0, TMP1, TMP2.
> +  |->vm_tobit:
> +  |  slwi TMP2, CARG1, 1
> +  |  addis TMP2, TMP2, 0x0020
> +  |  cmpwi TMP2, 0
> +  |  bge >2
> +  |   li TMP1, 0x3e0
> +  |  srawi TMP2, TMP2, 21
> +  |   not TMP1, TMP1
> +  |  sub. TMP2, TMP1, TMP2
> +  |    cmpwi cr7, CARG1, 0
> +  |  blt >1
> +  |   slwi TMP1, CARG1, 11
> +  |    srwi TMP0, CARG2, 21
> +  |   oris TMP1, TMP1, 0x8000
> +  |   or TMP1, TMP1, TMP0
> +  |   srw CARG1, TMP1, TMP2
> +  |  bclr 4, 28			// Return if cr7[lt] == 0, no hint.
> +  |   neg CARG1, CARG1
> +  |  blr
> +  |1:
> +  |  addi TMP2, TMP2, 21
> +  |  srw TMP1, CARG2, TMP2
> +  |   slwi CARG2, CARG1, 12
> +  |  subfic TMP2, TMP2, 20
> +  |   slw TMP0, CARG2, TMP2
> +  |   or CARG1, TMP1, TMP0
> +  |  bclr 4, 28			// Return if cr7[lt] == 0, no hint.
> +  |   neg CARG1, CARG1
> +  |  blr
> +  |2:
> +  |  li CARG1, 0
> +  |  blr
> +  |.endif
>    |.endif
>    |->fff_bitop_fb:
>    |.if DUALNUM
> -  |  lfd FARG1, 0(TMP1)
> +  |.if FPU
> +  |  lfd FARG1, 0(SAVE0)
>    |  bgt ->fff_fallback
>    |  fadd FARG1, FARG1, TOBIT
>    |  stfd FARG1, TMPD
>    |  lwz CARG2, TMPD_LO
>    |  blr
> +  |.else
> +  |  bgt ->fff_fallback
> +  |  mr CARG1, CARG4
> +  |  b ->vm_tobit
> +  |.endif
>    |.endif
>    |
>    |//-----------------------------------------------------------------------
> @@ -2531,10 +2902,21 @@ static void build_subroutines(BuildCtx *ctx)
>    |  decode_RA8 RC, INS			// Call base.
>    |   beq >2
>    |1:  // Move results down.
> +  |.if FPU
>    |  lfd f0, 0(RA)
> +  |.else
> +  |  lwz CARG1, 0(RA)
> +  |  lwz CARG2, 4(RA)
> +  |.endif
>    |   addic. TMP1, TMP1, -8
>    |    addi RA, RA, 8
> +  |.if FPU
>    |  stfdx f0, BASE, RC
> +  |.else
> +  |  add CARG3, BASE, RC
> +  |  stw CARG1, 0(CARG3)
> +  |  stw CARG2, 4(CARG3)
> +  |.endif
>    |    addi RC, RC, 8
>    |   bne <1
>    |2:
> @@ -2587,10 +2969,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |//-----------------------------------------------------------------------
>    |
>    |.macro savex_, a, b, c, d
> +  |.if FPU
>    |  stfd f..a, 16+a*8(sp)
>    |  stfd f..b, 16+b*8(sp)
>    |  stfd f..c, 16+c*8(sp)
>    |  stfd f..d, 16+d*8(sp)
> +  |.endif
>    |.endmacro
>    |
>    |->vm_exit_handler:
> @@ -2662,16 +3046,16 @@ static void build_subroutines(BuildCtx *ctx)
>    |  lwz KBASE, PC2PROTO(k)(TMP1)
>    |  // Setup type comparison constants.
>    |  li TISNUM, LJ_TISNUM
> -  |  lus TMP3, 0x59c0			// TOBIT = 2^52 + 2^51 (float).
> -  |  stw TMP3, TMPD
> +  |  .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |  .FPU stw TMP3, TMPD
>    |  li ZERO, 0
> -  |  ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |  lfs TOBIT, TMPD
> -  |  stw TMP3, TMPD
> -  |  lus TMP0, 0x4338			// Hiword of 2^52 + 2^51 (double)
> +  |  .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |  .FPU lfs TOBIT, TMPD
> +  |  .FPU stw TMP3, TMPD
> +  |  .FPU lus TMP0, 0x4338			// Hiword of 2^52 + 2^51 (double)
>    |    li TISNIL, LJ_TNIL
> -  |  stw TMP0, TONUM_HI
> -  |  lfs TONUM, TMPD
> +  |  .FPU stw TMP0, TONUM_HI
> +  |  .FPU lfs TONUM, TMPD
>    |  // Modified copy of ins_next which handles function header dispatch, too.
>    |  lwz INS, 0(PC)
>    |   addi PC, PC, 4
> @@ -2716,7 +3100,35 @@ static void build_subroutines(BuildCtx *ctx)
>    |//-- Math helper functions ----------------------------------------------
>    |//-----------------------------------------------------------------------
>    |
> -  |// NYI: Use internal implementations of floor, ceil, trunc.
> +  |// NYI: Use internal implementations of floor, ceil, trunc, sfcmp.
> +  |
> +  |.macro sfi2d, AHI, ALO
> +  |.if not FPU
> +  |  mr. AHI, ALO
> +  |  bclr 12, 2				// Handle zero first.
> +  |  srawi TMP0, ALO, 31
> +  |  xor TMP1, ALO, TMP0
> +  |  sub TMP1, TMP1, TMP0		// Absolute value in TMP1.
> +  |  cntlzw AHI, TMP1
> +  |  andix. TMP0, TMP0, 0x800		// Mask sign bit.
> +  |  slw TMP1, TMP1, AHI		// Align mantissa left with leading 1.
> +  |  subfic AHI, AHI, 0x3ff+31-1	// Exponent -1 in AHI.
> +  |  slwi ALO, TMP1, 21
> +  |  or AHI, AHI, TMP0			// Sign | Exponent.
> +  |  srwi TMP1, TMP1, 11
> +  |  slwi AHI, AHI, 20			// Align left.
> +  |  add AHI, AHI, TMP1			// Add mantissa, increment exponent.
> +  |  blr
> +  |.endif
> +  |.endmacro
> +  |
> +  |// Input: CARG2. Output: CARG1, CARG2. Temporaries: TMP0, TMP1.
> +  |->vm_sfi2d_1:
> +  |  sfi2d CARG1, CARG2
> +  |
> +  |// Input: CARG4. Output: CARG3, CARG4. Temporaries: TMP0, TMP1.
> +  |->vm_sfi2d_2:
> +  |  sfi2d CARG3, CARG4
>    |
>    |->vm_modi:
>    |  divwo. TMP0, CARG1, CARG2
> @@ -2784,21 +3196,21 @@ static void build_subroutines(BuildCtx *ctx)
>    |   addi DISPATCH, r12, GG_G2DISP
>    |  stw r11, CTSTATE->cb.slot
>    |  stw r3, CTSTATE->cb.gpr[0]
> -  |   stfd f1, CTSTATE->cb.fpr[0]
> +  |   .FPU stfd f1, CTSTATE->cb.fpr[0]
>    |  stw r4, CTSTATE->cb.gpr[1]
> -  |   stfd f2, CTSTATE->cb.fpr[1]
> +  |   .FPU stfd f2, CTSTATE->cb.fpr[1]
>    |  stw r5, CTSTATE->cb.gpr[2]
> -  |   stfd f3, CTSTATE->cb.fpr[2]
> +  |   .FPU stfd f3, CTSTATE->cb.fpr[2]
>    |  stw r6, CTSTATE->cb.gpr[3]
> -  |   stfd f4, CTSTATE->cb.fpr[3]
> +  |   .FPU stfd f4, CTSTATE->cb.fpr[3]
>    |  stw r7, CTSTATE->cb.gpr[4]
> -  |   stfd f5, CTSTATE->cb.fpr[4]
> +  |   .FPU stfd f5, CTSTATE->cb.fpr[4]
>    |  stw r8, CTSTATE->cb.gpr[5]
> -  |   stfd f6, CTSTATE->cb.fpr[5]
> +  |   .FPU stfd f6, CTSTATE->cb.fpr[5]
>    |  stw r9, CTSTATE->cb.gpr[6]
> -  |   stfd f7, CTSTATE->cb.fpr[6]
> +  |   .FPU stfd f7, CTSTATE->cb.fpr[6]
>    |  stw r10, CTSTATE->cb.gpr[7]
> -  |   stfd f8, CTSTATE->cb.fpr[7]
> +  |   .FPU stfd f8, CTSTATE->cb.fpr[7]
>    |  addi TMP0, sp, CFRAME_SPACE+8
>    |  stw TMP0, CTSTATE->cb.stack
>    |   mr CARG1, CTSTATE
> @@ -2809,21 +3221,21 @@ static void build_subroutines(BuildCtx *ctx)
>    |  lp BASE, L:CRET1->base
>    |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>    |  lp RC, L:CRET1->top
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>    |     li ZERO, 0
>    |   mr L, CRET1
> -  |     stw TMP3, TMPD
> -  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
>    |  lwz LFUNC:RB, FRAME_FUNC(BASE)
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |     stw TMP0, TONUM_HI
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU stw TMP0, TONUM_HI
>    |     li TISNIL, LJ_TNIL
>    |    li_vmstate INTERP
> -  |     lfs TOBIT, TMPD
> -  |     stw TMP3, TMPD
> +  |     .FPU lfs TOBIT, TMPD
> +  |     .FPU stw TMP3, TMPD
>    |  sub RC, RC, BASE
>    |    st_vmstate
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>    |  ins_callt
>    |.endif
>    |
> @@ -2837,7 +3249,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  mr CARG2, RA
>    |  bl extern lj_ccallback_leave	// (CTState *cts, TValue *o)
>    |  lwz CRET1, CTSTATE->cb.gpr[0]
> -  |  lfd FARG1, CTSTATE->cb.fpr[0]
> +  |  .FPU lfd FARG1, CTSTATE->cb.fpr[0]
>    |  lwz CRET2, CTSTATE->cb.gpr[1]
>    |  b ->vm_leave_unw
>    |.endif
> @@ -2871,14 +3283,14 @@ static void build_subroutines(BuildCtx *ctx)
>    |  bge <1
>    |2:
>    |  bney cr1, >3
> -  |  lfd f1, CCSTATE->fpr[0]
> -  |  lfd f2, CCSTATE->fpr[1]
> -  |  lfd f3, CCSTATE->fpr[2]
> -  |  lfd f4, CCSTATE->fpr[3]
> -  |  lfd f5, CCSTATE->fpr[4]
> -  |  lfd f6, CCSTATE->fpr[5]
> -  |  lfd f7, CCSTATE->fpr[6]
> -  |  lfd f8, CCSTATE->fpr[7]
> +  |  .FPU lfd f1, CCSTATE->fpr[0]
> +  |  .FPU lfd f2, CCSTATE->fpr[1]
> +  |  .FPU lfd f3, CCSTATE->fpr[2]
> +  |  .FPU lfd f4, CCSTATE->fpr[3]
> +  |  .FPU lfd f5, CCSTATE->fpr[4]
> +  |  .FPU lfd f6, CCSTATE->fpr[5]
> +  |  .FPU lfd f7, CCSTATE->fpr[6]
> +  |  .FPU lfd f8, CCSTATE->fpr[7]
>    |3:
>    |   lp TMP0, CCSTATE->func
>    |  lwz CARG2, CCSTATE->gpr[1]
> @@ -2895,7 +3307,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  lwz TMP2, -4(r14)
>    |   lwz TMP0, 4(r14)
>    |  stw CARG1, CCSTATE:TMP1->gpr[0]
> -  |  stfd FARG1, CCSTATE:TMP1->fpr[0]
> +  |  .FPU stfd FARG1, CCSTATE:TMP1->fpr[0]
>    |  stw CARG2, CCSTATE:TMP1->gpr[1]
>    |   mtlr TMP0
>    |  stw CARG3, CCSTATE:TMP1->gpr[2]
> @@ -2924,19 +3336,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>    case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
>      |  // RA = src1*8, RD = src2*8, JMP with RD = target
>      |.if DUALNUM
> -    |  lwzux TMP0, RA, BASE
> +    |  lwzux CARG1, RA, BASE
>      |    addi PC, PC, 4
>      |   lwz CARG2, 4(RA)
> -    |  lwzux TMP1, RD, BASE
> +    |  lwzux CARG3, RD, BASE
>      |    lwz TMP2, -4(PC)
> -    |  checknum cr0, TMP0
> -    |   lwz CARG3, 4(RD)
> +    |  checknum cr0, CARG1
> +    |   lwz CARG4, 4(RD)
>      |    decode_RD4 TMP2, TMP2
> -    |  checknum cr1, TMP1
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |  checknum cr1, CARG3
> +    |    addis SAVE0, TMP2, -(BCBIAS_J*4 >> 16)
>      |  bne cr0, >7
>      |  bne cr1, >8
> -    |   cmpw CARG2, CARG3
> +    |   cmpw CARG2, CARG4
>      if (op == BC_ISLT) {
>        |  bge >2
>      } else if (op == BC_ISGE) {
> @@ -2947,28 +3359,41 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  ble >2
>      }
>      |1:
> -    |  add PC, PC, TMP2
> +    |  add PC, PC, SAVE0
>      |2:
>      |  ins_next
>      |
>      |7:  // RA is not an integer.
>      |  bgt cr0, ->vmeta_comp
>      |  // RA is a number.
> -    |   lfd f0, 0(RA)
> +    |   .FPU lfd f0, 0(RA)
>      |  bgt cr1, ->vmeta_comp
>      |  blt cr1, >4
>      |  // RA is a number, RD is an integer.
> -    |  tonum_i f1, CARG3
> +    |.if FPU
> +    |  tonum_i f1, CARG4
> +    |.else
> +    |  bl ->vm_sfi2d_2
> +    |.endif
>      |  b >5
>      |
>      |8: // RA is an integer, RD is not an integer.
>      |  bgt cr1, ->vmeta_comp
>      |  // RA is an integer, RD is a number.
> +    |.if FPU
>      |  tonum_i f0, CARG2
> +    |.else
> +    |  bl ->vm_sfi2d_1
> +    |.endif
>      |4:
> -    |  lfd f1, 0(RD)
> +    |  .FPU lfd f1, 0(RD)
>      |5:
> +    |.if FPU
>      |  fcmpu cr0, f0, f1
> +    |.else
> +    |  blex __ledf2
> +    |  cmpwi CRET1, 0
> +    |.endif
>      if (op == BC_ISLT) {
>        |  bge <2
>      } else if (op == BC_ISGE) {
> @@ -3016,42 +3441,42 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      vk = op == BC_ISEQV;
>      |  // RA = src1*8, RD = src2*8, JMP with RD = target
>      |.if DUALNUM
> -    |  lwzux TMP0, RA, BASE
> +    |  lwzux CARG1, RA, BASE
>      |    addi PC, PC, 4
>      |   lwz CARG2, 4(RA)
> -    |  lwzux TMP1, RD, BASE
> -    |  checknum cr0, TMP0
> -    |    lwz TMP2, -4(PC)
> -    |  checknum cr1, TMP1
> -    |    decode_RD4 TMP2, TMP2
> -    |   lwz CARG3, 4(RD)
> +    |  lwzux CARG3, RD, BASE
> +    |  checknum cr0, CARG1
> +    |    lwz SAVE0, -4(PC)
> +    |  checknum cr1, CARG3
> +    |    decode_RD4 SAVE0, SAVE0
> +    |   lwz CARG4, 4(RD)
>      |  cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>      if (vk) {
>        |  ble cr7, ->BC_ISEQN_Z
>      } else {
>        |  ble cr7, ->BC_ISNEN_Z
>      }
>      |.else
> -    |  lwzux TMP0, RA, BASE
> -    |   lwz TMP2, 0(PC)
> +    |  lwzux CARG1, RA, BASE
> +    |   lwz SAVE0, 0(PC)
>      |    lfd f0, 0(RA)
>      |   addi PC, PC, 4
> -    |  lwzux TMP1, RD, BASE
> -    |  checknum cr0, TMP0
> -    |   decode_RD4 TMP2, TMP2
> +    |  lwzux CARG3, RD, BASE
> +    |  checknum cr0, CARG1
> +    |   decode_RD4 SAVE0, SAVE0
>      |    lfd f1, 0(RD)
> -    |  checknum cr1, TMP1
> -    |   addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |  checknum cr1, CARG3
> +    |   addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>      |  bge cr0, >5
>      |  bge cr1, >5
>      |  fcmpu cr0, f0, f1
>      if (vk) {
>        |  bne >1
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>      } else {
>        |  beq >1
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>      }
>      |1:
>      |  ins_next
> @@ -3059,36 +3484,36 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |5:  // Either or both types are not numbers.
>      |.if not DUALNUM
>      |    lwz CARG2, 4(RA)
> -    |    lwz CARG3, 4(RD)
> +    |    lwz CARG4, 4(RD)
>      |.endif
>      |.if FFI
> -    |  cmpwi cr7, TMP0, LJ_TCDATA
> -    |  cmpwi cr5, TMP1, LJ_TCDATA
> +    |  cmpwi cr7, CARG1, LJ_TCDATA
> +    |  cmpwi cr5, CARG3, LJ_TCDATA
>      |.endif
> -    |   not TMP3, TMP0
> -    |  cmplw TMP0, TMP1
> -    |   cmplwi cr1, TMP3, ~LJ_TISPRI		// Primitive?
> +    |   not TMP2, CARG1
> +    |  cmplw CARG1, CARG3
> +    |   cmplwi cr1, TMP2, ~LJ_TISPRI		// Primitive?
>      |.if FFI
>      |  cror 4*cr7+eq, 4*cr7+eq, 4*cr5+eq
>      |.endif
> -    |   cmplwi cr6, TMP3, ~LJ_TISTABUD		// Table or userdata?
> +    |   cmplwi cr6, TMP2, ~LJ_TISTABUD		// Table or userdata?
>      |.if FFI
>      |  beq cr7, ->vmeta_equal_cd
>      |.endif
> -    |    cmplw cr5, CARG2, CARG3
> +    |    cmplw cr5, CARG2, CARG4
>      |  crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt	// 2: Same type and primitive.
>      |  crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq	// 1: Same tv or different type.
>      |  crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq	// 0: Same type and same tv.
> -    |   mr SAVE0, PC
> +    |   mr SAVE1, PC
>      |  cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt	// 0 or 2.
>      |  cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt	// 1 or 2.
>      if (vk) {
>        |  bne cr0, >6
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>        |6:
>      } else {
>        |  beq cr0, >6
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>        |6:
>      }
>      |.if DUALNUM
> @@ -3103,6 +3528,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |
>      |  // Different tables or userdatas. Need to check __eq metamethod.
>      |  // Field metatable must be at same offset for GCtab and GCudata!
> +    |   mr CARG3, CARG4
>      |  lwz TAB:TMP2, TAB:CARG2->metatable
>      |   li CARG4, 1-vk			// ne = 0 or 1.
>      |  cmplwi TAB:TMP2, 0
> @@ -3110,7 +3536,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  lbz TMP2, TAB:TMP2->nomm
>      |  andix. TMP2, TMP2, 1<<MM_eq
>      |  bne <1				// Or 'no __eq' flag set?
> -    |  mr PC, SAVE0			// Restore old PC.
> +    |  mr PC, SAVE1			// Restore old PC.
>      |  b ->vmeta_equal			// Handle __eq metamethod.
>      break;
>  
> @@ -3151,16 +3577,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      vk = op == BC_ISEQN;
>      |  // RA = src*8, RD = num_const*8, JMP with RD = target
>      |.if DUALNUM
> -    |  lwzux TMP0, RA, BASE
> +    |  lwzux CARG1, RA, BASE
>      |    addi PC, PC, 4
>      |   lwz CARG2, 4(RA)
> -    |  lwzux TMP1, RD, KBASE
> -    |  checknum cr0, TMP0
> -    |    lwz TMP2, -4(PC)
> -    |  checknum cr1, TMP1
> -    |    decode_RD4 TMP2, TMP2
> -    |   lwz CARG3, 4(RD)
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |  lwzux CARG3, RD, KBASE
> +    |  checknum cr0, CARG1
> +    |    lwz SAVE0, -4(PC)
> +    |  checknum cr1, CARG3
> +    |    decode_RD4 SAVE0, SAVE0
> +    |   lwz CARG4, 4(RD)
> +    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>      if (vk) {
>        |->BC_ISEQN_Z:
>      } else {
> @@ -3168,7 +3594,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      }
>      |  bne cr0, >7
>      |  bne cr1, >8
> -    |   cmpw CARG2, CARG3
> +    |   cmpw CARG2, CARG4
>      |4:
>      |.else
>      if (vk) {
> @@ -3176,20 +3602,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      } else {
>        |->BC_ISNEN_Z:  // Dummy label.
>      }
> -    |  lwzx TMP0, BASE, RA
> +    |  lwzx CARG1, BASE, RA
>      |    addi PC, PC, 4
>      |   lfdx f0, BASE, RA
> -    |    lwz TMP2, -4(PC)
> +    |    lwz SAVE0, -4(PC)
>      |  lfdx f1, KBASE, RD
> -    |    decode_RD4 TMP2, TMP2
> -    |  checknum TMP0
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |    decode_RD4 SAVE0, SAVE0
> +    |  checknum CARG1
> +    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>      |  bge >3
>      |  fcmpu cr0, f0, f1
>      |.endif
>      if (vk) {
>        |  bne >1
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>        |1:
>        |.if not FFI
>        |3:
> @@ -3200,13 +3626,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |.if not FFI
>        |3:
>        |.endif
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>        |2:
>      }
>      |  ins_next
>      |.if FFI
>      |3:
> -    |  cmpwi TMP0, LJ_TCDATA
> +    |  cmpwi CARG1, LJ_TCDATA
>      |  beq ->vmeta_equal_cd
>      |  b <1
>      |.endif
> @@ -3214,18 +3640,31 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |7:  // RA is not an integer.
>      |  bge cr0, <3
>      |  // RA is a number.
> -    |   lfd f0, 0(RA)
> +    |   .FPU lfd f0, 0(RA)
>      |  blt cr1, >1
>      |  // RA is a number, RD is an integer.
> -    |  tonum_i f1, CARG3
> +    |.if FPU
> +    |  tonum_i f1, CARG4
> +    |.else
> +    |  bl ->vm_sfi2d_2
> +    |.endif
>      |  b >2
>      |
>      |8: // RA is an integer, RD is a number.
> +    |.if FPU
>      |  tonum_i f0, CARG2
> +    |.else
> +    |  bl ->vm_sfi2d_1
> +    |.endif
>      |1:
> -    |  lfd f1, 0(RD)
> +    |  .FPU lfd f1, 0(RD)
>      |2:
> +    |.if FPU
>      |  fcmpu cr0, f0, f1
> +    |.else
> +    |  blex __ledf2
> +    |  cmpwi CRET1, 0
> +    |.endif
>      |  b <4
>      |.endif
>      break;
> @@ -3280,7 +3719,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  add PC, PC, TMP2
>      } else {
>        |  li TMP1, LJ_TFALSE
> +      |.if FPU
>        |   lfdx f0, BASE, RD
> +      |.else
> +      |   lwzux CARG1, RD, BASE
> +      |   lwz CARG2, 4(RD)
> +      |.endif
>        |  cmplw TMP0, TMP1
>        if (op == BC_ISTC) {
>  	|  bge >1
> @@ -3289,7 +3733,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        }
>        |  addis PC, PC, -(BCBIAS_J*4 >> 16)
>        |  decode_RD4 TMP2, INS
> +      |.if FPU
>        |   stfdx f0, BASE, RA
> +      |.else
> +      |   stwux CARG1, RA, BASE
> +      |   stw CARG2, 4(RA)
> +      |.endif
>        |  add PC, PC, TMP2
>        |1:
>      }
> @@ -3324,8 +3773,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>    case BC_MOV:
>      |  // RA = dst*8, RD = src*8
>      |  ins_next1
> +    |.if FPU
>      |  lfdx f0, BASE, RD
>      |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwzux TMP0, RD, BASE
> +    |  lwz TMP1, 4(RD)
> +    |  stwux TMP0, RA, BASE
> +    |  stw TMP1, 4(RA)
> +    |.endif
>      |  ins_next2
>      break;
>    case BC_NOT:
> @@ -3427,44 +3883,65 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
>      ||switch (vk) {
>      ||case 0:
> -    |   lwzx TMP1, BASE, RB
> +    |   lwzx CARG1, BASE, RB
>      |   .if DUALNUM
> -    |     lwzx TMP2, KBASE, RC
> +    |     lwzx CARG3, KBASE, RC
>      |   .endif
> +    |   .if FPU
>      |    lfdx f14, BASE, RB
>      |    lfdx f15, KBASE, RC
> +    |   .else
> +    |    add TMP1, BASE, RB
> +    |    add TMP2, KBASE, RC
> +    |    lwz CARG2, 4(TMP1)
> +    |    lwz CARG4, 4(TMP2)
> +    |   .endif
>      |   .if DUALNUM
> -    |     checknum cr0, TMP1
> -    |     checknum cr1, TMP2
> +    |     checknum cr0, CARG1
> +    |     checknum cr1, CARG3
>      |     crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |     bge ->vmeta_arith_vn
>      |   .else
> -    |     checknum TMP1; bge ->vmeta_arith_vn
> +    |     checknum CARG1; bge ->vmeta_arith_vn
>      |   .endif
>      ||  break;
>      ||case 1:
> -    |   lwzx TMP1, BASE, RB
> +    |   lwzx CARG1, BASE, RB
>      |   .if DUALNUM
> -    |     lwzx TMP2, KBASE, RC
> +    |     lwzx CARG3, KBASE, RC
>      |   .endif
> +    |   .if FPU
>      |    lfdx f15, BASE, RB
>      |    lfdx f14, KBASE, RC
> +    |   .else
> +    |    add TMP1, BASE, RB
> +    |    add TMP2, KBASE, RC
> +    |    lwz CARG2, 4(TMP1)
> +    |    lwz CARG4, 4(TMP2)
> +    |   .endif
>      |   .if DUALNUM
> -    |     checknum cr0, TMP1
> -    |     checknum cr1, TMP2
> +    |     checknum cr0, CARG1
> +    |     checknum cr1, CARG3
>      |     crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |     bge ->vmeta_arith_nv
>      |   .else
> -    |     checknum TMP1; bge ->vmeta_arith_nv
> +    |     checknum CARG1; bge ->vmeta_arith_nv
>      |   .endif
>      ||  break;
>      ||default:
> -    |   lwzx TMP1, BASE, RB
> -    |   lwzx TMP2, BASE, RC
> +    |   lwzx CARG1, BASE, RB
> +    |   lwzx CARG3, BASE, RC
> +    |   .if FPU
>      |    lfdx f14, BASE, RB
>      |    lfdx f15, BASE, RC
> -    |   checknum cr0, TMP1
> -    |   checknum cr1, TMP2
> +    |   .else
> +    |    add TMP1, BASE, RB
> +    |    add TMP2, BASE, RC
> +    |    lwz CARG2, 4(TMP1)
> +    |    lwz CARG4, 4(TMP2)
> +    |   .endif
> +    |   checknum cr0, CARG1
> +    |   checknum cr1, CARG3
>      |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |   bge ->vmeta_arith_vv
>      ||  break;
> @@ -3498,48 +3975,78 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  fsub a, b, a			// b - floor(b/c)*c
>      |.endmacro
>      |
> +    |.macro sfpmod
> +    |->BC_MODVN_Z:
> +    |  stw CARG1, SFSAVE_1
> +    |  stw CARG2, SFSAVE_2
> +    |  mr SAVE0, CARG3
> +    |  mr SAVE1, CARG4
> +    |  blex __divdf3
> +    |  blex floor
> +    |  mr CARG3, SAVE0
> +    |  mr CARG4, SAVE1
> +    |  blex __muldf3
> +    |  mr CARG3, CRET1
> +    |  mr CARG4, CRET2
> +    |  lwz CARG1, SFSAVE_1
> +    |  lwz CARG2, SFSAVE_2
> +    |  blex __subdf3
> +    |.endmacro
> +    |
>      |.macro ins_arithfp, fpins
>      |  ins_arithpre
>      |.if "fpins" == "fpmod_"
>      |  b ->BC_MODVN_Z			// Avoid 3 copies. It's slow anyway.
> -    |.else
> +    |.elif FPU
>      |  fpins f0, f14, f15
>      |  ins_next1
>      |  stfdx f0, BASE, RA
>      |  ins_next2
> +    |.else
> +    |  blex __divdf3			// Only soft-float div uses this macro.
> +    |  ins_next1
> +    |  stwux CRET1, RA, BASE
> +    |  stw CRET2, 4(RA)
> +    |  ins_next2
>      |.endif
>      |.endmacro
>      |
> -    |.macro ins_arithdn, intins, fpins
> +    |.macro ins_arithdn, intins, fpins, fpcall
>      |  // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8
>      ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
>      ||switch (vk) {
>      ||case 0:
> -    |   lwzux TMP1, RB, BASE
> -    |   lwzux TMP2, RC, KBASE
> -    |    lwz CARG1, 4(RB)
> -    |   checknum cr0, TMP1
> -    |    lwz CARG2, 4(RC)
> +    |   lwzux CARG1, RB, BASE
> +    |   lwzux CARG3, RC, KBASE
> +    |    lwz CARG2, 4(RB)
> +    |   checknum cr0, CARG1
> +    |    lwz CARG4, 4(RC)
> +    |   checknum cr1, CARG3
>      ||  break;
>      ||case 1:
> -    |   lwzux TMP1, RB, BASE
> -    |   lwzux TMP2, RC, KBASE
> -    |    lwz CARG2, 4(RB)
> -    |   checknum cr0, TMP1
> -    |    lwz CARG1, 4(RC)
> +    |   lwzux CARG3, RB, BASE
> +    |   lwzux CARG1, RC, KBASE
> +    |    lwz CARG4, 4(RB)
> +    |   checknum cr0, CARG3
> +    |    lwz CARG2, 4(RC)
> +    |   checknum cr1, CARG1
>      ||  break;
>      ||default:
> -    |   lwzux TMP1, RB, BASE
> -    |   lwzux TMP2, RC, BASE
> -    |    lwz CARG1, 4(RB)
> -    |   checknum cr0, TMP1
> -    |    lwz CARG2, 4(RC)
> +    |   lwzux CARG1, RB, BASE
> +    |   lwzux CARG3, RC, BASE
> +    |    lwz CARG2, 4(RB)
> +    |   checknum cr0, CARG1
> +    |    lwz CARG4, 4(RC)
> +    |   checknum cr1, CARG3
>      ||  break;
>      ||}
> -    |  checknum cr1, TMP2
>      |  bne >5
>      |  bne cr1, >5
> -    |  intins CARG1, CARG1, CARG2
> +    |.if "intins" == "intmod"
> +    |  mr CARG1, CARG2
> +    |  mr CARG2, CARG4
> +    |.endif
> +    |  intins CARG1, CARG2, CARG4
>      |  bso >4
>      |1:
>      |  ins_next1
> @@ -3551,29 +4058,40 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  checkov TMP0, <1			// Ignore unrelated overflow.
>      |  ins_arithfallback b
>      |5:  // FP variant.
> +    |.if FPU
>      ||if (vk == 1) {
>      |  lfd f15, 0(RB)
> -    |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |  lfd f14, 0(RC)
>      ||} else {
>      |  lfd f14, 0(RB)
> -    |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |  lfd f15, 0(RC)
>      ||}
> +    |.endif
> +    |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |   ins_arithfallback bge
>      |.if "fpins" == "fpmod_"
>      |  b ->BC_MODVN_Z			// Avoid 3 copies. It's slow anyway.
>      |.else
> +    |.if FPU
>      |  fpins f0, f14, f15
> -    |  ins_next1
>      |  stfdx f0, BASE, RA
> +    |.else
> +    |.if "fpcall" == "sfpmod"
> +    |  sfpmod
> +    |.else
> +    |  blex fpcall
> +    |.endif
> +    |  stwux CRET1, RA, BASE
> +    |  stw CRET2, 4(RA)
> +    |.endif
> +    |  ins_next1
>      |  b <2
>      |.endif
>      |.endmacro
>      |
> -    |.macro ins_arith, intins, fpins
> +    |.macro ins_arith, intins, fpins, fpcall
>      |.if DUALNUM
> -    |  ins_arithdn intins, fpins
> +    |  ins_arithdn intins, fpins, fpcall
>      |.else
>      |  ins_arithfp fpins
>      |.endif
> @@ -3588,9 +4106,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  addo. TMP0, TMP0, TMP3
>      |  add y, a, b
>      |.endmacro
> -    |  ins_arith addo32., fadd
> +    |  ins_arith addo32., fadd, __adddf3
>      |.else
> -    |  ins_arith addo., fadd
> +    |  ins_arith addo., fadd, __adddf3
>      |.endif
>      break;
>    case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
> @@ -3602,36 +4120,48 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  subo. TMP0, TMP0, TMP3
>      |  sub y, a, b
>      |.endmacro
> -    |  ins_arith subo32., fsub
> +    |  ins_arith subo32., fsub, __subdf3
>      |.else
> -    |  ins_arith subo., fsub
> +    |  ins_arith subo., fsub, __subdf3
>      |.endif
>      break;
>    case BC_MULVN: case BC_MULNV: case BC_MULVV:
> -    |  ins_arith mullwo., fmul
> +    |  ins_arith mullwo., fmul, __muldf3
>      break;
>    case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
>      |  ins_arithfp fdiv
>      break;
>    case BC_MODVN:
> -    |  ins_arith intmod, fpmod
> +    |  ins_arith intmod, fpmod, sfpmod
>      break;
>    case BC_MODNV: case BC_MODVV:
> -    |  ins_arith intmod, fpmod_
> +    |  ins_arith intmod, fpmod_, sfpmod
>      break;
>    case BC_POW:
>      |  // NYI: (partial) integer arithmetic.
> -    |  lwzx TMP1, BASE, RB
> +    |  lwzx CARG1, BASE, RB
> +    |  lwzx CARG3, BASE, RC
> +    |.if FPU
>      |   lfdx FARG1, BASE, RB
> -    |  lwzx TMP2, BASE, RC
>      |   lfdx FARG2, BASE, RC
> -    |  checknum cr0, TMP1
> -    |  checknum cr1, TMP2
> +    |.else
> +    |   add TMP1, BASE, RB
> +    |   add TMP2, BASE, RC
> +    |   lwz CARG2, 4(TMP1)
> +    |   lwz CARG4, 4(TMP2)
> +    |.endif
> +    |  checknum cr0, CARG1
> +    |  checknum cr1, CARG3
>      |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |  bge ->vmeta_arith_vv
>      |  blex pow
>      |  ins_next1
> +    |.if FPU
>      |  stfdx FARG1, BASE, RA
> +    |.else
> +    |  stwux CARG1, RA, BASE
> +    |  stw CARG2, 4(RA)
> +    |.endif
>      |  ins_next2
>      break;
>  
> @@ -3651,8 +4181,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   lp BASE, L->base
>      |  bne ->vmeta_binop
>      |  ins_next1
> +    |.if FPU
>      |  lfdx f0, BASE, SAVE0		// Copy result from RB to RA.
>      |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwzux TMP0, SAVE0, BASE
> +    |  lwz TMP1, 4(SAVE0)
> +    |  stwux TMP0, RA, BASE
> +    |  stw TMP1, 4(RA)
> +    |.endif
>      |  ins_next2
>      break;
>  
> @@ -3715,8 +4252,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>    case BC_KNUM:
>      |  // RA = dst*8, RD = num_const*8
>      |  ins_next1
> +    |.if FPU
>      |  lfdx f0, KBASE, RD
>      |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwzux TMP0, RD, KBASE
> +    |  lwz TMP1, 4(RD)
> +    |  stwux TMP0, RA, BASE
> +    |  stw TMP1, 4(RA)
> +    |.endif
>      |  ins_next2
>      break;
>    case BC_KPRI:
> @@ -3749,8 +4293,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  lwzx UPVAL:RB, LFUNC:RB, RD
>      |  ins_next1
>      |  lwz TMP1, UPVAL:RB->v
> +    |.if FPU
>      |  lfd f0, 0(TMP1)
>      |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwz TMP2, 0(TMP1)
> +    |  lwz TMP3, 4(TMP1)
> +    |  stwux TMP2, RA, BASE
> +    |  stw TMP3, 4(RA)
> +    |.endif
>      |  ins_next2
>      break;
>    case BC_USETV:
> @@ -3758,14 +4309,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  lwz LFUNC:RB, FRAME_FUNC(BASE)
>      |    srwi RA, RA, 1
>      |    addi RA, RA, offsetof(GCfuncL, uvptr)
> +    |.if FPU
>      |   lfdux f0, RD, BASE
> +    |.else
> +    |   lwzux CARG1, RD, BASE
> +    |   lwz CARG3, 4(RD)
> +    |.endif
>      |  lwzx UPVAL:RB, LFUNC:RB, RA
>      |  lbz TMP3, UPVAL:RB->marked
>      |   lwz CARG2, UPVAL:RB->v
>      |  andix. TMP3, TMP3, LJ_GC_BLACK	// isblack(uv)
>      |    lbz TMP0, UPVAL:RB->closed
>      |   lwz TMP2, 0(RD)
> +    |.if FPU
>      |   stfd f0, 0(CARG2)
> +    |.else
> +    |   stw CARG1, 0(CARG2)
> +    |   stw CARG3, 4(CARG2)
> +    |.endif
>      |    cmplwi cr1, TMP0, 0
>      |   lwz TMP1, 4(RD)
>      |  cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -3821,11 +4382,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  lwz LFUNC:RB, FRAME_FUNC(BASE)
>      |   srwi RA, RA, 1
>      |   addi RA, RA, offsetof(GCfuncL, uvptr)
> +    |.if FPU
>      |    lfdx f0, KBASE, RD
> +    |.else
> +    |    lwzux TMP2, RD, KBASE
> +    |    lwz TMP3, 4(RD)
> +    |.endif
>      |  lwzx UPVAL:RB, LFUNC:RB, RA
>      |  ins_next1
>      |  lwz TMP1, UPVAL:RB->v
> +    |.if FPU
>      |  stfd f0, 0(TMP1)
> +    |.else
> +    |  stw TMP2, 0(TMP1)
> +    |  stw TMP3, 4(TMP1)
> +    |.endif
>      |  ins_next2
>      break;
>    case BC_USETP:
> @@ -3973,11 +4544,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.endif
>      |  ble ->vmeta_tgetv		// Integer key and in array part?
>      |  lwzx TMP0, TMP1, TMP2
> +    |.if FPU
>      |   lfdx f14, TMP1, TMP2
> +    |.else
> +    |   lwzux SAVE0, TMP1, TMP2
> +    |   lwz SAVE1, 4(TMP1)
> +    |.endif
>      |  checknil TMP0; beq >2
>      |1:
>      |  ins_next1
> +    |.if FPU
>      |   stfdx f14, BASE, RA
> +    |.else
> +    |   stwux SAVE0, RA, BASE
> +    |   stw SAVE1, 4(RA)
> +    |.endif
>      |  ins_next2
>      |
>      |2:  // Check for __index if table value is nil.
> @@ -4053,12 +4634,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  lwz TMP1, TAB:RB->asize
>      |   lwz TMP2, TAB:RB->array
>      |  cmplw TMP0, TMP1; bge ->vmeta_tgetb
> +    |.if FPU
>      |  lwzx TMP1, TMP2, RC
>      |   lfdx f0, TMP2, RC
> +    |.else
> +    |  lwzux TMP1, TMP2, RC
> +    |   lwz TMP3, 4(TMP2)
> +    |.endif
>      |  checknil TMP1; beq >5
>      |1:
>      |  ins_next1
> +    |.if FPU
>      |   stfdx f0, BASE, RA
> +    |.else
> +    |   stwux TMP1, RA, BASE
> +    |   stw TMP3, 4(RA)
> +    |.endif
>      |  ins_next2
>      |
>      |5:  // Check for __index if table value is nil.
> @@ -4088,10 +4679,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  cmplw TMP0, CARG2
>      |   slwi TMP2, CARG2, 3
>      |  ble ->vmeta_tgetr		// In array part?
> +    |.if FPU
>      |   lfdx f14, TMP1, TMP2
> +    |.else
> +    |   lwzux SAVE0, TMP2, TMP1
> +    |   lwz SAVE1, 4(TMP2)
> +    |.endif
>      |->BC_TGETR_Z:
>      |  ins_next1
> +    |.if FPU
>      |   stfdx f14, BASE, RA
> +    |.else
> +    |   stwux SAVE0, RA, BASE
> +    |   stw SAVE1, 4(RA)
> +    |.endif
>      |  ins_next2
>      break;
>  
> @@ -4132,11 +4733,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  ble ->vmeta_tsetv		// Integer key and in array part?
>      |   lwzx TMP2, TMP1, TMP0
>      |  lbz TMP3, TAB:RB->marked
> +    |.if FPU
>      |    lfdx f14, BASE, RA
> +    |.else
> +    |    add SAVE1, BASE, RA
> +    |    lwz SAVE0, 0(SAVE1)
> +    |    lwz SAVE1, 4(SAVE1)
> +    |.endif
>      |   checknil TMP2; beq >3
>      |1:
>      |  andix. TMP2, TMP3, LJ_GC_BLACK	// isblack(table)
> +    |.if FPU
>      |    stfdx f14, TMP1, TMP0
> +    |.else
> +    |    stwux SAVE0, TMP1, TMP0
> +    |    stw SAVE1, 4(TMP1)
> +    |.endif
>      |  bne >7
>      |2:
>      |  ins_next
> @@ -4177,7 +4789,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  lwz NODE:TMP2, TAB:RB->node
>      |    stb ZERO, TAB:RB->nomm		// Clear metamethod cache.
>      |  and TMP1, TMP1, TMP0		// idx = str->hash & tab->hmask
> +    |.if FPU
>      |    lfdx f14, BASE, RA
> +    |.else
> +    |    add CARG2, BASE, RA
> +    |    lwz SAVE0, 0(CARG2)
> +    |    lwz SAVE1, 4(CARG2)
> +    |.endif
>      |  slwi TMP0, TMP1, 5
>      |  slwi TMP1, TMP1, 3
>      |  sub TMP1, TMP0, TMP1
> @@ -4193,7 +4811,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |    checknil CARG2; beq >4		// Key found, but nil value?
>      |2:
>      |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
> +    |.if FPU
>      |    stfd f14, NODE:TMP2->val
> +    |.else
> +    |    stw SAVE0, NODE:TMP2->val.u32.hi
> +    |    stw SAVE1, NODE:TMP2->val.u32.lo
> +    |.endif
>      |  bne >7
>      |3:
>      |  ins_next
> @@ -4232,7 +4855,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  bl extern lj_tab_newkey		// (lua_State *L, GCtab *t, TValue *k)
>      |  // Returns TValue *.
>      |  lp BASE, L->base
> +    |.if FPU
>      |  stfd f14, 0(CRET1)
> +    |.else
> +    |  stw SAVE0, 0(CRET1)
> +    |  stw SAVE1, 4(CRET1)
> +    |.endif
>      |  b <3				// No 2nd write barrier needed.
>      |
>      |7:  // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4249,13 +4877,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   lwz TMP2, TAB:RB->array
>      |    lbz TMP3, TAB:RB->marked
>      |  cmplw TMP0, TMP1
> +    |.if FPU
>      |   lfdx f14, BASE, RA
> +    |.else
> +    |   add CARG2, BASE, RA
> +    |   lwz SAVE0, 0(CARG2)
> +    |   lwz SAVE1, 4(CARG2)
> +    |.endif
>      |  bge ->vmeta_tsetb
>      |  lwzx TMP1, TMP2, RC
>      |  checknil TMP1; beq >5
>      |1:
>      |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
> +    |.if FPU
>      |   stfdx f14, TMP2, RC
> +    |.else
> +    |   stwux SAVE0, RC, TMP2
> +    |   stw SAVE1, 4(RC)
> +    |.endif
>      |  bne >7
>      |2:
>      |  ins_next
> @@ -4295,10 +4934,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |2:
>      |  cmplw TMP0, CARG3
>      |   slwi TMP2, CARG3, 3
> +    |.if FPU
>      |   lfdx f14, BASE, RA
> +    |.else
> +    |  lwzux SAVE0, RA, BASE
> +    |  lwz SAVE1, 4(RA)
> +    |.endif
>      |  ble ->vmeta_tsetr		// In array part?
>      |  ins_next1
> +    |.if FPU
>      |   stfdx f14, TMP1, TMP2
> +    |.else
> +    |   stwux SAVE0, TMP1, TMP2
> +    |   stw SAVE1, 4(TMP1)
> +    |.endif
>      |  ins_next2
>      |
>      |7:  // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4328,10 +4977,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   add TMP1, TMP1, TMP0
>      |    andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
>      |3:  // Copy result slots to table.
> +    |.if FPU
>      |   lfd f0, 0(RA)
> +    |.else
> +    |   lwz SAVE0, 0(RA)
> +    |   lwz SAVE1, 4(RA)
> +    |.endif
>      |  addi RA, RA, 8
>      |  cmpw cr1, RA, TMP2
> +    |.if FPU
>      |   stfd f0, 0(TMP1)
> +    |.else
> +    |   stw SAVE0, 0(TMP1)
> +    |   stw SAVE1, 4(TMP1)
> +    |.endif
>      |    addi TMP1, TMP1, 8
>      |  blt cr1, <3
>      |  bne >7
> @@ -4398,9 +5057,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |    beq cr1, >3
>      |2:
>      |  addi TMP3, TMP2, 8
> +    |.if FPU
>      |   lfdx f0, RA, TMP2
> +    |.else
> +    |   add CARG3, RA, TMP2
> +    |   lwz CARG1, 0(CARG3)
> +    |   lwz CARG2, 4(CARG3)
> +    |.endif
>      |  cmplw cr1, TMP3, NARGS8:RC
> +    |.if FPU
>      |   stfdx f0, BASE, TMP2
> +    |.else
> +    |   stwux CARG1, TMP2, BASE
> +    |   stw CARG2, 4(TMP2)
> +    |.endif
>      |  mr TMP2, TMP3
>      |  bne cr1, <2
>      |3:
> @@ -4433,14 +5103,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  add BASE, BASE, RA
>      |  lwz TMP1, -24(BASE)
>      |   lwz LFUNC:RB, -20(BASE)
> +    |.if FPU
>      |    lfd f1, -8(BASE)
>      |    lfd f0, -16(BASE)
> +    |.else
> +    |    lwz CARG1, -8(BASE)
> +    |    lwz CARG2, -4(BASE)
> +    |    lwz CARG3, -16(BASE)
> +    |    lwz CARG4, -12(BASE)
> +    |.endif
>      |  stw TMP1, 0(BASE)		// Copy callable.
>      |   stw LFUNC:RB, 4(BASE)
>      |  checkfunc TMP1
> -    |    stfd f1, 16(BASE)		// Copy control var.
>      |     li NARGS8:RC, 16		// Iterators get 2 arguments.
> +    |.if FPU
> +    |    stfd f1, 16(BASE)		// Copy control var.
>      |    stfdu f0, 8(BASE)		// Copy state.
> +    |.else
> +    |    stw CARG1, 16(BASE)		// Copy control var.
> +    |    stw CARG2, 20(BASE)
> +    |    stwu CARG3, 8(BASE)		// Copy state.
> +    |    stw CARG4, 4(BASE)
> +    |.endif
>      |  bne ->vmeta_call
>      |  ins_call
>      break;
> @@ -4461,7 +5145,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   slwi TMP3, RC, 3
>      |  bge >5				// Index points after array part?
>      |  lwzx TMP2, TMP1, TMP3
> +    |.if FPU
>      |   lfdx f0, TMP1, TMP3
> +    |.else
> +    |   lwzux CARG1, TMP3, TMP1
> +    |   lwz CARG2, 4(TMP3)
> +    |.endif
>      |  checknil TMP2
>      |     lwz INS, -4(PC)
>      |  beq >4
> @@ -4473,7 +5162,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.endif
>      |    addi RC, RC, 1
>      |     addis TMP3, PC, -(BCBIAS_J*4 >> 16)
> +    |.if FPU
>      |  stfd f0, 8(RA)
> +    |.else
> +    |  stw CARG1, 8(RA)
> +    |  stw CARG2, 12(RA)
> +    |.endif
>      |     decode_RD4 TMP1, INS
>      |    stw RC, -4(RA)			// Update control var.
>      |     add PC, TMP1, TMP3
> @@ -4498,17 +5192,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   slwi RB, RC, 3
>      |   sub TMP3, TMP3, RB
>      |  lwzx RB, TMP2, TMP3
> +    |.if FPU
>      |  lfdx f0, TMP2, TMP3
> +    |.else
> +    |  add CARG3, TMP2, TMP3
> +    |  lwz CARG1, 0(CARG3)
> +    |  lwz CARG2, 4(CARG3)
> +    |.endif
>      |   add NODE:TMP3, TMP2, TMP3
>      |  checknil RB
>      |     lwz INS, -4(PC)
>      |  beq >7
> +    |.if FPU
>      |   lfd f1, NODE:TMP3->key
> +    |.else
> +    |   lwz CARG3, NODE:TMP3->key.u32.hi
> +    |   lwz CARG4, NODE:TMP3->key.u32.lo
> +    |.endif
>      |     addis TMP2, PC, -(BCBIAS_J*4 >> 16)
> +    |.if FPU
>      |  stfd f0, 8(RA)
> +    |.else
> +    |  stw CARG1, 8(RA)
> +    |  stw CARG2, 12(RA)
> +    |.endif
>      |    add RC, RC, TMP0
>      |     decode_RD4 TMP1, INS
> +    |.if FPU
>      |   stfd f1, 0(RA)
> +    |.else
> +    |   stw CARG3, 0(RA)
> +    |   stw CARG4, 4(RA)
> +    |.endif
>      |    addi RC, RC, 1
>      |     add PC, TMP1, TMP2
>      |    stw RC, -4(RA)			// Update control var.
> @@ -4574,9 +5289,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   subi TMP2, TMP2, 16
>      |   ble >2				// No vararg slots?
>      |1:  // Copy vararg slots to destination slots.
> +    |.if FPU
>      |  lfd f0, 0(RC)
> +    |.else
> +    |  lwz CARG1, 0(RC)
> +    |  lwz CARG2, 4(RC)
> +    |.endif
>      |   addi RC, RC, 8
> +    |.if FPU
>      |  stfd f0, 0(RA)
> +    |.else
> +    |  stw CARG1, 0(RA)
> +    |  stw CARG2, 4(RA)
> +    |.endif
>      |  cmplw RA, TMP2
>      |   cmplw cr1, RC, TMP3
>      |  bge >3				// All destination slots filled?
> @@ -4599,9 +5324,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   addi MULTRES, TMP1, 8
>      |  bgt >7
>      |6:
> +    |.if FPU
>      |  lfd f0, 0(RC)
> +    |.else
> +    |  lwz CARG1, 0(RC)
> +    |  lwz CARG2, 4(RC)
> +    |.endif
>      |   addi RC, RC, 8
> +    |.if FPU
>      |  stfd f0, 0(RA)
> +    |.else
> +    |  stw CARG1, 0(RA)
> +    |  stw CARG2, 4(RA)
> +    |.endif
>      |  cmplw RC, TMP3
>      |   addi RA, RA, 8
>      |  blt <6				// More vararg slots?
> @@ -4652,14 +5387,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   li TMP1, 0
>      |2:
>      |  addi TMP3, TMP1, 8
> +    |.if FPU
>      |   lfdx f0, RA, TMP1
> +    |.else
> +    |   add CARG3, RA, TMP1
> +    |   lwz CARG1, 0(CARG3)
> +    |   lwz CARG2, 4(CARG3)
> +    |.endif
>      |  cmpw TMP3, RC
> +    |.if FPU
>      |   stfdx f0, TMP2, TMP1
> +    |.else
> +    |   add CARG3, TMP2, TMP1
> +    |   stw CARG1, 0(CARG3)
> +    |   stw CARG2, 4(CARG3)
> +    |.endif
>      |  beq >3
>      |  addi TMP1, TMP3, 8
> +    |.if FPU
>      |   lfdx f1, RA, TMP3
> +    |.else
> +    |   add CARG3, RA, TMP3
> +    |   lwz CARG1, 0(CARG3)
> +    |   lwz CARG2, 4(CARG3)
> +    |.endif
>      |  cmpw TMP1, RC
> +    |.if FPU
>      |   stfdx f1, TMP2, TMP3
> +    |.else
> +    |   add CARG3, TMP2, TMP3
> +    |   stw CARG1, 0(CARG3)
> +    |   stw CARG2, 4(CARG3)
> +    |.endif
>      |  bne <2
>      |3:
>      |5:
> @@ -4701,8 +5460,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   subi TMP2, BASE, 8
>      |  decode_RB8 RB, INS
>      if (op == BC_RET1) {
> +      |.if FPU
>        |  lfd f0, 0(RA)
>        |  stfd f0, 0(TMP2)
> +      |.else
> +      |  lwz CARG1, 0(RA)
> +      |  lwz CARG2, 4(RA)
> +      |  stw CARG1, 0(TMP2)
> +      |  stw CARG2, 4(TMP2)
> +      |.endif
>      }
>      |5:
>      |  cmplw RB, RD
> @@ -4763,11 +5529,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |4:
>        |  stw CARG1, FORL_IDX*8+4(RA)
>      } else {
> -      |  lwz TMP3, FORL_STEP*8(RA)
> +      |  lwz SAVE0, FORL_STEP*8(RA)
>        |   lwz CARG3, FORL_STEP*8+4(RA)
>        |  lwz TMP2, FORL_STOP*8(RA)
>        |   lwz CARG2, FORL_STOP*8+4(RA)
> -      |  cmplw cr7, TMP3, TISNUM
> +      |  cmplw cr7, SAVE0, TISNUM
>        |  cmplw cr1, TMP2, TISNUM
>        |  crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
>        |  crand 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -4810,41 +5576,80 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      if (vk) {
>        |.if DUALNUM
>        |9:  // FP loop.
> +      |.if FPU
>        |  lfd f1, FORL_IDX*8(RA)
>        |.else
> +      |  lwz CARG1, FORL_IDX*8(RA)
> +      |  lwz CARG2, FORL_IDX*8+4(RA)
> +      |.endif
> +      |.else
>        |  lfdux f1, RA, BASE
>        |.endif
> +      |.if FPU
>        |  lfd f3, FORL_STEP*8(RA)
>        |  lfd f2, FORL_STOP*8(RA)
> -      |   lwz TMP3, FORL_STEP*8(RA)
>        |  fadd f1, f1, f3
>        |  stfd f1, FORL_IDX*8(RA)
> +      |.else
> +      |  lwz CARG3, FORL_STEP*8(RA)
> +      |  lwz CARG4, FORL_STEP*8+4(RA)
> +      |  mr SAVE1, RD
> +      |  blex __adddf3
> +      |  mr RD, SAVE1
> +      |  stw CRET1, FORL_IDX*8(RA)
> +      |  stw CRET2, FORL_IDX*8+4(RA)
> +      |  lwz CARG3, FORL_STOP*8(RA)
> +      |  lwz CARG4, FORL_STOP*8+4(RA)
> +      |.endif
> +      |   lwz SAVE0, FORL_STEP*8(RA)
>      } else {
>        |.if DUALNUM
>        |9:  // FP loop.
>        |.else
>        |  lwzux TMP1, RA, BASE
> -      |  lwz TMP3, FORL_STEP*8(RA)
> +      |  lwz SAVE0, FORL_STEP*8(RA)
>        |  lwz TMP2, FORL_STOP*8(RA)
>        |  cmplw cr0, TMP1, TISNUM
> -      |  cmplw cr7, TMP3, TISNUM
> +      |  cmplw cr7, SAVE0, TISNUM
>        |  cmplw cr1, TMP2, TISNUM
>        |.endif
> +      |.if FPU
>        |   lfd f1, FORL_IDX*8(RA)
> +      |.else
> +      |   lwz CARG1, FORL_IDX*8(RA)
> +      |   lwz CARG2, FORL_IDX*8+4(RA)
> +      |.endif
>        |  crand 4*cr0+lt, 4*cr0+lt, 4*cr7+lt
>        |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> +      |.if FPU
>        |   lfd f2, FORL_STOP*8(RA)
> +      |.else
> +      |   lwz CARG3, FORL_STOP*8(RA)
> +      |   lwz CARG4, FORL_STOP*8+4(RA)
> +      |.endif
>        |  bge ->vmeta_for
>      }
> -    |  cmpwi cr6, TMP3, 0
> +    |  cmpwi cr6, SAVE0, 0
>      if (op != BC_JFORL) {
>        |  srwi RD, RD, 1
>      }
> +    |.if FPU
>      |   stfd f1, FORL_EXT*8(RA)
> +    |.else
> +    |   stw CARG1, FORL_EXT*8(RA)
> +    |   stw CARG2, FORL_EXT*8+4(RA)
> +    |.endif
>      if (op != BC_JFORL) {
>        |  add RD, PC, RD
>      }
> +    |.if FPU
>      |  fcmpu cr0, f1, f2
> +    |.else
> +    |  mr SAVE1, RD
> +    |  blex __ledf2
> +    |  cmpwi CRET1, 0
> +    |  mr RD, SAVE1
> +    |.endif
>      if (op == BC_JFORI) {
>        |  addis PC, RD, -(BCBIAS_J*4 >> 16)
>      }
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:46   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:21     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 14:33   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:46 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few typos and a single question below.
On Wed, Aug 09, 2023 at 06:35:55PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
> 
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
> 
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the JIT compiler for powerpc.
Typo: s/powerpc/PowerPC/
> This includes:
> * All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
Typo: s/fp-depending/fp-dependent/
> * `asm_sfpmin_max()` is introduced for min/max operations on soft-float
>   point.
> * `asm_sfpcomp()` is introduced for soft-float point comparisons.
> 
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> 
> Sergey Kaplun:
> * added the description for the feature
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_arch.h    |   1 -
>  src/lj_asm_ppc.h | 321 ++++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 278 insertions(+), 44 deletions(-)
> 
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 8bb8757d..7397492e 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -281,7 +281,6 @@
>  #endif
>  
>  #if LJ_ABI_SOFTFP
> -#define LJ_ARCH_NOJIT		1  /* NYI */
>  #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
>  #else
>  #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index aa2d45c0..6cb608f7 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -226,6 +226,7 @@ static void asm_fusexrefx(ASMState *as, PPCIns pi, Reg rt, IRRef ref,
>    emit_tab(as, pi, rt, left, right);
>  }
>  
> +#if !LJ_SOFTFP
>  /* Fuse to multiply-add/sub instruction. */
>  static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
>  {
> @@ -245,6 +246,7 @@ static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
>    }
>    return 0;
>  }
> +#endif
>  
>  /* -- Calls --------------------------------------------------------------- */
>  
> @@ -253,13 +255,17 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>  {
>    uint32_t n, nargs = CCI_XNARGS(ci);
>    int32_t ofs = 8;
> -  Reg gpr = REGARG_FIRSTGPR, fpr = REGARG_FIRSTFPR;
> +  Reg gpr = REGARG_FIRSTGPR;
> +#if !LJ_SOFTFP
> +  Reg fpr = REGARG_FIRSTFPR;
> +#endif
>    if ((void *)ci->func)
>      emit_call(as, (void *)ci->func);
>    for (n = 0; n < nargs; n++) {  /* Setup args. */
>      IRRef ref = args[n];
>      if (ref) {
>        IRIns *ir = IR(ref);
> +#if !LJ_SOFTFP
>        if (irt_isfp(ir->t)) {
>  	if (fpr <= REGARG_LASTFPR) {
>  	  lua_assert(rset_test(as->freeset, fpr));  /* Already evicted. */
> @@ -271,7 +277,9 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>  	  emit_spstore(as, ir, r, ofs);
>  	  ofs += irt_isnum(ir->t) ? 8 : 4;
>  	}
> -      } else {
> +      } else
> +#endif
> +      {
>  	if (gpr <= REGARG_LASTGPR) {
>  	  lua_assert(rset_test(as->freeset, gpr));  /* Already evicted. */
>  	  ra_leftov(as, gpr, ref);
> @@ -290,8 +298,10 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>      }
>      checkmclim(as);
>    }
> +#if !LJ_SOFTFP
>    if ((ci->flags & CCI_VARARG))  /* Vararg calls need to know about FPR use. */
>      emit_tab(as, fpr == REGARG_FIRSTFPR ? PPCI_CRXOR : PPCI_CREQV, 6, 6, 6);
> +#endif
>  }
>  
>  /* Setup result reg/sp for call. Evict scratch regs. */
> @@ -299,8 +309,10 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
>  {
>    RegSet drop = RSET_SCRATCH;
>    int hiop = ((ir+1)->o == IR_HIOP && !irt_isnil((ir+1)->t));
> +#if !LJ_SOFTFP
>    if ((ci->flags & CCI_NOFPRCLOBBER))
>      drop &= ~RSET_FPR;
> +#endif
>    if (ra_hasreg(ir->r))
>      rset_clear(drop, ir->r);  /* Dest reg handled below. */
>    if (hiop && ra_hasreg((ir+1)->r))
> @@ -308,7 +320,7 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
>    ra_evictset(as, drop);  /* Evictions must be performed first. */
>    if (ra_used(ir)) {
>      lua_assert(!irt_ispri(ir->t));
> -    if (irt_isfp(ir->t)) {
> +    if (!LJ_SOFTFP && irt_isfp(ir->t)) {
>        if ((ci->flags & CCI_CASTU64)) {
>  	/* Use spill slot or temp slots. */
>  	int32_t ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
> @@ -377,6 +389,7 @@ static void asm_retf(ASMState *as, IRIns *ir)
>  
>  /* -- Type conversions ---------------------------------------------------- */
>  
> +#if !LJ_SOFTFP
>  static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
>  {
>    RegSet allow = RSET_FPR;
> @@ -409,15 +422,23 @@ static void asm_tobit(ASMState *as, IRIns *ir)
>    emit_fai(as, PPCI_STFD, tmp, RID_SP, SPOFS_TMP);
>    emit_fab(as, PPCI_FADD, tmp, left, right);
>  }
> +#endif
>  
>  static void asm_conv(ASMState *as, IRIns *ir)
>  {
>    IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> +#if !LJ_SOFTFP
>    int stfp = (st == IRT_NUM || st == IRT_FLOAT);
> +#endif
>    IRRef lref = ir->op1;
> -  lua_assert(irt_type(ir->t) != st);
>    lua_assert(!(irt_isint64(ir->t) ||
>  	       (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
> +#if LJ_SOFTFP
> +  /* FP conversions are handled by SPLIT. */
> +  lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
> +  /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
> +#else
> +  lua_assert(irt_type(ir->t) != st);
>    if (irt_isfp(ir->t)) {
>      Reg dest = ra_dest(as, ir, RSET_FPR);
>      if (stfp) {  /* FP to FP conversion. */
> @@ -476,7 +497,9 @@ static void asm_conv(ASMState *as, IRIns *ir)
>  	emit_fb(as, PPCI_FCTIWZ, tmp, left);
>        }
>      }
> -  } else {
> +  } else
> +#endif
> +  {
>      Reg dest = ra_dest(as, ir, RSET_GPR);
>      if (st >= IRT_I8 && st <= IRT_U16) {  /* Extend to 32 bit integer. */
>        Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> @@ -496,17 +519,41 @@ static void asm_strto(ASMState *as, IRIns *ir)
>  {
>    const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
>    IRRef args[2];
> -  int32_t ofs;
> +  int32_t ofs = SPOFS_TMP;
> +#if LJ_SOFTFP
> +  ra_evictset(as, RSET_SCRATCH);
> +  if (ra_used(ir)) {
> +    if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> +	(ir->s & 1) == LJ_BE && (ir->s ^ 1) == (ir+1)->s) {
> +      int i;
> +      for (i = 0; i < 2; i++) {
> +	Reg r = (ir+i)->r;
> +	if (ra_hasreg(r)) {
> +	  ra_free(as, r);
> +	  ra_modified(as, r);
> +	  emit_spload(as, ir+i, r, sps_scale((ir+i)->s));
> +	}
> +      }
> +      ofs = sps_scale(ir->s & ~1);
> +    } else {
> +      Reg rhi = ra_dest(as, ir+1, RSET_GPR);
> +      Reg rlo = ra_dest(as, ir, rset_exclude(RSET_GPR, rhi));
> +      emit_tai(as, PPCI_LWZ, rhi, RID_SP, ofs);
> +      emit_tai(as, PPCI_LWZ, rlo, RID_SP, ofs+4);
> +    }
> +  }
> +#else
>    RegSet drop = RSET_SCRATCH;
>    if (ra_hasreg(ir->r)) rset_set(drop, ir->r);  /* Spill dest reg (if any). */
>    ra_evictset(as, drop);
> +  if (ir->s) ofs = sps_scale(ir->s);
> +#endif
>    asm_guardcc(as, CC_EQ);
>    emit_ai(as, PPCI_CMPWI, RID_RET, 0);  /* Test return status. */
>    args[0] = ir->op1;      /* GCstr *str */
>    args[1] = ASMREF_TMP1;  /* TValue *n  */
>    asm_gencall(as, ci, args);
>    /* Store the result to the spill slot or temp slots. */
> -  ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
>    emit_tai(as, PPCI_ADDI, ra_releasetmp(as, ASMREF_TMP1), RID_SP, ofs);
>  }
>  
> @@ -530,7 +577,10 @@ static void asm_tvptr(ASMState *as, Reg dest, IRRef ref)
>        Reg src = ra_alloc1(as, ref, allow);
>        emit_setgl(as, src, tmptv.gcr);
>      }
> -    type = ra_allock(as, irt_toitype(ir->t), allow);
> +    if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> +      type = ra_alloc1(as, ref+1, allow);
> +    else
> +      type = ra_allock(as, irt_toitype(ir->t), allow);
>      emit_setgl(as, type, tmptv.it);
>    }
>  }
> @@ -574,11 +624,27 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>    Reg tisnum = RID_NONE, tmpnum = RID_NONE;
>    IRRef refkey = ir->op2;
>    IRIns *irkey = IR(refkey);
> +  int isk = irref_isk(refkey);
>    IRType1 kt = irkey->t;
>    uint32_t khash;
>    MCLabel l_end, l_loop, l_next;
>  
>    rset_clear(allow, tab);
> +#if LJ_SOFTFP
> +  if (!isk) {
> +    key = ra_alloc1(as, refkey, allow);
> +    rset_clear(allow, key);
> +    if (irkey[1].o == IR_HIOP) {
> +      if (ra_hasreg((irkey+1)->r)) {
> +	tmpnum = (irkey+1)->r;
> +	ra_noweak(as, tmpnum);
> +      } else {
> +	tmpnum = ra_allocref(as, refkey+1, allow);
> +      }
> +      rset_clear(allow, tmpnum);
> +    }
> +  }
> +#else
>    if (irt_isnum(kt)) {
>      key = ra_alloc1(as, refkey, RSET_FPR);
>      tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
> @@ -588,6 +654,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>      key = ra_alloc1(as, refkey, allow);
>      rset_clear(allow, key);
>    }
> +#endif
>    tmp2 = ra_scratch(as, allow);
>    rset_clear(allow, tmp2);
>  
> @@ -610,7 +677,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>      asm_guardcc(as, CC_EQ);
>    else
>      emit_condbranch(as, PPCI_BC|PPCF_Y, CC_EQ, l_end);
> -  if (irt_isnum(kt)) {
> +  if (!LJ_SOFTFP && irt_isnum(kt)) {
>      emit_fab(as, PPCI_FCMPU, 0, tmpnum, key);
>      emit_condbranch(as, PPCI_BC, CC_GE, l_next);
>      emit_ab(as, PPCI_CMPLW, tmp1, tisnum);
> @@ -620,7 +687,10 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>        emit_ab(as, PPCI_CMPW, tmp2, key);
>        emit_condbranch(as, PPCI_BC, CC_NE, l_next);
>      }
> -    emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
> +    if (LJ_SOFTFP && ra_hasreg(tmpnum))
> +      emit_ab(as, PPCI_CMPW, tmp1, tmpnum);
> +    else
> +      emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
>      if (!irt_ispri(kt))
>        emit_tai(as, PPCI_LWZ, tmp2, dest, (int32_t)offsetof(Node, key.gcr));
>    }
> @@ -629,19 +699,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>  	    (((char *)as->mcp-(char *)l_loop) & 0xffffu);
>  
>    /* Load main position relative to tab->node into dest. */
> -  khash = irref_isk(refkey) ? ir_khash(irkey) : 1;
> +  khash = isk ? ir_khash(irkey) : 1;
>    if (khash == 0) {
>      emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
>    } else {
>      Reg tmphash = tmp1;
> -    if (irref_isk(refkey))
> +    if (isk)
>        tmphash = ra_allock(as, khash, allow);
>      emit_tab(as, PPCI_ADD, dest, dest, tmp1);
>      emit_tai(as, PPCI_MULLI, tmp1, tmp1, sizeof(Node));
>      emit_asb(as, PPCI_AND, tmp1, tmp2, tmphash);
>      emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
>      emit_tai(as, PPCI_LWZ, tmp2, tab, (int32_t)offsetof(GCtab, hmask));
> -    if (irref_isk(refkey)) {
> +    if (isk) {
>        /* Nothing to do. */
>      } else if (irt_isstr(kt)) {
>        emit_tai(as, PPCI_LWZ, tmp1, key, (int32_t)offsetof(GCstr, hash));
> @@ -651,13 +721,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>        emit_asb(as, PPCI_XOR, tmp1, tmp1, tmp2);
>        emit_rotlwi(as, tmp1, tmp1, (HASH_ROT2+HASH_ROT1)&31);
>        emit_tab(as, PPCI_SUBF, tmp2, dest, tmp2);
> -      if (irt_isnum(kt)) {
> +      if (LJ_SOFTFP ? (irkey[1].o == IR_HIOP) : irt_isnum(kt)) {
> +#if LJ_SOFTFP
> +	emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
> +	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> +	emit_tab(as, PPCI_ADD, tmp1, tmpnum, tmpnum);
> +#else
>  	int32_t ofs = ra_spill(as, irkey);
>  	emit_asb(as, PPCI_XOR, tmp2, tmp2, tmp1);
>  	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
>  	emit_tab(as, PPCI_ADD, tmp1, tmp1, tmp1);
>  	emit_tai(as, PPCI_LWZ, tmp2, RID_SP, ofs+4);
>  	emit_tai(as, PPCI_LWZ, tmp1, RID_SP, ofs);
> +#endif
>        } else {
>  	emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
>  	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> @@ -784,8 +860,8 @@ static PPCIns asm_fxloadins(IRIns *ir)
>    case IRT_U8: return PPCI_LBZ;
>    case IRT_I16: return PPCI_LHA;
>    case IRT_U16: return PPCI_LHZ;
> -  case IRT_NUM: return PPCI_LFD;
> -  case IRT_FLOAT: return PPCI_LFS;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_LFD;
> +  case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_LFS;
>    default: return PPCI_LWZ;
>    }
>  }
> @@ -795,8 +871,8 @@ static PPCIns asm_fxstoreins(IRIns *ir)
>    switch (irt_type(ir->t)) {
>    case IRT_I8: case IRT_U8: return PPCI_STB;
>    case IRT_I16: case IRT_U16: return PPCI_STH;
> -  case IRT_NUM: return PPCI_STFD;
> -  case IRT_FLOAT: return PPCI_STFS;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_STFD;
> +  case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_STFS;
>    default: return PPCI_STW;
>    }
>  }
> @@ -839,7 +915,8 @@ static void asm_fstore(ASMState *as, IRIns *ir)
>  
>  static void asm_xload(ASMState *as, IRIns *ir)
>  {
> -  Reg dest = ra_dest(as, ir, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> +  Reg dest = ra_dest(as, ir,
> +    (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
>    lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
>    if (irt_isi8(ir->t))
>      emit_as(as, PPCI_EXTSB, dest, dest);
> @@ -857,7 +934,8 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
>      Reg src = ra_alloc1(as, irb->op1, RSET_GPR);
>      asm_fusexrefx(as, PPCI_STWBRX, src, ir->op1, rset_exclude(RSET_GPR, src));
>    } else {
> -    Reg src = ra_alloc1(as, ir->op2, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> +    Reg src = ra_alloc1(as, ir->op2,
> +      (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
>      asm_fusexref(as, asm_fxstoreins(ir), src, ir->op1,
>  		 rset_exclude(RSET_GPR, src), ofs);
>    }
> @@ -871,10 +949,19 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
>    Reg dest = RID_NONE, type = RID_TMP, tmp = RID_TMP, idx;
>    RegSet allow = RSET_GPR;
>    int32_t ofs = AHUREF_LSX;
> +  if (LJ_SOFTFP && (ir+1)->o == IR_HIOP) {
> +    t.irt = IRT_NUM;
> +    if (ra_used(ir+1)) {
> +      type = ra_dest(as, ir+1, allow);
> +      rset_clear(allow, type);
> +    }
> +    ofs = 0;
> +  }
>    if (ra_used(ir)) {
> -    lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> -    if (!irt_isnum(t)) ofs = 0;
> -    dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> +    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> +	       irt_isint(ir->t) || irt_isaddr(ir->t));
> +    if (LJ_SOFTFP || !irt_isnum(t)) ofs = 0;
> +    dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>      rset_clear(allow, dest);
>    }
>    idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> @@ -883,12 +970,13 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
>      asm_guardcc(as, CC_GE);
>      emit_ab(as, PPCI_CMPLW, type, tisnum);
>      if (ra_hasreg(dest)) {
> -      if (ofs == AHUREF_LSX) {
> +      if (!LJ_SOFTFP && ofs == AHUREF_LSX) {
>  	tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_GPR,
>  						       (idx&255)), (idx>>8)));
>  	emit_fab(as, PPCI_LFDX, dest, (idx&255), tmp);
>        } else {
> -	emit_fai(as, PPCI_LFD, dest, idx, ofs);
> +	emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest, idx,
> +		 ofs+4*LJ_SOFTFP);
>        }
>      }
>    } else {
> @@ -911,7 +999,7 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
>    int32_t ofs = AHUREF_LSX;
>    if (ir->r == RID_SINK)
>      return;
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>      src = ra_alloc1(as, ir->op2, RSET_FPR);
>    } else {
>      if (!irt_ispri(ir->t)) {
> @@ -919,11 +1007,14 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
>        rset_clear(allow, src);
>        ofs = 0;
>      }
> -    type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> +    if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> +      type = ra_alloc1(as, (ir+1)->op2, allow);
> +    else
> +      type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
>      rset_clear(allow, type);
>    }
>    idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>      if (ofs == AHUREF_LSX) {
>        emit_fab(as, PPCI_STFDX, src, (idx&255), RID_TMP);
>        emit_slwi(as, RID_TMP, (idx>>8), 3);
> @@ -948,21 +1039,33 @@ static void asm_sload(ASMState *as, IRIns *ir)
>    IRType1 t = ir->t;
>    Reg dest = RID_NONE, type = RID_NONE, base;
>    RegSet allow = RSET_GPR;
> +  int hiop = (LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> +  if (hiop)
> +    t.irt = IRT_NUM;
>    lua_assert(!(ir->op2 & IRSLOAD_PARENT));  /* Handled by asm_head_side(). */
> -  lua_assert(irt_isguard(t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> +  lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
>    lua_assert(LJ_DUALNUM ||
>  	     !irt_isint(t) || (ir->op2 & (IRSLOAD_CONVERT|IRSLOAD_FRAME)));
> +#if LJ_SOFTFP
> +  lua_assert(!(ir->op2 & IRSLOAD_CONVERT));  /* Handled by LJ_SOFTFP SPLIT. */
> +  if (hiop && ra_used(ir+1)) {
> +    type = ra_dest(as, ir+1, allow);
> +    rset_clear(allow, type);
> +  }
> +#else
>    if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
>      dest = ra_scratch(as, RSET_FPR);
>      asm_tointg(as, ir, dest);
>      t.irt = IRT_NUM;  /* Continue with a regular number type check. */
> -  } else if (ra_used(ir)) {
> +  } else
> +#endif
> +  if (ra_used(ir)) {
>      lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> -    dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> +    dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>      rset_clear(allow, dest);
>      base = ra_alloc1(as, REF_BASE, allow);
>      rset_clear(allow, base);
> -    if ((ir->op2 & IRSLOAD_CONVERT)) {
> +    if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
>        if (irt_isint(t)) {
>  	emit_tai(as, PPCI_LWZ, dest, RID_SP, SPOFS_TMPLO);
>  	dest = ra_scratch(as, RSET_FPR);
> @@ -994,10 +1097,13 @@ dotypecheck:
>      if ((ir->op2 & IRSLOAD_TYPECHECK)) {
>        Reg tisnum = ra_allock(as, (int32_t)LJ_TISNUM, allow);
>        asm_guardcc(as, CC_GE);
> -      emit_ab(as, PPCI_CMPLW, RID_TMP, tisnum);
> +#if !LJ_SOFTFP
>        type = RID_TMP;
> +#endif
> +      emit_ab(as, PPCI_CMPLW, type, tisnum);
>      }
> -    if (ra_hasreg(dest)) emit_fai(as, PPCI_LFD, dest, base, ofs-4);
> +    if (ra_hasreg(dest)) emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest,
> +				  base, ofs-(LJ_SOFTFP?0:4));
>    } else {
>      if ((ir->op2 & IRSLOAD_TYPECHECK)) {
>        asm_guardcc(as, CC_NE);
> @@ -1122,6 +1228,7 @@ static void asm_obar(ASMState *as, IRIns *ir)
>  
>  /* -- Arithmetic and logic operations ------------------------------------- */
>  
> +#if !LJ_SOFTFP
>  static void asm_fparith(ASMState *as, IRIns *ir, PPCIns pi)
>  {
>    Reg dest = ra_dest(as, ir, RSET_FPR);
> @@ -1149,13 +1256,17 @@ static void asm_fpmath(ASMState *as, IRIns *ir)
>    else
>      asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
>  }
> +#endif
>  
>  static void asm_add(ASMState *as, IRIns *ir)
>  {
> +#if !LJ_SOFTFP
>    if (irt_isnum(ir->t)) {
>      if (!asm_fusemadd(as, ir, PPCI_FMADD, PPCI_FMADD))
>        asm_fparith(as, ir, PPCI_FADD);
> -  } else {
> +  } else
> +#endif
> +  {
>      Reg dest = ra_dest(as, ir, RSET_GPR);
>      Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
>      PPCIns pi;
> @@ -1194,10 +1305,13 @@ static void asm_add(ASMState *as, IRIns *ir)
>  
>  static void asm_sub(ASMState *as, IRIns *ir)
>  {
> +#if !LJ_SOFTFP
>    if (irt_isnum(ir->t)) {
>      if (!asm_fusemadd(as, ir, PPCI_FMSUB, PPCI_FNMSUB))
>        asm_fparith(as, ir, PPCI_FSUB);
> -  } else {
> +  } else
> +#endif
> +  {
>      PPCIns pi = PPCI_SUBF;
>      Reg dest = ra_dest(as, ir, RSET_GPR);
>      Reg left, right;
> @@ -1223,9 +1337,12 @@ static void asm_sub(ASMState *as, IRIns *ir)
>  
>  static void asm_mul(ASMState *as, IRIns *ir)
>  {
> +#if !LJ_SOFTFP
>    if (irt_isnum(ir->t)) {
>      asm_fparith(as, ir, PPCI_FMUL);
> -  } else {
> +  } else
> +#endif
> +  {
>      PPCIns pi = PPCI_MULLW;
>      Reg dest = ra_dest(as, ir, RSET_GPR);
>      Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> @@ -1253,9 +1370,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
>  
>  static void asm_neg(ASMState *as, IRIns *ir)
>  {
> +#if !LJ_SOFTFP
>    if (irt_isnum(ir->t)) {
>      asm_fpunary(as, ir, PPCI_FNEG);
> -  } else {
> +  } else
> +#endif
> +  {
>      Reg dest, left;
>      PPCIns pi = PPCI_NEG;
>      if (as->flagmcp == as->mcp) {
> @@ -1566,9 +1686,40 @@ static void asm_bitshift(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pik)
>  		       PPCI_RLWINM|PPCF_MB(0)|PPCF_ME(31))
>  #define asm_bror(as, ir)	lua_assert(0)
>  
> +#if LJ_SOFTFP
> +static void asm_sfpmin_max(ASMState *as, IRIns *ir)
> +{
> +  CCallInfo ci = lj_ir_callinfo[IRCALL_softfp_cmp];
> +  IRRef args[4];
> +  MCLabel l_right, l_end;
> +  Reg desthi = ra_dest(as, ir, RSET_GPR), destlo = ra_dest(as, ir+1, RSET_GPR);
> +  Reg righthi, lefthi = ra_alloc2(as, ir, RSET_GPR);
> +  Reg rightlo, leftlo = ra_alloc2(as, ir+1, RSET_GPR);
> +  PPCCC cond = (IROp)ir->o == IR_MIN ? CC_EQ : CC_NE;
> +  righthi = (lefthi >> 8); lefthi &= 255;
> +  rightlo = (leftlo >> 8); leftlo &= 255;
> +  args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> +  args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> +  l_end = emit_label(as);
> +  if (desthi != righthi) emit_mr(as, desthi, righthi);
> +  if (destlo != rightlo) emit_mr(as, destlo, rightlo);
> +  l_right = emit_label(as);
> +  if (l_end != l_right) emit_jmp(as, l_end);
> +  if (desthi != lefthi) emit_mr(as, desthi, lefthi);
> +  if (destlo != leftlo) emit_mr(as, destlo, leftlo);
> +  if (l_right == as->mcp+1) {
> +    cond ^= 4; l_right = l_end; ++as->mcp;
> +  }
> +  emit_condbranch(as, PPCI_BC, cond, l_right);
> +  ra_evictset(as, RSET_SCRATCH);
> +  emit_cmpi(as, RID_RET, 1);
> +  asm_gencall(as, &ci, args);
> +}
> +#endif
> +
>  static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>  {
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>      Reg dest = ra_dest(as, ir, RSET_FPR);
>      Reg tmp = dest;
>      Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> @@ -1656,7 +1807,7 @@ static void asm_intcomp_(ASMState *as, IRRef lref, IRRef rref, Reg cr, PPCCC cc)
>  static void asm_comp(ASMState *as, IRIns *ir)
>  {
>    PPCCC cc = asm_compmap[ir->o];
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>      Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>      right = (left >> 8); left &= 255;
>      asm_guardcc(as, (cc >> 4));
> @@ -1677,6 +1828,44 @@ static void asm_comp(ASMState *as, IRIns *ir)
>  
>  #define asm_equal(as, ir)	asm_comp(as, ir)
>  
> +#if LJ_SOFTFP
> +/* SFP comparisons. */
> +static void asm_sfpcomp(ASMState *as, IRIns *ir)
> +{
> +  const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
> +  RegSet drop = RSET_SCRATCH;
> +  Reg r;
> +  IRRef args[4];
> +  args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> +  args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> +
> +  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> +    if (!rset_test(as->freeset, r) &&
> +	regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
> +      rset_clear(drop, r);
> +  }
> +  ra_evictset(as, drop);
> +  asm_setupresult(as, ir, ci);
> +  switch ((IROp)ir->o) {
> +  case IR_ULT:
> +    asm_guardcc(as, CC_EQ);
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> +  case IR_ULE:
> +    asm_guardcc(as, CC_EQ);
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 1);
> +    break;
> +  case IR_GE: case IR_GT:
> +    asm_guardcc(as, CC_EQ);
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 2);
> +  default:
> +    asm_guardcc(as, (asm_compmap[ir->o] & 0xf));
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> +    break;
> +  }
> +  asm_gencall(as, ci, args);
> +}
> +#endif
> +
>  #if LJ_HASFFI
>  /* 64 bit integer comparisons. */
>  static void asm_comp64(ASMState *as, IRIns *ir)
> @@ -1706,19 +1895,36 @@ static void asm_comp64(ASMState *as, IRIns *ir)
>  /* Hiword op of a split 64 bit op. Previous op must be the loword op. */
>  static void asm_hiop(ASMState *as, IRIns *ir)
>  {
> -#if LJ_HASFFI
> +#if LJ_HASFFI || LJ_SOFTFP
>    /* HIOP is marked as a store because it needs its own DCE logic. */
>    int uselo = ra_used(ir-1), usehi = ra_used(ir);  /* Loword/hiword used? */
>    if (LJ_UNLIKELY(!(as->flags & JIT_F_OPT_DCE))) uselo = usehi = 1;
>    if ((ir-1)->o == IR_CONV) {  /* Conversions to/from 64 bit. */
>      as->curins--;  /* Always skip the CONV. */
> +#if LJ_HASFFI && !LJ_SOFTFP
>      if (usehi || uselo)
>        asm_conv64(as, ir);
>      return;
> +#endif
>    } else if ((ir-1)->o <= IR_NE) {  /* 64 bit integer comparisons. ORDER IR. */
>      as->curins--;  /* Always skip the loword comparison. */
> +#if LJ_SOFTFP
> +    if (!irt_isint(ir->t)) {
> +      asm_sfpcomp(as, ir-1);
> +      return;
> +    }
> +#endif
> +#if LJ_HASFFI
>      asm_comp64(as, ir);
> +#endif
> +    return;
> +#if LJ_SOFTFP
> +  } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
> +      as->curins--;  /* Always skip the loword min/max. */
> +    if (uselo || usehi)
> +      asm_sfpmin_max(as, ir-1);
>      return;
> +#endif
>    } else if ((ir-1)->o == IR_XSTORE) {
>      as->curins--;  /* Handle both stores here. */
>      if ((ir-1)->r != RID_SINK) {
> @@ -1729,14 +1935,27 @@ static void asm_hiop(ASMState *as, IRIns *ir)
>    }
>    if (!usehi) return;  /* Skip unused hiword op for all remaining ops. */
>    switch ((ir-1)->o) {
> +#if LJ_HASFFI
>    case IR_ADD: as->curins--; asm_add64(as, ir); break;
>    case IR_SUB: as->curins--; asm_sub64(as, ir); break;
>    case IR_NEG: as->curins--; asm_neg64(as, ir); break;
> +#endif
> +#if LJ_SOFTFP
> +  case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
> +  case IR_STRTO:
Why are those fp-dependent? Should we write an explanation?
> +    if (!uselo)
> +      ra_allocref(as, ir->op1, RSET_GPR);  /* Mark lo op as used. */
> +    break;
> +#endif
>    case IR_CALLN:
> +  case IR_CALLS:
>    case IR_CALLXS:
>      if (!uselo)
>        ra_allocref(as, ir->op1, RID2RSET(RID_RETLO));  /* Mark lo op as used. */
>      break;
> +#if LJ_SOFTFP
> +  case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
> +#endif
>    case IR_CNEWI:
>      /* Nothing to do here. Handled by lo op itself. */
>      break;
> @@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>      if ((sn & SNAP_NORESTORE))
>        continue;
>      if (irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +      Reg tmp;
> +      RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> +      lua_assert(irref_isk(ref));  /* LJ_SOFTFP: must be a number constant. */
> +      tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.lo, allow);
> +      emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?4:0));
> +      if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
> +      tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
> +      emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#else
>        Reg src = ra_alloc1(as, ref, RSET_FPR);
>        emit_fai(as, PPCI_STFD, src, RID_BASE, ofs);
> +#endif
>      } else {
>        Reg type;
>        RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> @@ -1814,6 +2044,10 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>        if ((sn & (SNAP_CONT|SNAP_FRAME))) {
>  	if (s == 0) continue;  /* Do not overwrite link to previous frame. */
>  	type = ra_allock(as, (int32_t)(*flinks--), allow);
> +#if LJ_SOFTFP
> +      } else if ((sn & SNAP_SOFTFPNUM)) {
> +	type = ra_alloc1(as, ref+1, rset_exclude(RSET_GPR, RID_BASE));
> +#endif
>        } else {
>  	type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
>        }
> @@ -1950,14 +2184,15 @@ static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci)
>    int nslots = 2, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR;
>    asm_collectargs(as, ir, ci, args);
>    for (i = 0; i < nargs; i++)
> -    if (args[i] && irt_isfp(IR(args[i])->t)) {
> +    if (!LJ_SOFTFP && args[i] && irt_isfp(IR(args[i])->t)) {
>        if (nfpr > 0) nfpr--; else nslots = (nslots+3) & ~1;
>      } else {
>        if (ngpr > 0) ngpr--; else nslots++;
>      }
>    if (nslots > as->evenspill)  /* Leave room for args in stack slots. */
>      as->evenspill = nslots;
> -  return irt_isfp(ir->t) ? REGSP_HINT(RID_FPRET) : REGSP_HINT(RID_RET);
> +  return (!LJ_SOFTFP && irt_isfp(ir->t)) ? REGSP_HINT(RID_FPRET) :
> +					   REGSP_HINT(RID_RET);
>  }
>  
>  static void asm_setup_target(ASMState *as)
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
@ 2023-08-15 11:58   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:40     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 14:31   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 11:58 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few typos below.
On Wed, Aug 09, 2023 at 06:35:56PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> This patch is a follow-up for the commit
> a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
> timer from lj_profile"). It moves the timer machinery to the separate
Typo: s/to the/to a/
> module. Unfortunately, the `profile_{un}lock()` calls for Windows and
> PS3 wasn't updated to access `lj_profile_timer` structure instead of
Typo: s/wasn't/weren't/
> `ProfileState`.
> 
> Also, it is a follow-up to the commit
> f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
> section"). Since this commit the system-dependent header <unistd.h> and
> `write()`, `open()`, `close()` functions are used. They are undefining
Typo: s/undefining/undefined/
> on Windows, so this leads to error during the build.
Typo: s/error/errors/
> 
> This patch fixes the aforementioned misbehaviour. After it our fork may
> be built on Windows at least.
> ---
>  src/lib_misc.c         | 16 ++++++++++++----
>  src/lj_profile_timer.h |  8 ++++----
>  2 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index c18d297e..1913a622 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c
> @@ -8,10 +8,6 @@
>  #define lib_misc_c
>  #define LUA_LIB
>  
> -#include <errno.h>
> -#include <fcntl.h>
> -#include <unistd.h>
> -
>  #include "lua.h"
>  #include "lmisclib.h"
>  #include "lauxlib.h"
> @@ -25,6 +21,12 @@
>  
>  #include "lj_memprof.h"
>  
> +#include <errno.h>
> +#include <fcntl.h>
> +#if !LJ_TARGET_WINDOWS
> +#include <unistd.h>
> +#endif
> +
>  /* ------------------------------------------------------------------------ */
>  
>  static LJ_AINLINE void setnumfield(struct lua_State *L, GCtab *t,
> @@ -78,6 +80,7 @@ LJLIB_CF(misc_getmetrics)
>  
>  /* --------- profile common section --------------------------------------- */
>  
> +#if !LJ_TARGET_WINDOWS
>  /*
>  ** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
>  */
> @@ -434,6 +437,7 @@ LJLIB_CF(misc_memprof_stop)
>    lua_pushboolean(L, 1);
>    return 1;
>  }
> +#endif /* !LJ_TARGET_WINDOWS */
>  
>  #include "lj_libdef.h"
>  
> @@ -441,6 +445,7 @@ LJLIB_CF(misc_memprof_stop)
>  
>  LUALIB_API int luaopen_misc(struct lua_State *L)
>  {
> +#if !LJ_TARGET_WINDOWS
>    luaM_sysprof_set_writer(buffer_writer_default);
>    luaM_sysprof_set_on_stop(on_stop_cb_default);
>    /*
> @@ -448,9 +453,12 @@ LUALIB_API int luaopen_misc(struct lua_State *L)
>    ** backtracing function.
>    */
>    luaM_sysprof_set_backtracer(NULL);
> +#endif /* !LJ_TARGET_WINDOWS */
>  
>    LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
> +#if !LJ_TARGET_WINDOWS
>    LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
>    LJ_LIB_REG(L, LUAM_MISCLIBNAME ".sysprof", misc_sysprof);
> +#endif /* !LJ_TARGET_WINDOWS */
>    return 1;
>  }
> diff --git a/src/lj_profile_timer.h b/src/lj_profile_timer.h
> index 1deeea53..b3e1a6e9 100644
> --- a/src/lj_profile_timer.h
> +++ b/src/lj_profile_timer.h
> @@ -25,8 +25,8 @@
>  #if LJ_TARGET_PS3
>  #include <sys/timer.h>
>  #endif
> -#define profile_lock(ps)	pthread_mutex_lock(&ps->lock)
> -#define profile_unlock(ps)	pthread_mutex_unlock(&ps->lock)
> +#define profile_lock(ps)	pthread_mutex_lock(&ps->timer.lock)
> +#define profile_unlock(ps)	pthread_mutex_unlock(&ps->timer.lock)
>  
>  #elif LJ_PROFILE_WTHREAD
>  
> @@ -38,8 +38,8 @@
>  #include <windows.h>
>  #endif
>  typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int);
> -#define profile_lock(ps)	EnterCriticalSection(&ps->lock)
> -#define profile_unlock(ps)	LeaveCriticalSection(&ps->lock)
> +#define profile_lock(ps)	EnterCriticalSection(&ps->timer.lock)
> +#define profile_unlock(ps)	LeaveCriticalSection(&ps->timer.lock)
>  
>  #endif
>  
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
@ 2023-08-15 12:09   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:50     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 16:40   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 12:09 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:35:57PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by Ben Pye.
> 
> (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
> 
> This patch adds partial support for the Universal Windows Platform [1]
> in LuaJIT.
> This includes:
> * `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
Typo: s/is Unviersal/is the Universal/
>   Platform.
> * `LJ_WIN_VALLOC()` macro is introduced to use instead of
Typo: s/to use/to be used/
>   `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
> * `LJ_WIN_VPROTECT()` macro is introduced to use instead of
Typo: s/to use/to be used/
>   `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
> * `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
Typo: s/to use/to be used/
>   `LoadLibraryExA()` [6] (custom implementation using
>   `LoadPackagedLibrary()` [7] for UWP).
> 
> Note that the following features are not implemented for UWP:
> * `io.popen()`.
> * LuaJIT profiler's (`jit.p`) timer for Windows has not very high
>   resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
Typo: s/not very high/a low/
>   not used, because the <winmm.dll> library isn't loaded.
> 
> [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
> [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
> [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
> [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
> [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
> [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
> [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
> [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
> [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
> 
> Sergey Kaplun:
> * added the description for the feature
> 
> Part of tarantool/tarantool#8825
> ---
>  doc/ext_ffi_api.html   |  2 ++
>  src/lib_ffi.c          |  3 +++
>  src/lib_io.c           |  4 ++--
>  src/lib_package.c      | 24 +++++++++++++++++++++++-
>  src/lj_alloc.c         |  6 +++---
>  src/lj_arch.h          | 19 +++++++++++++++++++
>  src/lj_ccallback.c     |  4 ++--
>  src/lj_clib.c          | 20 ++++++++++++++++----
>  src/lj_mcode.c         |  8 ++++----
>  src/lj_profile_timer.c |  8 ++++----
>  10 files changed, 78 insertions(+), 20 deletions(-)
> 
> diff --git a/doc/ext_ffi_api.html b/doc/ext_ffi_api.html
> index 91af2e1d..c72191d1 100644
> --- a/doc/ext_ffi_api.html
> +++ b/doc/ext_ffi_api.html
> @@ -469,6 +469,8 @@ otherwise. The following parameters are currently defined:
>  <tr class="odd">
>  <td class="abiparam">win</td><td class="abidesc">Windows variant of the standard ABI</td></tr>
>  <tr class="even">
> +<td class="abiparam">uwp</td><td class="abidesc">Universal Windows Platform</td></tr>
> +<tr class="odd">
>  <td class="abiparam">gc64</td><td class="abidesc">64 bit GC references</td></tr>
>  </table>
>  
> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
> index 136e98e8..d1fe1a14 100644
> --- a/src/lib_ffi.c
> +++ b/src/lib_ffi.c
> @@ -746,6 +746,9 @@ LJLIB_CF(ffi_abi)	LJLIB_REC(.)
>  #endif
>  #if LJ_ABI_WIN
>    case H_(4ab624a8,4ab624a8): b = 1; break;  /* win */
> +#endif
> +#if LJ_TARGET_UWP
> +  case H_(a40f0bcb,a40f0bcb): b = 1; break;  /* uwp */
>  #endif
It is not obvious what happens here and it is not mentioned in the commit message.
Please add a description of this change too.
>    case H_(3af93066,1f001464): b = 1; break;  /* le/be */
>  #if LJ_GC64
> diff --git a/src/lib_io.c b/src/lib_io.c
> index f0108227..db995ae6 100644
> --- a/src/lib_io.c
> +++ b/src/lib_io.c
> @@ -99,7 +99,7 @@ static int io_file_close(lua_State *L, IOFileUD *iof)
>      int stat = -1;
>  #if LJ_TARGET_POSIX
>      stat = pclose(iof->fp);
> -#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE
> +#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP
>      stat = _pclose(iof->fp);
>  #else
>      lua_assert(0);
> @@ -414,7 +414,7 @@ LJLIB_CF(io_open)
>  
>  LJLIB_CF(io_popen)
>  {
> -#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE)
> +#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP)
>    const char *fname = strdata(lj_lib_checkstr(L, 1));
>    GCstr *s = lj_lib_optstr(L, 2);
>    const char *mode = s ? strdata(s) : "r";
> diff --git a/src/lib_package.c b/src/lib_package.c
> index 67959a10..b49f0209 100644
> --- a/src/lib_package.c
> +++ b/src/lib_package.c
> @@ -76,6 +76,20 @@ static const char *ll_bcsym(void *lib, const char *sym)
>  BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
>  #endif
>  
> +#if LJ_TARGET_UWP
> +void *LJ_WIN_LOADLIBA(const char *path)
> +{
> +  DWORD err = GetLastError();
> +  wchar_t wpath[256];
> +  HANDLE lib = NULL;
> +  if (MultiByteToWideChar(CP_ACP, 0, path, -1, wpath, 256) > 0) {
> +    lib = LoadPackagedLibrary(wpath, 0);
> +  }
> +  SetLastError(err);
> +  return lib;
> +}
> +#endif
> +
>  #undef setprogdir
>  
>  static void setprogdir(lua_State *L)
> @@ -119,7 +133,7 @@ static void ll_unloadlib(void *lib)
>  
>  static void *ll_load(lua_State *L, const char *path, int gl)
>  {
> -  HINSTANCE lib = LoadLibraryExA(path, NULL, 0);
> +  HINSTANCE lib = LJ_WIN_LOADLIBA(path);
>    if (lib == NULL) pusherror(L);
>    UNUSED(gl);
>    return lib;
> @@ -132,17 +146,25 @@ static lua_CFunction ll_sym(lua_State *L, void *lib, const char *sym)
>    return f;
>  }
>  
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
>  static const char *ll_bcsym(void *lib, const char *sym)
>  {
>    if (lib) {
>      return (const char *)GetProcAddress((HINSTANCE)lib, sym);
>    } else {
> +#if LJ_TARGET_UWP
> +    return (const char *)GetProcAddress((HINSTANCE)&__ImageBase, sym);
> +#else
>      HINSTANCE h = GetModuleHandleA(NULL);
>      const char *p = (const char *)GetProcAddress(h, sym);
>      if (p == NULL && GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
>  					(const char *)ll_bcsym, &h))
>        p = (const char *)GetProcAddress(h, sym);
>      return p;
> +#endif
>    }
>  }
>  
> diff --git a/src/lj_alloc.c b/src/lj_alloc.c
> index f7039b5b..9e2fb1f6 100644
> --- a/src/lj_alloc.c
> +++ b/src/lj_alloc.c
> @@ -167,7 +167,7 @@ static void *DIRECT_MMAP(size_t size)
>  static void *CALL_MMAP(size_t size)
>  {
>    DWORD olderr = GetLastError();
> -  void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> +  void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
>    SetLastError(olderr);
>    return ptr ? ptr : MFAIL;
>  }
> @@ -176,8 +176,8 @@ static void *CALL_MMAP(size_t size)
>  static void *DIRECT_MMAP(size_t size)
>  {
>    DWORD olderr = GetLastError();
> -  void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> -			   PAGE_READWRITE);
> +  void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> +			    PAGE_READWRITE);
>    SetLastError(olderr);
>    return ptr ? ptr : MFAIL;
>  }
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 7397492e..0351e046 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -141,6 +141,13 @@
>  #define LJ_TARGET_GC64		1
>  #endif
>  
> +#ifdef _UWP
> +#define LJ_TARGET_UWP		1
> +#if LUAJIT_TARGET == LUAJIT_ARCH_X64
> +#define LJ_TARGET_GC64		1
> +#endif
> +#endif
> +
>  #define LJ_NUMMODE_SINGLE	0	/* Single-number mode only. */
>  #define LJ_NUMMODE_SINGLE_DUAL	1	/* Default to single-number mode. */
>  #define LJ_NUMMODE_DUAL		2	/* Dual-number mode only. */
> @@ -586,6 +593,18 @@
>  #define LJ_ABI_WIN		0
>  #endif
>  
> +#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_UWP
> +#define LJ_WIN_VALLOC	VirtualAllocFromApp
> +#define LJ_WIN_VPROTECT	VirtualProtectFromApp
> +extern void *LJ_WIN_LOADLIBA(const char *path);
> +#else
> +#define LJ_WIN_VALLOC	VirtualAlloc
> +#define LJ_WIN_VPROTECT	VirtualProtect
> +#define LJ_WIN_LOADLIBA(path)	LoadLibraryExA((path), NULL, 0)
> +#endif
> +#endif
> +
>  #if defined(LUAJIT_NO_UNWIND) || __GNU_COMPACT_EH__ || defined(__symbian__) || LJ_TARGET_IOS || LJ_TARGET_PS3 || LJ_TARGET_PS4
>  #define LJ_NO_UNWIND		1
>  #endif
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index c33190d7..37edd00f 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -267,7 +267,7 @@ static void callback_mcode_new(CTState *cts)
>    if (CALLBACK_MAX_SLOT == 0)
>      lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
>  #if LJ_TARGET_WINDOWS
> -  p = VirtualAlloc(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> +  p = LJ_WIN_VALLOC(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
>    if (!p)
>      lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
>  #elif LJ_TARGET_POSIX
> @@ -285,7 +285,7 @@ static void callback_mcode_new(CTState *cts)
>  #if LJ_TARGET_WINDOWS
>    {
>      DWORD oprot;
> -    VirtualProtect(p, sz, PAGE_EXECUTE_READ, &oprot);
> +    LJ_WIN_VPROTECT(p, sz, PAGE_EXECUTE_READ, &oprot);
>    }
>  #elif LJ_TARGET_POSIX
>    mprotect(p, sz, (PROT_READ|PROT_EXEC));
> diff --git a/src/lj_clib.c b/src/lj_clib.c
> index c06c0915..a8672052 100644
> --- a/src/lj_clib.c
> +++ b/src/lj_clib.c
> @@ -158,11 +158,13 @@ BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
>  /* Default libraries. */
>  enum {
>    CLIB_HANDLE_EXE,
> +#if !LJ_TARGET_UWP
>    CLIB_HANDLE_DLL,
>    CLIB_HANDLE_CRT,
>    CLIB_HANDLE_KERNEL32,
>    CLIB_HANDLE_USER32,
>    CLIB_HANDLE_GDI32,
> +#endif
>    CLIB_HANDLE_MAX
>  };
>  
> @@ -208,7 +210,7 @@ static const char *clib_extname(lua_State *L, const char *name)
>  static void *clib_loadlib(lua_State *L, const char *name, int global)
>  {
>    DWORD oldwerr = GetLastError();
> -  void *h = (void *)LoadLibraryExA(clib_extname(L, name), NULL, 0);
> +  void *h = LJ_WIN_LOADLIBA(clib_extname(L, name));
>    if (!h) clib_error(L, "cannot load module " LUA_QS ": %s", name);
>    SetLastError(oldwerr);
>    UNUSED(global);
> @@ -218,6 +220,7 @@ static void *clib_loadlib(lua_State *L, const char *name, int global)
>  static void clib_unloadlib(CLibrary *cl)
>  {
>    if (cl->handle == CLIB_DEFHANDLE) {
> +#if !LJ_TARGET_UWP
>      MSize i;
>      for (i = CLIB_HANDLE_KERNEL32; i < CLIB_HANDLE_MAX; i++) {
>        void *h = clib_def_handle[i];
> @@ -226,11 +229,16 @@ static void clib_unloadlib(CLibrary *cl)
>  	FreeLibrary((HINSTANCE)h);
>        }
>      }
> +#endif
>    } else if (cl->handle) {
>      FreeLibrary((HINSTANCE)cl->handle);
>    }
>  }
>  
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
>  static void *clib_getsym(CLibrary *cl, const char *name)
>  {
>    void *p = NULL;
> @@ -239,6 +247,9 @@ static void *clib_getsym(CLibrary *cl, const char *name)
>      for (i = 0; i < CLIB_HANDLE_MAX; i++) {
>        HINSTANCE h = (HINSTANCE)clib_def_handle[i];
>        if (!(void *)h) {  /* Resolve default library handles (once). */
> +#if LJ_TARGET_UWP
> +	h = (HINSTANCE)&__ImageBase;
> +#else
>  	switch (i) {
>  	case CLIB_HANDLE_EXE: GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &h); break;
>  	case CLIB_HANDLE_DLL:
> @@ -249,11 +260,12 @@ static void *clib_getsym(CLibrary *cl, const char *name)
>  	  GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
>  			     (const char *)&_fmode, &h);
>  	  break;
> -	case CLIB_HANDLE_KERNEL32: h = LoadLibraryExA("kernel32.dll", NULL, 0); break;
> -	case CLIB_HANDLE_USER32: h = LoadLibraryExA("user32.dll", NULL, 0); break;
> -	case CLIB_HANDLE_GDI32: h = LoadLibraryExA("gdi32.dll", NULL, 0); break;
> +	case CLIB_HANDLE_KERNEL32: h = LJ_WIN_LOADLIBA("kernel32.dll"); break;
> +	case CLIB_HANDLE_USER32: h = LJ_WIN_LOADLIBA("user32.dll"); break;
> +	case CLIB_HANDLE_GDI32: h = LJ_WIN_LOADLIBA("gdi32.dll"); break;
>  	}
>  	if (!h) continue;
> +#endif
>  	clib_def_handle[i] = (void *)h;
>        }
>        p = (void *)GetProcAddress(h, name);
> diff --git a/src/lj_mcode.c b/src/lj_mcode.c
> index c6361018..10db4457 100644
> --- a/src/lj_mcode.c
> +++ b/src/lj_mcode.c
> @@ -66,8 +66,8 @@ void lj_mcode_sync(void *start, void *end)
>  
>  static void *mcode_alloc_at(jit_State *J, uintptr_t hint, size_t sz, DWORD prot)
>  {
> -  void *p = VirtualAlloc((void *)hint, sz,
> -			 MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
> +  void *p = LJ_WIN_VALLOC((void *)hint, sz,
> +			  MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
>    if (!p && !hint)
>      lj_trace_err(J, LJ_TRERR_MCODEAL);
>    return p;
> @@ -82,7 +82,7 @@ static void mcode_free(jit_State *J, void *p, size_t sz)
>  static int mcode_setprot(void *p, size_t sz, DWORD prot)
>  {
>    DWORD oprot;
> -  return !VirtualProtect(p, sz, prot, &oprot);
> +  return !LJ_WIN_VPROTECT(p, sz, prot, &oprot);
>  }
>  
>  #elif LJ_TARGET_POSIX
> @@ -255,7 +255,7 @@ static void *mcode_alloc(jit_State *J, size_t sz)
>  /* All memory addresses are reachable by relative jumps. */
>  static void *mcode_alloc(jit_State *J, size_t sz)
>  {
> -#ifdef __OpenBSD__
> +#if defined(__OpenBSD__) || LJ_TARGET_UWP
>    /* Allow better executable memory allocation for OpenBSD W^X mode. */
>    void *p = mcode_alloc_at(J, 0, sz, MCPROT_RUN);
>    if (p && mcode_setprot(p, sz, MCPROT_GEN)) {
> diff --git a/src/lj_profile_timer.c b/src/lj_profile_timer.c
> index 056fd1f7..0b859457 100644
> --- a/src/lj_profile_timer.c
> +++ b/src/lj_profile_timer.c
> @@ -84,7 +84,7 @@ static DWORD WINAPI timer_thread(void *timerx)
>  {
>    lj_profile_timer *timer = (lj_profile_timer *)timerx;
>    int interval = timer->opt.interval_msec;
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
>    timer->wmm_tbp(interval);
>  #endif
>    while (1) {
> @@ -92,7 +92,7 @@ static DWORD WINAPI timer_thread(void *timerx)
>      if (timer->abort) break;
>      timer->opt.handler();
>    }
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
>    timer->wmm_tep(interval);
>  #endif
>    return 0;
> @@ -101,9 +101,9 @@ static DWORD WINAPI timer_thread(void *timerx)
>  /* Start profiling timer thread. */
>  void lj_profile_timer_start(lj_profile_timer *timer)
>  {
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
>    if (!timer->wmm) { /* Load WinMM library on-demand. */
> -    timer->wmm = LoadLibraryExA("winmm.dll", NULL, 0);
> +    timer->wmm = LJ_WIN_LOADLIBA("winmm.dll");
>      if (timer->wmm) {
>        timer->wmm_tbp =
>  	(WMM_TPFUNC)GetProcAddress(timer->wmm, "timeBeginPeriod");
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:07   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:52     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 17:04     ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 2 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:07 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.

On Wed, Aug 09, 2023 at 06:35:58PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)
> 
> This patch refactors FFI parsing of supported C attributes and pragmas,
> `ffi.abi()` parameter check. It replaces usage of comparison (with
Typo: s/usage/the usage/
> hardcoded string hashes) with search in the given string with the
Typo: s/with search/with a search/
> format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
> "attribute" name.
> 
> Sergey Kaplun:
> * added the description for the commit
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lib_ffi.c   | 35 ++++++++++------------
>  src/lj_cparse.c | 77 +++++++++++++++++++++++++++++++------------------
>  src/lj_cparse.h |  2 ++
>  3 files changed, 67 insertions(+), 47 deletions(-)
> 
> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
> index d1fe1a14..62af54c1 100644
> --- a/src/lib_ffi.c
> +++ b/src/lib_ffi.c
> @@ -720,50 +720,47 @@ LJLIB_CF(ffi_fill)	LJLIB_REC(.)
>    return 0;
>  }
>  
> -#define H_(le, be)	LJ_ENDIAN_SELECT(0x##le, 0x##be)
> -
>  /* Test ABI string. */
>  LJLIB_CF(ffi_abi)	LJLIB_REC(.)
>  {
>    GCstr *s = lj_lib_checkstr(L, 1);
> -  int b = 0;
> -  switch (s->hash) {
> +  int b = lj_cparse_case(s,
>  #if LJ_64
> -  case H_(849858eb,ad35fd06): b = 1; break;  /* 64bit */
> +    "\00564bit"
>  #else
> -  case H_(662d3c79,d0e22477): b = 1; break;  /* 32bit */
> +    "\00532bit"
>  #endif
>  #if LJ_ARCH_HASFPU
> -  case H_(e33ee463,e33ee463): b = 1; break;  /* fpu */
> +    "\003fpu"
>  #endif
>  #if LJ_ABI_SOFTFP
> -  case H_(61211a23,c2e8c81c): b = 1; break;  /* softfp */
> +    "\006softfp"
>  #else
> -  case H_(539417a8,8ce0812f): b = 1; break;  /* hardfp */
> +    "\006hardfp"
>  #endif
>  #if LJ_ABI_EABI
> -  case H_(2182df8f,f2ed1152): b = 1; break;  /* eabi */
> +    "\004eabi"
>  #endif
>  #if LJ_ABI_WIN
> -  case H_(4ab624a8,4ab624a8): b = 1; break;  /* win */
> +    "\003win"
>  #endif
>  #if LJ_TARGET_UWP
> -  case H_(a40f0bcb,a40f0bcb): b = 1; break;  /* uwp */
> +    "\003uwp"
> +#endif
> +#if LJ_LE
> +    "\002le"
> +#else
> +    "\002be"
>  #endif
> -  case H_(3af93066,1f001464): b = 1; break;  /* le/be */
>  #if LJ_GC64
> -  case H_(9e89d2c9,13c83c92): b = 1; break;  /* gc64 */
> +    "\004gc64"
>  #endif
> -  default:
> -    break;
> -  }
> +  ) >= 0;
>    setboolV(L->top-1, b);
>    setboolV(&G(L)->tmptv2, b);  /* Remember for trace recorder. */
>    return 1;
>  }
>  
> -#undef H_
> -
>  LJLIB_PUSH(top-8) LJLIB_SET(!)  /* Store reference to miscmap table. */
>  
>  LJLIB_CF(ffi_metatype)
> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
> index fb440567..07c643d4 100644
> --- a/src/lj_cparse.c
> +++ b/src/lj_cparse.c
> @@ -28,6 +28,24 @@
>  ** If in doubt, please check the input against your favorite C compiler.
>  */
>  
> +/* -- Miscellaneous ------------------------------------------------------- */
> +
> +/* Match string against a C literal. */
> +#define cp_str_is(str, k) \
> +  ((str)->len == sizeof(k)-1 && !memcmp(strdata(str), k, sizeof(k)-1))
> +
> +/* Check string against a linear list of matches. */
> +int lj_cparse_case(GCstr *str, const char *match)
> +{
> +  MSize len;
> +  int n;
> +  for  (n = 0; (len = (MSize)*match++); n++, match += len) {
> +    if (str->len == len && !memcmp(match, strdata(str), len))
> +      return n;
> +  }
> +  return -1;
> +}
> +
>  /* -- C lexer ------------------------------------------------------------- */
>  
>  /* C lexer token names. */
> @@ -930,8 +948,6 @@ static CTypeID cp_decl_intern(CPState *cp, CPDecl *decl)
>  
>  /* -- C declaration parser ------------------------------------------------ */
>  
> -#define H_(le, be)	LJ_ENDIAN_SELECT(0x##le, 0x##be)
> -
>  /* Reset declaration state to declaration specifier. */
>  static void cp_decl_reset(CPDecl *decl)
>  {
> @@ -1071,44 +1087,57 @@ static void cp_decl_gccattribute(CPState *cp, CPDecl *decl)
>  	attrstr = lj_str_new(cp->L, c+2, attrstr->len-4);
>  #endif
>        cp_next(cp);
> -      switch (attrstr->hash) {
> -      case H_(64a9208e,8ce14319): case H_(8e6331b2,95a282af):  /* aligned */
> +      switch (lj_cparse_case(attrstr,
> +		"\007aligned" "\013__aligned__"
> +		"\006packed" "\012__packed__"
> +		"\004mode" "\010__mode__"
> +		"\013vector_size" "\017__vector_size__"
> +#if LJ_TARGET_X86
> +		"\007regparm" "\013__regparm__"
> +		"\005cdecl"  "\011__cdecl__"
> +		"\010thiscall" "\014__thiscall__"
> +		"\010fastcall" "\014__fastcall__"
> +		"\007stdcall" "\013__stdcall__"
> +		"\012sseregparm" "\016__sseregparm__"
> +#endif
> +	      )) {
> +      case 0: case 1: /* aligned */
>  	cp_decl_align(cp, decl);
>  	break;
> -      case H_(42eb47de,f0ede26c): case H_(29f48a09,cf383e0c):  /* packed */
> +      case 2: case 3: /* packed */
>  	decl->attr |= CTFP_PACKED;
>  	break;
> -      case H_(0a84eef6,8dfab04c): case H_(995cf92c,d5696591):  /* mode */
> +      case 4: case 5: /* mode */
>  	cp_decl_mode(cp, decl);
>  	break;
> -      case H_(0ab31997,2d5213fa): case H_(bf875611,200e9990):  /* vector_size */
> +      case 6: case 7: /* vector_size */
>  	{
>  	  CTSize vsize = cp_decl_sizeattr(cp);
>  	  if (vsize) CTF_INSERT(decl->attr, VSIZEP, lj_fls(vsize));
>  	}
>  	break;
>  #if LJ_TARGET_X86
> -      case H_(5ad22db8,c689b848): case H_(439150fa,65ea78cb):  /* regparm */
> +      case 8: case 9: /* regparm */
>  	CTF_INSERT(decl->fattr, REGPARM, cp_decl_sizeattr(cp));
>  	decl->fattr |= CTFP_CCONV;
>  	break;
> -      case H_(18fc0b98,7ff4c074): case H_(4e62abed,0a747424):  /* cdecl */
> +      case 10: case 11: /* cdecl */
>  	CTF_INSERT(decl->fattr, CCONV, CTCC_CDECL);
>  	decl->fattr |= CTFP_CCONV;
>  	break;
> -      case H_(72b2e41b,494c5a44): case H_(f2356d59,f25fc9bd):  /* thiscall */
> +      case 12: case 13: /* thiscall */
>  	CTF_INSERT(decl->fattr, CCONV, CTCC_THISCALL);
>  	decl->fattr |= CTFP_CCONV;
>  	break;
> -      case H_(0d0ffc42,ab746f88): case H_(21c54ba1,7f0ca7e3):  /* fastcall */
> +      case 14: case 15: /* fastcall */
>  	CTF_INSERT(decl->fattr, CCONV, CTCC_FASTCALL);
>  	decl->fattr |= CTFP_CCONV;
>  	break;
> -      case H_(ef76b040,9412e06a): case H_(de56697b,c750e6e1):  /* stdcall */
> +      case 16: case 17: /* stdcall */
>  	CTF_INSERT(decl->fattr, CCONV, CTCC_STDCALL);
>  	decl->fattr |= CTFP_CCONV;
>  	break;
> -      case H_(ea78b622,f234bd8e): case H_(252ffb06,8d50f34b):  /* sseregparm */
> +      case 18: case 19: /* sseregparm */
>  	decl->fattr |= CTF_SSEREGPARM;
>  	decl->fattr |= CTFP_CCONV;
>  	break;
> @@ -1140,16 +1169,13 @@ static void cp_decl_msvcattribute(CPState *cp, CPDecl *decl)
>    while (cp->tok == CTOK_IDENT) {
>      GCstr *attrstr = cp->str;
>      cp_next(cp);
> -    switch (attrstr->hash) {
> -    case H_(bc2395fa,98f267f8):  /* align */
> +    if (cp_str_is(attrstr, "align")) {
>        cp_decl_align(cp, decl);
> -      break;
> -    default:  /* Ignore all other attributes. */
> +    } else {  /* Ignore all other attributes. */
>        if (cp_opt(cp, '(')) {
>  	while (cp->tok != ')' && cp->tok != CTOK_EOF) cp_next(cp);
>  	cp_check(cp, ')');
>        }
> -      break;
>      }
>    }
>    cp_check(cp, ')');
> @@ -1729,17 +1755,16 @@ static CTypeID cp_decl_abstract(CPState *cp)
>  static void cp_pragma(CPState *cp, BCLine pragmaline)
>  {
>    cp_next(cp);
> -  if (cp->tok == CTOK_IDENT &&
> -      cp->str->hash == H_(e79b999f,42ca3e85))  {  /* pack */
> +  if (cp->tok == CTOK_IDENT && cp_str_is(cp->str, "pack"))  {
>      cp_next(cp);
>      cp_check(cp, '(');
>      if (cp->tok == CTOK_IDENT) {
> -      if (cp->str->hash == H_(738e923c,a1b65954)) {  /* push */
> +      if (cp_str_is(cp->str, "push")) {
>  	if (cp->curpack < CPARSE_MAX_PACKSTACK) {
>  	  cp->packstack[cp->curpack+1] = cp->packstack[cp->curpack];
>  	  cp->curpack++;
>  	}
> -      } else if (cp->str->hash == H_(6c71cf27,6c71cf27)) {  /* pop */
> +      } else if (cp_str_is(cp->str, "pop")) {
>  	if (cp->curpack > 0) cp->curpack--;
>        } else {
>  	cp_errmsg(cp, cp->tok, LJ_ERR_XSYMBOL);
> @@ -1788,13 +1813,11 @@ static void cp_decl_multi(CPState *cp)
>        if (tok == CTOK_INTEGER) {
>  	cp_line(cp, hashline);
>  	continue;
> -      } else if (tok == CTOK_IDENT &&
> -		 cp->str->hash == H_(187aab88,fcb60b42)) { /* line */
> +      } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "line")) {
>  	if (cp_next(cp) != CTOK_INTEGER) cp_err_token(cp, tok);
>  	cp_line(cp, hashline);
>  	continue;
> -      } else if (tok == CTOK_IDENT &&
> -	  cp->str->hash == H_(f5e6b4f8,1d509107)) { /* pragma */
> +      } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "pragma")) {
>  	cp_pragma(cp, hashline);
>  	continue;
>        } else {
> @@ -1865,8 +1888,6 @@ static void cp_decl_single(CPState *cp)
>    if (cp->tok != CTOK_EOF) cp_err_token(cp, CTOK_EOF);
>  }
>  
> -#undef H_
> -
>  /* ------------------------------------------------------------------------ */
>  
>  /* Protected callback for C parser. */
> diff --git a/src/lj_cparse.h b/src/lj_cparse.h
> index bad1060b..e40b4047 100644
> --- a/src/lj_cparse.h
> +++ b/src/lj_cparse.h
> @@ -60,6 +60,8 @@ typedef struct CPState {
>  
>  LJ_FUNC int lj_cparse(CPState *cp);
>  
> +LJ_FUNC int lj_cparse_case(GCstr *str, const char *match);
> +
>  #endif
>  
>  #endif
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
  2023-08-11  8:06   ` Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:10   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 17:15   ` Sergey Bronnikov via Tarantool-patches
  2 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:10 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM
On Wed, Aug 09, 2023 at 06:35:59PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)
> 
> This patch backports the aforementioned patch for mips and ppc, because
> those architectures were stripped during the backporting via
> 71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
> compilation and fix inconsistencies."). This applies these missed diffs
> to prevent conflict during backporting future patches.
> 
> This patch just removes macros, that are no more in use.
> 
> Sergey Kaplun:
> * added the description for the problem
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_asm_mips.h | 1 -
>  src/lj_asm_ppc.h  | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index a26a82cd..c27d8413 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
>  }
>  #endif
>  
> -#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
>  #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
>  
>  static void asm_arithov(ASMState *as, IRIns *ir)
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index 6cb608f7..6aaed058 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
>  }
>  
>  #define asm_abs(as, ir)		asm_fpunary(as, ir, PPCI_FABS)
> -#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
>  #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
>  
>  static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:17   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:59     ` Sergey Kaplun via Tarantool-patches
  2023-08-17  7:37   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:17 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM as trivial, except for a few comments regarding the commit message below.
On Wed, Aug 09, 2023 at 06:36:00PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)
> 
> This patch adds the `/* fallthrough */` where it may trigger the
> `-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
Typo: s/cases still/cases are still/
> this comment and will be fixed in the future commits.
Typo: s/in the/in/
> 
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> 
> Sergey Kaplun:
> * added the description for the commit
> 
> Part of tarantool/tarantool#8825
> ---
>  dynasm/dasm_arm.h  |  2 ++
>  dynasm/dasm_mips.h |  1 +
>  dynasm/dasm_ppc.h  |  1 +
>  dynasm/dasm_x86.h  | 18 ++++++++++++++----
>  src/lj_asm.c       |  7 ++++++-
>  src/lj_cparse.c    | 10 ++++++++++
>  src/lj_err.c       |  1 +
>  src/lj_opt_sink.c  |  2 +-
>  src/lj_parse.c     |  3 ++-
>  src/luajit.c       |  1 +
>  10 files changed, 39 insertions(+), 7 deletions(-)
> 
> diff --git a/dynasm/dasm_arm.h b/dynasm/dasm_arm.h
> index a43f7c66..1d404ccd 100644
> --- a/dynasm/dasm_arm.h
> +++ b/dynasm/dasm_arm.h
> @@ -254,6 +254,7 @@ void dasm_put(Dst_DECL, int start, ...)
>        case DASM_IMMV8:
>  	CK((n & 3) == 0, RANGE_I);
>  	n >>= 2;
> +	/* fallthrough */
>        case DASM_IMML8:
>        case DASM_IMML12:
>  	CK(n >= 0 ? ((n>>((ins>>5)&31)) == 0) :
> @@ -371,6 +372,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	  break;
>  	case DASM_REL_LG:
>  	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>  	case DASM_REL_PC:
>  	  CK(n >= 0, UNDEF_PC);
>  	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) - 4;
> diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
> index 4b49fd8c..71a835b2 100644
> --- a/dynasm/dasm_mips.h
> +++ b/dynasm/dasm_mips.h
> @@ -350,6 +350,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	  break;
>  	case DASM_REL_LG:
>  	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>  	case DASM_REL_PC:
>  	  CK(n >= 0, UNDEF_PC);
>  	  n = *DASM_POS2PTR(D, n);
> diff --git a/dynasm/dasm_ppc.h b/dynasm/dasm_ppc.h
> index 3a7ee9b0..83fc030a 100644
> --- a/dynasm/dasm_ppc.h
> +++ b/dynasm/dasm_ppc.h
> @@ -354,6 +354,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	  break;
>  	case DASM_REL_LG:
>  	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>  	case DASM_REL_PC:
>  	  CK(n >= 0, UNDEF_PC);
>  	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base);
> diff --git a/dynasm/dasm_x86.h b/dynasm/dasm_x86.h
> index bc636357..2a276042 100644
> --- a/dynasm/dasm_x86.h
> +++ b/dynasm/dasm_x86.h
> @@ -194,12 +194,13 @@ void dasm_put(Dst_DECL, int start, ...)
>        switch (action) {
>        case DASM_DISP:
>  	if (n == 0) { if (mrm < 0) mrm = p[-2]; if ((mrm&7) != 5) break; }
> -      case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob;
> +	/* fallthrough */
> +      case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
>        case DASM_REL_A: /* Assumes ptrdiff_t is int. !x64 */
>        case DASM_IMM_D: ofs += 4; break;
>        case DASM_IMM_S: CK(((n+128)&-256) == 0, RANGE_I); goto ob;
>        case DASM_IMM_B: CK((n&-256) == 0, RANGE_I); ob: ofs++; break;
> -      case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob;
> +      case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
>        case DASM_IMM_W: CK((n&-65536) == 0, RANGE_I); ofs += 2; break;
>        case DASM_SPACE: p++; ofs += n; break;
>        case DASM_SETLABEL: b[pos-2] = -0x40000000; break;  /* Neg. label ofs. */
> @@ -207,8 +208,8 @@ void dasm_put(Dst_DECL, int start, ...)
>  	if (*p < 0x40 && p[1] == DASM_DISP) mrm = n;
>  	if (*p < 0x20 && (n&7) == 4) ofs++;
>  	switch ((*p++ >> 3) & 3) {
> -	case 3: n |= b[pos-3];
> -	case 2: n |= b[pos-2];
> +	case 3: n |= b[pos-3]; /* fallthrough */
> +	case 2: n |= b[pos-2]; /* fallthrough */
>  	case 1: if (n <= 7) { b[pos-1] |= 0x10; ofs--; }
>  	}
>  	continue;
> @@ -329,11 +330,14 @@ int dasm_link(Dst_DECL, size_t *szp)
>  	  pos += 2;
>  	  break;
>  	}
> +	  /* fallthrough */
>  	case DASM_SPACE: case DASM_IMM_LG: case DASM_VREG: p++;
> +	  /* fallthrough */
>  	case DASM_DISP: case DASM_IMM_S: case DASM_IMM_B: case DASM_IMM_W:
>  	case DASM_IMM_D: case DASM_IMM_WB: case DASM_IMM_DB:
>  	case DASM_SETLABEL: case DASM_REL_A: case DASM_IMM_PC: pos++; break;
>  	case DASM_LABEL_LG: p++;
> +	  /* fallthrough */
>  	case DASM_LABEL_PC: b[pos++] += ofs; break; /* Fix label offset. */
>  	case DASM_ALIGN: ofs -= (b[pos++]+ofs)&*p++; break; /* Adjust ofs. */
>  	case DASM_EXTERN: p += 2; break;
> @@ -391,12 +395,15 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	    if (mrm != 5) { mm[-1] -= 0x80; break; } }
>  	  if (((n+128) & -256) != 0) goto wd; else mm[-1] -= 0x40;
>  	}
> +	  /* fallthrough */
>  	case DASM_IMM_S: case DASM_IMM_B: wb: dasmb(n); break;
>  	case DASM_IMM_DB: if (((n+128)&-256) == 0) {
>  	    db: if (!mark) mark = cp; mark[-2] += 2; mark = NULL; goto wb;
>  	  } else mark = NULL;
> +	  /* fallthrough */
>  	case DASM_IMM_D: wd: dasmd(n); break;
>  	case DASM_IMM_WB: if (((n+128)&-256) == 0) goto db; else mark = NULL;
> +	  /* fallthrough */
>  	case DASM_IMM_W: dasmw(n); break;
>  	case DASM_VREG: {
>  	  int t = *p++;
> @@ -421,6 +428,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	}
>  	case DASM_REL_LG: p++; if (n >= 0) goto rel_pc;
>  	  b++; n = (int)(ptrdiff_t)D->globals[-n];
> +	  /* fallthrough */
>  	case DASM_REL_A: rel_a: n -= (int)(ptrdiff_t)(cp+4); goto wd; /* !x64 */
>  	case DASM_REL_PC: rel_pc: {
>  	  int shrink = *b++;
> @@ -432,6 +440,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	}
>  	case DASM_IMM_LG:
>  	  p++; if (n < 0) { n = (int)(ptrdiff_t)D->globals[-n]; goto wd; }
> +	  /* fallthrough */
>  	case DASM_IMM_PC: {
>  	  int *pb = DASM_POS2PTR(D, n);
>  	  n = *pb < 0 ? pb[1] : (*pb + (int)(ptrdiff_t)base);
> @@ -452,6 +461,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	case DASM_EXTERN: n = DASM_EXTERN(Dst, cp, p[1], *p); p += 2; goto wd;
>  	case DASM_MARK: mark = cp; break;
>  	case DASM_ESC: action = *p++;
> +	  /* fallthrough */
>  	default: *cp++ = action; break;
>  	case DASM_SECTION: case DASM_STOP: goto stop;
>  	}
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 15de7e33..2d570bb9 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -2188,9 +2188,12 @@ static void asm_setup_regsp(ASMState *as)
>  	if (ir->op2 != REF_NIL && as->evenspill < 4)
>  	  as->evenspill = 4;  /* lj_cdata_newv needs 4 args. */
>        }
> +      /* fallthrough */
>  #else
> +      /* fallthrough */
>      case IR_CNEW:
>  #endif
> +      /* fallthrough */
>      case IR_TNEW: case IR_TDUP: case IR_CNEWI: case IR_TOSTR:
>      case IR_BUFSTR:
>        ir->prev = REGSP_HINT(RID_RET);
> @@ -2206,6 +2209,7 @@ static void asm_setup_regsp(ASMState *as)
>      case IR_LDEXP:
>  #endif
>  #endif
> +      /* fallthrough */
>      case IR_POW:
>        if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>  	if (inloop)
> @@ -2217,7 +2221,7 @@ static void asm_setup_regsp(ASMState *as)
>  	continue;
>  #endif
>        }
> -      /* fallthrough for integer POW */
> +      /* fallthrough */ /* for integer POW */
>      case IR_DIV: case IR_MOD:
>        if (!irt_isnum(ir->t)) {
>  	ir->prev = REGSP_HINT(RID_RET);
> @@ -2254,6 +2258,7 @@ static void asm_setup_regsp(ASMState *as)
>      case IR_BSHL: case IR_BSHR: case IR_BSAR:
>        if ((as->flags & JIT_F_BMI2))  /* Except if BMI2 is available. */
>  	break;
> +      /* fallthrough */
>      case IR_BROL: case IR_BROR:
>        if (!irref_isk(ir->op2) && !ra_hashint(IR(ir->op2)->r)) {
>  	IR(ir->op2)->r = REGSP_HINT(RID_ECX);
> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
> index 07c643d4..cd032b8e 100644
> --- a/src/lj_cparse.c
> +++ b/src/lj_cparse.c
> @@ -595,28 +595,34 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>  	k->id = k2.id > k3.id ? k2.id : k3.id;
>  	continue;
>        }
> +      /* fallthrough */
>      case 1:
>        if (cp_opt(cp, CTOK_OROR)) {
>  	cp_expr_sub(cp, &k2, 2); k->i32 = k->u32 || k2.u32; k->id = CTID_INT32;
>  	continue;
>        }
> +      /* fallthrough */
>      case 2:
>        if (cp_opt(cp, CTOK_ANDAND)) {
>  	cp_expr_sub(cp, &k2, 3); k->i32 = k->u32 && k2.u32; k->id = CTID_INT32;
>  	continue;
>        }
> +      /* fallthrough */
>      case 3:
>        if (cp_opt(cp, '|')) {
>  	cp_expr_sub(cp, &k2, 4); k->u32 = k->u32 | k2.u32; goto arith_result;
>        }
> +      /* fallthrough */
>      case 4:
>        if (cp_opt(cp, '^')) {
>  	cp_expr_sub(cp, &k2, 5); k->u32 = k->u32 ^ k2.u32; goto arith_result;
>        }
> +      /* fallthrough */
>      case 5:
>        if (cp_opt(cp, '&')) {
>  	cp_expr_sub(cp, &k2, 6); k->u32 = k->u32 & k2.u32; goto arith_result;
>        }
> +      /* fallthrough */
>      case 6:
>        if (cp_opt(cp, CTOK_EQ)) {
>  	cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 == k2.u32; k->id = CTID_INT32;
> @@ -625,6 +631,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>  	cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 != k2.u32; k->id = CTID_INT32;
>  	continue;
>        }
> +      /* fallthrough */
>      case 7:
>        if (cp_opt(cp, '<')) {
>  	cp_expr_sub(cp, &k2, 8);
> @@ -659,6 +666,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>  	k->id = CTID_INT32;
>  	continue;
>        }
> +      /* fallthrough */
>      case 8:
>        if (cp_opt(cp, CTOK_SHL)) {
>  	cp_expr_sub(cp, &k2, 9); k->u32 = k->u32 << k2.u32;
> @@ -671,6 +679,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>  	  k->u32 = k->u32 >> k2.u32;
>  	continue;
>        }
> +      /* fallthrough */
>      case 9:
>        if (cp_opt(cp, '+')) {
>  	cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 + k2.u32;
> @@ -680,6 +689,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>        } else if (cp_opt(cp, '-')) {
>  	cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 - k2.u32; goto arith_result;
>        }
> +      /* fallthrough */
>      case 10:
>        if (cp_opt(cp, '*')) {
>  	cp_expr_unary(cp, &k2); k->u32 = k->u32 * k2.u32; goto arith_result;
> diff --git a/src/lj_err.c b/src/lj_err.c
> index 9903d273..8d7134d9 100644
> --- a/src/lj_err.c
> +++ b/src/lj_err.c
> @@ -167,6 +167,7 @@ static void *err_unwind(lua_State *L, void *stopcf, int errcode)
>      case FRAME_CONT:  /* Continuation frame. */
>        if (frame_iscont_fficb(frame))
>  	goto unwind_c;
> +      /* fallthrough */
>      case FRAME_VARG:  /* Vararg frame. */
>        frame = frame_prevd(frame);
>        break;
> diff --git a/src/lj_opt_sink.c b/src/lj_opt_sink.c
> index a16d112f..c16363e7 100644
> --- a/src/lj_opt_sink.c
> +++ b/src/lj_opt_sink.c
> @@ -100,8 +100,8 @@ static void sink_mark_ins(jit_State *J)
>  	   (LJ_32 && ir+1 < irlast && (ir+1)->o == IR_HIOP &&
>  	    !sink_checkphi(J, ir, (ir+1)->op2))))
>  	irt_setmark(ir->t);  /* Mark ineligible allocation. */
> -      /* fallthrough */
>  #endif
> +      /* fallthrough */
>      case IR_USTORE:
>        irt_setmark(IR(ir->op2)->t);  /* Mark stored value. */
>        break;
> diff --git a/src/lj_parse.c b/src/lj_parse.c
> index 343fa797..e238afa3 100644
> --- a/src/lj_parse.c
> +++ b/src/lj_parse.c
> @@ -2684,7 +2684,8 @@ static int parse_stmt(LexState *ls)
>        lj_lex_next(ls);
>        parse_goto(ls);
>        break;
> -    }  /* else: fallthrough */
> +    }
> +    /* fallthrough */
>    default:
>      parse_call_assign(ls);
>      break;
> diff --git a/src/luajit.c b/src/luajit.c
> index 1ca24301..3a3ec247 100644
> --- a/src/luajit.c
> +++ b/src/luajit.c
> @@ -421,6 +421,7 @@ static int collectargs(char **argv, int *flags)
>        break;
>      case 'e':
>        *flags |= FLAGS_EXEC;
> +      /* fallthrough */
>      case 'j':  /* LuaJIT extension */
>      case 'l':
>        *flags |= FLAGS_OPTION;
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:21   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 14:01     ` Sergey Kaplun via Tarantool-patches
  2023-08-17  7:39   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:21 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM as trivial, except for the single comment regarding the commit message below.
On Wed, Aug 09, 2023 at 06:36:01PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
> 
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
Since there are no 'comments', but the single 'comment', I believe a better phrasing
would be:
| This patch adds the `/* fallthrough */` comment to dynasm/dasm_arm64.h, so the
| `-Wimplicit-fallthrough` [1] warning is not trigerred anymore for the ARM64 build.
> 
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> 
> Sergey Kaplun:
> * added the description for the commit
> 
> Part of tarantool/tarantool#8825
> ---
>  dynasm/dasm_arm64.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/dynasm/dasm_arm64.h b/dynasm/dasm_arm64.h
> index 47e1e074..ff21236d 100644
> --- a/dynasm/dasm_arm64.h
> +++ b/dynasm/dasm_arm64.h
> @@ -427,6 +427,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	  break;
>  	case DASM_REL_LG:
>  	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>  	case DASM_REL_PC:
>  	  CK(n >= 0, UNDEF_PC);
>  	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4;
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:25   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 14:08     ` Sergey Kaplun via Tarantool-patches
  2023-08-17  7:44   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:25 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM as trivial, except for a few nits, regarding the commit message.
On Wed, Aug 09, 2023 at 06:36:02PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)
> 
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
Typo: s/where it was/where they were/
> missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
Typo: s/is trigerred/is not triggered/
> 
> Also, this commits sets the correspoinding flag in the
Typo: s/commits/commit/
Typo: s/in the/in/
> <cmake/SetTargetFlags.cmake>.
> 
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> 
> Sergey Kaplun:
> * added the description for the commit
> 
> Part of tarantool/tarantool#8825
> ---
>  cmake/SetTargetFlags.cmake | 6 ++++++
>  src/lj_asm.c               | 2 +-
>  src/lj_asm_arm.h           | 4 ++--
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
> index 3b9e481d..d309989e 100644
> --- a/cmake/SetTargetFlags.cmake
> +++ b/cmake/SetTargetFlags.cmake
> @@ -8,6 +8,12 @@
>  
>  include(CheckUnwindTables)
>  
> +# Clang does not recognize comment markers.
> +if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
> +    AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
> +  AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
> +endif()
> +
>  if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
>    set(BUILDVM_MODE machasm)
>  else() # Linux and FreeBSD.
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 2d570bb9..25b96264 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -2176,8 +2176,8 @@ static void asm_setup_regsp(ASMState *as)
>  #if LJ_SOFTFP
>      case IR_MIN: case IR_MAX:
>        if ((ir+1)->o != IR_HIOP) break;
> -      /* fallthrough */
>  #endif
> +    /* fallthrough */
>      /* C calls evict all scratch regs and return results in RID_RET. */
>      case IR_SNEW: case IR_XSNEW: case IR_NEWREF: case IR_BUFPUT:
>        if (REGARG_NUMGPR < 3 && as->evenspill < 3)
> diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h
> index 6ae6e2f2..2894e5c9 100644
> --- a/src/lj_asm_arm.h
> +++ b/src/lj_asm_arm.h
> @@ -979,7 +979,7 @@ static ARMIns asm_fxloadins(IRIns *ir)
>    case IRT_I16: return ARMI_LDRSH;
>    case IRT_U16: return ARMI_LDRH;
>    case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VLDR_D;
> -  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S;
> +  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VLDR_S;  /* fallthrough */
>    default: return ARMI_LDR;
>    }
>  }
> @@ -990,7 +990,7 @@ static ARMIns asm_fxstoreins(IRIns *ir)
>    case IRT_I8: case IRT_U8: return ARMI_STRB;
>    case IRT_I16: case IRT_U16: return ARMI_STRH;
>    case IRT_NUM: lua_assert(!LJ_SOFTFP); return ARMI_VSTR_D;
> -  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S;
> +  case IRT_FLOAT: if (!LJ_SOFTFP) return ARMI_VSTR_S;  /* fallthrough */
>    default: return ARMI_STR;
>    }
>  }
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
@ 2023-08-15 13:35   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 14:20     ` Sergey Kaplun via Tarantool-patches
  2023-08-17  8:29   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 13:35 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
Please consider my comments below.

On Wed, Aug 09, 2023 at 06:36:03PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Thanks to Sergey Ostanevich.
> 
> (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
> 
> This patch just reverts the commit
> 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
> debug.getinfo(1,'>S')") and applies the one from the main repo for the
Typo: s/for the/for/
> consistency with the upstream.
> ---
>  src/lj_debug.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)

Since there were no test with the original fix, it would be nice to
add one.
> 
> diff --git a/src/lj_debug.c b/src/lj_debug.c
> index 654dc913..c4edcabb 100644
> --- a/src/lj_debug.c
> +++ b/src/lj_debug.c
> @@ -431,16 +431,12 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
>    TValue *frame = NULL;
>    TValue *nextframe = NULL;
>    GCfunc *fn;
> -  if (*what == '>') { /* we have to have an extra arg on stack */
> -    if (lua_gettop(L) > 2) {
> -      TValue *func = L->top - 1;
> -      api_check(L, tvisfunc(func));
> -      fn = funcV(func);
> -      L->top--;
> -      what++;
> -    } else { /* need better error to display? */
> -      return 0;
> -    }
> +  if (*what == '>') {
> +    TValue *func = L->top - 1;
> +    if (!tvisfunc(func)) return 0;
> +    fn = funcV(func);
> +    L->top--;
> +    what++;
>    } else {
>      uint32_t offset = (uint32_t)ar->i_ci & 0xffff;
>      uint32_t size = (uint32_t)ar->i_ci >> 16;
> -- 
> 2.41.0
Best regards,
Maxim Kokryashkin
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
@ 2023-08-15 14:07   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 14:22     ` Sergey Kaplun via Tarantool-patches
  2023-08-17  8:57   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 14:07 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.
On Wed, Aug 09, 2023 at 06:36:04PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Thanks to Yichun Zhang.
> 
> (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
> 
> This patch is predecessor for the commit
Typo: s/is predecessor for the/is the predecessor to/
> 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> that leading to the assertion failure. Since the predecessor patch,
Typo: s/leading/leads/
> there are no places, that can lead to the condition failure, since we
> always check that new baseslot + framesize (+ vargframe) >=
> `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
Typo: s/as minimum/as the minimum/
> for details), we can't obtain this assertion failure. This patch is
> added for the consistency with the upstream.
Typo: s/the consistency/consistency/
> 
> Since the predecessor patch fixes the issue, there is no new test case
> to add.
> 
> Sergey Kaplun:
> * added the description for the problem
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_record.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/lj_record.c b/src/lj_record.c
> index 02d9db9e..6030f77c 100644
> --- a/src/lj_record.c
> +++ b/src/lj_record.c
> @@ -87,9 +87,9 @@ static void rec_check_slots(jit_State *J)
>    BCReg s, nslots = J->baseslot + J->maxslot;
>    int32_t depth = 0;
>    cTValue *base = J->L->base - J->baseslot;
> -  lua_assert(J->baseslot >= 1+LJ_FR2 && J->baseslot < LJ_MAX_JSLOTS);
> +  lua_assert(J->baseslot >= 1+LJ_FR2);
>    lua_assert(J->baseslot == 1+LJ_FR2 || (J->slot[J->baseslot-1] & TREF_FRAME));
> -  lua_assert(nslots < LJ_MAX_JSLOTS);
> +  lua_assert(nslots <= LJ_MAX_JSLOTS);
>    for (s = 0; s < nslots; s++) {
>      TRef tr = J->slot[s];
>      if (tr) {
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
@ 2023-08-15 14:38   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 14:52     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 10:53   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-15 14:38 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few comments below.

On Wed, Aug 09, 2023 at 06:36:05PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)
> 
> This commit fixes possible integer overflow of the separator's length
Typo: s/possible/a possible/
> counter during parsing long strings. It may lead to the fact, that
> parser considers a string with unbalanced long brackets to be correct.
Typo: s/parser/the parser/
> Since this is pointless to parse too long string separators in the hope,
Typo: s/this is/it is/
> that the string is correct, just use hardcoded limit (2 ^ 25 is enough).
Typo: s/use hardcoded/use the hardcoded/
> 
> Be aware that this limit is different for Lua 5.1.
> 
> We can't check the string overflow itself without a really large file,
> because the ERR_MEM error will be raised, due to the string buffer
> reallocations during parsing. Keep such huge file in the repo is
Typo: s/Keep such/Keeping such a/
> pointless, so just check that we don't parse long string after
Typo: s/long string/long strings/
> aforementioned separator length.
Typo: s/aforementioned/the aforementioned/
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_lex.c                                  |  2 +-
>  .../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
>  2 files changed, 32 insertions(+), 1 deletion(-)
>  create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> 
> diff --git a/src/lj_lex.c b/src/lj_lex.c
> index 52856912..c66660d7 100644
> --- a/src/lj_lex.c
> +++ b/src/lj_lex.c
> @@ -138,7 +138,7 @@ static int lex_skipeq(LexState *ls)
>    int count = 0;
>    LexChar s = ls->c;
>    lua_assert(s == '[' || s == ']');
> -  while (lex_savenext(ls) == '=')
> +  while (lex_savenext(ls) == '=' && count < 0x20000000)
>      count++;
>    return (ls->c == s) ? count : (-count) - 1;
>  }
> diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> new file mode 100644
> index 00000000..fda69d17
> --- /dev/null
> +++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> @@ -0,0 +1,31 @@
> +local tap = require('tap')
> +
> +-- Test to check that we avoid parsing of too long separator
Typo: s/parsing of/parsing/
Typo: s/separator/separators/
> +-- for long strings.
> +-- See also the discussion in the
> +-- https://github.com/LuaJIT/LuaJIT/issues/812.
> +
> +local test = tap.test('lj-812-too-long-string-separator'):skipcond({
> +  ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
Please write a more detailed description of how it can be tested for non-GC64 build
and why it is disabled now, as we have discussed offline.

> +})
> +test:plan(2)
> +
> +-- We can't check the string overflow itself without a really
> +-- large file, because the ERR_MEM error will be raised, due to
> +-- the string buffer reallocations during parsing.
> +-- Keep such huge file in the repo is pointless, so just check
> +-- that we don't parse long string after some separator length.
> +-- Be aware that this limit is different for Lua 5.1.
Please fix the same typos as in the commit message here.
> +
> +-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
> +local separator = string.rep('=', 0x20000000 + 1)
> +local test_str = ('return [%s[]%s]'):format(separator, separator)
> +
> +local f, err = loadstring(test_str, 'empty_str_f')
> +test:ok(not f, 'correct status when parsing string with too long separator')
> +
> +-- Check error message.
> +test:ok(tostring(err):match('invalid long string delimiter'),
> +        'correct error when parsing string with too long separator')
> +
> +test:done(true)
> -- 
> 2.41.0
> 
Best regards,
Maxim Kokryashkin

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF Sergey Kaplun via Tarantool-patches
@ 2023-08-16  9:01   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 15:17     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 11:06   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16  9:01 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
Please consider my comments below.
On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by James Cowgill.
> 
> (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
> 
> The issue is observed for the following merged IRs:
> |    p64 HREF   0001  "a"            ; or other keys
> | >  p64 EQ     0002  [0x4002d0c528] ; nilnode
> Sometimes, when we need to rematerialize a constant during evicting of
Typo: s/during evicting/during the eviction/
> the register. So, the instruction related to constant rematerialization
Sometimes happens what? The sentence looks kind of chopped.
> is placed in the delay branch slot, which suppose to contain the loads
Typo: s/which suppose/which is supposed/
> of trace exit number to the `$ra` register. The resulting assembly is
Typo: s/number/numbers/ (because of `loads` being in the plural form)
> the following (for example):
> | beq     ra, r1, 0x400abee9b0  ->exit
> | lui     r1, 65531   ; delay slot without setting of the `ra`
> This leading to the assertion failure during trace exit in
Typo: s/leading/leads/
> `lj_trace_exit()`, since a trace number is incorrect.
> 
> This patch moves the constant register allocations above the main
> instruction emitting code in `asm_href()`.
AFAICS, It is not just moved, the register allocation logic has changed too.
Before the patch, there were a few cases of inplace emissions, which
disappeared after the patch. I believe it is important to mention to, along
with a more detailed description of the logic changes.
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_asm_mips.h                             |  42 +++++---
>  ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
>  2 files changed, 126 insertions(+), 17 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> 
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index c27d8413..23ffc3aa 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>    Reg dest = ra_dest(as, ir, allow);
>    Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
>    Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
> +#if LJ_64
> +  Reg cmp64 = RID_NONE;
> +#endif
>    IRRef refkey = ir->op2;
>    IRIns *irkey = IR(refkey);
>    int isk = irref_isk(refkey);
> @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>  #endif
>    tmp2 = ra_scratch(as, allow);
>    rset_clear(allow, tmp2);
> +#if LJ_64
> +  if (LJ_SOFTFP || !irt_isnum(kt)) {
> +    /* Allocate cmp64 register used for 64-bit comparisons */
> +    if (LJ_SOFTFP && irt_isnum(kt)) {
> +      cmp64 = key;
> +    } else if (!isk && irt_isaddr(kt)) {
> +      cmp64 = tmp2;
> +    } else {
> +      int64_t k;
> +      if (isk && irt_isaddr(kt)) {
> +	k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> +      } else {
> +	lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> +	k = ~((int64_t)~irt_toitype(ir->t) << 47);
> +      }
> +      cmp64 = ra_allock(as, k, allow);
> +      rset_clear(allow, cmp64);
> +    }
> +  }
> +#endif
>  
>    /* Key not found in chain: jump to exit (if merged) or load niltv. */
>    l_end = emit_label(as);
> @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>      emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
>      emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
>      emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> -  } else if (LJ_SOFTFP && irt_isnum(kt)) {
> -    emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> -    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> -  } else if (irt_isaddr(kt)) {
> -    Reg refk = tmp2;
> -    if (isk) {
> -      int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> -      refk = ra_allock(as, k, allow);
> -      rset_clear(allow, refk);
> -    }
> -    emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
> -    emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
>    } else {
> -    Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
> -    rset_clear(allow, pri);
> -    lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> -    emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
> -    emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> +    emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
> +    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
>    }
>    *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
>    if (!isk && irt_isaddr(kt)) {
> diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> new file mode 100644
> index 00000000..8c75e69c
> --- /dev/null
> +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> @@ -0,0 +1,101 @@
> +local tap = require('tap')
> +-- Test file to demonstrate the incorrect JIT behaviour for HREF
> +-- IR compilation on mips64.
> +-- See also https://github.com/LuaJIT/LuaJIT/pull/362.
> +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +test:plan(1)
> +
> +-- To reproduce the issue we need to compile a trace with
> +-- `IR_HREF`, with a lookup of constant hash key GC value. To
Typo: s/constant/a constant/
> +-- prevent an `IR_HREFK` to be emitted instead, we need a table
Typo: s/to be/from being/
> +-- with a huge hash part. Delta of address between the start of
Typo: s/Delta/The delta/
> +-- the hash part of the table and the current node to lookup must
> +-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
Typo: s/more/greater/
> +-- See <src/lj_record.c>, for details.
> +-- XXX: This constant is well suited to prevent test to be flaky,
Typo: s/to be/from being/
> +-- because the aforementioned delta is always large enough.
> +-- Also, this constant avoids table rehashing, when inserting new
> +-- keys.
> +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
> +
> +-- XXX: don't set `hotexit` to prevent compilation of trace after
> +-- exiting the main test cycle.
I suggest rehprasing it the following way:
| The `hotexit` option is not set to prevent the compilation of traces
| after the emission of the main test cycle.
> +jit.opt.start('hotloop=1')
> +
> +-- Don't use `table.new()`, here by intence -- this leads to the
Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/
> +-- allocation failure for the mcode memory, so traces are not
> +-- compiled.
> +local filled_tab = {}
> +-- Filling-up the table with GC values to minimize the amount of
Typo: s/Filling-up/Fill up/
> +-- hash collisions and increase delta between the start of the
Typo: s/delta/the delta/
> +-- hash part of the table and currently stored node.
Typo: s/currently/the currently/
> +for _ = 1, N_HASH_FIELDS do
> +  filled_tab[1LL] = 1
> +end
> +
> +-- luacheck: no unused
> +local tab_value_a
> +local tab_value_b
> +local tab_value_c
> +local tab_value_d
> +local tab_value_e
> +local tab_value_f
> +local tab_value_g
> +local tab_value_h
> +local tab_value_i
> +
> +-- The function for this trace has a bunch of the following IRs:
> +--    p64 HREF   0001  "a"            ; or other keys
> +-- >  p64 EQ     0002  [0x4002d0c528] ; nilnode
> +-- Sometimes, when we need to rematerialize a constant during
> +-- evicting of the register. So, the instruction related to
Typo: s/evicting/the eviction/
Again, sometimes happens what?
> +-- constant rematerialization is placed in the delay branch slot,
> +-- which suppose to contain the loads of trace exit number to the
Typo: s/which suppose/which is supposed/
Typo: s/number/numbers/
> +-- `$ra` register. This leading to the assertion failure during
Typo: s/leading/leads/
> +-- trace exit in `lj_trace_exit()`, since a trace number is
> +-- incorrect. The amount of the side exit to check is empirical
Typo: s/exit/exits/
> +-- (even a little bit more, than necessary just in case).
Typo: s/more/greater/
> +local function href_const(tab)
> +  tab_value_a = tab.a
> +  tab_value_b = tab.b
> +  tab_value_c = tab.c
> +  tab_value_d = tab.d
> +  tab_value_e = tab.e
> +  tab_value_f = tab.f
> +  tab_value_g = tab.g
> +  tab_value_h = tab.h
> +  tab_value_i = tab.i
> +end
> +
> +-- Compile main trace first.
Typo: s/main/the main/
> +href_const(filled_tab)
> +href_const(filled_tab)
> +
> +-- Now brute-force side exits to check that they are compiled
> +-- correct. Take side exits in the reverse order to take a new
Typo: s/correct/correctly/
Typo: s/the reverse/reverse/
> +-- side exit each time.
> +filled_tab.i = 'i'
> +href_const(filled_tab)
> +filled_tab.h = 'h'
> +href_const(filled_tab)
> +filled_tab.g = 'g'
> +href_const(filled_tab)
> +filled_tab.f = 'f'
> +href_const(filled_tab)
> +filled_tab.e = 'e'
> +href_const(filled_tab)
> +filled_tab.d = 'd'
> +href_const(filled_tab)
> +filled_tab.c = 'c'
> +href_const(filled_tab)
> +filled_tab.b = 'b'
> +href_const(filled_tab)
> +filled_tab.a = 'a'
> +href_const(filled_tab)
> +
> +test:ok(true, 'no assertion failures during trace exits')
> +
> +test:done(true)
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable Sergey Kaplun via Tarantool-patches
@ 2023-08-16  9:03   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 15:22     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 12:01   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16  9:03 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for the single typo in the commit message.
On Wed, Aug 09, 2023 at 06:36:07PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Cleanup only, bug cannot trigger.
> Thanks to Domingo Alvarez Duarte.
> 
> (cherry-picked from commit 5c911998a3c85d024a8006feafc68d0b4c962fd8)
> 
> This patch fixes local shadow variable `n` in `template__` function from
Typo: s/local/the local/
> <dynasm/dasm_mips.lua> by renaming it to `m`. Since this cannot be
> triggered, there is no test provided.
> 
> Sergey Kaplun:
> * added the description for the problem
> ---
>  dynasm/dasm_mips.lua | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/dynasm/dasm_mips.lua b/dynasm/dasm_mips.lua
> index 78a4e34a..bd2a2b43 100644
> --- a/dynasm/dasm_mips.lua
> +++ b/dynasm/dasm_mips.lua
> @@ -809,9 +809,9 @@ map_op[".template__"] = function(params, template, nparams)
>      elseif p == "X" then
>        op = op + parse_index(params[n]); n = n + 1
>      elseif p == "B" or p == "J" then
> -      local mode, n, s = parse_label(params[n], false)
> -      if p == "B" then n = n + 2048 end
> -      waction("REL_"..mode, n, s, 1)
> +      local mode, m, s = parse_label(params[n], false)
> +      if p == "B" then m = m + 2048 end
> +      waction("REL_"..mode, m, s, 1)
>        n = n + 1
>      elseif p == "A" then
>        op = op + parse_imm(params[n], 5, 6, 0, false); n = n + 1
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port Sergey Kaplun via Tarantool-patches
@ 2023-08-16  9:16   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 15:24     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 13:03   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16  9:16 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the patch!
LGTM, except for a few nits regarding the commit message.
On Wed, Aug 09, 2023 at 06:36:08PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by Hua Zhang, YunQiang Su from Wave Computing,
> and Radovan Birdic from RT-RK.
> Sponsored by Wave Computing.
> 
> (cherry-picked from commit 94d0b53004a5fa368defa4307a17edcdb87fe727)
> 
> This patch adds support for MIPS Release 6 [1] for the 64-bit build.
> This includes:
> * Global `_map_def` value is set with <dynasm/dynasm.lua>. `MIPSR6` key
>   specifies the corresponding instruction set support. Also, `MIPSR6` is
>   defined in `DYNASM_FLAGS` (`DASM_AFLAGS`).
> * New instructions are added within <dynasm/dasm_mips.lua>, they are
>   used if the aforementioned key is set.
> * Obsolete instructions (that are no more in use in r6) are used in the
Typo: s/no more/no longer/
>   opposite case (if `MIPSR6` isn't set).
> * New opcode maps are added into  <src/jit/dis_mips.lua>.
Typo: s/into/to/
> * `map_arch` table in <jit/bcsave.lua> is refactored for more convenient
>   usage. Now each arch key contains a table with the corresponding info
>   about supported architecture:
Typo: s/about/about the/
>     - `e`: endianess; "le" or "be"
>     - `b`: bit-width of the supported architecture; 32 or 64
>     - `m`: machine specification (see `e_machine` in man elf)
>     - `f`: processor-specific flags (see `e_flags` in man elf)
>     - `p`: number that identifies the type of target machine [2] for
>       Portable Executable format [3].
> * New `LJ_TARGET_MIPSR6` define is set for MIPSR6 in <src/lj_arch.h>.
> * The corresponding "MIPS32R6", "MIPS64R6" CPU strings are added to the
>   <src/jit.h>
> * MIPSR6 instructions are added to the <src/lj_target_mips.h>, some
>   obsolete instructions are removed or defined only for the non-MIPSR6
>   build.
> * All release-dependent instructions in <src/lj_asm_mips.h> are
>   instrumented with `LJ_TARGET_MIPSR6` macro.
> * `f20`, `f21`, `f22` FP registers are defined as `FTMP0`, `FTMP1`,
>   `FTMP2` correspondingly in the VM.
> * All release-dependent instructions in <src/vm_mips64.dasm> are
>   instrumented with `MIPSR6` macro.
> * `sfmin_max` macro now takes the third operand for the MIPSR6 build.
> * Fix implicit fallthrough warning for `LJ_SOFTFP && !LJ_NEED_FP64`
Typo: s/Fix/Fix the/
>   build in <src/lj_asm.c>.
> 
> Note, that 32-bit r6 targets still unsupported, because it is difficult
Typo: s/targets/targets are/
> and most available r6 CPUs are 64 bit.
> 
> [1]: https://www.mips.com/products/architectures/mips64/
> [2]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#machine-types
> [3]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
> 
> Sergey Kaplun:
> * added the description for the feature
> 
> Part of tarantool/tarantool#8825
> ---
>  cmake/SetDynASMFlags.cmake |   5 +
>  dynasm/dasm_mips.h         |  13 +-
>  dynasm/dasm_mips.lua       | 625 +++++++++++++++++++++++--------------
>  dynasm/dynasm.lua          |   1 +
>  src/Makefile.original      |   3 +
>  src/jit/bcsave.lua         |  84 ++---
>  src/jit/dis_mips.lua       | 293 +++++++++++++++--
>  src/jit/dis_mips64r6.lua   |  17 +
>  src/jit/dis_mips64r6el.lua |  17 +
>  src/lj_arch.h              |  29 +-
>  src/lj_asm.c               |   2 +-
>  src/lj_asm_mips.h          | 114 ++++++-
>  src/lj_emit_mips.h         |  15 +-
>  src/lj_jit.h               |   8 +
>  src/lj_target_mips.h       |  52 ++-
>  src/vm_mips64.dasc         | 370 ++++++++++++++++++++--
>  16 files changed, 1301 insertions(+), 347 deletions(-)
>  create mode 100644 src/jit/dis_mips64r6.lua
>  create mode 100644 src/jit/dis_mips64r6el.lua
> 
> diff --git a/cmake/SetDynASMFlags.cmake b/cmake/SetDynASMFlags.cmake
> index 142d7e64..7eead6e9 100644
> --- a/cmake/SetDynASMFlags.cmake
> +++ b/cmake/SetDynASMFlags.cmake
> @@ -64,6 +64,11 @@ elseif(LUAJIT_ARCH STREQUAL "mips")
>    endif()
>  endif()
>  
> +string(FIND "${TESTARCH}" "LJ_TARGET_MIPSR6" FOUND)
> +if(NOT FOUND EQUAL -1)
> +  AppendFlags(DYNASM_FLAGS -D MIPSR6)
> +endif()
> +
>  string(FIND "${TESTARCH}" "LJ_LE 1" FOUND)
>  if(NOT FOUND EQUAL -1)
>    list(APPEND DYNASM_FLAGS -D ENDIAN_LE)
> diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
> index 71a835b2..7d06aa72 100644
> --- a/dynasm/dasm_mips.h
> +++ b/dynasm/dasm_mips.h
> @@ -355,14 +355,15 @@ int dasm_encode(Dst_DECL, void *buffer)
>  	  CK(n >= 0, UNDEF_PC);
>  	  n = *DASM_POS2PTR(D, n);
>  	  if (ins & 2048)
> -	    n = n - (int)((char *)cp - base);
> -	  else
>  	    n = (n + (int)(size_t)base) & 0x0fffffff;
> -	patchrel:
> +	  else
> +	    n = n - (int)((char *)cp - base);
> +	patchrel: {
> +	  unsigned int e = 16 + ((ins >> 12) & 15);
>  	  CK((n & 3) == 0 &&
> -	     ((n + ((ins & 2048) ? 0x00020000 : 0)) >>
> -	       ((ins & 2048) ? 18 : 28)) == 0, RANGE_REL);
> -	  cp[-1] |= ((n>>2) & ((ins & 2048) ? 0x0000ffff: 0x03ffffff));
> +	     ((n + ((ins & 2048) ? 0 : (1<<(e+1)))) >> (e+2)) == 0, RANGE_REL);
> +	  cp[-1] |= ((n>>2) & ((1<<e)-1));
> +	  }
>  	  break;
>  	case DASM_LABEL_LG:
>  	  ins &= 2047; if (ins >= 20) D->globals[ins-10] = (void *)(base + n);
> diff --git a/dynasm/dasm_mips.lua b/dynasm/dasm_mips.lua
> index bd2a2b43..ccdc53cd 100644
> --- a/dynasm/dasm_mips.lua
> +++ b/dynasm/dasm_mips.lua
> @@ -6,6 +6,7 @@
>  ------------------------------------------------------------------------------
>  
>  local mips64 = mips64
> +local mipsr6 = _map_def.MIPSR6
>  
>  -- Module information:
>  local _info = {
> @@ -238,7 +239,6 @@ local map_op = {
>    bne_3 =	"14000000STB",
>    blez_2 =	"18000000SB",
>    bgtz_2 =	"1c000000SB",
> -  addi_3 =	"20000000TSI",
>    li_2 =	"24000000TI",
>    addiu_3 =	"24000000TSI",
>    slti_3 =	"28000000TSI",
> @@ -248,40 +248,22 @@ local map_op = {
>    ori_3 =	"34000000TSU",
>    xori_3 =	"38000000TSU",
>    lui_2 =	"3c000000TU",
> -  beqzl_2 =	"50000000SB",
> -  beql_3 =	"50000000STB",
> -  bnezl_2 =	"54000000SB",
> -  bnel_3 =	"54000000STB",
> -  blezl_2 =	"58000000SB",
> -  bgtzl_2 =	"5c000000SB",
> -  daddi_3 =	mips64 and "60000000TSI",
>    daddiu_3 =	mips64 and "64000000TSI",
>    ldl_2 =	mips64 and "68000000TO",
>    ldr_2 =	mips64 and "6c000000TO",
>    lb_2 =	"80000000TO",
>    lh_2 =	"84000000TO",
> -  lwl_2 =	"88000000TO",
>    lw_2 =	"8c000000TO",
>    lbu_2 =	"90000000TO",
>    lhu_2 =	"94000000TO",
> -  lwr_2 =	"98000000TO",
>    lwu_2 =	mips64 and "9c000000TO",
>    sb_2 =	"a0000000TO",
>    sh_2 =	"a4000000TO",
> -  swl_2 =	"a8000000TO",
>    sw_2 =	"ac000000TO",
> -  sdl_2 =	mips64 and "b0000000TO",
> -  sdr_2 =	mips64 and "b1000000TO",
> -  swr_2 =	"b8000000TO",
> -  cache_2 =	"bc000000NO",
> -  ll_2 =	"c0000000TO",
>    lwc1_2 =	"c4000000HO",
> -  pref_2 =	"cc000000NO",
>    ldc1_2 =	"d4000000HO",
>    ld_2 =	mips64 and "dc000000TO",
> -  sc_2 =	"e0000000TO",
>    swc1_2 =	"e4000000HO",
> -  scd_2 =	mips64 and "f0000000TO",
>    sdc1_2 =	"f4000000HO",
>    sd_2 =	mips64 and "fc000000TO",
>  
> @@ -289,10 +271,6 @@ local map_op = {
>    nop_0 =	"00000000",
>    sll_3 =	"00000000DTA",
>    sextw_2 =	"00000000DT",
> -  movf_2 =	"00000001DS",
> -  movf_3 =	"00000001DSC",
> -  movt_2 =	"00010001DS",
> -  movt_3 =	"00010001DSC",
>    srl_3 =	"00000002DTA",
>    rotr_3 =	"00200002DTA",
>    sra_3 =	"00000003DTA",
> @@ -301,31 +279,16 @@ local map_op = {
>    rotrv_3 =	"00000046DTS",
>    drotrv_3 =	mips64 and "00000056DTS",
>    srav_3 =	"00000007DTS",
> -  jr_1 =	"00000008S",
>    jalr_1 =	"0000f809S",
>    jalr_2 =	"00000009DS",
> -  movz_3 =	"0000000aDST",
> -  movn_3 =	"0000000bDST",
>    syscall_0 =	"0000000c",
>    syscall_1 =	"0000000cY",
>    break_0 =	"0000000d",
>    break_1 =	"0000000dY",
>    sync_0 =	"0000000f",
> -  mfhi_1 =	"00000010D",
> -  mthi_1 =	"00000011S",
> -  mflo_1 =	"00000012D",
> -  mtlo_1 =	"00000013S",
>    dsllv_3 =	mips64 and "00000014DTS",
>    dsrlv_3 =	mips64 and "00000016DTS",
>    dsrav_3 =	mips64 and "00000017DTS",
> -  mult_2 =	"00000018ST",
> -  multu_2 =	"00000019ST",
> -  div_2 =	"0000001aST",
> -  divu_2 =	"0000001bST",
> -  dmult_2 =	mips64 and "0000001cST",
> -  dmultu_2 =	mips64 and "0000001dST",
> -  ddiv_2 =	mips64 and "0000001eST",
> -  ddivu_2 =	mips64 and "0000001fST",
>    add_3 =	"00000020DST",
>    move_2 =	mips64 and "00000025DS" or "00000021DS",
>    addu_3 =	"00000021DST",
> @@ -369,32 +332,9 @@ local map_op = {
>    bgez_2 =	"04010000SB",
>    bltzl_2 =	"04020000SB",
>    bgezl_2 =	"04030000SB",
> -  tgei_2 =	"04080000SI",
> -  tgeiu_2 =	"04090000SI",
> -  tlti_2 =	"040a0000SI",
> -  tltiu_2 =	"040b0000SI",
> -  teqi_2 =	"040c0000SI",
> -  tnei_2 =	"040e0000SI",
> -  bltzal_2 =	"04100000SB",
>    bal_1 =	"04110000B",
> -  bgezal_2 =	"04110000SB",
> -  bltzall_2 =	"04120000SB",
> -  bgezall_2 =	"04130000SB",
>    synci_1 =	"041f0000O",
>  
> -  -- Opcode SPECIAL2.
> -  madd_2 =	"70000000ST",
> -  maddu_2 =	"70000001ST",
> -  mul_3 =	"70000002DST",
> -  msub_2 =	"70000004ST",
> -  msubu_2 =	"70000005ST",
> -  clz_2 =	"70000020DS=",
> -  clo_2 =	"70000021DS=",
> -  dclz_2 =	mips64 and "70000024DS=",
> -  dclo_2 =	mips64 and "70000025DS=",
> -  sdbbp_0 =	"7000003f",
> -  sdbbp_1 =	"7000003fY",
> -
>    -- Opcode SPECIAL3.
>    ext_4 =	"7c000000TSAM", -- Note: last arg is msbd = size-1
>    dextm_4 =	mips64 and "7c000001TSAM", -- Args: pos    | size-1-32
> @@ -445,15 +385,6 @@ local map_op = {
>    ctc1_2 =	"44c00000TG",
>    mthc1_2 =	"44e00000TG",
>  
> -  bc1f_1 =	"45000000B",
> -  bc1f_2 =	"45000000CB",
> -  bc1t_1 =	"45010000B",
> -  bc1t_2 =	"45010000CB",
> -  bc1fl_1 =	"45020000B",
> -  bc1fl_2 =	"45020000CB",
> -  bc1tl_1 =	"45030000B",
> -  bc1tl_2 =	"45030000CB",
> -
>    ["add.s_3"] =		"46000000FGH",
>    ["sub.s_3"] =		"46000001FGH",
>    ["mul.s_3"] =		"46000002FGH",
> @@ -470,51 +401,11 @@ local map_op = {
>    ["trunc.w.s_2"] =	"4600000dFG",
>    ["ceil.w.s_2"] =	"4600000eFG",
>    ["floor.w.s_2"] =	"4600000fFG",
> -  ["movf.s_2"] =	"46000011FG",
> -  ["movf.s_3"] =	"46000011FGC",
> -  ["movt.s_2"] =	"46010011FG",
> -  ["movt.s_3"] =	"46010011FGC",
> -  ["movz.s_3"] =	"46000012FGT",
> -  ["movn.s_3"] =	"46000013FGT",
>    ["recip.s_2"] =	"46000015FG",
>    ["rsqrt.s_2"] =	"46000016FG",
>    ["cvt.d.s_2"] =	"46000021FG",
>    ["cvt.w.s_2"] =	"46000024FG",
>    ["cvt.l.s_2"] =	"46000025FG",
> -  ["cvt.ps.s_3"] =	"46000026FGH",
> -  ["c.f.s_2"] =		"46000030GH",
> -  ["c.f.s_3"] =		"46000030VGH",
> -  ["c.un.s_2"] =	"46000031GH",
> -  ["c.un.s_3"] =	"46000031VGH",
> -  ["c.eq.s_2"] =	"46000032GH",
> -  ["c.eq.s_3"] =	"46000032VGH",
> -  ["c.ueq.s_2"] =	"46000033GH",
> -  ["c.ueq.s_3"] =	"46000033VGH",
> -  ["c.olt.s_2"] =	"46000034GH",
> -  ["c.olt.s_3"] =	"46000034VGH",
> -  ["c.ult.s_2"] =	"46000035GH",
> -  ["c.ult.s_3"] =	"46000035VGH",
> -  ["c.ole.s_2"] =	"46000036GH",
> -  ["c.ole.s_3"] =	"46000036VGH",
> -  ["c.ule.s_2"] =	"46000037GH",
> -  ["c.ule.s_3"] =	"46000037VGH",
> -  ["c.sf.s_2"] =	"46000038GH",
> -  ["c.sf.s_3"] =	"46000038VGH",
> -  ["c.ngle.s_2"] =	"46000039GH",
> -  ["c.ngle.s_3"] =	"46000039VGH",
> -  ["c.seq.s_2"] =	"4600003aGH",
> -  ["c.seq.s_3"] =	"4600003aVGH",
> -  ["c.ngl.s_2"] =	"4600003bGH",
> -  ["c.ngl.s_3"] =	"4600003bVGH",
> -  ["c.lt.s_2"] =	"4600003cGH",
> -  ["c.lt.s_3"] =	"4600003cVGH",
> -  ["c.nge.s_2"] =	"4600003dGH",
> -  ["c.nge.s_3"] =	"4600003dVGH",
> -  ["c.le.s_2"] =	"4600003eGH",
> -  ["c.le.s_3"] =	"4600003eVGH",
> -  ["c.ngt.s_2"] =	"4600003fGH",
> -  ["c.ngt.s_3"] =	"4600003fVGH",
> -
>    ["add.d_3"] =		"46200000FGH",
>    ["sub.d_3"] =		"46200001FGH",
>    ["mul.d_3"] =		"46200002FGH",
> @@ -531,130 +422,410 @@ local map_op = {
>    ["trunc.w.d_2"] =	"4620000dFG",
>    ["ceil.w.d_2"] =	"4620000eFG",
>    ["floor.w.d_2"] =	"4620000fFG",
> -  ["movf.d_2"] =	"46200011FG",
> -  ["movf.d_3"] =	"46200011FGC",
> -  ["movt.d_2"] =	"46210011FG",
> -  ["movt.d_3"] =	"46210011FGC",
> -  ["movz.d_3"] =	"46200012FGT",
> -  ["movn.d_3"] =	"46200013FGT",
>    ["recip.d_2"] =	"46200015FG",
>    ["rsqrt.d_2"] =	"46200016FG",
>    ["cvt.s.d_2"] =	"46200020FG",
>    ["cvt.w.d_2"] =	"46200024FG",
>    ["cvt.l.d_2"] =	"46200025FG",
> -  ["c.f.d_2"] =		"46200030GH",
> -  ["c.f.d_3"] =		"46200030VGH",
> -  ["c.un.d_2"] =	"46200031GH",
> -  ["c.un.d_3"] =	"46200031VGH",
> -  ["c.eq.d_2"] =	"46200032GH",
> -  ["c.eq.d_3"] =	"46200032VGH",
> -  ["c.ueq.d_2"] =	"46200033GH",
> -  ["c.ueq.d_3"] =	"46200033VGH",
> -  ["c.olt.d_2"] =	"46200034GH",
> -  ["c.olt.d_3"] =	"46200034VGH",
> -  ["c.ult.d_2"] =	"46200035GH",
> -  ["c.ult.d_3"] =	"46200035VGH",
> -  ["c.ole.d_2"] =	"46200036GH",
> -  ["c.ole.d_3"] =	"46200036VGH",
> -  ["c.ule.d_2"] =	"46200037GH",
> -  ["c.ule.d_3"] =	"46200037VGH",
> -  ["c.sf.d_2"] =	"46200038GH",
> -  ["c.sf.d_3"] =	"46200038VGH",
> -  ["c.ngle.d_2"] =	"46200039GH",
> -  ["c.ngle.d_3"] =	"46200039VGH",
> -  ["c.seq.d_2"] =	"4620003aGH",
> -  ["c.seq.d_3"] =	"4620003aVGH",
> -  ["c.ngl.d_2"] =	"4620003bGH",
> -  ["c.ngl.d_3"] =	"4620003bVGH",
> -  ["c.lt.d_2"] =	"4620003cGH",
> -  ["c.lt.d_3"] =	"4620003cVGH",
> -  ["c.nge.d_2"] =	"4620003dGH",
> -  ["c.nge.d_3"] =	"4620003dVGH",
> -  ["c.le.d_2"] =	"4620003eGH",
> -  ["c.le.d_3"] =	"4620003eVGH",
> -  ["c.ngt.d_2"] =	"4620003fGH",
> -  ["c.ngt.d_3"] =	"4620003fVGH",
> -
> -  ["add.ps_3"] =	"46c00000FGH",
> -  ["sub.ps_3"] =	"46c00001FGH",
> -  ["mul.ps_3"] =	"46c00002FGH",
> -  ["abs.ps_2"] =	"46c00005FG",
> -  ["mov.ps_2"] =	"46c00006FG",
> -  ["neg.ps_2"] =	"46c00007FG",
> -  ["movf.ps_2"] =	"46c00011FG",
> -  ["movf.ps_3"] =	"46c00011FGC",
> -  ["movt.ps_2"] =	"46c10011FG",
> -  ["movt.ps_3"] =	"46c10011FGC",
> -  ["movz.ps_3"] =	"46c00012FGT",
> -  ["movn.ps_3"] =	"46c00013FGT",
> -  ["cvt.s.pu_2"] =	"46c00020FG",
> -  ["cvt.s.pl_2"] =	"46c00028FG",
> -  ["pll.ps_3"] =	"46c0002cFGH",
> -  ["plu.ps_3"] =	"46c0002dFGH",
> -  ["pul.ps_3"] =	"46c0002eFGH",
> -  ["puu.ps_3"] =	"46c0002fFGH",
> -  ["c.f.ps_2"] =	"46c00030GH",
> -  ["c.f.ps_3"] =	"46c00030VGH",
> -  ["c.un.ps_2"] =	"46c00031GH",
> -  ["c.un.ps_3"] =	"46c00031VGH",
> -  ["c.eq.ps_2"] =	"46c00032GH",
> -  ["c.eq.ps_3"] =	"46c00032VGH",
> -  ["c.ueq.ps_2"] =	"46c00033GH",
> -  ["c.ueq.ps_3"] =	"46c00033VGH",
> -  ["c.olt.ps_2"] =	"46c00034GH",
> -  ["c.olt.ps_3"] =	"46c00034VGH",
> -  ["c.ult.ps_2"] =	"46c00035GH",
> -  ["c.ult.ps_3"] =	"46c00035VGH",
> -  ["c.ole.ps_2"] =	"46c00036GH",
> -  ["c.ole.ps_3"] =	"46c00036VGH",
> -  ["c.ule.ps_2"] =	"46c00037GH",
> -  ["c.ule.ps_3"] =	"46c00037VGH",
> -  ["c.sf.ps_2"] =	"46c00038GH",
> -  ["c.sf.ps_3"] =	"46c00038VGH",
> -  ["c.ngle.ps_2"] =	"46c00039GH",
> -  ["c.ngle.ps_3"] =	"46c00039VGH",
> -  ["c.seq.ps_2"] =	"46c0003aGH",
> -  ["c.seq.ps_3"] =	"46c0003aVGH",
> -  ["c.ngl.ps_2"] =	"46c0003bGH",
> -  ["c.ngl.ps_3"] =	"46c0003bVGH",
> -  ["c.lt.ps_2"] =	"46c0003cGH",
> -  ["c.lt.ps_3"] =	"46c0003cVGH",
> -  ["c.nge.ps_2"] =	"46c0003dGH",
> -  ["c.nge.ps_3"] =	"46c0003dVGH",
> -  ["c.le.ps_2"] =	"46c0003eGH",
> -  ["c.le.ps_3"] =	"46c0003eVGH",
> -  ["c.ngt.ps_2"] =	"46c0003fGH",
> -  ["c.ngt.ps_3"] =	"46c0003fVGH",
> -
>    ["cvt.s.w_2"] =	"46800020FG",
>    ["cvt.d.w_2"] =	"46800021FG",
> -
>    ["cvt.s.l_2"] =	"46a00020FG",
>    ["cvt.d.l_2"] =	"46a00021FG",
> -
> -  -- Opcode COP1X.
> -  lwxc1_2 =		"4c000000FX",
> -  ldxc1_2 =		"4c000001FX",
> -  luxc1_2 =		"4c000005FX",
> -  swxc1_2 =		"4c000008FX",
> -  sdxc1_2 =		"4c000009FX",
> -  suxc1_2 =		"4c00000dFX",
> -  prefx_2 =		"4c00000fMX",
> -  ["alnv.ps_4"] =	"4c00001eFGHS",
> -  ["madd.s_4"] =	"4c000020FRGH",
> -  ["madd.d_4"] =	"4c000021FRGH",
> -  ["madd.ps_4"] =	"4c000026FRGH",
> -  ["msub.s_4"] =	"4c000028FRGH",
> -  ["msub.d_4"] =	"4c000029FRGH",
> -  ["msub.ps_4"] =	"4c00002eFRGH",
> -  ["nmadd.s_4"] =	"4c000030FRGH",
> -  ["nmadd.d_4"] =	"4c000031FRGH",
> -  ["nmadd.ps_4"] =	"4c000036FRGH",
> -  ["nmsub.s_4"] =	"4c000038FRGH",
> -  ["nmsub.d_4"] =	"4c000039FRGH",
> -  ["nmsub.ps_4"] =	"4c00003eFRGH",
>  }
>  
> +if mipsr6 then -- Instructions added with MIPSR6.
> +
> +  for k,v in pairs({
> +
> +    -- Add immediate to upper bits.
> +    aui_3 =	"3c000000TSI",
> +    daui_3 =	mips64 and "74000000TSI",
> +    dahi_2 =	mips64 and "04060000SI",
> +    dati_2 =	mips64 and "041e0000SI",
> +
> +    -- TODO: addiupc, auipc, aluipc, lwpc, lwupc, ldpc.
> +
> +    -- Compact branches.
> +    blezalc_2 =	"18000000TB",	-- rt != 0.
> +    bgezalc_2 =	"18000000T=SB",	-- rt != 0.
> +    bgtzalc_2 =	"1c000000TB",	-- rt != 0.
> +    bltzalc_2 =	"1c000000T=SB",	-- rt != 0.
> +
> +    blezc_2 =	"58000000TB",	-- rt != 0.
> +    bgezc_2 =	"58000000T=SB",	-- rt != 0.
> +    bgec_3 =	"58000000STB",	-- rs != rt.
> +    blec_3 =	"58000000TSB",	-- rt != rs.
> +
> +    bgtzc_2 =	"5c000000TB",	-- rt != 0.
> +    bltzc_2 =	"5c000000T=SB",	-- rt != 0.
> +    bltc_3 =	"5c000000STB",	-- rs != rt.
> +    bgtc_3 =	"5c000000TSB",	-- rt != rs.
> +
> +    bgeuc_3 =	"18000000STB",	-- rs != rt.
> +    bleuc_3 =	"18000000TSB",	-- rt != rs.
> +    bltuc_3 =	"1c000000STB",	-- rs != rt.
> +    bgtuc_3 =	"1c000000TSB",	-- rt != rs.
> +
> +    beqzalc_2 =	"20000000TB",	-- rt != 0.
> +    bnezalc_2 =	"60000000TB",	-- rt != 0.
> +    beqc_3 =	"20000000STB",	-- rs < rt.
> +    bnec_3 =	"60000000STB",	-- rs < rt.
> +    bovc_3 =	"20000000STB",	-- rs >= rt.
> +    bnvc_3 =	"60000000STB",	-- rs >= rt.
> +
> +    beqzc_2 =	"d8000000SK",	-- rs != 0.
> +    bnezc_2 =	"f8000000SK",	-- rs != 0.
> +    jic_2 =	"d8000000TI",
> +    jialc_2 =	"f8000000TI",
> +    bc_1 =	"c8000000L",
> +    balc_1 =	"e8000000L",
> +
> +    -- Opcode SPECIAL.
> +    jr_1 =	"00000009S",
> +    sdbbp_0 =	"0000000e",
> +    sdbbp_1 =	"0000000eY",
> +    lsa_4 =	"00000005DSTA",
> +    dlsa_4 =	mips64 and "00000015DSTA",
> +    seleqz_3 =	"00000035DST",
> +    selnez_3 =	"00000037DST",
> +    clz_2 =	"00000050DS",
> +    clo_2 =	"00000051DS",
> +    dclz_2 =	mips64 and "00000052DS",
> +    dclo_2 =	mips64 and "00000053DS",
> +    mul_3 =	"00000098DST",
> +    muh_3 =	"000000d8DST",
> +    mulu_3 =	"00000099DST",
> +    muhu_3 =	"000000d9DST",
> +    div_3 =	"0000009aDST",
> +    mod_3 =	"000000daDST",
> +    divu_3 =	"0000009bDST",
> +    modu_3 =	"000000dbDST",
> +    dmul_3 =	mips64 and "0000009cDST",
> +    dmuh_3 =	mips64 and "000000dcDST",
> +    dmulu_3 =	mips64 and "0000009dDST",
> +    dmuhu_3 =	mips64 and "000000ddDST",
> +    ddiv_3 =	mips64 and "0000009eDST",
> +    dmod_3 =	mips64 and "000000deDST",
> +    ddivu_3 =	mips64 and "0000009fDST",
> +    dmodu_3 =	mips64 and "000000dfDST",
> +
> +    -- Opcode SPECIAL3.
> +    align_4 =		"7c000220DSTA",
> +    dalign_4 =		mips64 and "7c000224DSTA",
> +    bitswap_2 =		"7c000020DT",
> +    dbitswap_2 =	mips64 and "7c000024DT",
> +
> +    -- Opcode COP1.
> +    bc1eqz_2 =	"45200000HB",
> +    bc1nez_2 =	"45a00000HB",
> +
> +    ["sel.s_3"] =	"46000010FGH",
> +    ["seleqz.s_3"] =	"46000014FGH",
> +    ["selnez.s_3"] =	"46000017FGH",
> +    ["maddf.s_3"] =	"46000018FGH",
> +    ["msubf.s_3"] =	"46000019FGH",
> +    ["rint.s_2"] =	"4600001aFG",
> +    ["class.s_2"] =	"4600001bFG",
> +    ["min.s_3"] =	"4600001cFGH",
> +    ["mina.s_3"] =	"4600001dFGH",
> +    ["max.s_3"] =	"4600001eFGH",
> +    ["maxa.s_3"] =	"4600001fFGH",
> +    ["cmp.af.s_3"] =	"46800000FGH",
> +    ["cmp.un.s_3"] =	"46800001FGH",
> +    ["cmp.or.s_3"] =	"46800011FGH",
> +    ["cmp.eq.s_3"] =	"46800002FGH",
> +    ["cmp.une.s_3"] =	"46800012FGH",
> +    ["cmp.ueq.s_3"] =	"46800003FGH",
> +    ["cmp.ne.s_3"] =	"46800013FGH",
> +    ["cmp.lt.s_3"] =	"46800004FGH",
> +    ["cmp.ult.s_3"] =	"46800005FGH",
> +    ["cmp.le.s_3"] =	"46800006FGH",
> +    ["cmp.ule.s_3"] =	"46800007FGH",
> +    ["cmp.saf.s_3"] =	"46800008FGH",
> +    ["cmp.sun.s_3"] =	"46800009FGH",
> +    ["cmp.sor.s_3"] =	"46800019FGH",
> +    ["cmp.seq.s_3"] =	"4680000aFGH",
> +    ["cmp.sune.s_3"] =	"4680001aFGH",
> +    ["cmp.sueq.s_3"] =	"4680000bFGH",
> +    ["cmp.sne.s_3"] =	"4680001bFGH",
> +    ["cmp.slt.s_3"] =	"4680000cFGH",
> +    ["cmp.sult.s_3"] =	"4680000dFGH",
> +    ["cmp.sle.s_3"] =	"4680000eFGH",
> +    ["cmp.sule.s_3"] =	"4680000fFGH",
> +
> +    ["sel.d_3"] =	"46200010FGH",
> +    ["seleqz.d_3"] =	"46200014FGH",
> +    ["selnez.d_3"] =	"46200017FGH",
> +    ["maddf.d_3"] =	"46200018FGH",
> +    ["msubf.d_3"] =	"46200019FGH",
> +    ["rint.d_2"] =	"4620001aFG",
> +    ["class.d_2"] =	"4620001bFG",
> +    ["min.d_3"] =	"4620001cFGH",
> +    ["mina.d_3"] =	"4620001dFGH",
> +    ["max.d_3"] =	"4620001eFGH",
> +    ["maxa.d_3"] =	"4620001fFGH",
> +    ["cmp.af.d_3"] =	"46a00000FGH",
> +    ["cmp.un.d_3"] =	"46a00001FGH",
> +    ["cmp.or.d_3"] =	"46a00011FGH",
> +    ["cmp.eq.d_3"] =	"46a00002FGH",
> +    ["cmp.une.d_3"] =	"46a00012FGH",
> +    ["cmp.ueq.d_3"] =	"46a00003FGH",
> +    ["cmp.ne.d_3"] =	"46a00013FGH",
> +    ["cmp.lt.d_3"] =	"46a00004FGH",
> +    ["cmp.ult.d_3"] =	"46a00005FGH",
> +    ["cmp.le.d_3"] =	"46a00006FGH",
> +    ["cmp.ule.d_3"] =	"46a00007FGH",
> +    ["cmp.saf.d_3"] =	"46a00008FGH",
> +    ["cmp.sun.d_3"] =	"46a00009FGH",
> +    ["cmp.sor.d_3"] =	"46a00019FGH",
> +    ["cmp.seq.d_3"] =	"46a0000aFGH",
> +    ["cmp.sune.d_3"] =	"46a0001aFGH",
> +    ["cmp.sueq.d_3"] =	"46a0000bFGH",
> +    ["cmp.sne.d_3"] =	"46a0001bFGH",
> +    ["cmp.slt.d_3"] =	"46a0000cFGH",
> +    ["cmp.sult.d_3"] =	"46a0000dFGH",
> +    ["cmp.sle.d_3"] =	"46a0000eFGH",
> +    ["cmp.sule.d_3"] =	"46a0000fFGH",
> +
> +  }) do map_op[k] = v end
> +
> +else -- Instructions removed by MIPSR6.
> +
> +  for k,v in pairs({
> +    -- Traps, don't use.
> +    addi_3 =	"20000000TSI",
> +    daddi_3 =	mips64 and "60000000TSI",
> +
> +    -- Branch on likely, don't use.
> +    beqzl_2 =	"50000000SB",
> +    beql_3 =	"50000000STB",
> +    bnezl_2 =	"54000000SB",
> +    bnel_3 =	"54000000STB",
> +    blezl_2 =	"58000000SB",
> +    bgtzl_2 =	"5c000000SB",
> +
> +    lwl_2 =	"88000000TO",
> +    lwr_2 =	"98000000TO",
> +    swl_2 =	"a8000000TO",
> +    sdl_2 =	mips64 and "b0000000TO",
> +    sdr_2 =	mips64 and "b1000000TO",
> +    swr_2 =	"b8000000TO",
> +    cache_2 =	"bc000000NO",
> +    ll_2 =	"c0000000TO",
> +    pref_2 =	"cc000000NO",
> +    sc_2 =	"e0000000TO",
> +    scd_2 =	mips64 and "f0000000TO",
> +
> +    -- Opcode SPECIAL.
> +    movf_2 =	"00000001DS",
> +    movf_3 =	"00000001DSC",
> +    movt_2 =	"00010001DS",
> +    movt_3 =	"00010001DSC",
> +    jr_1 =	"00000008S",
> +    movz_3 =	"0000000aDST",
> +    movn_3 =	"0000000bDST",
> +    mfhi_1 =	"00000010D",
> +    mthi_1 =	"00000011S",
> +    mflo_1 =	"00000012D",
> +    mtlo_1 =	"00000013S",
> +    mult_2 =	"00000018ST",
> +    multu_2 =	"00000019ST",
> +    div_3 =	"0000001aST",
> +    divu_3 =	"0000001bST",
> +    ddiv_3 =	mips64 and "0000001eST",
> +    ddivu_3 =	mips64 and "0000001fST",
> +    dmult_2 =	mips64 and "0000001cST",
> +    dmultu_2 =	mips64 and "0000001dST",
> +
> +    -- Opcode REGIMM.
> +    tgei_2 =	"04080000SI",
> +    tgeiu_2 =	"04090000SI",
> +    tlti_2 =	"040a0000SI",
> +    tltiu_2 =	"040b0000SI",
> +    teqi_2 =	"040c0000SI",
> +    tnei_2 =	"040e0000SI",
> +    bltzal_2 =	"04100000SB",
> +    bgezal_2 =	"04110000SB",
> +    bltzall_2 =	"04120000SB",
> +    bgezall_2 =	"04130000SB",
> +
> +    -- Opcode SPECIAL2.
> +    madd_2 =	"70000000ST",
> +    maddu_2 =	"70000001ST",
> +    mul_3 =	"70000002DST",
> +    msub_2 =	"70000004ST",
> +    msubu_2 =	"70000005ST",
> +    clz_2 =	"70000020D=TS",
> +    clo_2 =	"70000021D=TS",
> +    dclz_2 =	mips64 and "70000024D=TS",
> +    dclo_2 =	mips64 and "70000025D=TS",
> +    sdbbp_0 =	"7000003f",
> +    sdbbp_1 =	"7000003fY",
> +
> +    -- Opcode COP1.
> +    bc1f_1 =	"45000000B",
> +    bc1f_2 =	"45000000CB",
> +    bc1t_1 =	"45010000B",
> +    bc1t_2 =	"45010000CB",
> +    bc1fl_1 =	"45020000B",
> +    bc1fl_2 =	"45020000CB",
> +    bc1tl_1 =	"45030000B",
> +    bc1tl_2 =	"45030000CB",
> +
> +    ["movf.s_2"] =	"46000011FG",
> +    ["movf.s_3"] =	"46000011FGC",
> +    ["movt.s_2"] =	"46010011FG",
> +    ["movt.s_3"] =	"46010011FGC",
> +    ["movz.s_3"] =	"46000012FGT",
> +    ["movn.s_3"] =	"46000013FGT",
> +    ["cvt.ps.s_3"] =	"46000026FGH",
> +    ["c.f.s_2"] =	"46000030GH",
> +    ["c.f.s_3"] =	"46000030VGH",
> +    ["c.un.s_2"] =	"46000031GH",
> +    ["c.un.s_3"] =	"46000031VGH",
> +    ["c.eq.s_2"] =	"46000032GH",
> +    ["c.eq.s_3"] =	"46000032VGH",
> +    ["c.ueq.s_2"] =	"46000033GH",
> +    ["c.ueq.s_3"] =	"46000033VGH",
> +    ["c.olt.s_2"] =	"46000034GH",
> +    ["c.olt.s_3"] =	"46000034VGH",
> +    ["c.ult.s_2"] =	"46000035GH",
> +    ["c.ult.s_3"] =	"46000035VGH",
> +    ["c.ole.s_2"] =	"46000036GH",
> +    ["c.ole.s_3"] =	"46000036VGH",
> +    ["c.ule.s_2"] =	"46000037GH",
> +    ["c.ule.s_3"] =	"46000037VGH",
> +    ["c.sf.s_2"] =	"46000038GH",
> +    ["c.sf.s_3"] =	"46000038VGH",
> +    ["c.ngle.s_2"] =	"46000039GH",
> +    ["c.ngle.s_3"] =	"46000039VGH",
> +    ["c.seq.s_2"] =	"4600003aGH",
> +    ["c.seq.s_3"] =	"4600003aVGH",
> +    ["c.ngl.s_2"] =	"4600003bGH",
> +    ["c.ngl.s_3"] =	"4600003bVGH",
> +    ["c.lt.s_2"] =	"4600003cGH",
> +    ["c.lt.s_3"] =	"4600003cVGH",
> +    ["c.nge.s_2"] =	"4600003dGH",
> +    ["c.nge.s_3"] =	"4600003dVGH",
> +    ["c.le.s_2"] =	"4600003eGH",
> +    ["c.le.s_3"] =	"4600003eVGH",
> +    ["c.ngt.s_2"] =	"4600003fGH",
> +    ["c.ngt.s_3"] =	"4600003fVGH",
> +    ["movf.d_2"] =	"46200011FG",
> +    ["movf.d_3"] =	"46200011FGC",
> +    ["movt.d_2"] =	"46210011FG",
> +    ["movt.d_3"] =	"46210011FGC",
> +    ["movz.d_3"] =	"46200012FGT",
> +    ["movn.d_3"] =	"46200013FGT",
> +    ["c.f.d_2"] =	"46200030GH",
> +    ["c.f.d_3"] =	"46200030VGH",
> +    ["c.un.d_2"] =	"46200031GH",
> +    ["c.un.d_3"] =	"46200031VGH",
> +    ["c.eq.d_2"] =	"46200032GH",
> +    ["c.eq.d_3"] =	"46200032VGH",
> +    ["c.ueq.d_2"] =	"46200033GH",
> +    ["c.ueq.d_3"] =	"46200033VGH",
> +    ["c.olt.d_2"] =	"46200034GH",
> +    ["c.olt.d_3"] =	"46200034VGH",
> +    ["c.ult.d_2"] =	"46200035GH",
> +    ["c.ult.d_3"] =	"46200035VGH",
> +    ["c.ole.d_2"] =	"46200036GH",
> +    ["c.ole.d_3"] =	"46200036VGH",
> +    ["c.ule.d_2"] =	"46200037GH",
> +    ["c.ule.d_3"] =	"46200037VGH",
> +    ["c.sf.d_2"] =	"46200038GH",
> +    ["c.sf.d_3"] =	"46200038VGH",
> +    ["c.ngle.d_2"] =	"46200039GH",
> +    ["c.ngle.d_3"] =	"46200039VGH",
> +    ["c.seq.d_2"] =	"4620003aGH",
> +    ["c.seq.d_3"] =	"4620003aVGH",
> +    ["c.ngl.d_2"] =	"4620003bGH",
> +    ["c.ngl.d_3"] =	"4620003bVGH",
> +    ["c.lt.d_2"] =	"4620003cGH",
> +    ["c.lt.d_3"] =	"4620003cVGH",
> +    ["c.nge.d_2"] =	"4620003dGH",
> +    ["c.nge.d_3"] =	"4620003dVGH",
> +    ["c.le.d_2"] =	"4620003eGH",
> +    ["c.le.d_3"] =	"4620003eVGH",
> +    ["c.ngt.d_2"] =	"4620003fGH",
> +    ["c.ngt.d_3"] =	"4620003fVGH",
> +    ["add.ps_3"] =	"46c00000FGH",
> +    ["sub.ps_3"] =	"46c00001FGH",
> +    ["mul.ps_3"] =	"46c00002FGH",
> +    ["abs.ps_2"] =	"46c00005FG",
> +    ["mov.ps_2"] =	"46c00006FG",
> +    ["neg.ps_2"] =	"46c00007FG",
> +    ["movf.ps_2"] =	"46c00011FG",
> +    ["movf.ps_3"] =	"46c00011FGC",
> +    ["movt.ps_2"] =	"46c10011FG",
> +    ["movt.ps_3"] =	"46c10011FGC",
> +    ["movz.ps_3"] =	"46c00012FGT",
> +    ["movn.ps_3"] =	"46c00013FGT",
> +    ["cvt.s.pu_2"] =	"46c00020FG",
> +    ["cvt.s.pl_2"] =	"46c00028FG",
> +    ["pll.ps_3"] =	"46c0002cFGH",
> +    ["plu.ps_3"] =	"46c0002dFGH",
> +    ["pul.ps_3"] =	"46c0002eFGH",
> +    ["puu.ps_3"] =	"46c0002fFGH",
> +    ["c.f.ps_2"] =	"46c00030GH",
> +    ["c.f.ps_3"] =	"46c00030VGH",
> +    ["c.un.ps_2"] =	"46c00031GH",
> +    ["c.un.ps_3"] =	"46c00031VGH",
> +    ["c.eq.ps_2"] =	"46c00032GH",
> +    ["c.eq.ps_3"] =	"46c00032VGH",
> +    ["c.ueq.ps_2"] =	"46c00033GH",
> +    ["c.ueq.ps_3"] =	"46c00033VGH",
> +    ["c.olt.ps_2"] =	"46c00034GH",
> +    ["c.olt.ps_3"] =	"46c00034VGH",
> +    ["c.ult.ps_2"] =	"46c00035GH",
> +    ["c.ult.ps_3"] =	"46c00035VGH",
> +    ["c.ole.ps_2"] =	"46c00036GH",
> +    ["c.ole.ps_3"] =	"46c00036VGH",
> +    ["c.ule.ps_2"] =	"46c00037GH",
> +    ["c.ule.ps_3"] =	"46c00037VGH",
> +    ["c.sf.ps_2"] =	"46c00038GH",
> +    ["c.sf.ps_3"] =	"46c00038VGH",
> +    ["c.ngle.ps_2"] =	"46c00039GH",
> +    ["c.ngle.ps_3"] =	"46c00039VGH",
> +    ["c.seq.ps_2"] =	"46c0003aGH",
> +    ["c.seq.ps_3"] =	"46c0003aVGH",
> +    ["c.ngl.ps_2"] =	"46c0003bGH",
> +    ["c.ngl.ps_3"] =	"46c0003bVGH",
> +    ["c.lt.ps_2"] =	"46c0003cGH",
> +    ["c.lt.ps_3"] =	"46c0003cVGH",
> +    ["c.nge.ps_2"] =	"46c0003dGH",
> +    ["c.nge.ps_3"] =	"46c0003dVGH",
> +    ["c.le.ps_2"] =	"46c0003eGH",
> +    ["c.le.ps_3"] =	"46c0003eVGH",
> +    ["c.ngt.ps_2"] =	"46c0003fGH",
> +    ["c.ngt.ps_3"] =	"46c0003fVGH",
> +
> +    -- Opcode COP1X.
> +    lwxc1_2 =	"4c000000FX",
> +    ldxc1_2 =	"4c000001FX",
> +    luxc1_2 =	"4c000005FX",
> +    swxc1_2 =	"4c000008FX",
> +    sdxc1_2 =	"4c000009FX",
> +    suxc1_2 =	"4c00000dFX",
> +    prefx_2 =	"4c00000fMX",
> +    ["alnv.ps_4"] =	"4c00001eFGHS",
> +    ["madd.s_4"] =	"4c000020FRGH",
> +    ["madd.d_4"] =	"4c000021FRGH",
> +    ["madd.ps_4"] =	"4c000026FRGH",
> +    ["msub.s_4"] =	"4c000028FRGH",
> +    ["msub.d_4"] =	"4c000029FRGH",
> +    ["msub.ps_4"] =	"4c00002eFRGH",
> +    ["nmadd.s_4"] =	"4c000030FRGH",
> +    ["nmadd.d_4"] =	"4c000031FRGH",
> +    ["nmadd.ps_4"] =	"4c000036FRGH",
> +    ["nmsub.s_4"] =	"4c000038FRGH",
> +    ["nmsub.d_4"] =	"4c000039FRGH",
> +    ["nmsub.ps_4"] =	"4c00003eFRGH",
> +
> +  }) do map_op[k] = v end
> +
> +end
> +
>  ------------------------------------------------------------------------------
>  
>  local function parse_gpr(expr)
> @@ -808,9 +979,11 @@ map_op[".template__"] = function(params, template, nparams)
>        op = op + parse_disp(params[n]); n = n + 1
>      elseif p == "X" then
>        op = op + parse_index(params[n]); n = n + 1
> -    elseif p == "B" or p == "J" then
> +    elseif p == "B" or p == "J" or p == "K" or p == "L" then
>        local mode, m, s = parse_label(params[n], false)
> -      if p == "B" then m = m + 2048 end
> +      if p == "J" then m = m + 0xa800
> +      elseif p == "K" then m = m + 0x5000
> +      elseif p == "L" then m = m + 0xa000 end
>        waction("REL_"..mode, m, s, 1)
>        n = n + 1
>      elseif p == "A" then
> @@ -833,7 +1006,7 @@ map_op[".template__"] = function(params, template, nparams)
>      elseif p == "Z" then
>        op = op + parse_imm(params[n], 10, 6, 0, false); n = n + 1
>      elseif p == "=" then
> -      op = op + shl(band(op, 0xf800), 5) -- Copy D to T for clz, clo.
> +      n = n - 1 -- Re-use previous parameter for next template char.
>      else
>        assert(false)
>      end
> diff --git a/dynasm/dynasm.lua b/dynasm/dynasm.lua
> index 5ec21a79..46ebfca8 100644
> --- a/dynasm/dynasm.lua
> +++ b/dynasm/dynasm.lua
> @@ -630,6 +630,7 @@ end
>  -- Load architecture-specific module.
>  local function loadarch(arch)
>    if not match(arch, "^[%w_]+$") then return "bad arch name" end
> +  _G._map_def = map_def
>    local ok, m_arch = pcall(require, "dasm_"..arch)
>    if not ok then return "cannot load module: "..m_arch end
>    g_arch = m_arch
> diff --git a/src/Makefile.original b/src/Makefile.original
> index aedaaa73..22d36a27 100644
> --- a/src/Makefile.original
> +++ b/src/Makefile.original
> @@ -455,6 +455,9 @@ ifeq (arm,$(TARGET_LJARCH))
>      DASM_AFLAGS+= -D IOS
>    endif
>  else
> +ifneq (,$(findstring LJ_TARGET_MIPSR6 ,$(TARGET_TESTARCH)))
> +  DASM_AFLAGS+= -D MIPSR6
> +endif
>  ifeq (ppc,$(TARGET_LJARCH))
>    ifneq (,$(findstring LJ_ARCH_SQRT 1,$(TARGET_TESTARCH)))
>      DASM_AFLAGS+= -D SQRT
> diff --git a/src/jit/bcsave.lua b/src/jit/bcsave.lua
> index 2553d97e..41081184 100644
> --- a/src/jit/bcsave.lua
> +++ b/src/jit/bcsave.lua
> @@ -17,6 +17,10 @@ local bit = require("bit")
>  -- Symbol name prefix for LuaJIT bytecode.
>  local LJBC_PREFIX = "luaJIT_BC_"
>  
> +local type, assert = type, assert
> +local format = string.format
> +local tremove, tconcat = table.remove, table.concat
> +
>  ------------------------------------------------------------------------------
>  
>  local function usage()
> @@ -63,8 +67,18 @@ local map_type = {
>  }
>  
>  local map_arch = {
> -  x86 = true, x64 = true, arm = true, arm64 = true, arm64be = true,
> -  ppc = true, mips = true, mipsel = true,
> +  x86 =		{ e = "le", b = 32, m = 3, p = 0x14c, },
> +  x64 =		{ e = "le", b = 64, m = 62, p = 0x8664, },
> +  arm =		{ e = "le", b = 32, m = 40, p = 0x1c0, },
> +  arm64 =	{ e = "le", b = 64, m = 183, p = 0xaa64, },
> +  arm64be =	{ e = "be", b = 64, m = 183, },
> +  ppc =		{ e = "be", b = 32, m = 20, },
> +  mips =	{ e = "be", b = 32, m = 8, f = 0x50001006, },
> +  mipsel =	{ e = "le", b = 32, m = 8, f = 0x50001006, },
> +  mips64 =	{ e = "be", b = 64, m = 8, f = 0x80000007, },
> +  mips64el =	{ e = "le", b = 64, m = 8, f = 0x80000007, },
> +  mips64r6 =	{ e = "be", b = 64, m = 8, f = 0xa0000407, },
> +  mips64r6el =	{ e = "le", b = 64, m = 8, f = 0xa0000407, },
>  }
>  
>  local map_os = {
> @@ -73,33 +87,33 @@ local map_os = {
>  }
>  
>  local function checkarg(str, map, err)
> -  str = string.lower(str)
> +  str = str:lower()
>    local s = check(map[str], "unknown ", err)
> -  return s == true and str or s
> +  return type(s) == "string" and s or str
>  end
>  
>  local function detecttype(str)
> -  local ext = string.match(string.lower(str), "%.(%a+)$")
> +  local ext = str:lower():match("%.(%a+)$")
>    return map_type[ext] or "raw"
>  end
>  
>  local function checkmodname(str)
> -  check(string.match(str, "^[%w_.%-]+$"), "bad module name")
> -  return string.gsub(str, "[%.%-]", "_")
> +  check(str:match("^[%w_.%-]+$"), "bad module name")
> +  return str:gsub("[%.%-]", "_")
>  end
>  
>  local function detectmodname(str)
>    if type(str) == "string" then
> -    local tail = string.match(str, "[^/\\]+$")
> +    local tail = str:match("[^/\\]+$")
>      if tail then str = tail end
> -    local head = string.match(str, "^(.*)%.[^.]*$")
> +    local head = str:match("^(.*)%.[^.]*$")
>      if head then str = head end
> -    str = string.match(str, "^[%w_.%-]+")
> +    str = str:match("^[%w_.%-]+")
>    else
>      str = nil
>    end
>    check(str, "cannot derive module name, use -n name")
> -  return string.gsub(str, "[%.%-]", "_")
> +  return str:gsub("[%.%-]", "_")
>  end
>  
>  ------------------------------------------------------------------------------
> @@ -118,7 +132,7 @@ end
>  local function bcsave_c(ctx, output, s)
>    local fp = savefile(output, "w")
>    if ctx.type == "c" then
> -    fp:write(string.format([[
> +    fp:write(format([[
>  #ifdef _cplusplus
>  extern "C"
>  #endif
> @@ -128,7 +142,7 @@ __declspec(dllexport)
>  const unsigned char %s%s[] = {
>  ]], LJBC_PREFIX, ctx.modname))
>    else
> -    fp:write(string.format([[
> +    fp:write(format([[
>  #define %s%s_SIZE %d
>  static const unsigned char %s%s[] = {
>  ]], LJBC_PREFIX, ctx.modname, #s, LJBC_PREFIX, ctx.modname))
> @@ -138,13 +152,13 @@ static const unsigned char %s%s[] = {
>      local b = tostring(string.byte(s, i))
>      m = m + #b + 1
>      if m > 78 then
> -      fp:write(table.concat(t, ",", 1, n), ",\n")
> +      fp:write(tconcat(t, ",", 1, n), ",\n")
>        n, m = 0, #b + 1
>      end
>      n = n + 1
>      t[n] = b
>    end
> -  bcsave_tail(fp, output, table.concat(t, ",", 1, n).."\n};\n")
> +  bcsave_tail(fp, output, tconcat(t, ",", 1, n).."\n};\n")
>  end
>  
>  local function bcsave_elfobj(ctx, output, s, ffi)
> @@ -199,12 +213,8 @@ typedef struct {
>  } ELF64obj;
>  ]]
>    local symname = LJBC_PREFIX..ctx.modname
> -  local is64, isbe = false, false
> -  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" then
> -    is64 = true
> -  elseif ctx.arch == "ppc" or ctx.arch == "mips" then
> -    isbe = true
> -  end
> +  local ai = assert(map_arch[ctx.arch])
> +  local is64, isbe = ai.b == 64, ai.e == "be"
>  
>    -- Handle different host/target endianess.
>    local function f32(x) return x end
> @@ -237,10 +247,8 @@ typedef struct {
>    hdr.eendian = isbe and 2 or 1
>    hdr.eversion = 1
>    hdr.type = f16(1)
> -  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, mips=8, mipsel=8 })[ctx.arch])
> -  if ctx.arch == "mips" or ctx.arch == "mipsel" then
> -    hdr.flags = f32(0x50001006)
> -  end
> +  hdr.machine = f16(ai.m)
> +  hdr.flags = f32(ai.f or 0)
>    hdr.version = f32(1)
>    hdr.shofs = fofs(ffi.offsetof(o, "sect"))
>    hdr.ehsize = f16(ffi.sizeof(hdr))
> @@ -336,12 +344,8 @@ typedef struct {
>  } PEobj;
>  ]]
>    local symname = LJBC_PREFIX..ctx.modname
> -  local is64 = false
> -  if ctx.arch == "x86" then
> -    symname = "_"..symname
> -  elseif ctx.arch == "x64" then
> -    is64 = true
> -  end
> +  local ai = assert(map_arch[ctx.arch])
> +  local is64 = ai.b == 64
>    local symexport = "   /EXPORT:"..symname..",DATA "
>  
>    -- The file format is always little-endian. Swap if the host is big-endian.
> @@ -355,7 +359,7 @@ typedef struct {
>    -- Create PE object and fill in header.
>    local o = ffi.new("PEobj")
>    local hdr = o.hdr
> -  hdr.arch = f16(({ x86=0x14c, x64=0x8664, arm=0x1c0, ppc=0x1f2, mips=0x366, mipsel=0x366 })[ctx.arch])
> +  hdr.arch = f16(assert(ai.p))
>    hdr.nsects = f16(2)
>    hdr.symtabofs = f32(ffi.offsetof(o, "sym0"))
>    hdr.nsyms = f32(6)
> @@ -605,16 +609,16 @@ local function docmd(...)
>    local n = 1
>    local list = false
>    local ctx = {
> -    strip = true, arch = jit.arch, os = string.lower(jit.os),
> +    strip = true, arch = jit.arch, os = jit.os:lower(),
>      type = false, modname = false,
>    }
>    while n <= #arg do
>      local a = arg[n]
> -    if type(a) == "string" and string.sub(a, 1, 1) == "-" and a ~= "-" then
> -      table.remove(arg, n)
> +    if type(a) == "string" and a:sub(1, 1) == "-" and a ~= "-" then
> +      tremove(arg, n)
>        if a == "--" then break end
>        for m=2,#a do
> -	local opt = string.sub(a, m, m)
> +	local opt = a:sub(m, m)
>  	if opt == "l" then
>  	  list = true
>  	elseif opt == "s" then
> @@ -627,13 +631,13 @@ local function docmd(...)
>  	    if n ~= 1 then usage() end
>  	    arg[1] = check(loadstring(arg[1]))
>  	  elseif opt == "n" then
> -	    ctx.modname = checkmodname(table.remove(arg, n))
> +	    ctx.modname = checkmodname(tremove(arg, n))
>  	  elseif opt == "t" then
> -	    ctx.type = checkarg(table.remove(arg, n), map_type, "file type")
> +	    ctx.type = checkarg(tremove(arg, n), map_type, "file type")
>  	  elseif opt == "a" then
> -	    ctx.arch = checkarg(table.remove(arg, n), map_arch, "architecture")
> +	    ctx.arch = checkarg(tremove(arg, n), map_arch, "architecture")
>  	  elseif opt == "o" then
> -	    ctx.os = checkarg(table.remove(arg, n), map_os, "OS name")
> +	    ctx.os = checkarg(tremove(arg, n), map_os, "OS name")
>  	  else
>  	    usage()
>  	  end
> diff --git a/src/jit/dis_mips.lua b/src/jit/dis_mips.lua
> index a12b8e62..c003b984 100644
> --- a/src/jit/dis_mips.lua
> +++ b/src/jit/dis_mips.lua
> @@ -19,13 +19,34 @@ local band, bor, tohex = bit.band, bit.bor, bit.tohex
>  local lshift, rshift, arshift = bit.lshift, bit.rshift, bit.arshift
>  
>  ------------------------------------------------------------------------------
> --- Primary and extended opcode maps
> +-- Extended opcode maps common to all MIPS releases
>  ------------------------------------------------------------------------------
>  
> -local map_movci = { shift = 16, mask = 1, [0] = "movfDSC", "movtDSC", }
>  local map_srl = { shift = 21, mask = 1, [0] = "srlDTA", "rotrDTA", }
>  local map_srlv = { shift = 6, mask = 1, [0] = "srlvDTS", "rotrvDTS", }
>  
> +local map_cop0 = {
> +  shift = 25, mask = 1,
> +  [0] = {
> +    shift = 21, mask = 15,
> +    [0] = "mfc0TDW", [4] = "mtc0TDW",
> +    [10] = "rdpgprDT",
> +    [11] = { shift = 5, mask = 1, [0] = "diT0", "eiT0", },
> +    [14] = "wrpgprDT",
> +  }, {
> +    shift = 0, mask = 63,
> +    [1] = "tlbr", [2] = "tlbwi", [6] = "tlbwr", [8] = "tlbp",
> +    [24] = "eret", [31] = "deret",
> +    [32] = "wait",
> +  },
> +}
> +
> +------------------------------------------------------------------------------
> +-- Primary and extended opcode maps for MIPS R1-R5
> +------------------------------------------------------------------------------
> +
> +local map_movci = { shift = 16, mask = 1, [0] = "movfDSC", "movtDSC", }
> +
>  local map_special = {
>    shift = 0, mask = 63,
>    [0] = { shift = 0, mask = -1, [0] = "nop", _ = "sllDTA" },
> @@ -87,22 +108,6 @@ local map_regimm = {
>    false,	false,		false,		"synciSO",
>  }
>  
> -local map_cop0 = {
> -  shift = 25, mask = 1,
> -  [0] = {
> -    shift = 21, mask = 15,
> -    [0] = "mfc0TDW", [4] = "mtc0TDW",
> -    [10] = "rdpgprDT",
> -    [11] = { shift = 5, mask = 1, [0] = "diT0", "eiT0", },
> -    [14] = "wrpgprDT",
> -  }, {
> -    shift = 0, mask = 63,
> -    [1] = "tlbr", [2] = "tlbwi", [6] = "tlbwr", [8] = "tlbp",
> -    [24] = "eret", [31] = "deret",
> -    [32] = "wait",
> -  },
> -}
> -
>  local map_cop1s = {
>    shift = 0, mask = 63,
>    [0] = "add.sFGH",	"sub.sFGH",	"mul.sFGH",	"div.sFGH",
> @@ -233,6 +238,208 @@ local map_pri = {
>    false,	"sdc1HSO",	"sdc2TSO",	"sdTSO",
>  }
>  
> +------------------------------------------------------------------------------
> +-- Primary and extended opcode maps for MIPS R6
> +------------------------------------------------------------------------------
> +
> +local map_mul_r6 =   { shift = 6, mask = 3, [2] = "mulDST",   [3] = "muhDST" }
> +local map_mulu_r6 =  { shift = 6, mask = 3, [2] = "muluDST",  [3] = "muhuDST" }
> +local map_div_r6 =   { shift = 6, mask = 3, [2] = "divDST",   [3] = "modDST" }
> +local map_divu_r6 =  { shift = 6, mask = 3, [2] = "divuDST",  [3] = "moduDST" }
> +local map_dmul_r6 =  { shift = 6, mask = 3, [2] = "dmulDST",  [3] = "dmuhDST" }
> +local map_dmulu_r6 = { shift = 6, mask = 3, [2] = "dmuluDST", [3] = "dmuhuDST" }
> +local map_ddiv_r6 =  { shift = 6, mask = 3, [2] = "ddivDST",  [3] = "dmodDST" }
> +local map_ddivu_r6 = { shift = 6, mask = 3, [2] = "ddivuDST", [3] = "dmoduDST" }
> +
> +local map_special_r6 = {
> +  shift = 0, mask = 63,
> +  [0] = { shift = 0, mask = -1, [0] = "nop", _ = "sllDTA" },
> +  false,	map_srl,	"sraDTA",
> +  "sllvDTS",	false,		map_srlv,	"sravDTS",
> +  "jrS",	"jalrD1S",	false,		false,
> +  "syscallY",	"breakY",	false,		"sync",
> +  "clzDS",	"cloDS",	"dclzDS",	"dcloDS",
> +  "dsllvDST",	"dlsaDSTA",	"dsrlvDST",	"dsravDST",
> +  map_mul_r6,	map_mulu_r6,	map_div_r6,	map_divu_r6,
> +  map_dmul_r6,	map_dmulu_r6,	map_ddiv_r6,	map_ddivu_r6,
> +  "addDST",	"addu|moveDST0", "subDST",	"subu|neguDS0T",
> +  "andDST",	"or|moveDST0",	"xorDST",	"nor|notDST0",
> +  false,	false,		"sltDST",	"sltuDST",
> +  "daddDST",	"dadduDST",	"dsubDST",	"dsubuDST",
> +  "tgeSTZ",	"tgeuSTZ",	"tltSTZ",	"tltuSTZ",
> +  "teqSTZ",	"seleqzDST",	"tneSTZ",	"selnezDST",
> +  "dsllDTA",	false,		"dsrlDTA",	"dsraDTA",
> +  "dsll32DTA",	false,		"dsrl32DTA",	"dsra32DTA",
> +}
> +
> +local map_bshfl_r6 = {
> +  shift = 9, mask = 3,
> +  [1] = "alignDSTa",
> +  _ = {
> +    shift = 6, mask = 31,
> +    [0] = "bitswapDT",
> +    [2] = "wsbhDT",
> +    [16] = "sebDT",
> +    [24] = "sehDT",
> +  }
> +}
> +
> +local map_dbshfl_r6 = {
> +  shift = 9, mask = 3,
> +  [1] = "dalignDSTa",
> +  _ = {
> +    shift = 6, mask = 31,
> +    [0] = "dbitswapDT",
> +    [2] = "dsbhDT",
> +    [5] = "dshdDT",
> +  }
> +}
> +
> +local map_special3_r6 = {
> +  shift = 0, mask = 63,
> +  [0]  = "extTSAK", [1]  = "dextmTSAP", [3]  = "dextTSAK",
> +  [4]  = "insTSAL", [6]  = "dinsuTSEQ", [7]  = "dinsTSAL",
> +  [32] = map_bshfl_r6, [36] = map_dbshfl_r6,  [59] = "rdhwrTD",
> +}
> +
> +local map_regimm_r6 = {
> +  shift = 16, mask = 31,
> +  [0] = "bltzSB", [1] = "bgezSB",
> +  [6] = "dahiSI", [30] = "datiSI",
> +  [23] = "sigrieI", [31] = "synciSO",
> +}
> +
> +local map_pcrel_r6 = {
> +  shift = 19, mask = 3,
> +  [0] = "addiupcS2", "lwpcS2", "lwupcS2", {
> +    shift = 18, mask = 1,
> +    [0] = "ldpcS3", { shift = 16, mask = 3, [2] = "auipcSI", [3] = "aluipcSI" }
> +  }
> +}
> +
> +local map_cop1s_r6 = {
> +  shift = 0, mask = 63,
> +  [0] = "add.sFGH",	"sub.sFGH",	"mul.sFGH",	"div.sFGH",
> +  "sqrt.sFG",		"abs.sFG",	"mov.sFG",	"neg.sFG",
> +  "round.l.sFG",	"trunc.l.sFG",	"ceil.l.sFG",	"floor.l.sFG",
> +  "round.w.sFG",	"trunc.w.sFG",	"ceil.w.sFG",	"floor.w.sFG",
> +  "sel.sFGH",		false,		false,		false,
> +  "seleqz.sFGH",	"recip.sFG",	"rsqrt.sFG",	"selnez.sFGH",
> +  "maddf.sFGH",		"msubf.sFGH",	"rint.sFG",	"class.sFG",
> +  "min.sFGH",		"mina.sFGH",	"max.sFGH",	"maxa.sFGH",
> +  false,		"cvt.d.sFG",	false,		false,
> +  "cvt.w.sFG",		"cvt.l.sFG",
> +}
> +
> +local map_cop1d_r6 = {
> +  shift = 0, mask = 63,
> +  [0] = "add.dFGH",	"sub.dFGH",	"mul.dFGH",	"div.dFGH",
> +  "sqrt.dFG",		"abs.dFG",	"mov.dFG",	"neg.dFG",
> +  "round.l.dFG",	"trunc.l.dFG",	"ceil.l.dFG",	"floor.l.dFG",
> +  "round.w.dFG",	"trunc.w.dFG",	"ceil.w.dFG",	"floor.w.dFG",
> +  "sel.dFGH",		false,		false,		false,
> +  "seleqz.dFGH",	"recip.dFG",	"rsqrt.dFG",	"selnez.dFGH",
> +  "maddf.dFGH",		"msubf.dFGH",	"rint.dFG",	"class.dFG",
> +  "min.dFGH",		"mina.dFGH",	"max.dFGH",	"maxa.dFGH",
> +  "cvt.s.dFG",		false,		false,		false,
> +  "cvt.w.dFG",		"cvt.l.dFG",
> +}
> +
> +local map_cop1w_r6 = {
> +  shift = 0, mask = 63,
> +  [0] = "cmp.af.sFGH",	"cmp.un.sFGH",	"cmp.eq.sFGH",	"cmp.ueq.sFGH",
> +  "cmp.lt.sFGH",	"cmp.ult.sFGH",	"cmp.le.sFGH",	"cmp.ule.sFGH",
> +  "cmp.saf.sFGH",	"cmp.sun.sFGH",	"cmp.seq.sFGH",	"cmp.sueq.sFGH",
> +  "cmp.slt.sFGH",	"cmp.sult.sFGH",	"cmp.sle.sFGH",	"cmp.sule.sFGH",
> +  false,		"cmp.or.sFGH",	"cmp.une.sFGH",	"cmp.ne.sFGH",
> +  false,		false,		false,		false,
> +  false,		"cmp.sor.sFGH",	"cmp.sune.sFGH",	"cmp.sne.sFGH",
> +  false,		false,		false,		false,
> +  "cvt.s.wFG", "cvt.d.wFG",
> +}
> +
> +local map_cop1l_r6 = {
> +  shift = 0, mask = 63,
> +  [0] = "cmp.af.dFGH",	"cmp.un.dFGH",	"cmp.eq.dFGH",	"cmp.ueq.dFGH",
> +  "cmp.lt.dFGH",	"cmp.ult.dFGH",	"cmp.le.dFGH",	"cmp.ule.dFGH",
> +  "cmp.saf.dFGH",	"cmp.sun.dFGH",	"cmp.seq.dFGH",	"cmp.sueq.dFGH",
> +  "cmp.slt.dFGH",	"cmp.sult.dFGH",	"cmp.sle.dFGH",	"cmp.sule.dFGH",
> +  false,		"cmp.or.dFGH",	"cmp.une.dFGH",	"cmp.ne.dFGH",
> +  false,		false,		false,		false,
> +  false,		"cmp.sor.dFGH",	"cmp.sune.dFGH",	"cmp.sne.dFGH",
> +  false,		false,		false,		false,
> +  "cvt.s.lFG", "cvt.d.lFG",
> +}
> +
> +local map_cop1_r6 = {
> +  shift = 21, mask = 31,
> +  [0] = "mfc1TG", "dmfc1TG",	"cfc1TG",	"mfhc1TG",
> +  "mtc1TG",	"dmtc1TG",	"ctc1TG",	"mthc1TG",
> +  false,	"bc1eqzHB",	false,		false,
> +  false,	"bc1nezHB",	false,		false,
> +  map_cop1s_r6,	map_cop1d_r6,	false,		false,
> +  map_cop1w_r6,	map_cop1l_r6,
> +}
> +
> +local function maprs_popTS(rs, rt)
> +  if rt == 0 then return 0 elseif rs == 0 then return 1
> +  elseif rs == rt then return 2 else return 3 end
> +end
> +
> +local map_pop06_r6 = {
> +  maprs = maprs_popTS, [0] = "blezSB", "blezalcTB", "bgezalcTB", "bgeucSTB"
> +}
> +local map_pop07_r6 = {
> +  maprs = maprs_popTS, [0] = "bgtzSB", "bgtzalcTB", "bltzalcTB", "bltucSTB"
> +}
> +local map_pop26_r6 = {
> +  maprs = maprs_popTS, "blezcTB", "bgezcTB", "bgecSTB"
> +}
> +local map_pop27_r6 = {
> +  maprs = maprs_popTS, "bgtzcTB", "bltzcTB", "bltcSTB"
> +}
> +
> +local function maprs_popS(rs, rt)
> +  if rs == 0 then return 0 else return 1 end
> +end
> +
> +local map_pop66_r6 = {
> +  maprs = maprs_popS, [0] = "jicTI", "beqzcSb"
> +}
> +local map_pop76_r6 = {
> +  maprs = maprs_popS, [0] = "jialcTI", "bnezcSb"
> +}
> +
> +local function maprs_popST(rs, rt)
> +  if rs >= rt then return 0 elseif rs == 0 then return 1 else return 2 end
> +end
> +
> +local map_pop10_r6 = {
> +  maprs = maprs_popST, [0] = "bovcSTB", "beqzalcTB", "beqcSTB"
> +}
> +local map_pop30_r6 = {
> +  maprs = maprs_popST, [0] = "bnvcSTB", "bnezalcTB", "bnecSTB"
> +}
> +
> +local map_pri_r6 = {
> +  [0] = map_special_r6,	map_regimm_r6,	"jJ",	"jalJ",
> +  "beq|beqz|bST00B",	"bne|bnezST0B",		map_pop06_r6,	map_pop07_r6,
> +  map_pop10_r6,	"addiu|liTS0I",	"sltiTSI",	"sltiuTSI",
> +  "andiTSU",	"ori|liTS0U",	"xoriTSU",	"aui|luiTS0U",
> +  map_cop0,	map_cop1_r6,	false,		false,
> +  false,	false,		map_pop26_r6,	map_pop27_r6,
> +  map_pop30_r6,	"daddiuTSI",	false,		false,
> +  false,	"dauiTSI",	false,		map_special3_r6,
> +  "lbTSO",	"lhTSO",	false,		"lwTSO",
> +  "lbuTSO",	"lhuTSO",	false,		false,
> +  "sbTSO",	"shTSO",	false,		"swTSO",
> +  false,	false,		false,		false,
> +  false,	"lwc1HSO",	"bc#",		false,
> +  false,	"ldc1HSO",	map_pop66_r6,	"ldTSO",
> +  false,	"swc1HSO",	"balc#",	map_pcrel_r6,
> +  false,	"sdc1HSO",	map_pop76_r6,	"sdTSO",
> +}
> +
>  ------------------------------------------------------------------------------
>  
>  local map_gpr = {
> @@ -287,10 +494,14 @@ local function disass_ins(ctx)
>    ctx.op = op
>    ctx.rel = nil
>  
> -  local opat = map_pri[rshift(op, 26)]
> +  local opat = ctx.map_pri[rshift(op, 26)]
>    while type(opat) ~= "string" do
>      if not opat then return unknown(ctx) end
> -    opat = opat[band(rshift(op, opat.shift), opat.mask)] or opat._
> +    if opat.maprs then
> +      opat = opat[opat.maprs(band(rshift(op,21),31), band(rshift(op,16),31))]
> +    else
> +      opat = opat[band(rshift(op, opat.shift), opat.mask)] or opat._
> +    end
>    end
>    local name, pat = match(opat, "^([a-z0-9_.]*)(.*)")
>    local altname, pat2 = match(pat, "|([a-z0-9_.|]*)(.*)")
> @@ -314,6 +525,8 @@ local function disass_ins(ctx)
>        x = "f"..band(rshift(op, 21), 31)
>      elseif p == "A" then
>        x = band(rshift(op, 6), 31)
> +    elseif p == "a" then
> +      x = band(rshift(op, 6), 7)
>      elseif p == "E" then
>        x = band(rshift(op, 6), 31) + 32
>      elseif p == "M" then
> @@ -333,6 +546,10 @@ local function disass_ins(ctx)
>        x = band(rshift(op, 11), 31) - last + 33
>      elseif p == "I" then
>        x = arshift(lshift(op, 16), 16)
> +    elseif p == "2" then
> +      x = arshift(lshift(op, 13), 11)
> +    elseif p == "3" then
> +      x = arshift(lshift(op, 14), 11)
>      elseif p == "U" then
>        x = band(op, 0xffff)
>      elseif p == "O" then
> @@ -342,7 +559,15 @@ local function disass_ins(ctx)
>        local index = map_gpr[band(rshift(op, 16), 31)]
>        operands[#operands] = format("%s(%s)", index, last)
>      elseif p == "B" then
> -      x = ctx.addr + ctx.pos + arshift(lshift(op, 16), 16)*4 + 4
> +      x = ctx.addr + ctx.pos + arshift(lshift(op, 16), 14) + 4
> +      ctx.rel = x
> +      x = format("0x%08x", x)
> +    elseif p == "b" then
> +      x = ctx.addr + ctx.pos + arshift(lshift(op, 11), 9) + 4
> +      ctx.rel = x
> +      x = format("0x%08x", x)
> +    elseif p == "#" then
> +      x = ctx.addr + ctx.pos + arshift(lshift(op, 6), 4) + 4
>        ctx.rel = x
>        x = format("0x%08x", x)
>      elseif p == "J" then
> @@ -408,6 +633,7 @@ local function create(code, addr, out)
>    ctx.disass = disass_block
>    ctx.hexdump = 8
>    ctx.get = get_be
> +  ctx.map_pri = map_pri
>    return ctx
>  end
>  
> @@ -417,6 +643,19 @@ local function create_el(code, addr, out)
>    return ctx
>  end
>  
> +local function create_r6(code, addr, out)
> +  local ctx = create(code, addr, out)
> +  ctx.map_pri = map_pri_r6
> +  return ctx
> +end
> +
> +local function create_r6_el(code, addr, out)
> +  local ctx = create(code, addr, out)
> +  ctx.get = get_le
> +  ctx.map_pri = map_pri_r6
> +  return ctx
> +end
> +
>  -- Simple API: disassemble code (a string) at address and output via out.
>  local function disass(code, addr, out)
>    create(code, addr, out):disass()
> @@ -426,6 +665,14 @@ local function disass_el(code, addr, out)
>    create_el(code, addr, out):disass()
>  end
>  
> +local function disass_r6(code, addr, out)
> +  create_r6(code, addr, out):disass()
> +end
> +
> +local function disass_r6_el(code, addr, out)
> +  create_r6_el(code, addr, out):disass()
> +end
> +
>  -- Return register name for RID.
>  local function regname(r)
>    if r < 32 then return map_gpr[r] end
> @@ -436,8 +683,12 @@ end
>  return {
>    create = create,
>    create_el = create_el,
> +  create_r6 = create_r6,
> +  create_r6_el = create_r6_el,
>    disass = disass,
>    disass_el = disass_el,
> +  disass_r6 = disass_r6,
> +  disass_r6_el = disass_r6_el,
>    regname = regname
>  }
>  
> diff --git a/src/jit/dis_mips64r6.lua b/src/jit/dis_mips64r6.lua
> new file mode 100644
> index 00000000..023c05ab
> --- /dev/null
> +++ b/src/jit/dis_mips64r6.lua
> @@ -0,0 +1,17 @@
> +----------------------------------------------------------------------------
> +-- LuaJIT MIPS64R6 disassembler wrapper module.
> +--
> +-- Copyright (C) 2005-2017 Mike Pall. All rights reserved.
> +-- Released under the MIT license. See Copyright Notice in luajit.h
> +----------------------------------------------------------------------------
> +-- This module just exports the r6 big-endian functions from the
> +-- MIPS disassembler module. All the interesting stuff is there.
> +------------------------------------------------------------------------------
> +
> +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips")
> +return {
> +  create = dis_mips.create_r6,
> +  disass = dis_mips.disass_r6,
> +  regname = dis_mips.regname
> +}
> +
> diff --git a/src/jit/dis_mips64r6el.lua b/src/jit/dis_mips64r6el.lua
> new file mode 100644
> index 00000000..f2988339
> --- /dev/null
> +++ b/src/jit/dis_mips64r6el.lua
> @@ -0,0 +1,17 @@
> +----------------------------------------------------------------------------
> +-- LuaJIT MIPS64R6EL disassembler wrapper module.
> +--
> +-- Copyright (C) 2005-2017 Mike Pall. All rights reserved.
> +-- Released under the MIT license. See Copyright Notice in luajit.h
> +----------------------------------------------------------------------------
> +-- This module just exports the r6 little-endian functions from the
> +-- MIPS disassembler module. All the interesting stuff is there.
> +------------------------------------------------------------------------------
> +
> +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips")
> +return {
> +  create = dis_mips.create_r6_el,
> +  disass = dis_mips.disass_r6_el,
> +  regname = dis_mips.regname
> +}
> +
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 0351e046..cf31a291 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -342,18 +342,38 @@
>  #elif LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 || LUAJIT_TARGET == LUAJIT_ARCH_MIPS64
>  
>  #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL)
> +#if __mips_isa_rev >= 6
> +#define LJ_TARGET_MIPSR6	1
> +#define LJ_TARGET_UNALIGNED	1
> +#endif
>  #if LUAJIT_TARGET == LUAJIT_ARCH_MIPS32
> +#if LJ_TARGET_MIPSR6
> +#define LJ_ARCH_NAME		"mips32r6el"
> +#else
>  #define LJ_ARCH_NAME		"mipsel"
> +#endif
> +#else
> +#if LJ_TARGET_MIPSR6
> +#define LJ_ARCH_NAME		"mips64r6el"
>  #else
>  #define LJ_ARCH_NAME		"mips64el"
>  #endif
> +#endif
>  #define LJ_ARCH_ENDIAN		LUAJIT_LE
>  #else
>  #if LUAJIT_TARGET == LUAJIT_ARCH_MIPS32
> +#if LJ_TARGET_MIPSR6
> +#define LJ_ARCH_NAME		"mips32r6"
> +#else
>  #define LJ_ARCH_NAME		"mips"
> +#endif
> +#else
> +#if LJ_TARGET_MIPSR6
> +#define LJ_ARCH_NAME		"mips64r6"
>  #else
>  #define LJ_ARCH_NAME		"mips64"
>  #endif
> +#endif
>  #define LJ_ARCH_ENDIAN		LUAJIT_BE
>  #endif
>  
> @@ -390,7 +410,9 @@
>  #define LJ_TARGET_UNIFYROT	2	/* Want only IR_BROR. */
>  #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
>  
> -#if _MIPS_ARCH_MIPS32R2 || _MIPS_ARCH_MIPS64R2
> +#if LJ_TARGET_MIPSR6
> +#define LJ_ARCH_VERSION		60
> +#elif _MIPS_ARCH_MIPS32R2 || _MIPS_ARCH_MIPS64R2
>  #define LJ_ARCH_VERSION		20
>  #else
>  #define LJ_ARCH_VERSION		10
> @@ -472,8 +494,13 @@
>  #if !((defined(_MIPS_SIM_ABI32) && _MIPS_SIM == _MIPS_SIM_ABI32) || (defined(_ABIO32) && _MIPS_SIM == _ABIO32))
>  #error "Only o32 ABI supported for MIPS32"
>  #endif
> +#if LJ_TARGET_MIPSR6
> +/* Not that useful, since most available r6 CPUs are 64 bit. */
> +#error "No support for MIPS32R6"
> +#endif
>  #elif LJ_TARGET_MIPS64
>  #if !((defined(_MIPS_SIM_ABI64) && _MIPS_SIM == _MIPS_SIM_ABI64) || (defined(_ABI64) && _MIPS_SIM == _ABI64))
> +/* MIPS32ON64 aka n32 ABI support might be desirable, but difficult. */
>  #error "Only n64 ABI supported for MIPS64"
>  #endif
>  #endif
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 25b96264..96b8c032 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -2159,8 +2159,8 @@ static void asm_setup_regsp(ASMState *as)
>  	  ir->prev = REGSP_HINT(RID_FPRET);
>  	  continue;
>  	}
> -	/* fallthrough */
>  #endif
> +      /* fallthrough */
>        case IR_CALLN: case IR_CALLXS:
>  #if LJ_SOFTFP
>        case IR_MIN: case IR_MAX:
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 23ffc3aa..4626507b 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -101,7 +101,12 @@ static void asm_guard(ASMState *as, MIPSIns mi, Reg rs, Reg rt)
>      as->invmcp = NULL;
>      as->loopinv = 1;
>      as->mcp = p+1;
> +#if !LJ_TARGET_MIPSR6
>      mi = mi ^ ((mi>>28) == 1 ? 0x04000000u : 0x00010000u);  /* Invert cond. */
> +#else
> +    mi = mi ^ ((mi>>28) == 1 ? 0x04000000u :
> +	       (mi>>28) == 4 ? 0x00800000u : 0x00010000u);  /* Invert cond. */
> +#endif
>      target = p;  /* Patch target later in asm_loop_fixup. */
>    }
>    emit_ti(as, MIPSI_LI, RID_TMP, as->snapno);
> @@ -410,7 +415,11 @@ static void asm_callround(ASMState *as, IRIns *ir, IRCallID id)
>  {
>    /* The modified regs must match with the *.dasc implementation. */
>    RegSet drop = RID2RSET(RID_R1)|RID2RSET(RID_R12)|RID2RSET(RID_FPRET)|
> -		RID2RSET(RID_F2)|RID2RSET(RID_F4)|RID2RSET(REGARG_FIRSTFPR);
> +		RID2RSET(RID_F2)|RID2RSET(RID_F4)|RID2RSET(REGARG_FIRSTFPR)
> +#if LJ_TARGET_MIPSR6
> +		|RID2RSET(RID_F21)
> +#endif
> +		;
>    if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
>    ra_evictset(as, drop);
>    ra_destreg(as, ir, RID_FPRET);
> @@ -444,8 +453,13 @@ static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
>  {
>    Reg tmp = ra_scratch(as, rset_exclude(RSET_FPR, left));
>    Reg dest = ra_dest(as, ir, RSET_GPR);
> +#if !LJ_TARGET_MIPSR6
>    asm_guard(as, MIPSI_BC1F, 0, 0);
>    emit_fgh(as, MIPSI_C_EQ_D, 0, tmp, left);
> +#else
> +  asm_guard(as, MIPSI_BC1EQZ, 0, (tmp&31));
> +  emit_fgh(as, MIPSI_CMP_EQ_D, tmp, tmp, left);
> +#endif
>    emit_fg(as, MIPSI_CVT_D_W, tmp, tmp);
>    emit_tg(as, MIPSI_MFC1, dest, tmp);
>    emit_fg(as, MIPSI_CVT_W_D, tmp, left);
> @@ -599,8 +613,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
>  		     (void *)&as->J->k64[LJ_K64_M2P64],
>  		     rset_exclude(RSET_GPR, dest));
>  	  emit_fg(as, MIPSI_TRUNC_L_D, tmp, left);  /* Delay slot. */
> -	  emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
> -	  emit_fgh(as, MIPSI_C_OLT_D, 0, left, tmp);
> +#if !LJ_TARGET_MIPSR6
> +	 emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
> +	 emit_fgh(as, MIPSI_C_OLT_D, 0, left, tmp);
> +#else
> +	 emit_branch(as, MIPSI_BC1NEZ, 0, (left&31), l_end);
> +	 emit_fgh(as, MIPSI_CMP_LT_D, left, left, tmp);
> +#endif
>  	  emit_lsptr(as, MIPSI_LDC1, (tmp & 31),
>  		     (void *)&as->J->k64[LJ_K64_2P63],
>  		     rset_exclude(RSET_GPR, dest));
> @@ -611,8 +630,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
>  		     (void *)&as->J->k32[LJ_K32_M2P64],
>  		     rset_exclude(RSET_GPR, dest));
>  	  emit_fg(as, MIPSI_TRUNC_L_S, tmp, left);  /* Delay slot. */
> -	  emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
> -	  emit_fgh(as, MIPSI_C_OLT_S, 0, left, tmp);
> +#if !LJ_TARGET_MIPSR6
> +	 emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
> +	 emit_fgh(as, MIPSI_C_OLT_S, 0, left, tmp);
> +#else
> +	 emit_branch(as, MIPSI_BC1NEZ, 0, (left&31), l_end);
> +	 emit_fgh(as, MIPSI_CMP_LT_S, left, left, tmp);
> +#endif
>  	  emit_lsptr(as, MIPSI_LWC1, (tmp & 31),
>  		     (void *)&as->J->k32[LJ_K32_2P63],
>  		     rset_exclude(RSET_GPR, dest));
> @@ -840,8 +864,12 @@ static void asm_aref(ASMState *as, IRIns *ir)
>    }
>    base = ra_alloc1(as, ir->op1, RSET_GPR);
>    idx = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, base));
> +#if !LJ_TARGET_MIPSR6
>    emit_dst(as, MIPSI_AADDU, dest, RID_TMP, base);
>    emit_dta(as, MIPSI_SLL, RID_TMP, idx, 3);
> +#else
> +  emit_dst(as, MIPSI_ALSA | MIPSF_A(3-1), dest, idx, base);
> +#endif
>  }
>  
>  /* Inlined hash lookup. Specialized for key type and for const keys.
> @@ -944,8 +972,13 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>      l_end = asm_exitstub_addr(as);
>    }
>    if (!LJ_SOFTFP && irt_isnum(kt)) {
> +#if !LJ_TARGET_MIPSR6
>      emit_branch(as, MIPSI_BC1T, 0, 0, l_end);
>      emit_fgh(as, MIPSI_C_EQ_D, 0, tmpnum, key);
> +#else
> +    emit_branch(as, MIPSI_BC1NEZ, 0, (tmpnum&31), l_end);
> +    emit_fgh(as, MIPSI_CMP_EQ_D, tmpnum, tmpnum, key);
> +#endif
>      *--as->mcp = MIPSI_NOP;  /* Avoid NaN comparison overhead. */
>      emit_branch(as, MIPSI_BEQ, tmp1, RID_ZERO, l_next);
>      emit_tsi(as, MIPSI_SLTIU, tmp1, tmp1, (int32_t)LJ_TISNUM);
> @@ -1196,7 +1229,9 @@ static MIPSIns asm_fxloadins(IRIns *ir)
>    case IRT_I16: return MIPSI_LH;
>    case IRT_U16: return MIPSI_LHU;
>    case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
> +  /* fallthrough */
>    case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
> +  /* fallthrough */
>    default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
>    }
>  }
> @@ -1207,7 +1242,9 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
>    case IRT_I8: case IRT_U8: return MIPSI_SB;
>    case IRT_I16: case IRT_U16: return MIPSI_SH;
>    case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
> +  /* fallthrough */
>    case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
> +  /* fallthrough */
>    default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
>    }
>  }
> @@ -1253,7 +1290,7 @@ static void asm_xload(ASMState *as, IRIns *ir)
>  {
>    Reg dest = ra_dest(as, ir,
>      (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
> -  lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
> +  lua_assert(LJ_TARGET_UNALIGNED || !(ir->op2 & IRXLOAD_UNALIGNED));
>    asm_fusexref(as, asm_fxloadins(ir), dest, ir->op1, RSET_GPR, 0);
>  }
>  
> @@ -1545,7 +1582,7 @@ static void asm_cnew(ASMState *as, IRIns *ir)
>        ofs -= 4; if (LJ_BE) ir++; else ir--;
>      }
>  #else
> -    emit_tsi(as, MIPSI_SD, ra_alloc1(as, ir->op2, allow),
> +    emit_tsi(as, sz == 8 ? MIPSI_SD : MIPSI_SW, ra_alloc1(as, ir->op2, allow),
>  	     RID_RET, sizeof(GCcdata));
>  #endif
>      lua_assert(sz == 4 || sz == 8);
> @@ -1678,6 +1715,7 @@ static void asm_add(ASMState *as, IRIns *ir)
>    } else
>  #endif
>    {
> +    /* TODO MIPSR6: Fuse ADD(BSHL(a,1-4),b) or ADD(ADD(a,a),b) to MIPSI_ALSA. */
>      Reg dest = ra_dest(as, ir, RSET_GPR);
>      Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
>      if (irref_isk(ir->op2)) {
> @@ -1722,8 +1760,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
>      Reg right, left = ra_alloc2(as, ir, RSET_GPR);
>      right = (left >> 8); left &= 255;
>      if (LJ_64 && irt_is64(ir->t)) {
> +#if !LJ_TARGET_MIPSR6
>        emit_dst(as, MIPSI_MFLO, dest, 0, 0);
>        emit_dst(as, MIPSI_DMULT, 0, left, right);
> +#else
> +      emit_dst(as, MIPSI_DMUL, dest, left, right);
> +#endif
>      } else {
>        emit_dst(as, MIPSI_MUL, dest, left, right);
>      }
> @@ -1806,6 +1848,7 @@ static void asm_abs(ASMState *as, IRIns *ir)
>  
>  static void asm_arithov(ASMState *as, IRIns *ir)
>  {
> +  /* TODO MIPSR6: bovc/bnvc. Caveat: no delay slot to load RID_TMP. */
>    Reg right, left, tmp, dest = ra_dest(as, ir, RSET_GPR);
>    lua_assert(!irt_is64(ir->t));
>    if (irref_isk(ir->op2)) {
> @@ -1850,9 +1893,14 @@ static void asm_mulov(ASMState *as, IRIns *ir)
>  						 right), dest));
>    asm_guard(as, MIPSI_BNE, RID_TMP, tmp);
>    emit_dta(as, MIPSI_SRA, RID_TMP, dest, 31);
> +#if !LJ_TARGET_MIPSR6
>    emit_dst(as, MIPSI_MFHI, tmp, 0, 0);
>    emit_dst(as, MIPSI_MFLO, dest, 0, 0);
>    emit_dst(as, MIPSI_MULT, 0, left, right);
> +#else
> +  emit_dst(as, MIPSI_MUL, dest, left, right);
> +  emit_dst(as, MIPSI_MUH, tmp, left, right);
> +#endif
>  }
>  
>  #if LJ_32 && LJ_HASFFI
> @@ -2076,6 +2124,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>      Reg dest = ra_dest(as, ir, RSET_FPR);
>      Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>      right = (left >> 8); left &= 255;
> +#if !LJ_TARGET_MIPSR6
>      if (dest == left) {
>        emit_fg(as, MIPSI_MOVT_D, dest, right);
>      } else {
> @@ -2083,19 +2132,37 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>        if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
>      }
>      emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
> +#else
> +    emit_fgh(as, ismax ? MIPSI_MAX_D : MIPSI_MIN_D, dest, left, right);
> +#endif
>  #endif
>    } else {
>      Reg dest = ra_dest(as, ir, RSET_GPR);
>      Reg right, left = ra_alloc2(as, ir, RSET_GPR);
>      right = (left >> 8); left &= 255;
> -    if (dest == left) {
> -      emit_dst(as, MIPSI_MOVN, dest, right, RID_TMP);
> +    if (left == right) {
> +      if (dest != left) emit_move(as, dest, left);
>      } else {
> -      emit_dst(as, MIPSI_MOVZ, dest, left, RID_TMP);
> -      if (dest != right) emit_move(as, dest, right);
> +#if !LJ_TARGET_MIPSR6
> +      if (dest == left) {
> +	emit_dst(as, MIPSI_MOVN, dest, right, RID_TMP);
> +      } else {
> +	emit_dst(as, MIPSI_MOVZ, dest, left, RID_TMP);
> +	if (dest != right) emit_move(as, dest, right);
> +      }
> +#else
> +      emit_dst(as, MIPSI_OR, dest, dest, RID_TMP);
> +      if (dest != right) {
> +	emit_dst(as, MIPSI_SELNEZ, RID_TMP, right, RID_TMP);
> +	emit_dst(as, MIPSI_SELEQZ, dest, left, RID_TMP);
> +      } else {
> +	emit_dst(as, MIPSI_SELEQZ, RID_TMP, left, RID_TMP);
> +	emit_dst(as, MIPSI_SELNEZ, dest, right, RID_TMP);
> +      }
> +#endif
> +      emit_dst(as, MIPSI_SLT, RID_TMP,
> +	       ismax ? left : right, ismax ? right : left);
>      }
> -    emit_dst(as, MIPSI_SLT, RID_TMP,
> -	     ismax ? left : right, ismax ? right : left);
>    }
>  }
>  
> @@ -2179,10 +2246,18 @@ static void asm_comp(ASMState *as, IRIns *ir)
>  #if LJ_SOFTFP
>      asm_sfpcomp(as, ir);
>  #else
> +#if !LJ_TARGET_MIPSR6
>      Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>      right = (left >> 8); left &= 255;
>      asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
>      emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
> +#else
> +    Reg tmp, right, left = ra_alloc2(as, ir, RSET_FPR);
> +    right = (left >> 8); left &= 255;
> +    tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_FPR, left), right));
> +    asm_guard(as, (op&1) ? MIPSI_BC1NEZ : MIPSI_BC1EQZ, 0, (tmp&31));
> +    emit_fgh(as, MIPSI_CMP_LT_D + ((op&3) ^ ((op>>2)&1)), tmp, left, right);
> +#endif
>  #endif
>    } else {
>      Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
> @@ -2218,9 +2293,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
>    if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
>  #if LJ_SOFTFP
>      asm_sfpcomp(as, ir);
> -#else
> +#elif !LJ_TARGET_MIPSR6
>      asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
>      emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
> +#else
> +    Reg tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_FPR, left), right));
> +    asm_guard(as, (ir->o & 1) ? MIPSI_BC1NEZ : MIPSI_BC1EQZ, 0, (tmp&31));
> +    emit_fgh(as, MIPSI_CMP_EQ_D, tmp, left, right);
>  #endif
>    } else {
>      asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
> @@ -2623,7 +2702,12 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
>        if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
>  	  ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
>  	   (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
> -	   (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
> +#if !LJ_TARGET_MIPSR6
> +	   (p[-1] & 0xffe00000u) == MIPSI_BC1F
> +#else
> +	   (p[-1] & 0xff600000u) == MIPSI_BC1EQZ
> +#endif
> +	  )) {
>  	ptrdiff_t delta = target - p;
>  	if (((delta + 0x8000) >> 16) == 0) {  /* Patch in-range branch. */
>  	patchbranch:
> diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
> index bb6593ae..313d030a 100644
> --- a/src/lj_emit_mips.h
> +++ b/src/lj_emit_mips.h
> @@ -138,6 +138,7 @@ static void emit_loadu64(ASMState *as, Reg r, uint64_t u64)
>      } else if (emit_kdelta1(as, r, (intptr_t)u64)) {
>        return;
>      } else {
> +      /* TODO MIPSR6: Use DAHI & DATI. Caveat: sign-extension. */
>        if ((u64 & 0xffff)) {
>  	emit_tsi(as, MIPSI_ORI, r, r, u64 & 0xffff);
>        }
> @@ -236,10 +237,22 @@ static void emit_jmp(ASMState *as, MCode *target)
>  static void emit_call(ASMState *as, void *target, int needcfa)
>  {
>    MCode *p = as->mcp;
> -  *--p = MIPSI_NOP;
> +#if LJ_TARGET_MIPSR6
> +  ptrdiff_t delta = (char *)target - (char *)p;
> +  if ((((delta>>2) + 0x02000000) >> 26) == 0) {  /* Try compact call first. */
> +    *--p = MIPSI_BALC | (((uintptr_t)delta >>2) & 0x03ffffffu);
> +    as->mcp = p;
> +    return;
> +  }
> +#endif
> +  *--p = MIPSI_NOP;  /* Delay slot. */
>    if ((((uintptr_t)target ^ (uintptr_t)p) >> 28) == 0) {
> +#if !LJ_TARGET_MIPSR6
>      *--p = (((uintptr_t)target & 1) ? MIPSI_JALX : MIPSI_JAL) |
>  	   (((uintptr_t)target >>2) & 0x03ffffffu);
> +#else
> +    *--p = MIPSI_JAL | (((uintptr_t)target >>2) & 0x03ffffffu);
> +#endif
>    } else {  /* Target out of range: need indirect call. */
>      *--p = MIPSI_JALR | MIPSF_S(RID_CFUNCADDR);
>      needcfa = 1;
> diff --git a/src/lj_jit.h b/src/lj_jit.h
> index c06829ab..a8b6f9a7 100644
> --- a/src/lj_jit.h
> +++ b/src/lj_jit.h
> @@ -51,10 +51,18 @@
>  /* Names for the CPU-specific flags. Must match the order above. */
>  #define JIT_F_CPU_FIRST		JIT_F_MIPSXXR2
>  #if LJ_TARGET_MIPS32
> +#if LJ_TARGET_MIPSR6
> +#define JIT_F_CPUSTRING		"\010MIPS32R6"
> +#else
>  #define JIT_F_CPUSTRING		"\010MIPS32R2"
> +#endif
> +#else
> +#if LJ_TARGET_MIPSR6
> +#define JIT_F_CPUSTRING		"\010MIPS64R6"
>  #else
>  #define JIT_F_CPUSTRING		"\010MIPS64R2"
>  #endif
> +#endif
>  #else
>  #define JIT_F_CPU_FIRST		0
>  #define JIT_F_CPUSTRING		""
> diff --git a/src/lj_target_mips.h b/src/lj_target_mips.h
> index 740687b3..84db6012 100644
> --- a/src/lj_target_mips.h
> +++ b/src/lj_target_mips.h
> @@ -223,6 +223,8 @@ typedef enum MIPSIns {
>    MIPSI_ADDIU = 0x24000000,
>    MIPSI_SUB = 0x00000022,
>    MIPSI_SUBU = 0x00000023,
> +
> +#if !LJ_TARGET_MIPSR6
>    MIPSI_MUL = 0x70000002,
>    MIPSI_DIV = 0x0000001a,
>    MIPSI_DIVU = 0x0000001b,
> @@ -232,6 +234,15 @@ typedef enum MIPSIns {
>    MIPSI_MFHI = 0x00000010,
>    MIPSI_MFLO = 0x00000012,
>    MIPSI_MULT = 0x00000018,
> +#else
> +  MIPSI_MUL = 0x00000098,
> +  MIPSI_MUH = 0x000000d8,
> +  MIPSI_DIV = 0x0000009a,
> +  MIPSI_DIVU = 0x0000009b,
> +
> +  MIPSI_SELEQZ = 0x00000035,
> +  MIPSI_SELNEZ = 0x00000037,
> +#endif
>  
>    MIPSI_SLL = 0x00000000,
>    MIPSI_SRL = 0x00000002,
> @@ -253,8 +264,13 @@ typedef enum MIPSIns {
>    MIPSI_B = 0x10000000,
>    MIPSI_J = 0x08000000,
>    MIPSI_JAL = 0x0c000000,
> +#if !LJ_TARGET_MIPSR6
>    MIPSI_JALX = 0x74000000,
>    MIPSI_JR = 0x00000008,
> +#else
> +  MIPSI_JR = 0x00000009,
> +  MIPSI_BALC = 0xe8000000,
> +#endif
>    MIPSI_JALR = 0x0000f809,
>  
>    MIPSI_BEQ = 0x10000000,
> @@ -282,15 +298,23 @@ typedef enum MIPSIns {
>  
>    /* MIPS64 instructions. */
>    MIPSI_DADD = 0x0000002c,
> -  MIPSI_DADDI = 0x60000000,
>    MIPSI_DADDU = 0x0000002d,
>    MIPSI_DADDIU = 0x64000000,
>    MIPSI_DSUB = 0x0000002e,
>    MIPSI_DSUBU = 0x0000002f,
> +#if !LJ_TARGET_MIPSR6
>    MIPSI_DDIV = 0x0000001e,
>    MIPSI_DDIVU = 0x0000001f,
>    MIPSI_DMULT = 0x0000001c,
>    MIPSI_DMULTU = 0x0000001d,
> +#else
> +  MIPSI_DDIV = 0x0000009e,
> +  MIPSI_DMOD = 0x000000de,
> +  MIPSI_DDIVU = 0x0000009f,
> +  MIPSI_DMODU = 0x000000df,
> +  MIPSI_DMUL = 0x0000009c,
> +  MIPSI_DMUH = 0x000000dc,
> +#endif
>  
>    MIPSI_DSLL = 0x00000038,
>    MIPSI_DSRL = 0x0000003a,
> @@ -308,6 +332,11 @@ typedef enum MIPSIns {
>    MIPSI_ASUBU = LJ_32 ? MIPSI_SUBU : MIPSI_DSUBU,
>    MIPSI_AL = LJ_32 ? MIPSI_LW : MIPSI_LD,
>    MIPSI_AS = LJ_32 ? MIPSI_SW : MIPSI_SD,
> +#if LJ_TARGET_MIPSR6
> +  MIPSI_LSA = 0x00000005,
> +  MIPSI_DLSA = 0x00000015,
> +  MIPSI_ALSA = LJ_32 ? MIPSI_LSA : MIPSI_DLSA,
> +#endif
>  
>    /* Extract/insert instructions. */
>    MIPSI_DEXTM = 0x7c000001,
> @@ -317,18 +346,19 @@ typedef enum MIPSIns {
>    MIPSI_DINSU = 0x7c000006,
>    MIPSI_DINS = 0x7c000007,
>  
> -  MIPSI_RINT_D = 0x4620001a,
> -  MIPSI_RINT_S = 0x4600001a,
> -  MIPSI_RINT = 0x4400001a,
>    MIPSI_FLOOR_D = 0x4620000b,
> -  MIPSI_CEIL_D = 0x4620000a,
> -  MIPSI_ROUND_D = 0x46200008,
>  
>    /* FP instructions. */
>    MIPSI_MOV_S = 0x46000006,
>    MIPSI_MOV_D = 0x46200006,
> +#if !LJ_TARGET_MIPSR6
>    MIPSI_MOVT_D = 0x46210011,
>    MIPSI_MOVF_D = 0x46200011,
> +#else
> +  MIPSI_MIN_D = 0x4620001C,
> +  MIPSI_MAX_D = 0x4620001E,
> +  MIPSI_SEL_D = 0x46200010,
> +#endif
>  
>    MIPSI_ABS_D = 0x46200005,
>    MIPSI_NEG_D = 0x46200007,
> @@ -363,15 +393,23 @@ typedef enum MIPSIns {
>    MIPSI_DMTC1 = 0x44a00000,
>    MIPSI_DMFC1 = 0x44200000,
>  
> +#if !LJ_TARGET_MIPSR6
>    MIPSI_BC1F = 0x45000000,
>    MIPSI_BC1T = 0x45010000,
> -
>    MIPSI_C_EQ_D = 0x46200032,
>    MIPSI_C_OLT_S = 0x46000034,
>    MIPSI_C_OLT_D = 0x46200034,
>    MIPSI_C_ULT_D = 0x46200035,
>    MIPSI_C_OLE_D = 0x46200036,
>    MIPSI_C_ULE_D = 0x46200037,
> +#else
> +  MIPSI_BC1EQZ = 0x45200000,
> +  MIPSI_BC1NEZ = 0x45a00000,
> +  MIPSI_CMP_EQ_D = 0x46a00002,
> +  MIPSI_CMP_LT_S = 0x46800004,
> +  MIPSI_CMP_LT_D = 0x46a00004,
> +#endif
> +
>  } MIPSIns;
>  
>  #endif
> diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
> index 9839b5ac..44fba36c 100644
> --- a/src/vm_mips64.dasc
> +++ b/src/vm_mips64.dasc
> @@ -83,6 +83,10 @@
>  |
>  |.define FRET1,		f0
>  |.define FRET2,		f2
> +|
> +|.define FTMP0,		f20
> +|.define FTMP1,		f21
> +|.define FTMP2,		f22
>  |.endif
>  |
>  |// Stack layout while in interpreter. Must match with lj_frame.h.
> @@ -310,10 +314,10 @@
>  |.endmacro
>  |
>  |// Assumes DISPATCH is relative to GL.
> -#define DISPATCH_GL(field)      (GG_DISP2G + (int)offsetof(global_State, field))
> -#define DISPATCH_J(field)       (GG_DISP2J + (int)offsetof(jit_State, field))
> -#define GG_DISP2GOT             (GG_OFS(got) - GG_OFS(dispatch))
> -#define DISPATCH_GOT(name)      (GG_DISP2GOT + sizeof(void*)*LJ_GOT_##name)
> +#define DISPATCH_GL(field)	(GG_DISP2G + (int)offsetof(global_State, field))
> +#define DISPATCH_J(field)	(GG_DISP2J + (int)offsetof(jit_State, field))
> +#define GG_DISP2GOT		(GG_OFS(got) - GG_OFS(dispatch))
> +#define DISPATCH_GOT(name)	(GG_DISP2GOT + sizeof(void*)*LJ_GOT_##name)
>  |
>  #define PC2PROTO(field)  ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
>  |
> @@ -492,8 +496,15 @@ static void build_subroutines(BuildCtx *ctx)
>    |7:  // Less results wanted.
>    |  subu TMP0, RD, TMP2
>    |  dsubu TMP0, BASE, TMP0		// Either keep top or shrink it.
> +  |.if MIPSR6
> +  |  selnez TMP0, TMP0, TMP2		// LUA_MULTRET+1 case?
> +  |  seleqz BASE, BASE, TMP2
> +  |  b <3
> +  |.  or BASE, BASE, TMP0
> +  |.else
>    |  b <3
>    |.  movn BASE, TMP0, TMP2		// LUA_MULTRET+1 case?
> +  |.endif
>    |
>    |8:  // Corner case: need to grow stack for filling up results.
>    |  // This can happen if:
> @@ -1125,11 +1136,16 @@ static void build_subroutines(BuildCtx *ctx)
>    |.endmacro
>    |
>    |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1 and has delay slot!
> +  |// MIPSR6: no delay slot, but a forbidden slot.
>    |.macro ffgccheck
>    |  ld TMP0, DISPATCH_GL(gc.total)(DISPATCH)
>    |  ld TMP1, DISPATCH_GL(gc.threshold)(DISPATCH)
>    |  dsubu AT, TMP0, TMP1
> +  |.if MIPSR6
> +  |  bgezalc AT, ->fff_gcstep
> +  |.else
>    |  bgezal AT, ->fff_gcstep
> +  |.endif
>    |.endmacro
>    |
>    |//-- Base library: checks -----------------------------------------------
> @@ -1157,7 +1173,13 @@ static void build_subroutines(BuildCtx *ctx)
>    |  sltu TMP1, TISNUM, TMP0
>    |  not TMP2, TMP0
>    |  li TMP3, ~LJ_TISNUM
> +  |.if MIPSR6
> +  |  selnez TMP2, TMP2, TMP1
> +  |  seleqz TMP3, TMP3, TMP1
> +  |  or TMP2, TMP2, TMP3
> +  |.else
>    |  movz TMP2, TMP3, TMP1
> +  |.endif
>    |  dsll TMP2, TMP2, 3
>    |  daddu TMP2, CFUNC:RB, TMP2
>    |  b ->fff_restv
> @@ -1169,7 +1191,11 @@ static void build_subroutines(BuildCtx *ctx)
>    |  gettp TMP2, CARG1
>    |  daddiu TMP0, TMP2, -LJ_TTAB
>    |  daddiu TMP1, TMP2, -LJ_TUDATA
> +  |.if MIPSR6
> +  |  selnez TMP0, TMP1, TMP0
> +  |.else
>    |  movn TMP0, TMP1, TMP0
> +  |.endif
>    |  bnez TMP0, >6
>    |.  cleartp TAB:CARG1
>    |1:  // Field metatable must be at same offset for GCtab and GCudata!
> @@ -1208,7 +1234,13 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |6:
>    |  sltiu AT, TMP2, LJ_TISNUM
> +  |.if MIPSR6
> +  |  selnez TMP0, TISNUM, AT
> +  |  seleqz AT, TMP2, AT
> +  |  or TMP2, TMP0, AT
> +  |.else
>    |  movn TMP2, TISNUM, AT
> +  |.endif
>    |  dsll TMP2, TMP2, 3
>    |   dsubu TMP0, DISPATCH, TMP2
>    |  b <2
> @@ -1270,8 +1302,13 @@ static void build_subroutines(BuildCtx *ctx)
>    |  or TMP0, TMP0, TMP1
>    |  bnez TMP0, ->fff_fallback
>    |.  sd BASE, L->base			// Add frame since C call can throw.
> +  |.if MIPSR6
> +  |  sd PC, SAVE_PC			// Redundant (but a defined value).
> +  |  ffgccheck
> +  |.else
>    |  ffgccheck
>    |.  sd PC, SAVE_PC			// Redundant (but a defined value).
> +  |.endif
>    |  load_got lj_strfmt_number
>    |  move CARG1, L
>    |  call_intern lj_strfmt_number	// (lua_State *L, cTValue *o)
> @@ -1441,8 +1478,15 @@ static void build_subroutines(BuildCtx *ctx)
>    |  addiu AT, TMP0, -LUA_YIELD
>    |    daddu CARG3, CARG2, TMP0
>    |   daddiu TMP3, CARG2, 8
> +  |.if MIPSR6
> +  |  seleqz CARG2, CARG2, AT
> +  |  selnez TMP3, TMP3, AT
> +  |  bgtz AT, ->fff_fallback		// st > LUA_YIELD?
> +  |.  or CARG2, TMP3, CARG2
> +  |.else
>    |  bgtz AT, ->fff_fallback		// st > LUA_YIELD?
>    |.  movn CARG2, TMP3, AT
> +  |.endif
>    |   xor TMP2, TMP2, CARG3
>    |  bnez TMP1, ->fff_fallback		// cframe != 0?
>    |.  or AT, TMP2, TMP0
> @@ -1754,7 +1798,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  b ->fff_res
>    |.  li RD, (2+1)*8
>    |
> -  |.macro math_minmax, name, intins, fpins
> +  |.macro math_minmax, name, intins, intinsc, fpins
>    |  .ffunc_1 name
>    |  daddu TMP3, BASE, NARGS8:RC
>    |  checkint CARG1, >5
> @@ -1766,7 +1810,13 @@ static void build_subroutines(BuildCtx *ctx)
>    |.  sextw CARG1, CARG1
>    |  lw CARG2, LO(TMP2)
>    |.  slt AT, CARG1, CARG2
> +  |.if MIPSR6
> +  |  intins TMP1, CARG2, AT
> +  |  intinsc CARG1, CARG1, AT
> +  |  or CARG1, CARG1, TMP1
> +  |.else
>    |  intins CARG1, CARG2, AT
> +  |.endif
>    |  daddiu TMP2, TMP2, 8
>    |  zextw CARG1, CARG1
>    |  b <1
> @@ -1802,13 +1852,23 @@ static void build_subroutines(BuildCtx *ctx)
>    |.  nop
>    |7:
>    |.if FPU
> +  |.if MIPSR6
> +  |  fpins FRET1, FRET1, FARG1
> +  |.else
>    |  c.olt.d FRET1, FARG1
>    |  fpins FRET1, FARG1
> +  |.endif
>    |.else
>    |  bal ->vm_sfcmpolt
>    |.  nop
> +  |.if MIPSR6
> +  |  intins AT, CARG2, CRET1
> +  |  intinsc CARG1, CARG1, CRET1
> +  |  or CARG1, CARG1, AT
> +  |.else
>    |  intins CARG1, CARG2, CRET1
>    |.endif
> +  |.endif
>    |  b <6
>    |.  daddiu TMP2, TMP2, 8
>    |
> @@ -1828,8 +1888,13 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |.endmacro
>    |
> -  |  math_minmax math_min, movz, movf.d
> -  |  math_minmax math_max, movn, movt.d
> +  |.if MIPSR6
> +  |  math_minmax math_min, seleqz, selnez, min.d
> +  |  math_minmax math_max, selnez, seleqz, max.d
> +  |.else
> +  |  math_minmax math_min, movz, _, movf.d
> +  |  math_minmax math_max, movn, _, movt.d
> +  |.endif
>    |
>    |//-- String library -----------------------------------------------------
>    |
> @@ -1854,7 +1919,9 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |.ffunc string_char			// Only handle the 1-arg case here.
>    |  ffgccheck
> +  |.if not MIPSR6
>    |.  nop
> +  |.endif
>    |  ld CARG1, 0(BASE)
>    |  gettp TMP0, CARG1
>    |  xori AT, NARGS8:RC, 8		// Exactly 1 argument.
> @@ -1884,7 +1951,9 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |.ffunc string_sub
>    |  ffgccheck
> +  |.if not MIPSR6
>    |.  nop
> +  |.endif
>    |  addiu AT, NARGS8:RC, -16
>    |  ld TMP0, 0(BASE)
>    |  bltz AT, ->fff_fallback
> @@ -1907,8 +1976,30 @@ static void build_subroutines(BuildCtx *ctx)
>    |  addiu TMP0, CARG2, 1
>    |  addu TMP1, CARG4, TMP0
>    |   slt TMP3, CARG3, r0
> +  |.if MIPSR6
> +  |  seleqz CARG4, CARG4, AT
> +  |  selnez TMP1, TMP1, AT
> +  |  or CARG4, TMP1, CARG4		// if (end < 0) end += len+1
> +  |.else
>    |  movn CARG4, TMP1, AT		// if (end < 0) end += len+1
> +  |.endif
>    |   addu TMP1, CARG3, TMP0
> +  |.if MIPSR6
> +  |   selnez TMP1, TMP1, TMP3
> +  |   seleqz CARG3, CARG3, TMP3
> +  |   or CARG3, TMP1, CARG3		// if (start < 0) start += len+1
> +  |   li TMP2, 1
> +  |  slt AT, CARG4, r0
> +  |   slt TMP3, r0, CARG3
> +  |  seleqz CARG4, CARG4, AT		// if (end < 0) end = 0
> +  |   selnez CARG3, CARG3, TMP3
> +  |   seleqz TMP2, TMP2, TMP3
> +  |   or CARG3, TMP2, CARG3		// if (start < 1) start = 1
> +  |  slt AT, CARG2, CARG4
> +  |  seleqz CARG4, CARG4, AT
> +  |  selnez CARG2, CARG2, AT
> +  |  or CARG4, CARG2, CARG4		// if (end > len) end = len
> +  |.else
>    |   movn CARG3, TMP1, TMP3		// if (start < 0) start += len+1
>    |   li TMP2, 1
>    |  slt AT, CARG4, r0
> @@ -1917,6 +2008,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |   movz CARG3, TMP2, TMP3		// if (start < 1) start = 1
>    |  slt AT, CARG2, CARG4
>    |  movn CARG4, CARG2, AT		// if (end > len) end = len
> +  |.endif
>    |   daddu CARG2, STR:CARG1, CARG3
>    |  subu CARG3, CARG4, CARG3		// len = end - start
>    |   daddiu CARG2, CARG2, sizeof(GCstr)-1
> @@ -1978,7 +2070,13 @@ static void build_subroutines(BuildCtx *ctx)
>    |  slt AT, CARG1, r0
>    |  dsrlv CRET1, TMP0, CARG3
>    |  dsubu TMP0, r0, CRET1
> +  |.if MIPSR6
> +  |  selnez TMP0, TMP0, AT
> +  |  seleqz CRET1, CRET1, AT
> +  |  or CRET1, CRET1, TMP0
> +  |.else
>    |  movn CRET1, TMP0, AT
> +  |.endif
>    |  jr ra
>    |.  zextw CRET1, CRET1
>    |1:
> @@ -2001,14 +2099,28 @@ static void build_subroutines(BuildCtx *ctx)
>    |  slt AT, CARG1, r0
>    |  dsrlv CRET1, CRET2, TMP0
>    |  dsubu CARG1, r0, CRET1
> +  |.if MIPSR6
> +  |  seleqz CRET1, CRET1, AT
> +  |  selnez CARG1, CARG1, AT
> +  |  or CRET1, CRET1, CARG1
> +  |.else
>    |  movn CRET1, CARG1, AT
> +  |.endif
>    |  li CARG1, 64
>    |  subu TMP0, CARG1, TMP0
>    |  dsllv CRET2, CRET2, TMP0	// Integer check.
>    |  sextw AT, CRET1
>    |  xor AT, CRET1, AT		// Range check.
>    |  jr ra
> +  |.if MIPSR6
> +  |  seleqz AT, AT, CRET2
> +  |  selnez CRET2, CRET2, CRET2
> +  |  jr ra
> +  |.  or CRET2, AT, CRET2
> +  |.else
> +  |  jr ra
>    |.  movz CRET2, AT, CRET2
> +  |.endif
>    |1:
>    |  jr ra
>    |.  li CRET2, 1
> @@ -2518,15 +2630,22 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |// Hard-float round to integer.
>    |// Modifies AT, TMP0, FRET1, FRET2, f4. Keeps all others incl. FARG1.
> +  |// MIPSR6: Modifies FTMP1, too.
>    |.macro vm_round_hf, func
>    |  lui TMP0, 0x4330			// Hiword of 2^52 (double).
>    |  dsll TMP0, TMP0, 32
>    |  dmtc1 TMP0, f4
>    |  abs.d FRET2, FARG1			// |x|
>    |    dmfc1 AT, FARG1
> +  |.if MIPSR6
> +  |  cmp.lt.d FTMP1, FRET2, f4
> +  |   add.d FRET1, FRET2, f4		// (|x| + 2^52) - 2^52
> +  |  bc1eqz FTMP1, >1			// Truncate only if |x| < 2^52.
> +  |.else
>    |  c.olt.d 0, FRET2, f4
>    |   add.d FRET1, FRET2, f4		// (|x| + 2^52) - 2^52
>    |  bc1f 0, >1				// Truncate only if |x| < 2^52.
> +  |.endif
>    |.  sub.d FRET1, FRET1, f4
>    |    slt AT, AT, r0
>    |.if "func" == "ceil"
> @@ -2537,16 +2656,38 @@ static void build_subroutines(BuildCtx *ctx)
>    |.if "func" == "trunc"
>    |   dsll TMP0, TMP0, 32
>    |   dmtc1 TMP0, f4
> +  |.if MIPSR6
> +  |  cmp.lt.d FTMP1, FRET2, FRET1	// |x| < result?
> +  |   sub.d FRET2, FRET1, f4
> +  |  sel.d  FTMP1, FRET1, FRET2		// If yes, subtract +1.
> +  |  dmtc1 AT, FRET1
> +  |  neg.d FRET2, FTMP1
> +  |  jr ra
> +  |.  sel.d FRET1, FTMP1, FRET2		// Merge sign bit back in.
> +  |.else
>    |  c.olt.d 0, FRET2, FRET1		// |x| < result?
>    |   sub.d FRET2, FRET1, f4
>    |  movt.d FRET1, FRET2, 0		// If yes, subtract +1.
>    |  neg.d FRET2, FRET1
>    |  jr ra
>    |.  movn.d FRET1, FRET2, AT		// Merge sign bit back in.
> +  |.endif
>    |.else
>    |  neg.d FRET2, FRET1
>    |   dsll TMP0, TMP0, 32
>    |   dmtc1 TMP0, f4
> +  |.if MIPSR6
> +  |  dmtc1 AT, FTMP1
> +  |  sel.d FTMP1, FRET1, FRET2
> +  |.if "func" == "ceil"
> +  |  cmp.lt.d FRET1, FTMP1, FARG1	// x > result?
> +  |.else
> +  |  cmp.lt.d FRET1, FARG1, FTMP1	// x < result?
> +  |.endif
> +  |   sub.d FRET2, FTMP1, f4		// If yes, subtract +-1.
> +  |  jr ra
> +  |.  sel.d FRET1, FTMP1, FRET2
> +  |.else
>    |  movn.d FRET1, FRET2, AT		// Merge sign bit back in.
>    |.if "func" == "ceil"
>    |  c.olt.d 0, FRET1, FARG1		// x > result?
> @@ -2557,6 +2698,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  jr ra
>    |.  movt.d FRET1, FRET2, 0
>    |.endif
> +  |.endif
>    |1:
>    |  jr ra
>    |.  mov.d FRET1, FARG1
> @@ -2701,7 +2843,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |.  li CRET1, 0
>    |.endif
>    |
> -  |.macro sfmin_max, name, intins
> +  |.macro sfmin_max, name, intins, intinsc
>    |->vm_sf .. name:
>    |.if JIT and not FPU
>    |  move TMP2, ra
> @@ -2710,13 +2852,25 @@ static void build_subroutines(BuildCtx *ctx)
>    |  move ra, TMP2
>    |  move TMP0, CRET1
>    |  move CRET1, CARG1
> +  |.if MIPSR6
> +  |  intins CRET1, CRET1, TMP0
> +  |  intinsc TMP0, CARG2, TMP0
> +  |  jr ra
> +  |.  or CRET1, CRET1, TMP0
> +  |.else
>    |  jr ra
>    |.  intins CRET1, CARG2, TMP0
>    |.endif
> +  |.endif
>    |.endmacro
>    |
> -  |  sfmin_max min, movz
> -  |  sfmin_max max, movn
> +  |.if MIPSR6
> +  |  sfmin_max min, selnez, seleqz
> +  |  sfmin_max max, seleqz, selnez
> +  |.else
> +  |  sfmin_max min, movz, _
> +  |  sfmin_max max, movn, _
> +  |.endif
>    |
>    |//-----------------------------------------------------------------------
>    |//-- Miscellaneous functions --------------------------------------------
> @@ -2885,7 +3039,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |    lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
>      |  slt AT, CARG1, CARG2
>      |    addu TMP2, TMP2, TMP3
> +    |.if MIPSR6
> +    |  movop TMP2, TMP2, AT
> +    |.else
>      |  movop TMP2, r0, AT
> +    |.endif
>      |1:
>      |  daddu PC, PC, TMP2
>      |  ins_next
> @@ -2903,16 +3061,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.endif
>      |3:  // RA and RD are both numbers.
>      |.if FPU
> -    |  fcomp f20, f22
> +    |.if MIPSR6
> +    |  fcomp FTMP0, FTMP0, FTMP2
> +    |   addu TMP2, TMP2, TMP3
> +    |  mfc1 TMP3, FTMP0
> +    |  b <1
> +    |.  fmovop TMP2, TMP2, TMP3
> +    |.else
> +    |  fcomp FTMP0, FTMP2
>      |   addu TMP2, TMP2, TMP3
>      |  b <1
>      |.  fmovop TMP2, r0
> +    |.endif
>      |.else
>      |  bal sfcomp
>      |.   addu TMP2, TMP2, TMP3
>      |  b <1
> +    |.if MIPSR6
> +    |.  movop TMP2, TMP2, CRET1
> +    |.else
>      |.  movop TMP2, r0, CRET1
>      |.endif
> +    |.endif
>      |
>      |4:  // RA is a number, RD is not a number.
>      |  bne CARG4, TISNUM, ->vmeta_comp
> @@ -2959,15 +3129,27 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.endif
>      |.endmacro
>      |
> +    |.if MIPSR6
> +    if (op == BC_ISLT) {
> +      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, selnez, selnez, cmp.lt.d, ->vm_sfcmpolt
> +    } else if (op == BC_ISGE) {
> +      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, seleqz, seleqz, cmp.lt.d, ->vm_sfcmpolt
> +    } else if (op == BC_ISLE) {
> +      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, seleqz, seleqz, cmp.ult.d, ->vm_sfcmpult
> +    } else {
> +      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, selnez, selnez, cmp.ult.d, ->vm_sfcmpult
> +    }
> +    |.else
>      if (op == BC_ISLT) {
> -      |  bc_comp f20, f22, CARG1, CARG2, movz, movf, c.olt.d, ->vm_sfcmpolt
> +      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, movz, movf, c.olt.d, ->vm_sfcmpolt
>      } else if (op == BC_ISGE) {
> -      |  bc_comp f20, f22, CARG1, CARG2, movn, movt, c.olt.d, ->vm_sfcmpolt
> +      |  bc_comp FTMP0, FTMP2, CARG1, CARG2, movn, movt, c.olt.d, ->vm_sfcmpolt
>      } else if (op == BC_ISLE) {
> -      |  bc_comp f22, f20, CARG2, CARG1, movn, movt, c.ult.d, ->vm_sfcmpult
> +      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, movn, movt, c.ult.d, ->vm_sfcmpult
>      } else {
> -      |  bc_comp f22, f20, CARG2, CARG1, movz, movf, c.ult.d, ->vm_sfcmpult
> +      |  bc_comp FTMP2, FTMP0, CARG2, CARG1, movz, movf, c.ult.d, ->vm_sfcmpult
>      }
> +    |.endif
>      break;
>  
>    case BC_ISEQV: case BC_ISNEV:
> @@ -3013,7 +3195,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |2:  // Check if the tags are the same and it's a table or userdata.
>      |  xor AT, CARG3, CARG4			// Same type?
>      |  sltiu TMP0, CARG3, LJ_TISTABUD+1		// Table or userdata?
> +    |.if MIPSR6
> +    |  seleqz TMP0, TMP0, AT
> +    |.else
>      |  movn TMP0, r0, AT
> +    |.endif
>      if (vk) {
>        |  beqz TMP0, <1
>      } else {
> @@ -3063,11 +3249,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |   lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
>      |  xor TMP1, CARG1, CARG2
>      |   addu TMP2, TMP2, TMP3
> +    |.if MIPSR6
> +    if (vk) {
> +      |  seleqz TMP2, TMP2, TMP1
> +    } else {
> +      |  selnez TMP2, TMP2, TMP1
> +    }
> +    |.else
>      if (vk) {
>        |  movn TMP2, r0, TMP1
>      } else {
>        |  movz TMP2, r0, TMP1
>      }
> +    |.endif
>      |  daddu PC, PC, TMP2
>      |  ins_next
>      break;
> @@ -3094,6 +3288,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  bne CARG4, TISNUM, >6
>      |.   addu TMP2, TMP2, TMP3
>      |  xor AT, CARG1, CARG2
> +    |.if MIPSR6
> +    if (vk) {
> +      | seleqz TMP2, TMP2, AT
> +      |1:
> +      |  daddu PC, PC, TMP2
> +      |2:
> +    } else {
> +      |  selnez TMP2, TMP2, AT
> +      |1:
> +      |2:
> +      |  daddu PC, PC, TMP2
> +    }
> +    |.else
>      if (vk) {
>        | movn TMP2, r0, AT
>        |1:
> @@ -3105,6 +3312,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |2:
>        |  daddu PC, PC, TMP2
>      }
> +    |.endif
>      |  ins_next
>      |
>      |3:  // RA is not an integer.
> @@ -3117,30 +3325,49 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.   addu TMP2, TMP2, TMP3
>      |  sltu AT, CARG4, TISNUM
>      |.if FPU
> -    |  ldc1 f20, 0(RA)
> -    |   ldc1 f22, 0(RD)
> +    |  ldc1 FTMP0, 0(RA)
> +    |   ldc1 FTMP2, 0(RD)
>      |.endif
>      |  beqz AT, >5
>      |.  nop
>      |4:  // RA and RD are both numbers.
>      |.if FPU
> -    |  c.eq.d f20, f22
> +    |.if MIPSR6
> +    |  cmp.eq.d FTMP0, FTMP0, FTMP2
> +    |  dmfc1 TMP1, FTMP0
> +    |  b <1
> +    if (vk) {
> +      |.  selnez TMP2, TMP2, TMP1
> +    } else {
> +      |.  seleqz TMP2, TMP2, TMP1
> +    }
> +    |.else
> +    |  c.eq.d FTMP0, FTMP2
>      |  b <1
>      if (vk) {
>        |.  movf TMP2, r0
>      } else {
>        |.  movt TMP2, r0
>      }
> +    |.endif
>      |.else
>      |  bal ->vm_sfcmpeq
>      |.  nop
>      |  b <1
> +    |.if MIPSR6
> +    if (vk) {
> +      |.  selnez TMP2, TMP2, CRET1
> +    } else {
> +      |.  seleqz TMP2, TMP2, CRET1
> +    }
> +    |.else
>      if (vk) {
>        |.  movz TMP2, r0, CRET1
>      } else {
>        |.  movn TMP2, r0, CRET1
>      }
>      |.endif
> +    |.endif
>      |
>      |5:  // RA is a number, RD is not a number.
>      |.if FFI
> @@ -3150,9 +3377,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.endif
>      |  // RA is a number, RD is an integer. Convert RD to a number.
>      |.if FPU
> -    |.  lwc1 f22, LO(RD)
> +    |.  lwc1 FTMP2, LO(RD)
>      |  b <4
> -    |.  cvt.d.w f22, f22
> +    |.  cvt.d.w FTMP2, FTMP2
>      |.else
>      |.  sextw CARG2, CARG2
>      |  bal ->vm_sfi2d_2
> @@ -3170,10 +3397,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.endif
>      |  // RA is an integer, RD is a number. Convert RA to a number.
>      |.if FPU
> -    |.  lwc1 f20, LO(RA)
> -    |   ldc1 f22, 0(RD)
> +    |.  lwc1 FTMP0, LO(RA)
> +    |   ldc1 FTMP2, 0(RD)
>      |  b <4
> -    |   cvt.d.w f20, f20
> +    |   cvt.d.w FTMP0, FTMP0
>      |.else
>      |.  sextw CARG1, CARG1
>      |  bal ->vm_sfi2d_1
> @@ -3216,11 +3443,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  decode_RD4b TMP2
>      |  lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
>      |  addu TMP2, TMP2, TMP3
> +    |.if MIPSR6
> +    if (vk) {
> +      |  seleqz TMP2, TMP2, TMP0
> +    } else {
> +      |  selnez TMP2, TMP2, TMP0
> +    }
> +    |.else
>      if (vk) {
>        |  movn TMP2, r0, TMP0
>      } else {
>        |  movz TMP2, r0, TMP0
>      }
> +    |.endif
>      |  daddu PC, PC, TMP2
>      |  ins_next
>      break;
> @@ -3239,11 +3474,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |   decode_RD4b TMP2
>        |   lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535)
>        |   addu TMP2, TMP2, TMP3
> +      |.if MIPSR6
> +      if (op == BC_IST) {
> +	|  selnez TMP2, TMP2, TMP0;
> +      } else {
> +	|  seleqz TMP2, TMP2, TMP0;
> +      }
> +      |.else
>        if (op == BC_IST) {
>  	|  movz TMP2, r0, TMP0
>        } else {
>  	|  movn TMP2, r0, TMP0
>        }
> +      |.endif
>        |  daddu PC, PC, TMP2
>      } else {
>        |  ld CRET1, 0(RD)
> @@ -3486,9 +3729,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  bltz TMP1, ->vmeta_arith
>      |.  daddu RA, BASE, RA
>      |.elif "intins" == "mult"
> +    |.if MIPSR6
> +    |.  nop
> +    |  mul CRET1, CARG3, CARG4
> +    |  muh TMP2, CARG3, CARG4
> +    |.else
>      |.  intins CARG3, CARG4
>      |  mflo CRET1
>      |  mfhi TMP2
> +    |.endif
>      |  sra TMP1, CRET1, 31
>      |  bne TMP1, TMP2, ->vmeta_arith
>      |.  daddu RA, BASE, RA
> @@ -3511,16 +3760,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |.endif
>      |
>      |5:  // Check for two numbers.
> -    |  .FPU ldc1 f20, 0(RB)
> +    |  .FPU ldc1 FTMP0, 0(RB)
>      |  sltu AT, TMP0, TISNUM
>      |   sltu TMP0, TMP1, TISNUM
> -    |  .FPU ldc1 f22, 0(RC)
> +    |  .FPU ldc1 FTMP2, 0(RC)
>      |   and AT, AT, TMP0
>      |   beqz AT, ->vmeta_arith
>      |.   daddu RA, BASE, RA
>      |
>      |.if FPU
> -    |  fpins FRET1, f20, f22
> +    |  fpins FRET1, FTMP0, FTMP2
>      |.elif "fpcall" == "sfpmod"
>      |  sfpmod
>      |.else
> @@ -3850,7 +4099,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  li TMP0, 0x801
>        |  addiu AT, CARG2, -0x7ff
>        |   srl CARG3, RD, 14
> +      |.if MIPSR6
> +      |  seleqz TMP0, TMP0, AT
> +      |  selnez CARG2, CARG2, AT
> +      |  or CARG2, CARG2, TMP0
> +      |.else
>        |  movz CARG2, TMP0, AT
> +      |.endif
>        |  // (lua_State *L, int32_t asize, uint32_t hbits)
>        |  call_intern lj_tab_new
>        |.  move CARG1, L
> @@ -4131,7 +4386,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  daddu NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
>      |   settp STR:RC, TMP3		// Tagged key to look for.
>      |.if FPU
> -    |   ldc1 f20, 0(RA)
> +    |   ldc1 FTMP0, 0(RA)
>      |.else
>      |   ld CRET1, 0(RA)
>      |.endif
> @@ -4147,7 +4402,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  andi AT, TMP3, LJ_GC_BLACK	// isblack(table)
>      |  bnez AT, >7
>      |.if FPU
> -    |.  sdc1 f20, NODE:TMP2->val
> +    |.  sdc1 FTMP0, NODE:TMP2->val
>      |.else
>      |.  sd CRET1, NODE:TMP2->val
>      |.endif
> @@ -4188,7 +4443,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  ld BASE, L->base
>      |.if FPU
>      |  b <3				// No 2nd write barrier needed.
> -    |.  sdc1 f20, 0(CRET1)
> +    |.  sdc1 FTMP0, 0(CRET1)
>      |.else
>      |  ld CARG1, 0(RA)
>      |  b <3				// No 2nd write barrier needed.
> @@ -4531,7 +4786,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  ld CARG1, 0(RC)
>      |  sltu AT, RC, TMP3
>      |    daddiu RC, RC, 8
> +    |.if MIPSR6
> +    |  selnez CARG1, CARG1, AT
> +    |  seleqz AT, TISNIL, AT
> +    |  or CARG1, CARG1, AT
> +    |.else
>      |  movz CARG1, TISNIL, AT
> +    |.endif
>      |  sd CARG1, 0(RA)
>      |  sltu AT, RA, TMP2
>      |  bnez AT, <1
> @@ -4720,7 +4981,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  dext AT, CRET1, 31, 0
>        |  slt CRET1, CARG2, CARG3
>        |  slt TMP1, CARG3, CARG2
> +      |.if MIPSR6
> +      |  selnez TMP1, TMP1, AT
> +      |  seleqz CRET1, CRET1, AT
> +      |  or CRET1, CRET1, TMP1
> +      |.else
>        |  movn CRET1, TMP1, AT
> +      |.endif
>      } else {
>        |  bne CARG3, TISNUM, >5
>        |.  ld CARG2, FORL_STEP*8(RA)	// STEP CARG2 - CARG4 type
> @@ -4736,20 +5003,34 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  slt CRET1, CRET1, CARG1
>        |  slt AT, CARG2, r0
>        |   slt TMP0, TMP0, r0		// ((y^a) & (y^b)) < 0: overflow.
> +      |.if MIPSR6
> +      |  selnez TMP1, TMP1, AT
> +      |  seleqz CRET1, CRET1, AT
> +      |  or CRET1, CRET1, TMP1
> +      |.else
>        |  movn CRET1, TMP1, AT
> +      |.endif
>        |   or CRET1, CRET1, TMP0
>        |  zextw CARG1, CARG1
>        |  settp CARG1, TISNUM
>      }
>      |1:
>      if (op == BC_FORI) {
> +      |.if MIPSR6
> +      |  selnez TMP2, TMP2, CRET1
> +      |.else
>        |  movz TMP2, r0, CRET1
> +      |.endif
>        |  daddu PC, PC, TMP2
>      } else if (op == BC_JFORI) {
>        |  daddu PC, PC, TMP2
>        |  lhu RD, -4+OFS_RD(PC)
>      } else if (op == BC_IFORL) {
> +      |.if MIPSR6
> +      |  seleqz TMP2, TMP2, CRET1
> +      |.else
>        |  movn TMP2, r0, CRET1
> +      |.endif
>        |  daddu PC, PC, TMP2
>      }
>      if (vk) {
> @@ -4779,6 +5060,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  and AT, AT, TMP0
>        |  beqz AT, ->vmeta_for
>        |.  slt TMP3, TMP3, r0
> +      |.if MIPSR6
> +      |   dmtc1 TMP3, FTMP2
> +      |  cmp.lt.d FTMP0, f0, f2
> +      |  cmp.lt.d FTMP1, f2, f0
> +      |  sel.d FTMP2, FTMP1, FTMP0
> +      |  b <1
> +      |.  dmfc1 CRET1, FTMP2
> +      |.else
>        |  c.ole.d 0, f0, f2
>        |  c.ole.d 1, f2, f0
>        |  li CRET1, 1
> @@ -4786,12 +5075,25 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  movt AT, r0, 1
>        |  b <1
>        |.  movn CRET1, AT, TMP3
> +      |.endif
>      } else {
>        |  ldc1 f0, FORL_IDX*8(RA)
>        |   ldc1 f4, FORL_STEP*8(RA)
>        |    ldc1 f2, FORL_STOP*8(RA)
>        |   ld TMP3, FORL_STEP*8(RA)
>        |  add.d f0, f0, f4
> +      |.if MIPSR6
> +      |   slt TMP3, TMP3, r0
> +      |   dmtc1 TMP3, FTMP2
> +      |  cmp.lt.d FTMP0, f0, f2
> +      |  cmp.lt.d FTMP1, f2, f0
> +      |  sel.d FTMP2, FTMP1, FTMP0
> +      |  dmfc1 CRET1, FTMP2
> +      if (op == BC_IFORL) {
> +	|  seleqz TMP2, TMP2, CRET1
> +	|  daddu PC, PC, TMP2
> +      }
> +      |.else
>        |  c.ole.d 0, f0, f2
>        |  c.ole.d 1, f2, f0
>        |   slt TMP3, TMP3, r0
> @@ -4804,6 +5106,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>  	|  movn TMP2, r0, CRET1
>  	|  daddu PC, PC, TMP2
>        }
> +      |.endif
>        |  sdc1 f0, FORL_IDX*8(RA)
>        |  ins_next1
>        |  b <2
> @@ -4979,8 +5282,17 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  ld TMP0, 0(RA)
>      |  sltu AT, RA, RC			// Less args than parameters?
>      |  move CARG1, TMP0
> +    |.if MIPSR6
> +    |  selnez TMP0, TMP0, AT
> +    |  seleqz TMP3, TISNIL, AT
> +    |  or TMP0, TMP0, TMP3
> +    |  seleqz TMP3, CARG1, AT
> +    |  selnez CARG1, TISNIL, AT
> +    |  or CARG1, CARG1, TMP3
> +    |.else
>      |  movz TMP0, TISNIL, AT		// Clear missing parameters.
>      |  movn CARG1, TISNIL, AT		// Clear old fixarg slot (help the GC).
> +    |.endif
>      |    addiu TMP2, TMP2, -1
>      |  sd TMP0, 16(TMP1)
>      |    daddiu TMP1, TMP1, 8
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
  2023-08-15  9:36   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 12:40     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 12:40 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> LGTM, except for a few comments below.
> 
> On Wed, Aug 09, 2023 at 06:35:50PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Djordje Kovacevic and Stefan Pejic.
> > 
> > (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
> > 
> > Without the aforementioned checks, some non-branch instructions may be
> > interpreted as some branch due to memory address collisions. This patch
> Please add a more comprehensive description of behavior before the patch.
> Because of magic values it is not obvious that the difference between the
> current PC and the jump address is XORed with the opcode, to make sure
> that this is a branching instruction.

Added. The new commit message is the following:

| MIPS: Use precise search for exit jump patching.
|
| Contributed by Djordje Kovacevic and Stefan Pejic.
|
| (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
|
| The branch instruction contains PC-relative mcode address in the lowest
| 4 bytes. To ensure that it is branch instruction we check that
| difference of the address of the current instruction and jump target is
| contained in the lowest 4 bytes of the instruction. But there is no
| check that opcode of this instruction is branch opcode. Without the
| aforementioned checks, some non-branch instructions may be interpreted
| as branches due to memory address collisions. This patch adds the
| corresponding mask values for comparisons with instruction opcodes used
| in the LuaJIT:
| * `MIPSI_BEQ` for `beq` and `bne`,
| * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
| * `MIPSI_BC1F` for `bc1f` and `bc1t`,
| see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
| details.
|
| To reproduce this failure, we need specific memory mapping, so the test
| case is omitted.
|
| Since MIPS architecture is not supported by Tarantool (at the moment)
| this patch is not necessary for backport. OTOH, it gives us the
| following benefits:
| * Be in sync with the LuaJIT upstream not only for x86_64, arm64
|   architectures.
| * Avoid conflicts during future backporting.
| So, it's more useful to backport some of the patches to avoid conflicts
| with the future patch series.
|
| [1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
|
| Sergey Kaplun:
| * added the description for the problem
|
| Part of tarantool/tarantool#8825

> 
> Typo: s/some branch/branches/

Fixed.

> > adds the corresponding comparisons masked values with instruction
> Typo: s/comparisons masked values/mask values for comparisons/

Fixed.

> > opcodes used in the LuaJIT:
> > * `MIPSI_BEQ` for `beq` and `bne`,
> > * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
> > * `MIPSI_BC1F` for `bc1f` and `bc1t`,
> > see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
> > details.
> > 
> > To reproduce this failure, we need specific memory mapping, so testcase
> Typo: s/testcase/the test case/

Fixed.

> > is omitted.
> > 
> > Since MIPS architecture is not supported by Tarantool (at the moment)
> > this patch is not necessary for backport. OTOH, it gives to us the
> Typo: s/gives to us/gives us/

Fixed.

> > following benefits:
> > * Be in sync with the LuaJIT upstream not only for x86_64, arm64
> >   architectures.
> > * Avoid conflicts during the future backporting.
> Typo: s/during the future/during future/

Fixed.

> > So, it's more useful to backport some of the patches to avoid conflicts

<snipped>

> > 
> Best regards,
> Maxim Kokryashkin

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
  2023-08-15 10:14   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 12:55     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 13:06       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 12:55 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Please, see my replies below.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
> 
> On Wed, Aug 09, 2023 at 06:35:51PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
> > depends on particular offset of mcode for side trace regarding the
> > parent trace. Before this commit just run some amount of functions to
> > generate traces to fill the required mcode range. Unfortunately, this
> > approach is not robust, since sometimes trace is not recorded due to
> > errors "leaving loop in root trace" observed because of hotcount
> > collisions.
> > 
> > This patch introduces the following helpers:
> > * `frontend.gettraceno(func)` -- returns the traceno for the given
> >   function, assumming that there is compiled trace for its prototype
> >   (i.e. the 0th bytecode is JFUNC).
> > * `jit.generators.fillmcode(traceno, size)` fills mcode area of the
> >   given size from the given trace. It is useful to generate some mcode
> >   to test jumps to side traces remote enough from the parent.
> > ---
> >  ...8-fix-side-exit-patching-on-arm64.test.lua |  78 ++----------
> >  test/tarantool-tests/utils/frontend.lua       |  24 ++++
> >  test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
> >  3 files changed, 150 insertions(+), 67 deletions(-)
> >  create mode 100644 test/tarantool-tests/utils/jit/generators.lua
> > 
> > diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> > index 93db3041..678ac914 100644
> > --- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
> > +++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua

<snipped>

> > diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> > index 2afebbb2..414257fd 100644
> > --- a/test/tarantool-tests/utils/frontend.lua
> > +++ b/test/tarantool-tests/utils/frontend.lua

<snipped>

> > diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
> > new file mode 100644
> > index 00000000..62b6e0ef
> > --- /dev/null
> > +++ b/test/tarantool-tests/utils/jit/generators.lua
> > @@ -0,0 +1,115 @@
> > +local M = {}
> > +
> > +local jutil = require('jit.util')
> > +
> > +local function getlast_traceno()
> > +  return misc.getmetrics().jit_trace_num
> > +end
> > +
> > +-- Convert addr to positive value if needed.
> > +local function canonize_address(addr)
> Nit: most of the time, the `canonize` variant is used in theological materials,
> while the `canonicalize` is more common in the sphere of software development.
> Feel free to ignore.

Fixed, thanks.

> > +  if addr < 0 then addr = addr + 2 ^ 32 end
> > +  return addr
> > +end
> > +
> > +-- Need some storage to avoid functions and traces to be
> > +-- collected.
> Typo: s/Need/We need/ or s/Need some storage/Some storage is needed/
> Typo: s/to be collected/being collected/

Fixed.

> > +local recfuncs = {}
> > +local last_i = 0
> > +-- This function generates a table of functions with heavy mcode
> > +-- payload with tab arithmetics to fill the mcode area from the
> > +-- one trace mcode by the some given size. This size is usually
> Typo: s/by the some/by some/

Fixed, thanks!

> > +-- big enough, because we want to check long jump side exits from
> > +-- some traces.
> > +-- Assumes, that maxmcode and maxtrace options are set to be sure,
> Typo: s/that/that the/

Fixed.

> > +-- that we can produce such amount of mcode.
> > +function M.fillmcode(trace_from, size)
> > +  local mcode, addr_from = jutil.tracemc(trace_from)
> > +  assert(mcode, 'the #1 argument should be an existed trace number')
> Typo: s/existed/existing/

Fixed, thanks!

> > +  addr_from = canonize_address(addr_from)
> > +  local required_diff = size + #mcode
> > +
> > +  -- Marker to check that traces are not flushed.
> > +  local maxtraceno = getlast_traceno()
> > +  local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
> > +
> > +  local _, last_addr = jutil.tracemc(maxtraceno)
> > +  last_addr = canonize_address(last_addr)
> > +
> > +  -- Addresses of traces may increase or decrease depending on OS,
> > +  -- so use absolute diff.
> > +  while math.abs(last_addr - addr_from) > required_diff do
> > +    last_i = last_i + 1
> > +    -- This is a quite heavy workload (though it doesn't look like
> Typo: s/This is a quite/This is quite a/

Fixed.

> > +    -- one at first). Each load from a table is type guarded. Each
> > +    -- table lookup (for both stores and loads) is guarded for
> > +    -- table <hmask> value and presence of the metatable. The code
> Typo: s/and presence/and the presence/

Fixed.

> > +    -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
> Typo: s/results to/results in/

Fixed.

> > +    -- practice.
> > +    local fname = ('fillmcode[%d]'):format(last_i)
> > +    recfuncs[last_i] = assert(loadstring(([[
> > +      return function(src)
> > +        local p = %d
> Nit: Poor naming, a more descriptive name is preferred.

It has no much sense, because we really don't care about of the
function's content. Since it's just moved part of the code, I prefer to
leave it as is.

Ignoring for now.

> > +        local tmp = { }
> > +        local dst = { }
> > +        -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
> Typo: s/as stop/as a stop/

Fixed, thanks!

> > +        -- in root trace) errors due to hotcount collisions.
> > +        for i = 1, 5 do

<snipped>

> > +    local function tnew(p)
> Nit: same issue with naming.

Ditto.

> > +      return {

<snipped>

See the iterative patch below:

===================================================================
diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
index 62b6e0ef..65abfdaa 100644
--- a/test/tarantool-tests/utils/jit/generators.lua
+++ b/test/tarantool-tests/utils/jit/generators.lua
@@ -7,26 +7,26 @@ local function getlast_traceno()
 end
 
 -- Convert addr to positive value if needed.
-local function canonize_address(addr)
+local function canonicalize_address(addr)
   if addr < 0 then addr = addr + 2 ^ 32 end
   return addr
 end
 
--- Need some storage to avoid functions and traces to be
+-- Some storage is needed to avoid functions and traces being
 -- collected.
 local recfuncs = {}
 local last_i = 0
 -- This function generates a table of functions with heavy mcode
 -- payload with tab arithmetics to fill the mcode area from the
--- one trace mcode by the some given size. This size is usually
--- big enough, because we want to check long jump side exits from
--- some traces.
--- Assumes, that maxmcode and maxtrace options are set to be sure,
--- that we can produce such amount of mcode.
+-- one trace mcode by some given size. This size is usually big
+-- enough, because we want to check long jump side exits from some
+-- traces.
+-- Assumes, that the maxmcode and maxtrace options are set to be
+-- sure, that we can produce such amount of mcode.
 function M.fillmcode(trace_from, size)
   local mcode, addr_from = jutil.tracemc(trace_from)
-  assert(mcode, 'the #1 argument should be an existed trace number')
-  addr_from = canonize_address(addr_from)
+  assert(mcode, 'the #1 argument should be an existing trace number')
+  addr_from = canonicalize_address(addr_from)
   local required_diff = size + #mcode
 
   -- Marker to check that traces are not flushed.
@@ -34,17 +34,17 @@ function M.fillmcode(trace_from, size)
   local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
 
   local _, last_addr = jutil.tracemc(maxtraceno)
-  last_addr = canonize_address(last_addr)
+  last_addr = canonicalize_address(last_addr)
 
   -- Addresses of traces may increase or decrease depending on OS,
   -- so use absolute diff.
   while math.abs(last_addr - addr_from) > required_diff do
     last_i = last_i + 1
-    -- This is a quite heavy workload (though it doesn't look like
+    -- This is quite a heavy workload (though it doesn't look like
     -- one at first). Each load from a table is type guarded. Each
     -- table lookup (for both stores and loads) is guarded for
-    -- table <hmask> value and presence of the metatable. The code
-    -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
+    -- table <hmask> value and the presence of the metatable. The
+    -- code below results in ~8Kb of mcode for ARM64 and MIPS64 in
     -- practice.
     local fname = ('fillmcode[%d]'):format(last_i)
     recfuncs[last_i] = assert(loadstring(([[
@@ -52,8 +52,8 @@ function M.fillmcode(trace_from, size)
         local p = %d
         local tmp = { }
         local dst = { }
-        -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
-        -- in root trace) errors due to hotcount collisions.
+        -- XXX: use 5 as a stop index to reduce LLEAVE (leaving
+        -- loop in root trace) errors due to hotcount collisions.
         for i = 1, 5 do
           tmp.a = src.a * p   tmp.j = src.j * p   tmp.s = src.s * p
           tmp.b = src.b * p   tmp.k = src.k * p   tmp.t = src.t * p
@@ -108,7 +108,7 @@ function M.fillmcode(trace_from, size)
     if not last_addr then
       error(FLUSH_ERR)
     end
-    last_addr = canonize_address(last_addr)
+    last_addr = canonicalize_address(last_addr)
   end
 end
 
===================================================================

> > +end
> > +
> > +return M
> > -- 
> > 2.41.0
> Best regards,
> Maxim Kokryashkin
> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
  2023-08-15 11:13   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:05     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:05 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:52PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Djordje Kovacevic and Stefan Pejic.
> > 
> > (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
> > 
> > `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> > is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
> Typo: s/to incorrect/to an incorrect/

Fixed.

> > check in `asm_sparejump_setup()`, so mcode bottom is not updated.
> Typo: s/so mcode/so the mcode/

Fixed.

> > 
> > This patch fixes check of the MCLink offset from the mcbot.
> Typo: s/fixes check/fixes the check/

Fixed.

> > Nevertheless, the emitting of spare jump slots is still incorrect, so
> > the introduced test still fails due to incorrect iteration through the
> Typo: s/due to/due to the/

Fixed.

> > sparce table (the last slot is out of mcode range).
> > 
> > This should be fixed via backporting of the commit
> > dbb78630169a8106b355a5be8af627e98c362f1e ("MIPS: Fix handling of
> > long-range spare jumps."). But it triggers the new unconditional
> > assert, that is added in this patch, mentioning that sizemcode is too
> > bit. So some workaround should be found, when this test will be enabled
> Typo: s/bit/big/
> Typo: s/will be/is/

Fixed, thanks!

> > for MIPS.
> > 
> > Since test also validates the behaviour of long-range jumps to side
> > traces for arm64 and x64, and we have no testing for MIPS64 (yet), we
> > can leave it as is without a skipcond.
> > 
> > Sergey Kaplun:
> > * added the description and the test for the problem
> > 
> > Part of tarantool/tarantool#8825
> > ---
> >  src/lj_asm_mips.h                             |  9 +--
> >  src/lj_jit.h                                  |  6 ++
> >  src/lj_mcode.c                                |  6 --
> >  ...x-mips64-spare-side-exit-patching.test.lua | 65 +++++++++++++++++++
> >  4 files changed, 76 insertions(+), 10 deletions(-)
> >  create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> > 

<snipped>

> > diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> > new file mode 100644
> > index 00000000..fdc826cb
> > --- /dev/null
> > +++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> > @@ -0,0 +1,65 @@
> > +local tap = require('tap')
> > +local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
> > +  ['Test requires JIT enabled'] = not jit.status(),
> > +  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
> > +  -- Need to fix the MIPS behaviour first.
> Typo: s/Need to/We need to/

Fixed.

> > +  ['Disabled for MIPS architectures'] = jit.arch:match('mips'),

<snipped>

> > +  -- Allow to use 2000 traces to avoid flushes.
> Typo: s/to use/compilation of up to/

Fixed.

> > +  'maxtrace=2000',
> > +  -- Allow to compile 8Mb of mcode to be sure the issue occurs.
> Typo: s/to compile/compilation of up to/

Fixed.

> > +  'maxmcode=8192',
> > +  -- Use big mcode area for traces to avoid using different
> Typo: s/using/usage of/

Fixed.

> > +  -- spare slots.
> > +  'sizemcode=256'
> > +)
> > +
> > +local MAX_SPARE_SLOT = 4
> A link to the definition in `lj_asm_mips.h` would be nice to have.

Added.

> 
> > +local function parent(marker)
> > +  -- Use several side exit to fill spare exit space (default is
> Typo: s/side exit/side exits/

Fixed, thanks!

> > +  -- 4 slots, each slot has 2 instructions -- jump and nop).
> > +  -- luacheck: ignore
> > +  if marker > MAX_SPARE_SLOT then end
> > +  if marker > 3 then end
> > +  if marker > 2 then end
> > +  if marker > 1 then end
> > +  if marker > 0 then end
> > +  -- XXX: use `fmod()` to avoid leaving the function and use
> > +  -- stitching here.
> > +  return math.fmod(1, 1)
> > +end
> > +
> > +-- Compile parent trace first.
> > +parent(0)
> > +parent(0)
> > +
> > +local parent_traceno = frontend.gettraceno(parent)
> > +local last_traceno = parent_traceno
> > +
> > +-- Now generate some mcode to forcify long jump with a spare slot.
> > +-- Each iteration provide different addresses and uses a different
> Typo: s/provide/provides/

Fixed, thanks!

> > +-- spare slot. After it compile and execute new side trace.
> Typo: s/After it compile and execute/After that, compiles and executes a/

Fixed.

See the iterative patch below.

===================================================================
diff --git a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
index fdc826cb..62933df9 100644
--- a/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
+++ b/test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
@@ -2,7 +2,7 @@ local tap = require('tap')
 local test = tap.test('fix-mips64-spare-side-exit-patching'):skipcond({
   ['Test requires JIT enabled'] = not jit.status(),
   ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
-  -- Need to fix the MIPS behaviour first.
+  -- We need to fix the MIPS behaviour first.
   ['Disabled for MIPS architectures'] = jit.arch:match('mips'),
 })
 
@@ -18,18 +18,19 @@ jit.opt.start(
   -- Try to compile all compiled paths as early as JIT can.
   'hotloop=1',
   'hotexit=1',
-  -- Allow to use 2000 traces to avoid flushes.
+  -- Allow compilation of up to 2000 traces to avoid flushes.
   'maxtrace=2000',
   -- Allow to compile 8Mb of mcode to be sure the issue occurs.
   'maxmcode=8192',
-  -- Use big mcode area for traces to avoid using different
+  -- Use big mcode area for traces to avoid usage of different
   -- spare slots.
   'sizemcode=256'
 )
 
+-- See the define in the <src/lj_asm_mips.h>.
 local MAX_SPARE_SLOT = 4
 local function parent(marker)
-  -- Use several side exit to fill spare exit space (default is
+  -- Use several side exits to fill spare exit space (default is
   -- 4 slots, each slot has 2 instructions -- jump and nop).
   -- luacheck: ignore
   if marker > MAX_SPARE_SLOT then end
@@ -50,8 +51,9 @@ local parent_traceno = frontend.gettraceno(parent)
 local last_traceno = parent_traceno
 
 -- Now generate some mcode to forcify long jump with a spare slot.
--- Each iteration provide different addresses and uses a different
--- spare slot. After it compile and execute new side trace.
+-- Each iteration provides different addresses and uses a
+-- different spare slot. After that, compiles and executes a new
+-- side trace.
 for i = 1, MAX_SPARE_SLOT + 1 do
   generators.fillmcode(last_traceno, 1024 * 1024)
   parent(i)
===================================================================

<snipped>

> > 2.41.0
> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit 02/19] test: introduce mcode generator for tests
  2023-08-16 12:55     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 13:06       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16 13:06 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 10152 bytes --]


Hi, Sergey!
Thanks for the fixes!
LGTM
--
Best regards,
Maxim Kokryashkin
 
  
>Среда, 16 августа 2023, 16:00 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
> 
>Hi, Maxim!
>Thanks for the review!
>Please, see my replies below.
>
>On 15.08.23, Maxim Kokryashkin wrote:
>> Hi, Sergey!
>> Thanks for the patch!
>> Please consider my comments below.
>>
>> On Wed, Aug 09, 2023 at 06:35:51PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> > The test <test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64>
>> > depends on particular offset of mcode for side trace regarding the
>> > parent trace. Before this commit just run some amount of functions to
>> > generate traces to fill the required mcode range. Unfortunately, this
>> > approach is not robust, since sometimes trace is not recorded due to
>> > errors "leaving loop in root trace" observed because of hotcount
>> > collisions.
>> >
>> > This patch introduces the following helpers:
>> > * `frontend.gettraceno(func)` -- returns the traceno for the given
>> > function, assumming that there is compiled trace for its prototype
>> > (i.e. the 0th bytecode is JFUNC).
>> > * `jit.generators.fillmcode(traceno, size)` fills mcode area of the
>> > given size from the given trace. It is useful to generate some mcode
>> > to test jumps to side traces remote enough from the parent.
>> > ---
>> > ...8-fix-side-exit-patching-on-arm64.test.lua | 78 ++----------
>> > test/tarantool-tests/utils/frontend.lua | 24 ++++
>> > test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++
>> > 3 files changed, 150 insertions(+), 67 deletions(-)
>> > create mode 100644 test/tarantool-tests/utils/jit/generators.lua
>> >
>> > diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
>> > index 93db3041..678ac914 100644
>> > --- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
>> > +++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua
>
><snipped>
>
>> > diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua
>> > index 2afebbb2..414257fd 100644
>> > --- a/test/tarantool-tests/utils/frontend.lua
>> > +++ b/test/tarantool-tests/utils/frontend.lua
>
><snipped>
>
>> > diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
>> > new file mode 100644
>> > index 00000000..62b6e0ef
>> > --- /dev/null
>> > +++ b/test/tarantool-tests/utils/jit/generators.lua
>> > @@ -0,0 +1,115 @@
>> > +local M = {}
>> > +
>> > +local jutil = require('jit.util')
>> > +
>> > +local function getlast_traceno()
>> > + return misc.getmetrics().jit_trace_num
>> > +end
>> > +
>> > +-- Convert addr to positive value if needed.
>> > +local function canonize_address(addr)
>> Nit: most of the time, the `canonize` variant is used in theological materials,
>> while the `canonicalize` is more common in the sphere of software development.
>> Feel free to ignore.
>
>Fixed, thanks.
>
>> > + if addr < 0 then addr = addr + 2 ^ 32 end
>> > + return addr
>> > +end
>> > +
>> > +-- Need some storage to avoid functions and traces to be
>> > +-- collected.
>> Typo: s/Need/We need/ or s/Need some storage/Some storage is needed/
>> Typo: s/to be collected/being collected/
>
>Fixed.
>
>> > +local recfuncs = {}
>> > +local last_i = 0
>> > +-- This function generates a table of functions with heavy mcode
>> > +-- payload with tab arithmetics to fill the mcode area from the
>> > +-- one trace mcode by the some given size. This size is usually
>> Typo: s/by the some/by some/
>
>Fixed, thanks!
>
>> > +-- big enough, because we want to check long jump side exits from
>> > +-- some traces.
>> > +-- Assumes, that maxmcode and maxtrace options are set to be sure,
>> Typo: s/that/that the/
>
>Fixed.
>
>> > +-- that we can produce such amount of mcode.
>> > +function M.fillmcode(trace_from, size)
>> > + local mcode, addr_from = jutil.tracemc(trace_from)
>> > + assert(mcode, 'the #1 argument should be an existed trace number')
>> Typo: s/existed/existing/
>
>Fixed, thanks!
>
>> > + addr_from = canonize_address(addr_from)
>> > + local required_diff = size + #mcode
>> > +
>> > + -- Marker to check that traces are not flushed.
>> > + local maxtraceno = getlast_traceno()
>> > + local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
>> > +
>> > + local _, last_addr = jutil.tracemc(maxtraceno)
>> > + last_addr = canonize_address(last_addr)
>> > +
>> > + -- Addresses of traces may increase or decrease depending on OS,
>> > + -- so use absolute diff.
>> > + while math.abs(last_addr - addr_from) > required_diff do
>> > + last_i = last_i + 1
>> > + -- This is a quite heavy workload (though it doesn't look like
>> Typo: s/This is a quite/This is quite a/
>
>Fixed.
>
>> > + -- one at first). Each load from a table is type guarded. Each
>> > + -- table lookup (for both stores and loads) is guarded for
>> > + -- table <hmask> value and presence of the metatable. The code
>> Typo: s/and presence/and the presence/
>
>Fixed.
>
>> > + -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
>> Typo: s/results to/results in/
>
>Fixed.
>
>> > + -- practice.
>> > + local fname = ('fillmcode[%d]'):format(last_i)
>> > + recfuncs[last_i] = assert(loadstring(([[
>> > + return function(src)
>> > + local p = %d
>> Nit: Poor naming, a more descriptive name is preferred.
>
>It has no much sense, because we really don't care about of the
>function's content. Since it's just moved part of the code, I prefer to
>leave it as is.
>
>Ignoring for now.
>
>> > + local tmp = { }
>> > + local dst = { }
>> > + -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
>> Typo: s/as stop/as a stop/
>
>Fixed, thanks!
>
>> > + -- in root trace) errors due to hotcount collisions.
>> > + for i = 1, 5 do
>
><snipped>
>
>> > + local function tnew(p)
>> Nit: same issue with naming.
>
>Ditto.
>
>> > + return {
>
><snipped>
>
>See the iterative patch below:
>
>===================================================================
>diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua
>index 62b6e0ef..65abfdaa 100644
>--- a/test/tarantool-tests/utils/jit/generators.lua
>+++ b/test/tarantool-tests/utils/jit/generators.lua
>@@ -7,26 +7,26 @@ local function getlast_traceno()
> end
> 
> -- Convert addr to positive value if needed.
>-local function canonize_address(addr)
>+local function canonicalize_address(addr)
>   if addr < 0 then addr = addr + 2 ^ 32 end
>   return addr
> end
> 
>--- Need some storage to avoid functions and traces to be
>+-- Some storage is needed to avoid functions and traces being
> -- collected.
> local recfuncs = {}
> local last_i = 0
> -- This function generates a table of functions with heavy mcode
> -- payload with tab arithmetics to fill the mcode area from the
>--- one trace mcode by the some given size. This size is usually
>--- big enough, because we want to check long jump side exits from
>--- some traces.
>--- Assumes, that maxmcode and maxtrace options are set to be sure,
>--- that we can produce such amount of mcode.
>+-- one trace mcode by some given size. This size is usually big
>+-- enough, because we want to check long jump side exits from some
>+-- traces.
>+-- Assumes, that the maxmcode and maxtrace options are set to be
>+-- sure, that we can produce such amount of mcode.
> function M.fillmcode(trace_from, size)
>   local mcode, addr_from = jutil.tracemc(trace_from)
>- assert(mcode, 'the #1 argument should be an existed trace number')
>- addr_from = canonize_address(addr_from)
>+ assert(mcode, 'the #1 argument should be an existing trace number')
>+ addr_from = canonicalize_address(addr_from)
>   local required_diff = size + #mcode
> 
>   -- Marker to check that traces are not flushed.
>@@ -34,17 +34,17 @@ function M.fillmcode(trace_from, size)
>   local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options'
> 
>   local _, last_addr = jutil.tracemc(maxtraceno)
>- last_addr = canonize_address(last_addr)
>+ last_addr = canonicalize_address(last_addr)
> 
>   -- Addresses of traces may increase or decrease depending on OS,
>   -- so use absolute diff.
>   while math.abs(last_addr - addr_from) > required_diff do
>     last_i = last_i + 1
>- -- This is a quite heavy workload (though it doesn't look like
>+ -- This is quite a heavy workload (though it doesn't look like
>     -- one at first). Each load from a table is type guarded. Each
>     -- table lookup (for both stores and loads) is guarded for
>- -- table <hmask> value and presence of the metatable. The code
>- -- below results to ~8Kb of mcode for ARM64 and MIPS64 in
>+ -- table <hmask> value and the presence of the metatable. The
>+ -- code below results in ~8Kb of mcode for ARM64 and MIPS64 in
>     -- practice.
>     local fname = ('fillmcode[%d]'):format(last_i)
>     recfuncs[last_i] = assert(loadstring(([[
>@@ -52,8 +52,8 @@ function M.fillmcode(trace_from, size)
>         local p = %d
>         local tmp = { }
>         local dst = { }
>- -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop
>- -- in root trace) errors due to hotcount collisions.
>+ -- XXX: use 5 as a stop index to reduce LLEAVE (leaving
>+ -- loop in root trace) errors due to hotcount collisions.
>         for i = 1, 5 do
>           tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p
>           tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p
>@@ -108,7 +108,7 @@ function M.fillmcode(trace_from, size)
>     if not last_addr then
>       error(FLUSH_ERR)
>     end
>- last_addr = canonize_address(last_addr)
>+ last_addr = canonicalize_address(last_addr)
>   end
> end
> 
>===================================================================
>
>> > +end
>> > +
>> > +return M
>> > --
>> > 2.41.0
>> Best regards,
>> Maxim Kokryashkin
>> >
>
>--
>Best regards,
>Sergey Kaplun
 

[-- Attachment #2: Type: text/html, Size: 12464 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
  2023-08-15 11:27   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:10     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:10 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:53PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> > Sponsored by Cisco Systems, Inc.
> > 
> > (cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)
> > 
> > The software floating point library is used on machines which do not
> > have hardware support for floating point [1]. This patch enables
> > support for such machines in JIT compiler backend for MIPS64.
> Typo: s/in JIT/in the JIT/

Fixed.

> > This includes:
> > * `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
> >   number to integer with a check for the soft-float support (called from
> >   JIT).
> > * `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
> >   operations with a check for the soft-float support (called from JIT).
> Typo: s/the soft-float/soft-float/

Fixed.

> > * `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
> >   `LJ_SOFTFP`.
> > * All fp-depending paths are instrumented with `LJ_SOFTFP` or
> Typo: s/fp-depending/fp-dependent/

Fixed.

> >   `LJ_SOFTFP32` macro.
> Typo: s/macro/macros/

Fixed.

> > * The corresponding function calls in <src/lj_ircall.h> are marked as
> >   `XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
> >   for soft-FP on 32-bit MIPS.
> 
> Shouldn't we also mention the `asm_tobit` function?

I suppose no, since it still just another implementation for SOFTFP &&
LJ_64 mode.

> > 
> > [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> > 
> > Sergey Kaplun:
> > * added the description for the feature
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
  2023-08-15 11:40   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:13     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:13 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:54PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> > Sponsored by Cisco Systems, Inc.
> > 
> > (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
> > 
> > The software floating point library is used on machines which do not
> > have hardware support for floating point [1]. This patch enables
> > support for such machines in the VM for powerpc.
> Typo: s/powerpc/PowerPC/

Fixed, thanks.

> > This includes:
> > * Any loads/storages of double values use load/storage through 32-bit
> Typo: s/storages/stores/ Feel free to ignore, though.

Fixed, thanks.

> >   registers of `lo` and `hi` part of the TValue union.
> > * Macro .FPU is added to skip instructions necessary only for
> >   hard-float operations (load/store floating point registers from/on the
> >   stack, when leave/enter VM, for example).
> Typo: s/leave/enter/leaving/entering/

Fixed, thanks!

> > * Now r25 named as `SAVE1` is used as saved temporary register (used in
> >   different fast functions)
> > * `sfi2d` macro is introduced to convert integer, that represents a
> Typo: s/convert/convert an/

Fixed.

> >   soft-float, to double. Receives destination and source registers, uses
> Typo: s/to double/to a double/

Fixed.

> >   `TMP0` and `TMP1`.
> > * `sfpmod` macro is introduced for soft-float point `fmod` built-in.
> > * `ins_arith` now receives the third parameter -- operation to use for
> >   soft-float point.
> > * `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
> >   there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
> >   set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.
> > 
> > Support of soft-float point for the JIT compiler will be added in the
> > next patch.
> > 
> > [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> > 
> > Sergey Kaplun:
> > * added the description for the feature
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
  2023-08-15 11:46   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:21     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:21 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
See my answers below.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few typos and a single question below.
> On Wed, Aug 09, 2023 at 06:35:55PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> > Sponsored by Cisco Systems, Inc.
> > 
> > (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
> > 
> > The software floating point library is used on machines which do not
> > have hardware support for floating point [1]. This patch enables
> > support for such machines in the JIT compiler for powerpc.
> Typo: s/powerpc/PowerPC/

Fixed, thanks!

> > This includes:
> > * All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
> Typo: s/fp-depending/fp-dependent/

Fixed.

> > * `asm_sfpmin_max()` is introduced for min/max operations on soft-float
> >   point.
> > * `asm_sfpcomp()` is introduced for soft-float point comparisons.
> > 
> > [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> > 
> > Sergey Kaplun:
> > * added the description for the feature
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > +#if LJ_SOFTFP
> > +  case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
> > +  case IR_STRTO:
> Why are those fp-dependent? Should we write an explanation?

I supposed, that is used lo for possible half of a fp value.
Also, there is no need to use it on hard-float machines.
I suppose, that the comment as is is OK.
Same for the stores.

> > +    if (!uselo)
> > +      ra_allocref(as, ir->op1, RSET_GPR);  /* Mark lo op as used. */
> > +    break;
> > +#endif
> >    case IR_CALLN:
> > +  case IR_CALLS:
> >    case IR_CALLXS:
> >      if (!uselo)
> >        ra_allocref(as, ir->op1, RID2RSET(RID_RETLO));  /* Mark lo op as used. */
> >      break;
> > +#if LJ_SOFTFP
> > +  case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
> > +#endif
> >    case IR_CNEWI:
> >      /* Nothing to do here. Handled by lo op itself. */
> >      break;
> > @@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
> >      if ((sn & SNAP_NORESTORE))
> >        continue;
> >      if (irt_isnum(ir->t)) {

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
  2023-08-15  9:36   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:25   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 13:25 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey

Thanks for the patch! LGTM

On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic.
>
> (cherry-picked from commit 7381b620358c2561e8690149f1d25828fdad6675)
>
> Without the aforementioned checks, some non-branch instructions may be
> interpreted as some branch due to memory address collisions. This patch
> adds the corresponding comparisons masked values with instruction
> opcodes used in the LuaJIT:
> * `MIPSI_BEQ` for `beq` and `bne`,
> * `MIPSI_BLTZ` for `bltz`, `blez`, `bgtz` and `bgez`,
> * `MIPSI_BC1F` for `bc1f` and `bc1t`,
> see <src/lj_target_mips.h> and MIPS Instruction Set Manual [1] for
> details.
>
> To reproduce this failure, we need specific memory mapping, so testcase
> is omitted.
>
> Since MIPS architecture is not supported by Tarantool (at the moment)
> this patch is not necessary for backport. OTOH, it gives to us the
> following benefits:
> * Be in sync with the LuaJIT upstream not only for x86_64, arm64
>    architectures.
> * Avoid conflicts during the future backporting.
> So, it's more useful to backport some of the patches to avoid conflicts
> with the future patch series.
>
> [1]: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#8825
> ---
>   src/lj_asm_mips.h | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 03417013..03215821 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -2472,7 +2472,11 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
>     MCode tjump = MIPSI_J|(((uintptr_t)target>>2)&0x03ffffffu);
>     for (p++; p < pe; p++) {
>       if (*p == exitload) {  /* Look for load of exit number. */
> -      if (((p[-1] ^ (px-p)) & 0xffffu) == 0) {  /* Look for exitstub branch. */
> +      /* Look for exitstub branch. Yes, this covers all used branch variants. */
> +      if (((p[-1] ^ (px-p)) & 0xffffu) == 0 &&
> +	  ((p[-1] & 0xf0000000u) == MIPSI_BEQ ||
> +	   (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ ||
> +	   (p[-1] & 0xffe00000u) == MIPSI_BC1F)) {
>   	ptrdiff_t delta = target - p;
>   	if (((delta + 0x8000) >> 16) == 0) {  /* Patch in-range branch. */
>   	patchbranch:

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
  2023-08-15 11:58   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:40     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:40 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thansk for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few typos below.
> On Wed, Aug 09, 2023 at 06:35:56PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > This patch is a follow-up for the commit
> > a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
> > timer from lj_profile"). It moves the timer machinery to the separate
> Typo: s/to the/to a/

Fixed.

> > module. Unfortunately, the `profile_{un}lock()` calls for Windows and
> > PS3 wasn't updated to access `lj_profile_timer` structure instead of
> Typo: s/wasn't/weren't/

Fixed, thanks.

> > `ProfileState`.
> > 
> > Also, it is a follow-up to the commit
> > f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
> > section"). Since this commit the system-dependent header <unistd.h> and
> > `write()`, `open()`, `close()` functions are used. They are undefining
> Typo: s/undefining/undefined/

Fixed.

> > on Windows, so this leads to error during the build.
> Typo: s/error/errors/

Fixed.

> > 
> > This patch fixes the aforementioned misbehaviour. After it our fork may
> > be built on Windows at least.
> > ---

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
  2023-08-15 12:09   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:50     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:50 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:35:57PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Ben Pye.
> > 
> > (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
> > 
> > This patch adds partial support for the Universal Windows Platform [1]
> > in LuaJIT.
> > This includes:
> > * `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
> Typo: s/is Unviersal/is the Universal/

Fixed.

> >   Platform.
> > * `LJ_WIN_VALLOC()` macro is introduced to use instead of
> Typo: s/to use/to be used/

Fixed.

> >   `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
> > * `LJ_WIN_VPROTECT()` macro is introduced to use instead of
> Typo: s/to use/to be used/

Fixed.

> >   `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
> > * `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
> Typo: s/to use/to be used/

Fixed.

> >   `LoadLibraryExA()` [6] (custom implementation using
> >   `LoadPackagedLibrary()` [7] for UWP).
> > 
> > Note that the following features are not implemented for UWP:
> > * `io.popen()`.
> > * LuaJIT profiler's (`jit.p`) timer for Windows has not very high
> >   resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
> Typo: s/not very high/a low/

Fixed.

> >   not used, because the <winmm.dll> library isn't loaded.
> > 
> > [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
> > [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
> > [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
> > [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
> > [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
> > [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
> > [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
> > [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
> > [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
> > 
> > Sergey Kaplun:
> > * added the description for the feature
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > +#if LJ_TARGET_UWP
> > +  case H_(a40f0bcb,a40f0bcb): b = 1; break;  /* uwp */
> >  #endif
> It is not obvious what happens here and it is not mentioned in the commit message.
> Please add a description of this change too.

Added.

> >    case H_(3af93066,1f001464): b = 1; break;  /* le/be */

The new commit message is the following:

| Windows: Add UWP support, part 1.
|
| Contributed by Ben Pye.
|
| (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
|
| This patch adds partial support for the Universal Windows Platform [1]
| in LuaJIT.
| This includes:
| * `LJ_TARGET_UWP` is introduced to mark that target is the Universal
|   Windows Platform.
| * `LJ_WIN_VALLOC()` macro is introduced to be used instead of
|   `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
| * `LJ_WIN_VPROTECT()` macro is introduced to be used instead of
|   `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
| * `LJ_WIN_LOADLIBA()` macro is introduced to be used instead of
|   `LoadLibraryExA()` [6] (custom implementation using
|   `LoadPackagedLibrary()` [7] for UWP).
| * Now `ffi.abi()` also provides information about "uwp" parameter for
|   target ABI.
|
| Note that the following features are not implemented for UWP:
| * `io.popen()`.
| * LuaJIT profiler's (`jit.p`) timer for Windows has a low resolution
|   since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are not used,
|   because the <winmm.dll> library isn't loaded.
|
| [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
| [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
| [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
| [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
| [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
| [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
| [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
| [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
| [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
|
| Sergey Kaplun:
| * added the description for the feature
|
| Part of tarantool/tarantool#8825


<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
  2023-08-15 13:07   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:52     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 17:04     ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:52 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> 
> On Wed, Aug 09, 2023 at 06:35:58PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > (cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)
> > 
> > This patch refactors FFI parsing of supported C attributes and pragmas,
> > `ffi.abi()` parameter check. It replaces usage of comparison (with
> Typo: s/usage/the usage/

Fixed.

> > hardcoded string hashes) with search in the given string with the
> Typo: s/with search/with a search/

Fixes.

> > format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
> > "attribute" name.
> > 
> > Sergey Kaplun:
> > * added the description for the commit
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-15 13:17   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 13:59     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 13:59 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM as trivial, except for a few comments regarding the commit message below.
> On Wed, Aug 09, 2023 at 06:36:00PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > (cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)
> > 
> > This patch adds the `/* fallthrough */` where it may trigger the
> > `-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
> Typo: s/cases still/cases are still/

Fixed, thanks!

> > this comment and will be fixed in the future commits.
> Typo: s/in the/in/

Fixed.

> > 
> > [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> > 
> > Sergey Kaplun:
> > * added the description for the commit
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
  2023-08-15 13:21   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:01     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:01 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Rephrased the commit message, as you've suggested.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM as trivial, except for the single comment regarding the commit message below.
> On Wed, Aug 09, 2023 at 06:36:01PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
> > 
> > This patch adds the `/* fallthrough */` comments elsewhere, where it was
> > missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> > is trigerred.
> Since there are no 'comments', but the single 'comment', I believe a better phrasing
> would be:
> | This patch adds the `/* fallthrough */` comment to dynasm/dasm_arm64.h, so the
> | `-Wimplicit-fallthrough` [1] warning is not trigerred anymore for the ARM64 build.

Fixed, thanks!

> > 
> > [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> > 
> > Sergey Kaplun:
> > * added the description for the commit
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > 2.41.0
> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-15 13:25   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:08     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:08 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the patch!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM as trivial, except for a few nits, regarding the commit message.
> On Wed, Aug 09, 2023 at 06:36:02PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > (cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)
> > 
> > This patch adds the `/* fallthrough */` comments elsewhere, where it was
> Typo: s/where it was/where they were/

Fixed.

> > missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
> > is trigerred.
> Typo: s/is trigerred/is not triggered/

Fixed, thanks!

> > 
> > Also, this commits sets the correspoinding flag in the
> Typo: s/commits/commit/
> Typo: s/in the/in/

Fixed.

> > <cmake/SetTargetFlags.cmake>.
> > 
> > [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
> > 
> > Sergey Kaplun:
> > * added the description for the commit
> > 
> > Part of tarantool/tarantool#8825
> > ---

Also added the following patch to be consisted with our codestyle in
CMake.
===================================================================
diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
index d309989e..d6ee1693 100644
--- a/cmake/SetTargetFlags.cmake
+++ b/cmake/SetTargetFlags.cmake
@@ -9,7 +9,7 @@
 include(CheckUnwindTables)
 
 # Clang does not recognize comment markers.
-if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
+if(CMAKE_C_COMPILER_ID STREQUAL "GNU"
     AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
   AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
 endif()
===================================================================

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
  2023-08-15 13:35   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:20     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 20:13       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:20 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
> 
> On Wed, Aug 09, 2023 at 06:36:03PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Thanks to Sergey Ostanevich.
> > 
> > (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
> > 
> > This patch just reverts the commit
> > 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
> > debug.getinfo(1,'>S')") and applies the one from the main repo for the
> Typo: s/for the/for/

Fixed.

> > consistency with the upstream.
> > ---
> >  src/lj_debug.c | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> Since there were no test with the original fix, it would be nice to
> add one.

Added, see iterative diff below:

===================================================================
diff --git a/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
new file mode 100644
index 00000000..a50b80e4
--- /dev/null
+++ b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
@@ -0,0 +1,13 @@
+local tap = require('tap')
+
+-- Test file to demonstrate crash in the `debug.getinfo()` call.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/509.
+local test = tap.test('lj-509-debug-getinfo-arguments-check.test.lua')
+test:plan(2)
+
+-- '>' expects to have an extra argument on the stack.
+local res, err = pcall(debug.getinfo, 1, '>S')
+test:ok(not res, 'check result of the call with invalid arguments')
+test:like(err, 'bad argument', 'check the error message')
+
+test:done(true)
===================================================================

> > 

<snipped>

> > 2.41.0
> Best regards,
> Maxim Kokryashkin
> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
  2023-08-15 14:07   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:22     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:22 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> On Wed, Aug 09, 2023 at 06:36:04PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Thanks to Yichun Zhang.
> > 
> > (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
> > 
> > This patch is predecessor for the commit
> Typo: s/is predecessor for the/is the predecessor to/

Fixed.

> > 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> > check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> > that leading to the assertion failure. Since the predecessor patch,
> Typo: s/leading/leads/

Fixed, thanks!

> > there are no places, that can lead to the condition failure, since we
> > always check that new baseslot + framesize (+ vargframe) >=
> > `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
> Typo: s/as minimum/as the minimum/

Fixed.

> > for details), we can't obtain this assertion failure. This patch is
> > added for the consistency with the upstream.
> Typo: s/the consistency/consistency/

Fixed.

> > 
> > Since the predecessor patch fixes the issue, there is no new test case
> > to add.
> > 
> > Sergey Kaplun:
> > * added the description for the problem
> > 
> > Part of tarantool/tarantool#8825
> > ---
> >  src/lj_record.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/lj_record.c b/src/lj_record.c
> > index 02d9db9e..6030f77c 100644
> > --- a/src/lj_record.c
> > +++ b/src/lj_record.c

<snipped>

> > -- 
> > 2.41.0
> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
  2023-08-15 10:14   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:32   ` Sergey Bronnikov via Tarantool-patches
  2023-08-16 15:20     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 14:32 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey


Thanks for the patch!

Sergey

On 8/9/23 18:35, Sergey Kaplun wrote:

<snipped>


> ls/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> index 2afebbb2..414257fd 100644
> --- a/test/tarantool-tests/utils/frontend.lua
> +++ b/test/tarantool-tests/utils/frontend.lua
> @@ -1,6 +1,10 @@
>   local M = {}
>   
>   local bc = require('jit.bc')
> +local jutil = require('jit.util')
> +local vmdef = require('jit.vmdef')
> +local bcnames = vmdef.bcnames
> +local band, rshift = bit.band, bit.rshift
>   
>   function M.hasbc(f, bytecode)
>     assert(type(f) == 'function', 'argument #1 should be a function')
> @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
>     return hasbc
>   end
>   
> +-- Get traceno of the trace assotiated for the given function.
> +function M.gettraceno(func)
> +  assert(type(func) == 'function', 'argument #1 should be a function')
> +
> +  -- The 0th BC is the header.
> +  local func_ins = jutil.funcbc(func, 0)
> +  local BC_NAME_LENGTH = 6
> +  local RD_SHIFT = 16


Nit: AFAIK usually we left a comment with a source of constants.



<snipped>


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
  2023-08-15 14:38   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 14:52     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 14:52 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
See my answers below.

On 15.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
> 
> On Wed, Aug 09, 2023 at 06:36:05PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > (cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)
> > 
> > This commit fixes possible integer overflow of the separator's length
> Typo: s/possible/a possible/

Fixed.

> > counter during parsing long strings. It may lead to the fact, that
> > parser considers a string with unbalanced long brackets to be correct.
> Typo: s/parser/the parser/

Fixed.

> > Since this is pointless to parse too long string separators in the hope,
> Typo: s/this is/it is/

Fixed.

> > that the string is correct, just use hardcoded limit (2 ^ 25 is enough).
> Typo: s/use hardcoded/use the hardcoded/

Fixed.

> > 
> > Be aware that this limit is different for Lua 5.1.
> > 
> > We can't check the string overflow itself without a really large file,
> > because the ERR_MEM error will be raised, due to the string buffer
> > reallocations during parsing. Keep such huge file in the repo is
> Typo: s/Keep such/Keeping such a/

Fixed.

> > pointless, so just check that we don't parse long string after
> Typo: s/long string/long strings/

Fixed.

> > aforementioned separator length.
> Typo: s/aforementioned/the aforementioned/

Fixed.

> > 
> > Sergey Kaplun:
> > * added the description and the test for the problem
> > 
> > Part of tarantool/tarantool#8825
> > ---
> >  src/lj_lex.c                                  |  2 +-
> >  .../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
> >  2 files changed, 32 insertions(+), 1 deletion(-)
> >  create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> > 
> > diff --git a/src/lj_lex.c b/src/lj_lex.c
> > index 52856912..c66660d7 100644
> > --- a/src/lj_lex.c
> > +++ b/src/lj_lex.c

<snipped>

> > diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> > new file mode 100644
> > index 00000000..fda69d17
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> > @@ -0,0 +1,31 @@
> > +local tap = require('tap')
> > +
> > +-- Test to check that we avoid parsing of too long separator
> Typo: s/parsing of/parsing/
> Typo: s/separator/separators/

Fixed.

> > +-- for long strings.
> > +-- See also the discussion in the
> > +-- https://github.com/LuaJIT/LuaJIT/issues/812.
> > +
> > +local test = tap.test('lj-812-too-long-string-separator'):skipcond({
> > +  ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
> Please write a more detailed description of how it can be tested for non-GC64 build
> and why it is disabled now, as we have discussed offline.

Added, see the diff below.

> 
> > +})
> > +test:plan(2)
> > +
> > +-- We can't check the string overflow itself without a really
> > +-- large file, because the ERR_MEM error will be raised, due to
> > +-- the string buffer reallocations during parsing.
> > +-- Keep such huge file in the repo is pointless, so just check
> > +-- that we don't parse long string after some separator length.
> > +-- Be aware that this limit is different for Lua 5.1.
> Please fix the same typos as in the commit message here.

Fixed.

> > +
> > +-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
> > +local separator = string.rep('=', 0x20000000 + 1)
> > +local test_str = ('return [%s[]%s]'):format(separator, separator)
> > +
> > +local f, err = loadstring(test_str, 'empty_str_f')
> > +test:ok(not f, 'correct status when parsing string with too long separator')
> > +
> > +-- Check error message.
> > +test:ok(tostring(err):match('invalid long string delimiter'),
> > +        'correct error when parsing string with too long separator')

Also, changed this part to the `test:like()`, since it is more readable
and has the same behaviour.

See the iterative patch below:

===================================================================
diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
index fda69d17..380e26f0 100644
--- a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
+++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
@@ -1,11 +1,17 @@
 local tap = require('tap')
 
--- Test to check that we avoid parsing of too long separator
--- for long strings.
+-- Test to check that we avoid parsing too long separators for
+-- long strings.
 -- See also the discussion in the
 -- https://github.com/LuaJIT/LuaJIT/issues/812.
 
 local test = tap.test('lj-812-too-long-string-separator'):skipcond({
+  -- In non-GC64 mode, we get the OOM error since we need memory
+  -- for the string to load and the same amount of memory for the
+  -- string buffer. So, the only option is to create a big file
+  -- in the repo and keep it, or generate it and remove each time.
+  -- These options are kinda pointless, so let's check the
+  -- behaviour only for GC64 mode.
   ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
 })
 test:plan(2)
@@ -13,8 +19,9 @@ test:plan(2)
 -- We can't check the string overflow itself without a really
 -- large file, because the ERR_MEM error will be raised, due to
 -- the string buffer reallocations during parsing.
--- Keep such huge file in the repo is pointless, so just check
--- that we don't parse long string after some separator length.
+-- Keeping such a huge file in the repo is pointless, so just
+-- check that we don't parse long strings after some separator
+-- length.
 -- Be aware that this limit is different for Lua 5.1.
 
 -- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
@@ -25,7 +32,7 @@ local f, err = loadstring(test_str, 'empty_str_f')
 test:ok(not f, 'correct status when parsing string with too long separator')
 
 -- Check error message.
-test:ok(tostring(err):match('invalid long string delimiter'),
-        'correct error when parsing string with too long separator')
+test:like(err, 'invalid long string delimiter',
+          'correct error when parsing string with too long separator')
 
 test:done(true)
===================================================================

> > +
> > +test:done(true)
> > -- 
> > 2.41.0
> > 
> Best regards,
> Maxim Kokryashkin

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
  2023-08-15 11:13   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 15:02   ` Sergey Bronnikov via Tarantool-patches
  2023-08-16 15:32     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 15:02 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey


thanks for the patch!

Test has passed after reverting a patch and I suspect it is expected because

behaviour was broken for MIPS only, right?

See a minor comment below.


On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic.
>
> (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
>
> `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
> check in `asm_sparejump_setup()`, so mcode bottom is not updated.
>
> This patch fixes check of the MCLink offset from the mcbot.
> Nevertheless, the emitting of spare jump slots is still incorrect, so
> the introduced test still fails due to incorrect iteration through the
> sparce table (the last slot is out of mcode range).
"sparce" -> "sparse"?
<snipped    >

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
  2023-08-16  9:01   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 15:17     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 20:14       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:17 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Please, see my answers below.

On 16.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
> On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by James Cowgill.
> > 
> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
> > 
> > The issue is observed for the following merged IRs:
> > |    p64 HREF   0001  "a"            ; or other keys
> > | >  p64 EQ     0002  [0x4002d0c528] ; nilnode
> > Sometimes, when we need to rematerialize a constant during evicting of
> Typo: s/during evicting/during the eviction/

Fixed.

> > the register. So, the instruction related to constant rematerialization
> Sometimes happens what? The sentence looks kind of chopped.

The "when" is misleading here. Dropped it.

> > is placed in the delay branch slot, which suppose to contain the loads
> Typo: s/which suppose/which is supposed/

Fixed.

> > of trace exit number to the `$ra` register. The resulting assembly is
> Typo: s/number/numbers/ (because of `loads` being in the plural form)

Fixed.

> > the following (for example):
> > | beq     ra, r1, 0x400abee9b0  ->exit
> > | lui     r1, 65531   ; delay slot without setting of the `ra`
> > This leading to the assertion failure during trace exit in
> Typo: s/leading/leads/

Fixed.

> > `lj_trace_exit()`, since a trace number is incorrect.
> > 
> > This patch moves the constant register allocations above the main
> > instruction emitting code in `asm_href()`.
> AFAICS, It is not just moved, the register allocation logic has changed too.
> Before the patch, there were a few cases of inplace emissions, which
> disappeared after the patch. I believe it is important to mention to, along
> with a more detailed description of the logic changes.

No, the logic is just the same, we just choose the register early.
Since we use now `cmp64` register everywhere, there is no need to use
duplicate code in if - else if - else chunks.

> > 
> > Sergey Kaplun:
> > * added the description and the test for the problem
> > 
> > Part of tarantool/tarantool#8825
> > ---
> >  src/lj_asm_mips.h                             |  42 +++++---
> >  ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
> >  2 files changed, 126 insertions(+), 17 deletions(-)
> >  create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> > 
> > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> > index c27d8413..23ffc3aa 100644
> > --- a/src/lj_asm_mips.h
> > +++ b/src/lj_asm_mips.h
> > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> >    Reg dest = ra_dest(as, ir, allow);
> >    Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
> >    Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
> > +#if LJ_64
> > +  Reg cmp64 = RID_NONE;
> > +#endif
> >    IRRef refkey = ir->op2;
> >    IRIns *irkey = IR(refkey);
> >    int isk = irref_isk(refkey);
> > @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> >  #endif
> >    tmp2 = ra_scratch(as, allow);
> >    rset_clear(allow, tmp2);
> > +#if LJ_64
> > +  if (LJ_SOFTFP || !irt_isnum(kt)) {
> > +    /* Allocate cmp64 register used for 64-bit comparisons */
> > +    if (LJ_SOFTFP && irt_isnum(kt)) {
> > +      cmp64 = key;
> > +    } else if (!isk && irt_isaddr(kt)) {
> > +      cmp64 = tmp2;
> > +    } else {
> > +      int64_t k;
> > +      if (isk && irt_isaddr(kt)) {
> > +	k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> > +      } else {
> > +	lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> > +	k = ~((int64_t)~irt_toitype(ir->t) << 47);
> > +      }
> > +      cmp64 = ra_allock(as, k, allow);
> > +      rset_clear(allow, cmp64);
> > +    }
> > +  }
> > +#endif
> >  
> >    /* Key not found in chain: jump to exit (if merged) or load niltv. */
> >    l_end = emit_label(as);
> > @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> >      emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
> >      emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
> >      emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > -  } else if (LJ_SOFTFP && irt_isnum(kt)) {
> > -    emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> > -    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > -  } else if (irt_isaddr(kt)) {
> > -    Reg refk = tmp2;
> > -    if (isk) {
> > -      int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> > -      refk = ra_allock(as, k, allow);
> > -      rset_clear(allow, refk);
> > -    }
> > -    emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
> > -    emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> >    } else {
> > -    Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
> > -    rset_clear(allow, pri);
> > -    lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> > -    emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
> > -    emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> > +    emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
> > +    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> >    }
> >    *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
> >    if (!isk && irt_isaddr(kt)) {
> > diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> > new file mode 100644
> > index 00000000..8c75e69c
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> > @@ -0,0 +1,101 @@
> > +local tap = require('tap')
> > +-- Test file to demonstrate the incorrect JIT behaviour for HREF
> > +-- IR compilation on mips64.
> > +-- See also https://github.com/LuaJIT/LuaJIT/pull/362.
> > +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
> > +  ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(1)
> > +
> > +-- To reproduce the issue we need to compile a trace with
> > +-- `IR_HREF`, with a lookup of constant hash key GC value. To
> Typo: s/constant/a constant/

Fixed.

> > +-- prevent an `IR_HREFK` to be emitted instead, we need a table
> Typo: s/to be/from being/

Fixed.

> > +-- with a huge hash part. Delta of address between the start of
> Typo: s/Delta/The delta/

Fixed.

> > +-- the hash part of the table and the current node to lookup must
> > +-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
> Typo: s/more/greater/

Fixed.

> > +-- See <src/lj_record.c>, for details.
> > +-- XXX: This constant is well suited to prevent test to be flaky,
> Typo: s/to be/from being/

Fixed.

> > +-- because the aforementioned delta is always large enough.
> > +-- Also, this constant avoids table rehashing, when inserting new
> > +-- keys.
> > +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
> > +
> > +-- XXX: don't set `hotexit` to prevent compilation of trace after
> > +-- exiting the main test cycle.
> I suggest rehprasing it the following way:
> | The `hotexit` option is not set to prevent the compilation of traces
> | after the emission of the main test cycle.

Rephrased.

> > +jit.opt.start('hotloop=1')
> > +
> > +-- Don't use `table.new()`, here by intence -- this leads to the
> Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/

Fixed.

> > +-- allocation failure for the mcode memory, so traces are not
> > +-- compiled.
> > +local filled_tab = {}
> > +-- Filling-up the table with GC values to minimize the amount of
> Typo: s/Filling-up/Fill up/

Fixed.

> > +-- hash collisions and increase delta between the start of the
> Typo: s/delta/the delta/

Fixed.

> > +-- hash part of the table and currently stored node.
> Typo: s/currently/the currently/

Fixed.

> > +for _ = 1, N_HASH_FIELDS do
> > +  filled_tab[1LL] = 1
> > +end
> > +
> > +-- luacheck: no unused
> > +local tab_value_a
> > +local tab_value_b
> > +local tab_value_c
> > +local tab_value_d
> > +local tab_value_e
> > +local tab_value_f
> > +local tab_value_g
> > +local tab_value_h
> > +local tab_value_i
> > +
> > +-- The function for this trace has a bunch of the following IRs:
> > +--    p64 HREF   0001  "a"            ; or other keys
> > +-- >  p64 EQ     0002  [0x4002d0c528] ; nilnode
> > +-- Sometimes, when we need to rematerialize a constant during
> > +-- evicting of the register. So, the instruction related to
> Typo: s/evicting/the eviction/

Fixed.

> Again, sometimes happens what?

The "when" is misleading here. Dropped it.

> > +-- constant rematerialization is placed in the delay branch slot,
> > +-- which suppose to contain the loads of trace exit number to the
> Typo: s/which suppose/which is supposed/

Fixed.

> Typo: s/number/numbers/

Fixed.

> > +-- `$ra` register. This leading to the assertion failure during
> Typo: s/leading/leads/

Fixed.

> > +-- trace exit in `lj_trace_exit()`, since a trace number is
> > +-- incorrect. The amount of the side exit to check is empirical
> Typo: s/exit/exits/

Fixed.

> > +-- (even a little bit more, than necessary just in case).
> Typo: s/more/greater/

Fixed.

> > +local function href_const(tab)
> > +  tab_value_a = tab.a
> > +  tab_value_b = tab.b
> > +  tab_value_c = tab.c
> > +  tab_value_d = tab.d
> > +  tab_value_e = tab.e
> > +  tab_value_f = tab.f
> > +  tab_value_g = tab.g
> > +  tab_value_h = tab.h
> > +  tab_value_i = tab.i
> > +end
> > +
> > +-- Compile main trace first.
> Typo: s/main/the main/

Fixed.

> > +href_const(filled_tab)
> > +href_const(filled_tab)
> > +
> > +-- Now brute-force side exits to check that they are compiled
> > +-- correct. Take side exits in the reverse order to take a new
> Typo: s/correct/correctly/
> Typo: s/the reverse/reverse/

Fixed.

<snipped>

See the iterative patch below:

===================================================================
diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
index 8c75e69c..b4ee9e2b 100644
--- a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
@@ -9,29 +9,29 @@ local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
 test:plan(1)
 
 -- To reproduce the issue we need to compile a trace with
--- `IR_HREF`, with a lookup of constant hash key GC value. To
--- prevent an `IR_HREFK` to be emitted instead, we need a table
--- with a huge hash part. Delta of address between the start of
--- the hash part of the table and the current node to lookup must
--- be more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- `IR_HREF`, with a lookup of a constant hash key GC value. To
+-- prevent an `IR_HREFK` from being emitted instead, we need a
+-- table with a huge hash part. The delta of address between the
+-- start of the hash part of the table and the current node to
+-- lookup must be greater than `(1024 * 64 - 1) * sizeof(Node)`.
 -- See <src/lj_record.c>, for details.
--- XXX: This constant is well suited to prevent test to be flaky,
--- because the aforementioned delta is always large enough.
+-- XXX: This constant is well suited to prevent test from being
+-- flaky, because the aforementioned delta is always large enough.
 -- Also, this constant avoids table rehashing, when inserting new
 -- keys.
 local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
 
--- XXX: don't set `hotexit` to prevent compilation of trace after
--- exiting the main test cycle.
+-- XXX: The `hotexit` option is not set to prevent the compilation
+-- of traces after the emission of the main test cycle.
 jit.opt.start('hotloop=1')
 
--- Don't use `table.new()`, here by intence -- this leads to the
--- allocation failure for the mcode memory, so traces are not
+-- `table.new()` is not used here by intention -- this leads to
+-- the allocation failure for the mcode memory, so traces are not
 -- compiled.
 local filled_tab = {}
--- Filling-up the table with GC values to minimize the amount of
--- hash collisions and increase delta between the start of the
--- hash part of the table and currently stored node.
+-- Fill up the table with GC values to minimize the amount of hash
+-- collisions and increase the delta between the start of the hash
+-- part of the table and the currently stored node.
 for _ = 1, N_HASH_FIELDS do
   filled_tab[1LL] = 1
 end
@@ -50,14 +50,14 @@ local tab_value_i
 -- The function for this trace has a bunch of the following IRs:
 --    p64 HREF   0001  "a"            ; or other keys
 -- >  p64 EQ     0002  [0x4002d0c528] ; nilnode
--- Sometimes, when we need to rematerialize a constant during
--- evicting of the register. So, the instruction related to
+-- Sometimes, we need to rematerialize a constant during the
+-- eviction of the register. So, the instruction related to
 -- constant rematerialization is placed in the delay branch slot,
--- which suppose to contain the loads of trace exit number to the
--- `$ra` register. This leading to the assertion failure during
--- trace exit in `lj_trace_exit()`, since a trace number is
--- incorrect. The amount of the side exit to check is empirical
--- (even a little bit more, than necessary just in case).
+-- which is supposed to contain the load of the trace exit number
+-- to the `$ra` register. This leads to the assertion failure
+-- during trace exit in `lj_trace_exit()`, since a trace number is
+-- incorrect. The amount of the side exits to check is empirical
+-- (even a little bit greater, than necessary just in case).
 local function href_const(tab)
   tab_value_a = tab.a
   tab_value_b = tab.b
@@ -70,13 +70,13 @@ local function href_const(tab)
   tab_value_i = tab.i
 end
 
--- Compile main trace first.
+-- Compile the main trace first.
 href_const(filled_tab)
 href_const(filled_tab)
 
 -- Now brute-force side exits to check that they are compiled
--- correct. Take side exits in the reverse order to take a new
--- side exit each time.
+-- correctly. Take side exits in reverse order to take a new side
+-- exit each time.
 filled_tab.i = 'i'
 href_const(filled_tab)
 filled_tab.h = 'h'
===================================================================

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
  2023-08-16 14:32   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-16 15:20     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 16:08       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:20 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 16.08.23, Sergey Bronnikov wrote:
> Hi, Sergey
> 
> 
> Thanks for the patch!
> 
> Sergey
> 
> On 8/9/23 18:35, Sergey Kaplun wrote:
> 
> <snipped>
> 
> 
> > ls/frontend.lua b/test/tarantool-tests/utils/frontend.lua
> > index 2afebbb2..414257fd 100644
> > --- a/test/tarantool-tests/utils/frontend.lua
> > +++ b/test/tarantool-tests/utils/frontend.lua
> > @@ -1,6 +1,10 @@
> >   local M = {}
> >   
> >   local bc = require('jit.bc')
> > +local jutil = require('jit.util')
> > +local vmdef = require('jit.vmdef')
> > +local bcnames = vmdef.bcnames
> > +local band, rshift = bit.band, bit.rshift
> >   
> >   function M.hasbc(f, bytecode)
> >     assert(type(f) == 'function', 'argument #1 should be a function')
> > @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
> >     return hasbc
> >   end
> >   
> > +-- Get traceno of the trace assotiated for the given function.
> > +function M.gettraceno(func)
> > +  assert(type(func) == 'function', 'argument #1 should be a function')
> > +
> > +  -- The 0th BC is the header.
> > +  local func_ins = jutil.funcbc(func, 0)
> > +  local BC_NAME_LENGTH = 6
> > +  local RD_SHIFT = 16
> 
> 
> Nit: AFAIK usually we left a comment with a source of constants.

Unfortunately, there is no any real sources for these constants,
but the code is similar to the <src/jit/bc.lua>. But, I don't sure
that is worth to be mentioned.

> 
> <snipped>
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable.
  2023-08-16  9:03   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 15:22     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:22 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!

On 16.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for the single typo in the commit message.
> On Wed, Aug 09, 2023 at 06:36:07PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Cleanup only, bug cannot trigger.
> > Thanks to Domingo Alvarez Duarte.
> > 
> > (cherry-picked from commit 5c911998a3c85d024a8006feafc68d0b4c962fd8)
> > 
> > This patch fixes local shadow variable `n` in `template__` function from
> Typo: s/local/the local/

Fixed, thanks!

> > <dynasm/dasm_mips.lua> by renaming it to `m`. Since this cannot be
> > triggered, there is no test provided.
> > 
> > Sergey Kaplun:
> > * added the description for the problem
> > ---

<snipped>

> > 2.41.0
> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port.
  2023-08-16  9:16   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 15:24     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:24 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
Fixed your comments inline.

On 16.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few nits regarding the commit message.
> On Wed, Aug 09, 2023 at 06:36:08PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Hua Zhang, YunQiang Su from Wave Computing,
> > and Radovan Birdic from RT-RK.
> > Sponsored by Wave Computing.
> > 
> > (cherry-picked from commit 94d0b53004a5fa368defa4307a17edcdb87fe727)
> > 
> > This patch adds support for MIPS Release 6 [1] for the 64-bit build.
> > This includes:
> > * Global `_map_def` value is set with <dynasm/dynasm.lua>. `MIPSR6` key
> >   specifies the corresponding instruction set support. Also, `MIPSR6` is
> >   defined in `DYNASM_FLAGS` (`DASM_AFLAGS`).
> > * New instructions are added within <dynasm/dasm_mips.lua>, they are
> >   used if the aforementioned key is set.
> > * Obsolete instructions (that are no more in use in r6) are used in the
> Typo: s/no more/no longer/

Fixed.

> >   opposite case (if `MIPSR6` isn't set).
> > * New opcode maps are added into  <src/jit/dis_mips.lua>.
> Typo: s/into/to/

Fixed.

> > * `map_arch` table in <jit/bcsave.lua> is refactored for more convenient
> >   usage. Now each arch key contains a table with the corresponding info
> >   about supported architecture:
> Typo: s/about/about the/

Fixed.

> >     - `e`: endianess; "le" or "be"
> >     - `b`: bit-width of the supported architecture; 32 or 64
> >     - `m`: machine specification (see `e_machine` in man elf)
> >     - `f`: processor-specific flags (see `e_flags` in man elf)
> >     - `p`: number that identifies the type of target machine [2] for
> >       Portable Executable format [3].
> > * New `LJ_TARGET_MIPSR6` define is set for MIPSR6 in <src/lj_arch.h>.
> > * The corresponding "MIPS32R6", "MIPS64R6" CPU strings are added to the
> >   <src/jit.h>
> > * MIPSR6 instructions are added to the <src/lj_target_mips.h>, some
> >   obsolete instructions are removed or defined only for the non-MIPSR6
> >   build.
> > * All release-dependent instructions in <src/lj_asm_mips.h> are
> >   instrumented with `LJ_TARGET_MIPSR6` macro.
> > * `f20`, `f21`, `f22` FP registers are defined as `FTMP0`, `FTMP1`,
> >   `FTMP2` correspondingly in the VM.
> > * All release-dependent instructions in <src/vm_mips64.dasm> are
> >   instrumented with `MIPSR6` macro.
> > * `sfmin_max` macro now takes the third operand for the MIPSR6 build.
> > * Fix implicit fallthrough warning for `LJ_SOFTFP && !LJ_NEED_FP64`
> Typo: s/Fix/Fix the/

Fixed.

> >   build in <src/lj_asm.c>.
> > 
> > Note, that 32-bit r6 targets still unsupported, because it is difficult
> Typo: s/targets/targets are/

Fixed.

> > and most available r6 CPUs are 64 bit.
> > 
> > [1]: https://www.mips.com/products/architectures/mips64/
> > [2]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#machine-types
> > [3]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
> > 
> > Sergey Kaplun:
> > * added the description for the feature
> > 
> > Part of tarantool/tarantool#8825
> > ---

<snipped>

> > 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
  2023-08-16 15:02   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-16 15:32     ` Sergey Kaplun via Tarantool-patches
  2023-08-16 16:08       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:32 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 16.08.23, Sergey Bronnikov wrote:
> Hi, Sergey
> 
> 
> thanks for the patch!
> 
> Test has passed after reverting a patch and I suspect it is expected because
> 
> behaviour was broken for MIPS only, right?

Yes, its true.

> 
> See a minor comment below.
> 
> 
> On 8/9/23 18:35, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Djordje Kovacevic and Stefan Pejic.
> >
> > (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
> >
> > `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
> > is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
> > check in `asm_sparejump_setup()`, so mcode bottom is not updated.
> >
> > This patch fixes check of the MCLink offset from the mcbot.
> > Nevertheless, the emitting of spare jump slots is still incorrect, so
> > the introduced test still fails due to incorrect iteration through the
> > sparce table (the last slot is out of mcode range).
> "sparce" -> "sparse"?

Changed to the "spare slots".

> <snipped    >

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (18 preceding siblings ...)
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port Sergey Kaplun via Tarantool-patches
@ 2023-08-16 15:35 ` Sergey Kaplun via Tarantool-patches
  2023-08-17 14:06   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-17 14:38 ` Sergey Bronnikov via Tarantool-patches
  2023-08-31 15:17 ` Igor Munkin via Tarantool-patches
  21 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-16 15:35 UTC (permalink / raw)
  To: Igor Munkin, Sergey Bronnikov; +Cc: tarantool-patches

Hi, folks!
I've fixed Maxim's comments for all patches, and rebased my branch to
tarantool/master.
Branch is force-pushed.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
  2023-08-15 11:27   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 16:07   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:07 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey


LGTM

On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit a057a07ab702e225e21848d4f918886c5b0ac06b)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in JIT compiler backend for MIPS64.
> This includes:
> * `vm_tointg()` helper is added in <src/vm_mips64.dasm> to convert FP
>    number to integer with a check for the soft-float support (called from
>    JIT).
> * `sfmin/max()` helpers are added in <src/vm_mips64.dasm> for min/max
>    operations with a check for the soft-float support (called from JIT).
> * `LJ_SOFTFP32` macro is introduced to be used for 32-bit MIPS instead
>    `LJ_SOFTFP`.
> * All fp-depending paths are instrumented with `LJ_SOFTFP` or
>    `LJ_SOFTFP32` macro.
> * The corresponding function calls in <src/lj_ircall.h> are marked as
>    `XA_FP32`, `XA2_FP32`, i.e. as required extra arguments on the stack
>    for soft-FP on 32-bit MIPS.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
>   src/lj_arch.h      |   4 +-
>   src/lj_asm.c       |   8 +-
>   src/lj_asm_mips.h  | 217 +++++++++++++++++++++++++++++++++++++--------
>   src/lj_crecord.c   |   4 +-
>   src/lj_emit_mips.h |   2 +
>   src/lj_ffrecord.c  |   2 +-
>   src/lj_ircall.h    |  43 ++++++---
>   src/lj_iropt.h     |   2 +-
>   src/lj_jit.h       |   4 +-
>   src/lj_obj.h       |   3 +
>   src/lj_opt_split.c |   2 +-
>   src/lj_snap.c      |  21 +++--
>   src/vm_mips64.dasc |  49 ++++++++++
>   13 files changed, 286 insertions(+), 75 deletions(-)
>
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 5276ae56..c39526ea 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -349,9 +349,6 @@
>   #define LJ_ARCH_BITS		32
>   #define LJ_TARGET_MIPS32	1
>   #else
> -#if LJ_ABI_SOFTFP || !LJ_ARCH_HASFPU
> -#define LJ_ARCH_NOJIT		1	/* NYI */
> -#endif
>   #define LJ_ARCH_BITS		64
>   #define LJ_TARGET_MIPS64	1
>   #define LJ_TARGET_GC64		1
> @@ -528,6 +525,7 @@
>   #define LJ_ABI_SOFTFP		0
>   #endif
>   #define LJ_SOFTFP		(!LJ_ARCH_HASFPU)
> +#define LJ_SOFTFP32		(LJ_SOFTFP && LJ_32)
>   
>   #if LJ_ARCH_ENDIAN == LUAJIT_BE
>   #define LJ_LE			0
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 0bfa44ed..15de7e33 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -341,7 +341,7 @@ static Reg ra_rematk(ASMState *as, IRRef ref)
>     ra_modified(as, r);
>     ir->r = RID_INIT;  /* Do not keep any hint. */
>     RA_DBGX((as, "remat     $i $r", ir, r));
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>     if (ir->o == IR_KNUM) {
>       emit_loadk64(as, r, ir);
>     } else
> @@ -1356,7 +1356,7 @@ static void asm_call(ASMState *as, IRIns *ir)
>     asm_gencall(as, ci, args);
>   }
>   
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>   static void asm_fppow(ASMState *as, IRIns *ir, IRRef lref, IRRef rref)
>   {
>     const CCallInfo *ci = &lj_ir_callinfo[IRCALL_pow];
> @@ -1703,10 +1703,10 @@ static void asm_ir(ASMState *as, IRIns *ir)
>     case IR_MUL: asm_mul(as, ir); break;
>     case IR_MOD: asm_mod(as, ir); break;
>     case IR_NEG: asm_neg(as, ir); break;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
>     case IR_DIV: case IR_POW: case IR_ABS:
>     case IR_ATAN2: case IR_LDEXP: case IR_FPMATH: case IR_TOBIT:
> -    lua_assert(0);  /* Unused for LJ_SOFTFP. */
> +    lua_assert(0);  /* Unused for LJ_SOFTFP32. */
>       break;
>   #else
>     case IR_DIV: asm_div(as, ir); break;
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index 0e60fc07..a26a82cd 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -290,7 +290,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>   	  {
>   	    ra_leftov(as, gpr, ref);
>   	    gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
>   	    fpr++;
>   #endif
>   	  }
> @@ -301,7 +301,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>   	  emit_spstore(as, ir, r, ofs);
>   	  ofs += irt_isnum(ir->t) ? 8 : 4;
>   #else
> -	  emit_spstore(as, ir, r, ofs + ((LJ_BE && (LJ_SOFTFP || r < RID_MAX_GPR) && !irt_is64(ir->t)) ? 4 : 0));
> +	  emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isfp(ir->t) && !irt_is64(ir->t)) ? 4 : 0));
>   	  ofs += 8;
>   #endif
>   	}
> @@ -312,7 +312,7 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>   #endif
>         if (gpr <= REGARG_LASTGPR) {
>   	gpr++;
> -#if LJ_64
> +#if LJ_64 && !LJ_SOFTFP
>   	fpr++;
>   #endif
>         } else {
> @@ -461,12 +461,36 @@ static void asm_tobit(ASMState *as, IRIns *ir)
>     emit_tg(as, MIPSI_MFC1, dest, tmp);
>     emit_fgh(as, MIPSI_ADD_D, tmp, left, right);
>   }
> +#elif LJ_64  /* && LJ_SOFTFP */
> +static void asm_tointg(ASMState *as, IRIns *ir, Reg r)
> +{
> +  /* The modified regs must match with the *.dasc implementation. */
> +  RegSet drop = RID2RSET(REGARG_FIRSTGPR)|RID2RSET(RID_RET)|RID2RSET(RID_RET+1)|
> +		RID2RSET(RID_R1)|RID2RSET(RID_R12);
> +  if (ra_hasreg(ir->r)) rset_clear(drop, ir->r);
> +  ra_evictset(as, drop);
> +  /* Return values are in RID_RET (converted value) and RID_RET+1 (status). */
> +  ra_destreg(as, ir, RID_RET);
> +  asm_guard(as, MIPSI_BNE, RID_RET+1, RID_ZERO);
> +  emit_call(as, (void *)lj_ir_callinfo[IRCALL_lj_vm_tointg].func, 0);
> +  if (r == RID_NONE)
> +    ra_leftov(as, REGARG_FIRSTGPR, ir->op1);
> +  else if (r != REGARG_FIRSTGPR)
> +    emit_move(as, REGARG_FIRSTGPR, r);
> +}
> +
> +static void asm_tobit(ASMState *as, IRIns *ir)
> +{
> +  Reg dest = ra_dest(as, ir, RSET_GPR);
> +  emit_dta(as, MIPSI_SLL, dest, dest, 0);
> +  asm_callid(as, ir, IRCALL_lj_vm_tobit);
> +}
>   #endif
>   
>   static void asm_conv(ASMState *as, IRIns *ir)
>   {
>     IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>     int stfp = (st == IRT_NUM || st == IRT_FLOAT);
>   #endif
>   #if LJ_64
> @@ -477,12 +501,13 @@ static void asm_conv(ASMState *as, IRIns *ir)
>     lua_assert(!(irt_isint64(ir->t) ||
>   	       (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
>   #endif
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
>     /* FP conversions are handled by SPLIT. */
>     lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
>     /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
>   #else
>     lua_assert(irt_type(ir->t) != st);
> +#if !LJ_SOFTFP
>     if (irt_isfp(ir->t)) {
>       Reg dest = ra_dest(as, ir, RSET_FPR);
>       if (stfp) {  /* FP to FP conversion. */
> @@ -608,6 +633,42 @@ static void asm_conv(ASMState *as, IRIns *ir)
>         }
>       }
>     } else
> +#else
> +  if (irt_isfp(ir->t)) {
> +#if LJ_64 && LJ_HASFFI
> +    if (stfp) {  /* FP to FP conversion. */
> +      asm_callid(as, ir, irt_isnum(ir->t) ? IRCALL_softfp_f2d :
> +					    IRCALL_softfp_d2f);
> +    } else {  /* Integer to FP conversion. */
> +      IRCallID cid = ((IRT_IS64 >> st) & 1) ?
> +	(irt_isnum(ir->t) ?
> +	 (st == IRT_I64 ? IRCALL_fp64_l2d : IRCALL_fp64_ul2d) :
> +	 (st == IRT_I64 ? IRCALL_fp64_l2f : IRCALL_fp64_ul2f)) :
> +	(irt_isnum(ir->t) ?
> +	 (st == IRT_INT ? IRCALL_softfp_i2d : IRCALL_softfp_ui2d) :
> +	 (st == IRT_INT ? IRCALL_softfp_i2f : IRCALL_softfp_ui2f));
> +      asm_callid(as, ir, cid);
> +    }
> +#else
> +    asm_callid(as, ir, IRCALL_softfp_i2d);
> +#endif
> +  } else if (stfp) {  /* FP to integer conversion. */
> +    if (irt_isguard(ir->t)) {
> +      /* Checked conversions are only supported from number to int. */
> +      lua_assert(irt_isint(ir->t) && st == IRT_NUM);
> +      asm_tointg(as, ir, RID_NONE);
> +    } else {
> +      IRCallID cid = irt_is64(ir->t) ?
> +	((st == IRT_NUM) ?
> +	 (irt_isi64(ir->t) ? IRCALL_fp64_d2l : IRCALL_fp64_d2ul) :
> +	 (irt_isi64(ir->t) ? IRCALL_fp64_f2l : IRCALL_fp64_f2ul)) :
> +	((st == IRT_NUM) ?
> +	 (irt_isint(ir->t) ? IRCALL_softfp_d2i : IRCALL_softfp_d2ui) :
> +	 (irt_isint(ir->t) ? IRCALL_softfp_f2i : IRCALL_softfp_f2ui));
> +      asm_callid(as, ir, cid);
> +    }
> +  } else
> +#endif
>   #endif
>     {
>       Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -665,7 +726,7 @@ static void asm_strto(ASMState *as, IRIns *ir)
>     const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
>     IRRef args[2];
>     int32_t ofs = 0;
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
>     ra_evictset(as, RSET_SCRATCH);
>     if (ra_used(ir)) {
>       if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> @@ -806,7 +867,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>     MCLabel l_end, l_loop, l_next;
>   
>     rset_clear(allow, tab);
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
>     if (!isk) {
>       key = ra_alloc1(as, refkey, allow);
>       rset_clear(allow, key);
> @@ -826,7 +887,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>       }
>     }
>   #else
> -  if (irt_isnum(kt)) {
> +  if (!LJ_SOFTFP && irt_isnum(kt)) {
>       key = ra_alloc1(as, refkey, RSET_FPR);
>       tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
>     } else if (!irt_ispri(kt)) {
> @@ -882,6 +943,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>       emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
>       emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
>       emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> +  } else if (LJ_SOFTFP && irt_isnum(kt)) {
> +    emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> +    emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
>     } else if (irt_isaddr(kt)) {
>       Reg refk = tmp2;
>       if (isk) {
> @@ -960,7 +1024,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>         emit_dta(as, MIPSI_ROTR, dest, tmp1, (-HASH_ROT1)&31);
>         if (irt_isnum(kt)) {
>   	emit_dst(as, MIPSI_ADDU, tmp1, tmp1, tmp1);
> -	emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 0);
> +	emit_dta(as, MIPSI_DSRA32, tmp1, LJ_SOFTFP ? key : tmp1, 0);
>   	emit_dta(as, MIPSI_SLL, tmp2, LJ_SOFTFP ? key : tmp1, 0);
>   #if !LJ_SOFTFP
>   	emit_tg(as, MIPSI_DMFC1, tmp1, key);
> @@ -1123,7 +1187,7 @@ static MIPSIns asm_fxloadins(IRIns *ir)
>     case IRT_U8: return MIPSI_LBU;
>     case IRT_I16: return MIPSI_LH;
>     case IRT_U16: return MIPSI_LHU;
> -  case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_LDC1;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1;
>     case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1;
>     default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW;
>     }
> @@ -1134,7 +1198,7 @@ static MIPSIns asm_fxstoreins(IRIns *ir)
>     switch (irt_type(ir->t)) {
>     case IRT_I8: case IRT_U8: return MIPSI_SB;
>     case IRT_I16: case IRT_U16: return MIPSI_SH;
> -  case IRT_NUM: lua_assert(!LJ_SOFTFP); return MIPSI_SDC1;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1;
>     case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1;
>     default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW;
>     }
> @@ -1199,7 +1263,7 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
>   
>   static void asm_ahuvload(ASMState *as, IRIns *ir)
>   {
> -  int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> +  int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
>     Reg dest = RID_NONE, type = RID_TMP, idx;
>     RegSet allow = RSET_GPR;
>     int32_t ofs = 0;
> @@ -1212,7 +1276,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
>       }
>     }
>     if (ra_used(ir)) {
> -    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> +    lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
>   	       irt_isint(ir->t) || irt_isaddr(ir->t));
>       dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>       rset_clear(allow, dest);
> @@ -1261,10 +1325,10 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
>     int32_t ofs = 0;
>     if (ir->r == RID_SINK)
>       return;
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> -    src = ra_alloc1(as, ir->op2, RSET_FPR);
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +    src = ra_alloc1(as, ir->op2, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
>       idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> -    emit_hsi(as, MIPSI_SDC1, src, idx, ofs);
> +    emit_hsi(as, LJ_SOFTFP ? MIPSI_SD : MIPSI_SDC1, src, idx, ofs);
>     } else {
>   #if LJ_32
>       if (!irt_ispri(ir->t)) {
> @@ -1312,7 +1376,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
>     IRType1 t = ir->t;
>   #if LJ_32
>     int32_t ofs = 8*((int32_t)ir->op1-1) + ((ir->op2 & IRSLOAD_FRAME) ? 4 : 0);
> -  int hiop = (LJ_32 && LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> +  int hiop = (LJ_SOFTFP32 && (ir+1)->o == IR_HIOP);
>     if (hiop)
>       t.irt = IRT_NUM;
>   #else
> @@ -1320,7 +1384,7 @@ static void asm_sload(ASMState *as, IRIns *ir)
>   #endif
>     lua_assert(!(ir->op2 & IRSLOAD_PARENT));  /* Handled by asm_head_side(). */
>     lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP32
>     lua_assert(!(ir->op2 & IRSLOAD_CONVERT));  /* Handled by LJ_SOFTFP SPLIT. */
>     if (hiop && ra_used(ir+1)) {
>       type = ra_dest(as, ir+1, allow);
> @@ -1328,29 +1392,44 @@ static void asm_sload(ASMState *as, IRIns *ir)
>     }
>   #else
>     if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
> -    dest = ra_scratch(as, RSET_FPR);
> +    dest = ra_scratch(as, LJ_SOFTFP ? allow : RSET_FPR);
>       asm_tointg(as, ir, dest);
>       t.irt = IRT_NUM;  /* Continue with a regular number type check. */
>     } else
>   #endif
>     if (ra_used(ir)) {
> -    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> +    lua_assert((LJ_SOFTFP32 ? 0 : irt_isnum(ir->t)) ||
>   	       irt_isint(ir->t) || irt_isaddr(ir->t));
>       dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>       rset_clear(allow, dest);
>       base = ra_alloc1(as, REF_BASE, allow);
>       rset_clear(allow, base);
> -    if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
> +    if (!LJ_SOFTFP32 && (ir->op2 & IRSLOAD_CONVERT)) {
>         if (irt_isint(t)) {
> -	Reg tmp = ra_scratch(as, RSET_FPR);
> +	Reg tmp = ra_scratch(as, LJ_SOFTFP ? RSET_GPR : RSET_FPR);
> +#if LJ_SOFTFP
> +	ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> +	ra_destreg(as, ir, RID_RET);
> +	emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_d2i].func, 0);
> +	if (tmp != REGARG_FIRSTGPR)
> +	  emit_move(as, REGARG_FIRSTGPR, tmp);
> +#else
>   	emit_tg(as, MIPSI_MFC1, dest, tmp);
>   	emit_fg(as, MIPSI_TRUNC_W_D, tmp, tmp);
> +#endif
>   	dest = tmp;
>   	t.irt = IRT_NUM;  /* Check for original type. */
>         } else {
>   	Reg tmp = ra_scratch(as, RSET_GPR);
> +#if LJ_SOFTFP
> +	ra_evictset(as, rset_exclude(RSET_SCRATCH, dest));
> +	ra_destreg(as, ir, RID_RET);
> +	emit_call(as, (void *)lj_ir_callinfo[IRCALL_softfp_i2d].func, 0);
> +	emit_dta(as, MIPSI_SLL, REGARG_FIRSTGPR, tmp, 0);
> +#else
>   	emit_fg(as, MIPSI_CVT_D_W, dest, dest);
>   	emit_tg(as, MIPSI_MTC1, tmp, dest);
> +#endif
>   	dest = tmp;
>   	t.irt = IRT_INT;  /* Check for original type. */
>         }
> @@ -1399,7 +1478,7 @@ dotypecheck:
>         if (irt_isnum(t)) {
>   	asm_guard(as, MIPSI_BEQ, RID_TMP, RID_ZERO);
>   	emit_tsi(as, MIPSI_SLTIU, RID_TMP, RID_TMP, (int32_t)LJ_TISNUM);
> -	if (ra_hasreg(dest))
> +	if (!LJ_SOFTFP && ra_hasreg(dest))
>   	  emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
>         } else {
>   	asm_guard(as, MIPSI_BNE, RID_TMP,
> @@ -1409,7 +1488,7 @@ dotypecheck:
>       }
>       emit_tsi(as, MIPSI_LD, type, base, ofs);
>     } else if (ra_hasreg(dest)) {
> -    if (irt_isnum(t))
> +    if (!LJ_SOFTFP && irt_isnum(t))
>         emit_hsi(as, MIPSI_LDC1, dest, base, ofs);
>       else
>         emit_tsi(as, irt_isint(t) ? MIPSI_LW : MIPSI_LD, dest, base,
> @@ -1554,26 +1633,40 @@ static void asm_fpunary(ASMState *as, IRIns *ir, MIPSIns mi)
>     Reg left = ra_hintalloc(as, ir->op1, dest, RSET_FPR);
>     emit_fg(as, mi, dest, left);
>   }
> +#endif
>   
> +#if !LJ_SOFTFP32
>   static void asm_fpmath(ASMState *as, IRIns *ir)
>   {
>     if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir))
>       return;
> +#if !LJ_SOFTFP
>     if (ir->op2 <= IRFPM_TRUNC)
>       asm_callround(as, ir, IRCALL_lj_vm_floor + ir->op2);
>     else if (ir->op2 == IRFPM_SQRT)
>       asm_fpunary(as, ir, MIPSI_SQRT_D);
>     else
> +#endif
>       asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
>   }
>   #endif
>   
> +#if !LJ_SOFTFP
> +#define asm_fpadd(as, ir)	asm_fparith(as, ir, MIPSI_ADD_D)
> +#define asm_fpsub(as, ir)	asm_fparith(as, ir, MIPSI_SUB_D)
> +#define asm_fpmul(as, ir)	asm_fparith(as, ir, MIPSI_MUL_D)
> +#elif LJ_64  /* && LJ_SOFTFP */
> +#define asm_fpadd(as, ir)	asm_callid(as, ir, IRCALL_softfp_add)
> +#define asm_fpsub(as, ir)	asm_callid(as, ir, IRCALL_softfp_sub)
> +#define asm_fpmul(as, ir)	asm_callid(as, ir, IRCALL_softfp_mul)
> +#endif
> +
>   static void asm_add(ASMState *as, IRIns *ir)
>   {
>     IRType1 t = ir->t;
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>     if (irt_isnum(t)) {
> -    asm_fparith(as, ir, MIPSI_ADD_D);
> +    asm_fpadd(as, ir);
>     } else
>   #endif
>     {
> @@ -1595,9 +1688,9 @@ static void asm_add(ASMState *as, IRIns *ir)
>   
>   static void asm_sub(ASMState *as, IRIns *ir)
>   {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>     if (irt_isnum(ir->t)) {
> -    asm_fparith(as, ir, MIPSI_SUB_D);
> +    asm_fpsub(as, ir);
>     } else
>   #endif
>     {
> @@ -1611,9 +1704,9 @@ static void asm_sub(ASMState *as, IRIns *ir)
>   
>   static void asm_mul(ASMState *as, IRIns *ir)
>   {
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>     if (irt_isnum(ir->t)) {
> -    asm_fparith(as, ir, MIPSI_MUL_D);
> +    asm_fpmul(as, ir);
>     } else
>   #endif
>     {
> @@ -1640,7 +1733,7 @@ static void asm_mod(ASMState *as, IRIns *ir)
>       asm_callid(as, ir, IRCALL_lj_vm_modi);
>   }
>   
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>   static void asm_pow(ASMState *as, IRIns *ir)
>   {
>   #if LJ_64 && LJ_HASFFI
> @@ -1660,7 +1753,11 @@ static void asm_div(ASMState *as, IRIns *ir)
>   					  IRCALL_lj_carith_divu64);
>     else
>   #endif
> +#if !LJ_SOFTFP
>       asm_fparith(as, ir, MIPSI_DIV_D);
> +#else
> +  asm_callid(as, ir, IRCALL_softfp_div);
> +#endif
>   }
>   #endif
>   
> @@ -1670,6 +1767,13 @@ static void asm_neg(ASMState *as, IRIns *ir)
>     if (irt_isnum(ir->t)) {
>       asm_fpunary(as, ir, MIPSI_NEG_D);
>     } else
> +#elif LJ_64  /* && LJ_SOFTFP */
> +  if (irt_isnum(ir->t)) {
> +    Reg dest = ra_dest(as, ir, RSET_GPR);
> +    Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> +    emit_dst(as, MIPSI_XOR, dest, left,
> +	    ra_allock(as, 0x8000000000000000ll, rset_exclude(RSET_GPR, dest)));
> +  } else
>   #endif
>     {
>       Reg dest = ra_dest(as, ir, RSET_GPR);
> @@ -1679,7 +1783,17 @@ static void asm_neg(ASMState *as, IRIns *ir)
>     }
>   }
>   
> +#if !LJ_SOFTFP
>   #define asm_abs(as, ir)		asm_fpunary(as, ir, MIPSI_ABS_D)
> +#elif LJ_64   /* && LJ_SOFTFP */
> +static void asm_abs(ASMState *as, IRIns *ir)
> +{
> +  Reg dest = ra_dest(as, ir, RSET_GPR);
> +  Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> +  emit_tsml(as, MIPSI_DEXTM, dest, left, 30, 0);
> +}
> +#endif
> +
>   #define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
>   #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
>   
> @@ -1924,15 +2038,21 @@ static void asm_bror(ASMState *as, IRIns *ir)
>     }
>   }
>   
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
>   static void asm_sfpmin_max(ASMState *as, IRIns *ir)
>   {
>     CCallInfo ci = lj_ir_callinfo[(IROp)ir->o == IR_MIN ? IRCALL_lj_vm_sfmin : IRCALL_lj_vm_sfmax];
> +#if LJ_64
> +  IRRef args[2];
> +  args[0] = ir->op1;
> +  args[1] = ir->op2;
> +#else
>     IRRef args[4];
>     args[0^LJ_BE] = ir->op1;
>     args[1^LJ_BE] = (ir+1)->op1;
>     args[2^LJ_BE] = ir->op2;
>     args[3^LJ_BE] = (ir+1)->op2;
> +#endif
>     asm_setupresult(as, ir, &ci);
>     emit_call(as, (void *)ci.func, 0);
>     ci.func = NULL;
> @@ -1942,7 +2062,10 @@ static void asm_sfpmin_max(ASMState *as, IRIns *ir)
>   
>   static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>   {
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +    asm_sfpmin_max(as, ir);
> +#else
>       Reg dest = ra_dest(as, ir, RSET_FPR);
>       Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>       right = (left >> 8); left &= 255;
> @@ -1953,6 +2076,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>         if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right);
>       }
>       emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left);
> +#endif
>     } else {
>       Reg dest = ra_dest(as, ir, RSET_GPR);
>       Reg right, left = ra_alloc2(as, ir, RSET_GPR);
> @@ -1973,18 +2097,24 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>   
>   /* -- Comparisons --------------------------------------------------------- */
>   
> -#if LJ_32 && LJ_SOFTFP
> +#if LJ_SOFTFP
>   /* SFP comparisons. */
>   static void asm_sfpcomp(ASMState *as, IRIns *ir)
>   {
>     const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
>     RegSet drop = RSET_SCRATCH;
>     Reg r;
> +#if LJ_64
> +  IRRef args[2];
> +  args[0] = ir->op1;
> +  args[1] = ir->op2;
> +#else
>     IRRef args[4];
>     args[LJ_LE ? 0 : 1] = ir->op1; args[LJ_LE ? 1 : 0] = (ir+1)->op1;
>     args[LJ_LE ? 2 : 3] = ir->op2; args[LJ_LE ? 3 : 2] = (ir+1)->op2;
> +#endif
>   
> -  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> +  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+(LJ_64?1:3); r++) {
>       if (!rset_test(as->freeset, r) &&
>   	regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
>         rset_clear(drop, r);
> @@ -2038,11 +2168,15 @@ static void asm_comp(ASMState *as, IRIns *ir)
>   {
>     /* ORDER IR: LT GE LE GT  ULT UGE ULE UGT. */
>     IROp op = ir->o;
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +    asm_sfpcomp(as, ir);
> +#else
>       Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>       right = (left >> 8); left &= 255;
>       asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
>       emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right);
> +#endif
>     } else {
>       Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR);
>       if (op == IR_ABC) op = IR_UGT;
> @@ -2074,9 +2208,13 @@ static void asm_equal(ASMState *as, IRIns *ir)
>     Reg right, left = ra_alloc2(as, ir, (!LJ_SOFTFP && irt_isnum(ir->t)) ?
>   				       RSET_FPR : RSET_GPR);
>     right = (left >> 8); left &= 255;
> -  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP32 && irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +    asm_sfpcomp(as, ir);
> +#else
>       asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0);
>       emit_fgh(as, MIPSI_C_EQ_D, 0, left, right);
> +#endif
>     } else {
>       asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right);
>     }
> @@ -2269,7 +2407,7 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>       if ((sn & SNAP_NORESTORE))
>         continue;
>       if (irt_isnum(ir->t)) {
> -#if LJ_SOFTFP
> +#if LJ_SOFTFP32
>         Reg tmp;
>         RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
>         lua_assert(irref_isk(ref));  /* LJ_SOFTFP: must be a number constant. */
> @@ -2278,6 +2416,9 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>         if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
>         tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
>         emit_tsi(as, MIPSI_SW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#elif LJ_SOFTFP  /* && LJ_64 */
> +      Reg src = ra_alloc1(as, ref, rset_exclude(RSET_GPR, RID_BASE));
> +      emit_tsi(as, MIPSI_SD, src, RID_BASE, ofs);
>   #else
>         Reg src = ra_alloc1(as, ref, RSET_FPR);
>         emit_hsi(as, MIPSI_SDC1, src, RID_BASE, ofs);
> diff --git a/src/lj_crecord.c b/src/lj_crecord.c
> index ffe995f4..804cdbf4 100644
> --- a/src/lj_crecord.c
> +++ b/src/lj_crecord.c
> @@ -212,7 +212,7 @@ static void crec_copy_emit(jit_State *J, CRecMemList *ml, MSize mlp,
>       ml[i].trval = emitir(IRT(IR_XLOAD, ml[i].tp), trsptr, 0);
>       ml[i].trofs = trofs;
>       i++;
> -    rwin += (LJ_SOFTFP && ml[i].tp == IRT_NUM) ? 2 : 1;
> +    rwin += (LJ_SOFTFP32 && ml[i].tp == IRT_NUM) ? 2 : 1;
>       if (rwin >= CREC_COPY_REGWIN || i >= mlp) {  /* Flush buffered stores. */
>         rwin = 0;
>         for ( ; j < i; j++) {
> @@ -1152,7 +1152,7 @@ static TRef crec_call_args(jit_State *J, RecordFFData *rd,
>   	else
>   	  tr = emitconv(tr, IRT_INT, d->size==1 ? IRT_I8 : IRT_I16,IRCONV_SEXT);
>         }
> -    } else if (LJ_SOFTFP && ctype_isfp(d->info) && d->size > 4) {
> +    } else if (LJ_SOFTFP32 && ctype_isfp(d->info) && d->size > 4) {
>         lj_needsplit(J);
>       }
>   #if LJ_TARGET_X86
> diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h
> index 8a9ee24d..bb6593ae 100644
> --- a/src/lj_emit_mips.h
> +++ b/src/lj_emit_mips.h
> @@ -12,6 +12,8 @@ static intptr_t get_k64val(IRIns *ir)
>       return (intptr_t)ir_kgc(ir);
>     } else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) {
>       return (intptr_t)ir_kptr(ir);
> +  } else if (LJ_SOFTFP && ir->o == IR_KNUM) {
> +    return (intptr_t)ir_knum(ir)->u64;
>     } else {
>       lua_assert(ir->o == IR_KINT || ir->o == IR_KNULL);
>       return ir->i;  /* Sign-extended. */
> diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
> index 8af9da1d..0746ec64 100644
> --- a/src/lj_ffrecord.c
> +++ b/src/lj_ffrecord.c
> @@ -986,7 +986,7 @@ static void LJ_FASTCALL recff_string_format(jit_State *J, RecordFFData *rd)
>       handle_num:
>         tra = lj_ir_tonum(J, tra);
>         tr = lj_ir_call(J, id, tr, trsf, tra);
> -      if (LJ_SOFTFP) lj_needsplit(J);
> +      if (LJ_SOFTFP32) lj_needsplit(J);
>         break;
>       case STRFMT_STR:
>         if (!tref_isstr(tra)) {
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index aa06b273..c1ac29d1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -52,7 +52,7 @@ typedef struct CCallInfo {
>   #define CCI_XARGS(ci)		(((ci)->flags >> CCI_XARGS_SHIFT) & 3)
>   #define CCI_XA			(1u << CCI_XARGS_SHIFT)
>   
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>   #define CCI_XNARGS(ci)		(CCI_NARGS((ci)) + CCI_XARGS((ci)))
>   #else
>   #define CCI_XNARGS(ci)		CCI_NARGS((ci))
> @@ -79,13 +79,19 @@ typedef struct CCallInfo {
>   #define IRCALLCOND_SOFTFP_FFI(x)	NULL
>   #endif
>   
> -#if LJ_SOFTFP && LJ_TARGET_MIPS32
> +#if LJ_SOFTFP && LJ_TARGET_MIPS
>   #define IRCALLCOND_SOFTFP_MIPS(x)	x
>   #else
>   #define IRCALLCOND_SOFTFP_MIPS(x)	NULL
>   #endif
>   
> -#define LJ_NEED_FP64	(LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS32)
> +#if LJ_SOFTFP && LJ_TARGET_MIPS64
> +#define IRCALLCOND_SOFTFP_MIPS64(x)	x
> +#else
> +#define IRCALLCOND_SOFTFP_MIPS64(x)	NULL
> +#endif
> +
> +#define LJ_NEED_FP64	(LJ_TARGET_ARM || LJ_TARGET_PPC || LJ_TARGET_MIPS)
>   
>   #if LJ_HASFFI && (LJ_SOFTFP || LJ_NEED_FP64)
>   #define IRCALLCOND_FP64_FFI(x)		x
> @@ -113,6 +119,14 @@ typedef struct CCallInfo {
>   #define XA2_FP		0
>   #endif
>   
> +#if LJ_SOFTFP32
> +#define XA_FP32		CCI_XA
> +#define XA2_FP32	(CCI_XA+CCI_XA)
> +#else
> +#define XA_FP32		0
> +#define XA2_FP32	0
> +#endif
> +
>   #if LJ_32
>   #define XA_64		CCI_XA
>   #define XA2_64		(CCI_XA+CCI_XA)
> @@ -185,20 +199,21 @@ typedef struct CCallInfo {
>     _(ANY,	pow,			2,   N, NUM, XA2_FP) \
>     _(ANY,	atan2,			2,   N, NUM, XA2_FP) \
>     _(ANY,	ldexp,			2,   N, NUM, XA_FP) \
> -  _(SOFTFP,	lj_vm_tobit,		2,   N, INT, 0) \
> -  _(SOFTFP,	softfp_add,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_sub,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_mul,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_div,		4,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_cmp,		4,   N, NIL, 0) \
> +  _(SOFTFP,	lj_vm_tobit,		1,   N, INT, XA_FP32) \
> +  _(SOFTFP,	softfp_add,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_sub,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_mul,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_div,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP,	softfp_cmp,		2,   N, NIL, XA2_FP32) \
>     _(SOFTFP,	softfp_i2d,		1,   N, NUM, 0) \
> -  _(SOFTFP,	softfp_d2i,		2,   N, INT, 0) \
> -  _(SOFTFP_MIPS, lj_vm_sfmin,		4,   N, NUM, 0) \
> -  _(SOFTFP_MIPS, lj_vm_sfmax,		4,   N, NUM, 0) \
> +  _(SOFTFP,	softfp_d2i,		1,   N, INT, XA_FP32) \
> +  _(SOFTFP_MIPS, lj_vm_sfmin,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP_MIPS, lj_vm_sfmax,		2,   N, NUM, XA2_FP32) \
> +  _(SOFTFP_MIPS64, lj_vm_tointg,	1,   N, INT, 0) \
>     _(SOFTFP_FFI,	softfp_ui2d,		1,   N, NUM, 0) \
>     _(SOFTFP_FFI,	softfp_f2d,		1,   N, NUM, 0) \
> -  _(SOFTFP_FFI,	softfp_d2ui,		2,   N, INT, 0) \
> -  _(SOFTFP_FFI,	softfp_d2f,		2,   N, FLOAT, 0) \
> +  _(SOFTFP_FFI,	softfp_d2ui,		1,   N, INT, XA_FP32) \
> +  _(SOFTFP_FFI,	softfp_d2f,		1,   N, FLOAT, XA_FP32) \
>     _(SOFTFP_FFI,	softfp_i2f,		1,   N, FLOAT, 0) \
>     _(SOFTFP_FFI,	softfp_ui2f,		1,   N, FLOAT, 0) \
>     _(SOFTFP_FFI,	softfp_f2i,		1,   N, INT, 0) \
> diff --git a/src/lj_iropt.h b/src/lj_iropt.h
> index 73aef0ef..a59ba3f4 100644
> --- a/src/lj_iropt.h
> +++ b/src/lj_iropt.h
> @@ -150,7 +150,7 @@ LJ_FUNC IRType lj_opt_narrow_forl(jit_State *J, cTValue *forbase);
>   /* Optimization passes. */
>   LJ_FUNC void lj_opt_dce(jit_State *J);
>   LJ_FUNC int lj_opt_loop(jit_State *J);
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>   LJ_FUNC void lj_opt_split(jit_State *J);
>   #else
>   #define lj_opt_split(J)		UNUSED(J)
> diff --git a/src/lj_jit.h b/src/lj_jit.h
> index cc8efd20..c06829ab 100644
> --- a/src/lj_jit.h
> +++ b/src/lj_jit.h
> @@ -375,7 +375,7 @@ enum {
>     ((TValue *)(((intptr_t)&J->ksimd[2*(n)] + 15) & ~(intptr_t)15))
>   
>   /* Set/reset flag to activate the SPLIT pass for the current trace. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>   #define lj_needsplit(J)		(J->needsplit = 1)
>   #define lj_resetsplit(J)	(J->needsplit = 0)
>   #else
> @@ -438,7 +438,7 @@ typedef struct jit_State {
>     MSize sizesnapmap;	/* Size of temp. snapshot map buffer. */
>   
>     PostProc postproc;	/* Required post-processing after execution. */
> -#if LJ_SOFTFP || (LJ_32 && LJ_HASFFI)
> +#if LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)
>     uint8_t needsplit;	/* Need SPLIT pass. */
>   #endif
>     uint8_t retryrec;	/* Retry recording. */
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 45507e0d..bf95e1eb 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -984,6 +984,9 @@ static LJ_AINLINE void copyTV(lua_State *L, TValue *o1, const TValue *o2)
>   
>   #if LJ_SOFTFP
>   LJ_ASMF int32_t lj_vm_tobit(double x);
> +#if LJ_TARGET_MIPS64
> +LJ_ASMF int32_t lj_vm_tointg(double x);
> +#endif
>   #endif
>   
>   static LJ_AINLINE int32_t lj_num2bit(lua_Number n)
> diff --git a/src/lj_opt_split.c b/src/lj_opt_split.c
> index c0788106..2fc36b8d 100644
> --- a/src/lj_opt_split.c
> +++ b/src/lj_opt_split.c
> @@ -8,7 +8,7 @@
>   
>   #include "lj_obj.h"
>   
> -#if LJ_HASJIT && (LJ_SOFTFP || (LJ_32 && LJ_HASFFI))
> +#if LJ_HASJIT && (LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI))
>   
>   #include "lj_err.h"
>   #include "lj_buf.h"
> diff --git a/src/lj_snap.c b/src/lj_snap.c
> index a063c316..9146cddc 100644
> --- a/src/lj_snap.c
> +++ b/src/lj_snap.c
> @@ -93,7 +93,7 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
>   	    (ir->op2 & (IRSLOAD_READONLY|IRSLOAD_PARENT)) != IRSLOAD_PARENT)
>   	  sn |= SNAP_NORESTORE;
>         }
> -      if (LJ_SOFTFP && irt_isnum(ir->t))
> +      if (LJ_SOFTFP32 && irt_isnum(ir->t))
>   	sn |= SNAP_SOFTFPNUM;
>         map[n++] = sn;
>       }
> @@ -379,7 +379,7 @@ IRIns *lj_snap_regspmap(GCtrace *T, SnapNo snapno, IRIns *ir)
>   	  break;
>   	}
>         }
> -    } else if (LJ_SOFTFP && ir->o == IR_HIOP) {
> +    } else if (LJ_SOFTFP32 && ir->o == IR_HIOP) {
>         ref++;
>       } else if (ir->o == IR_PVAL) {
>         ref = ir->op1 + REF_BIAS;
> @@ -491,7 +491,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
>       } else {
>         IRType t = irt_type(ir->t);
>         uint32_t mode = IRSLOAD_INHERIT|IRSLOAD_PARENT;
> -      if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
> +      if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM)) t = IRT_NUM;
>         if (ir->o == IR_SLOAD) mode |= (ir->op2 & IRSLOAD_READONLY);
>         tr = emitir_raw(IRT(IR_SLOAD, t), s, mode);
>       }
> @@ -525,7 +525,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
>   	    if (irs->r == RID_SINK && snap_sunk_store(T, ir, irs)) {
>   	      if (snap_pref(J, T, map, nent, seen, irs->op2) == 0)
>   		snap_pref(J, T, map, nent, seen, T->ir[irs->op2].op1);
> -	      else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> +	      else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
>   		       irs+1 < irlast && (irs+1)->o == IR_HIOP)
>   		snap_pref(J, T, map, nent, seen, (irs+1)->op2);
>   	    }
> @@ -584,10 +584,10 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
>   		lua_assert(irc->o == IR_CONV && irc->op2 == IRCONV_NUM_INT);
>   		val = snap_pref(J, T, map, nent, seen, irc->op1);
>   		val = emitir(IRTN(IR_CONV), val, IRCONV_NUM_INT);
> -	      } else if ((LJ_SOFTFP || (LJ_32 && LJ_HASFFI)) &&
> +	      } else if ((LJ_SOFTFP32 || (LJ_32 && LJ_HASFFI)) &&
>   			 irs+1 < irlast && (irs+1)->o == IR_HIOP) {
>   		IRType t = IRT_I64;
> -		if (LJ_SOFTFP && irt_type((irs+1)->t) == IRT_SOFTFP)
> +		if (LJ_SOFTFP32 && irt_type((irs+1)->t) == IRT_SOFTFP)
>   		  t = IRT_NUM;
>   		lj_needsplit(J);
>   		if (irref_isk(irs->op2) && irref_isk((irs+1)->op2)) {
> @@ -645,7 +645,7 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
>       int32_t *sps = &ex->spill[regsp_spill(rs)];
>       if (irt_isinteger(t)) {
>         setintV(o, *sps);
> -#if !LJ_SOFTFP
> +#if !LJ_SOFTFP32
>       } else if (irt_isnum(t)) {
>         o->u64 = *(uint64_t *)sps;
>   #endif
> @@ -670,6 +670,9 @@ static void snap_restoreval(jit_State *J, GCtrace *T, ExitState *ex,
>   #if !LJ_SOFTFP
>       } else if (irt_isnum(t)) {
>         setnumV(o, ex->fpr[r-RID_MIN_FPR]);
> +#elif LJ_64  /* && LJ_SOFTFP */
> +    } else if (irt_isnum(t)) {
> +      o->u64 = ex->gpr[r-RID_MIN_GPR];
>   #endif
>   #if LJ_64 && !LJ_GC64
>       } else if (irt_is64(t)) {
> @@ -823,7 +826,7 @@ static void snap_unsink(jit_State *J, GCtrace *T, ExitState *ex,
>   	  val = lj_tab_set(J->L, t, &tmp);
>   	  /* NOBARRIER: The table is new (marked white). */
>   	  snap_restoreval(J, T, ex, snapno, rfilt, irs->op2, val);
> -	  if (LJ_SOFTFP && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
> +	  if (LJ_SOFTFP32 && irs+1 < T->ir + T->nins && (irs+1)->o == IR_HIOP) {
>   	    snap_restoreval(J, T, ex, snapno, rfilt, (irs+1)->op2, &tmp);
>   	    val->u32.hi = tmp.u32.lo;
>   	  }
> @@ -884,7 +887,7 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
>   	continue;
>         }
>         snap_restoreval(J, T, ex, snapno, rfilt, ref, o);
> -      if (LJ_SOFTFP && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
> +      if (LJ_SOFTFP32 && (sn & SNAP_SOFTFPNUM) && tvisint(o)) {
>   	TValue tmp;
>   	snap_restoreval(J, T, ex, snapno, rfilt, ref+1, &tmp);
>   	o->u32.hi = tmp.u32.lo;
> diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
> index 04be38f0..9839b5ac 100644
> --- a/src/vm_mips64.dasc
> +++ b/src/vm_mips64.dasc
> @@ -1984,6 +1984,38 @@ static void build_subroutines(BuildCtx *ctx)
>     |1:
>     |  jr ra
>     |.  move CRET1, r0
> +  |
> +  |// FP number to int conversion with a check for soft-float.
> +  |// Modifies CARG1, CRET1, CRET2, TMP0, AT.
> +  |->vm_tointg:
> +  |.if JIT
> +  |  dsll CRET2, CARG1, 1
> +  |  beqz CRET2, >2
> +  |.  li TMP0, 1076
> +  |  dsrl AT, CRET2, 53
> +  |  dsubu TMP0, TMP0, AT
> +  |  sltiu AT, TMP0, 54
> +  |  beqz AT, >1
> +  |.  dextm CRET2, CRET2, 0, 20
> +  |  dinsu CRET2, AT, 21, 21
> +  |  slt AT, CARG1, r0
> +  |  dsrlv CRET1, CRET2, TMP0
> +  |  dsubu CARG1, r0, CRET1
> +  |  movn CRET1, CARG1, AT
> +  |  li CARG1, 64
> +  |  subu TMP0, CARG1, TMP0
> +  |  dsllv CRET2, CRET2, TMP0	// Integer check.
> +  |  sextw AT, CRET1
> +  |  xor AT, CRET1, AT		// Range check.
> +  |  jr ra
> +  |.  movz CRET2, AT, CRET2
> +  |1:
> +  |  jr ra
> +  |.  li CRET2, 1
> +  |2:
> +  |  jr ra
> +  |.  move CRET1, r0
> +  |.endif
>     |.endif
>     |
>     |.macro .ffunc_bit, name
> @@ -2669,6 +2701,23 @@ static void build_subroutines(BuildCtx *ctx)
>     |.  li CRET1, 0
>     |.endif
>     |
> +  |.macro sfmin_max, name, intins
> +  |->vm_sf .. name:
> +  |.if JIT and not FPU
> +  |  move TMP2, ra
> +  |  bal ->vm_sfcmpolt
> +  |.  nop
> +  |  move ra, TMP2
> +  |  move TMP0, CRET1
> +  |  move CRET1, CARG1
> +  |  jr ra
> +  |.  intins CRET1, CARG2, TMP0
> +  |.endif
> +  |.endmacro
> +  |
> +  |  sfmin_max min, movz
> +  |  sfmin_max max, movn
> +  |
>     |//-----------------------------------------------------------------------
>     |//-- Miscellaneous functions --------------------------------------------
>     |//-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests
  2023-08-16 15:20     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 16:08       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:08 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Ok, thanks.

LGTM now

On 8/16/23 18:20, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
>
> On 16.08.23, Sergey Bronnikov wrote:
>> Hi, Sergey
>>
>>
>> Thanks for the patch!
>>
>> Sergey
>>
>> On 8/9/23 18:35, Sergey Kaplun wrote:
>>
>> <snipped>
>>
>>
>>> ls/frontend.lua b/test/tarantool-tests/utils/frontend.lua
>>> index 2afebbb2..414257fd 100644
>>> --- a/test/tarantool-tests/utils/frontend.lua
>>> +++ b/test/tarantool-tests/utils/frontend.lua
>>> @@ -1,6 +1,10 @@
>>>    local M = {}
>>>    
>>>    local bc = require('jit.bc')
>>> +local jutil = require('jit.util')
>>> +local vmdef = require('jit.vmdef')
>>> +local bcnames = vmdef.bcnames
>>> +local band, rshift = bit.band, bit.rshift
>>>    
>>>    function M.hasbc(f, bytecode)
>>>      assert(type(f) == 'function', 'argument #1 should be a function')
>>> @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode)
>>>      return hasbc
>>>    end
>>>    
>>> +-- Get traceno of the trace assotiated for the given function.
>>> +function M.gettraceno(func)
>>> +  assert(type(func) == 'function', 'argument #1 should be a function')
>>> +
>>> +  -- The 0th BC is the header.
>>> +  local func_ins = jutil.funcbc(func, 0)
>>> +  local BC_NAME_LENGTH = 6
>>> +  local RD_SHIFT = 16
>>
>> Nit: AFAIK usually we left a comment with a source of constants.
> Unfortunately, there is no any real sources for these constants,
> but the code is similar to the <src/jit/bc.lua>. But, I don't sure
> that is worth to be mentioned.
>
>> <snipped>
>>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots.
  2023-08-16 15:32     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 16:08       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:08 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Thanks! LGTM now

On 8/16/23 18:32, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
>
> On 16.08.23, Sergey Bronnikov wrote:
>> Hi, Sergey
>>
>>
>> thanks for the patch!
>>
>> Test has passed after reverting a patch and I suspect it is expected because
>>
>> behaviour was broken for MIPS only, right?
> Yes, its true.
>
>> See a minor comment below.
>>
>>
>> On 8/9/23 18:35, Sergey Kaplun wrote:
>>> From: Mike Pall <mike>
>>>
>>> Contributed by Djordje Kovacevic and Stefan Pejic.
>>>
>>> (cherry-picked from commit c7c3c4da432ddb543d4b0a9abbb245f11b26afd0)
>>>
>>> `asm_setup_jump()` in <src/lj_asm_mips.h> presumes that `sizeof(MCLink)`
>>> is 8 bytes, but for MIPS64 its size is 16 bytes. This leads to incorrect
>>> check in `asm_sparejump_setup()`, so mcode bottom is not updated.
>>>
>>> This patch fixes check of the MCLink offset from the mcbot.
>>> Nevertheless, the emitting of spare jump slots is still incorrect, so
>>> the introduced test still fails due to incorrect iteration through the
>>> sparce table (the last slot is out of mcode range).
>> "sparce" -> "sparse"?
> Changed to the "spare slots".
>
>> <snipped    >

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
  2023-08-15 12:09   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 16:40   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 16:40 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey


Thanks for the patch! LGTM


On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Ben Pye.
>
> (cherry-picked from commit c3c54ce1aef782823936808a75460e6b53aada2c)
>
> This patch adds partial support for the Universal Windows Platform [1]
> in LuaJIT.
> This includes:
> * `LJ_TARGET_UWP` is introduced to mark that target is Universal Windows
>    Platform.
> * `LJ_WIN_VALLOC()` macro is introduced to use instead of
>    `VirtualAlloc()` [2] (`VirtualAllocFromApp()` [3] for UWP)
> * `LJ_WIN_VPROTECT()` macro is introduced to use instead of
>    `VirtualProtect()` [4] (`VirtualProtectFromApp()` [5] for UWP)
> * `LJ_WIN_LOADLIBA()` macro is introduced to use instead of
>    `LoadLibraryExA()` [6] (custom implementation using
>    `LoadPackagedLibrary()` [7] for UWP).
>
> Note that the following features are not implemented for UWP:
> * `io.popen()`.
> * LuaJIT profiler's (`jit.p`) timer for Windows has not very high
>    resolution since `timeBeginPeriod()` [8] and `timeEndPeriod()` [9] are
>    not used, because the <winmm.dll> library isn't loaded.
>
> [1]: https://learn.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide
> [2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc
> [3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocfromapp
> [4]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect
> [5]: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotectfromapp
> [6]: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
> [7]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-loadpackagedlibrary
> [8]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
> [9]: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timeendperiod
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
>   doc/ext_ffi_api.html   |  2 ++
>   src/lib_ffi.c          |  3 +++
>   src/lib_io.c           |  4 ++--
>   src/lib_package.c      | 24 +++++++++++++++++++++++-
>   src/lj_alloc.c         |  6 +++---
>   src/lj_arch.h          | 19 +++++++++++++++++++
>   src/lj_ccallback.c     |  4 ++--
>   src/lj_clib.c          | 20 ++++++++++++++++----
>   src/lj_mcode.c         |  8 ++++----
>   src/lj_profile_timer.c |  8 ++++----
>   10 files changed, 78 insertions(+), 20 deletions(-)
>
> diff --git a/doc/ext_ffi_api.html b/doc/ext_ffi_api.html
> index 91af2e1d..c72191d1 100644
> --- a/doc/ext_ffi_api.html
> +++ b/doc/ext_ffi_api.html
> @@ -469,6 +469,8 @@ otherwise. The following parameters are currently defined:
>   <tr class="odd">
>   <td class="abiparam">win</td><td class="abidesc">Windows variant of the standard ABI</td></tr>
>   <tr class="even">
> +<td class="abiparam">uwp</td><td class="abidesc">Universal Windows Platform</td></tr>
> +<tr class="odd">
>   <td class="abiparam">gc64</td><td class="abidesc">64 bit GC references</td></tr>
>   </table>
>   
> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
> index 136e98e8..d1fe1a14 100644
> --- a/src/lib_ffi.c
> +++ b/src/lib_ffi.c
> @@ -746,6 +746,9 @@ LJLIB_CF(ffi_abi)	LJLIB_REC(.)
>   #endif
>   #if LJ_ABI_WIN
>     case H_(4ab624a8,4ab624a8): b = 1; break;  /* win */
> +#endif
> +#if LJ_TARGET_UWP
> +  case H_(a40f0bcb,a40f0bcb): b = 1; break;  /* uwp */
>   #endif
>     case H_(3af93066,1f001464): b = 1; break;  /* le/be */
>   #if LJ_GC64
> diff --git a/src/lib_io.c b/src/lib_io.c
> index f0108227..db995ae6 100644
> --- a/src/lib_io.c
> +++ b/src/lib_io.c
> @@ -99,7 +99,7 @@ static int io_file_close(lua_State *L, IOFileUD *iof)
>       int stat = -1;
>   #if LJ_TARGET_POSIX
>       stat = pclose(iof->fp);
> -#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE
> +#elif LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP
>       stat = _pclose(iof->fp);
>   #else
>       lua_assert(0);
> @@ -414,7 +414,7 @@ LJLIB_CF(io_open)
>   
>   LJLIB_CF(io_popen)
>   {
> -#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE)
> +#if LJ_TARGET_POSIX || (LJ_TARGET_WINDOWS && !LJ_TARGET_XBOXONE && !LJ_TARGET_UWP)
>     const char *fname = strdata(lj_lib_checkstr(L, 1));
>     GCstr *s = lj_lib_optstr(L, 2);
>     const char *mode = s ? strdata(s) : "r";
> diff --git a/src/lib_package.c b/src/lib_package.c
> index 67959a10..b49f0209 100644
> --- a/src/lib_package.c
> +++ b/src/lib_package.c
> @@ -76,6 +76,20 @@ static const char *ll_bcsym(void *lib, const char *sym)
>   BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
>   #endif
>   
> +#if LJ_TARGET_UWP
> +void *LJ_WIN_LOADLIBA(const char *path)
> +{
> +  DWORD err = GetLastError();
> +  wchar_t wpath[256];
> +  HANDLE lib = NULL;
> +  if (MultiByteToWideChar(CP_ACP, 0, path, -1, wpath, 256) > 0) {
> +    lib = LoadPackagedLibrary(wpath, 0);
> +  }
> +  SetLastError(err);
> +  return lib;
> +}
> +#endif
> +
>   #undef setprogdir
>   
>   static void setprogdir(lua_State *L)
> @@ -119,7 +133,7 @@ static void ll_unloadlib(void *lib)
>   
>   static void *ll_load(lua_State *L, const char *path, int gl)
>   {
> -  HINSTANCE lib = LoadLibraryExA(path, NULL, 0);
> +  HINSTANCE lib = LJ_WIN_LOADLIBA(path);
>     if (lib == NULL) pusherror(L);
>     UNUSED(gl);
>     return lib;
> @@ -132,17 +146,25 @@ static lua_CFunction ll_sym(lua_State *L, void *lib, const char *sym)
>     return f;
>   }
>   
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
>   static const char *ll_bcsym(void *lib, const char *sym)
>   {
>     if (lib) {
>       return (const char *)GetProcAddress((HINSTANCE)lib, sym);
>     } else {
> +#if LJ_TARGET_UWP
> +    return (const char *)GetProcAddress((HINSTANCE)&__ImageBase, sym);
> +#else
>       HINSTANCE h = GetModuleHandleA(NULL);
>       const char *p = (const char *)GetProcAddress(h, sym);
>       if (p == NULL && GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
>   					(const char *)ll_bcsym, &h))
>         p = (const char *)GetProcAddress(h, sym);
>       return p;
> +#endif
>     }
>   }
>   
> diff --git a/src/lj_alloc.c b/src/lj_alloc.c
> index f7039b5b..9e2fb1f6 100644
> --- a/src/lj_alloc.c
> +++ b/src/lj_alloc.c
> @@ -167,7 +167,7 @@ static void *DIRECT_MMAP(size_t size)
>   static void *CALL_MMAP(size_t size)
>   {
>     DWORD olderr = GetLastError();
> -  void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> +  void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
>     SetLastError(olderr);
>     return ptr ? ptr : MFAIL;
>   }
> @@ -176,8 +176,8 @@ static void *CALL_MMAP(size_t size)
>   static void *DIRECT_MMAP(size_t size)
>   {
>     DWORD olderr = GetLastError();
> -  void *ptr = VirtualAlloc(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> -			   PAGE_READWRITE);
> +  void *ptr = LJ_WIN_VALLOC(0, size, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN,
> +			    PAGE_READWRITE);
>     SetLastError(olderr);
>     return ptr ? ptr : MFAIL;
>   }
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 7397492e..0351e046 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -141,6 +141,13 @@
>   #define LJ_TARGET_GC64		1
>   #endif
>   
> +#ifdef _UWP
> +#define LJ_TARGET_UWP		1
> +#if LUAJIT_TARGET == LUAJIT_ARCH_X64
> +#define LJ_TARGET_GC64		1
> +#endif
> +#endif
> +
>   #define LJ_NUMMODE_SINGLE	0	/* Single-number mode only. */
>   #define LJ_NUMMODE_SINGLE_DUAL	1	/* Default to single-number mode. */
>   #define LJ_NUMMODE_DUAL		2	/* Dual-number mode only. */
> @@ -586,6 +593,18 @@
>   #define LJ_ABI_WIN		0
>   #endif
>   
> +#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_UWP
> +#define LJ_WIN_VALLOC	VirtualAllocFromApp
> +#define LJ_WIN_VPROTECT	VirtualProtectFromApp
> +extern void *LJ_WIN_LOADLIBA(const char *path);
> +#else
> +#define LJ_WIN_VALLOC	VirtualAlloc
> +#define LJ_WIN_VPROTECT	VirtualProtect
> +#define LJ_WIN_LOADLIBA(path)	LoadLibraryExA((path), NULL, 0)
> +#endif
> +#endif
> +
>   #if defined(LUAJIT_NO_UNWIND) || __GNU_COMPACT_EH__ || defined(__symbian__) || LJ_TARGET_IOS || LJ_TARGET_PS3 || LJ_TARGET_PS4
>   #define LJ_NO_UNWIND		1
>   #endif
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index c33190d7..37edd00f 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -267,7 +267,7 @@ static void callback_mcode_new(CTState *cts)
>     if (CALLBACK_MAX_SLOT == 0)
>       lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
>   #if LJ_TARGET_WINDOWS
> -  p = VirtualAlloc(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
> +  p = LJ_WIN_VALLOC(NULL, sz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
>     if (!p)
>       lj_err_caller(cts->L, LJ_ERR_FFI_CBACKOV);
>   #elif LJ_TARGET_POSIX
> @@ -285,7 +285,7 @@ static void callback_mcode_new(CTState *cts)
>   #if LJ_TARGET_WINDOWS
>     {
>       DWORD oprot;
> -    VirtualProtect(p, sz, PAGE_EXECUTE_READ, &oprot);
> +    LJ_WIN_VPROTECT(p, sz, PAGE_EXECUTE_READ, &oprot);
>     }
>   #elif LJ_TARGET_POSIX
>     mprotect(p, sz, (PROT_READ|PROT_EXEC));
> diff --git a/src/lj_clib.c b/src/lj_clib.c
> index c06c0915..a8672052 100644
> --- a/src/lj_clib.c
> +++ b/src/lj_clib.c
> @@ -158,11 +158,13 @@ BOOL WINAPI GetModuleHandleExA(DWORD, LPCSTR, HMODULE*);
>   /* Default libraries. */
>   enum {
>     CLIB_HANDLE_EXE,
> +#if !LJ_TARGET_UWP
>     CLIB_HANDLE_DLL,
>     CLIB_HANDLE_CRT,
>     CLIB_HANDLE_KERNEL32,
>     CLIB_HANDLE_USER32,
>     CLIB_HANDLE_GDI32,
> +#endif
>     CLIB_HANDLE_MAX
>   };
>   
> @@ -208,7 +210,7 @@ static const char *clib_extname(lua_State *L, const char *name)
>   static void *clib_loadlib(lua_State *L, const char *name, int global)
>   {
>     DWORD oldwerr = GetLastError();
> -  void *h = (void *)LoadLibraryExA(clib_extname(L, name), NULL, 0);
> +  void *h = LJ_WIN_LOADLIBA(clib_extname(L, name));
>     if (!h) clib_error(L, "cannot load module " LUA_QS ": %s", name);
>     SetLastError(oldwerr);
>     UNUSED(global);
> @@ -218,6 +220,7 @@ static void *clib_loadlib(lua_State *L, const char *name, int global)
>   static void clib_unloadlib(CLibrary *cl)
>   {
>     if (cl->handle == CLIB_DEFHANDLE) {
> +#if !LJ_TARGET_UWP
>       MSize i;
>       for (i = CLIB_HANDLE_KERNEL32; i < CLIB_HANDLE_MAX; i++) {
>         void *h = clib_def_handle[i];
> @@ -226,11 +229,16 @@ static void clib_unloadlib(CLibrary *cl)
>   	FreeLibrary((HINSTANCE)h);
>         }
>       }
> +#endif
>     } else if (cl->handle) {
>       FreeLibrary((HINSTANCE)cl->handle);
>     }
>   }
>   
> +#if LJ_TARGET_UWP
> +EXTERN_C IMAGE_DOS_HEADER __ImageBase;
> +#endif
> +
>   static void *clib_getsym(CLibrary *cl, const char *name)
>   {
>     void *p = NULL;
> @@ -239,6 +247,9 @@ static void *clib_getsym(CLibrary *cl, const char *name)
>       for (i = 0; i < CLIB_HANDLE_MAX; i++) {
>         HINSTANCE h = (HINSTANCE)clib_def_handle[i];
>         if (!(void *)h) {  /* Resolve default library handles (once). */
> +#if LJ_TARGET_UWP
> +	h = (HINSTANCE)&__ImageBase;
> +#else
>   	switch (i) {
>   	case CLIB_HANDLE_EXE: GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &h); break;
>   	case CLIB_HANDLE_DLL:
> @@ -249,11 +260,12 @@ static void *clib_getsym(CLibrary *cl, const char *name)
>   	  GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS|GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
>   			     (const char *)&_fmode, &h);
>   	  break;
> -	case CLIB_HANDLE_KERNEL32: h = LoadLibraryExA("kernel32.dll", NULL, 0); break;
> -	case CLIB_HANDLE_USER32: h = LoadLibraryExA("user32.dll", NULL, 0); break;
> -	case CLIB_HANDLE_GDI32: h = LoadLibraryExA("gdi32.dll", NULL, 0); break;
> +	case CLIB_HANDLE_KERNEL32: h = LJ_WIN_LOADLIBA("kernel32.dll"); break;
> +	case CLIB_HANDLE_USER32: h = LJ_WIN_LOADLIBA("user32.dll"); break;
> +	case CLIB_HANDLE_GDI32: h = LJ_WIN_LOADLIBA("gdi32.dll"); break;
>   	}
>   	if (!h) continue;
> +#endif
>   	clib_def_handle[i] = (void *)h;
>         }
>         p = (void *)GetProcAddress(h, name);
> diff --git a/src/lj_mcode.c b/src/lj_mcode.c
> index c6361018..10db4457 100644
> --- a/src/lj_mcode.c
> +++ b/src/lj_mcode.c
> @@ -66,8 +66,8 @@ void lj_mcode_sync(void *start, void *end)
>   
>   static void *mcode_alloc_at(jit_State *J, uintptr_t hint, size_t sz, DWORD prot)
>   {
> -  void *p = VirtualAlloc((void *)hint, sz,
> -			 MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
> +  void *p = LJ_WIN_VALLOC((void *)hint, sz,
> +			  MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, prot);
>     if (!p && !hint)
>       lj_trace_err(J, LJ_TRERR_MCODEAL);
>     return p;
> @@ -82,7 +82,7 @@ static void mcode_free(jit_State *J, void *p, size_t sz)
>   static int mcode_setprot(void *p, size_t sz, DWORD prot)
>   {
>     DWORD oprot;
> -  return !VirtualProtect(p, sz, prot, &oprot);
> +  return !LJ_WIN_VPROTECT(p, sz, prot, &oprot);
>   }
>   
>   #elif LJ_TARGET_POSIX
> @@ -255,7 +255,7 @@ static void *mcode_alloc(jit_State *J, size_t sz)
>   /* All memory addresses are reachable by relative jumps. */
>   static void *mcode_alloc(jit_State *J, size_t sz)
>   {
> -#ifdef __OpenBSD__
> +#if defined(__OpenBSD__) || LJ_TARGET_UWP
>     /* Allow better executable memory allocation for OpenBSD W^X mode. */
>     void *p = mcode_alloc_at(J, 0, sz, MCPROT_RUN);
>     if (p && mcode_setprot(p, sz, MCPROT_GEN)) {
> diff --git a/src/lj_profile_timer.c b/src/lj_profile_timer.c
> index 056fd1f7..0b859457 100644
> --- a/src/lj_profile_timer.c
> +++ b/src/lj_profile_timer.c
> @@ -84,7 +84,7 @@ static DWORD WINAPI timer_thread(void *timerx)
>   {
>     lj_profile_timer *timer = (lj_profile_timer *)timerx;
>     int interval = timer->opt.interval_msec;
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
>     timer->wmm_tbp(interval);
>   #endif
>     while (1) {
> @@ -92,7 +92,7 @@ static DWORD WINAPI timer_thread(void *timerx)
>       if (timer->abort) break;
>       timer->opt.handler();
>     }
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
>     timer->wmm_tep(interval);
>   #endif
>     return 0;
> @@ -101,9 +101,9 @@ static DWORD WINAPI timer_thread(void *timerx)
>   /* Start profiling timer thread. */
>   void lj_profile_timer_start(lj_profile_timer *timer)
>   {
> -#if LJ_TARGET_WINDOWS
> +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP
>     if (!timer->wmm) { /* Load WinMM library on-demand. */
> -    timer->wmm = LoadLibraryExA("winmm.dll", NULL, 0);
> +    timer->wmm = LJ_WIN_LOADLIBA("winmm.dll");
>       if (timer->wmm) {
>         timer->wmm_tbp =
>   	(WMM_TPFUNC)GetProcAddress(timer->wmm, "timeBeginPeriod");

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes.
  2023-08-15 13:07   ` Maxim Kokryashkin via Tarantool-patches
  2023-08-16 13:52     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 17:04     ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 17:04 UTC (permalink / raw)
  To: Maxim Kokryashkin, Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey


Thanks for the patch! LGTM

On 8/15/23 16:07, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> LGTM, except for a few comments below.
>
> On Wed, Aug 09, 2023 at 06:35:58PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> From: Mike Pall <mike>
>>
>> (cherry-picked from commit 70f4b15ee45a6137fe6b48b941faea79d72f7159)
>>
>> This patch refactors FFI parsing of supported C attributes and pragmas,
>> `ffi.abi()` parameter check. It replaces usage of comparison (with
> Typo: s/usage/the usage/
>> hardcoded string hashes) with search in the given string with the
> Typo: s/with search/with a search/
>> format: "\XXXattribute1\XXXattribute2", where `\XXX` is the length of
>> "attribute" name.
>>
>> Sergey Kaplun:
>> * added the description for the commit
>>
>> Part of tarantool/tarantool#8825
>> ---
>>   src/lib_ffi.c   | 35 ++++++++++------------
>>   src/lj_cparse.c | 77 +++++++++++++++++++++++++++++++------------------
>>   src/lj_cparse.h |  2 ++
>>   3 files changed, 67 insertions(+), 47 deletions(-)
>>
>> diff --git a/src/lib_ffi.c b/src/lib_ffi.c
>> index d1fe1a14..62af54c1 100644
>> --- a/src/lib_ffi.c
>> +++ b/src/lib_ffi.c
>> @@ -720,50 +720,47 @@ LJLIB_CF(ffi_fill)	LJLIB_REC(.)
>>     return 0;
>>   }
>>   
>> -#define H_(le, be)	LJ_ENDIAN_SELECT(0x##le, 0x##be)
>> -
>>   /* Test ABI string. */
>>   LJLIB_CF(ffi_abi)	LJLIB_REC(.)
>>   {
>>     GCstr *s = lj_lib_checkstr(L, 1);
>> -  int b = 0;
>> -  switch (s->hash) {
>> +  int b = lj_cparse_case(s,
>>   #if LJ_64
>> -  case H_(849858eb,ad35fd06): b = 1; break;  /* 64bit */
>> +    "\00564bit"
>>   #else
>> -  case H_(662d3c79,d0e22477): b = 1; break;  /* 32bit */
>> +    "\00532bit"
>>   #endif
>>   #if LJ_ARCH_HASFPU
>> -  case H_(e33ee463,e33ee463): b = 1; break;  /* fpu */
>> +    "\003fpu"
>>   #endif
>>   #if LJ_ABI_SOFTFP
>> -  case H_(61211a23,c2e8c81c): b = 1; break;  /* softfp */
>> +    "\006softfp"
>>   #else
>> -  case H_(539417a8,8ce0812f): b = 1; break;  /* hardfp */
>> +    "\006hardfp"
>>   #endif
>>   #if LJ_ABI_EABI
>> -  case H_(2182df8f,f2ed1152): b = 1; break;  /* eabi */
>> +    "\004eabi"
>>   #endif
>>   #if LJ_ABI_WIN
>> -  case H_(4ab624a8,4ab624a8): b = 1; break;  /* win */
>> +    "\003win"
>>   #endif
>>   #if LJ_TARGET_UWP
>> -  case H_(a40f0bcb,a40f0bcb): b = 1; break;  /* uwp */
>> +    "\003uwp"
>> +#endif
>> +#if LJ_LE
>> +    "\002le"
>> +#else
>> +    "\002be"
>>   #endif
>> -  case H_(3af93066,1f001464): b = 1; break;  /* le/be */
>>   #if LJ_GC64
>> -  case H_(9e89d2c9,13c83c92): b = 1; break;  /* gc64 */
>> +    "\004gc64"
>>   #endif
>> -  default:
>> -    break;
>> -  }
>> +  ) >= 0;
>>     setboolV(L->top-1, b);
>>     setboolV(&G(L)->tmptv2, b);  /* Remember for trace recorder. */
>>     return 1;
>>   }
>>   
>> -#undef H_
>> -
>>   LJLIB_PUSH(top-8) LJLIB_SET(!)  /* Store reference to miscmap table. */
>>   
>>   LJLIB_CF(ffi_metatype)
>> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
>> index fb440567..07c643d4 100644
>> --- a/src/lj_cparse.c
>> +++ b/src/lj_cparse.c
>> @@ -28,6 +28,24 @@
>>   ** If in doubt, please check the input against your favorite C compiler.
>>   */
>>   
>> +/* -- Miscellaneous ------------------------------------------------------- */
>> +
>> +/* Match string against a C literal. */
>> +#define cp_str_is(str, k) \
>> +  ((str)->len == sizeof(k)-1 && !memcmp(strdata(str), k, sizeof(k)-1))
>> +
>> +/* Check string against a linear list of matches. */
>> +int lj_cparse_case(GCstr *str, const char *match)
>> +{
>> +  MSize len;
>> +  int n;
>> +  for  (n = 0; (len = (MSize)*match++); n++, match += len) {
>> +    if (str->len == len && !memcmp(match, strdata(str), len))
>> +      return n;
>> +  }
>> +  return -1;
>> +}
>> +
>>   /* -- C lexer ------------------------------------------------------------- */
>>   
>>   /* C lexer token names. */
>> @@ -930,8 +948,6 @@ static CTypeID cp_decl_intern(CPState *cp, CPDecl *decl)
>>   
>>   /* -- C declaration parser ------------------------------------------------ */
>>   
>> -#define H_(le, be)	LJ_ENDIAN_SELECT(0x##le, 0x##be)
>> -
>>   /* Reset declaration state to declaration specifier. */
>>   static void cp_decl_reset(CPDecl *decl)
>>   {
>> @@ -1071,44 +1087,57 @@ static void cp_decl_gccattribute(CPState *cp, CPDecl *decl)
>>   	attrstr = lj_str_new(cp->L, c+2, attrstr->len-4);
>>   #endif
>>         cp_next(cp);
>> -      switch (attrstr->hash) {
>> -      case H_(64a9208e,8ce14319): case H_(8e6331b2,95a282af):  /* aligned */
>> +      switch (lj_cparse_case(attrstr,
>> +		"\007aligned" "\013__aligned__"
>> +		"\006packed" "\012__packed__"
>> +		"\004mode" "\010__mode__"
>> +		"\013vector_size" "\017__vector_size__"
>> +#if LJ_TARGET_X86
>> +		"\007regparm" "\013__regparm__"
>> +		"\005cdecl"  "\011__cdecl__"
>> +		"\010thiscall" "\014__thiscall__"
>> +		"\010fastcall" "\014__fastcall__"
>> +		"\007stdcall" "\013__stdcall__"
>> +		"\012sseregparm" "\016__sseregparm__"
>> +#endif
>> +	      )) {
>> +      case 0: case 1: /* aligned */
>>   	cp_decl_align(cp, decl);
>>   	break;
>> -      case H_(42eb47de,f0ede26c): case H_(29f48a09,cf383e0c):  /* packed */
>> +      case 2: case 3: /* packed */
>>   	decl->attr |= CTFP_PACKED;
>>   	break;
>> -      case H_(0a84eef6,8dfab04c): case H_(995cf92c,d5696591):  /* mode */
>> +      case 4: case 5: /* mode */
>>   	cp_decl_mode(cp, decl);
>>   	break;
>> -      case H_(0ab31997,2d5213fa): case H_(bf875611,200e9990):  /* vector_size */
>> +      case 6: case 7: /* vector_size */
>>   	{
>>   	  CTSize vsize = cp_decl_sizeattr(cp);
>>   	  if (vsize) CTF_INSERT(decl->attr, VSIZEP, lj_fls(vsize));
>>   	}
>>   	break;
>>   #if LJ_TARGET_X86
>> -      case H_(5ad22db8,c689b848): case H_(439150fa,65ea78cb):  /* regparm */
>> +      case 8: case 9: /* regparm */
>>   	CTF_INSERT(decl->fattr, REGPARM, cp_decl_sizeattr(cp));
>>   	decl->fattr |= CTFP_CCONV;
>>   	break;
>> -      case H_(18fc0b98,7ff4c074): case H_(4e62abed,0a747424):  /* cdecl */
>> +      case 10: case 11: /* cdecl */
>>   	CTF_INSERT(decl->fattr, CCONV, CTCC_CDECL);
>>   	decl->fattr |= CTFP_CCONV;
>>   	break;
>> -      case H_(72b2e41b,494c5a44): case H_(f2356d59,f25fc9bd):  /* thiscall */
>> +      case 12: case 13: /* thiscall */
>>   	CTF_INSERT(decl->fattr, CCONV, CTCC_THISCALL);
>>   	decl->fattr |= CTFP_CCONV;
>>   	break;
>> -      case H_(0d0ffc42,ab746f88): case H_(21c54ba1,7f0ca7e3):  /* fastcall */
>> +      case 14: case 15: /* fastcall */
>>   	CTF_INSERT(decl->fattr, CCONV, CTCC_FASTCALL);
>>   	decl->fattr |= CTFP_CCONV;
>>   	break;
>> -      case H_(ef76b040,9412e06a): case H_(de56697b,c750e6e1):  /* stdcall */
>> +      case 16: case 17: /* stdcall */
>>   	CTF_INSERT(decl->fattr, CCONV, CTCC_STDCALL);
>>   	decl->fattr |= CTFP_CCONV;
>>   	break;
>> -      case H_(ea78b622,f234bd8e): case H_(252ffb06,8d50f34b):  /* sseregparm */
>> +      case 18: case 19: /* sseregparm */
>>   	decl->fattr |= CTF_SSEREGPARM;
>>   	decl->fattr |= CTFP_CCONV;
>>   	break;
>> @@ -1140,16 +1169,13 @@ static void cp_decl_msvcattribute(CPState *cp, CPDecl *decl)
>>     while (cp->tok == CTOK_IDENT) {
>>       GCstr *attrstr = cp->str;
>>       cp_next(cp);
>> -    switch (attrstr->hash) {
>> -    case H_(bc2395fa,98f267f8):  /* align */
>> +    if (cp_str_is(attrstr, "align")) {
>>         cp_decl_align(cp, decl);
>> -      break;
>> -    default:  /* Ignore all other attributes. */
>> +    } else {  /* Ignore all other attributes. */
>>         if (cp_opt(cp, '(')) {
>>   	while (cp->tok != ')' && cp->tok != CTOK_EOF) cp_next(cp);
>>   	cp_check(cp, ')');
>>         }
>> -      break;
>>       }
>>     }
>>     cp_check(cp, ')');
>> @@ -1729,17 +1755,16 @@ static CTypeID cp_decl_abstract(CPState *cp)
>>   static void cp_pragma(CPState *cp, BCLine pragmaline)
>>   {
>>     cp_next(cp);
>> -  if (cp->tok == CTOK_IDENT &&
>> -      cp->str->hash == H_(e79b999f,42ca3e85))  {  /* pack */
>> +  if (cp->tok == CTOK_IDENT && cp_str_is(cp->str, "pack"))  {
>>       cp_next(cp);
>>       cp_check(cp, '(');
>>       if (cp->tok == CTOK_IDENT) {
>> -      if (cp->str->hash == H_(738e923c,a1b65954)) {  /* push */
>> +      if (cp_str_is(cp->str, "push")) {
>>   	if (cp->curpack < CPARSE_MAX_PACKSTACK) {
>>   	  cp->packstack[cp->curpack+1] = cp->packstack[cp->curpack];
>>   	  cp->curpack++;
>>   	}
>> -      } else if (cp->str->hash == H_(6c71cf27,6c71cf27)) {  /* pop */
>> +      } else if (cp_str_is(cp->str, "pop")) {
>>   	if (cp->curpack > 0) cp->curpack--;
>>         } else {
>>   	cp_errmsg(cp, cp->tok, LJ_ERR_XSYMBOL);
>> @@ -1788,13 +1813,11 @@ static void cp_decl_multi(CPState *cp)
>>         if (tok == CTOK_INTEGER) {
>>   	cp_line(cp, hashline);
>>   	continue;
>> -      } else if (tok == CTOK_IDENT &&
>> -		 cp->str->hash == H_(187aab88,fcb60b42)) { /* line */
>> +      } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "line")) {
>>   	if (cp_next(cp) != CTOK_INTEGER) cp_err_token(cp, tok);
>>   	cp_line(cp, hashline);
>>   	continue;
>> -      } else if (tok == CTOK_IDENT &&
>> -	  cp->str->hash == H_(f5e6b4f8,1d509107)) { /* pragma */
>> +      } else if (tok == CTOK_IDENT && cp_str_is(cp->str, "pragma")) {
>>   	cp_pragma(cp, hashline);
>>   	continue;
>>         } else {
>> @@ -1865,8 +1888,6 @@ static void cp_decl_single(CPState *cp)
>>     if (cp->tok != CTOK_EOF) cp_err_token(cp, CTOK_EOF);
>>   }
>>   
>> -#undef H_
>> -
>>   /* ------------------------------------------------------------------------ */
>>   
>>   /* Protected callback for C parser. */
>> diff --git a/src/lj_cparse.h b/src/lj_cparse.h
>> index bad1060b..e40b4047 100644
>> --- a/src/lj_cparse.h
>> +++ b/src/lj_cparse.h
>> @@ -60,6 +60,8 @@ typedef struct CPState {
>>   
>>   LJ_FUNC int lj_cparse(CPState *cp);
>>   
>> +LJ_FUNC int lj_cparse_case(GCstr *str, const char *match);
>> +
>>   #endif
>>   
>>   #endif
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
  2023-08-11  8:06   ` Sergey Kaplun via Tarantool-patches
  2023-08-15 13:10   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-16 17:15   ` Sergey Bronnikov via Tarantool-patches
  2 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-16 17:15 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


Thanks

for

the

patch!

LGTM


On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry picked from commit 5655be4546d9177890c69f0d0accac4773ff0887)
>
> This patch backports the aforementioned patch for mips and ppc, because
> those architectures were stripped during the backporting via
> 71ec8eb232d4dfa8df2cbbae65b799b2ce493979 ("Cleanup math function
> compilation and fix inconsistencies."). This applies these missed diffs
> to prevent conflict during backporting future patches.
>
> This patch just removes macros, that are no more in use.
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#8825
> ---
>   src/lj_asm_mips.h | 1 -
>   src/lj_asm_ppc.h  | 1 -
>   2 files changed, 2 deletions(-)
>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index a26a82cd..c27d8413 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -1794,7 +1794,6 @@ static void asm_abs(ASMState *as, IRIns *ir)
>   }
>   #endif
>   
> -#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
>   #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
>   
>   static void asm_arithov(ASMState *as, IRIns *ir)
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index 6cb608f7..6aaed058 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -1390,7 +1390,6 @@ static void asm_neg(ASMState *as, IRIns *ir)
>   }
>   
>   #define asm_abs(as, ir)		asm_fpunary(as, ir, PPCI_FABS)
> -#define asm_atan2(as, ir)	asm_callid(as, ir, IRCALL_atan2)
>   #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
>   
>   static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit 14/19] Fix debug.getinfo() argument check.
  2023-08-16 14:20     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 20:13       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16 20:13 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 2125 bytes --]


Hi, Sergey!
Thanks for the fixes!
LGTM
--
Best regards,
Maxim Kokryashkin
 
  
>Среда, 16 августа 2023, 17:25 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
> 
>Hi, Maxim!
>Thanks for the review!
>
>On 15.08.23, Maxim Kokryashkin wrote:
>> Hi, Sergey!
>> Thanks for the patch!
>> Please consider my comments below.
>>
>> On Wed, Aug 09, 2023 at 06:36:03PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> > From: Mike Pall <mike>
>> >
>> > Thanks to Sergey Ostanevich.
>> >
>> > (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
>> >
>> > This patch just reverts the commit
>> > 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
>> > debug.getinfo(1,'>S')") and applies the one from the main repo for the
>> Typo: s/for the/for/
>
>Fixed.
>
>> > consistency with the upstream.
>> > ---
>> > src/lj_debug.c | 16 ++++++----------
>> > 1 file changed, 6 insertions(+), 10 deletions(-)
>>
>> Since there were no test with the original fix, it would be nice to
>> add one.
>
>Added, see iterative diff below:
>
>===================================================================
>diff --git a/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
>new file mode 100644
>index 00000000..a50b80e4
>--- /dev/null
>+++ b/test/tarantool-tests/lj-509-debug-getinfo-arguments-check.test.lua
>@@ -0,0 +1,13 @@
>+local tap = require('tap')
>+
>+-- Test file to demonstrate crash in the `debug.getinfo()` call.
>+-- See also:  https://github.com/LuaJIT/LuaJIT/issues/509 .
>+local test = tap.test('lj-509-debug-getinfo-arguments-check.test.lua')
>+test:plan(2)
>+
>+-- '>' expects to have an extra argument on the stack.
>+local res, err = pcall(debug.getinfo, 1, '>S')
>+test:ok(not res, 'check result of the call with invalid arguments')
>+test:like(err, 'bad argument', 'check the error message')
>+
>+test:done(true)
>===================================================================
>
>> >
>
><snipped>
>
>> > 2.41.0
>> Best regards,
>> Maxim Kokryashkin
>> >
>
>--
>Best regards,
>Sergey Kaplun
 

[-- Attachment #2: Type: text/html, Size: 2912 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
  2023-08-16 15:17     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-16 20:14       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-16 20:14 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 14702 bytes --]


Hi, Sergey!
Thanks for the fixes!
LGTM
--
Best regards,
Maxim Kokryashkin
 
  
>Среда, 16 августа 2023, 18:22 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
> 
>Hi, Maxim!
>Thanks for the review!
>Please, see my answers below.
>
>On 16.08.23, Maxim Kokryashkin wrote:
>> Hi, Sergey!
>> Thanks for the patch!
>> Please consider my comments below.
>> On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:
>> > From: Mike Pall <mike>
>> >
>> > Contributed by James Cowgill.
>> >
>> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
>> >
>> > The issue is observed for the following merged IRs:
>> > | p64 HREF 0001 "a" ; or other keys
>> > | > p64 EQ 0002 [0x4002d0c528] ; nilnode
>> > Sometimes, when we need to rematerialize a constant during evicting of
>> Typo: s/during evicting/during the eviction/
>
>Fixed.
>
>> > the register. So, the instruction related to constant rematerialization
>> Sometimes happens what? The sentence looks kind of chopped.
>
>The "when" is misleading here. Dropped it.
>
>> > is placed in the delay branch slot, which suppose to contain the loads
>> Typo: s/which suppose/which is supposed/
>
>Fixed.
>
>> > of trace exit number to the `$ra` register. The resulting assembly is
>> Typo: s/number/numbers/ (because of `loads` being in the plural form)
>
>Fixed.
>
>> > the following (for example):
>> > | beq ra, r1, 0x400abee9b0 ->exit
>> > | lui r1, 65531 ; delay slot without setting of the `ra`
>> > This leading to the assertion failure during trace exit in
>> Typo: s/leading/leads/
>
>Fixed.
>
>> > `lj_trace_exit()`, since a trace number is incorrect.
>> >
>> > This patch moves the constant register allocations above the main
>> > instruction emitting code in `asm_href()`.
>> AFAICS, It is not just moved, the register allocation logic has changed too.
>> Before the patch, there were a few cases of inplace emissions, which
>> disappeared after the patch. I believe it is important to mention to, along
>> with a more detailed description of the logic changes.
>
>No, the logic is just the same, we just choose the register early.
>Since we use now `cmp64` register everywhere, there is no need to use
>duplicate code in if - else if - else chunks.
>
>> >
>> > Sergey Kaplun:
>> > * added the description and the test for the problem
>> >
>> > Part of tarantool/tarantool#8825
>> > ---
>> > src/lj_asm_mips.h | 42 +++++---
>> > ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
>> > 2 files changed, 126 insertions(+), 17 deletions(-)
>> > create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>> >
>> > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
>> > index c27d8413..23ffc3aa 100644
>> > --- a/src/lj_asm_mips.h
>> > +++ b/src/lj_asm_mips.h
>> > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>> > Reg dest = ra_dest(as, ir, allow);
>> > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
>> > Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
>> > +#if LJ_64
>> > + Reg cmp64 = RID_NONE;
>> > +#endif
>> > IRRef refkey = ir->op2;
>> > IRIns *irkey = IR(refkey);
>> > int isk = irref_isk(refkey);
>> > @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>> > #endif
>> > tmp2 = ra_scratch(as, allow);
>> > rset_clear(allow, tmp2);
>> > +#if LJ_64
>> > + if (LJ_SOFTFP || !irt_isnum(kt)) {
>> > + /* Allocate cmp64 register used for 64-bit comparisons */
>> > + if (LJ_SOFTFP && irt_isnum(kt)) {
>> > + cmp64 = key;
>> > + } else if (!isk && irt_isaddr(kt)) {
>> > + cmp64 = tmp2;
>> > + } else {
>> > + int64_t k;
>> > + if (isk && irt_isaddr(kt)) {
>> > + k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
>> > + } else {
>> > + lua_assert(irt_ispri(kt) && !irt_isnil(kt));
>> > + k = ~((int64_t)~irt_toitype(ir->t) << 47);
>> > + }
>> > + cmp64 = ra_allock(as, k, allow);
>> > + rset_clear(allow, cmp64);
>> > + }
>> > + }
>> > +#endif
>> >
>> > /* Key not found in chain: jump to exit (if merged) or load niltv. */
>> > l_end = emit_label(as);
>> > @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>> > emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
>> > emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
>> > emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
>> > - } else if (LJ_SOFTFP && irt_isnum(kt)) {
>> > - emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
>> > - emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
>> > - } else if (irt_isaddr(kt)) {
>> > - Reg refk = tmp2;
>> > - if (isk) {
>> > - int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
>> > - refk = ra_allock(as, k, allow);
>> > - rset_clear(allow, refk);
>> > - }
>> > - emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
>> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
>> > } else {
>> > - Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
>> > - rset_clear(allow, pri);
>> > - lua_assert(irt_ispri(kt) && !irt_isnil(kt));
>> > - emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
>> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
>> > + emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
>> > + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
>> > }
>> > *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
>> > if (!isk && irt_isaddr(kt)) {
>> > diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>> > new file mode 100644
>> > index 00000000..8c75e69c
>> > --- /dev/null
>> > +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>> > @@ -0,0 +1,101 @@
>> > +local tap = require('tap')
>> > +-- Test file to demonstrate the incorrect JIT behaviour for HREF
>> > +-- IR compilation on mips64.
>> > +-- See also  https://github.com/LuaJIT/LuaJIT/pull/362 .
>> > +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
>> > + ['Test requires JIT enabled'] = not jit.status(),
>> > +})
>> > +
>> > +test:plan(1)
>> > +
>> > +-- To reproduce the issue we need to compile a trace with
>> > +-- `IR_HREF`, with a lookup of constant hash key GC value. To
>> Typo: s/constant/a constant/
>
>Fixed.
>
>> > +-- prevent an `IR_HREFK` to be emitted instead, we need a table
>> Typo: s/to be/from being/
>
>Fixed.
>
>> > +-- with a huge hash part. Delta of address between the start of
>> Typo: s/Delta/The delta/
>
>Fixed.
>
>> > +-- the hash part of the table and the current node to lookup must
>> > +-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
>> Typo: s/more/greater/
>
>Fixed.
>
>> > +-- See <src/lj_record.c>, for details.
>> > +-- XXX: This constant is well suited to prevent test to be flaky,
>> Typo: s/to be/from being/
>
>Fixed.
>
>> > +-- because the aforementioned delta is always large enough.
>> > +-- Also, this constant avoids table rehashing, when inserting new
>> > +-- keys.
>> > +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
>> > +
>> > +-- XXX: don't set `hotexit` to prevent compilation of trace after
>> > +-- exiting the main test cycle.
>> I suggest rehprasing it the following way:
>> | The `hotexit` option is not set to prevent the compilation of traces
>> | after the emission of the main test cycle.
>
>Rephrased.
>
>> > +jit.opt.start('hotloop=1')
>> > +
>> > +-- Don't use `table.new()`, here by intence -- this leads to the
>> Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/
>
>Fixed.
>
>> > +-- allocation failure for the mcode memory, so traces are not
>> > +-- compiled.
>> > +local filled_tab = {}
>> > +-- Filling-up the table with GC values to minimize the amount of
>> Typo: s/Filling-up/Fill up/
>
>Fixed.
>
>> > +-- hash collisions and increase delta between the start of the
>> Typo: s/delta/the delta/
>
>Fixed.
>
>> > +-- hash part of the table and currently stored node.
>> Typo: s/currently/the currently/
>
>Fixed.
>
>> > +for _ = 1, N_HASH_FIELDS do
>> > + filled_tab[1LL] = 1
>> > +end
>> > +
>> > +-- luacheck: no unused
>> > +local tab_value_a
>> > +local tab_value_b
>> > +local tab_value_c
>> > +local tab_value_d
>> > +local tab_value_e
>> > +local tab_value_f
>> > +local tab_value_g
>> > +local tab_value_h
>> > +local tab_value_i
>> > +
>> > +-- The function for this trace has a bunch of the following IRs:
>> > +-- p64 HREF 0001 "a" ; or other keys
>> > +-- > p64 EQ 0002 [0x4002d0c528] ; nilnode
>> > +-- Sometimes, when we need to rematerialize a constant during
>> > +-- evicting of the register. So, the instruction related to
>> Typo: s/evicting/the eviction/
>
>Fixed.
>
>> Again, sometimes happens what?
>
>The "when" is misleading here. Dropped it.
>
>> > +-- constant rematerialization is placed in the delay branch slot,
>> > +-- which suppose to contain the loads of trace exit number to the
>> Typo: s/which suppose/which is supposed/
>
>Fixed.
>
>> Typo: s/number/numbers/
>
>Fixed.
>
>> > +-- `$ra` register. This leading to the assertion failure during
>> Typo: s/leading/leads/
>
>Fixed.
>
>> > +-- trace exit in `lj_trace_exit()`, since a trace number is
>> > +-- incorrect. The amount of the side exit to check is empirical
>> Typo: s/exit/exits/
>
>Fixed.
>
>> > +-- (even a little bit more, than necessary just in case).
>> Typo: s/more/greater/
>
>Fixed.
>
>> > +local function href_const(tab)
>> > + tab_value_a = tab.a
>> > + tab_value_b = tab.b
>> > + tab_value_c = tab.c
>> > + tab_value_d = tab.d
>> > + tab_value_e = tab.e
>> > + tab_value_f = tab.f
>> > + tab_value_g = tab.g
>> > + tab_value_h = tab.h
>> > + tab_value_i = tab.i
>> > +end
>> > +
>> > +-- Compile main trace first.
>> Typo: s/main/the main/
>
>Fixed.
>
>> > +href_const(filled_tab)
>> > +href_const(filled_tab)
>> > +
>> > +-- Now brute-force side exits to check that they are compiled
>> > +-- correct. Take side exits in the reverse order to take a new
>> Typo: s/correct/correctly/
>> Typo: s/the reverse/reverse/
>
>Fixed.
>
><snipped>
>
>See the iterative patch below:
>
>===================================================================
>diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>index 8c75e69c..b4ee9e2b 100644
>--- a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>@@ -9,29 +9,29 @@ local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
> test:plan(1)
> 
> -- To reproduce the issue we need to compile a trace with
>--- `IR_HREF`, with a lookup of constant hash key GC value. To
>--- prevent an `IR_HREFK` to be emitted instead, we need a table
>--- with a huge hash part. Delta of address between the start of
>--- the hash part of the table and the current node to lookup must
>--- be more than `(1024 * 64 - 1) * sizeof(Node)`.
>+-- `IR_HREF`, with a lookup of a constant hash key GC value. To
>+-- prevent an `IR_HREFK` from being emitted instead, we need a
>+-- table with a huge hash part. The delta of address between the
>+-- start of the hash part of the table and the current node to
>+-- lookup must be greater than `(1024 * 64 - 1) * sizeof(Node)`.
> -- See <src/lj_record.c>, for details.
>--- XXX: This constant is well suited to prevent test to be flaky,
>--- because the aforementioned delta is always large enough.
>+-- XXX: This constant is well suited to prevent test from being
>+-- flaky, because the aforementioned delta is always large enough.
> -- Also, this constant avoids table rehashing, when inserting new
> -- keys.
> local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
> 
>--- XXX: don't set `hotexit` to prevent compilation of trace after
>--- exiting the main test cycle.
>+-- XXX: The `hotexit` option is not set to prevent the compilation
>+-- of traces after the emission of the main test cycle.
> jit.opt.start('hotloop=1')
> 
>--- Don't use `table.new()`, here by intence -- this leads to the
>--- allocation failure for the mcode memory, so traces are not
>+-- `table.new()` is not used here by intention -- this leads to
>+-- the allocation failure for the mcode memory, so traces are not
> -- compiled.
> local filled_tab = {}
>--- Filling-up the table with GC values to minimize the amount of
>--- hash collisions and increase delta between the start of the
>--- hash part of the table and currently stored node.
>+-- Fill up the table with GC values to minimize the amount of hash
>+-- collisions and increase the delta between the start of the hash
>+-- part of the table and the currently stored node.
> for _ = 1, N_HASH_FIELDS do
>   filled_tab[1LL] = 1
> end
>@@ -50,14 +50,14 @@ local tab_value_i
> -- The function for this trace has a bunch of the following IRs:
> -- p64 HREF 0001 "a" ; or other keys
> -- > p64 EQ 0002 [0x4002d0c528] ; nilnode
>--- Sometimes, when we need to rematerialize a constant during
>--- evicting of the register. So, the instruction related to
>+-- Sometimes, we need to rematerialize a constant during the
>+-- eviction of the register. So, the instruction related to
> -- constant rematerialization is placed in the delay branch slot,
>--- which suppose to contain the loads of trace exit number to the
>--- `$ra` register. This leading to the assertion failure during
>--- trace exit in `lj_trace_exit()`, since a trace number is
>--- incorrect. The amount of the side exit to check is empirical
>--- (even a little bit more, than necessary just in case).
>+-- which is supposed to contain the load of the trace exit number
>+-- to the `$ra` register. This leads to the assertion failure
>+-- during trace exit in `lj_trace_exit()`, since a trace number is
>+-- incorrect. The amount of the side exits to check is empirical
>+-- (even a little bit greater, than necessary just in case).
> local function href_const(tab)
>   tab_value_a = tab.a
>   tab_value_b = tab.b
>@@ -70,13 +70,13 @@ local function href_const(tab)
>   tab_value_i = tab.i
> end
> 
>--- Compile main trace first.
>+-- Compile the main trace first.
> href_const(filled_tab)
> href_const(filled_tab)
> 
> -- Now brute-force side exits to check that they are compiled
>--- correct. Take side exits in the reverse order to take a new
>--- side exit each time.
>+-- correctly. Take side exits in reverse order to take a new side
>+-- exit each time.
> filled_tab.i = 'i'
> href_const(filled_tab)
> filled_tab.h = 'h'
>===================================================================
>
>> >
>
>--
>Best regards,
>Sergey Kaplun
 

[-- Attachment #2: Type: text/html, Size: 17521 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
  2023-08-15 13:17   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17  7:37   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17  7:37 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


Thanks for the patch! LGTM


On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit d4ee80342770d1281e2ce877f8ae8ab1d99e6528)
>
> This patch adds the `/* fallthrough */` where it may trigger the
> `-Wimplicit-fallthrough` [1] warning. Some cases still not covered by
> this comment and will be fixed in the future commits.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
>   dynasm/dasm_arm.h  |  2 ++
>   dynasm/dasm_mips.h |  1 +
>   dynasm/dasm_ppc.h  |  1 +
>   dynasm/dasm_x86.h  | 18 ++++++++++++++----
>   src/lj_asm.c       |  7 ++++++-
>   src/lj_cparse.c    | 10 ++++++++++
>   src/lj_err.c       |  1 +
>   src/lj_opt_sink.c  |  2 +-
>   src/lj_parse.c     |  3 ++-
>   src/luajit.c       |  1 +
>   10 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/dynasm/dasm_arm.h b/dynasm/dasm_arm.h
> index a43f7c66..1d404ccd 100644
> --- a/dynasm/dasm_arm.h
> +++ b/dynasm/dasm_arm.h
> @@ -254,6 +254,7 @@ void dasm_put(Dst_DECL, int start, ...)
>         case DASM_IMMV8:
>   	CK((n & 3) == 0, RANGE_I);
>   	n >>= 2;
> +	/* fallthrough */
>         case DASM_IMML8:
>         case DASM_IMML12:
>   	CK(n >= 0 ? ((n>>((ins>>5)&31)) == 0) :
> @@ -371,6 +372,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	  break;
>   	case DASM_REL_LG:
>   	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>   	case DASM_REL_PC:
>   	  CK(n >= 0, UNDEF_PC);
>   	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) - 4;
> diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h
> index 4b49fd8c..71a835b2 100644
> --- a/dynasm/dasm_mips.h
> +++ b/dynasm/dasm_mips.h
> @@ -350,6 +350,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	  break;
>   	case DASM_REL_LG:
>   	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>   	case DASM_REL_PC:
>   	  CK(n >= 0, UNDEF_PC);
>   	  n = *DASM_POS2PTR(D, n);
> diff --git a/dynasm/dasm_ppc.h b/dynasm/dasm_ppc.h
> index 3a7ee9b0..83fc030a 100644
> --- a/dynasm/dasm_ppc.h
> +++ b/dynasm/dasm_ppc.h
> @@ -354,6 +354,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	  break;
>   	case DASM_REL_LG:
>   	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>   	case DASM_REL_PC:
>   	  CK(n >= 0, UNDEF_PC);
>   	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base);
> diff --git a/dynasm/dasm_x86.h b/dynasm/dasm_x86.h
> index bc636357..2a276042 100644
> --- a/dynasm/dasm_x86.h
> +++ b/dynasm/dasm_x86.h
> @@ -194,12 +194,13 @@ void dasm_put(Dst_DECL, int start, ...)
>         switch (action) {
>         case DASM_DISP:
>   	if (n == 0) { if (mrm < 0) mrm = p[-2]; if ((mrm&7) != 5) break; }
> -      case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob;
> +	/* fallthrough */
> +      case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
>         case DASM_REL_A: /* Assumes ptrdiff_t is int. !x64 */
>         case DASM_IMM_D: ofs += 4; break;
>         case DASM_IMM_S: CK(((n+128)&-256) == 0, RANGE_I); goto ob;
>         case DASM_IMM_B: CK((n&-256) == 0, RANGE_I); ob: ofs++; break;
> -      case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob;
> +      case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob; /* fallthrough */
>         case DASM_IMM_W: CK((n&-65536) == 0, RANGE_I); ofs += 2; break;
>         case DASM_SPACE: p++; ofs += n; break;
>         case DASM_SETLABEL: b[pos-2] = -0x40000000; break;  /* Neg. label ofs. */
> @@ -207,8 +208,8 @@ void dasm_put(Dst_DECL, int start, ...)
>   	if (*p < 0x40 && p[1] == DASM_DISP) mrm = n;
>   	if (*p < 0x20 && (n&7) == 4) ofs++;
>   	switch ((*p++ >> 3) & 3) {
> -	case 3: n |= b[pos-3];
> -	case 2: n |= b[pos-2];
> +	case 3: n |= b[pos-3]; /* fallthrough */
> +	case 2: n |= b[pos-2]; /* fallthrough */
>   	case 1: if (n <= 7) { b[pos-1] |= 0x10; ofs--; }
>   	}
>   	continue;
> @@ -329,11 +330,14 @@ int dasm_link(Dst_DECL, size_t *szp)
>   	  pos += 2;
>   	  break;
>   	}
> +	  /* fallthrough */
>   	case DASM_SPACE: case DASM_IMM_LG: case DASM_VREG: p++;
> +	  /* fallthrough */
>   	case DASM_DISP: case DASM_IMM_S: case DASM_IMM_B: case DASM_IMM_W:
>   	case DASM_IMM_D: case DASM_IMM_WB: case DASM_IMM_DB:
>   	case DASM_SETLABEL: case DASM_REL_A: case DASM_IMM_PC: pos++; break;
>   	case DASM_LABEL_LG: p++;
> +	  /* fallthrough */
>   	case DASM_LABEL_PC: b[pos++] += ofs; break; /* Fix label offset. */
>   	case DASM_ALIGN: ofs -= (b[pos++]+ofs)&*p++; break; /* Adjust ofs. */
>   	case DASM_EXTERN: p += 2; break;
> @@ -391,12 +395,15 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	    if (mrm != 5) { mm[-1] -= 0x80; break; } }
>   	  if (((n+128) & -256) != 0) goto wd; else mm[-1] -= 0x40;
>   	}
> +	  /* fallthrough */
>   	case DASM_IMM_S: case DASM_IMM_B: wb: dasmb(n); break;
>   	case DASM_IMM_DB: if (((n+128)&-256) == 0) {
>   	    db: if (!mark) mark = cp; mark[-2] += 2; mark = NULL; goto wb;
>   	  } else mark = NULL;
> +	  /* fallthrough */
>   	case DASM_IMM_D: wd: dasmd(n); break;
>   	case DASM_IMM_WB: if (((n+128)&-256) == 0) goto db; else mark = NULL;
> +	  /* fallthrough */
>   	case DASM_IMM_W: dasmw(n); break;
>   	case DASM_VREG: {
>   	  int t = *p++;
> @@ -421,6 +428,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	}
>   	case DASM_REL_LG: p++; if (n >= 0) goto rel_pc;
>   	  b++; n = (int)(ptrdiff_t)D->globals[-n];
> +	  /* fallthrough */
>   	case DASM_REL_A: rel_a: n -= (int)(ptrdiff_t)(cp+4); goto wd; /* !x64 */
>   	case DASM_REL_PC: rel_pc: {
>   	  int shrink = *b++;
> @@ -432,6 +440,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	}
>   	case DASM_IMM_LG:
>   	  p++; if (n < 0) { n = (int)(ptrdiff_t)D->globals[-n]; goto wd; }
> +	  /* fallthrough */
>   	case DASM_IMM_PC: {
>   	  int *pb = DASM_POS2PTR(D, n);
>   	  n = *pb < 0 ? pb[1] : (*pb + (int)(ptrdiff_t)base);
> @@ -452,6 +461,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	case DASM_EXTERN: n = DASM_EXTERN(Dst, cp, p[1], *p); p += 2; goto wd;
>   	case DASM_MARK: mark = cp; break;
>   	case DASM_ESC: action = *p++;
> +	  /* fallthrough */
>   	default: *cp++ = action; break;
>   	case DASM_SECTION: case DASM_STOP: goto stop;
>   	}
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index 15de7e33..2d570bb9 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -2188,9 +2188,12 @@ static void asm_setup_regsp(ASMState *as)
>   	if (ir->op2 != REF_NIL && as->evenspill < 4)
>   	  as->evenspill = 4;  /* lj_cdata_newv needs 4 args. */
>         }
> +      /* fallthrough */
>   #else
> +      /* fallthrough */
>       case IR_CNEW:
>   #endif
> +      /* fallthrough */
>       case IR_TNEW: case IR_TDUP: case IR_CNEWI: case IR_TOSTR:
>       case IR_BUFSTR:
>         ir->prev = REGSP_HINT(RID_RET);
> @@ -2206,6 +2209,7 @@ static void asm_setup_regsp(ASMState *as)
>       case IR_LDEXP:
>   #endif
>   #endif
> +      /* fallthrough */
>       case IR_POW:
>         if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>   	if (inloop)
> @@ -2217,7 +2221,7 @@ static void asm_setup_regsp(ASMState *as)
>   	continue;
>   #endif
>         }
> -      /* fallthrough for integer POW */
> +      /* fallthrough */ /* for integer POW */
>       case IR_DIV: case IR_MOD:
>         if (!irt_isnum(ir->t)) {
>   	ir->prev = REGSP_HINT(RID_RET);
> @@ -2254,6 +2258,7 @@ static void asm_setup_regsp(ASMState *as)
>       case IR_BSHL: case IR_BSHR: case IR_BSAR:
>         if ((as->flags & JIT_F_BMI2))  /* Except if BMI2 is available. */
>   	break;
> +      /* fallthrough */
>       case IR_BROL: case IR_BROR:
>         if (!irref_isk(ir->op2) && !ra_hashint(IR(ir->op2)->r)) {
>   	IR(ir->op2)->r = REGSP_HINT(RID_ECX);
> diff --git a/src/lj_cparse.c b/src/lj_cparse.c
> index 07c643d4..cd032b8e 100644
> --- a/src/lj_cparse.c
> +++ b/src/lj_cparse.c
> @@ -595,28 +595,34 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>   	k->id = k2.id > k3.id ? k2.id : k3.id;
>   	continue;
>         }
> +      /* fallthrough */
>       case 1:
>         if (cp_opt(cp, CTOK_OROR)) {
>   	cp_expr_sub(cp, &k2, 2); k->i32 = k->u32 || k2.u32; k->id = CTID_INT32;
>   	continue;
>         }
> +      /* fallthrough */
>       case 2:
>         if (cp_opt(cp, CTOK_ANDAND)) {
>   	cp_expr_sub(cp, &k2, 3); k->i32 = k->u32 && k2.u32; k->id = CTID_INT32;
>   	continue;
>         }
> +      /* fallthrough */
>       case 3:
>         if (cp_opt(cp, '|')) {
>   	cp_expr_sub(cp, &k2, 4); k->u32 = k->u32 | k2.u32; goto arith_result;
>         }
> +      /* fallthrough */
>       case 4:
>         if (cp_opt(cp, '^')) {
>   	cp_expr_sub(cp, &k2, 5); k->u32 = k->u32 ^ k2.u32; goto arith_result;
>         }
> +      /* fallthrough */
>       case 5:
>         if (cp_opt(cp, '&')) {
>   	cp_expr_sub(cp, &k2, 6); k->u32 = k->u32 & k2.u32; goto arith_result;
>         }
> +      /* fallthrough */
>       case 6:
>         if (cp_opt(cp, CTOK_EQ)) {
>   	cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 == k2.u32; k->id = CTID_INT32;
> @@ -625,6 +631,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>   	cp_expr_sub(cp, &k2, 7); k->i32 = k->u32 != k2.u32; k->id = CTID_INT32;
>   	continue;
>         }
> +      /* fallthrough */
>       case 7:
>         if (cp_opt(cp, '<')) {
>   	cp_expr_sub(cp, &k2, 8);
> @@ -659,6 +666,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>   	k->id = CTID_INT32;
>   	continue;
>         }
> +      /* fallthrough */
>       case 8:
>         if (cp_opt(cp, CTOK_SHL)) {
>   	cp_expr_sub(cp, &k2, 9); k->u32 = k->u32 << k2.u32;
> @@ -671,6 +679,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>   	  k->u32 = k->u32 >> k2.u32;
>   	continue;
>         }
> +      /* fallthrough */
>       case 9:
>         if (cp_opt(cp, '+')) {
>   	cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 + k2.u32;
> @@ -680,6 +689,7 @@ static void cp_expr_infix(CPState *cp, CPValue *k, int pri)
>         } else if (cp_opt(cp, '-')) {
>   	cp_expr_sub(cp, &k2, 10); k->u32 = k->u32 - k2.u32; goto arith_result;
>         }
> +      /* fallthrough */
>       case 10:
>         if (cp_opt(cp, '*')) {
>   	cp_expr_unary(cp, &k2); k->u32 = k->u32 * k2.u32; goto arith_result;
> diff --git a/src/lj_err.c b/src/lj_err.c
> index 9903d273..8d7134d9 100644
> --- a/src/lj_err.c
> +++ b/src/lj_err.c
> @@ -167,6 +167,7 @@ static void *err_unwind(lua_State *L, void *stopcf, int errcode)
>       case FRAME_CONT:  /* Continuation frame. */
>         if (frame_iscont_fficb(frame))
>   	goto unwind_c;
> +      /* fallthrough */
>       case FRAME_VARG:  /* Vararg frame. */
>         frame = frame_prevd(frame);
>         break;
> diff --git a/src/lj_opt_sink.c b/src/lj_opt_sink.c
> index a16d112f..c16363e7 100644
> --- a/src/lj_opt_sink.c
> +++ b/src/lj_opt_sink.c
> @@ -100,8 +100,8 @@ static void sink_mark_ins(jit_State *J)
>   	   (LJ_32 && ir+1 < irlast && (ir+1)->o == IR_HIOP &&
>   	    !sink_checkphi(J, ir, (ir+1)->op2))))
>   	irt_setmark(ir->t);  /* Mark ineligible allocation. */
> -      /* fallthrough */
>   #endif
> +      /* fallthrough */
>       case IR_USTORE:
>         irt_setmark(IR(ir->op2)->t);  /* Mark stored value. */
>         break;
> diff --git a/src/lj_parse.c b/src/lj_parse.c
> index 343fa797..e238afa3 100644
> --- a/src/lj_parse.c
> +++ b/src/lj_parse.c
> @@ -2684,7 +2684,8 @@ static int parse_stmt(LexState *ls)
>         lj_lex_next(ls);
>         parse_goto(ls);
>         break;
> -    }  /* else: fallthrough */
> +    }
> +    /* fallthrough */
>     default:
>       parse_call_assign(ls);
>       break;
> diff --git a/src/luajit.c b/src/luajit.c
> index 1ca24301..3a3ec247 100644
> --- a/src/luajit.c
> +++ b/src/luajit.c
> @@ -421,6 +421,7 @@ static int collectargs(char **argv, int *flags)
>         break;
>       case 'e':
>         *flags |= FLAGS_EXEC;
> +      /* fallthrough */
>       case 'j':  /* LuaJIT extension */
>       case 'l':
>         *flags |= FLAGS_OPTION;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
  2023-08-15 13:21   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17  7:39   ` Sergey Bronnikov via Tarantool-patches
  2023-08-17  7:51     ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17  7:39 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


Thanks for the patch! LGTM


On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
>
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
>   dynasm/dasm_arm64.h | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/dynasm/dasm_arm64.h b/dynasm/dasm_arm64.h
> index 47e1e074..ff21236d 100644
> --- a/dynasm/dasm_arm64.h
> +++ b/dynasm/dasm_arm64.h
> @@ -427,6 +427,7 @@ int dasm_encode(Dst_DECL, void *buffer)
>   	  break;
>   	case DASM_REL_LG:
>   	  CK(n >= 0, UNDEF_LG);
> +	  /* fallthrough */
>   	case DASM_REL_PC:
>   	  CK(n >= 0, UNDEF_PC);
>   	  n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
  2023-08-15 13:25   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17  7:44   ` Sergey Bronnikov via Tarantool-patches
  2023-08-17  8:01     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17  7:44 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!

LGTM, but I have a question inline


On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 9bd5a722bee2ee2c5b159a89937778b81be49915)
>
> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> missing for the ARM build, so the `-Wimplicit-fallthrough` [1] warning
> is trigerred.
>
> Also, this commits sets the correspoinding flag in the
> <cmake/SetTargetFlags.cmake>.
>
> [1]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough
>
> Sergey Kaplun:
> * added the description for the commit
>
> Part of tarantool/tarantool#8825
> ---
>   cmake/SetTargetFlags.cmake | 6 ++++++
>   src/lj_asm.c               | 2 +-
>   src/lj_asm_arm.h           | 4 ++--
>   3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/cmake/SetTargetFlags.cmake b/cmake/SetTargetFlags.cmake
> index 3b9e481d..d309989e 100644
> --- a/cmake/SetTargetFlags.cmake
> +++ b/cmake/SetTargetFlags.cmake
> @@ -8,6 +8,12 @@
>   
>   include(CheckUnwindTables)
>   
> +# Clang does not recognize comment markers.
> +if (CMAKE_C_COMPILER_ID STREQUAL "GNU"
> +    AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")

GCC 7.1 because there was no 7.0 in release series 7 [1], right?

1. https://gcc.gnu.org/gcc-7/

> +  AppendFlags(TARGET_C_FLAGS -Wimplicit-fallthrough)
> +endif()
> +
>   if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
>     set(BUILDVM_MODE machasm)
>   else() # Linux and FreeBSD.


<snipped>


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
  2023-08-17  7:39   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17  7:51     ` Sergey Bronnikov via Tarantool-patches
  2023-08-17  7:58       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17  7:51 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey, again

On 8/17/23 10:39, Sergey Bronnikov via Tarantool-patches wrote:
> Hi, Sergey!
>
>
> Thanks for the patch! LGTM
>
>
> On 8/9/23 18:36, Sergey Kaplun wrote:
>> From: Mike Pall <mike>
>>
>> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
>>
>> This patch adds the `/* fallthrough */` comments elsewhere, where it was
>> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
>> is trigerred.
>>
Typo: triggered


<snipped>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning.
  2023-08-17  7:51     ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17  7:58       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17  7:58 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey, again
> 
> On 8/17/23 10:39, Sergey Bronnikov via Tarantool-patches wrote:
> > Hi, Sergey!
> >
> >
> > Thanks for the patch! LGTM
> >
> >
> > On 8/9/23 18:36, Sergey Kaplun wrote:
> >> From: Mike Pall <mike>
> >>
> >> (cherry-picked from commit 9b41062156779160b88fe5e1eb1ece1ee1fe6a74)
> >>
> >> This patch adds the `/* fallthrough */` comments elsewhere, where it was
> >> missing for the ARM64 build, so the `-Wimplicit-fallthrough` [1] warning
> >> is trigerred.
> >>
> Typo: triggered

Fixed.

> 
> 
> <snipped>

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
  2023-08-17  7:44   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17  8:01     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17  8:01 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> LGTM, but I have a question inline
> 
> 

<snipped>

> > +    AND CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL "7.1")
> 
> GCC 7.1 because there was no 7.0 in release series 7 [1], right?
> 
> 1. https://gcc.gnu.org/gcc-7/

Yes, just the first version with this option.

<snipped>

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
  2023-08-15 13:35   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17  8:29   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17  8:29 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


LGTM

On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Thanks to Sergey Ostanevich.
>
> (cherry-picked from commit 0cd643d7cfc21bc8b6153d42b86a71d557270988)
>
> This patch just reverts the commit
> 48f463e613db6264bfa9acb581fe1ca702ea38eb ("luajit: fox for
> debug.getinfo(1,'>S')") and applies the one from the main repo for the
> consistency with the upstream.
> ---
>   src/lj_debug.c | 16 ++++++----------
>   1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/src/lj_debug.c b/src/lj_debug.c
> index 654dc913..c4edcabb 100644
> --- a/src/lj_debug.c
> +++ b/src/lj_debug.c
> @@ -431,16 +431,12 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
>     TValue *frame = NULL;
>     TValue *nextframe = NULL;
>     GCfunc *fn;
> -  if (*what == '>') { /* we have to have an extra arg on stack */
> -    if (lua_gettop(L) > 2) {
> -      TValue *func = L->top - 1;
> -      api_check(L, tvisfunc(func));
> -      fn = funcV(func);
> -      L->top--;
> -      what++;
> -    } else { /* need better error to display? */
> -      return 0;
> -    }
> +  if (*what == '>') {
> +    TValue *func = L->top - 1;
> +    if (!tvisfunc(func)) return 0;
> +    fn = funcV(func);
> +    L->top--;
> +    what++;
>     } else {
>       uint32_t offset = (uint32_t)ar->i_ci & 0xffff;
>       uint32_t size = (uint32_t)ar->i_ci >> 16;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
  2023-08-17  8:57   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17  8:57     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17  8:57 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> On 8/9/23 18:36, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Thanks to Yichun Zhang.
> >
> > (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
> >
> > This patch is predecessor for the commit
> > 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> > check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> > that leading to the assertion failure. Since the predecessor patch,
> > there are no places, that can lead to the condition failure, since we
> > always check that new baseslot + framesize (+ vargframe) >=
> > `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
> > for details), we can't obtain this assertion failure. This patch is
> > added for the consistency with the upstream.
> >
> > Since the predecessor patch fixes the issue, there is no new test case
> > to add.
> >
> > Sergey Kaplun:
> > * added the description for the problem
> Test for backported patch is missing. Why?

As mentioned above there is two separate commits (the current one and
this one [1]). Since the second fixes the issue and was backported
earlier there is no new testcase provided (see the commit message).

> >
> > Part of tarantool/tarantool#8825
> <snipped>

[1]: https://github.com/LuaJIT/LuaJIT/commit/630ff319

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
  2023-08-15 14:07   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17  8:57   ` Sergey Bronnikov via Tarantool-patches
  2023-08-17  8:57     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17  8:57 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!

On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Thanks to Yichun Zhang.
>
> (cherry-picked from commit 1c89933f129dde76944336c6bfd05297b8d67730)
>
> This patch is predecessor for the commit
> 944d32afd6ddd9dbac1cddf64bf81333efeb9e30 ("Add missing LJ_MAX_JSLOTS
> check.") It tries to fix the issue, when `J->baseslot == LJ_MAX_JSLOTS`,
> that leading to the assertion failure. Since the predecessor patch,
> there are no places, that can lead to the condition failure, since we
> always check that new baseslot + framesize (+ vargframe) >=
> `LJ_MAX_JSLOTS`. As far as minimum framesize is 1 (see <src/lj_parse.c>
> for details), we can't obtain this assertion failure. This patch is
> added for the consistency with the upstream.
>
> Since the predecessor patch fixes the issue, there is no new test case
> to add.
>
> Sergey Kaplun:
> * added the description for the problem
Test for backported patch is missing. Why?
>
> Part of tarantool/tarantool#8825
<snipped>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
  2023-08-15 14:38   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 10:53   ` Sergey Bronnikov via Tarantool-patches
  2023-08-17 13:57     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 10:53 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


thanks for the patch!

test duration is about 7 sec, I propose to add a print before string.rep:

print('# test generation requires about 7 sec')

Otherwise it looks like test hang. Feel free to ignore.


LGTM after fixing comments from Max.


Sergey

On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 16e5605eec2e3882d709c6b123a644f6a8023945)
>
> This commit fixes possible integer overflow of the separator's length
> counter during parsing long strings. It may lead to the fact, that
> parser considers a string with unbalanced long brackets to be correct.
> Since this is pointless to parse too long string separators in the hope,
> that the string is correct, just use hardcoded limit (2 ^ 25 is enough).
>
> Be aware that this limit is different for Lua 5.1.
>
> We can't check the string overflow itself without a really large file,
> because the ERR_MEM error will be raised, due to the string buffer
> reallocations during parsing. Keep such huge file in the repo is
> pointless, so just check that we don't parse long string after
> aforementioned separator length.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8825
> ---
>   src/lj_lex.c                                  |  2 +-
>   .../lj-812-too-long-string-separator.test.lua | 31 +++++++++++++++++++
>   2 files changed, 32 insertions(+), 1 deletion(-)
>   create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
>
> diff --git a/src/lj_lex.c b/src/lj_lex.c
> index 52856912..c66660d7 100644
> --- a/src/lj_lex.c
> +++ b/src/lj_lex.c
> @@ -138,7 +138,7 @@ static int lex_skipeq(LexState *ls)
>     int count = 0;
>     LexChar s = ls->c;
>     lua_assert(s == '[' || s == ']');
> -  while (lex_savenext(ls) == '=')
> +  while (lex_savenext(ls) == '=' && count < 0x20000000)
>       count++;
>     return (ls->c == s) ? count : (-count) - 1;
>   }
> diff --git a/test/tarantool-tests/lj-812-too-long-string-separator.test.lua b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> new file mode 100644
> index 00000000..fda69d17
> --- /dev/null
> +++ b/test/tarantool-tests/lj-812-too-long-string-separator.test.lua
> @@ -0,0 +1,31 @@
> +local tap = require('tap')
> +
> +-- Test to check that we avoid parsing of too long separator
> +-- for long strings.
> +-- See also the discussion in the
> +-- https://github.com/LuaJIT/LuaJIT/issues/812.
> +
> +local test = tap.test('lj-812-too-long-string-separator'):skipcond({
> +  ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
> +})
> +test:plan(2)
> +
> +-- We can't check the string overflow itself without a really
> +-- large file, because the ERR_MEM error will be raised, due to
> +-- the string buffer reallocations during parsing.
> +-- Keep such huge file in the repo is pointless, so just check
> +-- that we don't parse long string after some separator length.
> +-- Be aware that this limit is different for Lua 5.1.
> +
> +-- Use the hardcoded limit. The same as in the <src/lj_lex.c>.
> +local separator = string.rep('=', 0x20000000 + 1)
> +local test_str = ('return [%s[]%s]'):format(separator, separator)
> +
> +local f, err = loadstring(test_str, 'empty_str_f')
> +test:ok(not f, 'correct status when parsing string with too long separator')
> +
> +-- Check error message.
> +test:ok(tostring(err):match('invalid long string delimiter'),
> +        'correct error when parsing string with too long separator')
> +
> +test:done(true)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF Sergey Kaplun via Tarantool-patches
  2023-08-16  9:01   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 11:06   ` Sergey Bronnikov via Tarantool-patches
  2023-08-17 13:50     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 11:06 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


Thanks for the patch!

Test is passed after reverting the patch.


On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by James Cowgill.
>
> (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
>
> The issue is observed for the following merged IRs:
> |    p64 HREF   0001  "a"            ; or other keys
> | >  p64 EQ     0002  [0x4002d0c528] ; nilnode
> Sometimes, when we need to rematerialize a constant during evicting of
> the register. So, the instruction related to constant rematerialization
> is placed in the delay branch slot, which suppose to contain the loads
> of trace exit number to the `$ra` register. The resulting assembly is
> the following (for example):
> | beq     ra, r1, 0x400abee9b0  ->exit
> | lui     r1, 65531   ; delay slot without setting of the `ra`
> This leading to the assertion failure during trace exit in
> `lj_trace_exit()`, since a trace number is incorrect.
>
> This patch moves the constant register allocations above the main
> instruction emitting code in `asm_href()`.
>
> Sergey Kaplun:
> * added the description and the test for the problem

Test is passed after reverting the patch. LuaJIT was built with and 
without GC64.


<snipped>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable Sergey Kaplun via Tarantool-patches
  2023-08-16  9:03   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 12:01   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 12:01 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey


thanks for the patch! LGTM


(to be honestly I'm a bit confused why a single warning was fixed,

when luacheck reports about 112 warnings in dynasm/ directory. But this 
question is out of scope of backporting.).


Sergey

On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Cleanup only, bug cannot trigger.
> Thanks to Domingo Alvarez Duarte.
>
> (cherry-picked from commit 5c911998a3c85d024a8006feafc68d0b4c962fd8)
>
> This patch fixes local shadow variable `n` in `template__` function from
> <dynasm/dasm_mips.lua> by renaming it to `m`. Since this cannot be
> triggered, there is no test provided.
>
> Sergey Kaplun:
> * added the description for the problem
> ---
>   dynasm/dasm_mips.lua | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/dynasm/dasm_mips.lua b/dynasm/dasm_mips.lua
> index 78a4e34a..bd2a2b43 100644
> --- a/dynasm/dasm_mips.lua
> +++ b/dynasm/dasm_mips.lua
> @@ -809,9 +809,9 @@ map_op[".template__"] = function(params, template, nparams)
>       elseif p == "X" then
>         op = op + parse_index(params[n]); n = n + 1
>       elseif p == "B" or p == "J" then
> -      local mode, n, s = parse_label(params[n], false)
> -      if p == "B" then n = n + 2048 end
> -      waction("REL_"..mode, n, s, 1)
> +      local mode, m, s = parse_label(params[n], false)
> +      if p == "B" then m = m + 2048 end
> +      waction("REL_"..mode, m, s, 1)
>         n = n + 1
>       elseif p == "A" then
>         op = op + parse_imm(params[n], 5, 6, 0, false); n = n + 1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port.
  2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port Sergey Kaplun via Tarantool-patches
  2023-08-16  9:16   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 13:03   ` Sergey Bronnikov via Tarantool-patches
  2023-08-17 13:59     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 13:03 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


thanks for the patch!

LGTM, after fixing extra whitespace in commit message.


Sergey

On 8/9/23 18:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Hua Zhang, YunQiang Su from Wave Computing,
> and Radovan Birdic from RT-RK.
> Sponsored by Wave Computing.
>
> (cherry-picked from commit 94d0b53004a5fa368defa4307a17edcdb87fe727)
>
> This patch adds support for MIPS Release 6 [1] for the 64-bit build.
> This includes:
> * Global `_map_def` value is set with <dynasm/dynasm.lua>. `MIPSR6` key
>    specifies the corresponding instruction set support. Also, `MIPSR6` is
>    defined in `DYNASM_FLAGS` (`DASM_AFLAGS`).
> * New instructions are added within <dynasm/dasm_mips.lua>, they are
>    used if the aforementioned key is set.
> * Obsolete instructions (that are no more in use in r6) are used in the
>    opposite case (if `MIPSR6` isn't set).
> * New opcode maps are added into  <src/jit/dis_mips.lua>.
Nit: double whitespace before "<".


<snipped>


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
  2023-08-17 11:06   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17 13:50     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 14:30       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17 13:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> 
> Thanks for the patch!
> 
> Test is passed after reverting the patch.
> 
> 
> On 8/9/23 18:36, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by James Cowgill.
> >
> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
> >
> > The issue is observed for the following merged IRs:
> > |    p64 HREF   0001  "a"            ; or other keys
> > | >  p64 EQ     0002  [0x4002d0c528] ; nilnode
> > Sometimes, when we need to rematerialize a constant during evicting of
> > the register. So, the instruction related to constant rematerialization
> > is placed in the delay branch slot, which suppose to contain the loads
> > of trace exit number to the `$ra` register. The resulting assembly is
> > the following (for example):
> > | beq     ra, r1, 0x400abee9b0  ->exit
> > | lui     r1, 65531   ; delay slot without setting of the `ra`
> > This leading to the assertion failure during trace exit in
> > `lj_trace_exit()`, since a trace number is incorrect.
> >
> > This patch moves the constant register allocations above the main
> > instruction emitting code in `asm_href()`.
> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> 
> Test is passed after reverting the patch. LuaJIT was built with and 
> without GC64.

The test case is for MIPS, since the changes are only for MIPS too.
But in general it is good practise to test other arches too, for
observing any inconsistencies.

> 
> 
> <snipped>

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
  2023-08-17 10:53   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17 13:57     ` Sergey Kaplun via Tarantool-patches
  2023-08-17 14:28       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17 13:57 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> 
> thanks for the patch!
> 
> test duration is about 7 sec, I propose to add a print before string.rep:
> 
> print('# test generation requires about 7 sec')

I suppose it is a good thing to do, but there is another one test
(gh-7745-oom-on-trace.test.lua) which takes a long time too.
Maybe it's better to create a group of "long tests", or something like
that. Also, we may introduce `test:comment()` helper which provides the
same behaviour as the `test:diag('# ' .. fmt, ...)` and use it in all
long tests. But I prefer to do it in the separate patch set, since this
one is already huge enough :).

> 
> Otherwise it looks like test hang. Feel free to ignore.
> 
> 
> LGTM after fixing comments from Max.
> 
> 
> Sergey

<snipped>

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port.
  2023-08-17 13:03   ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-17 13:59     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-08-17 13:59 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!

On 17.08.23, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> 
> thanks for the patch!
> 
> LGTM, after fixing extra whitespace in commit message.
> 
> 
> Sergey
> 
> On 8/9/23 18:36, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by Hua Zhang, YunQiang Su from Wave Computing,
> > and Radovan Birdic from RT-RK.
> > Sponsored by Wave Computing.
> >
> > (cherry-picked from commit 94d0b53004a5fa368defa4307a17edcdb87fe727)
> >
> > This patch adds support for MIPS Release 6 [1] for the 64-bit build.
> > This includes:
> > * Global `_map_def` value is set with <dynasm/dynasm.lua>. `MIPSR6` key
> >    specifies the corresponding instruction set support. Also, `MIPSR6` is
> >    defined in `DYNASM_FLAGS` (`DASM_AFLAGS`).
> > * New instructions are added within <dynasm/dasm_mips.lua>, they are
> >    used if the aforementioned key is set.
> > * Obsolete instructions (that are no more in use in r6) are used in the
> >    opposite case (if `MIPSR6` isn't set).
> > * New opcode maps are added into  <src/jit/dis_mips.lua>.
> Nit: double whitespace before "<".
> 

Fixed, thanks!

Branch is force-pushed.

> 
> <snipped>
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit 00/19] Prerequisites for improve assertions
  2023-08-16 15:35 ` [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
@ 2023-08-17 14:06   ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-08-17 14:06 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 286 bytes --]


Hi, Sergey!
Thanks for the fixes!
The whole patchset is LGTM now.
--
Best regards,
Maxim Kokryashkin
 
 
> 
>>Hi, folks!
>>I've fixed Maxim's comments for all patches, and rebased my branch to
>>tarantool/master.
>>Branch is force-pushed.
>>
>>--
>>Best regards,
>>Sergey Kaplun
> 

[-- Attachment #2: Type: text/html, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings.
  2023-08-17 13:57     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-17 14:28       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:28 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, again

On 8/17/23 16:57, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
>
> On 17.08.23, Sergey Bronnikov wrote:
>> Hi, Sergey!
>>
>>
>> thanks for the patch!
>>
>> test duration is about 7 sec, I propose to add a print before string.rep:
>>
>> print('# test generation requires about 7 sec')
> I suppose it is a good thing to do, but there is another one test
> (gh-7745-oom-on-trace.test.lua) which takes a long time too.
> Maybe it's better to create a group of "long tests", or something like
> that. Also, we may introduce `test:comment()` helper which provides the
> same behaviour as the `test:diag('# ' .. fmt, ...)` and use it in all
> long tests. But I prefer to do it in the separate patch set, since this
> one is already huge enough :).
Okey, I'll not insist. LGTM
>
>> Otherwise it looks like test hang. Feel free to ignore.
>>
>>
>> LGTM after fixing comments from Max.
>>
>>
>> Sergey
> <snipped>
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF.
  2023-08-17 13:50     ` Sergey Kaplun via Tarantool-patches
@ 2023-08-17 14:30       ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:30 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi,

On 8/17/23 16:50, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
>
> On 17.08.23, Sergey Bronnikov wrote:
>> Hi, Sergey!
>>
>>
>> Thanks for the patch!
>>
>> Test is passed after reverting the patch.
>>
>>
>> On 8/9/23 18:36, Sergey Kaplun wrote:
>>> From: Mike Pall <mike>
>>>
>>> Contributed by James Cowgill.
>>>
>>> (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
>>>
>>> The issue is observed for the following merged IRs:
>>> |    p64 HREF   0001  "a"            ; or other keys
>>> | >  p64 EQ     0002  [0x4002d0c528] ; nilnode
>>> Sometimes, when we need to rematerialize a constant during evicting of
>>> the register. So, the instruction related to constant rematerialization
>>> is placed in the delay branch slot, which suppose to contain the loads
>>> of trace exit number to the `$ra` register. The resulting assembly is
>>> the following (for example):
>>> | beq     ra, r1, 0x400abee9b0  ->exit
>>> | lui     r1, 65531   ; delay slot without setting of the `ra`
>>> This leading to the assertion failure during trace exit in
>>> `lj_trace_exit()`, since a trace number is incorrect.
>>>
>>> This patch moves the constant register allocations above the main
>>> instruction emitting code in `asm_href()`.
>>>
>>> Sergey Kaplun:
>>> * added the description and the test for the problem
>> Test is passed after reverting the patch. LuaJIT was built with and
>> without GC64.
> The test case is for MIPS, since the changes are only for MIPS too.
> But in general it is good practise to test other arches too, for
> observing any inconsistencies.

Okey, LGTM now.


>>
>> <snipped>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
  2023-08-15 11:58   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 14:31   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:31 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey


thanks for the patch! LGTM

On 8/9/23 18:35, Sergey Kaplun wrote:
> This patch is a follow-up for the commit
> a170eb8be9475295f4f67a086e25ed665b95c8ea ("core: separate the profiling
> timer from lj_profile"). It moves the timer machinery to the separate
> module. Unfortunately, the `profile_{un}lock()` calls for Windows and
> PS3 wasn't updated to access `lj_profile_timer` structure instead of
> `ProfileState`.
>
> Also, it is a follow-up to the commit
> f8fa8f4bbd103ab07697487ca5cab08d57cdebf5 ("memprof: add profile common
> section"). Since this commit the system-dependent header <unistd.h> and
> `write()`, `open()`, `close()` functions are used. They are undefining
> on Windows, so this leads to error during the build.
>
> This patch fixes the aforementioned misbehaviour. After it our fork may
> be built on Windows at least.
> ---
>   src/lib_misc.c         | 16 ++++++++++++----
>   src/lj_profile_timer.h |  8 ++++----
>   2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index c18d297e..1913a622 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c
> @@ -8,10 +8,6 @@
>   #define lib_misc_c
>   #define LUA_LIB
>   
> -#include <errno.h>
> -#include <fcntl.h>
> -#include <unistd.h>
> -
>   #include "lua.h"
>   #include "lmisclib.h"
>   #include "lauxlib.h"
> @@ -25,6 +21,12 @@
>   
>   #include "lj_memprof.h"
>   
> +#include <errno.h>
> +#include <fcntl.h>
> +#if !LJ_TARGET_WINDOWS
> +#include <unistd.h>
> +#endif
> +
>   /* ------------------------------------------------------------------------ */
>   
>   static LJ_AINLINE void setnumfield(struct lua_State *L, GCtab *t,
> @@ -78,6 +80,7 @@ LJLIB_CF(misc_getmetrics)
>   
>   /* --------- profile common section --------------------------------------- */
>   
> +#if !LJ_TARGET_WINDOWS
>   /*
>   ** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
>   */
> @@ -434,6 +437,7 @@ LJLIB_CF(misc_memprof_stop)
>     lua_pushboolean(L, 1);
>     return 1;
>   }
> +#endif /* !LJ_TARGET_WINDOWS */
>   
>   #include "lj_libdef.h"
>   
> @@ -441,6 +445,7 @@ LJLIB_CF(misc_memprof_stop)
>   
>   LUALIB_API int luaopen_misc(struct lua_State *L)
>   {
> +#if !LJ_TARGET_WINDOWS
>     luaM_sysprof_set_writer(buffer_writer_default);
>     luaM_sysprof_set_on_stop(on_stop_cb_default);
>     /*
> @@ -448,9 +453,12 @@ LUALIB_API int luaopen_misc(struct lua_State *L)
>     ** backtracing function.
>     */
>     luaM_sysprof_set_backtracer(NULL);
> +#endif /* !LJ_TARGET_WINDOWS */
>   
>     LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
> +#if !LJ_TARGET_WINDOWS
>     LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
>     LJ_LIB_REG(L, LUAM_MISCLIBNAME ".sysprof", misc_sysprof);
> +#endif /* !LJ_TARGET_WINDOWS */
>     return 1;
>   }
> diff --git a/src/lj_profile_timer.h b/src/lj_profile_timer.h
> index 1deeea53..b3e1a6e9 100644
> --- a/src/lj_profile_timer.h
> +++ b/src/lj_profile_timer.h
> @@ -25,8 +25,8 @@
>   #if LJ_TARGET_PS3
>   #include <sys/timer.h>
>   #endif
> -#define profile_lock(ps)	pthread_mutex_lock(&ps->lock)
> -#define profile_unlock(ps)	pthread_mutex_unlock(&ps->lock)
> +#define profile_lock(ps)	pthread_mutex_lock(&ps->timer.lock)
> +#define profile_unlock(ps)	pthread_mutex_unlock(&ps->timer.lock)
>   
>   #elif LJ_PROFILE_WTHREAD
>   
> @@ -38,8 +38,8 @@
>   #include <windows.h>
>   #endif
>   typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int);
> -#define profile_lock(ps)	EnterCriticalSection(&ps->lock)
> -#define profile_unlock(ps)	LeaveCriticalSection(&ps->lock)
> +#define profile_lock(ps)	EnterCriticalSection(&ps->timer.lock)
> +#define profile_unlock(ps)	LeaveCriticalSection(&ps->timer.lock)
>   
>   #endif
>   

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
  2023-08-15 11:46   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 14:33   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:33 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey!


thanks for the patch! LGTM

On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the JIT compiler for powerpc.
> This includes:
> * All fp-depending paths are instrumented with `LJ_SOFTFP` macro.
> * `asm_sfpmin_max()` is introduced for min/max operations on soft-float
>    point.
> * `asm_sfpcomp()` is introduced for soft-float point comparisons.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
>   src/lj_arch.h    |   1 -
>   src/lj_asm_ppc.h | 321 ++++++++++++++++++++++++++++++++++++++++-------
>   2 files changed, 278 insertions(+), 44 deletions(-)
>
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index 8bb8757d..7397492e 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -281,7 +281,6 @@
>   #endif
>   
>   #if LJ_ABI_SOFTFP
> -#define LJ_ARCH_NOJIT		1  /* NYI */
>   #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
>   #else
>   #define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
> diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h
> index aa2d45c0..6cb608f7 100644
> --- a/src/lj_asm_ppc.h
> +++ b/src/lj_asm_ppc.h
> @@ -226,6 +226,7 @@ static void asm_fusexrefx(ASMState *as, PPCIns pi, Reg rt, IRRef ref,
>     emit_tab(as, pi, rt, left, right);
>   }
>   
> +#if !LJ_SOFTFP
>   /* Fuse to multiply-add/sub instruction. */
>   static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
>   {
> @@ -245,6 +246,7 @@ static int asm_fusemadd(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pir)
>     }
>     return 0;
>   }
> +#endif
>   
>   /* -- Calls --------------------------------------------------------------- */
>   
> @@ -253,13 +255,17 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>   {
>     uint32_t n, nargs = CCI_XNARGS(ci);
>     int32_t ofs = 8;
> -  Reg gpr = REGARG_FIRSTGPR, fpr = REGARG_FIRSTFPR;
> +  Reg gpr = REGARG_FIRSTGPR;
> +#if !LJ_SOFTFP
> +  Reg fpr = REGARG_FIRSTFPR;
> +#endif
>     if ((void *)ci->func)
>       emit_call(as, (void *)ci->func);
>     for (n = 0; n < nargs; n++) {  /* Setup args. */
>       IRRef ref = args[n];
>       if (ref) {
>         IRIns *ir = IR(ref);
> +#if !LJ_SOFTFP
>         if (irt_isfp(ir->t)) {
>   	if (fpr <= REGARG_LASTFPR) {
>   	  lua_assert(rset_test(as->freeset, fpr));  /* Already evicted. */
> @@ -271,7 +277,9 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>   	  emit_spstore(as, ir, r, ofs);
>   	  ofs += irt_isnum(ir->t) ? 8 : 4;
>   	}
> -      } else {
> +      } else
> +#endif
> +      {
>   	if (gpr <= REGARG_LASTGPR) {
>   	  lua_assert(rset_test(as->freeset, gpr));  /* Already evicted. */
>   	  ra_leftov(as, gpr, ref);
> @@ -290,8 +298,10 @@ static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args)
>       }
>       checkmclim(as);
>     }
> +#if !LJ_SOFTFP
>     if ((ci->flags & CCI_VARARG))  /* Vararg calls need to know about FPR use. */
>       emit_tab(as, fpr == REGARG_FIRSTFPR ? PPCI_CRXOR : PPCI_CREQV, 6, 6, 6);
> +#endif
>   }
>   
>   /* Setup result reg/sp for call. Evict scratch regs. */
> @@ -299,8 +309,10 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
>   {
>     RegSet drop = RSET_SCRATCH;
>     int hiop = ((ir+1)->o == IR_HIOP && !irt_isnil((ir+1)->t));
> +#if !LJ_SOFTFP
>     if ((ci->flags & CCI_NOFPRCLOBBER))
>       drop &= ~RSET_FPR;
> +#endif
>     if (ra_hasreg(ir->r))
>       rset_clear(drop, ir->r);  /* Dest reg handled below. */
>     if (hiop && ra_hasreg((ir+1)->r))
> @@ -308,7 +320,7 @@ static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci)
>     ra_evictset(as, drop);  /* Evictions must be performed first. */
>     if (ra_used(ir)) {
>       lua_assert(!irt_ispri(ir->t));
> -    if (irt_isfp(ir->t)) {
> +    if (!LJ_SOFTFP && irt_isfp(ir->t)) {
>         if ((ci->flags & CCI_CASTU64)) {
>   	/* Use spill slot or temp slots. */
>   	int32_t ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
> @@ -377,6 +389,7 @@ static void asm_retf(ASMState *as, IRIns *ir)
>   
>   /* -- Type conversions ---------------------------------------------------- */
>   
> +#if !LJ_SOFTFP
>   static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
>   {
>     RegSet allow = RSET_FPR;
> @@ -409,15 +422,23 @@ static void asm_tobit(ASMState *as, IRIns *ir)
>     emit_fai(as, PPCI_STFD, tmp, RID_SP, SPOFS_TMP);
>     emit_fab(as, PPCI_FADD, tmp, left, right);
>   }
> +#endif
>   
>   static void asm_conv(ASMState *as, IRIns *ir)
>   {
>     IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK);
> +#if !LJ_SOFTFP
>     int stfp = (st == IRT_NUM || st == IRT_FLOAT);
> +#endif
>     IRRef lref = ir->op1;
> -  lua_assert(irt_type(ir->t) != st);
>     lua_assert(!(irt_isint64(ir->t) ||
>   	       (st == IRT_I64 || st == IRT_U64))); /* Handled by SPLIT. */
> +#if LJ_SOFTFP
> +  /* FP conversions are handled by SPLIT. */
> +  lua_assert(!irt_isfp(ir->t) && !(st == IRT_NUM || st == IRT_FLOAT));
> +  /* Can't check for same types: SPLIT uses CONV int.int + BXOR for sfp NEG. */
> +#else
> +  lua_assert(irt_type(ir->t) != st);
>     if (irt_isfp(ir->t)) {
>       Reg dest = ra_dest(as, ir, RSET_FPR);
>       if (stfp) {  /* FP to FP conversion. */
> @@ -476,7 +497,9 @@ static void asm_conv(ASMState *as, IRIns *ir)
>   	emit_fb(as, PPCI_FCTIWZ, tmp, left);
>         }
>       }
> -  } else {
> +  } else
> +#endif
> +  {
>       Reg dest = ra_dest(as, ir, RSET_GPR);
>       if (st >= IRT_I8 && st <= IRT_U16) {  /* Extend to 32 bit integer. */
>         Reg left = ra_alloc1(as, ir->op1, RSET_GPR);
> @@ -496,17 +519,41 @@ static void asm_strto(ASMState *as, IRIns *ir)
>   {
>     const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num];
>     IRRef args[2];
> -  int32_t ofs;
> +  int32_t ofs = SPOFS_TMP;
> +#if LJ_SOFTFP
> +  ra_evictset(as, RSET_SCRATCH);
> +  if (ra_used(ir)) {
> +    if (ra_hasspill(ir->s) && ra_hasspill((ir+1)->s) &&
> +	(ir->s & 1) == LJ_BE && (ir->s ^ 1) == (ir+1)->s) {
> +      int i;
> +      for (i = 0; i < 2; i++) {
> +	Reg r = (ir+i)->r;
> +	if (ra_hasreg(r)) {
> +	  ra_free(as, r);
> +	  ra_modified(as, r);
> +	  emit_spload(as, ir+i, r, sps_scale((ir+i)->s));
> +	}
> +      }
> +      ofs = sps_scale(ir->s & ~1);
> +    } else {
> +      Reg rhi = ra_dest(as, ir+1, RSET_GPR);
> +      Reg rlo = ra_dest(as, ir, rset_exclude(RSET_GPR, rhi));
> +      emit_tai(as, PPCI_LWZ, rhi, RID_SP, ofs);
> +      emit_tai(as, PPCI_LWZ, rlo, RID_SP, ofs+4);
> +    }
> +  }
> +#else
>     RegSet drop = RSET_SCRATCH;
>     if (ra_hasreg(ir->r)) rset_set(drop, ir->r);  /* Spill dest reg (if any). */
>     ra_evictset(as, drop);
> +  if (ir->s) ofs = sps_scale(ir->s);
> +#endif
>     asm_guardcc(as, CC_EQ);
>     emit_ai(as, PPCI_CMPWI, RID_RET, 0);  /* Test return status. */
>     args[0] = ir->op1;      /* GCstr *str */
>     args[1] = ASMREF_TMP1;  /* TValue *n  */
>     asm_gencall(as, ci, args);
>     /* Store the result to the spill slot or temp slots. */
> -  ofs = ir->s ? sps_scale(ir->s) : SPOFS_TMP;
>     emit_tai(as, PPCI_ADDI, ra_releasetmp(as, ASMREF_TMP1), RID_SP, ofs);
>   }
>   
> @@ -530,7 +577,10 @@ static void asm_tvptr(ASMState *as, Reg dest, IRRef ref)
>         Reg src = ra_alloc1(as, ref, allow);
>         emit_setgl(as, src, tmptv.gcr);
>       }
> -    type = ra_allock(as, irt_toitype(ir->t), allow);
> +    if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> +      type = ra_alloc1(as, ref+1, allow);
> +    else
> +      type = ra_allock(as, irt_toitype(ir->t), allow);
>       emit_setgl(as, type, tmptv.it);
>     }
>   }
> @@ -574,11 +624,27 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>     Reg tisnum = RID_NONE, tmpnum = RID_NONE;
>     IRRef refkey = ir->op2;
>     IRIns *irkey = IR(refkey);
> +  int isk = irref_isk(refkey);
>     IRType1 kt = irkey->t;
>     uint32_t khash;
>     MCLabel l_end, l_loop, l_next;
>   
>     rset_clear(allow, tab);
> +#if LJ_SOFTFP
> +  if (!isk) {
> +    key = ra_alloc1(as, refkey, allow);
> +    rset_clear(allow, key);
> +    if (irkey[1].o == IR_HIOP) {
> +      if (ra_hasreg((irkey+1)->r)) {
> +	tmpnum = (irkey+1)->r;
> +	ra_noweak(as, tmpnum);
> +      } else {
> +	tmpnum = ra_allocref(as, refkey+1, allow);
> +      }
> +      rset_clear(allow, tmpnum);
> +    }
> +  }
> +#else
>     if (irt_isnum(kt)) {
>       key = ra_alloc1(as, refkey, RSET_FPR);
>       tmpnum = ra_scratch(as, rset_exclude(RSET_FPR, key));
> @@ -588,6 +654,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>       key = ra_alloc1(as, refkey, allow);
>       rset_clear(allow, key);
>     }
> +#endif
>     tmp2 = ra_scratch(as, allow);
>     rset_clear(allow, tmp2);
>   
> @@ -610,7 +677,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>       asm_guardcc(as, CC_EQ);
>     else
>       emit_condbranch(as, PPCI_BC|PPCF_Y, CC_EQ, l_end);
> -  if (irt_isnum(kt)) {
> +  if (!LJ_SOFTFP && irt_isnum(kt)) {
>       emit_fab(as, PPCI_FCMPU, 0, tmpnum, key);
>       emit_condbranch(as, PPCI_BC, CC_GE, l_next);
>       emit_ab(as, PPCI_CMPLW, tmp1, tisnum);
> @@ -620,7 +687,10 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>         emit_ab(as, PPCI_CMPW, tmp2, key);
>         emit_condbranch(as, PPCI_BC, CC_NE, l_next);
>       }
> -    emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
> +    if (LJ_SOFTFP && ra_hasreg(tmpnum))
> +      emit_ab(as, PPCI_CMPW, tmp1, tmpnum);
> +    else
> +      emit_ai(as, PPCI_CMPWI, tmp1, irt_toitype(irkey->t));
>       if (!irt_ispri(kt))
>         emit_tai(as, PPCI_LWZ, tmp2, dest, (int32_t)offsetof(Node, key.gcr));
>     }
> @@ -629,19 +699,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>   	    (((char *)as->mcp-(char *)l_loop) & 0xffffu);
>   
>     /* Load main position relative to tab->node into dest. */
> -  khash = irref_isk(refkey) ? ir_khash(irkey) : 1;
> +  khash = isk ? ir_khash(irkey) : 1;
>     if (khash == 0) {
>       emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
>     } else {
>       Reg tmphash = tmp1;
> -    if (irref_isk(refkey))
> +    if (isk)
>         tmphash = ra_allock(as, khash, allow);
>       emit_tab(as, PPCI_ADD, dest, dest, tmp1);
>       emit_tai(as, PPCI_MULLI, tmp1, tmp1, sizeof(Node));
>       emit_asb(as, PPCI_AND, tmp1, tmp2, tmphash);
>       emit_tai(as, PPCI_LWZ, dest, tab, (int32_t)offsetof(GCtab, node));
>       emit_tai(as, PPCI_LWZ, tmp2, tab, (int32_t)offsetof(GCtab, hmask));
> -    if (irref_isk(refkey)) {
> +    if (isk) {
>         /* Nothing to do. */
>       } else if (irt_isstr(kt)) {
>         emit_tai(as, PPCI_LWZ, tmp1, key, (int32_t)offsetof(GCstr, hash));
> @@ -651,13 +721,19 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
>         emit_asb(as, PPCI_XOR, tmp1, tmp1, tmp2);
>         emit_rotlwi(as, tmp1, tmp1, (HASH_ROT2+HASH_ROT1)&31);
>         emit_tab(as, PPCI_SUBF, tmp2, dest, tmp2);
> -      if (irt_isnum(kt)) {
> +      if (LJ_SOFTFP ? (irkey[1].o == IR_HIOP) : irt_isnum(kt)) {
> +#if LJ_SOFTFP
> +	emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
> +	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> +	emit_tab(as, PPCI_ADD, tmp1, tmpnum, tmpnum);
> +#else
>   	int32_t ofs = ra_spill(as, irkey);
>   	emit_asb(as, PPCI_XOR, tmp2, tmp2, tmp1);
>   	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
>   	emit_tab(as, PPCI_ADD, tmp1, tmp1, tmp1);
>   	emit_tai(as, PPCI_LWZ, tmp2, RID_SP, ofs+4);
>   	emit_tai(as, PPCI_LWZ, tmp1, RID_SP, ofs);
> +#endif
>         } else {
>   	emit_asb(as, PPCI_XOR, tmp2, key, tmp1);
>   	emit_rotlwi(as, dest, tmp1, HASH_ROT1);
> @@ -784,8 +860,8 @@ static PPCIns asm_fxloadins(IRIns *ir)
>     case IRT_U8: return PPCI_LBZ;
>     case IRT_I16: return PPCI_LHA;
>     case IRT_U16: return PPCI_LHZ;
> -  case IRT_NUM: return PPCI_LFD;
> -  case IRT_FLOAT: return PPCI_LFS;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_LFD;
> +  case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_LFS;
>     default: return PPCI_LWZ;
>     }
>   }
> @@ -795,8 +871,8 @@ static PPCIns asm_fxstoreins(IRIns *ir)
>     switch (irt_type(ir->t)) {
>     case IRT_I8: case IRT_U8: return PPCI_STB;
>     case IRT_I16: case IRT_U16: return PPCI_STH;
> -  case IRT_NUM: return PPCI_STFD;
> -  case IRT_FLOAT: return PPCI_STFS;
> +  case IRT_NUM: lua_assert(!LJ_SOFTFP); return PPCI_STFD;
> +  case IRT_FLOAT: if (!LJ_SOFTFP) return PPCI_STFS;
>     default: return PPCI_STW;
>     }
>   }
> @@ -839,7 +915,8 @@ static void asm_fstore(ASMState *as, IRIns *ir)
>   
>   static void asm_xload(ASMState *as, IRIns *ir)
>   {
> -  Reg dest = ra_dest(as, ir, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> +  Reg dest = ra_dest(as, ir,
> +    (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
>     lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED));
>     if (irt_isi8(ir->t))
>       emit_as(as, PPCI_EXTSB, dest, dest);
> @@ -857,7 +934,8 @@ static void asm_xstore_(ASMState *as, IRIns *ir, int32_t ofs)
>       Reg src = ra_alloc1(as, irb->op1, RSET_GPR);
>       asm_fusexrefx(as, PPCI_STWBRX, src, ir->op1, rset_exclude(RSET_GPR, src));
>     } else {
> -    Reg src = ra_alloc1(as, ir->op2, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR);
> +    Reg src = ra_alloc1(as, ir->op2,
> +      (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR);
>       asm_fusexref(as, asm_fxstoreins(ir), src, ir->op1,
>   		 rset_exclude(RSET_GPR, src), ofs);
>     }
> @@ -871,10 +949,19 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
>     Reg dest = RID_NONE, type = RID_TMP, tmp = RID_TMP, idx;
>     RegSet allow = RSET_GPR;
>     int32_t ofs = AHUREF_LSX;
> +  if (LJ_SOFTFP && (ir+1)->o == IR_HIOP) {
> +    t.irt = IRT_NUM;
> +    if (ra_used(ir+1)) {
> +      type = ra_dest(as, ir+1, allow);
> +      rset_clear(allow, type);
> +    }
> +    ofs = 0;
> +  }
>     if (ra_used(ir)) {
> -    lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> -    if (!irt_isnum(t)) ofs = 0;
> -    dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> +    lua_assert((LJ_SOFTFP ? 0 : irt_isnum(ir->t)) ||
> +	       irt_isint(ir->t) || irt_isaddr(ir->t));
> +    if (LJ_SOFTFP || !irt_isnum(t)) ofs = 0;
> +    dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>       rset_clear(allow, dest);
>     }
>     idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> @@ -883,12 +970,13 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
>       asm_guardcc(as, CC_GE);
>       emit_ab(as, PPCI_CMPLW, type, tisnum);
>       if (ra_hasreg(dest)) {
> -      if (ofs == AHUREF_LSX) {
> +      if (!LJ_SOFTFP && ofs == AHUREF_LSX) {
>   	tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_GPR,
>   						       (idx&255)), (idx>>8)));
>   	emit_fab(as, PPCI_LFDX, dest, (idx&255), tmp);
>         } else {
> -	emit_fai(as, PPCI_LFD, dest, idx, ofs);
> +	emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest, idx,
> +		 ofs+4*LJ_SOFTFP);
>         }
>       }
>     } else {
> @@ -911,7 +999,7 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
>     int32_t ofs = AHUREF_LSX;
>     if (ir->r == RID_SINK)
>       return;
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>       src = ra_alloc1(as, ir->op2, RSET_FPR);
>     } else {
>       if (!irt_ispri(ir->t)) {
> @@ -919,11 +1007,14 @@ static void asm_ahustore(ASMState *as, IRIns *ir)
>         rset_clear(allow, src);
>         ofs = 0;
>       }
> -    type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
> +    if (LJ_SOFTFP && (ir+1)->o == IR_HIOP)
> +      type = ra_alloc1(as, (ir+1)->op2, allow);
> +    else
> +      type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
>       rset_clear(allow, type);
>     }
>     idx = asm_fuseahuref(as, ir->op1, &ofs, allow);
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>       if (ofs == AHUREF_LSX) {
>         emit_fab(as, PPCI_STFDX, src, (idx&255), RID_TMP);
>         emit_slwi(as, RID_TMP, (idx>>8), 3);
> @@ -948,21 +1039,33 @@ static void asm_sload(ASMState *as, IRIns *ir)
>     IRType1 t = ir->t;
>     Reg dest = RID_NONE, type = RID_NONE, base;
>     RegSet allow = RSET_GPR;
> +  int hiop = (LJ_SOFTFP && (ir+1)->o == IR_HIOP);
> +  if (hiop)
> +    t.irt = IRT_NUM;
>     lua_assert(!(ir->op2 & IRSLOAD_PARENT));  /* Handled by asm_head_side(). */
> -  lua_assert(irt_isguard(t) || !(ir->op2 & IRSLOAD_TYPECHECK));
> +  lua_assert(irt_isguard(ir->t) || !(ir->op2 & IRSLOAD_TYPECHECK));
>     lua_assert(LJ_DUALNUM ||
>   	     !irt_isint(t) || (ir->op2 & (IRSLOAD_CONVERT|IRSLOAD_FRAME)));
> +#if LJ_SOFTFP
> +  lua_assert(!(ir->op2 & IRSLOAD_CONVERT));  /* Handled by LJ_SOFTFP SPLIT. */
> +  if (hiop && ra_used(ir+1)) {
> +    type = ra_dest(as, ir+1, allow);
> +    rset_clear(allow, type);
> +  }
> +#else
>     if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) {
>       dest = ra_scratch(as, RSET_FPR);
>       asm_tointg(as, ir, dest);
>       t.irt = IRT_NUM;  /* Continue with a regular number type check. */
> -  } else if (ra_used(ir)) {
> +  } else
> +#endif
> +  if (ra_used(ir)) {
>       lua_assert(irt_isnum(t) || irt_isint(t) || irt_isaddr(t));
> -    dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : RSET_GPR);
> +    dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isnum(t)) ? RSET_FPR : allow);
>       rset_clear(allow, dest);
>       base = ra_alloc1(as, REF_BASE, allow);
>       rset_clear(allow, base);
> -    if ((ir->op2 & IRSLOAD_CONVERT)) {
> +    if (!LJ_SOFTFP && (ir->op2 & IRSLOAD_CONVERT)) {
>         if (irt_isint(t)) {
>   	emit_tai(as, PPCI_LWZ, dest, RID_SP, SPOFS_TMPLO);
>   	dest = ra_scratch(as, RSET_FPR);
> @@ -994,10 +1097,13 @@ dotypecheck:
>       if ((ir->op2 & IRSLOAD_TYPECHECK)) {
>         Reg tisnum = ra_allock(as, (int32_t)LJ_TISNUM, allow);
>         asm_guardcc(as, CC_GE);
> -      emit_ab(as, PPCI_CMPLW, RID_TMP, tisnum);
> +#if !LJ_SOFTFP
>         type = RID_TMP;
> +#endif
> +      emit_ab(as, PPCI_CMPLW, type, tisnum);
>       }
> -    if (ra_hasreg(dest)) emit_fai(as, PPCI_LFD, dest, base, ofs-4);
> +    if (ra_hasreg(dest)) emit_fai(as, LJ_SOFTFP ? PPCI_LWZ : PPCI_LFD, dest,
> +				  base, ofs-(LJ_SOFTFP?0:4));
>     } else {
>       if ((ir->op2 & IRSLOAD_TYPECHECK)) {
>         asm_guardcc(as, CC_NE);
> @@ -1122,6 +1228,7 @@ static void asm_obar(ASMState *as, IRIns *ir)
>   
>   /* -- Arithmetic and logic operations ------------------------------------- */
>   
> +#if !LJ_SOFTFP
>   static void asm_fparith(ASMState *as, IRIns *ir, PPCIns pi)
>   {
>     Reg dest = ra_dest(as, ir, RSET_FPR);
> @@ -1149,13 +1256,17 @@ static void asm_fpmath(ASMState *as, IRIns *ir)
>     else
>       asm_callid(as, ir, IRCALL_lj_vm_floor + ir->op2);
>   }
> +#endif
>   
>   static void asm_add(ASMState *as, IRIns *ir)
>   {
> +#if !LJ_SOFTFP
>     if (irt_isnum(ir->t)) {
>       if (!asm_fusemadd(as, ir, PPCI_FMADD, PPCI_FMADD))
>         asm_fparith(as, ir, PPCI_FADD);
> -  } else {
> +  } else
> +#endif
> +  {
>       Reg dest = ra_dest(as, ir, RSET_GPR);
>       Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
>       PPCIns pi;
> @@ -1194,10 +1305,13 @@ static void asm_add(ASMState *as, IRIns *ir)
>   
>   static void asm_sub(ASMState *as, IRIns *ir)
>   {
> +#if !LJ_SOFTFP
>     if (irt_isnum(ir->t)) {
>       if (!asm_fusemadd(as, ir, PPCI_FMSUB, PPCI_FNMSUB))
>         asm_fparith(as, ir, PPCI_FSUB);
> -  } else {
> +  } else
> +#endif
> +  {
>       PPCIns pi = PPCI_SUBF;
>       Reg dest = ra_dest(as, ir, RSET_GPR);
>       Reg left, right;
> @@ -1223,9 +1337,12 @@ static void asm_sub(ASMState *as, IRIns *ir)
>   
>   static void asm_mul(ASMState *as, IRIns *ir)
>   {
> +#if !LJ_SOFTFP
>     if (irt_isnum(ir->t)) {
>       asm_fparith(as, ir, PPCI_FMUL);
> -  } else {
> +  } else
> +#endif
> +  {
>       PPCIns pi = PPCI_MULLW;
>       Reg dest = ra_dest(as, ir, RSET_GPR);
>       Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR);
> @@ -1253,9 +1370,12 @@ static void asm_mul(ASMState *as, IRIns *ir)
>   
>   static void asm_neg(ASMState *as, IRIns *ir)
>   {
> +#if !LJ_SOFTFP
>     if (irt_isnum(ir->t)) {
>       asm_fpunary(as, ir, PPCI_FNEG);
> -  } else {
> +  } else
> +#endif
> +  {
>       Reg dest, left;
>       PPCIns pi = PPCI_NEG;
>       if (as->flagmcp == as->mcp) {
> @@ -1566,9 +1686,40 @@ static void asm_bitshift(ASMState *as, IRIns *ir, PPCIns pi, PPCIns pik)
>   		       PPCI_RLWINM|PPCF_MB(0)|PPCF_ME(31))
>   #define asm_bror(as, ir)	lua_assert(0)
>   
> +#if LJ_SOFTFP
> +static void asm_sfpmin_max(ASMState *as, IRIns *ir)
> +{
> +  CCallInfo ci = lj_ir_callinfo[IRCALL_softfp_cmp];
> +  IRRef args[4];
> +  MCLabel l_right, l_end;
> +  Reg desthi = ra_dest(as, ir, RSET_GPR), destlo = ra_dest(as, ir+1, RSET_GPR);
> +  Reg righthi, lefthi = ra_alloc2(as, ir, RSET_GPR);
> +  Reg rightlo, leftlo = ra_alloc2(as, ir+1, RSET_GPR);
> +  PPCCC cond = (IROp)ir->o == IR_MIN ? CC_EQ : CC_NE;
> +  righthi = (lefthi >> 8); lefthi &= 255;
> +  rightlo = (leftlo >> 8); leftlo &= 255;
> +  args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> +  args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> +  l_end = emit_label(as);
> +  if (desthi != righthi) emit_mr(as, desthi, righthi);
> +  if (destlo != rightlo) emit_mr(as, destlo, rightlo);
> +  l_right = emit_label(as);
> +  if (l_end != l_right) emit_jmp(as, l_end);
> +  if (desthi != lefthi) emit_mr(as, desthi, lefthi);
> +  if (destlo != leftlo) emit_mr(as, destlo, leftlo);
> +  if (l_right == as->mcp+1) {
> +    cond ^= 4; l_right = l_end; ++as->mcp;
> +  }
> +  emit_condbranch(as, PPCI_BC, cond, l_right);
> +  ra_evictset(as, RSET_SCRATCH);
> +  emit_cmpi(as, RID_RET, 1);
> +  asm_gencall(as, &ci, args);
> +}
> +#endif
> +
>   static void asm_min_max(ASMState *as, IRIns *ir, int ismax)
>   {
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>       Reg dest = ra_dest(as, ir, RSET_FPR);
>       Reg tmp = dest;
>       Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> @@ -1656,7 +1807,7 @@ static void asm_intcomp_(ASMState *as, IRRef lref, IRRef rref, Reg cr, PPCCC cc)
>   static void asm_comp(ASMState *as, IRIns *ir)
>   {
>     PPCCC cc = asm_compmap[ir->o];
> -  if (irt_isnum(ir->t)) {
> +  if (!LJ_SOFTFP && irt_isnum(ir->t)) {
>       Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>       right = (left >> 8); left &= 255;
>       asm_guardcc(as, (cc >> 4));
> @@ -1677,6 +1828,44 @@ static void asm_comp(ASMState *as, IRIns *ir)
>   
>   #define asm_equal(as, ir)	asm_comp(as, ir)
>   
> +#if LJ_SOFTFP
> +/* SFP comparisons. */
> +static void asm_sfpcomp(ASMState *as, IRIns *ir)
> +{
> +  const CCallInfo *ci = &lj_ir_callinfo[IRCALL_softfp_cmp];
> +  RegSet drop = RSET_SCRATCH;
> +  Reg r;
> +  IRRef args[4];
> +  args[0^LJ_BE] = ir->op1; args[1^LJ_BE] = (ir+1)->op1;
> +  args[2^LJ_BE] = ir->op2; args[3^LJ_BE] = (ir+1)->op2;
> +
> +  for (r = REGARG_FIRSTGPR; r <= REGARG_FIRSTGPR+3; r++) {
> +    if (!rset_test(as->freeset, r) &&
> +	regcost_ref(as->cost[r]) == args[r-REGARG_FIRSTGPR])
> +      rset_clear(drop, r);
> +  }
> +  ra_evictset(as, drop);
> +  asm_setupresult(as, ir, ci);
> +  switch ((IROp)ir->o) {
> +  case IR_ULT:
> +    asm_guardcc(as, CC_EQ);
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> +  case IR_ULE:
> +    asm_guardcc(as, CC_EQ);
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 1);
> +    break;
> +  case IR_GE: case IR_GT:
> +    asm_guardcc(as, CC_EQ);
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 2);
> +  default:
> +    asm_guardcc(as, (asm_compmap[ir->o] & 0xf));
> +    emit_ai(as, PPCI_CMPWI, RID_RET, 0);
> +    break;
> +  }
> +  asm_gencall(as, ci, args);
> +}
> +#endif
> +
>   #if LJ_HASFFI
>   /* 64 bit integer comparisons. */
>   static void asm_comp64(ASMState *as, IRIns *ir)
> @@ -1706,19 +1895,36 @@ static void asm_comp64(ASMState *as, IRIns *ir)
>   /* Hiword op of a split 64 bit op. Previous op must be the loword op. */
>   static void asm_hiop(ASMState *as, IRIns *ir)
>   {
> -#if LJ_HASFFI
> +#if LJ_HASFFI || LJ_SOFTFP
>     /* HIOP is marked as a store because it needs its own DCE logic. */
>     int uselo = ra_used(ir-1), usehi = ra_used(ir);  /* Loword/hiword used? */
>     if (LJ_UNLIKELY(!(as->flags & JIT_F_OPT_DCE))) uselo = usehi = 1;
>     if ((ir-1)->o == IR_CONV) {  /* Conversions to/from 64 bit. */
>       as->curins--;  /* Always skip the CONV. */
> +#if LJ_HASFFI && !LJ_SOFTFP
>       if (usehi || uselo)
>         asm_conv64(as, ir);
>       return;
> +#endif
>     } else if ((ir-1)->o <= IR_NE) {  /* 64 bit integer comparisons. ORDER IR. */
>       as->curins--;  /* Always skip the loword comparison. */
> +#if LJ_SOFTFP
> +    if (!irt_isint(ir->t)) {
> +      asm_sfpcomp(as, ir-1);
> +      return;
> +    }
> +#endif
> +#if LJ_HASFFI
>       asm_comp64(as, ir);
> +#endif
> +    return;
> +#if LJ_SOFTFP
> +  } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
> +      as->curins--;  /* Always skip the loword min/max. */
> +    if (uselo || usehi)
> +      asm_sfpmin_max(as, ir-1);
>       return;
> +#endif
>     } else if ((ir-1)->o == IR_XSTORE) {
>       as->curins--;  /* Handle both stores here. */
>       if ((ir-1)->r != RID_SINK) {
> @@ -1729,14 +1935,27 @@ static void asm_hiop(ASMState *as, IRIns *ir)
>     }
>     if (!usehi) return;  /* Skip unused hiword op for all remaining ops. */
>     switch ((ir-1)->o) {
> +#if LJ_HASFFI
>     case IR_ADD: as->curins--; asm_add64(as, ir); break;
>     case IR_SUB: as->curins--; asm_sub64(as, ir); break;
>     case IR_NEG: as->curins--; asm_neg64(as, ir); break;
> +#endif
> +#if LJ_SOFTFP
> +  case IR_SLOAD: case IR_ALOAD: case IR_HLOAD: case IR_ULOAD: case IR_VLOAD:
> +  case IR_STRTO:
> +    if (!uselo)
> +      ra_allocref(as, ir->op1, RSET_GPR);  /* Mark lo op as used. */
> +    break;
> +#endif
>     case IR_CALLN:
> +  case IR_CALLS:
>     case IR_CALLXS:
>       if (!uselo)
>         ra_allocref(as, ir->op1, RID2RSET(RID_RETLO));  /* Mark lo op as used. */
>       break;
> +#if LJ_SOFTFP
> +  case IR_ASTORE: case IR_HSTORE: case IR_USTORE: case IR_TOSTR:
> +#endif
>     case IR_CNEWI:
>       /* Nothing to do here. Handled by lo op itself. */
>       break;
> @@ -1800,8 +2019,19 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>       if ((sn & SNAP_NORESTORE))
>         continue;
>       if (irt_isnum(ir->t)) {
> +#if LJ_SOFTFP
> +      Reg tmp;
> +      RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> +      lua_assert(irref_isk(ref));  /* LJ_SOFTFP: must be a number constant. */
> +      tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.lo, allow);
> +      emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?4:0));
> +      if (rset_test(as->freeset, tmp+1)) allow = RID2RSET(tmp+1);
> +      tmp = ra_allock(as, (int32_t)ir_knum(ir)->u32.hi, allow);
> +      emit_tai(as, PPCI_STW, tmp, RID_BASE, ofs+(LJ_BE?0:4));
> +#else
>         Reg src = ra_alloc1(as, ref, RSET_FPR);
>         emit_fai(as, PPCI_STFD, src, RID_BASE, ofs);
> +#endif
>       } else {
>         Reg type;
>         RegSet allow = rset_exclude(RSET_GPR, RID_BASE);
> @@ -1814,6 +2044,10 @@ static void asm_stack_restore(ASMState *as, SnapShot *snap)
>         if ((sn & (SNAP_CONT|SNAP_FRAME))) {
>   	if (s == 0) continue;  /* Do not overwrite link to previous frame. */
>   	type = ra_allock(as, (int32_t)(*flinks--), allow);
> +#if LJ_SOFTFP
> +      } else if ((sn & SNAP_SOFTFPNUM)) {
> +	type = ra_alloc1(as, ref+1, rset_exclude(RSET_GPR, RID_BASE));
> +#endif
>         } else {
>   	type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow);
>         }
> @@ -1950,14 +2184,15 @@ static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci)
>     int nslots = 2, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR;
>     asm_collectargs(as, ir, ci, args);
>     for (i = 0; i < nargs; i++)
> -    if (args[i] && irt_isfp(IR(args[i])->t)) {
> +    if (!LJ_SOFTFP && args[i] && irt_isfp(IR(args[i])->t)) {
>         if (nfpr > 0) nfpr--; else nslots = (nslots+3) & ~1;
>       } else {
>         if (ngpr > 0) ngpr--; else nslots++;
>       }
>     if (nslots > as->evenspill)  /* Leave room for args in stack slots. */
>       as->evenspill = nslots;
> -  return irt_isfp(ir->t) ? REGSP_HINT(RID_FPRET) : REGSP_HINT(RID_RET);
> +  return (!LJ_SOFTFP && irt_isfp(ir->t)) ? REGSP_HINT(RID_FPRET) :
> +					   REGSP_HINT(RID_RET);
>   }
>   
>   static void asm_setup_target(ASMState *as)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (19 preceding siblings ...)
  2023-08-16 15:35 ` [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
@ 2023-08-17 14:38 ` Sergey Bronnikov via Tarantool-patches
  2023-08-31 15:17 ` Igor Munkin via Tarantool-patches
  21 siblings, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:38 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey

thanks for patch series! LGTM

On 8/9/23 18:35, Sergey Kaplun wrote:
> This patch-set contains all commits are necessary to avoid conflicts
> during the backporting of 8ae5170c "Improve assertions" [1], that are
> caused by outdating from the upstream.
>
> This patch-set:
> - includes several new ports (see patches 4-6, 8, 19) -- only
>    description is added for such patches.
> - fixes some MIPS misbehaviour (1, 3, 17) -- include tests (*), except
>    the first one, since it depends on memory mapping.
> - fixes non-Linux/macOS build (7)
> - backportes patches, that was excluded or partially stripped before (10,
>    14, 15)
> - includes refactoring (2, 9, 18)
> - fixes general bugs (16)
> - fixes gcc 7.1 -Wimplicit-fallthrough warnings (11 - 13)
>
> Note: that only patches 3, 16, 17 adds some new tests.
> Other patches just provided description, and the patch 13 adds
> -Wimplicit-fallthrough for GCC (>= 7.1) builds.
>
> Patches are backported in the free order as far as they are unrelated
> to each other.
>
> (*) To run tests for mips64 in qemu:
>
> Compile with the following command:
>
> | make -j -f Makefile.original HOST_CC="gcc " \
> |         CROSS=mips64el-unknown-linux-gnu- \
> |         CCDEBUG=" -g -ggdb3" CFLAGS=" -O0" \
> |         XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT "
>
> Be avare, that mips64el-unknown-linux-gnu-gcc should provide n64 abi by
> default.
> Side note: installed on Gentoo with the following command
> | crossdev -t mips64el --abis n64 --ex-gdb
>
> And run the corresponding test (-g 7776 to use GDB server on 7776
> port):
> | LUA_PATH="src/?.lua;test/tarantool-tests/?.lua;test/tarantool-tests/?/init.lua;;" \
> | LD_LIBRARY_PATH="/usr/lib/gcc/mips64el-unknown-linux-gnu/13/" \
> |   qemu-mips64el  -g 7776 -L /usr/mips64el-unknown-linux-gnu/ \
> |     src/luajit -jdump=ta test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
>
> If you want to connect to the running test from multiarch-gdb:
> | mips64el-unknown-linux-gnu-gdb src/luajit
> | (gdb) target remote 0.0.0.0:7776
> | ...
> | 0x000000400297fd00 in __start () from /usr/mips64el-unknown-linux-gnu/lib64/ld.so.1
> | (gdb) c
>
> [1]: https://github.com/LuaJIT/LuaJIT/commit/8ae5170c
>
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-8825-mips-ppc-refactoring
> PR: https://github.com/tarantool/tarantool/pull/8969
> Related Issues:
> * https://github.com/tarantool/tarantool/issues/8825
> * https://github.com/LuaJIT/LuaJIT/pull/362
> * https://github.com/LuaJIT/LuaJIT/issues/812
>
> Mike Pall (17):
>    MIPS: Use precise search for exit jump patching.
>    MIPS: Fix handling of spare long-range jump slots.
>    MIPS64: Add soft-float support to JIT compiler backend.
>    PPC: Add soft-float support to interpreter.
>    PPC: Add soft-float support to JIT compiler backend.
>    Windows: Add UWP support, part 1.
>    FFI: Eliminate hardcoded string hashes.
>    Cleanup math function compilation and fix inconsistencies.
>    Fix GCC 7 -Wimplicit-fallthrough warnings.
>    DynASM: Fix warning.
>    ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
>    Fix debug.getinfo() argument check.
>    Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
>    Prevent integer overflow while parsing long strings.
>    MIPS64: Fix register allocation in assembly of HREF.
>    DynASM/MIPS: Fix shadowed variable.
>    MIPS: Add MIPS64 R6 port.
>
> Sergey Kaplun (2):
>    test: introduce mcode generator for tests
>    build: fix non-Linux/macOS builds
>
>   cmake/SetDynASMFlags.cmake                    |    5 +
>   cmake/SetTargetFlags.cmake                    |    6 +
>   doc/ext_ffi_api.html                          |    2 +
>   dynasm/dasm_arm.h                             |    2 +
>   dynasm/dasm_arm64.h                           |    1 +
>   dynasm/dasm_mips.h                            |   14 +-
>   dynasm/dasm_mips.lua                          |  629 ++++++---
>   dynasm/dasm_ppc.h                             |    1 +
>   dynasm/dasm_x86.h                             |   18 +-
>   dynasm/dynasm.lua                             |    1 +
>   src/Makefile.original                         |    3 +
>   src/host/buildvm_asm.c                        |    2 +-
>   src/jit/bcsave.lua                            |   84 +-
>   src/jit/dis_mips.lua                          |  293 +++-
>   src/jit/dis_mips64r6.lua                      |   17 +
>   src/jit/dis_mips64r6el.lua                    |   17 +
>   src/lib_ffi.c                                 |   36 +-
>   src/lib_io.c                                  |    4 +-
>   src/lib_misc.c                                |   16 +-
>   src/lib_package.c                             |   24 +-
>   src/lj_alloc.c                                |    6 +-
>   src/lj_arch.h                                 |   80 +-
>   src/lj_asm.c                                  |   19 +-
>   src/lj_asm_arm.h                              |    4 +-
>   src/lj_asm_mips.h                             |  379 ++++-
>   src/lj_asm_ppc.h                              |  322 ++++-
>   src/lj_ccall.c                                |   38 +-
>   src/lj_ccall.h                                |    4 +-
>   src/lj_ccallback.c                            |   34 +-
>   src/lj_clib.c                                 |   20 +-
>   src/lj_cparse.c                               |   87 +-
>   src/lj_cparse.h                               |    2 +
>   src/lj_crecord.c                              |    4 +-
>   src/lj_debug.c                                |   16 +-
>   src/lj_emit_mips.h                            |   17 +-
>   src/lj_err.c                                  |    1 +
>   src/lj_ffrecord.c                             |    2 +-
>   src/lj_frame.h                                |    2 +-
>   src/lj_ircall.h                               |   45 +-
>   src/lj_iropt.h                                |    2 +-
>   src/lj_jit.h                                  |   18 +-
>   src/lj_lex.c                                  |    2 +-
>   src/lj_mcode.c                                |   14 +-
>   src/lj_obj.h                                  |    3 +
>   src/lj_opt_sink.c                             |    2 +-
>   src/lj_opt_split.c                            |    2 +-
>   src/lj_parse.c                                |    3 +-
>   src/lj_profile_timer.c                        |    8 +-
>   src/lj_profile_timer.h                        |    8 +-
>   src/lj_record.c                               |    4 +-
>   src/lj_snap.c                                 |   21 +-
>   src/lj_target_mips.h                          |   52 +-
>   src/luajit.c                                  |    1 +
>   src/vm_mips64.dasc                            |  413 +++++-
>   src/vm_ppc.dasc                               | 1249 ++++++++++++++---
>   ...x-mips64-spare-side-exit-patching.test.lua |   65 +
>   ...8-fix-side-exit-patching-on-arm64.test.lua |   78 +-
>   ...-mips64-href-delay-slot-side-exit.test.lua |  101 ++
>   .../lj-812-too-long-string-separator.test.lua |   31 +
>   test/tarantool-tests/utils/frontend.lua       |   24 +
>   test/tarantool-tests/utils/jit/generators.lua |  115 ++
>   61 files changed, 3565 insertions(+), 908 deletions(-)
>   create mode 100644 src/jit/dis_mips64r6.lua
>   create mode 100644 src/jit/dis_mips64r6el.lua
>   create mode 100644 test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
>   create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
>   create mode 100644 test/tarantool-tests/lj-812-too-long-string-separator.test.lua
>   create mode 100644 test/tarantool-tests/utils/jit/generators.lua
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter.
  2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
  2023-08-15 11:40   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-08-17 14:53   ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 0 replies; 97+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-17 14:53 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi, Sergey


thanks for the patch! LGTM

On 8/9/23 18:35, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com.
> Sponsored by Cisco Systems, Inc.
>
> (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a)
>
> The software floating point library is used on machines which do not
> have hardware support for floating point [1]. This patch enables
> support for such machines in the VM for powerpc.
> This includes:
> * Any loads/storages of double values use load/storage through 32-bit
>    registers of `lo` and `hi` part of the TValue union.
> * Macro .FPU is added to skip instructions necessary only for
>    hard-float operations (load/store floating point registers from/on the
>    stack, when leave/enter VM, for example).
> * Now r25 named as `SAVE1` is used as saved temporary register (used in
>    different fast functions)
> * `sfi2d` macro is introduced to convert integer, that represents a
>    soft-float, to double. Receives destination and source registers, uses
>    `TMP0` and `TMP1`.
> * `sfpmod` macro is introduced for soft-float point `fmod` built-in.
> * `ins_arith` now receives the third parameter -- operation to use for
>    soft-float point.
> * `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that
>    there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is
>    set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true.
>
> Support of soft-float point for the JIT compiler will be added in the
> next patch.
>
> [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
>
> Sergey Kaplun:
> * added the description for the feature
>
> Part of tarantool/tarantool#8825
> ---
>   src/host/buildvm_asm.c |    2 +-
>   src/lj_arch.h          |   29 +-
>   src/lj_ccall.c         |   38 +-
>   src/lj_ccall.h         |    4 +-
>   src/lj_ccallback.c     |   30 +-
>   src/lj_frame.h         |    2 +-
>   src/lj_ircall.h        |    2 +-
>   src/vm_ppc.dasc        | 1249 +++++++++++++++++++++++++++++++++-------
>   8 files changed, 1101 insertions(+), 255 deletions(-)
>
> diff --git a/src/host/buildvm_asm.c b/src/host/buildvm_asm.c
> index ffd14903..43595b31 100644
> --- a/src/host/buildvm_asm.c
> +++ b/src/host/buildvm_asm.c
> @@ -338,7 +338,7 @@ void emit_asm(BuildCtx *ctx)
>   #if !(LJ_TARGET_PS3 || LJ_TARGET_PSVITA)
>       fprintf(ctx->fp, "\t.section .note.GNU-stack,\"\"," ELFASM_PX "progbits\n");
>   #endif
> -#if LJ_TARGET_PPC && !LJ_TARGET_PS3
> +#if LJ_TARGET_PPC && !LJ_TARGET_PS3 && !LJ_ABI_SOFTFP
>       /* Hard-float ABI. */
>       fprintf(ctx->fp, "\t.gnu_attribute 4, 1\n");
>   #endif
> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index c39526ea..8bb8757d 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h
> @@ -262,6 +262,29 @@
>   #else
>   #define LJ_ARCH_BITS		32
>   #define LJ_ARCH_NAME		"ppc"
> +
> +#if !defined(LJ_ARCH_HASFPU)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ARCH_HASFPU		0
> +#else
> +#define LJ_ARCH_HASFPU		1
> +#endif
> +#endif
> +
> +#if !defined(LJ_ABI_SOFTFP)
> +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> +#define LJ_ABI_SOFTFP		1
> +#else
> +#define LJ_ABI_SOFTFP		0
> +#endif
> +#endif
> +#endif
> +
> +#if LJ_ABI_SOFTFP
> +#define LJ_ARCH_NOJIT		1  /* NYI */
> +#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL
> +#else
> +#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
>   #endif
>   
>   #define LJ_TARGET_PPC		1
> @@ -271,7 +294,6 @@
>   #define LJ_TARGET_MASKSHIFT	0
>   #define LJ_TARGET_MASKROT	1
>   #define LJ_TARGET_UNIFYROT	1	/* Want only IR_BROL. */
> -#define LJ_ARCH_NUMMODE		LJ_NUMMODE_DUAL_SINGLE
>   
>   #if LJ_TARGET_CONSOLE
>   #define LJ_ARCH_PPC32ON64	1
> @@ -431,16 +453,13 @@
>   #error "No support for ILP32 model on ARM64"
>   #endif
>   #elif LJ_TARGET_PPC
> -#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
> -#error "No support for PowerPC CPUs without double-precision FPU"
> -#endif
>   #if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
>   #error "No support for little-endian PPC32"
>   #endif
>   #if LJ_ARCH_PPC64
>   #error "No support for PowerPC 64 bit mode (yet)"
>   #endif
> -#ifdef __NO_FPRS__
> +#if defined(__NO_FPRS__) && !defined(_SOFT_FLOAT)
>   #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
>   #endif
>   #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ccall.c b/src/lj_ccall.c
> index d39ff861..c1e12f56 100644
> --- a/src/lj_ccall.c
> +++ b/src/lj_ccall.c
> @@ -388,6 +388,24 @@
>   #define CCALL_HANDLE_COMPLEXARG \
>     /* Pass complex by value in 2 or 4 GPRs. */
>   
> +#define CCALL_HANDLE_GPR \
> +  /* Try to pass argument in GPRs. */ \
> +  if (n > 1) { \
> +    lua_assert(n == 2 || n == 4);  /* int64_t or complex (float). */ \
> +    if (ctype_isinteger(d->info) || ctype_isfp(d->info)) \
> +      ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> +    else if (ngpr + n > maxgpr) \
> +      ngpr = maxgpr;  /* Prevent reordering. */ \
> +  } \
> +  if (ngpr + n <= maxgpr) { \
> +    dp = &cc->gpr[ngpr]; \
> +    ngpr += n; \
> +    goto done; \
> +  } \
> +
> +#if LJ_ABI_SOFTFP
> +#define CCALL_HANDLE_REGARG  CCALL_HANDLE_GPR
> +#else
>   #define CCALL_HANDLE_REGARG \
>     if (isfp) {  /* Try to pass argument in FPRs. */ \
>       if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -396,24 +414,16 @@
>         d = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */ \
>         goto done; \
>       } \
> -  } else {  /* Try to pass argument in GPRs. */ \
> -    if (n > 1) { \
> -      lua_assert(n == 2 || n == 4);  /* int64_t or complex (float). */ \
> -      if (ctype_isinteger(d->info)) \
> -	ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> -      else if (ngpr + n > maxgpr) \
> -	ngpr = maxgpr;  /* Prevent reordering. */ \
> -    } \
> -    if (ngpr + n <= maxgpr) { \
> -      dp = &cc->gpr[ngpr]; \
> -      ngpr += n; \
> -      goto done; \
> -    } \
> +  } else { \
> +    CCALL_HANDLE_GPR \
>     }
> +#endif
>   
> +#if !LJ_ABI_SOFTFP
>   #define CCALL_HANDLE_RET \
>     if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
>       ctr = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */
> +#endif
>   
>   #elif LJ_TARGET_MIPS32
>   /* -- MIPS o32 calling conventions ---------------------------------------- */
> @@ -1081,7 +1091,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
>     }
>     if (fid) lj_err_caller(L, LJ_ERR_FFI_NUMARG);  /* Too few arguments. */
>   
> -#if LJ_TARGET_X64 || LJ_TARGET_PPC
> +#if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP)
>     cc->nfpr = nfpr;  /* Required for vararg functions. */
>   #endif
>     cc->nsp = nsp;
> diff --git a/src/lj_ccall.h b/src/lj_ccall.h
> index 59f66481..6efa48c7 100644
> --- a/src/lj_ccall.h
> +++ b/src/lj_ccall.h
> @@ -86,9 +86,9 @@ typedef union FPRArg {
>   #elif LJ_TARGET_PPC
>   
>   #define CCALL_NARG_GPR		8
> -#define CCALL_NARG_FPR		8
> +#define CCALL_NARG_FPR		(LJ_ABI_SOFTFP ? 0 : 8)
>   #define CCALL_NRET_GPR		4	/* For complex double. */
> -#define CCALL_NRET_FPR		1
> +#define CCALL_NRET_FPR		(LJ_ABI_SOFTFP ? 0 : 1)
>   #define CCALL_SPS_EXTRA		4
>   #define CCALL_SPS_FREE		0
>   
> diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c
> index 224b6b94..c33190d7 100644
> --- a/src/lj_ccallback.c
> +++ b/src/lj_ccallback.c
> @@ -419,6 +419,23 @@ void lj_ccallback_mcode_free(CTState *cts)
>   
>   #elif LJ_TARGET_PPC
>   
> +#define CALLBACK_HANDLE_GPR \
> +  if (n > 1) { \
> +    lua_assert(((LJ_ABI_SOFTFP && ctype_isnum(cta->info)) ||  /* double. */ \
> +		ctype_isinteger(cta->info)) && n == 2);  /* int64_t. */ \
> +    ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> +  } \
> +  if (ngpr + n <= maxgpr) { \
> +    sp = &cts->cb.gpr[ngpr]; \
> +    ngpr += n; \
> +    goto done; \
> +  }
> +
> +#if LJ_ABI_SOFTFP
> +#define CALLBACK_HANDLE_REGARG \
> +  CALLBACK_HANDLE_GPR \
> +  UNUSED(isfp);
> +#else
>   #define CALLBACK_HANDLE_REGARG \
>     if (isfp) { \
>       if (nfpr + 1 <= CCALL_NARG_FPR) { \
> @@ -427,20 +444,15 @@ void lj_ccallback_mcode_free(CTState *cts)
>         goto done; \
>       } \
>     } else {  /* Try to pass argument in GPRs. */ \
> -    if (n > 1) { \
> -      lua_assert(ctype_isinteger(cta->info) && n == 2);  /* int64_t. */ \
> -      ngpr = (ngpr + 1u) & ~1u;  /* Align int64_t to regpair. */ \
> -    } \
> -    if (ngpr + n <= maxgpr) { \
> -      sp = &cts->cb.gpr[ngpr]; \
> -      ngpr += n; \
> -      goto done; \
> -    } \
> +    CALLBACK_HANDLE_GPR \
>     }
> +#endif
>   
> +#if !LJ_ABI_SOFTFP
>   #define CALLBACK_HANDLE_RET \
>     if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
>       *(double *)dp = *(float *)dp;  /* FPRs always hold doubles. */
> +#endif
>   
>   #elif LJ_TARGET_MIPS32
>   
> diff --git a/src/lj_frame.h b/src/lj_frame.h
> index 2bdf3c48..5cb3d639 100644
> --- a/src/lj_frame.h
> +++ b/src/lj_frame.h
> @@ -226,7 +226,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
>   #define CFRAME_OFS_L		36
>   #define CFRAME_OFS_PC		32
>   #define CFRAME_OFS_MULTRES	28
> -#define CFRAME_SIZE		272
> +#define CFRAME_SIZE		(LJ_ARCH_HASFPU ? 272 : 128)
>   #define CFRAME_SHIFT_MULTRES	3
>   #endif
>   #elif LJ_TARGET_MIPS32
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index c1ac29d1..bbad35b1 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -291,7 +291,7 @@ LJ_DATA const CCallInfo lj_ir_callinfo[IRCALL__MAX+1];
>   #define fp64_f2l __aeabi_f2lz
>   #define fp64_f2ul __aeabi_f2ulz
>   #endif
> -#elif LJ_TARGET_MIPS
> +#elif LJ_TARGET_MIPS || LJ_TARGET_PPC
>   #define softfp_add __adddf3
>   #define softfp_sub __subdf3
>   #define softfp_mul __muldf3
> diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
> index 7ad8df37..980ad897 100644
> --- a/src/vm_ppc.dasc
> +++ b/src/vm_ppc.dasc
> @@ -103,6 +103,18 @@
>   |// Fixed register assignments for the interpreter.
>   |// Don't use: r1 = sp, r2 and r13 = reserved (TOC, TLS or SDATA)
>   |
> +|.macro .FPU, a, b
> +|.if FPU
> +|  a, b
> +|.endif
> +|.endmacro
> +|
> +|.macro .FPU, a, b, c
> +|.if FPU
> +|  a, b, c
> +|.endif
> +|.endmacro
> +|
>   |// The following must be C callee-save (but BASE is often refetched).
>   |.define BASE,		r14	// Base of current Lua stack frame.
>   |.define KBASE,		r15	// Constants of current Lua function.
> @@ -116,8 +128,10 @@
>   |.define TISNUM,	r22
>   |.define TISNIL,	r23
>   |.define ZERO,		r24
> +|.if FPU
>   |.define TOBIT,		f30	// 2^52 + 2^51.
>   |.define TONUM,		f31	// 2^52 + 2^51 + 2^31.
> +|.endif
>   |
>   |// The following temporaries are not saved across C calls, except for RA.
>   |.define RA,		r20	// Callee-save.
> @@ -133,6 +147,7 @@
>   |
>   |// Saved temporaries.
>   |.define SAVE0,		r21
> +|.define SAVE1,		r25
>   |
>   |// Calling conventions.
>   |.define CARG1,		r3
> @@ -141,8 +156,10 @@
>   |.define CARG4,		r6	// Overlaps TMP3.
>   |.define CARG5,		r7	// Overlaps INS.
>   |
> +|.if FPU
>   |.define FARG1,		f1
>   |.define FARG2,		f2
> +|.endif
>   |
>   |.define CRET1,		r3
>   |.define CRET2,		r4
> @@ -213,10 +230,16 @@
>   |.endif
>   |.else
>   |
> +|.if FPU
>   |.define SAVE_LR,	276(sp)
>   |.define CFRAME_SPACE,	272     // Delta for sp.
>   |// Back chain for sp:	272(sp) <-- sp entering interpreter
>   |.define SAVE_FPR_,	128     // .. 128+18*8: 64 bit FPR saves.
> +|.else
> +|.define SAVE_LR,	132(sp)
> +|.define CFRAME_SPACE,	128     // Delta for sp.
> +|// Back chain for sp:	128(sp) <-- sp entering interpreter
> +|.endif
>   |.define SAVE_GPR_,	56      // .. 56+18*4: 32 bit GPR saves.
>   |.define SAVE_CR,	52(sp)  // 32 bit CR save.
>   |.define SAVE_ERRF,	48(sp)  // 32 bit C frame info.
> @@ -226,16 +249,25 @@
>   |.define SAVE_PC,	32(sp)
>   |.define SAVE_MULTRES,	28(sp)
>   |.define UNUSED1,	24(sp)
> +|.if FPU
>   |.define TMPD_LO,	20(sp)
>   |.define TMPD_HI,	16(sp)
>   |.define TONUM_LO,	12(sp)
>   |.define TONUM_HI,	8(sp)
> +|.else
> +|.define SFSAVE_4,	20(sp)
> +|.define SFSAVE_3,	16(sp)
> +|.define SFSAVE_2,	12(sp)
> +|.define SFSAVE_1,	8(sp)
> +|.endif
>   |// Next frame lr:	4(sp)
>   |// Back chain for sp:	0(sp)	<-- sp while in interpreter
>   |
> +|.if FPU
>   |.define TMPD_BLO,	23(sp)
>   |.define TMPD,		TMPD_HI
>   |.define TONUM_D,	TONUM_HI
> +|.endif
>   |
>   |.endif
>   |
> @@ -245,7 +277,7 @@
>   |.else
>   |  stw r..reg, SAVE_GPR_+(reg-14)*4(sp)
>   |.endif
> -|  stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +|  .FPU stfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
>   |.endmacro
>   |.macro rest_, reg
>   |.if GPR64
> @@ -253,7 +285,7 @@
>   |.else
>   |  lwz r..reg, SAVE_GPR_+(reg-14)*4(sp)
>   |.endif
> -|  lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
> +|  .FPU lfd f..reg, SAVE_FPR_+(reg-14)*8(sp)
>   |.endmacro
>   |
>   |.macro saveregs
> @@ -323,6 +355,7 @@
>   |// Trap for not-yet-implemented parts.
>   |.macro NYI; tw 4, sp, sp; .endmacro
>   |
> +|.if FPU
>   |// int/FP conversions.
>   |.macro tonum_i, freg, reg
>   |  xoris reg, reg, 0x8000
> @@ -346,6 +379,7 @@
>   |.macro toint, reg, freg
>   |  toint reg, freg, freg
>   |.endmacro
> +|.endif
>   |
>   |//-----------------------------------------------------------------------
>   |
> @@ -533,9 +567,19 @@ static void build_subroutines(BuildCtx *ctx)
>     |  beq >2
>     |1:
>     |  addic. TMP1, TMP1, -8
> +  |.if FPU
>     |   lfd f0, 0(RA)
> +  |.else
> +  |   lwz CARG1, 0(RA)
> +  |   lwz CARG2, 4(RA)
> +  |.endif
>     |    addi RA, RA, 8
> +  |.if FPU
>     |   stfd f0, 0(BASE)
> +  |.else
> +  |   stw CARG1, 0(BASE)
> +  |   stw CARG2, 4(BASE)
> +  |.endif
>     |    addi BASE, BASE, 8
>     |  bney <1
>     |
> @@ -613,23 +657,23 @@ static void build_subroutines(BuildCtx *ctx)
>     |  .toc ld TOCREG, SAVE_TOC
>     |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>     |  lp BASE, L->base
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>     |   lwz DISPATCH, L->glref		// Setup pointer to dispatch table.
>     |     li ZERO, 0
> -  |     stw TMP3, TMPD
> +  |     .FPU stw TMP3, TMPD
>     |  li TMP1, LJ_TFALSE
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
>     |     li TISNIL, LJ_TNIL
>     |    li_vmstate INTERP
> -  |     lfs TOBIT, TMPD
> +  |     .FPU lfs TOBIT, TMPD
>     |  lwz PC, FRAME_PC(BASE)		// Fetch PC of previous frame.
>     |  la RA, -8(BASE)			// Results start at BASE-8.
> -  |     stw TMP3, TMPD
> +  |     .FPU stw TMP3, TMPD
>     |   addi DISPATCH, DISPATCH, GG_G2DISP
>     |  stw TMP1, 0(RA)			// Prepend false to error message.
>     |  li RD, 16				// 2 results: false + error message.
>     |    st_vmstate
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>     |  b ->vm_returnc
>     |
>     |//-----------------------------------------------------------------------
> @@ -690,22 +734,22 @@ static void build_subroutines(BuildCtx *ctx)
>     |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>     |   lp TMP1, L->top
>     |  lwz PC, FRAME_PC(BASE)
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>     |    stb CARG3, L->status
> -  |     stw TMP3, TMPD
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |     lfs TOBIT, TMPD
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU lfs TOBIT, TMPD
>     |   sub RD, TMP1, BASE
> -  |     stw TMP3, TMPD
> -  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
>     |   addi RD, RD, 8
> -  |     stw TMP0, TONUM_HI
> +  |     .FPU stw TMP0, TONUM_HI
>     |    li_vmstate INTERP
>     |     li ZERO, 0
>     |    st_vmstate
>     |  andix. TMP0, PC, FRAME_TYPE
>     |   mr MULTRES, RD
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>     |     li TISNIL, LJ_TNIL
>     |  beq ->BC_RET_Z
>     |  b ->vm_return
> @@ -739,19 +783,19 @@ static void build_subroutines(BuildCtx *ctx)
>     |  lp TMP2, L->base			// TMP2 = old base (used in vmeta_call).
>     |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>     |   lp TMP1, L->top
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>     |  add PC, PC, BASE
> -  |     stw TMP3, TMPD
> +  |     .FPU stw TMP3, TMPD
>     |     li ZERO, 0
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |     lfs TOBIT, TMPD
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU lfs TOBIT, TMPD
>     |  sub PC, PC, TMP2			// PC = frame delta + frame type
> -  |     stw TMP3, TMPD
> -  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
>     |   sub NARGS8:RC, TMP1, BASE
> -  |     stw TMP0, TONUM_HI
> +  |     .FPU stw TMP0, TONUM_HI
>     |    li_vmstate INTERP
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>     |     li TISNIL, LJ_TNIL
>     |    st_vmstate
>     |
> @@ -839,15 +883,30 @@ static void build_subroutines(BuildCtx *ctx)
>     |  lwz INS, -4(PC)
>     |   subi CARG2, RB, 16
>     |  decode_RB8 SAVE0, INS
> +  |.if FPU
>     |   lfd f0, 0(RA)
> +  |.else
> +  |   lwz TMP2, 0(RA)
> +  |   lwz TMP3, 4(RA)
> +  |.endif
>     |  add TMP1, BASE, SAVE0
>     |   stp BASE, L->base
>     |  cmplw TMP1, CARG2
>     |   sub CARG3, CARG2, TMP1
>     |  decode_RA8 RA, INS
> +  |.if FPU
>     |   stfd f0, 0(CARG2)
> +  |.else
> +  |   stw TMP2, 0(CARG2)
> +  |   stw TMP3, 4(CARG2)
> +  |.endif
>     |  bney ->BC_CAT_Z
> +  |.if FPU
>     |   stfdx f0, BASE, RA
> +  |.else
> +  |   stwux TMP2, RA, BASE
> +  |   stw TMP3, 4(RA)
> +  |.endif
>     |  b ->cont_nop
>     |
>     |//-- Table indexing metamethods -----------------------------------------
> @@ -900,9 +959,19 @@ static void build_subroutines(BuildCtx *ctx)
>     |  // Returns TValue * (finished) or NULL (metamethod).
>     |  cmplwi CRET1, 0
>     |  beq >3
> +  |.if FPU
>     |   lfd f0, 0(CRET1)
> +  |.else
> +  |   lwz TMP0, 0(CRET1)
> +  |   lwz TMP1, 4(CRET1)
> +  |.endif
>     |  ins_next1
> +  |.if FPU
>     |   stfdx f0, BASE, RA
> +  |.else
> +  |   stwux TMP0, RA, BASE
> +  |   stw TMP1, 4(RA)
> +  |.endif
>     |  ins_next2
>     |
>     |3:  // Call __index metamethod.
> @@ -920,7 +989,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |  // Returns cTValue * or NULL.
>     |  cmplwi CRET1, 0
>     |  beq >1
> +  |.if FPU
>     |  lfd f14, 0(CRET1)
> +  |.else
> +  |  lwz SAVE0, 0(CRET1)
> +  |  lwz SAVE1, 4(CRET1)
> +  |.endif
>     |  b ->BC_TGETR_Z
>     |1:
>     |  stwx TISNIL, BASE, RA
> @@ -975,11 +1049,21 @@ static void build_subroutines(BuildCtx *ctx)
>     |  bl extern lj_meta_tset		// (lua_State *L, TValue *o, TValue *k)
>     |  // Returns TValue * (finished) or NULL (metamethod).
>     |  cmplwi CRET1, 0
> +  |.if FPU
>     |   lfdx f0, BASE, RA
> +  |.else
> +  |   lwzux TMP2, RA, BASE
> +  |   lwz TMP3, 4(RA)
> +  |.endif
>     |  beq >3
>     |  // NOBARRIER: lj_meta_tset ensures the table is not black.
>     |  ins_next1
> +  |.if FPU
>     |   stfd f0, 0(CRET1)
> +  |.else
> +  |   stw TMP2, 0(CRET1)
> +  |   stw TMP3, 4(CRET1)
> +  |.endif
>     |  ins_next2
>     |
>     |3:  // Call __newindex metamethod.
> @@ -990,7 +1074,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |   add PC, TMP1, BASE
>     |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
>     |   li NARGS8:RC, 24			// 3 args for func(t, k, v)
> +  |.if FPU
>     |  stfd f0, 16(BASE)			// Copy value to third argument.
> +  |.else
> +  |  stw TMP2, 16(BASE)
> +  |  stw TMP3, 20(BASE)
> +  |.endif
>     |  b ->vm_call_dispatch_f
>     |
>     |->vmeta_tsetr:
> @@ -999,7 +1088,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |  stw PC, SAVE_PC
>     |  bl extern lj_tab_setinth  // (lua_State *L, GCtab *t, int32_t key)
>     |  // Returns TValue *.
> +  |.if FPU
>     |  stfd f14, 0(CRET1)
> +  |.else
> +  |  stw SAVE0, 0(CRET1)
> +  |  stw SAVE1, 4(CRET1)
> +  |.endif
>     |  b ->cont_nop
>     |
>     |//-- Comparison metamethods ---------------------------------------------
> @@ -1038,9 +1132,19 @@ static void build_subroutines(BuildCtx *ctx)
>     |
>     |->cont_ra:				// RA = resultptr
>     |  lwz INS, -4(PC)
> +  |.if FPU
>     |   lfd f0, 0(RA)
> +  |.else
> +  |   lwz CARG1, 0(RA)
> +  |   lwz CARG2, 4(RA)
> +  |.endif
>     |  decode_RA8 TMP1, INS
> +  |.if FPU
>     |   stfdx f0, BASE, TMP1
> +  |.else
> +  |   stwux CARG1, TMP1, BASE
> +  |   stw CARG2, 4(TMP1)
> +  |.endif
>     |  b ->cont_nop
>     |
>     |->cont_condt:			// RA = resultptr
> @@ -1246,22 +1350,32 @@ static void build_subroutines(BuildCtx *ctx)
>     |.macro .ffunc_n, name
>     |->ff_ .. name:
>     |  cmplwi NARGS8:RC, 8
> -  |   lwz CARG3, 0(BASE)
> +  |   lwz CARG1, 0(BASE)
> +  |.if FPU
>     |    lfd FARG1, 0(BASE)
> +  |.else
> +  |    lwz CARG2, 4(BASE)
> +  |.endif
>     |  blt ->fff_fallback
> -  |  checknum CARG3; bge ->fff_fallback
> +  |  checknum CARG1; bge ->fff_fallback
>     |.endmacro
>     |
>     |.macro .ffunc_nn, name
>     |->ff_ .. name:
>     |  cmplwi NARGS8:RC, 16
> -  |   lwz CARG3, 0(BASE)
> +  |   lwz CARG1, 0(BASE)
> +  |.if FPU
>     |    lfd FARG1, 0(BASE)
> -  |   lwz CARG4, 8(BASE)
> +  |   lwz CARG3, 8(BASE)
>     |    lfd FARG2, 8(BASE)
> +  |.else
> +  |    lwz CARG2, 4(BASE)
> +  |   lwz CARG3, 8(BASE)
> +  |    lwz CARG4, 12(BASE)
> +  |.endif
>     |  blt ->fff_fallback
> +  |  checknum CARG1; bge ->fff_fallback
>     |  checknum CARG3; bge ->fff_fallback
> -  |  checknum CARG4; bge ->fff_fallback
>     |.endmacro
>     |
>     |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1.
> @@ -1282,14 +1396,21 @@ static void build_subroutines(BuildCtx *ctx)
>     |  bge cr1, ->fff_fallback
>     |   stw CARG3, 0(RA)
>     |  addi RD, NARGS8:RC, 8		// Compute (nresults+1)*8.
> +  |  addi TMP1, BASE, 8
> +  |  add TMP2, RA, NARGS8:RC
>     |   stw CARG1, 4(RA)
>     |  beq ->fff_res			// Done if exactly 1 argument.
> -  |  li TMP1, 8
> -  |  subi RC, RC, 8
>     |1:
> -  |  cmplw TMP1, RC
> -  |   lfdx f0, BASE, TMP1
> -  |   stfdx f0, RA, TMP1
> +  |  cmplw TMP1, TMP2
> +  |.if FPU
> +  |   lfd f0, 0(TMP1)
> +  |   stfd f0, 0(TMP1)
> +  |.else
> +  |   lwz CARG1, 0(TMP1)
> +  |   lwz CARG2, 4(TMP1)
> +  |   stw CARG1, -8(TMP1)
> +  |   stw CARG2, -4(TMP1)
> +  |.endif
>     |    addi TMP1, TMP1, 8
>     |  bney <1
>     |  b ->fff_res
> @@ -1304,8 +1425,14 @@ static void build_subroutines(BuildCtx *ctx)
>     |  orc TMP1, TMP2, TMP0
>     |  addi TMP1, TMP1, ~LJ_TISNUM+1
>     |  slwi TMP1, TMP1, 3
> +  |.if FPU
>     |   la TMP2, CFUNC:RB->upvalue
>     |  lfdx FARG1, TMP2, TMP1
> +  |.else
> +  |  add TMP1, CFUNC:RB, TMP1
> +  |  lwz CARG1, CFUNC:TMP1->upvalue[0].u32.hi
> +  |  lwz CARG2, CFUNC:TMP1->upvalue[0].u32.lo
> +  |.endif
>     |  b ->fff_resn
>     |
>     |//-- Base library: getters and setters ---------------------------------
> @@ -1383,7 +1510,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |   mr CARG1, L
>     |  bl extern lj_tab_get  // (lua_State *L, GCtab *t, cTValue *key)
>     |  // Returns cTValue *.
> +  |.if FPU
>     |  lfd FARG1, 0(CRET1)
> +  |.else
> +  |  lwz CARG2, 4(CRET1)
> +  |  lwz CARG1, 0(CRET1)	// Caveat: CARG1 == CRET1.
> +  |.endif
>     |  b ->fff_resn
>     |
>     |//-- Base library: conversions ------------------------------------------
> @@ -1392,7 +1524,11 @@ static void build_subroutines(BuildCtx *ctx)
>     |  // Only handles the number case inline (without a base argument).
>     |  cmplwi NARGS8:RC, 8
>     |   lwz CARG1, 0(BASE)
> +  |.if FPU
>     |    lfd FARG1, 0(BASE)
> +  |.else
> +  |    lwz CARG2, 4(BASE)
> +  |.endif
>     |  bne ->fff_fallback			// Exactly one argument.
>     |   checknum CARG1; bgt ->fff_fallback
>     |  b ->fff_resn
> @@ -1443,12 +1579,23 @@ static void build_subroutines(BuildCtx *ctx)
>     |  cmplwi CRET1, 0
>     |   li CARG3, LJ_TNIL
>     |  beq ->fff_restv			// End of traversal: return nil.
> -  |  lfd f0, 8(BASE)			// Copy key and value to results.
>     |   la RA, -8(BASE)
> +  |.if FPU
> +  |  lfd f0, 8(BASE)			// Copy key and value to results.
>     |  lfd f1, 16(BASE)
>     |  stfd f0, 0(RA)
> -  |   li RD, (2+1)*8
>     |  stfd f1, 8(RA)
> +  |.else
> +  |  lwz CARG1, 8(BASE)
> +  |  lwz CARG2, 12(BASE)
> +  |  lwz CARG3, 16(BASE)
> +  |  lwz CARG4, 20(BASE)
> +  |  stw CARG1, 0(RA)
> +  |  stw CARG2, 4(RA)
> +  |  stw CARG3, 8(RA)
> +  |  stw CARG4, 12(RA)
> +  |.endif
> +  |   li RD, (2+1)*8
>     |  b ->fff_res
>     |
>     |.ffunc_1 pairs
> @@ -1457,17 +1604,32 @@ static void build_subroutines(BuildCtx *ctx)
>     |  bne ->fff_fallback
>   #if LJ_52
>     |   lwz TAB:TMP2, TAB:CARG1->metatable
> +  |.if FPU
>     |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>     |   cmplwi TAB:TMP2, 0
>     |  la RA, -8(BASE)
>     |   bne ->fff_fallback
>   #else
> +  |.if FPU
>     |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>     |  la RA, -8(BASE)
>   #endif
>     |   stw TISNIL, 8(BASE)
>     |  li RD, (3+1)*8
> +  |.if FPU
>     |  stfd f0, 0(RA)
> +  |.else
> +  |  stw TMP0, 0(RA)
> +  |  stw TMP1, 4(RA)
> +  |.endif
>     |  b ->fff_res
>     |
>     |.ffunc ipairs_aux
> @@ -1513,14 +1675,24 @@ static void build_subroutines(BuildCtx *ctx)
>     |  stfd FARG2, 0(RA)
>     |.endif
>     |  ble >2				// Not in array part?
> +  |.if FPU
>     |  lwzx TMP2, TMP1, TMP3
>     |  lfdx f0, TMP1, TMP3
> +  |.else
> +  |  lwzux TMP2, TMP1, TMP3
> +  |  lwz TMP3, 4(TMP1)
> +  |.endif
>     |1:
>     |  checknil TMP2
>     |   li RD, (0+1)*8
>     |  beq ->fff_res			// End of iteration, return 0 results.
>     |   li RD, (2+1)*8
> +  |.if FPU
>     |  stfd f0, 8(RA)
> +  |.else
> +  |  stw TMP2, 8(RA)
> +  |  stw TMP3, 12(RA)
> +  |.endif
>     |  b ->fff_res
>     |2:  // Check for empty hash part first. Otherwise call C function.
>     |  lwz TMP0, TAB:CARG1->hmask
> @@ -1534,7 +1706,11 @@ static void build_subroutines(BuildCtx *ctx)
>     |   li RD, (0+1)*8
>     |  beq ->fff_res
>     |  lwz TMP2, 0(CRET1)
> +  |.if FPU
>     |  lfd f0, 0(CRET1)
> +  |.else
> +  |  lwz TMP3, 4(CRET1)
> +  |.endif
>     |  b <1
>     |
>     |.ffunc_1 ipairs
> @@ -1543,12 +1719,22 @@ static void build_subroutines(BuildCtx *ctx)
>     |  bne ->fff_fallback
>   #if LJ_52
>     |   lwz TAB:TMP2, TAB:CARG1->metatable
> +  |.if FPU
>     |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>     |   cmplwi TAB:TMP2, 0
>     |  la RA, -8(BASE)
>     |   bne ->fff_fallback
>   #else
> +  |.if FPU
>     |  lfd f0, CFUNC:RB->upvalue[0]
> +  |.else
> +  |  lwz TMP0, CFUNC:RB->upvalue[0].u32.hi
> +  |  lwz TMP1, CFUNC:RB->upvalue[0].u32.lo
> +  |.endif
>     |  la RA, -8(BASE)
>   #endif
>     |.if DUALNUM
> @@ -1558,7 +1744,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |.endif
>     |   stw ZERO, 12(BASE)
>     |  li RD, (3+1)*8
> +  |.if FPU
>     |  stfd f0, 0(RA)
> +  |.else
> +  |  stw TMP0, 0(RA)
> +  |  stw TMP1, 4(RA)
> +  |.endif
>     |  b ->fff_res
>     |
>     |//-- Base library: catch errors ----------------------------------------
> @@ -1577,19 +1768,32 @@ static void build_subroutines(BuildCtx *ctx)
>     |
>     |.ffunc xpcall
>     |  cmplwi NARGS8:RC, 16
> -  |   lwz CARG4, 8(BASE)
> +  |   lwz CARG3, 8(BASE)
> +  |.if FPU
>     |    lfd FARG2, 8(BASE)
>     |    lfd FARG1, 0(BASE)
> +  |.else
> +  |    lwz CARG1, 0(BASE)
> +  |    lwz CARG2, 4(BASE)
> +  |    lwz CARG4, 12(BASE)
> +  |.endif
>     |  blt ->fff_fallback
>     |  lbz TMP1, DISPATCH_GL(hookmask)(DISPATCH)
>     |   mr TMP2, BASE
> -  |  checkfunc CARG4; bne ->fff_fallback  // Traceback must be a function.
> +  |  checkfunc CARG3; bne ->fff_fallback  // Traceback must be a function.
>     |   la BASE, 16(BASE)
>     |  // Remember active hook before pcall.
>     |  rlwinm TMP1, TMP1, 32-HOOK_ACTIVE_SHIFT, 31, 31
> +  |.if FPU
>     |    stfd FARG2, 0(TMP2)		// Swap function and traceback.
> -  |  subi NARGS8:RC, NARGS8:RC, 16
>     |    stfd FARG1, 8(TMP2)
> +  |.else
> +  |    stw CARG3, 0(TMP2)
> +  |    stw CARG4, 4(TMP2)
> +  |    stw CARG1, 8(TMP2)
> +  |    stw CARG2, 12(TMP2)
> +  |.endif
> +  |  subi NARGS8:RC, NARGS8:RC, 16
>     |  addi PC, TMP1, 16+FRAME_PCALL
>     |  b ->vm_call_dispatch
>     |
> @@ -1632,9 +1836,21 @@ static void build_subroutines(BuildCtx *ctx)
>     |  stp BASE, L->top
>     |2:  // Move args to coroutine.
>     |  cmpw TMP1, NARGS8:RC
> +  |.if FPU
>     |   lfdx f0, BASE, TMP1
> +  |.else
> +  |   add CARG3, BASE, TMP1
> +  |   lwz TMP2, 0(CARG3)
> +  |   lwz TMP3, 4(CARG3)
> +  |.endif
>     |  beq >3
> +  |.if FPU
>     |   stfdx f0, CARG2, TMP1
> +  |.else
> +  |   add CARG3, CARG2, TMP1
> +  |   stw TMP2, 0(CARG3)
> +  |   stw TMP3, 4(CARG3)
> +  |.endif
>     |  addi TMP1, TMP1, 8
>     |  b <2
>     |3:
> @@ -1665,8 +1881,17 @@ static void build_subroutines(BuildCtx *ctx)
>     |   stp TMP2, L:SAVE0->top		// Clear coroutine stack.
>     |5:  // Move results from coroutine.
>     |  cmplw TMP1, TMP3
> +  |.if FPU
>     |   lfdx f0, TMP2, TMP1
>     |   stfdx f0, BASE, TMP1
> +  |.else
> +  |   add CARG3, TMP2, TMP1
> +  |   lwz CARG1, 0(CARG3)
> +  |   lwz CARG2, 4(CARG3)
> +  |   add CARG3, BASE, TMP1
> +  |   stw CARG1, 0(CARG3)
> +  |   stw CARG2, 4(CARG3)
> +  |.endif
>     |    addi TMP1, TMP1, 8
>     |  bne <5
>     |6:
> @@ -1691,12 +1916,22 @@ static void build_subroutines(BuildCtx *ctx)
>     |  andix. TMP0, PC, FRAME_TYPE
>     |  la TMP3, -8(TMP3)
>     |   li TMP1, LJ_TFALSE
> +  |.if FPU
>     |  lfd f0, 0(TMP3)
> +  |.else
> +  |  lwz CARG1, 0(TMP3)
> +  |  lwz CARG2, 4(TMP3)
> +  |.endif
>     |   stp TMP3, L:SAVE0->top		// Remove error from coroutine stack.
>     |    li RD, (2+1)*8
>     |   stw TMP1, -8(BASE)		// Prepend false to results.
>     |    la RA, -8(BASE)
> +  |.if FPU
>     |  stfd f0, 0(BASE)			// Copy error message.
> +  |.else
> +  |  stw CARG1, 0(BASE)			// Copy error message.
> +  |  stw CARG2, 4(BASE)
> +  |.endif
>     |  b <7
>     |.else
>     |  mr CARG1, L
> @@ -1875,7 +2110,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |  lus CARG1, 0x8000			// -(2^31).
>     |  beqy ->fff_resi
>     |5:
> +  |.if FPU
>     |  lfd FARG1, 0(BASE)
> +  |.else
> +  |  lwz CARG1, 0(BASE)
> +  |  lwz CARG2, 4(BASE)
> +  |.endif
>     |  blex func
>     |  b ->fff_resn
>     |.endmacro
> @@ -1899,10 +2139,14 @@ static void build_subroutines(BuildCtx *ctx)
>     |
>     |.ffunc math_log
>     |  cmplwi NARGS8:RC, 8
> -  |   lwz CARG3, 0(BASE)
> -  |    lfd FARG1, 0(BASE)
> +  |   lwz CARG1, 0(BASE)
>     |  bne ->fff_fallback			// Need exactly 1 argument.
> -  |  checknum CARG3; bge ->fff_fallback
> +  |  checknum CARG1; bge ->fff_fallback
> +  |.if FPU
> +  |  lfd FARG1, 0(BASE)
> +  |.else
> +  |  lwz CARG2, 4(BASE)
> +  |.endif
>     |  blex log
>     |  b ->fff_resn
>     |
> @@ -1924,17 +2168,24 @@ static void build_subroutines(BuildCtx *ctx)
>     |.if DUALNUM
>     |.ffunc math_ldexp
>     |  cmplwi NARGS8:RC, 16
> -  |   lwz CARG3, 0(BASE)
> +  |   lwz TMP0, 0(BASE)
> +  |.if FPU
>     |    lfd FARG1, 0(BASE)
> -  |   lwz CARG4, 8(BASE)
> +  |.else
> +  |    lwz CARG1, 0(BASE)
> +  |    lwz CARG2, 4(BASE)
> +  |.endif
> +  |   lwz TMP1, 8(BASE)
>     |.if GPR64
>     |    lwz CARG2, 12(BASE)
> -  |.else
> +  |.elif FPU
>     |    lwz CARG1, 12(BASE)
> +  |.else
> +  |    lwz CARG3, 12(BASE)
>     |.endif
>     |  blt ->fff_fallback
> -  |  checknum CARG3; bge ->fff_fallback
> -  |  checknum CARG4; bne ->fff_fallback
> +  |  checknum TMP0; bge ->fff_fallback
> +  |  checknum TMP1; bne ->fff_fallback
>     |.else
>     |.ffunc_nn math_ldexp
>     |.if GPR64
> @@ -1949,8 +2200,10 @@ static void build_subroutines(BuildCtx *ctx)
>     |.ffunc_n math_frexp
>     |.if GPR64
>     |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
> -  |.else
> +  |.elif FPU
>     |  la CARG1, DISPATCH_GL(tmptv)(DISPATCH)
> +  |.else
> +  |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
>     |.endif
>     |   lwz PC, FRAME_PC(BASE)
>     |  blex frexp
> @@ -1959,7 +2212,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |.if not DUALNUM
>     |   tonum_i FARG2, TMP1
>     |.endif
> +  |.if FPU
>     |  stfd FARG1, 0(RA)
> +  |.else
> +  |  stw CRET1, 0(RA)
> +  |  stw CRET2, 4(RA)
> +  |.endif
>     |  li RD, (2+1)*8
>     |.if DUALNUM
>     |   stw TISNUM, 8(RA)
> @@ -1972,13 +2230,20 @@ static void build_subroutines(BuildCtx *ctx)
>     |.ffunc_n math_modf
>     |.if GPR64
>     |  la CARG2, -8(BASE)
> -  |.else
> +  |.elif FPU
>     |  la CARG1, -8(BASE)
> +  |.else
> +  |  la CARG3, -8(BASE)
>     |.endif
>     |   lwz PC, FRAME_PC(BASE)
>     |  blex modf
>     |   la RA, -8(BASE)
> +  |.if FPU
>     |  stfd FARG1, 0(BASE)
> +  |.else
> +  |  stw CRET1, 0(BASE)
> +  |  stw CRET2, 4(BASE)
> +  |.endif
>     |  li RD, (2+1)*8
>     |  b ->fff_res
>     |
> @@ -1986,13 +2251,13 @@ static void build_subroutines(BuildCtx *ctx)
>     |.if DUALNUM
>     |  .ffunc_1 name
>     |  checknum CARG3
> -  |   addi TMP1, BASE, 8
> -  |   add TMP2, BASE, NARGS8:RC
> +  |   addi SAVE0, BASE, 8
> +  |   add SAVE1, BASE, NARGS8:RC
>     |  bne >4
>     |1:  // Handle integers.
> -  |  lwz CARG4, 0(TMP1)
> -  |   cmplw cr1, TMP1, TMP2
> -  |  lwz CARG2, 4(TMP1)
> +  |  lwz CARG4, 0(SAVE0)
> +  |   cmplw cr1, SAVE0, SAVE1
> +  |  lwz CARG2, 4(SAVE0)
>     |   bge cr1, ->fff_resi
>     |  checknum CARG4
>     |   xoris TMP0, CARG1, 0x8000
> @@ -2009,36 +2274,76 @@ static void build_subroutines(BuildCtx *ctx)
>     |.if GPR64
>     |  rldicl CARG1, CARG1, 0, 32
>     |.endif
> -  |   addi TMP1, TMP1, 8
> +  |   addi SAVE0, SAVE0, 8
>     |  b <1
>     |3:
>     |  bge ->fff_fallback
>     |  // Convert intermediate result to number and continue below.
> +  |.if FPU
>     |  tonum_i FARG1, CARG1
> -  |  lfd FARG2, 0(TMP1)
> +  |  lfd FARG2, 0(SAVE0)
> +  |.else
> +  |  mr CARG2, CARG1
> +  |  bl ->vm_sfi2d_1
> +  |  lwz CARG3, 0(SAVE0)
> +  |  lwz CARG4, 4(SAVE0)
> +  |.endif
>     |  b >6
>     |4:
> +  |.if FPU
>     |   lfd FARG1, 0(BASE)
> +  |.else
> +  |   lwz CARG1, 0(BASE)
> +  |   lwz CARG2, 4(BASE)
> +  |.endif
>     |  bge ->fff_fallback
>     |5:  // Handle numbers.
> -  |  lwz CARG4, 0(TMP1)
> -  |   cmplw cr1, TMP1, TMP2
> -  |  lfd FARG2, 0(TMP1)
> +  |  lwz CARG3, 0(SAVE0)
> +  |   cmplw cr1, SAVE0, SAVE1
> +  |.if FPU
> +  |  lfd FARG2, 0(SAVE0)
> +  |.else
> +  |  lwz CARG4, 4(SAVE0)
> +  |.endif
>     |   bge cr1, ->fff_resn
> -  |  checknum CARG4; bge >7
> +  |  checknum CARG3; bge >7
>     |6:
> +  |   addi SAVE0, SAVE0, 8
> +  |.if FPU
>     |  fsub f0, FARG1, FARG2
> -  |   addi TMP1, TMP1, 8
>     |.if ismax
>     |  fsel FARG1, f0, FARG1, FARG2
>     |.else
>     |  fsel FARG1, f0, FARG2, FARG1
>     |.endif
> +  |.else
> +  |  stw CARG1, SFSAVE_1
> +  |  stw CARG2, SFSAVE_2
> +  |  stw CARG3, SFSAVE_3
> +  |  stw CARG4, SFSAVE_4
> +  |  blex __ledf2
> +  |  cmpwi CRET1, 0
> +  |.if ismax
> +  |  blt >8
> +  |.else
> +  |  bge >8
> +  |.endif
> +  |  lwz CARG1, SFSAVE_1
> +  |  lwz CARG2, SFSAVE_2
> +  |  b <5
> +  |8:
> +  |  lwz CARG1, SFSAVE_3
> +  |  lwz CARG2, SFSAVE_4
> +  |.endif
>     |  b <5
>     |7:  // Convert integer to number and continue above.
> -  |   lwz CARG2, 4(TMP1)
> +  |   lwz CARG3, 4(SAVE0)
>     |  bne ->fff_fallback
> -  |  tonum_i FARG2, CARG2
> +  |.if FPU
> +  |  tonum_i FARG2, CARG3
> +  |.else
> +  |  bl ->vm_sfi2d_2
> +  |.endif
>     |  b <6
>     |.else
>     |  .ffunc_n name
> @@ -2238,28 +2543,37 @@ static void build_subroutines(BuildCtx *ctx)
>     |
>     |.macro .ffunc_bit_op, name, ins
>     |  .ffunc_bit name
> -  |  addi TMP1, BASE, 8
> -  |  add TMP2, BASE, NARGS8:RC
> +  |  addi SAVE0, BASE, 8
> +  |  add SAVE1, BASE, NARGS8:RC
>     |1:
> -  |  lwz CARG4, 0(TMP1)
> -  |   cmplw cr1, TMP1, TMP2
> +  |  lwz CARG4, 0(SAVE0)
> +  |   cmplw cr1, SAVE0, SAVE1
>     |.if DUALNUM
> -  |  lwz CARG2, 4(TMP1)
> +  |  lwz CARG2, 4(SAVE0)
>     |.else
> -  |  lfd FARG1, 0(TMP1)
> +  |  lfd FARG1, 0(SAVE0)
>     |.endif
>     |   bgey cr1, ->fff_resi
>     |  checknum CARG4
>     |.if DUALNUM
> +  |.if FPU
>     |  bnel ->fff_bitop_fb
>     |.else
> +  |  beq >3
> +  |  stw CARG1, SFSAVE_1
> +  |  bl ->fff_bitop_fb
> +  |  mr CARG2, CARG1
> +  |  lwz CARG1, SFSAVE_1
> +  |3:
> +  |.endif
> +  |.else
>     |  fadd FARG1, FARG1, TOBIT
>     |  bge ->fff_fallback
>     |  stfd FARG1, TMPD
>     |  lwz CARG2, TMPD_LO
>     |.endif
>     |  ins CARG1, CARG1, CARG2
> -  |   addi TMP1, TMP1, 8
> +  |   addi SAVE0, SAVE0, 8
>     |  b <1
>     |.endmacro
>     |
> @@ -2281,7 +2595,14 @@ static void build_subroutines(BuildCtx *ctx)
>     |.macro .ffunc_bit_sh, name, ins, shmod
>     |.if DUALNUM
>     |  .ffunc_2 bit_..name
> +  |.if FPU
>     |  checknum CARG3; bnel ->fff_tobit_fb
> +  |.else
> +  |  checknum CARG3; beq >1
> +  |  bl ->fff_tobit_fb
> +  |  lwz CARG2, 12(BASE)	// Conversion polluted CARG2.
> +  |1:
> +  |.endif
>     |  // Note: no inline conversion from number for 2nd argument!
>     |  checknum CARG4; bne ->fff_fallback
>     |.else
> @@ -2318,27 +2639,77 @@ static void build_subroutines(BuildCtx *ctx)
>     |->fff_resn:
>     |  lwz PC, FRAME_PC(BASE)
>     |  la RA, -8(BASE)
> +  |.if FPU
>     |  stfd FARG1, -8(BASE)
> +  |.else
> +  |  stw CARG1, -8(BASE)
> +  |  stw CARG2, -4(BASE)
> +  |.endif
>     |  b ->fff_res1
>     |
>     |// Fallback FP number to bit conversion.
>     |->fff_tobit_fb:
>     |.if DUALNUM
> +  |.if FPU
>     |  lfd FARG1, 0(BASE)
>     |  bgt ->fff_fallback
>     |  fadd FARG1, FARG1, TOBIT
>     |  stfd FARG1, TMPD
>     |  lwz CARG1, TMPD_LO
>     |  blr
> +  |.else
> +  |  bgt ->fff_fallback
> +  |  mr CARG2, CARG1
> +  |  mr CARG1, CARG3
> +  |// Modifies: CARG1, CARG2, TMP0, TMP1, TMP2.
> +  |->vm_tobit:
> +  |  slwi TMP2, CARG1, 1
> +  |  addis TMP2, TMP2, 0x0020
> +  |  cmpwi TMP2, 0
> +  |  bge >2
> +  |   li TMP1, 0x3e0
> +  |  srawi TMP2, TMP2, 21
> +  |   not TMP1, TMP1
> +  |  sub. TMP2, TMP1, TMP2
> +  |    cmpwi cr7, CARG1, 0
> +  |  blt >1
> +  |   slwi TMP1, CARG1, 11
> +  |    srwi TMP0, CARG2, 21
> +  |   oris TMP1, TMP1, 0x8000
> +  |   or TMP1, TMP1, TMP0
> +  |   srw CARG1, TMP1, TMP2
> +  |  bclr 4, 28			// Return if cr7[lt] == 0, no hint.
> +  |   neg CARG1, CARG1
> +  |  blr
> +  |1:
> +  |  addi TMP2, TMP2, 21
> +  |  srw TMP1, CARG2, TMP2
> +  |   slwi CARG2, CARG1, 12
> +  |  subfic TMP2, TMP2, 20
> +  |   slw TMP0, CARG2, TMP2
> +  |   or CARG1, TMP1, TMP0
> +  |  bclr 4, 28			// Return if cr7[lt] == 0, no hint.
> +  |   neg CARG1, CARG1
> +  |  blr
> +  |2:
> +  |  li CARG1, 0
> +  |  blr
> +  |.endif
>     |.endif
>     |->fff_bitop_fb:
>     |.if DUALNUM
> -  |  lfd FARG1, 0(TMP1)
> +  |.if FPU
> +  |  lfd FARG1, 0(SAVE0)
>     |  bgt ->fff_fallback
>     |  fadd FARG1, FARG1, TOBIT
>     |  stfd FARG1, TMPD
>     |  lwz CARG2, TMPD_LO
>     |  blr
> +  |.else
> +  |  bgt ->fff_fallback
> +  |  mr CARG1, CARG4
> +  |  b ->vm_tobit
> +  |.endif
>     |.endif
>     |
>     |//-----------------------------------------------------------------------
> @@ -2531,10 +2902,21 @@ static void build_subroutines(BuildCtx *ctx)
>     |  decode_RA8 RC, INS			// Call base.
>     |   beq >2
>     |1:  // Move results down.
> +  |.if FPU
>     |  lfd f0, 0(RA)
> +  |.else
> +  |  lwz CARG1, 0(RA)
> +  |  lwz CARG2, 4(RA)
> +  |.endif
>     |   addic. TMP1, TMP1, -8
>     |    addi RA, RA, 8
> +  |.if FPU
>     |  stfdx f0, BASE, RC
> +  |.else
> +  |  add CARG3, BASE, RC
> +  |  stw CARG1, 0(CARG3)
> +  |  stw CARG2, 4(CARG3)
> +  |.endif
>     |    addi RC, RC, 8
>     |   bne <1
>     |2:
> @@ -2587,10 +2969,12 @@ static void build_subroutines(BuildCtx *ctx)
>     |//-----------------------------------------------------------------------
>     |
>     |.macro savex_, a, b, c, d
> +  |.if FPU
>     |  stfd f..a, 16+a*8(sp)
>     |  stfd f..b, 16+b*8(sp)
>     |  stfd f..c, 16+c*8(sp)
>     |  stfd f..d, 16+d*8(sp)
> +  |.endif
>     |.endmacro
>     |
>     |->vm_exit_handler:
> @@ -2662,16 +3046,16 @@ static void build_subroutines(BuildCtx *ctx)
>     |  lwz KBASE, PC2PROTO(k)(TMP1)
>     |  // Setup type comparison constants.
>     |  li TISNUM, LJ_TISNUM
> -  |  lus TMP3, 0x59c0			// TOBIT = 2^52 + 2^51 (float).
> -  |  stw TMP3, TMPD
> +  |  .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |  .FPU stw TMP3, TMPD
>     |  li ZERO, 0
> -  |  ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |  lfs TOBIT, TMPD
> -  |  stw TMP3, TMPD
> -  |  lus TMP0, 0x4338			// Hiword of 2^52 + 2^51 (double)
> +  |  .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |  .FPU lfs TOBIT, TMPD
> +  |  .FPU stw TMP3, TMPD
> +  |  .FPU lus TMP0, 0x4338			// Hiword of 2^52 + 2^51 (double)
>     |    li TISNIL, LJ_TNIL
> -  |  stw TMP0, TONUM_HI
> -  |  lfs TONUM, TMPD
> +  |  .FPU stw TMP0, TONUM_HI
> +  |  .FPU lfs TONUM, TMPD
>     |  // Modified copy of ins_next which handles function header dispatch, too.
>     |  lwz INS, 0(PC)
>     |   addi PC, PC, 4
> @@ -2716,7 +3100,35 @@ static void build_subroutines(BuildCtx *ctx)
>     |//-- Math helper functions ----------------------------------------------
>     |//-----------------------------------------------------------------------
>     |
> -  |// NYI: Use internal implementations of floor, ceil, trunc.
> +  |// NYI: Use internal implementations of floor, ceil, trunc, sfcmp.
> +  |
> +  |.macro sfi2d, AHI, ALO
> +  |.if not FPU
> +  |  mr. AHI, ALO
> +  |  bclr 12, 2				// Handle zero first.
> +  |  srawi TMP0, ALO, 31
> +  |  xor TMP1, ALO, TMP0
> +  |  sub TMP1, TMP1, TMP0		// Absolute value in TMP1.
> +  |  cntlzw AHI, TMP1
> +  |  andix. TMP0, TMP0, 0x800		// Mask sign bit.
> +  |  slw TMP1, TMP1, AHI		// Align mantissa left with leading 1.
> +  |  subfic AHI, AHI, 0x3ff+31-1	// Exponent -1 in AHI.
> +  |  slwi ALO, TMP1, 21
> +  |  or AHI, AHI, TMP0			// Sign | Exponent.
> +  |  srwi TMP1, TMP1, 11
> +  |  slwi AHI, AHI, 20			// Align left.
> +  |  add AHI, AHI, TMP1			// Add mantissa, increment exponent.
> +  |  blr
> +  |.endif
> +  |.endmacro
> +  |
> +  |// Input: CARG2. Output: CARG1, CARG2. Temporaries: TMP0, TMP1.
> +  |->vm_sfi2d_1:
> +  |  sfi2d CARG1, CARG2
> +  |
> +  |// Input: CARG4. Output: CARG3, CARG4. Temporaries: TMP0, TMP1.
> +  |->vm_sfi2d_2:
> +  |  sfi2d CARG3, CARG4
>     |
>     |->vm_modi:
>     |  divwo. TMP0, CARG1, CARG2
> @@ -2784,21 +3196,21 @@ static void build_subroutines(BuildCtx *ctx)
>     |   addi DISPATCH, r12, GG_G2DISP
>     |  stw r11, CTSTATE->cb.slot
>     |  stw r3, CTSTATE->cb.gpr[0]
> -  |   stfd f1, CTSTATE->cb.fpr[0]
> +  |   .FPU stfd f1, CTSTATE->cb.fpr[0]
>     |  stw r4, CTSTATE->cb.gpr[1]
> -  |   stfd f2, CTSTATE->cb.fpr[1]
> +  |   .FPU stfd f2, CTSTATE->cb.fpr[1]
>     |  stw r5, CTSTATE->cb.gpr[2]
> -  |   stfd f3, CTSTATE->cb.fpr[2]
> +  |   .FPU stfd f3, CTSTATE->cb.fpr[2]
>     |  stw r6, CTSTATE->cb.gpr[3]
> -  |   stfd f4, CTSTATE->cb.fpr[3]
> +  |   .FPU stfd f4, CTSTATE->cb.fpr[3]
>     |  stw r7, CTSTATE->cb.gpr[4]
> -  |   stfd f5, CTSTATE->cb.fpr[4]
> +  |   .FPU stfd f5, CTSTATE->cb.fpr[4]
>     |  stw r8, CTSTATE->cb.gpr[5]
> -  |   stfd f6, CTSTATE->cb.fpr[5]
> +  |   .FPU stfd f6, CTSTATE->cb.fpr[5]
>     |  stw r9, CTSTATE->cb.gpr[6]
> -  |   stfd f7, CTSTATE->cb.fpr[6]
> +  |   .FPU stfd f7, CTSTATE->cb.fpr[6]
>     |  stw r10, CTSTATE->cb.gpr[7]
> -  |   stfd f8, CTSTATE->cb.fpr[7]
> +  |   .FPU stfd f8, CTSTATE->cb.fpr[7]
>     |  addi TMP0, sp, CFRAME_SPACE+8
>     |  stw TMP0, CTSTATE->cb.stack
>     |   mr CARG1, CTSTATE
> @@ -2809,21 +3221,21 @@ static void build_subroutines(BuildCtx *ctx)
>     |  lp BASE, L:CRET1->base
>     |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
>     |  lp RC, L:CRET1->top
> -  |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
> +  |     .FPU lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
>     |     li ZERO, 0
>     |   mr L, CRET1
> -  |     stw TMP3, TMPD
> -  |     lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
> +  |     .FPU stw TMP3, TMPD
> +  |     .FPU lus TMP0, 0x4338		// Hiword of 2^52 + 2^51 (double)
>     |  lwz LFUNC:RB, FRAME_FUNC(BASE)
> -  |     ori TMP3, TMP3, 0x0004		// TONUM = 2^52 + 2^51 + 2^31 (float).
> -  |     stw TMP0, TONUM_HI
> +  |     .FPU ori TMP3, TMP3, 0x0004	// TONUM = 2^52 + 2^51 + 2^31 (float).
> +  |     .FPU stw TMP0, TONUM_HI
>     |     li TISNIL, LJ_TNIL
>     |    li_vmstate INTERP
> -  |     lfs TOBIT, TMPD
> -  |     stw TMP3, TMPD
> +  |     .FPU lfs TOBIT, TMPD
> +  |     .FPU stw TMP3, TMPD
>     |  sub RC, RC, BASE
>     |    st_vmstate
> -  |     lfs TONUM, TMPD
> +  |     .FPU lfs TONUM, TMPD
>     |  ins_callt
>     |.endif
>     |
> @@ -2837,7 +3249,7 @@ static void build_subroutines(BuildCtx *ctx)
>     |  mr CARG2, RA
>     |  bl extern lj_ccallback_leave	// (CTState *cts, TValue *o)
>     |  lwz CRET1, CTSTATE->cb.gpr[0]
> -  |  lfd FARG1, CTSTATE->cb.fpr[0]
> +  |  .FPU lfd FARG1, CTSTATE->cb.fpr[0]
>     |  lwz CRET2, CTSTATE->cb.gpr[1]
>     |  b ->vm_leave_unw
>     |.endif
> @@ -2871,14 +3283,14 @@ static void build_subroutines(BuildCtx *ctx)
>     |  bge <1
>     |2:
>     |  bney cr1, >3
> -  |  lfd f1, CCSTATE->fpr[0]
> -  |  lfd f2, CCSTATE->fpr[1]
> -  |  lfd f3, CCSTATE->fpr[2]
> -  |  lfd f4, CCSTATE->fpr[3]
> -  |  lfd f5, CCSTATE->fpr[4]
> -  |  lfd f6, CCSTATE->fpr[5]
> -  |  lfd f7, CCSTATE->fpr[6]
> -  |  lfd f8, CCSTATE->fpr[7]
> +  |  .FPU lfd f1, CCSTATE->fpr[0]
> +  |  .FPU lfd f2, CCSTATE->fpr[1]
> +  |  .FPU lfd f3, CCSTATE->fpr[2]
> +  |  .FPU lfd f4, CCSTATE->fpr[3]
> +  |  .FPU lfd f5, CCSTATE->fpr[4]
> +  |  .FPU lfd f6, CCSTATE->fpr[5]
> +  |  .FPU lfd f7, CCSTATE->fpr[6]
> +  |  .FPU lfd f8, CCSTATE->fpr[7]
>     |3:
>     |   lp TMP0, CCSTATE->func
>     |  lwz CARG2, CCSTATE->gpr[1]
> @@ -2895,7 +3307,7 @@ static void build_subroutines(BuildCtx *ctx)
>     |  lwz TMP2, -4(r14)
>     |   lwz TMP0, 4(r14)
>     |  stw CARG1, CCSTATE:TMP1->gpr[0]
> -  |  stfd FARG1, CCSTATE:TMP1->fpr[0]
> +  |  .FPU stfd FARG1, CCSTATE:TMP1->fpr[0]
>     |  stw CARG2, CCSTATE:TMP1->gpr[1]
>     |   mtlr TMP0
>     |  stw CARG3, CCSTATE:TMP1->gpr[2]
> @@ -2924,19 +3336,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
>       |  // RA = src1*8, RD = src2*8, JMP with RD = target
>       |.if DUALNUM
> -    |  lwzux TMP0, RA, BASE
> +    |  lwzux CARG1, RA, BASE
>       |    addi PC, PC, 4
>       |   lwz CARG2, 4(RA)
> -    |  lwzux TMP1, RD, BASE
> +    |  lwzux CARG3, RD, BASE
>       |    lwz TMP2, -4(PC)
> -    |  checknum cr0, TMP0
> -    |   lwz CARG3, 4(RD)
> +    |  checknum cr0, CARG1
> +    |   lwz CARG4, 4(RD)
>       |    decode_RD4 TMP2, TMP2
> -    |  checknum cr1, TMP1
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |  checknum cr1, CARG3
> +    |    addis SAVE0, TMP2, -(BCBIAS_J*4 >> 16)
>       |  bne cr0, >7
>       |  bne cr1, >8
> -    |   cmpw CARG2, CARG3
> +    |   cmpw CARG2, CARG4
>       if (op == BC_ISLT) {
>         |  bge >2
>       } else if (op == BC_ISGE) {
> @@ -2947,28 +3359,41 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>         |  ble >2
>       }
>       |1:
> -    |  add PC, PC, TMP2
> +    |  add PC, PC, SAVE0
>       |2:
>       |  ins_next
>       |
>       |7:  // RA is not an integer.
>       |  bgt cr0, ->vmeta_comp
>       |  // RA is a number.
> -    |   lfd f0, 0(RA)
> +    |   .FPU lfd f0, 0(RA)
>       |  bgt cr1, ->vmeta_comp
>       |  blt cr1, >4
>       |  // RA is a number, RD is an integer.
> -    |  tonum_i f1, CARG3
> +    |.if FPU
> +    |  tonum_i f1, CARG4
> +    |.else
> +    |  bl ->vm_sfi2d_2
> +    |.endif
>       |  b >5
>       |
>       |8: // RA is an integer, RD is not an integer.
>       |  bgt cr1, ->vmeta_comp
>       |  // RA is an integer, RD is a number.
> +    |.if FPU
>       |  tonum_i f0, CARG2
> +    |.else
> +    |  bl ->vm_sfi2d_1
> +    |.endif
>       |4:
> -    |  lfd f1, 0(RD)
> +    |  .FPU lfd f1, 0(RD)
>       |5:
> +    |.if FPU
>       |  fcmpu cr0, f0, f1
> +    |.else
> +    |  blex __ledf2
> +    |  cmpwi CRET1, 0
> +    |.endif
>       if (op == BC_ISLT) {
>         |  bge <2
>       } else if (op == BC_ISGE) {
> @@ -3016,42 +3441,42 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       vk = op == BC_ISEQV;
>       |  // RA = src1*8, RD = src2*8, JMP with RD = target
>       |.if DUALNUM
> -    |  lwzux TMP0, RA, BASE
> +    |  lwzux CARG1, RA, BASE
>       |    addi PC, PC, 4
>       |   lwz CARG2, 4(RA)
> -    |  lwzux TMP1, RD, BASE
> -    |  checknum cr0, TMP0
> -    |    lwz TMP2, -4(PC)
> -    |  checknum cr1, TMP1
> -    |    decode_RD4 TMP2, TMP2
> -    |   lwz CARG3, 4(RD)
> +    |  lwzux CARG3, RD, BASE
> +    |  checknum cr0, CARG1
> +    |    lwz SAVE0, -4(PC)
> +    |  checknum cr1, CARG3
> +    |    decode_RD4 SAVE0, SAVE0
> +    |   lwz CARG4, 4(RD)
>       |  cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>       if (vk) {
>         |  ble cr7, ->BC_ISEQN_Z
>       } else {
>         |  ble cr7, ->BC_ISNEN_Z
>       }
>       |.else
> -    |  lwzux TMP0, RA, BASE
> -    |   lwz TMP2, 0(PC)
> +    |  lwzux CARG1, RA, BASE
> +    |   lwz SAVE0, 0(PC)
>       |    lfd f0, 0(RA)
>       |   addi PC, PC, 4
> -    |  lwzux TMP1, RD, BASE
> -    |  checknum cr0, TMP0
> -    |   decode_RD4 TMP2, TMP2
> +    |  lwzux CARG3, RD, BASE
> +    |  checknum cr0, CARG1
> +    |   decode_RD4 SAVE0, SAVE0
>       |    lfd f1, 0(RD)
> -    |  checknum cr1, TMP1
> -    |   addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |  checknum cr1, CARG3
> +    |   addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>       |  bge cr0, >5
>       |  bge cr1, >5
>       |  fcmpu cr0, f0, f1
>       if (vk) {
>         |  bne >1
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>       } else {
>         |  beq >1
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>       }
>       |1:
>       |  ins_next
> @@ -3059,36 +3484,36 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |5:  // Either or both types are not numbers.
>       |.if not DUALNUM
>       |    lwz CARG2, 4(RA)
> -    |    lwz CARG3, 4(RD)
> +    |    lwz CARG4, 4(RD)
>       |.endif
>       |.if FFI
> -    |  cmpwi cr7, TMP0, LJ_TCDATA
> -    |  cmpwi cr5, TMP1, LJ_TCDATA
> +    |  cmpwi cr7, CARG1, LJ_TCDATA
> +    |  cmpwi cr5, CARG3, LJ_TCDATA
>       |.endif
> -    |   not TMP3, TMP0
> -    |  cmplw TMP0, TMP1
> -    |   cmplwi cr1, TMP3, ~LJ_TISPRI		// Primitive?
> +    |   not TMP2, CARG1
> +    |  cmplw CARG1, CARG3
> +    |   cmplwi cr1, TMP2, ~LJ_TISPRI		// Primitive?
>       |.if FFI
>       |  cror 4*cr7+eq, 4*cr7+eq, 4*cr5+eq
>       |.endif
> -    |   cmplwi cr6, TMP3, ~LJ_TISTABUD		// Table or userdata?
> +    |   cmplwi cr6, TMP2, ~LJ_TISTABUD		// Table or userdata?
>       |.if FFI
>       |  beq cr7, ->vmeta_equal_cd
>       |.endif
> -    |    cmplw cr5, CARG2, CARG3
> +    |    cmplw cr5, CARG2, CARG4
>       |  crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt	// 2: Same type and primitive.
>       |  crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq	// 1: Same tv or different type.
>       |  crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq	// 0: Same type and same tv.
> -    |   mr SAVE0, PC
> +    |   mr SAVE1, PC
>       |  cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt	// 0 or 2.
>       |  cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt	// 1 or 2.
>       if (vk) {
>         |  bne cr0, >6
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>         |6:
>       } else {
>         |  beq cr0, >6
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>         |6:
>       }
>       |.if DUALNUM
> @@ -3103,6 +3528,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |
>       |  // Different tables or userdatas. Need to check __eq metamethod.
>       |  // Field metatable must be at same offset for GCtab and GCudata!
> +    |   mr CARG3, CARG4
>       |  lwz TAB:TMP2, TAB:CARG2->metatable
>       |   li CARG4, 1-vk			// ne = 0 or 1.
>       |  cmplwi TAB:TMP2, 0
> @@ -3110,7 +3536,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  lbz TMP2, TAB:TMP2->nomm
>       |  andix. TMP2, TMP2, 1<<MM_eq
>       |  bne <1				// Or 'no __eq' flag set?
> -    |  mr PC, SAVE0			// Restore old PC.
> +    |  mr PC, SAVE1			// Restore old PC.
>       |  b ->vmeta_equal			// Handle __eq metamethod.
>       break;
>   
> @@ -3151,16 +3577,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       vk = op == BC_ISEQN;
>       |  // RA = src*8, RD = num_const*8, JMP with RD = target
>       |.if DUALNUM
> -    |  lwzux TMP0, RA, BASE
> +    |  lwzux CARG1, RA, BASE
>       |    addi PC, PC, 4
>       |   lwz CARG2, 4(RA)
> -    |  lwzux TMP1, RD, KBASE
> -    |  checknum cr0, TMP0
> -    |    lwz TMP2, -4(PC)
> -    |  checknum cr1, TMP1
> -    |    decode_RD4 TMP2, TMP2
> -    |   lwz CARG3, 4(RD)
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |  lwzux CARG3, RD, KBASE
> +    |  checknum cr0, CARG1
> +    |    lwz SAVE0, -4(PC)
> +    |  checknum cr1, CARG3
> +    |    decode_RD4 SAVE0, SAVE0
> +    |   lwz CARG4, 4(RD)
> +    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>       if (vk) {
>         |->BC_ISEQN_Z:
>       } else {
> @@ -3168,7 +3594,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       }
>       |  bne cr0, >7
>       |  bne cr1, >8
> -    |   cmpw CARG2, CARG3
> +    |   cmpw CARG2, CARG4
>       |4:
>       |.else
>       if (vk) {
> @@ -3176,20 +3602,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       } else {
>         |->BC_ISNEN_Z:  // Dummy label.
>       }
> -    |  lwzx TMP0, BASE, RA
> +    |  lwzx CARG1, BASE, RA
>       |    addi PC, PC, 4
>       |   lfdx f0, BASE, RA
> -    |    lwz TMP2, -4(PC)
> +    |    lwz SAVE0, -4(PC)
>       |  lfdx f1, KBASE, RD
> -    |    decode_RD4 TMP2, TMP2
> -    |  checknum TMP0
> -    |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
> +    |    decode_RD4 SAVE0, SAVE0
> +    |  checknum CARG1
> +    |    addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16)
>       |  bge >3
>       |  fcmpu cr0, f0, f1
>       |.endif
>       if (vk) {
>         |  bne >1
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>         |1:
>         |.if not FFI
>         |3:
> @@ -3200,13 +3626,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>         |.if not FFI
>         |3:
>         |.endif
> -      |  add PC, PC, TMP2
> +      |  add PC, PC, SAVE0
>         |2:
>       }
>       |  ins_next
>       |.if FFI
>       |3:
> -    |  cmpwi TMP0, LJ_TCDATA
> +    |  cmpwi CARG1, LJ_TCDATA
>       |  beq ->vmeta_equal_cd
>       |  b <1
>       |.endif
> @@ -3214,18 +3640,31 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |7:  // RA is not an integer.
>       |  bge cr0, <3
>       |  // RA is a number.
> -    |   lfd f0, 0(RA)
> +    |   .FPU lfd f0, 0(RA)
>       |  blt cr1, >1
>       |  // RA is a number, RD is an integer.
> -    |  tonum_i f1, CARG3
> +    |.if FPU
> +    |  tonum_i f1, CARG4
> +    |.else
> +    |  bl ->vm_sfi2d_2
> +    |.endif
>       |  b >2
>       |
>       |8: // RA is an integer, RD is a number.
> +    |.if FPU
>       |  tonum_i f0, CARG2
> +    |.else
> +    |  bl ->vm_sfi2d_1
> +    |.endif
>       |1:
> -    |  lfd f1, 0(RD)
> +    |  .FPU lfd f1, 0(RD)
>       |2:
> +    |.if FPU
>       |  fcmpu cr0, f0, f1
> +    |.else
> +    |  blex __ledf2
> +    |  cmpwi CRET1, 0
> +    |.endif
>       |  b <4
>       |.endif
>       break;
> @@ -3280,7 +3719,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>         |  add PC, PC, TMP2
>       } else {
>         |  li TMP1, LJ_TFALSE
> +      |.if FPU
>         |   lfdx f0, BASE, RD
> +      |.else
> +      |   lwzux CARG1, RD, BASE
> +      |   lwz CARG2, 4(RD)
> +      |.endif
>         |  cmplw TMP0, TMP1
>         if (op == BC_ISTC) {
>   	|  bge >1
> @@ -3289,7 +3733,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>         }
>         |  addis PC, PC, -(BCBIAS_J*4 >> 16)
>         |  decode_RD4 TMP2, INS
> +      |.if FPU
>         |   stfdx f0, BASE, RA
> +      |.else
> +      |   stwux CARG1, RA, BASE
> +      |   stw CARG2, 4(RA)
> +      |.endif
>         |  add PC, PC, TMP2
>         |1:
>       }
> @@ -3324,8 +3773,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     case BC_MOV:
>       |  // RA = dst*8, RD = src*8
>       |  ins_next1
> +    |.if FPU
>       |  lfdx f0, BASE, RD
>       |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwzux TMP0, RD, BASE
> +    |  lwz TMP1, 4(RD)
> +    |  stwux TMP0, RA, BASE
> +    |  stw TMP1, 4(RA)
> +    |.endif
>       |  ins_next2
>       break;
>     case BC_NOT:
> @@ -3427,44 +3883,65 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
>       ||switch (vk) {
>       ||case 0:
> -    |   lwzx TMP1, BASE, RB
> +    |   lwzx CARG1, BASE, RB
>       |   .if DUALNUM
> -    |     lwzx TMP2, KBASE, RC
> +    |     lwzx CARG3, KBASE, RC
>       |   .endif
> +    |   .if FPU
>       |    lfdx f14, BASE, RB
>       |    lfdx f15, KBASE, RC
> +    |   .else
> +    |    add TMP1, BASE, RB
> +    |    add TMP2, KBASE, RC
> +    |    lwz CARG2, 4(TMP1)
> +    |    lwz CARG4, 4(TMP2)
> +    |   .endif
>       |   .if DUALNUM
> -    |     checknum cr0, TMP1
> -    |     checknum cr1, TMP2
> +    |     checknum cr0, CARG1
> +    |     checknum cr1, CARG3
>       |     crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>       |     bge ->vmeta_arith_vn
>       |   .else
> -    |     checknum TMP1; bge ->vmeta_arith_vn
> +    |     checknum CARG1; bge ->vmeta_arith_vn
>       |   .endif
>       ||  break;
>       ||case 1:
> -    |   lwzx TMP1, BASE, RB
> +    |   lwzx CARG1, BASE, RB
>       |   .if DUALNUM
> -    |     lwzx TMP2, KBASE, RC
> +    |     lwzx CARG3, KBASE, RC
>       |   .endif
> +    |   .if FPU
>       |    lfdx f15, BASE, RB
>       |    lfdx f14, KBASE, RC
> +    |   .else
> +    |    add TMP1, BASE, RB
> +    |    add TMP2, KBASE, RC
> +    |    lwz CARG2, 4(TMP1)
> +    |    lwz CARG4, 4(TMP2)
> +    |   .endif
>       |   .if DUALNUM
> -    |     checknum cr0, TMP1
> -    |     checknum cr1, TMP2
> +    |     checknum cr0, CARG1
> +    |     checknum cr1, CARG3
>       |     crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>       |     bge ->vmeta_arith_nv
>       |   .else
> -    |     checknum TMP1; bge ->vmeta_arith_nv
> +    |     checknum CARG1; bge ->vmeta_arith_nv
>       |   .endif
>       ||  break;
>       ||default:
> -    |   lwzx TMP1, BASE, RB
> -    |   lwzx TMP2, BASE, RC
> +    |   lwzx CARG1, BASE, RB
> +    |   lwzx CARG3, BASE, RC
> +    |   .if FPU
>       |    lfdx f14, BASE, RB
>       |    lfdx f15, BASE, RC
> -    |   checknum cr0, TMP1
> -    |   checknum cr1, TMP2
> +    |   .else
> +    |    add TMP1, BASE, RB
> +    |    add TMP2, BASE, RC
> +    |    lwz CARG2, 4(TMP1)
> +    |    lwz CARG4, 4(TMP2)
> +    |   .endif
> +    |   checknum cr0, CARG1
> +    |   checknum cr1, CARG3
>       |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>       |   bge ->vmeta_arith_vv
>       ||  break;
> @@ -3498,48 +3975,78 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  fsub a, b, a			// b - floor(b/c)*c
>       |.endmacro
>       |
> +    |.macro sfpmod
> +    |->BC_MODVN_Z:
> +    |  stw CARG1, SFSAVE_1
> +    |  stw CARG2, SFSAVE_2
> +    |  mr SAVE0, CARG3
> +    |  mr SAVE1, CARG4
> +    |  blex __divdf3
> +    |  blex floor
> +    |  mr CARG3, SAVE0
> +    |  mr CARG4, SAVE1
> +    |  blex __muldf3
> +    |  mr CARG3, CRET1
> +    |  mr CARG4, CRET2
> +    |  lwz CARG1, SFSAVE_1
> +    |  lwz CARG2, SFSAVE_2
> +    |  blex __subdf3
> +    |.endmacro
> +    |
>       |.macro ins_arithfp, fpins
>       |  ins_arithpre
>       |.if "fpins" == "fpmod_"
>       |  b ->BC_MODVN_Z			// Avoid 3 copies. It's slow anyway.
> -    |.else
> +    |.elif FPU
>       |  fpins f0, f14, f15
>       |  ins_next1
>       |  stfdx f0, BASE, RA
>       |  ins_next2
> +    |.else
> +    |  blex __divdf3			// Only soft-float div uses this macro.
> +    |  ins_next1
> +    |  stwux CRET1, RA, BASE
> +    |  stw CRET2, 4(RA)
> +    |  ins_next2
>       |.endif
>       |.endmacro
>       |
> -    |.macro ins_arithdn, intins, fpins
> +    |.macro ins_arithdn, intins, fpins, fpcall
>       |  // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8
>       ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
>       ||switch (vk) {
>       ||case 0:
> -    |   lwzux TMP1, RB, BASE
> -    |   lwzux TMP2, RC, KBASE
> -    |    lwz CARG1, 4(RB)
> -    |   checknum cr0, TMP1
> -    |    lwz CARG2, 4(RC)
> +    |   lwzux CARG1, RB, BASE
> +    |   lwzux CARG3, RC, KBASE
> +    |    lwz CARG2, 4(RB)
> +    |   checknum cr0, CARG1
> +    |    lwz CARG4, 4(RC)
> +    |   checknum cr1, CARG3
>       ||  break;
>       ||case 1:
> -    |   lwzux TMP1, RB, BASE
> -    |   lwzux TMP2, RC, KBASE
> -    |    lwz CARG2, 4(RB)
> -    |   checknum cr0, TMP1
> -    |    lwz CARG1, 4(RC)
> +    |   lwzux CARG3, RB, BASE
> +    |   lwzux CARG1, RC, KBASE
> +    |    lwz CARG4, 4(RB)
> +    |   checknum cr0, CARG3
> +    |    lwz CARG2, 4(RC)
> +    |   checknum cr1, CARG1
>       ||  break;
>       ||default:
> -    |   lwzux TMP1, RB, BASE
> -    |   lwzux TMP2, RC, BASE
> -    |    lwz CARG1, 4(RB)
> -    |   checknum cr0, TMP1
> -    |    lwz CARG2, 4(RC)
> +    |   lwzux CARG1, RB, BASE
> +    |   lwzux CARG3, RC, BASE
> +    |    lwz CARG2, 4(RB)
> +    |   checknum cr0, CARG1
> +    |    lwz CARG4, 4(RC)
> +    |   checknum cr1, CARG3
>       ||  break;
>       ||}
> -    |  checknum cr1, TMP2
>       |  bne >5
>       |  bne cr1, >5
> -    |  intins CARG1, CARG1, CARG2
> +    |.if "intins" == "intmod"
> +    |  mr CARG1, CARG2
> +    |  mr CARG2, CARG4
> +    |.endif
> +    |  intins CARG1, CARG2, CARG4
>       |  bso >4
>       |1:
>       |  ins_next1
> @@ -3551,29 +4058,40 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  checkov TMP0, <1			// Ignore unrelated overflow.
>       |  ins_arithfallback b
>       |5:  // FP variant.
> +    |.if FPU
>       ||if (vk == 1) {
>       |  lfd f15, 0(RB)
> -    |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>       |  lfd f14, 0(RC)
>       ||} else {
>       |  lfd f14, 0(RB)
> -    |   crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>       |  lfd f15, 0(RC)
>       ||}
> +    |.endif
> +    |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>       |   ins_arithfallback bge
>       |.if "fpins" == "fpmod_"
>       |  b ->BC_MODVN_Z			// Avoid 3 copies. It's slow anyway.
>       |.else
> +    |.if FPU
>       |  fpins f0, f14, f15
> -    |  ins_next1
>       |  stfdx f0, BASE, RA
> +    |.else
> +    |.if "fpcall" == "sfpmod"
> +    |  sfpmod
> +    |.else
> +    |  blex fpcall
> +    |.endif
> +    |  stwux CRET1, RA, BASE
> +    |  stw CRET2, 4(RA)
> +    |.endif
> +    |  ins_next1
>       |  b <2
>       |.endif
>       |.endmacro
>       |
> -    |.macro ins_arith, intins, fpins
> +    |.macro ins_arith, intins, fpins, fpcall
>       |.if DUALNUM
> -    |  ins_arithdn intins, fpins
> +    |  ins_arithdn intins, fpins, fpcall
>       |.else
>       |  ins_arithfp fpins
>       |.endif
> @@ -3588,9 +4106,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  addo. TMP0, TMP0, TMP3
>       |  add y, a, b
>       |.endmacro
> -    |  ins_arith addo32., fadd
> +    |  ins_arith addo32., fadd, __adddf3
>       |.else
> -    |  ins_arith addo., fadd
> +    |  ins_arith addo., fadd, __adddf3
>       |.endif
>       break;
>     case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
> @@ -3602,36 +4120,48 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  subo. TMP0, TMP0, TMP3
>       |  sub y, a, b
>       |.endmacro
> -    |  ins_arith subo32., fsub
> +    |  ins_arith subo32., fsub, __subdf3
>       |.else
> -    |  ins_arith subo., fsub
> +    |  ins_arith subo., fsub, __subdf3
>       |.endif
>       break;
>     case BC_MULVN: case BC_MULNV: case BC_MULVV:
> -    |  ins_arith mullwo., fmul
> +    |  ins_arith mullwo., fmul, __muldf3
>       break;
>     case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
>       |  ins_arithfp fdiv
>       break;
>     case BC_MODVN:
> -    |  ins_arith intmod, fpmod
> +    |  ins_arith intmod, fpmod, sfpmod
>       break;
>     case BC_MODNV: case BC_MODVV:
> -    |  ins_arith intmod, fpmod_
> +    |  ins_arith intmod, fpmod_, sfpmod
>       break;
>     case BC_POW:
>       |  // NYI: (partial) integer arithmetic.
> -    |  lwzx TMP1, BASE, RB
> +    |  lwzx CARG1, BASE, RB
> +    |  lwzx CARG3, BASE, RC
> +    |.if FPU
>       |   lfdx FARG1, BASE, RB
> -    |  lwzx TMP2, BASE, RC
>       |   lfdx FARG2, BASE, RC
> -    |  checknum cr0, TMP1
> -    |  checknum cr1, TMP2
> +    |.else
> +    |   add TMP1, BASE, RB
> +    |   add TMP2, BASE, RC
> +    |   lwz CARG2, 4(TMP1)
> +    |   lwz CARG4, 4(TMP2)
> +    |.endif
> +    |  checknum cr0, CARG1
> +    |  checknum cr1, CARG3
>       |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>       |  bge ->vmeta_arith_vv
>       |  blex pow
>       |  ins_next1
> +    |.if FPU
>       |  stfdx FARG1, BASE, RA
> +    |.else
> +    |  stwux CARG1, RA, BASE
> +    |  stw CARG2, 4(RA)
> +    |.endif
>       |  ins_next2
>       break;
>   
> @@ -3651,8 +4181,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   lp BASE, L->base
>       |  bne ->vmeta_binop
>       |  ins_next1
> +    |.if FPU
>       |  lfdx f0, BASE, SAVE0		// Copy result from RB to RA.
>       |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwzux TMP0, SAVE0, BASE
> +    |  lwz TMP1, 4(SAVE0)
> +    |  stwux TMP0, RA, BASE
> +    |  stw TMP1, 4(RA)
> +    |.endif
>       |  ins_next2
>       break;
>   
> @@ -3715,8 +4252,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     case BC_KNUM:
>       |  // RA = dst*8, RD = num_const*8
>       |  ins_next1
> +    |.if FPU
>       |  lfdx f0, KBASE, RD
>       |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwzux TMP0, RD, KBASE
> +    |  lwz TMP1, 4(RD)
> +    |  stwux TMP0, RA, BASE
> +    |  stw TMP1, 4(RA)
> +    |.endif
>       |  ins_next2
>       break;
>     case BC_KPRI:
> @@ -3749,8 +4293,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  lwzx UPVAL:RB, LFUNC:RB, RD
>       |  ins_next1
>       |  lwz TMP1, UPVAL:RB->v
> +    |.if FPU
>       |  lfd f0, 0(TMP1)
>       |  stfdx f0, BASE, RA
> +    |.else
> +    |  lwz TMP2, 0(TMP1)
> +    |  lwz TMP3, 4(TMP1)
> +    |  stwux TMP2, RA, BASE
> +    |  stw TMP3, 4(RA)
> +    |.endif
>       |  ins_next2
>       break;
>     case BC_USETV:
> @@ -3758,14 +4309,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  lwz LFUNC:RB, FRAME_FUNC(BASE)
>       |    srwi RA, RA, 1
>       |    addi RA, RA, offsetof(GCfuncL, uvptr)
> +    |.if FPU
>       |   lfdux f0, RD, BASE
> +    |.else
> +    |   lwzux CARG1, RD, BASE
> +    |   lwz CARG3, 4(RD)
> +    |.endif
>       |  lwzx UPVAL:RB, LFUNC:RB, RA
>       |  lbz TMP3, UPVAL:RB->marked
>       |   lwz CARG2, UPVAL:RB->v
>       |  andix. TMP3, TMP3, LJ_GC_BLACK	// isblack(uv)
>       |    lbz TMP0, UPVAL:RB->closed
>       |   lwz TMP2, 0(RD)
> +    |.if FPU
>       |   stfd f0, 0(CARG2)
> +    |.else
> +    |   stw CARG1, 0(CARG2)
> +    |   stw CARG3, 4(CARG2)
> +    |.endif
>       |    cmplwi cr1, TMP0, 0
>       |   lwz TMP1, 4(RD)
>       |  cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -3821,11 +4382,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  lwz LFUNC:RB, FRAME_FUNC(BASE)
>       |   srwi RA, RA, 1
>       |   addi RA, RA, offsetof(GCfuncL, uvptr)
> +    |.if FPU
>       |    lfdx f0, KBASE, RD
> +    |.else
> +    |    lwzux TMP2, RD, KBASE
> +    |    lwz TMP3, 4(RD)
> +    |.endif
>       |  lwzx UPVAL:RB, LFUNC:RB, RA
>       |  ins_next1
>       |  lwz TMP1, UPVAL:RB->v
> +    |.if FPU
>       |  stfd f0, 0(TMP1)
> +    |.else
> +    |  stw TMP2, 0(TMP1)
> +    |  stw TMP3, 4(TMP1)
> +    |.endif
>       |  ins_next2
>       break;
>     case BC_USETP:
> @@ -3973,11 +4544,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |.endif
>       |  ble ->vmeta_tgetv		// Integer key and in array part?
>       |  lwzx TMP0, TMP1, TMP2
> +    |.if FPU
>       |   lfdx f14, TMP1, TMP2
> +    |.else
> +    |   lwzux SAVE0, TMP1, TMP2
> +    |   lwz SAVE1, 4(TMP1)
> +    |.endif
>       |  checknil TMP0; beq >2
>       |1:
>       |  ins_next1
> +    |.if FPU
>       |   stfdx f14, BASE, RA
> +    |.else
> +    |   stwux SAVE0, RA, BASE
> +    |   stw SAVE1, 4(RA)
> +    |.endif
>       |  ins_next2
>       |
>       |2:  // Check for __index if table value is nil.
> @@ -4053,12 +4634,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  lwz TMP1, TAB:RB->asize
>       |   lwz TMP2, TAB:RB->array
>       |  cmplw TMP0, TMP1; bge ->vmeta_tgetb
> +    |.if FPU
>       |  lwzx TMP1, TMP2, RC
>       |   lfdx f0, TMP2, RC
> +    |.else
> +    |  lwzux TMP1, TMP2, RC
> +    |   lwz TMP3, 4(TMP2)
> +    |.endif
>       |  checknil TMP1; beq >5
>       |1:
>       |  ins_next1
> +    |.if FPU
>       |   stfdx f0, BASE, RA
> +    |.else
> +    |   stwux TMP1, RA, BASE
> +    |   stw TMP3, 4(RA)
> +    |.endif
>       |  ins_next2
>       |
>       |5:  // Check for __index if table value is nil.
> @@ -4088,10 +4679,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  cmplw TMP0, CARG2
>       |   slwi TMP2, CARG2, 3
>       |  ble ->vmeta_tgetr		// In array part?
> +    |.if FPU
>       |   lfdx f14, TMP1, TMP2
> +    |.else
> +    |   lwzux SAVE0, TMP2, TMP1
> +    |   lwz SAVE1, 4(TMP2)
> +    |.endif
>       |->BC_TGETR_Z:
>       |  ins_next1
> +    |.if FPU
>       |   stfdx f14, BASE, RA
> +    |.else
> +    |   stwux SAVE0, RA, BASE
> +    |   stw SAVE1, 4(RA)
> +    |.endif
>       |  ins_next2
>       break;
>   
> @@ -4132,11 +4733,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  ble ->vmeta_tsetv		// Integer key and in array part?
>       |   lwzx TMP2, TMP1, TMP0
>       |  lbz TMP3, TAB:RB->marked
> +    |.if FPU
>       |    lfdx f14, BASE, RA
> +    |.else
> +    |    add SAVE1, BASE, RA
> +    |    lwz SAVE0, 0(SAVE1)
> +    |    lwz SAVE1, 4(SAVE1)
> +    |.endif
>       |   checknil TMP2; beq >3
>       |1:
>       |  andix. TMP2, TMP3, LJ_GC_BLACK	// isblack(table)
> +    |.if FPU
>       |    stfdx f14, TMP1, TMP0
> +    |.else
> +    |    stwux SAVE0, TMP1, TMP0
> +    |    stw SAVE1, 4(TMP1)
> +    |.endif
>       |  bne >7
>       |2:
>       |  ins_next
> @@ -4177,7 +4789,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  lwz NODE:TMP2, TAB:RB->node
>       |    stb ZERO, TAB:RB->nomm		// Clear metamethod cache.
>       |  and TMP1, TMP1, TMP0		// idx = str->hash & tab->hmask
> +    |.if FPU
>       |    lfdx f14, BASE, RA
> +    |.else
> +    |    add CARG2, BASE, RA
> +    |    lwz SAVE0, 0(CARG2)
> +    |    lwz SAVE1, 4(CARG2)
> +    |.endif
>       |  slwi TMP0, TMP1, 5
>       |  slwi TMP1, TMP1, 3
>       |  sub TMP1, TMP0, TMP1
> @@ -4193,7 +4811,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |    checknil CARG2; beq >4		// Key found, but nil value?
>       |2:
>       |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
> +    |.if FPU
>       |    stfd f14, NODE:TMP2->val
> +    |.else
> +    |    stw SAVE0, NODE:TMP2->val.u32.hi
> +    |    stw SAVE1, NODE:TMP2->val.u32.lo
> +    |.endif
>       |  bne >7
>       |3:
>       |  ins_next
> @@ -4232,7 +4855,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  bl extern lj_tab_newkey		// (lua_State *L, GCtab *t, TValue *k)
>       |  // Returns TValue *.
>       |  lp BASE, L->base
> +    |.if FPU
>       |  stfd f14, 0(CRET1)
> +    |.else
> +    |  stw SAVE0, 0(CRET1)
> +    |  stw SAVE1, 4(CRET1)
> +    |.endif
>       |  b <3				// No 2nd write barrier needed.
>       |
>       |7:  // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4249,13 +4877,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   lwz TMP2, TAB:RB->array
>       |    lbz TMP3, TAB:RB->marked
>       |  cmplw TMP0, TMP1
> +    |.if FPU
>       |   lfdx f14, BASE, RA
> +    |.else
> +    |   add CARG2, BASE, RA
> +    |   lwz SAVE0, 0(CARG2)
> +    |   lwz SAVE1, 4(CARG2)
> +    |.endif
>       |  bge ->vmeta_tsetb
>       |  lwzx TMP1, TMP2, RC
>       |  checknil TMP1; beq >5
>       |1:
>       |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
> +    |.if FPU
>       |   stfdx f14, TMP2, RC
> +    |.else
> +    |   stwux SAVE0, RC, TMP2
> +    |   stw SAVE1, 4(RC)
> +    |.endif
>       |  bne >7
>       |2:
>       |  ins_next
> @@ -4295,10 +4934,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |2:
>       |  cmplw TMP0, CARG3
>       |   slwi TMP2, CARG3, 3
> +    |.if FPU
>       |   lfdx f14, BASE, RA
> +    |.else
> +    |  lwzux SAVE0, RA, BASE
> +    |  lwz SAVE1, 4(RA)
> +    |.endif
>       |  ble ->vmeta_tsetr		// In array part?
>       |  ins_next1
> +    |.if FPU
>       |   stfdx f14, TMP1, TMP2
> +    |.else
> +    |   stwux SAVE0, TMP1, TMP2
> +    |   stw SAVE1, 4(TMP1)
> +    |.endif
>       |  ins_next2
>       |
>       |7:  // Possible table write barrier for the value. Skip valiswhite check.
> @@ -4328,10 +4977,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   add TMP1, TMP1, TMP0
>       |    andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
>       |3:  // Copy result slots to table.
> +    |.if FPU
>       |   lfd f0, 0(RA)
> +    |.else
> +    |   lwz SAVE0, 0(RA)
> +    |   lwz SAVE1, 4(RA)
> +    |.endif
>       |  addi RA, RA, 8
>       |  cmpw cr1, RA, TMP2
> +    |.if FPU
>       |   stfd f0, 0(TMP1)
> +    |.else
> +    |   stw SAVE0, 0(TMP1)
> +    |   stw SAVE1, 4(TMP1)
> +    |.endif
>       |    addi TMP1, TMP1, 8
>       |  blt cr1, <3
>       |  bne >7
> @@ -4398,9 +5057,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |    beq cr1, >3
>       |2:
>       |  addi TMP3, TMP2, 8
> +    |.if FPU
>       |   lfdx f0, RA, TMP2
> +    |.else
> +    |   add CARG3, RA, TMP2
> +    |   lwz CARG1, 0(CARG3)
> +    |   lwz CARG2, 4(CARG3)
> +    |.endif
>       |  cmplw cr1, TMP3, NARGS8:RC
> +    |.if FPU
>       |   stfdx f0, BASE, TMP2
> +    |.else
> +    |   stwux CARG1, TMP2, BASE
> +    |   stw CARG2, 4(TMP2)
> +    |.endif
>       |  mr TMP2, TMP3
>       |  bne cr1, <2
>       |3:
> @@ -4433,14 +5103,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  add BASE, BASE, RA
>       |  lwz TMP1, -24(BASE)
>       |   lwz LFUNC:RB, -20(BASE)
> +    |.if FPU
>       |    lfd f1, -8(BASE)
>       |    lfd f0, -16(BASE)
> +    |.else
> +    |    lwz CARG1, -8(BASE)
> +    |    lwz CARG2, -4(BASE)
> +    |    lwz CARG3, -16(BASE)
> +    |    lwz CARG4, -12(BASE)
> +    |.endif
>       |  stw TMP1, 0(BASE)		// Copy callable.
>       |   stw LFUNC:RB, 4(BASE)
>       |  checkfunc TMP1
> -    |    stfd f1, 16(BASE)		// Copy control var.
>       |     li NARGS8:RC, 16		// Iterators get 2 arguments.
> +    |.if FPU
> +    |    stfd f1, 16(BASE)		// Copy control var.
>       |    stfdu f0, 8(BASE)		// Copy state.
> +    |.else
> +    |    stw CARG1, 16(BASE)		// Copy control var.
> +    |    stw CARG2, 20(BASE)
> +    |    stwu CARG3, 8(BASE)		// Copy state.
> +    |    stw CARG4, 4(BASE)
> +    |.endif
>       |  bne ->vmeta_call
>       |  ins_call
>       break;
> @@ -4461,7 +5145,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   slwi TMP3, RC, 3
>       |  bge >5				// Index points after array part?
>       |  lwzx TMP2, TMP1, TMP3
> +    |.if FPU
>       |   lfdx f0, TMP1, TMP3
> +    |.else
> +    |   lwzux CARG1, TMP3, TMP1
> +    |   lwz CARG2, 4(TMP3)
> +    |.endif
>       |  checknil TMP2
>       |     lwz INS, -4(PC)
>       |  beq >4
> @@ -4473,7 +5162,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |.endif
>       |    addi RC, RC, 1
>       |     addis TMP3, PC, -(BCBIAS_J*4 >> 16)
> +    |.if FPU
>       |  stfd f0, 8(RA)
> +    |.else
> +    |  stw CARG1, 8(RA)
> +    |  stw CARG2, 12(RA)
> +    |.endif
>       |     decode_RD4 TMP1, INS
>       |    stw RC, -4(RA)			// Update control var.
>       |     add PC, TMP1, TMP3
> @@ -4498,17 +5192,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   slwi RB, RC, 3
>       |   sub TMP3, TMP3, RB
>       |  lwzx RB, TMP2, TMP3
> +    |.if FPU
>       |  lfdx f0, TMP2, TMP3
> +    |.else
> +    |  add CARG3, TMP2, TMP3
> +    |  lwz CARG1, 0(CARG3)
> +    |  lwz CARG2, 4(CARG3)
> +    |.endif
>       |   add NODE:TMP3, TMP2, TMP3
>       |  checknil RB
>       |     lwz INS, -4(PC)
>       |  beq >7
> +    |.if FPU
>       |   lfd f1, NODE:TMP3->key
> +    |.else
> +    |   lwz CARG3, NODE:TMP3->key.u32.hi
> +    |   lwz CARG4, NODE:TMP3->key.u32.lo
> +    |.endif
>       |     addis TMP2, PC, -(BCBIAS_J*4 >> 16)
> +    |.if FPU
>       |  stfd f0, 8(RA)
> +    |.else
> +    |  stw CARG1, 8(RA)
> +    |  stw CARG2, 12(RA)
> +    |.endif
>       |    add RC, RC, TMP0
>       |     decode_RD4 TMP1, INS
> +    |.if FPU
>       |   stfd f1, 0(RA)
> +    |.else
> +    |   stw CARG3, 0(RA)
> +    |   stw CARG4, 4(RA)
> +    |.endif
>       |    addi RC, RC, 1
>       |     add PC, TMP1, TMP2
>       |    stw RC, -4(RA)			// Update control var.
> @@ -4574,9 +5289,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   subi TMP2, TMP2, 16
>       |   ble >2				// No vararg slots?
>       |1:  // Copy vararg slots to destination slots.
> +    |.if FPU
>       |  lfd f0, 0(RC)
> +    |.else
> +    |  lwz CARG1, 0(RC)
> +    |  lwz CARG2, 4(RC)
> +    |.endif
>       |   addi RC, RC, 8
> +    |.if FPU
>       |  stfd f0, 0(RA)
> +    |.else
> +    |  stw CARG1, 0(RA)
> +    |  stw CARG2, 4(RA)
> +    |.endif
>       |  cmplw RA, TMP2
>       |   cmplw cr1, RC, TMP3
>       |  bge >3				// All destination slots filled?
> @@ -4599,9 +5324,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   addi MULTRES, TMP1, 8
>       |  bgt >7
>       |6:
> +    |.if FPU
>       |  lfd f0, 0(RC)
> +    |.else
> +    |  lwz CARG1, 0(RC)
> +    |  lwz CARG2, 4(RC)
> +    |.endif
>       |   addi RC, RC, 8
> +    |.if FPU
>       |  stfd f0, 0(RA)
> +    |.else
> +    |  stw CARG1, 0(RA)
> +    |  stw CARG2, 4(RA)
> +    |.endif
>       |  cmplw RC, TMP3
>       |   addi RA, RA, 8
>       |  blt <6				// More vararg slots?
> @@ -4652,14 +5387,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   li TMP1, 0
>       |2:
>       |  addi TMP3, TMP1, 8
> +    |.if FPU
>       |   lfdx f0, RA, TMP1
> +    |.else
> +    |   add CARG3, RA, TMP1
> +    |   lwz CARG1, 0(CARG3)
> +    |   lwz CARG2, 4(CARG3)
> +    |.endif
>       |  cmpw TMP3, RC
> +    |.if FPU
>       |   stfdx f0, TMP2, TMP1
> +    |.else
> +    |   add CARG3, TMP2, TMP1
> +    |   stw CARG1, 0(CARG3)
> +    |   stw CARG2, 4(CARG3)
> +    |.endif
>       |  beq >3
>       |  addi TMP1, TMP3, 8
> +    |.if FPU
>       |   lfdx f1, RA, TMP3
> +    |.else
> +    |   add CARG3, RA, TMP3
> +    |   lwz CARG1, 0(CARG3)
> +    |   lwz CARG2, 4(CARG3)
> +    |.endif
>       |  cmpw TMP1, RC
> +    |.if FPU
>       |   stfdx f1, TMP2, TMP3
> +    |.else
> +    |   add CARG3, TMP2, TMP3
> +    |   stw CARG1, 0(CARG3)
> +    |   stw CARG2, 4(CARG3)
> +    |.endif
>       |  bne <2
>       |3:
>       |5:
> @@ -4701,8 +5460,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |   subi TMP2, BASE, 8
>       |  decode_RB8 RB, INS
>       if (op == BC_RET1) {
> +      |.if FPU
>         |  lfd f0, 0(RA)
>         |  stfd f0, 0(TMP2)
> +      |.else
> +      |  lwz CARG1, 0(RA)
> +      |  lwz CARG2, 4(RA)
> +      |  stw CARG1, 0(TMP2)
> +      |  stw CARG2, 4(TMP2)
> +      |.endif
>       }
>       |5:
>       |  cmplw RB, RD
> @@ -4763,11 +5529,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>         |4:
>         |  stw CARG1, FORL_IDX*8+4(RA)
>       } else {
> -      |  lwz TMP3, FORL_STEP*8(RA)
> +      |  lwz SAVE0, FORL_STEP*8(RA)
>         |   lwz CARG3, FORL_STEP*8+4(RA)
>         |  lwz TMP2, FORL_STOP*8(RA)
>         |   lwz CARG2, FORL_STOP*8+4(RA)
> -      |  cmplw cr7, TMP3, TISNUM
> +      |  cmplw cr7, SAVE0, TISNUM
>         |  cmplw cr1, TMP2, TISNUM
>         |  crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
>         |  crand 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
> @@ -4810,41 +5576,80 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       if (vk) {
>         |.if DUALNUM
>         |9:  // FP loop.
> +      |.if FPU
>         |  lfd f1, FORL_IDX*8(RA)
>         |.else
> +      |  lwz CARG1, FORL_IDX*8(RA)
> +      |  lwz CARG2, FORL_IDX*8+4(RA)
> +      |.endif
> +      |.else
>         |  lfdux f1, RA, BASE
>         |.endif
> +      |.if FPU
>         |  lfd f3, FORL_STEP*8(RA)
>         |  lfd f2, FORL_STOP*8(RA)
> -      |   lwz TMP3, FORL_STEP*8(RA)
>         |  fadd f1, f1, f3
>         |  stfd f1, FORL_IDX*8(RA)
> +      |.else
> +      |  lwz CARG3, FORL_STEP*8(RA)
> +      |  lwz CARG4, FORL_STEP*8+4(RA)
> +      |  mr SAVE1, RD
> +      |  blex __adddf3
> +      |  mr RD, SAVE1
> +      |  stw CRET1, FORL_IDX*8(RA)
> +      |  stw CRET2, FORL_IDX*8+4(RA)
> +      |  lwz CARG3, FORL_STOP*8(RA)
> +      |  lwz CARG4, FORL_STOP*8+4(RA)
> +      |.endif
> +      |   lwz SAVE0, FORL_STEP*8(RA)
>       } else {
>         |.if DUALNUM
>         |9:  // FP loop.
>         |.else
>         |  lwzux TMP1, RA, BASE
> -      |  lwz TMP3, FORL_STEP*8(RA)
> +      |  lwz SAVE0, FORL_STEP*8(RA)
>         |  lwz TMP2, FORL_STOP*8(RA)
>         |  cmplw cr0, TMP1, TISNUM
> -      |  cmplw cr7, TMP3, TISNUM
> +      |  cmplw cr7, SAVE0, TISNUM
>         |  cmplw cr1, TMP2, TISNUM
>         |.endif
> +      |.if FPU
>         |   lfd f1, FORL_IDX*8(RA)
> +      |.else
> +      |   lwz CARG1, FORL_IDX*8(RA)
> +      |   lwz CARG2, FORL_IDX*8+4(RA)
> +      |.endif
>         |  crand 4*cr0+lt, 4*cr0+lt, 4*cr7+lt
>         |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
> +      |.if FPU
>         |   lfd f2, FORL_STOP*8(RA)
> +      |.else
> +      |   lwz CARG3, FORL_STOP*8(RA)
> +      |   lwz CARG4, FORL_STOP*8+4(RA)
> +      |.endif
>         |  bge ->vmeta_for
>       }
> -    |  cmpwi cr6, TMP3, 0
> +    |  cmpwi cr6, SAVE0, 0
>       if (op != BC_JFORL) {
>         |  srwi RD, RD, 1
>       }
> +    |.if FPU
>       |   stfd f1, FORL_EXT*8(RA)
> +    |.else
> +    |   stw CARG1, FORL_EXT*8(RA)
> +    |   stw CARG2, FORL_EXT*8+4(RA)
> +    |.endif
>       if (op != BC_JFORL) {
>         |  add RD, PC, RD
>       }
> +    |.if FPU
>       |  fcmpu cr0, f1, f2
> +    |.else
> +    |  mr SAVE1, RD
> +    |  blex __ledf2
> +    |  cmpwi CRET1, 0
> +    |  mr RD, SAVE1
> +    |.endif
>       if (op == BC_JFORI) {
>         |  addis PC, RD, -(BCBIAS_J*4 >> 16)
>       }

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions
  2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
                   ` (20 preceding siblings ...)
  2023-08-17 14:38 ` Sergey Bronnikov via Tarantool-patches
@ 2023-08-31 15:17 ` Igor Munkin via Tarantool-patches
  21 siblings, 0 replies; 97+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2023-08-31 15:17 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

I've checked the patchset into all long-term branches in
tarantool/luajit and bumped a new version in master, release/2.11 and
release/2.10.

On 09.08.23, Sergey Kaplun wrote:
> This patch-set contains all commits are necessary to avoid conflicts
> during the backporting of 8ae5170c "Improve assertions" [1], that are
> caused by outdating from the upstream.
> 
> This patch-set:
> - includes several new ports (see patches 4-6, 8, 19) -- only
>   description is added for such patches.
> - fixes some MIPS misbehaviour (1, 3, 17) -- include tests (*), except
>   the first one, since it depends on memory mapping.
> - fixes non-Linux/macOS build (7)
> - backportes patches, that was excluded or partially stripped before (10,
>   14, 15)
> - includes refactoring (2, 9, 18)
> - fixes general bugs (16)
> - fixes gcc 7.1 -Wimplicit-fallthrough warnings (11 - 13)
> 
> Note: that only patches 3, 16, 17 adds some new tests.
> Other patches just provided description, and the patch 13 adds
> -Wimplicit-fallthrough for GCC (>= 7.1) builds.
> 
> Patches are backported in the free order as far as they are unrelated
> to each other.
> 
> (*) To run tests for mips64 in qemu:
> 
> Compile with the following command:
> 
> | make -j -f Makefile.original HOST_CC="gcc " \
> |         CROSS=mips64el-unknown-linux-gnu- \
> |         CCDEBUG=" -g -ggdb3" CFLAGS=" -O0" \
> |         XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT "
> 
> Be avare, that mips64el-unknown-linux-gnu-gcc should provide n64 abi by
> default.
> Side note: installed on Gentoo with the following command
> | crossdev -t mips64el --abis n64 --ex-gdb
> 
> And run the corresponding test (-g 7776 to use GDB server on 7776
> port):
> | LUA_PATH="src/?.lua;test/tarantool-tests/?.lua;test/tarantool-tests/?/init.lua;;" \
> | LD_LIBRARY_PATH="/usr/lib/gcc/mips64el-unknown-linux-gnu/13/" \
> |   qemu-mips64el  -g 7776 -L /usr/mips64el-unknown-linux-gnu/ \
> |     src/luajit -jdump=ta test/tarantool-tests/fix-mips64-spare-side-exit-patching.test.lua
> 
> If you want to connect to the running test from multiarch-gdb:
> | mips64el-unknown-linux-gnu-gdb src/luajit
> | (gdb) target remote 0.0.0.0:7776
> | ...
> | 0x000000400297fd00 in __start () from /usr/mips64el-unknown-linux-gnu/lib64/ld.so.1
> | (gdb) c
> 
> [1]: https://github.com/LuaJIT/LuaJIT/commit/8ae5170c
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-8825-mips-ppc-refactoring
> PR: https://github.com/tarantool/tarantool/pull/8969
> Related Issues:
> * https://github.com/tarantool/tarantool/issues/8825
> * https://github.com/LuaJIT/LuaJIT/pull/362
> * https://github.com/LuaJIT/LuaJIT/issues/812
> 
> Mike Pall (17):
>   MIPS: Use precise search for exit jump patching.
>   MIPS: Fix handling of spare long-range jump slots.
>   MIPS64: Add soft-float support to JIT compiler backend.
>   PPC: Add soft-float support to interpreter.
>   PPC: Add soft-float support to JIT compiler backend.
>   Windows: Add UWP support, part 1.
>   FFI: Eliminate hardcoded string hashes.
>   Cleanup math function compilation and fix inconsistencies.
>   Fix GCC 7 -Wimplicit-fallthrough warnings.
>   DynASM: Fix warning.
>   ARM: Fix GCC 7 -Wimplicit-fallthrough warnings.
>   Fix debug.getinfo() argument check.
>   Fix LJ_MAX_JSLOTS assertion in rec_check_slots().
>   Prevent integer overflow while parsing long strings.
>   MIPS64: Fix register allocation in assembly of HREF.
>   DynASM/MIPS: Fix shadowed variable.
>   MIPS: Add MIPS64 R6 port.
> 
> Sergey Kaplun (2):
>   test: introduce mcode generator for tests
>   build: fix non-Linux/macOS builds
> 

<snipped>

> 
> -- 
> 2.41.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2023-08-31 15:35 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-09 15:35 [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 01/19] MIPS: Use precise search for exit jump patching Sergey Kaplun via Tarantool-patches
2023-08-15  9:36   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 12:40     ` Sergey Kaplun via Tarantool-patches
2023-08-16 13:25   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests Sergey Kaplun via Tarantool-patches
2023-08-15 10:14   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 12:55     ` Sergey Kaplun via Tarantool-patches
2023-08-16 13:06       ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:32   ` Sergey Bronnikov via Tarantool-patches
2023-08-16 15:20     ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:08       ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 03/19] MIPS: Fix handling of spare long-range jump slots Sergey Kaplun via Tarantool-patches
2023-08-15 11:13   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:05     ` Sergey Kaplun via Tarantool-patches
2023-08-16 15:02   ` Sergey Bronnikov via Tarantool-patches
2023-08-16 15:32     ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:08       ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 04/19] MIPS64: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
2023-08-15 11:27   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:10     ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:07   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter Sergey Kaplun via Tarantool-patches
2023-08-15 11:40   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:13     ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:53   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 06/19] PPC: Add soft-float support to JIT compiler backend Sergey Kaplun via Tarantool-patches
2023-08-15 11:46   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:21     ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:33   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 07/19] build: fix non-Linux/macOS builds Sergey Kaplun via Tarantool-patches
2023-08-15 11:58   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:40     ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:31   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 08/19] Windows: Add UWP support, part 1 Sergey Kaplun via Tarantool-patches
2023-08-15 12:09   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:50     ` Sergey Kaplun via Tarantool-patches
2023-08-16 16:40   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 09/19] FFI: Eliminate hardcoded string hashes Sergey Kaplun via Tarantool-patches
2023-08-15 13:07   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:52     ` Sergey Kaplun via Tarantool-patches
2023-08-16 17:04     ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:35 ` [Tarantool-patches] [PATCH luajit 10/19] Cleanup math function compilation and fix inconsistencies Sergey Kaplun via Tarantool-patches
2023-08-11  8:06   ` Sergey Kaplun via Tarantool-patches
2023-08-15 13:10   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 17:15   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 11/19] Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
2023-08-15 13:17   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 13:59     ` Sergey Kaplun via Tarantool-patches
2023-08-17  7:37   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 12/19] DynASM: Fix warning Sergey Kaplun via Tarantool-patches
2023-08-15 13:21   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:01     ` Sergey Kaplun via Tarantool-patches
2023-08-17  7:39   ` Sergey Bronnikov via Tarantool-patches
2023-08-17  7:51     ` Sergey Bronnikov via Tarantool-patches
2023-08-17  7:58       ` Sergey Kaplun via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 13/19] ARM: Fix GCC 7 -Wimplicit-fallthrough warnings Sergey Kaplun via Tarantool-patches
2023-08-15 13:25   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:08     ` Sergey Kaplun via Tarantool-patches
2023-08-17  7:44   ` Sergey Bronnikov via Tarantool-patches
2023-08-17  8:01     ` Sergey Kaplun via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 14/19] Fix debug.getinfo() argument check Sergey Kaplun via Tarantool-patches
2023-08-15 13:35   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:20     ` Sergey Kaplun via Tarantool-patches
2023-08-16 20:13       ` Maxim Kokryashkin via Tarantool-patches
2023-08-17  8:29   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 15/19] Fix LJ_MAX_JSLOTS assertion in rec_check_slots() Sergey Kaplun via Tarantool-patches
2023-08-15 14:07   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:22     ` Sergey Kaplun via Tarantool-patches
2023-08-17  8:57   ` Sergey Bronnikov via Tarantool-patches
2023-08-17  8:57     ` Sergey Kaplun via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 16/19] Prevent integer overflow while parsing long strings Sergey Kaplun via Tarantool-patches
2023-08-15 14:38   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 14:52     ` Sergey Kaplun via Tarantool-patches
2023-08-17 10:53   ` Sergey Bronnikov via Tarantool-patches
2023-08-17 13:57     ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:28       ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF Sergey Kaplun via Tarantool-patches
2023-08-16  9:01   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 15:17     ` Sergey Kaplun via Tarantool-patches
2023-08-16 20:14       ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 11:06   ` Sergey Bronnikov via Tarantool-patches
2023-08-17 13:50     ` Sergey Kaplun via Tarantool-patches
2023-08-17 14:30       ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 18/19] DynASM/MIPS: Fix shadowed variable Sergey Kaplun via Tarantool-patches
2023-08-16  9:03   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 15:22     ` Sergey Kaplun via Tarantool-patches
2023-08-17 12:01   ` Sergey Bronnikov via Tarantool-patches
2023-08-09 15:36 ` [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port Sergey Kaplun via Tarantool-patches
2023-08-16  9:16   ` Maxim Kokryashkin via Tarantool-patches
2023-08-16 15:24     ` Sergey Kaplun via Tarantool-patches
2023-08-17 13:03   ` Sergey Bronnikov via Tarantool-patches
2023-08-17 13:59     ` Sergey Kaplun via Tarantool-patches
2023-08-16 15:35 ` [Tarantool-patches] [PATCH luajit 00/19] Prerequisites for improve assertions Sergey Kaplun via Tarantool-patches
2023-08-17 14:06   ` Maxim Kokryashkin via Tarantool-patches
2023-08-17 14:38 ` Sergey Bronnikov via Tarantool-patches
2023-08-31 15:17 ` Igor Munkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox