Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Bronnikov via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tarantool-patches@dev.tarantool.org,
	Sergey Kaplun <skaplun@tarantool.org>,
	Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Subject: [Tarantool-patches] [PATCH luajit 3/4][v2] OSX/iOS/ARM64: Fix generation of Mach-O object files.
Date: Thu, 11 Apr 2024 16:22:06 +0300	[thread overview]
Message-ID: <7bdffd2650a785877e03584e6d532e855d09de8a.1712841312.git.sergeyb@tarantool.org> (raw)
In-Reply-To: <cover.1712841312.git.sergeyb@tarantool.org>

From: Mike Pall <mike>

Thanks to Carlo Cabrera.

(cherry picked from commit 3065c910ad6027031aabe2dfd3c26a3d0f014b4f)

The Mach-O FAT object constructed by LuaJIT had an incorrect format.
The problem is reproduced when the target hardware platform has
AVX512F and LuaJIT is compiled with enabled AVX512F instructions.

The problem arises because LuaJIT FFI code for Mach-O file
generation in `bcsave.lua` relies on undefined behavior for
conversions to `uint32_t`. AVX512F has the `vcvttsd2usi`
instruction, which converts `double`/`float` to `uint32_t/uint64_t`.
Earlier architectures (SSE2, AVX2) are sorely lacking such
an instruction, as they only support signed conversions. Unsigned
conversions are done with a signed convert and range shifting -
the exact algorithm depends on the compiler. A side-effect of these
workarounds is that negative `double`/`float` often inadvertently
convert 'as expected', even though this is invoking undefined
behavior. Whereas `vcvttsd2usi` always returns 0x80000000 or
0x8000000000000000 for out-of-range inputs.

The patch fixes the problem, however, the real issue remains unfixed.

Sergey Bronnikov:
* added the description, the test for the problem and
  flavor with AVX512 to `exotic-builds-testing` workflow

Part of tarantool/tarantool#9595
---
 src/jit/bcsave.lua                            |   6 +-
 test/LuaJIT-tests/CMakeLists.txt              |   9 +
 ...-865-cross-generation-mach-o-file.test.lua | 300 ++++++++++++++++++
 3 files changed, 312 insertions(+), 3 deletions(-)
 create mode 100644 test/tarantool-tests/lj-865-cross-generation-mach-o-file.test.lua

diff --git a/src/jit/bcsave.lua b/src/jit/bcsave.lua
index a287d675..7aec1555 100644
--- a/src/jit/bcsave.lua
+++ b/src/jit/bcsave.lua
@@ -446,18 +446,18 @@ typedef struct {
   uint32_t value;
 } mach_nlist;
 typedef struct {
-  uint32_t strx;
+  int32_t strx;
   uint8_t type, sect;
   uint16_t desc;
   uint64_t value;
 } mach_nlist_64;
 typedef struct
 {
-  uint32_t magic, nfat_arch;
+  int32_t magic, nfat_arch;
 } mach_fat_header;
 typedef struct
 {
-  uint32_t cputype, cpusubtype, offset, size, align;
+  int32_t cputype, cpusubtype, offset, size, align;
 } mach_fat_arch;
 typedef struct {
   struct {
diff --git a/test/LuaJIT-tests/CMakeLists.txt b/test/LuaJIT-tests/CMakeLists.txt
index b8e4dfc4..6d073700 100644
--- a/test/LuaJIT-tests/CMakeLists.txt
+++ b/test/LuaJIT-tests/CMakeLists.txt
@@ -52,6 +52,15 @@ if(LUAJIT_NO_UNWIND)
   set(LUAJIT_TEST_TAGS_EXTRA +internal_unwinder)
 endif()
 
+if(CMAKE_C_FLAGS MATCHES "-march=skylake-avx512")
+  # FIXME: Test <bit64.lua> verifies bitwise operations on numbers.
+  # There is a known issue - bitop doesn't work in LuaJIT built
+  # with the enabled AVX512 instruction set, see
+  # https://github.com/tarantool/tarantool/issues/6787.
+  # Hence, skip this when "skylake-avx512" is passed.
+  set(LUAJIT_TEST_TAGS_EXTRA +avx512)
+endif()
+
 set(TEST_SUITE_NAME "LuaJIT-tests")
 
 # XXX: The call produces both test and target <LuaJIT-tests-deps>
diff --git a/test/tarantool-tests/lj-865-cross-generation-mach-o-file.test.lua b/test/tarantool-tests/lj-865-cross-generation-mach-o-file.test.lua
new file mode 100644
index 00000000..04fb5495
--- /dev/null
+++ b/test/tarantool-tests/lj-865-cross-generation-mach-o-file.test.lua
@@ -0,0 +1,300 @@
+local tap = require('tap')
+local test = tap.test('lj-865-cross-generation-mach-o-file')
+local utils = require('utils')
+
+test:plan(1)
+
+-- The test creates an object file in Mach-O format with LuaJIT
+-- bytecode and checks the validity of the object file fields.
+--
+-- The original problem is reproduced with LuaJIT that built with
+-- enabled AVX512F instructions. The support for AVX512F could be
+-- checked in `/proc/cpuinfo` on Linux and
+-- `sysctl hw.optional.avx512f` on Mac. AVX512F must be
+-- implicitly enabled in a C compiler by passing a CPU codename.
+-- Please take a look at the GCC Online Documentation [1] for
+-- available CPU codenames. Also, see the Wikipedia for CPUs with
+-- AVX-512 support [2].
+-- To detect the CPU codename execute:
+-- `gcc -march=native -Q --help=target | grep march`.
+--
+-- 1. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
+-- 2. https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512
+--
+-- Manual steps for reproducing are the following:
+--
+-- $ CC=gcc TARGET_CFLAGS='skylake-avx512' cmake -S . -B build
+-- $ cmake --build build --parallel
+-- $ echo > test.lua
+-- $ LUA_PATH="src/?.lua;;" luajit -b -o osx -a arm test.lua test.o
+-- $ file test.o
+-- empty.o: DOS executable (block device driver)
+
+local ffi = require('ffi')
+
+-- LuaJIT can generate so called Universal Binary with Lua
+-- bytecode. The Universal Binary format is a format for
+-- executable files that run natively on hardware platforms with
+-- different hardware architectures. This concept is more
+-- generally known as a fat binary.
+--
+-- The format of the Mach-O is described in the document
+-- "OS X ABI Mach-O File Format Reference", published by Apple
+-- company. The copy of the (now removed) official documentation
+-- can be found here [1]. Yet another source of truth is
+-- XNU headers, see the definition of C-structures in:
+-- [2] (`nlist_64`), [3] (`fat_arch` and `fat_header`).
+--
+-- There are a good visual representation of Universal Binary
+-- in "Mac OS X Internals" book (pages 67-68) [5] and in the [6].
+-- Below is the schematic structure of Universal Binary, which
+-- includes two executables for PowerPC and Intel i386 (omitted):
+--
+--   0x0000000 ---------------------------------------
+--             |
+-- struct      | 0xcafebabe  FAT_MAGIC                 magic
+-- fat_header  | -------------------------------------
+--             | 0x00000003                            nfat_arch
+--             ---------------------------------------
+--             | 0x00000012  CPU_TYPE_POWERPC          cputype
+--             | -------------------------------------
+--             | 0x00000000  CPU_SUBTYPE_POWERPC_ALL   cpusubtype
+-- struct      | -------------------------------------
+-- fat_arch    | 0x00001000  4096 bytes                offset
+--             | -------------------------------------
+--             | 0x00004224  16932 bytes               size
+--             | -------------------------------------
+--             | 0x0000000c  2^12 = 4096 bytes         align
+--             ---------------------------------------
+--             ---------------------------------------
+--             | 0x00000007  CPU_TYPE_I386             cputype
+--             | -------------------------------------
+--             | 0x00000003  CPU_SUBTYPE_I386_ALL      cpusubtype
+-- struct      | -------------------------------------
+-- fat_arch    | 0x00006000  24576 bytes               offset
+--             | -------------------------------------
+--             | 0x0000292c  10540 bytes               size
+--             | -------------------------------------
+--             | 0x0000000c  2^12 = 4096 bytes         align
+--             ---------------------------------------
+--               Unused
+-- 0x00001000  ---------------------------------------
+--             | 0xfeedface  MH_MAGIC                  magic
+--             | ------------------------------------
+--             | 0x00000012  CPU_TYPE_POWERPC          cputype
+--             | ------------------------------------
+-- struct      | 0x00000000  CPU_SUBTYPE_POWERPC_ALL   cpusubtype
+-- mach_header | ------------------------------------
+--             | 0x00000002  MH_EXECUTE                filetype
+--             | ------------------------------------
+--             | 0x0000000b  10 load commands          ncmds
+--             | ------------------------------------
+--             | 0x00000574  1396 bytes                sizeofcmds
+--             | ------------------------------------
+--             | 0x00000085  DYLDLINK TWOLEVEL         flags
+--             --------------------------------------
+--               Load commands
+--             ---------------------------------------
+--               Data
+--             ---------------------------------------
+--
+--               < x86 executable >
+--
+-- 1. https://github.com/aidansteele/osx-abi-macho-file-format-reference
+-- 2. https://github.com/apple-oss-distributions/xnu/blob/xnu-10002.1.13/EXTERNAL_HEADERS/mach-o/nlist.h
+-- 3. https://github.com/apple-oss-distributions/xnu/blob/xnu-10002.1.13/EXTERNAL_HEADERS/mach-o/fat.h
+-- 4. https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code
+-- 5. https://reverseengineering.stackexchange.com/a/6357/46029
+-- 6. http://formats.kaitai.io/mach_o/index.html
+--
+-- Using the same declarations as defined in <src/jit/bcsave.lua>.
+ffi.cdef[[
+typedef struct
+{
+  uint32_t magic, cputype, cpusubtype, filetype, ncmds, sizeofcmds, flags;
+} mach_header;
+
+typedef struct {
+  uint32_t cmd, cmdsize;
+  char segname[16];
+  uint32_t vmaddr, vmsize, fileoff, filesize;
+  uint32_t maxprot, initprot, nsects, flags;
+} mach_segment_command;
+
+typedef struct {
+  char sectname[16], segname[16];
+  uint32_t addr, size;
+  uint32_t offset, align, reloff, nreloc, flags;
+  uint32_t reserved1, reserved2;
+} mach_section;
+
+typedef struct {
+  uint32_t cmd, cmdsize, symoff, nsyms, stroff, strsize;
+} mach_symtab_command;
+
+typedef struct {
+  int32_t strx;
+  uint8_t type, sect;
+  int16_t desc;
+  uint32_t value;
+} mach_nlist;
+
+typedef struct
+{
+  int32_t magic, nfat_arch;
+} mach_fat_header;
+
+typedef struct
+{
+  int32_t cputype, cpusubtype, offset, size, align;
+} mach_fat_arch;
+
+typedef struct {
+  mach_fat_header fat;
+  mach_fat_arch fat_arch[2];
+  struct {
+    mach_header hdr;
+    mach_segment_command seg;
+    mach_section sec;
+    mach_symtab_command sym;
+  } arch[2];
+  mach_nlist sym_entry;
+  uint8_t space[4096];
+} mach_fat_obj;
+]]
+
+local function create_obj_file(name, arch)
+  local mach_o_path = os.tmpname() .. '.o'
+  local lua_path = os.getenv('LUA_PATH')
+  local lua_bin = utils.exec.luacmd(arg):match('%S+')
+  local cmd_fmt = 'LUA_PATH="%s" %s -b -n "%s" -o osx -a %s -e "print()" %s'
+  local cmd = (cmd_fmt):format(lua_path, lua_bin, name, arch, mach_o_path)
+  local ret = os.execute(cmd)
+  assert(ret == 0, 'cannot create an object file')
+  return mach_o_path
+end
+
+-- Parses a buffer in the Mach-O format and returns
+-- the FAT magic number and `nfat_arch`.
+local function read_mach_o(buf)
+  local res = {
+    header = {
+      magic = 0,
+      nfat_arch = 0,
+    },
+    fat_arch = {},
+  }
+
+  -- Mach-O FAT object.
+  local mach_fat_obj_type = ffi.typeof('mach_fat_obj *')
+  local obj = ffi.cast(mach_fat_obj_type, buf)
+
+  -- Mach-O FAT object header.
+  local mach_fat_header = obj.fat
+  -- Mach-O FAT is BE, target arch is LE.
+  local be32 = bit.bswap
+  res.header.magic = be32(mach_fat_header.magic)
+  res.header.nfat_arch = be32(mach_fat_header.nfat_arch)
+
+  -- Mach-O FAT object arches.
+  for i = 0, res.header.nfat_arch - 1 do
+    local fat_arch = obj.fat_arch[i]
+    local arch = {
+      cputype = be32(fat_arch.cputype),
+      cpusubtype = be32(fat_arch.cpusubtype),
+    }
+    table.insert(res.fat_arch, arch)
+  end
+
+  return res
+end
+
+-- Universal Binary can contain executables for more than one
+-- CPU architecture. For simplicity, the test compares the sum of
+-- CPU types and CPU subtypes.
+--
+-- <src/jit/bcsave.lua:bcsave_machobj> has the definitions of the
+-- numbers below. The original XNU source code may be found in
+-- <osfmk/mach/machine.h> [1].
+--
+-- 1. https://opensource.apple.com/source/xnu/xnu-4570.41.2/osfmk/mach/machine.h.auto.html
+--
+local SUM_CPUTYPE = {
+  arm = 7 + 12,
+}
+local SUM_CPUSUBTYPE = {
+  arm = 3 + 9,
+}
+
+-- The function builds Mach-O FAT object file and retrieves
+-- its header fields (magic and nfat_arch) and fields of each arch
+-- (cputype, cpusubtype).
+--
+-- The Mach-O FAT object header can be retrieved with `otool` on
+-- macOS:
+--
+-- $ otool -f empty.o
+-- Fat headers
+-- fat_magic 0xcafebabe
+-- nfat_arch 2
+-- <snipped>
+--
+-- CPU type and subtype can be retrieved with `lipo` on macOS:
+--
+-- $ luajit -b -o osx -a arm empty.lua empty.o
+-- $ lipo -archs empty.o
+-- i386 armv7
+-- $ luajit -b -o osx -a arm64 empty.lua empty.o
+-- $ lipo -archs empty.o
+-- x86_64 arm64
+local function build_and_check_mach_o(subtest, hw_arch)
+  assert(hw_arch == 'arm')
+
+  subtest:plan(4)
+  -- FAT_MAGIC is an integer containing the value 0xCAFEBABE in
+  -- big-endian byte order format. On a big-endian host CPU,
+  -- this can be validated using the constant FAT_MAGIC;
+  -- on a little-endian host CPU, it can be validated using
+  -- the constant FAT_CIGAM.
+  --
+  -- FAT_NARCH is an integer specifying the number of fat_arch
+  -- data structures that follow. This is the number of
+  -- architectures contained in this binary.
+  --
+  -- See the aforementioned "OS X ABI Mach-O File Format
+  -- Reference".
+  local FAT_MAGIC = '0xffffffffcafebabe'
+  local FAT_NARCH = 2
+
+  local MODULE_NAME = 'lango_team'
+
+  local mach_o_obj_path = create_obj_file(MODULE_NAME, hw_arch)
+  local mach_o_buf = utils.tools.read_file(mach_o_obj_path)
+  assert(mach_o_buf ~= nil and #mach_o_buf ~= 0, 'cannot read an object file')
+
+  local mach_o = read_mach_o(mach_o_buf)
+
+  -- Teardown.
+  assert(os.remove(mach_o_obj_path), 'remove an object file')
+
+  local magic_str = string.format('%#x', mach_o.header.magic)
+  subtest:is(magic_str, FAT_MAGIC,
+             'fat_magic is correct in Mach-O')
+  subtest:is(mach_o.header.nfat_arch, FAT_NARCH,
+             'nfat_arch is correct in Mach-O')
+
+  local total_cputype = 0
+  local total_cpusubtype = 0
+  for i = 1, FAT_NARCH do
+    total_cputype = total_cputype + mach_o.fat_arch[i].cputype
+    total_cpusubtype = total_cpusubtype + mach_o.fat_arch[i].cpusubtype
+  end
+  subtest:is(total_cputype, SUM_CPUTYPE[hw_arch],
+             'cputype is correct in Mach-O')
+  subtest:is(total_cpusubtype, SUM_CPUSUBTYPE[hw_arch],
+             'cpusubtype is correct in Mach-O')
+end
+
+test:test('arm', build_and_check_mach_o, 'arm')
+
+test:done(true)
-- 
2.34.1


  parent reply	other threads:[~2024-04-11 15:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11 13:22 [Tarantool-patches] [PATCH luajit 0/4][v2] Mach-O generation fixes Sergey Bronnikov via Tarantool-patches
2024-04-11 13:22 ` [Tarantool-patches] [PATCH luajit 1/4][v2] ci: add a workflow for testing with AVX512 Sergey Bronnikov via Tarantool-patches
2024-04-12 10:27   ` Sergey Kaplun via Tarantool-patches
2024-04-16 11:53     ` Sergey Bronnikov via Tarantool-patches
2024-04-18  8:24       ` Sergey Kaplun via Tarantool-patches
2024-05-05 12:29         ` Maxim Kokryashkin via Tarantool-patches
2024-06-14 13:55           ` Sergey Bronnikov via Tarantool-patches
2024-06-14 15:24         ` Sergey Bronnikov via Tarantool-patches
2024-04-11 13:22 ` [Tarantool-patches] [PATCH luajit 2/4][v2] test: introduce a helper read_file Sergey Bronnikov via Tarantool-patches
2024-04-12 10:47   ` Sergey Kaplun via Tarantool-patches
2024-04-16 12:02     ` Sergey Bronnikov via Tarantool-patches
2024-06-20 12:14     ` Sergey Bronnikov via Tarantool-patches
2024-04-11 13:22 ` Sergey Bronnikov via Tarantool-patches [this message]
2024-04-12 11:19   ` [Tarantool-patches] [PATCH luajit 3/4][v2] OSX/iOS/ARM64: Fix generation of Mach-O object files Sergey Kaplun via Tarantool-patches
2024-04-16 15:29     ` Sergey Bronnikov via Tarantool-patches
2024-06-13 15:50       ` Sergey Bronnikov via Tarantool-patches
2024-04-11 13:22 ` [Tarantool-patches] [PATCH luajit 4/4][v2] OSX/iOS/ARM64: Fix bytecode embedding in Mach-O object file Sergey Bronnikov via Tarantool-patches
2024-04-12 11:27   ` Sergey Kaplun via Tarantool-patches
2024-06-13 15:40     ` Sergey Bronnikov via Tarantool-patches
2024-06-13 15:47       ` Sergey Bronnikov via Tarantool-patches
2024-06-20 10:15 ` [Tarantool-patches] [PATCH luajit 0/4][v2] Mach-O generation fixes Sergey Kaplun via Tarantool-patches
2024-07-09  8:07 ` Sergey Kaplun via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bdffd2650a785877e03584e6d532e855d09de8a.1712841312.git.sergeyb@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=estetus@gmail.com \
    --cc=m.kokryashkin@tarantool.org \
    --cc=skaplun@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH luajit 3/4][v2] OSX/iOS/ARM64: Fix generation of Mach-O object files.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox