[Tarantool-patches] [PATCH luajit 1/2] OSX/iOS/ARM64: Fix generation of Mach-O object files.
Sergey Kaplun
skaplun at tarantool.org
Mon Apr 8 18:01:25 MSK 2024
Hi, Sergey!
Thanks for the patch!
I'll proceed with the version from the branch.
Please consider my comments below.
Please rebase your branch to the tarantool/master and solve the
conflicts.
On 26.03.24, Sergey Bronnikov wrote:
> Subject: [PATCH 2/3] OSX/iOS/ARM64: Fix generation of Mach-O object files.
>
> Thanks to Carlo Cabrera.
>
> (cherry picked from commit 3065c910ad6027031aabe2dfd3c26a3d0f014b4f)
>
<snipped>
>
> Part of tarantool/tarantool#9595
> ---
> .github/workflows/exotic-builds-testing.yml | 6 +-
> src/jit/bcsave.lua | 6 +-
> test/LuaJIT-tests/CMakeLists.txt | 9 +
> test/LuaJIT-tests/lib/ffi/index | 2 +-
> ...generation_of_mach-o_object_files.test.lua | 343 ++++++++++++++++++
> 5 files changed, 361 insertions(+), 5 deletions(-)
> create mode 100644 test/tarantool-tests/lj-865-fix_generation_of_mach-o_object_files.test.lua
>
> diff --git a/.github/workflows/exotic-builds-testing.yml b/.github/workflows/exotic-builds-testing.yml
> index 859603bd..9cace0bc 100644
> --- a/.github/workflows/exotic-builds-testing.yml
> +++ b/.github/workflows/exotic-builds-testing.yml
> @@ -34,7 +34,7 @@ jobs:
> BUILDTYPE: [Debug, Release]
> ARCH: [ARM64, x86_64]
> GC64: [ON, OFF]
> - FLAVOR: [checkhook, dualnum, gdbjit, nojit, nounwind]
> + FLAVOR: [checkhook, dualnum, gdbjit, nojit, nounwind, avx512]
Please sort entries alphabetically.
> include:
> - BUILDTYPE: Debug
> CMAKEFLAGS: -DCMAKE_BUILD_TYPE=Debug -DLUA_USE_ASSERT=ON -DLUA_USE_APICHECK=ON
> @@ -50,12 +50,16 @@ jobs:
> FLAVORFLAGS: -DLUAJIT_USE_GDBJIT=ON
> - FLAVOR: nounwind
> FLAVORFLAGS: -DLUAJIT_NO_UNWIND=ON
> + - FLAVOR: avx512
> + CMAKEFLAGS: -DCMAKE_C_FLAGS=-march=skylake-avx512 -DCMAKE_C_COMPILER=gcc
Is there any guarantee that the given runner has avx512 support?
If not, our tests will fail due to illegal instruction error.
> exclude:
> - ARCH: ARM64
> GC64: OFF
> # DUALNUM is default for ARM64, no need for additional testing.
> - FLAVOR: dualnum
> ARCH: ARM64
> + - FLAVOR: avx512
> + ARCH: ARM64
Please sort entries alphabetically.
> runs-on: [self-hosted, regular, Linux, '${{ matrix.ARCH }}']
> name: >
> LuaJIT ${{ matrix.FLAVOR }}
> diff --git a/src/jit/bcsave.lua b/src/jit/bcsave.lua
> index a287d675..7aec1555 100644
> --- a/src/jit/bcsave.lua
> +++ b/src/jit/bcsave.lua
> @@ -446,18 +446,18 @@ typedef struct {
> uint32_t value;
> } mach_nlist;
> typedef struct {
> - uint32_t strx;
> + int32_t strx;
> uint8_t type, sect;
> uint16_t desc;
> uint64_t value;
> } mach_nlist_64;
> typedef struct
> {
> - uint32_t magic, nfat_arch;
> + int32_t magic, nfat_arch;
> } mach_fat_header;
> typedef struct
> {
> - uint32_t cputype, cpusubtype, offset, size, align;
> + int32_t cputype, cpusubtype, offset, size, align;
> } mach_fat_arch;
> typedef struct {
> struct {
> diff --git a/test/LuaJIT-tests/CMakeLists.txt b/test/LuaJIT-tests/CMakeLists.txt
> index a0fb5440..29146113 100644
> --- a/test/LuaJIT-tests/CMakeLists.txt
> +++ b/test/LuaJIT-tests/CMakeLists.txt
> @@ -64,6 +64,15 @@ if(LUAJIT_NO_UNWIND)
> set(LUAJIT_TEST_TAGS_EXTRA +internal_unwinder)
> endif()
>
> +if(CMAKE_C_FLAGS MATCHES "-march=skylake-avx512")
> + # Test <bit64.lua> verifies bitwise operations on numbers.
Maybe we should add FIXME: tag here. Fill free to ignore.
> + # There is a known issue - bitop doesn't working in LuaJIT
Typo: s/working/work/
> + # built with enabled AVX512 instruction set,
Typo: s/enabled/the enabled/
> + # see https://github.com/tarantool/tarantool/issues/6787.
> + # Hence, skip this when "skylake-avx512" is passed.
> + set(LUAJIT_TEST_TAGS_EXTRA +avx512)
> +endif()
> +
> add_custom_command(TARGET LuaJIT-tests
> COMMENT "Running LuaJIT-tests"
> COMMAND
> diff --git a/test/LuaJIT-tests/lib/ffi/index b/test/LuaJIT-tests/lib/ffi/index
> index e3a34e30..e7c168c2 100644
> --- a/test/LuaJIT-tests/lib/ffi/index
> +++ b/test/LuaJIT-tests/lib/ffi/index
> @@ -1,4 +1,4 @@
> -bit64.lua +luajit>=2.1
> +bit64.lua +luajit>=2.1 -avx512
Do we need to skip all tests instead of only failed one?
> cdata_var.lua
> copy_fill.lua
> err.lua
> diff --git a/test/tarantool-tests/lj-865-fix_generation_of_mach-o_object_files.test.lua b/test/tarantool-tests/lj-865-fix_generation_of_mach-o_object_files.test.lua
Please, use "-" instead of "_" as a separator, according to other test
names.
Also, the test name is too long. I suggest something like the
following:
| lj-865-cross-generation-mach-o-file.test.lua
Don't forget to update the testname below as well.
> new file mode 100644
> index 00000000..40bd1c74
> --- /dev/null
> +++ b/test/tarantool-tests/lj-865-fix_generation_of_mach-o_object_files.test.lua
> @@ -0,0 +1,343 @@
> +local tap = require('tap')
> +local test = tap.test('lj-865-fix_generation_of_mach-o_object_files')
> +
> +test:plan(4)
> +
> +-- The test creates an object file in Mach-O format with LuaJIT
> +-- bytecode and checks the validness of the object file fields.
Typo? s/validness/validity/
IINM, validness is the obsolete form.
> +--
> +-- The original problem is reproduced with LuaJIT that built with
Typo: s/LuaJIT that built/LuaJIT, which is built/
> +-- enabled AVX512F instructions. The support for AVX512F could be
> +-- checked in `/proc/cpuinfo` on Linux and
> +-- `sysctl hw.optional.avx512f` on Mac. AVX512F must be
> +-- implicitly enabled in a C compiler by passing CPU codename.
Typo: s/CPU codename/a CPU codename/
> +-- Please consult for the available model architecture on
> +-- GCC Online Documentation [1] for available CPU codenames.
> +-- and Wikipedia for CPUs with AVX-512 support [2].
I suggest the following rewording:
| Please consult the GCC Online Documentation [1] for available CPU
| codenames. Also, see the Wikipedia for CPUs with AVX-512 support [2].
Or the follwing:
| Please take a look at the GCC Online Documentation [1] for available
| CPU codenames. Also, see the Wikipedia for CPUs with AVX-512 support [2].
Feel free to reword it like you want.
> +-- To detect CPU codename execute:
Typo: s/CPU codename/the CPU codename/
> +-- `gcc -march=native -Q --help=target | grep march`.
> +--
> +-- 1. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> +-- 2. https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512
> +--
> +-- Manual steps for reproducing are the following:
I suppose you mean "from the original issue". Or we can use
| -DCMAKE_C_FLAGS="-march=skylake-avx512"
> +--
> +-- $ make CC=gcc TARGET_CFLAGS='skylake-avx512' -f Makefile.original
> +-- $ echo > test.lua
> +-- $ LUA_PATH="src/?.lua;;" luajit -b -o osx -a arm test.lua test.o
> +-- $ file test.o
> +-- empty.o: DOS executable (block device driver)
> +
> +local ffi = require('ffi')
> +
> +-- LuaJIT can generate so called Universal Binary with Lua bytetcode.
Comment width is more than 66 symbols.
> +-- The Universal Binary format is a format for executable files
> +-- that run natively on hardware platforms with different hardware
> +-- architectures. This concept is more generally known as a fat binary.
> +--
> +-- Format of the Mach-O is described in the document
Typo: s/Format/The format/
> +-- "OS X ABI Mach-O File Format Reference", published by Apple company.
Typo? s/Reference",/Reference,"/
> +-- Copy of the (now removed) official documentation can be found
Typo: s/Copy/The copy/
> +-- here [1]. Yet another source of thruth are XNU headers,
Typo: s/thruth/truth/
> +-- see definition of C-structures in: [2] (`nlist_64`),
Typo: s/definition/the definition/
> +-- [3] (`fat_arch` and `fat_header`).
> +--
> +-- There are a good visual representation of Universal Binary
Typo: s/are/is/
> +-- in "Mac OS X Internals" book (pages 67-68) [5] and in [6].
Typo: s/in/in the/g
> +-- Below is schematic structure of Universal Binary that
Typo: s/schematic/the schematic/
Typo: s/Binary that/Binary, which/
> +-- includes two executables for PowerPC and Intel i386 (omitted):
> +--
> +-- 0x0000000 ---------------------------------------
> +-- |
> +-- struct | 0xcafebabe FAT_MAGIC magic
> +-- fat_header | -------------------------------------
> +-- | 0x00000003 nfat_arch
> +-- ---------------------------------------
> +-- | 0x00000012 CPU_TYPE_POWERPC cputype
> +-- | -------------------------------------
> +-- | 0x00000000 CPU_SUBTYPE_POWERPC_ALL cpusubtype
> +-- struct | -------------------------------------
> +-- fat_arch | 0x00001000 4096 bytes offset
> +-- | -------------------------------------
> +-- | 0x00004224 16932 bytes size
> +-- | -------------------------------------
> +-- | 0x0000000c 2^12 = 4096 bytes align
> +-- ---------------------------------------
> +-- ---------------------------------------
> +-- | 0x00000007 CPU_TYPE_I386 cputype
> +-- | -------------------------------------
> +-- | 0x00000003 CPU_SUBTYPE_I386_ALL cpusubtype
> +-- struct | -------------------------------------
> +-- fat_arch | 0x00006000 24576 bytes offset
> +-- | -------------------------------------
> +-- | 0x0000292c 10540 bytes size
> +-- | -------------------------------------
> +-- | 0x0000000c 2^12 = 4096 bytes align
> +-- ---------------------------------------
> +-- Unused
> +-- 0x00001000 ---------------------------------------
> +-- | 0xfeedface MH_MAGIC magic
> +-- | ------------------------------------
> +-- | 0x00000012 CPU_TYPE_POWERPC cputype
> +-- | ------------------------------------
> +-- struct | 0x00000000 CPU_SUBTYPE_POWERPC_ALL cpusubtype
> +-- mach_header | ------------------------------------
> +-- | 0x00000002 MH_EXECUTE filetype
> +-- | ------------------------------------
> +-- | 0x0000000b 10 load commands ncmds
> +-- | ------------------------------------
> +-- | 0x00000574 1396 bytes sizeofcmds
> +-- | ------------------------------------
> +-- | 0x00000085 DYLDLINK TWOLEVEL flags
> +-- --------------------------------------
> +-- Load commands
> +-- ---------------------------------------
> +-- Data
> +-- ---------------------------------------
Side note: This diagramm is amazing thanks!
> +--
> +-- < x86 executable >
> +--
> +--
Nit: excess empty line.
> +-- 1. https://github.com/aidansteele/osx-abi-macho-file-format-reference
> +-- 2. https://github.com/apple-oss-distributions/xnu/blob/xnu-10002.1.13/EXTERNAL_HEADERS/mach-o/nlist.h
> +-- 3. https://github.com/apple-oss-distributions/xnu/blob/xnu-10002.1.13/EXTERNAL_HEADERS/mach-o/fat.h
> +-- 4. https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code
> +-- 5. https://reverseengineering.stackexchange.com/a/6357/46029
> +-- 6. http://formats.kaitai.io/mach_o/index.html
> +--
> +-- Using the same declarations as defined in <src/jit/bcsave.lua>.
> +ffi.cdef[[
> +typedef struct
> +{
> + uint32_t magic, cputype, cpusubtype, filetype, ncmds, sizeofcmds, flags;
> +} mach_header;
> +
> +typedef struct
> +{
> + mach_header; uint32_t reserved;
> +} mach_header_64;
> +
> +typedef struct {
> + uint32_t cmd, cmdsize;
> + char segname[16];
> + uint32_t vmaddr, vmsize, fileoff, filesize;
> + uint32_t maxprot, initprot, nsects, flags;
> +} mach_segment_command;
> +
> +typedef struct {
> + uint32_t cmd, cmdsize;
> + char segname[16];
> + uint64_t vmaddr, vmsize, fileoff, filesize;
> + uint32_t maxprot, initprot, nsects, flags;
> +} mach_segment_command_64;
> +
> +typedef struct {
> + char sectname[16], segname[16];
> + uint32_t addr, size;
> + uint32_t offset, align, reloff, nreloc, flags;
> + uint32_t reserved1, reserved2;
> +} mach_section;
> +
> +typedef struct {
> + char sectname[16], segname[16];
> + uint64_t addr, size;
> + uint32_t offset, align, reloff, nreloc, flags;
> + uint32_t reserved1, reserved2, reserved3;
> +} mach_section_64;
> +
> +typedef struct {
> + uint32_t cmd, cmdsize, symoff, nsyms, stroff, strsize;
> +} mach_symtab_command;
> +
> +typedef struct {
> + int32_t strx;
> + uint8_t type, sect;
> + int16_t desc;
> + uint32_t value;
> +} mach_nlist;
> +
> +typedef struct {
> + uint32_t strx;
Should we use here `int32_t` as it is after the changes?
> + uint8_t type, sect;
> + uint16_t desc;
> + uint64_t value;
> +} mach_nlist_64;
> +
> +typedef struct
> +{
> + uint32_t magic, nfat_arch;
Should we use here `int32_t` as it is after the changes?
> +} mach_fat_header;
> +
> +typedef struct
> +{
> + uint32_t cputype, cpusubtype, offset, size, align;
Should we use here `int32_t` as it is after the changes?
> +} mach_fat_arch;
> +
> +typedef struct {
> + mach_fat_header fat;
> + mach_fat_arch fat_arch[2];
> + struct {
> + mach_header hdr;
> + mach_segment_command seg;
> + mach_section sec;
> + mach_symtab_command sym;
> + } arch[2];
> + mach_nlist sym_entry;
> + uint8_t space[4096];
> +} mach_fat_obj;
> +
> +typedef struct {
> + mach_fat_header fat;
> + mach_fat_arch fat_arch[2];
> + struct {
> + mach_header_64 hdr;
> + mach_segment_command_64 seg;
> + mach_section_64 sec;
> + mach_symtab_command sym;
> + } arch[2];
> + mach_nlist_64 sym_entry;
> + uint8_t space[4096];
> +} mach_fat_obj_64;
Should this part be defined in the next patch, since there is no such
cdef at the moment?
OTOH, it makes test more arch-agnostic. So, the adding of the other case
is more simple.
> +]]
> +
> +local function create_obj_file(name, arch)
> + local mach_o_path = os.tmpname() .. '.o'
> + local lua_path = os.getenv('LUA_PATH')
> + local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
> + local cmd_fmt = 'LUA_PATH="%s" %s -b -n "%s" -o osx -a %s -e "print()" %s'
> + local cmd = (cmd_fmt):format(lua_path, lua_bin, name, arch, mach_o_path)
> + local ret = os.execute(cmd)
> + assert(ret == 0, 'cannot create an object file')
> + return mach_o_path
> +end
> +
> +-- Parses a buffer in the Mach-O format and returns
> +-- the FAT magic number and `nfat_arch`.
> +local function read_mach_o(buf, hw_arch)
> + local res = {
> + header = {
> + magic = 0,
> + nfat_arch = 0,
> + },
> + fat_arch = {},
> + }
> +
> + local is64 = hw_arch == 'arm64'
> +
> + -- Mach-O FAT object.
> + local mach_fat_obj_type = ffi.typeof(is64 and
> + 'mach_fat_obj_64 *' or 'mach_fat_obj *')
> + local obj = ffi.cast(mach_fat_obj_type, buf)
> +
> + -- Mach-O FAT object header.
> + local mach_fat_header_type = ffi.typeof('mach_fat_header *')
> + local mach_fat_header = ffi.cast(mach_fat_header_type, obj.fat)
Looks like this cast is excess:
| src/luajit -e '
| local ffi = require"ffi"
| ffi.cdef[[
| struct test {int a; int b;};
| struct test_outer { struct test t; int c;};
| ]]
| local s = ffi.new("struct test_outer", {{0}, 0})
| print(s.t.a)
| '
| 0
> + local be32 = bit.bswap -- Mach-O FAT is BE, target arch is LE.
Please, use the separate line for the comment.
> + res.header.magic = be32(mach_fat_header.magic)
> + res.header.nfat_arch = be32(mach_fat_header.nfat_arch)
> +
> + -- Mach-O FAT object arches.
> + local mach_fat_arch_type = ffi.typeof('mach_fat_arch *')
> + for i = 0, res.header.nfat_arch - 1 do
> + local fat_arch = ffi.cast(mach_fat_arch_type, obj.fat_arch[i])
Looks like this cast is excess.
> + local arch = {
> + cputype = be32(fat_arch.cputype),
> + cpusubtype = be32(fat_arch.cpusubtype),
> + }
> + table.insert(res.fat_arch, arch)
> + end
> +
> + return res
> +end
> +
> +-- Universal Binary can contain executables for more than one
> +-- CPU architecture. For simplicity the test compares
Typo: s/For simplicity/For simplicity,/
Nit: The comment line is "underfilled".
> +-- sum of CPU types and CPU subtypes.
Typo: s/sum/the sum/
> +-- Numbers below are defined in <src/jit/bcsave.lua:bcsave_machobj>.
Typo: s/Numbers/The numbers/
> +-- Original source is XNU source code, <osfmk/mach/machine.h> [1].
Typo: s/Original/The original/
Typo: s/XNU/the XNU/
> +--
> +-- 1. https://opensource.apple.com/source/xnu/xnu-4570.41.2/osfmk/mach/machine.h.auto.html
> +local sum_cputype = {
> + x86 = 7,
> + x64 = 0x01000007,
> + arm = 7 + 12,
> + arm64 = 0x01000007 + 0x0100000c,
> +}
> +local sum_cpusubtype = {
> + x86 = 3,
> + x64 = 3,
> + arm = 3 + 9,
> + arm64 = 3 + 0,
> +}
I suggest the following for the clarification:
| local cputype = {
| x86 = 7,
| x64 = 0x01000007,
| arm = 7,
| arm64 = 0x0100000c,
| }
| local cpusubtype = {
| x86 = 3,
| x64 = 3,
| arm = 9,
| arm64 = 0,
| }
And declare next:
| local sum_cputype = {
| arm = cputype.arm + cputype.x86,
| arm64 = cputype.arm64 + cputype.x64,
| }
| local sum_cpusubtype = {
| arm = cpusubtype.arm + cpusubtype.x86,
| arm64 = cpusubtype.arm64 + cpusubtype.x64,
| }
Also, it is preferable to use upper case since values are constants.
> +
> +-- The function builds Mach-O FAT object file and retrieves
> +-- its header fields (magic and nfat_arch)
> +-- and fields of the each arch (cputype, cpusubtype).
Typo: s/the each/each/
> +--
> +-- Mach-O FAT object header can be retrieved with `otool` on macOS:
Typo: s/Mach-O FAT object/A Mach-O FAT object/
Typo: s/header/headers/
> +--
> +-- $ otool -f empty.o
> +-- Fat headers
> +-- fat_magic 0xcafebabe
> +-- nfat_arch 2
> +-- <snipped>
Minor: it is better to separate quoting in the comment like the
following to simplify reading:
| $ otool -f empty.o
| Fat headers
| fat_magic 0xcafebabe
| nfat_arch 2
| ...
Feel free to ignore.
> +--
> +-- CPU type and subtype can be retrieved with `lipo` on macOS:
> +--
> +-- $ luajit -b -o osx -a arm empty.lua empty.o
> +-- $ lipo -archs empty.o
> +-- i386 armv7
> +-- $ luajit -b -o osx -a arm64 empty.lua empty.o
> +-- $ lipo -archs empty.o
> +-- x86_64 arm64
> +local function build_and_check_mach_o(hw_arch)
Maybe it is better to use a subtest here:
The number of tests is fixed for this function, and it becomes more
clear when we add the additional arch for testing.
> + assert(hw_arch == 'arm' or hw_arch == 'arm64')
> +
> + -- FAT_MAGIC is an integer containing the value 0xCAFEBABE in
> + -- big-endian byte order format. On a big-endian host CPU,
> + -- this can be validated using the constant FAT_MAGIC;
> + -- on a little-endian host CPU, it can be validated using
> + -- the constant FAT_CIGAM.
> + --
> + -- FAT_NARCH is an integer specifying the number of fat_arch
> + -- data structures that follow. This is the number of
> + -- architectures contained in this binary.
> + --
> + -- See aforementioned "OS X ABI Mach-O File Format Reference".
Typo: s/aforementioned/the aforementioned/
> + --
Excess empty line.
> + local FAT_MAGIC = '0xffffffffcafebabe'
> + local FAT_NARCH = 2
> +
> + local MODULE_NAME = 'lango_team'
> +
> + local mach_o_obj_path = create_obj_file(MODULE_NAME, hw_arch)
> + local mach_o_buf = require('utils').tools.read_file(mach_o_obj_path)
Please move `require` of the necessary tool to the start of the file to
be consistent with the code style in tests.
> + assert(mach_o_buf == nil or #mach_o_buf ~= 0, 'cannot read an object file')
Should it be:
| mach_o_buf ~= nil and #mach_o_buf ~= 0
> +
> + local mach_o = read_mach_o(mach_o_buf, hw_arch)
> +
> + -- Teardown.
> + local retcode = os.remove(mach_o_obj_path)
> + assert(retcode == true, 'remove an object file')
Minor: I suggest refactoring this as the following (`retcode` isn't used
anywhere else):
| assert(os.remove(mach_o_obj_path), 'remove an object file')
> +
> + local magic_str = string.format('0x%02x', mach_o.header.magic)
Why do we need 02 in the format string?
Also, you may use '%#x' for the same result.
> + test:is(magic_str, FAT_MAGIC,
> + 'fat_magic is correct in Mach-O, ' .. hw_arch)
> + test:is(mach_o.header.nfat_arch, FAT_NARCH,
> + 'nfat_arch is correct in Mach-O, ' .. hw_arch)
> +
> + local total_cputype = 0
> + local total_cpusubtype = 0
> + for i = 1, mach_o.header.nfat_arch do
Minor: I suggest using `FAT_NARCH` here since we've checked their
equality above.
> + total_cputype = total_cputype + mach_o.fat_arch[i].cputype
> + total_cpusubtype = total_cpusubtype + mach_o.fat_arch[i].cpusubtype
> + end
> + test:is(total_cputype, sum_cputype[hw_arch],
> + 'cputype is correct in Mach-O, ' .. hw_arch)
> + test:is(total_cpusubtype, sum_cpusubtype[hw_arch],
> + 'cpusubtype is correct in Mach-O, ' .. hw_arch)
> +end
> +
> +build_and_check_mach_o('arm')
> +
> +test:done(true)
> --
> 2.44.0
--
Best regards,
Sergey Kaplun
More information about the Tarantool-patches
mailing list