[Tarantool-patches] [PATCH] Fix saved bytecode encapsulated in ELF objects.
Sergey Kaplun
skaplun at tarantool.org
Tue May 23 16:09:10 MSK 2023
Hi, Sergey!
Thanks for the patch!
Please consider my comments below.
On 22.05.23, Sergey Bronnikov wrote:
> From: Sergey Bronnikov <sergeyb at tarantool.org>
>
> Thanks to Dimitry Andric.
>
> (cherry picked from commit 7dbf0b05f1228c1c719866db5e5f3d58f87f74c8)
>
> Function `bcsave.lua:bcsave_elfobj()` produced an object file in ELF
> format with a wrong size size of `.strtab`. Wrong size causes lld to
> show an error message similar to: `ld: error: obj/bytecode.o: string
> table non-null terminated`.
>
> Sergey Bronnikov:
> * added the description and the test for the problem
>
> Signed-off-by: Sergey Bronnikov <sergeyb at tarantool.org>
> ---
> Branch: https://github.com/tarantool/luajit/tree/ligurio/gh-366-strtab-section-correct-size
> PR: https://github.com/tarantool/tarantool/pull/8678
> ---
> src/jit/bcsave.lua | 2 +-
> .../lj-366-strtab-correct-size.test.lua | 148 ++++++++++++++++++
> 2 files changed, 149 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/lj-366-strtab-correct-size.test.lua
>
> diff --git a/src/jit/bcsave.lua b/src/jit/bcsave.lua
> index c17c88e0..2553d97e 100644
> --- a/src/jit/bcsave.lua
> +++ b/src/jit/bcsave.lua
> @@ -275,7 +275,7 @@ typedef struct {
> o.sect[2].size = fofs(ofs)
> o.sect[3].type = f32(3) -- .strtab
> o.sect[3].ofs = fofs(sofs + ofs)
> - o.sect[3].size = fofs(#symname+1)
> + o.sect[3].size = fofs(#symname+2)
> ffi.copy(o.space+ofs+1, symname)
> ofs = ofs + #symname + 2
> o.sect[4].type = f32(1) -- .rodata
> diff --git a/test/tarantool-tests/lj-366-strtab-correct-size.test.lua b/test/tarantool-tests/lj-366-strtab-correct-size.test.lua
> new file mode 100644
> index 00000000..d4b51537
> --- /dev/null
> +++ b/test/tarantool-tests/lj-366-strtab-correct-size.test.lua
> @@ -0,0 +1,148 @@
> +local tap = require('tap')
> +local test = tap.test('lj-366-strtab-correct-size'):skipcond({
> + -- Test is ELF-specific and because LuaJIT exports object files in ELF format
Typo: s/Test/The test/
Typo: s/, and/and/
Nit: Comment line is wider than 66 symbols.
Here and below.
> + -- for all operating systems except macOS and Windows we skip test on
> + -- these OSes. See src/jit/bcsave.lua:bcsave_obj.
> + ['Disabled on Windows'] = jit.os == 'Windows',
Side note: I'm not sure, that we shall mention Windows, when we still
don't support it. Still, this skipcond is necessary, so I'm OK with it.
> + ['Disabled on macOS'] = jit.os == 'OSX',
> +})
> +
> +local ffi = require 'ffi'
> +
> +-- Reference numbers for strtab offset and size could be obtained with
> +-- readelf(1). Note that number system of these number is hexadecimal.
Typo: s/number/numbers/
> +--
> +-- $ luajit -b -n "module_name" -e "print()" xxx.obj
> +-- $ readelf --section-headers xxx.obj
Minor: May you also provide the linker command with the failure
mentioned in the issue?
Side note: I suppose that the starting \0 byte [1] was forgotten to
count. :)
> +-- There are 6 section headers, starting at offset 0x40:
> +--
> +-- Section Headers:
> +-- [Nr] Name Type Address Offset
> +-- Size EntSize Flags Link Info Align
> +-- ...
> +--
> +-- [ 3] .strtab STRTAB 0000000000000000 00000223
> +-- 0000000000000017 0000000000000000 0 0 1
> +-- ...
> +
> +local expected_strtab_size = 23 -- == 0x17
Minor: Also, comment about this size (length of
`LJBC_PREFIX..ctx.modname` + 2) is desirable. Maybe we should copy this
string and calculate the size here as well.
Feel free to ignore.
> +local expected_strtab_offset = 547 -- == 0x223
Why don't declare as 0x17, 0x223?
> +
> +-- Defined in elf.h.
> +local sht_symtab = 2
Minor: Maybe it is better to use UPPER-CASE for constants?
Feel free to ignore.
> +
> +ffi.cdef[[
> +typedef struct {
> + uint8_t emagic[4], eclass, eendian, eversion, eosabi, eabiversion, epad[7];
> + uint16_t type, machine;
> + uint32_t version;
> + uint32_t entry, phofs, shofs;
> + uint32_t flags;
> + uint16_t ehsize, phentsize, phnum, shentsize, shnum, shstridx;
> +} ELF32header;
I suggest to add additional empty line between each declaration for
better readability.
> +typedef struct {
> + uint8_t emagic[4], eclass, eendian, eversion, eosabi, eabiversion, epad[7];
> + uint16_t type, machine;
> + uint32_t version;
> + uint64_t entry, phofs, shofs;
> + uint32_t flags;
> + uint16_t ehsize, phentsize, phnum, shentsize, shnum, shstridx;
> +} ELF64header;
> +typedef struct {
> + uint32_t name, type, flags, addr, ofs, size, link, info, align, entsize;
> +} ELF32sectheader;
> +typedef struct {
> + uint32_t name, type;
> + uint64_t flags, addr, ofs, size;
> + uint32_t link, info;
> + uint64_t align, entsize;
> +} ELF64sectheader;
> +typedef struct {
> + uint32_t name, value, size;
> + uint8_t info, other;
> + uint16_t sectidx;
> +} ELF32symbol;
I have the following declaration in <elf.h>, maybe we should use the the
same declarations ad litteram? Just to avoid inconsistencies in naming?
The same for all other structures defined.
| typedef struct
| {
| Elf32_Word st_name; /* Symbol name (string tbl index) */
| Elf32_Addr st_value; /* Symbol value */
| Elf32_Word st_size; /* Symbol size */
| unsigned char st_info; /* Symbol type and binding */
| unsigned char st_other; /* Symbol visibility */
| Elf32_Section st_shndx; /* Section index */
| } Elf32_Sym;
Or we may use the same as defined in <src/jit/bcsave.lua> (as you did),
but the comment about it is desirable.
> +typedef struct {
> + uint32_t name;
> + uint8_t info, other;
> + uint16_t sectidx;
> + uint64_t value, size;
> +} ELF64symbol;
> +typedef struct {
> + ELF32header hdr;
> + ELF32sectheader sect[6];
> + ELF32symbol sym[2];
> + uint8_t space[4096];
> +} ELF32obj;
> +typedef struct {
> + ELF64header hdr;
> + ELF64sectheader sect[6];
> + ELF64symbol sym[2];
> + uint8_t space[4096];
> +} ELF64obj;
> +]]
> +
> +local is64_arch = {
> + ["x64"] = true,
Nit: please, use single-quotes for strings, here and below.
> + ["arm64"] = true,
> + ["arm64be"] = true,
> + ["ppc"] = false,
> + ["mips"] = false,
Why do we need ppc, and mips definitions here?
> +}
> +
> +local is64 = is64_arch[jit.arch] or false
> +local ELFobj_type = ffi.typeof(is64 and "ELF64obj *" or "ELF32obj *")
> +local ELFsectheader_type = ffi.typeof(is64 and "ELF64sectheader *" or "ELF32sectheader")
Should it be `ELF32sectheader *`?
> +
> +-- Reads a file located in a specified path and returns its content.
> +local function read_file(path)
> + local file = assert(io.open(path), 'cannot open an object file')
> + local content = file:read("*a")
> + file:close()
> + return content
> +end
> +
> +-- Parses a buffer in an ELF format and returns an offset and a size of strtab
> +-- section for a symtab.
> +local function read_elf_strtab(elf_content)
> + local elf = ffi.cast(ELFobj_type, elf_content)
> + local elf_header = elf.hdr
> + local sec_strtab
> + for i = 0, elf_header.shnum do
> + local sec = ffi.cast(ELFsectheader_type, elf.sect[i])
> + if sec.type == sht_symtab then
Why don't just to find .strtab instead?
LuaJIT emits really a small obj file, (it's literally the third section).
> + sec_strtab = ffi.cast(ELFsectheader_type, elf.sect[sec.link])
> + break
> + end
> + end
> + return sec_strtab.size, sec_strtab.ofs
> +end
> +
> +local function lua_bin_path(arg)
Don't get why do we need this function to get luabin?
Why just `luacmd` is not enough?
> + local args_str = require('utils').luacmd(arg)
> + local args = {}
> + for a in args_str:gmatch("%S+") do
> + table.insert(args, a);
> + end
> + return args[1]
Why do we need to parse the whole string if we return only first
occurence of %S+?
> +end
> +
> +test:plan(5)
> +
> +local elf_filename = os.tmpname() .. '.obj'
> +local lua_path = os.getenv('LUA_PATH')
> +local lua_bin = lua_bin_path(arg)
> +local cmd = ('LUA_PATH="%s" %s -b -n "module_name" -e "print()" %s'):format(lua_path, lua_bin, elf_filename)
Nit: Code line is wider than 80 symbols.
> +local ret = os.execute(cmd)
> +test:ok(ret == 0, 'create an object file')
I suggest to use `assert()` for these checks for test correcness.
`test:*()` functions -- to check some bugs
`assert()` -- to verify test correctness or validity
So, we have only 2 test to plan: for size and offset.
> +
> +local elf_content = read_file(elf_filename)
> +test:ok(#elf_content ~= 0, 'read an object file')
> +local strtab_size, strtab_offset = read_elf_strtab(elf_content)
> +test:ok(strtab_size == expected_strtab_size, 'check .strtab size')
> +test:ok(strtab_offset == expected_strtab_offset, 'check .strtab offset')
> +
> +ret = os.execute(("rm -f %s"):format(elf_filename))
I suppose, that we may use `os.remove()` here.
> +test:ok(ret == 0, 'remove an object file')
> +
> +os.exit(test:check() and 0 or 1)
> --
> 2.34.1
>
[1]: https://refspecs.linuxbase.org/elf/gabi4+/ch4.strtab.html
--
Best regards,
Sergey Kaplun
More information about the Tarantool-patches
mailing list