Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit 0/2][v2] Fix embedded bytecode loader
@ 2023-08-31 11:29 Sergey Bronnikov via Tarantool-patches
  2023-08-31 11:30 ` [Tarantool-patches] [PATCH luajit 1/2][v2] " Sergey Bronnikov via Tarantool-patches
  2023-08-31 11:32 ` [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for " Sergey Bronnikov via Tarantool-patches
  0 siblings, 2 replies; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-31 11:29 UTC (permalink / raw)
  To: tarantool-patches, Sergey Kaplun, max.kokryashkin

From: Sergey Bronnikov <sergeyb@tarantool.org>

Tarantool PR: https://github.com/tarantool/tarantool/pull/8913
Related issue: https://github.com/LuaJIT/LuaJIT/issues/549
Branch: https://github.com/tarantool/luajit/tree/ligurio/lj-549-fix-embedded-bytecode-loader
Patches v1: https://lists.tarantool.org/tarantool-patches/cover.1690300762.git.sergeyb@tarantool.org/

Changes v2:
- added a Lua test that reproduces a problem like reproducer do
- added fixes according to comments from Maxim K. and Sergey K.

Mike Pall (2):
  Fix embedded bytecode loader.
  Followup fix for embedded bytecode loader.

 src/lib_package.c                             |   4 +-
 src/lj_bcread.c                               |  10 +-
 src/lj_lex.c                                  |   7 +
 src/lj_lex.h                                  |   1 +
 test/tarantool-c-tests/lj-549-lua_load.test.c | 134 ++++++++++++++++++
 .../lj-549-bytecode-loader.test.lua           |  96 +++++++++++++
 6 files changed, 245 insertions(+), 7 deletions(-)
 create mode 100644 test/tarantool-c-tests/lj-549-lua_load.test.c
 create mode 100644 test/tarantool-tests/lj-549-bytecode-loader.test.lua

-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-08-31 11:29 [Tarantool-patches] [PATCH luajit 0/2][v2] Fix embedded bytecode loader Sergey Bronnikov via Tarantool-patches
@ 2023-08-31 11:30 ` Sergey Bronnikov via Tarantool-patches
  2023-08-31 11:49   ` Sergey Bronnikov via Tarantool-patches
                     ` (3 more replies)
  2023-08-31 11:32 ` [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for " Sergey Bronnikov via Tarantool-patches
  1 sibling, 4 replies; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-31 11:30 UTC (permalink / raw)
  To: tarantool-patches, Sergey Kaplun, max.kokryashkin

From: Sergey Bronnikov <sergeyb@tarantool.org>

(cherry-picked from commit 820339960123dc78a7ce03edf53fcf4fdae0e55d)

The original problem is specific to x32 and is as follows: when a chunk
with a bytecode library is loaded into memory, and the address is higher
than 0x80000100, the `LexState->pe`, that contains an address of the end
of the bytecode chunk in the memory, will wrap around and become smaller
than the address in `LexState->p`, that contains an address of the
beginning of bytecode chunk in the memory. In `bcread_fill()` called by
`bcread_want()`, `memcpy()` is called with a very large size and causes
bus error on x86 and segmentation fault on ARM Android.

The problem cannot be reproduced on platforms supported by Tarantool
(ARM64, x86_64), so test doesn't reproduce a problem without a patch and
tests the patch partially.

Sergey Bronnikov:
* added the description and the test
---
 src/lib_package.c                             |  4 +-
 src/lj_bcread.c                               | 10 +-
 src/lj_lex.c                                  |  6 ++
 src/lj_lex.h                                  |  1 +
 .../lj-549-bytecode-loader.test.lua           | 96 +++++++++++++++++++
 5 files changed, 110 insertions(+), 7 deletions(-)
 create mode 100644 test/tarantool-tests/lj-549-bytecode-loader.test.lua

diff --git a/src/lib_package.c b/src/lib_package.c
index b49f0209..12603038 100644
--- a/src/lib_package.c
+++ b/src/lib_package.c
@@ -260,7 +260,7 @@ static int ll_loadfunc(lua_State *L, const char *path, const char *name, int r)
       const char *bcdata = ll_bcsym(*reg, mksymname(L, name, SYMPREFIX_BC));
       lua_pop(L, 1);
       if (bcdata) {
-	if (luaL_loadbuffer(L, bcdata, LJ_MAX_BUF, name) != 0)
+	if (luaL_loadbuffer(L, bcdata, ~(size_t)0, name) != 0)
 	  return PACKAGE_ERR_LOAD;
 	return 0;
       }
@@ -431,7 +431,7 @@ static int lj_cf_package_loader_preload(lua_State *L)
   if (lua_isnil(L, -1)) {  /* Not found? */
     const char *bcname = mksymname(L, name, SYMPREFIX_BC);
     const char *bcdata = ll_bcsym(NULL, bcname);
-    if (bcdata == NULL || luaL_loadbuffer(L, bcdata, LJ_MAX_BUF, name) != 0)
+    if (bcdata == NULL || luaL_loadbuffer(L, bcdata, ~(size_t)0, name) != 0)
       lua_pushfstring(L, "\n\tno field package.preload['%s']", name);
   }
   return 1;
diff --git a/src/lj_bcread.c b/src/lj_bcread.c
index cddf6ff1..48ec15e4 100644
--- a/src/lj_bcread.c
+++ b/src/lj_bcread.c
@@ -79,6 +79,7 @@ static LJ_NOINLINE void bcread_fill(LexState *ls, MSize len, int need)
       ls->c = -1;  /* Only bad if we get called again. */
       break;
     }
+    if (sz >= LJ_MAX_BUF - n) lj_err_mem(ls->L);
     if (n) {  /* Append to buffer. */
       n += (MSize)sz;
       p = lj_buf_need(&ls->sb, n < len ? len : n);
@@ -90,20 +91,20 @@ static LJ_NOINLINE void bcread_fill(LexState *ls, MSize len, int need)
       ls->p = buf;
       ls->pe = buf + sz;
     }
-  } while (ls->p + len > ls->pe);
+  } while ((MSize)(ls->pe - ls->p) < len);
 }
 
 /* Need a certain number of bytes. */
 static LJ_AINLINE void bcread_need(LexState *ls, MSize len)
 {
-  if (LJ_UNLIKELY(ls->p + len > ls->pe))
+  if (LJ_UNLIKELY((MSize)(ls->pe - ls->p) < len))
     bcread_fill(ls, len, 1);
 }
 
 /* Want to read up to a certain number of bytes, but may need less. */
 static LJ_AINLINE void bcread_want(LexState *ls, MSize len)
 {
-  if (LJ_UNLIKELY(ls->p + len > ls->pe))
+  if (LJ_UNLIKELY((MSize)(ls->pe - ls->p) < len))
     bcread_fill(ls, len, 0);
 }
 
@@ -463,8 +464,7 @@ GCproto *lj_bcread(LexState *ls)
     setprotoV(L, L->top, pt);
     incr_top(L);
   }
-  if ((int32_t)(2*(uint32_t)(ls->pe - ls->p)) > 0 ||
-      L->top-1 != bcread_oldtop(L, ls))
+  if ((ls->pe != ls->p && !ls->endmark) || L->top-1 != bcread_oldtop(L, ls))
     bcread_error(ls, LJ_ERR_BCBAD);
   /* Pop off last prototype. */
   L->top--;
diff --git a/src/lj_lex.c b/src/lj_lex.c
index cef3c683..6291705f 100644
--- a/src/lj_lex.c
+++ b/src/lj_lex.c
@@ -48,6 +48,11 @@ static LJ_NOINLINE LexChar lex_more(LexState *ls)
   size_t sz;
   const char *p = ls->rfunc(ls->L, ls->rdata, &sz);
   if (p == NULL || sz == 0) return LEX_EOF;
+  if (sz >= LJ_MAX_BUF) {
+    if (sz != ~(size_t)0) lj_err_mem(ls->L);
+    sz = ~(uintptr_t)0 - (uintptr_t)p;
+    ls->endmark = 1;
+  }
   ls->pe = p + sz;
   ls->p = p + 1;
   return (LexChar)(uint8_t)p[0];
@@ -408,6 +413,7 @@ int lj_lex_setup(lua_State *L, LexState *ls)
   ls->lookahead = TK_eof;  /* No look-ahead token. */
   ls->linenumber = 1;
   ls->lastline = 1;
+  ls->endmark = 0;
   lex_next(ls);  /* Read-ahead first char. */
   if (ls->c == 0xef && ls->p + 2 <= ls->pe && (uint8_t)ls->p[0] == 0xbb &&
       (uint8_t)ls->p[1] == 0xbf) {  /* Skip UTF-8 BOM (if buffered). */
diff --git a/src/lj_lex.h b/src/lj_lex.h
index ae05a954..a26e504a 100644
--- a/src/lj_lex.h
+++ b/src/lj_lex.h
@@ -73,6 +73,7 @@ typedef struct LexState {
   BCInsLine *bcstack;	/* Stack for bytecode instructions/line numbers. */
   MSize sizebcstack;	/* Size of bytecode stack. */
   uint32_t level;	/* Syntactical nesting level. */
+  int endmark;		/* Trust bytecode end marker, even if not at EOF. */
 } LexState;
 
 LJ_FUNC int lj_lex_setup(lua_State *L, LexState *ls);
diff --git a/test/tarantool-tests/lj-549-bytecode-loader.test.lua b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
new file mode 100644
index 00000000..889be80a
--- /dev/null
+++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
@@ -0,0 +1,96 @@
+local tap = require('tap')
+local ffi = require('ffi')
+local utils = require('utils')
+local test = tap.test('lj-549-bytecode-loader'):skipcond({
+    -- ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
+})
+
+test:plan(1)
+
+-- Test creates a shared library with LuaJIT bytecode,
+-- loads shared library as a Lua module and checks,
+-- that no crashes eliminated.
+--
+-- $ make HOST_CC='gcc -m32' TARGET_CFLAGS='-m32' \
+--                           TARGET_LDFLAGS='-m32' \
+--                           TARGET_SHLDFLAGS='-m32' \
+--                           -f Makefile.original
+-- $ echo 'print("test")' > a.lua
+-- $ LUA_PATH="src/?.lua;;" luajit -b a.lua a.c
+-- $ gcc -m32 -fPIC -shared a.c -o a.so
+-- $ luajit -e "require('a')"
+-- Program received signal SIGBUS, Bus error
+
+local function file_exists(fname)
+   return io.open(fname, 'r') or true and false
+end
+
+local function get_file_name(file)
+    return file:match("[^/]*$")
+end
+
+local stdout_msg = 'Lango team'
+local lua_code = ('print(%q)'):format(stdout_msg)
+local fpath = os.tmpname()
+local path_lua = ('%s.lua'):format(fpath)
+local path_c = ('%s.c'):format(fpath)
+local path_so = ('%s.so'):format(fpath)
+
+-- Create a file with a minimal Lua code.
+local fh = assert(io.open(path_lua, 'w'))
+fh:write(lua_code)
+fh:close()
+
+local module_name = assert(get_file_name(fpath))
+
+local basedir = function(path)
+    local sep = '/'
+    return path:match('(.*' .. sep .. ')') or './'
+end
+
+-- Create a C file with LuaJIT bytecode.
+-- We cannot use utils.makecmd, because command-line generated
+-- by `makecmd` contains `-e` that is incompatible with option `-b`.
+local function create_c_file(pathlua, pathc)
+  local lua_path = os.getenv('LUA_PATH')
+  local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
+  local cmd_fmt = 'LUA_PATH="%s" %s -b %s %s'
+  local cmd = (cmd_fmt):format(lua_path, lua_bin, pathlua, pathc)
+  local ret = os.execute(cmd)
+  assert(ret == 0, 'create a C file with bytecode')
+end
+
+create_c_file(path_lua, path_c)
+assert(file_exists(path_c))
+
+-- Compile C source code with LuaJIT bytecode to a shared library.
+-- `-m64` is not available on ARM64, see
+-- "3.18.1 AArch64 Options in the manual",
+-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
+local cflags_64 = jit.arch == 'arm64' and '-march=armv8-a' or '-m64'
+local cflags = ffi.abi('32bit') and '-m32' or cflags_64
+local cc_cmd = ('cc %s -fPIC -shared %s -o %s'):format(cflags, path_c, path_so)
+local ph = io.popen(cc_cmd)
+ph:close()
+assert(file_exists(path_so))
+
+-- Load shared library as a Lua module.
+local lua_cpath = ('"/tmp/?.so;"'):format(basedir(fpath))
+assert(file_exists(path_so))
+local cmd = utils.exec.makecmd(arg, {
+    script = ('-e "require([[%s]])"'):format(module_name),
+    env = {
+        LUA_CPATH = lua_cpath,
+        -- It is required to cleanup LUA_PATH, otherwise
+        -- LuaJIT loads Lua module, see tarantool-tests/utils/init.lua.
+        LUA_PATH = '',
+    },
+})
+local res = cmd()
+test:ok(res == stdout_msg, 'bytecode loader works')
+
+os.remove(path_lua)
+os.remove(path_c)
+os.remove(path_so)
+
+os.exit(test:check() and 0 or 1)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-08-31 11:29 [Tarantool-patches] [PATCH luajit 0/2][v2] Fix embedded bytecode loader Sergey Bronnikov via Tarantool-patches
  2023-08-31 11:30 ` [Tarantool-patches] [PATCH luajit 1/2][v2] " Sergey Bronnikov via Tarantool-patches
@ 2023-08-31 11:32 ` Sergey Bronnikov via Tarantool-patches
  2023-09-01 10:05   ` Maxim Kokryashkin via Tarantool-patches
  2023-09-05 12:55   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 2 replies; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-31 11:32 UTC (permalink / raw)
  To: tarantool-patches, Sergey Kaplun, max.kokryashkin

From: Sergey Bronnikov <sergeyb@tarantool.org>

(cherry-picked from commit e49863eda13d095b1a78fd4ca0fd3a6a9a17d782)

The patch follows up a previous patch and limits the total size of a
chunk load by `lua_load` with size `LJ_MAX_BUF - 1`.

Sergey Bronnikov:
* added the description and the test
---
 src/lj_lex.c                                  |   1 +
 test/tarantool-c-tests/lj-549-lua_load.test.c | 134 ++++++++++++++++++
 2 files changed, 135 insertions(+)
 create mode 100644 test/tarantool-c-tests/lj-549-lua_load.test.c

diff --git a/src/lj_lex.c b/src/lj_lex.c
index 6291705f..13495c41 100644
--- a/src/lj_lex.c
+++ b/src/lj_lex.c
@@ -51,6 +51,7 @@ static LJ_NOINLINE LexChar lex_more(LexState *ls)
   if (sz >= LJ_MAX_BUF) {
     if (sz != ~(size_t)0) lj_err_mem(ls->L);
     sz = ~(uintptr_t)0 - (uintptr_t)p;
+    if (sz >= LJ_MAX_BUF) sz = LJ_MAX_BUF-1;
     ls->endmark = 1;
   }
   ls->pe = p + sz;
diff --git a/test/tarantool-c-tests/lj-549-lua_load.test.c b/test/tarantool-c-tests/lj-549-lua_load.test.c
new file mode 100644
index 00000000..9baa7a1a
--- /dev/null
+++ b/test/tarantool-c-tests/lj-549-lua_load.test.c
@@ -0,0 +1,134 @@
+#include <assert.h>
+#include <stdint.h>
+#include <stddef.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+#include <lua.h>
+#include <lualib.h>
+#include <lauxlib.h>
+
+#include "test.h"
+#include "utils.h"
+
+/* Need for skipcond. */
+#include "lj_arch.h"
+
+/* Defined in lj_def.h. */
+#define LJ_MAX_MEM32    0x7fffff00      /* Max. 32 bit memory allocation. */
+#define LJ_MAX_BUF      LJ_MAX_MEM32    /* Max. buffer length. */
+
+/* Defined in lua.h. */
+/* mark for precompiled code (`<esc>Lua') */
+#define	LUA_SIGNATURE	"\033Lua"
+
+#define UNUSED(x) ((void)(x))
+
+/**
+ * Function generates a huge chunk of "bytecode" with a size bigger than
+ * LJ_MAX_BUF. Generated chunk must enable endmark in a Lex state.
+ */
+static const char *
+bc_reader_with_endmark(lua_State *L, void *data, size_t *size)
+{
+	UNUSED(data);
+	int bc_chunk_size = (size_t)0;
+	static char *bc_chunk = NULL;
+	free(bc_chunk);
+
+	bc_chunk = malloc(bc_chunk_size);
+	assert(bc_chunk != NULL);
+
+	/**
+	 * `lua_load` automatically detects whether the chunk is text or binary,
+	 * and loads it accordingly. We need a trace for bytecode input,
+	 * so it is necessary to deceive a check in lj_lex_setup, that
+	 * makes a sanity check and detects whether input is bytecode or text
+	 * by the first char. Put LUA_SIGNATURE[0] at the beginning of the
+	 * allocated region.
+	 */
+	bc_chunk[0] = LUA_SIGNATURE[0];
+
+	*size = bc_chunk_size;
+
+	return bc_chunk;
+}
+
+static int bc_loader_with_endmark(void *test_state)
+{
+	lua_State *L = test_state;
+	void *ud = NULL;
+	int res = lua_load(L, bc_reader_with_endmark, ud, "endmark");
+
+	/*
+	 * Make sure we passed the condition with lj_err_mem in the function
+	 * `lex_more`.
+	 */
+	assert_true(res != LUA_ERRMEM);
+
+	return TEST_EXIT_SUCCESS;
+}
+
+enum bc_emission_state {
+	EMIT_BC,
+	EMIT_EOF,
+};
+
+typedef struct {
+	enum bc_emission_state state;
+} dt;
+
+/**
+ * Function returns a bytecode chunk on the first call and NULL and size equal
+ * to zero on the second call. Triggers the END_OF_STREAM flag in the function
+ * `lex_more`.
+ */
+static const char *
+bc_reader_with_eof(lua_State *L, void *data, size_t *size)
+{
+	UNUSED(data);
+	UNUSED(L);
+	dt *test_data = (dt *)data;
+	if (test_data->state == EMIT_EOF) {
+		*size = 0;
+		return NULL;
+	}
+
+	static char *bc_chunk = NULL;
+	free(bc_chunk);
+
+	size_t sz = 10;
+	bc_chunk = malloc(sz);
+	bc_chunk[0] = LUA_SIGNATURE[0];
+	*size = sz;
+
+	return bc_chunk;
+}
+
+static int bc_loader_with_eof(void *test_state)
+{
+	lua_State *L = test_state;
+	dt test_data = {0};
+	test_data.state = EMIT_BC;
+	int res = lua_load(L, bc_reader_with_eof, &test_data, "eof");
+	assert_true(res = LUA_ERRSYNTAX);
+	if (res == LUA_OK) {
+		lua_pcall(L, 0, 0, 0);
+	}
+
+	return TEST_EXIT_SUCCESS;
+}
+
+int main(void)
+{
+	lua_State *L = utils_lua_init();
+	const struct test_unit tgroup[] = {
+		test_unit_def(bc_loader_with_endmark),
+		test_unit_def(bc_loader_with_eof)
+	};
+
+	const int test_result = test_run_group(tgroup, L);
+	utils_lua_close(L);
+	return test_result;
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-08-31 11:30 ` [Tarantool-patches] [PATCH luajit 1/2][v2] " Sergey Bronnikov via Tarantool-patches
@ 2023-08-31 11:49   ` Sergey Bronnikov via Tarantool-patches
  2023-09-01  9:42   ` Maxim Kokryashkin via Tarantool-patches
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-08-31 11:49 UTC (permalink / raw)
  To: tarantool-patches, Sergey Kaplun, max.kokryashkin

From: Sergey Bronnikov <sergeyb@tarantool.org>

(cherry-picked from commit 820339960123dc78a7ce03edf53fcf4fdae0e55d)

The original problem is specific to x32 and is as follows: when a chunk
with a bytecode library is loaded into memory, and the address is higher
than 0x80000100, the `LexState->pe`, that contains an address of the end
of the bytecode chunk in the memory, will wrap around and become smaller
than the address in `LexState->p`, that contains an address of the
beginning of bytecode chunk in the memory. In `bcread_fill()` called by
`bcread_want()`, `memcpy()` is called with a very large size and causes
bus error on x86 and segmentation fault on ARM Android.

The problem cannot be reproduced on platforms supported by Tarantool
(ARM64, x86_64), so test doesn't reproduce a problem without a patch and
tests the patch partially.

Sergey Bronnikov:
* added the description and the test
---
 src/lib_package.c                             |  4 +-
 src/lj_bcread.c                               | 10 +-
 src/lj_lex.c                                  |  6 ++
 src/lj_lex.h                                  |  1 +
 .../lj-549-bytecode-loader.test.lua           | 96 +++++++++++++++++++
 5 files changed, 110 insertions(+), 7 deletions(-)
 create mode 100644 test/tarantool-tests/lj-549-bytecode-loader.test.lua

diff --git a/src/lib_package.c b/src/lib_package.c
index b49f0209..12603038 100644
--- a/src/lib_package.c
+++ b/src/lib_package.c
@@ -260,7 +260,7 @@ static int ll_loadfunc(lua_State *L, const char *path, const char *name, int r)
       const char *bcdata = ll_bcsym(*reg, mksymname(L, name, SYMPREFIX_BC));
       lua_pop(L, 1);
       if (bcdata) {
-	if (luaL_loadbuffer(L, bcdata, LJ_MAX_BUF, name) != 0)
+	if (luaL_loadbuffer(L, bcdata, ~(size_t)0, name) != 0)
 	  return PACKAGE_ERR_LOAD;
 	return 0;
       }
@@ -431,7 +431,7 @@ static int lj_cf_package_loader_preload(lua_State *L)
   if (lua_isnil(L, -1)) {  /* Not found? */
     const char *bcname = mksymname(L, name, SYMPREFIX_BC);
     const char *bcdata = ll_bcsym(NULL, bcname);
-    if (bcdata == NULL || luaL_loadbuffer(L, bcdata, LJ_MAX_BUF, name) != 0)
+    if (bcdata == NULL || luaL_loadbuffer(L, bcdata, ~(size_t)0, name) != 0)
       lua_pushfstring(L, "\n\tno field package.preload['%s']", name);
   }
   return 1;
diff --git a/src/lj_bcread.c b/src/lj_bcread.c
index cddf6ff1..48ec15e4 100644
--- a/src/lj_bcread.c
+++ b/src/lj_bcread.c
@@ -79,6 +79,7 @@ static LJ_NOINLINE void bcread_fill(LexState *ls, MSize len, int need)
       ls->c = -1;  /* Only bad if we get called again. */
       break;
     }
+    if (sz >= LJ_MAX_BUF - n) lj_err_mem(ls->L);
     if (n) {  /* Append to buffer. */
       n += (MSize)sz;
       p = lj_buf_need(&ls->sb, n < len ? len : n);
@@ -90,20 +91,20 @@ static LJ_NOINLINE void bcread_fill(LexState *ls, MSize len, int need)
       ls->p = buf;
       ls->pe = buf + sz;
     }
-  } while (ls->p + len > ls->pe);
+  } while ((MSize)(ls->pe - ls->p) < len);
 }
 
 /* Need a certain number of bytes. */
 static LJ_AINLINE void bcread_need(LexState *ls, MSize len)
 {
-  if (LJ_UNLIKELY(ls->p + len > ls->pe))
+  if (LJ_UNLIKELY((MSize)(ls->pe - ls->p) < len))
     bcread_fill(ls, len, 1);
 }
 
 /* Want to read up to a certain number of bytes, but may need less. */
 static LJ_AINLINE void bcread_want(LexState *ls, MSize len)
 {
-  if (LJ_UNLIKELY(ls->p + len > ls->pe))
+  if (LJ_UNLIKELY((MSize)(ls->pe - ls->p) < len))
     bcread_fill(ls, len, 0);
 }
 
@@ -463,8 +464,7 @@ GCproto *lj_bcread(LexState *ls)
     setprotoV(L, L->top, pt);
     incr_top(L);
   }
-  if ((int32_t)(2*(uint32_t)(ls->pe - ls->p)) > 0 ||
-      L->top-1 != bcread_oldtop(L, ls))
+  if ((ls->pe != ls->p && !ls->endmark) || L->top-1 != bcread_oldtop(L, ls))
     bcread_error(ls, LJ_ERR_BCBAD);
   /* Pop off last prototype. */
   L->top--;
diff --git a/src/lj_lex.c b/src/lj_lex.c
index cef3c683..6291705f 100644
--- a/src/lj_lex.c
+++ b/src/lj_lex.c
@@ -48,6 +48,11 @@ static LJ_NOINLINE LexChar lex_more(LexState *ls)
   size_t sz;
   const char *p = ls->rfunc(ls->L, ls->rdata, &sz);
   if (p == NULL || sz == 0) return LEX_EOF;
+  if (sz >= LJ_MAX_BUF) {
+    if (sz != ~(size_t)0) lj_err_mem(ls->L);
+    sz = ~(uintptr_t)0 - (uintptr_t)p;
+    ls->endmark = 1;
+  }
   ls->pe = p + sz;
   ls->p = p + 1;
   return (LexChar)(uint8_t)p[0];
@@ -408,6 +413,7 @@ int lj_lex_setup(lua_State *L, LexState *ls)
   ls->lookahead = TK_eof;  /* No look-ahead token. */
   ls->linenumber = 1;
   ls->lastline = 1;
+  ls->endmark = 0;
   lex_next(ls);  /* Read-ahead first char. */
   if (ls->c == 0xef && ls->p + 2 <= ls->pe && (uint8_t)ls->p[0] == 0xbb &&
       (uint8_t)ls->p[1] == 0xbf) {  /* Skip UTF-8 BOM (if buffered). */
diff --git a/src/lj_lex.h b/src/lj_lex.h
index ae05a954..a26e504a 100644
--- a/src/lj_lex.h
+++ b/src/lj_lex.h
@@ -73,6 +73,7 @@ typedef struct LexState {
   BCInsLine *bcstack;	/* Stack for bytecode instructions/line numbers. */
   MSize sizebcstack;	/* Size of bytecode stack. */
   uint32_t level;	/* Syntactical nesting level. */
+  int endmark;		/* Trust bytecode end marker, even if not at EOF. */
 } LexState;
 
 LJ_FUNC int lj_lex_setup(lua_State *L, LexState *ls);
diff --git a/test/tarantool-tests/lj-549-bytecode-loader.test.lua b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
new file mode 100644
index 00000000..889be80a
--- /dev/null
+++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
@@ -0,0 +1,96 @@
+local tap = require('tap')
+local ffi = require('ffi')
+local utils = require('utils')
+local test = tap.test('lj-549-bytecode-loader'):skipcond({
+    -- ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
+})
+
+test:plan(1)
+
+-- Test creates a shared library with LuaJIT bytecode,
+-- loads shared library as a Lua module and checks,
+-- that no crashes eliminated.
+--
+-- $ make HOST_CC='gcc -m32' TARGET_CFLAGS='-m32' \
+--                           TARGET_LDFLAGS='-m32' \
+--                           TARGET_SHLDFLAGS='-m32' \
+--                           -f Makefile.original
+-- $ echo 'print("test")' > a.lua
+-- $ LUA_PATH="src/?.lua;;" luajit -b a.lua a.c
+-- $ gcc -m32 -fPIC -shared a.c -o a.so
+-- $ luajit -e "require('a')"
+-- Program received signal SIGBUS, Bus error
+
+local function file_exists(fname)
+   return io.open(fname, 'r') or true and false
+end
+
+local function get_file_name(file)
+    return file:match("[^/]*$")
+end
+
+local stdout_msg = 'Lango team'
+local lua_code = ('print(%q)'):format(stdout_msg)
+local fpath = os.tmpname()
+local path_lua = ('%s.lua'):format(fpath)
+local path_c = ('%s.c'):format(fpath)
+local path_so = ('%s.so'):format(fpath)
+
+-- Create a file with a minimal Lua code.
+local fh = assert(io.open(path_lua, 'w'))
+fh:write(lua_code)
+fh:close()
+
+local module_name = assert(get_file_name(fpath))
+
+local basedir = function(path)
+    local sep = '/'
+    return path:match('(.*' .. sep .. ')') or './'
+end
+
+-- Create a C file with LuaJIT bytecode.
+-- We cannot use utils.makecmd, because command-line generated
+-- by `makecmd` contains `-e` that is incompatible with option `-b`.
+local function create_c_file(pathlua, pathc)
+  local lua_path = os.getenv('LUA_PATH')
+  local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
+  local cmd_fmt = 'LUA_PATH="%s" %s -b %s %s'
+  local cmd = (cmd_fmt):format(lua_path, lua_bin, pathlua, pathc)
+  local ret = os.execute(cmd)
+  assert(ret == 0, 'create a C file with bytecode')
+end
+
+create_c_file(path_lua, path_c)
+assert(file_exists(path_c))
+
+-- Compile C source code with LuaJIT bytecode to a shared library.
+-- `-m64` is not available on ARM64, see
+-- "3.18.1 AArch64 Options in the manual",
+-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
+local cflags_64 = jit.arch == 'arm64' and '-march=armv8-a' or '-m64'
+local cflags = ffi.abi('32bit') and '-m32' or cflags_64
+local cc_cmd = ('cc %s -fPIC -shared %s -o %s'):format(cflags, path_c, path_so)
+local ph = io.popen(cc_cmd)
+ph:close()
+assert(file_exists(path_so))
+
+-- Load shared library as a Lua module.
+local lua_cpath = ('"/tmp/?.so;"'):format(basedir(fpath))
+assert(file_exists(path_so))
+local cmd = utils.exec.makecmd(arg, {
+    script = ('-e "require([[%s]])"'):format(module_name),
+    env = {
+        LUA_CPATH = lua_cpath,
+        -- It is required to cleanup LUA_PATH, otherwise
+        -- LuaJIT loads Lua module, see tarantool-tests/utils/init.lua.
+        LUA_PATH = '',
+    },
+})
+local res = cmd()
+test:ok(res == stdout_msg, 'bytecode loader works')
+
+os.remove(path_lua)
+os.remove(path_c)
+os.remove(path_so)
+
+os.exit(test:check() and 0 or 1)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-08-31 11:30 ` [Tarantool-patches] [PATCH luajit 1/2][v2] " Sergey Bronnikov via Tarantool-patches
  2023-08-31 11:49   ` Sergey Bronnikov via Tarantool-patches
@ 2023-09-01  9:42   ` Maxim Kokryashkin via Tarantool-patches
  2023-09-04  9:31     ` Sergey Bronnikov via Tarantool-patches
  2023-09-05 14:10   ` Sergey Kaplun via Tarantool-patches
  2023-09-05 14:12   ` Sergey Kaplun via Tarantool-patches
  3 siblings, 1 reply; 21+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-09-01  9:42 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: max.kokryashkin, tarantool-patches

Hi, Sergey!
Thanks for the patch!
Please consider my comments below.
On Thu, Aug 31, 2023 at 02:30:38PM +0300, Sergey Bronnikov via Tarantool-patches wrote:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
> 
> (cherry-picked from commit 820339960123dc78a7ce03edf53fcf4fdae0e55d)
> 
> The original problem is specific to x32 and is as follows: when a chunk
> with a bytecode library is loaded into memory, and the address is higher
> than 0x80000100, the `LexState->pe`, that contains an address of the end
> of the bytecode chunk in the memory, will wrap around and become smaller
> than the address in `LexState->p`, that contains an address of the
> beginning of bytecode chunk in the memory. In `bcread_fill()` called by
> `bcread_want()`, `memcpy()` is called with a very large size and causes
> bus error on x86 and segmentation fault on ARM Android.
> 
> The problem cannot be reproduced on platforms supported by Tarantool
> (ARM64, x86_64), so test doesn't reproduce a problem without a patch and
> tests the patch partially.
> 
> Sergey Bronnikov:
> * added the description and the test
> ---
>  src/lib_package.c                             |  4 +-
>  src/lj_bcread.c                               | 10 +-
>  src/lj_lex.c                                  |  6 ++
>  src/lj_lex.h                                  |  1 +
>  .../lj-549-bytecode-loader.test.lua           | 96 +++++++++++++++++++
>  5 files changed, 110 insertions(+), 7 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-549-bytecode-loader.test.lua
> 
> diff --git a/src/lib_package.c b/src/lib_package.c
> index b49f0209..12603038 100644
> --- a/src/lib_package.c
> +++ b/src/lib_package.c
<snipped>

> diff --git a/src/lj_bcread.c b/src/lj_bcread.c
> index cddf6ff1..48ec15e4 100644
> --- a/src/lj_bcread.c
> +++ b/src/lj_bcread.c
<snipped>

> diff --git a/src/lj_lex.c b/src/lj_lex.c
> index cef3c683..6291705f 100644
> --- a/src/lj_lex.c
> +++ b/src/lj_lex.c
<snipped>

> diff --git a/src/lj_lex.h b/src/lj_lex.h
> index ae05a954..a26e504a 100644
> --- a/src/lj_lex.h
> +++ b/src/lj_lex.h
<snipped>

> diff --git a/test/tarantool-tests/lj-549-bytecode-loader.test.lua b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
> new file mode 100644
> index 00000000..889be80a
> --- /dev/null
> +++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
> @@ -0,0 +1,96 @@
> +local tap = require('tap')
> +local ffi = require('ffi')
> +local utils = require('utils')
> +local test = tap.test('lj-549-bytecode-loader'):skipcond({
> +    -- ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
Why this skipcond is commented out?
> +})
> +
> +test:plan(1)
> +
> +-- Test creates a shared library with LuaJIT bytecode,
> +-- loads shared library as a Lua module and checks,
> +-- that no crashes eliminated.
> +--
> +-- $ make HOST_CC='gcc -m32' TARGET_CFLAGS='-m32' \
> +--                           TARGET_LDFLAGS='-m32' \
> +--                           TARGET_SHLDFLAGS='-m32' \
> +--                           -f Makefile.original
> +-- $ echo 'print("test")' > a.lua
> +-- $ LUA_PATH="src/?.lua;;" luajit -b a.lua a.c
> +-- $ gcc -m32 -fPIC -shared a.c -o a.so
> +-- $ luajit -e "require('a')"
> +-- Program received signal SIGBUS, Bus error
> +
> +local function file_exists(fname)
> +   return io.open(fname, 'r') or true and false
> +end
> +
> +local function get_file_name(file)
> +    return file:match("[^/]*$")
> +end
> +
> +local stdout_msg = 'Lango team'
> +local lua_code = ('print(%q)'):format(stdout_msg)
> +local fpath = os.tmpname()
> +local path_lua = ('%s.lua'):format(fpath)
> +local path_c = ('%s.c'):format(fpath)
> +local path_so = ('%s.so'):format(fpath)
> +
> +-- Create a file with a minimal Lua code.
> +local fh = assert(io.open(path_lua, 'w'))
> +fh:write(lua_code)
> +fh:close()
> +
> +local module_name = assert(get_file_name(fpath))
> +
> +local basedir = function(path)
> +    local sep = '/'
> +    return path:match('(.*' .. sep .. ')') or './'
> +end
> +
> +-- Create a C file with LuaJIT bytecode.
> +-- We cannot use utils.makecmd, because command-line generated
> +-- by `makecmd` contains `-e` that is incompatible with option `-b`.
> +local function create_c_file(pathlua, pathc)
> +  local lua_path = os.getenv('LUA_PATH')
> +  local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
> +  local cmd_fmt = 'LUA_PATH="%s" %s -b %s %s'
> +  local cmd = (cmd_fmt):format(lua_path, lua_bin, pathlua, pathc)
> +  local ret = os.execute(cmd)
> +  assert(ret == 0, 'create a C file with bytecode')
> +end
> +
> +create_c_file(path_lua, path_c)
> +assert(file_exists(path_c))
> +
> +-- Compile C source code with LuaJIT bytecode to a shared library.
> +-- `-m64` is not available on ARM64, see
> +-- "3.18.1 AArch64 Options in the manual",
> +-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
> +local cflags_64 = jit.arch == 'arm64' and '-march=armv8-a' or '-m64'
> +local cflags = ffi.abi('32bit') and '-m32' or cflags_64
> +local cc_cmd = ('cc %s -fPIC -shared %s -o %s'):format(cflags, path_c, path_so)
> +local ph = io.popen(cc_cmd)
> +ph:close()
I suggest using the os.execute and checking the exit code.
Popen is excessive here.

> +assert(file_exists(path_so))
> +
> +-- Load shared library as a Lua module.
> +local lua_cpath = ('"/tmp/?.so;"'):format(basedir(fpath))
> +assert(file_exists(path_so))
> +local cmd = utils.exec.makecmd(arg, {
> +    script = ('-e "require([[%s]])"'):format(module_name),
> +    env = {
> +        LUA_CPATH = lua_cpath,
> +        -- It is required to cleanup LUA_PATH, otherwise
> +        -- LuaJIT loads Lua module, see tarantool-tests/utils/init.lua.
> +        LUA_PATH = '',
> +    },
> +})
> +local res = cmd()
> +test:ok(res == stdout_msg, 'bytecode loader works')
> +
> +os.remove(path_lua)
> +os.remove(path_c)
> +os.remove(path_so)
> +
> +os.exit(test:check() and 0 or 1)
`test:done` should be used instead.
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-08-31 11:32 ` [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for " Sergey Bronnikov via Tarantool-patches
@ 2023-09-01 10:05   ` Maxim Kokryashkin via Tarantool-patches
  2023-09-04 16:34     ` Sergey Bronnikov via Tarantool-patches
  2023-09-05 12:55   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 21+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-09-01 10:05 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: max.kokryashkin, tarantool-patches

On Thu, Aug 31, 2023 at 02:32:14PM +0300, Sergey Bronnikov via Tarantool-patches wrote:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
> 
> (cherry-picked from commit e49863eda13d095b1a78fd4ca0fd3a6a9a17d782)
> 
> The patch follows up a previous patch and limits the total size of a
> chunk load by `lua_load` with size `LJ_MAX_BUF - 1`.
> 
> Sergey Bronnikov:
> * added the description and the test
> ---
>  src/lj_lex.c                                  |   1 +
>  test/tarantool-c-tests/lj-549-lua_load.test.c | 134 ++++++++++++++++++
>  2 files changed, 135 insertions(+)
>  create mode 100644 test/tarantool-c-tests/lj-549-lua_load.test.c
> 
> diff --git a/src/lj_lex.c b/src/lj_lex.c
> index 6291705f..13495c41 100644
> --- a/src/lj_lex.c
> +++ b/src/lj_lex.c
> @@ -51,6 +51,7 @@ static LJ_NOINLINE LexChar lex_more(LexState *ls)
>    if (sz >= LJ_MAX_BUF) {
>      if (sz != ~(size_t)0) lj_err_mem(ls->L);
>      sz = ~(uintptr_t)0 - (uintptr_t)p;
> +    if (sz >= LJ_MAX_BUF) sz = LJ_MAX_BUF-1;
>      ls->endmark = 1;
>    }
>    ls->pe = p + sz;
> diff --git a/test/tarantool-c-tests/lj-549-lua_load.test.c b/test/tarantool-c-tests/lj-549-lua_load.test.c
> new file mode 100644
> index 00000000..9baa7a1a
> --- /dev/null
> +++ b/test/tarantool-c-tests/lj-549-lua_load.test.c
> @@ -0,0 +1,134 @@
> +#include <assert.h>
> +#include <stdint.h>
> +#include <stddef.h>
> +#include <string.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +#include <lua.h>
> +#include <lualib.h>
> +#include <lauxlib.h>
> +
> +#include "test.h"
> +#include "utils.h"
> +
> +/* Need for skipcond. */
> +#include "lj_arch.h"
> +
> +/* Defined in lj_def.h. */
> +#define LJ_MAX_MEM32    0x7fffff00      /* Max. 32 bit memory allocation. */
> +#define LJ_MAX_BUF      LJ_MAX_MEM32    /* Max. buffer length. */
> +
> +/* Defined in lua.h. */
> +/* mark for precompiled code (`<esc>Lua') */
> +#define	LUA_SIGNATURE	"\033Lua"
> +
> +#define UNUSED(x) ((void)(x))
> +
> +/**
> + * Function generates a huge chunk of "bytecode" with a size bigger than
> + * LJ_MAX_BUF. Generated chunk must enable endmark in a Lex state.
> + */
> +static const char *
> +bc_reader_with_endmark(lua_State *L, void *data, size_t *size)
> +{
> +	UNUSED(data);
> +	int bc_chunk_size = (size_t)0;
> +	static char *bc_chunk = NULL;
> +	free(bc_chunk);
What's the point of free here? Why the buffer is static?
> +
> +	bc_chunk = malloc(bc_chunk_size);
Malloc of zero size doesn't seem to be the thing you wanted to do.
> +	assert(bc_chunk != NULL);
> +
> +	/**
> +	 * `lua_load` automatically detects whether the chunk is text or binary,
> +	 * and loads it accordingly. We need a trace for bytecode input,
> +	 * so it is necessary to deceive a check in lj_lex_setup, that
> +	 * makes a sanity check and detects whether input is bytecode or text
> +	 * by the first char. Put LUA_SIGNATURE[0] at the beginning of the
> +	 * allocated region.
> +	 */
> +	bc_chunk[0] = LUA_SIGNATURE[0];
> +
> +	*size = bc_chunk_size;
> +
> +	return bc_chunk;
> +}
> +
> +static int bc_loader_with_endmark(void *test_state)
> +{
> +	lua_State *L = test_state;
> +	void *ud = NULL;
> +	int res = lua_load(L, bc_reader_with_endmark, ud, "endmark");
> +
> +	/*
> +	 * Make sure we passed the condition with lj_err_mem in the function
> +	 * `lex_more`.
> +	 */
> +	assert_true(res != LUA_ERRMEM);
> +
> +	return TEST_EXIT_SUCCESS;
> +}
> +
> +enum bc_emission_state {
> +	EMIT_BC,
> +	EMIT_EOF,
> +};
> +
> +typedef struct {
> +	enum bc_emission_state state;
> +} dt;
> +
> +/**
> + * Function returns a bytecode chunk on the first call and NULL and size equal
> + * to zero on the second call. Triggers the END_OF_STREAM flag in the function
> + * `lex_more`.
> + */
> +static const char *
> +bc_reader_with_eof(lua_State *L, void *data, size_t *size)
> +{
> +	UNUSED(data);
> +	UNUSED(L);
> +	dt *test_data = (dt *)data;
> +	if (test_data->state == EMIT_EOF) {
This section is unreachable, isn't it?
> +		*size = 0;
> +		return NULL;
> +	}
> +
> +	static char *bc_chunk = NULL;
> +	free(bc_chunk);
Ditto.
> +
> +	size_t sz = 10;
Is there any reason for it to be exactly 10? Drop a comment.
> +	bc_chunk = malloc(sz);
> +	bc_chunk[0] = LUA_SIGNATURE[0];
> +	*size = sz;
> +
> +	return bc_chunk;
> +}
> +
> +static int bc_loader_with_eof(void *test_state)
> +{
> +	lua_State *L = test_state;
> +	dt test_data = {0};
> +	test_data.state = EMIT_BC;
> +	int res = lua_load(L, bc_reader_with_eof, &test_data, "eof");
> +	assert_true(res = LUA_ERRSYNTAX);
> +	if (res == LUA_OK) {
> +		lua_pcall(L, 0, 0, 0);
> +	}
> +
> +	return TEST_EXIT_SUCCESS;
> +}
> +
> +int main(void)
> +{
> +	lua_State *L = utils_lua_init();
> +	const struct test_unit tgroup[] = {
> +		test_unit_def(bc_loader_with_endmark),
> +		test_unit_def(bc_loader_with_eof)
> +	};
> +
> +	const int test_result = test_run_group(tgroup, L);
> +	utils_lua_close(L);
> +	return test_result;
> +}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-09-01  9:42   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-09-04  9:31     ` Sergey Bronnikov via Tarantool-patches
  2023-09-05  6:34       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-09-04  9:31 UTC (permalink / raw)
  To: Maxim Kokryashkin, Sergey Bronnikov; +Cc: max.kokryashkin, tarantool-patches

Hi, Maxim

On 9/1/23 12:42, Maxim Kokryashkin via Tarantool-patches wrote:


<snipped>

>> diff --git a/test/tarantool-tests/lj-549-bytecode-loader.test.lua b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
>> new file mode 100644
>> index 00000000..889be80a
>> --- /dev/null
>> +++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
>> @@ -0,0 +1,96 @@
>> +local tap = require('tap')
>> +local ffi = require('ffi')
>> +local utils = require('utils')
>> +local test = tap.test('lj-549-bytecode-loader'):skipcond({
>> +    -- ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
> Why this skipcond is commented out?

it is not needed anymore, so I removed it. Iterative patch is below:


--- a/test/tarantool-tests/lj-549-bytecode-loader.test.lua
+++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
@@ -1,9 +1,7 @@
  local tap = require('tap')
  local ffi = require('ffi')
  local utils = require('utils')
-local test = tap.test('lj-549-bytecode-loader'):skipcond({
-    -- ['Test requires GC64 mode enabled'] = not 
require('ffi').abi('gc64'),
-})
+local test = tap.test('lj-549-bytecode-loader')

  test:plan(1)

>> +})
>> +
>> +test:plan(1)
>> +
>> +-- Test creates a shared library with LuaJIT bytecode,
>> +-- loads shared library as a Lua module and checks,
>> +-- that no crashes eliminated.
>> +--
>> +-- $ make HOST_CC='gcc -m32' TARGET_CFLAGS='-m32' \
>> +--                           TARGET_LDFLAGS='-m32' \
>> +--                           TARGET_SHLDFLAGS='-m32' \
>> +--                           -f Makefile.original
>> +-- $ echo 'print("test")' > a.lua
>> +-- $ LUA_PATH="src/?.lua;;" luajit -b a.lua a.c
>> +-- $ gcc -m32 -fPIC -shared a.c -o a.so
>> +-- $ luajit -e "require('a')"
>> +-- Program received signal SIGBUS, Bus error
>> +
>> +local function file_exists(fname)
>> +   return io.open(fname, 'r') or true and false
>> +end
>> +
>> +local function get_file_name(file)
>> +    return file:match("[^/]*$")
>> +end
>> +
>> +local stdout_msg = 'Lango team'
>> +local lua_code = ('print(%q)'):format(stdout_msg)
>> +local fpath = os.tmpname()
>> +local path_lua = ('%s.lua'):format(fpath)
>> +local path_c = ('%s.c'):format(fpath)
>> +local path_so = ('%s.so'):format(fpath)
>> +
>> +-- Create a file with a minimal Lua code.
>> +local fh = assert(io.open(path_lua, 'w'))
>> +fh:write(lua_code)
>> +fh:close()
>> +
>> +local module_name = assert(get_file_name(fpath))
>> +
>> +local basedir = function(path)
>> +    local sep = '/'
>> +    return path:match('(.*' .. sep .. ')') or './'
>> +end
>> +
>> +-- Create a C file with LuaJIT bytecode.
>> +-- We cannot use utils.makecmd, because command-line generated
>> +-- by `makecmd` contains `-e` that is incompatible with option `-b`.
>> +local function create_c_file(pathlua, pathc)
>> +  local lua_path = os.getenv('LUA_PATH')
>> +  local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
>> +  local cmd_fmt = 'LUA_PATH="%s" %s -b %s %s'
>> +  local cmd = (cmd_fmt):format(lua_path, lua_bin, pathlua, pathc)
>> +  local ret = os.execute(cmd)
>> +  assert(ret == 0, 'create a C file with bytecode')
>> +end
>> +
>> +create_c_file(path_lua, path_c)
>> +assert(file_exists(path_c))
>> +
>> +-- Compile C source code with LuaJIT bytecode to a shared library.
>> +-- `-m64` is not available on ARM64, see
>> +-- "3.18.1 AArch64 Options in the manual",
>> +-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
>> +local cflags_64 = jit.arch == 'arm64' and '-march=armv8-a' or '-m64'
>> +local cflags = ffi.abi('32bit') and '-m32' or cflags_64
>> +local cc_cmd = ('cc %s -fPIC -shared %s -o %s'):format(cflags, path_c, path_so)
>> +local ph = io.popen(cc_cmd)
>> +ph:close()
> I suggest using the os.execute and checking the exit code.
> Popen is excessive here.

replaced, iterative patch is below

--- a/test/tarantool-tests/lj-549-bytecode-loader.test.lua
+++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
@@ -68,8 +68,8 @@ assert(file_exists(path_c))
  local cflags_64 = jit.arch == 'arm64' and '-march=armv8-a' or '-m64'
  local cflags = ffi.abi('32bit') and '-m32' or cflags_64
  local cc_cmd = ('cc %s -fPIC -shared %s -o %s'):format(cflags, path_c, 
path_so)
-local ph = io.popen(cc_cmd)
-ph:close()
+local rc = os.execute(cc_cmd)
+assert(rc == 0)
  assert(file_exists(path_so))

  -- Load shared library as a Lua module.

>
>> +assert(file_exists(path_so))
>> +
>> +-- Load shared library as a Lua module.
>> +local lua_cpath = ('"/tmp/?.so;"'):format(basedir(fpath))
>> +assert(file_exists(path_so))
>> +local cmd = utils.exec.makecmd(arg, {
>> +    script = ('-e "require([[%s]])"'):format(module_name),
>> +    env = {
>> +        LUA_CPATH = lua_cpath,
>> +        -- It is required to cleanup LUA_PATH, otherwise
>> +        -- LuaJIT loads Lua module, see tarantool-tests/utils/init.lua.
>> +        LUA_PATH = '',
>> +    },
>> +})
>> +local res = cmd()
>> +test:ok(res == stdout_msg, 'bytecode loader works')
>> +
>> +os.remove(path_lua)
>> +os.remove(path_c)
>> +os.remove(path_so)
>> +
>> +os.exit(test:check() and 0 or 1)
> `test:done` should be used instead.

Fixed.


@@ -91,4 +91,4 @@ os.remove(path_lua)
  os.remove(path_c)
  os.remove(path_so)

-os.exit(test:check() and 0 or 1)
+test:done(true)

>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-09-01 10:05   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-09-04 16:34     ` Sergey Bronnikov via Tarantool-patches
  2023-09-05  6:45       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-09-04 16:34 UTC (permalink / raw)
  To: Maxim Kokryashkin, Sergey Bronnikov; +Cc: max.kokryashkin, tarantool-patches

Hi, Max

On 9/1/23 13:05, Maxim Kokryashkin via Tarantool-patches wrote:
> On Thu, Aug 31, 2023 at 02:32:14PM +0300, Sergey Bronnikov via Tarantool-patches wrote:
>> From: Sergey Bronnikov <sergeyb@tarantool.org>
<snipped>
>> +
>> +/**
>> + * Function generates a huge chunk of "bytecode" with a size bigger than
>> + * LJ_MAX_BUF. Generated chunk must enable endmark in a Lex state.
>> + */
>> +static const char *
>> +bc_reader_with_endmark(lua_State *L, void *data, size_t *size)
>> +{
>> +	UNUSED(data);
>> +	int bc_chunk_size = (size_t)0;
>> +	static char *bc_chunk = NULL;
>> +	free(bc_chunk);
> What's the point of free here? Why the buffer is static?

Because callee (aka Reader) is responsible for buffer, reader 
initializes it once

and then reuse.

 > The block must exist until the reader function is called again.

 > To signal the end of the chunk, the reader must return NULL or set 
size to zero.

 > The reader function may return pieces of any size greater than zero.


1. http://www.lua.org/manual/5.1/manual.html#lua_Reader


>> +
>> +	bc_chunk = malloc(bc_chunk_size);
> Malloc of zero size doesn't seem to be the thing you wanted to do.

Right. Updated:


@@ -33,26 +33,9 @@ static const char *
  bc_reader_with_endmark(lua_State *L, void *data, size_t *size)
  {
      UNUSED(data);
-    int bc_chunk_size = (size_t)0;
-    static char *bc_chunk = NULL;
-    free(bc_chunk);
-
-    bc_chunk = malloc(bc_chunk_size);
-    assert(bc_chunk != NULL);
-
-    /**
-     * `lua_load` automatically detects whether the chunk is text or 
binary,
-     * and loads it accordingly. We need a trace for bytecode input,
-     * so it is necessary to deceive a check in lj_lex_setup, that
-     * makes a sanity check and detects whether input is bytecode or text
-     * by the first char. Put LUA_SIGNATURE[0] at the beginning of the
-     * allocated region.
-     */
-    bc_chunk[0] = LUA_SIGNATURE[0];
-
-    *size = bc_chunk_size;
+    *size = ~(size_t)0;

-    return bc_chunk;
+    return NULL;
  }

  static int bc_loader_with_endmark(void *test_state)

>> +	assert(bc_chunk != NULL);
>> +
>> +	/**
>> +	 * `lua_load` automatically detects whether the chunk is text or binary,
>> +	 * and loads it accordingly. We need a trace for bytecode input,
>> +	 * so it is necessary to deceive a check in lj_lex_setup, that
>> +	 * makes a sanity check and detects whether input is bytecode or text
>> +	 * by the first char. Put LUA_SIGNATURE[0] at the beginning of the
>> +	 * allocated region.
>> +	 */
>> +	bc_chunk[0] = LUA_SIGNATURE[0];
>> +
>> +	*size = bc_chunk_size;
>> +
>> +	return bc_chunk;
>> +}
>> +
>> +static int bc_loader_with_endmark(void *test_state)
>> +{
>> +	lua_State *L = test_state;
>> +	void *ud = NULL;
>> +	int res = lua_load(L, bc_reader_with_endmark, ud, "endmark");
>> +
>> +	/*
>> +	 * Make sure we passed the condition with lj_err_mem in the function
>> +	 * `lex_more`.
>> +	 */
>> +	assert_true(res != LUA_ERRMEM);
>> +
>> +	return TEST_EXIT_SUCCESS;
>> +}
>> +
>> +enum bc_emission_state {
>> +	EMIT_BC,
>> +	EMIT_EOF,
>> +};
>> +
>> +typedef struct {
>> +	enum bc_emission_state state;
>> +} dt;
>> +
>> +/**
>> + * Function returns a bytecode chunk on the first call and NULL and size equal
>> + * to zero on the second call. Triggers the END_OF_STREAM flag in the function
>> + * `lex_more`.
>> + */
>> +static const char *
>> +bc_reader_with_eof(lua_State *L, void *data, size_t *size)
>> +{
>> +	UNUSED(data);
>> +	UNUSED(L);
>> +	dt *test_data = (dt *)data;
>> +	if (test_data->state == EMIT_EOF) {
> This section is unreachable, isn't it?

Right, fixed it.


>> +		*size = 0;
>> +		return NULL;
>> +	}
>> +
>> +	static char *bc_chunk = NULL;
>> +	free(bc_chunk);
> Ditto.
>> +
>> +	size_t sz = 10;
> Is there any reason for it to be exactly 10? Drop a comment.

Set it to 2 and dropped a comment.


@@ -98,10 +81,23 @@ bc_reader_with_eof(lua_State *L, void *data, size_t 
*size)
      static char *bc_chunk = NULL;
      free(bc_chunk);

-    size_t sz = 10;
+    /**
+     * Minimal size of a buffer with bytecode:
+     * signiture (1 byte) and a bytecode itself (1 byte).
+     */
+    size_t sz = 2;
      bc_chunk = malloc(sz);
+    /**
+     * `lua_load` automatically detects whether the chunk is text or 
binary,
+     * and loads it accordingly. We need a trace for *bytecode* input,
+     * so it is necessary to deceive a check in `lj_lex_setup`, that
+     * makes a sanity check and detects whether input is bytecode or text
+     * by the first char. Put `LUA_SIGNATURE[0]` at the beginning of the
+     * allocated region.
+     */
      bc_chunk[0] = LUA_SIGNATURE[0];
      *size = sz;
+    test_data->state = EMIT_EOF;

      return bc_chunk;
  }


<snipped>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-09-04  9:31     ` Sergey Bronnikov via Tarantool-patches
@ 2023-09-05  6:34       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 21+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-09-05  6:34 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: Sergey Bronnikov, max.kokryashkin, tarantool-patches

Hi, Sergey!
Thanks for the fixes!
LGTM

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-09-04 16:34     ` Sergey Bronnikov via Tarantool-patches
@ 2023-09-05  6:45       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 21+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-09-05  6:45 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: Sergey Bronnikov, max.kokryashkin, tarantool-patches

Hi, Sergey!
Thanks for the fixes!
LGTM
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-08-31 11:32 ` [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for " Sergey Bronnikov via Tarantool-patches
  2023-09-01 10:05   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-09-05 12:55   ` Sergey Kaplun via Tarantool-patches
  2023-09-07  7:04     ` Sergey Bronnikov via Tarantool-patches
  1 sibling, 1 reply; 21+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-09-05 12:55 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: max.kokryashkin, tarantool-patches

Hi, Sergey!
Thanks for the patch!
Please, consider my comments below.

On 31.08.23, Sergey Bronnikov wrote:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
>
> (cherry-picked from commit e49863eda13d095b1a78fd4ca0fd3a6a9a17d782)
>
> The patch follows up a previous patch and limits the total size of a
> chunk load by `lua_load` with size `LJ_MAX_BUF - 1`.
>
> Sergey Bronnikov:
> * added the description and the test
> ---
>  src/lj_lex.c                                  |   1 +
>  test/tarantool-c-tests/lj-549-lua_load.test.c | 134 ++++++++++++++++++

I suggest renaming the test to lj-549-lua-load.test.c to be consistent with
other tests.

>  2 files changed, 135 insertions(+)
>  create mode 100644 test/tarantool-c-tests/lj-549-lua_load.test.c
>
> diff --git a/src/lj_lex.c b/src/lj_lex.c
> index 6291705f..13495c41 100644
> --- a/src/lj_lex.c
> +++ b/src/lj_lex.c
> @@ -51,6 +51,7 @@ static LJ_NOINLINE LexChar lex_more(LexState *ls)
>    if (sz >= LJ_MAX_BUF) {
>      if (sz != ~(size_t)0) lj_err_mem(ls->L);
>      sz = ~(uintptr_t)0 - (uintptr_t)p;
> +    if (sz >= LJ_MAX_BUF) sz = LJ_MAX_BUF-1;
>      ls->endmark = 1;
>    }
>    ls->pe = p + sz;
> diff --git a/test/tarantool-c-tests/lj-549-lua_load.test.c b/test/tarantool-c-tests/lj-549-lua_load.test.c
> new file mode 100644
> index 00000000..9baa7a1a
> --- /dev/null
> +++ b/test/tarantool-c-tests/lj-549-lua_load.test.c
> @@ -0,0 +1,134 @@
> +#include <assert.h>

This include is excess.

> +#include <stdint.h>

Ditto.

> +#include <stddef.h>
> +#include <string.h>

Ditto.

> +#include <stdlib.h>
> +#include <stdio.h>

Ditto.

> +
> +#include <lua.h>
> +#include <lualib.h>

This include is excess since all libs are opened via utils.

> +#include <lauxlib.h>

This include is excess since there is no `luaL*` functions or structures
usage (and there is no usage of the `LUA_ERRFILE`, `LUA_NOREF`,
`LUA_REFNIL`).

> +
> +#include "test.h"
> +#include "utils.h"
> +
> +/* Need for skipcond. */
> +#include "lj_arch.h"

There is no skipconditions, so this include may be dropped.

> +
> +/* Defined in lj_def.h. */
> +#define LJ_MAX_MEM32    0x7fffff00      /* Max. 32 bit memory allocation. */
> +#define LJ_MAX_BUF      LJ_MAX_MEM32    /* Max. buffer length. */

Why don't use `#include "lj_def.h"` instead and mention what we need
from it?
Reminder: this is kind of unit tests (or these C tests may implement
unit test). So, we can include internal libraries, and this is OK for
__our C tests__.

> +
> +/* Defined in lua.h. */
> +/* mark for precompiled code (`<esc>Lua') */
> +#define	LUA_SIGNATURE	"\033Lua"

We already included <lua.h>, so this define isn't required.

> +
> +#define UNUSED(x) ((void)(x))
> +
> +/**

There is no need in double '*' outside functions (we're not in Kansas
anymore. :))
I suggest to be consistent with other tests codebase and use just `/*`.

> + * Function generates a huge chunk of "bytecode" with a size bigger than
> + * LJ_MAX_BUF. Generated chunk must enable endmark in a Lex state.

Nit: Comment line width is greater than 66 symbols.
Typo: s/Generated/The generated/

(I'll proceed with the branch verison below.)

| static const char *
| bc_reader_with_endmark(lua_State *L, void *data, size_t *size)

The comment is desirable about the resulting chunk:
According the Lua 5.1 Reference Manual:
| To signal the end of the chunk, the reader must return `NULL` or set
| `size` to zero.

So, since this function returns `NULL`, the resulting chunk should be
treated as "". Which provides the following bytecode:
| "endmark":0-1
| 0000 FUNCV  rbase:   1
| 0001 RET0   rbase:   0 lit:     1

This is also avoids test's failure before the patch: we just return
earlier:

| <src/lj_lex.c:50>
| if (p == NULL || sz == 0) return LEX_EOF;

So, looks like the test doesn't check the patch itself.

> + {
> + 	UNUSED(data);
> + 	*size = ~(size_t)0;
> +
> + 	return NULL;
> + }
> +
> + static int bc_loader_with_endmark(void *test_state)
> + {
> + 	lua_State *L = test_state;
> + 	void *ud = NULL;
> + 	int res = lua_load(L, bc_reader_with_endmark, ud, "endmark");
> +
> + 	/*
> + 	 * Make sure we passed the condition with lj_err_mem in the function

Nit: Comment line width is greater than 66 symbols.

> + 	 * `lex_more`.
> + 	 */
> + 	assert_true(res != LUA_ERRMEM);

Maybe it's better to use here codition res == LUA_OK?

> + 	lua_settop(L, 0);
> +
> + 	return TEST_EXIT_SUCCESS;
> + }
> +
> + enum bc_emission_state {
> + 	EMIT_BC,
> + 	EMIT_EOF,
> + };
> +
> + typedef struct {
> + 	enum bc_emission_state state;
> + } dt;
> +
> + /**

Typo: s</**></*>

> +  * Function returns a bytecode chunk on the first call and NULL
> +  * and size equal to zero on the second call. Triggers the flag
> +  * `END_OF_STREAM` in the function `lex_more`.
> +  */
> + static const char *
> + bc_reader_with_eof(lua_State *L, void *data, size_t *size)
> + {
> + 	UNUSED(L);
> + 	dt *test_data = (dt *)data;
> + 	if (test_data->state == EMIT_EOF) {
> + 		*size = 0;
> + 		return NULL;
> + 	}
> +
> + 	static char *bc_chunk = NULL;
> + 	free(bc_chunk);

This free is called only once, when bc_chunk is already NULL.
I suggest moving the initialization of the `bc_chunk` to the beginning
of the scope and calling `free()` only for the `EMIT_EOF` state (it's
also a little bit more readable -- a reader shouldn't remember that
`free(NULL)` is OK).

> +
> + 	/**

Typo: s</**></*>

> + 	 * Minimal size of a buffer with bytecode:
> + 	 * signiture (1 byte) and a bytecode itself (1 byte).

Typo: s/a bytecode/the bytecode/
Typo: s/signiture/The signature/

> + 	 */
> + 	size_t sz = 2;
> + 	bc_chunk = malloc(sz);
> + 	/**

Typo: s</**></*>

> + 	 * `lua_load` automatically detects whether the chunk is text or binary,

Typo: s/binary,/binary/

> + 	 * and loads it accordingly. We need a trace for *bytecode* input,
> + 	 * so it is necessary to deceive a check in `lj_lex_setup`, that
> + 	 * makes a sanity check and detects whether input is bytecode or text
> + 	 * by the first char. Put `LUA_SIGNATURE[0]` at the beginning of the
> + 	 * allocated region.

Nit: Comment line width is greater than 66 symbols.

> + 	 */
> + 	bc_chunk[0] = LUA_SIGNATURE[0];
> + 	*size = sz;
> + 	test_data->state = EMIT_EOF;
> +
> + 	return bc_chunk;
> + }
> +
> + static int bc_loader_with_eof(void *test_state)
> + {
> + 	lua_State *L = test_state;
> + 	dt test_data = {0};
> + 	test_data.state = EMIT_BC;
> + 	int res = lua_load(L, bc_reader_with_eof, &test_data, "eof");
> + 	assert_true(res = LUA_ERRSYNTAX);

Typo: s/=/==/
But res is indeed `LUA_ERRSYNTAX` for now :).

> + 	lua_settop(L, 0);
> +
> + 	return TEST_EXIT_SUCCESS;
> + }
> +
> + int main(void)
> + {
> + 	lua_State *L = utils_lua_init();
> + 	const struct test_unit tgroup[] = {
> + 		test_unit_def(bc_loader_with_endmark),
> + 		test_unit_def(bc_loader_with_eof)
> + 	};
> +
> + 	const int test_result = test_run_group(tgroup, L);
> + 	utils_lua_close(L);
> + 	return test_result;
> + }

[1]: https://www.lua.org/manual/5.1/manual.html#lua_Reader

--
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-08-31 11:30 ` [Tarantool-patches] [PATCH luajit 1/2][v2] " Sergey Bronnikov via Tarantool-patches
  2023-08-31 11:49   ` Sergey Bronnikov via Tarantool-patches
  2023-09-01  9:42   ` Maxim Kokryashkin via Tarantool-patches
@ 2023-09-05 14:10   ` Sergey Kaplun via Tarantool-patches
  2023-09-07 15:21     ` Sergey Bronnikov via Tarantool-patches
  2023-09-05 14:12   ` Sergey Kaplun via Tarantool-patches
  3 siblings, 1 reply; 21+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-09-05 14:10 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: max.kokryashkin, tarantool-patches

Hi, Sergey!
Thanks for the patch!
Please, consider my comments below.

On 31.08.23, Sergey Bronnikov wrote:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
> 
> (cherry-picked from commit 820339960123dc78a7ce03edf53fcf4fdae0e55d)
> 
> The original problem is specific to x32 and is as follows: when a chunk
> with a bytecode library is loaded into memory, and the address is higher
> than 0x80000100, the `LexState->pe`, that contains an address of the end
> of the bytecode chunk in the memory, will wrap around and become smaller
> than the address in `LexState->p`, that contains an address of the
> beginning of bytecode chunk in the memory. In `bcread_fill()` called by
> `bcread_want()`, `memcpy()` is called with a very large size and causes
> bus error on x86 and segmentation fault on ARM Android.

Typo: s/bus error/the bus error/
Typo: s/segmentation fault/the segmentation fault/

> 
> The problem cannot be reproduced on platforms supported by Tarantool
> (ARM64, x86_64), so test doesn't reproduce a problem without a patch and
> tests the patch partially.
> 
> Sergey Bronnikov:
> * added the description and the test
> ---
>  src/lib_package.c                             |  4 +-
>  src/lj_bcread.c                               | 10 +-
>  src/lj_lex.c                                  |  6 ++
>  src/lj_lex.h                                  |  1 +
>  .../lj-549-bytecode-loader.test.lua           | 96 +++++++++++++++++++
>  5 files changed, 110 insertions(+), 7 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-549-bytecode-loader.test.lua
> 

<snipped>

> diff --git a/test/tarantool-tests/lj-549-bytecode-loader.test.lua b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
> new file mode 100644
> index 00000000..889be80a
> --- /dev/null
> +++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
> @@ -0,0 +1,96 @@
> +local tap = require('tap')
> +local ffi = require('ffi')
> +local utils = require('utils')
> +local test = tap.test('lj-549-bytecode-loader'):skipcond({
> +    -- ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
> +})

Minor: It's better to require ffi and utils after test initialization
via `tap.test()`, see other tests for example.
Also, I suppose that we don't need `utils` itself, but
`utils.exec.makecmd`.

> +
> +test:plan(1)
> +
> +-- Test creates a shared library with LuaJIT bytecode,
> +-- loads shared library as a Lua module and checks,
> +-- that no crashes eliminated.
> +--
> +-- $ make HOST_CC='gcc -m32' TARGET_CFLAGS='-m32' \
> +--                           TARGET_LDFLAGS='-m32' \
> +--                           TARGET_SHLDFLAGS='-m32' \
> +--                           -f Makefile.original
> +-- $ echo 'print("test")' > a.lua
> +-- $ LUA_PATH="src/?.lua;;" luajit -b a.lua a.c
> +-- $ gcc -m32 -fPIC -shared a.c -o a.so
> +-- $ luajit -e "require('a')"
> +-- Program received signal SIGBUS, Bus error
> +
> +local function file_exists(fname)
> +   return io.open(fname, 'r') or true and false

OK, this is a little bit confusing:
If file doesn't exists we go to `or true` and after check `and false`
which is always false. Tricky, but works.

Also, here we don't close file handler.
I suggest it is better to rewrite this as the following:
| local fh = io.open(name, 'r')
| return fh and io.close(fh)

It is simplier to read, and fixes problem with leaking handler.

> +end
> +
> +local function get_file_name(file)
> +    return file:match("[^/]*$")

Minor: it may match the empty string for a directory occasionally:
| src/luajit -e 'print([["]]..("/tmp/"):match("[^/]*$")..[["]])'
| ""

Nit: use single quotes instead of double quotes if possible.

Nit: `[^/\\]` is better since it also covers Windows.
See <test/lua-Harness-tests/314-regex.t:167>
| local dirname = arg[0]:gsub('([^/\\]+)$', '')
Since we don't support Windows feel free to ignore.

> +end
> +
> +local stdout_msg = 'Lango team'
> +local lua_code = ('print(%q)'):format(stdout_msg)
> +local fpath = os.tmpname()
> +local path_lua = ('%s.lua'):format(fpath)
> +local path_c = ('%s.c'):format(fpath)
> +local path_so = ('%s.so'):format(fpath)

Minor: I suppose it should be renamed to `path_shared`, since on macOS
they have the ".dyld" suffix for shared libs. Hence, we need to use the
suffix in format of the shared library name too. You may take some
inspiration from here [1].

> +
> +-- Create a file with a minimal Lua code.
> +local fh = assert(io.open(path_lua, 'w'))
> +fh:write(lua_code)
> +fh:close()
> +
> +local module_name = assert(get_file_name(fpath))
> +
> +local basedir = function(path)
> +    local sep = '/'

Why do we need an additional variable here?

Nit: Indent is 4 spaces instead of 2.

> +    return path:match('(.*' .. sep .. ')') or './'

It's better to mention that the pattern matching is greedy, so we match
until the last separator.

> +end
> +
> +-- Create a C file with LuaJIT bytecode.
> +-- We cannot use utils.makecmd, because command-line generated
> +-- by `makecmd` contains `-e` that is incompatible with option `-b`.

Nit: comment line width is more than 66 symbols

> +local function create_c_file(pathlua, pathc)
> +  local lua_path = os.getenv('LUA_PATH')
> +  local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
> +  local cmd_fmt = 'LUA_PATH="%s" %s -b %s %s'
> +  local cmd = (cmd_fmt):format(lua_path, lua_bin, pathlua, pathc)
> +  local ret = os.execute(cmd)
> +  assert(ret == 0, 'create a C file with bytecode')
> +end
> +
> +create_c_file(path_lua, path_c)
> +assert(file_exists(path_c))

Minor: The test flow is a little bit hard to read due to function
declarations. Maybe it is better to declare all utility functions first
and then use them one by one? This makes control flow easier to read.

> +
> +-- Compile C source code with LuaJIT bytecode to a shared library.
> +-- `-m64` is not available on ARM64, see
> +-- "3.18.1 AArch64 Options in the manual",
> +-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
> +local cflags_64 = jit.arch == 'arm64' and '-march=armv8-a' or '-m64'
> +local cflags = ffi.abi('32bit') and '-m32' or cflags_64
> +local cc_cmd = ('cc %s -fPIC -shared %s -o %s'):format(cflags, path_c, path_so)
> +local ph = io.popen(cc_cmd)
> +ph:close()
> +assert(file_exists(path_so))
> +
> +-- Load shared library as a Lua module.
> +local lua_cpath = ('"/tmp/?.so;"'):format(basedir(fpath))
> +assert(file_exists(path_so))
> +local cmd = utils.exec.makecmd(arg, {
> +    script = ('-e "require([[%s]])"'):format(module_name),

Nit: Indent is 4 spaces instead of 2.

> +    env = {
> +        LUA_CPATH = lua_cpath,

Nit: Indent is 4 spaces instead of 2.

> +        -- It is required to cleanup LUA_PATH, otherwise
> +        -- LuaJIT loads Lua module, see tarantool-tests/utils/init.lua.

Nit: comment line width is more than 66 symbols

Actually I don't understand from the comment what Lua module exactly is
loaded. Maybe it's better to fix this behaviour?
Feel free to ignore.

> +        LUA_PATH = '',
> +    },
> +})
> +local res = cmd()
> +test:ok(res == stdout_msg, 'bytecode loader works')
> +
> +os.remove(path_lua)
> +os.remove(path_c)
> +os.remove(path_so)
> +
> +os.exit(test:check() and 0 or 1)
> -- 
> 2.34.1
> 

[1]: https://github.com/tarantool/tarantool/blob/dc8973c3de6311ab11df8d43520e1d40de4b9c7b/test/box/func_reload.test.lua#L5

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-08-31 11:30 ` [Tarantool-patches] [PATCH luajit 1/2][v2] " Sergey Bronnikov via Tarantool-patches
                     ` (2 preceding siblings ...)
  2023-09-05 14:10   ` Sergey Kaplun via Tarantool-patches
@ 2023-09-05 14:12   ` Sergey Kaplun via Tarantool-patches
  2023-09-07  7:06     ` Sergey Bronnikov via Tarantool-patches
  3 siblings, 1 reply; 21+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-09-05 14:12 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: max.kokryashkin, tarantool-patches

Hi!
One more nit:

On 31.08.23, Sergey Bronnikov wrote:

<snipped>

> +-- Compile C source code with LuaJIT bytecode to a shared library.
> +-- `-m64` is not available on ARM64, see
> +-- "3.18.1 AArch64 Options in the manual",
> +-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

Missed dot at the end of the sentence.

<snipped>

> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-09-05 12:55   ` Sergey Kaplun via Tarantool-patches
@ 2023-09-07  7:04     ` Sergey Bronnikov via Tarantool-patches
  2023-09-11  9:26       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-09-07  7:04 UTC (permalink / raw)
  To: Sergey Kaplun, Sergey Bronnikov; +Cc: tarantool-patches, max.kokryashkin

Hi, Sergey

On 9/5/23 15:55, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please, consider my comments below.
>
> On 31.08.23, Sergey Bronnikov wrote:
>> From: Sergey Bronnikov <sergeyb@tarantool.org>
>>
>> (cherry-picked from commit e49863eda13d095b1a78fd4ca0fd3a6a9a17d782)
>>
>> The patch follows up a previous patch and limits the total size of a
>> chunk load by `lua_load` with size `LJ_MAX_BUF - 1`.
>>
>> Sergey Bronnikov:
>> * added the description and the test
>> ---
>>   src/lj_lex.c                                  |   1 +
>>   test/tarantool-c-tests/lj-549-lua_load.test.c | 134 ++++++++++++++++++
> I suggest renaming the test to lj-549-lua-load.test.c to be consistent with
> other tests.
Renamed.
>
>>   2 files changed, 135 insertions(+)
>>   create mode 100644 test/tarantool-c-tests/lj-549-lua_load.test.c
>>
>> diff --git a/src/lj_lex.c b/src/lj_lex.c
>> index 6291705f..13495c41 100644
>> --- a/src/lj_lex.c
>> +++ b/src/lj_lex.c
>> @@ -51,6 +51,7 @@ static LJ_NOINLINE LexChar lex_more(LexState *ls)
>>     if (sz >= LJ_MAX_BUF) {
>>       if (sz != ~(size_t)0) lj_err_mem(ls->L);
>>       sz = ~(uintptr_t)0 - (uintptr_t)p;
>> +    if (sz >= LJ_MAX_BUF) sz = LJ_MAX_BUF-1;
>>       ls->endmark = 1;
>>     }
>>     ls->pe = p + sz;
>> diff --git a/test/tarantool-c-tests/lj-549-lua_load.test.c b/test/tarantool-c-tests/lj-549-lua_load.test.c
>> new file mode 100644
>> index 00000000..9baa7a1a
>> --- /dev/null
>> +++ b/test/tarantool-c-tests/lj-549-lua_load.test.c
>> @@ -0,0 +1,134 @@
>> +#include <assert.h>
> This include is excess.
Removed.
>
>> +#include <stdint.h>
> Ditto.

Removed.


>
>> +#include <stddef.h>
>> +#include <string.h>
> Ditto.

Removed.


>
>> +#include <stdlib.h>
>> +#include <stdio.h>
> Ditto.
Removed.
>
>> +
>> +#include <lua.h>
>> +#include <lualib.h>
> This include is excess since all libs are opened via utils.
Removed.
>
>> +#include <lauxlib.h>
> This include is excess since there is no `luaL*` functions or structures
> usage (and there is no usage of the `LUA_ERRFILE`, `LUA_NOREF`,
> `LUA_REFNIL`).
Removed.
>
>> +
>> +#include "test.h"
>> +#include "utils.h"
>> +
>> +/* Need for skipcond. */
>> +#include "lj_arch.h"
> There is no skipconditions, so this include may be dropped.
Removed.
>
>> +
>> +/* Defined in lj_def.h. */
>> +#define LJ_MAX_MEM32    0x7fffff00      /* Max. 32 bit memory allocation. */
>> +#define LJ_MAX_BUF      LJ_MAX_MEM32    /* Max. buffer length. */
> Why don't use `#include "lj_def.h"` instead and mention what we need
> from it?
> Reminder: this is kind of unit tests (or these C tests may implement
> unit test). So, we can include internal libraries, and this is OK for
> __our C tests__.
>
Fixed.
>> +
>> +/* Defined in lua.h. */
>> +/* mark for precompiled code (`<esc>Lua') */
>> +#define	LUA_SIGNATURE	"\033Lua"
> We already included <lua.h>, so this define isn't required.
Fixed.
>
>> +
>> +#define UNUSED(x) ((void)(x))
>> +
>> +/**
> There is no need in double '*' outside functions (we're not in Kansas
> anymore. :))
> I suggest to be consistent with other tests codebase and use just `/*`.
Fixed.
>
>> + * Function generates a huge chunk of "bytecode" with a size bigger than
>> + * LJ_MAX_BUF. Generated chunk must enable endmark in a Lex state.
> Nit: Comment line width is greater than 66 symbols.
> Typo: s/Generated/The generated/
Fixed.
>
> (I'll proceed with the branch verison below.)
>
> | static const char *
> | bc_reader_with_endmark(lua_State *L, void *data, size_t *size)
>
> The comment is desirable about the resulting chunk:
> According the Lua 5.1 Reference Manual:
> | To signal the end of the chunk, the reader must return `NULL` or set
> | `size` to zero.
>
> So, since this function returns `NULL`, the resulting chunk should be
> treated as "". Which provides the following bytecode:
> | "endmark":0-1
> | 0000 FUNCV  rbase:   1
> | 0001 RET0   rbase:   0 lit:     1
>
> This is also avoids test's failure before the patch: we just return
> earlier:
>
> | <src/lj_lex.c:50>
> | if (p == NULL || sz == 0) return LEX_EOF;
>
> So, looks like the test doesn't check the patch itself.

This is exactly the case covered by testcase,

note the name of testcase and Reader function contains postfix "_eof".

Yes, doesn't check the patch itself, because it is not trivial to test 
endmark introduced in patch.

I added note about test to the second commit too.

>
>> + {
>> + 	UNUSED(data);
>> + 	*size = ~(size_t)0;
>> +
>> + 	return NULL;
>> + }
>> +
>> + static int bc_loader_with_endmark(void *test_state)
>> + {
>> + 	lua_State *L = test_state;
>> + 	void *ud = NULL;
>> + 	int res = lua_load(L, bc_reader_with_endmark, ud, "endmark");
>> +
>> + 	/*
>> + 	 * Make sure we passed the condition with lj_err_mem in the function
> Nit: Comment line width is greater than 66 symbols.
Fixed.
>
>> + 	 * `lex_more`.
>> + 	 */
>> + 	assert_true(res != LUA_ERRMEM);
> Maybe it's better to use here codition res == LUA_OK?
>
>> + 	lua_settop(L, 0);
>> +
>> + 	return TEST_EXIT_SUCCESS;
>> + }
>> +
>> + enum bc_emission_state {
>> + 	EMIT_BC,
>> + 	EMIT_EOF,
>> + };
>> +
>> + typedef struct {
>> + 	enum bc_emission_state state;
>> + } dt;
>> +
>> + /**
> Typo: s</**></*>

Fixed.


>
>> +  * Function returns a bytecode chunk on the first call and NULL
>> +  * and size equal to zero on the second call. Triggers the flag
>> +  * `END_OF_STREAM` in the function `lex_more`.
>> +  */
>> + static const char *
>> + bc_reader_with_eof(lua_State *L, void *data, size_t *size)
>> + {
>> + 	UNUSED(L);
>> + 	dt *test_data = (dt *)data;
>> + 	if (test_data->state == EMIT_EOF) {
>> + 		*size = 0;
>> + 		return NULL;
>> + 	}
>> +
>> + 	static char *bc_chunk = NULL;
>> + 	free(bc_chunk);
> This free is called only once, when bc_chunk is already NULL.
> I suggest moving the initialization of the `bc_chunk` to the beginning
> of the scope and calling `free()` only for the `EMIT_EOF` state (it's
> also a little bit more readable -- a reader shouldn't remember that
> `free(NULL)` is OK).
Updated.
>
>> +
>> + 	/**
> Typo: s</**></*>
Fixed.
>
>> + 	 * Minimal size of a buffer with bytecode:
>> + 	 * signiture (1 byte) and a bytecode itself (1 byte).
> Typo: s/a bytecode/the bytecode/
> Typo: s/signiture/The signature/


Fixed.

>
>> + 	 */
>> + 	size_t sz = 2;
>> + 	bc_chunk = malloc(sz);
>> + 	/**
> Typo: s</**></*>
Fixed.
>
>> + 	 * `lua_load` automatically detects whether the chunk is text or binary,
> Typo: s/binary,/binary/
Fixed.
>
>> + 	 * and loads it accordingly. We need a trace for *bytecode* input,
>> + 	 * so it is necessary to deceive a check in `lj_lex_setup`, that
>> + 	 * makes a sanity check and detects whether input is bytecode or text
>> + 	 * by the first char. Put `LUA_SIGNATURE[0]` at the beginning of the
>> + 	 * allocated region.
> Nit: Comment line width is greater than 66 symbols.
Fixed.
>
>> + 	 */
>> + 	bc_chunk[0] = LUA_SIGNATURE[0];
>> + 	*size = sz;
>> + 	test_data->state = EMIT_EOF;
>> +
>> + 	return bc_chunk;
>> + }
>> +
>> + static int bc_loader_with_eof(void *test_state)
>> + {
>> + 	lua_State *L = test_state;
>> + 	dt test_data = {0};
>> + 	test_data.state = EMIT_BC;
>> + 	int res = lua_load(L, bc_reader_with_eof, &test_data, "eof");
>> + 	assert_true(res = LUA_ERRSYNTAX);
> Typo: s/=/==/

Fixed.


> But res is indeed `LUA_ERRSYNTAX` for now :).
>
>> + 	lua_settop(L, 0);
>> +
>> + 	return TEST_EXIT_SUCCESS;
>> + }
>> +
>> + int main(void)
>> + {
>> + 	lua_State *L = utils_lua_init();
>> + 	const struct test_unit tgroup[] = {
>> + 		test_unit_def(bc_loader_with_endmark),
>> + 		test_unit_def(bc_loader_with_eof)
>> + 	};
>> +
>> + 	const int test_result = test_run_group(tgroup, L);
>> + 	utils_lua_close(L);
>> + 	return test_result;
>> + }
> [1]: https://www.lua.org/manual/5.1/manual.html#lua_Reader
>
> --
> Best regards,
> Sergey Kaplun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-09-05 14:12   ` Sergey Kaplun via Tarantool-patches
@ 2023-09-07  7:06     ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-09-07  7:06 UTC (permalink / raw)
  To: Sergey Kaplun, Sergey Bronnikov; +Cc: tarantool-patches, max.kokryashkin

Hi,

On 9/5/23 17:12, Sergey Kaplun wrote:
> Hi!
> One more nit:
>
> On 31.08.23, Sergey Bronnikov wrote:
>
> <snipped>
>
>> +-- Compile C source code with LuaJIT bytecode to a shared library.
>> +-- `-m64` is not available on ARM64, see
>> +-- "3.18.1 AArch64 Options in the manual",
>> +-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
> Missed dot at the end of the sentence.

Thanks.

Fixed, commited, squashed to the original commit, and force-pushed.

>
> <snipped>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-09-05 14:10   ` Sergey Kaplun via Tarantool-patches
@ 2023-09-07 15:21     ` Sergey Bronnikov via Tarantool-patches
  2023-09-11  8:45       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-09-07 15:21 UTC (permalink / raw)
  To: Sergey Kaplun, Sergey Bronnikov; +Cc: tarantool-patches, max.kokryashkin

Hi, Sergey


On 9/5/23 17:10, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please, consider my comments below.
>
> On 31.08.23, Sergey Bronnikov wrote:
>> From: Sergey Bronnikov <sergeyb@tarantool.org>
>>
>> (cherry-picked from commit 820339960123dc78a7ce03edf53fcf4fdae0e55d)
>>
>> The original problem is specific to x32 and is as follows: when a chunk
>> with a bytecode library is loaded into memory, and the address is higher
>> than 0x80000100, the `LexState->pe`, that contains an address of the end
>> of the bytecode chunk in the memory, will wrap around and become smaller
>> than the address in `LexState->p`, that contains an address of the
>> beginning of bytecode chunk in the memory. In `bcread_fill()` called by
>> `bcread_want()`, `memcpy()` is called with a very large size and causes
>> bus error on x86 and segmentation fault on ARM Android.
> Typo: s/bus error/the bus error/
> Typo: s/segmentation fault/the segmentation fault/

Fixed.


>
>> The problem cannot be reproduced on platforms supported by Tarantool
>> (ARM64, x86_64), so test doesn't reproduce a problem without a patch and
>> tests the patch partially.
>>
>> Sergey Bronnikov:
>> * added the description and the test
>> ---
>>   src/lib_package.c                             |  4 +-
>>   src/lj_bcread.c                               | 10 +-
>>   src/lj_lex.c                                  |  6 ++
>>   src/lj_lex.h                                  |  1 +
>>   .../lj-549-bytecode-loader.test.lua           | 96 +++++++++++++++++++
>>   5 files changed, 110 insertions(+), 7 deletions(-)
>>   create mode 100644 test/tarantool-tests/lj-549-bytecode-loader.test.lua
>>
> <snipped>
>
>> diff --git a/test/tarantool-tests/lj-549-bytecode-loader.test.lua b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
>> new file mode 100644
>> index 00000000..889be80a
>> --- /dev/null
>> +++ b/test/tarantool-tests/lj-549-bytecode-loader.test.lua
>> @@ -0,0 +1,96 @@
>> +local tap = require('tap')
>> +local ffi = require('ffi')
>> +local utils = require('utils')
>> +local test = tap.test('lj-549-bytecode-loader'):skipcond({
>> +    -- ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
>> +})
> Minor: It's better to require ffi and utils after test initialization
> via `tap.test()`, see other tests for example.
> Also, I suppose that we don't need `utils` itself, but
> `utils.exec.makecmd`.

Fixed.


>> +
>> +test:plan(1)
>> +
>> +-- Test creates a shared library with LuaJIT bytecode,
>> +-- loads shared library as a Lua module and checks,
>> +-- that no crashes eliminated.
>> +--
>> +-- $ make HOST_CC='gcc -m32' TARGET_CFLAGS='-m32' \
>> +--                           TARGET_LDFLAGS='-m32' \
>> +--                           TARGET_SHLDFLAGS='-m32' \
>> +--                           -f Makefile.original
>> +-- $ echo 'print("test")' > a.lua
>> +-- $ LUA_PATH="src/?.lua;;" luajit -b a.lua a.c
>> +-- $ gcc -m32 -fPIC -shared a.c -o a.so
>> +-- $ luajit -e "require('a')"
>> +-- Program received signal SIGBUS, Bus error
>> +
>> +local function file_exists(fname)
>> +   return io.open(fname, 'r') or true and false
> OK, this is a little bit confusing:
> If file doesn't exists we go to `or true` and after check `and false`
> which is always false. Tricky, but works.
>
> Also, here we don't close file handler.
> I suggest it is better to rewrite this as the following:
> | local fh = io.open(name, 'r')
> | return fh and io.close(fh)
>
> It is simplier to read, and fixes problem with leaking handler.
Updated.
>
>> +end
>> +
>> +local function get_file_name(file)
>> +    return file:match("[^/]*$")
> Minor: it may match the empty string for a directory occasionally:
> | src/luajit -e 'print([["]]..("/tmp/"):match("[^/]*$")..[["]])'


Fixed.

> | ""
>
> Nit: use single quotes instead of double quotes if possible.

Without context it is difficult to get what is line you talk about.

As I see everything is fine with quotes in version on the branch.


>
> Nit: `[^/\\]` is better since it also covers Windows.
> See <test/lua-Harness-tests/314-regex.t:167>
> | local dirname = arg[0]:gsub('([^/\\]+)$', '')
> Since we don't support Windows feel free to ignore.
>
>> +end
>> +
>> +local stdout_msg = 'Lango team'
>> +local lua_code = ('print(%q)'):format(stdout_msg)
>> +local fpath = os.tmpname()
>> +local path_lua = ('%s.lua'):format(fpath)
>> +local path_c = ('%s.c'):format(fpath)
>> +local path_so = ('%s.so'):format(fpath)
> Minor: I suppose it should be renamed to `path_shared`, since on macOS
> they have the ".dyld" suffix for shared libs. Hence, we need to use the
> suffix in format of the shared library name too. You may take some
> inspiration from here [1].
Fixed.
>> +
>> +-- Create a file with a minimal Lua code.
>> +local fh = assert(io.open(path_lua, 'w'))
>> +fh:write(lua_code)
>> +fh:close()
>> +
>> +local module_name = assert(get_file_name(fpath))
>> +
>> +local basedir = function(path)
>> +    local sep = '/'
> Why do we need an additional variable here?

For clarity.


>
> Nit: Indent is 4 spaces instead of 2.
>
>> +    return path:match('(.*' .. sep .. ')') or './'
> It's better to mention that the pattern matching is greedy, so we match
> until the last separator.


Updated.

>> +end
>> +
>> +-- Create a C file with LuaJIT bytecode.
>> +-- We cannot use utils.makecmd, because command-line generated
>> +-- by `makecmd` contains `-e` that is incompatible with option `-b`.
> Nit: comment line width is more than 66 symbols

Fixed.


>> +local function create_c_file(pathlua, pathc)
>> +  local lua_path = os.getenv('LUA_PATH')
>> +  local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
>> +  local cmd_fmt = 'LUA_PATH="%s" %s -b %s %s'
>> +  local cmd = (cmd_fmt):format(lua_path, lua_bin, pathlua, pathc)
>> +  local ret = os.execute(cmd)
>> +  assert(ret == 0, 'create a C file with bytecode')
>> +end
>> +
>> +create_c_file(path_lua, path_c)
>> +assert(file_exists(path_c))
> Minor: The test flow is a little bit hard to read due to function
> declarations. Maybe it is better to declare all utility functions first
> and then use them one by one? This makes control flow easier to read.
Rearranged, take a look please.
>
>> +
>> +-- Compile C source code with LuaJIT bytecode to a shared library.
>> +-- `-m64` is not available on ARM64, see
>> +-- "3.18.1 AArch64 Options in the manual",
>> +-- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
>> +local cflags_64 = jit.arch == 'arm64' and '-march=armv8-a' or '-m64'
>> +local cflags = ffi.abi('32bit') and '-m32' or cflags_64
>> +local cc_cmd = ('cc %s -fPIC -shared %s -o %s'):format(cflags, path_c, path_so)
>> +local ph = io.popen(cc_cmd)
>> +ph:close()
>> +assert(file_exists(path_so))
>> +
>> +-- Load shared library as a Lua module.
>> +local lua_cpath = ('"/tmp/?.so;"'):format(basedir(fpath))
>> +assert(file_exists(path_so))
>> +local cmd = utils.exec.makecmd(arg, {
>> +    script = ('-e "require([[%s]])"'):format(module_name),
> Nit: Indent is 4 spaces instead of 2.
Fixed.
>
>> +    env = {
>> +        LUA_CPATH = lua_cpath,
> Nit: Indent is 4 spaces instead of 2.
Fixed.
>
>> +        -- It is required to cleanup LUA_PATH, otherwise
>> +        -- LuaJIT loads Lua module, see tarantool-tests/utils/init.lua.
> Nit: comment line width is more than 66 symbols
Fixed too.
>
> Actually I don't understand from the comment what Lua module exactly is
> loaded. Maybe it's better to fix this behaviour?
What behaviour you want to fix?
> Feel free to ignore.
>
>> +        LUA_PATH = '',
>> +    },
>> +})
>> +local res = cmd()
>> +test:ok(res == stdout_msg, 'bytecode loader works')
>> +
>> +os.remove(path_lua)
>> +os.remove(path_c)
>> +os.remove(path_so)
>> +
>> +os.exit(test:check() and 0 or 1)
>> -- 
>> 2.34.1
>>
> [1]: https://github.com/tarantool/tarantool/blob/dc8973c3de6311ab11df8d43520e1d40de4b9c7b/test/box/func_reload.test.lua#L5
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-09-07 15:21     ` Sergey Bronnikov via Tarantool-patches
@ 2023-09-11  8:45       ` Sergey Kaplun via Tarantool-patches
  2023-09-12 10:20         ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-09-11  8:45 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: Sergey Bronnikov, tarantool-patches, max.kokryashkin

Hi, Sergey!
Thanks for the fixes!

On 07.09.23, Sergey Bronnikov wrote:
> Hi, Sergey
> 
> 
> On 9/5/23 17:10, Sergey Kaplun wrote:
> > Hi, Sergey!
> > Thanks for the patch!
> > Please, consider my comments below.
> >
> > On 31.08.23, Sergey Bronnikov wrote:
> >> From: Sergey Bronnikov <sergeyb@tarantool.org>
> >>

<snipped>

> >
> >> +end
> >> +
> >> +local function get_file_name(file)
> >> +    return file:match("[^/]*$")
> > Minor: it may match the empty string for a directory occasionally:
> > | src/luajit -e 'print([["]]..("/tmp/"):match("[^/]*$")..[["]])'
> 
> 
> Fixed.
> 
> > | ""
> >
> > Nit: use single quotes instead of double quotes if possible.
> 
> Without context it is difficult to get what is line you talk about.
> 
> As I see everything is fine with quotes in version on the branch.
> 

I'm talking about the same line (got this version from the branch):

| +local function basename(path)
| +  return path:match("[^/]*$")
| +end

> 
> >
> > Nit: `[^/\\]` is better since it also covers Windows.
> > See <test/lua-Harness-tests/314-regex.t:167>
> > | local dirname = arg[0]:gsub('([^/\\]+)$', '')
> > Since we don't support Windows feel free to ignore.
> >
> >> +end
> >> +
> >> +local stdout_msg = 'Lango team'
> >> +local lua_code = ('print(%q)'):format(stdout_msg)
> >> +local fpath = os.tmpname()
> >> +local path_lua = ('%s.lua'):format(fpath)
> >> +local path_c = ('%s.c'):format(fpath)
> >> +local path_so = ('%s.so'):format(fpath)
> > Minor: I suppose it should be renamed to `path_shared`, since on macOS
> > they have the ".dyld" suffix for shared libs. Hence, we need to use the
> > suffix in format of the shared library name too. You may take some
> > inspiration from here [1].
> Fixed.

Hmm, I don't see these updates on the branch [1]. Maybe you forgot to push
your local changes to the repository (*)?

> >> +

<snipped>

> >> +local basedir = function(path)
> >> +    local sep = '/'
> > Why do we need an additional variable here?
> 
> For clarity.

OK.

> 
> 
> >
> > Nit: Indent is 4 spaces instead of 2.
> >
> >> +    return path:match('(.*' .. sep .. ')') or './'
> > It's better to mention that the pattern matching is greedy, so we match
> > until the last separator.
> 
> 
> Updated.

This is missing too (*).

> 
> >> +end
> >> +
> >> +-- Create a C file with LuaJIT bytecode.
> >> +-- We cannot use utils.makecmd, because command-line generated
> >> +-- by `makecmd` contains `-e` that is incompatible with option `-b`.
> > Nit: comment line width is more than 66 symbols
> 
> Fixed.
> 
> 
> >> +local function create_c_file(pathlua, pathc)
> >> +  local lua_path = os.getenv('LUA_PATH')
> >> +  local lua_bin = require('utils').exec.luacmd(arg):match('%S+')
> >> +  local cmd_fmt = 'LUA_PATH="%s" %s -b %s %s'
> >> +  local cmd = (cmd_fmt):format(lua_path, lua_bin, pathlua, pathc)
> >> +  local ret = os.execute(cmd)
> >> +  assert(ret == 0, 'create a C file with bytecode')
> >> +end
> >> +
> >> +create_c_file(path_lua, path_c)
> >> +assert(file_exists(path_c))
> > Minor: The test flow is a little bit hard to read due to function
> > declarations. Maybe it is better to declare all utility functions first
> > and then use them one by one? This makes control flow easier to read.
> Rearranged, take a look please.

Yes, this is muchc clearer, thanks!

> >
> >> +

<snipped>

> >
> >> +        -- It is required to cleanup LUA_PATH, otherwise
> >> +        -- LuaJIT loads Lua module, see tarantool-tests/utils/init.lua.
> > Nit: comment line width is more than 66 symbols
> Fixed too.
> >
> > Actually I don't understand from the comment what Lua module exactly is
> > loaded. Maybe it's better to fix this behaviour?
> What behaviour you want to fix?

That some non-existing module was loaded with `makecmd()`, but since we
have a workaround, maybe it isn't critical.
Anyway, we should do it in a separate patch set.
Feel free to ignore, as I've said.

> > Feel free to ignore.
> >
> >> +        LUA_PATH = '',

<snipped>

> >

[1]: https://github.com/tarantool/luajit/blob/f7e5e8abe396411065f8941d04879577e7fae175/test/tarantool-tests/lj-549-bytecode-loader.test.lua#L54

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-09-07  7:04     ` Sergey Bronnikov via Tarantool-patches
@ 2023-09-11  9:26       ` Sergey Kaplun via Tarantool-patches
  2023-09-12 10:30         ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-09-11  9:26 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: Sergey Bronnikov, tarantool-patches, max.kokryashkin

Hi, Sergey!
Thanks for the fixes and clarifications!
LGTM, after answering to my comments below.

On 07.09.23, Sergey Bronnikov wrote:
> Hi, Sergey
> 
> On 9/5/23 15:55, Sergey Kaplun wrote:
> > Hi, Sergey!
> > Thanks for the patch!
> > Please, consider my comments below.
> >
> > On 31.08.23, Sergey Bronnikov wrote:
> >> From: Sergey Bronnikov <sergeyb@tarantool.org>
> >>
> >> (cherry-picked from commit e49863eda13d095b1a78fd4ca0fd3a6a9a17d782)
> >>
> >> The patch follows up a previous patch and limits the total size of a
> >> chunk load by `lua_load` with size `LJ_MAX_BUF - 1`.
> >>
> >> Sergey Bronnikov:
> >> * added the description and the test
> >> ---

<snipped>

> >> diff --git a/test/tarantool-c-tests/lj-549-lua_load.test.c b/test/tarantool-c-tests/lj-549-lua_load.test.c
> >> new file mode 100644
> >> index 00000000..9baa7a1a
> >> --- /dev/null
> >> +++ b/test/tarantool-c-tests/lj-549-lua_load.test.c

<snipped>

> >
> >> + 	 * `lex_more`.
> >> + 	 */
> >> + 	assert_true(res != LUA_ERRMEM);
> > Maybe it's better to use here codition res == LUA_OK?
> >
> >> + 	lua_settop(L, 0);
> >> +
> >> + 	return TEST_EXIT_SUCCESS;
> >> + }

<snipped>

> >> + static const char *
> >> + bc_reader_with_eof(lua_State *L, void *data, size_t *size)
> >> + {
> >> + 	UNUSED(L);
> >> + 	dt *test_data = (dt *)data;
> >> + 	if (test_data->state == EMIT_EOF) {
> >> + 		*size = 0;
> >> + 		return NULL;
> >> + 	}
> >> +
> >> + 	static char *bc_chunk = NULL;
> >> + 	free(bc_chunk);
> > This free is called only once, when bc_chunk is already NULL.
> > I suggest moving the initialization of the `bc_chunk` to the beginning
> > of the scope and calling `free()` only for the `EMIT_EOF` state (it's
> > also a little bit more readable -- a reader shouldn't remember that
> > `free(NULL)` is OK).
> Updated.

Unfortunately, I don't see these changes on branch [1].

> >
> >> +

<snipped>

> >
> > --
> > Best regards,
> > Sergey Kaplun

[1]: https://github.com/tarantool/luajit/blob/f7e5e8abe396411065f8941d04879577e7fae175/test/tarantool-c-tests/lj-549-lua-load.test.c#L70

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-09-11  8:45       ` Sergey Kaplun via Tarantool-patches
@ 2023-09-12 10:20         ` Sergey Bronnikov via Tarantool-patches
  2023-10-31 11:30           ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-09-12 10:20 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Sergey Bronnikov, tarantool-patches, max.kokryashkin

Hi,


On 9/11/23 11:45, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the fixes!
>
> On 07.09.23, Sergey Bronnikov wrote:
>> Hi, Sergey
>>

<snipped>

>>
>>> | ""
>>>
>>> Nit: use single quotes instead of double quotes if possible.
>> Without context it is difficult to get what is line you talk about.
>>
>> As I see everything is fine with quotes in version on the branch.
>>
> I'm talking about the same line (got this version from the branch):
>
> | +local function basename(path)
> | +  return path:match("[^/]*$")
> | +end

Fixed.


--- a/test/tarantool-tests/utils/tools.lua
+++ b/test/tarantool-tests/utils/tools.lua
@@ -10,7 +10,7 @@ function M.basedir(path)
  end

  function M.basename(path)
-  return path:match("[^/]*$")
+  return path:match('[^/]*$')
  end

  function M.profilename(name)

>>> Nit: `[^/\\]` is better since it also covers Windows.
>>> See <test/lua-Harness-tests/314-regex.t:167>
>>> | local dirname = arg[0]:gsub('([^/\\]+)$', '')
>>> Since we don't support Windows feel free to ignore.
>>>
>>>> +end
>>>> +
>>>> +local stdout_msg = 'Lango team'
>>>> +local lua_code = ('print(%q)'):format(stdout_msg)
>>>> +local fpath = os.tmpname()
>>>> +local path_lua = ('%s.lua'):format(fpath)
>>>> +local path_c = ('%s.c'):format(fpath)
>>>> +local path_so = ('%s.so'):format(fpath)
>>> Minor: I suppose it should be renamed to `path_shared`, since on macOS
>>> they have the ".dyld" suffix for shared libs. Hence, we need to use the
>>> suffix in format of the shared library name too. You may take some
>>> inspiration from here [1].
>> Fixed.
> Hmm, I don't see these updates on the branch [1]. Maybe you forgot to push
> your local changes to the repository (*)?

Right, I forgot to force-push ;(

Force-pushed now.


<snipped>

>>> Nit: Indent is 4 spaces instead of 2.
>>>
>>>> +    return path:match('(.*' .. sep .. ')') or './'
>>> It's better to mention that the pattern matching is greedy, so we match
>>> until the last separator.
>>
>> Updated.
> This is missing too (*).
Added a comment.
>
<snipped>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for embedded bytecode loader.
  2023-09-11  9:26       ` Sergey Kaplun via Tarantool-patches
@ 2023-09-12 10:30         ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 21+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2023-09-12 10:30 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Sergey Bronnikov, tarantool-patches, max.kokryashkin


On 9/11/23 12:26, Sergey Kaplun wrote:


<snipped>

>>>> + static const char *
>>>> + bc_reader_with_eof(lua_State *L, void *data, size_t *size)
>>>> + {
>>>> + 	UNUSED(L);
>>>> + 	dt *test_data = (dt *)data;
>>>> + 	if (test_data->state == EMIT_EOF) {
>>>> + 		*size = 0;
>>>> + 		return NULL;
>>>> + 	}
>>>> +
>>>> + 	static char *bc_chunk = NULL;
>>>> + 	free(bc_chunk);
>>> This free is called only once, when bc_chunk is already NULL.
>>> I suggest moving the initialization of the `bc_chunk` to the beginning
>>> of the scope and calling `free()` only for the `EMIT_EOF` state (it's
>>> also a little bit more readable -- a reader shouldn't remember that
>>> `free(NULL)` is OK).
>> Updated.
> Unfortunately, I don't see these changes on branch [1].
Force-pushed.


<snipped>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2][v2] Fix embedded bytecode loader.
  2023-09-12 10:20         ` Sergey Bronnikov via Tarantool-patches
@ 2023-10-31 11:30           ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 21+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-10-31 11:30 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: Sergey Bronnikov, tarantool-patches, max.kokryashkin

Hi, Sergey!
Sorry for the late response!
Thanks for the fixes and clarifications!
LGTM!

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-10-31 11:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-31 11:29 [Tarantool-patches] [PATCH luajit 0/2][v2] Fix embedded bytecode loader Sergey Bronnikov via Tarantool-patches
2023-08-31 11:30 ` [Tarantool-patches] [PATCH luajit 1/2][v2] " Sergey Bronnikov via Tarantool-patches
2023-08-31 11:49   ` Sergey Bronnikov via Tarantool-patches
2023-09-01  9:42   ` Maxim Kokryashkin via Tarantool-patches
2023-09-04  9:31     ` Sergey Bronnikov via Tarantool-patches
2023-09-05  6:34       ` Maxim Kokryashkin via Tarantool-patches
2023-09-05 14:10   ` Sergey Kaplun via Tarantool-patches
2023-09-07 15:21     ` Sergey Bronnikov via Tarantool-patches
2023-09-11  8:45       ` Sergey Kaplun via Tarantool-patches
2023-09-12 10:20         ` Sergey Bronnikov via Tarantool-patches
2023-10-31 11:30           ` Sergey Kaplun via Tarantool-patches
2023-09-05 14:12   ` Sergey Kaplun via Tarantool-patches
2023-09-07  7:06     ` Sergey Bronnikov via Tarantool-patches
2023-08-31 11:32 ` [Tarantool-patches] [PATCH luajit 2/2][v2] Followup fix for " Sergey Bronnikov via Tarantool-patches
2023-09-01 10:05   ` Maxim Kokryashkin via Tarantool-patches
2023-09-04 16:34     ` Sergey Bronnikov via Tarantool-patches
2023-09-05  6:45       ` Maxim Kokryashkin via Tarantool-patches
2023-09-05 12:55   ` Sergey Kaplun via Tarantool-patches
2023-09-07  7:04     ` Sergey Bronnikov via Tarantool-patches
2023-09-11  9:26       ` Sergey Kaplun via Tarantool-patches
2023-09-12 10:30         ` Sergey Bronnikov via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox