From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 8B2BB57649C; Wed, 9 Aug 2023 18:41:57 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 8B2BB57649C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1691595717; bh=a3jGOKhy/GFUoO3DkxQG2++44GeusnJzvkfmXQyYLXA=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=pQ66wZOULeWwkqERHEbnUMvq2LGfVzqPML/PUQmO6bm3RFJQ2yws7nloP1ocGT1yU 5Zmmr7QyNDKGmcMkLtCf4V1U1/3I77TjEQ3OH9kgwsiADtlZCTuG94XtruHe1bbogC 9F8JrE1NYu5TOOJ/ei6unod2tMH5F9I+YjVUgTa0= Received: from smtp32.i.mail.ru (smtp32.i.mail.ru [95.163.41.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A1AEE576066 for ; Wed, 9 Aug 2023 18:40:57 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A1AEE576066 Received: by smtp32.i.mail.ru with esmtpa (envelope-from ) id 1qTlIu-003Nbf-2z; Wed, 09 Aug 2023 18:40:57 +0300 To: Igor Munkin , Sergey Bronnikov Date: Wed, 9 Aug 2023 18:35:51 +0300 Message-ID: <67642b3989e440fb554bf60db140828653c59659.1691592488.git.skaplun@tarantool.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD969E04B5EED670DC868303E4FA23A046C075EC7AC197E9C0B182A05F5380850406EC42C528514C18C6A27CB2E765B16166282E013C0337ADC64135331CBC963C2 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7A8325FA649D0A450EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637340F95D8C375F5048638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8B0DE9B52580D9097529AE1422AC62606117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCB6FC2C91A742FF12A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352026055571C92BF10F618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE4B6963042765DA4B2D242C3BD2E3F4C64AD6D5ED66289B523666184CF4C3C14F6136E347CC761E07725E5C173C3A84C3EF0AF71940E6227776E601842F6C81A1F004C906525384303E02D724532EE2C3F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7C468D16C903838CAB43847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A5EA9AA64B413CE1612F6BEBBAB7CE8134023B1F181720444CF87CCE6106E1FC07E67D4AC08A07B9B086D40F53BA1922959C5DF10A05D560A950611B66E3DA6D700B0A020F03D25A0997E3FB2386030E77 X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CF529DE55A9F6173E55DC2EA2E8A8FBF8FAB51EFA9AB3CCE54F4265E0251AC7FB1CC55679CEC9B022B1529AA6C8ABD04CA6692AF14C3BA0E3AB74106406F49F7FEA74DFFEFA5DC0E7F02C26D483E81D6BE5EF9655DD6DEA7D65774BB76CC95456EEC5B5AD62611EEC62B5AFB4261A09AF0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojUzxoxvtYX2pW6toWKxtxlg== X-Mailru-Sender: 11C2EC085EDE56FAC07928AF2646A769A0DC0DED60E004876A27CB2E765B16161442BDFAA4939184DEDBA653FF35249392D99EB8CC7091A70E183A470755BFD208F19895AA18418972D6B4FCE48DF648AE208404248635DF X-Mras: Ok Subject: [Tarantool-patches] [PATCH luajit 02/19] test: introduce mcode generator for tests X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" The test depends on particular offset of mcode for side trace regarding the parent trace. Before this commit just run some amount of functions to generate traces to fill the required mcode range. Unfortunately, this approach is not robust, since sometimes trace is not recorded due to errors "leaving loop in root trace" observed because of hotcount collisions. This patch introduces the following helpers: * `frontend.gettraceno(func)` -- returns the traceno for the given function, assumming that there is compiled trace for its prototype (i.e. the 0th bytecode is JFUNC). * `jit.generators.fillmcode(traceno, size)` fills mcode area of the given size from the given trace. It is useful to generate some mcode to test jumps to side traces remote enough from the parent. --- ...8-fix-side-exit-patching-on-arm64.test.lua | 78 ++---------- test/tarantool-tests/utils/frontend.lua | 24 ++++ test/tarantool-tests/utils/jit/generators.lua | 115 ++++++++++++++++++ 3 files changed, 150 insertions(+), 67 deletions(-) create mode 100644 test/tarantool-tests/utils/jit/generators.lua diff --git a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua index 93db3041..678ac914 100644 --- a/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua +++ b/test/tarantool-tests/gh-6098-fix-side-exit-patching-on-arm64.test.lua @@ -1,8 +1,12 @@ local tap = require('tap') local test = tap.test('gh-6098-fix-side-exit-patching-on-arm64'):skipcond({ ['Test requires JIT enabled'] = not jit.status(), + ['Disabled on *BSD due to #4819'] = jit.os == 'BSD', }) +local generators = require('utils').jit.generators +local frontend = require('utils').frontend + test:plan(1) -- The function to be tested for side exit patching: @@ -20,52 +24,6 @@ local function cbool(cond) end end --- XXX: Function template below produces 8Kb mcode for ARM64, so --- we need to compile at least 128 traces to exceed 1Mb delta --- between root trace side exit and side trace. --- Unfortunately, we have no other option for extending this jump --- delta, since the base of the current mcode area (J->mcarea) is --- used as a hint for mcode allocator (see lj_mcode.c for info). -local FUNCS = 128 -local recfuncs = { } -for i = 1, FUNCS do - -- This is a quite heavy workload (though it doesn't look like - -- one at first). Each load from a table is type guarded. Each - -- table lookup (for both stores and loads) is guarded for table - -- value and metatable presence. The code below results - -- to 8Kb of mcode for ARM64 in practice. - recfuncs[i] = assert(load(([[ - return function(src) - local p = %d - local tmp = { } - local dst = { } - for i = 1, 3 do - tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p - tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p - tmp.c = src.c * p tmp.l = src.l * p tmp.u = src.u * p - tmp.d = src.d * p tmp.m = src.m * p tmp.v = src.v * p - tmp.e = src.e * p tmp.n = src.n * p tmp.w = src.w * p - tmp.f = src.f * p tmp.o = src.o * p tmp.x = src.x * p - tmp.g = src.g * p tmp.p = src.p * p tmp.y = src.y * p - tmp.h = src.h * p tmp.q = src.q * p tmp.z = src.z * p - tmp.i = src.i * p tmp.r = src.r * p - - dst.a = tmp.z + p dst.j = tmp.q + p dst.s = tmp.h + p - dst.b = tmp.y + p dst.k = tmp.p + p dst.t = tmp.g + p - dst.c = tmp.x + p dst.l = tmp.o + p dst.u = tmp.f + p - dst.d = tmp.w + p dst.m = tmp.n + p dst.v = tmp.e + p - dst.e = tmp.v + p dst.n = tmp.m + p dst.w = tmp.d + p - dst.f = tmp.u + p dst.o = tmp.l + p dst.x = tmp.c + p - dst.g = tmp.t + p dst.p = tmp.k + p dst.y = tmp.b + p - dst.h = tmp.s + p dst.q = tmp.j + p dst.z = tmp.a + p - dst.i = tmp.r + p dst.r = tmp.i + p - end - dst.tmp = tmp - return dst - end - ]]):format(i)), ('Syntax error in function recfuncs[%d]'):format(i))() -end - -- Make compiler work hard: -- * No optimizations at all to produce more mcode. -- * Try to compile all compiled paths as early as JIT can. @@ -78,27 +36,13 @@ cbool(true) -- a root trace for . cbool(true) -for i = 1, FUNCS do - -- XXX: FNEW is NYI, hence loop recording fails at this point. - -- The recording is aborted on purpose: we are going to record - -- number of traces for functions in . - -- Otherwise, loop recording might lead to a very long trace - -- error (via return to a lower frame), or a trace with lots of - -- side traces. We need neither of this, but just bunch of - -- traces filling the available mcode area. - local function tnew(p) - return { - a = p + 1, f = p + 6, k = p + 11, p = p + 16, u = p + 21, z = p + 26, - b = p + 2, g = p + 7, l = p + 12, q = p + 17, v = p + 22, - c = p + 3, h = p + 8, m = p + 13, r = p + 18, w = p + 23, - d = p + 4, i = p + 9, n = p + 14, s = p + 19, x = p + 24, - e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25, - } - end - -- Each function call produces a trace (see the template for the - -- function definition above). - recfuncs[i](tnew(i)) -end +local cbool_traceno = frontend.gettraceno(cbool) + +-- XXX: Unfortunately, we have no other option for extending +-- this jump delta, since the base of the current mcode area +-- (J->mcarea) is used as a hint for mcode allocator (see +-- lj_mcode.c for info). +generators.fillmcode(cbool_traceno, 1024 * 1024) -- XXX: I tried to make the test in pure Lua, but I failed to -- implement the robust solution. As a result I've implemented a diff --git a/test/tarantool-tests/utils/frontend.lua b/test/tarantool-tests/utils/frontend.lua index 2afebbb2..414257fd 100644 --- a/test/tarantool-tests/utils/frontend.lua +++ b/test/tarantool-tests/utils/frontend.lua @@ -1,6 +1,10 @@ local M = {} local bc = require('jit.bc') +local jutil = require('jit.util') +local vmdef = require('jit.vmdef') +local bcnames = vmdef.bcnames +local band, rshift = bit.band, bit.rshift function M.hasbc(f, bytecode) assert(type(f) == 'function', 'argument #1 should be a function') @@ -22,4 +26,24 @@ function M.hasbc(f, bytecode) return hasbc end +-- Get traceno of the trace assotiated for the given function. +function M.gettraceno(func) + assert(type(func) == 'function', 'argument #1 should be a function') + + -- The 0th BC is the header. + local func_ins = jutil.funcbc(func, 0) + local BC_NAME_LENGTH = 6 + local RD_SHIFT = 16 + + -- Calculate index in `bcnames` string. + local op_idx = BC_NAME_LENGTH * band(func_ins, 0xff) + -- Get the name of the operation. + local op_name = string.sub(bcnames, op_idx + 1, op_idx + BC_NAME_LENGTH) + assert(op_name:match('JFUNC'), + 'The given function has non-jitted header: ' .. op_name) + + -- RD contains the traceno. + return rshift(func_ins, RD_SHIFT) +end + return M diff --git a/test/tarantool-tests/utils/jit/generators.lua b/test/tarantool-tests/utils/jit/generators.lua new file mode 100644 index 00000000..62b6e0ef --- /dev/null +++ b/test/tarantool-tests/utils/jit/generators.lua @@ -0,0 +1,115 @@ +local M = {} + +local jutil = require('jit.util') + +local function getlast_traceno() + return misc.getmetrics().jit_trace_num +end + +-- Convert addr to positive value if needed. +local function canonize_address(addr) + if addr < 0 then addr = addr + 2 ^ 32 end + return addr +end + +-- Need some storage to avoid functions and traces to be +-- collected. +local recfuncs = {} +local last_i = 0 +-- This function generates a table of functions with heavy mcode +-- payload with tab arithmetics to fill the mcode area from the +-- one trace mcode by the some given size. This size is usually +-- big enough, because we want to check long jump side exits from +-- some traces. +-- Assumes, that maxmcode and maxtrace options are set to be sure, +-- that we can produce such amount of mcode. +function M.fillmcode(trace_from, size) + local mcode, addr_from = jutil.tracemc(trace_from) + assert(mcode, 'the #1 argument should be an existed trace number') + addr_from = canonize_address(addr_from) + local required_diff = size + #mcode + + -- Marker to check that traces are not flushed. + local maxtraceno = getlast_traceno() + local FLUSH_ERR = 'Traces are flushed, check your maxtrace, maxmcode options' + + local _, last_addr = jutil.tracemc(maxtraceno) + last_addr = canonize_address(last_addr) + + -- Addresses of traces may increase or decrease depending on OS, + -- so use absolute diff. + while math.abs(last_addr - addr_from) > required_diff do + last_i = last_i + 1 + -- This is a quite heavy workload (though it doesn't look like + -- one at first). Each load from a table is type guarded. Each + -- table lookup (for both stores and loads) is guarded for + -- table value and presence of the metatable. The code + -- below results to ~8Kb of mcode for ARM64 and MIPS64 in + -- practice. + local fname = ('fillmcode[%d]'):format(last_i) + recfuncs[last_i] = assert(loadstring(([[ + return function(src) + local p = %d + local tmp = { } + local dst = { } + -- XXX: use 5 as stop index to reduce LLEAVE (leaving loop + -- in root trace) errors due to hotcount collisions. + for i = 1, 5 do + tmp.a = src.a * p tmp.j = src.j * p tmp.s = src.s * p + tmp.b = src.b * p tmp.k = src.k * p tmp.t = src.t * p + tmp.c = src.c * p tmp.l = src.l * p tmp.u = src.u * p + tmp.d = src.d * p tmp.m = src.m * p tmp.v = src.v * p + tmp.e = src.e * p tmp.n = src.n * p tmp.w = src.w * p + tmp.f = src.f * p tmp.o = src.o * p tmp.x = src.x * p + tmp.g = src.g * p tmp.p = src.p * p tmp.y = src.y * p + tmp.h = src.h * p tmp.q = src.q * p tmp.z = src.z * p + tmp.i = src.i * p tmp.r = src.r * p + + dst.a = tmp.z + p dst.j = tmp.q + p dst.s = tmp.h + p + dst.b = tmp.y + p dst.k = tmp.p + p dst.t = tmp.g + p + dst.c = tmp.x + p dst.l = tmp.o + p dst.u = tmp.f + p + dst.d = tmp.w + p dst.m = tmp.n + p dst.v = tmp.e + p + dst.e = tmp.v + p dst.n = tmp.m + p dst.w = tmp.d + p + dst.f = tmp.u + p dst.o = tmp.l + p dst.x = tmp.c + p + dst.g = tmp.t + p dst.p = tmp.k + p dst.y = tmp.b + p + dst.h = tmp.s + p dst.q = tmp.j + p dst.z = tmp.a + p + dst.i = tmp.r + p dst.r = tmp.i + p + end + dst.tmp = tmp + return dst + end + ]]):format(last_i), fname), ('Syntax error in function %s'):format(fname))() + -- XXX: FNEW is NYI, hence loop recording fails at this point. + -- The recording is aborted on purpose: the whole loop + -- recording might lead to a very long trace error (via return + -- to a lower frame), or a trace with lots of side traces. We + -- need neither of this, but just a bunch of traces filling + -- the available mcode area. + local function tnew(p) + return { + a = p + 1, f = p + 6, k = p + 11, p = p + 16, u = p + 21, z = p + 26, + b = p + 2, g = p + 7, l = p + 12, q = p + 17, v = p + 22, + c = p + 3, h = p + 8, m = p + 13, r = p + 18, w = p + 23, + d = p + 4, i = p + 9, n = p + 14, s = p + 19, x = p + 24, + e = p + 5, j = p + 10, o = p + 15, t = p + 20, y = p + 25, + } + end + -- Each function call produces a trace (see the template for + -- the function definition above). + recfuncs[last_i](tnew(last_i)) + local last_traceno = getlast_traceno() + if last_traceno < maxtraceno then + error(FLUSH_ERR) + end + + -- Calculate the address of the last trace start. + maxtraceno = last_traceno + _, last_addr = jutil.tracemc(last_traceno) + if not last_addr then + error(FLUSH_ERR) + end + last_addr = canonize_address(last_addr) + end +end + +return M -- 2.41.0