From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id E6FA171218; Wed, 27 Oct 2021 14:06:46 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E6FA171218 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1635332807; bh=JXiXy2cAHGcLXIZUttboN7xqjVrohQbu+bU1JdWSr+c=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=SOa0A04nVlLH0S+Rk/qBZQFpndQkvuSl1uZn0nblqkYw7QR8hKbRfIl1fcOLo2NyG 7IfJaDOHmS0IJNY9rfpfm+6MBj13+qfCRKY401PT9db/CQyYhScVwTnKx+63Dfr1BN yMpu3y4yX9uZQdoLVbS1TzZ2r0KjXScXNdzSA7H8= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 44AE271218 for ; Wed, 27 Oct 2021 14:06:45 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 44AE271218 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1mfglY-0003Co-1S; Wed, 27 Oct 2021 14:06:44 +0300 Date: Wed, 27 Oct 2021 14:06:40 +0300 To: Sergey Kaplun Message-ID: <20211027110640.GA8831@tarantool.org> References: <20211022130225.6076-1-skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20211022130225.6076-1-skaplun@tarantool.org> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.10.1 (2018-07-13) X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D1D35DBD2D15487E99C830D5B4EA346169F334C4E4A2904B182A05F538085040F9F3B267F231DC9D26D106200C5D65AEDE140B94CB035E08D8949678347DBF60 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE77A6D8F9467FE415BEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006378D70459434292EC88638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D89B795FAC4E8191573A13EF7B46521168117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC60CDF180582EB8FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F4460429728776938767073520B1593CA6EC85F86D28451B159A507268D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6E098993382A367A8089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A58CE4E315E74526E29057BBD367AD6589CBB3872429C11549D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7567C209D01CC1E34B410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34C53D1911AD7E41993918E5D37B94CA1BF84446F86F4BC83F1D61FE9B768555589B4ED8D030F164041D7E09C32AA3244C1CED160EDF437D4ABE40C5A2D3E6FC77FE8DA44ABE2443F7927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojOHwMx23X6B3qnFJe5XLx5A== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DBFCC036DB852BC85095FA8307638EAFEA7C8D0F45F857DBFE9F1EFEE2F478337FB559BB5D741EB964C8C2C849690F8E70A04DAD6CC59E33667EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Igor Munkin via Tarantool-patches Reply-To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Sergey, Thanks for the patch! LGTM with some nits below. On 22.10.21, Sergey Kaplun wrote: > tuple_bless() uses a tail call to ffi.gc() with return to the caller. > This tail call replaces the current (tuple_bless) frame with the frame > of the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()` > to the frame below it aborts the trace recording with the error "NYI: > return to lower frame". Side note: for the root traces the issue is the same, but the error is different. > > This patch replaces the tail call with using additional local variable Minor: You do not replace tail call, but rather don't give an option for LuaJIT to emit CALLT. Anyway, just being pedantic, feel free to ignore. > returned to the caller right after. > --- > > Actually, this patch become possible thanks to Michael Filonenko and his > benchmarks of TDG runs with jit.dump() enabled. After analysis of this > dump we realize that tuple_bless is not compiled. This uncompiled chunk > of code leads to the JIT cancer for all possible workflows that use > tuple_bless() (i.e. tuple:update() and tuple:upsert()). This change is > really trivial, but adds almost x2 improvement of performance for > tuple:update()/upsert() scenario. Hope, that this patch will be a > stimulus for including benchmarks of our forward products like TDG to > routine performance running with the corresponding profilers dumps. Kekw, one-liner boosting update/upsert in two times -- nice catch! Anyway, please check that your change doesn't affect overall perfomance in interpreter mode too. The bad thing in this, that we have no regular Lua benchmarks at all (even those you provided below), so we can't watch the effect of such changes regularly. > > Benchmarks: > > Before patch: > > Update: > | Tarantool 2.10.0-beta1-90-g31594b427 > | type 'help' for interactive help > | tarantool> local t = {} > | for i = 1, 1e6 do > | table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'}) > | end > | local clock = require"clock" > | local S = clock.proc() > | for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end > | return clock.proc() - S; > | --- > | - 4.208298872 > > Upsert: 4.158661731 > > After patch: > > Update: > | Tarantool 2.10.0-beta1-90-g31594b427 > | type 'help' for interactive help > | tarantool> local t = {} > | for i = 1, 1e6 do > | table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'}) > | end > | local clock = require"clock" > | local S = clock.proc() > | for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end > | return clock.proc() - S; > | --- > | - 2.357670738 > > Upsert: 2.334134195 > > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-tuple-bless-compile > > src/box/lua/tuple.lua | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua > index fa76f4f7f..73446ab22 100644 > --- a/src/box/lua/tuple.lua > +++ b/src/box/lua/tuple.lua > @@ -98,7 +98,14 @@ local tuple_bless = function(tuple) > -- overflow checked by tuple_bless() in C > builtin.box_tuple_ref(tuple) > -- must never fail: > - return ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) > + -- XXX: If we use tail call (instead creating a new frame for Typo: s/instead/instead of/. > + -- a call just replace the top one) here, then JIT tries Minor: I see "replace" for the second time, but LuaJIT just "use" the caller frame for callee. I propose to s/replace/use/g, but this is neglible, so feel free to ignore. > + -- to compile return from `ffi.gc()` to the frame below. This > + -- abort the trace recording with the error "NYI: return to Typo: s/abort/aborts/. > + -- lower frame". So avoid tail call and use additional stack > + -- slots (for the local variable and the frame). > + local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) > + return tuple_ref Side note: Ugh... I'm sad we're doing things like this one. Complicating the code, leaving huge comments with the rationale of such complicating to reach the desirable (and what is important, local) performance. I propose to spend your innovative time to try solving the problem in the JIT engine: it will be more fun and allow us to avoid writing the cookbook "How to write super-duper-jittable code in LuaJIT". Here is the valid question: what about other hot places with CALLT in Tarantool? Should they be considered/fixed? I guess a ticket will help to not forget about this problem. Anyway, for now the fix provides the considerable boost, so feel free to proceed with the patch. > end > > local tuple_check = function(tuple, usage) > -- > 2.31.0 > -- Best regards, IM