From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 535C871220; Wed, 27 Oct 2021 16:18:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 535C871220 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1635340711; bh=QLYHpNcFhPQYk2S+BFPpigtlZI7zGGplaPFWYbz3we4=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=uO7bFAOBFViKPMMWgK/rNf9TBDUsZDraQ/bV4UyOpjK5686GDoWhT2FLGFCrKlvH5 xMYAhW4vrkmQ/LIF35S0w51c+1XKlNnXHQIA/jbmb8DZx49yUiy3Q5tOPDQi9jEkQr bawf35UraRkMLM8MAjS/RS1f9wjk5ZgAhaA89MBE= Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id F365771220 for ; Wed, 27 Oct 2021 16:18:29 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F365771220 Received: by smtp52.i.mail.ru with esmtpa (envelope-from ) id 1mfip2-0000gj-Gn; Wed, 27 Oct 2021 16:18:28 +0300 Date: Wed, 27 Oct 2021 16:16:43 +0300 To: Igor Munkin Message-ID: References: <20211022130225.6076-1-skaplun@tarantool.org> <20211027110640.GA8831@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211027110640.GA8831@tarantool.org> X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D1D35DBD2D15487EB9438C392F7C4E68414DBB3B161A4FBD182A05F538085040CD0DF5393D0588BC342D8795E1A6F092D6F275DCE5CD99947A88D3A68FD0D123 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7E3137013C338FE3BEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637013F392EFFCDE01C8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D82D13B5CD5CAE81ABF7919574784D842C117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC8C7ADC89C2F0B2A5A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751FC26CFBAC0749D213D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6045A9A90E9EED90B089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A5F3117B4D864891F7C2349062C7E0A063576022F87A4284E0D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7567C209D01CC1E34B410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34431D0341F6B74DD347EDB759F6F00A4D2D309EFF2765616D9F6BAC1D8F4ED6B38421F5BF3B6B07B61D7E09C32AA3244CFD7A5ED41AB61C8F99E8B835D5E953F64DBEAD0ED6C55A80FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojOHwMx23X6B1E5P/gHe7NRQ== X-Mailru-Sender: 3B9A0136629DC91206CBC582EFEF4CB41045B11582E842AF8379E5DA0175007F4BF114205C6AF5B4F2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A89437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Igor, Thanks for the review! On 27.10.21, Igor Munkin wrote: > Sergey, > > Thanks for the patch! LGTM with some nits below. > > On 22.10.21, Sergey Kaplun wrote: > > tuple_bless() uses a tail call to ffi.gc() with return to the caller. > > This tail call replaces the current (tuple_bless) frame with the frame > > of the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()` > > to the frame below it aborts the trace recording with the error "NYI: > > return to lower frame". > > Side note: for the root traces the issue is the same, but the error is > different. Yep. > > > > > This patch replaces the tail call with using additional local variable > > Minor: You do not replace tail call, but rather don't give an option for > LuaJIT to emit CALLT. Anyway, just being pedantic, feel free to ignore. So, the CALLT is replaced with a regular call :). Ignoring. > > > returned to the caller right after. > > --- > > > > Actually, this patch become possible thanks to Michael Filonenko and his > > benchmarks of TDG runs with jit.dump() enabled. After analysis of this > > dump we realize that tuple_bless is not compiled. This uncompiled chunk > > of code leads to the JIT cancer for all possible workflows that use > > tuple_bless() (i.e. tuple:update() and tuple:upsert()). This change is > > really trivial, but adds almost x2 improvement of performance for > > tuple:update()/upsert() scenario. Hope, that this patch will be a > > stimulus for including benchmarks of our forward products like TDG to > > routine performance running with the corresponding profilers dumps. > > Kekw, one-liner boosting update/upsert in two times -- nice catch! > Anyway, please check that your change doesn't affect overall perfomance > in interpreter mode too. The new one (without tailcall) is 1% slower: 21.2 sec vs 21.0 sec with jit.off(). This looks like a good trade to me. > > The bad thing in this, that we have no regular Lua benchmarks at all > (even those you provided below), so we can't watch the effect of such > changes regularly. It's true. > > > > > Benchmarks: > > > > Before patch: > > > > Update: > > | Tarantool 2.10.0-beta1-90-g31594b427 > > | type 'help' for interactive help > > | tarantool> local t = {} > > | for i = 1, 1e6 do > > | table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'}) > > | end > > | local clock = require"clock" > > | local S = clock.proc() > > | for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end > > | return clock.proc() - S; > > | --- > > | - 4.208298872 > > > > Upsert: 4.158661731 > > > > After patch: > > > > Update: > > | Tarantool 2.10.0-beta1-90-g31594b427 > > | type 'help' for interactive help > > | tarantool> local t = {} > > | for i = 1, 1e6 do > > | table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'}) > > | end > > | local clock = require"clock" > > | local S = clock.proc() > > | for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end > > | return clock.proc() - S; > > | --- > > | - 2.357670738 > > > > Upsert: 2.334134195 > > > > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-tuple-bless-compile > > > > src/box/lua/tuple.lua | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua > > index fa76f4f7f..73446ab22 100644 > > --- a/src/box/lua/tuple.lua > > +++ b/src/box/lua/tuple.lua > > @@ -98,7 +98,14 @@ local tuple_bless = function(tuple) > > -- overflow checked by tuple_bless() in C > > builtin.box_tuple_ref(tuple) > > -- must never fail: > > - return ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) > > + -- XXX: If we use tail call (instead creating a new frame for > > Typo: s/instead/instead of/. > > > + -- a call just replace the top one) here, then JIT tries > > Minor: I see "replace" for the second time, but LuaJIT just "use" the > caller frame for callee. I propose to s/replace/use/g, but this is > neglible, so feel free to ignore. > > > + -- to compile return from `ffi.gc()` to the frame below. This > > + -- abort the trace recording with the error "NYI: return to > > Typo: s/abort/aborts/. Fixed your comments. See the iterative patch below. Branch is force-pushed. =================================================================== diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua index 1201c7c34..f47b5926d 100644 --- a/src/box/lua/tuple.lua +++ b/src/box/lua/tuple.lua @@ -98,10 +98,10 @@ local tuple_bless = function(tuple) -- overflow checked by tuple_bless() in C builtin.box_tuple_ref(tuple) -- must never fail: - -- XXX: If we use tail call (instead creating a new frame for - -- a call just replace the top one) here, then JIT tries - -- to compile return from `ffi.gc()` to the frame below. This - -- abort the trace recording with the error "NYI: return to + -- XXX: If we use tail call (instead of creating a new frame + -- for a call just use the top one) here, then JIT tries to + -- compile return from `ffi.gc()` to the frame below. This + -- aborts the trace recording with the error "NYI: return to -- lower frame". So avoid tail call and use additional stack -- slots (for the local variable and the frame). local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) =================================================================== And the new commit message (remove "replace" usage): =================================================================== tuple: make tuple_bless() compilable tuple_bless() uses a tail call to ffi.gc() with return to the caller. This tail call uses the current (tuple_bless) frame instead of creating the frame for the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()` to the frame below it aborts the trace recording with the error "NYI: return to lower frame". This patch replaces the tail call with using additional local variable returned to the caller right after. =================================================================== > > > + -- lower frame". So avoid tail call and use additional stack > > + -- slots (for the local variable and the frame). > > + local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) > > + return tuple_ref > > Side note: Ugh... I'm sad we're doing things like this one. Complicating > the code, leaving huge comments with the rationale of such complicating > to reach the desirable (and what is important, local) performance. I > propose to spend your innovative time to try solving the problem in the > JIT engine: it will be more fun and allow us to avoid writing the > cookbook "How to write super-duper-jittable code in LuaJIT". Yes, it is ugly workaround. The true way is to resolve problem with compiling of return to frame below one where trace was started. > > Here is the valid question: what about other hot places with CALLT in > Tarantool? Should they be considered/fixed? I guess a ticket will help > to not forget about this problem. I suppose it should be created within the activity of regular testing our most valuable products and "boxes". > > Anyway, for now the fix provides the considerable boost, so feel free to > proceed with the patch. Thanks!:) > > > end > > > > local tuple_check = function(tuple, usage) > > -- > > 2.31.0 > > > > -- > Best regards, > IM -- Best regards, Sergey Kaplun