From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 3A76846971A for ; Wed, 4 Dec 2019 16:15:34 +0300 (MSK) Received: by mail-lf1-f67.google.com with SMTP id 203so6057222lfa.12 for ; Wed, 04 Dec 2019 05:15:34 -0800 (PST) Date: Wed, 4 Dec 2019 16:15:32 +0300 From: Konstantin Osipov Message-ID: <20191204131532.GC9676@atlas> References: <20191126211701.mhavpytwkemux3vm@tkn_work_nb> <20191127083123.GA2752@atlas> <20191128123445.GC29714@atlas> <20191128130005.GA1214@tarantool.org> <20191128131804.GE29714@atlas> <20191128140340.GB1214@tarantool.org> <20191128155801.GB11584@atlas> <20191128183615.GC1214@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH] Move txn from shema to a separate module (use C API instead of FFI) List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Leonid Vasiliev Cc: tarantool-patches@dev.tarantool.org * Leonid Vasiliev [19/12/04 16:08]: Leonid, thank you for the benchmark. Could you please also provide - explanation for the results (not everyone may understand why the results are the way they are) - your analysis of the results and suggestions for further actions? > So, several benchmarks (sergos/rollback_to_savepoint - > lvasiliev/gh-4427-move-some-stuff-from-ffi-to-c-api): > > BM1: > > local os = require('os') > local clock = require('clock') > > box.cfg() > > local s = box.schema.space.create('test') > s:create_index('primary') > > > local start_clock = clock.monotonic() > > for i = 1,1000000 do > box.begin() > s:replace{1} > box.is_in_txn() > local save1 = box.savepoint() > s:replace{2} > box.is_in_txn() > local save2 = box.savepoint() > s:replace{3} > box.is_in_txn() > local save3 = box.savepoint() > s:replace{4} > box.is_in_txn() > box.rollback_to_savepoint(save3) > box.is_in_txn() > box.rollback_to_savepoint(save2) > box.is_in_txn() > box.rollback_to_savepoint(save1) > box.is_in_txn() > box.commit() > end > > local stop_clock = clock.monotonic() > local work_time = stop_clock - start_clock > print("Result: " .. tostring(work_time)) > > s:drop() > os.exit() > > > BM2: > > local os = require('os') > local clock = require('clock') > > box.cfg() > > local s = box.schema.space.create('test') > s:create_index('primary') > > > local start_clock = clock.monotonic() > > for i = 1,1000000 do > box.is_in_txn() > end > > local stop_clock = clock.monotonic() > local work_time = stop_clock - start_clock > print("Result: " .. tostring(work_time)) > > s:drop() > os.exit() > > > BM3: > > local os = require('os') > local clock = require('clock') > > box.cfg() > > local s = box.schema.space.create('test') > s:create_index('primary') > > > local start_clock = clock.monotonic() > > for i = 1,1000000 do > box.begin() > box.commit() > end > > local stop_clock = clock.monotonic() > local work_time = stop_clock - start_clock > print("Result: " .. tostring(work_time)) > > s:drop() > os.exit() > > > BM4: > > local os = require('os') > local clock = require('clock') > > box.cfg() > > local s = box.schema.space.create('test') > s:create_index('primary') > > > local start_clock = clock.monotonic() > > for i = 1,1000000 do > box.begin() > s:replace{1} > local save1 = box.savepoint() > s:replace{2} > box.rollback_to_savepoint(save1) > box.commit() > end > > local stop_clock = clock.monotonic() > local work_time = stop_clock - start_clock > print("Result: " .. tostring(work_time)) > > s:drop() > os.exit() > > > Results: > > BM Sergo commit(s) Leonid commit(s) Result > bm1 26.7092 26.7897 Sergo commit win 0.3% > bm2 0.0023 0.0195 Sergo commit win x 8.5 > bm3 0.1281 0.1502 Sergo commit win 17.2% > bm4 19.6567 19.8466 Sergo commit win 0.96% > > > On 11/28/19 9:36 PM, Igor Munkin wrote: > > Kostja, > > > > On 28.11.19, Konstantin Osipov wrote: > > > * Igor Munkin [19/11/28 17:08]: > > > > > > > Why should we be aiming at using FFI more? The root cause is that > > > > current fiber machinery (as well as some parts of triggers mechanism) > > > > doesn't respect the Lua coroutine switch semantics, thereby breaking > > > > trace recording. Lua-C API implicitly (or non-intentionally) prevents > > > > breakage by JIT trace aborts when recording FUNCC. > > > > > > It's not correct. The current FFI functions were carefully crafted > > > to never lead to sandwich code: only those functions which can not > > > trigger a return to Lua were implemented as FFI. > > > There was one regression between 1.10 and in 2.3 because we > > > started firing rollback triggers when rolling back to a savepoint, > > > which was spotted by a failing tests. > > > > > > One more time: When FFI bindings were written we were aware of NYI > > > and took it into account. > > > > > > > OK, maybe I said it the wrong way using the word "non-intentionally". I > > mean that Tarantool doesn't use any special handler to asynchroniously > > abort recording, since there is no such outside the LuaJIT internals > > (jit.off blacklists the function regardless the host and guest stacks > > layout). > > > > > > Therefore, I guess we should be aiming either at changing fiber > > > > switching to the one respecting the LuaJIT runtime or at tuning JIT > > > > compiler way more regarding the Lua-C usage. > > > > > > This is actually quite simple - we could easily call a LuaJIT hook > > > whenever switching a fiber, to make sure that it carefully > > > switches the internals as well. Mike Pall refused to cooperate on > > > the matter, but now we (you) control our own destiny. > > > > Unfortunately, I haven't seen the thread where the subj is discussed > > with Mike Pall, but the approach you proposed doesn't seem to be a > > convenient one, however it still solves a problem (as does the move to > > use Lua-C API for the code with possible Lua VM re-entrance underneath). > > > > The major flaw I see in this solution, is introducing the dependency on > > the JIT interface into Tarantool internals. There is already one > > dependency on LuaJIT-2.1.0 presented with internal headers usage for > > several hacks in utils.c. As a result we are not able to simply replace > > the Lua implementation to try another one (e.g. uJIT conforming > > LuaJIT-2.0.5) for comparing each other. > > > > The best proposal we had with Kirill and Sergos is to finalize a trace > > exactly at CALLXS IR, however after some research I found that the > > snapshot to be replayed at the corresponding trace exit will restore the > > guest stack it doesn't relate to. I hope to make further research this > > direction, but it requires a way more time to adjust this behaviour and > > its benefits are doubtful for me now. > > > > For now, there is a partial fix I mentioned before, however it still > > violates the flow I described here[1]. I'm going to proceed with the > > research a bit later and provide another patch. > > > > > > > > > Besides, we can't fully prevent platform failures if there is an FFI > > > > misusage in users code. > > > > > > Tarantool has never been claiming that it prevents people from > > > > Sorry, I simply misread the following: > > |> Why not dig it up to protect from future erosion of the code base? > > |> > > |> This would be more valuable contribution than just falling back to > > |> Lua/C for everything. > > > > > shooting themselves in the foot. Performance is the ultimate > > > design goal, at the cost of safety at times. > > > > > > > Great, we discussed with Leonid and Sasha offline and agreed to make > > several benchmarks to be provided in this thread. With no benchmarks all > > our estimates can be simply wrong. > > > > > > > > > > What should be the rule of thumb in your opinion, ffi or > > > > > lua/c? > > > > > > > > If you want to know my rule of thumb: FFI is for external existing > > > > libraries to be used in Lua code (and all compiler related benefits are > > > > nothing more than a godsend consequence, since all guest stack > > > > manipulations are implemented in LuaJIT runtime, not in an external > > > > code) and Lua-C is a well-designed and well-documented API for embedding > > > > Lua into a host application / extending Lua with external low-lewel > > > > libs. I totally do not insist on my point of view, since everyone has > > > > it's own vision on LuaJIT features. > > > > > > OK, but there must be a single policy though. So far it was: > > > everything that doesn't yield and doesn't call back to Lua > > > uses FFI. Everything else *has* to use Lua/C API, UNTIL there > > > is a way to safely sandwich FFI calls. > > > > > > > I agree with you for the policy existence, but we all see the one you > > mentioned above can introduce bugs leading to a platform failures. So I > > guess we should reconsider it or simply dump somewhere. I think we have > > to make some benchmarks and provide not only stats, but also a > > reproducer with the input data, otherwise JIT tests are IMHO irrelevant. > > > > > -- > > > Konstantin Osipov, Moscow, Russia > > > > [1]: https://github.com/tarantool/tarantool/issues/4427#issuecomment-546056302 > > -- Konstantin Osipov, Moscow, Russia