From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7891146971A for ; Wed, 4 Dec 2019 16:05:44 +0300 (MSK) References: <156ce93b495648d6f3fd6c879b0d9aaf56754a1e.1574773773.git.lvasiliev@tarantool.org> <20191126210520.GE23422@atlas> <20191126211701.mhavpytwkemux3vm@tkn_work_nb> <20191127083123.GA2752@atlas> <20191128123445.GC29714@atlas> <20191128130005.GA1214@tarantool.org> <20191128131804.GE29714@atlas> <20191128140340.GB1214@tarantool.org> <20191128155801.GB11584@atlas> <20191128183615.GC1214@tarantool.org> From: Leonid Vasiliev Message-ID: Date: Wed, 4 Dec 2019 16:05:42 +0300 MIME-Version: 1.0 In-Reply-To: <20191128183615.GC1214@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] Move txn from shema to a separate module (use C API instead of FFI) List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin , Konstantin Osipov Cc: tarantool-patches@dev.tarantool.org So, several benchmarks (sergos/rollback_to_savepoint - lvasiliev/gh-4427-move-some-stuff-from-ffi-to-c-api): BM1: local os = require('os') local clock = require('clock') box.cfg() local s = box.schema.space.create('test') s:create_index('primary') local start_clock = clock.monotonic() for i = 1,1000000 do box.begin() s:replace{1} box.is_in_txn() local save1 = box.savepoint() s:replace{2} box.is_in_txn() local save2 = box.savepoint() s:replace{3} box.is_in_txn() local save3 = box.savepoint() s:replace{4} box.is_in_txn() box.rollback_to_savepoint(save3) box.is_in_txn() box.rollback_to_savepoint(save2) box.is_in_txn() box.rollback_to_savepoint(save1) box.is_in_txn() box.commit() end local stop_clock = clock.monotonic() local work_time = stop_clock - start_clock print("Result: " .. tostring(work_time)) s:drop() os.exit() BM2: local os = require('os') local clock = require('clock') box.cfg() local s = box.schema.space.create('test') s:create_index('primary') local start_clock = clock.monotonic() for i = 1,1000000 do box.is_in_txn() end local stop_clock = clock.monotonic() local work_time = stop_clock - start_clock print("Result: " .. tostring(work_time)) s:drop() os.exit() BM3: local os = require('os') local clock = require('clock') box.cfg() local s = box.schema.space.create('test') s:create_index('primary') local start_clock = clock.monotonic() for i = 1,1000000 do box.begin() box.commit() end local stop_clock = clock.monotonic() local work_time = stop_clock - start_clock print("Result: " .. tostring(work_time)) s:drop() os.exit() BM4: local os = require('os') local clock = require('clock') box.cfg() local s = box.schema.space.create('test') s:create_index('primary') local start_clock = clock.monotonic() for i = 1,1000000 do box.begin() s:replace{1} local save1 = box.savepoint() s:replace{2} box.rollback_to_savepoint(save1) box.commit() end local stop_clock = clock.monotonic() local work_time = stop_clock - start_clock print("Result: " .. tostring(work_time)) s:drop() os.exit() Results: BM Sergo commit(s) Leonid commit(s) Result bm1 26.7092 26.7897 Sergo commit win 0.3% bm2 0.0023 0.0195 Sergo commit win x 8.5 bm3 0.1281 0.1502 Sergo commit win 17.2% bm4 19.6567 19.8466 Sergo commit win 0.96% On 11/28/19 9:36 PM, Igor Munkin wrote: > Kostja, > > On 28.11.19, Konstantin Osipov wrote: >> * Igor Munkin [19/11/28 17:08]: >> >>> Why should we be aiming at using FFI more? The root cause is that >>> current fiber machinery (as well as some parts of triggers mechanism) >>> doesn't respect the Lua coroutine switch semantics, thereby breaking >>> trace recording. Lua-C API implicitly (or non-intentionally) prevents >>> breakage by JIT trace aborts when recording FUNCC. >> >> It's not correct. The current FFI functions were carefully crafted >> to never lead to sandwich code: only those functions which can not >> trigger a return to Lua were implemented as FFI. >> There was one regression between 1.10 and in 2.3 because we >> started firing rollback triggers when rolling back to a savepoint, >> which was spotted by a failing tests. >> >> One more time: When FFI bindings were written we were aware of NYI >> and took it into account. >> > > OK, maybe I said it the wrong way using the word "non-intentionally". I > mean that Tarantool doesn't use any special handler to asynchroniously > abort recording, since there is no such outside the LuaJIT internals > (jit.off blacklists the function regardless the host and guest stacks > layout). > >>> Therefore, I guess we should be aiming either at changing fiber >>> switching to the one respecting the LuaJIT runtime or at tuning JIT >>> compiler way more regarding the Lua-C usage. >> >> This is actually quite simple - we could easily call a LuaJIT hook >> whenever switching a fiber, to make sure that it carefully >> switches the internals as well. Mike Pall refused to cooperate on >> the matter, but now we (you) control our own destiny. > > Unfortunately, I haven't seen the thread where the subj is discussed > with Mike Pall, but the approach you proposed doesn't seem to be a > convenient one, however it still solves a problem (as does the move to > use Lua-C API for the code with possible Lua VM re-entrance underneath). > > The major flaw I see in this solution, is introducing the dependency on > the JIT interface into Tarantool internals. There is already one > dependency on LuaJIT-2.1.0 presented with internal headers usage for > several hacks in utils.c. As a result we are not able to simply replace > the Lua implementation to try another one (e.g. uJIT conforming > LuaJIT-2.0.5) for comparing each other. > > The best proposal we had with Kirill and Sergos is to finalize a trace > exactly at CALLXS IR, however after some research I found that the > snapshot to be replayed at the corresponding trace exit will restore the > guest stack it doesn't relate to. I hope to make further research this > direction, but it requires a way more time to adjust this behaviour and > its benefits are doubtful for me now. > > For now, there is a partial fix I mentioned before, however it still > violates the flow I described here[1]. I'm going to proceed with the > research a bit later and provide another patch. > >> >>> Besides, we can't fully prevent platform failures if there is an FFI >>> misusage in users code. >> >> Tarantool has never been claiming that it prevents people from > > Sorry, I simply misread the following: > |> Why not dig it up to protect from future erosion of the code base? > |> > |> This would be more valuable contribution than just falling back to > |> Lua/C for everything. > >> shooting themselves in the foot. Performance is the ultimate >> design goal, at the cost of safety at times. >> > > Great, we discussed with Leonid and Sasha offline and agreed to make > several benchmarks to be provided in this thread. With no benchmarks all > our estimates can be simply wrong. > >> >>>> What should be the rule of thumb in your opinion, ffi or >>>> lua/c? >>> >>> If you want to know my rule of thumb: FFI is for external existing >>> libraries to be used in Lua code (and all compiler related benefits are >>> nothing more than a godsend consequence, since all guest stack >>> manipulations are implemented in LuaJIT runtime, not in an external >>> code) and Lua-C is a well-designed and well-documented API for embedding >>> Lua into a host application / extending Lua with external low-lewel >>> libs. I totally do not insist on my point of view, since everyone has >>> it's own vision on LuaJIT features. >> >> OK, but there must be a single policy though. So far it was: >> everything that doesn't yield and doesn't call back to Lua >> uses FFI. Everything else *has* to use Lua/C API, UNTIL there >> is a way to safely sandwich FFI calls. >> > > I agree with you for the policy existence, but we all see the one you > mentioned above can introduce bugs leading to a platform failures. So I > guess we should reconsider it or simply dump somewhere. I think we have > to make some benchmarks and provide not only stats, but also a > reproducer with the input data, otherwise JIT tests are IMHO irrelevant. > >> -- >> Konstantin Osipov, Moscow, Russia > > [1]: https://github.com/tarantool/tarantool/issues/4427#issuecomment-546056302 >