[Tarantool-patches] [PATCH] Move txn from shema to a separate module (use C API instead of FFI)

Konstantin Osipov kostja.osipov at gmail.com
Wed Dec 4 16:15:32 MSK 2019


* Leonid Vasiliev <lvasiliev at tarantool.org> [19/12/04 16:08]:

Leonid, thank you for the benchmark.

Could you please also provide 
- explanation for the results (not everyone may understand why the
  results are the way they are)
- your analysis of the results and suggestions for further
  actions?

> So, several benchmarks (sergos/rollback_to_savepoint -
> lvasiliev/gh-4427-move-some-stuff-from-ffi-to-c-api):
> 
> BM1:
> 
> local os = require('os')
> local clock = require('clock')
> 
> box.cfg()
> 
> local s = box.schema.space.create('test')
> s:create_index('primary')
> 
> 
> local start_clock = clock.monotonic()
> 
> for i = 1,1000000 do
>     box.begin()
>     s:replace{1}
>     box.is_in_txn()
>     local save1 = box.savepoint()
>     s:replace{2}
>     box.is_in_txn()
>     local save2 = box.savepoint()
>     s:replace{3}
>     box.is_in_txn()
>     local save3 = box.savepoint()
>     s:replace{4}
>     box.is_in_txn()
>     box.rollback_to_savepoint(save3)
>     box.is_in_txn()
>     box.rollback_to_savepoint(save2)
>     box.is_in_txn()
>     box.rollback_to_savepoint(save1)
>     box.is_in_txn()
>     box.commit()
> end
> 
> local stop_clock = clock.monotonic()
> local work_time = stop_clock - start_clock
> print("Result: " .. tostring(work_time))
> 
> s:drop()
> os.exit()
> 
> 
> BM2:
> 
> local os = require('os')
> local clock = require('clock')
> 
> box.cfg()
> 
> local s = box.schema.space.create('test')
> s:create_index('primary')
> 
> 
> local start_clock = clock.monotonic()
> 
> for i = 1,1000000 do
>     box.is_in_txn()
> end
> 
> local stop_clock = clock.monotonic()
> local work_time = stop_clock - start_clock
> print("Result: " .. tostring(work_time))
> 
> s:drop()
> os.exit()
> 
> 
> BM3:
> 
> local os = require('os')
> local clock = require('clock')
> 
> box.cfg()
> 
> local s = box.schema.space.create('test')
> s:create_index('primary')
> 
> 
> local start_clock = clock.monotonic()
> 
> for i = 1,1000000 do
>     box.begin()
>     box.commit()
> end
> 
> local stop_clock = clock.monotonic()
> local work_time = stop_clock - start_clock
> print("Result: " .. tostring(work_time))
> 
> s:drop()
> os.exit()
> 
> 
> BM4:
> 
> local os = require('os')
> local clock = require('clock')
> 
> box.cfg()
> 
> local s = box.schema.space.create('test')
> s:create_index('primary')
> 
> 
> local start_clock = clock.monotonic()
> 
> for i = 1,1000000 do
>     box.begin()
>     s:replace{1}
>     local save1 = box.savepoint()
>     s:replace{2}
>     box.rollback_to_savepoint(save1)
>     box.commit()
> end
> 
> local stop_clock = clock.monotonic()
> local work_time = stop_clock - start_clock
> print("Result: " .. tostring(work_time))
> 
> s:drop()
> os.exit()
> 
> 
> Results:
> 
> BM	Sergo commit(s)	Leonid commit(s)	Result
> bm1	26.7092		26.7897			Sergo commit win 0.3%
> bm2	0.0023		0.0195			Sergo commit win x 8.5
> bm3	0.1281		0.1502			Sergo commit win 17.2%
> bm4	19.6567		19.8466			Sergo commit win 0.96%
> 
> 
> On 11/28/19 9:36 PM, Igor Munkin wrote:
> > Kostja,
> > 
> > On 28.11.19, Konstantin Osipov wrote:
> > > * Igor Munkin <imun at tarantool.org> [19/11/28 17:08]:
> > > 
> > > > Why should we be aiming at using FFI more? The root cause is that
> > > > current fiber machinery (as well as some parts of triggers mechanism)
> > > > doesn't respect the Lua coroutine switch semantics, thereby breaking
> > > > trace recording. Lua-C API implicitly (or non-intentionally) prevents
> > > > breakage by JIT trace aborts when recording FUNCC.
> > > 
> > > It's not correct. The current FFI functions were carefully crafted
> > > to never lead to sandwich code: only those functions which can not
> > > trigger a return to Lua were implemented as FFI.
> > > There was one regression between 1.10 and in 2.3 because we
> > > started firing rollback triggers when rolling back to a savepoint,
> > > which was spotted by a failing tests.
> > > 
> > > One more time: When FFI bindings were written we were aware of NYI
> > > and took it into account.
> > > 
> > 
> > OK, maybe I said it the wrong way using the word "non-intentionally". I
> > mean that Tarantool doesn't use any special handler to asynchroniously
> > abort recording, since there is no such outside the LuaJIT internals
> > (jit.off blacklists the function regardless the host and guest stacks
> > layout).
> > 
> > > > Therefore, I guess we should be aiming either at changing fiber
> > > > switching to the one respecting the LuaJIT runtime or at tuning JIT
> > > > compiler way more regarding the Lua-C usage.
> > > 
> > > This is actually quite simple - we could easily call a LuaJIT hook
> > > whenever switching a fiber, to make sure that it carefully
> > > switches the internals as well. Mike Pall refused to cooperate on
> > > the matter, but now we (you) control our own destiny.
> > 
> > Unfortunately, I haven't seen the thread where the subj is discussed
> > with Mike Pall, but the approach you proposed doesn't seem to be a
> > convenient one, however it still solves a problem (as does the move to
> > use Lua-C API for the code with possible Lua VM re-entrance underneath).
> > 
> > The major flaw I see in this solution, is introducing the dependency on
> > the JIT interface into Tarantool internals. There is already one
> > dependency on LuaJIT-2.1.0 presented with internal headers usage for
> > several hacks in utils.c. As a result we are not able to simply replace
> > the Lua implementation to try another one (e.g. uJIT conforming
> > LuaJIT-2.0.5) for comparing each other.
> > 
> > The best proposal we had with Kirill and Sergos is to finalize a trace
> > exactly at CALLXS IR, however after some research I found that the
> > snapshot to be replayed at the corresponding trace exit will restore the
> > guest stack it doesn't relate to. I hope to make further research this
> > direction, but it requires a way more time to adjust this behaviour and
> > its benefits are doubtful for me now.
> > 
> > For now, there is a partial fix I mentioned before, however it still
> > violates the flow I described here[1]. I'm going to proceed with the
> > research a bit later and provide another patch.
> > 
> > > 
> > > > Besides, we can't fully prevent platform failures if there is an FFI
> > > > misusage in users code.
> > > 
> > > Tarantool has never been claiming that it prevents people from
> > 
> > Sorry, I simply misread the following:
> > |> Why not dig it up to protect from future erosion of the code base?
> > |>
> > |> This would be more valuable contribution than just falling back to
> > |> Lua/C for everything.
> > 
> > > shooting themselves in the foot. Performance is the ultimate
> > > design goal, at the cost of safety at times.
> > > 
> > 
> > Great, we discussed with Leonid and Sasha offline and agreed to make
> > several benchmarks to be provided in this thread. With no benchmarks all
> > our estimates can be simply wrong.
> > 
> > > 
> > > > > What should be the rule of thumb in your opinion, ffi or
> > > > > lua/c?
> > > > 
> > > > If you want to know my rule of thumb: FFI is for external existing
> > > > libraries to be used in Lua code (and all compiler related benefits are
> > > > nothing more than a godsend consequence, since all guest stack
> > > > manipulations are implemented in LuaJIT runtime, not in an external
> > > > code) and Lua-C is a well-designed and well-documented API for embedding
> > > > Lua into a host application / extending Lua with external low-lewel
> > > > libs. I totally do not insist on my point of view, since everyone has
> > > > it's own vision on LuaJIT features.
> > > 
> > > OK, but there must be a single policy though. So far it was:
> > > everything that doesn't yield and doesn't call back to Lua
> > > uses FFI. Everything else *has* to use Lua/C API, UNTIL  there
> > > is a way to safely sandwich FFI calls.
> > > 
> > 
> > I agree with you for the policy existence, but we all see the one you
> > mentioned above can introduce bugs leading to a platform failures. So I
> > guess we should reconsider it or simply dump somewhere. I think we have
> > to make some benchmarks and provide not only stats, but also a
> > reproducer with the input data, otherwise JIT tests are IMHO irrelevant.
> > 
> > > -- 
> > > Konstantin Osipov, Moscow, Russia
> > 
> > [1]: https://github.com/tarantool/tarantool/issues/4427#issuecomment-546056302
> > 

-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list