[Tarantool-patches] [PATCH] Move txn from shema to a separate module (use C API instead of FFI)

Leonid Vasiliev lvasiliev at tarantool.org
Thu Dec 5 11:27:29 MSK 2019


On 12/4/19 4:15 PM, Konstantin Osipov wrote:
> * Leonid Vasiliev <lvasiliev at tarantool.org> [19/12/04 16:08]:
> 
> Leonid, thank you for the benchmark.
> 
> Could you please also provide
> - explanation for the results (not everyone may understand why the
>    results are the way they are)

Results explanation:
In case lvasiliev/gh-4427-move-some-stuff-from-ffi-to-c-api: 
box.begin(), box.is_in_txn(), box.savepoint() use a C-API
In case sergos/rollback_to_savepoint:
box.begin(), box.is_in_txn(), box.savepoint() use a LuaJIT's FFI

> - your analysis of the results and suggestions for further
>    actions?

Look, I'll be honest with you.
I don't have enough competence in Lua performance stuff. As I understand 
- each of these methods has its pros and cons. My (and Igor if I have 
understood correctly) point of view: "We need a some policy of using FFI 
vs. C-API". And I try to force it.

> 
>> So, several benchmarks (sergos/rollback_to_savepoint -
>> lvasiliev/gh-4427-move-some-stuff-from-ffi-to-c-api):
>>
>> BM1:
>>
>> local os = require('os')
>> local clock = require('clock')
>>
>> box.cfg()
>>
>> local s = box.schema.space.create('test')
>> s:create_index('primary')
>>
>>
>> local start_clock = clock.monotonic()
>>
>> for i = 1,1000000 do
>>      box.begin()
>>      s:replace{1}
>>      box.is_in_txn()
>>      local save1 = box.savepoint()
>>      s:replace{2}
>>      box.is_in_txn()
>>      local save2 = box.savepoint()
>>      s:replace{3}
>>      box.is_in_txn()
>>      local save3 = box.savepoint()
>>      s:replace{4}
>>      box.is_in_txn()
>>      box.rollback_to_savepoint(save3)
>>      box.is_in_txn()
>>      box.rollback_to_savepoint(save2)
>>      box.is_in_txn()
>>      box.rollback_to_savepoint(save1)
>>      box.is_in_txn()
>>      box.commit()
>> end
>>
>> local stop_clock = clock.monotonic()
>> local work_time = stop_clock - start_clock
>> print("Result: " .. tostring(work_time))
>>
>> s:drop()
>> os.exit()
>>
>>
>> BM2:
>>
>> local os = require('os')
>> local clock = require('clock')
>>
>> box.cfg()
>>
>> local s = box.schema.space.create('test')
>> s:create_index('primary')
>>
>>
>> local start_clock = clock.monotonic()
>>
>> for i = 1,1000000 do
>>      box.is_in_txn()
>> end
>>
>> local stop_clock = clock.monotonic()
>> local work_time = stop_clock - start_clock
>> print("Result: " .. tostring(work_time))
>>
>> s:drop()
>> os.exit()
>>
>>
>> BM3:
>>
>> local os = require('os')
>> local clock = require('clock')
>>
>> box.cfg()
>>
>> local s = box.schema.space.create('test')
>> s:create_index('primary')
>>
>>
>> local start_clock = clock.monotonic()
>>
>> for i = 1,1000000 do
>>      box.begin()
>>      box.commit()
>> end
>>
>> local stop_clock = clock.monotonic()
>> local work_time = stop_clock - start_clock
>> print("Result: " .. tostring(work_time))
>>
>> s:drop()
>> os.exit()
>>
>>
>> BM4:
>>
>> local os = require('os')
>> local clock = require('clock')
>>
>> box.cfg()
>>
>> local s = box.schema.space.create('test')
>> s:create_index('primary')
>>
>>
>> local start_clock = clock.monotonic()
>>
>> for i = 1,1000000 do
>>      box.begin()
>>      s:replace{1}
>>      local save1 = box.savepoint()
>>      s:replace{2}
>>      box.rollback_to_savepoint(save1)
>>      box.commit()
>> end
>>
>> local stop_clock = clock.monotonic()
>> local work_time = stop_clock - start_clock
>> print("Result: " .. tostring(work_time))
>>
>> s:drop()
>> os.exit()
>>
>>
>> Results:
>>
>> BM	Sergo commit(s)	Leonid commit(s)	Result
>> bm1	26.7092		26.7897			Sergo commit win 0.3%
>> bm2	0.0023		0.0195			Sergo commit win x 8.5
>> bm3	0.1281		0.1502			Sergo commit win 17.2%
>> bm4	19.6567		19.8466			Sergo commit win 0.96%
>>
>>
>> On 11/28/19 9:36 PM, Igor Munkin wrote:
>>> Kostja,
>>>
>>> On 28.11.19, Konstantin Osipov wrote:
>>>> * Igor Munkin <imun at tarantool.org> [19/11/28 17:08]:
>>>>
>>>>> Why should we be aiming at using FFI more? The root cause is that
>>>>> current fiber machinery (as well as some parts of triggers mechanism)
>>>>> doesn't respect the Lua coroutine switch semantics, thereby breaking
>>>>> trace recording. Lua-C API implicitly (or non-intentionally) prevents
>>>>> breakage by JIT trace aborts when recording FUNCC.
>>>>
>>>> It's not correct. The current FFI functions were carefully crafted
>>>> to never lead to sandwich code: only those functions which can not
>>>> trigger a return to Lua were implemented as FFI.
>>>> There was one regression between 1.10 and in 2.3 because we
>>>> started firing rollback triggers when rolling back to a savepoint,
>>>> which was spotted by a failing tests.
>>>>
>>>> One more time: When FFI bindings were written we were aware of NYI
>>>> and took it into account.
>>>>
>>>
>>> OK, maybe I said it the wrong way using the word "non-intentionally". I
>>> mean that Tarantool doesn't use any special handler to asynchroniously
>>> abort recording, since there is no such outside the LuaJIT internals
>>> (jit.off blacklists the function regardless the host and guest stacks
>>> layout).
>>>
>>>>> Therefore, I guess we should be aiming either at changing fiber
>>>>> switching to the one respecting the LuaJIT runtime or at tuning JIT
>>>>> compiler way more regarding the Lua-C usage.
>>>>
>>>> This is actually quite simple - we could easily call a LuaJIT hook
>>>> whenever switching a fiber, to make sure that it carefully
>>>> switches the internals as well. Mike Pall refused to cooperate on
>>>> the matter, but now we (you) control our own destiny.
>>>
>>> Unfortunately, I haven't seen the thread where the subj is discussed
>>> with Mike Pall, but the approach you proposed doesn't seem to be a
>>> convenient one, however it still solves a problem (as does the move to
>>> use Lua-C API for the code with possible Lua VM re-entrance underneath).
>>>
>>> The major flaw I see in this solution, is introducing the dependency on
>>> the JIT interface into Tarantool internals. There is already one
>>> dependency on LuaJIT-2.1.0 presented with internal headers usage for
>>> several hacks in utils.c. As a result we are not able to simply replace
>>> the Lua implementation to try another one (e.g. uJIT conforming
>>> LuaJIT-2.0.5) for comparing each other.
>>>
>>> The best proposal we had with Kirill and Sergos is to finalize a trace
>>> exactly at CALLXS IR, however after some research I found that the
>>> snapshot to be replayed at the corresponding trace exit will restore the
>>> guest stack it doesn't relate to. I hope to make further research this
>>> direction, but it requires a way more time to adjust this behaviour and
>>> its benefits are doubtful for me now.
>>>
>>> For now, there is a partial fix I mentioned before, however it still
>>> violates the flow I described here[1]. I'm going to proceed with the
>>> research a bit later and provide another patch.
>>>
>>>>
>>>>> Besides, we can't fully prevent platform failures if there is an FFI
>>>>> misusage in users code.
>>>>
>>>> Tarantool has never been claiming that it prevents people from
>>>
>>> Sorry, I simply misread the following:
>>> |> Why not dig it up to protect from future erosion of the code base?
>>> |>
>>> |> This would be more valuable contribution than just falling back to
>>> |> Lua/C for everything.
>>>
>>>> shooting themselves in the foot. Performance is the ultimate
>>>> design goal, at the cost of safety at times.
>>>>
>>>
>>> Great, we discussed with Leonid and Sasha offline and agreed to make
>>> several benchmarks to be provided in this thread. With no benchmarks all
>>> our estimates can be simply wrong.
>>>
>>>>
>>>>>> What should be the rule of thumb in your opinion, ffi or
>>>>>> lua/c?
>>>>>
>>>>> If you want to know my rule of thumb: FFI is for external existing
>>>>> libraries to be used in Lua code (and all compiler related benefits are
>>>>> nothing more than a godsend consequence, since all guest stack
>>>>> manipulations are implemented in LuaJIT runtime, not in an external
>>>>> code) and Lua-C is a well-designed and well-documented API for embedding
>>>>> Lua into a host application / extending Lua with external low-lewel
>>>>> libs. I totally do not insist on my point of view, since everyone has
>>>>> it's own vision on LuaJIT features.
>>>>
>>>> OK, but there must be a single policy though. So far it was:
>>>> everything that doesn't yield and doesn't call back to Lua
>>>> uses FFI. Everything else *has* to use Lua/C API, UNTIL  there
>>>> is a way to safely sandwich FFI calls.
>>>>
>>>
>>> I agree with you for the policy existence, but we all see the one you
>>> mentioned above can introduce bugs leading to a platform failures. So I
>>> guess we should reconsider it or simply dump somewhere. I think we have
>>> to make some benchmarks and provide not only stats, but also a
>>> reproducer with the input data, otherwise JIT tests are IMHO irrelevant.
>>>
>>>> -- 
>>>> Konstantin Osipov, Moscow, Russia
>>>
>>> [1]: https://github.com/tarantool/tarantool/issues/4427#issuecomment-546056302
>>>
> 


More information about the Tarantool-patches mailing list