[Tarantool-patches] [PATCH luajit] ARM64: Fix write barrier in BC_USETS.

Sergey Kaplun skaplun at tarantool.org
Sun Aug 1 20:00:33 MSK 2021


Igor,

Thanks for the review!
Update commit message on the branch, considering you comments.

See answers to you questions below.

On 01.08.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider the comments below.
> 
> On 07.07.21, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> > 
> > Contributed by Javier Guerra Giraldez.
> > 
> > (cherry picked from commit c785131ca5a6d24adc519e5e0bf1b69b671d912f)
> > 
> > Closed upvalues are never gray. So after closed upvalue is marked, it is
> > marked as black. Black objects can't refer white objects, so for storing
> > a white value in closed upvalue, we need to move the barrier forward and
> > color our value to gray by using `lj_gc_barrieruv()`. This function
> > can't be called on closed upvalues with non-white values (at least there
> > is no need to mark it again).
> 
> Minor: Considering the comments in parenthesis, "can't" looks more like
> "shouldn't". Anyway, I looked to the sources, and see the assertion,
> that only white and alive objects need to be marked, so I'm confused
> with your remark.

But the assertion means that it can't. The comment is only the
clarification why.
Ignore for now.

> 
> > 
> > USETS bytecode for arm64 architecture has the incorrect instruction to
> > check that upvalue is closed:
> 
> AFAIU, the instruction is correct, but the nzcv value is not.

Fixed.

> 
> > | ccmp TMP0w, #0, #0, ne
> > | beq <1 // branch out from barrier movement
> > `TMP0w` contains `upvalue->closed` field. If it equals NULL (the first
> > `#0`). The second zero is the value of NZCV condition flags set if the
> > condition (`ne`) is FALSE [1][2]. If the set value is not white, then
> > flags are set to zero and branch is not taken (no Zero flag). If it
> > happens at propagate or atomic GC State and the `lj_gc_barrieruv()`
> > function is called then the gray value to set is marked as white. That
> > leads to the assertion failure in the `gc_mark()` function.
> 
> OK, I understand almost nothing from the part above. Here are the
> comments:
> 1. "If it equals NULL (the first `#0`)", then what?

My bad:
I mean here:
If it equals NULL (the first `#0`), then the upvalue is open.
Added this.

> 2. Just to check we are on the same page: the second "immediate"
> mentioned in docs[1] is NZCV?

Yes.

>                               Then beq <1 branch is not taken since
> (TMP0w != 0) is FALSE (i.e. upvalue is not closed), but zero flag in
> NZCV value is not set?

Yes.

>                        So how does the color of the value to be stored
> relate to this control flow?

This NZCV value isn't set if the upvalue is white, because condition is
of the following instruction

|    tst TMP1w, #LJ_GC_WHITES	// iswhite(str)

is TRUE. So the <1 branch is taken, because the upvalue is closed.

> 3. AFAICS, if the branch is not taken and <lj_gc_barrieruv> is called at
> propagate or atomic phase, the value is colored either to gray or black.

Yes, that leads to the assertion failure mentioned in the ticket in the
LuaJIT upstream.

> 
> > 
> > This patch changes yielded NZCV condition flag to 4 (Zero flag is up) to
> > take the correct branch after `ccmp` instruction.
> > 
> > Sergey Kaplun:
> > * added the description and the test for the problem
> > 
> > [1]: https://developer.arm.com/documentation/dui0801/g/pge1427897656225
> > [2]: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes
> 
> Minor: Why #5629 is not mentioned?

Added.

> 
> > ---
> > 
> > LuaJIT branch: https://github.com/tarantool/luajit/tree/skaplun/lj-426-incorrect-check-closed-uv
> > Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/lj-426-incorrect-check-closed-uv
> > 
> > Assertion failure [1] is not related to the patch (I've reproduced it on
> > master branch). Looks like another one GC64 issue.
> 
> Is this failure described in #6227 fixed by this patch[1]?

Yes.

> 
> > 
> > How to reproduce:
> > 1) Run the following command from the Tarantool repo on Odroid:
> > | $ i=0; while [[ $? == 0 ]]; do i=$(($i+1)); echo $i; make LuaJIT-tests; done
> > 2) Wait (need 4-15 iterations).
> > 
> > [1]: https://github.com/tarantool/tarantool/runs/3009273464#step:4:4013
> > 
> > Side note: Thanks to the Lord, that there is no #0 issue and it is not
> > mentioned that way...
> 
> Heh, GitHub is not ready for ARM64 support, but Tarantool almost is!
> 
> > 
> >  src/vm_arm64.dasc                             |  2 +-
> >  ...6-arm64-incorrect-check-closed-uv.test.lua | 38 +++++++++++++++++++
> >  2 files changed, 39 insertions(+), 1 deletion(-)
> >  create mode 100644 test/tarantool-tests/lj-426-arm64-incorrect-check-closed-uv.test.lua
> > 
> 
> <snipped>
> 
> > diff --git a/test/tarantool-tests/lj-426-arm64-incorrect-check-closed-uv.test.lua b/test/tarantool-tests/lj-426-arm64-incorrect-check-closed-uv.test.lua
> > new file mode 100644
> > index 00000000..b757133f
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-426-arm64-incorrect-check-closed-uv.test.lua
> > @@ -0,0 +1,38 @@
> > +local tap = require('tap')
> > +
> > +local test = tap.test('lj-426-arm64-incorrect-check-closed-uv')
> > +test:plan(1)
> > +
> > +-- Test file to demonstrate LuaJIT USETS bytecode incorrect
> > +-- behaviour on arm64 in case when non-white object is set to
> > +-- closed upvalue.
> > +-- See also, https://github.com/LuaJIT/LuaJIT/issues/426.
> > +
> > +-- First, create a closed upvalue.
> > +do
> 
> Minor: I'm not sure, we need a separate lexical block here. Could you
> please clarify the reason in the comment?

We need a closed upvalue. I suppose that it is the simpiest way to
create one. Please, provide a simplier example if you know one.

> 
> > +  local uv -- luacheck: no unused
> > +  -- The function's prototype is created with the following
> > +  -- constants at chunk parsing. After adding this constant to
> > +  -- the function's prototype it will be marked as gray during
> > +  -- propogate phase.
> 
> Then what does it test, if the constant is marked as gray? Will this
> string be white later?

It shouldn't be white, it should be gray, otherwise the aforementioned
condition is TRUE (remember, we need FALSE).

> 
> > +  local function usets() uv = '' end
> > +  _G.usets = usets
> > +end
> > +
> > +-- Set GC state to GCpause.
> > +collectgarbage()
> > +-- Do GC step as often as possible.
> > +collectgarbage('setstepmul', 100)
> 
> Minor: Don't get, why you need to make GC less aggressive for the test.
> The test is run, until propagate phase is finished.

More likely, that it is run, until the upvalue is marked as black
during traversing (with the bug). I can remove this line if you insist.

> 
> > +
> > +-- We don't know on what exactly step our upvalue is marked as
> > +-- black and USETS become dangerous, so just check it at each
> > +-- step.
> > +-- Don't need to do the full GC cycle step by step.
> > +local old_steps_atomic = misc.getmetrics().gc_steps_atomic
> > +while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
> > +  collectgarbage('step')
> > +  usets() -- luacheck: no global
> > +end
> > +
> > +test:ok(true)
> > +os.exit(test:check() and 0 or 1)
> > -- 
> > 2.31.0
> > 
> 
> [1]: https://lists.tarantool.org/tarantool-patches/20210719073632.12008-1-skaplun@tarantool.org/T/#u
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list