[Tarantool-patches] [PATCH v2 luajit 16/30] test: adapt PUC Lua test for %q in fmt for LuaJIT

Sergey Kaplun skaplun at tarantool.org
Thu Apr 8 11:51:52 MSK 2021


Igor,

On 07.04.21, Igor Munkin wrote:
> Sergey,
> 
> On 07.04.21, Sergey Kaplun wrote:
> > Igor,
> > 
> 
> <snipped>
> 
> > > > > > string.format(): %q reversible.
> > > > > > See also https://luajit.org/extensions.html#lua52.
> > > > > > 
> > > > > > In Lua 5.1 string.format() does not accept string values containing
> > > > > > embedded zeros, except as arguments to the '%q' option.
> > > > > > In Lua 5.2 '\0' is not handled differently from other
> > > > > > control chars in string.format('%q', ...).
> > > > > > See commit 7cc981c14067d4b0e774a6bfb0acfc2f5c911f0d
> > > > > > (string.format("%q", str) is now fully reversible
> > > > > > (from Lua 5.2).).
> > > > > 
> > > > > Well, I honestly don't understand what is changed in *semantics*. I've
> > > > > tried the following command with Lua 5.2, Lua 5.1 and LuaJIT 2.0.5 as an
> > > > > interpreter being tested
> > > > > | <interp> -e 'print(string.format("%q", "\0"))'
> > > > > 
> > > > > I understand the semantics of "%q", but was it just a bug in Lua 5.1?
> > > > 
> > > > A bug with the test for it???
> > > 
> > > Well, I can remind you the bug with <tonumber> we fixed the last year.
> > > There might be no test for it though, but all in all it has not been
> > > fixed in Lua 5.1.
> > 
> > I mean that this behaviour is verificated by the test. When behaviour is
> > changed the test is changed too.
> > 
> > > 
> > > > 
> > > > > What does "fully reversible" mean in this context?
> > > 
> > > This question is left unaddressed.
> > 
> > I don't know what does Mike mean by these.
> 
> Then it would be great to describe the changes on your own. I can hardly
> split the comment into yours words and ones taked from Lua Reference
> manual or git log, but you are writing that "In Lua 5.2 '\0' is not
> handled differently from other control chars". Could you please clarify
> the difference you are talking about? Or provide the links describing
> it? May be some parts from PIL?
> 
> > 
> > > 
> > > > > 
> > > > > I understand only the fact the behaviour differs and you reimplemented
> > > > > the test assertion according to Lua 5.2 testing suite. That's all.
> > > > > 
> > > > > I found not a single word regarding this issue in Lua bugs[1] page,
> > > > > except invalid handling of \r[2]. Is there any issue/page with a more
> > > > 
> > > > It looks unrelated to these changes.
> > > > 
> > > > > verbose explanation what has been changed in 7cc981c?
> > > > 
> > > > I just read these lines in Lua 5.1 reference manual :):
> > > > | This function does not accept string values containing embedded
> > > > | zeros, except as arguments to the q option.
> > > 
> > > So what? This means literally nothing to me... BTW, I can pass such
> > > string to the function and it can yield any bullshit the developer
> > > wanted to. That's why we decided to comment such places in a clear and
> > > verbose way, didn't we?
> > 
> > Don't get you point here. AFAIU it means that `%q` is the only one
> > option that can contain embeded zeros, so it handles it in the special
> > way.
> 
> My point relates to the fact, nobody except you can understand the
> comment near the change. Hence one need to make the similar
> investigation you made. Then what is the sense of commenting the changes
> in suite?
> 
> > 
> > > 
> > > > 
> > > > As for me, it is just new behaviour of Lua 5.2 -- patterns now accept
> > > > '\0' as a reqular character (see
> > > > 4541243355a299a9b75042d207feb87295872c3a (patterns now accept '\0' as a
> > > > regular character) from Lua repository). So, according to commit
> > > > 658ea8752b979102627e2fede7b7ddfbb67ba6c9 (no need to handle '\0'
> > > > differently from other control chars in format '%q')) from Lua
> > > > repository, this behaviour is excess.
> > > > 
> > > > Also, it is mentioned here [2].
> > > 
> > > I see nothing regarding this change there.
> > 
> > I am talking about this part:
> > 
> > | Character class %z in patterns is deprecated, as now patterns may
> > | contain '\0' as a regular character.
> 
> Patterns, not options, right?
> 
> > 
> 
> <snipped>

I haven't found no more verbose descriptions about %q specifier
neither in the PIL, nor in the Lua Reference Manual, nor in the Lua
mailing list, nor in the LuaJIT mailing list.
So, I am appealing to the commits in the Lua repository.

Going back to the definition of specifier %q from Lua 5.1 Reference
manual:
| The q option formats a string in a form suitable to be safely read
| back by the Lua interpreter: ...

Answering your previous question -- "reversible" means (in my
understanding now) that `string.format("% q", binary)` should return
the same-looking `binary` string that was presented to the
interpreter and `string.format()` first (perhaps, it can expand
control characters like '\r', '\n', and turn symbol "\65" into "A").
This is why I was referencing zero-bytes in patterns (unfortunately,
wrongly).

Historically, in Lua 5.2, control characters are written as \nnn
when needed, see d62a21b9d379a576bae7426c80039ca1a4d2bb07 ("when
formatting with '%q', all control characters are coded as \nnn.") [1].
In this patch, %q specifier starts writing control characters
to a new string in the same way through \n, instead of their binary
representation as it is. If the control character is followed by a
digit, then for the correct work of the parser, it is necessary to
extend the escape sequence to 3 significant characters (otherwise,
the transition "\0002" -> "\02" corrupts string).
For this patch, the expansion is performed unconditionally (the check
for the next symbol is not performed) if this first symbol is a zero
byte. You may notice that the additional check for '\0' is
unnecessary - if after it comes a digit, then a branch with expanding
the format to 3 significant characters will be selected. Consistent
work with the zero byte processing was added in the commit
658ea8752b979102627e2fede7b7ddfbb67ba6c9 ("no need to handle '\0'
differently from other control chars in format '%q'") [2]. Note that
this patch does not change the semantics of %q specifier - the new
string can still be "safely read back by the Lua interpreter".

As we know, in Lua 5.1, '\0' was converted to "\000" unconditionally.
This was partly done to avoid unnecessary logic, as I suppose, since
before the d62a21b commit, all other characters were transmitted
as-is in the new string, and there was no inconsistency.

Is it worth mentioning the first commit clarifying the cause of the
inconsistency, because it does not directly relate to changes? The
initial comment, in my opinion, reflects the reason for the failing
test -- '\0' began to be processed in the same way as other
controlled characters in Lua 5.2. LuaJIT has adopted this practice in
a related commit. The curious reader can independently search the
commit history for these changes. Please let me know, if you think
these clarifications are necessary to describe the problem.

> 
> > > > > [1]: https://www.lua.org/bugs.html#5.1
> > > > > [2]: https://www.lua.org/bugs.html#5.1-4
> > > > > 
> > > > > -- 
> > > > > Best regards,
> > > > > IM
> > > > 
> > > > [1]: https://www.lua.org/manual/5.1/manual.html#pdf-string.format
> > > > [2]: https://www.lua.org/manual/5.2/manual.html#8.2
> 
> <snipped>
> 
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 
> -- 
> Best regards,
> IM

[1]: https://github.com/lua/lua/commit/d62a21b9d379a576bae7426c80039ca1a4d2bb07
[2]: https://github.com/lua/lua/commit/658ea8752b979102627e2fede7b7ddfbb67ba6c9

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list