[Tarantool-patches] [PATCH v2 luajit 16/30] test: adapt PUC Lua test for %q in fmt for LuaJIT

Igor Munkin imun at tarantool.org
Mon Apr 12 13:26:07 MSK 2021


Sergey,

On 08.04.21, Sergey Kaplun wrote:
> Igor,
> 
> On 07.04.21, Igor Munkin wrote:
> > Sergey,
> > 
> > On 07.04.21, Sergey Kaplun wrote:
> > > Igor,
> > > 
> > 
> > <snipped>
> > 
> > > > > > > string.format(): %q reversible.
> > > > > > > See also https://luajit.org/extensions.html#lua52.
> > > > > > > 
> > > > > > > In Lua 5.1 string.format() does not accept string values containing
> > > > > > > embedded zeros, except as arguments to the '%q' option.
> > > > > > > In Lua 5.2 '\0' is not handled differently from other
> > > > > > > control chars in string.format('%q', ...).
> > > > > > > See commit 7cc981c14067d4b0e774a6bfb0acfc2f5c911f0d
> > > > > > > (string.format("%q", str) is now fully reversible
> > > > > > > (from Lua 5.2).).
> > > > > > 
> > > > > > Well, I honestly don't understand what is changed in *semantics*. I've
> > > > > > tried the following command with Lua 5.2, Lua 5.1 and LuaJIT 2.0.5 as an
> > > > > > interpreter being tested
> > > > > > | <interp> -e 'print(string.format("%q", "\0"))'
> > > > > > 
> > > > > > I understand the semantics of "%q", but was it just a bug in Lua 5.1?
> > > > > 
> > > > > A bug with the test for it???
> > > > 
> > > > Well, I can remind you the bug with <tonumber> we fixed the last year.
> > > > There might be no test for it though, but all in all it has not been
> > > > fixed in Lua 5.1.
> > > 
> > > I mean that this behaviour is verificated by the test. When behaviour is
> > > changed the test is changed too.
> > > 
> > > > 
> > > > > 
> > > > > > What does "fully reversible" mean in this context?
> > > > 
> > > > This question is left unaddressed.
> > > 
> > > I don't know what does Mike mean by these.
> > 
> > Then it would be great to describe the changes on your own. I can hardly
> > split the comment into yours words and ones taked from Lua Reference
> > manual or git log, but you are writing that "In Lua 5.2 '\0' is not
> > handled differently from other control chars". Could you please clarify
> > the difference you are talking about? Or provide the links describing
> > it? May be some parts from PIL?
> > 
> > > 
> > > > 
> > > > > > 
> > > > > > I understand only the fact the behaviour differs and you reimplemented
> > > > > > the test assertion according to Lua 5.2 testing suite. That's all.
> > > > > > 
> > > > > > I found not a single word regarding this issue in Lua bugs[1] page,
> > > > > > except invalid handling of \r[2]. Is there any issue/page with a more
> > > > > 
> > > > > It looks unrelated to these changes.
> > > > > 
> > > > > > verbose explanation what has been changed in 7cc981c?
> > > > > 
> > > > > I just read these lines in Lua 5.1 reference manual :):
> > > > > | This function does not accept string values containing embedded
> > > > > | zeros, except as arguments to the q option.
> > > > 
> > > > So what? This means literally nothing to me... BTW, I can pass such
> > > > string to the function and it can yield any bullshit the developer
> > > > wanted to. That's why we decided to comment such places in a clear and
> > > > verbose way, didn't we?
> > > 
> > > Don't get you point here. AFAIU it means that `%q` is the only one
> > > option that can contain embeded zeros, so it handles it in the special
> > > way.
> > 
> > My point relates to the fact, nobody except you can understand the
> > comment near the change. Hence one need to make the similar
> > investigation you made. Then what is the sense of commenting the changes
> > in suite?
> > 
> > > 
> > > > 
> > > > > 
> > > > > As for me, it is just new behaviour of Lua 5.2 -- patterns now accept
> > > > > '\0' as a reqular character (see
> > > > > 4541243355a299a9b75042d207feb87295872c3a (patterns now accept '\0' as a
> > > > > regular character) from Lua repository). So, according to commit
> > > > > 658ea8752b979102627e2fede7b7ddfbb67ba6c9 (no need to handle '\0'
> > > > > differently from other control chars in format '%q')) from Lua
> > > > > repository, this behaviour is excess.
> > > > > 
> > > > > Also, it is mentioned here [2].
> > > > 
> > > > I see nothing regarding this change there.
> > > 
> > > I am talking about this part:
> > > 
> > > | Character class %z in patterns is deprecated, as now patterns may
> > > | contain '\0' as a regular character.
> > 
> > Patterns, not options, right?
> > 
> > > 
> > 
> > <snipped>
> 
> I haven't found no more verbose descriptions about %q specifier
> neither in the PIL, nor in the Lua Reference Manual, nor in the Lua
> mailing list, nor in the LuaJIT mailing list.
> So, I am appealing to the commits in the Lua repository.
> 
> Going back to the definition of specifier %q from Lua 5.1 Reference
> manual:
> | The q option formats a string in a form suitable to be safely read
> | back by the Lua interpreter: ...
> 
> Answering your previous question -- "reversible" means (in my
> understanding now) that `string.format("% q", binary)` should return
> the same-looking `binary` string that was presented to the
> interpreter and `string.format()` first (perhaps, it can expand
> control characters like '\r', '\n', and turn symbol "\65" into "A").
> This is why I was referencing zero-bytes in patterns (unfortunately,
> wrongly).
> 
> Historically, in Lua 5.2, control characters are written as \nnn
> when needed, see d62a21b9d379a576bae7426c80039ca1a4d2bb07 ("when
> formatting with '%q', all control characters are coded as \nnn.") [1].
> In this patch, %q specifier starts writing control characters
> to a new string in the same way through \n, instead of their binary
> representation as it is. If the control character is followed by a
> digit, then for the correct work of the parser, it is necessary to
> extend the escape sequence to 3 significant characters (otherwise,
> the transition "\0002" -> "\02" corrupts string).
> For this patch, the expansion is performed unconditionally (the check
> for the next symbol is not performed) if this first symbol is a zero
> byte. You may notice that the additional check for '\0' is
> unnecessary - if after it comes a digit, then a branch with expanding
> the format to 3 significant characters will be selected. Consistent
> work with the zero byte processing was added in the commit
> 658ea8752b979102627e2fede7b7ddfbb67ba6c9 ("no need to handle '\0'
> differently from other control chars in format '%q'") [2]. Note that
> this patch does not change the semantics of %q specifier - the new
> string can still be "safely read back by the Lua interpreter".
> 
> As we know, in Lua 5.1, '\0' was converted to "\000" unconditionally.
> This was partly done to avoid unnecessary logic, as I suppose, since
> before the d62a21b commit, all other characters were transmitted
> as-is in the new string, and there was no inconsistency.
> 
> Is it worth mentioning the first commit clarifying the cause of the
> inconsistency, because it does not directly relate to changes? The
> initial comment, in my opinion, reflects the reason for the failing
> test -- '\0' began to be processed in the same way as other
> controlled characters in Lua 5.2. LuaJIT has adopted this practice in
> a related commit. The curious reader can independently search the
> commit history for these changes. Please let me know, if you think
> these clarifications are necessary to describe the problem.

IMHO, it's worth mentioning everything above! Thanks for such deep
digging and clarification: now I understand the original semantics and
your change. It was totally unclear that Lua converts \0 to \000
considering it as an octet. Please enrich the comment with some parts
above (honestly, you can put the whole explanation to the comment if you
want), and after this change, the patch LGTM. Again, thanks a lot for
your investigation!

> 
> > 
> > > > > > [1]: https://www.lua.org/bugs.html#5.1
> > > > > > [2]: https://www.lua.org/bugs.html#5.1-4
> > > > > > 
> > > > > > -- 
> > > > > > Best regards,
> > > > > > IM
> > > > > 
> > > > > [1]: https://www.lua.org/manual/5.1/manual.html#pdf-string.format
> > > > > [2]: https://www.lua.org/manual/5.2/manual.html#8.2
> > 
> > <snipped>
> > 
> > > 
> > > -- 
> > > Best regards,
> > > Sergey Kaplun
> > 
> > -- 
> > Best regards,
> > IM
> 
> [1]: https://github.com/lua/lua/commit/d62a21b9d379a576bae7426c80039ca1a4d2bb07
> [2]: https://github.com/lua/lua/commit/658ea8752b979102627e2fede7b7ddfbb67ba6c9
> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list