From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 9D5F66BD18; Mon, 12 Apr 2021 13:26:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 9D5F66BD18 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618223181; bh=Ci4ltcjbIjoUMCJFtVDOWGbE78n7w9rq7hHL2xK5ZE4=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=TAqD/yUh97Jue+WHadwhCNHKCEgLFhDP1Za149AvUPZjuSctXymVlKN32ltG0+Nvt 2PKCrw9TEZMEeEkKzU/j9XpVNm1ACQP8yh2tSAkTgB1gJCdcH9gSFD41K22fF/hNvy jRtqKlT3Ei74RB4+Zr8C92P3Ytu5QEWUp4N+nviQ= Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 3775F6EC5B for ; Mon, 12 Apr 2021 13:26:20 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3775F6EC5B Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1lVtlr-0008Pq-2C; Mon, 12 Apr 2021 13:26:19 +0300 Date: Mon, 12 Apr 2021 13:26:07 +0300 To: Sergey Kaplun Message-ID: <20210412102607.GV29703@tarantool.org> References: <20210330221631.GA29703@tarantool.org> <20210406213704.GL29703@tarantool.org> <20210407163113.GO29703@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.10.1 (2018-07-13) X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E7480EBD5CA77A668ECB87DA2124B0A8E6609182A05F5380850402D90F537A856A350528DE82435D2378BA0B21E2022CCCF8D0A6EA5BE8045C8AC X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7FBB2043146276655EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063779089FB2CE4EA2908638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B232ECFB80E76F063916E811EB6A12F99F0A05E728617FD226D2E47CDBA5A96583C09775C1D3CA48CF17B107DEF921CE79117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE7328B01A8D746D8839FA2833FD35BB23DF004C906525384302BEBFE083D3B9BA73A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4930A3850AC1BE2E7350555CCFDA08FA3FAC4224003CC83647689D4C264860C145E X-C1DE0DAB: 0D63561A33F958A56F75A48D9A89A63B97BAEFB9A2E28B3C229FA183CEB960A4D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3467D08F30473A584292CFF126D6F1549564E623DC8F63B03A98CBA5E8B6AAC2656D5640FD63F5BED51D7E09C32AA3244C77E811A484EBB75D672AABC8AD7891E495A9E0DC41E9A4CF927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojq8JA+pXcDukLI3vyXZvejw== X-Mailru-Sender: 689FA8AB762F73936BC43F508A063822E6D982AC917A018C4AD54BB60012595BA7C8D0F45F857DBFE9F1EFEE2F478337FB559BB5D741EB964C8C2C849690F8E70A04DAD6CC59E33667EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 luajit 16/30] test: adapt PUC Lua test for %q in fmt for LuaJIT X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Igor Munkin via Tarantool-patches Reply-To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Sergey, On 08.04.21, Sergey Kaplun wrote: > Igor, > > On 07.04.21, Igor Munkin wrote: > > Sergey, > > > > On 07.04.21, Sergey Kaplun wrote: > > > Igor, > > > > > > > > > > > > > > > > string.format(): %q reversible. > > > > > > > See also https://luajit.org/extensions.html#lua52. > > > > > > > > > > > > > > In Lua 5.1 string.format() does not accept string values containing > > > > > > > embedded zeros, except as arguments to the '%q' option. > > > > > > > In Lua 5.2 '\0' is not handled differently from other > > > > > > > control chars in string.format('%q', ...). > > > > > > > See commit 7cc981c14067d4b0e774a6bfb0acfc2f5c911f0d > > > > > > > (string.format("%q", str) is now fully reversible > > > > > > > (from Lua 5.2).). > > > > > > > > > > > > Well, I honestly don't understand what is changed in *semantics*. I've > > > > > > tried the following command with Lua 5.2, Lua 5.1 and LuaJIT 2.0.5 as an > > > > > > interpreter being tested > > > > > > | -e 'print(string.format("%q", "\0"))' > > > > > > > > > > > > I understand the semantics of "%q", but was it just a bug in Lua 5.1? > > > > > > > > > > A bug with the test for it??? > > > > > > > > Well, I can remind you the bug with we fixed the last year. > > > > There might be no test for it though, but all in all it has not been > > > > fixed in Lua 5.1. > > > > > > I mean that this behaviour is verificated by the test. When behaviour is > > > changed the test is changed too. > > > > > > > > > > > > > > > > > > What does "fully reversible" mean in this context? > > > > > > > > This question is left unaddressed. > > > > > > I don't know what does Mike mean by these. > > > > Then it would be great to describe the changes on your own. I can hardly > > split the comment into yours words and ones taked from Lua Reference > > manual or git log, but you are writing that "In Lua 5.2 '\0' is not > > handled differently from other control chars". Could you please clarify > > the difference you are talking about? Or provide the links describing > > it? May be some parts from PIL? > > > > > > > > > > > > > > > > > > > > > I understand only the fact the behaviour differs and you reimplemented > > > > > > the test assertion according to Lua 5.2 testing suite. That's all. > > > > > > > > > > > > I found not a single word regarding this issue in Lua bugs[1] page, > > > > > > except invalid handling of \r[2]. Is there any issue/page with a more > > > > > > > > > > It looks unrelated to these changes. > > > > > > > > > > > verbose explanation what has been changed in 7cc981c? > > > > > > > > > > I just read these lines in Lua 5.1 reference manual :): > > > > > | This function does not accept string values containing embedded > > > > > | zeros, except as arguments to the q option. > > > > > > > > So what? This means literally nothing to me... BTW, I can pass such > > > > string to the function and it can yield any bullshit the developer > > > > wanted to. That's why we decided to comment such places in a clear and > > > > verbose way, didn't we? > > > > > > Don't get you point here. AFAIU it means that `%q` is the only one > > > option that can contain embeded zeros, so it handles it in the special > > > way. > > > > My point relates to the fact, nobody except you can understand the > > comment near the change. Hence one need to make the similar > > investigation you made. Then what is the sense of commenting the changes > > in suite? > > > > > > > > > > > > > > > > > > > As for me, it is just new behaviour of Lua 5.2 -- patterns now accept > > > > > '\0' as a reqular character (see > > > > > 4541243355a299a9b75042d207feb87295872c3a (patterns now accept '\0' as a > > > > > regular character) from Lua repository). So, according to commit > > > > > 658ea8752b979102627e2fede7b7ddfbb67ba6c9 (no need to handle '\0' > > > > > differently from other control chars in format '%q')) from Lua > > > > > repository, this behaviour is excess. > > > > > > > > > > Also, it is mentioned here [2]. > > > > > > > > I see nothing regarding this change there. > > > > > > I am talking about this part: > > > > > > | Character class %z in patterns is deprecated, as now patterns may > > > | contain '\0' as a regular character. > > > > Patterns, not options, right? > > > > > > > > > > > I haven't found no more verbose descriptions about %q specifier > neither in the PIL, nor in the Lua Reference Manual, nor in the Lua > mailing list, nor in the LuaJIT mailing list. > So, I am appealing to the commits in the Lua repository. > > Going back to the definition of specifier %q from Lua 5.1 Reference > manual: > | The q option formats a string in a form suitable to be safely read > | back by the Lua interpreter: ... > > Answering your previous question -- "reversible" means (in my > understanding now) that `string.format("% q", binary)` should return > the same-looking `binary` string that was presented to the > interpreter and `string.format()` first (perhaps, it can expand > control characters like '\r', '\n', and turn symbol "\65" into "A"). > This is why I was referencing zero-bytes in patterns (unfortunately, > wrongly). > > Historically, in Lua 5.2, control characters are written as \nnn > when needed, see d62a21b9d379a576bae7426c80039ca1a4d2bb07 ("when > formatting with '%q', all control characters are coded as \nnn.") [1]. > In this patch, %q specifier starts writing control characters > to a new string in the same way through \n, instead of their binary > representation as it is. If the control character is followed by a > digit, then for the correct work of the parser, it is necessary to > extend the escape sequence to 3 significant characters (otherwise, > the transition "\0002" -> "\02" corrupts string). > For this patch, the expansion is performed unconditionally (the check > for the next symbol is not performed) if this first symbol is a zero > byte. You may notice that the additional check for '\0' is > unnecessary - if after it comes a digit, then a branch with expanding > the format to 3 significant characters will be selected. Consistent > work with the zero byte processing was added in the commit > 658ea8752b979102627e2fede7b7ddfbb67ba6c9 ("no need to handle '\0' > differently from other control chars in format '%q'") [2]. Note that > this patch does not change the semantics of %q specifier - the new > string can still be "safely read back by the Lua interpreter". > > As we know, in Lua 5.1, '\0' was converted to "\000" unconditionally. > This was partly done to avoid unnecessary logic, as I suppose, since > before the d62a21b commit, all other characters were transmitted > as-is in the new string, and there was no inconsistency. > > Is it worth mentioning the first commit clarifying the cause of the > inconsistency, because it does not directly relate to changes? The > initial comment, in my opinion, reflects the reason for the failing > test -- '\0' began to be processed in the same way as other > controlled characters in Lua 5.2. LuaJIT has adopted this practice in > a related commit. The curious reader can independently search the > commit history for these changes. Please let me know, if you think > these clarifications are necessary to describe the problem. IMHO, it's worth mentioning everything above! Thanks for such deep digging and clarification: now I understand the original semantics and your change. It was totally unclear that Lua converts \0 to \000 considering it as an octet. Please enrich the comment with some parts above (honestly, you can put the whole explanation to the comment if you want), and after this change, the patch LGTM. Again, thanks a lot for your investigation! > > > > > > > > > [1]: https://www.lua.org/bugs.html#5.1 > > > > > > [2]: https://www.lua.org/bugs.html#5.1-4 > > > > > > > > > > > > -- > > > > > > Best regards, > > > > > > IM > > > > > > > > > > [1]: https://www.lua.org/manual/5.1/manual.html#pdf-string.format > > > > > [2]: https://www.lua.org/manual/5.2/manual.html#8.2 > > > > > > > > > > > > -- > > > Best regards, > > > Sergey Kaplun > > > > -- > > Best regards, > > IM > > [1]: https://github.com/lua/lua/commit/d62a21b9d379a576bae7426c80039ca1a4d2bb07 > [2]: https://github.com/lua/lua/commit/658ea8752b979102627e2fede7b7ddfbb67ba6c9 > > -- > Best regards, > Sergey Kaplun -- Best regards, IM