[tarantool-patches] Re: [PATCH v2 5/5] lua: introduce utf8 built-in globaly visible module

Alexander Turenko alexander.turenko at tarantool.org
Sat May 5 03:18:15 MSK 2018


Vlad,

Thanks for the fixes. You are rock!

I want to clarify two things, please see below.

WBR, Alexander Turenko.

On Sat, May 05, 2018 at 02:32:27AM +0300, Vladislav Shpilevoy wrote:
> Hello. Thanks for review.
> 
> On 05/05/2018 01:33, Alexander Turenko wrote:
> > Vlad,
> > 
> > Are you try to run tests from utf8.lua from [1]?
> > 
> > [1]: https://www.lua.org/tests/lua-5.3.4-tests.tar.gz
> > 

Are you think such testing would be redundant? I don't insist, just want
to know explicit position.

> > > +
> > > +/**
> > > + * Calculate length of a UTF8 string. Length here is symbol count.
> > > + * Works like utf8.len in Lua 5.3.
> > > + * @param String to get length.
> > > + * @param Start byte offset. Must point to the start of symbol. On
> > > + *        invalid symbol an error is returned. Can be negative.
> > 
> > Can be 1 <= |start| <= #str + 1, right? Is it worth to document? Such
> > offset equilibristics is not very intuitive (at least for me).
> 
> No, start can be any, as well as end.
> 

It does not look like so:

tarantool> print(utf8.len('abc', 0))
nil     position is out of string

tarantool> print(utf8.len('abc', 5))
nil     position is out of string

tarantool> print(utf8.len('abc', 1, 4))
nil     position is out of string

That matches lua 5.3 behaviour, but contradicts with your words above.
So the question is about proper doxygen-style comment.

> > 
> > > + * @param End byte offset, can be negative. Can point to the
> > > + *        middle of symbol.
> > 
> > We need to clarify that a symbol under the end offset is subject to
> > include into the resulting count (inclusive range).
> > 
> > I would also explicitly stated that -1 is the end byte.
> > 
> > Worth to document allowed range (0 <= |end| <= #str, right?)?
> 




More information about the Tarantool-patches mailing list