[Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer

Thu Dec 24 01:34:54 MSK 2020

Igor,

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please, consider the comments below.
> 
> On 16.12.20, Sergey Kaplun wrote:
> > This patch introduces module for reading and writing leb128 compression.
> > It will be used for streaming profiling events writing, that will be
> > added at the next patches.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> >  src/Makefile       |   5 +-
> >  src/Makefile.dep   |   1 +
> >  src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++
> >  src/utils/leb128.h |  55 ++++++++++++++++++++
> >  4 files changed, 183 insertions(+), 2 deletions(-)
> >  create mode 100644 src/utils/leb128.c
> >  create mode 100644 src/utils/leb128.h
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index caa49f9..be7ed95 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> 
> Please, adjust these changes considering the comments to the first
> patch. I propose to use either <lj_utils.*> or <lj_utils_leb128.*> for
> the name.

OK, no problem.

> 
> > @@ -468,6 +468,7 @@ endif
> 
> <snipped>
> 
> > diff --git a/src/utils/leb128.c b/src/utils/leb128.c
> > new file mode 100644
> > index 0000000..921e5bc
> > --- /dev/null
> > +++ b/src/utils/leb128.c
> > @@ -0,0 +1,124 @@
> > +/*
> > +** Working with LEB128/ULEB128 encoding.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#include <stdint.h>
> > +#include <stddef.h>
> 
> Why do you include this again instead of using leb128.h?

It's enough. Definitions from <leb128.h> is redundant here.

> 
> > +
> > +#define LINK_BIT          (0x80)
> > +#define MIN_TWOBYTE_VALUE (0x80)
> > +#define PAYLOAD_MASK      (0x7f)
> > +#define SHIFT_STEP        (7)
> > +#define LEB_SIGN_BIT      (0x40)
> > +
> > +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */
> > +
> > +size_t write_uleb128(uint8_t *buffer, uint64_t value)
> > +{
> > +  size_t i = 0;
> > +
> > +  for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> > +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +  buffer[i++] = (uint8_t)value;
> > +
> > +  return i;
> > +}
> > +
> > +size_t write_leb128(uint8_t *buffer, int64_t value)
> > +{
> > +  size_t i = 0;
> > +
> > +  for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> 
> What is 0x40? If this is <LEB_SIGN_BIT>, then just use the constant
> here. Otherwise create a new one with the comment. Please, do not use
> magic numbers.

This necessary bit propagation for correct encoding. I'll drop comment
about it in the next version.

> 
> > +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +  buffer[i++] = (uint8_t)(value & PAYLOAD_MASK);
> > +
> > +  return i;
> > +}
> > +
> > +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */
> > +
> > +/*
> > +** NB! For each LEB128 type (signed/unsigned) we have two versions of read
> 
> Minor: It's better to use XXX for these cases in comments. We already
> discussed this with Vlad here[1] (search for "What is 'XXX'?").

OK, IINM I've already seen "NB:" comment somewhere in LuaJIT sources.
But "XXX" is good to me.

> 
> > +** functions: The one consuming unlimited number of input octets and the one
> > +** consuming not more than given number of input octets. Currently reading
> > +** is not used in performance critical places, so these two functions are
> > +** implemented via single low-level function + run-time mode check. Feel free
> > +** to change if this becomes a bottleneck.
> 
> Well, you can also add LJ_AINLINE for a low-level function, or simply
> add a similar hint by yourself (I personally prefer the first one).

OK.

> 
> > +*/
> > +
> > +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded,
> > +			    size_t n)
> 
> AFAICS, <n> argument is used only in case <guarded> is set to 1.
> Moreover, <n> can't be 0 when <guarded> is set, otherwise this is a
> nilpotent function. So it seems you can drop the <guarded> parameter in
> favour of the following contract for <n>:
> * n == 0 is for guarded == 0 && n == 0
> * n > 0  is for guarded == 1 && n > 0
> 
> This also relates to <_read_leb128>.

Yes, good point, thanks!

> 
> > +{
> > +  size_t i = 0;
> > +  uint64_t value = 0;
> > +  uint64_t shift = 0;
> > +  uint8_t octet;
> > +
> > +  for(;;) {
> > +    if (guarded && i + 1 > n) {
> > +      return 0;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +    octet = buffer[i++];
> > +    value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift;
> > +    shift += SHIFT_STEP;
> > +    if (!(octet & LINK_BIT)) {
> > +      break;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +  }
> > +
> > +  *out = value;
> > +  return i;
> > +}
> > +
> 
> <snipped>
> 
> > +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded,
> > +			   size_t n)
> > +{
> > +  size_t i = 0;
> > +  int64_t  value = 0;
> > +  uint64_t shift = 0;
> > +  uint8_t  octet;
> 
> A mess with whitespace above.

Fixed.

> 
> > +
> > +  for(;;) {
> > +    if (guarded && i + 1 > n) {
> > +      return 0;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +    octet  = buffer[i++];
> > +    value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift;
> > +    shift += SHIFT_STEP;
> > +    if (!(octet & LINK_BIT)) {
> > +      break;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +  }
> > +
> > +  if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) {
> > +    value |= -(1 << shift);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +
> > +  *out = value;
> > +  return i;
> > +}
> > +
> 
> <snipped>
> 
> > diff --git a/src/utils/leb128.h b/src/utils/leb128.h
> > new file mode 100644
> > index 0000000..46d90bc
> > --- /dev/null
> > +++ b/src/utils/leb128.h
> > @@ -0,0 +1,55 @@
> > +/*
> > +** Interfaces for working with LEB128/ULEB128 encoding.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#ifndef _LJ_UTILS_LEB128_H
> > +#define _LJ_UTILS_LEB128_H
> > +
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +
> > +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */
> > +#define LEB128_U64_MAXSIZE 10
> 
> The naming looks odd to me. Considering my comment for the first patch,
> I propose to use something matching "lj_u?leb128_(read|write)(_n)?".
> 
> By the way, the order of the interfaces is also odd.

I'll rewrite naming considering new naming of this translation unit.

> 
> > +
> > +/*
> > +** Writes a value from an unsigned 64-bit input to a buffer of bytes.
> > +** Buffer overflow is not checked. Returns number of bytes written.
> > +*/
> > +size_t write_uleb128(uint8_t *buffer, uint64_t value);
> > +
> > +/*
> > +** Writes a value from an signed 64-bit input to a buffer of bytes.
> > +** Buffer overflow is not checked. Returns number of bytes written.
> > +*/
> > +size_t write_leb128(uint8_t *buffer, int64_t value);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a uint64_t output.
> > +** Buffer overflow is not checked. Returns number of bytes read.
> 
> If "buffer overflow" stands for "reading out of bounds", please reword
> this. Otherwise, I don't get it.

Yep, fixed, thanks!

> 
> > +*/
> > +size_t read_uleb128(uint64_t *out, const uint8_t *buffer);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a int64_t output.
> > +** Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +*/
> > +size_t read_leb128(int64_t *out, const uint8_t *buffer);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
> > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +** If more than n bytes is about to be consumed, returns 0 without touching out.
> > +*/
> > +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
> > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +** If more than n bytes is about to be consumed, returns 0 without touching out.
> > +*/
> > +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n);
> > +
> > +#endif
> > -- 
> > 2.28.0
> > 
> 
> [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun