Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Sergey Kaplun <skaplun@tarantool.org>
To: Igor Munkin <imun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
Date: Thu, 24 Dec 2020 01:34:54 +0300	[thread overview]
Message-ID: <20201223223454.GF9101@root> (raw)
In-Reply-To: <20201220224433.GQ5396@tarantool.org>

Igor,

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please, consider the comments below.
> 
> On 16.12.20, Sergey Kaplun wrote:
> > This patch introduces module for reading and writing leb128 compression.
> > It will be used for streaming profiling events writing, that will be
> > added at the next patches.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> >  src/Makefile       |   5 +-
> >  src/Makefile.dep   |   1 +
> >  src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++
> >  src/utils/leb128.h |  55 ++++++++++++++++++++
> >  4 files changed, 183 insertions(+), 2 deletions(-)
> >  create mode 100644 src/utils/leb128.c
> >  create mode 100644 src/utils/leb128.h
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index caa49f9..be7ed95 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> 
> Please, adjust these changes considering the comments to the first
> patch. I propose to use either <lj_utils.*> or <lj_utils_leb128.*> for
> the name.

OK, no problem.

> 
> > @@ -468,6 +468,7 @@ endif
> 
> <snipped>
> 
> > diff --git a/src/utils/leb128.c b/src/utils/leb128.c
> > new file mode 100644
> > index 0000000..921e5bc
> > --- /dev/null
> > +++ b/src/utils/leb128.c
> > @@ -0,0 +1,124 @@
> > +/*
> > +** Working with LEB128/ULEB128 encoding.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#include <stdint.h>
> > +#include <stddef.h>
> 
> Why do you include this again instead of using leb128.h?

It's enough. Definitions from <leb128.h> is redundant here.

> 
> > +
> > +#define LINK_BIT          (0x80)
> > +#define MIN_TWOBYTE_VALUE (0x80)
> > +#define PAYLOAD_MASK      (0x7f)
> > +#define SHIFT_STEP        (7)
> > +#define LEB_SIGN_BIT      (0x40)
> > +
> > +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */
> > +
> > +size_t write_uleb128(uint8_t *buffer, uint64_t value)
> > +{
> > +  size_t i = 0;
> > +
> > +  for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> > +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +  buffer[i++] = (uint8_t)value;
> > +
> > +  return i;
> > +}
> > +
> > +size_t write_leb128(uint8_t *buffer, int64_t value)
> > +{
> > +  size_t i = 0;
> > +
> > +  for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> 
> What is 0x40? If this is <LEB_SIGN_BIT>, then just use the constant
> here. Otherwise create a new one with the comment. Please, do not use
> magic numbers.

This necessary bit propagation for correct encoding. I'll drop comment
about it in the next version.

> 
> > +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +  buffer[i++] = (uint8_t)(value & PAYLOAD_MASK);
> > +
> > +  return i;
> > +}
> > +
> > +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */
> > +
> > +/*
> > +** NB! For each LEB128 type (signed/unsigned) we have two versions of read
> 
> Minor: It's better to use XXX for these cases in comments. We already
> discussed this with Vlad here[1] (search for "What is 'XXX'?").

OK, IINM I've already seen "NB:" comment somewhere in LuaJIT sources.
But "XXX" is good to me.

> 
> > +** functions: The one consuming unlimited number of input octets and the one
> > +** consuming not more than given number of input octets. Currently reading
> > +** is not used in performance critical places, so these two functions are
> > +** implemented via single low-level function + run-time mode check. Feel free
> > +** to change if this becomes a bottleneck.
> 
> Well, you can also add LJ_AINLINE for a low-level function, or simply
> add a similar hint by yourself (I personally prefer the first one).

OK.

> 
> > +*/
> > +
> > +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded,
> > +			    size_t n)
> 
> AFAICS, <n> argument is used only in case <guarded> is set to 1.
> Moreover, <n> can't be 0 when <guarded> is set, otherwise this is a
> nilpotent function. So it seems you can drop the <guarded> parameter in
> favour of the following contract for <n>:
> * n == 0 is for guarded == 0 && n == 0
> * n > 0  is for guarded == 1 && n > 0
> 
> This also relates to <_read_leb128>.

Yes, good point, thanks!

> 
> > +{
> > +  size_t i = 0;
> > +  uint64_t value = 0;
> > +  uint64_t shift = 0;
> > +  uint8_t octet;
> > +
> > +  for(;;) {
> > +    if (guarded && i + 1 > n) {
> > +      return 0;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +    octet = buffer[i++];
> > +    value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift;
> > +    shift += SHIFT_STEP;
> > +    if (!(octet & LINK_BIT)) {
> > +      break;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +  }
> > +
> > +  *out = value;
> > +  return i;
> > +}
> > +
> 
> <snipped>
> 
> > +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded,
> > +			   size_t n)
> > +{
> > +  size_t i = 0;
> > +  int64_t  value = 0;
> > +  uint64_t shift = 0;
> > +  uint8_t  octet;
> 
> A mess with whitespace above.

Fixed.

> 
> > +
> > +  for(;;) {
> > +    if (guarded && i + 1 > n) {
> > +      return 0;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +    octet  = buffer[i++];
> > +    value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift;
> > +    shift += SHIFT_STEP;
> > +    if (!(octet & LINK_BIT)) {
> > +      break;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +  }
> > +
> > +  if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) {
> > +    value |= -(1 << shift);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +
> > +  *out = value;
> > +  return i;
> > +}
> > +
> 
> <snipped>
> 
> > diff --git a/src/utils/leb128.h b/src/utils/leb128.h
> > new file mode 100644
> > index 0000000..46d90bc
> > --- /dev/null
> > +++ b/src/utils/leb128.h
> > @@ -0,0 +1,55 @@
> > +/*
> > +** Interfaces for working with LEB128/ULEB128 encoding.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#ifndef _LJ_UTILS_LEB128_H
> > +#define _LJ_UTILS_LEB128_H
> > +
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +
> > +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */
> > +#define LEB128_U64_MAXSIZE 10
> 
> The naming looks odd to me. Considering my comment for the first patch,
> I propose to use something matching "lj_u?leb128_(read|write)(_n)?".
> 
> By the way, the order of the interfaces is also odd.

I'll rewrite naming considering new naming of this translation unit.

> 
> > +
> > +/*
> > +** Writes a value from an unsigned 64-bit input to a buffer of bytes.
> > +** Buffer overflow is not checked. Returns number of bytes written.
> > +*/
> > +size_t write_uleb128(uint8_t *buffer, uint64_t value);
> > +
> > +/*
> > +** Writes a value from an signed 64-bit input to a buffer of bytes.
> > +** Buffer overflow is not checked. Returns number of bytes written.
> > +*/
> > +size_t write_leb128(uint8_t *buffer, int64_t value);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a uint64_t output.
> > +** Buffer overflow is not checked. Returns number of bytes read.
> 
> If "buffer overflow" stands for "reading out of bounds", please reword
> this. Otherwise, I don't get it.

Yep, fixed, thanks!

> 
> > +*/
> > +size_t read_uleb128(uint64_t *out, const uint8_t *buffer);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a int64_t output.
> > +** Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +*/
> > +size_t read_leb128(int64_t *out, const uint8_t *buffer);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
> > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +** If more than n bytes is about to be consumed, returns 0 without touching out.
> > +*/
> > +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
> > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +** If more than n bytes is about to be consumed, returns 0 without touching out.
> > +*/
> > +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n);
> > +
> > +#endif
> > -- 
> > 2.28.0
> > 
> 
> [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

next prev parent reply	other threads:[~2020-12-23 22:35 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun
2020-12-20 21:27   ` Igor Munkin
2020-12-23 18:20     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun
2020-12-20 22:44   ` Igor Munkin
2020-12-23 22:34     ` Sergey Kaplun [this message]
2020-12-24  9:11       ` Igor Munkin
2020-12-25  8:46         ` Sergey Kaplun
2020-12-23 16:50   ` Sergey Ostanevich
2020-12-23 22:36     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module Sergey Kaplun
2020-12-21  9:24   ` Igor Munkin
2020-12-24  6:46     ` Sergey Kaplun
2020-12-24 15:45       ` Sergey Ostanevich
2020-12-24 21:20         ` Sergey Kaplun
2020-12-25  9:37           ` Igor Munkin
2020-12-25 10:13             ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module Sergey Kaplun
2020-12-21 10:30   ` Igor Munkin
2020-12-24  7:00     ` Sergey Kaplun
2020-12-24  9:36       ` Igor Munkin
2020-12-25  8:45         ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates Sergey Kaplun
2020-12-25 11:07   ` Sergey Ostanevich
2020-12-25 11:23     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 06/11] core: introduce new mem_L field Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API Sergey Kaplun
2020-12-20 22:46   ` Igor Munkin
2020-12-24  6:50     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 08/11] profile: introduce memory profiler Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for " Sergey Kaplun
2020-12-24 16:32   ` Sergey Ostanevich
2020-12-24 21:25     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory Sergey Kaplun
2020-12-20 22:46   ` Igor Munkin
2020-12-24  6:47     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser Sergey Kaplun
2020-12-24 23:09   ` Igor Munkin
2020-12-25  8:41     ` Sergey Kaplun
2020-12-21 10:43 ` [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Igor Munkin
2020-12-24  7:02   ` Sergey Kaplun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201223223454.GF9101@root \
    --to=skaplun@tarantool.org \
    --cc=imun@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox