From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp32.i.mail.ru (smtp32.i.mail.ru [94.100.177.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5ADB24765E0 for ; Thu, 24 Dec 2020 01:35:40 +0300 (MSK) Date: Thu, 24 Dec 2020 01:34:54 +0300 From: Sergey Kaplun Message-ID: <20201223223454.GF9101@root> References: <7f5108768b70c9ffd2561f89c4974379085921e4.1608142899.git.skaplun@tarantool.org> <20201220224433.GQ5396@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201220224433.GQ5396@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org Igor, Thanks for the review! On 21.12.20, Igor Munkin wrote: > Sergey, > > Thanks for the patch! Please, consider the comments below. > > On 16.12.20, Sergey Kaplun wrote: > > This patch introduces module for reading and writing leb128 compression. > > It will be used for streaming profiling events writing, that will be > > added at the next patches. > > > > Part of tarantool/tarantool#5442 > > --- > > src/Makefile | 5 +- > > src/Makefile.dep | 1 + > > src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++ > > src/utils/leb128.h | 55 ++++++++++++++++++++ > > 4 files changed, 183 insertions(+), 2 deletions(-) > > create mode 100644 src/utils/leb128.c > > create mode 100644 src/utils/leb128.h > > > > diff --git a/src/Makefile b/src/Makefile > > index caa49f9..be7ed95 100644 > > --- a/src/Makefile > > +++ b/src/Makefile > > Please, adjust these changes considering the comments to the first > patch. I propose to use either or for > the name. OK, no problem. > > > @@ -468,6 +468,7 @@ endif > > > > > diff --git a/src/utils/leb128.c b/src/utils/leb128.c > > new file mode 100644 > > index 0000000..921e5bc > > --- /dev/null > > +++ b/src/utils/leb128.c > > @@ -0,0 +1,124 @@ > > +/* > > +** Working with LEB128/ULEB128 encoding. > > +** > > +** Major portions taken verbatim or adapted from the LuaVela. > > +** Copyright (C) 2015-2019 IPONWEB Ltd. > > +*/ > > + > > +#include > > +#include > > Why do you include this again instead of using leb128.h? It's enough. Definitions from is redundant here. > > > + > > +#define LINK_BIT (0x80) > > +#define MIN_TWOBYTE_VALUE (0x80) > > +#define PAYLOAD_MASK (0x7f) > > +#define SHIFT_STEP (7) > > +#define LEB_SIGN_BIT (0x40) > > + > > +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */ > > + > > +size_t write_uleb128(uint8_t *buffer, uint64_t value) > > +{ > > + size_t i = 0; > > + > > + for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { > > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > > + } > > The braces are excess. Fixed. > > > + buffer[i++] = (uint8_t)value; > > + > > + return i; > > +} > > + > > +size_t write_leb128(uint8_t *buffer, int64_t value) > > +{ > > + size_t i = 0; > > + > > + for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { > > What is 0x40? If this is , then just use the constant > here. Otherwise create a new one with the comment. Please, do not use > magic numbers. This necessary bit propagation for correct encoding. I'll drop comment about it in the next version. > > > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > > + } > > The braces are excess. Fixed. > > > + buffer[i++] = (uint8_t)(value & PAYLOAD_MASK); > > + > > + return i; > > +} > > + > > +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */ > > + > > +/* > > +** NB! For each LEB128 type (signed/unsigned) we have two versions of read > > Minor: It's better to use XXX for these cases in comments. We already > discussed this with Vlad here[1] (search for "What is 'XXX'?"). OK, IINM I've already seen "NB:" comment somewhere in LuaJIT sources. But "XXX" is good to me. > > > +** functions: The one consuming unlimited number of input octets and the one > > +** consuming not more than given number of input octets. Currently reading > > +** is not used in performance critical places, so these two functions are > > +** implemented via single low-level function + run-time mode check. Feel free > > +** to change if this becomes a bottleneck. > > Well, you can also add LJ_AINLINE for a low-level function, or simply > add a similar hint by yourself (I personally prefer the first one). OK. > > > +*/ > > + > > +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded, > > + size_t n) > > AFAICS, argument is used only in case is set to 1. > Moreover, can't be 0 when is set, otherwise this is a > nilpotent function. So it seems you can drop the parameter in > favour of the following contract for : > * n == 0 is for guarded == 0 && n == 0 > * n > 0 is for guarded == 1 && n > 0 > > This also relates to <_read_leb128>. Yes, good point, thanks! > > > +{ > > + size_t i = 0; > > + uint64_t value = 0; > > + uint64_t shift = 0; > > + uint8_t octet; > > + > > + for(;;) { > > + if (guarded && i + 1 > n) { > > + return 0; > > + } > > The braces are excess. Fixed. > > > + octet = buffer[i++]; > > + value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift; > > + shift += SHIFT_STEP; > > + if (!(octet & LINK_BIT)) { > > + break; > > + } > > The braces are excess. Fixed. > > > + } > > + > > + *out = value; > > + return i; > > +} > > + > > > > > +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded, > > + size_t n) > > +{ > > + size_t i = 0; > > + int64_t value = 0; > > + uint64_t shift = 0; > > + uint8_t octet; > > A mess with whitespace above. Fixed. > > > + > > + for(;;) { > > + if (guarded && i + 1 > n) { > > + return 0; > > + } > > The braces are excess. Fixed. > > > + octet = buffer[i++]; > > + value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift; > > + shift += SHIFT_STEP; > > + if (!(octet & LINK_BIT)) { > > + break; > > + } > > The braces are excess. Fixed. > > > + } > > + > > + if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) { > > + value |= -(1 << shift); > > + } > > The braces are excess. Fixed. > > > + > > + *out = value; > > + return i; > > +} > > + > > > > > diff --git a/src/utils/leb128.h b/src/utils/leb128.h > > new file mode 100644 > > index 0000000..46d90bc > > --- /dev/null > > +++ b/src/utils/leb128.h > > @@ -0,0 +1,55 @@ > > +/* > > +** Interfaces for working with LEB128/ULEB128 encoding. > > +** > > +** Major portions taken verbatim or adapted from the LuaVela. > > +** Copyright (C) 2015-2019 IPONWEB Ltd. > > +*/ > > + > > +#ifndef _LJ_UTILS_LEB128_H > > +#define _LJ_UTILS_LEB128_H > > + > > +#include > > +#include > > + > > +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */ > > +#define LEB128_U64_MAXSIZE 10 > > The naming looks odd to me. Considering my comment for the first patch, > I propose to use something matching "lj_u?leb128_(read|write)(_n)?". > > By the way, the order of the interfaces is also odd. I'll rewrite naming considering new naming of this translation unit. > > > + > > +/* > > +** Writes a value from an unsigned 64-bit input to a buffer of bytes. > > +** Buffer overflow is not checked. Returns number of bytes written. > > +*/ > > +size_t write_uleb128(uint8_t *buffer, uint64_t value); > > + > > +/* > > +** Writes a value from an signed 64-bit input to a buffer of bytes. > > +** Buffer overflow is not checked. Returns number of bytes written. > > +*/ > > +size_t write_leb128(uint8_t *buffer, int64_t value); > > + > > +/* > > +** Reads a value from a buffer of bytes to a uint64_t output. > > +** Buffer overflow is not checked. Returns number of bytes read. > > If "buffer overflow" stands for "reading out of bounds", please reword > this. Otherwise, I don't get it. Yep, fixed, thanks! > > > +*/ > > +size_t read_uleb128(uint64_t *out, const uint8_t *buffer); > > + > > +/* > > +** Reads a value from a buffer of bytes to a int64_t output. > > +** Buffer overflow is not checked. Returns number of bytes read. > > Ditto. Yep, fixed, thanks! > > > +*/ > > +size_t read_leb128(int64_t *out, const uint8_t *buffer); > > + > > +/* > > +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more > > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. > > Ditto. Yep, fixed, thanks! > > > +** If more than n bytes is about to be consumed, returns 0 without touching out. > > +*/ > > +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n); > > + > > +/* > > +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more > > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. > > Ditto. Yep, fixed, thanks! > > > +** If more than n bytes is about to be consumed, returns 0 without touching out. > > +*/ > > +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n); > > + > > +#endif > > -- > > 2.28.0 > > > > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html > > -- > Best regards, > IM -- Best regards, Sergey Kaplun