From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 28D8E4765E0 for ; Mon, 21 Dec 2020 01:44:39 +0300 (MSK) Date: Mon, 21 Dec 2020 01:44:33 +0300 From: Igor Munkin Message-ID: <20201220224433.GQ5396@tarantool.org> References: <7f5108768b70c9ffd2561f89c4974379085921e4.1608142899.git.skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <7f5108768b70c9ffd2561f89c4974379085921e4.1608142899.git.skaplun@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Sergey, Thanks for the patch! Please, consider the comments below. On 16.12.20, Sergey Kaplun wrote: > This patch introduces module for reading and writing leb128 compression. > It will be used for streaming profiling events writing, that will be > added at the next patches. > > Part of tarantool/tarantool#5442 > --- > src/Makefile | 5 +- > src/Makefile.dep | 1 + > src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++ > src/utils/leb128.h | 55 ++++++++++++++++++++ > 4 files changed, 183 insertions(+), 2 deletions(-) > create mode 100644 src/utils/leb128.c > create mode 100644 src/utils/leb128.h > > diff --git a/src/Makefile b/src/Makefile > index caa49f9..be7ed95 100644 > --- a/src/Makefile > +++ b/src/Makefile Please, adjust these changes considering the comments to the first patch. I propose to use either or for the name. > @@ -468,6 +468,7 @@ endif > diff --git a/src/utils/leb128.c b/src/utils/leb128.c > new file mode 100644 > index 0000000..921e5bc > --- /dev/null > +++ b/src/utils/leb128.c > @@ -0,0 +1,124 @@ > +/* > +** Working with LEB128/ULEB128 encoding. > +** > +** Major portions taken verbatim or adapted from the LuaVela. > +** Copyright (C) 2015-2019 IPONWEB Ltd. > +*/ > + > +#include > +#include Why do you include this again instead of using leb128.h? > + > +#define LINK_BIT (0x80) > +#define MIN_TWOBYTE_VALUE (0x80) > +#define PAYLOAD_MASK (0x7f) > +#define SHIFT_STEP (7) > +#define LEB_SIGN_BIT (0x40) > + > +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */ > + > +size_t write_uleb128(uint8_t *buffer, uint64_t value) > +{ > + size_t i = 0; > + > + for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > + } The braces are excess. > + buffer[i++] = (uint8_t)value; > + > + return i; > +} > + > +size_t write_leb128(uint8_t *buffer, int64_t value) > +{ > + size_t i = 0; > + > + for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { What is 0x40? If this is , then just use the constant here. Otherwise create a new one with the comment. Please, do not use magic numbers. > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > + } The braces are excess. > + buffer[i++] = (uint8_t)(value & PAYLOAD_MASK); > + > + return i; > +} > + > +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */ > + > +/* > +** NB! For each LEB128 type (signed/unsigned) we have two versions of read Minor: It's better to use XXX for these cases in comments. We already discussed this with Vlad here[1] (search for "What is 'XXX'?"). > +** functions: The one consuming unlimited number of input octets and the one > +** consuming not more than given number of input octets. Currently reading > +** is not used in performance critical places, so these two functions are > +** implemented via single low-level function + run-time mode check. Feel free > +** to change if this becomes a bottleneck. Well, you can also add LJ_AINLINE for a low-level function, or simply add a similar hint by yourself (I personally prefer the first one). > +*/ > + > +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded, > + size_t n) AFAICS, argument is used only in case is set to 1. Moreover, can't be 0 when is set, otherwise this is a nilpotent function. So it seems you can drop the parameter in favour of the following contract for : * n == 0 is for guarded == 0 && n == 0 * n > 0 is for guarded == 1 && n > 0 This also relates to <_read_leb128>. > +{ > + size_t i = 0; > + uint64_t value = 0; > + uint64_t shift = 0; > + uint8_t octet; > + > + for(;;) { > + if (guarded && i + 1 > n) { > + return 0; > + } The braces are excess. > + octet = buffer[i++]; > + value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift; > + shift += SHIFT_STEP; > + if (!(octet & LINK_BIT)) { > + break; > + } The braces are excess. > + } > + > + *out = value; > + return i; > +} > + > +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded, > + size_t n) > +{ > + size_t i = 0; > + int64_t value = 0; > + uint64_t shift = 0; > + uint8_t octet; A mess with whitespace above. > + > + for(;;) { > + if (guarded && i + 1 > n) { > + return 0; > + } The braces are excess. > + octet = buffer[i++]; > + value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift; > + shift += SHIFT_STEP; > + if (!(octet & LINK_BIT)) { > + break; > + } The braces are excess. > + } > + > + if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) { > + value |= -(1 << shift); > + } The braces are excess. > + > + *out = value; > + return i; > +} > + > diff --git a/src/utils/leb128.h b/src/utils/leb128.h > new file mode 100644 > index 0000000..46d90bc > --- /dev/null > +++ b/src/utils/leb128.h > @@ -0,0 +1,55 @@ > +/* > +** Interfaces for working with LEB128/ULEB128 encoding. > +** > +** Major portions taken verbatim or adapted from the LuaVela. > +** Copyright (C) 2015-2019 IPONWEB Ltd. > +*/ > + > +#ifndef _LJ_UTILS_LEB128_H > +#define _LJ_UTILS_LEB128_H > + > +#include > +#include > + > +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */ > +#define LEB128_U64_MAXSIZE 10 The naming looks odd to me. Considering my comment for the first patch, I propose to use something matching "lj_u?leb128_(read|write)(_n)?". By the way, the order of the interfaces is also odd. > + > +/* > +** Writes a value from an unsigned 64-bit input to a buffer of bytes. > +** Buffer overflow is not checked. Returns number of bytes written. > +*/ > +size_t write_uleb128(uint8_t *buffer, uint64_t value); > + > +/* > +** Writes a value from an signed 64-bit input to a buffer of bytes. > +** Buffer overflow is not checked. Returns number of bytes written. > +*/ > +size_t write_leb128(uint8_t *buffer, int64_t value); > + > +/* > +** Reads a value from a buffer of bytes to a uint64_t output. > +** Buffer overflow is not checked. Returns number of bytes read. If "buffer overflow" stands for "reading out of bounds", please reword this. Otherwise, I don't get it. > +*/ > +size_t read_uleb128(uint64_t *out, const uint8_t *buffer); > + > +/* > +** Reads a value from a buffer of bytes to a int64_t output. > +** Buffer overflow is not checked. Returns number of bytes read. Ditto. > +*/ > +size_t read_leb128(int64_t *out, const uint8_t *buffer); > + > +/* > +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. Ditto. > +** If more than n bytes is about to be consumed, returns 0 without touching out. > +*/ > +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n); > + > +/* > +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. Ditto. > +** If more than n bytes is about to be consumed, returns 0 without touching out. > +*/ > +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n); > + > +#endif > -- > 2.28.0 > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html -- Best regards, IM