From: Sergey Kaplun <skaplun@tarantool.org> To: Igor Munkin <imun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Date: Thu, 24 Dec 2020 01:34:54 +0300 [thread overview] Message-ID: <20201223223454.GF9101@root> (raw) In-Reply-To: <20201220224433.GQ5396@tarantool.org> Igor, Thanks for the review! On 21.12.20, Igor Munkin wrote: > Sergey, > > Thanks for the patch! Please, consider the comments below. > > On 16.12.20, Sergey Kaplun wrote: > > This patch introduces module for reading and writing leb128 compression. > > It will be used for streaming profiling events writing, that will be > > added at the next patches. > > > > Part of tarantool/tarantool#5442 > > --- > > src/Makefile | 5 +- > > src/Makefile.dep | 1 + > > src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++ > > src/utils/leb128.h | 55 ++++++++++++++++++++ > > 4 files changed, 183 insertions(+), 2 deletions(-) > > create mode 100644 src/utils/leb128.c > > create mode 100644 src/utils/leb128.h > > > > diff --git a/src/Makefile b/src/Makefile > > index caa49f9..be7ed95 100644 > > --- a/src/Makefile > > +++ b/src/Makefile > > Please, adjust these changes considering the comments to the first > patch. I propose to use either <lj_utils.*> or <lj_utils_leb128.*> for > the name. OK, no problem. > > > @@ -468,6 +468,7 @@ endif > > <snipped> > > > diff --git a/src/utils/leb128.c b/src/utils/leb128.c > > new file mode 100644 > > index 0000000..921e5bc > > --- /dev/null > > +++ b/src/utils/leb128.c > > @@ -0,0 +1,124 @@ > > +/* > > +** Working with LEB128/ULEB128 encoding. > > +** > > +** Major portions taken verbatim or adapted from the LuaVela. > > +** Copyright (C) 2015-2019 IPONWEB Ltd. > > +*/ > > + > > +#include <stdint.h> > > +#include <stddef.h> > > Why do you include this again instead of using leb128.h? It's enough. Definitions from <leb128.h> is redundant here. > > > + > > +#define LINK_BIT (0x80) > > +#define MIN_TWOBYTE_VALUE (0x80) > > +#define PAYLOAD_MASK (0x7f) > > +#define SHIFT_STEP (7) > > +#define LEB_SIGN_BIT (0x40) > > + > > +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */ > > + > > +size_t write_uleb128(uint8_t *buffer, uint64_t value) > > +{ > > + size_t i = 0; > > + > > + for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { > > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > > + } > > The braces are excess. Fixed. > > > + buffer[i++] = (uint8_t)value; > > + > > + return i; > > +} > > + > > +size_t write_leb128(uint8_t *buffer, int64_t value) > > +{ > > + size_t i = 0; > > + > > + for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { > > What is 0x40? If this is <LEB_SIGN_BIT>, then just use the constant > here. Otherwise create a new one with the comment. Please, do not use > magic numbers. This necessary bit propagation for correct encoding. I'll drop comment about it in the next version. > > > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > > + } > > The braces are excess. Fixed. > > > + buffer[i++] = (uint8_t)(value & PAYLOAD_MASK); > > + > > + return i; > > +} > > + > > +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */ > > + > > +/* > > +** NB! For each LEB128 type (signed/unsigned) we have two versions of read > > Minor: It's better to use XXX for these cases in comments. We already > discussed this with Vlad here[1] (search for "What is 'XXX'?"). OK, IINM I've already seen "NB:" comment somewhere in LuaJIT sources. But "XXX" is good to me. > > > +** functions: The one consuming unlimited number of input octets and the one > > +** consuming not more than given number of input octets. Currently reading > > +** is not used in performance critical places, so these two functions are > > +** implemented via single low-level function + run-time mode check. Feel free > > +** to change if this becomes a bottleneck. > > Well, you can also add LJ_AINLINE for a low-level function, or simply > add a similar hint by yourself (I personally prefer the first one). OK. > > > +*/ > > + > > +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded, > > + size_t n) > > AFAICS, <n> argument is used only in case <guarded> is set to 1. > Moreover, <n> can't be 0 when <guarded> is set, otherwise this is a > nilpotent function. So it seems you can drop the <guarded> parameter in > favour of the following contract for <n>: > * n == 0 is for guarded == 0 && n == 0 > * n > 0 is for guarded == 1 && n > 0 > > This also relates to <_read_leb128>. Yes, good point, thanks! > > > +{ > > + size_t i = 0; > > + uint64_t value = 0; > > + uint64_t shift = 0; > > + uint8_t octet; > > + > > + for(;;) { > > + if (guarded && i + 1 > n) { > > + return 0; > > + } > > The braces are excess. Fixed. > > > + octet = buffer[i++]; > > + value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift; > > + shift += SHIFT_STEP; > > + if (!(octet & LINK_BIT)) { > > + break; > > + } > > The braces are excess. Fixed. > > > + } > > + > > + *out = value; > > + return i; > > +} > > + > > <snipped> > > > +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded, > > + size_t n) > > +{ > > + size_t i = 0; > > + int64_t value = 0; > > + uint64_t shift = 0; > > + uint8_t octet; > > A mess with whitespace above. Fixed. > > > + > > + for(;;) { > > + if (guarded && i + 1 > n) { > > + return 0; > > + } > > The braces are excess. Fixed. > > > + octet = buffer[i++]; > > + value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift; > > + shift += SHIFT_STEP; > > + if (!(octet & LINK_BIT)) { > > + break; > > + } > > The braces are excess. Fixed. > > > + } > > + > > + if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) { > > + value |= -(1 << shift); > > + } > > The braces are excess. Fixed. > > > + > > + *out = value; > > + return i; > > +} > > + > > <snipped> > > > diff --git a/src/utils/leb128.h b/src/utils/leb128.h > > new file mode 100644 > > index 0000000..46d90bc > > --- /dev/null > > +++ b/src/utils/leb128.h > > @@ -0,0 +1,55 @@ > > +/* > > +** Interfaces for working with LEB128/ULEB128 encoding. > > +** > > +** Major portions taken verbatim or adapted from the LuaVela. > > +** Copyright (C) 2015-2019 IPONWEB Ltd. > > +*/ > > + > > +#ifndef _LJ_UTILS_LEB128_H > > +#define _LJ_UTILS_LEB128_H > > + > > +#include <stddef.h> > > +#include <stdint.h> > > + > > +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */ > > +#define LEB128_U64_MAXSIZE 10 > > The naming looks odd to me. Considering my comment for the first patch, > I propose to use something matching "lj_u?leb128_(read|write)(_n)?". > > By the way, the order of the interfaces is also odd. I'll rewrite naming considering new naming of this translation unit. > > > + > > +/* > > +** Writes a value from an unsigned 64-bit input to a buffer of bytes. > > +** Buffer overflow is not checked. Returns number of bytes written. > > +*/ > > +size_t write_uleb128(uint8_t *buffer, uint64_t value); > > + > > +/* > > +** Writes a value from an signed 64-bit input to a buffer of bytes. > > +** Buffer overflow is not checked. Returns number of bytes written. > > +*/ > > +size_t write_leb128(uint8_t *buffer, int64_t value); > > + > > +/* > > +** Reads a value from a buffer of bytes to a uint64_t output. > > +** Buffer overflow is not checked. Returns number of bytes read. > > If "buffer overflow" stands for "reading out of bounds", please reword > this. Otherwise, I don't get it. Yep, fixed, thanks! > > > +*/ > > +size_t read_uleb128(uint64_t *out, const uint8_t *buffer); > > + > > +/* > > +** Reads a value from a buffer of bytes to a int64_t output. > > +** Buffer overflow is not checked. Returns number of bytes read. > > Ditto. Yep, fixed, thanks! > > > +*/ > > +size_t read_leb128(int64_t *out, const uint8_t *buffer); > > + > > +/* > > +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more > > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. > > Ditto. Yep, fixed, thanks! > > > +** If more than n bytes is about to be consumed, returns 0 without touching out. > > +*/ > > +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n); > > + > > +/* > > +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more > > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. > > Ditto. Yep, fixed, thanks! > > > +** If more than n bytes is about to be consumed, returns 0 without touching out. > > +*/ > > +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n); > > + > > +#endif > > -- > > 2.28.0 > > > > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html > > -- > Best regards, > IM -- Best regards, Sergey Kaplun
next prev parent reply other threads:[~2020-12-23 22:35 UTC|newest] Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun 2020-12-20 21:27 ` Igor Munkin 2020-12-23 18:20 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun 2020-12-20 22:44 ` Igor Munkin 2020-12-23 22:34 ` Sergey Kaplun [this message] 2020-12-24 9:11 ` Igor Munkin 2020-12-25 8:46 ` Sergey Kaplun 2020-12-23 16:50 ` Sergey Ostanevich 2020-12-23 22:36 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module Sergey Kaplun 2020-12-21 9:24 ` Igor Munkin 2020-12-24 6:46 ` Sergey Kaplun 2020-12-24 15:45 ` Sergey Ostanevich 2020-12-24 21:20 ` Sergey Kaplun 2020-12-25 9:37 ` Igor Munkin 2020-12-25 10:13 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module Sergey Kaplun 2020-12-21 10:30 ` Igor Munkin 2020-12-24 7:00 ` Sergey Kaplun 2020-12-24 9:36 ` Igor Munkin 2020-12-25 8:45 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates Sergey Kaplun 2020-12-25 11:07 ` Sergey Ostanevich 2020-12-25 11:23 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 06/11] core: introduce new mem_L field Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API Sergey Kaplun 2020-12-20 22:46 ` Igor Munkin 2020-12-24 6:50 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 08/11] profile: introduce memory profiler Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for " Sergey Kaplun 2020-12-24 16:32 ` Sergey Ostanevich 2020-12-24 21:25 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory Sergey Kaplun 2020-12-20 22:46 ` Igor Munkin 2020-12-24 6:47 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser Sergey Kaplun 2020-12-24 23:09 ` Igor Munkin 2020-12-25 8:41 ` Sergey Kaplun 2020-12-21 10:43 ` [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Igor Munkin 2020-12-24 7:02 ` Sergey Kaplun
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201223223454.GF9101@root \ --to=skaplun@tarantool.org \ --cc=imun@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox