From: Igor Munkin <imun@tarantool.org> To: Sergey Kaplun <skaplun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Date: Mon, 21 Dec 2020 01:44:33 +0300 [thread overview] Message-ID: <20201220224433.GQ5396@tarantool.org> (raw) In-Reply-To: <7f5108768b70c9ffd2561f89c4974379085921e4.1608142899.git.skaplun@tarantool.org> Sergey, Thanks for the patch! Please, consider the comments below. On 16.12.20, Sergey Kaplun wrote: > This patch introduces module for reading and writing leb128 compression. > It will be used for streaming profiling events writing, that will be > added at the next patches. > > Part of tarantool/tarantool#5442 > --- > src/Makefile | 5 +- > src/Makefile.dep | 1 + > src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++ > src/utils/leb128.h | 55 ++++++++++++++++++++ > 4 files changed, 183 insertions(+), 2 deletions(-) > create mode 100644 src/utils/leb128.c > create mode 100644 src/utils/leb128.h > > diff --git a/src/Makefile b/src/Makefile > index caa49f9..be7ed95 100644 > --- a/src/Makefile > +++ b/src/Makefile Please, adjust these changes considering the comments to the first patch. I propose to use either <lj_utils.*> or <lj_utils_leb128.*> for the name. > @@ -468,6 +468,7 @@ endif <snipped> > diff --git a/src/utils/leb128.c b/src/utils/leb128.c > new file mode 100644 > index 0000000..921e5bc > --- /dev/null > +++ b/src/utils/leb128.c > @@ -0,0 +1,124 @@ > +/* > +** Working with LEB128/ULEB128 encoding. > +** > +** Major portions taken verbatim or adapted from the LuaVela. > +** Copyright (C) 2015-2019 IPONWEB Ltd. > +*/ > + > +#include <stdint.h> > +#include <stddef.h> Why do you include this again instead of using leb128.h? > + > +#define LINK_BIT (0x80) > +#define MIN_TWOBYTE_VALUE (0x80) > +#define PAYLOAD_MASK (0x7f) > +#define SHIFT_STEP (7) > +#define LEB_SIGN_BIT (0x40) > + > +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */ > + > +size_t write_uleb128(uint8_t *buffer, uint64_t value) > +{ > + size_t i = 0; > + > + for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > + } The braces are excess. > + buffer[i++] = (uint8_t)value; > + > + return i; > +} > + > +size_t write_leb128(uint8_t *buffer, int64_t value) > +{ > + size_t i = 0; > + > + for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) { What is 0x40? If this is <LEB_SIGN_BIT>, then just use the constant here. Otherwise create a new one with the comment. Please, do not use magic numbers. > + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT); > + } The braces are excess. > + buffer[i++] = (uint8_t)(value & PAYLOAD_MASK); > + > + return i; > +} > + > +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */ > + > +/* > +** NB! For each LEB128 type (signed/unsigned) we have two versions of read Minor: It's better to use XXX for these cases in comments. We already discussed this with Vlad here[1] (search for "What is 'XXX'?"). > +** functions: The one consuming unlimited number of input octets and the one > +** consuming not more than given number of input octets. Currently reading > +** is not used in performance critical places, so these two functions are > +** implemented via single low-level function + run-time mode check. Feel free > +** to change if this becomes a bottleneck. Well, you can also add LJ_AINLINE for a low-level function, or simply add a similar hint by yourself (I personally prefer the first one). > +*/ > + > +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded, > + size_t n) AFAICS, <n> argument is used only in case <guarded> is set to 1. Moreover, <n> can't be 0 when <guarded> is set, otherwise this is a nilpotent function. So it seems you can drop the <guarded> parameter in favour of the following contract for <n>: * n == 0 is for guarded == 0 && n == 0 * n > 0 is for guarded == 1 && n > 0 This also relates to <_read_leb128>. > +{ > + size_t i = 0; > + uint64_t value = 0; > + uint64_t shift = 0; > + uint8_t octet; > + > + for(;;) { > + if (guarded && i + 1 > n) { > + return 0; > + } The braces are excess. > + octet = buffer[i++]; > + value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift; > + shift += SHIFT_STEP; > + if (!(octet & LINK_BIT)) { > + break; > + } The braces are excess. > + } > + > + *out = value; > + return i; > +} > + <snipped> > +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded, > + size_t n) > +{ > + size_t i = 0; > + int64_t value = 0; > + uint64_t shift = 0; > + uint8_t octet; A mess with whitespace above. > + > + for(;;) { > + if (guarded && i + 1 > n) { > + return 0; > + } The braces are excess. > + octet = buffer[i++]; > + value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift; > + shift += SHIFT_STEP; > + if (!(octet & LINK_BIT)) { > + break; > + } The braces are excess. > + } > + > + if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) { > + value |= -(1 << shift); > + } The braces are excess. > + > + *out = value; > + return i; > +} > + <snipped> > diff --git a/src/utils/leb128.h b/src/utils/leb128.h > new file mode 100644 > index 0000000..46d90bc > --- /dev/null > +++ b/src/utils/leb128.h > @@ -0,0 +1,55 @@ > +/* > +** Interfaces for working with LEB128/ULEB128 encoding. > +** > +** Major portions taken verbatim or adapted from the LuaVela. > +** Copyright (C) 2015-2019 IPONWEB Ltd. > +*/ > + > +#ifndef _LJ_UTILS_LEB128_H > +#define _LJ_UTILS_LEB128_H > + > +#include <stddef.h> > +#include <stdint.h> > + > +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */ > +#define LEB128_U64_MAXSIZE 10 The naming looks odd to me. Considering my comment for the first patch, I propose to use something matching "lj_u?leb128_(read|write)(_n)?". By the way, the order of the interfaces is also odd. > + > +/* > +** Writes a value from an unsigned 64-bit input to a buffer of bytes. > +** Buffer overflow is not checked. Returns number of bytes written. > +*/ > +size_t write_uleb128(uint8_t *buffer, uint64_t value); > + > +/* > +** Writes a value from an signed 64-bit input to a buffer of bytes. > +** Buffer overflow is not checked. Returns number of bytes written. > +*/ > +size_t write_leb128(uint8_t *buffer, int64_t value); > + > +/* > +** Reads a value from a buffer of bytes to a uint64_t output. > +** Buffer overflow is not checked. Returns number of bytes read. If "buffer overflow" stands for "reading out of bounds", please reword this. Otherwise, I don't get it. > +*/ > +size_t read_uleb128(uint64_t *out, const uint8_t *buffer); > + > +/* > +** Reads a value from a buffer of bytes to a int64_t output. > +** Buffer overflow is not checked. Returns number of bytes read. Ditto. > +*/ > +size_t read_leb128(int64_t *out, const uint8_t *buffer); > + > +/* > +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. Ditto. > +** If more than n bytes is about to be consumed, returns 0 without touching out. > +*/ > +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n); > + > +/* > +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read. Ditto. > +** If more than n bytes is about to be consumed, returns 0 without touching out. > +*/ > +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n); > + > +#endif > -- > 2.28.0 > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html -- Best regards, IM
next prev parent reply other threads:[~2020-12-20 22:44 UTC|newest] Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun 2020-12-20 21:27 ` Igor Munkin 2020-12-23 18:20 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun 2020-12-20 22:44 ` Igor Munkin [this message] 2020-12-23 22:34 ` Sergey Kaplun 2020-12-24 9:11 ` Igor Munkin 2020-12-25 8:46 ` Sergey Kaplun 2020-12-23 16:50 ` Sergey Ostanevich 2020-12-23 22:36 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module Sergey Kaplun 2020-12-21 9:24 ` Igor Munkin 2020-12-24 6:46 ` Sergey Kaplun 2020-12-24 15:45 ` Sergey Ostanevich 2020-12-24 21:20 ` Sergey Kaplun 2020-12-25 9:37 ` Igor Munkin 2020-12-25 10:13 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module Sergey Kaplun 2020-12-21 10:30 ` Igor Munkin 2020-12-24 7:00 ` Sergey Kaplun 2020-12-24 9:36 ` Igor Munkin 2020-12-25 8:45 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates Sergey Kaplun 2020-12-25 11:07 ` Sergey Ostanevich 2020-12-25 11:23 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 06/11] core: introduce new mem_L field Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API Sergey Kaplun 2020-12-20 22:46 ` Igor Munkin 2020-12-24 6:50 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 08/11] profile: introduce memory profiler Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for " Sergey Kaplun 2020-12-24 16:32 ` Sergey Ostanevich 2020-12-24 21:25 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory Sergey Kaplun 2020-12-20 22:46 ` Igor Munkin 2020-12-24 6:47 ` Sergey Kaplun 2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser Sergey Kaplun 2020-12-24 23:09 ` Igor Munkin 2020-12-25 8:41 ` Sergey Kaplun 2020-12-21 10:43 ` [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Igor Munkin 2020-12-24 7:02 ` Sergey Kaplun
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201220224433.GQ5396@tarantool.org \ --to=imun@tarantool.org \ --cc=skaplun@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox