[Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
Igor Munkin
imun at tarantool.org
Mon Dec 21 01:44:33 MSK 2020
Sergey,
Thanks for the patch! Please, consider the comments below.
On 16.12.20, Sergey Kaplun wrote:
> This patch introduces module for reading and writing leb128 compression.
> It will be used for streaming profiling events writing, that will be
> added at the next patches.
>
> Part of tarantool/tarantool#5442
> ---
> src/Makefile | 5 +-
> src/Makefile.dep | 1 +
> src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++
> src/utils/leb128.h | 55 ++++++++++++++++++++
> 4 files changed, 183 insertions(+), 2 deletions(-)
> create mode 100644 src/utils/leb128.c
> create mode 100644 src/utils/leb128.h
>
> diff --git a/src/Makefile b/src/Makefile
> index caa49f9..be7ed95 100644
> --- a/src/Makefile
> +++ b/src/Makefile
Please, adjust these changes considering the comments to the first
patch. I propose to use either <lj_utils.*> or <lj_utils_leb128.*> for
the name.
> @@ -468,6 +468,7 @@ endif
<snipped>
> diff --git a/src/utils/leb128.c b/src/utils/leb128.c
> new file mode 100644
> index 0000000..921e5bc
> --- /dev/null
> +++ b/src/utils/leb128.c
> @@ -0,0 +1,124 @@
> +/*
> +** Working with LEB128/ULEB128 encoding.
> +**
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#include <stdint.h>
> +#include <stddef.h>
Why do you include this again instead of using leb128.h?
> +
> +#define LINK_BIT (0x80)
> +#define MIN_TWOBYTE_VALUE (0x80)
> +#define PAYLOAD_MASK (0x7f)
> +#define SHIFT_STEP (7)
> +#define LEB_SIGN_BIT (0x40)
> +
> +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */
> +
> +size_t write_uleb128(uint8_t *buffer, uint64_t value)
> +{
> + size_t i = 0;
> +
> + for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> + }
The braces are excess.
> + buffer[i++] = (uint8_t)value;
> +
> + return i;
> +}
> +
> +size_t write_leb128(uint8_t *buffer, int64_t value)
> +{
> + size_t i = 0;
> +
> + for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
What is 0x40? If this is <LEB_SIGN_BIT>, then just use the constant
here. Otherwise create a new one with the comment. Please, do not use
magic numbers.
> + buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> + }
The braces are excess.
> + buffer[i++] = (uint8_t)(value & PAYLOAD_MASK);
> +
> + return i;
> +}
> +
> +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */
> +
> +/*
> +** NB! For each LEB128 type (signed/unsigned) we have two versions of read
Minor: It's better to use XXX for these cases in comments. We already
discussed this with Vlad here[1] (search for "What is 'XXX'?").
> +** functions: The one consuming unlimited number of input octets and the one
> +** consuming not more than given number of input octets. Currently reading
> +** is not used in performance critical places, so these two functions are
> +** implemented via single low-level function + run-time mode check. Feel free
> +** to change if this becomes a bottleneck.
Well, you can also add LJ_AINLINE for a low-level function, or simply
add a similar hint by yourself (I personally prefer the first one).
> +*/
> +
> +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded,
> + size_t n)
AFAICS, <n> argument is used only in case <guarded> is set to 1.
Moreover, <n> can't be 0 when <guarded> is set, otherwise this is a
nilpotent function. So it seems you can drop the <guarded> parameter in
favour of the following contract for <n>:
* n == 0 is for guarded == 0 && n == 0
* n > 0 is for guarded == 1 && n > 0
This also relates to <_read_leb128>.
> +{
> + size_t i = 0;
> + uint64_t value = 0;
> + uint64_t shift = 0;
> + uint8_t octet;
> +
> + for(;;) {
> + if (guarded && i + 1 > n) {
> + return 0;
> + }
The braces are excess.
> + octet = buffer[i++];
> + value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift;
> + shift += SHIFT_STEP;
> + if (!(octet & LINK_BIT)) {
> + break;
> + }
The braces are excess.
> + }
> +
> + *out = value;
> + return i;
> +}
> +
<snipped>
> +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded,
> + size_t n)
> +{
> + size_t i = 0;
> + int64_t value = 0;
> + uint64_t shift = 0;
> + uint8_t octet;
A mess with whitespace above.
> +
> + for(;;) {
> + if (guarded && i + 1 > n) {
> + return 0;
> + }
The braces are excess.
> + octet = buffer[i++];
> + value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift;
> + shift += SHIFT_STEP;
> + if (!(octet & LINK_BIT)) {
> + break;
> + }
The braces are excess.
> + }
> +
> + if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) {
> + value |= -(1 << shift);
> + }
The braces are excess.
> +
> + *out = value;
> + return i;
> +}
> +
<snipped>
> diff --git a/src/utils/leb128.h b/src/utils/leb128.h
> new file mode 100644
> index 0000000..46d90bc
> --- /dev/null
> +++ b/src/utils/leb128.h
> @@ -0,0 +1,55 @@
> +/*
> +** Interfaces for working with LEB128/ULEB128 encoding.
> +**
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#ifndef _LJ_UTILS_LEB128_H
> +#define _LJ_UTILS_LEB128_H
> +
> +#include <stddef.h>
> +#include <stdint.h>
> +
> +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */
> +#define LEB128_U64_MAXSIZE 10
The naming looks odd to me. Considering my comment for the first patch,
I propose to use something matching "lj_u?leb128_(read|write)(_n)?".
By the way, the order of the interfaces is also odd.
> +
> +/*
> +** Writes a value from an unsigned 64-bit input to a buffer of bytes.
> +** Buffer overflow is not checked. Returns number of bytes written.
> +*/
> +size_t write_uleb128(uint8_t *buffer, uint64_t value);
> +
> +/*
> +** Writes a value from an signed 64-bit input to a buffer of bytes.
> +** Buffer overflow is not checked. Returns number of bytes written.
> +*/
> +size_t write_leb128(uint8_t *buffer, int64_t value);
> +
> +/*
> +** Reads a value from a buffer of bytes to a uint64_t output.
> +** Buffer overflow is not checked. Returns number of bytes read.
If "buffer overflow" stands for "reading out of bounds", please reword
this. Otherwise, I don't get it.
> +*/
> +size_t read_uleb128(uint64_t *out, const uint8_t *buffer);
> +
> +/*
> +** Reads a value from a buffer of bytes to a int64_t output.
> +** Buffer overflow is not checked. Returns number of bytes read.
Ditto.
> +*/
> +size_t read_leb128(int64_t *out, const uint8_t *buffer);
> +
> +/*
> +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
> +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
Ditto.
> +** If more than n bytes is about to be consumed, returns 0 without touching out.
> +*/
> +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n);
> +
> +/*
> +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
> +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
Ditto.
> +** If more than n bytes is about to be consumed, returns 0 without touching out.
> +*/
> +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n);
> +
> +#endif
> --
> 2.28.0
>
[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html
--
Best regards,
IM
More information about the Tarantool-patches
mailing list