Re: [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Alexander Turenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Nikiforov <void@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance
Date: Thu, 21 Jan 2021 18:31:09 +0300	[thread overview]
Message-ID: <20210121153109.5rgx6wttrkkmcrc5@tkn_work_nb> (raw)
In-Reply-To: <b3f6a37aef7bc59668e9f1e6b9638920a8f52846.1610357625.git.void@tarantool.org>

On Mon, Jan 11, 2021 at 12:45:01PM +0300, Sergey Nikiforov wrote:
> Unnecessary checks were removed from internal loops.
> Benchmark shows that performance is now ~1.19 times higher
> (release build, Intel Core I7-9700K, only one thread).
> ---
> 
> Branch: https://github.com/tarantool/tarantool/tree/void234/gh-3069-fix-base64-memory-overrun-v6
> 
>  third_party/base64.c | 35 ++++++++++++++++++++++++++---------
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/third_party/base64.c b/third_party/base64.c
> index 3ecd0843a..74260971b 100644
> --- a/third_party/base64.c
> +++ b/third_party/base64.c
> @@ -258,10 +258,10 @@ base64_decode_block(const char *in_base64, int in_len,
>  		{
>  	case step_a:
>  			do {
> -				if (in_pos == in_end || out_pos >= out_end)
> +				if (in_pos >= in_end)

Okay, it is the loop invariant and so we can move it outside the loop.
But why we sink it down rather than hoisting up? We know that there is
no space in the output buffer -- so there is no sense to read the input,
right? Or it will break 'correctness' of the hypotetical use of the
`state`?

So we should imagine how the state would be used if it would be
possible. From which input position the processing would be resumed if
the code would provide such ability? How a caller supposed to
distinguish end of the output buffer from successful processing the
whole input?

Aside of this, skipping unrecognized input bytes looks as another
incorrectness. We should be ready to work with untrusted input, perform
appropriate validation and report errors.

We can skip newlines, but I guess there is no sense to do this in a
loop: only one newline is a row is allowed (but it worth to verify the
guess with the standard).

It is hard to discuss an optimization until we'll define how the result
should look in the corner cases like 'short output buffer', 'unexpected
end of an input buffer' and so on. Current code does not report errors:
it silently skips invalid input, accepts trimmed input, silently trims
output at the buffer boundary. I don't understand how it supposed to
work with untrusted input.

I propose to stop attempts to do some 'quick fixups' and either work on
the code or don't.

If we would define behaviour first, I would spend MUCH less time trying
to deduce it.

That's really hard to discuss correctness of dead code. To be honest, it
looks easier to me to finish the chained processing implementation, test
it and remove than rather than trying to interpter the code (both
existing and imaginary) in the mind.

BTW, it seems we able to entirely eliminate the output buffer length
checks for the first (out_len * 3 / 4) input bytes and than process the
tail with full checks. (After we'll done with strict and correct API.)

next prev parent reply	other threads:[~2021-01-21 15:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11  9:44 [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Sergey Nikiforov via Tarantool-patches
2021-01-11  9:45 ` [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) Sergey Nikiforov via Tarantool-patches
2021-01-21  2:16   ` Alexander Turenko via Tarantool-patches
2021-01-26 16:37     ` Sergey Nikiforov via Tarantool-patches
2021-01-26 16:48       ` Sergey Nikiforov via Tarantool-patches
2021-02-20 11:30         ` Alexander Turenko via Tarantool-patches
2021-02-20 12:49       ` Alexander Turenko via Tarantool-patches
2021-02-25  8:25         ` Sergey Nikiforov via Tarantool-patches
2021-02-27 18:47           ` Alexander Turenko via Tarantool-patches
2021-01-11  9:45 ` [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance Sergey Nikiforov via Tarantool-patches
2021-01-21 15:31   ` Alexander Turenko via Tarantool-patches [this message]
2021-01-26 16:37     ` Sergey Nikiforov via Tarantool-patches
2021-02-20 12:51       ` Alexander Turenko via Tarantool-patches
2021-01-12 19:39 ` [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Alexander Turenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210121153109.5rgx6wttrkkmcrc5@tkn_work_nb \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=alexander.turenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=void@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox