[Tarantool-patches] [PATCH 00/20] Rewrite performance critical parts of net.box in C

Vladimir Davydov vdavydov at tarantool.org
Fri Jul 23 14:07:10 MSK 2021


https://github.com/tarantool/tarantool/tree/vdavydov/net-box-optimization

This patch set rewrites performance-critical parts of net.box (response
dispatching and IO loop) in C. It shouldn't introduce any user-visible
changes or changes in the logic, because the C version was basically
created by rewriting Lua code line-by-line. The goal of this work is to
improve performance of CPU-bound applications that use net.box, such as
vshard.router.

To ensure that this patch does meet the expectations, I ran a simple
benchmark [tnt-bench.lua], which issues multiple concurrent requests in
a loop. The test measures RPS per wall time (WALL) and processor time
(PROC). Concurrency is implemented with either fibers or futures.

There are a few test cases that issue different kinds of requests:

 - REPLACE({k, 'bar', i + k})
 - UPDATE({k}, {{'=', 2, 'bar'}, {'=', 3, i + k}})
 - SELECT({k})
 - CALL('bench_func', {1, 2, 3, 'foo', 'bar'})

where i and k are integers and bench_func is defined as follows:

  function bench_func(...) return {...} end

The test was run on my laptop (i5-10210U 1.60GHz) for the following
Tarantool versions built with CMAKE_BUILD_TYPE=RelWithDebInfo:

 - master: 2.9.0-165-ga02cfe60cf23
 - patched: master + this patch set
 - poc: master + [tarantool-net-box-call-in-c.patch]

The latter is a proof-of-concept version that I created before starting
to work on this patch set.

The results are below.


/// USING FIBERS (SYNCHRONOUS) ///

---------+-----------------------------++-----------------------------+
         |      KRPS (WALL TIME)       ||      KRPS (PROC TIME)       |
         +---------+---------+---------++---------+---------+---------+
         |  master | patched |   poc   ||  master | patched |   poc   |
---------+---------+---------+---------++---------+---------+---------+
 REPLACE | 162.628 | 268.349 |   N/A   || 221.402 | 459.965 |   N/A   |
 UPDATE  | 126.905 | 195.835 |   N/A   || 173.635 | 334.609 |   N/A   |
 SELECT  | 187.742 | 353.043 |   N/A   || 207.605 | 427.147 |   N/A   |
 CALL    | 163.700 | 290.717 | 375.412 || 213.560 | 481.349 | 761.238 |


/// USING FUTURES (ASYNCHRONOUS) ///

---------+-----------------------------++-----------------------------+
         |       RPS (WALL TIME)       ||       RPS (PROC TIME)       |
         +---------+---------+---------++---------+---------+---------+
         |  master | patched |   poc   ||  master | patched |   poc   |
---------+---------+---------+---------++---------+---------+---------+
 REPLACE | 191.529 | 249.810 |   N/A   || 277.648 | 413.360 |   N/A   |
 UPDATE  | 155.116 | 173.850 |   N/A   || 231.603 | 273.624 |   N/A   |
 SELECT  | 238.657 | 286.699 |   N/A   || 269.040 | 333.706 |   N/A   |
 CALL    | 192.041 | 241.571 |   N/A   || 261.085 | 365.139 |   N/A   |


So the patch set increases RPS of synchronous net.box.call, which is the
primary method used by vshard.router, by about 75%. Other synchronous
methods show the improvement between 50 and 90%. The requests per
processor second ratio is doubled by the patch, which means that it also
reduces CPU usage during the test - judging by KRPS[WALL]/KRPS[PROC]
the ratio, it is decreased from 75% to 60%.

Asynchronous calls don't show as much of an improvement as synchronous,
because per each asynchronous call we still have to create a 'future'
object in Lua. Still, the improvement is quite noticeable - 30% for
REPLACE, 10% for UPDATE, 20% for SELECT, 25% for CALL.

What is surprising is that the PoC version still outperforms the patched
version by about 30% and shows even lower CPU usage (50% vs 60%). This
is probably caused by the IO loop implementation. I'm going to look into
that separately.


Links:

[tnt-bench.lua] https://gist.github.com/locker/7faeb39129a2421a85568c512288208f
[tarantool-net-box-call-in-c.patch] https://gist.github.com/locker/cd357f9482bfd207ffe7df610c4b2fba


For more information about net.box performance, see

 - C/C++ vs Net.Box Connector Performance
   https://docs.google.com/document/d/1v-d-qQ9zilOdDgDJZWTzs0cSJ9XVXLWRfoQnxNfYttc

 - vshard.router.call performance analysis
   https://docs.google.com/document/d/1VwMzs75Umi5IhFw-r54wj0b8s_d9WDCYFZ3lRMFzfB8


Vladimir Davydov (20):
  net.box: fix console connection breakage when request is discarded
  net.box: wake up wait_result callers when request is discarded
  net.box: do not check worker_fiber in request:result,is_ready
  net.box: remove decode_push from method_decoder table
  net.box: use decode_tuple instead of decode_get
  net.box: rename request.ctx to request.format
  net.box: use integer id instead of method name
  net.box: remove useless encode optimization
  net.box: rewrite request encoder in C
  lua/utils: make char ptr Lua CTIDs public
  net.box: rewrite response decoder in C
  net.box: rewrite error decoder in C
  net.box: rewrite send_and_recv_{iproto,console} in C
  net.box: rename netbox_{prepare,encode}_request to {begin,end}
  net.box: rewrite request implementation in C
  net.box: store next_request_id in C code
  net.box: rewrite console handlers in C
  net.box: rewrite iproto handlers in C
  net.box: merge new_id, new_request and encode_method
  net.box: do not create request object in Lua for sync requests

 src/box/lua/net_box.c                         | 1714 ++++++++++++++---
 src/box/lua/net_box.lua                       |  733 ++-----
 src/lib/core/errinj.h                         |    1 +
 src/lua/utils.c                               |    4 +-
 src/lua/utils.h                               |    2 +
 test/box/access.result                        |   24 +-
 test/box/access.test.lua                      |   20 +-
 test/box/errinj.result                        |    1 +
 ...net.box_console_connections_gh-2677.result |    2 +-
 ...t.box_console_connections_gh-2677.test.lua |    2 +-
 .../net.box_discard_console_request.result    |   62 +
 .../net.box_discard_console_request.test.lua  |   19 +
 test/box/net.box_discard_gh-3107.result       |   11 +
 test/box/net.box_discard_gh-3107.test.lua     |    3 +
 .../net.box_incorrect_iterator_gh-841.result  |    9 +-
 ...net.box_incorrect_iterator_gh-841.test.lua |    9 +-
 test/box/net.box_iproto_hangs_gh-3464.result  |    2 +-
 .../box/net.box_iproto_hangs_gh-3464.test.lua |    2 +-
 .../net.box_long-poll_input_gh-3400.result    |   13 +-
 .../net.box_long-poll_input_gh-3400.test.lua  |    8 +-
 test/box/suite.ini                            |    2 +-
 21 files changed, 1735 insertions(+), 908 deletions(-)
 create mode 100644 test/box/net.box_discard_console_request.result
 create mode 100644 test/box/net.box_discard_console_request.test.lua

-- 
2.25.1



More information about the Tarantool-patches mailing list