From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id D39016EC5A; Fri, 23 Jul 2021 14:07:32 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D39016EC5A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627038452; bh=rjQSWTQ4EMuG4+GIPX4hKJOjiTBO59gAgrDiS/oOsA8=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=cEZ7oEcqvCFOXYPprjFRiNQp27sdBzsP30LdK+9ik6FOcflPkcS9Uwxv2wz/zmX69 FPHH6oZhmRJLjo9Fphzdz4ThfdYb/56lk399Zo5OuXgzwp8m89VagbirRS5pczb7Yl lp4Vtatj/9mmEOba1F3DcxAV5vnHXLX495nQl0kM= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 713676EC55 for ; Fri, 23 Jul 2021 14:07:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 713676EC55 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1m6t1e-0004dl-P4; Fri, 23 Jul 2021 14:07:31 +0300 To: tarantool-patches@dev.tarantool.org Date: Fri, 23 Jul 2021 14:07:10 +0300 Message-Id: X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C30288BCF456A452EC92BAB6D044D5CCDE182A05F538085040ED9F848179057E7E7686CF61D42B9ECAA5495F8FBDE6FB86EED5172667482B8B X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7EA4B66823129EB3CEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AE01C1A20EF0A1348638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D84F59ABB595B255542B4A641AE8E0B48A117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC8C7ADC89C2F0B2A5A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD1828451B159A507268D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6753C3A5E0A5AB5B7089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975C16CF367659D795C0D727C9B29A11A403AAF49B25087A80E19C2B6934AE262D3EE7EAB7254005DCEDAAE47F5F63E737721E0A4E2319210D9B64D260DF9561598F01A9E91200F654B0AEA200A0D3D80EA68E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34A533D6DF7731C664DCD0FD2360D7159BBEC0CCFFBCD8F05D0F6EA5B4707F541BEB413FD54E557FAE1D7E09C32AA3244CF2D5508938C9266975663E2280E1E4D67101BF96129E4011729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojbL9S8ysBdXiEX0g4jkpDtcL5LiTmgAhB X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D64D73C385AAF80B7C4CF2D207A6C4536274CEFED1673C562683ABF942079399BFB559BB5D741EB966A65DFF43FF7BE03240331F90058701C67EA787935ED9F1B X-Mras: Ok Subject: [Tarantool-patches] [PATCH 00/20] Rewrite performance critical parts of net.box in C X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladimir Davydov via Tarantool-patches Reply-To: Vladimir Davydov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" https://github.com/tarantool/tarantool/tree/vdavydov/net-box-optimization This patch set rewrites performance-critical parts of net.box (response dispatching and IO loop) in C. It shouldn't introduce any user-visible changes or changes in the logic, because the C version was basically created by rewriting Lua code line-by-line. The goal of this work is to improve performance of CPU-bound applications that use net.box, such as vshard.router. To ensure that this patch does meet the expectations, I ran a simple benchmark [tnt-bench.lua], which issues multiple concurrent requests in a loop. The test measures RPS per wall time (WALL) and processor time (PROC). Concurrency is implemented with either fibers or futures. There are a few test cases that issue different kinds of requests: - REPLACE({k, 'bar', i + k}) - UPDATE({k}, {{'=', 2, 'bar'}, {'=', 3, i + k}}) - SELECT({k}) - CALL('bench_func', {1, 2, 3, 'foo', 'bar'}) where i and k are integers and bench_func is defined as follows: function bench_func(...) return {...} end The test was run on my laptop (i5-10210U 1.60GHz) for the following Tarantool versions built with CMAKE_BUILD_TYPE=RelWithDebInfo: - master: 2.9.0-165-ga02cfe60cf23 - patched: master + this patch set - poc: master + [tarantool-net-box-call-in-c.patch] The latter is a proof-of-concept version that I created before starting to work on this patch set. The results are below. /// USING FIBERS (SYNCHRONOUS) /// ---------+-----------------------------++-----------------------------+ | KRPS (WALL TIME) || KRPS (PROC TIME) | +---------+---------+---------++---------+---------+---------+ | master | patched | poc || master | patched | poc | ---------+---------+---------+---------++---------+---------+---------+ REPLACE | 162.628 | 268.349 | N/A || 221.402 | 459.965 | N/A | UPDATE | 126.905 | 195.835 | N/A || 173.635 | 334.609 | N/A | SELECT | 187.742 | 353.043 | N/A || 207.605 | 427.147 | N/A | CALL | 163.700 | 290.717 | 375.412 || 213.560 | 481.349 | 761.238 | /// USING FUTURES (ASYNCHRONOUS) /// ---------+-----------------------------++-----------------------------+ | RPS (WALL TIME) || RPS (PROC TIME) | +---------+---------+---------++---------+---------+---------+ | master | patched | poc || master | patched | poc | ---------+---------+---------+---------++---------+---------+---------+ REPLACE | 191.529 | 249.810 | N/A || 277.648 | 413.360 | N/A | UPDATE | 155.116 | 173.850 | N/A || 231.603 | 273.624 | N/A | SELECT | 238.657 | 286.699 | N/A || 269.040 | 333.706 | N/A | CALL | 192.041 | 241.571 | N/A || 261.085 | 365.139 | N/A | So the patch set increases RPS of synchronous net.box.call, which is the primary method used by vshard.router, by about 75%. Other synchronous methods show the improvement between 50 and 90%. The requests per processor second ratio is doubled by the patch, which means that it also reduces CPU usage during the test - judging by KRPS[WALL]/KRPS[PROC] the ratio, it is decreased from 75% to 60%. Asynchronous calls don't show as much of an improvement as synchronous, because per each asynchronous call we still have to create a 'future' object in Lua. Still, the improvement is quite noticeable - 30% for REPLACE, 10% for UPDATE, 20% for SELECT, 25% for CALL. What is surprising is that the PoC version still outperforms the patched version by about 30% and shows even lower CPU usage (50% vs 60%). This is probably caused by the IO loop implementation. I'm going to look into that separately. Links: [tnt-bench.lua] https://gist.github.com/locker/7faeb39129a2421a85568c512288208f [tarantool-net-box-call-in-c.patch] https://gist.github.com/locker/cd357f9482bfd207ffe7df610c4b2fba For more information about net.box performance, see - C/C++ vs Net.Box Connector Performance https://docs.google.com/document/d/1v-d-qQ9zilOdDgDJZWTzs0cSJ9XVXLWRfoQnxNfYttc - vshard.router.call performance analysis https://docs.google.com/document/d/1VwMzs75Umi5IhFw-r54wj0b8s_d9WDCYFZ3lRMFzfB8 Vladimir Davydov (20): net.box: fix console connection breakage when request is discarded net.box: wake up wait_result callers when request is discarded net.box: do not check worker_fiber in request:result,is_ready net.box: remove decode_push from method_decoder table net.box: use decode_tuple instead of decode_get net.box: rename request.ctx to request.format net.box: use integer id instead of method name net.box: remove useless encode optimization net.box: rewrite request encoder in C lua/utils: make char ptr Lua CTIDs public net.box: rewrite response decoder in C net.box: rewrite error decoder in C net.box: rewrite send_and_recv_{iproto,console} in C net.box: rename netbox_{prepare,encode}_request to {begin,end} net.box: rewrite request implementation in C net.box: store next_request_id in C code net.box: rewrite console handlers in C net.box: rewrite iproto handlers in C net.box: merge new_id, new_request and encode_method net.box: do not create request object in Lua for sync requests src/box/lua/net_box.c | 1714 ++++++++++++++--- src/box/lua/net_box.lua | 733 ++----- src/lib/core/errinj.h | 1 + src/lua/utils.c | 4 +- src/lua/utils.h | 2 + test/box/access.result | 24 +- test/box/access.test.lua | 20 +- test/box/errinj.result | 1 + ...net.box_console_connections_gh-2677.result | 2 +- ...t.box_console_connections_gh-2677.test.lua | 2 +- .../net.box_discard_console_request.result | 62 + .../net.box_discard_console_request.test.lua | 19 + test/box/net.box_discard_gh-3107.result | 11 + test/box/net.box_discard_gh-3107.test.lua | 3 + .../net.box_incorrect_iterator_gh-841.result | 9 +- ...net.box_incorrect_iterator_gh-841.test.lua | 9 +- test/box/net.box_iproto_hangs_gh-3464.result | 2 +- .../box/net.box_iproto_hangs_gh-3464.test.lua | 2 +- .../net.box_long-poll_input_gh-3400.result | 13 +- .../net.box_long-poll_input_gh-3400.test.lua | 8 +- test/box/suite.ini | 2 +- 21 files changed, 1735 insertions(+), 908 deletions(-) create mode 100644 test/box/net.box_discard_console_request.result create mode 100644 test/box/net.box_discard_console_request.test.lua -- 2.25.1