From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8161942F4C7 for ; Mon, 11 Nov 2019 21:35:18 +0300 (MSK) Received: by mail-wm1-f65.google.com with SMTP id z26so359743wmi.4 for ; Mon, 11 Nov 2019 10:35:18 -0800 (PST) Date: Mon, 11 Nov 2019 21:35:15 +0300 From: Konstantin Osipov Message-ID: <20191111183515.GA25103@atlas> References: <20191107010455.64457-1-korablev@tarantool.org> <20191107010455.64457-14-korablev@tarantool.org> <20191110234029.GA15733@atlas> <20191111105355.GC82024@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191111105355.GC82024@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org * Nikita Pettik [19/11/11 21:27]: > Today Kirill came to me and said that there's request from solution > team to make cache global: according to them they use many connections > to execute the same set of queries. That turns out global cache to be > reasonable. However, to allow execute the same prepared statement via > different sessions we should keep trace of original query - its hash > value. So the new proposal is: A back of the envelope calculation shows that if you have 40 nodes, 32 cores, 100 connections on each instance, it gives you 40*32*100*5mb = 625 GB of prepared statement cache! Why does someone else have to point it out to the core team? > - Each session holds map > - There's one global hash > - Each statement now has reference counter: each "prepare" call via > different session bumps its value; each "unprepare" call results in > its decrement. Disconect of session leads to counter decrements of all > related statements. Statement is not immediately deallocated if its > counter is 0: we maintaint global list of statements to be freed when > memory limit is reached > - On "prepare" call we firstly check if there's enough space for statement. > If memory limit has reached, we traverse list of statements to be freed > and release all occupied resources. It would allow us to solve possible > overhead on disconnect event. Here's how having cache global is connected to using string identifiers: If you make the cache opaque to the client you don't have this mess with explicit referencing and dereferencing from sessions. Besides, as I already mentioned multiple times, a single session may use multiple references to the same prepared statement - since the protocol is fully asynchronous. The cache has to be pure LRU, with automatic allocation and expiration of objects. This will save CPU cycles on disconnect as well as make the implementation simpler. You can't do this if you have numeric identifiers, because you must keep the object around as long as the identifier is around. -- Konstantin Osipov, Moscow, Russia