From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id EC0C142F4C3 for ; Mon, 11 Nov 2019 13:53:57 +0300 (MSK) Date: Mon, 11 Nov 2019 13:53:55 +0300 From: Nikita Pettik Message-ID: <20191111105355.GC82024@tarantool.org> References: <20191107010455.64457-1-korablev@tarantool.org> <20191107010455.64457-14-korablev@tarantool.org> <20191110234029.GA15733@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191110234029.GA15733@atlas> Subject: Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov , tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org On 11 Nov 02:40, Konstantin Osipov wrote: > * Nikita Pettik [19/11/07 04:33]: > > This patch introduces cache (as data structure) to handle prepared > > statements and a set of interface functions (insert, delete, find, > > erase) to operate on it. Cache under the hood is hash table with integer > > ids as keys and prepared statements (struct sql_stmt which is an alias > > for struct Vdbe) as values. Size of cache is regulated by quota via > > box.cfg{sql_cache_size} parameter. Cache is supposed to be attached to > > session, which means it is destroyed when session is ended. To erase > > session manually, there's special handle - session's method > > sql_cache_erase(). Default cache size is assumed to be 5 Mb (like in > > PosgreSQL). > > I admire the depth of architecture vision here - let's just do what postgresql does!! It's just default value, it can be changed manually to whatever value user wants. Feel free to suggest better option. > If tarantool connections are going to be as slow as postgresql > connections (I can imagine iterating over vdbe objects in a > single-threaded environment to free them up on disconnect, > especially in a thundering herd way when a bunch of connections is > established or dropped), why use tarantool at all? > > The whole idea of using string identifiers was that the cache is > global and there is zero overhead on connect or disconnect. How kind of identifiers is related to cache locality? I don't understand how global cache could help us to solve problem of cleaning up VM objects on disconect (on connect there's no overhead anyway). Today Kirill came to me and said that there's request from solution team to make cache global: according to them they use many connections to execute the same set of queries. That turns out global cache to be reasonable. However, to allow execute the same prepared statement via different sessions we should keep trace of original query - its hash value. So the new proposal is: - Each session holds map - There's one global hash - Each statement now has reference counter: each "prepare" call via different session bumps its value; each "unprepare" call results in its decrement. Disconect of session leads to counter decrements of all related statements. Statement is not immediately deallocated if its counter is 0: we maintaint global list of statements to be freed when memory limit is reached - On "prepare" call we firstly check if there's enough space for statement. If memory limit has reached, we traverse list of statements to be freed and release all occupied resources. It would allow us to solve possible overhead on disconnect event. I'll update RFC according to this proposal if everyone is OK with it.