[Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets

Konstantin Osipov kostja.osipov at gmail.com
Mon Nov 11 21:35:15 MSK 2019


* Nikita Pettik <korablev at tarantool.org> [19/11/11 21:27]:
> Today Kirill came to me and said that there's request from solution
> team to make cache global: according to them they use many connections
> to execute the same set of queries. That turns out global cache to be
> reasonable. However, to allow execute the same prepared statement via
> different sessions we should keep trace of original query - its hash
> value. So the new proposal is:

A back of the envelope calculation shows that if you have 40
nodes, 32 cores, 100 connections on each instance, it gives you 

40*32*100*5mb = 625 GB of prepared statement cache!

Why does someone else have to point it out to the core team?

> - Each session holds map <stmt_id : query_hash>
> - There's one global hash <query_hash : stmt>
> - Each statement now has reference counter: each "prepare" call via
>   different session bumps its value; each "unprepare" call results in
>   its decrement. Disconect of session leads to counter decrements of all
>   related statements. Statement is not immediately deallocated if its
>   counter is 0: we maintaint global list of statements to be freed when
>   memory limit is reached
> - On "prepare" call we firstly check if there's enough space for statement.
>   If memory limit has reached, we traverse list of statements to be freed
>   and release all occupied resources. It would allow us to solve possible
>   overhead on disconnect event.

Here's how having cache global is connected to using string
identifiers:

If you make the cache opaque to the client you don't have
this mess with explicit referencing and dereferencing from
sessions.

Besides, as I already mentioned multiple times, a single session
may use multiple references to the same prepared statement - since
the protocol is fully asynchronous.

The cache has to be pure LRU, with automatic allocation and
expiration of objects. This will save CPU cycles on disconnect
as well as make the implementation simpler.

You can't do this if you have numeric identifiers, because you
must keep the object around as long as the identifier is around.

-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list