Tarantool development patches archive
 help / color / mirror / Atom feed
From: Georgy Kirichenko <georgy@tarantool.org>
To: tarantool-patches@dev.tarantool.org
Cc: v.shpilevoy@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets
Date: Tue, 12 Nov 2019 10:54:54 +0300	[thread overview]
Message-ID: <4071647.IRq02QQX9c@localhost> (raw)
In-Reply-To: <20191111183515.GA25103@atlas>

[-- Attachment #1: Type: text/plain, Size: 4924 bytes --]

On Monday, November 11, 2019 9:35:15 PM MSK Konstantin Osipov wrote:
> * Nikita Pettik <korablev@tarantool.org> [19/11/11 21:27]:
> > Today Kirill came to me and said that there's request from solution
> > team to make cache global: according to them they use many connections
> > to execute the same set of queries. That turns out global cache to be
> > reasonable. However, to allow execute the same prepared statement via
> > different sessions we should keep trace of original query - its hash
> 
> > value. So the new proposal is:
> A back of the envelope calculation shows that if you have 40
> nodes, 32 cores, 100 connections on each instance, it gives you
> 
> 40*32*100*5mb = 625 GB of prepared statement cache!
Why did you make this fake up?
1. why did you pack instances so dense?
2. why did you use 5mb instead of 1-2-5kb, for instance.
4. why did you count connections because the only thing is matter is count of 
unique sql statements.
4. why did you use the word `cache`.
Prepared statements and statement cache are the completely different 
substances. Obviously, if somebody prepared a statement then they wish the 
statement to do not be evicted from the prepared statements list.
> 
> Why does someone else have to point it out to the core team?
Seriously?
> 
> > - Each session holds map <stmt_id : query_hash>
> > - There's one global hash <query_hash : stmt>
> > - Each statement now has reference counter: each "prepare" call via
> > 
> >   different session bumps its value; each "unprepare" call results in
> >   its decrement. Disconect of session leads to counter decrements of all
> >   related statements. Statement is not immediately deallocated if its
> >   counter is 0: we maintaint global list of statements to be freed when
> >   memory limit is reached
> > 
> > - On "prepare" call we firstly check if there's enough space for
> > statement.
> > 
> >   If memory limit has reached, we traverse list of statements to be freed
> >   and release all occupied resources. It would allow us to solve possible
> >   overhead on disconnect event.
> 
> Here's how having cache global is connected to using string
> identifiers:
> 
> If you make the cache opaque to the client you don't have
> this mess with explicit referencing and dereferencing from
> sessions.
User should explicitly unprepare any statement they prepared before. Or it 
would be done by on_session_stop trigger. As we do not want to evict a 
prepared statement from prepared statement list, really.
> 
> Besides, as I already mentioned multiple times, a single session
> may use multiple references to the same prepared statement - since
> the protocol is fully asynchronous.
Wrong. The connection could but, please, do not forget about streams (which 
would encapsulate a single user session) we want to introduce in the near 
future.
> 
> The cache has to be pure LRU, with automatic allocation and
> expiration of objects. This will save CPU cycles on disconnect
> as well as make the implementation simpler.
One more time: we do not talk about statement cache but we discuss the 
prepared statement.
In any case we should prefer MPI instead of LRU and the discuss hot/cold zones 
politics. 
> 
> You can't do this if you have numeric identifiers, because you
> must keep the object around as long as the identifier is around.
Yes, and it is the feature requirement - if a statement was prepared then it 
should be prepared until user did not request the statement unprepare.

So I would like to explain my own vision of the feature:
1. Each instance has its own prepared statements table from which a statement 
vm could be fetched using the statement key (sql source string hash, for 
instance). Also the table may contain some meta: list of accessed objects (to 
prevent us from access checking on each invocation), count of invocation, sql 
source, reference counter.
2. Each session has a limit which limits the amount of resources, count of 
prepared statement for instance.
3. Each session contain the list of prepared statement identifiers used by this 
session.
4. Statement cache could be used by the feature but it is not essential. At 
least it relaxes the feature implementation complexity - we should not 
implement a good statement cache, eviction politics and other stuff like this 
right now.
5. If a client loses their session (crash, network failure) then it should 
reestablish a session and reprepare all statement it wish to use. It is more 
simply to implement in comparison with any caching, eviction, tracking and 
other things you suggested.
6. Right now a connection could contain only one session and it is not a stop-
factor - the only thing we should keep in mind - prepared statement relates to 
a session not to a connection. Then, when we will implement streams and 
multiple sessions per connection, we should not do any more changes to support 
prepared statements in this circumstances.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-11-12  7:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07  1:04 [Tarantool-patches] [PATCH 00/15] sql: prepared statements Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 01/15] sql: remove sql_prepare_v2() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 02/15] sql: refactor sql_prepare() and sqlPrepare() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 03/15] sql: move sql_prepare() declaration to box/execute.h Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 04/15] sql: rename sqlPrepare() to sql_compile() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 05/15] sql: move sql_finalize() to execute.h Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 06/15] port: increase padding of struct port Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 07/15] port: add dump format and request type to port_sql Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 08/15] sql: resurrect sql_bind_parameter_count() function Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 09/15] sql: resurrect sql_bind_parameter_name() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 10/15] sql: add sql_schema_version() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 11/15] sql: introduce sql_stmt_sizeof() function Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 12/15] box: increment schema_version on ddl operations Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets Nikita Pettik
2019-11-10 23:40   ` Konstantin Osipov
2019-11-11 10:53     ` Nikita Pettik
2019-11-11 18:35       ` Konstantin Osipov
2019-11-12  7:54         ` Georgy Kirichenko [this message]
2019-11-12  8:50           ` Konstantin Osipov
2019-11-12  9:30             ` Georgy Kirichenko
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 14/15] box: introduce prepared statements Nikita Pettik
2019-11-10 23:42   ` Konstantin Osipov
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 15/15] netbox: " Nikita Pettik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4071647.IRq02QQX9c@localhost \
    --to=georgy@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox