Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Georgy Kirichenko <georgy@tarantool.org>
To: Konstantin Osipov <kostja.osipov@gmail.com>
Cc: v.shpilevoy@tarantool.org, tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets
Date: Tue, 12 Nov 2019 12:30:38 +0300	[thread overview]
Message-ID: <38384932.slqLeNPVtO@localhost> (raw)
In-Reply-To: <20191112085052.GD25103@atlas>

[-- Attachment #1: Type: text/plain, Size: 8712 bytes --]

On Tuesday, November 12, 2019 11:50:52 AM MSK Konstantin Osipov wrote:
> * Georgy Kirichenko <georgy@tarantool.org> [19/11/12 11:03]:
> > > > value. So the new proposal is:
> > > A back of the envelope calculation shows that if you have 40
> > > nodes, 32 cores, 100 connections on each instance, it gives you
> > > 
> > > 40*32*100*5mb = 625 GB of prepared statement cache!
> > 
> > Why did you make this fake up?
> > 1. why did you pack instances so dense?
> 
> What you should have done is looked at the worst case.
> 
> The case above is not the worst. The desnity doesn't matter much
> - halve it and you get 300GB, even 50G for a cache on 40 nodes
> is too much.
> 
> > 2. why did you use 5mb instead of 1-2-5kb, for instance.
> 
> That's the default cache size. And it's a small size, too, a
> single compiled AST for a SELECT can easily take 16-64kb (measure
> it!), so it's a cache for ~100-200 statements.
I do not talk about cache size. Which cache, why cache? I consider only a 
prepared statement resource footprint and count of prepared statements on a 
single instance. There is no cache yet.
> 
> > 4. why did you count connections because the only thing is matter is count
> > of unique sql statements.
> 
> With a per-session cache, you need to count connections. With a
> global cache the numbers are not bad.
We will have more than one session per connection, please keep it in mind. All 
statements in a session would execute sequentially otherwise there is no 
transaction consistency in a session boundaries. 
> 
> > 4. why did you use the word `cache`.
> > Prepared statements and statement cache are the completely different
> > substances. Obviously, if somebody prepared a statement then they wish the
> > statement to do not be evicted from the prepared statements list.
> 
> This is some kind of 1980's myth. 
And it does not mean that this approach is bad.
> The reason everybody does it
> this way is that virtually all databases that support prepared
> statements are vertically scalable process-per-connection.
> 
> Otherwise, the approach has no benefits. In particular, there is
> no value in making the user explicitly deallocate a prepared
> statement. It's a pain to not be able to re-use a prepared
> statement handle across connections.
Your approach suggested there is no more prepared statements because there is 
no difference between prepared and unprepared except its argument marshaling 
which could be done using client library. Because simple statement also should 
be put into a statement cache. And if I prepared a statement I definitely want 
my prepared statement still persist without any dependencies how many other 
statement were executed.  
Anyway, in case on reconnect a session should be reestablished so there is no 
big deal to reprepare all statement. And it is no a big deal to iterate over 
session prepared statement maps to decrement all corresponding statement 
reference counts.
Also my approach allows to reuse a statement prepared by other session because 
they both would share the prepared statement reference (sql source hash).
Additionally, it is a very strange opinion that a client should be able to 
reuse prepared statement event in cases when a server crashes or a client 
initiates a connection to another server or event cluster. Otherwise there is 
no difference between prepared statements and simple unprepared statements.
> 
> There are some risks in having a global transparent cache, too:
> i.e. one has to watch out for security issues, if the same
> statement can be reused across connections with different
> credentials. But these risks do not materialize in case of
> tarantool.
Wrong, please keep in mind multisession connection, vshard and proxy.
Also a prepared statement sql could contain session-local variables. 
> 
> > > If you make the cache opaque to the client you don't have
> > > this mess with explicit referencing and dereferencing from
> > > sessions.
> > 
> > User should explicitly unprepare any statement they prepared before. Or it
> > would be done by on_session_stop trigger. As we do not want to evict a
> > prepared statement from prepared statement list, really.
> 
> There is no reason for this, except copying someone else's design.
> You're just making users do more work, and doing more work on the
> server side on disconnect.
Amount of work I suggest to do is much less in comparison with a good 
statement cache (simple LRU is completely not enough), statement repreparation 
if it is unknown or evicted, cache statistic, global cache sharing and access 
checking. My approach is straightforward and predictable.
> 
> > Yes, and it is the feature requirement - if a statement was prepared then
> > it should be prepared until user did not request the statement unprepare.
> There is no such requirement.
> 
> > So I would like to explain my own vision of the feature:
> > 1. Each instance has its own prepared statements table from which a
> > statement vm could be fetched using the statement key (sql source string
> > hash, for instance). Also the table may contain some meta: list of
> > accessed objects (to prevent us from access checking on each invocation),
> > count of invocation, sql source, reference counter.
> > 2. Each session has a limit which limits the amount of resources, count of
> > prepared statement for instance.
> 
> So users have double trouble: not only they need to explicitly
> deallocate prepare, they also need to be able to handle an
> out-of-resources error, such as
> out-of-prepared-statements-cache-memory.
Yes, and it is completely Ok.
> 
> This "vision" happens to copy all the bugs of Oracle.
You compare my approach with the most popular business database, thanks a lot.
> 
> - connect/disconnect is slow.
Please estimate costs of disconnect in terms of network pings, sql planning 
and wal latency (the only thing the server should do is to decrement around 
100-200 integers)
> - users have to do more work
Please estimate amount of work in comparison with users business logic 
implementation
> - there is a new kind of error - out of statement cache - which
I would like repeat: we do not talk about statement cache.
>   one has to be prepared for. There is no way to work this error
>   around on application side
> - the server side still has to handle concurrent execution
>   of the same prepared statement, the protocol allows it today,
>   so the implementaiton either has to ban it or create a copy on
>   the fly (i.e. implicitly prepare another vdbe anyway).
I do not see how did you make it up. Any prepared statement implementation 
should be ably to handle many prepared statement executions in parallel.
The other thing my approach suggests: vdbe should be able executed in 
parallel, but this limitation is applicable anyway.
> 
> The only benefit of explicit allocate/deallocate is that there is
> some kind of guarantee that the statement is not going to get
> parsed again when it is executed. This is very little value: I'm
> not aware of cases when this is needed, and one can create a view instead.
Please, make a proof.
> 
> > 3. Each session contain the list of prepared statement identifiers used by
> > this session.
> > 4. Statement cache could be used by the feature but it is not essential.
> > At
> > least it relaxes the feature implementation complexity - we should not
> > implement a good statement cache, eviction politics and other stuff like
> > this right now.
> > 5. If a client loses their session (crash, network failure) then it should
> > reestablish a session and reprepare all statement it wish to use. It is
> > more simply to implement in comparison with any caching, eviction,
> > tracking and other things you suggested.
> 
> No, it's not simple to implement, certainly not simpler than a
> global cache with its own opaque eviction strategy. The simplest
> implementation of the global cache is not caching anything at all
> and preparing before every execute - which is good enough for JDBC
> and ODBC.
Well, so you suggested not to implement prepared statements.
> 
> > 6. Right now a connection could contain only one session and it is not a
> > stop- factor - the only thing we should keep in mind - prepared statement
> > relates to a session not to a connection. Then, when we will implement
> > streams and multiple sessions per connection, we should not do any more
> > changes to support prepared statements in this circumstances.
> 
> If you suggest that a single session can execute only one
> statement concurrently, you should be explicit about it - and it's
> a huge limitation.
Please make a distinction between connection and session, we already discussed 
it some month before.


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

next prev parent reply	other threads:[~2019-11-12  9:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07  1:04 [Tarantool-patches] [PATCH 00/15] sql: prepared statements Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 01/15] sql: remove sql_prepare_v2() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 02/15] sql: refactor sql_prepare() and sqlPrepare() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 03/15] sql: move sql_prepare() declaration to box/execute.h Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 04/15] sql: rename sqlPrepare() to sql_compile() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 05/15] sql: move sql_finalize() to execute.h Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 06/15] port: increase padding of struct port Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 07/15] port: add dump format and request type to port_sql Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 08/15] sql: resurrect sql_bind_parameter_count() function Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 09/15] sql: resurrect sql_bind_parameter_name() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 10/15] sql: add sql_schema_version() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 11/15] sql: introduce sql_stmt_sizeof() function Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 12/15] box: increment schema_version on ddl operations Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets Nikita Pettik
2019-11-10 23:40   ` Konstantin Osipov
2019-11-11 10:53     ` Nikita Pettik
2019-11-11 18:35       ` Konstantin Osipov
2019-11-12  7:54         ` Georgy Kirichenko
2019-11-12  8:50           ` Konstantin Osipov
2019-11-12  9:30             ` Georgy Kirichenko [this message]
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 14/15] box: introduce prepared statements Nikita Pettik
2019-11-10 23:42   ` Konstantin Osipov
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 15/15] netbox: " Nikita Pettik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38384932.slqLeNPVtO@localhost \
    --to=georgy@tarantool.org \
    --cc=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox