Tarantool development patches archive
 help / color / mirror / Atom feed
From: Konstantin Osipov <kostja.osipov@gmail.com>
To: Georgy Kirichenko <georgy@tarantool.org>
Cc: v.shpilevoy@tarantool.org, tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets
Date: Tue, 12 Nov 2019 11:50:52 +0300	[thread overview]
Message-ID: <20191112085052.GD25103@atlas> (raw)
In-Reply-To: <4071647.IRq02QQX9c@localhost>

* Georgy Kirichenko <georgy@tarantool.org> [19/11/12 11:03]:
> > > value. So the new proposal is:
> > A back of the envelope calculation shows that if you have 40
> > nodes, 32 cores, 100 connections on each instance, it gives you
> > 
> > 40*32*100*5mb = 625 GB of prepared statement cache!
> Why did you make this fake up?
> 1. why did you pack instances so dense?

What you should have done is looked at the worst case. 

The case above is not the worst. The desnity doesn't matter much 
- halve it and you get 300GB, even 50G for a cache on 40 nodes 
is too much.

> 2. why did you use 5mb instead of 1-2-5kb, for instance.

That's the default cache size. And it's a small size, too, a
single compiled AST for a SELECT can easily take 16-64kb (measure
it!), so it's a cache for ~100-200 statements.

> 4. why did you count connections because the only thing is matter is count of 
> unique sql statements.

With a per-session cache, you need to count connections. With a
global cache the numbers are not bad.

> 4. why did you use the word `cache`.
> Prepared statements and statement cache are the completely different 
> substances. Obviously, if somebody prepared a statement then they wish the 
> statement to do not be evicted from the prepared statements list.

This is some kind of 1980's myth. The reason everybody does it
this way is that virtually all databases that support prepared
statements are vertically scalable process-per-connection.

Otherwise, the approach has no benefits. In particular, there is
no value in making the user explicitly deallocate a prepared
statement. It's a pain to not be able to re-use a prepared
statement handle across connections. 

There are some risks in having a global transparent cache, too:
i.e. one has to watch out for security issues, if the same
statement can be reused across connections with different
credentials. But these risks do not materialize in case of
tarantool.

> > 
> > If you make the cache opaque to the client you don't have
> > this mess with explicit referencing and dereferencing from
> > sessions.
> User should explicitly unprepare any statement they prepared before. Or it 
> would be done by on_session_stop trigger. As we do not want to evict a 
> prepared statement from prepared statement list, really.

There is no reason for this, except copying someone else's design.
You're just making users do more work, and doing more work on the
server side on disconnect.

> Yes, and it is the feature requirement - if a statement was prepared then it 
> should be prepared until user did not request the statement unprepare.

There is no such requirement. 

> So I would like to explain my own vision of the feature:
> 1. Each instance has its own prepared statements table from which a statement 
> vm could be fetched using the statement key (sql source string hash, for 
> instance). Also the table may contain some meta: list of accessed objects (to 
> prevent us from access checking on each invocation), count of invocation, sql 
> source, reference counter.
> 2. Each session has a limit which limits the amount of resources, count of 
> prepared statement for instance.

So users have double trouble: not only they need to explicitly
deallocate prepare, they also need to be able to handle an
out-of-resources error, such as
out-of-prepared-statements-cache-memory. 

This "vision" happens to copy all the bugs of Oracle. 

- connect/disconnect is slow. 
- users have to do more work
- there is a new kind of error - out of statement cache - which
  one has to be prepared for. There is no way to work this error
  around on application side
- the server side still has to handle concurrent execution 
  of the same prepared statement, the protocol allows it today,
  so the implementaiton either has to ban it or create a copy on
  the fly (i.e. implicitly prepare another vdbe anyway).

The only benefit of explicit allocate/deallocate is that there is
some kind of guarantee that the statement is not going to get
parsed again when it is executed. This is very little value: I'm
not aware of cases when this is needed, and one can create a view instead. 

> 3. Each session contain the list of prepared statement identifiers used by this 
> session.
> 4. Statement cache could be used by the feature but it is not essential. At 
> least it relaxes the feature implementation complexity - we should not 
> implement a good statement cache, eviction politics and other stuff like this 
> right now.
> 5. If a client loses their session (crash, network failure) then it should 
> reestablish a session and reprepare all statement it wish to use. It is more 
> simply to implement in comparison with any caching, eviction, tracking and 
> other things you suggested.

No, it's not simple to implement, certainly not simpler than a
global cache with its own opaque eviction strategy. The simplest
implementation of the global cache is not caching anything at all
and preparing before every execute - which is good enough for JDBC
and ODBC.

> 6. Right now a connection could contain only one session and it is not a stop-
> factor - the only thing we should keep in mind - prepared statement relates to 
> a session not to a connection. Then, when we will implement streams and 
> multiple sessions per connection, we should not do any more changes to support 
> prepared statements in this circumstances.

If you suggest that a single session can execute only one
statement concurrently, you should be explicit about it - and it's
a huge limitation.


-- 
Konstantin Osipov, Moscow, Russia

  reply	other threads:[~2019-11-12  8:50 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07  1:04 [Tarantool-patches] [PATCH 00/15] sql: prepared statements Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 01/15] sql: remove sql_prepare_v2() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 02/15] sql: refactor sql_prepare() and sqlPrepare() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 03/15] sql: move sql_prepare() declaration to box/execute.h Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 04/15] sql: rename sqlPrepare() to sql_compile() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 05/15] sql: move sql_finalize() to execute.h Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 06/15] port: increase padding of struct port Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 07/15] port: add dump format and request type to port_sql Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 08/15] sql: resurrect sql_bind_parameter_count() function Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 09/15] sql: resurrect sql_bind_parameter_name() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 10/15] sql: add sql_schema_version() Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 11/15] sql: introduce sql_stmt_sizeof() function Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 12/15] box: increment schema_version on ddl operations Nikita Pettik
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets Nikita Pettik
2019-11-10 23:40   ` Konstantin Osipov
2019-11-11 10:53     ` Nikita Pettik
2019-11-11 18:35       ` Konstantin Osipov
2019-11-12  7:54         ` Georgy Kirichenko
2019-11-12  8:50           ` Konstantin Osipov [this message]
2019-11-12  9:30             ` Georgy Kirichenko
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 14/15] box: introduce prepared statements Nikita Pettik
2019-11-10 23:42   ` Konstantin Osipov
2019-11-07  1:04 ` [Tarantool-patches] [PATCH 15/15] netbox: " Nikita Pettik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191112085052.GD25103@atlas \
    --to=kostja.osipov@gmail.com \
    --cc=georgy@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox