From: Konstantin Osipov <kostja.osipov@gmail.com> To: Georgy Kirichenko <georgy@tarantool.org> Cc: v.shpilevoy@tarantool.org, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets Date: Tue, 12 Nov 2019 11:50:52 +0300 [thread overview] Message-ID: <20191112085052.GD25103@atlas> (raw) In-Reply-To: <4071647.IRq02QQX9c@localhost> * Georgy Kirichenko <georgy@tarantool.org> [19/11/12 11:03]: > > > value. So the new proposal is: > > A back of the envelope calculation shows that if you have 40 > > nodes, 32 cores, 100 connections on each instance, it gives you > > > > 40*32*100*5mb = 625 GB of prepared statement cache! > Why did you make this fake up? > 1. why did you pack instances so dense? What you should have done is looked at the worst case. The case above is not the worst. The desnity doesn't matter much - halve it and you get 300GB, even 50G for a cache on 40 nodes is too much. > 2. why did you use 5mb instead of 1-2-5kb, for instance. That's the default cache size. And it's a small size, too, a single compiled AST for a SELECT can easily take 16-64kb (measure it!), so it's a cache for ~100-200 statements. > 4. why did you count connections because the only thing is matter is count of > unique sql statements. With a per-session cache, you need to count connections. With a global cache the numbers are not bad. > 4. why did you use the word `cache`. > Prepared statements and statement cache are the completely different > substances. Obviously, if somebody prepared a statement then they wish the > statement to do not be evicted from the prepared statements list. This is some kind of 1980's myth. The reason everybody does it this way is that virtually all databases that support prepared statements are vertically scalable process-per-connection. Otherwise, the approach has no benefits. In particular, there is no value in making the user explicitly deallocate a prepared statement. It's a pain to not be able to re-use a prepared statement handle across connections. There are some risks in having a global transparent cache, too: i.e. one has to watch out for security issues, if the same statement can be reused across connections with different credentials. But these risks do not materialize in case of tarantool. > > > > If you make the cache opaque to the client you don't have > > this mess with explicit referencing and dereferencing from > > sessions. > User should explicitly unprepare any statement they prepared before. Or it > would be done by on_session_stop trigger. As we do not want to evict a > prepared statement from prepared statement list, really. There is no reason for this, except copying someone else's design. You're just making users do more work, and doing more work on the server side on disconnect. > Yes, and it is the feature requirement - if a statement was prepared then it > should be prepared until user did not request the statement unprepare. There is no such requirement. > So I would like to explain my own vision of the feature: > 1. Each instance has its own prepared statements table from which a statement > vm could be fetched using the statement key (sql source string hash, for > instance). Also the table may contain some meta: list of accessed objects (to > prevent us from access checking on each invocation), count of invocation, sql > source, reference counter. > 2. Each session has a limit which limits the amount of resources, count of > prepared statement for instance. So users have double trouble: not only they need to explicitly deallocate prepare, they also need to be able to handle an out-of-resources error, such as out-of-prepared-statements-cache-memory. This "vision" happens to copy all the bugs of Oracle. - connect/disconnect is slow. - users have to do more work - there is a new kind of error - out of statement cache - which one has to be prepared for. There is no way to work this error around on application side - the server side still has to handle concurrent execution of the same prepared statement, the protocol allows it today, so the implementaiton either has to ban it or create a copy on the fly (i.e. implicitly prepare another vdbe anyway). The only benefit of explicit allocate/deallocate is that there is some kind of guarantee that the statement is not going to get parsed again when it is executed. This is very little value: I'm not aware of cases when this is needed, and one can create a view instead. > 3. Each session contain the list of prepared statement identifiers used by this > session. > 4. Statement cache could be used by the feature but it is not essential. At > least it relaxes the feature implementation complexity - we should not > implement a good statement cache, eviction politics and other stuff like this > right now. > 5. If a client loses their session (crash, network failure) then it should > reestablish a session and reprepare all statement it wish to use. It is more > simply to implement in comparison with any caching, eviction, tracking and > other things you suggested. No, it's not simple to implement, certainly not simpler than a global cache with its own opaque eviction strategy. The simplest implementation of the global cache is not caching anything at all and preparing before every execute - which is good enough for JDBC and ODBC. > 6. Right now a connection could contain only one session and it is not a stop- > factor - the only thing we should keep in mind - prepared statement relates to > a session not to a connection. Then, when we will implement streams and > multiple sessions per connection, we should not do any more changes to support > prepared statements in this circumstances. If you suggest that a single session can execute only one statement concurrently, you should be explicit about it - and it's a huge limitation. -- Konstantin Osipov, Moscow, Russia
next prev parent reply other threads:[~2019-11-12 8:50 UTC|newest] Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-07 1:04 [Tarantool-patches] [PATCH 00/15] sql: prepared statements Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 01/15] sql: remove sql_prepare_v2() Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 02/15] sql: refactor sql_prepare() and sqlPrepare() Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 03/15] sql: move sql_prepare() declaration to box/execute.h Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 04/15] sql: rename sqlPrepare() to sql_compile() Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 05/15] sql: move sql_finalize() to execute.h Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 06/15] port: increase padding of struct port Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 07/15] port: add dump format and request type to port_sql Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 08/15] sql: resurrect sql_bind_parameter_count() function Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 09/15] sql: resurrect sql_bind_parameter_name() Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 10/15] sql: add sql_schema_version() Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 11/15] sql: introduce sql_stmt_sizeof() function Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 12/15] box: increment schema_version on ddl operations Nikita Pettik 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets Nikita Pettik 2019-11-10 23:40 ` Konstantin Osipov 2019-11-11 10:53 ` Nikita Pettik 2019-11-11 18:35 ` Konstantin Osipov 2019-11-12 7:54 ` Georgy Kirichenko 2019-11-12 8:50 ` Konstantin Osipov [this message] 2019-11-12 9:30 ` Georgy Kirichenko 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 14/15] box: introduce prepared statements Nikita Pettik 2019-11-10 23:42 ` Konstantin Osipov 2019-11-07 1:04 ` [Tarantool-patches] [PATCH 15/15] netbox: " Nikita Pettik
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191112085052.GD25103@atlas \ --to=kostja.osipov@gmail.com \ --cc=georgy@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 13/15] sql: introduce cache for prepared statemets' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox