[server-dev] [RFC] Interactive transactions in IProto

Thu Nov 15 21:04:03 MSK 2018

On Thursday, November 15, 2018 5:48:51 PM MSK Konstantin Osipov wrote:
> * Георгий Кириченко <georgy at tarantool.org> [18/11/15 14:58]:
> > > > > > > Definitely, it should be a global limit, a new box.cfg setting
> > > > > > > (as
> > > > > > > much as I hate settings there is no waywe can do away without it
> > > > > > > here afaiu).
> > > > > > 
> > > > > > But why? We don't limit the number of incoming connections. Why
> > > > > > should
> > > > > > we bother with limiting streams?
> > > > > 
> > > > > The number of incoming connections is limited implicitly by
> > > > > ulimit. A connection doesn't take database resources, it only
> > > > > consumers memory buffers and file descriptors. An open transaction
> > > > > potentially holds a lot more resources. E.g. a typical graph-based
> > > > > deadlock detector has complexity O(N^2) on the number of
> > > > > transactions.
> > > > 
> > > > Streams have nothing to do with deadlock resolvers or transactions.
> > > > Even without streams and even now I can create thousands of active
> > > > transactions. Streams are at a lower level that transactions. You
> > > > for unknown reason think that stream == transaction, but it is false.
> > > 
> > > You forget the reason we're adding streams. We can't use today's
> > > connections since they are mostly stateless. We need to add a
> > > state to the connection - an open transaction. And to be able to
> > > multiplex multiple states over the same connection, we're adding
> > > streams.
> > > 
> > > If you try to look where this is heading, it's a full support of
> > > SQL features related to current session.
> > > 
> > > The spec already says very little about impact on changes to the
> > > current session made in one stream to another stream - they can be
> > > dramatic. What if, for example, I change sql_default_engine in one
> > > stream, will it impact another? What about the current user?
> > > 
> > > In SQL, there are the following attributes of the session:
> > > 
> > > - current user
> > > - transaction
> > > - transaction isolation level
> > > - client character set
> > > - state of the diagnostics area
> > > - temporary table data
> > > 
> > > Are these going to be shared between streams? In other words, are
> > > you going to only make "the current transaction" a server side
> > > context of the stream, and share everything else? I think then you
> > > will stumble over the first subsequent requirement of ANSI we will
> > > eventually get to do. Besides, proxying won't work as intended.
> > 
> > The worst thing a sql connection is strict-synchronous - the next call
> > should not be started to process until the previous one is finished. But
> > this breaks current tarantool network batching. And if you plan to rely
> > on transaction 'is open' state so you even do not know will a currently
> > started request finish with open transaction or not. Also this might
> > break current behavior.
> > 
> > The one of biggest limitation of all known SQL servers (oracle, mysql, sql
> > server) is fact that only one transaction per connection is allowed. And
> > there is root cause for a lot of connection pool existence. Also this
> > requires to have a dedicated connection after proxy for each client.
> > 
> > Also returning transaction id not only breaks backward compatibility but
> > generates a lot of  questions how it should be done and how server should
> > react in cases of various misuses. Also there is a lots of undefined
> > behaviors and semantic questions, for example what the state on
> > connection after two calls are batched and each call has some count of
> > yields and the started a transaction. Or should we reset a transaction if
> > a call produces a yield. Easy to see that streams paradigm does not have
> > this issues because defines very simple rules.
> > 
> > Transaction looks to be pinned to stream and not shareable between streams
> > even for one connection. Stream should be maintained only if transaction
> > was opened or corresponding tx queue is not empty right now. Also if user
> > uses stream then they consciously change request processing principles
> > and tarantool might rely on that fact and preserve transaction for future
> > use. So, long-living transaction survive only if transaction exists.
> > Obviously stream allows us to provide backward compatibility without any
> > client changes.
> > 
> > Streams allow us to make all the things including in easy and clean
> > manner.
> > Yes, there are questions about exact server behavior, but that is more
> > technical questions like limits and error handling
> 
> I agree, but let's simply stop pretending streams are "just about
> the order", and say something like:
> 
> - IPROTO_BEGIN/COMMIT/ROLLBACK only works if IPROTO_STREAM_ID is non-zero
> - if stream id is zero, then dangling transactions are rolled back
>   as they are now
> - all requests inside a stream are strictly sequential
I'm agreed with the points you suggest.

> - a stream owns its own diagnostics, transaction, transaction
>   isolation level, and possibly authenticated user (see below).
> - better yet, IPROTO_SQL_EXECUTE is only available if stream id is
>   non-zero
> 
> Then we need to decide how to manage streams. Since a
> stream may have a lot of state I don't like the idea of implicit
> open/close of a stream. Imagine the server closes a stream
> containing a session local temporary table implicitly. The user
> may get confused. Why not add a separate command to create a
> stream, or extend IPROTO_AUTH with option to create a stream?

Ok, now i think it might be useful to require explicit stream open/close api 
because it likes worth to preserve an executed function returning and pass 
then in the next call as well as save current context parameters such as an 
isolation level. Also this allows us to provide a better error handling 
environment as. In that case we able to provide different credentials for 
streams.