From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id A711520D55 for ; Wed, 24 Jul 2019 18:41:30 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Q8SA2D8tdHgF for ; Wed, 24 Jul 2019 18:41:30 -0400 (EDT) Received: from smtp47.i.mail.ru (smtp47.i.mail.ru [94.100.177.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 4A6B220CCB for ; Wed, 24 Jul 2019 18:41:30 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH] net.box: ignore absence of _vcollation References: <20190708121338.76309-1-roman.habibov@tarantool.org> <20190712095549.ac4hhaxhxzgaik5i@tarantool.org> <20190720190714.5pejxoy2ae43nssp@tkn_work_nb> <20190724212743.jwrti2iuclleogjq@tkn_work_nb> From: Vladislav Shpilevoy Message-ID: Date: Thu, 25 Jul 2019 00:43:22 +0200 MIME-Version: 1.0 In-Reply-To: <20190724212743.jwrti2iuclleogjq@tkn_work_nb> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: Alexander Turenko , Roman Khabibov Cc: tarantool-patches@freelists.org, Konstantin Osipov , Kirill Yukhin Hi! Yeah, I remember that I should explain my vision of how that task should be solved. See below a long-read. Here are my explanations on how netbox works, and on how should this issue be fixed (perhaps). Netbox for each connection creates a worker fiber and a metadata object keeping connect timeout, reconnect timeout, status, etc. The metadata object is known as a netbox connection, and is the object with which a user works. It is a result of 'netbox.connect()'. Worker fiber does real work: establishes a tcp connection, sends/receives data. But there appears a problem of cyclic references. The thing is that the worker fiber needs some metadata from the netbox object. At least, 'reconnect_timeout'. On the other hand, the netbox object needs some results from the worker fiber - responses on the requests, connection status. In other words, both the worker fiber and the netbox object need something from each other. If the worker fiber and the netbox object would keep references to each other, they would never be deleted - Lua GC is not that smart. So there is a channel of communication between them, which references them both weakly. https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L834-L852 This function is called when a netbox object is created. It takes some mandatory meta like host, port, and a so called 'callback'. It is a closure defined here: https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L910-L954 The closure has access to the netbox object https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L909 and returns certain parameters of it. The worker fiber uses a weak reference to this closure as a proxy to read metadata of the netbox object: https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L837-L840. For example, when the worker fiber needs to obtain reconnect_timeout, it calls the callback https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L443. The callback here is a weak reference to the closure https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L948-L953. And vice versa - when the worker fiber wants to return something back to the netbox object, it calls the same callback, but with a different argument: https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L399. How does that callback help to GC the connection? Lets assume, that there are no references to a netbox object in user code. It is not referenced in the worker fiber either, because the only reference is callback, and it is weak. So it is deleted. Together with https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L844-L846. It is a standard way of setting your own callbacks on GC of a table - you create a dummy cdata object with a custom GC callback. On deletion of that gc-hook a function https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L473-L481 is called. It cancels the worker fiber. Now both the netbox object and the worker fiber are deleted. Talking of this issue: your situation is simpler, as I see now, when I dived a bit into the problem. You have nothing to do about the netbox object. Your target is the worker fiber only. And in such a case I don't see any magic solution that would shrink this patch https://github.com/tarantool/tarantool/commit/abc88ef8d38e21a6c7faf925a19105ce65d45467 to a couple of lines. The only obvious thing you can do here is to avoid passing version id to iproto_auth_sm and iproto_schema_sm functions, and Alexander proposed. Just make it a global value here https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L232 together with the others, if you still will need. Your case is different from reconnect_timeout and other callback-stuff, because these parameters are set by a user out of the worker fiber, and can be changed by him. Callback allows the worker fiber to read actual values always. Version id originates from the worker fiber, and so can be maintained and kept here. Talking of how to remove branching - it has nothing to do with netbox specific code, or GC, or anything application dependent. It is a pure technical matter. You have a classic problem already solved in box/lua/upgrade.lua. The problem is that you have a pipeline of versions, each adding, or removing, or changing something. And it is logical to have a pipeline of functions building that schema step by step. IMO, you should have several functions to fetch and to install schema. Foe example: function fetch_schema_2_2_1() ... end function fetch_schema_2_2_2() fetch_schema_2_2_1() -- + some v_collation things. end function fetch_schema_2_2_3() fetch_schema_2_2_3() --- + ... end And you have a global table where key is version id, value is a schema fetch function. Here https://github.com/tarantool/tarantool/commit/abc88ef8d38e21a6c7faf925a19105ce65d45467#diff-a9dd9be71f8af749f288abb6304a5185R733 you take a fetcher from the table like 'fetcher = fetchers[version_id]', and just call it. All these encode selects https://github.com/tarantool/tarantool/commit/abc88ef8d38e21a6c7faf925a19105ce65d45467#diff-a9dd9be71f8af749f288abb6304a5185R754 should move to these schema fetchers. Of course, it wouldn't be that simple. At least, because schema fetch requests are sent in parallel, as you can see here: https://github.com/tarantool/tarantool/blob/0529e2f0ef7dd4c19283b9e8e6b0e7e43bf400a3/src/box/lua/net_box.lua#L743-L748 And still the idea is the same. Perhaps you will have not only fetcher for each version, but two functions: start_fetch/end_fetch, or begin_fetch/end_fetch. A first function sends a request (encode_select), a second one waits for a response. Or let these fetchers return an asynchronous id, by which you will wait for responses in one place. Don't know what to chose. If I were you, and I had time, I would try to implement these step-wise fetchers. They would be quite easy to support and to extend for other future schema changes. Perhaps, it will remove the branching. Up to you what to do, to Alexander, and to other managers. My only wish is to look at the final patch when it will be ready.