Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Oleg Babin <olegrok@tarantool.org>,
	tarantool-patches@dev.tarantool.org,
	yaroslav.dynnikov@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH vshard 11/11] router: introduce map_callrw()
Date: Sat, 27 Feb 2021 00:58:06 +0100
Message-ID: <85b54259-b7c2-475b-a5ed-da4fec44e7ca@tarantool.org> (raw)
In-Reply-To: <f3760ddc-55d7-b6db-718b-d6969bb66085@tarantool.org>

>>> On 23.02.2021 03:15, Vladislav Shpilevoy wrote:
>>>> Closes #147
>>> Will read-only map-reduce functions be done in the scope of separate issue/patch?
>>>
>>> I know about #173 but seems we need to keep information about map_callro function.
>> Perhaps there will be a new ticket, yes. Until 173 is fixed, ro refs seem
>> pointless. The implementation depends on how exactly to fix 173.
>>
>> At this moment I didn't even design it yet. I am thinking if I need to
>> account ro storage refs separated from rw refs in the scheduler or not.
>> Implementation heavily depends on this.
>>
>> If I go for considering ro refs separated, it adds third group of operations
>> to the scheduler complicating it significantly, and a second type of refs to
>> the refs module obviously.
>>
>> It would allow to send buckets and mark them as GARBAGE/SENT while keep ro
>> refs. But would complicate the garbage collector a bit as it would need to
>> consider storage ro refs.
>>
>> On the other hand, it would heavily complicate the code in the scheduler.
>> Like really heavy, I suppose, but I can be wrong.
>>
>> Also the win will be zeroed if we ever implement rebalancing of writable
>> buckets. Besides, the main reason I am more into unified type of refs is
>> that they are used solely for map-reduce. Which means they are taken on all
>> storages. This means if you have SENDING readable bucket on one storage,
>> you have RECEIVING non-readable bucket on another storage. Which makes
>> such ro refs pointless for full cluster scan. Simply won't be able to take
>> ro ref on the RECEIVING storage.
>>
>> If we go for unified refs, it would allow to keep the scheduler relatively
>> simple, but requires urgent fix of 173. Otherwise you can't be sure your
>> data is consistent. Even now you can't be sure with normal requests, which
>> terrifies me and raises a huge question - why the fuck nobody cares? I
>> think I already had a couple of nightmares about 173.
> 
> Yes, it's a problem but rebalansing is not quite often operation. So, in some
> 
> cases routeall() was enough. Anyway we didn't have any alternatives. But map-reduce it's
> 
> really often operation - any request over secondary index and you should scan whole cluster.

Have you tried building secondary indexes over buckets? There is an
algorithm, in case it is something perf sensitive.

You can store secondary index in another space and shard it independently.
And there are ways how to deal with inability to atomically update it and
the main space together. So almost always you will have at most 2 network
hops to at most 2 nodes to find the primary key. Regardless of cluster
size.

>> Another issue - ro refs won't stop rebalancer from working and don't
>> even participate in the scheduler, if we fix 173 in the simplest way -
>> just invalidate all refs on the replica if a bucket starts moving. If
>> the rebalancer works actively, nearly all your ro map-reduces will return
>> errors during data migration because buckets will move constantly. No
>> throttling via the scheduler.
>>
>> One way to go - switch all replica weights to defaults for the time of
>> data migration manually. So map_callro() will go to master nodes and
>> will participate in the scheduling.
>>
>> Another way to go - don't care and fix 173 in the simplest way. Weights
>> anyway are used super rare AFAIK.
>>
>> Third way to go - implement some smarter fix of 173. I have many ideas here.
>> But neither of them are quick.
>>
>>>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>>>> index 97bcb0a..8abd77f 100644
>>>> --- a/vshard/router/init.lua
>>>> +++ b/vshard/router/init.lua
>>>> @@ -44,6 +44,11 @@ if not M then
>>>>            module_version = 0,
>>>>            -- Number of router which require collecting lua garbage.
>>>>            collect_lua_garbage_cnt = 0,
>>>> +
>>>> +        ----------------------- Map-Reduce -----------------------
>>>> +        -- Storage Ref ID. It must be unique for each ref request
>>>> +        -- and therefore is global and monotonically growing.
>>>> +        ref_id = 0,
>>> Maybe 0ULL?
>> Wouldn't break anyway - doubles are precise until 2^53. But an integer
>> should be faster I hope.
>>
>> Changed to 0ULL.

I reverted it back. Asked Igor and he reminded me it is cdata. So it
involves heavy stuff with metatables and shit. It is cheaper to simply
increment the plain number. I didn't measure though.

>>>> +router_map_callrw = function(router, func, args, opts)
>>>> +    local replicasets = router.replicasets
>>> It would be great to filter here replicasets with bucket_count = 0 and weight = 0.
>> Information on the router may be outdated. Even if bucket_count is 0,
>> it may still mean the discovery didn't get to there yet. Or the
>> discovery is simply disabled or dead due to a bug.
>>
>> Weight 0 also does not say anything certain about buckets on the
>> storage. It can be that config on the router is outdated, or it is
>> outdated on the storage. Or someone simply does not set the weights
>> for router configs because they are not used here. Or the weights are
>> really 0 and set everywhere, but rebalancing is in progress - buckets
>> move from the storage slowly, and the scheduler allows to squeeze some
>> map-reduces in the meantime.
>>
>>> In case if such "dummy" replicasets are disabled we get an error "connection refused".
>> I need more info here. What is 'disabled' replicaset? And why would
>> 0 discovered buckets or 0 weight lead to the refused connection?
>>
>> In case this is some cartridge specific shit, this is bad. As you
>> can see above, I can't ignore such replicasets. I need to send requests
>> to them anyway.
>>
>> There is a workaround though - even if an error has occurred, continue
>> execution if even without the failed storages I still cover all the buckets.
>> Then having 'disabled' replicasets in the config would result in some
>> unnecessary faulty requests for each map call, but they would work.
>>
>> Although I don't know how to 'return' such errors. I don't want to log them
>> on each request, and don't have a concept of a 'warning' object or something
>> like this. Weird option - in case of this uncertain success return the
>> result, error, uuid. So the function could return
>>
>> - nil, err[, uuid] - fail;
>> - res              - success;
>> - res, err[, uuid] - success but with a suspicious issue;
>>
> Seems I didn't fully state the problem. Maybe it's not even relevant issue and users
> 
> that do it just create their own problems. But there was a case when our customers added
> 
> new replicaset (in fact single instance) in cluster. This instance didn't have any data and had a weight=0.
> 
> Then they just turned off this instance after some time. And all requests that perform map-reduce started to fail with "Connection
> 
> refused" error.
> 
> It causes a question: "Why do our requests fail if we disable instance that doesn't have any data".
> 
> Yes, requirement of weight=0 is obviously not enough - because if rebalansing is in progress replicaset with weight=0
> 
> still could contain some data.
> 
> 
> Considering an opportunity to finish requests if someone failed - I'm not sure that it's really needed.
> 
> Usually we don't need some partial result (moreover if it adds some workload).

I want to emphasize - it won't be partial. It will be full result. In case the
'disabled' node does not have any data, the requests to the other nodes will
see that they covered all the buckets. So this 'disabled' node loss is not
critical.

Map-Reduce, at least how I understood it, is about providing access to all
buckets. You can succeed at this even if didn't manage to look into each node.
If the not responded nodes didn't have any data.

But sounds like a crutch for an outdated config. If this is not something
regular I need to support, I better leave it as is now then. A normal fail.

  reply	other threads:[~2021-02-26 23:58 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  0:15 [Tarantool-patches] [PATCH vshard 00/11] VShard Map-Reduce, part 2: Ref, Sched, Map Vladislav Shpilevoy via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 01/11] error: introduce vshard.error.timeout() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:46     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 10/11] sched: introduce vshard.storage.sched module Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 21:50     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-04 21:02   ` Oleg Babin via Tarantool-patches
2021-03-05 22:06     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-09  8:03       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 11/11] router: introduce map_callrw() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 22:04     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:43       ` Oleg Babin via Tarantool-patches
2021-02-26 23:58         ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-03-01 10:58           ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 02/11] storage: add helper for local functions invocation Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:47     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 04/11] registry: module for circular deps resolution Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 05/11] util: introduce safe fiber_cond_wait() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:48     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 06/11] util: introduce fiber_is_self_canceled() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 07/11] storage: introduce bucket_generation_wait() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 08/11] storage: introduce bucket_are_all_rw() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:48     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 09/11] ref: introduce vshard.storage.ref module Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 21:49     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-03-04 21:22   ` Oleg Babin via Tarantool-patches
2021-03-05 22:06     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-09  8:03       ` Oleg Babin via Tarantool-patches
2021-03-21 18:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-12 23:13 ` [Tarantool-patches] [PATCH vshard 00/11] VShard Map-Reduce, part 2: Ref, Sched, Map Vladislav Shpilevoy via Tarantool-patches
2021-03-15  7:05   ` Oleg Babin via Tarantool-patches
2021-03-28 18:17 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85b54259-b7c2-475b-a5ed-da4fec44e7ca@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=yaroslav.dynnikov@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git