Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] box: add binary search for _session settings space
@ 2020-01-10  7:46 Chris Sosnin
  2020-01-15 20:24 ` Vladislav Shpilevoy
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Sosnin @ 2020-01-10  7:46 UTC (permalink / raw)
  To: tarantool-patches

As it is mentioned in implementation, it is important
that _session_settings options are sorted by name, so
there is no need in linear lookup and we can replace it
with binary search.

Closes #4712
---
branch: https://github.com/tarantool/tarantool/tree/ksosnin/gh-4712-search-settings
issue: https://github.com/tarantool/tarantool/issues/4712

 src/box/session_settings.c | 10 ++++++----
 src/lib/core/util.c        | 22 ++++++++++++++++++++++
 src/trivia/util.h          |  3 +++
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/src/box/session_settings.c b/src/box/session_settings.c
index a969d3d10..ec87641be 100644
--- a/src/box/session_settings.c
+++ b/src/box/session_settings.c
@@ -265,9 +265,10 @@ session_settings_index_get(struct index *base, const char *key,
 	struct session_setting_module *module = &session_setting_modules[0];
 	struct session_setting_module *end = module + session_setting_type_MAX;
 	int sid = 0;
+	int count = module->setting_count;
 	for (; module < end; ++module, sid = 0) {
-		if (session_settings_next_in_module(module, &sid, key, true,
-						    true) == 0)
+		if ((sid = str_bin_search(module->settings, key,
+					  count)) != count)
 			goto found;
 	}
 	*result = NULL;
@@ -372,9 +373,10 @@ session_settings_space_execute_update(struct space *space, struct txn *txn,
 	key = tt_cstr(key, key_len);
 	struct session_setting_module *module = &session_setting_modules[0];
 	struct session_setting_module *end = module + session_setting_type_MAX;
+	int count = module->setting_count;
 	for (; module < end; ++module, sid = 0) {
-		if (session_settings_next_in_module(module, &sid, key, true,
-						    true) == 0)
+		if ((sid = str_bin_search(module->settings, key,
+					  count)) != count)
 			goto found;
 	}
 	*result = NULL;
diff --git a/src/lib/core/util.c b/src/lib/core/util.c
index ffa1d2b51..6553d6626 100644
--- a/src/lib/core/util.c
+++ b/src/lib/core/util.c
@@ -84,6 +84,28 @@ strnindex(const char **haystack, const char *needle, uint32_t len, uint32_t hmax
 	return hmax;
 }
 
+/**
+ * Same as strindex(), but assuming the array is sorted and it's
+ * size is known.
+ */
+uint32_t
+str_bin_search(const char **haystack, const char *needle, uint32_t hmax) {
+	int cmp;
+	uint32_t index, start = 0;
+	uint32_t end = hmax;
+	while (start < end) {
+		index = (start + end) / 2;
+		cmp = strcmp(haystack[index], needle);
+		if (cmp < 0)
+			start = index + 1;
+		else
+			end = index;
+	}
+	if (start < hmax && strcmp(haystack[start], needle) == 0)
+		return start;
+	return hmax;
+}
+
 void
 close_all_xcpt(int fdc, ...)
 {
diff --git a/src/trivia/util.h b/src/trivia/util.h
index 8a3d22b38..5c1e3bd5d 100644
--- a/src/trivia/util.h
+++ b/src/trivia/util.h
@@ -98,6 +98,9 @@ strindex(const char **haystack, const char *needle, uint32_t hmax);
 uint32_t
 strnindex(const char **haystack, const char *needle, uint32_t len, uint32_t hmax);
 
+uint32_t
+str_bin_search(const char **haystack, const char *needle, uint32_t hmax);
+
 #define nelem(x)     (sizeof((x))/sizeof((x)[0]))
 #define field_sizeof(compound_type, field) sizeof(((compound_type *)NULL)->field)
 #ifndef lengthof
-- 
2.21.0 (Apple Git-122.2)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: add binary search for _session settings space
  2020-01-10  7:46 [Tarantool-patches] [PATCH] box: add binary search for _session settings space Chris Sosnin
@ 2020-01-15 20:24 ` Vladislav Shpilevoy
  2020-01-16 13:13   ` Chris Sosnin
  0 siblings, 1 reply; 4+ messages in thread
From: Vladislav Shpilevoy @ 2020-01-15 20:24 UTC (permalink / raw)
  To: Chris Sosnin, tarantool-patches

Hi! Thanks for the patch!

On 10/01/2020 08:46, Chris Sosnin wrote:
> box: add binary search for _session settings space

There is no '_session' space. Please, add '_' between
'_session' and 'settings'.

> As it is mentioned in implementation, it is important
> that _session_settings options are sorted by name, so
> there is no need in linear lookup and we can replace it
> with binary search.
> 
> Closes #4712
> ---
> branch: https://github.com/tarantool/tarantool/tree/ksosnin/gh-4712-search-settings
> issue: https://github.com/tarantool/tarantool/issues/4712
> 
>  src/box/session_settings.c | 10 ++++++----
>  src/lib/core/util.c        | 22 ++++++++++++++++++++++
>  src/trivia/util.h          |  3 +++
>  3 files changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/src/box/session_settings.c b/src/box/session_settings.c
> index a969d3d10..ec87641be 100644
> --- a/src/box/session_settings.c
> +++ b/src/box/session_settings.c
> @@ -265,9 +265,10 @@ session_settings_index_get(struct index *base, const char *key,
>  	struct session_setting_module *module = &session_setting_modules[0];
>  	struct session_setting_module *end = module + session_setting_type_MAX;
>  	int sid = 0;
> +	int count = module->setting_count;

1. 'Module' is different on each iteration, because I do '++module'
each time. Here you took count only for the first module, but use it
for all of them.

The same below.

>  	for (; module < end; ++module, sid = 0) {
> -		if (session_settings_next_in_module(module, &sid, key, true,
> -						    true) == 0)
> +		if ((sid = str_bin_search(module->settings, key,
> +					  count)) != count)
>  			goto found;
>  	}
>  	*result = NULL;
> @@ -372,9 +373,10 @@ session_settings_space_execute_update(struct space *space, struct txn *txn,
>  	key = tt_cstr(key, key_len);
>  	struct session_setting_module *module = &session_setting_modules[0];
>  	struct session_setting_module *end = module + session_setting_type_MAX;
> +	int count = module->setting_count;
>  	for (; module < end; ++module, sid = 0) {
> -		if (session_settings_next_in_module(module, &sid, key, true,
> -						    true) == 0)
> +		if ((sid = str_bin_search(module->settings, key,
> +					  count)) != count)
>  			goto found;
>  	}
>  	*result = NULL;
> diff --git a/src/lib/core/util.c b/src/lib/core/util.c
> index ffa1d2b51..6553d6626 100644
> --- a/src/lib/core/util.c
> +++ b/src/lib/core/util.c
> @@ -84,6 +84,28 @@ strnindex(const char **haystack, const char *needle, uint32_t len, uint32_t hmax
>  	return hmax;
>  }
>  
> +/**
> + * Same as strindex(), but assuming the array is sorted and it's
> + * size is known.
> + */
> +uint32_t
> +str_bin_search(const char **haystack, const char *needle, uint32_t hmax) {

2. We don't really need this function anywhere except session_settings.c.
Besides, even in that file it is needed only in session_settings_next_in_module().
So you can just patch the latter function.

> +	int cmp;
> +	uint32_t index, start = 0;
> +	uint32_t end = hmax;
> +	while (start < end) {
> +		index = (start + end) / 2;
> +		cmp = strcmp(haystack[index], needle);
> +		if (cmp < 0)
> +			start = index + 1;
> +		else
> +			end = index;
> +	}
> +	if (start < hmax && strcmp(haystack[start], needle) == 0)
> +		return start;
> +	return hmax;
> +}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: add binary search for _session settings space
  2020-01-15 20:24 ` Vladislav Shpilevoy
@ 2020-01-16 13:13   ` Chris Sosnin
  2020-01-16 20:27     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Sosnin @ 2020-01-16 13:13 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]

Hi! Thank you very much for the review!

> There is no '_session' space. Please, add '_' between
> '_session' and 'settings’.

I agree, it’s a typo.

> 1. 'Module' is different on each iteration, because I do '++module'
> each time. Here you took count only for the first module, but use it
> for all of them.
> 
> The same below.

I agree here too, thank you for the catch!

> 2. We don't really need this function anywhere except session_settings.c.
> Besides, even in that file it is needed only in session_settings_next_in_module().
> So you can just patch the latter function.


I don’t think it’s a good idea, consider the following:

tarantool> box.space._session_settings:select('a', {iterator = 'GE’})

With the current version, we will find the first element which is GE with linear lookup,
and the rest loops will consist of one iteration (overall it will always be the number of elements
in the array). If we change session_settings_next_in_module() to use binary search, 
however, it will highly increase the number of comparisons, because, even though we know
that the next element is greater or equal, we are still looking for it in the array.

My initial patch takes advantage of the array being sorted for update and get methods,
leaving the case from above untouched. Perhaps I could try to make the first lookup 
with binary search, and the rest with linear. What do you think?

Best regards,
Chris.

[-- Attachment #2: Type: text/html, Size: 10246 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: add binary search for _session settings space
  2020-01-16 13:13   ` Chris Sosnin
@ 2020-01-16 20:27     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2020-01-16 20:27 UTC (permalink / raw)
  To: Chris Sosnin; +Cc: tarantool-patches

Hi!

>> 2. We don't really need this function anywhere except session_settings.c.
>> Besides, even in that file it is needed only in session_settings_next_in_module().
>> So you can just patch the latter function.
> 
> I don’t think it’s a good idea, consider the following:
> 
> tarantool> box.space._session_settings:select('a', {iterator = 'GE’})
> 
> With the current version, we will find the first element which is GE with linear lookup,
> and the rest loops will consist of one iteration (overall it will always be the number of elements
> in the array). If we change session_settings_next_in_module() to use binary search, 
> however, it will highly increase the number of comparisons, because, even though we know
> that the next element is greater or equal, we are still looking for it in the array.
> 
> My initial patch takes advantage of the array being sorted for update and get methods,
> leaving the case from above untouched. Perhaps I could try to make the first lookup 
> with binary search, and the rest with linear. What do you think?

Hm, you are right. session_settings_next_in_module() perhaps is not the
best place.

Yes, I think all linear places should be fixed and become binary
search. For the iterators case you can patch
session_settings_index_create_iterator() so as it initializes
module_id and setting_id with a first matching module and setting
(or last, depending on the iterator type). Although not sure if it
is correct, because usually creation of an iterator does not
position the iterator. At least it does not happen in all our other
iterators.

Alternative solution - for forward iterator initialize module_id
with -1. In session_settings_iterator_next() check, that if module_id
is -1, then find a first setting with a binary search.
For backward iterator the same - init module_id with session_setting_type_MAX,
and use a binary search first time in session_settings_iterator_prev(),
when you see module_id == MAX.

Motivation is that even though 'update' and 'get' are the most
important places, and you fixed them, some users may still use :select()
even when they want to select just one tuple, with EQ iterator and a
full key. And it will be linear, because select() is always an iterator.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-16 20:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-10  7:46 [Tarantool-patches] [PATCH] box: add binary search for _session settings space Chris Sosnin
2020-01-15 20:24 ` Vladislav Shpilevoy
2020-01-16 13:13   ` Chris Sosnin
2020-01-16 20:27     ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox