[Tarantool-patches] [PATCH] box: allow upgrading from version 1.6

Serge Petrenko sergepetrenko at tarantool.org
Wed Aug 11 15:36:03 MSK 2021



10.08.2021 17:43, Sergey Ostanevich пишет:
> Hi!
>
> Thanks for the patch! See my 7 comments below
>
> Sergos

Thanks for the review!

>
> 1. I don’t see any doc references to
> https://www.tarantool.io/en/doc/latest/book/admin/upgrades/#admin-upgrades
> where obviously some changes are needed: “Meanwhile Tarantool 2.x may have
> incompatible changes when migrating from Tarantool 1.6. to 2.x directly."
>

I think the doc team already has the ticket, let me check.

Here's the discussion we need:
https://github.com/tarantool/doc/issues/1807#issuecomment-825558865

>> On 10 Aug 2021, at 14:28, Serge Petrenko <sergepetrenko at tarantool.org> wrote:
>>
>> Direct upgrade support from pre-1.7.5 versions was removed in commit
>> 7d3b80e78206c479cab75b4893629cfa1932252e
>> (Forbid upgrade from Tarantool < 1.7.5 and refactor upgrade.lua)
>> The reason for that was the mandatory space format checks introduced
>> back then. Later on, more incompatible changes were made, and due to these
>> changes, Tarantool failed to recover from pre-1.7.5 snapshots.
>>
>> Actually this is rather inconvenient. Now one has to perform an extra
>> upgrade step when upgrading from, say, 1.6 to 2.x: instead of performing a
>> direct upgrade one has to do 1.6 -> 1.10 -> 2.x upgrade which takes
>> twice the time.
>>
> 2. I didn’t get the part before this line - if it part of the commit message?
> I don’t think we need that deep explanation, partially covered later in
> description how the issue is addressed.

I'll clarify this bit. Thanks! I'd better leave it in the commit message 
though.
It will be helpful for anyone digging the git log in the future.

>
>> Make it possible to boot from snapshots coming from Tarantool version 1.6.8 and
>> above.
>>
> 3. Not sure if it’s too long, but it looks like.

Ok, fixed.

>
>> In order to do so, introduce before_replace triggers on system spaces,
>> which work during snapshot/xlog recovery. The triggers will set tuple
>> formats to the ones supported by current Tarantool (2.x). This way the
>> recovered data will have the correct format for a usual schema upgrade.
>>
>> Also add upgrade_to_1_7_5() handler, which finishes transformation of
>> old schema to 1.7.5. The handler is fired together with other
>> box.schema.upgrade() handlers, so there's no user-visible behaviour
>> change.
>>
>> Side note: it would be great to use the same technique to allow booting
>> from pre-1.6.8 snapshots. Unfortunately, this is not possible.
>>
>> Current triggers don't break the order of schema upgrades, so 1.7.1
>> upgrades come before 1.7.2 and 1.7.5. This is because all the upgrades
>> in these versions are replacing existing tuples and not inserting new
>> ones, so the upgrades may be handled by the before_replace triggers.
>>
>> Upgrade to 1.6.8 requires inserting new tuples: creating sysviews, like
>> _vspace, _vuser and so on. This can't be done from the before_replace
>> triggers, so we would have to run triggers for 1.7.x first which would
>> allow Tarantool to recover the snapshot, and then run an upgrade handler for
>> 1.6.8. This looks really messy.
>>
>> Closes #5894
>> ---
>> https://github.com/tarantool/tarantool/issues/5894
>> https://github.com/tarantool/tarantool/tree/sp/gh-5894-1.6-upgrade
>> src/box/lua/load_cfg.lua |   9 ++
>> src/box/lua/upgrade.lua  | 251 ++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 258 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>> index 4df70c210..ea73337b7 100644
>> --- a/src/box/lua/load_cfg.lua
>> +++ b/src/box/lua/load_cfg.lua
>> @@ -695,6 +695,12 @@ local function load_cfg(cfg)
>>          return box.error() -- re-throw exception from check_cfg()
>>      end
>>
>> +    local snap_dir = box.cfg.memtx_dir
> 4. It should be covered with comment on the matter - why is it memtx only,
> what versions are expected here, what triggers will do.

Ok. Please see the incremental diff below.

>
>> +    local snap_version = private.get_snapshot_version(snap_dir)
>> +    if snap_version and private.need_recovery_triggers(snap_version) then
>> +        private.set_recovery_triggers(snap_version)
>> +    end
>> +
>>      -- NB: After this point the function should not raise an
>>      -- error.
>>      --
>> @@ -722,6 +728,9 @@ local function load_cfg(cfg)
>>      -- This call either succeeds or calls panic() / exit().
>>      private.cfg_load()
>>
>> +    if snap_version and private.need_recovery_triggers(snap_version) then
>> +        private.reset_recovery_triggers()
>> +    end
> 5. So, after this point the formats will be up to 2.x. If we make a snapshot
> at this moment, the new snap file will become incompatible with 1.6.8?

No. After this point the formats are ~ 1.7.5. The user still has to
call box.schema.upgrade() + box.snapshot() to finish the upgrade.
Just like during a normal upgrade. From, say, 1.6 to 1.10.

Once the user creates a snapshot (either after box.cfg() or after
box.cfg() + box.schema.upgrade()), this snapshot is incompatible
with 1.6, and it has to be removed before returning to 1.6.

> If an 2.x instance will bootstrap from this snap, will it be able to
> be a replica in 1.6.8 cluster?

No, it won't be able to participate in 1.6 cluster. I don't remember the 
exact
reason for that, but I believe, we've discussed it verbally.

As far as I know the RB guys are ok with an upgrade which involves a 
downtime.

>   
>
>>      -- This block does not raise an error: all necessary checks
>>      -- already performed in private.cfg_check(). See <dynamic_cfg>
>>      -- comment.
>> diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
>> index 97afc0b4d..c1f3a027a 100644
>> --- a/src/box/lua/upgrade.lua
>> +++ b/src/box/lua/upgrade.lua
>> @@ -1,6 +1,8 @@
>> local log = require('log')
>> local bit = require('bit')
>> local json = require('json')
>> +local fio = require('fio')
>> +local xlog = require('xlog')
>>
>> -- Guest user id - the default user
>> local GUEST = 0
>> @@ -86,6 +88,41 @@ local function set_system_triggers(val)
>>      foreach_system_space(function(s) s:run_triggers(val) end)
>> end
>>
>> +-- Get schema version, stored in _schema system space, by reading the latest
>> +-- snapshot file from the snap_dir. Useful to define schema_version before
>> +-- recovering the snapshot, because some schema versions are too old and cannot
>> +-- be recovered normally.
>> +local function get_snapshot_version(snap_dir)
>> +    local SCHEMA_ID = 272
> 6. Can we reuse the global definition at schema_def.h:72 ?

Ok, fixed. I had to put get_snapshot_version() call lower for that,
because box.* isn't accessible until box's metatable is reset.
Please find the corresponding changes below.

>
>> +    local snap_pattern = snap_dir..'/'..string.rep('[0-9]', 20)..'.snap'
>> +    local snap_list = fio.glob(snap_pattern)
>> +    table.sort(snap_list)
>> +    local snap = snap_list[#snap_list]
>> +    if snap == nil then
>> +        return nil
>> +    end
>> +    local version = nil
>> +    for _, row in xlog.pairs(snap) do
>> +        local sid = row.BODY and row.BODY.space_id
>> +        if sid == SCHEMA_ID then
>> +            local tuple = row.BODY.tuple
>> +            if tuple and tuple[1] == 'version' then
>> +                local major, minor, patch = tuple[2], tuple[3], tuple[4] or 0
>> +                if major and minor and patch and type(major) == 'number' and
>> +                   type(minor) == 'number' and type(patch) == 'number' then
>> +                    version = mkversion(major, minor, patch)
>> +                    break
>> +                end
>> +            end
>> +        elseif sid and sid > SCHEMA_ID then
>> +            -- Exit early if version wasn't found in _schema space.
>> +            -- Snapshot rows are ordered by space id.
>> +            break
>> +        end
>> +    end
>> +    return version
>> +end
>> +
>> --------------------------------------------------------------------------------
>> -- Bootstrap
>> --------------------------------------------------------------------------------
>> @@ -131,6 +168,135 @@ local function create_sysview(source_id, target_id)
>>      end
>> end
>>
>> +--------------------------------------------------------------------------------
>> +-- Tarantool 1.7.1
>> +--------------------------------------------------------------------------------
>> +local function user_trig_1_7_1(_, tuple)
>> +    if tuple[3] == 'guest' and not tuple[5] then
>> +        local auth_method_list = {}
>> +        auth_method_list["chap-sha1"] = box.schema.user.password("")
>> +        tuple = tuple:update{{'=', 5, auth_method_list}}
>> +        log.info("Set empty password to user 'guest'")
>> +    end
>> +    return tuple
>> +end
>> +
>> +--------------------------------------------------------------------------------
>> +-- Tarantool 1.7.2
>> +--------------------------------------------------------------------------------
>> +local function index_trig_1_7_2(_, tuple)
>> +    local field_types_v16 = {
>> +        num = 'unsigned',
>> +        int = 'integer',
>> +        str = 'string',
>> +    }
>> +    local parts = tuple[6]
>> +    local changed = false
>> +    for _, part in pairs(parts) do
>> +        local field_type = part[2]:lower()
>> +        if field_types_v16[field_type] ~= nil then
>> +            part[2] = field_types_v16[field_type]
>> +            changed = true
>> +        end
>> +    end
>> +    if changed then
>> +        log.info("Updating index '%s' on space '%s': set parts to %s", tuple[3],
>> +                 box.space[tuple[1]].name, json.encode(parts))
>> +        tuple = tuple:update{{'=', 6, parts}}
>> +    end
>> +    return tuple
>> +end
>> +
>> +--------------------------------------------------------------------------------
>> +-- Tarantool 1.7.5
>> +--------------------------------------------------------------------------------
>> +local function create_truncate_space()
>> +    local _truncate = box.space[box.schema.TRUNCATE_ID]
>> +
>> +    log.info("create space _truncate")
>> +    box.space._space:insert{_truncate.id, ADMIN, '_truncate', 'memtx', 0, setmap({}),
>> +                            {{name = 'id', type = 'unsigned'}, {name = 'count', type = 'unsigned'}}}
>> +
>> +    log.info("create index primary on _truncate")
>> +    box.space._index:insert{_truncate.id, 0, 'primary', 'tree', {unique = true}, {{0, 'unsigned'}}}
>> +
>> +    local _priv = box.space[box.schema.PRIV_ID]
>> +    _priv:insert{ADMIN, PUBLIC, 'space', _truncate.id, box.priv.W}
>> +end
>> +
>> +local function upgrade_to_1_7_5()
>> +    create_truncate_space()
> 7. Is it an intended nesting of a single-call function?

Yep. I wanted to keep the same structure as for other handlers.

>
>> +end
>> +
>> +local function user_trig_1_7_5(_, tuple)
>> +    if not tuple[5] then
>> +        tuple = tuple:update{{'=', 5, setmap({})}}
>> +        log.info("Set empty password to %s '%s'", tuple[4], tuple[3])
>> +    end
>> +    return tuple
>> +end
>> +

Incremental diff:

=================================

diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index ea73337b7..2b7c7656c 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -695,12 +695,6 @@ local function load_cfg(cfg)
          return box.error() -- re-throw exception from check_cfg()
      end

-    local snap_dir = box.cfg.memtx_dir
-    local snap_version = private.get_snapshot_version(snap_dir)
-    if snap_version and private.need_recovery_triggers(snap_version) then
-        private.set_recovery_triggers(snap_version)
-    end
-
      -- NB: After this point the function should not raise an
      -- error.
      --
@@ -725,6 +719,17 @@ local function load_cfg(cfg)
              __call = locked(reload_cfg),
          })

+    -- Check schema version of the snapshot we're about to recover, if any.
+    -- Some schema versions (below 1.7.5) are incompatible with 
Tarantool 2.x
+    -- When recovering from such an old snapshot, special recovery 
triggers on
+    -- system spaces are needed in order to be able to recover and upgrade
+    -- the schema then.
+    local snap_dir = box.cfg.memtx_dir
+    local snap_version = private.get_snapshot_version(snap_dir)
+    if snap_version and private.need_recovery_triggers(snap_version) then
+        private.set_recovery_triggers(snap_version)
+    end
+
      -- This call either succeeds or calls panic() / exit().
      private.cfg_load()

diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
index c1f3a027a..8ee6a114f 100644
--- a/src/box/lua/upgrade.lua
+++ b/src/box/lua/upgrade.lua
@@ -93,7 +93,6 @@ end
  -- recovering the snapshot, because some schema versions are too old 
and cannot
  -- be recovered normally.
  local function get_snapshot_version(snap_dir)
-    local SCHEMA_ID = 272
      local snap_pattern = snap_dir..'/'..string.rep('[0-9]', 20)..'.snap'
      local snap_list = fio.glob(snap_pattern)
      table.sort(snap_list)
@@ -104,7 +103,7 @@ local function get_snapshot_version(snap_dir)
      local version = nil
      for _, row in xlog.pairs(snap) do
          local sid = row.BODY and row.BODY.space_id
-        if sid == SCHEMA_ID then
+        if sid == box.schema.SCHEMA_ID then
              local tuple = row.BODY.tuple
              if tuple and tuple[1] == 'version' then
                  local major, minor, patch = tuple[2], tuple[3], 
tuple[4] or 0
@@ -114,7 +113,7 @@ local function get_snapshot_version(snap_dir)
                      break
                  end
              end
-        elseif sid and sid > SCHEMA_ID then
+        elseif sid and sid > box.schema.SCHEMA_ID then
              -- Exit early if version wasn't found in _schema space.
              -- Snapshot rows are ordered by space id.
              break


=================================

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list