Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Alexander V. Tikhonov" <avtikhon@tarantool.org>
To: Oleg Piskunov <o.piskunov@tarantool.org>,
	Sergey Bronnikov <sergeyb@tarantool.org>,
	Alexander Turenko <alexander.turenko@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: [Tarantool-patches] [PATCH v1 1/3] Add metafiles cleanup routines at S3 pack script
Date: Mon, 30 Mar 2020 08:38:04 +0300	[thread overview]
Message-ID: <f570f79198fef1a9d55f28a43bc4d862d7639f85.1585546306.git.avtikhon@tarantool.org> (raw)
In-Reply-To: <cover.1585546306.git.avtikhon@tarantool.org>
In-Reply-To: <cover.1585546306.git.avtikhon@tarantool.org>

Added cleanup functionality for the meta files.
Script may have the following situations:

 - package files removed at S3, but it still registered:
   Script stores and registers the new packages at S3 and
   removes all the other registered blocks for the sames
   files in meta files.

 - package files already exists at S3 with the same hashes:
   Script passes it with warning message.

 - package files already exists at S3 with the old hashes:
   Script fails w/o force flag, otherwise it stores and
   registers the new packages at S3 and removes all the other
   registered blocks for the sames files in meta files.

Added '-s|skip_errors' option flag to skip errors on changed
packages to avoid of exits on script run.

Follow-up #3380
---
 tools/update_repo.sh | 204 +++++++++++++++++++++++++++++++++----------
 1 file changed, 159 insertions(+), 45 deletions(-)

diff --git a/tools/update_repo.sh b/tools/update_repo.sh
index f49569b73..ddc44d118 100755
--- a/tools/update_repo.sh
+++ b/tools/update_repo.sh
@@ -9,6 +9,7 @@ ws_prefix=/tmp/tarantool_repo_s3
 alloss='ubuntu debian el fedora'
 product=tarantool
 force=
+skip_errors=
 # the path with binaries either repository
 repo=.
 
@@ -82,6 +83,8 @@ EOF
          Product name to be packed with, default name is 'tarantool'
     -f|--force
          Force updating the remote package with the local one despite the checksum difference
+    -s|--skip_errors
+         Skip failing on changed packages
     -h|--help
          Usage help message
 EOF
@@ -114,6 +117,9 @@ case $i in
     -f|--force)
     force=1
     ;;
+    -s|--skip_errors)
+    skip_errors=1
+    ;;
     -h|--help)
     usage
     exit 0
@@ -169,6 +175,9 @@ function update_deb_packfile {
 function update_deb_metadata {
     packpath=$1
     packtype=$2
+    packfile=$3
+
+    file_exists=''
 
     if [ ! -f $packpath.saved ] ; then
         # get the latest Sources file from S3 either create empty file
@@ -185,38 +194,94 @@ function update_deb_metadata {
         # find the hash from the new Sources file
         hash=$(grep '^Checksums-Sha256:' -A3 $packpath | \
             tail -n 1 | awk '{print $1}')
+        # check if the file already exists in S3
+        if $aws ls "$bucket_path/$packfile" ; then
+            echo "WARNING: DSC file already exists in S3!"
+            file_exists=$bucket_path/$packfile
+        fi
         # search the new hash in the old Sources file from S3
         if grep " $hash .* .*$" $packpath.saved ; then
             echo "WARNING: DSC file already registered in S3!"
-            return
+            echo "File hash: $hash"
+            if [ "$file_exists" != "" ] ; then
+                return
+            fi
         fi
         # check if the DSC file already exists in old Sources file from S3
         file=$(grep '^Files:' -A3 $packpath | tail -n 1 | awk '{print $3}')
-        if [ "$force" == "" ] && grep " .* .* $file$" $packpath.saved ; then
-            echo "ERROR: the file already exists, but changed, set '-f' to overwrite it: $file"
-            echo "New hash: $hash"
-            # unlock the publishing
-            $rm_file $ws_lockfile
-            exit 1
+        if grep " .* .* $file$" $packpath.saved ; then
+            if [ "$force" == "" -a "$file_exists" != "" ] ; then
+                if [ "$skip_errors" == "" ] ; then
+                    echo "ERROR: the file already exists, but changed, set '-f' to overwrite it: $file"
+                    echo "New hash: $hash"
+                    # unlock the publishing
+                    $rm_file $ws_lockfile
+                    exit 1
+                else
+                    echo "WARNING: the file already exists, but changed, set '-f' to overwrite it: $file"
+                    echo "New hash: $hash"
+                    return
+                fi
+            fi
+            hashes_old=$(grep '^Checksums-Sha256:' -A3 $packpath.saved | \
+                grep " .* .* $file" | awk '{print $1}')
+            # NOTE: for the single file name may exists more than one
+            #       entry in damaged file, to fix it all found entries
+            #       of this file need to be removed
+            # find and remove all package blocks for the bad hashes
+            for hash_rm in $hashes_old ; do
+                echo "Removing from $packpath.saved file old hash: $hash_rm"
+                pcregrep -Mi -v "(?s)Package: (\N+\n)+(?=^ ${hash_rm}).*?^$" \
+                    $packpath.saved >$packpath.saved_new
+                mv $packpath.saved_new $packpath.saved
+            done
         fi
         updated_dsc=1
     elif [ "$packtype" == "deb" ]; then
         # check if the DEB file already exists in old Packages file from S3
         # find the hash from the new Packages file
-        hash=$(grep '^SHA256: ' $packpath)
+        hash=$(grep '^SHA256: ' $packpath | awk '{print $2}')
+        # check if the file already exists in S3
+        if $aws ls "$bucket_path/$packfile" ; then
+            echo "WARNING: DEB file already exists in S3!"
+            file_exists=$bucket_path/$packfile
+        fi
         # search the new hash in the old Packages file from S3
         if grep "^SHA256: $hash" $packpath.saved ; then
             echo "WARNING: DEB file already registered in S3!"
-            return
+            echo "File hash: $hash"
+            if [ "$file_exists" != "" ] ; then
+                return
+            fi
         fi
         # check if the DEB file already exists in old Packages file from S3
         file=$(grep '^Filename:' $packpath | awk '{print $2}')
-        if [ "$force" == "" ] && grep "Filename: $file$" $packpath.saved ; then
-            echo "ERROR: the file already exists, but changed, set '-f' to overwrite it: $file"
-            echo "New hash: $hash"
-            # unlock the publishing
-            $rm_file $ws_lockfile
-            exit 1
+        if grep "Filename: $file$" $packpath.saved ; then
+            if [ "$force" == "" -a "$file_exists" != "" ] ; then
+                if [ "$skip_errors" == "" ] ; then
+                    echo "ERROR: the file already exists, but changed, set '-f' to overwrite it: $file"
+                    echo "New hash: $hash"
+                    # unlock the publishing
+                    $rm_file $ws_lockfile
+                    exit 1
+                else
+                    echo "WARNING: the file already exists, but changed, set '-f' to overwrite it: $file"
+                    echo "New hash: $hash"
+                    return
+                fi
+            fi
+            hashes_old=$(grep -e "^Filename: " -e "^SHA256: " $packpath.saved | \
+                grep -A1 "$file" | grep "^SHA256: " | awk '{print $2}')
+            # NOTE: for the single file name may exists more than one
+            #       entry in damaged file, to fix it all found entries
+            #       of this file need to be removed
+            # find and remove all package blocks for the bad hashes
+            for hash_rm in $hashes_old ; do
+                echo "Removing from $packpath.saved file old hash: $hash_rm"
+                pcregrep -Mi -v "(?s)Package: (\N+\n)+(?=SHA256: ${hash_rm}).*?^$" \
+                    $packpath.saved >$packpath.saved_new
+                mv $packpath.saved_new $packpath.saved
+            done
         fi
         updated_deb=1
     fi
@@ -248,9 +313,6 @@ function pack_deb {
         exit 1
     fi
 
-    # prepare the workspace
-    prepare_ws ${os}
-
     # set the subpath with binaries based on literal character of the product name
     proddir=$(echo $product | head -c 1)
 
@@ -297,7 +359,7 @@ EOF
             for packages in dists/$loop_dist/$component/binary-*/Packages ; do
                 # copy Packages file to avoid of removing by the new DEB version
                 # update metadata 'Packages' files
-                update_deb_metadata $packages deb
+                update_deb_metadata $packages deb $locpackfile
                 [ "$updated_deb" == "1" ] || continue
                 updated_files=1
             done
@@ -316,7 +378,8 @@ EOF
             echo "Regenerated DSC file: $locpackfile"
             # copy Sources file to avoid of removing by the new DSC version
             # update metadata 'Sources' file
-            update_deb_metadata dists/$loop_dist/$component/source/Sources dsc
+            update_deb_metadata dists/$loop_dist/$component/source/Sources dsc \
+                $locpackfile
             [ "$updated_dsc" == "1" ] || continue
             updated_files=1
             # save the registered DSC file to S3
@@ -398,11 +461,6 @@ EOF
         # 4. sync the latest distribution path changes to S3
         $aws_sync_public dists/$loop_dist "$bucket_path/dists/$loop_dist"
     done
-
-    # unlock the publishing
-    $rm_file $ws_lockfile
-
-    popd
 }
 
 # The 'pack_rpm' function especialy created for RPM packages. It works
@@ -426,9 +484,6 @@ function pack_rpm {
         exit 1
     fi
 
-    # prepare the workspace
-    prepare_ws ${os}_${option_dist}
-
     # copy the needed package binaries to the workspace
     ( cd $repo && cp $pack_rpms $ws/. )
 
@@ -460,29 +515,76 @@ function pack_rpm {
     for hash in $(zcat repodata/other.xml.gz | grep "<package pkgid=" | \
         awk -F'"' '{print $2}') ; do
         updated_rpm=0
+        file_exists=''
         name=$(zcat repodata/other.xml.gz | grep "<package pkgid=\"$hash\"" | \
             awk -F'"' '{print $4}')
+        file=$(zcat repodata/primary.xml.gz | \
+            grep -e "<checksum type=" -e "<location href=" | \
+            grep "$hash" -A1 | grep "<location href=" | \
+            awk -F'"' '{print $2}')
+        # check if the file already exists in S3
+        if $aws ls "$bucket_path/$repopath/$file" ; then
+            echo "WARNING: DSC file already exists in S3!"
+            file_exists=$bucket_path/$repopath/$file
+        fi
         # search the new hash in the old meta file from S3
         if zcat repodata.base/filelists.xml.gz | grep "pkgid=\"$hash\"" | \
             grep "name=\"$name\"" ; then
             echo "WARNING: $name file already registered in S3!"
             echo "File hash: $hash"
-            continue
+            if [ "$file_exists" != "" ] ; then
+                continue
+            fi
         fi
         updated_rpms=1
         # check if the hashed file already exists in old meta file from S3
-        file=$(zcat repodata/primary.xml.gz | \
-            grep -e "<checksum type=" -e "<location href=" | \
-            grep "$hash" -A1 | grep "<location href=" | \
-            awk -F'"' '{print $2}')
-        # check if the file already exists in S3
-        if [ "$force" == "" ] && zcat repodata.base/primary.xml.gz | \
+        if zcat repodata.base/primary.xml.gz | \
                 grep "<location href=\"$file\"" ; then
-            echo "ERROR: the file already exists, but changed, set '-f' to overwrite it: $file"
-            echo "New hash: $hash"
-            # unlock the publishing
-            $rm_file $ws_lockfile
-            exit 1
+            if [ "$force" == "" -a "$file_exists" != "" ] ; then
+                if [ "$skip_errors" == "" ] ; then
+                    echo "ERROR: the file already exists, but changed, set '-f' to overwrite it: $file"
+                    echo "New hash: $hash"
+                    # unlock the publishing
+                    $rm_file $ws_lockfile
+                    exit 1
+                else
+                    echo "WARNING: the file already exists, but changed, set '-f' to overwrite it: $file"
+                    echo "New hash: $hash"
+                    continue
+                fi
+            fi
+            hashes_old=$(zcat repodata.base/primary.xml.gz | \
+                grep -e "<checksum type=" -e "<location href=" | \
+                grep -B1 "$file" | grep "<checksum type=" | \
+                awk -F'>' '{print $2}' | sed 's#<.*##g')
+            # NOTE: for the single file name may exists more than one
+            #       entry in damaged file, to fix it all found entries
+            #       of this file need to be removed
+            for metafile in repodata.base/other \
+                            repodata.base/filelists \
+                            repodata.base/primary ; do
+                up_lines=''
+                if [ "$metafile" == "repodata.base/primary" ]; then
+                    up_full_lines='(\N+\n)*'
+                fi
+                packs_rm=0
+                # find and remove all <package> tags for the bad hashes
+                for hash_rm in $hashes_old ; do
+                    echo "Removing from ${metafile}.xml.gz file old hash: $hash_rm"
+                    zcat ${metafile}.xml.gz | \
+                        pcregrep -Mi -v "(?s)<package ${up_full_lines}\N+(?=${hash_rm}).*?package>" | \
+                        gzip - >${metafile}_new.xml.gz
+                    mv ${metafile}_new.xml.gz ${metafile}.xml.gz
+                    packs_rm=$(($packs_rm+1))
+                done
+                # reduce number of packages in metafile counter
+                gunzip ${metafile}.xml.gz
+                packs=$(($(grep " packages=" ${metafile}.xml | \
+                    sed 's#.* packages="\([0-9]*\)".*#\1#g')-${packs_rm}))
+                sed "s# packages=\"[0-9]*\"# packages=\"${packs}\"#g" \
+                    -i ${metafile}.xml
+                gzip ${metafile}.xml
+            done
         fi
     done
 
@@ -554,22 +656,34 @@ EOF
 
     # update the metadata at the S3
     $aws_sync_public repodata "$bucket_path/$repopath/repodata"
-
-    # unlock the publishing
-    $rm_file $ws_lockfile
-
-    popd
 }
 
 if [ "$os" == "ubuntu" -o "$os" == "debian" ]; then
+    # prepare the workspace
+    prepare_ws ${os}
     pack_deb
+    # unlock the publishing
+    $rm_file $ws_lockfile
+    popd
 elif [ "$os" == "el" -o "$os" == "fedora" ]; then
     # RPM packages structure needs different paths for binaries and sources
     # packages, in this way it is needed to call the packages registering
     # script twice with the given format:
     # pack_rpm <packages store subpath> <patterns of the packages to register>
+
+    # prepare the workspace
+    prepare_ws ${os}_${option_dist}
     pack_rpm x86_64 "*.x86_64.rpm *.noarch.rpm"
+    # unlock the publishing
+    $rm_file $ws_lockfile
+    popd
+
+    # prepare the workspace
+    prepare_ws ${os}_${option_dist}
     pack_rpm SRPMS "*.src.rpm"
+    # unlock the publishing
+    $rm_file $ws_lockfile
+    popd
 else
     echo "USAGE: given OS '$os' is not supported, use any single from the list: $alloss"
     usage
-- 
2.17.1

  reply	other threads:[~2020-03-30  5:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-30  5:38 [Tarantool-patches] [PATCH v1 0/3] extend packages pushing to S3 script Alexander V. Tikhonov
2020-03-30  5:38 ` Alexander V. Tikhonov [this message]
2020-03-30  5:38 ` [Tarantool-patches] [PATCH v1 2/3] Enable script for saving packages in S3 for modules Alexander V. Tikhonov
2020-03-30  5:38 ` [Tarantool-patches] [PATCH v1 3/3] Add help instruction on 'product' option Alexander V. Tikhonov
2020-03-30 14:40 ` [Tarantool-patches] [PATCH v1 0/3] extend packages pushing to S3 script Sergey Bronnikov
2020-04-11  5:04   ` Oleg Piskunov
2020-04-15 13:51 ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f570f79198fef1a9d55f28a43bc4d862d7639f85.1585546306.git.avtikhon@tarantool.org \
    --to=avtikhon@tarantool.org \
    --cc=alexander.turenko@tarantool.org \
    --cc=o.piskunov@tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1 1/3] Add metafiles cleanup routines at S3 pack script' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox