[Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing
@ 2025-10-24 10:50 Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 01/41] perf: add LuaJIT-test-cleanup perf suite Sergey Kaplun via Tarantool-patches
                   ` (40 more replies)
  0 siblings, 41 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patchset introduces the performance testing suite for LuaJIT
benchmarks. It takes the LuaJIT test cleanup benches [1] and adapts them
to use a custom benchmark module with the GoogleBenchamark-similar
format. All results are collected and reported to InfluxDB, like it is
done for the Tarantool's tests.

The results for the following benchmarks are not very stable. It should
be investigated later (I appreciate any help with this):

* array3d
* binary-trees
* euler14-bit
* k-nukleotide
* nsieve (most unstable)
* nsieve-bit
* spectral-norm

Also, I've added notes to some commits where I'm not sure that the
implementation/solution is very good. Any insights are welcome :).

[1]: https://github.com/LuaJIT/LuaJIT-test-cleanup/tree/014708b/bench

Sergey Kaplun (41):
  perf: add LuaJIT-test-cleanup perf suite
  perf: introduce clock module
  perf: introduce bench module
  perf: adjust array3d in LuaJIT-benches
  perf: adjust binary-trees in LuaJIT-benches
  perf: adjust chameneos in LuaJIT-benches
  perf: adjust coroutine-ring in LuaJIT-benches
  perf: adjust euler14-bit in LuaJIT-benches
  perf: adjust fannkuch in LuaJIT-benches
  perf: adjust fasta in LuaJIT-benches
  perf: adjust k-nucleotide in LuaJIT-benches
  perf: adjust life in LuaJIT-benches
  perf: adjust mandelbrot-bit in LuaJIT-benches
  perf: adjust mandelbrot in LuaJIT-benches
  perf: adjust md5 in LuaJIT-benches
  perf: adjust meteor in LuaJIT-benches
  perf: adjust nbody in LuaJIT-benches
  perf: adjust nsieve-bit-fp in LuaJIT-benches
  perf: adjust nsieve-bit in LuaJIT-benches
  perf: adjust nsieve in LuaJIT-benches
  perf: adjust partialsums in LuaJIT-benches
  perf: adjust pidigits-nogmp in LuaJIT-benches
  perf: adjust ray in LuaJIT-benches
  perf: adjust recursive-ack in LuaJIT-benches
  perf: adjust recursive-fib in LuaJIT-benches
  perf: adjust revcomp in LuaJIT-benches
  perf: adjust scimark-2010-12-20 in LuaJIT-benches
  perf: move <scimark_lib.lua> to <libs/> directory
  perf: adjust scimark-fft in LuaJIT-benches
  perf: adjust scimark-lu in LuaJIT-benches
  perf: add scimark-mc in LuaJIT-benches
  perf: adjust scimark-sor in LuaJIT-benches
  perf: adjust scimark-sparse in LuaJIT-benches
  perf: adjust series in LuaJIT-benches
  perf: adjust spectral-norm in LuaJIT-benches
  perf: adjust sum-file in LuaJIT-benches
  perf: add CMake infrastructure
  perf: add aggregator helper for bench statistics
  perf: add a script for the environment setup
  perf: provide CMake option to setup the benchmark
  ci: introduce the performance workflow

 .github/actions/setup-performance/README.md  |   10 +
 .github/actions/setup-performance/action.yml |   18 +
 .github/workflows/performance.yml            |  110 ++
 .gitignore                                   |    5 +
 .luacheckrc                                  |    1 +
 CMakeLists.txt                               |   11 +
 perf/CMakeLists.txt                          |  119 +++
 perf/LuaJIT-benches/CMakeLists.txt           |   52 +
 perf/LuaJIT-benches/PARAM_arm.txt            |   29 +
 perf/LuaJIT-benches/PARAM_mips.txt           |   29 +
 perf/LuaJIT-benches/PARAM_ppc.txt            |   29 +
 perf/LuaJIT-benches/PARAM_x86.txt            |   29 +
 perf/LuaJIT-benches/SUMCOL_1.txt             | 1000 ++++++++++++++++++
 perf/LuaJIT-benches/TEST_md5sum.txt          |   20 +
 perf/LuaJIT-benches/array3d.lua              |   74 ++
 perf/LuaJIT-benches/binary-trees.lua         |  105 ++
 perf/LuaJIT-benches/chameneos.lua            |   82 ++
 perf/LuaJIT-benches/coroutine-ring.lua       |   53 +
 perf/LuaJIT-benches/euler14-bit.lua          |   42 +
 perf/LuaJIT-benches/fannkuch.lua             |   81 ++
 perf/LuaJIT-benches/fasta.lua                |   29 +
 perf/LuaJIT-benches/k-nucleotide.lua         |  129 +++
 perf/LuaJIT-benches/libs/fasta.lua           |   98 ++
 perf/LuaJIT-benches/libs/scimark_lib.lua     |  297 ++++++
 perf/LuaJIT-benches/life.lua                 |  188 ++++
 perf/LuaJIT-benches/mandelbrot-bit.lua       |   61 ++
 perf/LuaJIT-benches/mandelbrot.lua           |   49 +
 perf/LuaJIT-benches/md5.lua                  |  196 ++++
 perf/LuaJIT-benches/meteor.lua               |  246 +++++
 perf/LuaJIT-benches/nbody.lua                |  140 +++
 perf/LuaJIT-benches/nsieve-bit-fp.lua        |   62 ++
 perf/LuaJIT-benches/nsieve-bit.lua           |   52 +
 perf/LuaJIT-benches/nsieve.lua               |   46 +
 perf/LuaJIT-benches/partialsums.lua          |   44 +
 perf/LuaJIT-benches/pidigits-nogmp.lua       |  121 +++
 perf/LuaJIT-benches/ray.lua                  |  159 +++
 perf/LuaJIT-benches/recursive-ack.lua        |   23 +
 perf/LuaJIT-benches/recursive-fib.lua        |   31 +
 perf/LuaJIT-benches/revcomp.lua              |   59 ++
 perf/LuaJIT-benches/scimark-2010-12-20.lua   |  415 ++++++++
 perf/LuaJIT-benches/scimark-fft.lua          |   18 +
 perf/LuaJIT-benches/scimark-lu.lua           |   19 +
 perf/LuaJIT-benches/scimark-mc.lua           |   19 +
 perf/LuaJIT-benches/scimark-sor.lua          |   19 +
 perf/LuaJIT-benches/scimark-sparse.lua       |   19 +
 perf/LuaJIT-benches/series.lua               |   42 +
 perf/LuaJIT-benches/spectral-norm.lua        |   58 +
 perf/LuaJIT-benches/sum-file.lua             |   25 +
 perf/helpers/aggregate.lua                   |  124 +++
 perf/helpers/setup_env.sh                    |  135 +++
 perf/utils/bench.lua                         |  509 +++++++++
 perf/utils/clock.lua                         |   35 +
 52 files changed, 5366 insertions(+)
 create mode 100644 .github/actions/setup-performance/README.md
 create mode 100644 .github/actions/setup-performance/action.yml
 create mode 100644 .github/workflows/performance.yml
 create mode 100644 perf/CMakeLists.txt
 create mode 100644 perf/LuaJIT-benches/CMakeLists.txt
 create mode 100644 perf/LuaJIT-benches/PARAM_arm.txt
 create mode 100644 perf/LuaJIT-benches/PARAM_mips.txt
 create mode 100644 perf/LuaJIT-benches/PARAM_ppc.txt
 create mode 100644 perf/LuaJIT-benches/PARAM_x86.txt
 create mode 100644 perf/LuaJIT-benches/SUMCOL_1.txt
 create mode 100644 perf/LuaJIT-benches/TEST_md5sum.txt
 create mode 100644 perf/LuaJIT-benches/array3d.lua
 create mode 100644 perf/LuaJIT-benches/binary-trees.lua
 create mode 100644 perf/LuaJIT-benches/chameneos.lua
 create mode 100644 perf/LuaJIT-benches/coroutine-ring.lua
 create mode 100644 perf/LuaJIT-benches/euler14-bit.lua
 create mode 100644 perf/LuaJIT-benches/fannkuch.lua
 create mode 100644 perf/LuaJIT-benches/fasta.lua
 create mode 100644 perf/LuaJIT-benches/k-nucleotide.lua
 create mode 100644 perf/LuaJIT-benches/libs/fasta.lua
 create mode 100644 perf/LuaJIT-benches/libs/scimark_lib.lua
 create mode 100644 perf/LuaJIT-benches/life.lua
 create mode 100644 perf/LuaJIT-benches/mandelbrot-bit.lua
 create mode 100644 perf/LuaJIT-benches/mandelbrot.lua
 create mode 100644 perf/LuaJIT-benches/md5.lua
 create mode 100644 perf/LuaJIT-benches/meteor.lua
 create mode 100644 perf/LuaJIT-benches/nbody.lua
 create mode 100644 perf/LuaJIT-benches/nsieve-bit-fp.lua
 create mode 100644 perf/LuaJIT-benches/nsieve-bit.lua
 create mode 100644 perf/LuaJIT-benches/nsieve.lua
 create mode 100644 perf/LuaJIT-benches/partialsums.lua
 create mode 100644 perf/LuaJIT-benches/pidigits-nogmp.lua
 create mode 100644 perf/LuaJIT-benches/ray.lua
 create mode 100644 perf/LuaJIT-benches/recursive-ack.lua
 create mode 100644 perf/LuaJIT-benches/recursive-fib.lua
 create mode 100644 perf/LuaJIT-benches/revcomp.lua
 create mode 100644 perf/LuaJIT-benches/scimark-2010-12-20.lua
 create mode 100644 perf/LuaJIT-benches/scimark-fft.lua
 create mode 100644 perf/LuaJIT-benches/scimark-lu.lua
 create mode 100644 perf/LuaJIT-benches/scimark-mc.lua
 create mode 100644 perf/LuaJIT-benches/scimark-sor.lua
 create mode 100644 perf/LuaJIT-benches/scimark-sparse.lua
 create mode 100644 perf/LuaJIT-benches/series.lua
 create mode 100644 perf/LuaJIT-benches/spectral-norm.lua
 create mode 100644 perf/LuaJIT-benches/sum-file.lua
 create mode 100644 perf/helpers/aggregate.lua
 create mode 100755 perf/helpers/setup_env.sh
 create mode 100644 perf/utils/bench.lua
 create mode 100644 perf/utils/clock.lua

-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 01/41] perf: add LuaJIT-test-cleanup perf suite
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 02/41] perf: introduce clock module Sergey Kaplun via Tarantool-patches
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 88295 bytes --]

This patch introduces the LuaJIT-test-cleanup bench suite [1] into our
LuaJIT fork source tree. To provide relatable reprodusible results
several benchmarks need to be adjusted. However, to be sure we initially use
the valid suite, everything in the <perf/LuaJIT-benches> directory is
moved intact.

[1]: https://github.com/LuaJIT/LuaJIT-test-cleanup/tree/014708b/bench
---
 .luacheckrc                                |    1 +
 perf/LuaJIT-benches/PARAM_arm.txt          |   29 +
 perf/LuaJIT-benches/PARAM_mips.txt         |   29 +
 perf/LuaJIT-benches/PARAM_ppc.txt          |   29 +
 perf/LuaJIT-benches/PARAM_x86.txt          |   29 +
 perf/LuaJIT-benches/SUMCOL_1.txt           | 1000 ++++++++++++++++++++
 perf/LuaJIT-benches/TEST_md5sum.txt        |   20 +
 perf/LuaJIT-benches/array3d.lua            |   59 ++
 perf/LuaJIT-benches/binary-trees.lua       |   47 +
 perf/LuaJIT-benches/chameneos.lua          |   68 ++
 perf/LuaJIT-benches/coroutine-ring.lua     |   42 +
 perf/LuaJIT-benches/euler14-bit.lua        |   22 +
 perf/LuaJIT-benches/fannkuch.lua           |   50 +
 perf/LuaJIT-benches/fasta.lua              |   95 ++
 perf/LuaJIT-benches/k-nucleotide.lua       |   58 ++
 perf/LuaJIT-benches/life.lua               |  111 +++
 perf/LuaJIT-benches/mandelbrot-bit.lua     |   33 +
 perf/LuaJIT-benches/mandelbrot.lua         |   23 +
 perf/LuaJIT-benches/md5.lua                |  183 ++++
 perf/LuaJIT-benches/meteor.lua             |  220 +++++
 perf/LuaJIT-benches/nbody.lua              |  119 +++
 perf/LuaJIT-benches/nsieve-bit-fp.lua      |   37 +
 perf/LuaJIT-benches/nsieve-bit.lua         |   27 +
 perf/LuaJIT-benches/nsieve.lua             |   21 +
 perf/LuaJIT-benches/partialsums.lua        |   29 +
 perf/LuaJIT-benches/pidigits-nogmp.lua     |  100 ++
 perf/LuaJIT-benches/ray.lua                |  135 +++
 perf/LuaJIT-benches/recursive-ack.lua      |    8 +
 perf/LuaJIT-benches/recursive-fib.lua      |    7 +
 perf/LuaJIT-benches/revcomp.lua            |   37 +
 perf/LuaJIT-benches/scimark-2010-12-20.lua |  400 ++++++++
 perf/LuaJIT-benches/scimark-fft.lua        |    1 +
 perf/LuaJIT-benches/scimark-lu.lua         |    1 +
 perf/LuaJIT-benches/scimark-sor.lua        |    1 +
 perf/LuaJIT-benches/scimark-sparse.lua     |    1 +
 perf/LuaJIT-benches/scimark_lib.lua        |  297 ++++++
 perf/LuaJIT-benches/series.lua             |   34 +
 perf/LuaJIT-benches/spectral-norm.lua      |   40 +
 perf/LuaJIT-benches/sum-file.lua           |    6 +
 39 files changed, 3449 insertions(+)
 create mode 100644 perf/LuaJIT-benches/PARAM_arm.txt
 create mode 100644 perf/LuaJIT-benches/PARAM_mips.txt
 create mode 100644 perf/LuaJIT-benches/PARAM_ppc.txt
 create mode 100644 perf/LuaJIT-benches/PARAM_x86.txt
 create mode 100644 perf/LuaJIT-benches/SUMCOL_1.txt
 create mode 100644 perf/LuaJIT-benches/TEST_md5sum.txt
 create mode 100644 perf/LuaJIT-benches/array3d.lua
 create mode 100644 perf/LuaJIT-benches/binary-trees.lua
 create mode 100644 perf/LuaJIT-benches/chameneos.lua
 create mode 100644 perf/LuaJIT-benches/coroutine-ring.lua
 create mode 100644 perf/LuaJIT-benches/euler14-bit.lua
 create mode 100644 perf/LuaJIT-benches/fannkuch.lua
 create mode 100644 perf/LuaJIT-benches/fasta.lua
 create mode 100644 perf/LuaJIT-benches/k-nucleotide.lua
 create mode 100644 perf/LuaJIT-benches/life.lua
 create mode 100644 perf/LuaJIT-benches/mandelbrot-bit.lua
 create mode 100644 perf/LuaJIT-benches/mandelbrot.lua
 create mode 100644 perf/LuaJIT-benches/md5.lua
 create mode 100644 perf/LuaJIT-benches/meteor.lua
 create mode 100644 perf/LuaJIT-benches/nbody.lua
 create mode 100644 perf/LuaJIT-benches/nsieve-bit-fp.lua
 create mode 100644 perf/LuaJIT-benches/nsieve-bit.lua
 create mode 100644 perf/LuaJIT-benches/nsieve.lua
 create mode 100644 perf/LuaJIT-benches/partialsums.lua
 create mode 100644 perf/LuaJIT-benches/pidigits-nogmp.lua
 create mode 100644 perf/LuaJIT-benches/ray.lua
 create mode 100644 perf/LuaJIT-benches/recursive-ack.lua
 create mode 100644 perf/LuaJIT-benches/recursive-fib.lua
 create mode 100644 perf/LuaJIT-benches/revcomp.lua
 create mode 100644 perf/LuaJIT-benches/scimark-2010-12-20.lua
 create mode 100644 perf/LuaJIT-benches/scimark-fft.lua
 create mode 100644 perf/LuaJIT-benches/scimark-lu.lua
 create mode 100644 perf/LuaJIT-benches/scimark-sor.lua
 create mode 100644 perf/LuaJIT-benches/scimark-sparse.lua
 create mode 100644 perf/LuaJIT-benches/scimark_lib.lua
 create mode 100644 perf/LuaJIT-benches/series.lua
 create mode 100644 perf/LuaJIT-benches/spectral-norm.lua
 create mode 100644 perf/LuaJIT-benches/sum-file.lua

diff --git a/.luacheckrc b/.luacheckrc
index 19098dd9..35824875 100644
--- a/.luacheckrc
+++ b/.luacheckrc
@@ -16,6 +16,7 @@ files['test/tarantool-tests/'] = {
 -- test suites and need to be coherent with the upstream.
 exclude_files = {
   'dynasm/',
+  'perf/LuaJIT-benches/',
   'src/',
   'test/LuaJIT-tests/',
   'test/PUC-Rio-Lua-5.1-tests/',
diff --git a/perf/LuaJIT-benches/PARAM_arm.txt b/perf/LuaJIT-benches/PARAM_arm.txt
new file mode 100644
index 00000000..a07fd010
--- /dev/null
+++ b/perf/LuaJIT-benches/PARAM_arm.txt
@@ -0,0 +1,29 @@
+array3d 200
+binary-trees 13
+chameneos 1e6
+coroutine-ring 3e6
+euler14-bit 5e6
+fannkuch 10
+fasta 2e6
+k-nucleotide 5e5 FASTA_500000
+life
+mandelbrot 2000
+mandelbrot-bit 2000
+md5 5000
+nbody 1e6
+nsieve 9
+nsieve-bit 9
+nsieve-bit-fp 9
+partialsums 2e6
+pidigits-nogmp 2000
+ray 4
+recursive-ack 9
+recursive-fib 37
+revcomp 1e6 FASTA_1000000
+scimark-fft 2000
+scimark-lu 300
+scimark-sor 5000
+scimark-sparse 5e3
+series 1500
+spectral-norm 1000
+sum-file 1000 SUMCOL_1000
diff --git a/perf/LuaJIT-benches/PARAM_mips.txt b/perf/LuaJIT-benches/PARAM_mips.txt
new file mode 100644
index 00000000..e6bcadba
--- /dev/null
+++ b/perf/LuaJIT-benches/PARAM_mips.txt
@@ -0,0 +1,29 @@
+array3d 50
+binary-trees 10
+chameneos 5e4
+coroutine-ring 2e5
+euler14-bit 2e4
+fannkuch 8
+fasta 2e4
+k-nucleotide 1e4 FASTA_10000
+life
+mandelbrot 150
+mandelbrot-bit 150
+md5 10
+nbody 1e4
+nsieve 4
+nsieve-bit 4
+nsieve-bit-fp 2
+partialsums 5e4
+pidigits-nogmp 150
+ray 2
+recursive-ack 7
+recursive-fib 29
+revcomp 5e4 FASTA_50000
+scimark-fft 20
+scimark-lu 3
+scimark-sor 40
+scimark-sparse 100
+series 50
+spectral-norm 100
+sum-file 100 SUMCOL_100
diff --git a/perf/LuaJIT-benches/PARAM_ppc.txt b/perf/LuaJIT-benches/PARAM_ppc.txt
new file mode 100644
index 00000000..c8319a15
--- /dev/null
+++ b/perf/LuaJIT-benches/PARAM_ppc.txt
@@ -0,0 +1,29 @@
+array3d 200
+binary-trees 13
+chameneos 1e6
+coroutine-ring 4e6
+euler14-bit 1e6
+fannkuch 9
+fasta 5e5
+k-nucleotide 1e5 FASTA_100000
+life
+mandelbrot 800
+mandelbrot-bit 800
+md5 500
+nbody 1e5
+nsieve 8
+nsieve-bit 8
+nsieve-bit-fp 8
+partialsums 5e5
+pidigits-nogmp 800
+ray 5
+recursive-ack 9
+recursive-fib 34
+revcomp 1e6 FASTA_1000000
+scimark-fft 500
+scimark-lu 100
+scimark-sor 1000
+scimark-sparse 3000
+series 1000
+spectral-norm 200
+sum-file 1000 SUMCOL_1000
diff --git a/perf/LuaJIT-benches/PARAM_x86.txt b/perf/LuaJIT-benches/PARAM_x86.txt
new file mode 100644
index 00000000..87088d7b
--- /dev/null
+++ b/perf/LuaJIT-benches/PARAM_x86.txt
@@ -0,0 +1,29 @@
+array3d 300
+binary-trees 16
+chameneos 1e7
+coroutine-ring 2e7
+euler14-bit 2e7
+fannkuch 11
+fasta 25e6
+k-nucleotide 5e6 FASTA_5000000
+life
+mandelbrot 5000
+mandelbrot-bit 5000
+md5 20000
+nbody 5e6
+nsieve 12
+nsieve-bit 12
+nsieve-bit-fp 12
+partialsums 1e7
+pidigits-nogmp 5000
+ray 9
+recursive-ack 10
+recursive-fib 40
+revcomp 5e6 FASTA_5000000
+scimark-fft 50000
+scimark-lu 5000
+scimark-sor 50000
+scimark-sparse 15e4
+series 10000
+spectral-norm 3000
+sum-file 5000 SUMCOL_5000
diff --git a/perf/LuaJIT-benches/SUMCOL_1.txt b/perf/LuaJIT-benches/SUMCOL_1.txt
new file mode 100644
index 00000000..956aba14
--- /dev/null
+++ b/perf/LuaJIT-benches/SUMCOL_1.txt
@@ -0,0 +1,1000 @@
+276
+498
+-981
+770
+-401
+702
+966
+950
+-853
+-53
+-293
+604
+288
+892
+-697
+204
+96
+408
+880
+-7
+-817
+422
+-261
+-485
+-77
+826
+184
+864
+-751
+626
+812
+-369
+-353
+-371
+488
+-83
+-659
+24
+524
+-21
+840
+-757
+-17
+-973
+-843
+260
+858
+-389
+-521
+-99
+482
+-561
+-213
+630
+766
+932
+112
+-419
+-877
+762
+266
+-837
+170
+834
+746
+764
+922
+-89
+576
+-63
+90
+684
+316
+506
+-959
+708
+70
+252
+-747
+342
+-593
+-895
+-937
+-707
+350
+588
+-201
+-683
+-113
+-511
+-867
+322
+202
+472
+150
+-9
+-643
+28
+336
+86
+-925
+836
+-473
+-451
+-971
+-805
+-619
+84
+-67
+806
+270
+366
+334
+-555
+-557
+-331
+-409
+-553
+-145
+-71
+528
+490
+492
+828
+628
+-961
+536
+-859
+-271
+974
+-671
+-749
+414
+-257
+778
+56
+598
+-437
+-899
+-785
+-987
+32
+-999
+132
+-821
+-209
+402
+-543
+194
+-967
+294
+-943
+-285
+-483
+-97
+660
+-481
+-829
+-309
+-597
+-855
+80
+-355
+192
+-823
+436
+916
+282
+-629
+612
+-329
+-535
+780
+-47
+706
+110
+756
+-857
+-933
+-345
+-523
+718
+-31
+902
+678
+540
+698
+456
+-399
+126
+412
+-563
+-321
+-487
+-641
+-195
+-199
+-955
+772
+570
+18
+-217
+886
+984
+-721
+-995
+46
+-989
+946
+64
+716
+-719
+-869
+-579
+776
+450
+936
+980
+-439
+-977
+-455
+-997
+6
+268
+-269
+-421
+328
+352
+578
+-575
+476
+976
+-57
+-469
+544
+582
+-43
+510
+-939
+-581
+-337
+-203
+-737
+-827
+852
+-279
+-803
+-911
+-865
+548
+48
+-75
+416
+-275
+688
+-255
+-687
+-461
+-233
+420
+912
+-901
+-299
+12
+568
+694
+-411
+-883
+-327
+-361
+-339
+646
+-137
+-905
+670
+686
+-131
+-849
+-825
+256
+228
+-841
+68
+368
+-909
+242
+298
+118
+10
+222
+954
+-493
+-459
+-445
+608
+-765
+34
+468
+-715
+690
+-185
+-551
+-571
+-241
+292
+92
+768
+-923
+956
+614
+8
+730
+208
+-417
+300
+136
+-59
+-251
+-539
+166
+798
+866
+454
+-391
+-317
+668
+502
+-15
+994
+854
+-189
+666
+446
+-565
+-5
+42
+-227
+-87
+-779
+26
+312
+354
+754
+396
+-515
+220
+872
+654
+88
+-667
+250
+572
+952
+72
+982
+972
+-529
+-471
+-533
+-427
+538
+154
+-457
+-819
+750
+152
+452
+-41
+838
+-489
+418
+-649
+-637
+-197
+74
+394
+-653
+-727
+-435
+-23
+348
+638
+-611
+914
+-357
+-743
+-685
+580
+-247
+-577
+54
+-931
+-3
+558
+-793
+-443
+-759
+162
+-811
+384
+720
+-117
+900
+-519
+-39
+744
+432
+286
+-873
+380
+-167
+-283
+430
+-155
+-755
+206
+100
+364
+-677
+332
+-567
+382
+-605
+-181
+676
+-475
+-845
+910
+546
+14
+398
+616
+-769
+424
+992
+-235
+-239
+774
+478
+-919
+168
+-771
+-773
+-69
+-509
+930
+550
+-463
+178
+-861
+-761
+-795
+234
+-831
+-61
+-979
+-851
+-665
+-709
+896
+742
+-123
+590
+-693
+-887
+-379
+144
+-717
+20
+174
+82
+464
+30
+-969
+-349
+-531
+-799
+-661
+-647
+-623
+878
+148
+-545
+238
+-259
+554
+726
+-37
+-797
+98
+78
+-591
+-975
+962
+120
+906
+-207
+656
+-171
+652
+188
+672
+-133
+-91
+224
+818
+-333
+-839
+-499
+22
+-739
+142
+378
+-403
+-315
+370
+284
+122
+230
+-527
+-127
+442
+534
+160
+722
+262
+-657
+304
+258
+-103
+960
+-495
+-265
+634
+-101
+480
+-363
+308
+76
+-949
+-585
+904
+146
+-703
+164
+850
+246
+732
+-725
+566
+274
+-163
+-935
+-681
+-229
+254
+-733
+-547
+-273
+-903
+736
+-711
+794
+392
+-655
+-549
+808
+-429
+484
+-701
+-617
+804
+36
+-775
+-335
+-927
+714
+-177
+-325
+-413
+-963
+114
+-253
+-789
+-645
+40
+434
+898
+924
+-19
+738
+788
+280
+-121
+594
+-913
+426
+816
+-373
+-45
+340
+-109
+-323
+58
+-249
+940
+-297
+988
+998
+-607
+-745
+-633
+-115
+996
+-893
+696
+400
+848
+500
+-263
+562
+-807
+-105
+-603
+658
+-73
+-863
+448
+680
+-157
+-161
+728
+814
+-477
+-375
+1000
+-631
+-991
+362
+156
+-187
+-705
+-917
+-449
+-741
+556
+440
+-589
+-11
+-359
+-891
+-801
+-153
+-381
+938
+-173
+-243
+618
+-599
+-497
+486
+128
+790
+460
+-27
+-305
+-205
+-215
+324
+-341
+50
+458
+52
+-621
+874
+386
+560
+-569
+-51
+802
+786
+920
+-425
+466
+444
+-507
+-915
+346
+622
+-679
+784
+-689
+388
+508
+-613
+-313
+-447
+564
+-897
+-211
+-225
+-615
+-367
+186
+894
+-65
+-453
+-245
+602
+496
+-651
+-601
+820
+226
+-695
+-119
+372
+180
+94
+214
+542
+648
+-871
+592
+584
+824
+796
+374
+-945
+-311
+516
+942
+-221
+-433
+200
+-465
+-953
+870
+868
+-879
+518
+356
+-223
+682
+990
+-191
+-541
+-951
+-921
+-319
+-169
+-291
+-289
+792
+876
+306
+-491
+326
+-885
+62
+514
+-929
+318
+-231
+632
+44
+-107
+644
+-267
+-343
+-847
+934
+734
+-505
+-351
+574
+-627
+636
+-93
+-431
+-835
+428
+-183
+-151
+2
+-813
+-595
+958
+-141
+692
+-385
+610
+-179
+376
+948
+198
+-675
+964
+-907
+918
+-165
+-1
+406
+748
+-111
+532
+-55
+-281
+740
+504
+236
+-29
+662
+-713
+-537
+196
+-587
+822
+-135
+700
+-35
+674
+-407
+240
+-673
+-669
+-393
+470
+-525
+-875
+-383
+-625
+296
+-85
+-147
+-277
+800
+-691
+-143
+16
+-983
+-303
+290
+-139
+172
+320
+512
+596
+640
+664
+-791
+-783
+-387
+-735
+-467
+-301
+810
+134
+216
+278
+176
+606
+140
+-787
+978
+586
+890
+882
+-753
+-13
+970
+-941
+-175
+-777
+-809
+-441
+-347
+-377
+390
+-423
+842
+642
+190
+302
+438
+704
+310
+-49
+124
+-781
+-287
+724
+-767
+830
+620
+-295
+244
+-159
+-307
+-397
+66
+-237
+314
+-79
+624
+710
+272
+-365
+928
+856
+138
+-479
+520
+832
+862
+760
+846
+-81
+106
+-513
+-193
+650
+782
+-517
+944
+218
+712
+-663
+-559
+462
+-635
+-25
+182
+530
+844
+330
+-833
+102
+-881
+108
+-947
+-763
+-405
+232
+410
+104
+-729
+-149
+-889
+888
+360
+968
+908
+116
+-815
+-129
+522
+-723
+-993
+860
+-503
+926
+-219
+-415
+60
+158
+-609
+-501
+986
+-699
+-583
+884
+212
+210
+-957
+526
+-985
+552
+344
+-395
+-95
+338
+248
+494
+130
+404
+358
+600
+-639
+-125
+-33
+-965
+752
+474
+-731
+758
+-573
+4
+38
+264
diff --git a/perf/LuaJIT-benches/TEST_md5sum.txt b/perf/LuaJIT-benches/TEST_md5sum.txt
new file mode 100644
index 00000000..15aa8a1c
--- /dev/null
+++ b/perf/LuaJIT-benches/TEST_md5sum.txt
@@ -0,0 +1,20 @@
+binarytrees	10	7202f4e13df7abc5ad8c07f05fe9d644
+chameneos	1e5	a629ce12f63050c6656bce175258cf8f
+cheapconcr	1000	d29799d1e263810a4db7bbf43ca66499
+cheapconcw	1000	d29799d1e263810a4db7bbf43ca66499
+fannkuch	8	51e5e372cbc5471ea8940b20ad782319
+fasta	1e5	78cd327de6f0a5667da0aa9349888279
+knucleotide	x	88efb24c1fed533959ed84bb32c88142 <FASTA_10000
+mandelbrot	200	cc65e64bd553ed18896de1dfe7fae3e5
+meteor	3000	9a65bb4b0a735ace1eaa4f2628f01026
+nbody	1e4	e0361c898ba747117ec177f7b3b3359c
+nsieve	4	767e02c93624995732e151932fa5f304
+nsievebits	4	767e02c93624995732e151932fa5f304
+partialsums	1e5	33efb41c72f8ecfb5b36c99e32189a3f
+pidigits	200	173a11a77bb1e72dd31254a760317428
+recursive	4	07a47c2d2cf50503b16efda789f84916
+regexdna	x	fdf3e6e9c599754e1eec3e524ea13fed <FASTA_10000
+revcomp	x	47de276e2f72519b57b82da39f4c7592 <FASTA_10000
+spectralnorm 200	25f44bd552ccd9faa0ee2ae5617947e2
+sumfile	x	2ebd3caa45b31a2e74e436b645eab4b0 <SUMCOL_100
+
diff --git a/perf/LuaJIT-benches/array3d.lua b/perf/LuaJIT-benches/array3d.lua
new file mode 100644
index 00000000..c10b09b1
--- /dev/null
+++ b/perf/LuaJIT-benches/array3d.lua
@@ -0,0 +1,59 @@
+
+local function array_set(self, x, y, z, p)
+  assert(x >= 0 and x < self.nx, "x outside PA")
+  assert(y >= 0 and y < self.ny, "y outside PA")
+  assert(z >= 0 and z < self.nz, "z outside PA")
+  local pos = (z*self.ny + y)*self.nx + x
+  local image = self.image
+  if self.packed then
+    local maxv = self.max_voltage
+    if p > maxv then self.max_voltage = p*2.0 end
+    local oldp = image[pos] or 0.0 -- Works with uninitialized table, too
+    if oldp > maxv then p = p + maxv*2.0 end
+    image[pos] = p
+  else
+    image[pos] = p
+  end
+  self.changed = true
+  self.changed_recently = true
+end
+
+local function array_points(self)
+  local y, z = 0, 0
+  return function(self, x)
+    x = x + 1
+    if x >= self.nx then
+      x = 0
+      y = y + 1
+      if y >= self.ny then
+	y = 0
+	z = z + 1
+	if z >= self.nz then
+	  return nil, nil, nil
+	end
+      end
+    end
+    return x, y, z
+  end, self, 0
+end
+
+local function array_new(nx, ny, nz, packed)
+  return {
+    nx = nx, ny = ny, nz = nz,
+    packed = packed, max_voltage = 0.0,
+    changed = false, changed_recently = false,
+    image = {}, -- Preferably use a fixed-type, pre-sized array here.
+    set = array_set,
+    points = array_points,
+  }
+end
+
+local dim = tonumber(arg and arg[1]) or 300 -- Array dimension dim^3
+local packed = arg and arg[2] == "packed"   -- Packed image or flat
+local arr = array_new(dim, dim, dim, packed)
+
+for x,y,z in arr:points() do
+  arr:set(x, y, z, x*x)
+end
+assert(arr.image[dim^3-1] == (dim-1)^2)
+
diff --git a/perf/LuaJIT-benches/binary-trees.lua b/perf/LuaJIT-benches/binary-trees.lua
new file mode 100644
index 00000000..bf040466
--- /dev/null
+++ b/perf/LuaJIT-benches/binary-trees.lua
@@ -0,0 +1,47 @@
+
+local function BottomUpTree(item, depth)
+  if depth > 0 then
+    local i = item + item
+    depth = depth - 1
+    local left, right = BottomUpTree(i-1, depth), BottomUpTree(i, depth)
+    return { item, left, right }
+  else
+    return { item }
+  end
+end
+
+local function ItemCheck(tree)
+  if tree[2] then
+    return tree[1] + ItemCheck(tree[2]) - ItemCheck(tree[3])
+  else
+    return tree[1]
+  end
+end
+
+local N = tonumber(arg and arg[1]) or 0
+local mindepth = 4
+local maxdepth = mindepth + 2
+if maxdepth < N then maxdepth = N end
+
+do
+  local stretchdepth = maxdepth + 1
+  local stretchtree = BottomUpTree(0, stretchdepth)
+  io.write(string.format("stretch tree of depth %d\t check: %d\n",
+    stretchdepth, ItemCheck(stretchtree)))
+end
+
+local longlivedtree = BottomUpTree(0, maxdepth)
+
+for depth=mindepth,maxdepth,2 do
+  local iterations = 2 ^ (maxdepth - depth + mindepth)
+  local check = 0
+  for i=1,iterations do
+    check = check + ItemCheck(BottomUpTree(1, depth)) +
+            ItemCheck(BottomUpTree(-1, depth))
+  end
+  io.write(string.format("%d\t trees of depth %d\t check: %d\n",
+    iterations*2, depth, check))
+end
+
+io.write(string.format("long lived tree of depth %d\t check: %d\n",
+  maxdepth, ItemCheck(longlivedtree)))
diff --git a/perf/LuaJIT-benches/chameneos.lua b/perf/LuaJIT-benches/chameneos.lua
new file mode 100644
index 00000000..78b64c3f
--- /dev/null
+++ b/perf/LuaJIT-benches/chameneos.lua
@@ -0,0 +1,68 @@
+
+local co = coroutine
+local create, resume, yield = co.create, co.resume, co.yield
+
+local N = tonumber(arg and arg[1]) or 10
+local first, second
+
+-- Meet another creature.
+local function meet(me)
+  while second do yield() end -- Wait until meeting place clears.
+  local other = first
+  if other then -- Hey, I found a new friend!
+    first = nil
+    second = me
+  else -- Sniff, nobody here (yet).
+    local n = N - 1
+    if n < 0 then return end -- Uh oh, the mall is closed.
+    N = n
+    first = me
+    repeat yield(); other = second until other -- Wait for another creature.
+    second = nil
+    yield() -- Be nice and let others meet up.
+  end
+  return other
+end
+
+-- Create a very social creature.
+local function creature(color)
+  return create(function()
+    local me = color
+    for met=0,1000000000 do
+      local other = meet(me)
+      if not other then return met end
+      if me ~= other then
+        if me == "blue" then me = other == "red" and "yellow" or "red"
+        elseif me == "red" then me = other == "blue" and "yellow" or "blue"
+        else me = other == "blue" and "red" or "blue" end
+      end
+    end
+  end)
+end
+
+-- Trivial round-robin scheduler.
+local function schedule(threads)
+  local resume = resume
+  local nthreads, meetings = #threads, 0
+  repeat
+    for i=1,nthreads do
+      local thr = threads[i]
+      if not thr then return meetings end
+      local ok, met = resume(thr)
+      if met then
+        meetings = meetings + met
+        threads[i] = nil
+      end
+    end
+  until false
+end
+
+-- A bunch of colorful creatures.
+local threads = {
+  creature("blue"),
+  creature("red"),
+  creature("yellow"),
+  creature("blue"),
+}
+
+io.write(schedule(threads), "\n")
diff --git a/perf/LuaJIT-benches/coroutine-ring.lua b/perf/LuaJIT-benches/coroutine-ring.lua
new file mode 100644
index 00000000..1e8c5ef6
--- /dev/null
+++ b/perf/LuaJIT-benches/coroutine-ring.lua
@@ -0,0 +1,42 @@
+-- The Computer Language Benchmarks Game
+-- http://shootout.alioth.debian.org/
+-- contributed by Sam Roberts
+-- reviewed by Bruno Massa
+
+local n         = tonumber(arg and arg[1]) or 2e7
+
+-- fixed size pool
+local poolsize  = 503
+local threads   = {}
+
+-- cache these to avoid global environment lookups
+local create    = coroutine.create
+local resume    = coroutine.resume
+local yield     = coroutine.yield
+
+local id        = 1
+local token     = 0
+local ok
+
+local body = function(token)
+  while true do
+    token = yield(token + 1)
+  end
+end
+
+-- create all threads
+for id = 1, poolsize do
+  threads[id] = create(body)
+end
+
+-- send the token
+repeat
+  if id == poolsize then
+    id = 1
+  else
+    id = id + 1
+  end
+  ok, token = resume(threads[id], token)
+until token == n
+
+io.write(id, "\n")
diff --git a/perf/LuaJIT-benches/euler14-bit.lua b/perf/LuaJIT-benches/euler14-bit.lua
new file mode 100644
index 00000000..537f2bf3
--- /dev/null
+++ b/perf/LuaJIT-benches/euler14-bit.lua
@@ -0,0 +1,22 @@
+
+local bit = require("bit")
+local bnot, bor, band = bit.bnot, bit.bor, bit.band
+local shl, shr = bit.lshift, bit.rshift
+
+local N = tonumber(arg and arg[1]) or 10000000
+local cache, m, n = { 1 }, 1, 1
+if arg and arg[2] then cache = nil end
+for i=2,N do
+  local j = i
+  for len=1,1000000000 do
+    j = bor(band(shr(j,1), band(j,1)-1), band(shl(j,1)+j+1, bnot(band(j,1)-1)))
+    if cache then
+      local x = cache[j]; if x then j = x+len; break end
+    elseif j == 1 then
+      j = len+1; break
+    end
+  end
+  if cache then cache[i] = j end
+  if j > m then m, n = j, i end
+end
+io.write("Found ", n, " (chain length: ", m, ")\n")
diff --git a/perf/LuaJIT-benches/fannkuch.lua b/perf/LuaJIT-benches/fannkuch.lua
new file mode 100644
index 00000000..2a4cd426
--- /dev/null
+++ b/perf/LuaJIT-benches/fannkuch.lua
@@ -0,0 +1,50 @@
+
+local function fannkuch(n)
+  local p, q, s, odd, check, maxflips = {}, {}, {}, true, 0, 0
+  for i=1,n do p[i] = i; q[i] = i; s[i] = i end
+  repeat
+    -- Print max. 30 permutations.
+    if check < 30 then
+      if not p[n] then return maxflips end	-- Catch n = 0, 1, 2.
+      io.write(unpack(p)); io.write("\n")
+      check = check + 1
+    end
+    -- Copy and flip.
+    local q1 = p[1]				-- Cache 1st element.
+    if p[n] ~= n and q1 ~= 1 then		-- Avoid useless work.
+      for i=2,n do q[i] = p[i] end		-- Work on a copy.
+      local flips = 1			-- Flip ...
+      while true do
+	local qq = q[q1]
+	if qq == 1 then				-- ... until 1st element is 1.
+	  if flips > maxflips then maxflips = flips end -- New maximum?
+	  break
+	end
+	q[q1] = q1
+	if q1 >= 4 then
+	  local i, j = 2, q1 - 1
+	  repeat q[i], q[j] = q[j], q[i]; i = i + 1; j = j - 1; until i >= j
+	end
+	q1 = qq
+	flips=flips+1
+      end
+    end
+    -- Permute.
+    if odd then
+      p[2], p[1] = p[1], p[2]; odd = false	-- Rotate 1<-2.
+    else
+      p[2], p[3] = p[3], p[2]; odd = true	-- Rotate 1<-2 and 1<-2<-3.
+      for i=3,n do
+	local sx = s[i]
+	if sx ~= 1 then s[i] = sx-1; break end
+	if i == n then return maxflips end	-- Out of permutations.
+	s[i] = i
+	-- Rotate 1<-...<-i+1.
+	local t=p[1]; for j=i+1,1,-1 do p[j],t=t,p[j] end
+      end
+    end
+  until false
+end
+
+local n = tonumber(arg and arg[1]) or 1
+io.write("Pfannkuchen(", n, ") = ", fannkuch(n), "\n")
diff --git a/perf/LuaJIT-benches/fasta.lua b/perf/LuaJIT-benches/fasta.lua
new file mode 100644
index 00000000..7ce60804
--- /dev/null
+++ b/perf/LuaJIT-benches/fasta.lua
@@ -0,0 +1,95 @@
+
+local Last = 42
+local function random(max)
+  local y = (Last * 3877 + 29573) % 139968
+  Last = y
+  return (max * y) / 139968
+end
+
+local function make_repeat_fasta(id, desc, s, n)
+  local write, sub = io.write, string.sub
+  write(">", id, " ", desc, "\n")
+  local p, sn, s2 = 1, #s, s..s
+  for i=60,n,60 do
+    write(sub(s2, p, p + 59), "\n")
+    p = p + 60; if p > sn then p = p - sn end
+  end
+  local tail = n % 60
+  if tail > 0 then write(sub(s2, p, p + tail-1), "\n") end
+end
+
+local function make_random_fasta(id, desc, bs, n)
+  io.write(">", id, " ", desc, "\n")
+  loadstring([=[
+    local write, char, unpack, n, random = io.write, string.char, unpack, ...
+    local buf, p = {}, 1
+    for i=60,n,60 do
+      for j=p,p+59 do ]=]..bs..[=[ end
+      buf[p+60] = 10; p = p + 61
+      if p >= 2048 then write(char(unpack(buf, 1, p-1))); p = 1 end
+    end
+    local tail = n % 60
+    if tail > 0 then
+      for j=p,p+tail-1 do ]=]..bs..[=[ end
+      p = p + tail; buf[p] = 10; p = p + 1
+    end
+    write(char(unpack(buf, 1, p-1)))
+  ]=], desc)(n, random)
+end
+
+local function bisect(c, p, lo, hi)
+  local n = hi - lo
+  if n == 0 then return "buf[j] = "..c[hi].."\n" end
+  local mid = math.floor(n / 2)
+  return "if r < "..p[lo+mid].." then\n"..bisect(c, p, lo, lo+mid)..
+         "else\n"..bisect(c, p, lo+mid+1, hi).."end\n"
+end
+
+local function make_bisect(tab)
+  local c, p, sum = {}, {}, 0
+  for i,row in ipairs(tab) do
+    c[i] = string.byte(row[1])
+    sum = sum + row[2]
+    p[i] = sum
+  end
+  return "local r = random(1)\n"..bisect(c, p, 1, #tab)
+end
+
+local alu =
+  "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"..
+  "GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"..
+  "CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"..
+  "ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"..
+  "GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"..
+  "AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"..
+  "AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"
+
+local iub = make_bisect{
+  { "a", 0.27 },
+  { "c", 0.12 },
+  { "g", 0.12 },
+  { "t", 0.27 },
+  { "B", 0.02 },
+  { "D", 0.02 },
+  { "H", 0.02 },
+  { "K", 0.02 },
+  { "M", 0.02 },
+  { "N", 0.02 },
+  { "R", 0.02 },
+  { "S", 0.02 },
+  { "V", 0.02 },
+  { "W", 0.02 },
+  { "Y", 0.02 },
+}
+
+local homosapiens = make_bisect{
+  { "a", 0.3029549426680 },
+  { "c", 0.1979883004921 },
+  { "g", 0.1975473066391 },
+  { "t", 0.3015094502008 },
+}
+
+local N = tonumber(arg and arg[1]) or 1000
+make_repeat_fasta('ONE', 'Homo sapiens alu', alu, N*2)
+make_random_fasta('TWO', 'IUB ambiguity codes', iub, N*3)
+make_random_fasta('THREE', 'Homo sapiens frequency', homosapiens, N*5)
diff --git a/perf/LuaJIT-benches/k-nucleotide.lua b/perf/LuaJIT-benches/k-nucleotide.lua
new file mode 100644
index 00000000..0bfb41be
--- /dev/null
+++ b/perf/LuaJIT-benches/k-nucleotide.lua
@@ -0,0 +1,58 @@
+
+local function kfrequency(seq, freq, k, frame)
+  local sub = string.sub
+  local k1 = k - 1
+  for i=frame,#seq-k1,k do
+    local c = sub(seq, i, i+k1)
+    freq[c] = (freq[c] or 0) + 1
+  end
+end
+
+local function count(seq, frag)
+  local k = #frag
+  local freq = {}
+  for frame=1,k do kfrequency(seq, freq, k, frame) end
+  io.write(freq[frag] or 0, "\t", frag, "\n")
+end
+
+local function frequency(seq, k)
+  local freq = {}
+  for frame=1,k do kfrequency(seq, freq, k, frame) end
+  local sfreq, sn, sum = {}, 1, 0
+  for c,v in pairs(freq) do sfreq[sn] = c; sn = sn + 1; sum = sum + v end
+  table.sort(sfreq, function(a, b)
+    local fa, fb = freq[a], freq[b]
+    return fa == fb and a > b or fa > fb
+  end)
+  for _,c in ipairs(sfreq) do
+    io.write(string.format("%s %0.3f\n", c, (freq[c]*100)/sum))
+  end
+  io.write("\n")
+end
+
+local function readseq()
+  local sub = string.sub
+  for line in io.lines() do
+    if sub(line, 1, 1) == ">" and sub(line, 2, 6) == "THREE" then break end
+  end
+  local lines, ln = {}, 0
+  for line in io.lines() do
+    local c = sub(line, 1, 1)
+    if c == ">" then
+      break
+    elseif c ~= ";" then
+      ln = ln + 1
+      lines[ln] = line
+    end
+  end
+  return string.upper(table.concat(lines, "", 1, ln))
+end
+
+local seq = readseq()
+frequency(seq, 1)
+frequency(seq, 2)
+count(seq, "GGT")
+count(seq, "GGTA")
+count(seq, "GGTATT")
+count(seq, "GGTATTTTAATT")
+count(seq, "GGTATTTTAATTTATAGT")
diff --git a/perf/LuaJIT-benches/life.lua b/perf/LuaJIT-benches/life.lua
new file mode 100644
index 00000000..911d9fe1
--- /dev/null
+++ b/perf/LuaJIT-benches/life.lua
@@ -0,0 +1,111 @@
+-- life.lua
+-- original by Dave Bollinger <DBollinger@compuserve.com> posted to lua-l
+-- modified to use ANSI terminal escape sequences
+-- modified to use for instead of while
+
+local write=io.write
+
+ALIVE="¥"	DEAD="þ"
+ALIVE="O"	DEAD="-"
+
+function delay() -- NOTE: SYSTEM-DEPENDENT, adjust as necessary
+  for i=1,10000 do end
+  -- local i=os.clock()+1 while(os.clock()<i) do end
+end
+
+function ARRAY2D(w,h)
+  local t = {w=w,h=h}
+  for y=1,h do
+    t[y] = {}
+    for x=1,w do
+      t[y][x]=0
+    end
+  end
+  return t
+end
+
+_CELLS = {}
+
+-- give birth to a "shape" within the cell array
+function _CELLS:spawn(shape,left,top)
+  for y=0,shape.h-1 do
+    for x=0,shape.w-1 do
+      self[top+y][left+x] = shape[y*shape.w+x+1]
+    end
+  end
+end
+
+-- run the CA and produce the next generation
+function _CELLS:evolve(next)
+  local ym1,y,yp1,yi=self.h-1,self.h,1,self.h
+  while yi > 0 do
+    local xm1,x,xp1,xi=self.w-1,self.w,1,self.w
+    while xi > 0 do
+      local sum = self[ym1][xm1] + self[ym1][x] + self[ym1][xp1] +
+                  self[y][xm1] + self[y][xp1] +
+                  self[yp1][xm1] + self[yp1][x] + self[yp1][xp1]
+      next[y][x] = ((sum==2) and self[y][x]) or ((sum==3) and 1) or 0
+      xm1,x,xp1,xi = x,xp1,xp1+1,xi-1
+    end
+    ym1,y,yp1,yi = y,yp1,yp1+1,yi-1
+  end
+end
+
+-- output the array to screen
+function _CELLS:draw()
+  local out="" -- accumulate to reduce flicker
+  for y=1,self.h do
+   for x=1,self.w do
+      out=out..(((self[y][x]>0) and ALIVE) or DEAD)
+    end
+    out=out.."\n"
+  end
+  write(out)
+end
+
+-- constructor
+function CELLS(w,h)
+  local c = ARRAY2D(w,h)
+  c.spawn = _CELLS.spawn
+  c.evolve = _CELLS.evolve
+  c.draw = _CELLS.draw
+  return c
+end
+
+--
+-- shapes suitable for use with spawn() above
+--
+HEART = { 1,0,1,1,0,1,1,1,1; w=3,h=3 }
+GLIDER = { 0,0,1,1,0,1,0,1,1; w=3,h=3 }
+EXPLODE = { 0,1,0,1,1,1,1,0,1,0,1,0; w=3,h=4 }
+FISH = { 0,1,1,1,1,1,0,0,0,1,0,0,0,0,1,1,0,0,1,0; w=5,h=4 }
+BUTTERFLY = { 1,0,0,0,1,0,1,1,1,0,1,0,0,0,1,1,0,1,0,1,1,0,0,0,1; w=5,h=5 }
+
+-- the main routine
+function LIFE(w,h)
+  -- create two arrays
+  local thisgen = CELLS(w,h)
+  local nextgen = CELLS(w,h)
+
+  -- create some life
+  -- about 1000 generations of fun, then a glider steady-state
+  thisgen:spawn(GLIDER,5,4)
+  thisgen:spawn(EXPLODE,25,10)
+  thisgen:spawn(FISH,4,12)
+
+  -- run until break
+  local gen=1
+  write("\027[2J")	-- ANSI clear screen
+  while 1 do
+    thisgen:evolve(nextgen)
+    thisgen,nextgen = nextgen,thisgen
+    write("\027[H")	-- ANSI home cursor
+    thisgen:draw()
+    write("Life - generation ",gen,"\n")
+    gen=gen+1
+    if gen>2000 then break end
+    --delay()		-- no delay
+  end
+end
+
+LIFE(40,20)
diff --git a/perf/LuaJIT-benches/mandelbrot-bit.lua b/perf/LuaJIT-benches/mandelbrot-bit.lua
new file mode 100644
index 00000000..91d96975
--- /dev/null
+++ b/perf/LuaJIT-benches/mandelbrot-bit.lua
@@ -0,0 +1,33 @@
+
+local bit = require("bit")
+local bor, band = bit.bor, bit.band
+local shl, shr, rol = bit.lshift, bit.rshift, bit.rol
+local write, char, unpack = io.write, string.char, unpack
+local N = tonumber(arg and arg[1]) or 100
+local M, buf = 2/N, {}
+write("P4\n", N, " ", N, "\n")
+for y=0,N-1 do
+  local Ci, b, p = y*M-1, -16777216, 0
+  local Ciq = Ci*Ci
+  for x=0,N-1,2 do
+    local Cr, Cr2 = x*M-1.5, (x+1)*M-1.5
+    local Zr, Zi, Zrq, Ziq = Cr, Ci, Cr*Cr, Ciq
+    local Zr2, Zi2, Zrq2, Ziq2 = Cr2, Ci, Cr2*Cr2, Ciq
+    b = rol(b, 2)
+    for i=1,49 do
+      Zi = Zr*Zi*2 + Ci; Zi2 = Zr2*Zi2*2 + Ci
+      Zr = Zrq-Ziq + Cr; Zr2 = Zrq2-Ziq2 + Cr2
+      Ziq = Zi*Zi; Ziq2 = Zi2*Zi2
+      Zrq = Zr*Zr; Zrq2 = Zr2*Zr2
+      if band(b, 2) ~= 0 and Zrq+Ziq > 4.0 then b = band(b, -3) end
+      if band(b, 1) ~= 0 and Zrq2+Ziq2 > 4.0 then b = band(b, -2) end
+      if band(b, 3) == 0 then break end
+    end
+    if b >= 0 then p = p + 1; buf[p] = b; b = -16777216; end
+  end
+  if b ~= -16777216 then
+    if band(N, 1) ~= 0 then b = shr(b, 1) end
+    p = p + 1; buf[p] = shl(b, 8-band(N, 7))
+  end
+  write(char(unpack(buf, 1, p)))
+end
diff --git a/perf/LuaJIT-benches/mandelbrot.lua b/perf/LuaJIT-benches/mandelbrot.lua
new file mode 100644
index 00000000..0ef595a2
--- /dev/null
+++ b/perf/LuaJIT-benches/mandelbrot.lua
@@ -0,0 +1,23 @@
+
+local write, char, unpack = io.write, string.char, unpack
+local N = tonumber(arg and arg[1]) or 100
+local M, ba, bb, buf = 2/N, 2^(N%8+1)-1, 2^(8-N%8), {}
+write("P4\n", N, " ", N, "\n")
+for y=0,N-1 do
+  local Ci, b, p = y*M-1, 1, 0
+  for x=0,N-1 do
+    local Cr = x*M-1.5
+    local Zr, Zi, Zrq, Ziq = Cr, Ci, Cr*Cr, Ci*Ci
+    b = b + b
+    for i=1,49 do
+      Zi = Zr*Zi*2 + Ci
+      Zr = Zrq-Ziq + Cr
+      Ziq = Zi*Zi
+      Zrq = Zr*Zr
+      if Zrq+Ziq > 4.0 then b = b + 1; break; end
+    end
+    if b >= 256 then p = p + 1; buf[p] = 511 - b; b = 1; end
+  end
+  if b ~= 1 then p = p + 1; buf[p] = (ba-b)*bb; end
+  write(char(unpack(buf, 1, p)))
+end
diff --git a/perf/LuaJIT-benches/md5.lua b/perf/LuaJIT-benches/md5.lua
new file mode 100644
index 00000000..fdf6b4a7
--- /dev/null
+++ b/perf/LuaJIT-benches/md5.lua
@@ -0,0 +1,183 @@
+
+local bit = require("bit")
+local tobit, tohex, bnot = bit.tobit or bit.cast, bit.tohex, bit.bnot
+local bor, band, bxor = bit.bor, bit.band, bit.bxor
+local lshift, rshift, rol, bswap = bit.lshift, bit.rshift, bit.rol, bit.bswap
+local byte, char, sub, rep = string.byte, string.char, string.sub, string.rep
+
+if not rol then -- Replacement function if rotates are missing.
+  local bor, shl, shr = bit.bor, bit.lshift, bit.rshift
+  function rol(a, b) return bor(shl(a, b), shr(a, 32-b)) end
+end
+
+if not bswap then -- Replacement function if bswap is missing.
+  local bor, band, shl, shr = bit.bor, bit.band, bit.lshift, bit.rshift
+  function bswap(a)
+    return bor(shr(a, 24), band(shr(a, 8), 0xff00),
+	       shl(band(a, 0xff00), 8), shl(a, 24));
+  end
+end
+
+if not tohex then -- (Unreliable) replacement function if tohex is missing.
+  function tohex(a)
+    return string.sub(string.format("%08x", a), -8)
+  end
+end
+
+local function tr_f(a, b, c, d, x, s)
+  return rol(bxor(d, band(b, bxor(c, d))) + a + x, s) + b
+end
+
+local function tr_g(a, b, c, d, x, s)
+  return rol(bxor(c, band(d, bxor(b, c))) + a + x, s) + b
+end
+
+local function tr_h(a, b, c, d, x, s)
+  return rol(bxor(b, c, d) + a + x, s) + b
+end
+
+local function tr_i(a, b, c, d, x, s)
+  return rol(bxor(c, bor(b, bnot(d))) + a + x, s) + b
+end
+
+local function transform(x, a1, b1, c1, d1)
+  local a, b, c, d = a1, b1, c1, d1
+
+  a = tr_f(a, b, c, d, x[ 1] + 0xd76aa478,  7)
+  d = tr_f(d, a, b, c, x[ 2] + 0xe8c7b756, 12)
+  c = tr_f(c, d, a, b, x[ 3] + 0x242070db, 17)
+  b = tr_f(b, c, d, a, x[ 4] + 0xc1bdceee, 22)
+  a = tr_f(a, b, c, d, x[ 5] + 0xf57c0faf,  7)
+  d = tr_f(d, a, b, c, x[ 6] + 0x4787c62a, 12)
+  c = tr_f(c, d, a, b, x[ 7] + 0xa8304613, 17)
+  b = tr_f(b, c, d, a, x[ 8] + 0xfd469501, 22)
+  a = tr_f(a, b, c, d, x[ 9] + 0x698098d8,  7)
+  d = tr_f(d, a, b, c, x[10] + 0x8b44f7af, 12)
+  c = tr_f(c, d, a, b, x[11] + 0xffff5bb1, 17)
+  b = tr_f(b, c, d, a, x[12] + 0x895cd7be, 22)
+  a = tr_f(a, b, c, d, x[13] + 0x6b901122,  7)
+  d = tr_f(d, a, b, c, x[14] + 0xfd987193, 12)
+  c = tr_f(c, d, a, b, x[15] + 0xa679438e, 17)
+  b = tr_f(b, c, d, a, x[16] + 0x49b40821, 22)
+
+  a = tr_g(a, b, c, d, x[ 2] + 0xf61e2562,  5)
+  d = tr_g(d, a, b, c, x[ 7] + 0xc040b340,  9)
+  c = tr_g(c, d, a, b, x[12] + 0x265e5a51, 14)
+  b = tr_g(b, c, d, a, x[ 1] + 0xe9b6c7aa, 20)
+  a = tr_g(a, b, c, d, x[ 6] + 0xd62f105d,  5)
+  d = tr_g(d, a, b, c, x[11] + 0x02441453,  9)
+  c = tr_g(c, d, a, b, x[16] + 0xd8a1e681, 14)
+  b = tr_g(b, c, d, a, x[ 5] + 0xe7d3fbc8, 20)
+  a = tr_g(a, b, c, d, x[10] + 0x21e1cde6,  5)
+  d = tr_g(d, a, b, c, x[15] + 0xc33707d6,  9)
+  c = tr_g(c, d, a, b, x[ 4] + 0xf4d50d87, 14)
+  b = tr_g(b, c, d, a, x[ 9] + 0x455a14ed, 20)
+  a = tr_g(a, b, c, d, x[14] + 0xa9e3e905,  5)
+  d = tr_g(d, a, b, c, x[ 3] + 0xfcefa3f8,  9)
+  c = tr_g(c, d, a, b, x[ 8] + 0x676f02d9, 14)
+  b = tr_g(b, c, d, a, x[13] + 0x8d2a4c8a, 20)
+
+  a = tr_h(a, b, c, d, x[ 6] + 0xfffa3942,  4)
+  d = tr_h(d, a, b, c, x[ 9] + 0x8771f681, 11)
+  c = tr_h(c, d, a, b, x[12] + 0x6d9d6122, 16)
+  b = tr_h(b, c, d, a, x[15] + 0xfde5380c, 23)
+  a = tr_h(a, b, c, d, x[ 2] + 0xa4beea44,  4)
+  d = tr_h(d, a, b, c, x[ 5] + 0x4bdecfa9, 11)
+  c = tr_h(c, d, a, b, x[ 8] + 0xf6bb4b60, 16)
+  b = tr_h(b, c, d, a, x[11] + 0xbebfbc70, 23)
+  a = tr_h(a, b, c, d, x[14] + 0x289b7ec6,  4)
+  d = tr_h(d, a, b, c, x[ 1] + 0xeaa127fa, 11)
+  c = tr_h(c, d, a, b, x[ 4] + 0xd4ef3085, 16)
+  b = tr_h(b, c, d, a, x[ 7] + 0x04881d05, 23)
+  a = tr_h(a, b, c, d, x[10] + 0xd9d4d039,  4)
+  d = tr_h(d, a, b, c, x[13] + 0xe6db99e5, 11)
+  c = tr_h(c, d, a, b, x[16] + 0x1fa27cf8, 16)
+  b = tr_h(b, c, d, a, x[ 3] + 0xc4ac5665, 23)
+
+  a = tr_i(a, b, c, d, x[ 1] + 0xf4292244,  6)
+  d = tr_i(d, a, b, c, x[ 8] + 0x432aff97, 10)
+  c = tr_i(c, d, a, b, x[15] + 0xab9423a7, 15)
+  b = tr_i(b, c, d, a, x[ 6] + 0xfc93a039, 21)
+  a = tr_i(a, b, c, d, x[13] + 0x655b59c3,  6)
+  d = tr_i(d, a, b, c, x[ 4] + 0x8f0ccc92, 10)
+  c = tr_i(c, d, a, b, x[11] + 0xffeff47d, 15)
+  b = tr_i(b, c, d, a, x[ 2] + 0x85845dd1, 21)
+  a = tr_i(a, b, c, d, x[ 9] + 0x6fa87e4f,  6)
+  d = tr_i(d, a, b, c, x[16] + 0xfe2ce6e0, 10)
+  c = tr_i(c, d, a, b, x[ 7] + 0xa3014314, 15)
+  b = tr_i(b, c, d, a, x[14] + 0x4e0811a1, 21)
+  a = tr_i(a, b, c, d, x[ 5] + 0xf7537e82,  6)
+  d = tr_i(d, a, b, c, x[12] + 0xbd3af235, 10)
+  c = tr_i(c, d, a, b, x[ 3] + 0x2ad7d2bb, 15)
+  b = tr_i(b, c, d, a, x[10] + 0xeb86d391, 21)
+
+  return tobit(a+a1), tobit(b+b1), tobit(c+c1), tobit(d+d1)
+end
+
+-- Note: this is copying the original string and NOT particularly fast.
+-- A library for struct unpacking would make this task much easier.
+local function md5(msg)
+  local len = #msg
+  msg = msg.."\128"..rep("\0", 63 - band(len + 8, 63))
+	   ..char(band(lshift(len, 3), 255), band(rshift(len, 5), 255),
+		  band(rshift(len, 13), 255), band(rshift(len, 21), 255))
+	   .."\0\0\0\0"
+  local a, b, c, d = 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
+  local x, k = {}, 1
+  for i=1,#msg,4 do
+    local m0, m1, m2, m3 = byte(msg, i, i+3)
+    x[k] = bor(m0, lshift(m1, 8), lshift(m2, 16), lshift(m3, 24))
+    if k == 16 then
+      a, b, c, d = transform(x, a, b, c, d)
+      k = 1
+    else
+      k = k + 1
+    end
+  end
+  return tohex(bswap(a))..tohex(bswap(b))..tohex(bswap(c))..tohex(bswap(d))
+end
+
+assert(md5('') == 'd41d8cd98f00b204e9800998ecf8427e')
+assert(md5('a') == '0cc175b9c0f1b6a831c399e269772661')
+assert(md5('abc') == '900150983cd24fb0d6963f7d28e17f72')
+assert(md5('message digest') == 'f96b697d7cb7938d525a2f31aaf161d0')
+assert(md5('abcdefghijklmnopqrstuvwxyz') == 'c3fcd3d76192e4007dfb496cca67e13b')
+assert(md5('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789') ==
+       'd174ab98d277d9f5a5611c2c9f419d9f')
+assert(md5('12345678901234567890123456789012345678901234567890123456789012345678901234567890') ==
+       '57edf4a22be3c955ac49da2e2107b67a')
+
+local N = tonumber(arg and arg[1]) or 10000
+
+  -- Credits: William Shakespeare, Romeo and Juliet
+local txt = [[Rebellious subjects, enemies to peace,
+Profaners of this neighbour-stained steel,--
+Will they not hear? What, ho! you men, you beasts,
+That quench the fire of your pernicious rage
+With purple fountains issuing from your veins,
+On pain of torture, from those bloody hands
+Throw your mistemper'd weapons to the ground,
+And hear the sentence of your moved prince.
+Three civil brawls, bred of an airy word,
+By thee, old Capulet, and Montague,
+Have thrice disturb'd the quiet of our streets,
+And made Verona's ancient citizens
+Cast by their grave beseeming ornaments,
+To wield old partisans, in hands as old,
+Canker'd with peace, to part your canker'd hate:
+If ever you disturb our streets again,
+Your lives shall pay the forfeit of the peace.
+For this time, all the rest depart away:
+You Capulet; shall go along with me:
+And, Montague, come you this afternoon,
+To know our further pleasure in this case,
+To old Free-town, our common judgment-place.
+Once more, on pain of death, all men depart.]]
+  txt = txt..txt..txt..txt
+  txt = txt..txt..txt..txt
+
+for i=1,N do
+  res = md5(txt)
+end
+assert(res == 'a831e91e0f70eddcb70dc61c6f82f6cd')
+
diff --git a/perf/LuaJIT-benches/meteor.lua b/perf/LuaJIT-benches/meteor.lua
new file mode 100644
index 00000000..80588ab5
--- /dev/null
+++ b/perf/LuaJIT-benches/meteor.lua
@@ -0,0 +1,220 @@
+
+-- Generate a decision tree based solver for the meteor puzzle.
+local function generatesolver(countinit)
+  local pairs, ipairs, format = pairs, ipairs, string.format
+  local byte, min, sort = string.byte, math.min, table.sort
+
+  -- Cached position to distance lookup.
+  local dist = setmetatable({}, { __index = function(t, xy)
+    local x = xy%10; local y = (xy-x)/10
+    if (x+y)%2 == 1 then y = y + 1; x = 10 - x end
+    local d = xy + 256*x*x + 1024*y*y; t[xy] = d; return d
+  end})
+
+  -- Lookup table to validate a cell and to find its successor.
+  local ok = {}
+  for i=0,150 do ok[i] = false end
+  for i=99,0,-1 do
+    local x = i%10
+    if ((i-x)/10+x)%2 == 0 then
+      ok[i] = i + (ok[i+1] and 1 or (ok[i+2] and 2 or 3))
+    end
+  end
+
+  -- Temporary board state for the island checks.
+  local islands, slide = {}, {20,22,24,26,28,31,33,35,37,39}
+  local bbc, bb = 0, {}
+  for i=0,19 do bb[i] = false; bb[i+80] = false end
+  for i=20,79 do bb[i] = ok[i] end
+
+  -- Recursive flood fill algorithm.
+  local function fill(bb, p)
+    bbc = bbc + 1
+    local n = p+2; if bb[n] then bb[n] = false; fill(bb, n) end
+    n = p-2; if bb[n] then bb[n] = false; fill(bb, n) end
+    n = p-9; if bb[n] then bb[n] = false; fill(bb, n) end
+    n = p-11; if bb[n] then bb[n] = false; fill(bb, n) end
+    n = p+9; if bb[n] then bb[n] = false; fill(bb, n) end
+    n = p+11; if bb[n] then bb[n] = false; fill(bb, n) end
+  end
+
+  -- Generate pruned, sliding decision trees.
+  local dtrees = {{}, {}, {}, {}, {}, {}, {}, {}, {}, {}}
+  local rot = { nil, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {} }
+  for k=0,9 do
+    -- Generate 10 initial pieces from line noise. :-)
+    local t = { 60, 62, byte("@BMBIK@KT@GPIKR@IKIKT@GK@KM@BG", k*3+1, k*3+3) }
+    rot[1] = t
+    for i,xy in ipairs(t) do
+      local x = xy%10; local y = (xy-x-60)/10
+      -- Add 11 more variations by rotating and flipping.
+      for j=2,12 do
+	if j == 7 then y = -y else x,y = (x+3*y)/2, (y-x)/2 end
+	rot[j][i] = x+10*y
+      end
+    end
+    for r,v in ipairs(rot) do
+      -- Exploit symmetry and leave out half of the orientations of one piece.
+      -- The selected piece gives the best reduction of the solution space.
+      if k ~= 3 or r%2 == 0 then
+	-- Normalize to origin, add distance, sort by distance from origin.
+	local m = min(v[1], v[2], v[3], v[4], v[5])
+	for i=1,5 do v[i] = dist[v[i]-m] end
+	sort(v)
+	local v2, v3, v4, v5 = v[2]%256, v[3]%256, v[4]%256, v[5]%256
+	-- Slide the piece across 2 rows, prune the tree, check for islands.
+	for j,p in ipairs(slide) do
+	  bb[p] = false
+	  if ok[p+v2] and ok[p+v3] and ok[p+v4] and ok[p+v5] then -- Prune.
+	    for i=p+1,79 do bb[i] = ok[i] end -- Clear remaining board.
+	    bb[p+v2] = false; bb[p+v3] = false -- Add piece.
+	    bb[p+v4] = false; bb[p+v5] = false
+	    bbc = j -- Flood fill and count the filled positions.
+	    if bb[71] then bb[71] = false; fill(bb, 71) end -- Lower left.
+	    if bb[79] then bb[79] = false; fill(bb, 79) end -- Lower right.
+	    local di = 0
+	    if bbc < 22 then bbc = 26
+	    elseif bbc < 26 then -- Island found, locate it, fill from above.
+	      for i=p+2,79 do if bb[i] then di = i-p; break end end
+	      for i=p-9,p-1 do if ok[i] then fill(bb, i) bbc = bbc - 1 end end
+	    end
+	    if bbc == 26 then -- Prune boards with static islands.
+	      local tb = dtrees[j] -- Build decision tree in distance order.
+	      local ta = tb[v2]; if not ta then ta = {}; tb[v2] = ta end
+	      tb = ta[v3]; if not tb then tb = {}; ta[v3] = tb end
+	      ta = tb[v4]; if not ta then ta = {}; tb[v4] = ta; islands[ta] = di
+	      elseif islands[ta] ~= di then islands[ta] = 0 end
+	      ta[v5] = di*10+k -- Leaves hold island check and piece number.
+	    end
+	  end
+	end
+      end
+    end
+  end
+
+  local s = "local u0,u1,u2,u3,u4,u5,u6,u7,u8,u9" -- Piece use flags.
+  for p=0,99 do if ok[p] then s = s..",b"..p end end -- Board cells.
+  s = s.."\n"..[[
+local countinit = ...
+local count = countinit
+local bmin, bmax, pcs = 9, 0, {}
+local smin, smax
+local write, reverse = io.write, string.reverse
+
+-- Print min/max boards.
+local function printboard(s)
+  local flip = true
+  for x in string.gmatch(string.gsub(s, ".", "%1 "), "..........") do
+    write(x, flip and "\n " or "\n")
+    flip = not flip
+  end
+  write("\n")
+end
+
+-- Print result.
+local function printresult()
+  write(countinit-count, " solutions found\n\n")
+  printboard(smin)
+  printboard(smax)
+end
+
+-- Generate piece lookup array from the order of use.
+local function genp()
+  local p = pcs
+  p[u0] = "0" p[u1] = "1" p[u2] = "2" p[u3] = "3" p[u4] = "4"
+  p[u5] = "5" p[u6] = "6" p[u7] = "7" p[u8] = "8" p[u9] = "9"
+  return p
+end
+
+-- Goal function.
+local function f91(k)
+  if k ~= 10 then return end
+  count = count - 2 -- Need to count the symmetric solution, too.
+  repeat
+    -- Quick precheck before constructing the string.
+    local b0, b99 = b0, b99
+    if b0 <= bmin then bmin = b0 elseif b0 >= bmax then bmax = b0
+    elseif b99 <= bmin then bmin = b99 elseif b99 >= bmax then bmax = b99
+    else break end
+    -- Translate the filled board to a string.
+    local p = genp()
+    local s = p[b0] ]]
+  for p=2,99 do if ok[p] then s = s.."..p[b"..p.."]" end end
+  s = s..[[
+    -- Remember min/max boards, dito for the symmetric board.
+    if not smin then smin = s; smax = s
+    elseif s < smin then smin = s elseif s > smax then smax = s end
+    s = reverse(s)
+    if s < smin then smin = s elseif s > smax then smax = s end
+  until true
+  if count <= 0 then error() end -- Early abort if max count given.
+end
+local f93 = f91
+]]
+
+  -- Recursively convert the decision tree to Lua code.
+  local function codetree(tree, d, p, pn)
+    local found, s = false, ""
+    d = d + 1
+    for a,t in pairs(tree) do
+      local b = p+a
+      if b < 100 then -- Prune the tree at the lower border.
+	local pp = b ~= pn and pn or ok[b] -- Find maximum successor function.
+	if d >= 5 then -- Try to place the last cell of a piece and advance.
+	  found = true
+	  local u = t%10
+	  local di = (t-u)/10
+	  if di ~= 0 and d == 5 then
+	    di = di + p; if pp == di then pp = ok[di] end
+	    s = format("%sif b%d and not u%d and not b%d then b%d=k u%d=k f%d(k) u%d=N b%d=N end\n",
+		       s, di, u, b, b, u, pp, u, b)
+	  else
+	    s = format("%sif not u%d and not b%d then b%d=k u%d=k f%d(k) u%d=N b%d=N end\n",
+		       s, u, b, b, u, pp, u, b)
+	  end
+	else -- Try to place an intermediate cell.
+	  local di = d ~= 4 and 0 or islands[t]
+	  if di == 0 then
+	    local st = codetree(t, d, p, pp)
+	    if st then
+	      found = true
+	      s = format("%sif not b%d then b%d=k\n%sb%d=N end\n", s, b, b, st, b)
+	    end
+	  else -- Combine island checks.
+	    di = di + p; if pp == di then pp = ok[di] end
+	    local st = codetree(t, 6, p, pp)
+	    if st then
+	      found = true
+	      s = format("%sif b%d and not b%d then b%d=k\n%sb%d=N end\n", s, di, b, b, st, b)
+	    end
+	  end
+	end
+      end
+    end
+    return found and s
+  end
+
+  -- Embed the decision tree into a function hierarchy.
+  local j = 5
+  for p=88,0,-1 do
+    local pn = ok[p]
+    if pn then
+      s = format("%slocal function f%d(k)\nlocal N if b%d then return f%d(k) end k=k+1 b%d=k\n%sb%d=N end\n",
+	    s, p, p, pn, p, codetree(dtrees[j], 1, p, pn), p)
+      j = j - 1; if j == 0 then j = 10 end
+    end
+  end
+
+  -- Compile and return solver function and result getter.
+  return loadstring(s.."return f0, printresult\n", "solver")(countinit)
+end
+
+-- Generate the solver function hierarchy.
+local solver, printresult = generatesolver(tonumber(arg and arg[1]) or 10000)
+
+-- The optimizer for LuaJIT 1.1.x is not helpful here, so turn it off.
+if jit and jit.opt and jit.version_num < 10200 then jit.opt.start(0) end
+
+-- Run the solver protected to get partial results (max count or ctrl-c).
+pcall(solver, 0)
+printresult()
diff --git a/perf/LuaJIT-benches/nbody.lua b/perf/LuaJIT-benches/nbody.lua
new file mode 100644
index 00000000..e0ff8f77
--- /dev/null
+++ b/perf/LuaJIT-benches/nbody.lua
@@ -0,0 +1,119 @@
+
+local sqrt = math.sqrt
+
+local PI = 3.141592653589793
+local SOLAR_MASS = 4 * PI * PI
+local DAYS_PER_YEAR = 365.24
+local bodies = {
+  { -- Sun
+    x = 0,
+    y = 0,
+    z = 0,
+    vx = 0,
+    vy = 0,
+    vz = 0,
+    mass = SOLAR_MASS
+  },
+  { -- Jupiter
+    x = 4.84143144246472090e+00,
+    y = -1.16032004402742839e+00,
+    z = -1.03622044471123109e-01,
+    vx = 1.66007664274403694e-03 * DAYS_PER_YEAR,
+    vy = 7.69901118419740425e-03 * DAYS_PER_YEAR,
+    vz = -6.90460016972063023e-05 * DAYS_PER_YEAR,
+    mass = 9.54791938424326609e-04 * SOLAR_MASS
+  },
+  { -- Saturn
+    x = 8.34336671824457987e+00,
+    y = 4.12479856412430479e+00,
+    z = -4.03523417114321381e-01,
+    vx = -2.76742510726862411e-03 * DAYS_PER_YEAR,
+    vy = 4.99852801234917238e-03 * DAYS_PER_YEAR,
+    vz = 2.30417297573763929e-05 * DAYS_PER_YEAR,
+    mass = 2.85885980666130812e-04 * SOLAR_MASS
+  },
+  { -- Uranus
+    x = 1.28943695621391310e+01,
+    y = -1.51111514016986312e+01,
+    z = -2.23307578892655734e-01,
+    vx = 2.96460137564761618e-03 * DAYS_PER_YEAR,
+    vy = 2.37847173959480950e-03 * DAYS_PER_YEAR,
+    vz = -2.96589568540237556e-05 * DAYS_PER_YEAR,
+    mass = 4.36624404335156298e-05 * SOLAR_MASS
+  },
+  { -- Neptune
+    x = 1.53796971148509165e+01,
+    y = -2.59193146099879641e+01,
+    z = 1.79258772950371181e-01,
+    vx = 2.68067772490389322e-03 * DAYS_PER_YEAR,
+    vy = 1.62824170038242295e-03 * DAYS_PER_YEAR,
+    vz = -9.51592254519715870e-05 * DAYS_PER_YEAR,
+    mass = 5.15138902046611451e-05 * SOLAR_MASS
+  }
+}
+
+local function advance(bodies, nbody, dt)
+  for i=1,nbody do
+    local bi = bodies[i]
+    local bix, biy, biz, bimass = bi.x, bi.y, bi.z, bi.mass
+    local bivx, bivy, bivz = bi.vx, bi.vy, bi.vz
+    for j=i+1,nbody do
+      local bj = bodies[j]
+      local dx, dy, dz = bix-bj.x, biy-bj.y, biz-bj.z
+      local mag = sqrt(dx*dx + dy*dy + dz*dz)
+      mag = dt / (mag * mag * mag)
+      local bm = bj.mass*mag
+      bivx = bivx - (dx * bm)
+      bivy = bivy - (dy * bm)
+      bivz = bivz - (dz * bm)
+      bm = bimass*mag
+      bj.vx = bj.vx + (dx * bm)
+      bj.vy = bj.vy + (dy * bm)
+      bj.vz = bj.vz + (dz * bm)
+    end
+    bi.vx = bivx
+    bi.vy = bivy
+    bi.vz = bivz
+    bi.x = bix + dt * bivx
+    bi.y = biy + dt * bivy
+    bi.z = biz + dt * bivz
+  end
+end
+
+local function energy(bodies, nbody)
+  local e = 0
+  for i=1,nbody do
+    local bi = bodies[i]
+    local vx, vy, vz, bim = bi.vx, bi.vy, bi.vz, bi.mass
+    e = e + (0.5 * bim * (vx*vx + vy*vy + vz*vz))
+    for j=i+1,nbody do
+      local bj = bodies[j]
+      local dx, dy, dz = bi.x-bj.x, bi.y-bj.y, bi.z-bj.z
+      local distance = sqrt(dx*dx + dy*dy + dz*dz)
+      e = e - ((bim * bj.mass) / distance)
+    end
+  end
+  return e
+end
+
+local function offsetMomentum(b, nbody)
+  local px, py, pz = 0, 0, 0
+  for i=1,nbody do
+    local bi = b[i]
+    local bim = bi.mass
+    px = px + (bi.vx * bim)
+    py = py + (bi.vy * bim)
+    pz = pz + (bi.vz * bim)
+  end
+  b[1].vx = -px / SOLAR_MASS
+  b[1].vy = -py / SOLAR_MASS
+  b[1].vz = -pz / SOLAR_MASS
+end
+
+local N = tonumber(arg and arg[1]) or 1000
+local nbody = #bodies
+
+offsetMomentum(bodies, nbody)
+io.write( string.format("%0.9f",energy(bodies, nbody)), "\n")
+for i=1,N do advance(bodies, nbody, 0.01) end
+io.write( string.format("%0.9f",energy(bodies, nbody)), "\n")
diff --git a/perf/LuaJIT-benches/nsieve-bit-fp.lua b/perf/LuaJIT-benches/nsieve-bit-fp.lua
new file mode 100644
index 00000000..3971ec1f
--- /dev/null
+++ b/perf/LuaJIT-benches/nsieve-bit-fp.lua
@@ -0,0 +1,37 @@
+
+local floor, ceil = math.floor, math.ceil
+
+local precision = 50 -- Maximum precision of lua_Number (minus safety margin).
+local onebits = (2^precision)-1
+
+local function nsieve(p, m)
+  local cm = ceil(m/precision)
+  do local onebits = onebits; for i=0,cm do p[i] = onebits end end
+  local count, idx, bit = 0, 2, 2
+  for i=2,m do
+    local r = p[idx] / bit
+    if r - floor(r) >= 0.5 then -- Bit set?
+      local kidx, kbit = idx, bit
+      for k=i+i,m,i do
+        kidx = kidx + i
+        while kidx >= cm do kidx = kidx - cm; kbit = kbit + kbit end
+        local x = p[kidx]
+        local r = x / kbit
+        if r - floor(r) >= 0.5 then p[kidx] = x - kbit*0.5 end -- Clear bit.
+      end
+      count = count + 1
+    end
+    idx = idx + 1
+    if idx >= cm then idx = 0; bit = bit + bit end
+  end
+  return count
+end
+
+local N = tonumber(arg and arg[1]) or 1
+if N < 2 then N = 2 end
+local primes = {}
+
+for i=0,2 do
+  local m = (2^(N-i))*10000
+  io.write(string.format("Primes up to %8d %8d\n", m, nsieve(primes, m)))
+end
diff --git a/perf/LuaJIT-benches/nsieve-bit.lua b/perf/LuaJIT-benches/nsieve-bit.lua
new file mode 100644
index 00000000..820a3726
--- /dev/null
+++ b/perf/LuaJIT-benches/nsieve-bit.lua
@@ -0,0 +1,27 @@
+
+local bit = require("bit")
+local band, bxor, rshift, rol = bit.band, bit.bxor, bit.rshift, bit.rol
+
+local function nsieve(p, m)
+  local count = 0
+  for i=0,rshift(m, 5) do p[i] = -1 end
+  for i=2,m do
+    if band(rshift(p[rshift(i, 5)], i), 1) ~= 0 then
+      count = count + 1
+      for j=i+i,m,i do
+	local jx = rshift(j, 5)
+	p[jx] = band(p[jx], rol(-2, j))
+      end
+    end
+  end
+  return count
+end
+
+local N = tonumber(arg and arg[1]) or 1
+if N < 2 then N = 2 end
+local primes = {}
+
+for i=0,2 do
+  local m = (2^(N-i))*10000
+  io.write(string.format("Primes up to %8d %8d\n", m, nsieve(primes, m)))
+end
diff --git a/perf/LuaJIT-benches/nsieve.lua b/perf/LuaJIT-benches/nsieve.lua
new file mode 100644
index 00000000..6de0524f
--- /dev/null
+++ b/perf/LuaJIT-benches/nsieve.lua
@@ -0,0 +1,21 @@
+
+local function nsieve(p, m)
+  for i=2,m do p[i] = true end
+  local count = 0
+  for i=2,m do
+    if p[i] then
+      for k=i+i,m,i do p[k] = false end
+      count = count + 1
+    end
+  end
+  return count
+end
+
+local N = tonumber(arg and arg[1]) or 1
+if N < 2 then N = 2 end
+local primes = {}
+
+for i=0,2 do
+  local m = (2^(N-i))*10000
+  io.write(string.format("Primes up to %8d %8d\n", m, nsieve(primes, m)))
+end
diff --git a/perf/LuaJIT-benches/partialsums.lua b/perf/LuaJIT-benches/partialsums.lua
new file mode 100644
index 00000000..46bb9da3
--- /dev/null
+++ b/perf/LuaJIT-benches/partialsums.lua
@@ -0,0 +1,29 @@
+
+local n = tonumber(arg[1])
+local function pr(fmt, x) io.write(string.format(fmt, x)) end
+
+local a1, a2, a3, a4, a5, a6, a7, a8, a9, alt = 1, 0, 0, 0, 0, 0, 0, 0, 0, 1
+local sqrt, sin, cos = math.sqrt, math.sin, math.cos
+for k=1,n do
+  local k2, sk, ck = k*k, sin(k), cos(k)
+  local k3 = k2*k
+  a1 = a1 + (2/3)^k
+  a2 = a2 + 1/sqrt(k)
+  a3 = a3 + 1/(k2+k)
+  a4 = a4 + 1/(k3*sk*sk)
+  a5 = a5 + 1/(k3*ck*ck)
+  a6 = a6 + 1/k
+  a7 = a7 + 1/k2
+  a8 = a8 + alt/k
+  a9 = a9 + alt/(k+k-1)
+  alt = -alt
+end
+pr("%.9f\t(2/3)^k\n", a1)
+pr("%.9f\tk^-0.5\n", a2)
+pr("%.9f\t1/k(k+1)\n", a3)
+pr("%.9f\tFlint Hills\n", a4)
+pr("%.9f\tCookson Hills\n", a5)
+pr("%.9f\tHarmonic\n", a6)
+pr("%.9f\tRiemann Zeta\n", a7)
+pr("%.9f\tAlternating Harmonic\n", a8)
+pr("%.9f\tGregory\n", a9)
diff --git a/perf/LuaJIT-benches/pidigits-nogmp.lua b/perf/LuaJIT-benches/pidigits-nogmp.lua
new file mode 100644
index 00000000..63a1cb0e
--- /dev/null
+++ b/perf/LuaJIT-benches/pidigits-nogmp.lua
@@ -0,0 +1,100 @@
+
+-- Start of dynamically compiled chunk.
+local chunk = [=[
+
+-- Factory function for multi-precision number (mpn) operations.
+local function fmm(fa, fb)
+  return loadstring([[
+    return function(y, a, ka, b, kb)
+      local carry, n = 0, #a ]]..(fb == 0 and "" or [[
+      local na, nb = n, #b -- Need to adjust lengths. 1 element suffices here.
+      if na > nb then b[na] = 0 elseif na < nb then a[nb] = 0; n = nb end
+    ]])..[[
+      for i=1,n do -- Sum up all elements and propagate carry.
+        local x = a[i] ]]..(fa == 2 and "*ka" or "")..
+          (fb == 2 and "+b[i]*kb" or (fb == 1 and "+b[i]" or ""))..[[ + carry
+        if x < RADIX and x >= 0 then carry = 0; y[i] = x -- Check for overflow.
+        else local d = x % RADIX; carry = (x-d) / RADIX; y[i] = d end
+      end
+      y[n+1] = nil -- Truncate target. 1 element suffices here.
+      if carry == 0 then while n > 0 and y[n] == 0 do y[n] = nil end
+      elseif carry == -1 then y[n] = y[n] - RADIX else y[n+1] = carry end
+    ]]..(fb == 0 and "" or [[ -- Undo length adjustment.
+      if na > nb then b[na] = nil elseif na < nb and y ~= a then a[nb] = nil end
+    ]])..[[
+      return y
+    end]])()
+end
+
+-- Generate needed mpn functions.
+local mm_kk, mm_k1, mm_k0, mm_11 = fmm(2, 2), fmm(2, 1), fmm(2, 0), fmm(1, 1)
+
+-- Choose the most efficient mpn function for y = a*ka + b*kb at run-time.
+local function mm(y, a, ka, b, kb)
+  local f = mm_kk
+  if kb == 0 or #b == 0 then if ka == 1 then return a else f = mm_k0 end
+  elseif kb == 1 then if ka == 1 then f = mm_11 else f = mm_k1 end end
+  return f(y, a, ka, b, kb)
+end
+
+-- Compose matrix with numbers on the right.
+local function compose_r(aq,ar,as,at, bq,br,bs,bt)
+  mm(ar, ar,bq, at,br) mm(at, at,bt, ar,bs)
+  mm(as, as,bt, aq,bs) mm(aq, aq,bq, nil,0)
+end
+
+-- Compose matrix with numbers on the left.
+local function compose_l(aq,ar,as,at, bq,br,bs,bt)
+  mm(ar, ar,bt, aq,br) mm(at, at,bt, as,br)
+  mm(as, as,bq, at,bs) mm(aq, aq,bq, nil,0)
+end
+
+-- Extract one digit.
+local u, v, jj = {}, {}, 0
+local function extract(q,r,s,t, j)
+  local u = j == jj + 1 and mm(u, u,1, q,1) or mm(u, q,j, r,1); jj = j
+  local v = mm(v, t,1, s,j)
+  local nu, nv, y = #u, #v
+  if nu == nv then
+    if nu == 1 then y = u[1] / v[1]
+    else y = (u[nu]*RADIX + u[nu-1]) / (v[nv]*RADIX + v[nv-1]) end
+  elseif nu == nv+1 then y = (u[nu]*RADIX + u[nv]) / v[nv]
+  else return 0 end
+  return math.floor(y)
+end
+
+-- Coroutine which yields successive digits of PI.
+return coroutine.wrap(function()
+  local q, r, s, t, k = {1}, {}, {}, {1}, 1
+  repeat
+    local y = extract(q,r,s,t, 3)
+    if y == extract(q,r,s,t, 4) then
+      coroutine.yield(y)
+      compose_r(q,r,s,t,  10, -10*y, 0, 1)
+    else
+      compose_l(q,r,s,t,   k, 4*k+2, 0, 2*k+1)
+      k = k + 1
+    end
+  until false
+end)
+
+]=] -- End of dynamically compiled chunk.
+
+local N = tonumber(arg and arg[1]) or 27
+local RADIX = N < 6500 and 2^36 or 2^32 -- Avoid overflow.
+
+-- Substitute radix and compile chunk.
+local pidigit = loadstring(string.gsub(chunk, "RADIX", tostring(RADIX)))()
+
+-- Print lines with 10 digits.
+for i=10,N,10 do
+  for j=1,10 do io.write(pidigit()) end
+  io.write("\t:", i, "\n")
+end
+
+-- Print remaining digits (if any).
+local n10 = N % 10
+if n10 ~= 0 then
+  for i=1,n10 do io.write(pidigit()) end
+  io.write(string.rep(" ", 10-n10), "\t:", N, "\n")
+end
diff --git a/perf/LuaJIT-benches/ray.lua b/perf/LuaJIT-benches/ray.lua
new file mode 100644
index 00000000..2acc24c0
--- /dev/null
+++ b/perf/LuaJIT-benches/ray.lua
@@ -0,0 +1,135 @@
+local sqrt = math.sqrt
+local huge = math.huge
+
+local delta = 1
+while delta * delta + 1 ~= 1 do
+  delta = delta * 0.5
+end
+
+local function length(x, y, z)  return sqrt(x*x + y*y + z*z) end
+local function vlen(v)          return length(v[1], v[2], v[3]) end
+local function mul(c, x, y, z)  return c*x, c*y, c*z end
+local function unitise(x, y, z) return mul(1/length(x, y, z), x, y, z) end
+local function dot(x1, y1, z1, x2, y2, z2)
+  return x1*x2 + y1*y2 + z1*z2
+end
+
+local function vsub(a, b)        return a[1] - b[1], a[2] - b[2], a[3] - b[3] end
+local function vdot(a, b)        return dot(a[1], a[2], a[3], b[1], b[2], b[3]) end
+
+
+local sphere = {}
+function sphere:new(centre, radius)
+  self.__index = self
+  return setmetatable({centre=centre, radius=radius}, self)
+end
+
+local function sphere_distance(self, origin, dir)
+  local vx, vy, vz = vsub(self.centre, origin)
+  local b = dot(vx, vy, vz, dir[1], dir[2], dir[3])
+  local r = self.radius
+  local disc = r*r + b*b - vx*vx-vy*vy-vz*vz
+  if disc < 0 then return huge end
+  local d = sqrt(disc)
+  local t2 = b + d
+  if t2 < 0 then return huge end
+  local t1 = b - d
+  return t1 > 0 and t1 or t2
+end
+
+function sphere:intersect(origin, dir, best)
+  local lambda = sphere_distance(self, origin, dir)
+  if lambda < best[1] then
+    local c = self.centre
+    best[1] = lambda
+    local b2 = best[2]
+    b2[1], b2[2], b2[3] =
+      unitise(
+        origin[1] - c[1] + lambda * dir[1],
+        origin[2] - c[2] + lambda * dir[2],
+        origin[3] - c[3] + lambda * dir[3])
+  end
+end
+
+local group = {}
+function group:new(bound)
+  self.__index = self
+  return setmetatable({bound=bound, children={}}, self)
+end
+
+function group:add(s)
+  self.children[#self.children+1] = s
+end
+
+function group:intersect(origin, dir, best)
+  local lambda = sphere_distance(self.bound, origin, dir)
+  if lambda < best[1] then
+    for _, c in ipairs(self.children) do
+      c:intersect(origin, dir, best)
+    end
+  end
+end
+
+local hit = { 0, 0, 0 }
+local ilight
+local best = { huge, { 0, 0, 0 } }
+
+local function ray_trace(light, camera, dir, scene)
+  best[1] = huge
+  scene:intersect(camera, dir, best)
+  local b1 = best[1]
+  if b1 == huge then return 0 end
+  local b2 = best[2]
+  local g = vdot(b2, light)
+  if g >= 0 then return 0 end
+  hit[1] = camera[1] + b1*dir[1] + delta*b2[1]
+  hit[2] = camera[2] + b1*dir[2] + delta*b2[2]
+  hit[3] = camera[3] + b1*dir[3] + delta*b2[3]
+  best[1] = huge
+  scene:intersect(hit, ilight, best)
+  if best[1] == huge then
+    return -g
+  else
+    return 0
+  end
+end
+
+local function create(level, centre, radius)
+  local s = sphere:new(centre, radius)
+  if level == 1 then return s end
+  local gr = group:new(sphere:new(centre, 3*radius))
+  gr:add(s)
+  local rn = 3*radius/sqrt(12)
+  for dz = -1,1,2 do
+    for dx = -1,1,2 do
+      gr:add(create(level-1, { centre[1] + rn*dx, centre[2] + rn, centre[3] + rn*dz }, radius*0.5))
+    end
+  end
+  return gr
+end
+
+
+local level, n, ss = tonumber(arg[1]) or 9, tonumber(arg[2]) or 256, 4
+local iss = 1/ss
+local gf = 255/(ss*ss)
+
+io.write(("P5\n%d %d\n255\n"):format(n, n))
+local light = { unitise(-1, -3, 2) }
+ilight = { -light[1], -light[2], -light[3] }
+local camera = { 0, 0, -4 }
+local dir = { 0, 0, 0 }
+
+local scene = create(level, {0, -1, 0}, 1)
+
+for y = n/2-1, -n/2, -1 do
+  for x = -n/2, n/2-1 do
+    local g = 0
+    for d = y, y+.99, iss do
+      for e = x, x+.99, iss do
+        dir[1], dir[2], dir[3] = unitise(e, d, n)
+        g = g + ray_trace(light, camera, dir, scene) 
+      end
+    end
+    io.write(string.char(math.floor(0.5 + g*gf)))
+  end
+end
diff --git a/perf/LuaJIT-benches/recursive-ack.lua b/perf/LuaJIT-benches/recursive-ack.lua
new file mode 100644
index 00000000..fad30589
--- /dev/null
+++ b/perf/LuaJIT-benches/recursive-ack.lua
@@ -0,0 +1,8 @@
+local function Ack(m, n)
+  if m == 0 then return n+1 end
+  if n == 0 then return Ack(m-1, 1) end
+  return Ack(m-1, (Ack(m, n-1))) -- The parentheses are deliberate.
+end
+
+local N = tonumber(arg and arg[1]) or 10
+io.write("Ack(3,", N ,"): ", Ack(3,N), "\n")
diff --git a/perf/LuaJIT-benches/recursive-fib.lua b/perf/LuaJIT-benches/recursive-fib.lua
new file mode 100644
index 00000000..ef9950de
--- /dev/null
+++ b/perf/LuaJIT-benches/recursive-fib.lua
@@ -0,0 +1,7 @@
+local function fib(n)
+  if n < 2 then return 1 end
+  return fib(n-2) + fib(n-1)
+end
+
+local n = tonumber(arg[1]) or 10
+io.write(string.format("Fib(%d): %d\n", n, fib(n)))
diff --git a/perf/LuaJIT-benches/revcomp.lua b/perf/LuaJIT-benches/revcomp.lua
new file mode 100644
index 00000000..34fe347b
--- /dev/null
+++ b/perf/LuaJIT-benches/revcomp.lua
@@ -0,0 +1,37 @@
+
+local sub = string.sub
+iubc = setmetatable({
+  A="T", C="G", B="V", D="H", K="M", R="Y",
+  a="T", c="G", b="V", d="H", k="M", r="Y",
+  T="A", G="C", V="B", H="D", M="K", Y="R", U="A",
+  t="A", g="C", v="B", h="D", m="K", y="R", u="A",
+  N="N", S="S", W="W", n="N", s="S", w="W",
+}, { __index = function(t, s)
+  local r = t[sub(s, 2)]..t[sub(s, 1, 1)]; t[s] = r; return r end })
+
+local wcode = [=[
+return function(t, n)
+  if n == 1 then return end
+  local iubc, sub, write = iubc, string.sub, io.write
+  local s = table.concat(t, "", 1, n-1)
+  for i=#s-59,1,-60 do
+    write(]=]
+for i=59,3,-4 do wcode = wcode.."iubc[sub(s, i+"..(i-3)..", i+"..i..")], " end
+wcode = wcode..[=["\n")
+  end
+  local r = #s % 60
+  if r ~= 0 then
+    for i=r,1,-4 do write(iubc[sub(s, i-3 < 1 and 1 or i-3, i)]) end
+    write("\n")
+  end
+end
+]=]
+local writerev = loadstring(wcode)()
+
+local t, n = {}, 1
+for line in io.lines() do
+  local c = sub(line, 1, 1)
+  if c == ">" then writerev(t, n); io.write(line, "\n"); n = 1
+  elseif c ~= ";" then t[n] = line; n = n + 1 end
+end
+writerev(t, n)
diff --git a/perf/LuaJIT-benches/scimark-2010-12-20.lua b/perf/LuaJIT-benches/scimark-2010-12-20.lua
new file mode 100644
index 00000000..353acb7c
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark-2010-12-20.lua
@@ -0,0 +1,400 @@
+------------------------------------------------------------------------------
+-- Lua SciMark (2010-12-20).
+--
+-- A literal translation of SciMark 2.0a, written in Java and C.
+-- Credits go to the original authors Roldan Pozo and Bruce Miller.
+-- See: http://math.nist.gov/scimark2/
+------------------------------------------------------------------------------
+
+local SCIMARK_VERSION = "2010-12-10"
+local SCIMARK_COPYRIGHT = "Copyright (C) 2006-2010 Mike Pall"
+
+local MIN_TIME = 2.0
+local RANDOM_SEED = 101009 -- Must be odd.
+local SIZE_SELECT = "small"
+
+local benchmarks = {
+  "FFT", "SOR", "MC", "SPARSE", "LU",
+  small = {
+    FFT		= { 1024 },
+    SOR		= { 100 },
+    MC		= { },
+    SPARSE	= { 1000, 5000 },
+    LU		= { 100 },
+  },
+  large = {
+    FFT		= { 1048576 },
+    SOR		= { 1000 },
+    MC		= { },
+    SPARSE	= { 100000, 1000000 },
+    LU		= { 1000 },
+  },
+}
+
+local abs, log, sin, floor = math.abs, math.log, math.sin, math.floor
+local pi, clock = math.pi, os.clock
+local format = string.format
+
+------------------------------------------------------------------------------
+-- Select array type: Lua tables or native (FFI) arrays
+------------------------------------------------------------------------------
+
+local darray, iarray
+
+local function array_init()
+  if jit and jit.status and jit.status() then
+    local ok, ffi = pcall(require, "ffi")
+    if ok then
+      darray = ffi.typeof("double[?]")
+      iarray = ffi.typeof("int[?]")
+      return
+    end
+  end
+  function darray(n) return {} end
+  iarray = darray
+end
+
+------------------------------------------------------------------------------
+-- This is a Lagged Fibonacci Pseudo-random Number Generator with
+-- j, k, M = 5, 17, 31. Pretty weak, but same as C/Java SciMark.
+------------------------------------------------------------------------------
+
+local rand, rand_init
+
+if jit and jit.status and jit.status() then
+  -- LJ2 has bit operations and zero-based arrays (internally).
+  local bit = require("bit")
+  local band, sar = bit.band, bit.arshift
+  function rand_init(seed)
+    local Rm, Rj, Ri = iarray(17), 16, 11
+    for i=0,16 do Rm[i] = 0 end
+    for i=16,0,-1 do
+      seed = band(seed*9069, 0x7fffffff)
+      Rm[i] = seed
+    end
+    function rand()
+      local i = band(Ri+1, sar(Ri-16, 31))
+      local j = band(Rj+1, sar(Rj-16, 31))
+      Ri, Rj = i, j
+      local k = band(Rm[i] - Rm[j], 0x7fffffff)
+      Rm[j] = k
+      return k * (1.0/2147483647.0)
+    end
+  end
+else
+  -- Better for standard Lua with one-based arrays and without bit operations.
+  function rand_init(seed)
+    local Rm, Rj = {}, 1
+    for i=1,17 do Rm[i] = 0 end
+    for i=17,1,-1 do
+      seed = (seed*9069) % (2^31)
+      Rm[i] = seed
+    end
+    function rand()
+      local j, m = Rj, Rm
+      local h = j - 5
+      if h < 1 then h = h + 17 end
+      local k = m[h] - m[j]
+      if k < 0 then k = k + 2147483647 end
+      m[j] = k
+      if j < 17 then Rj = j + 1 else Rj = 1 end
+      return k * (1.0/2147483647.0)
+    end
+  end
+end
+
+local function random_vector(n)
+  local v = darray(n+1)
+  for x=1,n do v[x] = rand() end
+  return v
+end
+
+local function random_matrix(m, n)
+  local a = {}
+  for y=1,m do
+    local v = darray(n+1)
+    a[y] = v
+    for x=1,n do v[x] = rand() end
+  end
+  return a
+end
+
+------------------------------------------------------------------------------
+-- FFT: Fast Fourier Transform.
+------------------------------------------------------------------------------
+
+local function fft_bitreverse(v, n)
+  local j = 0
+  for i=0,2*n-4,2 do
+    if i < j then
+      v[i+1], v[i+2], v[j+1], v[j+2] = v[j+1], v[j+2], v[i+1], v[i+2]
+    end
+    local k = n
+    while k <= j do j = j - k; k = k / 2 end
+    j = j + k
+  end
+end
+
+local function fft_transform(v, n, dir)
+  if n <= 1 then return end
+  fft_bitreverse(v, n)
+  local dual = 1
+  repeat
+    local dual2 = 2*dual
+    for i=1,2*n-1,2*dual2 do
+      local j = i+dual2
+      local ir, ii = v[i], v[i+1]
+      local jr, ji = v[j], v[j+1]
+      v[j], v[j+1] = ir - jr, ii - ji
+      v[i], v[i+1] = ir + jr, ii + ji
+    end
+    local theta = dir * pi / dual
+    local s, s2 = sin(theta), 2.0 * sin(theta * 0.5)^2
+    local wr, wi = 1.0, 0.0
+    for a=3,dual2-1,2 do
+      wr, wi = wr - s*wi - s2*wr, wi + s*wr - s2*wi
+      for i=a,a+2*(n-dual2),2*dual2 do
+	local j = i+dual2
+	local jr, ji = v[j], v[j+1]
+	local dr, di = wr*jr - wi*ji, wr*ji + wi*jr
+	local ir, ii = v[i], v[i+1]
+	v[j], v[j+1] = ir - dr, ii - di
+	v[i], v[i+1] = ir + dr, ii + di
+      end
+    end
+    dual = dual2
+  until dual >= n
+end
+
+function benchmarks.FFT(n)
+  local l2n = log(n)/log(2)
+  if l2n % 1 ~= 0 then
+    io.stderr:write("Error: FFT data length is not a power of 2\n")
+    os.exit(1)
+  end
+  local v = random_vector(n*2)
+  return function(cycles)
+    local norm = 1.0 / n
+    for p=1,cycles do
+      fft_transform(v, n, -1)
+      fft_transform(v, n, 1)
+      for i=1,n*2 do v[i] = v[i] * norm end
+    end
+    return ((5*n-2)*l2n + 2*(n+1)) * cycles
+  end
+end
+
+------------------------------------------------------------------------------
+-- SOR: Jacobi Successive Over-Relaxation.
+------------------------------------------------------------------------------
+
+local function sor_run(mat, m, n, cycles, omega)
+  local om4, om1 = omega*0.25, 1.0-omega
+  m = m - 1
+  n = n - 1
+  for i=1,cycles do
+    for y=2,m do
+      local v, vp, vn = mat[y], mat[y-1], mat[y+1]
+      for x=2,n do
+	v[x] = om4*((vp[x]+vn[x])+(v[x-1]+v[x+1])) + om1*v[x]
+      end
+    end
+  end
+end
+
+function benchmarks.SOR(n)
+  local mat = random_matrix(n, n)
+  return function(cycles)
+    sor_run(mat, n, n, cycles, 1.25)
+    return (n-1)*(n-1)*cycles*6
+  end
+end
+
+------------------------------------------------------------------------------
+-- MC: Monte Carlo Integration.
+------------------------------------------------------------------------------
+
+local function mc_integrate(cycles)
+  local under_curve = 0
+  local rand = rand
+  for i=1,cycles do
+    local x = rand()
+    local y = rand()
+    if x*x + y*y <= 1.0 then under_curve = under_curve + 1 end
+  end
+  return (under_curve/cycles) * 4
+end
+
+function benchmarks.MC()
+  return function(cycles)
+    local res = mc_integrate(cycles)
+    assert(math.sqrt(cycles)*math.abs(res-math.pi) < 5.0, "bad MC result")
+    return cycles * 4 -- Way off, but same as SciMark in C/Java.
+  end
+end
+
+------------------------------------------------------------------------------
+-- Sparse Matrix Multiplication.
+------------------------------------------------------------------------------
+
+local function sparse_mult(n, cycles, vy, val, row, col, vx)
+  for p=1,cycles do
+    for r=1,n do
+      local sum = 0
+      for i=row[r],row[r+1]-1 do sum = sum + vx[col[i]] * val[i] end
+      vy[r] = sum
+    end
+  end
+end
+
+function benchmarks.SPARSE(n, nz)
+  local nr = floor(nz/n)
+  local anz = nr*n
+  local vx = random_vector(n)
+  local val = random_vector(anz)
+  local vy, col, row = darray(n+1), iarray(nz+1), iarray(n+2)
+  row[1] = 1
+  for r=1,n do
+    local step = floor(r/nr)
+    if step < 1 then step = 1 end
+    local rr = row[r]
+    row[r+1] = rr+nr
+    for i=0,nr-1 do col[rr+i] = 1+i*step end
+  end
+  return function(cycles)
+    sparse_mult(n, cycles, vy, val, row, col, vx)
+    return anz*cycles*2
+  end
+end
+
+------------------------------------------------------------------------------
+-- LU: Dense Matrix Factorization.
+------------------------------------------------------------------------------
+
+local function lu_factor(a, pivot, m, n)
+  local min_m_n = m < n and m or n
+  for j=1,min_m_n do
+    local jp, t = j, abs(a[j][j])
+    for i=j+1,m do
+      local ab = abs(a[i][j])
+      if ab > t then
+	jp = i
+	t = ab
+      end
+    end
+    pivot[j] = jp
+    if a[jp][j] == 0 then error("zero pivot") end
+    if jp ~= j then a[j], a[jp] = a[jp], a[j] end
+    if j < m then
+      local recp = 1.0 / a[j][j]
+      for k=j+1,m do
+	local v = a[k]
+	v[j] = v[j] * recp
+      end
+    end
+    if j < min_m_n then
+      for i=j+1,m do
+	local vi, vj = a[i], a[j]
+	local eij = vi[j]
+	for k=j+1,n do vi[k] = vi[k] - eij * vj[k] end
+      end
+    end
+  end
+end
+
+local function matrix_alloc(m, n)
+  local a = {}
+  for y=1,m do a[y] = darray(n+1) end
+  return a
+end
+
+local function matrix_copy(dst, src, m, n)
+  for y=1,m do
+    local vd, vs = dst[y], src[y]
+    for x=1,n do vd[x] = vs[x] end
+  end
+end
+
+function benchmarks.LU(n)
+  local mat = random_matrix(n, n)
+  local tmp = matrix_alloc(n, n)
+  local pivot = iarray(n+1)
+  return function(cycles)
+    for i=1,cycles do
+      matrix_copy(tmp, mat, n, n)
+      lu_factor(tmp, pivot, n, n)
+    end
+    return 2.0/3.0*n*n*n*cycles
+  end
+end
+
+------------------------------------------------------------------------------
+-- Main program.
+------------------------------------------------------------------------------
+
+local function printf(...)
+  io.write(format(...))
+end
+
+local function fmtparams(p1, p2)
+  if p2 then return format("[%d, %d]", p1, p2)
+  elseif p1 then return format("[%d]", p1) end
+  return ""
+end
+
+local function measure(min_time, name, ...)
+  array_init()
+  rand_init(RANDOM_SEED)
+  local run = benchmarks[name](...)
+  local cycles = 1
+  repeat
+    local tm = clock()
+    local flops = run(cycles, ...)
+    tm = clock() - tm
+    if tm >= min_time then
+      local res = flops / tm * 1.0e-6
+      local p1, p2 = ...
+      printf("%-7s %8.2f  %s\n", name, res, fmtparams(...))
+      return res
+    end
+    cycles = cycles * 2
+  until false
+end
+
+printf("Lua SciMark %s based on SciMark 2.0a. %s.\n\n",
+       SCIMARK_VERSION, SCIMARK_COPYRIGHT)
+
+while arg and arg[1] do
+  local a = table.remove(arg, 1)
+  if a == "-noffi" then
+    package.preload.ffi = nil
+  elseif a == "-small" then
+    SIZE_SELECT = "small"
+  elseif a == "-large" then
+    SIZE_SELECT = "large"
+  elseif benchmarks[a] then
+    local p = benchmarks[SIZE_SELECT][a]
+    measure(MIN_TIME, a, tonumber(arg[1]) or p[1], tonumber(arg[2]) or p[2])
+    return
+  else
+    printf("Usage: scimark [-noffi] [-small|-large] [BENCH params...]\n\n")
+    printf("BENCH   -small         -large\n")
+    printf("---------------------------------------\n")
+    for _,name in ipairs(benchmarks) do
+      printf("%-7s %-13s %s\n", name,
+	     fmtparams(unpack(benchmarks.small[name])),
+	     fmtparams(unpack(benchmarks.large[name])))
+    end
+    printf("\n")
+    os.exit(1)
+  end
+end
+
+local params = benchmarks[SIZE_SELECT]
+local sum = 0
+for _,name in ipairs(benchmarks) do
+  sum = sum + measure(MIN_TIME, name, unpack(params[name]))
+end
+printf("\nSciMark %8.2f  [%s problem sizes]\n", sum / #benchmarks, SIZE_SELECT)
+io.flush()
+
diff --git a/perf/LuaJIT-benches/scimark-fft.lua b/perf/LuaJIT-benches/scimark-fft.lua
new file mode 100644
index 00000000..c05bb69a
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark-fft.lua
@@ -0,0 +1 @@
+require("scimark_lib").FFT(1024)(tonumber(arg and arg[1]) or 50000)
diff --git a/perf/LuaJIT-benches/scimark-lu.lua b/perf/LuaJIT-benches/scimark-lu.lua
new file mode 100644
index 00000000..7636d994
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark-lu.lua
@@ -0,0 +1 @@
+require("scimark_lib").LU(100)(tonumber(arg and arg[1]) or 5000)
diff --git a/perf/LuaJIT-benches/scimark-sor.lua b/perf/LuaJIT-benches/scimark-sor.lua
new file mode 100644
index 00000000..e537e986
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark-sor.lua
@@ -0,0 +1 @@
+require("scimark_lib").SOR(100)(tonumber(arg and arg[1]) or 50000)
diff --git a/perf/LuaJIT-benches/scimark-sparse.lua b/perf/LuaJIT-benches/scimark-sparse.lua
new file mode 100644
index 00000000..01a2258d
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark-sparse.lua
@@ -0,0 +1 @@
+require("scimark_lib").SPARSE(1000, 5000)(tonumber(arg and arg[1]) or 150000)
diff --git a/perf/LuaJIT-benches/scimark_lib.lua b/perf/LuaJIT-benches/scimark_lib.lua
new file mode 100644
index 00000000..aeffd75a
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark_lib.lua
@@ -0,0 +1,297 @@
+------------------------------------------------------------------------------
+-- Lua SciMark (2010-03-15).
+--
+-- A literal translation of SciMark 2.0a, written in Java and C.
+-- Credits go to the original authors Roldan Pozo and Bruce Miller.
+-- See: http://math.nist.gov/scimark2/
+------------------------------------------------------------------------------
+
+
+local SCIMARK_VERSION = "2010-03-15"
+
+local RANDOM_SEED = 101009 -- Must be odd.
+
+local abs, log, sin, floor = math.abs, math.log, math.sin, math.floor
+local pi, clock = math.pi, os.clock
+
+local benchmarks = {}
+
+------------------------------------------------------------------------------
+-- This is a Lagged Fibonacci Pseudo-random Number Generator with
+-- j, k, M = 5, 17, 31. Pretty weak, but same as C/Java SciMark.
+------------------------------------------------------------------------------
+
+local rand, rand_init
+
+if jit and jit.status and jit.status() then
+  -- LJ2 has bit operations and zero-based arrays (internally).
+  local bit = require("bit")
+  local band, sar = bit.band, bit.arshift
+  local Rm, Rj, Ri = {}, 0, 0
+  for i=0,16 do Rm[i] = 0 end
+  function rand_init(seed)
+    Rj, Ri = 16, 11
+    for i=16,0,-1 do
+      seed = band(seed*9069, 0x7fffffff)
+      Rm[i] = seed
+    end
+  end
+  function rand()
+    local i = band(Ri+1, sar(Ri-16, 31))
+    local j = band(Rj+1, sar(Rj-16, 31))
+    Ri, Rj = i, j
+    local k = band(Rm[i] - Rm[j], 0x7fffffff)
+    Rm[j] = k
+    return k * (1.0/2147483647.0)
+  end
+else
+  -- Better for standard Lua with one-based arrays and without bit operations.
+  local Rm, Rj = {}, 1
+  for i=1,17 do Rm[i] = 0 end
+  function rand_init(seed)
+    Rj = 1
+    for i=17,1,-1 do
+      seed = (seed*9069) % (2^31)
+      Rm[i] = seed
+    end
+  end
+  function rand()
+    local j, m = Rj, Rm
+    local h = j - 5
+    if h < 1 then h = h + 17 end
+    local k = m[h] - m[j]
+    if k < 0 then k = k + 2147483647 end
+    m[j] = k
+    if j < 17 then Rj = j + 1 else Rj = 1 end
+    return k * (1.0/2147483647.0)
+  end
+end
+
+local function random_vector(n)
+  local v = {}
+  for x=1,n do v[x] = rand() end
+  return v
+end
+
+local function random_matrix(m, n)
+  local a = {}
+  for y=1,m do
+    local v = {}
+    a[y] = v
+    for x=1,n do v[x] = rand() end
+  end
+  return a
+end
+
+------------------------------------------------------------------------------
+-- FFT: Fast Fourier Transform.
+------------------------------------------------------------------------------
+
+local function fft_bitreverse(v, n)
+  local j = 0
+  for i=0,2*n-4,2 do
+    if i < j then
+      v[i+1], v[i+2], v[j+1], v[j+2] = v[j+1], v[j+2], v[i+1], v[i+2]
+    end
+    local k = n
+    while k <= j do j = j - k; k = k / 2 end
+    j = j + k
+  end
+end
+
+local function fft_transform(v, n, dir)
+  if n <= 1 then return end
+  fft_bitreverse(v, n)
+  local dual = 1
+  repeat
+    local dual2 = 2*dual
+    for i=1,2*n-1,2*dual2 do
+      local j = i+dual2
+      local ir, ii = v[i], v[i+1]
+      local jr, ji = v[j], v[j+1]
+      v[j], v[j+1] = ir - jr, ii - ji
+      v[i], v[i+1] = ir + jr, ii + ji
+    end
+    local theta = dir * pi / dual
+    local s, s2 = sin(theta), 2.0 * sin(theta * 0.5)^2
+    local wr, wi = 1.0, 0.0
+    for a=3,dual2-1,2 do
+      wr, wi = wr - s*wi - s2*wr, wi + s*wr - s2*wi
+      for i=a,a+2*(n-dual2),2*dual2 do
+	local j = i+dual2
+	local jr, ji = v[j], v[j+1]
+	local dr, di = wr*jr - wi*ji, wr*ji + wi*jr
+	local ir, ii = v[i], v[i+1]
+	v[j], v[j+1] = ir - dr, ii - di
+	v[i], v[i+1] = ir + dr, ii + di
+      end
+    end
+    dual = dual2
+  until dual >= n
+end
+
+function benchmarks.FFT(n)
+  local l2n = log(n)/log(2)
+  if l2n % 1 ~= 0 then
+    io.stderr:write("Error: FFT data length is not a power of 2\n")
+    os.exit(1)
+  end
+  local v = random_vector(n*2)
+  return function(cycles)
+    local norm = 1.0 / n
+    for p=1,cycles do
+      fft_transform(v, n, -1)
+      fft_transform(v, n, 1)
+      for i=1,n*2 do v[i] = v[i] * norm end
+    end
+    return ((5*n-2)*l2n + 2*(n+1)) * cycles
+  end
+end
+
+------------------------------------------------------------------------------
+-- SOR: Jacobi Successive Over-Relaxation.
+------------------------------------------------------------------------------
+
+local function sor_run(mat, m, n, cycles, omega)
+  local om4, om1 = omega*0.25, 1.0-omega
+  m = m - 1
+  n = n - 1
+  for i=1,cycles do
+    for y=2,m do
+      local v, vp, vn = mat[y], mat[y-1], mat[y+1]
+      for x=2,n do
+	v[x] = om4*((vp[x]+vn[x])+(v[x-1]+v[x+1])) + om1*v[x]
+      end
+    end
+  end
+end
+
+function benchmarks.SOR(n)
+  local mat = random_matrix(n, n)
+  return function(cycles)
+    sor_run(mat, n, n, cycles, 1.25)
+    return (n-1)*(n-1)*cycles*6
+  end
+end
+
+------------------------------------------------------------------------------
+-- MC: Monte Carlo Integration.
+------------------------------------------------------------------------------
+
+local function mc_integrate(cycles)
+  local under_curve = 0
+  local rand = rand
+  for i=1,cycles do
+    local x = rand()
+    local y = rand()
+    if x*x + y*y <= 1.0 then under_curve = under_curve + 1 end
+  end
+  return (under_curve/cycles) * 4
+end
+
+function benchmarks.MC()
+  return function(cycles)
+    local res = mc_integrate(cycles)
+    assert(math.sqrt(cycles)*math.abs(res-math.pi) < 5.0, "bad MC result")
+    return cycles * 4 -- Way off, but same as SciMark in C/Java.
+  end
+end
+
+------------------------------------------------------------------------------
+-- Sparse Matrix Multiplication.
+------------------------------------------------------------------------------
+
+local function sparse_mult(n, cycles, vy, val, row, col, vx)
+  for p=1,cycles do
+    for r=1,n do
+      local sum = 0
+      for i=row[r],row[r+1]-1 do sum = sum + vx[col[i]] * val[i] end
+      vy[r] = sum
+    end
+  end
+end
+
+function benchmarks.SPARSE(n, nz)
+  local nr = floor(nz/n)
+  local anz = nr*n
+  local vx = random_vector(n)
+  local val = random_vector(anz)
+  local vy, col, row = {}, {}, {}
+  row[1] = 1
+  for r=1,n do
+    local step = floor(r/nr)
+    if step < 1 then step = 1 end
+    local rr = row[r]
+    row[r+1] = rr+nr
+    for i=0,nr-1 do col[rr+i] = 1+i*step end
+  end
+  return function(cycles)
+    sparse_mult(n, cycles, vy, val, row, col, vx)
+    return anz*cycles*2
+  end
+end
+
+------------------------------------------------------------------------------
+-- LU: Dense Matrix Factorization.
+------------------------------------------------------------------------------
+
+local function lu_factor(a, pivot, m, n)
+  local min_m_n = m < n and m or n
+  for j=1,min_m_n do
+    local jp, t = j, abs(a[j][j])
+    for i=j+1,m do
+      local ab = abs(a[i][j])
+      if ab > t then
+	jp = i
+	t = ab
+      end
+    end
+    pivot[j] = jp
+    if a[jp][j] == 0 then error("zero pivot") end
+    if jp ~= j then a[j], a[jp] = a[jp], a[j] end
+    if j < m then
+      local recp = 1.0 / a[j][j]
+      for k=j+1,m do
+        local v = a[k]
+	v[j] = v[j] * recp
+      end
+    end
+    if j < min_m_n then
+      for i=j+1,m do
+	local vi, vj = a[i], a[j]
+	local eij = vi[j]
+	for k=j+1,n do vi[k] = vi[k] - eij * vj[k] end
+      end
+    end
+  end
+end
+
+local function matrix_alloc(m, n)
+  local a = {}
+  for y=1,m do a[y] = {} end
+  return a
+end
+
+local function matrix_copy(dst, src, m, n)
+  for y=1,m do
+    local vd, vs = dst[y], src[y]
+    for x=1,n do vd[x] = vs[x] end
+  end
+end
+
+function benchmarks.LU(n)
+  local mat = random_matrix(n, n)
+  local tmp = matrix_alloc(n, n)
+  local pivot = {}
+  return function(cycles)
+    for i=1,cycles do
+      matrix_copy(tmp, mat, n, n)
+      lu_factor(tmp, pivot, n, n)
+    end
+    return 2.0/3.0*n*n*n*cycles
+  end
+end
+
+rand_init(RANDOM_SEED)
+
+return benchmarks
diff --git a/perf/LuaJIT-benches/series.lua b/perf/LuaJIT-benches/series.lua
new file mode 100644
index 00000000..f766cb32
--- /dev/null
+++ b/perf/LuaJIT-benches/series.lua
@@ -0,0 +1,34 @@
+
+local function integrate(x0, x1, nsteps, omegan, f)
+  local x, dx = x0, (x1-x0)/nsteps
+  local rvalue = ((x0+1)^x0 * f(omegan*x0)) / 2
+  for i=3,nsteps do
+    x = x + dx
+    rvalue = rvalue + (x+1)^x * f(omegan*x)
+  end
+  return (rvalue + ((x1+1)^x1 * f(omegan*x1)) / 2) * dx
+end
+
+local function series(n)
+  local sin, cos = math.sin, math.cos
+  local omega = math.pi
+  local t = {}
+
+  t[1] = integrate(0, 2, 1000, 0, function() return 1 end) / 2
+  t[2] = 0
+
+  for i=2,n do
+    t[2*i-1] = integrate(0, 2, 1000, omega*i, cos)
+    t[2*i] = integrate(0, 2, 1000, omega*i, sin)
+  end
+
+  return t
+end
+
+local n = tonumber(arg and arg[1]) or 10000
+local tm = os.clock()
+local t = series(n)
+tm = os.clock() - tm
+assert(math.abs(t[1]-2.87295) < 0.00001)
+io.write(string.format("size %d, %.2f s, %.1f iterations/s\n",
+                       n, tm, (2*n-1)/tm))
diff --git a/perf/LuaJIT-benches/spectral-norm.lua b/perf/LuaJIT-benches/spectral-norm.lua
new file mode 100644
index 00000000..ecc80112
--- /dev/null
+++ b/perf/LuaJIT-benches/spectral-norm.lua
@@ -0,0 +1,40 @@
+
+local function A(i, j)
+  local ij = i+j-1
+  return 1.0 / (ij * (ij-1) * 0.5 + i)
+end
+
+local function Av(x, y, N)
+  for i=1,N do
+    local a = 0
+    for j=1,N do a = a + x[j] * A(i, j) end
+    y[i] = a
+  end
+end
+
+local function Atv(x, y, N)
+  for i=1,N do
+    local a = 0
+    for j=1,N do a = a + x[j] * A(j, i) end
+    y[i] = a
+  end
+end
+
+local function AtAv(x, y, t, N)
+  Av(x, t, N)
+  Atv(t, y, N)
+end
+
+local N = tonumber(arg and arg[1]) or 100
+local u, v, t = {}, {}, {}
+for i=1,N do u[i] = 1 end
+
+for i=1,10 do AtAv(u, v, t, N) AtAv(v, u, t, N) end
+
+local vBv, vv = 0, 0
+for i=1,N do
+  local ui, vi = u[i], v[i]
+  vBv = vBv + ui*vi
+  vv = vv + vi*vi
+end
+io.write(string.format("%0.9f\n", math.sqrt(vBv / vv)))
diff --git a/perf/LuaJIT-benches/sum-file.lua b/perf/LuaJIT-benches/sum-file.lua
new file mode 100644
index 00000000..c9e618fd
--- /dev/null
+++ b/perf/LuaJIT-benches/sum-file.lua
@@ -0,0 +1,6 @@
+
+local sum = 0
+for line in io.lines() do
+  sum = sum + line
+end
+io.write(sum, "\n")
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 02/41] perf: introduce clock module
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 01/41] perf: add LuaJIT-test-cleanup perf suite Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 03/41] perf: introduce bench module Sergey Kaplun via Tarantool-patches
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This module contains 2 functions:
- `realtime()` -- returns the time represented by the wall clock.
- `process_cputime()` -- returns the time consumed by all threads of
  the process.

Both functions are implemented via FFI call to the `clock_gettime()`.
---
 perf/utils/clock.lua | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 perf/utils/clock.lua

diff --git a/perf/utils/clock.lua b/perf/utils/clock.lua
new file mode 100644
index 00000000..57385967
--- /dev/null
+++ b/perf/utils/clock.lua
@@ -0,0 +1,35 @@
+local ffi = require('ffi')
+
+ffi.cdef[[
+struct timespec {
+  long tv_sec; /* Seconds. */
+  long tv_nsec; /* Nanoseconds. */
+};
+
+int clock_gettime(int clockid, struct timespec *tp);
+]]
+
+local C = ffi.C
+
+-- Wall clock.
+local CLOCK_REALTIME = 0
+-- CPU time consumed by the process.
+local CLOCK_PROCESS_CPUTIME_ID = 2
+
+-- All functions below returns the corresponding `clock_gettime()`
+-- in seconds.
+local M = {}
+
+local timespec = ffi.new('struct timespec[1]')
+
+function M.realtime()
+  C.clock_gettime(CLOCK_REALTIME, timespec)
+  return tonumber(timespec[0].tv_sec) + tonumber(timespec[0].tv_nsec) / 1e9
+end
+
+function M.process_cputime()
+  C.clock_gettime(CLOCK_PROCESS_CPUTIME_ID, timespec)
+  return tonumber(timespec[0].tv_sec) + tonumber(timespec[0].tv_nsec) / 1e9
+end
+
+return M
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 03/41] perf: introduce bench module
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 01/41] perf: add LuaJIT-test-cleanup perf suite Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 02/41] perf: introduce clock module Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 04/41] perf: adjust array3d in LuaJIT-benches Sergey Kaplun via Tarantool-patches
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This module provides functionality to run custom benchmark workloads
defined by the following syntax:

| local bench = require('bench').new(arg)
|
| -- f_* are functions, n_* are numbers.
| bench:add({
|   setup = f_setup,
|   payload = f_payload,
|   teardown = f_teardown,
|   items = n_items_processed,
|
|   checker = f_checker,
|   -- Or instead:
|   skip_check = true,
|
|   iterations = n_iterations,
|   -- Or instead:
|   min_time = n_seconds,
| })
|
| bench:run_and_report()

The checker function received the single value returned by the payload
function and completed all checks related to the test. If it returns a
true value, it is considered a successful check pass. The checker
function is called before the main workload as a warm-up. Generally, you
should always provide the checker function to be sure that your
benchmark is still correct after optimizations. In cases when it is
impossible (for some reason), you may specify the `skip_check` flag. In
that case the warm-up part will be skipped as well.

Each test is run in the order it was added. The module measures the
real-time and CPU time necessary to run `iterations` repetitions of the
test or amount of iterations `min_time` in seconds (4 by default) and
calculates the metric items per second (more is better). The total
amount of items equals `n_items_processed * n_iterations`. The items may
be added in the table with the description inside the payload function
as well. The results (real-time, CPU time, iterations, items/s) are
reported in a format similar to the Google Benchmark suite [1].

Each test may be run from the command line as follows:
| LUA_PATH="..." luajit test_name.lua [flags] arguments

The supported flags are:
| -j{off|on}                 Disable/Enable JIT for the benchmarks.
| --benchmark_color={true|false|auto}
|                            Enables the colorized output for the
|                            terminal (not the file).
| --benchmark_min_time={number} Minimum seconds to run the benchmark
|                            tests.
| --benchmark_out=<file>     Places the output into <file>.
| --benchmark_out_format={console|json}
|                            The format is used when saving the results in the
|                            file. The default format is the JSON format.
| -h, --help                 Display help message and exit.

These options are similar to the Google Benchmark command line options,
but with a few changes:
1) If an output file is given, there is no output in the terminal.
2) The min_time option supports only number values. There is no support
   for the iterations number (by the 'x' suffix).

[1]: https://github.com/google/benchmark
---
 perf/utils/bench.lua | 509 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 509 insertions(+)
 create mode 100644 perf/utils/bench.lua

diff --git a/perf/utils/bench.lua b/perf/utils/bench.lua
new file mode 100644
index 00000000..68473215
--- /dev/null
+++ b/perf/utils/bench.lua
@@ -0,0 +1,509 @@
+local clock = require('clock')
+local ffi = require('ffi')
+-- Require 'cjson' only on demand for formatted output to file.
+local json
+
+local M = {}
+
+local type, assert, error = type, assert, error
+local format, rep = string.format, string.rep
+local floor, max, min = math.floor, math.max, math.min
+local table_remove = table.remove
+
+local LJ_HASJIT = jit and jit.opt
+
+-- Argument parsing. ---------------------------------------------
+
+-- XXX: Make options compatible with Google Benchmark, since most
+-- probably it will be used for the C benchmarks as well.
+-- Compatibility isn't full: there is no support for environment
+-- variables (since they are not so useful) and the output to the
+-- terminal is suppressed if the --benchmark_out flag is
+-- specified.
+
+local HELP_MSG = [[
+ Options:
+   -j{off|on}                 Disable/Enable JIT for the benchmarks.
+   --benchmark_color={true|false|auto}
+                              Enables the colorized output for the terminal (not
+                              the file). 'auto' means to use colors if the
+                              output is being sent to a terminal and the TERM
+                              environment variable is set to a terminal type
+                              that supports colors.
+   --benchmark_min_time={number}
+                              Minimum seconds to run the benchmark tests.
+                              4.0 by default.
+   --benchmark_out=<file>     Places the output into <file>.
+   --benchmark_out_format={console|json}
+                              The format is used when saving the results in the
+                              file. The default format is the JSON format.
+   -h, --help                 Display this message and exit.
+
+ There are a bunch of suggestions on how to achieve the most
+ stable benchmark results:
+ https://github.com/tarantool/tarantool/wiki/Benchmarking
+]]
+
+local function usage(ctx)
+  local header = format('USAGE: luajit %s [options]\n', ctx.name)
+  io.stderr:write(header, HELP_MSG)
+  os.exit(1)
+end
+
+local function check_param(check, strfmt, ...)
+  if not check then
+    io.stderr:write(format(strfmt, ...))
+    os.exit(1)
+  end
+end
+
+-- Valid values: 'false'/'no'/'0'.
+-- In case of an invalid value the 'auto' is used.
+local function set_color(ctx, value)
+  if value == 'false' or value == 'no' or value == '0' then
+    ctx.color = false
+  else
+    -- In case of an invalid value, the Google Benchmark uses
+    -- 'auto', which is true for the stdout output (the only
+    -- colorizable output). So just set it to true by default.
+    ctx.color = true
+  end
+end
+
+local DEFAULT_MIN_TIME = 4.0
+local function set_min_time(ctx, value)
+  local time = tonumber(value)
+  check_param(time, 'Invalid min time: "%s"\n', value)
+  ctx.min_time = time
+end
+
+local function set_output(ctx, filename)
+  check_param(type(filename) == "string", 'Invalid output value: "%s"\n',
+              filename)
+  ctx.output = filename
+end
+
+-- Determine the output format for the benchmark.
+-- Supports only 'console' and 'json' for now.
+local function set_output_format(ctx, value)
+  local output_format = tostring(value)
+  check_param(output_format, 'Invalid output format: "%s"\n', value)
+  output_format = output_format:lower()
+  check_param(output_format == 'json' or output_format == 'console',
+              'Unsupported output format: "%s"\n', output_format)
+  ctx.output_format = output_format
+end
+
+local function set_jit(ctx, value)
+  check_param(value == 'on' or value == 'off',
+             'Invalid jit value: "%s"\n', value)
+  if value == 'off' then
+    ctx.jit = false
+  elseif value == 'on' then
+    ctx.jit = true
+  end
+end
+
+local function unrecognized_option(optname, dashes)
+  local fullname = dashes .. (optname or '=')
+  io.stderr:write(format('unrecognized command-line flag: %s\n', fullname))
+  io.stderr:write(HELP_MSG)
+  os.exit(1)
+end
+
+local function unrecognized_long_option(_, optname)
+  unrecognized_option(optname, '--')
+end
+
+local function unrecognized_short_option(_, optname)
+  unrecognized_option(optname, '-')
+end
+
+local SHORT_OPTS = setmetatable({
+  ['h'] = usage,
+  ['j'] = set_jit,
+}, {__index = unrecognized_short_option})
+
+local LONG_OPTS = setmetatable({
+  ['benchmark_color'] = set_color,
+  ['benchmark_min_time'] = set_min_time,
+  ['benchmark_out'] = set_output,
+  -- XXX: For now support only JSON encoded and raw output.
+  ['benchmark_out_format'] = set_output_format,
+  ['help'] = usage,
+}, {__index = unrecognized_long_option})
+
+local function is_option(str)
+  return type(str) == 'string' and str:sub(1, 1) == '-' and str ~= '-'
+end
+
+local function next_arg_value(arg, n)
+  local opt_value = nil
+  if arg[n] and not is_option(arg[n]) then
+    opt_value = arg[n]
+    table_remove(arg, n)
+  end
+  return opt_value
+end
+
+local function parse_long_option(arg, a, n)
+  local opt_name, opt_value
+  -- Remove dashes.
+  local opt = a:sub(3)
+  -- --option=value
+  if opt:find('=', 1, true) then
+    -- May match empty option name and/or value.
+    opt_name, opt_value = opt:match('^([^=]+)=(.*)$')
+  else
+    -- --option value
+    opt_name = opt
+    opt_value = next_arg_value(arg, n)
+  end
+  return opt_name, opt_value
+end
+
+local function parse_short_option(arg, a, n)
+  local opt_name, opt_value
+  -- Remove the dash.
+  local opt = a:sub(2)
+  if #opt == 1 then
+    -- -o value
+    opt_name = opt
+    opt_value = next_arg_value(arg, n)
+  else
+    -- -ovalue.
+    opt_name = opt:sub(1, 1)
+    opt_value = opt:sub(2)
+  end
+  return opt_name, opt_value
+end
+
+local function parse_opt(ctx, arg, a, n)
+  if a:sub(1, 2) == '--' then
+    local opt_name, opt_value = parse_long_option(arg, a, n)
+    LONG_OPTS[opt_name](ctx, opt_value)
+  else
+    local opt_name, opt_value = parse_short_option(arg, a, n)
+    SHORT_OPTS[opt_name](ctx, opt_value)
+  end
+end
+
+-- Process the options and update the benchmark context.
+local function argparse(arg, name)
+  local ctx = {name = name}
+  local n = 1
+  while n <= #arg do
+    local a = arg[n]
+    if is_option(a) then
+      table_remove(arg, n)
+      parse_opt(ctx, arg, a, n)
+    else
+      -- Just ignore it.
+      n = n + 1
+    end
+  end
+  return ctx
+end
+
+-- Formatting. ---------------------------------------------------
+
+local function format_console_header()
+  -- Use a similar format to the Google Benchmark, except for the
+  -- fixed benchmark name length.
+  local header = format('%-37s %12s %15s %13s %-28s\n',
+    'Benchmark', 'Time', 'CPU', 'Iterations', 'UserCounters...'
+  )
+  local border = rep('-', #header - 1) .. '\n'
+  return border .. header .. border
+end
+
+local COLORS = {
+  GREEN = '\027[32m%s\027[m',
+  YELLOW = '\027[33m%s\027[m',
+  CYAN = '\027[36m%s\027[m',
+}
+
+local function format_name(ctx, name)
+  name = format('%-37s ', name)
+  if ctx.color then
+     name = format(COLORS.GREEN, name)
+  end
+  return name
+end
+
+local function format_time(ctx, real_time, cpu_time, time_unit)
+  local timestr = format('%10.2f %-4s %10.2f %-4s ', real_time, time_unit,
+                         cpu_time, time_unit)
+  if ctx.color then
+     timestr = format(COLORS.YELLOW, timestr)
+  end
+  return timestr
+end
+
+local function format_iterations(ctx, iterations)
+  iterations = format('%10d ', iterations)
+  if ctx.color then
+     iterations = format(COLORS.CYAN, iterations)
+  end
+  return iterations
+end
+
+local function format_ips(ips)
+  local ips_str
+  if ips / 1e6 > 1 then
+    ips_str = format('items_per_second=%.3fM/s', ips / 1e6)
+  elseif ips / 1e3 > 1 then
+    ips_str = format('items_per_second=%.3fk/s', ips / 1e3)
+  else
+    ips_str = format('items_per_second=%d/s', ips)
+  end
+  return ips_str
+end
+
+local function format_result_console(ctx, r)
+  return format('%s%s%s%s\n',
+    format_name(ctx, r.name),
+    format_time(ctx, r.real_time, r.cpu_time, r.time_unit),
+    format_iterations(ctx, r.iterations),
+    format_ips(r.items_per_second)
+  )
+end
+
+local function format_results(ctx)
+  local output_format = ctx.output_format
+  local res = ''
+  if output_format == 'json' then
+    res = json.encode({
+      benchmarks = ctx.results,
+      context = ctx.context,
+    })
+  else
+    assert(output_format == 'console', 'Unknown format: ' .. output_format)
+    res = res .. format_console_header()
+    for _, r in ipairs(ctx.results) do
+      res = res .. format_result_console(ctx, r)
+    end
+  end
+  return res
+end
+
+local function report_results(ctx)
+  ctx.fh:write(format_results(ctx))
+end
+
+-- Tests setup and run. ------------------------------------------
+
+local function term_is_color()
+  local term = os.getenv('TERM')
+  return (term and term:match('color') or os.getenv('COLORTERM'))
+end
+
+local function benchmark_context(ctx)
+  return {
+    arch = jit.arch,
+    -- Google Benchmark reports a date in ISO 8061 format.
+    date = os.date('%Y-%m-%dT%H:%M:%S%z'),
+    gc64 = ffi.abi('gc64'),
+    host_name = io.popen('hostname'):read(),
+    jit = ctx.jit,
+  }
+end
+
+local function init(ctx)
+  -- Array of benches to proceed with.
+  ctx.benches = {}
+  -- Array of the corresponding results.
+  ctx.results = {}
+
+  if ctx.jit == nil then
+    if LJ_HASJIT then
+      ctx.jit = jit.status()
+    else
+      ctx.jit = false
+    end
+  end
+  ctx.color = ctx.color == nil and true or ctx.color
+  if ctx.output then
+    -- Don't bother with manual file closing. It will be closed
+    -- automatically when the corresponding object is
+    -- garbage-collected.
+    ctx.fh = assert(io.open(ctx.output, 'w+'))
+    ctx.output_format = ctx.output_format or 'json'
+    -- Always without color.
+    ctx.color = false
+  else
+    ctx.fh = io.stdout
+    -- Always console outptut to the terminal.
+    ctx.output_format = 'console'
+    if ctx.color and term_is_color() then
+      ctx.color = true
+    else
+      ctx.color = false
+    end
+  end
+  ctx.min_time = ctx.min_time or DEFAULT_MIN_TIME
+
+  if ctx.output_format == 'json' then
+    json = require('cjson')
+  end
+
+  -- Google Benchmark's context, plus benchmark info.
+  ctx.context = benchmark_context(ctx)
+
+  return ctx
+end
+
+local function test_name()
+  return debug.getinfo(3, 'S').short_src:match('([^/\\]+)$')
+end
+
+local function add_bench(ctx, bench)
+  if bench.checker == nil and not bench.skip_check then
+    error('Bench requires a checker to proof the results', 2)
+  end
+  table.insert(ctx.benches, bench)
+end
+
+local MAX_ITERATIONS = 1e9
+-- Determine the number of iterations for the next benchmark run.
+local function iterations_multiplier(min_time, get_time, iterations)
+  -- When the last run is at least 10% of the required time, the
+  -- maximum expansion should be 14x.
+  local multiplier = min_time * 1.4 / max(get_time, 1e-9)
+  local is_significant = get_time / min_time > 0.1
+  multiplier = is_significant and multiplier or 10
+  local new_iterations = max(floor(multiplier * iterations), iterations + 1)
+  return min(new_iterations, MAX_ITERATIONS)
+end
+
+-- https://luajit.org/running.html#foot.
+local JIT_DEFAULTS = {
+  maxtrace = 1000,
+  maxrecord = 4000,
+  maxirconst = 500,
+  maxside = 100,
+  maxsnap = 500,
+  hotloop = 56,
+  hotexit = 10,
+  tryside = 4,
+  instunroll = 4,
+  loopunroll = 15,
+  callunroll = 3,
+  recunroll = 2,
+  sizemcode = 32,
+  maxmcode = 512,
+}
+
+-- Basic setup for all tests to clean up after a previous
+-- executor.
+local function luajit_tests_setup(ctx)
+  -- Reset the JIT to the defaults.
+  if ctx.jit == false then
+    jit.off()
+  elseif LJ_HASJIT then
+    jit.on()
+    jit.flush()
+    jit.opt.start(3)
+    for k, v in pairs(JIT_DEFAULTS) do
+      jit.opt.start(k .. '=' .. v)
+    end
+  end
+
+  -- Reset the GC to the defaults.
+  collectgarbage('setstepmul', 200)
+  collectgarbage('setpause', 200)
+
+  -- Collect all garbage at the end. Twice to be sure that all
+  -- finalizers are run.
+  collectgarbage()
+  collectgarbage()
+end
+
+local function run_benches(ctx)
+  -- Process the tests in the predefined order with ipairs.
+  for _, bench in ipairs(ctx.benches) do
+    luajit_tests_setup(ctx)
+    if bench.setup then bench.setup() end
+
+    -- The first run is used as a warm-up, plus results checks.
+    local payload = bench.payload
+    -- Generally you should never skip any checks. But sometimes
+    -- a bench may generate so much output in one run that it is
+    -- overkill to save the result in the file and test it.
+    -- So to avoid double time for the test run, just skip the
+    -- check.
+    if not bench.skip_check then
+      local result = payload()
+      assert(bench.checker(result))
+    end
+    local N
+    local delta_real, delta_cpu
+    -- Iterations are specified manually.
+    if bench.iterations then
+      N = bench.iterations
+
+      local start_real = clock.realtime()
+      local start_cpu  = clock.process_cputime()
+      for _ = 1, N do
+        payload()
+      end
+      delta_real = clock.realtime() - start_real
+      delta_cpu  = clock.process_cputime() - start_cpu
+    else
+      -- Iterations are determined dinamycally, adjusting to fit
+      -- the minimum time to run for the benchmark.
+      local min_time = bench.min_time or ctx.min_time
+      local next_iterations = 1
+      repeat
+        N = next_iterations
+        local start_real = clock.realtime()
+        local start_cpu  = clock.process_cputime()
+        for _ = 1, N do
+          payload()
+        end
+        delta_real = clock.realtime() - start_real
+        delta_cpu  = clock.process_cputime() - start_cpu
+        next_iterations = iterations_multiplier(min_time, delta_real, N)
+      until delta_real > min_time or N == next_iterations
+    end
+
+    if bench.teardown then bench.teardown() end
+
+    local items = N * bench.items
+    local items_per_second = math.floor(items / delta_real)
+    table.insert(ctx.results, {
+      cpu_time = delta_cpu,
+      real_time = delta_real,
+      items_per_second = items_per_second,
+      iterations = N,
+      name = bench.name,
+      time_unit = 's',
+      -- Fields below are used only for the Google Benchmark
+      -- compatibility. We don't use them really.
+      run_name = bench.name,
+      run_type = 'iteration',
+      repetitions = 1,
+      repetition_index = 1,
+      threads = 1,
+    })
+  end
+end
+
+local function run_and_report(ctx)
+  run_benches(ctx)
+  report_results(ctx)
+end
+
+function M.new(arg)
+  assert(type(arg) == 'table', 'given argument should be a table')
+  local name = test_name()
+  local ctx = init(argparse(arg, name))
+  return setmetatable(ctx, {__index = {
+    add = add_bench,
+    run = run_benches,
+    report = report_results,
+    run_and_report = run_and_report,
+  }})
+end
+
+return M
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 04/41] perf: adjust array3d in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (2 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 03/41] perf: introduce bench module Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 05/41] perf: adjust binary-trees " Sergey Kaplun via Tarantool-patches
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The number of iterations is fixed for this test to avoid OOM errors
for the non-GC64 builds.
---
 perf/LuaJIT-benches/array3d.lua | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/perf/LuaJIT-benches/array3d.lua b/perf/LuaJIT-benches/array3d.lua
index c10b09b1..75ab5b01 100644
--- a/perf/LuaJIT-benches/array3d.lua
+++ b/perf/LuaJIT-benches/array3d.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local function array_set(self, x, y, z, p)
   assert(x >= 0 and x < self.nx, "x outside PA")
@@ -50,10 +51,24 @@ end
 
 local dim = tonumber(arg and arg[1]) or 300 -- Array dimension dim^3
 local packed = arg and arg[2] == "packed"   -- Packed image or flat
-local arr = array_new(dim, dim, dim, packed)
 
-for x,y,z in arr:points() do
-  arr:set(x, y, z, x*x)
-end
-assert(arr.image[dim^3-1] == (dim-1)^2)
+bench:add({
+  name = "array3d",
+  checker = function(arr)
+    assert(arr.image[dim^3-1] == (dim-1)^2)
+    return true
+  end,
+  payload = function()
+    local arr = array_new(dim, dim, dim, packed)
+    for x,y,z in arr:points() do
+      arr:set(x, y, z, x*x)
+    end
+    return arr
+  end,
+  items = dim * dim * dim,
+  -- Limit the number of iterations to avoid OOM errors for
+  -- non-GC64 builds.
+  iterations = 5,
+})
 
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 05/41] perf: adjust binary-trees in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (3 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 04/41] perf: adjust array3d in LuaJIT-benches Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 06/41] perf: adjust chameneos " Sergey Kaplun via Tarantool-patches
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The test cases are split by the different types of trees:
1) stretched tree,
2) long-lived tree,
3) several trees with a depth of the power of 2,
4) iteration over all trees in the third test case.

The number of items is the number of `ItemCheck()` first-level calls
performed in the payload.
---

I'm not sure that we should distinguish different subtests here.
OTOH, how to calculate the amount of items correctly for the whole test
instead?

 perf/LuaJIT-benches/binary-trees.lua | 94 ++++++++++++++++++++++------
 1 file changed, 76 insertions(+), 18 deletions(-)

diff --git a/perf/LuaJIT-benches/binary-trees.lua b/perf/LuaJIT-benches/binary-trees.lua
index bf040466..9d4dc7b4 100644
--- a/perf/LuaJIT-benches/binary-trees.lua
+++ b/perf/LuaJIT-benches/binary-trees.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local function BottomUpTree(item, depth)
   if depth > 0 then
@@ -18,30 +19,87 @@ local function ItemCheck(tree)
   end
 end
 
-local N = tonumber(arg and arg[1]) or 0
+local N = tonumber(arg and arg[1]) or 16
 local mindepth = 4
 local maxdepth = mindepth + 2
 if maxdepth < N then maxdepth = N end
 
-do
-  local stretchdepth = maxdepth + 1
-  local stretchtree = BottomUpTree(0, stretchdepth)
-  io.write(string.format("stretch tree of depth %d\t check: %d\n",
-    stretchdepth, ItemCheck(stretchtree)))
-end
+local stretchdepth = maxdepth + 1
+
+bench:add({
+  name = "stretch_depth_" .. tostring(stretchdepth),
+  payload = function()
+    local stretchtree = BottomUpTree(0, stretchdepth)
+    local check = ItemCheck(stretchtree)
+    return check
+  end,
+  items = 1,
+  checker = function(check)
+    return check == -1
+  end,
+})
 
-local longlivedtree = BottomUpTree(0, maxdepth)
+-- This tree created once on the setup for the first test.
+local longlivedtree
 
-for depth=mindepth,maxdepth,2 do
+for depth = mindepth, maxdepth, 2 do
   local iterations = 2 ^ (maxdepth - depth + mindepth)
-  local check = 0
-  for i=1,iterations do
-    check = check + ItemCheck(BottomUpTree(1, depth)) +
-            ItemCheck(BottomUpTree(-1, depth))
-  end
-  io.write(string.format("%d\t trees of depth %d\t check: %d\n",
-    iterations*2, depth, check))
+  local tree_bench
+  tree_bench = {
+    name = "tree_depth_" .. tostring(depth),
+    setup = function()
+      if not longlivedtree then
+        longlivedtree = BottomUpTree(0, maxdepth)
+      end
+      tree_bench.items = iterations * 2
+    end,
+    checker = function(check)
+      return check == -iterations * 2
+    end,
+    payload = function()
+      local check = 0
+      for i = 1, iterations do
+        check = check + ItemCheck(BottomUpTree(1, depth)) +
+                ItemCheck(BottomUpTree(-1, depth))
+      end
+      return check
+    end,
+  }
+
+  bench:add(tree_bench)
 end
 
-io.write(string.format("long lived tree of depth %d\t check: %d\n",
-  maxdepth, ItemCheck(longlivedtree)))
+bench:add({
+  name = "longlived_depth_" .. tostring(maxdepth),
+  payload = function()
+    local check = ItemCheck(longlivedtree)
+    return check
+  end,
+  items = 1,
+  checker = function(check)
+    return check == -1
+  end,
+})
+
+bench:add({
+  name = "all_in_once",
+  payload = function()
+    for depth = mindepth, maxdepth, 2 do
+      local iterations = 2 ^ (maxdepth - depth + mindepth)
+      local tree_bench
+      local check = 0
+      for i = 1, iterations do
+        check = check + ItemCheck(BottomUpTree(1, depth)) +
+                ItemCheck(BottomUpTree(-1, depth))
+      end
+      assert(check == -iterations * 2)
+    end
+  end,
+  -- Geometric progression, starting at maxdepth trees with the
+  -- corresponding step.
+  items = (2 * maxdepth) * (4 ^ ((maxdepth - mindepth) / 2 + 1) - 1) / 3,
+  -- Correctness is checked in the payload function.
+  skip_check = true,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 06/41] perf: adjust chameneos in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (4 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 05/41] perf: adjust binary-trees " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 07/41] perf: adjust coroutine-ring " Sergey Kaplun via Tarantool-patches
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/chameneos.lua | 32 ++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/perf/LuaJIT-benches/chameneos.lua b/perf/LuaJIT-benches/chameneos.lua
index 78b64c3f..c1002041 100644
--- a/perf/LuaJIT-benches/chameneos.lua
+++ b/perf/LuaJIT-benches/chameneos.lua
@@ -1,8 +1,10 @@
+local bench = require("bench").new(arg)
 
 local co = coroutine
 local create, resume, yield = co.create, co.resume, co.yield
 
-local N = tonumber(arg and arg[1]) or 10
+local N = tonumber(arg and arg[1]) or 1e7
+local N_ATTEMPTS = N
 local first, second
 
 -- Meet another creature.
@@ -57,12 +59,24 @@ local function schedule(threads)
   until false
 end
 
--- A bunch of colorful creatures.
-local threads = {
-  creature("blue"),
-  creature("red"),
-  creature("yellow"),
-  creature("blue"),
-}
+bench:add({
+  name = "chameneos",
+  items = N_ATTEMPTS,
+  checker = function(meetings) return meetings == N_ATTEMPTS * 2 end,
+  payload = function()
+    -- A bunch of colorful creatures.
+    local threads = {
+      creature("blue"),
+      creature("red"),
+      creature("yellow"),
+      creature("blue"),
+    }
 
-io.write(schedule(threads), "\n")
+    local meetings = schedule(threads)
+    -- XXX: Restore meetings for the next iteration.
+    N = N_ATTEMPTS
+    return meetings
+  end,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 07/41] perf: adjust coroutine-ring in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (5 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 06/41] perf: adjust chameneos " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 08/41] perf: adjust euler14-bit " Sergey Kaplun via Tarantool-patches
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/coroutine-ring.lua | 45 ++++++++++++++++----------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/perf/LuaJIT-benches/coroutine-ring.lua b/perf/LuaJIT-benches/coroutine-ring.lua
index 1e8c5ef6..1b86a5ba 100644
--- a/perf/LuaJIT-benches/coroutine-ring.lua
+++ b/perf/LuaJIT-benches/coroutine-ring.lua
@@ -1,3 +1,5 @@
+local bench = require("bench").new(arg)
+
 -- The Computer Language Benchmarks Game
 -- http://shootout.alioth.debian.org/
 -- contributed by Sam Roberts
@@ -7,7 +9,6 @@ local n         = tonumber(arg and arg[1]) or 2e7
 
 -- fixed size pool
 local poolsize  = 503
-local threads   = {}
 
 -- cache these to avoid global environment lookups
 local create    = coroutine.create
@@ -15,7 +16,6 @@ local resume    = coroutine.resume
 local yield     = coroutine.yield
 
 local id        = 1
-local token     = 0
 local ok
 
 local body = function(token)
@@ -24,19 +24,30 @@ local body = function(token)
   end
 end
 
--- create all threads
-for id = 1, poolsize do
-  threads[id] = create(body)
-end
-
--- send the token
-repeat
-  if id == poolsize then
-    id = 1
-  else
-    id = id + 1
-  end
-  ok, token = resume(threads[id], token)
-until token == n
+bench:add({
+  name = "coroutine_ring",
+  payload = function()
+    local token     = 0
+    -- create all threads
+    local threads   = {}
+    for id = 1, poolsize do
+      threads[id] = create(body)
+    end
+
+    -- send the token
+    repeat
+      if id == poolsize then
+        id = 1
+      else
+        id = id + 1
+      end
+      ok, token = resume(threads[id], token)
+    until token == n
+    return id
+  end,
+  checker = function(id) return id == (n % poolsize + 1) end,
+  items = n,
+})
+
+bench:run_and_report()
 
-io.write(id, "\n")
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 08/41] perf: adjust euler14-bit in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (6 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 07/41] perf: adjust coroutine-ring " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 09/41] perf: adjust fannkuch " Sergey Kaplun via Tarantool-patches
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/euler14-bit.lua | 52 ++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/perf/LuaJIT-benches/euler14-bit.lua b/perf/LuaJIT-benches/euler14-bit.lua
index 537f2bf3..7c521deb 100644
--- a/perf/LuaJIT-benches/euler14-bit.lua
+++ b/perf/LuaJIT-benches/euler14-bit.lua
@@ -1,22 +1,42 @@
+local bench = require("bench").new(arg)
 
 local bit = require("bit")
 local bnot, bor, band = bit.bnot, bit.bor, bit.band
 local shl, shr = bit.lshift, bit.rshift
 
-local N = tonumber(arg and arg[1]) or 10000000
-local cache, m, n = { 1 }, 1, 1
-if arg and arg[2] then cache = nil end
-for i=2,N do
-  local j = i
-  for len=1,1000000000 do
-    j = bor(band(shr(j,1), band(j,1)-1), band(shl(j,1)+j+1, bnot(band(j,1)-1)))
-    if cache then
-      local x = cache[j]; if x then j = x+len; break end
-    elseif j == 1 then
-      j = len+1; break
+local DEFAULT_N = 2e7
+local N = tonumber(arg and arg[1]) or DEFAULT_N
+local drop_cache = arg and arg[2]
+
+bench:add({
+  name = "euler14_bit",
+  payload = function()
+    local cache, m, n = { 1 }, 1, 1
+    if drop_cache then cache = nil end
+    for i=2,N do
+      local j = i
+      for len=1,1000000000 do
+        j = bor(band(shr(j,1), band(j,1)-1), band(shl(j,1)+j+1, bnot(band(j,1)-1)))
+        if cache then
+          local x = cache[j]; if x then j = x+len; break end
+        elseif j == 1 then
+          j = len+1; break
+        end
+      end
+      if cache then cache[i] = j end
+      if j > m then m, n = j, i end
+    end
+    return {n = n, m = m}
+  end,
+  checker = function(res)
+    if N ~= DEFAULT_N then
+      -- Test only for the default.
+      return true
+    else
+      return res.n == 18064027 and res.m == 623
     end
-  end
-  if cache then cache[i] = j end
-  if j > m then m, n = j, i end
-end
-io.write("Found ", n, " (chain length: ", m, ")\n")
+  end,
+  items = N,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 09/41] perf: adjust fannkuch in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (7 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 08/41] perf: adjust euler14-bit " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 10/41] perf: adjust fasta " Sergey Kaplun via Tarantool-patches
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---

I'm not sure that amount of permutations is the correct items count.
Have you any other suggestions?

 perf/LuaJIT-benches/fannkuch.lua | 37 +++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/perf/LuaJIT-benches/fannkuch.lua b/perf/LuaJIT-benches/fannkuch.lua
index 2a4cd426..c963c66f 100644
--- a/perf/LuaJIT-benches/fannkuch.lua
+++ b/perf/LuaJIT-benches/fannkuch.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local function fannkuch(n)
   local p, q, s, odd, check, maxflips = {}, {}, {}, true, 0, 0
@@ -6,7 +7,7 @@ local function fannkuch(n)
     -- Print max. 30 permutations.
     if check < 30 then
       if not p[n] then return maxflips end	-- Catch n = 0, 1, 2.
-      io.write(unpack(p)); io.write("\n")
+      -- io.write(unpack(p)); io.write("\n")
       check = check + 1
     end
     -- Copy and flip.
@@ -46,5 +47,35 @@ local function fannkuch(n)
   until false
 end
 
-local n = tonumber(arg and arg[1]) or 1
-io.write("Pfannkuchen(", n, ") = ", fannkuch(n), "\n")
+local n = tonumber(arg and arg[1]) or 11
+
+-- Precomputed numbers taken from:
+-- https://dl.acm.org/doi/pdf/10.1145/382109.382124
+local FANNKUCH = { 0, 1, 2, 4, 7, 10, 16, 22, 30, 38, 51, 65, 80 }
+
+local function factorial(n)
+  local fact = 1
+  for i = 2, n do
+    fact = fact * i
+  end
+  return fact
+end
+
+bench:add({
+  name = "fannkuch",
+  payload = function()
+    return fannkuch(n)
+  end,
+  checker = function(res)
+    if n > #FANNKUCH then
+      -- Not precomputed, so can't check.
+      return true
+    else
+      return res == FANNKUCH[n]
+    end
+  end,
+  -- Assume that we count permutations here.
+  items = factorial(n),
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 10/41] perf: adjust fasta in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (8 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 09/41] perf: adjust fannkuch " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 11/41] perf: adjust k-nucleotide " Sergey Kaplun via Tarantool-patches
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Since the result output (with the different input parameter value)
produced by this benchmark is used in other benchmarks
(<k-nucleotide.lua> and <revcomp.lua>), the original script is used as a
library (inside the <libs/> subdirectory) with the updated default input
value and returns the number of items processed. The output for the
benchmark itself is suppressed and not checked since it is irrational to
store in the repository such huge files for testing.
---
 perf/LuaJIT-benches/fasta.lua      | 120 +++++++----------------------
 perf/LuaJIT-benches/libs/fasta.lua |  98 +++++++++++++++++++++++
 2 files changed, 125 insertions(+), 93 deletions(-)
 create mode 100644 perf/LuaJIT-benches/libs/fasta.lua

diff --git a/perf/LuaJIT-benches/fasta.lua b/perf/LuaJIT-benches/fasta.lua
index 7ce60804..d0dc005d 100644
--- a/perf/LuaJIT-benches/fasta.lua
+++ b/perf/LuaJIT-benches/fasta.lua
@@ -1,95 +1,29 @@
-
-local Last = 42
-local function random(max)
-  local y = (Last * 3877 + 29573) % 139968
-  Last = y
-  return (max * y) / 139968
-end
-
-local function make_repeat_fasta(id, desc, s, n)
-  local write, sub = io.write, string.sub
-  write(">", id, " ", desc, "\n")
-  local p, sn, s2 = 1, #s, s..s
-  for i=60,n,60 do
-    write(sub(s2, p, p + 59), "\n")
-    p = p + 60; if p > sn then p = p - sn end
-  end
-  local tail = n % 60
-  if tail > 0 then write(sub(s2, p, p + tail-1), "\n") end
-end
-
-local function make_random_fasta(id, desc, bs, n)
-  io.write(">", id, " ", desc, "\n")
-  loadstring([=[
-    local write, char, unpack, n, random = io.write, string.char, unpack, ...
-    local buf, p = {}, 1
-    for i=60,n,60 do
-      for j=p,p+59 do ]=]..bs..[=[ end
-      buf[p+60] = 10; p = p + 61
-      if p >= 2048 then write(char(unpack(buf, 1, p-1))); p = 1 end
-    end
-    local tail = n % 60
-    if tail > 0 then
-      for j=p,p+tail-1 do ]=]..bs..[=[ end
-      p = p + tail; buf[p] = 10; p = p + 1
-    end
-    write(char(unpack(buf, 1, p-1)))
-  ]=], desc)(n, random)
-end
-
-local function bisect(c, p, lo, hi)
-  local n = hi - lo
-  if n == 0 then return "buf[j] = "..c[hi].."\n" end
-  local mid = math.floor(n / 2)
-  return "if r < "..p[lo+mid].." then\n"..bisect(c, p, lo, lo+mid)..
-         "else\n"..bisect(c, p, lo+mid+1, hi).."end\n"
-end
-
-local function make_bisect(tab)
-  local c, p, sum = {}, {}, 0
-  for i,row in ipairs(tab) do
-    c[i] = string.byte(row[1])
-    sum = sum + row[2]
-    p[i] = sum
-  end
-  return "local r = random(1)\n"..bisect(c, p, 1, #tab)
-end
-
-local alu =
-  "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"..
-  "GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"..
-  "CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"..
-  "ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"..
-  "GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"..
-  "AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"..
-  "AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"
-
-local iub = make_bisect{
-  { "a", 0.27 },
-  { "c", 0.12 },
-  { "g", 0.12 },
-  { "t", 0.27 },
-  { "B", 0.02 },
-  { "D", 0.02 },
-  { "H", 0.02 },
-  { "K", 0.02 },
-  { "M", 0.02 },
-  { "N", 0.02 },
-  { "R", 0.02 },
-  { "S", 0.02 },
-  { "V", 0.02 },
-  { "W", 0.02 },
-  { "Y", 0.02 },
-}
-
-local homosapiens = make_bisect{
-  { "a", 0.3029549426680 },
-  { "c", 0.1979883004921 },
-  { "g", 0.1975473066391 },
-  { "t", 0.3015094502008 },
+local bench = require("bench").new(arg)
+
+local stdout = io.output()
+
+local benchmark
+benchmark = {
+  name = "fasta",
+  -- XXX: The result file may take up to 278 Mb for the default
+  -- settings. To check the correctness of the script, run it as
+  -- is from the console.
+  skip_check = true,
+  setup = function()
+    io.output("/dev/null")
+  end,
+  payload = function()
+    -- Run the benchmark as is from the file.
+    local items = require("fasta")
+    -- Remove it from the cache to be sure the benchmark will run
+    -- at the next iteration.
+    package.loaded["fasta"] = nil
+    benchmark.items = items
+  end,
+  teardown = function()
+    io.output(stdout)
+  end,
 }
 
-local N = tonumber(arg and arg[1]) or 1000
-make_repeat_fasta('ONE', 'Homo sapiens alu', alu, N*2)
-make_random_fasta('TWO', 'IUB ambiguity codes', iub, N*3)
-make_random_fasta('THREE', 'Homo sapiens frequency', homosapiens, N*5)
+bench:add(benchmark)
+bench:run_and_report()
diff --git a/perf/LuaJIT-benches/libs/fasta.lua b/perf/LuaJIT-benches/libs/fasta.lua
new file mode 100644
index 00000000..9c72c244
--- /dev/null
+++ b/perf/LuaJIT-benches/libs/fasta.lua
@@ -0,0 +1,98 @@
+
+local Last = 42
+local function random(max)
+  local y = (Last * 3877 + 29573) % 139968
+  Last = y
+  return (max * y) / 139968
+end
+
+local function make_repeat_fasta(id, desc, s, n)
+  local write, sub = io.write, string.sub
+  write(">", id, " ", desc, "\n")
+  local p, sn, s2 = 1, #s, s..s
+  for i=60,n,60 do
+    write(sub(s2, p, p + 59), "\n")
+    p = p + 60; if p > sn then p = p - sn end
+  end
+  local tail = n % 60
+  if tail > 0 then write(sub(s2, p, p + tail-1), "\n") end
+end
+
+local function make_random_fasta(id, desc, bs, n)
+  io.write(">", id, " ", desc, "\n")
+  loadstring([=[
+    local write, char, unpack, n, random = io.write, string.char, unpack, ...
+    local buf, p = {}, 1
+    for i=60,n,60 do
+      for j=p,p+59 do ]=]..bs..[=[ end
+      buf[p+60] = 10; p = p + 61
+      if p >= 2048 then write(char(unpack(buf, 1, p-1))); p = 1 end
+    end
+    local tail = n % 60
+    if tail > 0 then
+      for j=p,p+tail-1 do ]=]..bs..[=[ end
+      p = p + tail; buf[p] = 10; p = p + 1
+    end
+    write(char(unpack(buf, 1, p-1)))
+  ]=], desc)(n, random)
+end
+
+local function bisect(c, p, lo, hi)
+  local n = hi - lo
+  if n == 0 then return "buf[j] = "..c[hi].."\n" end
+  local mid = math.floor(n / 2)
+  return "if r < "..p[lo+mid].." then\n"..bisect(c, p, lo, lo+mid)..
+         "else\n"..bisect(c, p, lo+mid+1, hi).."end\n"
+end
+
+local function make_bisect(tab)
+  local c, p, sum = {}, {}, 0
+  for i,row in ipairs(tab) do
+    c[i] = string.byte(row[1])
+    sum = sum + row[2]
+    p[i] = sum
+  end
+  return "local r = random(1)\n"..bisect(c, p, 1, #tab)
+end
+
+local alu =
+  "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"..
+  "GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"..
+  "CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"..
+  "ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"..
+  "GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"..
+  "AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"..
+  "AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"
+
+local iub = make_bisect{
+  { "a", 0.27 },
+  { "c", 0.12 },
+  { "g", 0.12 },
+  { "t", 0.27 },
+  { "B", 0.02 },
+  { "D", 0.02 },
+  { "H", 0.02 },
+  { "K", 0.02 },
+  { "M", 0.02 },
+  { "N", 0.02 },
+  { "R", 0.02 },
+  { "S", 0.02 },
+  { "V", 0.02 },
+  { "W", 0.02 },
+  { "Y", 0.02 },
+}
+
+local homosapiens = make_bisect{
+  { "a", 0.3029549426680 },
+  { "c", 0.1979883004921 },
+  { "g", 0.1975473066391 },
+  { "t", 0.3015094502008 },
+}
+
+local N = tonumber(arg and arg[1]) or 25e6
+
+make_repeat_fasta('ONE', 'Homo sapiens alu', alu, N*2)
+make_random_fasta('TWO', 'IUB ambiguity codes', iub, N*3)
+make_random_fasta('THREE', 'Homo sapiens frequency', homosapiens, N*5)
+
+return N*2 + N*3 + N*5
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 11/41] perf: adjust k-nucleotide in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (9 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 10/41] perf: adjust fasta " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 12/41] perf: adjust life " Sergey Kaplun via Tarantool-patches
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The benchmark input is given by redirecting the corresponding
<FASTA_5000000> file generated by the `libs/fasta.lua 5e6`. The output
from the benchmark is redirected to /dev/null. All checks are done by
the comparison with the precomputed values for the aforementioned file.
---
 perf/LuaJIT-benches/k-nucleotide.lua | 93 ++++++++++++++++++++++++----
 1 file changed, 82 insertions(+), 11 deletions(-)

diff --git a/perf/LuaJIT-benches/k-nucleotide.lua b/perf/LuaJIT-benches/k-nucleotide.lua
index 0bfb41be..ae51dae9 100644
--- a/perf/LuaJIT-benches/k-nucleotide.lua
+++ b/perf/LuaJIT-benches/k-nucleotide.lua
@@ -1,3 +1,4 @@
+local bench = require('bench').new(arg)
 
 local function kfrequency(seq, freq, k, frame)
   local sub = string.sub
@@ -12,7 +13,8 @@ local function count(seq, frag)
   local k = #frag
   local freq = {}
   for frame=1,k do kfrequency(seq, freq, k, frame) end
-  io.write(freq[frag] or 0, "\t", frag, "\n")
+  return freq[frag]
+  -- io.write(freq[frag] or 0, "\t", frag, "\n")
 end
 
 local function frequency(seq, k)
@@ -24,10 +26,13 @@ local function frequency(seq, k)
     local fa, fb = freq[a], freq[b]
     return fa == fb and a > b or fa > fb
   end)
+  local res = {}
   for _,c in ipairs(sfreq) do
-    io.write(string.format("%s %0.3f\n", c, (freq[c]*100)/sum))
+    -- io.write(string.format("%s %0.3f\n", c, (freq[c]*100)/sum))
+    res[c] = freq[c]*100/sum
   end
-  io.write("\n")
+  -- io.write("\n")
+  return res
 end
 
 local function readseq()
@@ -48,11 +53,77 @@ local function readseq()
   return string.upper(table.concat(lines, "", 1, ln))
 end
 
-local seq = readseq()
-frequency(seq, 1)
-frequency(seq, 2)
-count(seq, "GGT")
-count(seq, "GGTA")
-count(seq, "GGTATT")
-count(seq, "GGTATTTTAATT")
-count(seq, "GGTATTTTAATTTATAGT")
+local function check_freq(res, expected)
+  for k,v in pairs(expected) do
+    assert(string.format("%0.3f", res[k]) == v,
+           "Incorrect frequency for fragment " .. k)
+  end
+end
+
+-- The input is generated by `fasta.lua 5e6'. The check function
+-- is corresponding.
+local N = 5e6
+-- See <libs/fasta.lua> for the details.
+local items = N * 5
+bench:add({
+  name = "k_nucleotide",
+  payload = function()
+    local seq = readseq()
+    local sfreq1 = frequency(seq, 1)
+    local sfreq2 = frequency(seq, 2)
+    local GGT  = count(seq, "GGT")
+    local GGTA = count(seq, "GGTA")
+    local GGTATT = count(seq, "GGTATT")
+    local GGTATTTTAATT = count(seq, "GGTATTTTAATT")
+    local GGTATTTTAATTTATAGT = count(seq, "GGTATTTTAATTTATAGT")
+
+    local res = {
+      sfreq1 = sfreq1,
+      sfreq2 = sfreq2,
+      GGT  = GGT,
+      GGTA = GGTA,
+      GGTATT = GGTATT,
+      GGTATTTTAATT = GGTATTTTAATT,
+      GGTATTTTAATTTATAGT = GGTATTTTAATTTATAGT,
+    }
+    -- XXX: Reset input for the non-check iteration.
+    io.stdin:seek("set", 0)
+    return res
+  end,
+  checker = function(res)
+    check_freq(res.sfreq1, {
+      A = "30.296",
+      T = "30.149",
+      C = "19.800",
+      G = "19.754",
+    })
+    check_freq(res.sfreq2, {
+      AA = "9.177",
+      TA = "9.132",
+      AT = "9.130",
+      TT = "9.091",
+      CA = "6.002",
+      AC = "6.001",
+      AG = "5.987",
+      GA = "5.984",
+      CT = "5.971",
+      TC = "5.971",
+      GT = "5.957",
+      TG = "5.956",
+      CC = "3.917",
+      GC = "3.911",
+      CG = "3.909",
+      GG = "3.902",
+    })
+
+    assert(res.GGT == 294331)
+    assert(res.GGTA == 89290)
+    assert(res.GGTATT == 9462)
+    assert(res.GGTATTTTAATT == 178)
+    assert(res.GGTATTTTAATTTATAGT == 178)
+    return true
+  end,
+  items = items,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 12/41] perf: adjust life in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (10 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 11/41] perf: adjust k-nucleotide " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 13/41] perf: adjust mandelbrot-bit " Sergey Kaplun via Tarantool-patches
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2994 bytes --]

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file.

The output is redirected to /dev/null. The checker tests the result
after the exact amount of iterations for the fixed field (as it is
declared in the original benchmark).
---
 perf/LuaJIT-benches/life.lua | 79 +++++++++++++++++++++++++++++++++++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/life.lua b/perf/LuaJIT-benches/life.lua
index 911d9fe1..d0e4dc98 100644
--- a/perf/LuaJIT-benches/life.lua
+++ b/perf/LuaJIT-benches/life.lua
@@ -3,6 +3,8 @@
 -- modified to use ANSI terminal escape sequences
 -- modified to use for instead of while
 
+local bench = require('bench').new(arg)
+
 local write=io.write
 
 ALIVE="¥"	DEAD="þ"
@@ -106,6 +108,81 @@ function LIFE(w,h)
     if gen>2000 then break end
     --delay()		-- no delay
   end
+  return thisgen
 end
 
-LIFE(40,20)
+-- Result of the LIFE(40, 20) after 2000 generations.
+--[[
+----------------------------------------
+----------------------------------------
+--OO--------------------------O---------
+-OO--------------------------O-O--------
+---O--------------------------O---------
+----------------------------------------
+----------------------------------------
+----------------------------------------
+----------------------------------------
+----------------------------------------
+----------------------------------------
+----------------------------------------
+---O------------------------------------
+--O-O-----------------------------------
+--O-O-----------------------------------
+---O------------------------------------
+----------------------------------------
+-------OO-------------------------------
+-------OO-------------------------------
+----------------------------------------
+]]
+
+local function check_life(thisgen, w, h)
+  local expected_cells = ARRAY2D(w, h)
+  for y = 1, h do
+    for x = 1, w do
+      expected_cells[y][x] = false
+    end
+  end
+  local alive_cells = {
+    {3, 3}, {3, 4}, {3, 31},
+    {4, 2}, {4, 3}, {4, 30}, {4, 32},
+    {5, 4}, {5, 31},
+    {13, 4},
+    {14, 3}, {14, 5},
+    {15, 3}, {15, 5},
+    {16, 4},
+    {18, 8}, {18, 9},
+    {19, 8}, {19, 9},
+  }
+  for _, cell in ipairs(alive_cells) do
+    local y, x = cell[1], cell[2]
+    expected_cells[y][x] = true
+  end
+  for y = 1, h do
+    for x = 1, w do
+      assert(thisgen[y][x] > 0 == expected_cells[y][x],
+             ('Incorrect value for cell (%d, %d)'):format(x, y))
+    end
+  end
+  return true
+end
+
+local stdout = io.output()
+
+bench:add({
+  name = 'life',
+  setup = function()
+    io.output('/dev/null')
+  end,
+  payload = function()
+    return LIFE(40, 20)
+  end,
+  teardown = function()
+    io.output(stdout)
+  end,
+  checker = function(res)
+    return check_life(res, 40, 20)
+  end,
+  items = 2000 * 40 * 20,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 13/41] perf: adjust mandelbrot-bit in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (11 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 12/41] perf: adjust life " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 14/41] perf: adjust mandelbrot " Sergey Kaplun via Tarantool-patches
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The output is redirected to /dev/null. The check is skipped since it is
very inconvenient to check the binary output, especially since it may be
configured by the parameter.
---
 perf/LuaJIT-benches/mandelbrot-bit.lua | 86 +++++++++++++++++---------
 1 file changed, 57 insertions(+), 29 deletions(-)

diff --git a/perf/LuaJIT-benches/mandelbrot-bit.lua b/perf/LuaJIT-benches/mandelbrot-bit.lua
index 91d96975..a6b5e1f8 100644
--- a/perf/LuaJIT-benches/mandelbrot-bit.lua
+++ b/perf/LuaJIT-benches/mandelbrot-bit.lua
@@ -1,33 +1,61 @@
-
 local bit = require("bit")
-local bor, band = bit.bor, bit.band
-local shl, shr, rol = bit.lshift, bit.rshift, bit.rol
-local write, char, unpack = io.write, string.char, unpack
-local N = tonumber(arg and arg[1]) or 100
-local M, buf = 2/N, {}
-write("P4\n", N, " ", N, "\n")
-for y=0,N-1 do
-  local Ci, b, p = y*M-1, -16777216, 0
-  local Ciq = Ci*Ci
-  for x=0,N-1,2 do
-    local Cr, Cr2 = x*M-1.5, (x+1)*M-1.5
-    local Zr, Zi, Zrq, Ziq = Cr, Ci, Cr*Cr, Ciq
-    local Zr2, Zi2, Zrq2, Ziq2 = Cr2, Ci, Cr2*Cr2, Ciq
-    b = rol(b, 2)
-    for i=1,49 do
-      Zi = Zr*Zi*2 + Ci; Zi2 = Zr2*Zi2*2 + Ci
-      Zr = Zrq-Ziq + Cr; Zr2 = Zrq2-Ziq2 + Cr2
-      Ziq = Zi*Zi; Ziq2 = Zi2*Zi2
-      Zrq = Zr*Zr; Zrq2 = Zr2*Zr2
-      if band(b, 2) ~= 0 and Zrq+Ziq > 4.0 then b = band(b, -3) end
-      if band(b, 1) ~= 0 and Zrq2+Ziq2 > 4.0 then b = band(b, -2) end
-      if band(b, 3) == 0 then break end
+
+local bench = require("bench").new(arg)
+
+local N = tonumber(arg and arg[1]) or 5000
+
+local function payload()
+  -- These functions must not be an upvalue but the stack slot.
+  local N = N
+  local bor, band = bit.bor, bit.band
+  local shl, shr, rol = bit.lshift, bit.rshift, bit.rol
+  local write, char, unpack = io.write, string.char, unpack
+
+  local M, buf = 2/N, {}
+  write("P4\n", N, " ", N, "\n")
+  for y=0,N-1 do
+    local Ci, b, p = y*M-1, -16777216, 0
+    local Ciq = Ci*Ci
+    for x=0,N-1,2 do
+      local Cr, Cr2 = x*M-1.5, (x+1)*M-1.5
+      local Zr, Zi, Zrq, Ziq = Cr, Ci, Cr*Cr, Ciq
+      local Zr2, Zi2, Zrq2, Ziq2 = Cr2, Ci, Cr2*Cr2, Ciq
+      b = rol(b, 2)
+      for i=1,49 do
+        Zi = Zr*Zi*2 + Ci; Zi2 = Zr2*Zi2*2 + Ci
+        Zr = Zrq-Ziq + Cr; Zr2 = Zrq2-Ziq2 + Cr2
+        Ziq = Zi*Zi; Ziq2 = Zi2*Zi2
+        Zrq = Zr*Zr; Zrq2 = Zr2*Zr2
+        if band(b, 2) ~= 0 and Zrq+Ziq > 4.0 then b = band(b, -3) end
+        if band(b, 1) ~= 0 and Zrq2+Ziq2 > 4.0 then b = band(b, -2) end
+        if band(b, 3) == 0 then break end
+      end
+      if b >= 0 then p = p + 1; buf[p] = b; b = -16777216; end
     end
-    if b >= 0 then p = p + 1; buf[p] = b; b = -16777216; end
-  end
-  if b ~= -16777216 then
-    if band(N, 1) ~= 0 then b = shr(b, 1) end
-    p = p + 1; buf[p] = shl(b, 8-band(N, 7))
+    if b ~= -16777216 then
+      if band(N, 1) ~= 0 then b = shr(b, 1) end
+      p = p + 1; buf[p] = shl(b, 8-band(N, 7))
+    end
+    write(char(unpack(buf, 1, p)))
   end
-  write(char(unpack(buf, 1, p)))
 end
+
+local stdout = io.output()
+
+bench:add({
+  name = "mandelbrot_bit",
+  items = N,
+  -- XXX: This is inconvenient to have the binary file in the
+  -- repository for the comparison. If the check is needed, run
+  -- the payload manually.
+  skip_check = true,
+  setup = function()
+    io.output("/dev/null")
+  end,
+  teardown = function()
+    io.output(stdout)
+  end,
+  payload = payload,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 14/41] perf: adjust mandelbrot in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (12 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 13/41] perf: adjust mandelbrot-bit " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 15/41] perf: adjust md5 " Sergey Kaplun via Tarantool-patches
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The output is redirected to /dev/null. The check is skipped since it is
very inconvenient to check the binary output, especially since it may be
configured by the parameter.
---
 perf/LuaJIT-benches/mandelbrot.lua | 64 +++++++++++++++++++++---------
 1 file changed, 45 insertions(+), 19 deletions(-)

diff --git a/perf/LuaJIT-benches/mandelbrot.lua b/perf/LuaJIT-benches/mandelbrot.lua
index 0ef595a2..51e0dd4f 100644
--- a/perf/LuaJIT-benches/mandelbrot.lua
+++ b/perf/LuaJIT-benches/mandelbrot.lua
@@ -1,23 +1,49 @@
+local bench = require("bench").new(arg)
 
-local write, char, unpack = io.write, string.char, unpack
-local N = tonumber(arg and arg[1]) or 100
-local M, ba, bb, buf = 2/N, 2^(N%8+1)-1, 2^(8-N%8), {}
-write("P4\n", N, " ", N, "\n")
-for y=0,N-1 do
-  local Ci, b, p = y*M-1, 1, 0
-  for x=0,N-1 do
-    local Cr = x*M-1.5
-    local Zr, Zi, Zrq, Ziq = Cr, Ci, Cr*Cr, Ci*Ci
-    b = b + b
-    for i=1,49 do
-      Zi = Zr*Zi*2 + Ci
-      Zr = Zrq-Ziq + Cr
-      Ziq = Zi*Zi
-      Zrq = Zr*Zr
-      if Zrq+Ziq > 4.0 then b = b + 1; break; end
+local N = tonumber(arg and arg[1]) or 5000
+
+local function payload()
+  -- These functions must not be an upvalue but the stack slot.
+  local N = N
+  local write, char, unpack = io.write, string.char, unpack
+  local M, ba, bb, buf = 2/N, 2^(N%8+1)-1, 2^(8-N%8), {}
+  write("P4\n", N, " ", N, "\n")
+  for y=0,N-1 do
+    local Ci, b, p = y*M-1, 1, 0
+    for x=0,N-1 do
+      local Cr = x*M-1.5
+      local Zr, Zi, Zrq, Ziq = Cr, Ci, Cr*Cr, Ci*Ci
+      b = b + b
+      for i=1,49 do
+        Zi = Zr*Zi*2 + Ci
+        Zr = Zrq-Ziq + Cr
+        Ziq = Zi*Zi
+        Zrq = Zr*Zr
+        if Zrq+Ziq > 4.0 then b = b + 1; break; end
+      end
+      if b >= 256 then p = p + 1; buf[p] = 511 - b; b = 1; end
     end
-    if b >= 256 then p = p + 1; buf[p] = 511 - b; b = 1; end
+    if b ~= 1 then p = p + 1; buf[p] = (ba-b)*bb; end
+    write(char(unpack(buf, 1, p)))
   end
-  if b ~= 1 then p = p + 1; buf[p] = (ba-b)*bb; end
-  write(char(unpack(buf, 1, p)))
 end
+
+local stdout = io.output()
+
+bench:add({
+  name = "mandelbrot",
+  items = N,
+  -- XXX: This is inconvenient to have the binary file in the
+  -- repository for the comparison. If the check is needed run,
+  -- the payload manually.
+  skip_check = true,
+  setup = function()
+    io.output("/dev/null")
+  end,
+  teardown = function()
+    io.output(stdout)
+  end,
+  payload = payload,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 15/41] perf: adjust md5 in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (13 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 14/41] perf: adjust mandelbrot " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 16/41] perf: adjust meteor " Sergey Kaplun via Tarantool-patches
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/md5.lua | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/perf/LuaJIT-benches/md5.lua b/perf/LuaJIT-benches/md5.lua
index fdf6b4a7..5ec67527 100644
--- a/perf/LuaJIT-benches/md5.lua
+++ b/perf/LuaJIT-benches/md5.lua
@@ -1,5 +1,6 @@
-
 local bit = require("bit")
+local bench = require("bench").new(arg)
+
 local tobit, tohex, bnot = bit.tobit or bit.cast, bit.tohex, bit.bnot
 local bor, band, bxor = bit.bor, bit.band, bit.bxor
 local lshift, rshift, rol, bswap = bit.lshift, bit.rshift, bit.rol, bit.bswap
@@ -147,7 +148,7 @@ assert(md5('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789') ==
 assert(md5('12345678901234567890123456789012345678901234567890123456789012345678901234567890') ==
        '57edf4a22be3c955ac49da2e2107b67a')
 
-local N = tonumber(arg and arg[1]) or 10000
+local N = tonumber(arg and arg[1]) or 20000
 
   -- Credits: William Shakespeare, Romeo and Juliet
 local txt = [[Rebellious subjects, enemies to peace,
@@ -176,8 +177,20 @@ Once more, on pain of death, all men depart.]]
   txt = txt..txt..txt..txt
   txt = txt..txt..txt..txt
 
-for i=1,N do
-  res = md5(txt)
-end
-assert(res == 'a831e91e0f70eddcb70dc61c6f82f6cd')
-
+bench:add({
+  name = 'md5',
+  payload = function()
+    local res
+    for i=1,N do
+      res = md5(txt)
+    end
+    return res
+  end,
+  checker = function(res)
+    assert(res == 'a831e91e0f70eddcb70dc61c6f82f6cd')
+    return true
+  end,
+  items = N,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 16/41] perf: adjust meteor in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (14 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 15/41] perf: adjust md5 " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 17/41] perf: adjust nbody " Sergey Kaplun via Tarantool-patches
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The arguments to the script still can be
provided in the command line run. However, the values greater than the
maximum possible solutions found do not affect the time of execution for
this benchmark. Hence, the number of items to proceed is considered
constant as the maximum possible number of solutions.
---
 perf/LuaJIT-benches/meteor.lua | 46 ++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/perf/LuaJIT-benches/meteor.lua b/perf/LuaJIT-benches/meteor.lua
index 80588ab5..f3962820 100644
--- a/perf/LuaJIT-benches/meteor.lua
+++ b/perf/LuaJIT-benches/meteor.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 -- Generate a decision tree based solver for the meteor puzzle.
 local function generatesolver(countinit)
@@ -118,6 +119,10 @@ local function printresult()
   printboard(smax)
 end
 
+local function getresult()
+  return countinit-count, smin, smax
+end
+
 -- Generate piece lookup array from the order of use.
 local function genp()
   local p = pcs
@@ -141,7 +146,7 @@ local function f91(k)
     local s = p[b0] ]]
   for p=2,99 do if ok[p] then s = s.."..p[b"..p.."]" end end
   s = s..[[
-    -- Remember min/max boards, dito for the symmetric board.
+    -- Remember min/max boards, ditto for the symmetric board.
     if not smin then smin = s; smax = s
     elseif s < smin then smin = s elseif s > smax then smax = s end
     s = reverse(s)
@@ -206,15 +211,36 @@ local f93 = f91
   end
 
   -- Compile and return solver function and result getter.
-  return loadstring(s.."return f0, printresult\n", "solver")(countinit)
+  return loadstring(s.."return f0, printresult, getresult\n", "solver")(countinit)
 end
 
--- Generate the solver function hierarchy.
-local solver, printresult = generatesolver(tonumber(arg and arg[1]) or 10000)
-
--- The optimizer for LuaJIT 1.1.x is not helpful here, so turn it off.
-if jit and jit.opt and jit.version_num < 10200 then jit.opt.start(0) end
+local N = tonumber(arg and arg[1]) or 10000
+
+bench:add({
+  name = "meteror",
+  setup = function()
+    -- The optimizer for LuaJIT 1.1.x is not helpful here, so turn it off.
+    if jit and jit.opt and jit.version_num < 10200 then jit.opt.start(0) end
+  end,
+  payload = function()
+    -- Generate the solver function hierarchy.
+    local solver, printresult, getresult = generatesolver(N)
+
+    -- Run the solver protected to get partial results (max count or ctrl-c).
+    pcall(solver, 0)
+
+    local n, smin, smax = getresult()
+    return {n = n, smin = smin, smax = smax}
+  end,
+  checker = function(res)
+    if N >= 2097 then
+      assert(res.n == 2098, "Incorrect solutions number")
+      assert(res.smin == "00001222012661126155865558633348893448934747977799")
+      assert(res.smax == "99998966856688568255777257472014220144031400311333")
+    end
+    return true
+  end,
+  items = 2098,
+})
 
--- Run the solver protected to get partial results (max count or ctrl-c).
-pcall(solver, 0)
-printresult()
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 17/41] perf: adjust nbody in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (15 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 16/41] perf: adjust meteor " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 18/41] perf: adjust nsieve-bit-fp " Sergey Kaplun via Tarantool-patches
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/nbody.lua | 127 ++++++++++++++++++++--------------
 1 file changed, 74 insertions(+), 53 deletions(-)

diff --git a/perf/LuaJIT-benches/nbody.lua b/perf/LuaJIT-benches/nbody.lua
index e0ff8f77..f01c20a3 100644
--- a/perf/LuaJIT-benches/nbody.lua
+++ b/perf/LuaJIT-benches/nbody.lua
@@ -1,56 +1,12 @@
+local bench = require("bench").new(arg)
 
 local sqrt = math.sqrt
 
 local PI = 3.141592653589793
 local SOLAR_MASS = 4 * PI * PI
 local DAYS_PER_YEAR = 365.24
-local bodies = {
-  { -- Sun
-    x = 0,
-    y = 0,
-    z = 0,
-    vx = 0,
-    vy = 0,
-    vz = 0,
-    mass = SOLAR_MASS
-  },
-  { -- Jupiter
-    x = 4.84143144246472090e+00,
-    y = -1.16032004402742839e+00,
-    z = -1.03622044471123109e-01,
-    vx = 1.66007664274403694e-03 * DAYS_PER_YEAR,
-    vy = 7.69901118419740425e-03 * DAYS_PER_YEAR,
-    vz = -6.90460016972063023e-05 * DAYS_PER_YEAR,
-    mass = 9.54791938424326609e-04 * SOLAR_MASS
-  },
-  { -- Saturn
-    x = 8.34336671824457987e+00,
-    y = 4.12479856412430479e+00,
-    z = -4.03523417114321381e-01,
-    vx = -2.76742510726862411e-03 * DAYS_PER_YEAR,
-    vy = 4.99852801234917238e-03 * DAYS_PER_YEAR,
-    vz = 2.30417297573763929e-05 * DAYS_PER_YEAR,
-    mass = 2.85885980666130812e-04 * SOLAR_MASS
-  },
-  { -- Uranus
-    x = 1.28943695621391310e+01,
-    y = -1.51111514016986312e+01,
-    z = -2.23307578892655734e-01,
-    vx = 2.96460137564761618e-03 * DAYS_PER_YEAR,
-    vy = 2.37847173959480950e-03 * DAYS_PER_YEAR,
-    vz = -2.96589568540237556e-05 * DAYS_PER_YEAR,
-    mass = 4.36624404335156298e-05 * SOLAR_MASS
-  },
-  { -- Neptune
-    x = 1.53796971148509165e+01,
-    y = -2.59193146099879641e+01,
-    z = 1.79258772950371181e-01,
-    vx = 2.68067772490389322e-03 * DAYS_PER_YEAR,
-    vy = 1.62824170038242295e-03 * DAYS_PER_YEAR,
-    vz = -9.51592254519715870e-05 * DAYS_PER_YEAR,
-    mass = 5.15138902046611451e-05 * SOLAR_MASS
-  }
-}
+local bodies
+local nbody
 
 local function advance(bodies, nbody, dt)
   for i=1,nbody do
@@ -110,10 +66,75 @@ local function offsetMomentum(b, nbody)
   b[1].vz = -pz / SOLAR_MASS
 end
 
-local N = tonumber(arg and arg[1]) or 1000
-local nbody = #bodies
+local DEFAULT_N = 5e6
+local N = tonumber(arg and arg[1]) or DEFAULT_N
 
-offsetMomentum(bodies, nbody)
-io.write( string.format("%0.9f",energy(bodies, nbody)), "\n")
-for i=1,N do advance(bodies, nbody, 0.01) end
-io.write( string.format("%0.9f",energy(bodies, nbody)), "\n")
+bench:add({
+  name = "nbody",
+  payload = function()
+    bodies = {
+      { -- Sun
+        x = 0,
+        y = 0,
+        z = 0,
+        vx = 0,
+        vy = 0,
+        vz = 0,
+        mass = SOLAR_MASS
+      },
+      { -- Jupiter
+        x = 4.84143144246472090e+00,
+        y = -1.16032004402742839e+00,
+        z = -1.03622044471123109e-01,
+        vx = 1.66007664274403694e-03 * DAYS_PER_YEAR,
+        vy = 7.69901118419740425e-03 * DAYS_PER_YEAR,
+        vz = -6.90460016972063023e-05 * DAYS_PER_YEAR,
+        mass = 9.54791938424326609e-04 * SOLAR_MASS
+      },
+      { -- Saturn
+        x = 8.34336671824457987e+00,
+        y = 4.12479856412430479e+00,
+        z = -4.03523417114321381e-01,
+        vx = -2.76742510726862411e-03 * DAYS_PER_YEAR,
+        vy = 4.99852801234917238e-03 * DAYS_PER_YEAR,
+        vz = 2.30417297573763929e-05 * DAYS_PER_YEAR,
+        mass = 2.85885980666130812e-04 * SOLAR_MASS
+      },
+      { -- Uranus
+        x = 1.28943695621391310e+01,
+        y = -1.51111514016986312e+01,
+        z = -2.23307578892655734e-01,
+        vx = 2.96460137564761618e-03 * DAYS_PER_YEAR,
+        vy = 2.37847173959480950e-03 * DAYS_PER_YEAR,
+        vz = -2.96589568540237556e-05 * DAYS_PER_YEAR,
+        mass = 4.36624404335156298e-05 * SOLAR_MASS
+      },
+      { -- Neptune
+        x = 1.53796971148509165e+01,
+        y = -2.59193146099879641e+01,
+        z = 1.79258772950371181e-01,
+        vx = 2.68067772490389322e-03 * DAYS_PER_YEAR,
+        vy = 1.62824170038242295e-03 * DAYS_PER_YEAR,
+        vz = -9.51592254519715870e-05 * DAYS_PER_YEAR,
+        mass = 5.15138902046611451e-05 * SOLAR_MASS
+      }
+    }
+    nbody = #bodies
+
+    offsetMomentum(bodies, nbody)
+
+    assert(energy(bodies, nbody) == -0.16907516382852447179,
+             "Correct start energy")
+    for i=1,N do advance(bodies, nbody, 0.01) end
+  end,
+  checker = function()
+    if N == DEFAULT_N then
+      assert(energy(bodies, nbody) == -0.16908313397890917251,
+             "Correct result energy")
+    end
+    return true
+  end,
+  items = N,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 18/41] perf: adjust nsieve-bit-fp in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (16 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 17/41] perf: adjust nbody " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 19/41] perf: adjust nsieve-bit " Sergey Kaplun via Tarantool-patches
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/nsieve-bit-fp.lua | 35 +++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/perf/LuaJIT-benches/nsieve-bit-fp.lua b/perf/LuaJIT-benches/nsieve-bit-fp.lua
index 3971ec1f..d0ab23d2 100644
--- a/perf/LuaJIT-benches/nsieve-bit-fp.lua
+++ b/perf/LuaJIT-benches/nsieve-bit-fp.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local floor, ceil = math.floor, math.ceil
 
@@ -27,11 +28,35 @@ local function nsieve(p, m)
   return count
 end
 
-local N = tonumber(arg and arg[1]) or 1
+local DEFAULT_N = 12
+local N = tonumber(arg and arg[1]) or DEFAULT_N
 if N < 2 then N = 2 end
 local primes = {}
 
-for i=0,2 do
-  local m = (2^(N-i))*10000
-  io.write(string.format("Primes up to %8d %8d\n", m, nsieve(primes, m)))
-end
+local benchmark
+benchmark = {
+  name = "nsieve_bit_fp",
+  payload = function()
+    local res = {}
+    local items = 0
+    for i=0,2 do
+      local m = (2^(N-i))*10000
+      items = items + m
+      res[i] = nsieve(primes, m)
+    end
+    benchmark.items = items
+
+    return res
+  end,
+  checker = function(res)
+    if N == DEFAULT_N then
+      assert(res[0] == 2488465)
+      assert(res[1] == 1299069)
+      assert(res[2] == 679461)
+    end
+    return true
+  end,
+}
+
+bench:add(benchmark)
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 19/41] perf: adjust nsieve-bit in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (17 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 18/41] perf: adjust nsieve-bit-fp " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 20/41] perf: adjust nsieve " Sergey Kaplun via Tarantool-patches
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/nsieve-bit.lua | 35 +++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/perf/LuaJIT-benches/nsieve-bit.lua b/perf/LuaJIT-benches/nsieve-bit.lua
index 820a3726..4858e9e2 100644
--- a/perf/LuaJIT-benches/nsieve-bit.lua
+++ b/perf/LuaJIT-benches/nsieve-bit.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local bit = require("bit")
 local band, bxor, rshift, rol = bit.band, bit.bxor, bit.rshift, bit.rol
@@ -17,11 +18,35 @@ local function nsieve(p, m)
   return count
 end
 
-local N = tonumber(arg and arg[1]) or 1
+local DEFAULT_N = 12
+local N = tonumber(arg and arg[1]) or DEFAULT_N
 if N < 2 then N = 2 end
 local primes = {}
 
-for i=0,2 do
-  local m = (2^(N-i))*10000
-  io.write(string.format("Primes up to %8d %8d\n", m, nsieve(primes, m)))
-end
+local benchmark
+benchmark = {
+  name = "nsieve_bit",
+  payload = function()
+    local res = {}
+    local items = 0
+    for i=0,2 do
+      local m = (2^(N-i))*10000
+      items = items + m
+      res[i] = nsieve(primes, m)
+    end
+    benchmark.items = items
+
+    return res
+  end,
+  checker = function(res)
+    if N == DEFAULT_N then
+      assert(res[0] == 2488465)
+      assert(res[1] == 1299069)
+      assert(res[2] == 679461)
+    end
+    return true
+  end,
+}
+
+bench:add(benchmark)
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 20/41] perf: adjust nsieve in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (18 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 19/41] perf: adjust nsieve-bit " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 21/41] perf: adjust partialsums " Sergey Kaplun via Tarantool-patches
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/nsieve.lua | 35 +++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/perf/LuaJIT-benches/nsieve.lua b/perf/LuaJIT-benches/nsieve.lua
index 6de0524f..2d1b66c8 100644
--- a/perf/LuaJIT-benches/nsieve.lua
+++ b/perf/LuaJIT-benches/nsieve.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local function nsieve(p, m)
   for i=2,m do p[i] = true end
@@ -11,11 +12,35 @@ local function nsieve(p, m)
   return count
 end
 
-local N = tonumber(arg and arg[1]) or 1
+local DEFAULT_N = 12
+local N = tonumber(arg and arg[1]) or DEFAULT_N
 if N < 2 then N = 2 end
 local primes = {}
 
-for i=0,2 do
-  local m = (2^(N-i))*10000
-  io.write(string.format("Primes up to %8d %8d\n", m, nsieve(primes, m)))
-end
+local benchmark
+benchmark = {
+  name = "nsieve",
+  payload = function()
+    local res = {}
+    local items = 0
+    for i=0,2 do
+      local m = (2^(N-i))*10000
+      items = items + m
+      res[i] = nsieve(primes, m)
+    end
+    benchmark.items = items
+
+    return res
+  end,
+  checker = function(res)
+    if N == DEFAULT_N then
+      assert(res[0] == 2488465)
+      assert(res[1] == 1299069)
+      assert(res[2] == 679461)
+    end
+    return true
+  end,
+}
+
+bench:add(benchmark)
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 21/41] perf: adjust partialsums in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (19 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 20/41] perf: adjust nsieve " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 22/41] perf: adjust pidigits-nogmp " Sergey Kaplun via Tarantool-patches
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/partialsums.lua | 69 ++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/perf/LuaJIT-benches/partialsums.lua b/perf/LuaJIT-benches/partialsums.lua
index 46bb9da3..ab24b30a 100644
--- a/perf/LuaJIT-benches/partialsums.lua
+++ b/perf/LuaJIT-benches/partialsums.lua
@@ -1,29 +1,44 @@
+local bench = require("bench").new(arg)
 
-local n = tonumber(arg[1])
-local function pr(fmt, x) io.write(string.format(fmt, x)) end
+local DEFAULT_N = 1e7
+local n = tonumber(arg[1]) or DEFAULT_N
 
-local a1, a2, a3, a4, a5, a6, a7, a8, a9, alt = 1, 0, 0, 0, 0, 0, 0, 0, 0, 1
-local sqrt, sin, cos = math.sqrt, math.sin, math.cos
-for k=1,n do
-  local k2, sk, ck = k*k, sin(k), cos(k)
-  local k3 = k2*k
-  a1 = a1 + (2/3)^k
-  a2 = a2 + 1/sqrt(k)
-  a3 = a3 + 1/(k2+k)
-  a4 = a4 + 1/(k3*sk*sk)
-  a5 = a5 + 1/(k3*ck*ck)
-  a6 = a6 + 1/k
-  a7 = a7 + 1/k2
-  a8 = a8 + alt/k
-  a9 = a9 + alt/(k+k-1)
-  alt = -alt
-end
-pr("%.9f\t(2/3)^k\n", a1)
-pr("%.9f\tk^-0.5\n", a2)
-pr("%.9f\t1/k(k+1)\n", a3)
-pr("%.9f\tFlint Hills\n", a4)
-pr("%.9f\tCookson Hills\n", a5)
-pr("%.9f\tHarmonic\n", a6)
-pr("%.9f\tRiemann Zeta\n", a7)
-pr("%.9f\tAlternating Harmonic\n", a8)
-pr("%.9f\tGregory\n", a9)
+bench:add({
+  name = "partialsums",
+  payload = function()
+    local a1, a2, a3, a4, a5, a6, a7, a8, a9, alt = 1, 0, 0, 0, 0, 0, 0, 0, 0, 1
+    local sqrt, sin, cos = math.sqrt, math.sin, math.cos
+    for k=1,n do
+      local k2, sk, ck = k*k, sin(k), cos(k)
+      local k3 = k2*k
+      a1 = a1 + (2/3)^k
+      a2 = a2 + 1/sqrt(k)
+      a3 = a3 + 1/(k2+k)
+      a4 = a4 + 1/(k3*sk*sk)
+      a5 = a5 + 1/(k3*ck*ck)
+      a6 = a6 + 1/k
+      a7 = a7 + 1/k2
+      a8 = a8 + alt/k
+      a9 = a9 + alt/(k+k-1)
+      alt = -alt
+    end
+    return {a1, a2, a3, a4, a5, a6, a7, a8, a9}
+  end,
+  checker = function(a)
+    if n == DEFAULT_N then
+      assert(a[1] == 2.99999999999999866773)
+      assert(a[2] == 6323.09512394020111969439)
+      assert(a[3] == 0.99999989999981531152)
+      assert(a[4] == 30.31454593111029183206)
+      assert(a[5] == 42.99523427973661426904)
+      assert(a[6] == 16.69531136585727182364)
+      assert(a[7] == 1.64493396684725956547)
+      assert(a[8] == 0.69314713056010635039)
+      assert(a[9] == 0.78539813839744787582)
+    end
+    return true
+  end,
+  items = n,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 22/41] perf: adjust pidigits-nogmp in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (20 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 21/41] perf: adjust partialsums " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 23/41] perf: adjust ray " Sergey Kaplun via Tarantool-patches
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The output is redirected to /dev/null. The check is skipped since it is
very inconvenient to store the huge file in the repository with the
reference value.
---
 perf/LuaJIT-benches/pidigits-nogmp.lua | 49 ++++++++++++++++++--------
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/perf/LuaJIT-benches/pidigits-nogmp.lua b/perf/LuaJIT-benches/pidigits-nogmp.lua
index 63a1cb0e..e96b3e45 100644
--- a/perf/LuaJIT-benches/pidigits-nogmp.lua
+++ b/perf/LuaJIT-benches/pidigits-nogmp.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 -- Start of dynamically compiled chunk.
 local chunk = [=[
@@ -80,21 +81,41 @@ end)
 
 ]=] -- End of dynamically compiled chunk.
 
-local N = tonumber(arg and arg[1]) or 27
+local N = tonumber(arg and arg[1]) or 5000
 local RADIX = N < 6500 and 2^36 or 2^32 -- Avoid overflow.
 
--- Substitute radix and compile chunk.
-local pidigit = loadstring(string.gsub(chunk, "RADIX", tostring(RADIX)))()
+local stdout = io.output()
 
--- Print lines with 10 digits.
-for i=10,N,10 do
-  for j=1,10 do io.write(pidigit()) end
-  io.write("\t:", i, "\n")
-end
+bench:add({
+  name = "pidigit_nogmp",
+  -- Avoid skip checking here, since it is not very convenient.
+  -- If you want to check the behaviour -- drop the setup
+  -- function.
+  skip_check = true,
+  setup = function()
+    io.output("/dev/null")
+  end,
+  payload = function()
+    -- Substitute radix and compile chunk.
+    local pidigit = loadstring(string.gsub(chunk, "RADIX", tostring(RADIX)))()
 
--- Print remaining digits (if any).
-local n10 = N % 10
-if n10 ~= 0 then
-  for i=1,n10 do io.write(pidigit()) end
-  io.write(string.rep(" ", 10-n10), "\t:", N, "\n")
-end
+    -- Print lines with 10 digits.
+    for i=10,N,10 do
+      for j=1,10 do io.write(pidigit()) end
+      io.write("\t:", i, "\n")
+    end
+
+    -- Print remaining digits (if any).
+    local n10 = N % 10
+    if n10 ~= 0 then
+      for i=1,n10 do io.write(pidigit()) end
+      io.write(string.rep(" ", 10-n10), "\t:", N, "\n")
+    end
+  end,
+  teardown = function()
+    io.output(stdout)
+  end,
+  items = N,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 23/41] perf: adjust ray in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (21 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 22/41] perf: adjust pidigits-nogmp " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 24/41] perf: adjust recursive-ack " Sergey Kaplun via Tarantool-patches
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The output is redirected to /dev/null. The check is skipped since it is
very inconvenient to check the binary output, especially since it may be
configured by the parameter.
---
 perf/LuaJIT-benches/ray.lua | 76 ++++++++++++++++++++++++-------------
 1 file changed, 50 insertions(+), 26 deletions(-)

diff --git a/perf/LuaJIT-benches/ray.lua b/perf/LuaJIT-benches/ray.lua
index 2acc24c0..f7b76d0a 100644
--- a/perf/LuaJIT-benches/ray.lua
+++ b/perf/LuaJIT-benches/ray.lua
@@ -1,10 +1,8 @@
+local bench = require("bench").new(arg)
+
 local sqrt = math.sqrt
 local huge = math.huge
-
-local delta = 1
-while delta * delta + 1 ~= 1 do
-  delta = delta * 0.5
-end
+local delta
 
 local function length(x, y, z)  return sqrt(x*x + y*y + z*z) end
 local function vlen(v)          return length(v[1], v[2], v[3]) end
@@ -110,26 +108,52 @@ end
 
 
 local level, n, ss = tonumber(arg[1]) or 9, tonumber(arg[2]) or 256, 4
-local iss = 1/ss
-local gf = 255/(ss*ss)
-
-io.write(("P5\n%d %d\n255\n"):format(n, n))
-local light = { unitise(-1, -3, 2) }
-ilight = { -light[1], -light[2], -light[3] }
-local camera = { 0, 0, -4 }
-local dir = { 0, 0, 0 }
-
-local scene = create(level, {0, -1, 0}, 1)
-
-for y = n/2-1, -n/2, -1 do
-  for x = -n/2, n/2-1 do
-    local g = 0
-    for d = y, y+.99, iss do
-      for e = x, x+.99, iss do
-        dir[1], dir[2], dir[3] = unitise(e, d, n)
-        g = g + ray_trace(light, camera, dir, scene) 
+
+local stdout = io.output()
+
+bench:add({
+  name = "ray",
+  -- Avoid skip checking here, since it is not very convenient.
+  -- If you want to check the behaviour -- drop the setup
+  -- function.
+  skip_check = true,
+  setup = function()
+    io.output("/dev/null")
+  end,
+  payload = function()
+    local iss = 1/ss
+    local gf = 255/(ss*ss)
+
+    delta = 1
+    while delta * delta + 1 ~= 1 do
+      delta = delta * 0.5
+    end
+
+    io.write(("P5\n%d %d\n255\n"):format(n, n))
+    local light = { unitise(-1, -3, 2) }
+    ilight = { -light[1], -light[2], -light[3] }
+    local camera = { 0, 0, -4 }
+    local dir = { 0, 0, 0 }
+
+    local scene = create(level, {0, -1, 0}, 1)
+
+    for y = n/2-1, -n/2, -1 do
+      for x = -n/2, n/2-1 do
+        local g = 0
+        for d = y, y+.99, iss do
+          for e = x, x+.99, iss do
+            dir[1], dir[2], dir[3] = unitise(e, d, n)
+            g = g + ray_trace(light, camera, dir, scene)
+          end
+        end
+        io.write(string.char(math.floor(0.5 + g*gf)))
       end
     end
-    io.write(string.char(math.floor(0.5 + g*gf)))
-  end
-end
+  end,
+  teardown = function()
+    io.output(stdout)
+  end,
+  items = n * n * level,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 24/41] perf: adjust recursive-ack in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (22 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 23/41] perf: adjust ray " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 25/41] perf: adjust recursive-fib " Sergey Kaplun via Tarantool-patches
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/recursive-ack.lua | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/recursive-ack.lua b/perf/LuaJIT-benches/recursive-ack.lua
index fad30589..1172d4b3 100644
--- a/perf/LuaJIT-benches/recursive-ack.lua
+++ b/perf/LuaJIT-benches/recursive-ack.lua
@@ -1,3 +1,5 @@
+local bench = require("bench").new(arg)
+
 local function Ack(m, n)
   if m == 0 then return n+1 end
   if n == 0 then return Ack(m-1, 1) end
@@ -5,4 +7,17 @@ local function Ack(m, n)
 end
 
 local N = tonumber(arg and arg[1]) or 10
-io.write("Ack(3,", N ,"): ", Ack(3,N), "\n")
+
+bench:add({
+  name = "recursive_ack",
+  -- Sum of calls for the function RA(3, N).
+  items = 128 * ((4 ^ N - 1) / 3) - 40 * (2 ^ N - 1) + 3 * N + 15,
+  payload = function()
+    return Ack(3, N)
+  end,
+  checker = function(res)
+    return res == 2 ^ (N + 3) - 3
+  end,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 25/41] perf: adjust recursive-fib in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (23 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 24/41] perf: adjust recursive-ack " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 26/41] perf: adjust revcomp " Sergey Kaplun via Tarantool-patches
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/recursive-fib.lua | 28 +++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/perf/LuaJIT-benches/recursive-fib.lua b/perf/LuaJIT-benches/recursive-fib.lua
index ef9950de..99af3f9e 100644
--- a/perf/LuaJIT-benches/recursive-fib.lua
+++ b/perf/LuaJIT-benches/recursive-fib.lua
@@ -1,7 +1,31 @@
+local bench = require("bench").new(arg)
+
 local function fib(n)
   if n < 2 then return 1 end
   return fib(n-2) + fib(n-1)
 end
 
-local n = tonumber(arg[1]) or 10
-io.write(string.format("Fib(%d): %d\n", n, fib(n)))
+local n = tonumber(arg[1]) or 40
+
+local benchmark
+benchmark = {
+  name = "recursive_fib",
+  checker = function(res)
+    local km1, k = 1, 1
+    for i = 2, n do
+      local tmp = k + km1
+      km1 = k
+      k = tmp
+    end
+    return k == res
+  end,
+  payload = function()
+    local res = fib(n)
+    -- Number of calls.
+    benchmark.items = res * 2 - 1
+    return res
+  end,
+}
+
+bench:add(benchmark)
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 26/41] perf: adjust revcomp in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (24 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 25/41] perf: adjust recursive-fib " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 27/41] perf: adjust scimark-2010-12-20 " Sergey Kaplun via Tarantool-patches
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The benchmark input is given by redirecting the corresponding
<FASTA_5000000> file generated by the `libs/fasta.lua 5e6`. The output
from the benchmark is redirected to /dev/null. Checks are skipped since
the output is very huge, and it is overkill to store it in the
repository.
---
 perf/LuaJIT-benches/revcomp.lua | 72 +++++++++++++++++++++------------
 1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/perf/LuaJIT-benches/revcomp.lua b/perf/LuaJIT-benches/revcomp.lua
index 34fe347b..2b1ffa5c 100644
--- a/perf/LuaJIT-benches/revcomp.lua
+++ b/perf/LuaJIT-benches/revcomp.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local sub = string.sub
 iubc = setmetatable({
@@ -9,29 +10,50 @@ iubc = setmetatable({
 }, { __index = function(t, s)
   local r = t[sub(s, 2)]..t[sub(s, 1, 1)]; t[s] = r; return r end })
 
-local wcode = [=[
-return function(t, n)
-  if n == 1 then return end
-  local iubc, sub, write = iubc, string.sub, io.write
-  local s = table.concat(t, "", 1, n-1)
-  for i=#s-59,1,-60 do
-    write(]=]
-for i=59,3,-4 do wcode = wcode.."iubc[sub(s, i+"..(i-3)..", i+"..i..")], " end
-wcode = wcode..[=["\n")
-  end
-  local r = #s % 60
-  if r ~= 0 then
-    for i=r,1,-4 do write(iubc[sub(s, i-3 < 1 and 1 or i-3, i)]) end
-    write("\n")
-  end
-end
-]=]
-local writerev = loadstring(wcode)()
+local stdout = io.output()
 
-local t, n = {}, 1
-for line in io.lines() do
-  local c = sub(line, 1, 1)
-  if c == ">" then writerev(t, n); io.write(line, "\n"); n = 1
-  elseif c ~= ";" then t[n] = line; n = n + 1 end
-end
-writerev(t, n)
+bench:add({
+  name = "revcomp",
+  -- The compare with the result output file is inconvenient.
+  skip_check = true,
+  setup = function()
+    io.output("/dev/null")
+  end,
+  payload = function()
+    local wcode = [=[
+    return function(t, n)
+      if n == 1 then return end
+      local iubc, sub, write = iubc, string.sub, io.write
+      local s = table.concat(t, "", 1, n-1)
+      for i=#s-59,1,-60 do
+        write(]=]
+    for i=59,3,-4 do wcode = wcode.."iubc[sub(s, i+"..(i-3)..", i+"..i..")], " end
+    wcode = wcode..[=["\n")
+      end
+      local r = #s % 60
+      if r ~= 0 then
+        for i=r,1,-4 do write(iubc[sub(s, i-3 < 1 and 1 or i-3, i)]) end
+        write("\n")
+      end
+    end
+    ]=]
+    local writerev = loadstring(wcode)()
+
+    local t, n = {}, 1
+    for line in io.lines() do
+      local c = sub(line, 1, 1)
+      if c == ">" then writerev(t, n); io.write(line, "\n"); n = 1
+      elseif c ~= ";" then t[n] = line; n = n + 1 end
+    end
+    writerev(t, n)
+    -- Repeat operation several times.
+    io.stdin:seek("set", 0)
+  end,
+  teardown = function()
+    io.output(stdout)
+  end,
+  -- Amount of symbols in the input file.
+  items = 5e6,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 27/41] perf: adjust scimark-2010-12-20 in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (25 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 26/41] perf: adjust revcomp " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 28/41] perf: move <scimark_lib.lua> to <libs/> directory Sergey Kaplun via Tarantool-patches
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The time for each subsequent benchmark is increased up to 4 seconds,
accoring the defaults in the "bench" framework. The main difference
between this test and others that will be added in next commits is
the usage of FFI arrays instead of plain Lua tables.
---
 perf/LuaJIT-benches/scimark-2010-12-20.lua | 93 +++++++++++++---------
 1 file changed, 54 insertions(+), 39 deletions(-)

diff --git a/perf/LuaJIT-benches/scimark-2010-12-20.lua b/perf/LuaJIT-benches/scimark-2010-12-20.lua
index 353acb7c..3fb627fa 100644
--- a/perf/LuaJIT-benches/scimark-2010-12-20.lua
+++ b/perf/LuaJIT-benches/scimark-2010-12-20.lua
@@ -9,25 +9,26 @@
 local SCIMARK_VERSION = "2010-12-10"
 local SCIMARK_COPYRIGHT = "Copyright (C) 2006-2010 Mike Pall"
 
-local MIN_TIME = 2.0
+local bench = require("bench").new(arg)
+
 local RANDOM_SEED = 101009 -- Must be odd.
 local SIZE_SELECT = "small"
 
 local benchmarks = {
   "FFT", "SOR", "MC", "SPARSE", "LU",
   small = {
-    FFT		= { 1024 },
-    SOR		= { 100 },
-    MC		= { },
-    SPARSE	= { 1000, 5000 },
-    LU		= { 100 },
+    FFT		= { params = { 1024 }, cycles = 50000, },
+    SOR		= { params = { 100 }, cycles = 50000, },
+    MC		= { params = { }, cycles = 15e7, },
+    SPARSE	= { params = { 1000, 5000 }, cycles = 15e4, },
+    LU		= { params = { 100 }, cycles = 5000, },
   },
   large = {
-    FFT		= { 1048576 },
-    SOR		= { 1000 },
-    MC		= { },
-    SPARSE	= { 100000, 1000000 },
-    LU		= { 1000 },
+    FFT		= { params = { 1048576 }, cycles = 25, },
+    SOR		= { params = { 1000 }, cycles = 500, },
+    MC		= { params = { }, cycles = 15e7, },
+    SPARSE	= { params = { 100000, 1000000 }, cycles = 1500, },
+    LU		= { params = { 1000 }, cycles = 50, },
   },
 }
 
@@ -342,48 +343,51 @@ local function fmtparams(p1, p2)
   return ""
 end
 
-local function measure(min_time, name, ...)
+local function measure(name, cycles, ...)
   array_init()
   rand_init(RANDOM_SEED)
   local run = benchmarks[name](...)
-  local cycles = 1
-  repeat
-    local tm = clock()
-    local flops = run(cycles, ...)
-    tm = clock() - tm
-    if tm >= min_time then
-      local res = flops / tm * 1.0e-6
-      local p1, p2 = ...
-      printf("%-7s %8.2f  %s\n", name, res, fmtparams(...))
-      return res
-    end
-    cycles = cycles * 2
-  until false
+  local flops = run(cycles, ...)
+  return flops
 end
 
-printf("Lua SciMark %s based on SciMark 2.0a. %s.\n\n",
-       SCIMARK_VERSION, SCIMARK_COPYRIGHT)
+-- printf("Lua SciMark %s based on SciMark 2.0a. %s.\n\n",
+--        SCIMARK_VERSION, SCIMARK_COPYRIGHT)
 
 while arg and arg[1] do
   local a = table.remove(arg, 1)
-  if a == "-noffi" then
+  if a == "noffi" then
     package.preload.ffi = nil
-  elseif a == "-small" then
+  elseif a == "small" then
     SIZE_SELECT = "small"
-  elseif a == "-large" then
+  elseif a == "large" then
     SIZE_SELECT = "large"
   elseif benchmarks[a] then
-    local p = benchmarks[SIZE_SELECT][a]
-    measure(MIN_TIME, a, tonumber(arg[1]) or p[1], tonumber(arg[2]) or p[2])
+    local cycles = benchmarks[SIZE_SELECT][a].cycles
+    local p = benchmarks[SIZE_SELECT][a].params
+    local b
+    b = {
+      name = a,
+      -- XXX: The description of tests for each function is too
+      -- inconvenient.
+      skip_check = true,
+      payload = function()
+        local flops = measure(a, cycles, tonumber(arg[1]) or p[1],
+                              tonumber(arg[2]) or p[2])
+        b.items = flops
+      end,
+    }
+    bench:add(b)
+    bench:run_and_report()
     return
   else
-    printf("Usage: scimark [-noffi] [-small|-large] [BENCH params...]\n\n")
-    printf("BENCH   -small         -large\n")
+    printf("Usage: scimark [noffi] [small|large] [BENCH params...]\n\n")
+    printf("BENCH   small         large\n")
     printf("---------------------------------------\n")
     for _,name in ipairs(benchmarks) do
       printf("%-7s %-13s %s\n", name,
-	     fmtparams(unpack(benchmarks.small[name])),
-	     fmtparams(unpack(benchmarks.large[name])))
+	     fmtparams(unpack(benchmarks.small[name].params)),
+	     fmtparams(unpack(benchmarks.large[name].params)))
     end
     printf("\n")
     os.exit(1)
@@ -393,8 +397,19 @@ end
 local params = benchmarks[SIZE_SELECT]
 local sum = 0
 for _,name in ipairs(benchmarks) do
-  sum = sum + measure(MIN_TIME, name, unpack(params[name]))
+  local cycles = params[name].cycles
+  local b
+  b = {
+    name = name,
+    -- XXX: The description of tests for each function is too
+    -- inconvenient.
+    skip_check = true,
+    payload = function()
+      local flops = measure(name, cycles, unpack(params[name].params))
+      b.items = flops
+    end,
+  }
+  bench:add(b)
 end
-printf("\nSciMark %8.2f  [%s problem sizes]\n", sum / #benchmarks, SIZE_SELECT)
-io.flush()
 
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 28/41] perf: move <scimark_lib.lua> to <libs/> directory
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (26 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 27/41] perf: adjust scimark-2010-12-20 " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 29/41] perf: adjust scimark-fft in LuaJIT-benches Sergey Kaplun via Tarantool-patches
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This helps to avoid this library in the scanning of the test files
for the suite.
---
 perf/LuaJIT-benches/{ => libs}/scimark_lib.lua | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename perf/LuaJIT-benches/{ => libs}/scimark_lib.lua (100%)

diff --git a/perf/LuaJIT-benches/scimark_lib.lua b/perf/LuaJIT-benches/libs/scimark_lib.lua
similarity index 100%
rename from perf/LuaJIT-benches/scimark_lib.lua
rename to perf/LuaJIT-benches/libs/scimark_lib.lua
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 29/41] perf: adjust scimark-fft in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (27 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 28/41] perf: move <scimark_lib.lua> to <libs/> directory Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu " Sergey Kaplun via Tarantool-patches
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-fft.lua | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/scimark-fft.lua b/perf/LuaJIT-benches/scimark-fft.lua
index c05bb69a..96535774 100644
--- a/perf/LuaJIT-benches/scimark-fft.lua
+++ b/perf/LuaJIT-benches/scimark-fft.lua
@@ -1 +1,18 @@
-require("scimark_lib").FFT(1024)(tonumber(arg and arg[1]) or 50000)
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 50000
+local benchmark
+benchmark = {
+  name = "scimark_fft",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").FFT(1024)(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (28 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 29/41] perf: adjust scimark-fft in LuaJIT-benches Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:01   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc " Sergey Kaplun via Tarantool-patches
                   ` (10 subsequent siblings)
  40 siblings, 2 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-lu.lua | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/scimark-lu.lua b/perf/LuaJIT-benches/scimark-lu.lua
index 7636d994..4f521e0b 100644
--- a/perf/LuaJIT-benches/scimark-lu.lua
+++ b/perf/LuaJIT-benches/scimark-lu.lua
@@ -1 +1,19 @@
-require("scimark_lib").LU(100)(tonumber(arg and arg[1]) or 5000)
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 5000
+
+local benchmark
+benchmark = {
+  name = "scimark_lu",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").LU(100)(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (29 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor " Sergey Kaplun via Tarantool-patches
                   ` (9 subsequent siblings)
  40 siblings, 2 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adds the aforementioned test with the use of the benchmark
framework introduced before. The default arguments are adjusted
according to the amount of cycles in the <scimark-2010-12-20.lua> file.
The arguments to the script can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-mc.lua | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 perf/LuaJIT-benches/scimark-mc.lua

diff --git a/perf/LuaJIT-benches/scimark-mc.lua b/perf/LuaJIT-benches/scimark-mc.lua
new file mode 100644
index 00000000..d26b6e48
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark-mc.lua
@@ -0,0 +1,19 @@
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 15e7
+
+local benchmark
+benchmark = {
+  name = "scimark_mc",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").MC()(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (30 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse " Sergey Kaplun via Tarantool-patches
                   ` (8 subsequent siblings)
  40 siblings, 2 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-sor.lua | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/scimark-sor.lua b/perf/LuaJIT-benches/scimark-sor.lua
index e537e986..9bcdb0ad 100644
--- a/perf/LuaJIT-benches/scimark-sor.lua
+++ b/perf/LuaJIT-benches/scimark-sor.lua
@@ -1 +1,19 @@
-require("scimark_lib").SOR(100)(tonumber(arg and arg[1]) or 50000)
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 50000
+
+local benchmark
+benchmark = {
+  name = "scimark_sor",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").SOR(100)(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (31 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 10:50 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:03   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 34/41] perf: adjust series " Sergey Kaplun via Tarantool-patches
                   ` (7 subsequent siblings)
  40 siblings, 2 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 10:50 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-sparse.lua | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/scimark-sparse.lua b/perf/LuaJIT-benches/scimark-sparse.lua
index 01a2258d..a855cc22 100644
--- a/perf/LuaJIT-benches/scimark-sparse.lua
+++ b/perf/LuaJIT-benches/scimark-sparse.lua
@@ -1 +1,19 @@
-require("scimark_lib").SPARSE(1000, 5000)(tonumber(arg and arg[1]) or 150000)
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 150000
+
+local benchmark
+benchmark = {
+  name = "scimark_sparse",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").SPARSE(1000, 5000)(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:01   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-lu.lua | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/scimark-lu.lua b/perf/LuaJIT-benches/scimark-lu.lua
index 7636d994..4f521e0b 100644
--- a/perf/LuaJIT-benches/scimark-lu.lua
+++ b/perf/LuaJIT-benches/scimark-lu.lua
@@ -1 +1,19 @@
-require("scimark_lib").LU(100)(tonumber(arg and arg[1]) or 5000)
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 5000
+
+local benchmark
+benchmark = {
+  name = "scimark_lu",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").LU(100)(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adds the aforementioned test with the use of the benchmark
framework introduced before. The default arguments are adjusted
according to the amount of cycles in the <scimark-2010-12-20.lua> file.
The arguments to the script can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-mc.lua | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 perf/LuaJIT-benches/scimark-mc.lua

diff --git a/perf/LuaJIT-benches/scimark-mc.lua b/perf/LuaJIT-benches/scimark-mc.lua
new file mode 100644
index 00000000..d26b6e48
--- /dev/null
+++ b/perf/LuaJIT-benches/scimark-mc.lua
@@ -0,0 +1,19 @@
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 15e7
+
+local benchmark
+benchmark = {
+  name = "scimark_mc",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").MC()(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-sor.lua | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/scimark-sor.lua b/perf/LuaJIT-benches/scimark-sor.lua
index e537e986..9bcdb0ad 100644
--- a/perf/LuaJIT-benches/scimark-sor.lua
+++ b/perf/LuaJIT-benches/scimark-sor.lua
@@ -1 +1,19 @@
-require("scimark_lib").SOR(100)(tonumber(arg and arg[1]) or 50000)
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 50000
+
+local benchmark
+benchmark = {
+  name = "scimark_sor",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").SOR(100)(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:03   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

Checks are omitted since they were not present in the original suite,
plus the precise result value depends on the input parameter.
---
 perf/LuaJIT-benches/scimark-sparse.lua | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/perf/LuaJIT-benches/scimark-sparse.lua b/perf/LuaJIT-benches/scimark-sparse.lua
index 01a2258d..a855cc22 100644
--- a/perf/LuaJIT-benches/scimark-sparse.lua
+++ b/perf/LuaJIT-benches/scimark-sparse.lua
@@ -1 +1,19 @@
-require("scimark_lib").SPARSE(1000, 5000)(tonumber(arg and arg[1]) or 150000)
+local bench = require("bench").new(arg)
+
+local cycles = tonumber(arg and arg[1]) or 150000
+
+local benchmark
+benchmark = {
+  name = "scimark_sparse",
+  -- XXX: The description of tests for the function is too
+  -- inconvenient.
+  skip_check = true,
+  payload = function()
+    local flops = require("scimark_lib").SPARSE(1000, 5000)(cycles)
+    benchmark.items = flops
+  end,
+}
+
+bench:add(benchmark)
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 34/41] perf: adjust series in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (32 preceding siblings ...)
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 35/41] perf: adjust spectral-norm " Sergey Kaplun via Tarantool-patches
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/series.lua | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/perf/LuaJIT-benches/series.lua b/perf/LuaJIT-benches/series.lua
index f766cb32..3dc970c5 100644
--- a/perf/LuaJIT-benches/series.lua
+++ b/perf/LuaJIT-benches/series.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local function integrate(x0, x1, nsteps, omegan, f)
   local x, dx = x0, (x1-x0)/nsteps
@@ -26,9 +27,16 @@ local function series(n)
 end
 
 local n = tonumber(arg and arg[1]) or 10000
-local tm = os.clock()
-local t = series(n)
-tm = os.clock() - tm
-assert(math.abs(t[1]-2.87295) < 0.00001)
-io.write(string.format("size %d, %.2f s, %.1f iterations/s\n",
-                       n, tm, (2*n-1)/tm))
+
+bench:add({
+  name = "series",
+  checker = function(res)
+    return math.abs(res[1]-2.87295) < 0.00001
+  end,
+  payload = function()
+    return series(n)
+  end,
+  items = 2 * n - 1,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 35/41] perf: adjust spectral-norm in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (33 preceding siblings ...)
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 34/41] perf: adjust series " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 36/41] perf: adjust sum-file " Sergey Kaplun via Tarantool-patches
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.
---
 perf/LuaJIT-benches/spectral-norm.lua | 40 +++++++++++++++++++--------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/perf/LuaJIT-benches/spectral-norm.lua b/perf/LuaJIT-benches/spectral-norm.lua
index ecc80112..6e63cd47 100644
--- a/perf/LuaJIT-benches/spectral-norm.lua
+++ b/perf/LuaJIT-benches/spectral-norm.lua
@@ -1,3 +1,4 @@
+local bench = require("bench").new(arg)
 
 local function A(i, j)
   local ij = i+j-1
@@ -25,16 +26,33 @@ local function AtAv(x, y, t, N)
   Atv(t, y, N)
 end
 
-local N = tonumber(arg and arg[1]) or 100
-local u, v, t = {}, {}, {}
-for i=1,N do u[i] = 1 end
+local N = tonumber(arg and arg[1]) or 3000
 
-for i=1,10 do AtAv(u, v, t, N) AtAv(v, u, t, N) end
+bench:add({
+  name = "spectral_norm",
+  checker = function(res)
+    -- XXX: Empirical value.
+    if N > 66 then
+      assert(math.abs(res - 1.27422) < 0.00001)
+    end
+    return true
+  end,
+  payload = function()
+    local u, v, t = {}, {}, {}
+    for i=1,N do u[i] = 1 end
 
-local vBv, vv = 0, 0
-for i=1,N do
-  local ui, vi = u[i], v[i]
-  vBv = vBv + ui*vi
-  vv = vv + vi*vi
-end
-io.write(string.format("%0.9f\n", math.sqrt(vBv / vv)))
+    for i=1,10 do AtAv(u, v, t, N) AtAv(v, u, t, N) end
+
+    local vBv, vv = 0, 0
+    for i=1,N do
+      local ui, vi = u[i], v[i]
+      vBv = vBv + ui*vi
+      vv = vv + vi*vi
+    end
+    return math.sqrt(vBv / vv)
+  end,
+  -- Operations inside `for i=1,10` loop.
+  items = 40 * N * N,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 36/41] perf: adjust sum-file in LuaJIT-benches
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (34 preceding siblings ...)
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 35/41] perf: adjust spectral-norm " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 37/41] perf: add CMake infrastructure Sergey Kaplun via Tarantool-patches
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adjusts the aforementioned test to use the benchmark
framework introduced before. The default arguments are adjusted
according to the <PARAM_x86.txt> file. The arguments to the script still
can be provided in the command line run.

The input for the test is redirected from the generated file
<SUMCOL_5000.txt>. This file is the result of concatenation of the
<SUMCOL_1.txt> 5000 times.
---
 perf/LuaJIT-benches/sum-file.lua | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/perf/LuaJIT-benches/sum-file.lua b/perf/LuaJIT-benches/sum-file.lua
index c9e618fd..270c1865 100644
--- a/perf/LuaJIT-benches/sum-file.lua
+++ b/perf/LuaJIT-benches/sum-file.lua
@@ -1,6 +1,25 @@
+local bench = require("bench").new(arg)
 
-local sum = 0
-for line in io.lines() do
-  sum = sum + line
-end
-io.write(sum, "\n")
+-- XXX: The input file is generated from <SUMCOL_1.txt> by
+-- repeating it 5000 times. The <SUMCOL_1.txt> contains 1000 lines
+-- with the total sum of 500.
+bench:add({
+  name = "sum_file",
+  payload = function()
+    local sum = 0
+    for line in io.lines() do
+      sum = sum + line
+    end
+    -- Allow several iterations.
+    io.stdin:seek("set", 0)
+    return sum
+  end,
+  checker = function(res)
+    -- Precomputed result.
+    return res == 2500000
+  end,
+  -- Fixed size of the file.
+  items = 5e6,
+})
+
+bench:run_and_report()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 37/41] perf: add CMake infrastructure
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (35 preceding siblings ...)
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 36/41] perf: adjust sum-file " Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 38/41] perf: add aggregator helper for bench statistics Sergey Kaplun via Tarantool-patches
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This commit introduces CMake building scripts for the benches introduced
before. The benchmarks are enabled only if `LUAJIT_ENABLE_PERF` option
is set. For each suite (LuaJIT-benches in this patch set)
`AddBenchTarget()` macro generates 2 targets:
* Target to run all benches and store results in the
  perf/output/<suite_name> directory.
* Target to run all benches via CTest and inspect results in the
  console.

For the LuaJIT-benches there are 2 generated files:
* FASTA_5000000 -- is used as an input for <k-nukleotide.lua> and
                   <revcomp.lua>.
* SUMCOLL_5000.txt -- is used as an input for <sum-file.lua>.

These files and <perf/output> directory are added to the .gitignore files.
---
 .gitignore                         |  5 ++
 CMakeLists.txt                     | 11 ++++
 perf/CMakeLists.txt                | 99 ++++++++++++++++++++++++++++++
 perf/LuaJIT-benches/CMakeLists.txt | 52 ++++++++++++++++
 4 files changed, 167 insertions(+)
 create mode 100644 perf/CMakeLists.txt
 create mode 100644 perf/LuaJIT-benches/CMakeLists.txt

diff --git a/.gitignore b/.gitignore
index c26a7eb8..bfc7d401 100644
--- a/.gitignore
+++ b/.gitignore
@@ -28,3 +28,8 @@ luajit-parse-memprof
 luajit-parse-sysprof
 luajit.pc
 *.c_test
+
+# Generated by the performance tests.
+FASTA_5000000
+SUMCOL_5000.txt
+perf/output/
diff --git a/CMakeLists.txt b/CMakeLists.txt
index c0da4362..73f46835 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -464,6 +464,17 @@ if(LUAJIT_USE_TEST)
 endif()
 add_subdirectory(test)
 
+# --- Benchmarks source tree ---------------------------------------------------
+
+# The option to enable performance tests for the LuaJIT.
+# Disabled by default, since commonly it is used only by LuaJIT
+# developers and run in the CI with the specially set-up machine.
+option(LUAJIT_ENABLE_PERF "Generate <perf> target" OFF)
+
+if(LUAJIT_ENABLE_PERF)
+  add_subdirectory(perf)
+endif()
+
 # --- Misc rules ---------------------------------------------------------------
 
 # XXX: Implement <uninstall> target using the following recipe:
diff --git a/perf/CMakeLists.txt b/perf/CMakeLists.txt
new file mode 100644
index 00000000..cc3c312f
--- /dev/null
+++ b/perf/CMakeLists.txt
@@ -0,0 +1,99 @@
+# Running various bench suites against LuaJIT.
+
+include(MakeLuaPath)
+
+if(CMAKE_BUILD_TYPE STREQUAL "Debug")
+  message(WARNING "LuaJIT and perf tests are built in the Debug mode."
+                  "Timings may be affected.")
+endif()
+
+set(PERF_OUTPUT_DIR ${PROJECT_BINARY_DIR}/perf/output)
+file(MAKE_DIRECTORY ${PERF_OUTPUT_DIR})
+
+# List of paths that will be used for each suite.
+make_lua_path(LUA_PATH_BENCH_BASE
+  PATHS
+    # Use of the bench module.
+    ${CMAKE_CURRENT_SOURCE_DIR}/utils/?.lua
+    # Simple usage with `jit.dump()`, etc.
+    ${LUAJIT_SOURCE_DIR}/?.lua
+    ${LUAJIT_BINARY_DIR}/?.lua
+)
+
+make_lua_path(LUA_CPATH
+  PATHS
+    # XXX: Some arches may have installed the cjson module here.
+    /usr/lib64/lua/5.1/?.so
+)
+
+# Produce the pair:
+# Target to run for reporting and target to inspect from the
+# console, runnable by the CTest.
+macro(AddBenchTarget perf_suite)
+  file(MAKE_DIRECTORY "${PERF_OUTPUT_DIR}/${perf_suite}/")
+  message(STATUS "Add perf suite ${perf_suite}")
+  add_custom_target(${perf_suite})
+  add_custom_target(${perf_suite}-console
+    COMMAND ${CMAKE_CTEST_COMMAND}
+      -L ${perf_suite}
+      --parallel 1
+      --verbose
+      --output-on-failure
+      --no-tests=error
+  )
+  add_dependencies(${perf_suite}-console luajit-main)
+endmacro()
+
+# Add the bench to the pair of targets created by the call above.
+macro(AddBench bench_name bench_path perf_suite LUA_PATH)
+  set(bench_title "perf/${perf_suite}/${bench_name}")
+  get_filename_component(bench_name_stripped  ${bench_name} NAME_WE)
+  set(bench_out_file
+    ${PERF_OUTPUT_DIR}/${perf_suite}/${bench_name_stripped}.json
+  )
+  set(bench_command "${LUAJIT_BINARY} ${bench_path}")
+  if(${ARGC} GREATER 4)
+    set(input_file ${ARGV4})
+    set(bench_command "${bench_command} < ${input_file}")
+  endif()
+  set(BENCH_FLAGS
+    "--benchmark_out_format=json --benchmark_out=${bench_out_file}"
+  )
+  set(bench_command_flags ${bench_command} ${BENCH_FLAGS})
+  separate_arguments(bench_command_separated UNIX_COMMAND ${bench_command})
+  add_custom_command(
+    COMMAND ${CMAKE_COMMAND} -E env
+      LUA_PATH="${LUA_PATH}"
+      LUA_CPATH="${LUA_CPATH}"
+        ${bench_command_separated}
+          --benchmark_out_format=json
+          --benchmark_out="${bench_out_file}"
+    OUTPUT ${bench_out_file}
+    DEPENDS luajit-main
+    COMMENT
+      "Running benchmark ${bench_title} saving results in ${bench_out_file}."
+  )
+  add_custom_target(${bench_name} DEPENDS ${bench_out_file})
+  add_dependencies(${perf_suite} ${bench_name})
+
+  # Report in the console.
+  add_test(NAME ${bench_title}
+    COMMAND sh -c "${bench_command}"
+  )
+  set_tests_properties(${bench_title} PROPERTIES
+    ENVIRONMENT "LUA_PATH=${LUA_PATH}"
+    LABELS ${perf_suite}
+    DEPENDS luajit-main
+  )
+  unset(input_file)
+endmacro()
+
+add_subdirectory(LuaJIT-benches)
+
+add_custom_target(${PROJECT_NAME}-perf
+  DEPENDS LuaJIT-benches
+)
+
+add_custom_target(${PROJECT_NAME}-perf-console
+  DEPENDS LuaJIT-benches-console
+)
diff --git a/perf/LuaJIT-benches/CMakeLists.txt b/perf/LuaJIT-benches/CMakeLists.txt
new file mode 100644
index 00000000..d9909f36
--- /dev/null
+++ b/perf/LuaJIT-benches/CMakeLists.txt
@@ -0,0 +1,52 @@
+set(PERF_SUITE_NAME LuaJIT-benches)
+set(LUA_BENCH_SUFFIX .lua)
+
+AddBenchTarget(${PERF_SUITE_NAME})
+
+# Input for the k-nucleotide and revcomp benchmarks.
+set(FASTA_NAME ${CMAKE_CURRENT_BINARY_DIR}/FASTA_5000000)
+add_custom_target(FASTA_5000000
+  COMMAND ${LUAJIT_BINARY}
+    ${CMAKE_CURRENT_SOURCE_DIR}/libs/fasta.lua 5000000 > ${FASTA_NAME}
+  OUTPUT ${FASTA_NAME}
+  DEPENDS luajit-main
+  COMMENT "Generate ${FASTA_NAME}."
+)
+
+make_lua_path(LUA_PATH
+  PATHS
+    ${LUA_PATH_BENCH_BASE}
+    ${CMAKE_CURRENT_SOURCE_DIR}/libs/?.lua
+)
+
+# Input for the <sum-file.lua> benchmark.
+set(SUM_NAME ${CMAKE_CURRENT_BINARY_DIR}/SUMCOL_5000.txt)
+# Remove possibly existing file.
+file(REMOVE ${SUM_NAME})
+
+set(SUMCOL_FILE ${CMAKE_CURRENT_SOURCE_DIR}/SUMCOL_1.txt)
+file(READ ${SUMCOL_FILE} SUMCOL_CONTENT)
+foreach(_unused RANGE 4999)
+  file(APPEND ${SUM_NAME} "${SUMCOL_CONTENT}")
+endforeach()
+
+file(GLOB benches "${CMAKE_CURRENT_SOURCE_DIR}/*${LUA_BENCH_SUFFIX}")
+foreach(bench_path ${benches})
+  file(RELATIVE_PATH bench_name ${CMAKE_CURRENT_SOURCE_DIR} ${bench_path})
+  set(bench_title "perf/${PERF_SUITE_NAME}/${bench_name}")
+  if(bench_name MATCHES "k-nucleotide" OR bench_name MATCHES "revcomp")
+    AddBench(${bench_name}
+      ${bench_path} ${PERF_SUITE_NAME} "${LUA_PATH}" ${FASTA_NAME}
+    )
+    add_dependencies(${bench_name} FASTA_5000000)
+  elseif(bench_name MATCHES "sum-file")
+    AddBench(${bench_name}
+      ${bench_path} ${PERF_SUITE_NAME} "${LUA_PATH}" ${SUM_NAME}
+    )
+  else()
+    AddBench(${bench_name} ${bench_path} ${PERF_SUITE_NAME} "${LUA_PATH}")
+  endif()
+endforeach()
+
+# We need to generate the file before we run tests.
+add_dependencies(${PERF_SUITE_NAME}-console FASTA_5000000)
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 38/41] perf: add aggregator helper for bench statistics
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (36 preceding siblings ...)
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 37/41] perf: add CMake infrastructure Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 39/41] perf: add a script for the environment setup Sergey Kaplun via Tarantool-patches
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adds a helper script to aggregate the benchmark results from
JSON files to the format parsable by the InfluxDB line protocol [1].

All JSON files from each suite in the <perf/output> directory are
considered as the benchmark results and aggregated into the
<perf/output/summary.txt> file that can be posted to the InfluxDB. The
results are aggregated via the new target LuaJIT-perf-aggregate.

[1]: https://docs.influxdata.com/influxdb/v2/reference/syntax/line-protocol/
---
 perf/CMakeLists.txt        |  13 ++++
 perf/helpers/aggregate.lua | 124 +++++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+)
 create mode 100644 perf/helpers/aggregate.lua

diff --git a/perf/CMakeLists.txt b/perf/CMakeLists.txt
index cc3c312f..68e561fd 100644
--- a/perf/CMakeLists.txt
+++ b/perf/CMakeLists.txt
@@ -97,3 +97,16 @@ add_custom_target(${PROJECT_NAME}-perf
 add_custom_target(${PROJECT_NAME}-perf-console
   DEPENDS LuaJIT-benches-console
 )
+
+set(PERF_SUMMARY ${PERF_OUTPUT_DIR}/summary.txt)
+add_custom_target(${PROJECT_NAME}-perf-aggregate
+  BYPRODUCTS ${PERF_SUMMARY}
+  COMMENT "Aggregate performance test results into ${PERF_SUMMARY}"
+  COMMAND ${CMAKE_COMMAND} -E env
+    LUA_CPATH="${LUA_CPATH}"
+      ${LUAJIT_BINARY} ${CMAKE_CURRENT_SOURCE_DIR}/helpers/aggregate.lua
+        ${PERF_SUMMARY}
+        ${PERF_OUTPUT_DIR}
+  WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+  DEPENDS luajit-main
+)
diff --git a/perf/helpers/aggregate.lua b/perf/helpers/aggregate.lua
new file mode 100644
index 00000000..12a8ab89
--- /dev/null
+++ b/perf/helpers/aggregate.lua
@@ -0,0 +1,124 @@
+local json = require('cjson')
+
+-- File to aggregate the benchmark results from JSON files to the
+-- format parsable by the InfluxDB line protocol [1]:
+-- <measurement>,<tag_set> <field_set> <timestamp>
+--
+-- <tag_set> and <field_set> have the following format:
+-- <key1>=<value1>,<key2>=<value2>
+--
+-- The reported tag set is a set of values that can be used for
+-- filtering data (i.e., branch or benchmark name).
+--
+-- luacheck: push no max comment line length
+--
+-- [1]: https://docs.influxdata.com/influxdb/v2/reference/syntax/line-protocol/
+--
+-- luacheck: pop
+
+local output = assert(arg[1], 'Output file is required as the first argument')
+local input_dir = arg[2] or '.'
+
+local out_fh = assert(io.open(output, 'w+'))
+
+local function exec(cmd)
+  return io.popen(cmd):read('*all'):gsub('%s+$', '')
+end
+
+local commit = os.getenv('PERF_COMMIT') or exec('git rev-parse --short HEAD')
+assert(commit, 'can not determine the commit')
+
+local branch = os.getenv('PERF_BRANCH') or
+  exec('git rev-parse --abbrev-ref HEAD')
+assert(branch, 'can not determine the branch')
+
+-- Not very robust, but OK for our needs.
+local function listdir(path)
+  local handle = io.popen('ls -1 ' .. path)
+
+  local files = {}
+  for file in handle:lines() do
+    table.insert(files, file)
+  end
+
+  return files
+end
+
+local tag_set = {branch = branch}
+
+local function table_plain_copy(src)
+  local dst = {}
+  for k, v in pairs(src) do
+    dst[k] = v
+  end
+  return dst
+end
+
+local function read_all(file)
+  local fh = assert(io.open(file, 'rb'))
+  local content = fh:read('*all')
+  fh:close()
+  return content
+end
+
+local REPORTED_FIELDS = {
+  'cpu_time',
+  'items_per_second',
+  'iterations',
+  'real_time',
+}
+
+local function influx_kv(tab)
+  local kv_string = {}
+  for k, v in pairs(tab) do
+    table.insert(kv_string, ('%s=%s'):format(k, v))
+  end
+  return table.concat(kv_string, ',')
+end
+
+local time = os.time()
+local function influx_line(measurement, tags, fields)
+  return ('%s,%s %s %d\n'):format(measurement, influx_kv(tags),
+          influx_kv(fields), time)
+end
+
+for _, suite_name in pairs(listdir(input_dir)) do
+  -- May list the report file, but will be ignored by the
+  -- condition below.
+  local suite_dir = ('%s/%s'):format(input_dir, suite_name)
+  for _, file in pairs(listdir(suite_dir)) do
+    -- Skip files in which we are not interested.
+    if not file:match('%.json$') then goto continue end
+
+    local data = read_all(('%s/%s'):format(suite_dir, file))
+    local bench_name = file:match('([^/]+)%.json')
+    local bench_data = json.decode(data)
+    local benchmarks = bench_data.benchmarks
+    local arch = bench_data.context.arch
+    local gc64 = bench_data.context.gc64
+    local jit = bench_data.context.jit
+
+    for _, bench in ipairs(benchmarks) do
+      local full_tag_set = table_plain_copy(tag_set)
+      full_tag_set.name = bench.name
+      full_tag_set.suite = suite_name
+      full_tag_set.arch = arch
+      full_tag_set.gc64 = gc64
+      full_tag_set.jit = jit
+
+      -- Save the commit as a field, since we don't want to filter
+      -- benchmarks by the commit (one point of data).
+      local field_set = {commit = ('"%s"'):format(commit)}
+
+      for _, field in ipairs(REPORTED_FIELDS) do
+          field_set[field] = bench[field]
+      end
+
+      local line = influx_line(bench_name, full_tag_set, field_set)
+      out_fh:write(line)
+    end
+    ::continue::
+  end
+end
+
+out_fh:close()
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 39/41] perf: add a script for the environment setup
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (37 preceding siblings ...)
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 38/41] perf: add aggregator helper for bench statistics Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 40/41] perf: provide CMake option to setup the benchmark Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 41/41] ci: introduce the performance workflow Sergey Kaplun via Tarantool-patches
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

The patch adds a script for setting the environment before running
performance tests. Most of the settings are taken from the Tarantool's
wiki page dedicated to the benchmarking [1].

[1]: https://github.com/tarantool/tarantool/wiki/Benchmarking
---
 perf/helpers/setup_env.sh | 135 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)
 create mode 100755 perf/helpers/setup_env.sh

diff --git a/perf/helpers/setup_env.sh b/perf/helpers/setup_env.sh
new file mode 100755
index 00000000..043d3c88
--- /dev/null
+++ b/perf/helpers/setup_env.sh
@@ -0,0 +1,135 @@
+#!/bin/sh
+
+# The script sets up a Linux operating system before running
+# LuaJIT benchmarks. See more details in [1].
+#
+# [1]: https://github.com/tarantool/tarantool/wiki/Benchmarking
+
+set -eu
+
+uid=$(id -u)
+if [ "$uid" -ne 0 ]
+  then echo "Please run as root."
+  exit 1
+fi
+
+###
+# Helpers.
+###
+
+cpu_vendor="unknown"
+cpuinfo_vendor=$(awk '/vendor_id/{ print $3; exit }' < /proc/cpuinfo)
+if [ "$cpuinfo_vendor" = "GenuineIntel" ]; then
+  cpu_vendor="intel"
+elif [ "$cpuinfo_vendor" = "AuthenticAMD" ]; then
+  cpu_vendor="amd"
+else
+  echo "Unknown CPU vendor '$cpuinfo_vendor'"
+  exit 1
+fi
+
+FAILURE_MSG="WARNING"
+SUCCESS_MSG="CHECKED"
+SKIPPED_MSG="SKIPPED"
+
+set_kernel_setting() {
+  desc_msg="$1"
+  file_path="$2"
+  value="$3"
+
+  if [ -f "$file_path" ]; then
+    sh -c "echo $value > $file_path" && status="$SUCCESS_MSG" || status="$FAILURE_MSG"
+  else
+    status="$SKIPPED_MSG"
+  fi
+  echo "$desc_msg $status"
+}
+
+kernel_setting_is_nonzero() {
+  desc_msg="$1"
+  file_path="$2"
+  hint_msg="$3"
+
+  if [ -f "$file_path" ]; then
+    value=$(cat "$file_path")
+    if [ -n "$value" ]; then
+      status="$SUCCESS_MSG"
+    else
+      status="$FAILURE_MSG (hint: $hint_msg)"
+    fi
+  else
+    status="$SKIPPED_MSG"
+  fi
+  echo "$desc_msg $status"
+}
+
+###
+# Kernel command line parameters.
+###
+
+desc_msg="Disable AMD SMT or Intel Hyperthreading "
+sysfs_path="/sys/devices/system/cpu/smt/active"
+if [ -f "$sysfs_path" ]; then
+  is_set=$(cat $sysfs_path)
+  err_msg="$FAILURE_MSG (hint: set 'nosmt' kernel parameter)"
+  [ "$is_set" = 1 ] && status="$SUCCESS_MSG" || status="$err_msg"
+else
+  status="$SKIPPED_MSG"
+fi
+echo "$desc_msg $status"
+
+kernel_setting_is_nonzero \
+  "Isolate CPUs for benchmarking" \
+  "/sys/devices/system/cpu/isolated" \
+  "set 'isolcpus' kernel parameter"
+
+kernel_setting_is_nonzero \
+  "Offload interrupts from the isolated CPUs" \
+  "/proc/irq/default_smp_affinity" \
+  "set 'irqaffinity' kernel parameter"
+
+kernel_setting_is_nonzero \
+  "Disable scheduling on single-task isolated CPUs" \
+  "/sys/devices/system/cpu/nohz_full" \
+  "set 'nohz_full' kernel parameter"
+
+set_kernel_setting \
+  "Disable transparent huge pages" \
+  "/sys/kernel/mm/transparent_hugepage/enabled" \
+  "never"
+
+set_kernel_setting \
+  "Disable direct compaction of transparent huge pages" \
+  "/sys/kernel/mm/transparent_hugepage/defrag" \
+  "never"
+
+# Disable ASLR for the repeatable LuaJIT behaviour.
+set_kernel_setting \
+  "Disable ASLR" \
+  "/proc/sys/kernel/randomize_va_space" \
+  "0"
+
+###
+# System tuning.
+###
+
+if [ "$cpu_vendor" = "amd" ]; then
+  sysfs_path="/sys/devices/system/cpu/cpufreq/boost"
+  value=0
+elif [ "$cpu_vendor" = "intel" ]; then
+  sysfs_path="/sys/devices/system/cpu/intel_pstate/no_turbo"
+  value=1
+fi
+set_kernel_setting \
+  "Disable TurboBoost" \
+  "$sysfs_path" \
+  "$value"
+
+ncpu=$(getconf _NPROCESSORS_ONLN)
+for cpu_id in $(seq 0 1 $((ncpu-1))); do
+  sysfs_path_cpu="/sys/devices/system/cpu/cpu$cpu_id/cpufreq/scaling_governor"
+  set_kernel_setting \
+    "Stabilize the frequency of CPU $cpu_id" \
+    "$sysfs_path_cpu" \
+    "performance"
+done
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 40/41] perf: provide CMake option to setup the benchmark
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (38 preceding siblings ...)
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 39/41] perf: add a script for the environment setup Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 41/41] ci: introduce the performance workflow Sergey Kaplun via Tarantool-patches
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch introduces the `LUAJIT_BENCH_INIT` option to determine the
shell command to be run before the benchmark itself. It may be useful to
set taskset, etc.
---
 perf/CMakeLists.txt | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/perf/CMakeLists.txt b/perf/CMakeLists.txt
index 68e561fd..c315597f 100644
--- a/perf/CMakeLists.txt
+++ b/perf/CMakeLists.txt
@@ -7,6 +7,13 @@ if(CMAKE_BUILD_TYPE STREQUAL "Debug")
                   "Timings may be affected.")
 endif()
 
+# The shell command needs to be run before benchmarks are started.
+if(LUAJIT_BENCH_INIT)
+  message(STATUS
+    "The following command will run before benchmarks: '${LUAJIT_BENCH_INIT}'."
+  )
+endif()
+
 set(PERF_OUTPUT_DIR ${PROJECT_BINARY_DIR}/perf/output)
 file(MAKE_DIRECTORY ${PERF_OUTPUT_DIR})
 
@@ -51,7 +58,7 @@ macro(AddBench bench_name bench_path perf_suite LUA_PATH)
   set(bench_out_file
     ${PERF_OUTPUT_DIR}/${perf_suite}/${bench_name_stripped}.json
   )
-  set(bench_command "${LUAJIT_BINARY} ${bench_path}")
+  set(bench_command "${LUAJIT_BENCH_INIT} ${LUAJIT_BINARY} ${bench_path}")
   if(${ARGC} GREATER 4)
     set(input_file ${ARGV4})
     set(bench_command "${bench_command} < ${input_file}")
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Tarantool-patches] [PATCH v1 luajit 41/41] ci: introduce the performance workflow
  2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
                   ` (39 preceding siblings ...)
  2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 40/41] perf: provide CMake option to setup the benchmark Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:00 ` Sergey Kaplun via Tarantool-patches
  40 siblings, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:00 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

This patch adds the workflow to run benchmarks from various suites,
aggregate their results, and send statistics to the InfluxDB to be
processed later.

The workflow contains a matrix to measure GC64 and non-GC64 modes with
enabled/disabled JIT for x64 architecture.
---
 .github/actions/setup-performance/README.md  |  10 ++
 .github/actions/setup-performance/action.yml |  18 +++
 .github/workflows/performance.yml            | 110 +++++++++++++++++++
 3 files changed, 138 insertions(+)
 create mode 100644 .github/actions/setup-performance/README.md
 create mode 100644 .github/actions/setup-performance/action.yml
 create mode 100644 .github/workflows/performance.yml

diff --git a/.github/actions/setup-performance/README.md b/.github/actions/setup-performance/README.md
new file mode 100644
index 00000000..4c4bbdab
--- /dev/null
+++ b/.github/actions/setup-performance/README.md
@@ -0,0 +1,10 @@
+# Setup performance
+
+Action setups the performance on Linux runners.
+
+## How to use Github Action from Github workflow
+
+Add the following code to the running steps before LuaJIT configuration:
+```
+- uses: ./.github/actions/setup-performance
+```
diff --git a/.github/actions/setup-performance/action.yml b/.github/actions/setup-performance/action.yml
new file mode 100644
index 00000000..24d07440
--- /dev/null
+++ b/.github/actions/setup-performance/action.yml
@@ -0,0 +1,18 @@
+name: Setup performance
+description: The Linux machine setup for running LuaJIT benchmarks
+runs:
+  using: composite
+  steps:
+    - name: Setup CI environment (Linux)
+      uses: ./.github/actions/setup-linux
+    - name: Install dependencies for the LuaJIT benchmarks
+      run: |
+        apt -y update
+        apt install -y luarocks curl
+      shell: bash
+    - name: Install Lua modules
+      run: luarocks install lua-cjson
+      shell: bash
+    - name: Run script to setup Linux environment
+      run: sh ./perf/helpers/setup_env.sh
+      shell: bash
diff --git a/.github/workflows/performance.yml b/.github/workflows/performance.yml
new file mode 100644
index 00000000..bfb6be97
--- /dev/null
+++ b/.github/workflows/performance.yml
@@ -0,0 +1,110 @@
+name: Performance
+
+on:
+  push:
+    branches-ignore:
+      - '**-noperf'
+      - 'tarantool/release/**'
+      - 'upstream-**'
+    tags-ignore:
+      - '**'
+  schedule:
+    # Once a day at 03:00 to avoid clashing with runs for the
+    # Tarantool benchmarks at midnight.
+    - cron: '0 3 * * *'
+
+concurrency:
+  # An update of a developer branch cancels the previously
+  # scheduled workflow run for this branch. However, the default
+  # branch, and long-term branch (tarantool/release/2.11,
+  # tarantool/release/2.10, etc) workflow runs are never canceled.
+  #
+  # We use a trick here: define the concurrency group as 'workflow
+  # run ID' + # 'workflow run attempt' because it is a unique
+  # combination for any run. So it effectively discards grouping.
+  #
+  # XXX: we cannot use `github.sha` as a unique identifier because
+  # pushing a tag may cancel a run that works on a branch push
+  # event.
+  group: ${{ startsWith(github.ref, 'refs/heads/tarantool/')
+    && format('{0}-{1}', github.run_id, github.run_attempt)
+    || format('{0}-{1}', github.workflow, github.ref) }}
+  cancel-in-progress: true
+
+jobs:
+  performance-luajit:
+    # The 'performance' label _must_ be set only for the single
+    # runner to guarantee that results are not dependent on the
+    # machine.
+    runs-on:
+      - self-hosted
+      - Linux
+      - x86_64
+      - 'performance'
+
+    env:
+      PERF_BRANCH: ${{ github.ref_name }}
+      PERF_COMMIT: ${{ github.sha }}
+
+    strategy:
+      fail-fast: false
+      matrix:
+        GC64: [ON, OFF]
+        JOFF: [ON, OFF]
+      # Run each job sequentially.
+      max-parallel: 1
+    name: >
+      LuaJIT
+      GC64:${{ matrix.GC64 }}
+      JOFF:${{ matrix.GC64 }}
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          submodules: recursive
+      - name: setup performance environment
+        uses: ./.github/actions/setup-performance
+      - name: configure
+        # The taskset alone will pin all the process threads
+        # into a single (random) isolated CPU, see
+        # https://bugzilla.kernel.org/show_bug.cgi?id=116701.
+        # The workaround is using realtime scheduler for the
+        # isolated task using chrt, e. g.:
+        # sudo taskset 0xef chrt 50.
+        # But this makes the process use non-standard, real-time
+        # round-robin scheduling mechanism.
+        run: >
+          cmake -S . -B ${{ env.BUILDDIR }}
+          -DCMAKE_BUILD_TYPE=RelWithDebInfo
+          -DLUAJIT_ENABLE_PERF=ON
+          -DLUAJIT_BENCH_INIT="taskset 0xfe chrt 50"
+          -DLUAJIT_DISABLE_JIT=${{ matrix.JOFF }}
+          -DLUAJIT_ENABLE_GC64=${{ matrix.GC64 }}
+      - name: build
+        run: cmake --build . --parallel
+        working-directory: ${{ env.BUILDDIR }}
+      - name: perf
+        run: make LuaJIT-perf
+        working-directory: ${{ env.BUILDDIR }}
+      - name: aggregate benchmark results
+        run: make LuaJIT-perf-aggregate
+        working-directory: ${{ env.BUILDDIR }}
+      - name: send statistics to InfluxDB
+        # --silent -o /dev/null: Prevent dumping any reply part
+        # in the output in case of an error.
+        # --fail: Exit with the 22 error code is status >= 400.
+        # --write-out: See the reason for the failure, if any.
+        # --retry, --retry-delay: To avoid losing the results of
+        # running after such a long job, try to retry sending the
+        # results.
+        run: >
+          curl --request POST
+          "${{ secrets.INFLUXDB_URL }}/api/v2/write?org=tarantool&bucket=luajit-performance&precision=s"
+          --write-out "%{http_code}"
+          --retry 5
+          --retry-delay 5
+          --connect-timeout 120
+          --fail --silent -o /dev/null
+          --header "Authorization: Token ${{ secrets.INFLUXDB_TOKEN }}"
+          --data-binary @./perf/output/summary.txt
+        working-directory: ${{ env.BUILDDIR }}
-- 
2.51.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu " Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:01   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:01 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

The second copy of the letter is send by mistake. Feel free to ignore
it.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc " Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:02 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

The second copy of the letter is send by mistake. Feel free to ignore
it.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor " Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:02 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

The second copy of the letter is send by mistake. Feel free to ignore
it.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse in LuaJIT-benches
  2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse " Sergey Kaplun via Tarantool-patches
  2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
@ 2025-10-24 11:03   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 50+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-10-24 11:03 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

The second copy of the letter is send by mistake. Feel free to ignore
it.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2025-10-24 11:15 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-24 10:50 [Tarantool-patches] [PATCH v1 luajit 00/41] LuaJIT performance testing Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 01/41] perf: add LuaJIT-test-cleanup perf suite Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 02/41] perf: introduce clock module Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 03/41] perf: introduce bench module Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 04/41] perf: adjust array3d in LuaJIT-benches Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 05/41] perf: adjust binary-trees " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 06/41] perf: adjust chameneos " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 07/41] perf: adjust coroutine-ring " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 08/41] perf: adjust euler14-bit " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 09/41] perf: adjust fannkuch " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 10/41] perf: adjust fasta " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 11/41] perf: adjust k-nucleotide " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 12/41] perf: adjust life " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 13/41] perf: adjust mandelbrot-bit " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 14/41] perf: adjust mandelbrot " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 15/41] perf: adjust md5 " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 16/41] perf: adjust meteor " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 17/41] perf: adjust nbody " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 18/41] perf: adjust nsieve-bit-fp " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 19/41] perf: adjust nsieve-bit " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 20/41] perf: adjust nsieve " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 21/41] perf: adjust partialsums " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 22/41] perf: adjust pidigits-nogmp " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 23/41] perf: adjust ray " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 24/41] perf: adjust recursive-ack " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 25/41] perf: adjust recursive-fib " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 26/41] perf: adjust revcomp " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 27/41] perf: adjust scimark-2010-12-20 " Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 28/41] perf: move <scimark_lib.lua> to <libs/> directory Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 29/41] perf: adjust scimark-fft in LuaJIT-benches Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 30/41] perf: adjust scimark-lu " Sergey Kaplun via Tarantool-patches
2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
2025-10-24 11:01   ` Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 31/41] perf: add scimark-mc " Sergey Kaplun via Tarantool-patches
2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 32/41] perf: adjust scimark-sor " Sergey Kaplun via Tarantool-patches
2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
2025-10-24 11:02   ` Sergey Kaplun via Tarantool-patches
2025-10-24 10:50 ` [Tarantool-patches] [PATCH v1 luajit 33/41] perf: adjust scimark-sparse " Sergey Kaplun via Tarantool-patches
2025-10-24 11:00   ` Sergey Kaplun via Tarantool-patches
2025-10-24 11:03   ` Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 34/41] perf: adjust series " Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 35/41] perf: adjust spectral-norm " Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 36/41] perf: adjust sum-file " Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 37/41] perf: add CMake infrastructure Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 38/41] perf: add aggregator helper for bench statistics Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 39/41] perf: add a script for the environment setup Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 40/41] perf: provide CMake option to setup the benchmark Sergey Kaplun via Tarantool-patches
2025-10-24 11:00 ` [Tarantool-patches] [PATCH v1 luajit 41/41] ci: introduce the performance workflow Sergey Kaplun via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox