Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua
@ 2019-05-15 13:33 Vladislav Shpilevoy
  2019-05-18 18:41 ` [tarantool-patches] " Vladislav Shpilevoy
  2019-05-18 19:12 ` Konstantin Osipov
  0 siblings, 2 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2019-05-15 13:33 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Static allocator gives memory blocks from cyclic BSS memory
block of 3 pages 4096 bytes each. It is much faster than
malloc, when a temporary buffer is needed.

This commit exposes the allocator to Lua, which suffers from
lack of ability to pass values by pointers into FFI functions,
nor has a stack to allocate small buffers like 'char[256]'.
Also these allocations complicate and slow down GC job.

Static allocator solves most of these problems provindg a way to
swiftly allocate temporary memory blocks. Besides, for the most
annoying problem with ffi.new(<type>[1]) the buffer module now
exposes a statically allocated array of scalar values able to fit
any 'int', 'char', 'size_t' ... and ready at hand.

A simple micro benchmark showed, that ffi.new() vs
buffer.static_alloc() is ~100 times slower, even on small
allocations of 1Kb, and ~1.5 times slower on tiny allocations
of 10 bytes. The results do not account GC. It is remarkable,
buffer.static_alloc() speed does not depend on size, while
ffi.new() strongly depends.

The commit is motivated by one another place where
ffi.new('int[1]') appeared, in SWIM module, to obtain payload
size as an out parameter of a C function.
---
Branch: https://github.com/tarantool/tarantool/tree/gerold103/static-allocator-lua

 extra/exports            |  2 ++
 src/CMakeLists.txt       |  1 +
 src/box/lua/schema.lua   |  3 +-
 src/lua/buffer.c         | 42 +++++++++++++++++++++++++++
 src/lua/buffer.lua       | 61 ++++++++++++++++++++++++++++++++++++++++
 src/lua/crypto.lua       | 22 +++++++--------
 src/lua/digest.lua       |  7 +++--
 src/lua/fio.lua          |  3 +-
 src/lua/init.c           |  2 +-
 src/lua/msgpackffi.lua   | 50 +++++++++++++-------------------
 src/lua/socket.lua       | 36 ++++++++++++++----------
 src/lua/string.lua       | 10 ++++---
 src/lua/uri.lua          |  7 +++--
 src/lua/uuid.lua         |  9 +++---
 test/app/buffer.result   | 53 ++++++++++++++++++++++++++++++++++
 test/app/buffer.test.lua | 23 +++++++++++++++
 16 files changed, 256 insertions(+), 75 deletions(-)
 create mode 100644 src/lua/buffer.c
 create mode 100644 test/app/buffer.result
 create mode 100644 test/app/buffer.test.lua

diff --git a/extra/exports b/extra/exports
index 4f41a17b3..5375a01e4 100644
--- a/extra/exports
+++ b/extra/exports
@@ -85,6 +85,8 @@ tnt_EVP_MD_CTX_free
 tnt_HMAC_CTX_new
 tnt_HMAC_CTX_free
 
+lua_static_aligned_alloc
+
 # Module API
 
 _say
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index a6a18142b..492b8712e 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -116,6 +116,7 @@ set (server_sources
      lua/utf8.c
      lua/info.c
      lua/string.c
+     lua/buffer.c
      ${lua_sources}
      ${PROJECT_SOURCE_DIR}/third_party/lua-yaml/lyaml.cc
      ${PROJECT_SOURCE_DIR}/third_party/lua-yaml/b64.c
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index f31cf7f2c..fa6c7c9a4 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -7,6 +7,7 @@ local fun = require('fun')
 local log = require('log')
 local fio = require('fio')
 local json = require('json')
+local static_alloc = require('buffer').static_alloc
 local session = box.session
 local internal = require('box.internal')
 local function setmap(table)
@@ -2103,7 +2104,7 @@ box.schema.user = {}
 
 box.schema.user.password = function(password)
     local BUF_SIZE = 128
-    local buf = ffi.new("char[?]", BUF_SIZE)
+    local buf = static_alloc('char', BUF_SIZE)
     builtin.password_prepare(password, #password, buf, BUF_SIZE)
     return ffi.string(buf)
 end
diff --git a/src/lua/buffer.c b/src/lua/buffer.c
new file mode 100644
index 000000000..5fd349261
--- /dev/null
+++ b/src/lua/buffer.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include "small/static.h"
+
+/**
+ * Static inline functions like in static.h can't be exported.
+ * Here they are given a symbol to export.
+ */
+
+void *
+lua_static_aligned_alloc(size_t size, size_t alignment)
+{
+	return static_aligned_alloc(size, alignment);
+}
diff --git a/src/lua/buffer.lua b/src/lua/buffer.lua
index a72d8d1f9..7fd9f46ed 100644
--- a/src/lua/buffer.lua
+++ b/src/lua/buffer.lua
@@ -33,6 +33,37 @@ ibuf_reinit(struct ibuf *ibuf);
 
 void *
 ibuf_reserve_slow(struct ibuf *ibuf, size_t size);
+
+void *
+lua_static_aligned_alloc(size_t size, size_t alignment);
+
+/**
+ * Scalar is a buffer to use with FFI functions, which usually
+ * operate with pointers to scalar values like int, char, size_t,
+ * void *. To avoid doing 'ffi.new(<type>[1])' on each such FFI
+ * function invocation, a module can use one of attributes of the
+ * scalar union.
+ *
+ * Naming policy of the attributes is easy to remember:
+ * 'a' for array type + type name first letters + 'p' for pointer.
+ *
+ * For example:
+ * - int[1] - <a>rray of <i>nt - ai;
+ * - const unsigned char *[1] -
+ *       <a>rray of <c>onst <u>nsigned <c>har <p> pointer - acucp.
+ */
+union scalar {
+    size_t as[1];
+    void *ap[1];
+    int ai[1];
+    char ac[1];
+    const unsigned char *acucp[1];
+    unsigned long aul[1];
+    uint16_t u16;
+    uint32_t u32;
+    uint64_t u64;
+    int64_t i64;
+};
 ]]
 
 local builtin = ffi.C
@@ -174,8 +205,38 @@ local function ibuf_new(arg, arg2)
     errorf('Usage: ibuf([size])')
 end
 
+--
+-- Allocate a chunk of static BSS memory, or use ordinal ffi.new,
+-- when too big size.
+-- @param type C type - a struct, a basic type, etc. Should be a
+--        string: 'int', 'char *', 'struct tuple', etc.
+-- @param size Optional argument, number of elements of @a type to
+--        allocate. 1 by default.
+-- @return Cdata pointer to @a type.
+--
+local function static_alloc(type, size)
+    size = size or 1
+    local bsize = size * ffi.sizeof(type)
+    local ptr = builtin.lua_static_aligned_alloc(bsize, ffi.alignof(type))
+    if ptr ~= nil then
+        return ffi.cast(type..' *', ptr)
+    end
+    return ffi.new(type..'[?]', size)
+end
+
+--
+-- Sometimes it is wanted to use several temporary scalar cdata
+-- values. Then one union object is not enough - its attributes
+-- share memory.
+--
+local scalar_array = ffi.new('union scalar[?]', 2)
+
 return {
     ibuf = ibuf_new;
     IBUF_SHARED = ffi.C.tarantool_lua_ibuf;
     READAHEAD = READAHEAD;
+    static_alloc = static_alloc,
+    scalar_array = scalar_array,
+    -- Fast access when only one variable is needed.
+    scalar = scalar_array[0],
 }
diff --git a/src/lua/crypto.lua b/src/lua/crypto.lua
index e76370517..7db6afac9 100644
--- a/src/lua/crypto.lua
+++ b/src/lua/crypto.lua
@@ -2,6 +2,7 @@
 
 local ffi = require('ffi')
 local buffer = require('buffer')
+local scalar = buffer.scalar
 
 ffi.cdef[[
     int tnt_openssl_init(void);
@@ -84,7 +85,6 @@ local function digest_new(digest)
         digest = digest,
         buf = buffer.ibuf(64),
         initialized = false,
-        outl = ffi.new('int[1]')
     }, digest_mt)
     self:init()
     return self
@@ -114,10 +114,10 @@ local function digest_final(self)
         return error('Digest not initialized')
     end
     self.initialized = false
-    if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, self.outl) ~= 1 then
+    if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, scalar.ai) ~= 1 then
         return error('Can\'t finalize digest: ' .. openssl_err_str())
     end
-    return ffi.string(self.buf.wpos, self.outl[0])
+    return ffi.string(self.buf.wpos, scalar.ai[0])
 end
 
 local function digest_free(self)
@@ -156,9 +156,7 @@ local function hmac_new(digest, key)
     local self = setmetatable({
         ctx = ctx,
         digest = digest,
-        buf = buffer.ibuf(64),
         initialized = false,
-        outl = ffi.new('int[1]')
     }, hmac_mt)
     self:init(key)
     return self
@@ -188,10 +186,11 @@ local function hmac_final(self)
         return error('HMAC not initialized')
     end
     self.initialized = false
-    if ffi.C.HMAC_Final(self.ctx, self.buf.wpos, self.outl) ~= 1 then
+    local buf = buffer.static_alloc('char', 64)
+    if ffi.C.HMAC_Final(self.ctx, buf, scalar.ai) ~= 1 then
         return error('Can\'t finalize HMAC: ' .. openssl_err_str())
     end
-    return ffi.string(self.buf.wpos, self.outl[0])
+    return ffi.string(buf, scalar.ai[0])
 end
 
 local function hmac_free(self)
@@ -254,7 +253,6 @@ local function cipher_new(cipher, key, iv, direction)
         direction = direction,
         buf = buffer.ibuf(),
         initialized = false,
-        outl = ffi.new('int[1]')
     }, cipher_mt)
     self:init(key, iv)
     return self
@@ -279,10 +277,10 @@ local function cipher_update(self, input)
         error("Usage: cipher:update(string)")
     end
     local wpos = self.buf:reserve(input:len() + self.block_size - 1)
-    if ffi.C.EVP_CipherUpdate(self.ctx, wpos, self.outl, input, input:len()) ~= 1 then
+    if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then
         return error('Can\'t update cipher:' .. openssl_err_str())
     end
-    return ffi.string(wpos, self.outl[0])
+    return ffi.string(wpos, scalar.ai[0])
 end
 
 local function cipher_final(self)
@@ -291,11 +289,11 @@ local function cipher_final(self)
     end
     self.initialized = false
     local wpos = self.buf:reserve(self.block_size - 1)
-    if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, self.outl) ~= 1 then
+    if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, scalar.ai) ~= 1 then
         return error('Can\'t finalize cipher:' .. openssl_err_str())
     end
     self.initialized = false
-    return ffi.string(wpos, self.outl[0])
+    return ffi.string(wpos, scalar.ai[0])
 end
 
 local function cipher_free(self)
diff --git a/src/lua/digest.lua b/src/lua/digest.lua
index 8f199c0af..6ed91cfa2 100644
--- a/src/lua/digest.lua
+++ b/src/lua/digest.lua
@@ -3,6 +3,7 @@
 local ffi = require('ffi')
 local crypto = require('crypto')
 local bit = require('bit')
+local static_alloc = require('buffer').static_alloc
 
 ffi.cdef[[
     /* internal implementation */
@@ -179,7 +180,7 @@ local m = {
         end
         local blen = #bin
         local slen = ffi.C.base64_bufsize(blen, mask)
-        local str  = ffi.new('char[?]', slen)
+        local str  = static_alloc('char', slen)
         local len = ffi.C.base64_encode(bin, blen, str, slen, mask)
         return ffi.string(str, len)
     end,
@@ -190,7 +191,7 @@ local m = {
         end
         local slen = #str
         local blen = math.ceil(slen * 3 / 4)
-        local bin  = ffi.new('char[?]', blen)
+        local bin  = static_alloc('char', blen)
         local len = ffi.C.base64_decode(str, slen, bin, blen)
         return ffi.string(bin, len)
     end,
@@ -228,7 +229,7 @@ local m = {
         if n == nil then
             error('Usage: digest.urandom(len)')
         end
-        local buf = ffi.new('char[?]', n)
+        local buf = static_alloc('char', n)
         ffi.C.random_bytes(buf, n)
         return ffi.string(buf, n)
     end,
diff --git a/src/lua/fio.lua b/src/lua/fio.lua
index 38664a556..fce79c277 100644
--- a/src/lua/fio.lua
+++ b/src/lua/fio.lua
@@ -275,8 +275,7 @@ fio.dirname = function(path)
     if type(path) ~= 'string' then
         error("Usage: fio.dirname(path)")
     end
-    path = ffi.new('char[?]', #path + 1, path)
-    return ffi.string(ffi.C.dirname(path))
+    return ffi.string(ffi.C.dirname(ffi.cast('char *', path)))
 end
 
 fio.umask = function(umask)
diff --git a/src/lua/init.c b/src/lua/init.c
index be164bdfb..303369841 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -126,9 +126,9 @@ static const char *lua_modules[] = {
 	"errno", errno_lua,
 	"fiber", fiber_lua,
 	"env", env_lua,
+	"buffer", buffer_lua,
 	"string", string_lua,
 	"table", table_lua,
-	"buffer", buffer_lua,
 	"msgpackffi", msgpackffi_lua,
 	"crypto", crypto_lua,
 	"digest", digest_lua,
diff --git a/src/lua/msgpackffi.lua b/src/lua/msgpackffi.lua
index 609334cff..1e4786707 100644
--- a/src/lua/msgpackffi.lua
+++ b/src/lua/msgpackffi.lua
@@ -21,19 +21,10 @@ float
 mp_decode_float(const char **data);
 double
 mp_decode_double(const char **data);
-union tmpint {
-    uint16_t u16;
-    uint32_t u32;
-    uint64_t u64;
-};
 ]])
 
 local strict_alignment = (jit.arch == 'arm')
-
-local tmpint
-if strict_alignment then
-   tmpint = ffi.new('union tmpint[1]')
-end
+local scalar = buffer.scalar
 
 local function bswap_u16(num)
     return bit.rshift(bit.bswap(tonumber(num)), 16)
@@ -70,10 +61,10 @@ end
 local encode_u16
 if strict_alignment then
     encode_u16 = function(buf, code, num)
-        tmpint[0].u16 = bswap_u16(num)
+        scalar.u16 = bswap_u16(num)
         local p = buf:alloc(3)
         p[0] = code
-        ffi.copy(p + 1, tmpint, 2)
+        ffi.copy(p + 1, scalar, 2)
     end
 else
     encode_u16 = function(buf, code, num)
@@ -86,11 +77,10 @@ end
 local encode_u32
 if strict_alignment then
     encode_u32 = function(buf, code, num)
-        tmpint[0].u32 =
-            ffi.cast('uint32_t', bit.bswap(tonumber(num)))
+        scalar.u32 = ffi.cast('uint32_t', bit.bswap(tonumber(num)))
         local p = buf:alloc(5)
         p[0] = code
-        ffi.copy(p + 1, tmpint, 4)
+        ffi.copy(p + 1, scalar, 4)
     end
 else
     encode_u32 = function(buf, code, num)
@@ -104,10 +94,10 @@ end
 local encode_u64
 if strict_alignment then
     encode_u64 = function(buf, code, num)
-        tmpint[0].u64 = bit.bswap(ffi.cast('uint64_t', num))
+        scalar.u64 = bit.bswap(ffi.cast('uint64_t', num))
         local p = buf:alloc(9)
         p[0] = code
-        ffi.copy(p + 1, tmpint, 8)
+        ffi.copy(p + 1, scalar, 8)
     end
 else
     encode_u64 = function(buf, code, num)
@@ -324,9 +314,9 @@ end
 local decode_u16
 if strict_alignment then
     decode_u16 = function(data)
-        ffi.copy(tmpint, data[0], 2)
+        ffi.copy(scalar, data[0], 2)
         data[0] = data[0] + 2
-        return tonumber(bswap_u16(tmpint[0].u16))
+        return tonumber(bswap_u16(scalar.u16))
     end
 else
     decode_u16 = function(data)
@@ -339,10 +329,10 @@ end
 local decode_u32
 if strict_alignment then
     decode_u32 = function(data)
-        ffi.copy(tmpint, data[0], 4)
+        ffi.copy(scalar, data[0], 4)
         data[0] = data[0] + 4
         return tonumber(
-            ffi.cast('uint32_t', bit.bswap(tonumber(tmpint[0].u32))))
+            ffi.cast('uint32_t', bit.bswap(tonumber(scalar.u32))))
     end
 else
     decode_u32 = function(data)
@@ -356,9 +346,9 @@ end
 local decode_u64
 if strict_alignment then
     decode_u64 = function(data)
-        ffi.copy(tmpint, data[0], 8);
+        ffi.copy(scalar, data[0], 8);
         data[0] = data[0] + 8
-        local num = bit.bswap(tmpint[0].u64)
+        local num = bit.bswap(scalar.u64)
         if num <= DBL_INT_MAX then
             return tonumber(num) -- return as 'number'
         end
@@ -385,8 +375,8 @@ end
 local decode_i16
 if strict_alignment then
     decode_i16 = function(data)
-        ffi.copy(tmpint, data[0], 2)
-        local num = bswap_u16(tmpint[0].u16)
+        ffi.copy(scalar, data[0], 2)
+        local num = bswap_u16(scalar.u16)
         data[0] = data[0] + 2
         -- note: this double cast is actually necessary
         return tonumber(ffi.cast('int16_t', ffi.cast('uint16_t', num)))
@@ -403,8 +393,8 @@ end
 local decode_i32
 if strict_alignment then
     decode_i32 = function(data)
-        ffi.copy(tmpint, data[0], 4)
-        local num = bit.bswap(tonumber(tmpint[0].u32))
+        ffi.copy(scalar, data[0], 4)
+        local num = bit.bswap(tonumber(scalar.u32))
         data[0] = data[0] + 4
         return num
     end
@@ -419,9 +409,9 @@ end
 local decode_i64
 if strict_alignment then
     decode_i64 = function(data)
-        ffi.copy(tmpint, data[0], 8)
+        ffi.copy(scalar, data[0], 8)
         data[0] = data[0] + 8
-        local num = bit.bswap(ffi.cast('int64_t', tmpint[0].u64))
+        local num = bit.bswap(scalar.i64)
         if num >= -DBL_INT_MAX and num <= DBL_INT_MAX then
             return tonumber(num) -- return as 'number'
         end
@@ -552,7 +542,7 @@ end
 -- element. It is significally faster on LuaJIT to use double pointer than
 -- return result, newpos.
 --
-local bufp = ffi.new('const unsigned char *[1]');
+local bufp = scalar.acucp;
 
 local function check_offset(offset, len)
     if offset == nil then
diff --git a/src/lua/socket.lua b/src/lua/socket.lua
index b2700e0c0..1ddafee2f 100644
--- a/src/lua/socket.lua
+++ b/src/lua/socket.lua
@@ -10,6 +10,9 @@ local fiber = require('fiber')
 local fio = require('fio')
 local log = require('log')
 local buffer = require('buffer')
+local scalar = buffer.scalar
+local scalar_array = buffer.scalar_array
+local static_alloc = buffer.static_alloc
 
 local format = string.format
 
@@ -472,9 +475,9 @@ local function socket_setsockopt(self, level, name, value)
     end
 
     if info.type == 1 then
-        local value = ffi.new("int[1]", value)
+        scalar.ai[0] = value
         local res = ffi.C.setsockopt(fd,
-            level, info.iname, value, ffi.sizeof('int'))
+            level, info.iname, scalar.ai, ffi.sizeof('int'))
 
         if res < 0 then
             self._errno = boxerrno()
@@ -516,8 +519,10 @@ local function socket_getsockopt(self, level, name)
     self._errno = nil
 
     if info.type == 1 then
-        local value = ffi.new("int[1]", 0)
-        local len = ffi.new("size_t[1]", ffi.sizeof('int'))
+        local value = scalar_array[0].ai
+        value[0] = 0
+        local len = scalar_array[1].as
+        len[0] = ffi.sizeof('int')
         local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
 
         if res < 0 then
@@ -532,8 +537,9 @@ local function socket_getsockopt(self, level, name)
     end
 
     if info.type == 2 then
-        local value = ffi.new("char[256]", { 0 })
-        local len = ffi.new("size_t[1]", 256)
+        local value = static_alloc('char', 256)
+        local len = scalar.as
+        len[0] = 256
         local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
         if res < 0 then
             self._errno = boxerrno()
@@ -555,8 +561,9 @@ local function socket_linger(self, active, timeout)
     local info = internal.SO_OPT[level].SO_LINGER
     self._errno = nil
     if active == nil then
-        local value = ffi.new("linger_t[1]")
-        local len = ffi.new("size_t[1]", 2 * ffi.sizeof('int'))
+        local value = static_alloc('linger_t')
+        local len = scalar.as
+        len[0] = 2 * ffi.sizeof('linger_t')
         local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
         if res < 0 then
             self._errno = boxerrno()
@@ -581,9 +588,10 @@ local function socket_linger(self, active, timeout)
         iactive = 0
     end
 
-    local value = ffi.new("linger_t[1]",
-        { { active = iactive, timeout = timeout } })
-    local len = 2 * ffi.sizeof('int')
+    local value = static_alloc('linger_t')
+    value[0].active = iactive
+    value[0].timeout = timeout
+    local len = ffi.sizeof('linger_t')
     local res = ffi.C.setsockopt(fd, level, info.iname, value, len)
     if res < 0 then
         self._errno = boxerrno()
@@ -798,8 +806,7 @@ local function get_recv_size(self, size)
             -- them using message peek.
             local iflags = get_iflags(internal.SEND_FLAGS, {'MSG_PEEK'})
             assert(iflags ~= nil)
-            local buf = ffi.new('char[?]', 1)
-            size = tonumber(ffi.C.recv(fd, buf, 1, iflags))
+            size = tonumber(ffi.C.recv(fd, scalar.ac, 1, iflags))
             -- Prevent race condition: proceed with the case when
             -- a datagram of length > 0 has been arrived after the
             -- getsockopt call above.
@@ -836,8 +843,7 @@ local function socket_recv(self, size, flags)
     end
 
     self._errno = nil
-    local buf = ffi.new("char[?]", size)
-
+    local buf = static_alloc('char', size)
     local res = ffi.C.recv(fd, buf, size, iflags)
 
     if res == -1 then
diff --git a/src/lua/string.lua b/src/lua/string.lua
index 8216ace6a..6ed6be8bf 100644
--- a/src/lua/string.lua
+++ b/src/lua/string.lua
@@ -1,4 +1,6 @@
 local ffi = require('ffi')
+local buffer = require('buffer')
+local static_alloc = buffer.static_alloc
 
 ffi.cdef[[
     const char *
@@ -13,8 +15,8 @@ ffi.cdef[[
 ]]
 
 local c_char_ptr     = ffi.typeof('const char *')
-local strip_newstart = ffi.new("unsigned long[1]")
-local strip_newlen   = ffi.new("unsigned long[1]")
+local strip_newstart = buffer.scalar_array[0].aul
+local strip_newlen   = buffer.scalar_array[1].aul
 
 local memcmp  = ffi.C.memcmp
 local memmem  = ffi.C.memmem
@@ -290,7 +292,7 @@ local function string_hex(inp)
         error(err_string_arg:format(1, 'string.hex', 'string', type(inp)), 2)
     end
     local len = inp:len() * 2
-    local res = ffi.new('char[?]', len + 1)
+    local res = static_alloc('char', len + 1)
 
     local uinp = ffi.cast('const unsigned char *', inp)
     for i = 0, inp:len() - 1 do
@@ -333,7 +335,7 @@ local function string_fromhex(inp)
     end
     local len = inp:len() / 2
     local casted_inp = ffi.cast('const char *', inp)
-    local res = ffi.new('char[?]', len)
+    local res = static_alloc('char', len)
     for i = 0, len - 1 do
         local first = hexadecimals_mapping[casted_inp[i * 2]]
         local second = hexadecimals_mapping[casted_inp[i * 2 + 1]]
diff --git a/src/lua/uri.lua b/src/lua/uri.lua
index d2946cd2d..5967c8bf2 100644
--- a/src/lua/uri.lua
+++ b/src/lua/uri.lua
@@ -1,6 +1,7 @@
 -- uri.lua (internal file)
 
 local ffi = require('ffi')
+local static_alloc = require('buffer').static_alloc
 
 ffi.cdef[[
 struct uri {
@@ -32,12 +33,11 @@ uri_format(char *str, size_t len, struct uri *uri, bool write_password);
 
 local builtin = ffi.C;
 
-local uribuf = ffi.new('struct uri')
-
 local function parse(str)
     if str == nil then
         error("Usage: uri.parse(string)")
     end
+    local uribuf = static_alloc('struct uri')
     if builtin.uri_parse(uribuf, str) ~= 0 then
         return nil
     end
@@ -59,6 +59,7 @@ local function parse(str)
 end
 
 local function format(uri, write_password)
+    local uribuf = static_alloc('struct uri')
     uribuf.scheme = uri.scheme
     uribuf.scheme_len = string.len(uri.scheme or '')
     uribuf.login = uri.login
@@ -75,7 +76,7 @@ local function format(uri, write_password)
     uribuf.query_len = string.len(uri.query or '')
     uribuf.fragment = uri.fragment
     uribuf.fragment_len = string.len(uri.fragment or '')
-    local str = ffi.new('char[1024]')
+    local str = static_alloc('char', 1024)
     builtin.uri_format(str, 1024, uribuf, write_password and 1 or 0)
     return ffi.string(str)
 end
diff --git a/src/lua/uuid.lua b/src/lua/uuid.lua
index 956ad6e36..f8418cf4d 100644
--- a/src/lua/uuid.lua
+++ b/src/lua/uuid.lua
@@ -1,6 +1,7 @@
 -- uuid.lua (internal file)
 
 local ffi = require("ffi")
+local static_alloc = require('buffer').static_alloc
 local builtin = ffi.C
 
 ffi.cdef[[
@@ -33,7 +34,6 @@ extern const struct tt_uuid uuid_nil;
 local uuid_t = ffi.typeof('struct tt_uuid')
 local UUID_STR_LEN = 36
 local UUID_LEN = ffi.sizeof(uuid_t)
-local uuidbuf = ffi.new(uuid_t)
 
 local uuid_tostring = function(uu)
     if not ffi.istype(uuid_t, uu) then
@@ -69,9 +69,8 @@ local uuid_tobin = function(uu, byteorder)
         return error('Usage: uuid:bin([byteorder])')
     end
     if need_bswap(byteorder) then
-        if uu ~= uuidbuf then
-            ffi.copy(uuidbuf, uu, UUID_LEN)
-        end
+        local uuidbuf = static_alloc('struct tt_uuid')
+        ffi.copy(uuidbuf, uu, UUID_LEN)
         builtin.tt_uuid_bswap(uuidbuf)
         return ffi.string(ffi.cast('char *', uuidbuf), UUID_LEN)
     end
@@ -114,10 +113,12 @@ local uuid_new = function()
 end
 
 local uuid_new_bin = function(byteorder)
+    local uuidbuf = static_alloc('struct tt_uuid')
     builtin.tt_uuid_create(uuidbuf)
     return uuid_tobin(uuidbuf, byteorder)
 end
 local uuid_new_str = function()
+    local uuidbuf = static_alloc('struct tt_uuid')
     builtin.tt_uuid_create(uuidbuf)
     return uuid_tostring(uuidbuf)
 end
diff --git a/test/app/buffer.result b/test/app/buffer.result
new file mode 100644
index 000000000..10494b812
--- /dev/null
+++ b/test/app/buffer.result
@@ -0,0 +1,53 @@
+test_run = require('test_run').new()
+---
+...
+buffer = require('buffer')
+---
+...
+ffi = require('ffi')
+---
+...
+-- Scalar.
+scalar = buffer.scalar
+---
+...
+scalar.u16 = 100
+---
+...
+u16 = ffi.new('uint16_t[1]')
+---
+...
+ffi.copy(u16, scalar, 2)
+---
+...
+u16[0]
+---
+- 100
+...
+u16[0] = 200
+---
+...
+ffi.copy(scalar, u16, 2)
+---
+...
+scalar.u16
+---
+- 200
+...
+-- Alignment.
+_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
+---
+...
+p = buffer.static_alloc('int')
+---
+...
+ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
+---
+- true
+...
+-- Able to allocate bigger than static buffer - such allocations
+-- are on the heap.
+type(buffer.static_alloc('char', 13000))
+---
+- cdata
+...
diff --git a/test/app/buffer.test.lua b/test/app/buffer.test.lua
new file mode 100644
index 000000000..6b1a974ca
--- /dev/null
+++ b/test/app/buffer.test.lua
@@ -0,0 +1,23 @@
+test_run = require('test_run').new()
+buffer = require('buffer')
+ffi = require('ffi')
+
+-- Scalar.
+scalar = buffer.scalar
+scalar.u16 = 100
+u16 = ffi.new('uint16_t[1]')
+ffi.copy(u16, scalar, 2)
+u16[0]
+
+u16[0] = 200
+ffi.copy(scalar, u16, 2)
+scalar.u16
+
+-- Alignment.
+_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
+p = buffer.static_alloc('int')
+ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
+
+-- Able to allocate bigger than static buffer - such allocations
+-- are on the heap.
+type(buffer.static_alloc('char', 13000))
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tarantool-patches] Re: [PATCH 1/1] buffer: port static allocator to Lua
  2019-05-15 13:33 [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua Vladislav Shpilevoy
@ 2019-05-18 18:41 ` Vladislav Shpilevoy
  2019-05-18 19:12 ` Konstantin Osipov
  1 sibling, 0 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2019-05-18 18:41 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Still waiting a review.

On 15/05/2019 16:33, Vladislav Shpilevoy wrote:
> Static allocator gives memory blocks from cyclic BSS memory
> block of 3 pages 4096 bytes each. It is much faster than
> malloc, when a temporary buffer is needed.
> 
> This commit exposes the allocator to Lua, which suffers from
> lack of ability to pass values by pointers into FFI functions,
> nor has a stack to allocate small buffers like 'char[256]'.
> Also these allocations complicate and slow down GC job.
> 
> Static allocator solves most of these problems provindg a way to
> swiftly allocate temporary memory blocks. Besides, for the most
> annoying problem with ffi.new(<type>[1]) the buffer module now
> exposes a statically allocated array of scalar values able to fit
> any 'int', 'char', 'size_t' ... and ready at hand.
> 
> A simple micro benchmark showed, that ffi.new() vs
> buffer.static_alloc() is ~100 times slower, even on small
> allocations of 1Kb, and ~1.5 times slower on tiny allocations
> of 10 bytes. The results do not account GC. It is remarkable,
> buffer.static_alloc() speed does not depend on size, while
> ffi.new() strongly depends.
> 
> The commit is motivated by one another place where
> ffi.new('int[1]') appeared, in SWIM module, to obtain payload
> size as an out parameter of a C function.
> ---
> Branch: https://github.com/tarantool/tarantool/tree/gerold103/static-allocator-lua
> 
>  extra/exports            |  2 ++
>  src/CMakeLists.txt       |  1 +
>  src/box/lua/schema.lua   |  3 +-
>  src/lua/buffer.c         | 42 +++++++++++++++++++++++++++
>  src/lua/buffer.lua       | 61 ++++++++++++++++++++++++++++++++++++++++
>  src/lua/crypto.lua       | 22 +++++++--------
>  src/lua/digest.lua       |  7 +++--
>  src/lua/fio.lua          |  3 +-
>  src/lua/init.c           |  2 +-
>  src/lua/msgpackffi.lua   | 50 +++++++++++++-------------------
>  src/lua/socket.lua       | 36 ++++++++++++++----------
>  src/lua/string.lua       | 10 ++++---
>  src/lua/uri.lua          |  7 +++--
>  src/lua/uuid.lua         |  9 +++---
>  test/app/buffer.result   | 53 ++++++++++++++++++++++++++++++++++
>  test/app/buffer.test.lua | 23 +++++++++++++++
>  16 files changed, 256 insertions(+), 75 deletions(-)
>  create mode 100644 src/lua/buffer.c
>  create mode 100644 test/app/buffer.result
>  create mode 100644 test/app/buffer.test.lua
> 
> diff --git a/extra/exports b/extra/exports
> index 4f41a17b3..5375a01e4 100644
> --- a/extra/exports
> +++ b/extra/exports
> @@ -85,6 +85,8 @@ tnt_EVP_MD_CTX_free
>  tnt_HMAC_CTX_new
>  tnt_HMAC_CTX_free
>  
> +lua_static_aligned_alloc
> +
>  # Module API
>  
>  _say
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
> index a6a18142b..492b8712e 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -116,6 +116,7 @@ set (server_sources
>       lua/utf8.c
>       lua/info.c
>       lua/string.c
> +     lua/buffer.c
>       ${lua_sources}
>       ${PROJECT_SOURCE_DIR}/third_party/lua-yaml/lyaml.cc
>       ${PROJECT_SOURCE_DIR}/third_party/lua-yaml/b64.c
> diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
> index f31cf7f2c..fa6c7c9a4 100644
> --- a/src/box/lua/schema.lua
> +++ b/src/box/lua/schema.lua
> @@ -7,6 +7,7 @@ local fun = require('fun')
>  local log = require('log')
>  local fio = require('fio')
>  local json = require('json')
> +local static_alloc = require('buffer').static_alloc
>  local session = box.session
>  local internal = require('box.internal')
>  local function setmap(table)
> @@ -2103,7 +2104,7 @@ box.schema.user = {}
>  
>  box.schema.user.password = function(password)
>      local BUF_SIZE = 128
> -    local buf = ffi.new("char[?]", BUF_SIZE)
> +    local buf = static_alloc('char', BUF_SIZE)
>      builtin.password_prepare(password, #password, buf, BUF_SIZE)
>      return ffi.string(buf)
>  end
> diff --git a/src/lua/buffer.c b/src/lua/buffer.c
> new file mode 100644
> index 000000000..5fd349261
> --- /dev/null
> +++ b/src/lua/buffer.c
> @@ -0,0 +1,42 @@
> +/*
> + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "small/static.h"
> +
> +/**
> + * Static inline functions like in static.h can't be exported.
> + * Here they are given a symbol to export.
> + */
> +
> +void *
> +lua_static_aligned_alloc(size_t size, size_t alignment)
> +{
> +	return static_aligned_alloc(size, alignment);
> +}
> diff --git a/src/lua/buffer.lua b/src/lua/buffer.lua
> index a72d8d1f9..7fd9f46ed 100644
> --- a/src/lua/buffer.lua
> +++ b/src/lua/buffer.lua
> @@ -33,6 +33,37 @@ ibuf_reinit(struct ibuf *ibuf);
>  
>  void *
>  ibuf_reserve_slow(struct ibuf *ibuf, size_t size);
> +
> +void *
> +lua_static_aligned_alloc(size_t size, size_t alignment);
> +
> +/**
> + * Scalar is a buffer to use with FFI functions, which usually
> + * operate with pointers to scalar values like int, char, size_t,
> + * void *. To avoid doing 'ffi.new(<type>[1])' on each such FFI
> + * function invocation, a module can use one of attributes of the
> + * scalar union.
> + *
> + * Naming policy of the attributes is easy to remember:
> + * 'a' for array type + type name first letters + 'p' for pointer.
> + *
> + * For example:
> + * - int[1] - <a>rray of <i>nt - ai;
> + * - const unsigned char *[1] -
> + *       <a>rray of <c>onst <u>nsigned <c>har <p> pointer - acucp.
> + */
> +union scalar {
> +    size_t as[1];
> +    void *ap[1];
> +    int ai[1];
> +    char ac[1];
> +    const unsigned char *acucp[1];
> +    unsigned long aul[1];
> +    uint16_t u16;
> +    uint32_t u32;
> +    uint64_t u64;
> +    int64_t i64;
> +};
>  ]]
>  
>  local builtin = ffi.C
> @@ -174,8 +205,38 @@ local function ibuf_new(arg, arg2)
>      errorf('Usage: ibuf([size])')
>  end
>  
> +--
> +-- Allocate a chunk of static BSS memory, or use ordinal ffi.new,
> +-- when too big size.
> +-- @param type C type - a struct, a basic type, etc. Should be a
> +--        string: 'int', 'char *', 'struct tuple', etc.
> +-- @param size Optional argument, number of elements of @a type to
> +--        allocate. 1 by default.
> +-- @return Cdata pointer to @a type.
> +--
> +local function static_alloc(type, size)
> +    size = size or 1
> +    local bsize = size * ffi.sizeof(type)
> +    local ptr = builtin.lua_static_aligned_alloc(bsize, ffi.alignof(type))
> +    if ptr ~= nil then
> +        return ffi.cast(type..' *', ptr)
> +    end
> +    return ffi.new(type..'[?]', size)
> +end
> +
> +--
> +-- Sometimes it is wanted to use several temporary scalar cdata
> +-- values. Then one union object is not enough - its attributes
> +-- share memory.
> +--
> +local scalar_array = ffi.new('union scalar[?]', 2)
> +
>  return {
>      ibuf = ibuf_new;
>      IBUF_SHARED = ffi.C.tarantool_lua_ibuf;
>      READAHEAD = READAHEAD;
> +    static_alloc = static_alloc,
> +    scalar_array = scalar_array,
> +    -- Fast access when only one variable is needed.
> +    scalar = scalar_array[0],
>  }
> diff --git a/src/lua/crypto.lua b/src/lua/crypto.lua
> index e76370517..7db6afac9 100644
> --- a/src/lua/crypto.lua
> +++ b/src/lua/crypto.lua
> @@ -2,6 +2,7 @@
>  
>  local ffi = require('ffi')
>  local buffer = require('buffer')
> +local scalar = buffer.scalar
>  
>  ffi.cdef[[
>      int tnt_openssl_init(void);
> @@ -84,7 +85,6 @@ local function digest_new(digest)
>          digest = digest,
>          buf = buffer.ibuf(64),
>          initialized = false,
> -        outl = ffi.new('int[1]')
>      }, digest_mt)
>      self:init()
>      return self
> @@ -114,10 +114,10 @@ local function digest_final(self)
>          return error('Digest not initialized')
>      end
>      self.initialized = false
> -    if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, self.outl) ~= 1 then
> +    if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, scalar.ai) ~= 1 then
>          return error('Can\'t finalize digest: ' .. openssl_err_str())
>      end
> -    return ffi.string(self.buf.wpos, self.outl[0])
> +    return ffi.string(self.buf.wpos, scalar.ai[0])
>  end
>  
>  local function digest_free(self)
> @@ -156,9 +156,7 @@ local function hmac_new(digest, key)
>      local self = setmetatable({
>          ctx = ctx,
>          digest = digest,
> -        buf = buffer.ibuf(64),
>          initialized = false,
> -        outl = ffi.new('int[1]')
>      }, hmac_mt)
>      self:init(key)
>      return self
> @@ -188,10 +186,11 @@ local function hmac_final(self)
>          return error('HMAC not initialized')
>      end
>      self.initialized = false
> -    if ffi.C.HMAC_Final(self.ctx, self.buf.wpos, self.outl) ~= 1 then
> +    local buf = buffer.static_alloc('char', 64)
> +    if ffi.C.HMAC_Final(self.ctx, buf, scalar.ai) ~= 1 then
>          return error('Can\'t finalize HMAC: ' .. openssl_err_str())
>      end
> -    return ffi.string(self.buf.wpos, self.outl[0])
> +    return ffi.string(buf, scalar.ai[0])
>  end
>  
>  local function hmac_free(self)
> @@ -254,7 +253,6 @@ local function cipher_new(cipher, key, iv, direction)
>          direction = direction,
>          buf = buffer.ibuf(),
>          initialized = false,
> -        outl = ffi.new('int[1]')
>      }, cipher_mt)
>      self:init(key, iv)
>      return self
> @@ -279,10 +277,10 @@ local function cipher_update(self, input)
>          error("Usage: cipher:update(string)")
>      end
>      local wpos = self.buf:reserve(input:len() + self.block_size - 1)
> -    if ffi.C.EVP_CipherUpdate(self.ctx, wpos, self.outl, input, input:len()) ~= 1 then
> +    if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then
>          return error('Can\'t update cipher:' .. openssl_err_str())
>      end
> -    return ffi.string(wpos, self.outl[0])
> +    return ffi.string(wpos, scalar.ai[0])
>  end
>  
>  local function cipher_final(self)
> @@ -291,11 +289,11 @@ local function cipher_final(self)
>      end
>      self.initialized = false
>      local wpos = self.buf:reserve(self.block_size - 1)
> -    if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, self.outl) ~= 1 then
> +    if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, scalar.ai) ~= 1 then
>          return error('Can\'t finalize cipher:' .. openssl_err_str())
>      end
>      self.initialized = false
> -    return ffi.string(wpos, self.outl[0])
> +    return ffi.string(wpos, scalar.ai[0])
>  end
>  
>  local function cipher_free(self)
> diff --git a/src/lua/digest.lua b/src/lua/digest.lua
> index 8f199c0af..6ed91cfa2 100644
> --- a/src/lua/digest.lua
> +++ b/src/lua/digest.lua
> @@ -3,6 +3,7 @@
>  local ffi = require('ffi')
>  local crypto = require('crypto')
>  local bit = require('bit')
> +local static_alloc = require('buffer').static_alloc
>  
>  ffi.cdef[[
>      /* internal implementation */
> @@ -179,7 +180,7 @@ local m = {
>          end
>          local blen = #bin
>          local slen = ffi.C.base64_bufsize(blen, mask)
> -        local str  = ffi.new('char[?]', slen)
> +        local str  = static_alloc('char', slen)
>          local len = ffi.C.base64_encode(bin, blen, str, slen, mask)
>          return ffi.string(str, len)
>      end,
> @@ -190,7 +191,7 @@ local m = {
>          end
>          local slen = #str
>          local blen = math.ceil(slen * 3 / 4)
> -        local bin  = ffi.new('char[?]', blen)
> +        local bin  = static_alloc('char', blen)
>          local len = ffi.C.base64_decode(str, slen, bin, blen)
>          return ffi.string(bin, len)
>      end,
> @@ -228,7 +229,7 @@ local m = {
>          if n == nil then
>              error('Usage: digest.urandom(len)')
>          end
> -        local buf = ffi.new('char[?]', n)
> +        local buf = static_alloc('char', n)
>          ffi.C.random_bytes(buf, n)
>          return ffi.string(buf, n)
>      end,
> diff --git a/src/lua/fio.lua b/src/lua/fio.lua
> index 38664a556..fce79c277 100644
> --- a/src/lua/fio.lua
> +++ b/src/lua/fio.lua
> @@ -275,8 +275,7 @@ fio.dirname = function(path)
>      if type(path) ~= 'string' then
>          error("Usage: fio.dirname(path)")
>      end
> -    path = ffi.new('char[?]', #path + 1, path)
> -    return ffi.string(ffi.C.dirname(path))
> +    return ffi.string(ffi.C.dirname(ffi.cast('char *', path)))
>  end
>  
>  fio.umask = function(umask)
> diff --git a/src/lua/init.c b/src/lua/init.c
> index be164bdfb..303369841 100644
> --- a/src/lua/init.c
> +++ b/src/lua/init.c
> @@ -126,9 +126,9 @@ static const char *lua_modules[] = {
>  	"errno", errno_lua,
>  	"fiber", fiber_lua,
>  	"env", env_lua,
> +	"buffer", buffer_lua,
>  	"string", string_lua,
>  	"table", table_lua,
> -	"buffer", buffer_lua,
>  	"msgpackffi", msgpackffi_lua,
>  	"crypto", crypto_lua,
>  	"digest", digest_lua,
> diff --git a/src/lua/msgpackffi.lua b/src/lua/msgpackffi.lua
> index 609334cff..1e4786707 100644
> --- a/src/lua/msgpackffi.lua
> +++ b/src/lua/msgpackffi.lua
> @@ -21,19 +21,10 @@ float
>  mp_decode_float(const char **data);
>  double
>  mp_decode_double(const char **data);
> -union tmpint {
> -    uint16_t u16;
> -    uint32_t u32;
> -    uint64_t u64;
> -};
>  ]])
>  
>  local strict_alignment = (jit.arch == 'arm')
> -
> -local tmpint
> -if strict_alignment then
> -   tmpint = ffi.new('union tmpint[1]')
> -end
> +local scalar = buffer.scalar
>  
>  local function bswap_u16(num)
>      return bit.rshift(bit.bswap(tonumber(num)), 16)
> @@ -70,10 +61,10 @@ end
>  local encode_u16
>  if strict_alignment then
>      encode_u16 = function(buf, code, num)
> -        tmpint[0].u16 = bswap_u16(num)
> +        scalar.u16 = bswap_u16(num)
>          local p = buf:alloc(3)
>          p[0] = code
> -        ffi.copy(p + 1, tmpint, 2)
> +        ffi.copy(p + 1, scalar, 2)
>      end
>  else
>      encode_u16 = function(buf, code, num)
> @@ -86,11 +77,10 @@ end
>  local encode_u32
>  if strict_alignment then
>      encode_u32 = function(buf, code, num)
> -        tmpint[0].u32 =
> -            ffi.cast('uint32_t', bit.bswap(tonumber(num)))
> +        scalar.u32 = ffi.cast('uint32_t', bit.bswap(tonumber(num)))
>          local p = buf:alloc(5)
>          p[0] = code
> -        ffi.copy(p + 1, tmpint, 4)
> +        ffi.copy(p + 1, scalar, 4)
>      end
>  else
>      encode_u32 = function(buf, code, num)
> @@ -104,10 +94,10 @@ end
>  local encode_u64
>  if strict_alignment then
>      encode_u64 = function(buf, code, num)
> -        tmpint[0].u64 = bit.bswap(ffi.cast('uint64_t', num))
> +        scalar.u64 = bit.bswap(ffi.cast('uint64_t', num))
>          local p = buf:alloc(9)
>          p[0] = code
> -        ffi.copy(p + 1, tmpint, 8)
> +        ffi.copy(p + 1, scalar, 8)
>      end
>  else
>      encode_u64 = function(buf, code, num)
> @@ -324,9 +314,9 @@ end
>  local decode_u16
>  if strict_alignment then
>      decode_u16 = function(data)
> -        ffi.copy(tmpint, data[0], 2)
> +        ffi.copy(scalar, data[0], 2)
>          data[0] = data[0] + 2
> -        return tonumber(bswap_u16(tmpint[0].u16))
> +        return tonumber(bswap_u16(scalar.u16))
>      end
>  else
>      decode_u16 = function(data)
> @@ -339,10 +329,10 @@ end
>  local decode_u32
>  if strict_alignment then
>      decode_u32 = function(data)
> -        ffi.copy(tmpint, data[0], 4)
> +        ffi.copy(scalar, data[0], 4)
>          data[0] = data[0] + 4
>          return tonumber(
> -            ffi.cast('uint32_t', bit.bswap(tonumber(tmpint[0].u32))))
> +            ffi.cast('uint32_t', bit.bswap(tonumber(scalar.u32))))
>      end
>  else
>      decode_u32 = function(data)
> @@ -356,9 +346,9 @@ end
>  local decode_u64
>  if strict_alignment then
>      decode_u64 = function(data)
> -        ffi.copy(tmpint, data[0], 8);
> +        ffi.copy(scalar, data[0], 8);
>          data[0] = data[0] + 8
> -        local num = bit.bswap(tmpint[0].u64)
> +        local num = bit.bswap(scalar.u64)
>          if num <= DBL_INT_MAX then
>              return tonumber(num) -- return as 'number'
>          end
> @@ -385,8 +375,8 @@ end
>  local decode_i16
>  if strict_alignment then
>      decode_i16 = function(data)
> -        ffi.copy(tmpint, data[0], 2)
> -        local num = bswap_u16(tmpint[0].u16)
> +        ffi.copy(scalar, data[0], 2)
> +        local num = bswap_u16(scalar.u16)
>          data[0] = data[0] + 2
>          -- note: this double cast is actually necessary
>          return tonumber(ffi.cast('int16_t', ffi.cast('uint16_t', num)))
> @@ -403,8 +393,8 @@ end
>  local decode_i32
>  if strict_alignment then
>      decode_i32 = function(data)
> -        ffi.copy(tmpint, data[0], 4)
> -        local num = bit.bswap(tonumber(tmpint[0].u32))
> +        ffi.copy(scalar, data[0], 4)
> +        local num = bit.bswap(tonumber(scalar.u32))
>          data[0] = data[0] + 4
>          return num
>      end
> @@ -419,9 +409,9 @@ end
>  local decode_i64
>  if strict_alignment then
>      decode_i64 = function(data)
> -        ffi.copy(tmpint, data[0], 8)
> +        ffi.copy(scalar, data[0], 8)
>          data[0] = data[0] + 8
> -        local num = bit.bswap(ffi.cast('int64_t', tmpint[0].u64))
> +        local num = bit.bswap(scalar.i64)
>          if num >= -DBL_INT_MAX and num <= DBL_INT_MAX then
>              return tonumber(num) -- return as 'number'
>          end
> @@ -552,7 +542,7 @@ end
>  -- element. It is significally faster on LuaJIT to use double pointer than
>  -- return result, newpos.
>  --
> -local bufp = ffi.new('const unsigned char *[1]');
> +local bufp = scalar.acucp;
>  
>  local function check_offset(offset, len)
>      if offset == nil then
> diff --git a/src/lua/socket.lua b/src/lua/socket.lua
> index b2700e0c0..1ddafee2f 100644
> --- a/src/lua/socket.lua
> +++ b/src/lua/socket.lua
> @@ -10,6 +10,9 @@ local fiber = require('fiber')
>  local fio = require('fio')
>  local log = require('log')
>  local buffer = require('buffer')
> +local scalar = buffer.scalar
> +local scalar_array = buffer.scalar_array
> +local static_alloc = buffer.static_alloc
>  
>  local format = string.format
>  
> @@ -472,9 +475,9 @@ local function socket_setsockopt(self, level, name, value)
>      end
>  
>      if info.type == 1 then
> -        local value = ffi.new("int[1]", value)
> +        scalar.ai[0] = value
>          local res = ffi.C.setsockopt(fd,
> -            level, info.iname, value, ffi.sizeof('int'))
> +            level, info.iname, scalar.ai, ffi.sizeof('int'))
>  
>          if res < 0 then
>              self._errno = boxerrno()
> @@ -516,8 +519,10 @@ local function socket_getsockopt(self, level, name)
>      self._errno = nil
>  
>      if info.type == 1 then
> -        local value = ffi.new("int[1]", 0)
> -        local len = ffi.new("size_t[1]", ffi.sizeof('int'))
> +        local value = scalar_array[0].ai
> +        value[0] = 0
> +        local len = scalar_array[1].as
> +        len[0] = ffi.sizeof('int')
>          local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
>  
>          if res < 0 then
> @@ -532,8 +537,9 @@ local function socket_getsockopt(self, level, name)
>      end
>  
>      if info.type == 2 then
> -        local value = ffi.new("char[256]", { 0 })
> -        local len = ffi.new("size_t[1]", 256)
> +        local value = static_alloc('char', 256)
> +        local len = scalar.as
> +        len[0] = 256
>          local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
>          if res < 0 then
>              self._errno = boxerrno()
> @@ -555,8 +561,9 @@ local function socket_linger(self, active, timeout)
>      local info = internal.SO_OPT[level].SO_LINGER
>      self._errno = nil
>      if active == nil then
> -        local value = ffi.new("linger_t[1]")
> -        local len = ffi.new("size_t[1]", 2 * ffi.sizeof('int'))
> +        local value = static_alloc('linger_t')
> +        local len = scalar.as
> +        len[0] = 2 * ffi.sizeof('linger_t')
>          local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
>          if res < 0 then
>              self._errno = boxerrno()
> @@ -581,9 +588,10 @@ local function socket_linger(self, active, timeout)
>          iactive = 0
>      end
>  
> -    local value = ffi.new("linger_t[1]",
> -        { { active = iactive, timeout = timeout } })
> -    local len = 2 * ffi.sizeof('int')
> +    local value = static_alloc('linger_t')
> +    value[0].active = iactive
> +    value[0].timeout = timeout
> +    local len = ffi.sizeof('linger_t')
>      local res = ffi.C.setsockopt(fd, level, info.iname, value, len)
>      if res < 0 then
>          self._errno = boxerrno()
> @@ -798,8 +806,7 @@ local function get_recv_size(self, size)
>              -- them using message peek.
>              local iflags = get_iflags(internal.SEND_FLAGS, {'MSG_PEEK'})
>              assert(iflags ~= nil)
> -            local buf = ffi.new('char[?]', 1)
> -            size = tonumber(ffi.C.recv(fd, buf, 1, iflags))
> +            size = tonumber(ffi.C.recv(fd, scalar.ac, 1, iflags))
>              -- Prevent race condition: proceed with the case when
>              -- a datagram of length > 0 has been arrived after the
>              -- getsockopt call above.
> @@ -836,8 +843,7 @@ local function socket_recv(self, size, flags)
>      end
>  
>      self._errno = nil
> -    local buf = ffi.new("char[?]", size)
> -
> +    local buf = static_alloc('char', size)
>      local res = ffi.C.recv(fd, buf, size, iflags)
>  
>      if res == -1 then
> diff --git a/src/lua/string.lua b/src/lua/string.lua
> index 8216ace6a..6ed6be8bf 100644
> --- a/src/lua/string.lua
> +++ b/src/lua/string.lua
> @@ -1,4 +1,6 @@
>  local ffi = require('ffi')
> +local buffer = require('buffer')
> +local static_alloc = buffer.static_alloc
>  
>  ffi.cdef[[
>      const char *
> @@ -13,8 +15,8 @@ ffi.cdef[[
>  ]]
>  
>  local c_char_ptr     = ffi.typeof('const char *')
> -local strip_newstart = ffi.new("unsigned long[1]")
> -local strip_newlen   = ffi.new("unsigned long[1]")
> +local strip_newstart = buffer.scalar_array[0].aul
> +local strip_newlen   = buffer.scalar_array[1].aul
>  
>  local memcmp  = ffi.C.memcmp
>  local memmem  = ffi.C.memmem
> @@ -290,7 +292,7 @@ local function string_hex(inp)
>          error(err_string_arg:format(1, 'string.hex', 'string', type(inp)), 2)
>      end
>      local len = inp:len() * 2
> -    local res = ffi.new('char[?]', len + 1)
> +    local res = static_alloc('char', len + 1)
>  
>      local uinp = ffi.cast('const unsigned char *', inp)
>      for i = 0, inp:len() - 1 do
> @@ -333,7 +335,7 @@ local function string_fromhex(inp)
>      end
>      local len = inp:len() / 2
>      local casted_inp = ffi.cast('const char *', inp)
> -    local res = ffi.new('char[?]', len)
> +    local res = static_alloc('char', len)
>      for i = 0, len - 1 do
>          local first = hexadecimals_mapping[casted_inp[i * 2]]
>          local second = hexadecimals_mapping[casted_inp[i * 2 + 1]]
> diff --git a/src/lua/uri.lua b/src/lua/uri.lua
> index d2946cd2d..5967c8bf2 100644
> --- a/src/lua/uri.lua
> +++ b/src/lua/uri.lua
> @@ -1,6 +1,7 @@
>  -- uri.lua (internal file)
>  
>  local ffi = require('ffi')
> +local static_alloc = require('buffer').static_alloc
>  
>  ffi.cdef[[
>  struct uri {
> @@ -32,12 +33,11 @@ uri_format(char *str, size_t len, struct uri *uri, bool write_password);
>  
>  local builtin = ffi.C;
>  
> -local uribuf = ffi.new('struct uri')
> -
>  local function parse(str)
>      if str == nil then
>          error("Usage: uri.parse(string)")
>      end
> +    local uribuf = static_alloc('struct uri')
>      if builtin.uri_parse(uribuf, str) ~= 0 then
>          return nil
>      end
> @@ -59,6 +59,7 @@ local function parse(str)
>  end
>  
>  local function format(uri, write_password)
> +    local uribuf = static_alloc('struct uri')
>      uribuf.scheme = uri.scheme
>      uribuf.scheme_len = string.len(uri.scheme or '')
>      uribuf.login = uri.login
> @@ -75,7 +76,7 @@ local function format(uri, write_password)
>      uribuf.query_len = string.len(uri.query or '')
>      uribuf.fragment = uri.fragment
>      uribuf.fragment_len = string.len(uri.fragment or '')
> -    local str = ffi.new('char[1024]')
> +    local str = static_alloc('char', 1024)
>      builtin.uri_format(str, 1024, uribuf, write_password and 1 or 0)
>      return ffi.string(str)
>  end
> diff --git a/src/lua/uuid.lua b/src/lua/uuid.lua
> index 956ad6e36..f8418cf4d 100644
> --- a/src/lua/uuid.lua
> +++ b/src/lua/uuid.lua
> @@ -1,6 +1,7 @@
>  -- uuid.lua (internal file)
>  
>  local ffi = require("ffi")
> +local static_alloc = require('buffer').static_alloc
>  local builtin = ffi.C
>  
>  ffi.cdef[[
> @@ -33,7 +34,6 @@ extern const struct tt_uuid uuid_nil;
>  local uuid_t = ffi.typeof('struct tt_uuid')
>  local UUID_STR_LEN = 36
>  local UUID_LEN = ffi.sizeof(uuid_t)
> -local uuidbuf = ffi.new(uuid_t)
>  
>  local uuid_tostring = function(uu)
>      if not ffi.istype(uuid_t, uu) then
> @@ -69,9 +69,8 @@ local uuid_tobin = function(uu, byteorder)
>          return error('Usage: uuid:bin([byteorder])')
>      end
>      if need_bswap(byteorder) then
> -        if uu ~= uuidbuf then
> -            ffi.copy(uuidbuf, uu, UUID_LEN)
> -        end
> +        local uuidbuf = static_alloc('struct tt_uuid')
> +        ffi.copy(uuidbuf, uu, UUID_LEN)
>          builtin.tt_uuid_bswap(uuidbuf)
>          return ffi.string(ffi.cast('char *', uuidbuf), UUID_LEN)
>      end
> @@ -114,10 +113,12 @@ local uuid_new = function()
>  end
>  
>  local uuid_new_bin = function(byteorder)
> +    local uuidbuf = static_alloc('struct tt_uuid')
>      builtin.tt_uuid_create(uuidbuf)
>      return uuid_tobin(uuidbuf, byteorder)
>  end
>  local uuid_new_str = function()
> +    local uuidbuf = static_alloc('struct tt_uuid')
>      builtin.tt_uuid_create(uuidbuf)
>      return uuid_tostring(uuidbuf)
>  end
> diff --git a/test/app/buffer.result b/test/app/buffer.result
> new file mode 100644
> index 000000000..10494b812
> --- /dev/null
> +++ b/test/app/buffer.result
> @@ -0,0 +1,53 @@
> +test_run = require('test_run').new()
> +---
> +...
> +buffer = require('buffer')
> +---
> +...
> +ffi = require('ffi')
> +---
> +...
> +-- Scalar.
> +scalar = buffer.scalar
> +---
> +...
> +scalar.u16 = 100
> +---
> +...
> +u16 = ffi.new('uint16_t[1]')
> +---
> +...
> +ffi.copy(u16, scalar, 2)
> +---
> +...
> +u16[0]
> +---
> +- 100
> +...
> +u16[0] = 200
> +---
> +...
> +ffi.copy(scalar, u16, 2)
> +---
> +...
> +scalar.u16
> +---
> +- 200
> +...
> +-- Alignment.
> +_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
> +---
> +...
> +p = buffer.static_alloc('int')
> +---
> +...
> +ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
> +---
> +- true
> +...
> +-- Able to allocate bigger than static buffer - such allocations
> +-- are on the heap.
> +type(buffer.static_alloc('char', 13000))
> +---
> +- cdata
> +...
> diff --git a/test/app/buffer.test.lua b/test/app/buffer.test.lua
> new file mode 100644
> index 000000000..6b1a974ca
> --- /dev/null
> +++ b/test/app/buffer.test.lua
> @@ -0,0 +1,23 @@
> +test_run = require('test_run').new()
> +buffer = require('buffer')
> +ffi = require('ffi')
> +
> +-- Scalar.
> +scalar = buffer.scalar
> +scalar.u16 = 100
> +u16 = ffi.new('uint16_t[1]')
> +ffi.copy(u16, scalar, 2)
> +u16[0]
> +
> +u16[0] = 200
> +ffi.copy(scalar, u16, 2)
> +scalar.u16
> +
> +-- Alignment.
> +_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
> +p = buffer.static_alloc('int')
> +ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
> +
> +-- Able to allocate bigger than static buffer - such allocations
> +-- are on the heap.
> +type(buffer.static_alloc('char', 13000))
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tarantool-patches] Re: [PATCH 1/1] buffer: port static allocator to Lua
  2019-05-15 13:33 [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua Vladislav Shpilevoy
  2019-05-18 18:41 ` [tarantool-patches] " Vladislav Shpilevoy
@ 2019-05-18 19:12 ` Konstantin Osipov
  2019-05-18 22:15   ` Vladislav Shpilevoy
  1 sibling, 1 reply; 4+ messages in thread
From: Konstantin Osipov @ 2019-05-18 19:12 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/05/15 16:34]:

1) scalar looks clearly as a separate patch. while I like the idea
   I don't appreciate the urgency and a huge patch.

2) The static alloc patch is OK to push

3) Since you care about performance so much, I don't understand
   why you do concat (..) in Lua static alloc. Did you check it's
   a compile-time, not a run-time concat? 
> +    if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then

I don't understand, if you need a scalar and an array of scalars,
and you have scalar and scalar_array variables, why would you ever
use an array in scalar? Wouldn't it be better to have two unions
instead of one for the sake of clarity?


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tarantool-patches] Re: [PATCH 1/1] buffer: port static allocator to Lua
  2019-05-18 19:12 ` Konstantin Osipov
@ 2019-05-18 22:15   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2019-05-18 22:15 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches



On 18/05/2019 22:12, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/05/15 16:34]:
> 
> 1) scalar looks clearly as a separate patch. while I like the idea
>    I don't appreciate the urgency and a huge patch.

Done in a separate commit, see v2 thread. Urgency is motivated by
SWIM Lua API dependency on that - I do not want to push SWIM Lua
with ugly "ffi.new('int[1]')" construction.

> 
> 2) The static alloc patch is OK to push
> 
> 3) Since you care about performance so much, I don't understand
>    why you do concat (..) in Lua static alloc. Did you check it's
>    a compile-time, not a run-time concat? 

I did not check whether it is compile time, because Lua has no a
compiler. But I made a benchmark - even with concat it is orders of
magnitude faster.

Nonetheless, probably you could be right, and it would be better to
avoid concatenation - then impact would be even greater.

Firstly, I tried to avoid concatenation and pass the array type
always. But I faced a problem, that ffi.cast does not allow some
C conversions. I needed to be able to do
ffi.cast('type[size]', <cdata void *>) - in C I would not have
problems, but in Lua I do. I started looking into FFI implementation,
and found how to allow that conversion, but then I decided that
it is not worth doing without a benchmark: concact vs no concat.

Appeared, that noconcat version is ~5% faster. In the summary,
original patch makes small allocations +10000% faster, without
concat it would be +10005%. Not sure, if this difference is worth
splitting static_alloc() into multiple functions, nor moving
ffi.sizeof() + ffi.cast() operations onto the caller's shoulders.

So I decided to keep it is as. Otherwise it becomes too complicated.

>> +    if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then
> 
> I don't understand, if you need a scalar and an array of scalars,
> and you have scalar and scalar_array variables, why would you ever
> use an array in scalar?

I use array types [1] in scalar, because it is the only way to
get a pointer onto a type in Lua.

> Wouldn't it be better to have two unions
> instead of one for the sake of clarity?

I have - scalar_array is two unions. I thought about their
separation in something like buffer.scalar1, buffer.scalar2, but
looks ugly, unnatural.

But here I do not insist. If you want, I can split them into two
variables. Should I?

See V2 thread.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-05-18 22:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-15 13:33 [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua Vladislav Shpilevoy
2019-05-18 18:41 ` [tarantool-patches] " Vladislav Shpilevoy
2019-05-18 19:12 ` Konstantin Osipov
2019-05-18 22:15   ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox