* [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua
@ 2019-05-15 13:33 Vladislav Shpilevoy
2019-05-18 18:41 ` [tarantool-patches] " Vladislav Shpilevoy
2019-05-18 19:12 ` Konstantin Osipov
0 siblings, 2 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2019-05-15 13:33 UTC (permalink / raw)
To: tarantool-patches; +Cc: kostja
Static allocator gives memory blocks from cyclic BSS memory
block of 3 pages 4096 bytes each. It is much faster than
malloc, when a temporary buffer is needed.
This commit exposes the allocator to Lua, which suffers from
lack of ability to pass values by pointers into FFI functions,
nor has a stack to allocate small buffers like 'char[256]'.
Also these allocations complicate and slow down GC job.
Static allocator solves most of these problems provindg a way to
swiftly allocate temporary memory blocks. Besides, for the most
annoying problem with ffi.new(<type>[1]) the buffer module now
exposes a statically allocated array of scalar values able to fit
any 'int', 'char', 'size_t' ... and ready at hand.
A simple micro benchmark showed, that ffi.new() vs
buffer.static_alloc() is ~100 times slower, even on small
allocations of 1Kb, and ~1.5 times slower on tiny allocations
of 10 bytes. The results do not account GC. It is remarkable,
buffer.static_alloc() speed does not depend on size, while
ffi.new() strongly depends.
The commit is motivated by one another place where
ffi.new('int[1]') appeared, in SWIM module, to obtain payload
size as an out parameter of a C function.
---
Branch: https://github.com/tarantool/tarantool/tree/gerold103/static-allocator-lua
| 2 ++
src/CMakeLists.txt | 1 +
src/box/lua/schema.lua | 3 +-
src/lua/buffer.c | 42 +++++++++++++++++++++++++++
src/lua/buffer.lua | 61 ++++++++++++++++++++++++++++++++++++++++
src/lua/crypto.lua | 22 +++++++--------
src/lua/digest.lua | 7 +++--
src/lua/fio.lua | 3 +-
src/lua/init.c | 2 +-
src/lua/msgpackffi.lua | 50 +++++++++++++-------------------
src/lua/socket.lua | 36 ++++++++++++++----------
src/lua/string.lua | 10 ++++---
src/lua/uri.lua | 7 +++--
src/lua/uuid.lua | 9 +++---
test/app/buffer.result | 53 ++++++++++++++++++++++++++++++++++
test/app/buffer.test.lua | 23 +++++++++++++++
16 files changed, 256 insertions(+), 75 deletions(-)
create mode 100644 src/lua/buffer.c
create mode 100644 test/app/buffer.result
create mode 100644 test/app/buffer.test.lua
--git a/extra/exports b/extra/exports
index 4f41a17b3..5375a01e4 100644
--- a/extra/exports
+++ b/extra/exports
@@ -85,6 +85,8 @@ tnt_EVP_MD_CTX_free
tnt_HMAC_CTX_new
tnt_HMAC_CTX_free
+lua_static_aligned_alloc
+
# Module API
_say
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index a6a18142b..492b8712e 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -116,6 +116,7 @@ set (server_sources
lua/utf8.c
lua/info.c
lua/string.c
+ lua/buffer.c
${lua_sources}
${PROJECT_SOURCE_DIR}/third_party/lua-yaml/lyaml.cc
${PROJECT_SOURCE_DIR}/third_party/lua-yaml/b64.c
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index f31cf7f2c..fa6c7c9a4 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -7,6 +7,7 @@ local fun = require('fun')
local log = require('log')
local fio = require('fio')
local json = require('json')
+local static_alloc = require('buffer').static_alloc
local session = box.session
local internal = require('box.internal')
local function setmap(table)
@@ -2103,7 +2104,7 @@ box.schema.user = {}
box.schema.user.password = function(password)
local BUF_SIZE = 128
- local buf = ffi.new("char[?]", BUF_SIZE)
+ local buf = static_alloc('char', BUF_SIZE)
builtin.password_prepare(password, #password, buf, BUF_SIZE)
return ffi.string(buf)
end
diff --git a/src/lua/buffer.c b/src/lua/buffer.c
new file mode 100644
index 000000000..5fd349261
--- /dev/null
+++ b/src/lua/buffer.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the
+ * following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include "small/static.h"
+
+/**
+ * Static inline functions like in static.h can't be exported.
+ * Here they are given a symbol to export.
+ */
+
+void *
+lua_static_aligned_alloc(size_t size, size_t alignment)
+{
+ return static_aligned_alloc(size, alignment);
+}
diff --git a/src/lua/buffer.lua b/src/lua/buffer.lua
index a72d8d1f9..7fd9f46ed 100644
--- a/src/lua/buffer.lua
+++ b/src/lua/buffer.lua
@@ -33,6 +33,37 @@ ibuf_reinit(struct ibuf *ibuf);
void *
ibuf_reserve_slow(struct ibuf *ibuf, size_t size);
+
+void *
+lua_static_aligned_alloc(size_t size, size_t alignment);
+
+/**
+ * Scalar is a buffer to use with FFI functions, which usually
+ * operate with pointers to scalar values like int, char, size_t,
+ * void *. To avoid doing 'ffi.new(<type>[1])' on each such FFI
+ * function invocation, a module can use one of attributes of the
+ * scalar union.
+ *
+ * Naming policy of the attributes is easy to remember:
+ * 'a' for array type + type name first letters + 'p' for pointer.
+ *
+ * For example:
+ * - int[1] - <a>rray of <i>nt - ai;
+ * - const unsigned char *[1] -
+ * <a>rray of <c>onst <u>nsigned <c>har <p> pointer - acucp.
+ */
+union scalar {
+ size_t as[1];
+ void *ap[1];
+ int ai[1];
+ char ac[1];
+ const unsigned char *acucp[1];
+ unsigned long aul[1];
+ uint16_t u16;
+ uint32_t u32;
+ uint64_t u64;
+ int64_t i64;
+};
]]
local builtin = ffi.C
@@ -174,8 +205,38 @@ local function ibuf_new(arg, arg2)
errorf('Usage: ibuf([size])')
end
+--
+-- Allocate a chunk of static BSS memory, or use ordinal ffi.new,
+-- when too big size.
+-- @param type C type - a struct, a basic type, etc. Should be a
+-- string: 'int', 'char *', 'struct tuple', etc.
+-- @param size Optional argument, number of elements of @a type to
+-- allocate. 1 by default.
+-- @return Cdata pointer to @a type.
+--
+local function static_alloc(type, size)
+ size = size or 1
+ local bsize = size * ffi.sizeof(type)
+ local ptr = builtin.lua_static_aligned_alloc(bsize, ffi.alignof(type))
+ if ptr ~= nil then
+ return ffi.cast(type..' *', ptr)
+ end
+ return ffi.new(type..'[?]', size)
+end
+
+--
+-- Sometimes it is wanted to use several temporary scalar cdata
+-- values. Then one union object is not enough - its attributes
+-- share memory.
+--
+local scalar_array = ffi.new('union scalar[?]', 2)
+
return {
ibuf = ibuf_new;
IBUF_SHARED = ffi.C.tarantool_lua_ibuf;
READAHEAD = READAHEAD;
+ static_alloc = static_alloc,
+ scalar_array = scalar_array,
+ -- Fast access when only one variable is needed.
+ scalar = scalar_array[0],
}
diff --git a/src/lua/crypto.lua b/src/lua/crypto.lua
index e76370517..7db6afac9 100644
--- a/src/lua/crypto.lua
+++ b/src/lua/crypto.lua
@@ -2,6 +2,7 @@
local ffi = require('ffi')
local buffer = require('buffer')
+local scalar = buffer.scalar
ffi.cdef[[
int tnt_openssl_init(void);
@@ -84,7 +85,6 @@ local function digest_new(digest)
digest = digest,
buf = buffer.ibuf(64),
initialized = false,
- outl = ffi.new('int[1]')
}, digest_mt)
self:init()
return self
@@ -114,10 +114,10 @@ local function digest_final(self)
return error('Digest not initialized')
end
self.initialized = false
- if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, self.outl) ~= 1 then
+ if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, scalar.ai) ~= 1 then
return error('Can\'t finalize digest: ' .. openssl_err_str())
end
- return ffi.string(self.buf.wpos, self.outl[0])
+ return ffi.string(self.buf.wpos, scalar.ai[0])
end
local function digest_free(self)
@@ -156,9 +156,7 @@ local function hmac_new(digest, key)
local self = setmetatable({
ctx = ctx,
digest = digest,
- buf = buffer.ibuf(64),
initialized = false,
- outl = ffi.new('int[1]')
}, hmac_mt)
self:init(key)
return self
@@ -188,10 +186,11 @@ local function hmac_final(self)
return error('HMAC not initialized')
end
self.initialized = false
- if ffi.C.HMAC_Final(self.ctx, self.buf.wpos, self.outl) ~= 1 then
+ local buf = buffer.static_alloc('char', 64)
+ if ffi.C.HMAC_Final(self.ctx, buf, scalar.ai) ~= 1 then
return error('Can\'t finalize HMAC: ' .. openssl_err_str())
end
- return ffi.string(self.buf.wpos, self.outl[0])
+ return ffi.string(buf, scalar.ai[0])
end
local function hmac_free(self)
@@ -254,7 +253,6 @@ local function cipher_new(cipher, key, iv, direction)
direction = direction,
buf = buffer.ibuf(),
initialized = false,
- outl = ffi.new('int[1]')
}, cipher_mt)
self:init(key, iv)
return self
@@ -279,10 +277,10 @@ local function cipher_update(self, input)
error("Usage: cipher:update(string)")
end
local wpos = self.buf:reserve(input:len() + self.block_size - 1)
- if ffi.C.EVP_CipherUpdate(self.ctx, wpos, self.outl, input, input:len()) ~= 1 then
+ if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then
return error('Can\'t update cipher:' .. openssl_err_str())
end
- return ffi.string(wpos, self.outl[0])
+ return ffi.string(wpos, scalar.ai[0])
end
local function cipher_final(self)
@@ -291,11 +289,11 @@ local function cipher_final(self)
end
self.initialized = false
local wpos = self.buf:reserve(self.block_size - 1)
- if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, self.outl) ~= 1 then
+ if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, scalar.ai) ~= 1 then
return error('Can\'t finalize cipher:' .. openssl_err_str())
end
self.initialized = false
- return ffi.string(wpos, self.outl[0])
+ return ffi.string(wpos, scalar.ai[0])
end
local function cipher_free(self)
diff --git a/src/lua/digest.lua b/src/lua/digest.lua
index 8f199c0af..6ed91cfa2 100644
--- a/src/lua/digest.lua
+++ b/src/lua/digest.lua
@@ -3,6 +3,7 @@
local ffi = require('ffi')
local crypto = require('crypto')
local bit = require('bit')
+local static_alloc = require('buffer').static_alloc
ffi.cdef[[
/* internal implementation */
@@ -179,7 +180,7 @@ local m = {
end
local blen = #bin
local slen = ffi.C.base64_bufsize(blen, mask)
- local str = ffi.new('char[?]', slen)
+ local str = static_alloc('char', slen)
local len = ffi.C.base64_encode(bin, blen, str, slen, mask)
return ffi.string(str, len)
end,
@@ -190,7 +191,7 @@ local m = {
end
local slen = #str
local blen = math.ceil(slen * 3 / 4)
- local bin = ffi.new('char[?]', blen)
+ local bin = static_alloc('char', blen)
local len = ffi.C.base64_decode(str, slen, bin, blen)
return ffi.string(bin, len)
end,
@@ -228,7 +229,7 @@ local m = {
if n == nil then
error('Usage: digest.urandom(len)')
end
- local buf = ffi.new('char[?]', n)
+ local buf = static_alloc('char', n)
ffi.C.random_bytes(buf, n)
return ffi.string(buf, n)
end,
diff --git a/src/lua/fio.lua b/src/lua/fio.lua
index 38664a556..fce79c277 100644
--- a/src/lua/fio.lua
+++ b/src/lua/fio.lua
@@ -275,8 +275,7 @@ fio.dirname = function(path)
if type(path) ~= 'string' then
error("Usage: fio.dirname(path)")
end
- path = ffi.new('char[?]', #path + 1, path)
- return ffi.string(ffi.C.dirname(path))
+ return ffi.string(ffi.C.dirname(ffi.cast('char *', path)))
end
fio.umask = function(umask)
diff --git a/src/lua/init.c b/src/lua/init.c
index be164bdfb..303369841 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -126,9 +126,9 @@ static const char *lua_modules[] = {
"errno", errno_lua,
"fiber", fiber_lua,
"env", env_lua,
+ "buffer", buffer_lua,
"string", string_lua,
"table", table_lua,
- "buffer", buffer_lua,
"msgpackffi", msgpackffi_lua,
"crypto", crypto_lua,
"digest", digest_lua,
diff --git a/src/lua/msgpackffi.lua b/src/lua/msgpackffi.lua
index 609334cff..1e4786707 100644
--- a/src/lua/msgpackffi.lua
+++ b/src/lua/msgpackffi.lua
@@ -21,19 +21,10 @@ float
mp_decode_float(const char **data);
double
mp_decode_double(const char **data);
-union tmpint {
- uint16_t u16;
- uint32_t u32;
- uint64_t u64;
-};
]])
local strict_alignment = (jit.arch == 'arm')
-
-local tmpint
-if strict_alignment then
- tmpint = ffi.new('union tmpint[1]')
-end
+local scalar = buffer.scalar
local function bswap_u16(num)
return bit.rshift(bit.bswap(tonumber(num)), 16)
@@ -70,10 +61,10 @@ end
local encode_u16
if strict_alignment then
encode_u16 = function(buf, code, num)
- tmpint[0].u16 = bswap_u16(num)
+ scalar.u16 = bswap_u16(num)
local p = buf:alloc(3)
p[0] = code
- ffi.copy(p + 1, tmpint, 2)
+ ffi.copy(p + 1, scalar, 2)
end
else
encode_u16 = function(buf, code, num)
@@ -86,11 +77,10 @@ end
local encode_u32
if strict_alignment then
encode_u32 = function(buf, code, num)
- tmpint[0].u32 =
- ffi.cast('uint32_t', bit.bswap(tonumber(num)))
+ scalar.u32 = ffi.cast('uint32_t', bit.bswap(tonumber(num)))
local p = buf:alloc(5)
p[0] = code
- ffi.copy(p + 1, tmpint, 4)
+ ffi.copy(p + 1, scalar, 4)
end
else
encode_u32 = function(buf, code, num)
@@ -104,10 +94,10 @@ end
local encode_u64
if strict_alignment then
encode_u64 = function(buf, code, num)
- tmpint[0].u64 = bit.bswap(ffi.cast('uint64_t', num))
+ scalar.u64 = bit.bswap(ffi.cast('uint64_t', num))
local p = buf:alloc(9)
p[0] = code
- ffi.copy(p + 1, tmpint, 8)
+ ffi.copy(p + 1, scalar, 8)
end
else
encode_u64 = function(buf, code, num)
@@ -324,9 +314,9 @@ end
local decode_u16
if strict_alignment then
decode_u16 = function(data)
- ffi.copy(tmpint, data[0], 2)
+ ffi.copy(scalar, data[0], 2)
data[0] = data[0] + 2
- return tonumber(bswap_u16(tmpint[0].u16))
+ return tonumber(bswap_u16(scalar.u16))
end
else
decode_u16 = function(data)
@@ -339,10 +329,10 @@ end
local decode_u32
if strict_alignment then
decode_u32 = function(data)
- ffi.copy(tmpint, data[0], 4)
+ ffi.copy(scalar, data[0], 4)
data[0] = data[0] + 4
return tonumber(
- ffi.cast('uint32_t', bit.bswap(tonumber(tmpint[0].u32))))
+ ffi.cast('uint32_t', bit.bswap(tonumber(scalar.u32))))
end
else
decode_u32 = function(data)
@@ -356,9 +346,9 @@ end
local decode_u64
if strict_alignment then
decode_u64 = function(data)
- ffi.copy(tmpint, data[0], 8);
+ ffi.copy(scalar, data[0], 8);
data[0] = data[0] + 8
- local num = bit.bswap(tmpint[0].u64)
+ local num = bit.bswap(scalar.u64)
if num <= DBL_INT_MAX then
return tonumber(num) -- return as 'number'
end
@@ -385,8 +375,8 @@ end
local decode_i16
if strict_alignment then
decode_i16 = function(data)
- ffi.copy(tmpint, data[0], 2)
- local num = bswap_u16(tmpint[0].u16)
+ ffi.copy(scalar, data[0], 2)
+ local num = bswap_u16(scalar.u16)
data[0] = data[0] + 2
-- note: this double cast is actually necessary
return tonumber(ffi.cast('int16_t', ffi.cast('uint16_t', num)))
@@ -403,8 +393,8 @@ end
local decode_i32
if strict_alignment then
decode_i32 = function(data)
- ffi.copy(tmpint, data[0], 4)
- local num = bit.bswap(tonumber(tmpint[0].u32))
+ ffi.copy(scalar, data[0], 4)
+ local num = bit.bswap(tonumber(scalar.u32))
data[0] = data[0] + 4
return num
end
@@ -419,9 +409,9 @@ end
local decode_i64
if strict_alignment then
decode_i64 = function(data)
- ffi.copy(tmpint, data[0], 8)
+ ffi.copy(scalar, data[0], 8)
data[0] = data[0] + 8
- local num = bit.bswap(ffi.cast('int64_t', tmpint[0].u64))
+ local num = bit.bswap(scalar.i64)
if num >= -DBL_INT_MAX and num <= DBL_INT_MAX then
return tonumber(num) -- return as 'number'
end
@@ -552,7 +542,7 @@ end
-- element. It is significally faster on LuaJIT to use double pointer than
-- return result, newpos.
--
-local bufp = ffi.new('const unsigned char *[1]');
+local bufp = scalar.acucp;
local function check_offset(offset, len)
if offset == nil then
diff --git a/src/lua/socket.lua b/src/lua/socket.lua
index b2700e0c0..1ddafee2f 100644
--- a/src/lua/socket.lua
+++ b/src/lua/socket.lua
@@ -10,6 +10,9 @@ local fiber = require('fiber')
local fio = require('fio')
local log = require('log')
local buffer = require('buffer')
+local scalar = buffer.scalar
+local scalar_array = buffer.scalar_array
+local static_alloc = buffer.static_alloc
local format = string.format
@@ -472,9 +475,9 @@ local function socket_setsockopt(self, level, name, value)
end
if info.type == 1 then
- local value = ffi.new("int[1]", value)
+ scalar.ai[0] = value
local res = ffi.C.setsockopt(fd,
- level, info.iname, value, ffi.sizeof('int'))
+ level, info.iname, scalar.ai, ffi.sizeof('int'))
if res < 0 then
self._errno = boxerrno()
@@ -516,8 +519,10 @@ local function socket_getsockopt(self, level, name)
self._errno = nil
if info.type == 1 then
- local value = ffi.new("int[1]", 0)
- local len = ffi.new("size_t[1]", ffi.sizeof('int'))
+ local value = scalar_array[0].ai
+ value[0] = 0
+ local len = scalar_array[1].as
+ len[0] = ffi.sizeof('int')
local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
if res < 0 then
@@ -532,8 +537,9 @@ local function socket_getsockopt(self, level, name)
end
if info.type == 2 then
- local value = ffi.new("char[256]", { 0 })
- local len = ffi.new("size_t[1]", 256)
+ local value = static_alloc('char', 256)
+ local len = scalar.as
+ len[0] = 256
local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
if res < 0 then
self._errno = boxerrno()
@@ -555,8 +561,9 @@ local function socket_linger(self, active, timeout)
local info = internal.SO_OPT[level].SO_LINGER
self._errno = nil
if active == nil then
- local value = ffi.new("linger_t[1]")
- local len = ffi.new("size_t[1]", 2 * ffi.sizeof('int'))
+ local value = static_alloc('linger_t')
+ local len = scalar.as
+ len[0] = 2 * ffi.sizeof('linger_t')
local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
if res < 0 then
self._errno = boxerrno()
@@ -581,9 +588,10 @@ local function socket_linger(self, active, timeout)
iactive = 0
end
- local value = ffi.new("linger_t[1]",
- { { active = iactive, timeout = timeout } })
- local len = 2 * ffi.sizeof('int')
+ local value = static_alloc('linger_t')
+ value[0].active = iactive
+ value[0].timeout = timeout
+ local len = ffi.sizeof('linger_t')
local res = ffi.C.setsockopt(fd, level, info.iname, value, len)
if res < 0 then
self._errno = boxerrno()
@@ -798,8 +806,7 @@ local function get_recv_size(self, size)
-- them using message peek.
local iflags = get_iflags(internal.SEND_FLAGS, {'MSG_PEEK'})
assert(iflags ~= nil)
- local buf = ffi.new('char[?]', 1)
- size = tonumber(ffi.C.recv(fd, buf, 1, iflags))
+ size = tonumber(ffi.C.recv(fd, scalar.ac, 1, iflags))
-- Prevent race condition: proceed with the case when
-- a datagram of length > 0 has been arrived after the
-- getsockopt call above.
@@ -836,8 +843,7 @@ local function socket_recv(self, size, flags)
end
self._errno = nil
- local buf = ffi.new("char[?]", size)
-
+ local buf = static_alloc('char', size)
local res = ffi.C.recv(fd, buf, size, iflags)
if res == -1 then
diff --git a/src/lua/string.lua b/src/lua/string.lua
index 8216ace6a..6ed6be8bf 100644
--- a/src/lua/string.lua
+++ b/src/lua/string.lua
@@ -1,4 +1,6 @@
local ffi = require('ffi')
+local buffer = require('buffer')
+local static_alloc = buffer.static_alloc
ffi.cdef[[
const char *
@@ -13,8 +15,8 @@ ffi.cdef[[
]]
local c_char_ptr = ffi.typeof('const char *')
-local strip_newstart = ffi.new("unsigned long[1]")
-local strip_newlen = ffi.new("unsigned long[1]")
+local strip_newstart = buffer.scalar_array[0].aul
+local strip_newlen = buffer.scalar_array[1].aul
local memcmp = ffi.C.memcmp
local memmem = ffi.C.memmem
@@ -290,7 +292,7 @@ local function string_hex(inp)
error(err_string_arg:format(1, 'string.hex', 'string', type(inp)), 2)
end
local len = inp:len() * 2
- local res = ffi.new('char[?]', len + 1)
+ local res = static_alloc('char', len + 1)
local uinp = ffi.cast('const unsigned char *', inp)
for i = 0, inp:len() - 1 do
@@ -333,7 +335,7 @@ local function string_fromhex(inp)
end
local len = inp:len() / 2
local casted_inp = ffi.cast('const char *', inp)
- local res = ffi.new('char[?]', len)
+ local res = static_alloc('char', len)
for i = 0, len - 1 do
local first = hexadecimals_mapping[casted_inp[i * 2]]
local second = hexadecimals_mapping[casted_inp[i * 2 + 1]]
diff --git a/src/lua/uri.lua b/src/lua/uri.lua
index d2946cd2d..5967c8bf2 100644
--- a/src/lua/uri.lua
+++ b/src/lua/uri.lua
@@ -1,6 +1,7 @@
-- uri.lua (internal file)
local ffi = require('ffi')
+local static_alloc = require('buffer').static_alloc
ffi.cdef[[
struct uri {
@@ -32,12 +33,11 @@ uri_format(char *str, size_t len, struct uri *uri, bool write_password);
local builtin = ffi.C;
-local uribuf = ffi.new('struct uri')
-
local function parse(str)
if str == nil then
error("Usage: uri.parse(string)")
end
+ local uribuf = static_alloc('struct uri')
if builtin.uri_parse(uribuf, str) ~= 0 then
return nil
end
@@ -59,6 +59,7 @@ local function parse(str)
end
local function format(uri, write_password)
+ local uribuf = static_alloc('struct uri')
uribuf.scheme = uri.scheme
uribuf.scheme_len = string.len(uri.scheme or '')
uribuf.login = uri.login
@@ -75,7 +76,7 @@ local function format(uri, write_password)
uribuf.query_len = string.len(uri.query or '')
uribuf.fragment = uri.fragment
uribuf.fragment_len = string.len(uri.fragment or '')
- local str = ffi.new('char[1024]')
+ local str = static_alloc('char', 1024)
builtin.uri_format(str, 1024, uribuf, write_password and 1 or 0)
return ffi.string(str)
end
diff --git a/src/lua/uuid.lua b/src/lua/uuid.lua
index 956ad6e36..f8418cf4d 100644
--- a/src/lua/uuid.lua
+++ b/src/lua/uuid.lua
@@ -1,6 +1,7 @@
-- uuid.lua (internal file)
local ffi = require("ffi")
+local static_alloc = require('buffer').static_alloc
local builtin = ffi.C
ffi.cdef[[
@@ -33,7 +34,6 @@ extern const struct tt_uuid uuid_nil;
local uuid_t = ffi.typeof('struct tt_uuid')
local UUID_STR_LEN = 36
local UUID_LEN = ffi.sizeof(uuid_t)
-local uuidbuf = ffi.new(uuid_t)
local uuid_tostring = function(uu)
if not ffi.istype(uuid_t, uu) then
@@ -69,9 +69,8 @@ local uuid_tobin = function(uu, byteorder)
return error('Usage: uuid:bin([byteorder])')
end
if need_bswap(byteorder) then
- if uu ~= uuidbuf then
- ffi.copy(uuidbuf, uu, UUID_LEN)
- end
+ local uuidbuf = static_alloc('struct tt_uuid')
+ ffi.copy(uuidbuf, uu, UUID_LEN)
builtin.tt_uuid_bswap(uuidbuf)
return ffi.string(ffi.cast('char *', uuidbuf), UUID_LEN)
end
@@ -114,10 +113,12 @@ local uuid_new = function()
end
local uuid_new_bin = function(byteorder)
+ local uuidbuf = static_alloc('struct tt_uuid')
builtin.tt_uuid_create(uuidbuf)
return uuid_tobin(uuidbuf, byteorder)
end
local uuid_new_str = function()
+ local uuidbuf = static_alloc('struct tt_uuid')
builtin.tt_uuid_create(uuidbuf)
return uuid_tostring(uuidbuf)
end
diff --git a/test/app/buffer.result b/test/app/buffer.result
new file mode 100644
index 000000000..10494b812
--- /dev/null
+++ b/test/app/buffer.result
@@ -0,0 +1,53 @@
+test_run = require('test_run').new()
+---
+...
+buffer = require('buffer')
+---
+...
+ffi = require('ffi')
+---
+...
+-- Scalar.
+scalar = buffer.scalar
+---
+...
+scalar.u16 = 100
+---
+...
+u16 = ffi.new('uint16_t[1]')
+---
+...
+ffi.copy(u16, scalar, 2)
+---
+...
+u16[0]
+---
+- 100
+...
+u16[0] = 200
+---
+...
+ffi.copy(scalar, u16, 2)
+---
+...
+scalar.u16
+---
+- 200
+...
+-- Alignment.
+_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
+---
+...
+p = buffer.static_alloc('int')
+---
+...
+ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
+---
+- true
+...
+-- Able to allocate bigger than static buffer - such allocations
+-- are on the heap.
+type(buffer.static_alloc('char', 13000))
+---
+- cdata
+...
diff --git a/test/app/buffer.test.lua b/test/app/buffer.test.lua
new file mode 100644
index 000000000..6b1a974ca
--- /dev/null
+++ b/test/app/buffer.test.lua
@@ -0,0 +1,23 @@
+test_run = require('test_run').new()
+buffer = require('buffer')
+ffi = require('ffi')
+
+-- Scalar.
+scalar = buffer.scalar
+scalar.u16 = 100
+u16 = ffi.new('uint16_t[1]')
+ffi.copy(u16, scalar, 2)
+u16[0]
+
+u16[0] = 200
+ffi.copy(scalar, u16, 2)
+scalar.u16
+
+-- Alignment.
+_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
+p = buffer.static_alloc('int')
+ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
+
+-- Able to allocate bigger than static buffer - such allocations
+-- are on the heap.
+type(buffer.static_alloc('char', 13000))
--
2.20.1 (Apple Git-117)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tarantool-patches] Re: [PATCH 1/1] buffer: port static allocator to Lua
2019-05-15 13:33 [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua Vladislav Shpilevoy
@ 2019-05-18 18:41 ` Vladislav Shpilevoy
2019-05-18 19:12 ` Konstantin Osipov
1 sibling, 0 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2019-05-18 18:41 UTC (permalink / raw)
To: tarantool-patches; +Cc: kostja
Still waiting a review.
On 15/05/2019 16:33, Vladislav Shpilevoy wrote:
> Static allocator gives memory blocks from cyclic BSS memory
> block of 3 pages 4096 bytes each. It is much faster than
> malloc, when a temporary buffer is needed.
>
> This commit exposes the allocator to Lua, which suffers from
> lack of ability to pass values by pointers into FFI functions,
> nor has a stack to allocate small buffers like 'char[256]'.
> Also these allocations complicate and slow down GC job.
>
> Static allocator solves most of these problems provindg a way to
> swiftly allocate temporary memory blocks. Besides, for the most
> annoying problem with ffi.new(<type>[1]) the buffer module now
> exposes a statically allocated array of scalar values able to fit
> any 'int', 'char', 'size_t' ... and ready at hand.
>
> A simple micro benchmark showed, that ffi.new() vs
> buffer.static_alloc() is ~100 times slower, even on small
> allocations of 1Kb, and ~1.5 times slower on tiny allocations
> of 10 bytes. The results do not account GC. It is remarkable,
> buffer.static_alloc() speed does not depend on size, while
> ffi.new() strongly depends.
>
> The commit is motivated by one another place where
> ffi.new('int[1]') appeared, in SWIM module, to obtain payload
> size as an out parameter of a C function.
> ---
> Branch: https://github.com/tarantool/tarantool/tree/gerold103/static-allocator-lua
>
> extra/exports | 2 ++
> src/CMakeLists.txt | 1 +
> src/box/lua/schema.lua | 3 +-
> src/lua/buffer.c | 42 +++++++++++++++++++++++++++
> src/lua/buffer.lua | 61 ++++++++++++++++++++++++++++++++++++++++
> src/lua/crypto.lua | 22 +++++++--------
> src/lua/digest.lua | 7 +++--
> src/lua/fio.lua | 3 +-
> src/lua/init.c | 2 +-
> src/lua/msgpackffi.lua | 50 +++++++++++++-------------------
> src/lua/socket.lua | 36 ++++++++++++++----------
> src/lua/string.lua | 10 ++++---
> src/lua/uri.lua | 7 +++--
> src/lua/uuid.lua | 9 +++---
> test/app/buffer.result | 53 ++++++++++++++++++++++++++++++++++
> test/app/buffer.test.lua | 23 +++++++++++++++
> 16 files changed, 256 insertions(+), 75 deletions(-)
> create mode 100644 src/lua/buffer.c
> create mode 100644 test/app/buffer.result
> create mode 100644 test/app/buffer.test.lua
>
> diff --git a/extra/exports b/extra/exports
> index 4f41a17b3..5375a01e4 100644
> --- a/extra/exports
> +++ b/extra/exports
> @@ -85,6 +85,8 @@ tnt_EVP_MD_CTX_free
> tnt_HMAC_CTX_new
> tnt_HMAC_CTX_free
>
> +lua_static_aligned_alloc
> +
> # Module API
>
> _say
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
> index a6a18142b..492b8712e 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -116,6 +116,7 @@ set (server_sources
> lua/utf8.c
> lua/info.c
> lua/string.c
> + lua/buffer.c
> ${lua_sources}
> ${PROJECT_SOURCE_DIR}/third_party/lua-yaml/lyaml.cc
> ${PROJECT_SOURCE_DIR}/third_party/lua-yaml/b64.c
> diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
> index f31cf7f2c..fa6c7c9a4 100644
> --- a/src/box/lua/schema.lua
> +++ b/src/box/lua/schema.lua
> @@ -7,6 +7,7 @@ local fun = require('fun')
> local log = require('log')
> local fio = require('fio')
> local json = require('json')
> +local static_alloc = require('buffer').static_alloc
> local session = box.session
> local internal = require('box.internal')
> local function setmap(table)
> @@ -2103,7 +2104,7 @@ box.schema.user = {}
>
> box.schema.user.password = function(password)
> local BUF_SIZE = 128
> - local buf = ffi.new("char[?]", BUF_SIZE)
> + local buf = static_alloc('char', BUF_SIZE)
> builtin.password_prepare(password, #password, buf, BUF_SIZE)
> return ffi.string(buf)
> end
> diff --git a/src/lua/buffer.c b/src/lua/buffer.c
> new file mode 100644
> index 000000000..5fd349261
> --- /dev/null
> +++ b/src/lua/buffer.c
> @@ -0,0 +1,42 @@
> +/*
> + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the
> + * following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "small/static.h"
> +
> +/**
> + * Static inline functions like in static.h can't be exported.
> + * Here they are given a symbol to export.
> + */
> +
> +void *
> +lua_static_aligned_alloc(size_t size, size_t alignment)
> +{
> + return static_aligned_alloc(size, alignment);
> +}
> diff --git a/src/lua/buffer.lua b/src/lua/buffer.lua
> index a72d8d1f9..7fd9f46ed 100644
> --- a/src/lua/buffer.lua
> +++ b/src/lua/buffer.lua
> @@ -33,6 +33,37 @@ ibuf_reinit(struct ibuf *ibuf);
>
> void *
> ibuf_reserve_slow(struct ibuf *ibuf, size_t size);
> +
> +void *
> +lua_static_aligned_alloc(size_t size, size_t alignment);
> +
> +/**
> + * Scalar is a buffer to use with FFI functions, which usually
> + * operate with pointers to scalar values like int, char, size_t,
> + * void *. To avoid doing 'ffi.new(<type>[1])' on each such FFI
> + * function invocation, a module can use one of attributes of the
> + * scalar union.
> + *
> + * Naming policy of the attributes is easy to remember:
> + * 'a' for array type + type name first letters + 'p' for pointer.
> + *
> + * For example:
> + * - int[1] - <a>rray of <i>nt - ai;
> + * - const unsigned char *[1] -
> + * <a>rray of <c>onst <u>nsigned <c>har <p> pointer - acucp.
> + */
> +union scalar {
> + size_t as[1];
> + void *ap[1];
> + int ai[1];
> + char ac[1];
> + const unsigned char *acucp[1];
> + unsigned long aul[1];
> + uint16_t u16;
> + uint32_t u32;
> + uint64_t u64;
> + int64_t i64;
> +};
> ]]
>
> local builtin = ffi.C
> @@ -174,8 +205,38 @@ local function ibuf_new(arg, arg2)
> errorf('Usage: ibuf([size])')
> end
>
> +--
> +-- Allocate a chunk of static BSS memory, or use ordinal ffi.new,
> +-- when too big size.
> +-- @param type C type - a struct, a basic type, etc. Should be a
> +-- string: 'int', 'char *', 'struct tuple', etc.
> +-- @param size Optional argument, number of elements of @a type to
> +-- allocate. 1 by default.
> +-- @return Cdata pointer to @a type.
> +--
> +local function static_alloc(type, size)
> + size = size or 1
> + local bsize = size * ffi.sizeof(type)
> + local ptr = builtin.lua_static_aligned_alloc(bsize, ffi.alignof(type))
> + if ptr ~= nil then
> + return ffi.cast(type..' *', ptr)
> + end
> + return ffi.new(type..'[?]', size)
> +end
> +
> +--
> +-- Sometimes it is wanted to use several temporary scalar cdata
> +-- values. Then one union object is not enough - its attributes
> +-- share memory.
> +--
> +local scalar_array = ffi.new('union scalar[?]', 2)
> +
> return {
> ibuf = ibuf_new;
> IBUF_SHARED = ffi.C.tarantool_lua_ibuf;
> READAHEAD = READAHEAD;
> + static_alloc = static_alloc,
> + scalar_array = scalar_array,
> + -- Fast access when only one variable is needed.
> + scalar = scalar_array[0],
> }
> diff --git a/src/lua/crypto.lua b/src/lua/crypto.lua
> index e76370517..7db6afac9 100644
> --- a/src/lua/crypto.lua
> +++ b/src/lua/crypto.lua
> @@ -2,6 +2,7 @@
>
> local ffi = require('ffi')
> local buffer = require('buffer')
> +local scalar = buffer.scalar
>
> ffi.cdef[[
> int tnt_openssl_init(void);
> @@ -84,7 +85,6 @@ local function digest_new(digest)
> digest = digest,
> buf = buffer.ibuf(64),
> initialized = false,
> - outl = ffi.new('int[1]')
> }, digest_mt)
> self:init()
> return self
> @@ -114,10 +114,10 @@ local function digest_final(self)
> return error('Digest not initialized')
> end
> self.initialized = false
> - if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, self.outl) ~= 1 then
> + if ffi.C.EVP_DigestFinal_ex(self.ctx, self.buf.wpos, scalar.ai) ~= 1 then
> return error('Can\'t finalize digest: ' .. openssl_err_str())
> end
> - return ffi.string(self.buf.wpos, self.outl[0])
> + return ffi.string(self.buf.wpos, scalar.ai[0])
> end
>
> local function digest_free(self)
> @@ -156,9 +156,7 @@ local function hmac_new(digest, key)
> local self = setmetatable({
> ctx = ctx,
> digest = digest,
> - buf = buffer.ibuf(64),
> initialized = false,
> - outl = ffi.new('int[1]')
> }, hmac_mt)
> self:init(key)
> return self
> @@ -188,10 +186,11 @@ local function hmac_final(self)
> return error('HMAC not initialized')
> end
> self.initialized = false
> - if ffi.C.HMAC_Final(self.ctx, self.buf.wpos, self.outl) ~= 1 then
> + local buf = buffer.static_alloc('char', 64)
> + if ffi.C.HMAC_Final(self.ctx, buf, scalar.ai) ~= 1 then
> return error('Can\'t finalize HMAC: ' .. openssl_err_str())
> end
> - return ffi.string(self.buf.wpos, self.outl[0])
> + return ffi.string(buf, scalar.ai[0])
> end
>
> local function hmac_free(self)
> @@ -254,7 +253,6 @@ local function cipher_new(cipher, key, iv, direction)
> direction = direction,
> buf = buffer.ibuf(),
> initialized = false,
> - outl = ffi.new('int[1]')
> }, cipher_mt)
> self:init(key, iv)
> return self
> @@ -279,10 +277,10 @@ local function cipher_update(self, input)
> error("Usage: cipher:update(string)")
> end
> local wpos = self.buf:reserve(input:len() + self.block_size - 1)
> - if ffi.C.EVP_CipherUpdate(self.ctx, wpos, self.outl, input, input:len()) ~= 1 then
> + if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then
> return error('Can\'t update cipher:' .. openssl_err_str())
> end
> - return ffi.string(wpos, self.outl[0])
> + return ffi.string(wpos, scalar.ai[0])
> end
>
> local function cipher_final(self)
> @@ -291,11 +289,11 @@ local function cipher_final(self)
> end
> self.initialized = false
> local wpos = self.buf:reserve(self.block_size - 1)
> - if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, self.outl) ~= 1 then
> + if ffi.C.EVP_CipherFinal_ex(self.ctx, wpos, scalar.ai) ~= 1 then
> return error('Can\'t finalize cipher:' .. openssl_err_str())
> end
> self.initialized = false
> - return ffi.string(wpos, self.outl[0])
> + return ffi.string(wpos, scalar.ai[0])
> end
>
> local function cipher_free(self)
> diff --git a/src/lua/digest.lua b/src/lua/digest.lua
> index 8f199c0af..6ed91cfa2 100644
> --- a/src/lua/digest.lua
> +++ b/src/lua/digest.lua
> @@ -3,6 +3,7 @@
> local ffi = require('ffi')
> local crypto = require('crypto')
> local bit = require('bit')
> +local static_alloc = require('buffer').static_alloc
>
> ffi.cdef[[
> /* internal implementation */
> @@ -179,7 +180,7 @@ local m = {
> end
> local blen = #bin
> local slen = ffi.C.base64_bufsize(blen, mask)
> - local str = ffi.new('char[?]', slen)
> + local str = static_alloc('char', slen)
> local len = ffi.C.base64_encode(bin, blen, str, slen, mask)
> return ffi.string(str, len)
> end,
> @@ -190,7 +191,7 @@ local m = {
> end
> local slen = #str
> local blen = math.ceil(slen * 3 / 4)
> - local bin = ffi.new('char[?]', blen)
> + local bin = static_alloc('char', blen)
> local len = ffi.C.base64_decode(str, slen, bin, blen)
> return ffi.string(bin, len)
> end,
> @@ -228,7 +229,7 @@ local m = {
> if n == nil then
> error('Usage: digest.urandom(len)')
> end
> - local buf = ffi.new('char[?]', n)
> + local buf = static_alloc('char', n)
> ffi.C.random_bytes(buf, n)
> return ffi.string(buf, n)
> end,
> diff --git a/src/lua/fio.lua b/src/lua/fio.lua
> index 38664a556..fce79c277 100644
> --- a/src/lua/fio.lua
> +++ b/src/lua/fio.lua
> @@ -275,8 +275,7 @@ fio.dirname = function(path)
> if type(path) ~= 'string' then
> error("Usage: fio.dirname(path)")
> end
> - path = ffi.new('char[?]', #path + 1, path)
> - return ffi.string(ffi.C.dirname(path))
> + return ffi.string(ffi.C.dirname(ffi.cast('char *', path)))
> end
>
> fio.umask = function(umask)
> diff --git a/src/lua/init.c b/src/lua/init.c
> index be164bdfb..303369841 100644
> --- a/src/lua/init.c
> +++ b/src/lua/init.c
> @@ -126,9 +126,9 @@ static const char *lua_modules[] = {
> "errno", errno_lua,
> "fiber", fiber_lua,
> "env", env_lua,
> + "buffer", buffer_lua,
> "string", string_lua,
> "table", table_lua,
> - "buffer", buffer_lua,
> "msgpackffi", msgpackffi_lua,
> "crypto", crypto_lua,
> "digest", digest_lua,
> diff --git a/src/lua/msgpackffi.lua b/src/lua/msgpackffi.lua
> index 609334cff..1e4786707 100644
> --- a/src/lua/msgpackffi.lua
> +++ b/src/lua/msgpackffi.lua
> @@ -21,19 +21,10 @@ float
> mp_decode_float(const char **data);
> double
> mp_decode_double(const char **data);
> -union tmpint {
> - uint16_t u16;
> - uint32_t u32;
> - uint64_t u64;
> -};
> ]])
>
> local strict_alignment = (jit.arch == 'arm')
> -
> -local tmpint
> -if strict_alignment then
> - tmpint = ffi.new('union tmpint[1]')
> -end
> +local scalar = buffer.scalar
>
> local function bswap_u16(num)
> return bit.rshift(bit.bswap(tonumber(num)), 16)
> @@ -70,10 +61,10 @@ end
> local encode_u16
> if strict_alignment then
> encode_u16 = function(buf, code, num)
> - tmpint[0].u16 = bswap_u16(num)
> + scalar.u16 = bswap_u16(num)
> local p = buf:alloc(3)
> p[0] = code
> - ffi.copy(p + 1, tmpint, 2)
> + ffi.copy(p + 1, scalar, 2)
> end
> else
> encode_u16 = function(buf, code, num)
> @@ -86,11 +77,10 @@ end
> local encode_u32
> if strict_alignment then
> encode_u32 = function(buf, code, num)
> - tmpint[0].u32 =
> - ffi.cast('uint32_t', bit.bswap(tonumber(num)))
> + scalar.u32 = ffi.cast('uint32_t', bit.bswap(tonumber(num)))
> local p = buf:alloc(5)
> p[0] = code
> - ffi.copy(p + 1, tmpint, 4)
> + ffi.copy(p + 1, scalar, 4)
> end
> else
> encode_u32 = function(buf, code, num)
> @@ -104,10 +94,10 @@ end
> local encode_u64
> if strict_alignment then
> encode_u64 = function(buf, code, num)
> - tmpint[0].u64 = bit.bswap(ffi.cast('uint64_t', num))
> + scalar.u64 = bit.bswap(ffi.cast('uint64_t', num))
> local p = buf:alloc(9)
> p[0] = code
> - ffi.copy(p + 1, tmpint, 8)
> + ffi.copy(p + 1, scalar, 8)
> end
> else
> encode_u64 = function(buf, code, num)
> @@ -324,9 +314,9 @@ end
> local decode_u16
> if strict_alignment then
> decode_u16 = function(data)
> - ffi.copy(tmpint, data[0], 2)
> + ffi.copy(scalar, data[0], 2)
> data[0] = data[0] + 2
> - return tonumber(bswap_u16(tmpint[0].u16))
> + return tonumber(bswap_u16(scalar.u16))
> end
> else
> decode_u16 = function(data)
> @@ -339,10 +329,10 @@ end
> local decode_u32
> if strict_alignment then
> decode_u32 = function(data)
> - ffi.copy(tmpint, data[0], 4)
> + ffi.copy(scalar, data[0], 4)
> data[0] = data[0] + 4
> return tonumber(
> - ffi.cast('uint32_t', bit.bswap(tonumber(tmpint[0].u32))))
> + ffi.cast('uint32_t', bit.bswap(tonumber(scalar.u32))))
> end
> else
> decode_u32 = function(data)
> @@ -356,9 +346,9 @@ end
> local decode_u64
> if strict_alignment then
> decode_u64 = function(data)
> - ffi.copy(tmpint, data[0], 8);
> + ffi.copy(scalar, data[0], 8);
> data[0] = data[0] + 8
> - local num = bit.bswap(tmpint[0].u64)
> + local num = bit.bswap(scalar.u64)
> if num <= DBL_INT_MAX then
> return tonumber(num) -- return as 'number'
> end
> @@ -385,8 +375,8 @@ end
> local decode_i16
> if strict_alignment then
> decode_i16 = function(data)
> - ffi.copy(tmpint, data[0], 2)
> - local num = bswap_u16(tmpint[0].u16)
> + ffi.copy(scalar, data[0], 2)
> + local num = bswap_u16(scalar.u16)
> data[0] = data[0] + 2
> -- note: this double cast is actually necessary
> return tonumber(ffi.cast('int16_t', ffi.cast('uint16_t', num)))
> @@ -403,8 +393,8 @@ end
> local decode_i32
> if strict_alignment then
> decode_i32 = function(data)
> - ffi.copy(tmpint, data[0], 4)
> - local num = bit.bswap(tonumber(tmpint[0].u32))
> + ffi.copy(scalar, data[0], 4)
> + local num = bit.bswap(tonumber(scalar.u32))
> data[0] = data[0] + 4
> return num
> end
> @@ -419,9 +409,9 @@ end
> local decode_i64
> if strict_alignment then
> decode_i64 = function(data)
> - ffi.copy(tmpint, data[0], 8)
> + ffi.copy(scalar, data[0], 8)
> data[0] = data[0] + 8
> - local num = bit.bswap(ffi.cast('int64_t', tmpint[0].u64))
> + local num = bit.bswap(scalar.i64)
> if num >= -DBL_INT_MAX and num <= DBL_INT_MAX then
> return tonumber(num) -- return as 'number'
> end
> @@ -552,7 +542,7 @@ end
> -- element. It is significally faster on LuaJIT to use double pointer than
> -- return result, newpos.
> --
> -local bufp = ffi.new('const unsigned char *[1]');
> +local bufp = scalar.acucp;
>
> local function check_offset(offset, len)
> if offset == nil then
> diff --git a/src/lua/socket.lua b/src/lua/socket.lua
> index b2700e0c0..1ddafee2f 100644
> --- a/src/lua/socket.lua
> +++ b/src/lua/socket.lua
> @@ -10,6 +10,9 @@ local fiber = require('fiber')
> local fio = require('fio')
> local log = require('log')
> local buffer = require('buffer')
> +local scalar = buffer.scalar
> +local scalar_array = buffer.scalar_array
> +local static_alloc = buffer.static_alloc
>
> local format = string.format
>
> @@ -472,9 +475,9 @@ local function socket_setsockopt(self, level, name, value)
> end
>
> if info.type == 1 then
> - local value = ffi.new("int[1]", value)
> + scalar.ai[0] = value
> local res = ffi.C.setsockopt(fd,
> - level, info.iname, value, ffi.sizeof('int'))
> + level, info.iname, scalar.ai, ffi.sizeof('int'))
>
> if res < 0 then
> self._errno = boxerrno()
> @@ -516,8 +519,10 @@ local function socket_getsockopt(self, level, name)
> self._errno = nil
>
> if info.type == 1 then
> - local value = ffi.new("int[1]", 0)
> - local len = ffi.new("size_t[1]", ffi.sizeof('int'))
> + local value = scalar_array[0].ai
> + value[0] = 0
> + local len = scalar_array[1].as
> + len[0] = ffi.sizeof('int')
> local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
>
> if res < 0 then
> @@ -532,8 +537,9 @@ local function socket_getsockopt(self, level, name)
> end
>
> if info.type == 2 then
> - local value = ffi.new("char[256]", { 0 })
> - local len = ffi.new("size_t[1]", 256)
> + local value = static_alloc('char', 256)
> + local len = scalar.as
> + len[0] = 256
> local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
> if res < 0 then
> self._errno = boxerrno()
> @@ -555,8 +561,9 @@ local function socket_linger(self, active, timeout)
> local info = internal.SO_OPT[level].SO_LINGER
> self._errno = nil
> if active == nil then
> - local value = ffi.new("linger_t[1]")
> - local len = ffi.new("size_t[1]", 2 * ffi.sizeof('int'))
> + local value = static_alloc('linger_t')
> + local len = scalar.as
> + len[0] = 2 * ffi.sizeof('linger_t')
> local res = ffi.C.getsockopt(fd, level, info.iname, value, len)
> if res < 0 then
> self._errno = boxerrno()
> @@ -581,9 +588,10 @@ local function socket_linger(self, active, timeout)
> iactive = 0
> end
>
> - local value = ffi.new("linger_t[1]",
> - { { active = iactive, timeout = timeout } })
> - local len = 2 * ffi.sizeof('int')
> + local value = static_alloc('linger_t')
> + value[0].active = iactive
> + value[0].timeout = timeout
> + local len = ffi.sizeof('linger_t')
> local res = ffi.C.setsockopt(fd, level, info.iname, value, len)
> if res < 0 then
> self._errno = boxerrno()
> @@ -798,8 +806,7 @@ local function get_recv_size(self, size)
> -- them using message peek.
> local iflags = get_iflags(internal.SEND_FLAGS, {'MSG_PEEK'})
> assert(iflags ~= nil)
> - local buf = ffi.new('char[?]', 1)
> - size = tonumber(ffi.C.recv(fd, buf, 1, iflags))
> + size = tonumber(ffi.C.recv(fd, scalar.ac, 1, iflags))
> -- Prevent race condition: proceed with the case when
> -- a datagram of length > 0 has been arrived after the
> -- getsockopt call above.
> @@ -836,8 +843,7 @@ local function socket_recv(self, size, flags)
> end
>
> self._errno = nil
> - local buf = ffi.new("char[?]", size)
> -
> + local buf = static_alloc('char', size)
> local res = ffi.C.recv(fd, buf, size, iflags)
>
> if res == -1 then
> diff --git a/src/lua/string.lua b/src/lua/string.lua
> index 8216ace6a..6ed6be8bf 100644
> --- a/src/lua/string.lua
> +++ b/src/lua/string.lua
> @@ -1,4 +1,6 @@
> local ffi = require('ffi')
> +local buffer = require('buffer')
> +local static_alloc = buffer.static_alloc
>
> ffi.cdef[[
> const char *
> @@ -13,8 +15,8 @@ ffi.cdef[[
> ]]
>
> local c_char_ptr = ffi.typeof('const char *')
> -local strip_newstart = ffi.new("unsigned long[1]")
> -local strip_newlen = ffi.new("unsigned long[1]")
> +local strip_newstart = buffer.scalar_array[0].aul
> +local strip_newlen = buffer.scalar_array[1].aul
>
> local memcmp = ffi.C.memcmp
> local memmem = ffi.C.memmem
> @@ -290,7 +292,7 @@ local function string_hex(inp)
> error(err_string_arg:format(1, 'string.hex', 'string', type(inp)), 2)
> end
> local len = inp:len() * 2
> - local res = ffi.new('char[?]', len + 1)
> + local res = static_alloc('char', len + 1)
>
> local uinp = ffi.cast('const unsigned char *', inp)
> for i = 0, inp:len() - 1 do
> @@ -333,7 +335,7 @@ local function string_fromhex(inp)
> end
> local len = inp:len() / 2
> local casted_inp = ffi.cast('const char *', inp)
> - local res = ffi.new('char[?]', len)
> + local res = static_alloc('char', len)
> for i = 0, len - 1 do
> local first = hexadecimals_mapping[casted_inp[i * 2]]
> local second = hexadecimals_mapping[casted_inp[i * 2 + 1]]
> diff --git a/src/lua/uri.lua b/src/lua/uri.lua
> index d2946cd2d..5967c8bf2 100644
> --- a/src/lua/uri.lua
> +++ b/src/lua/uri.lua
> @@ -1,6 +1,7 @@
> -- uri.lua (internal file)
>
> local ffi = require('ffi')
> +local static_alloc = require('buffer').static_alloc
>
> ffi.cdef[[
> struct uri {
> @@ -32,12 +33,11 @@ uri_format(char *str, size_t len, struct uri *uri, bool write_password);
>
> local builtin = ffi.C;
>
> -local uribuf = ffi.new('struct uri')
> -
> local function parse(str)
> if str == nil then
> error("Usage: uri.parse(string)")
> end
> + local uribuf = static_alloc('struct uri')
> if builtin.uri_parse(uribuf, str) ~= 0 then
> return nil
> end
> @@ -59,6 +59,7 @@ local function parse(str)
> end
>
> local function format(uri, write_password)
> + local uribuf = static_alloc('struct uri')
> uribuf.scheme = uri.scheme
> uribuf.scheme_len = string.len(uri.scheme or '')
> uribuf.login = uri.login
> @@ -75,7 +76,7 @@ local function format(uri, write_password)
> uribuf.query_len = string.len(uri.query or '')
> uribuf.fragment = uri.fragment
> uribuf.fragment_len = string.len(uri.fragment or '')
> - local str = ffi.new('char[1024]')
> + local str = static_alloc('char', 1024)
> builtin.uri_format(str, 1024, uribuf, write_password and 1 or 0)
> return ffi.string(str)
> end
> diff --git a/src/lua/uuid.lua b/src/lua/uuid.lua
> index 956ad6e36..f8418cf4d 100644
> --- a/src/lua/uuid.lua
> +++ b/src/lua/uuid.lua
> @@ -1,6 +1,7 @@
> -- uuid.lua (internal file)
>
> local ffi = require("ffi")
> +local static_alloc = require('buffer').static_alloc
> local builtin = ffi.C
>
> ffi.cdef[[
> @@ -33,7 +34,6 @@ extern const struct tt_uuid uuid_nil;
> local uuid_t = ffi.typeof('struct tt_uuid')
> local UUID_STR_LEN = 36
> local UUID_LEN = ffi.sizeof(uuid_t)
> -local uuidbuf = ffi.new(uuid_t)
>
> local uuid_tostring = function(uu)
> if not ffi.istype(uuid_t, uu) then
> @@ -69,9 +69,8 @@ local uuid_tobin = function(uu, byteorder)
> return error('Usage: uuid:bin([byteorder])')
> end
> if need_bswap(byteorder) then
> - if uu ~= uuidbuf then
> - ffi.copy(uuidbuf, uu, UUID_LEN)
> - end
> + local uuidbuf = static_alloc('struct tt_uuid')
> + ffi.copy(uuidbuf, uu, UUID_LEN)
> builtin.tt_uuid_bswap(uuidbuf)
> return ffi.string(ffi.cast('char *', uuidbuf), UUID_LEN)
> end
> @@ -114,10 +113,12 @@ local uuid_new = function()
> end
>
> local uuid_new_bin = function(byteorder)
> + local uuidbuf = static_alloc('struct tt_uuid')
> builtin.tt_uuid_create(uuidbuf)
> return uuid_tobin(uuidbuf, byteorder)
> end
> local uuid_new_str = function()
> + local uuidbuf = static_alloc('struct tt_uuid')
> builtin.tt_uuid_create(uuidbuf)
> return uuid_tostring(uuidbuf)
> end
> diff --git a/test/app/buffer.result b/test/app/buffer.result
> new file mode 100644
> index 000000000..10494b812
> --- /dev/null
> +++ b/test/app/buffer.result
> @@ -0,0 +1,53 @@
> +test_run = require('test_run').new()
> +---
> +...
> +buffer = require('buffer')
> +---
> +...
> +ffi = require('ffi')
> +---
> +...
> +-- Scalar.
> +scalar = buffer.scalar
> +---
> +...
> +scalar.u16 = 100
> +---
> +...
> +u16 = ffi.new('uint16_t[1]')
> +---
> +...
> +ffi.copy(u16, scalar, 2)
> +---
> +...
> +u16[0]
> +---
> +- 100
> +...
> +u16[0] = 200
> +---
> +...
> +ffi.copy(scalar, u16, 2)
> +---
> +...
> +scalar.u16
> +---
> +- 200
> +...
> +-- Alignment.
> +_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
> +---
> +...
> +p = buffer.static_alloc('int')
> +---
> +...
> +ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
> +---
> +- true
> +...
> +-- Able to allocate bigger than static buffer - such allocations
> +-- are on the heap.
> +type(buffer.static_alloc('char', 13000))
> +---
> +- cdata
> +...
> diff --git a/test/app/buffer.test.lua b/test/app/buffer.test.lua
> new file mode 100644
> index 000000000..6b1a974ca
> --- /dev/null
> +++ b/test/app/buffer.test.lua
> @@ -0,0 +1,23 @@
> +test_run = require('test_run').new()
> +buffer = require('buffer')
> +ffi = require('ffi')
> +
> +-- Scalar.
> +scalar = buffer.scalar
> +scalar.u16 = 100
> +u16 = ffi.new('uint16_t[1]')
> +ffi.copy(u16, scalar, 2)
> +u16[0]
> +
> +u16[0] = 200
> +ffi.copy(scalar, u16, 2)
> +scalar.u16
> +
> +-- Alignment.
> +_ = buffer.static_alloc('char') -- This makes buffer pos unaligned.
> +p = buffer.static_alloc('int')
> +ffi.cast('int', p) % ffi.alignof('int') == 0 -- But next alloc is aligned.
> +
> +-- Able to allocate bigger than static buffer - such allocations
> +-- are on the heap.
> +type(buffer.static_alloc('char', 13000))
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tarantool-patches] Re: [PATCH 1/1] buffer: port static allocator to Lua
2019-05-15 13:33 [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua Vladislav Shpilevoy
2019-05-18 18:41 ` [tarantool-patches] " Vladislav Shpilevoy
@ 2019-05-18 19:12 ` Konstantin Osipov
2019-05-18 22:15 ` Vladislav Shpilevoy
1 sibling, 1 reply; 4+ messages in thread
From: Konstantin Osipov @ 2019-05-18 19:12 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tarantool-patches
* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/05/15 16:34]:
1) scalar looks clearly as a separate patch. while I like the idea
I don't appreciate the urgency and a huge patch.
2) The static alloc patch is OK to push
3) Since you care about performance so much, I don't understand
why you do concat (..) in Lua static alloc. Did you check it's
a compile-time, not a run-time concat?
> + if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then
I don't understand, if you need a scalar and an array of scalars,
and you have scalar and scalar_array variables, why would you ever
use an array in scalar? Wouldn't it be better to have two unions
instead of one for the sake of clarity?
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tarantool-patches] Re: [PATCH 1/1] buffer: port static allocator to Lua
2019-05-18 19:12 ` Konstantin Osipov
@ 2019-05-18 22:15 ` Vladislav Shpilevoy
0 siblings, 0 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2019-05-18 22:15 UTC (permalink / raw)
To: Konstantin Osipov; +Cc: tarantool-patches
On 18/05/2019 22:12, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/05/15 16:34]:
>
> 1) scalar looks clearly as a separate patch. while I like the idea
> I don't appreciate the urgency and a huge patch.
Done in a separate commit, see v2 thread. Urgency is motivated by
SWIM Lua API dependency on that - I do not want to push SWIM Lua
with ugly "ffi.new('int[1]')" construction.
>
> 2) The static alloc patch is OK to push
>
> 3) Since you care about performance so much, I don't understand
> why you do concat (..) in Lua static alloc. Did you check it's
> a compile-time, not a run-time concat?
I did not check whether it is compile time, because Lua has no a
compiler. But I made a benchmark - even with concat it is orders of
magnitude faster.
Nonetheless, probably you could be right, and it would be better to
avoid concatenation - then impact would be even greater.
Firstly, I tried to avoid concatenation and pass the array type
always. But I faced a problem, that ffi.cast does not allow some
C conversions. I needed to be able to do
ffi.cast('type[size]', <cdata void *>) - in C I would not have
problems, but in Lua I do. I started looking into FFI implementation,
and found how to allow that conversion, but then I decided that
it is not worth doing without a benchmark: concact vs no concat.
Appeared, that noconcat version is ~5% faster. In the summary,
original patch makes small allocations +10000% faster, without
concat it would be +10005%. Not sure, if this difference is worth
splitting static_alloc() into multiple functions, nor moving
ffi.sizeof() + ffi.cast() operations onto the caller's shoulders.
So I decided to keep it is as. Otherwise it becomes too complicated.
>> + if ffi.C.EVP_CipherUpdate(self.ctx, wpos, scalar.ai, input, input:len()) ~= 1 then
>
> I don't understand, if you need a scalar and an array of scalars,
> and you have scalar and scalar_array variables, why would you ever
> use an array in scalar?
I use array types [1] in scalar, because it is the only way to
get a pointer onto a type in Lua.
> Wouldn't it be better to have two unions
> instead of one for the sake of clarity?
I have - scalar_array is two unions. I thought about their
separation in something like buffer.scalar1, buffer.scalar2, but
looks ugly, unnatural.
But here I do not insist. If you want, I can split them into two
variables. Should I?
See V2 thread.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-05-18 22:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-15 13:33 [tarantool-patches] [PATCH 1/1] buffer: port static allocator to Lua Vladislav Shpilevoy
2019-05-18 18:41 ` [tarantool-patches] " Vladislav Shpilevoy
2019-05-18 19:12 ` Konstantin Osipov
2019-05-18 22:15 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox