From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp38.i.mail.ru (smtp38.i.mail.ru [94.100.177.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2237F469710 for ; Thu, 28 May 2020 23:11:59 +0300 (MSK) From: "Timur Safin" References: In-Reply-To: Date: Thu, 28 May 2020 23:11:56 +0300 Message-ID: <048601d6352c$3cdb81a0$b69284e0$@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Language: ru Subject: Re: [Tarantool-patches] [PATCH v2 04/10] crc32: align memory access List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: 'Vladislav Shpilevoy' , tarantool-patches@dev.tarantool.org, alyapunov@tarantool.org, korablev@tarantool.org : -----Original Message----- : From: Vladislav Shpilevoy : Subject: [PATCH v2 04/10] crc32: align memory access : : : diff --git a/src/cpu_feature.c b/src/cpu_feature.c : index 98567ccb3..9bf6223de 100644 : --- a/src/cpu_feature.c : +++ b/src/cpu_feature.c : @@ -50,7 +51,7 @@ : : : static uint32_t : -crc32c_hw_byte(uint32_t crc, unsigned char const *data, unsigned int : length) : +crc32c_hw_byte(uint32_t crc, char const *data, unsigned int length) : { : while (length--) { : __asm__ __volatile__( : @@ -68,6 +69,26 @@ crc32c_hw_byte(uint32_t crc, unsigned char const *data, : unsigned int length) : uint32_t : crc32c_hw(uint32_t crc, const char *buf, unsigned int len) : { : + const int align = alignof(unsigned long); : + unsigned long addr = (unsigned long)buf; : + unsigned int not_aligned_prefix = : + ((addr - 1 + align) & ~(align - 1)) - addr; Hmm, hmm... Isn't it simple `addr % align`? Or even `addr & (align - 1)` ? : + /* : + * Calculate CRC32 for the prefix byte-by-byte so as to : + * then use aligned words to calculate the rest. This is : + * twice less loads, because every load takes exactly one : + * word from memory. Not 2 words, which would need to be : + * partially merged then. : + * But the main reason is that unaligned loads are just : + * unsafe, because this is an undefined behaviour. : + */ : + if (not_aligned_prefix < len) { : + crc = crc32c_hw_byte(crc, buf, not_aligned_prefix); : + buf += not_aligned_prefix; : + len -= not_aligned_prefix; : + } else { : + return crc32c_hw_byte(crc, buf, len); : + } : unsigned int iquotient = len / SCALE_F; : unsigned int iremainder = len % SCALE_F; : unsigned long *ptmp = (unsigned long *)buf;