Classes
struct	Combination_step_executor
	The body of unrolled loop used to combine partial results from each slice into the final hash of whole chunk, which in i-th iteration takes the crc of i-th slice and "rolls it forward" by virtually processing as many zeros as there are from the end of the i-th slice to the end of the chunk. More...

struct	crc32_impl
	The collection of functions implementing hardware accelerated updating of CRC32-C hash by processing a few (1,2,4 or 8) bytes of input. More...

struct	Loop
	A helper template to statically unroll a loop with a fixed number of iterations, where the iteration number itself is constexpr. More...

struct	Loop< 0 >

struct	Update_step_executor
	The body of unrolled loop used to process slices in parallel, which in i-th iteration processes 8 bytes from the i-th slice of data, where each slice has slice_len bytes. More...

struct	use_pclmul
	Implementation of polynomial_mul_rev<w>(rev_u) function which uses hardware accelerated polynomial multiplication to compute rev(w*u), where rev_u=rev(u). More...

struct	use_unrolled_loop_poly_mul
	Implementation of polynomial_mul_rev<w>(rev_u) function which uses a simple loop over i: if(w>>i&1)result^=rev_u<<(32-i), which is equivalent to w * flip_at_32(rev_u), which in turn is equivalent to rev(rev(w) * rev_u),. More...

Functions
bool	can_use_crc32 ()
	Checks if hardware accelerated crc32 instructions are available to this process right now. More...

bool	can_use_poly_mul ()
	Checks if hardware accelerated polynomial multiplication instructions are available to this process right now. More...

constexpr uint32_t	compute_x_to_8len (size_t len)
	Computes x^(len*8) modulo CRC32-C polynomial, which is useful, when you need to conceptually append len bytes of zeros to already computed hash. More...

constexpr uint64_t	flip_at_32 (uint32_t w)
	Produces a 64-bit result by moving i-th bit of 32-bit input to the 32-i-th position (zeroing the other bits). More...

template<size_t len, typename algo_to_use >
static uint64_t	roll (uint32_t crc)
	Rolls the crc forward by len bytes, that is updates it as if 8*len zero bits were processed. More...

template<typename algo_to_use >
static uint32_t	fold_64_to_32 (uint64_t big)
	Takes a 64-bit reversed representation of a polynomial, and computes the 32-bit reversed representation of it modulo CRC32-C. More...

template<size_t slice_len, size_t slices_count, typename algo_to_use >
static uint32_t	consume_chunk (uint32_t crc0, const unsigned char *data)
	Updates the crc checksum by processing slices_count*slice_len bytes of data. More...

template<size_t slice_len, size_t slices_count, typename algo_to_use >
static void	consume_chunks (uint32_t &crc, const byte *&data, size_t &len)
	Updates the crc checksum by processing at most len bytes of data. More...

template<typename Chunk , typename algo_to_use >
static void	consume_pow2 (uint32_t &crc, const byte *&data, size_t len)
	Updates the crc checksum by processing Chunk (1,2 or 4 bytes) of data, but only when the len of the data provided, when decomposed into powers of two, has a Chunk of this length. More...

template<typename algo_to_use >
static uint32_t	crc32 (uint32_t crc, const byte *data, size_t len)
	The hardware accelerated implementation of CRC32-C exploiting within-core parallelism on reordering processors, by consuming the data in large chunks split into 3 independent slices each. More...

uint32_t	crc32_using_pclmul (const byte *data, size_t len)
	The specialization of crc32<> template for use_pclmul and 0 as initial value of the hash. More...

uint32_t	crc32_using_unrolled_loop_poly_mul (const byte *data, size_t len)
	The specialization of crc32<> template for use_unrolled_loop_poly_mul and 0 as initial value of the hash. More...

Function Documentation

◆ can_use_crc32()

bool hardware::can_use_crc32 ( )

Checks if hardware accelerated crc32 instructions are available to this process right now.

◆ can_use_poly_mul()

bool hardware::can_use_poly_mul ( )

Checks if hardware accelerated polynomial multiplication instructions are available to this process right now.

◆ compute_x_to_8len()

constexpr uint32_t hardware::compute_x_to_8len ( size_t len )

constexpr

Computes x^(len*8) modulo CRC32-C polynomial, which is useful, when you need to conceptually append len bytes of zeros to already computed hash.

Parameters

[in] len The value of len in the x^(len*8) mod CRC32-C

Returns: the rest of x^(len*8) mod CRC32-C, with the most significant coefficient of the resulting rest - the one which is for x^31 - stored as the most significant bit of the uint32_t (the one at 1U<<31)

◆ consume_chunk()

template<size_t slice_len, size_t slices_count, typename algo_to_use >

static uint32_t hardware::consume_chunk	(	uint32_t	crc0,
		const unsigned char *	data
	)

inlinestatic

Updates the crc checksum by processing slices_count*slice_len bytes of data.

The chunk is processed as slice_count independent slices of length slice_len, and the results are combined together at the end to compute correct result.

Parameters

[in]	crc0	initial value of the hash
[in]	data	data over which to calculate CRC32-C

Returns: The value of _crc updated by processing the range data[0]...data[slices_count*slice_len-1].

◆ consume_chunks()

template<size_t slice_len, size_t slices_count, typename algo_to_use >

static void hardware::consume_chunks	(	uint32_t &	crc,
		const byte *&	data,
		size_t &	len
	)

inlinestatic

Updates the crc checksum by processing at most len bytes of data.

The data is consumed in chunks of size slice_len*slices_count, and stops when no more full chunks can be fit into len bytes. Each chunk is processed as slice_count independent slices of length slice_len, and the results are combined together at the end to compute correct result.

Parameters

[in,out]	crc	initial value of the hash. Updated by this function by processing data[0]...data[B(slice_len slices_count)], where B = floor(len / (slice_len * slices_count)).
[in,out]	data	data over which to calculate CRC32-C. Advanced by this function to point to unprocessed part of the buffer.
[in,out]	len	data length to be processed. Updated by this function to be len % (slice_len * slices_count).

◆ consume_pow2()

template<typename Chunk , typename algo_to_use >

static void hardware::consume_pow2	(	uint32_t &	crc,
		const byte *&	data,
		size_t	len
	)

inlinestatic

Updates the crc checksum by processing Chunk (1,2 or 4 bytes) of data, but only when the len of the data provided, when decomposed into powers of two, has a Chunk of this length.

This is used to process the prefix of the buffer to get to the position which is aligned mod 8, and to process the remaining suffix which starts at position aligned mod 8, but has less than 8 bytes.

Parameters

[in,out]	crc	initial value of the hash. Updated by this function by processing Chunk pointed by data.
[in,out]	data	data over which to calculate CRC32-C. Advanced by this function to point to unprocessed part of the buffer.
[in,out]	len	data length, allowed to be processed.

◆ crc32()

template<typename algo_to_use >

static uint32_t hardware::crc32	(	uint32_t	crc,
		const byte *	data,
		size_t	len
	)

static

The hardware accelerated implementation of CRC32-C exploiting within-core parallelism on reordering processors, by consuming the data in large chunks split into 3 independent slices each.

It's optimized for handling buffers of length typical for 16kb pages and redo log blocks, but it works correctly for any len and alignment.

Parameters

[in]	crc	initial value of the hash (0 for first block, or the result of CRC32-C for the data processed so far)
[in]	data	data over which to calculate CRC32-C
[in]	len	data length

Returns: CRC-32C (polynomial 0x11EDC6F41)

◆ crc32_using_pclmul()

uint32_t hardware::crc32_using_pclmul	(	const byte *	data,
		size_t	len
	)

The specialization of crc32<> template for use_pclmul and 0 as initial value of the hash.

Used on platforms which support hardware accelerated polynomial multiplication. It's non-static so it can be unit-tested.

Parameters

[in]	data	data over which to calculate CRC32-C
[in]	len	data length

Returns: CRC-32C (polynomial 0x11EDC6F41)

◆ crc32_using_unrolled_loop_poly_mul()

uint32_t hardware::crc32_using_unrolled_loop_poly_mul	(	const byte *	data,
		size_t	len
	)

The specialization of crc32<> template for use_unrolled_loop_poly_mul and 0 as initial value of the hash.

Used on platforms which do not support hardware accelerated polynomial multiplication. It's non-static so it can be unit-tested.

Parameters

[in]	data	data over which to calculate CRC32-C
[in]	len	data length

Returns: CRC-32C (polynomial 0x11EDC6F41)

◆ flip_at_32()

constexpr uint64_t hardware::flip_at_32 ( uint32_t w )

constexpr

Produces a 64-bit result by moving i-th bit of 32-bit input to the 32-i-th position (zeroing the other bits).

Please note that in particular this moves 0-th bit to 32-nd, and 31-st bit to 1-st, so the range in which data resides is not only mirrored, but also shifted one bit. Such operation is useful for implementing polynomial multiplication when one of the operands is given in reverse and we need the result reversed, too (as is the case in CRC32-C): rev(w * v) = rev(w)*flip_at_32(v) proof: rev(w * v)[i] = (w * v)[63-i] = sum(0<=j<=31){w[j]*v[63-i-j]} = sum(0<=j<=31){rev(w)[31-j]*v[63-i-j]} = sum(0<=j<=31){rev(w)[31-j]*flip_at_32(v)[32-63+i+j]} = sum(0<=j<=31){rev(w)[31-j]*flip_at_32(v)[i-(j-31)]} = sum(0<=j<=31){rev(w)[j]*flip_at_32(v)[i-j]} = rev(w)*flip_at_32(v)[i] So, for example, if crc32=rev(w) is the variable storing the CRC32-C hash of a buffer, and you want to conceptually append len bytes of zeros to it, then you can precompute v = compute_x_to_8len(len), and are interested in rev(w*v), which you can achieve by crc32 * flip_at_32(compute_x_to_8len(len)).

Parameters

[in] w The input 32-bit polynomial

Returns: The polynomial flipped and shifted, so that i-th bit becomes 32-i-th.

◆ fold_64_to_32()

template<typename algo_to_use >

static uint32_t hardware::fold_64_to_32 ( uint64_t big )

inlinestatic

Takes a 64-bit reversed representation of a polynomial, and computes the 32-bit reversed representation of it modulo CRC32-C.

Parameters

[in] big The 64-bit representation of polynomial w, with the most significant coefficient (the one for x^63) stored at least significant bit (the one at 1<<0).

Returns: The 32-bit representation of w mod CRC-32, in which the most significant coefficient (the one for x^31) stored at least significant bit (the one at 1<<0).

◆ roll()

template<size_t len, typename algo_to_use >

static uint64_t hardware::roll ( uint32_t crc )

inlinestatic

Rolls the crc forward by len bytes, that is updates it as if 8*len zero bits were processed.

Parameters

[in] crc initial value of the hash

Returns: Updated value of the hash: rev(rev(crc)*(x^{8*len} mod CRC32-C))

Classes

Functions

Function Documentation

◆ can_use_crc32()

◆ can_use_poly_mul()

◆ compute_x_to_8len()

◆ consume_chunk()

◆ consume_chunks()

◆ consume_pow2()

◆ crc32()

◆ crc32_using_pclmul()

◆ crc32_using_unrolled_loop_poly_mul()

◆ flip_at_32()

◆ fold_64_to_32()

◆ roll()