MySQL 8.4.0
Source Code Documentation
hardware Namespace Reference

Classes

struct  Combination_step_executor
 The body of unrolled loop used to combine partial results from each slice into the final hash of whole chunk, which in i-th iteration takes the crc of i-th slice and "rolls it forward" by virtually processing as many zeros as there are from the end of the i-th slice to the end of the chunk. More...
 
struct  crc32_impl
 The collection of functions implementing hardware accelerated updating of CRC32-C hash by processing a few (1,2,4 or 8) bytes of input. More...
 
struct  Loop
 A helper template to statically unroll a loop with a fixed number of iterations, where the iteration number itself is constexpr. More...
 
struct  Loop< 0 >
 
struct  Update_step_executor
 The body of unrolled loop used to process slices in parallel, which in i-th iteration processes 8 bytes from the i-th slice of data, where each slice has slice_len bytes. More...
 
struct  use_pclmul
 Implementation of polynomial_mul_rev<w>(rev_u) function which uses hardware accelerated polynomial multiplication to compute rev(w*u), where rev_u=rev(u). More...
 
struct  use_unrolled_loop_poly_mul
 Implementation of polynomial_mul_rev<w>(rev_u) function which uses a simple loop over i: if(w>>i&1)result^=rev_u<<(32-i), which is equivalent to w * flip_at_32(rev_u), which in turn is equivalent to rev(rev(w) * rev_u),. More...
 

Functions

bool can_use_crc32 ()
 Checks if hardware accelerated crc32 instructions are available to this process right now. More...
 
bool can_use_poly_mul ()
 Checks if hardware accelerated polynomial multiplication instructions are available to this process right now. More...
 
constexpr uint32_t compute_x_to_8len (size_t len)
 Computes x^(len*8) modulo CRC32-C polynomial, which is useful, when you need to conceptually append len bytes of zeros to already computed hash. More...
 
constexpr uint64_t flip_at_32 (uint32_t w)
 Produces a 64-bit result by moving i-th bit of 32-bit input to the 32-i-th position (zeroing the other bits). More...
 
template<size_t len, typename algo_to_use >
static uint64_t roll (uint32_t crc)
 Rolls the crc forward by len bytes, that is updates it as if 8*len zero bits were processed. More...
 
template<typename algo_to_use >
static uint32_t fold_64_to_32 (uint64_t big)
 Takes a 64-bit reversed representation of a polynomial, and computes the 32-bit reversed representation of it modulo CRC32-C. More...
 
template<size_t slice_len, size_t slices_count, typename algo_to_use >
static uint32_t consume_chunk (uint32_t crc0, const unsigned char *data)
 Updates the crc checksum by processing slices_count*slice_len bytes of data. More...
 
template<size_t slice_len, size_t slices_count, typename algo_to_use >
static void consume_chunks (uint32_t &crc, const byte *&data, size_t &len)
 Updates the crc checksum by processing at most len bytes of data. More...
 
template<typename Chunk , typename algo_to_use >
static void consume_pow2 (uint32_t &crc, const byte *&data, size_t len)
 Updates the crc checksum by processing Chunk (1,2 or 4 bytes) of data, but only when the len of the data provided, when decomposed into powers of two, has a Chunk of this length. More...
 
template<typename algo_to_use >
static uint32_t crc32 (uint32_t crc, const byte *data, size_t len)
 The hardware accelerated implementation of CRC32-C exploiting within-core parallelism on reordering processors, by consuming the data in large chunks split into 3 independent slices each. More...
 
uint32_t crc32_using_pclmul (const byte *data, size_t len)
 The specialization of crc32<> template for use_pclmul and 0 as initial value of the hash. More...
 
uint32_t crc32_using_unrolled_loop_poly_mul (const byte *data, size_t len)
 The specialization of crc32<> template for use_unrolled_loop_poly_mul and 0 as initial value of the hash. More...
 

Function Documentation

◆ can_use_crc32()

bool hardware::can_use_crc32 ( )

Checks if hardware accelerated crc32 instructions are available to this process right now.

◆ can_use_poly_mul()

bool hardware::can_use_poly_mul ( )

Checks if hardware accelerated polynomial multiplication instructions are available to this process right now.

◆ compute_x_to_8len()

constexpr uint32_t hardware::compute_x_to_8len ( size_t  len)
constexpr

Computes x^(len*8) modulo CRC32-C polynomial, which is useful, when you need to conceptually append len bytes of zeros to already computed hash.

Parameters
[in]lenThe value of len in the x^(len*8) mod CRC32-C
Returns
the rest of x^(len*8) mod CRC32-C, with the most significant coefficient of the resulting rest - the one which is for x^31 - stored as the most significant bit of the uint32_t (the one at 1U<<31)

◆ consume_chunk()

template<size_t slice_len, size_t slices_count, typename algo_to_use >
static uint32_t hardware::consume_chunk ( uint32_t  crc0,
const unsigned char *  data 
)
inlinestatic

Updates the crc checksum by processing slices_count*slice_len bytes of data.

The chunk is processed as slice_count independent slices of length slice_len, and the results are combined together at the end to compute correct result.

Parameters
[in]crc0initial value of the hash
[in]datadata over which to calculate CRC32-C
Returns
The value of _crc updated by processing the range data[0]...data[slices_count*slice_len-1].

◆ consume_chunks()

template<size_t slice_len, size_t slices_count, typename algo_to_use >
static void hardware::consume_chunks ( uint32_t &  crc,
const byte *&  data,
size_t &  len 
)
inlinestatic

Updates the crc checksum by processing at most len bytes of data.

The data is consumed in chunks of size slice_len*slices_count, and stops when no more full chunks can be fit into len bytes. Each chunk is processed as slice_count independent slices of length slice_len, and the results are combined together at the end to compute correct result.

Parameters
[in,out]crcinitial value of the hash. Updated by this function by processing data[0]...data[B*(slice_len * slices_count)], where B = floor(len / (slice_len * slices_count)).
[in,out]datadata over which to calculate CRC32-C. Advanced by this function to point to unprocessed part of the buffer.
[in,out]lendata length to be processed. Updated by this function to be len % (slice_len * slices_count).

◆ consume_pow2()

template<typename Chunk , typename algo_to_use >
static void hardware::consume_pow2 ( uint32_t &  crc,
const byte *&  data,
size_t  len 
)
inlinestatic

Updates the crc checksum by processing Chunk (1,2 or 4 bytes) of data, but only when the len of the data provided, when decomposed into powers of two, has a Chunk of this length.

This is used to process the prefix of the buffer to get to the position which is aligned mod 8, and to process the remaining suffix which starts at position aligned mod 8, but has less than 8 bytes.

Parameters
[in,out]crcinitial value of the hash. Updated by this function by processing Chunk pointed by data.
[in,out]datadata over which to calculate CRC32-C. Advanced by this function to point to unprocessed part of the buffer.
[in,out]lendata length, allowed to be processed.

◆ crc32()

template<typename algo_to_use >
static uint32_t hardware::crc32 ( uint32_t  crc,
const byte data,
size_t  len 
)
static

The hardware accelerated implementation of CRC32-C exploiting within-core parallelism on reordering processors, by consuming the data in large chunks split into 3 independent slices each.

It's optimized for handling buffers of length typical for 16kb pages and redo log blocks, but it works correctly for any len and alignment.

Parameters
[in]crcinitial value of the hash (0 for first block, or the result of CRC32-C for the data processed so far)
[in]datadata over which to calculate CRC32-C
[in]lendata length
Returns
CRC-32C (polynomial 0x11EDC6F41)

◆ crc32_using_pclmul()

uint32_t hardware::crc32_using_pclmul ( const byte data,
size_t  len 
)

The specialization of crc32<> template for use_pclmul and 0 as initial value of the hash.

Used on platforms which support hardware accelerated polynomial multiplication. It's non-static so it can be unit-tested.

Parameters
[in]datadata over which to calculate CRC32-C
[in]lendata length
Returns
CRC-32C (polynomial 0x11EDC6F41)

◆ crc32_using_unrolled_loop_poly_mul()

uint32_t hardware::crc32_using_unrolled_loop_poly_mul ( const byte data,
size_t  len 
)

The specialization of crc32<> template for use_unrolled_loop_poly_mul and 0 as initial value of the hash.

Used on platforms which do not support hardware accelerated polynomial multiplication. It's non-static so it can be unit-tested.

Parameters
[in]datadata over which to calculate CRC32-C
[in]lendata length
Returns
CRC-32C (polynomial 0x11EDC6F41)

◆ flip_at_32()

constexpr uint64_t hardware::flip_at_32 ( uint32_t  w)
constexpr

Produces a 64-bit result by moving i-th bit of 32-bit input to the 32-i-th position (zeroing the other bits).

Please note that in particular this moves 0-th bit to 32-nd, and 31-st bit to 1-st, so the range in which data resides is not only mirrored, but also shifted one bit. Such operation is useful for implementing polynomial multiplication when one of the operands is given in reverse and we need the result reversed, too (as is the case in CRC32-C): rev(w * v) = rev(w)*flip_at_32(v) proof: rev(w * v)[i] = (w * v)[63-i] = sum(0<=j<=31){w[j]*v[63-i-j]} = sum(0<=j<=31){rev(w)[31-j]*v[63-i-j]} = sum(0<=j<=31){rev(w)[31-j]*flip_at_32(v)[32-63+i+j]} = sum(0<=j<=31){rev(w)[31-j]*flip_at_32(v)[i-(j-31)]} = sum(0<=j<=31){rev(w)[j]*flip_at_32(v)[i-j]} = rev(w)*flip_at_32(v)[i] So, for example, if crc32=rev(w) is the variable storing the CRC32-C hash of a buffer, and you want to conceptually append len bytes of zeros to it, then you can precompute v = compute_x_to_8len(len), and are interested in rev(w*v), which you can achieve by crc32 * flip_at_32(compute_x_to_8len(len)).

Parameters
[in]wThe input 32-bit polynomial
Returns
The polynomial flipped and shifted, so that i-th bit becomes 32-i-th.

◆ fold_64_to_32()

template<typename algo_to_use >
static uint32_t hardware::fold_64_to_32 ( uint64_t  big)
inlinestatic

Takes a 64-bit reversed representation of a polynomial, and computes the 32-bit reversed representation of it modulo CRC32-C.

Parameters
[in]bigThe 64-bit representation of polynomial w, with the most significant coefficient (the one for x^63) stored at least significant bit (the one at 1<<0).
Returns
The 32-bit representation of w mod CRC-32, in which the most significant coefficient (the one for x^31) stored at least significant bit (the one at 1<<0).

◆ roll()

template<size_t len, typename algo_to_use >
static uint64_t hardware::roll ( uint32_t  crc)
inlinestatic

Rolls the crc forward by len bytes, that is updates it as if 8*len zero bits were processed.

Parameters
[in]crcinitial value of the hash
Returns
Updated value of the hash: rev(rev(crc)*(x^{8*len} mod CRC32-C))