|
struct | Combination_step_executor |
| The body of unrolled loop used to combine partial results from each slice into the final hash of whole chunk, which in i-th iteration takes the crc of i-th slice and "rolls it forward" by virtually processing as many zeros as there are from the end of the i-th slice to the end of the chunk. More...
|
|
struct | crc32_impl |
| The collection of functions implementing hardware accelerated updating of CRC32-C hash by processing a few (1,2,4 or 8) bytes of input. More...
|
|
struct | Loop |
| A helper template to statically unroll a loop with a fixed number of iterations, where the iteration number itself is constexpr. More...
|
|
struct | Loop< 0 > |
|
struct | Update_step_executor |
| The body of unrolled loop used to process slices in parallel, which in i-th iteration processes 8 bytes from the i-th slice of data, where each slice has slice_len bytes. More...
|
|
struct | use_pclmul |
| Implementation of polynomial_mul_rev<w>(rev_u) function which uses hardware accelerated polynomial multiplication to compute rev(w*u), where rev_u=rev(u). More...
|
|
struct | use_unrolled_loop_poly_mul |
| Implementation of polynomial_mul_rev<w>(rev_u) function which uses a simple loop over i: if(w>>i&1)result^=rev_u<<(32-i), which is equivalent to w * flip_at_32(rev_u), which in turn is equivalent to rev(rev(w) * rev_u),. More...
|
|
|
bool | can_use_crc32 () |
| Checks if hardware accelerated crc32 instructions are available to this process right now. More...
|
|
bool | can_use_poly_mul () |
| Checks if hardware accelerated polynomial multiplication instructions are available to this process right now. More...
|
|
constexpr uint32_t | compute_x_to_8len (size_t len) |
| Computes x^(len*8) modulo CRC32-C polynomial, which is useful, when you need to conceptually append len bytes of zeros to already computed hash. More...
|
|
constexpr uint64_t | flip_at_32 (uint32_t w) |
| Produces a 64-bit result by moving i-th bit of 32-bit input to the 32-i-th position (zeroing the other bits). More...
|
|
template<size_t len, typename algo_to_use > |
static uint64_t | roll (uint32_t crc) |
| Rolls the crc forward by len bytes, that is updates it as if 8*len zero bits were processed. More...
|
|
template<typename algo_to_use > |
static uint32_t | fold_64_to_32 (uint64_t big) |
| Takes a 64-bit reversed representation of a polynomial, and computes the 32-bit reversed representation of it modulo CRC32-C. More...
|
|
template<size_t slice_len, size_t slices_count, typename algo_to_use > |
static uint32_t | consume_chunk (uint32_t crc0, const unsigned char *data) |
| Updates the crc checksum by processing slices_count*slice_len bytes of data. More...
|
|
template<size_t slice_len, size_t slices_count, typename algo_to_use > |
static void | consume_chunks (uint32_t &crc, const byte *&data, size_t &len) |
| Updates the crc checksum by processing at most len bytes of data. More...
|
|
template<typename Chunk , typename algo_to_use > |
static void | consume_pow2 (uint32_t &crc, const byte *&data, size_t len) |
| Updates the crc checksum by processing Chunk (1,2 or 4 bytes) of data, but only when the len of the data provided, when decomposed into powers of two, has a Chunk of this length. More...
|
|
template<typename algo_to_use > |
static uint32_t | crc32 (uint32_t crc, const byte *data, size_t len) |
| The hardware accelerated implementation of CRC32-C exploiting within-core parallelism on reordering processors, by consuming the data in large chunks split into 3 independent slices each. More...
|
|
uint32_t | crc32_using_pclmul (const byte *data, size_t len) |
| The specialization of crc32<> template for use_pclmul and 0 as initial value of the hash. More...
|
|
uint32_t | crc32_using_unrolled_loop_poly_mul (const byte *data, size_t len) |
| The specialization of crc32<> template for use_unrolled_loop_poly_mul and 0 as initial value of the hash. More...
|
|
template<size_t slice_len, size_t slices_count, typename algo_to_use >
static uint32_t hardware::consume_chunk |
( |
uint32_t |
crc0, |
|
|
const unsigned char * |
data |
|
) |
| |
|
inlinestatic |
Updates the crc checksum by processing slices_count*slice_len bytes of data.
The chunk is processed as slice_count independent slices of length slice_len, and the results are combined together at the end to compute correct result.
- Parameters
-
[in] | crc0 | initial value of the hash |
[in] | data | data over which to calculate CRC32-C |
- Returns
- The value of _crc updated by processing the range data[0]...data[slices_count*slice_len-1].
template<size_t slice_len, size_t slices_count, typename algo_to_use >
static void hardware::consume_chunks |
( |
uint32_t & |
crc, |
|
|
const byte *& |
data, |
|
|
size_t & |
len |
|
) |
| |
|
inlinestatic |
Updates the crc checksum by processing at most len bytes of data.
The data is consumed in chunks of size slice_len*slices_count, and stops when no more full chunks can be fit into len bytes. Each chunk is processed as slice_count independent slices of length slice_len, and the results are combined together at the end to compute correct result.
- Parameters
-
[in,out] | crc | initial value of the hash. Updated by this function by processing data[0]...data[B*(slice_len * slices_count)], where B = floor(len / (slice_len * slices_count)). |
[in,out] | data | data over which to calculate CRC32-C. Advanced by this function to point to unprocessed part of the buffer. |
[in,out] | len | data length to be processed. Updated by this function to be len % (slice_len * slices_count). |
template<typename Chunk , typename algo_to_use >
static void hardware::consume_pow2 |
( |
uint32_t & |
crc, |
|
|
const byte *& |
data, |
|
|
size_t |
len |
|
) |
| |
|
inlinestatic |
Updates the crc checksum by processing Chunk (1,2 or 4 bytes) of data, but only when the len of the data provided, when decomposed into powers of two, has a Chunk of this length.
This is used to process the prefix of the buffer to get to the position which is aligned mod 8, and to process the remaining suffix which starts at position aligned mod 8, but has less than 8 bytes.
- Parameters
-
[in,out] | crc | initial value of the hash. Updated by this function by processing Chunk pointed by data. |
[in,out] | data | data over which to calculate CRC32-C. Advanced by this function to point to unprocessed part of the buffer. |
[in,out] | len | data length, allowed to be processed. |
template<typename algo_to_use >
static uint32_t hardware::crc32 |
( |
uint32_t |
crc, |
|
|
const byte * |
data, |
|
|
size_t |
len |
|
) |
| |
|
static |
The hardware accelerated implementation of CRC32-C exploiting within-core parallelism on reordering processors, by consuming the data in large chunks split into 3 independent slices each.
It's optimized for handling buffers of length typical for 16kb pages and redo log blocks, but it works correctly for any len and alignment.
- Parameters
-
[in] | crc | initial value of the hash (0 for first block, or the result of CRC32-C for the data processed so far) |
[in] | data | data over which to calculate CRC32-C |
[in] | len | data length |
- Returns
- CRC-32C (polynomial 0x11EDC6F41)
constexpr uint64_t hardware::flip_at_32 |
( |
uint32_t |
w | ) |
|
|
constexpr |
Produces a 64-bit result by moving i-th bit of 32-bit input to the 32-i-th position (zeroing the other bits).
Please note that in particular this moves 0-th bit to 32-nd, and 31-st bit to 1-st, so the range in which data resides is not only mirrored, but also shifted one bit. Such operation is useful for implementing polynomial multiplication when one of the operands is given in reverse and we need the result reversed, too (as is the case in CRC32-C): rev(w * v) = rev(w)*flip_at_32(v) proof: rev(w * v)[i] = (w * v)[63-i] = sum(0<=j<=31){w[j]*v[63-i-j]} = sum(0<=j<=31){rev(w)[31-j]*v[63-i-j]} = sum(0<=j<=31){rev(w)[31-j]*flip_at_32(v)[32-63+i+j]} = sum(0<=j<=31){rev(w)[31-j]*flip_at_32(v)[i-(j-31)]} = sum(0<=j<=31){rev(w)[j]*flip_at_32(v)[i-j]} = rev(w)*flip_at_32(v)[i] So, for example, if crc32=rev(w) is the variable storing the CRC32-C hash of a buffer, and you want to conceptually append len bytes of zeros to it, then you can precompute v = compute_x_to_8len(len), and are interested in rev(w*v), which you can achieve by crc32 * flip_at_32(compute_x_to_8len(len)).
- Parameters
-
[in] | w | The input 32-bit polynomial |
- Returns
- The polynomial flipped and shifted, so that i-th bit becomes 32-i-th.