Buffer used for storing records to be sorted. More...

#include <filesort_utils.h>

Public Member Functions
	Filesort_buffer ()

size_t	sort_buffer (Sort_param *param, size_t num_input_rows, size_t max_output_rows)
	Sort me... More...

void	reset ()
	Prepares the buffer for the next batch of records to process. More...

Bounds_checked_array< uchar >	get_next_record_pointer (size_t min_size)
	Where should the next record be stored? More...

void	commit_used_memory (size_t num_bytes)

bool	preallocate_records (size_t num_records)
	Removes any existing rows and allocates `num_records` maximum-sized rows (call get_sorted_record() to get their pointers). More...

size_t	max_size_in_bytes () const

size_t	peak_memory_used () const
	How much memory has been allocated (counting both the sort buffer and the record pointers) at most since last call to clear_peak_memory_used(). More...

void	clear_peak_memory_used ()
	See peak_memory_used. More...

void	set_max_size (size_t max_size, size_t record_length)
	Set the memory limit for the sort buffer before starting to add records. More...

void	free_sort_buffer ()
	Frees all memory. More...

uchar **	get_sort_keys ()
	Get the list of record pointers as a contiguous array. More...

uchar *	get_sorted_record (size_t ix)
	Gets sorted record number ix. More...

Bounds_checked_array< uchar >	get_contiguous_buffer ()
	Clears all rows, then returns a contiguous buffer of maximum size. More...

Private Member Functions
bool	allocate_block (size_t num_bytes)
	Allocate a new block with space for at least `num_bytes` bytes. More...

bool	allocate_sized_block (size_t num_bytes)
	Allocate a new block of exactly `block_size` bytes, and sets it as the current block. More...

void	update_peak_memory_used () const
	See m_peak_memory_used. More...

Filesort_buffer &	operator= (const Filesort_buffer &rhs)=delete

Filesort_buffer &	operator= (Filesort_buffer &&rhs)=default

Private Attributes
uchar *	m_next_rec_ptr
	The next record will be inserted here. More...

uchar *	m_current_block_end
	The limit of the current block, exclusive. More...

std::vector< unique_ptr_my_free< uchar[]> >	m_blocks
	The memory blocks used for the actual data. More...

std::vector< uchar * >	m_record_pointers
	Pointer to the beginning of each record. More...

size_t	m_max_record_length
	Worst-case length of each record. More...

size_t	m_max_size_in_bytes
	Maximum number of bytes we are allowed to allocate in all. More...

size_t	m_current_block_size
	The size of the current memory block (m_blocks.back()), in bytes (or 0 if no block). More...

size_t	m_space_used_other_blocks
	The total size of all blocks except the current one, not including record pointers. More...

size_t	m_peak_memory_used {0}
	The largest amount of total memory we've been using since last call to clear_peak_memory_used(). More...

Detailed Description

Buffer used for storing records to be sorted.

The records are stored in a series of buffers that are allocated incrementally, growing 50% each time, similar to how a MEM_ROOT works. This allows the user to set a large maximum buffer size without getting huge allocations for sorting small result sets. It means that if you actually do use the entire buffer, there will be more allocations than one large allocation up-front, but this is a worthwhile tradeoff (those allocation will tend to disappear into the cost of actually getting all the rows and sorting them).

In addition, Filesort_buffer stores a vector of pointers to the beginning of each record. It is these pointers that are actually sorted in filesort. If the records are small, this can add up to overhead on top of the amount of memory the user expected to use. We do take already allocated pointers into account when calculating how big a new block can be, so the amount of badness is bounded:

Assume that we have set maximum record size to infinity, but that in practice, they are are about the smallest size possible (4-byte sort key plus 4-byte rowid) and that we are on a 64-bit system. Then, the worst possible overhead is that we use as much space on pointers as the size of the last (largest) block. We can look at the two possible extremes:

Smallest possible sort buffer (32 kB): 32 kB overhead.
A huge sort buffer (x kB): If the last block is y kB, the total size will be y + 2/3y + (2/3)²y + ... = 3y, which means the last block is 1/3 of the total size. Thus, pointer overhead will be no worse than 33%.

In most practical cases, it will be much better than this. In particular, even when allocating a block (where we don't yet know how many records will fit), we allow space for the record pointers we'd need given maximum-sized rows.

The buffer must be kept available for multiple executions of the same sort operation, so one can call reset() for reuse. Again similar to MEM_ROOT, this keeps the last (largest) block and discards the others.

Constructor & Destructor Documentation

◆ Filesort_buffer()

Filesort_buffer::Filesort_buffer ( )

inline

Member Function Documentation

◆ allocate_block()

bool Filesort_buffer::allocate_block ( size_t num_bytes )

private

Allocate a new block with space for at least num_bytes bytes.

Returns: true if the allocation failed (including if m_max_size_in_bytes was exceeded).

◆ allocate_sized_block()

bool Filesort_buffer::allocate_sized_block ( size_t num_bytes )

private

Allocate a new block of exactly block_size bytes, and sets it as the current block.

Does not check m_max_size_in_bytes.

Returns: true if the allocation failed

◆ clear_peak_memory_used()

void Filesort_buffer::clear_peak_memory_used ( )

inline

See peak_memory_used.

◆ commit_used_memory()

void Filesort_buffer::commit_used_memory ( size_t num_bytes )

inline

◆ free_sort_buffer()

void Filesort_buffer::free_sort_buffer ( )

Frees all memory.

Unlike reset(), which keeps one block for future use, this actually releases all blocks. It is intended to release memory in an error situation, for final shutdown, or if even the largest block will not be large enough for future allocations.

You do not need to call this if you are destroying the object anyway.

◆ get_contiguous_buffer()

Bounds_checked_array< uchar > Filesort_buffer::get_contiguous_buffer ( )

Clears all rows, then returns a contiguous buffer of maximum size.

(This may or may not involve allocation.) This is for reusing the memory for merge buffers, which requires the memory to be a single contiguous chunk; one could in theory adjust merging to allow using multiple buffers like sorting does, but once we need to merge, that means we've hit disk anyway (or at the very least, need to talk to the OS' buffer cache), and the cost of a single allocation is small compared to I/O.

If you use this memory area, you cannot also use the Filesort_buffer to store sort records (get_next_record_pointer etc.); that would use the same memory.

Can return nullptr, if allocation fails.

◆ get_next_record_pointer()

Bounds_checked_array< uchar > Filesort_buffer::get_next_record_pointer ( size_t min_size )

inline

Where should the next record be stored?

If a block is returned, it is always at least "min_size" bytes long. If the returned block is not large enough for your purposes, call get_next_record_pointer() again with a larger value of min_size than the size you got back. Just increasing the size by one byte is fine; the class will still try to make exponentially larger blocks each time.

If there's no room for a record of the given size, returns nullptr.

After you've written data to the given record, call commit_used_memory() with the number of bytes you've actually written. This ensures it will not get reused for subsequent records.

◆ get_sort_keys()

uchar ** Filesort_buffer::get_sort_keys ( )

inline

Get the list of record pointers as a contiguous array.

Will be invalidated by calling get_next_record_pointer() or otherwise changing the number of records.

◆ get_sorted_record()

uchar * Filesort_buffer::get_sorted_record ( size_t ix )

inline

Gets sorted record number ix.

See also: get_sort_keys() Only valid after buffer has been sorted!

◆ max_size_in_bytes()

size_t Filesort_buffer::max_size_in_bytes ( ) const

inline

◆ operator=() [1/2]

Filesort_buffer & Filesort_buffer::operator= ( const Filesort_buffer & rhs )

privatedelete

◆ operator=() [2/2]

Filesort_buffer & Filesort_buffer::operator= ( Filesort_buffer && rhs )

privatedefault

◆ peak_memory_used()

size_t Filesort_buffer::peak_memory_used ( ) const

inline

How much memory has been allocated (counting both the sort buffer and the record pointers) at most since last call to clear_peak_memory_used().

Note in particular that reset() and free_sort_buffer() does not zero this counter.

◆ preallocate_records()

bool Filesort_buffer::preallocate_records ( size_t num_records )

Removes any existing rows and allocates num_records maximum-sized rows (call get_sorted_record() to get their pointers).

This is somewhat more efficient than calling reset() and then get_next_record_pointer() repeatedly, as it guarantees that at most one allocation is needed.

Returns: true on memory allocation error, including if the allocated size would exceed max_size_in_bytes().

◆ reset()

void Filesort_buffer::reset ( )

Prepares the buffer for the next batch of records to process.

◆ set_max_size()

void Filesort_buffer::set_max_size	(	size_t	max_size,
		size_t	record_length
	)

inline

Set the memory limit for the sort buffer before starting to add records.

If trying to allocate space for a new row (in get_next_record_pointer) would take us past the set limit, allocation will fail. Note that we can go a bit over this limit due to having to store record pointers; see the class comment.

Parameters

max_size	Maximum size of the sort buffer, in bytes.
record_length	Worst-case size of each record, in bytes.

◆ sort_buffer()

size_t Filesort_buffer::sort_buffer	(	Sort_param *	param,
		size_t	num_input_rows,
		size_t	max_output_rows
	)

Sort me...

Returns: Number of records, after any deduplication

◆ update_peak_memory_used()

void Filesort_buffer::update_peak_memory_used ( ) const

private

See m_peak_memory_used.

Member Data Documentation

◆ m_blocks

std::vector<unique_ptr_my_free<uchar[]> > Filesort_buffer::m_blocks

private

The memory blocks used for the actual data.

◆ m_current_block_end

uchar* Filesort_buffer::m_current_block_end

private

The limit of the current block, exclusive.

◆ m_current_block_size

size_t Filesort_buffer::m_current_block_size

private

The size of the current memory block (m_blocks.back()), in bytes (or 0 if no block).

If nothing has been allocated from the block yet, the invariant m_next_rec_ptr + m_current_block_size == m_current_block_end holds.

◆ m_max_record_length

size_t Filesort_buffer::m_max_record_length

private

Worst-case length of each record.

◆ m_max_size_in_bytes

size_t Filesort_buffer::m_max_size_in_bytes

private

Maximum number of bytes we are allowed to allocate in all.

◆ m_next_rec_ptr

uchar* Filesort_buffer::m_next_rec_ptr

private

The next record will be inserted here.

◆ m_peak_memory_used

size_t Filesort_buffer::m_peak_memory_used {0}

mutableprivate

The largest amount of total memory we've been using since last call to clear_peak_memory_used().

This is updated lazily so that we don't need to do the calculations for every record (and thus is mutable). The only point where it must be explicitly updated (by calling update_peak_memory_used()), except when being asked for the value, is right before we deallocate memory, as otherwise, there could be a peak we had forgotten.

◆ m_record_pointers

std::vector<uchar *> Filesort_buffer::m_record_pointers

private

Pointer to the beginning of each record.

◆ m_space_used_other_blocks

size_t Filesort_buffer::m_space_used_other_blocks

private

The total size of all blocks except the current one, not including record pointers.

Used for bookkeeping how far away we are from reaching m_max_size_in_bytes.

The documentation for this class was generated from the following files:

sql/filesort_utils.h
sql/filesort_utils.cc

Public Member Functions

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ Filesort_buffer()

Member Function Documentation

◆ allocate_block()

◆ allocate_sized_block()

◆ clear_peak_memory_used()

◆ commit_used_memory()

◆ free_sort_buffer()

◆ get_contiguous_buffer()

◆ get_next_record_pointer()

◆ get_sort_keys()

◆ get_sorted_record()

◆ max_size_in_bytes()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ peak_memory_used()

◆ preallocate_records()

◆ reset()

◆ set_max_size()

◆ sort_buffer()

◆ update_peak_memory_used()

Member Data Documentation

◆ m_blocks

◆ m_current_block_end

◆ m_current_block_size

◆ m_max_record_length

◆ m_max_size_in_bytes

◆ m_next_rec_ptr

◆ m_peak_memory_used

◆ m_record_pointers

◆ m_space_used_other_blocks