#include <trx0sys.h>

Classes
struct	Trx_track_hash

Public Member Functions
By_id const &	by_id () const

trx_id_t	min_id () const

trx_t *	get (trx_id_t trx_id) const

void	insert (trx_t &trx)

void	erase (trx_id_t trx_id)

Static Public Member Functions
static trx_id_t	get_cheap_lower_bound_for_already_active_id ()
	Returns a value which is lower or equal to id of any transaction for which insert(id) happened before the call started, and erase(id) has not happened before the start of the call. More...

static trx_id_t	get_better_lower_bound_for_already_active_id ()

Private Types
using	By_id = std::unordered_map< trx_id_t, trx_t *, Trx_track_hash >

Static Private Member Functions
static void	limit_to (std::atomic< trx_id_t > &a, trx_id_t upper_bound)
	Performs an equivalent of if(upper_bound < a) a=upper_bound atomically, ignoring, but preserving the UPDATING_LOWER_BOUND flag. More...

Private Attributes
By_id	m_by_id

std::atomic< trx_id_t >	m_min_id {TRX_ID_MAX}
	For observers which use Trx_shard::mutex protection: each transaction id in the m_by_id is guaranteed to be at least m_min_id. More...

Static Private Attributes
static std::atomic< trx_id_t >	s_lower_bound [2] = {0, 0}
	A "lower bound" is a value which is guaranteed to be smaller or equal than any id in any of the shards of transactions which finished calling insert(id) and have not yet started a call to erase(id). More...

static constexpr trx_id_t	UPDATING_LOWER_BOUND = trx_id_t{1} << 63

static std::atomic< bool >	s_updating_lower_bound {false}
	This is used during get_better_lower_bound_for_already_active_id() to announce that it is trying to establish new value for s_lower_bound. More...

Member Typedef Documentation

◆ By_id

using Trx_by_id_with_min::By_id = std::unordered_map<trx_id_t, trx_t *, Trx_track_hash>

private

Member Function Documentation

◆ by_id()

By_id const & Trx_by_id_with_min::by_id ( ) const

inline

◆ erase()

void Trx_by_id_with_min::erase ( trx_id_t trx_id )

inline

◆ get()

trx_t * Trx_by_id_with_min::get ( trx_id_t trx_id ) const

inline

◆ get_better_lower_bound_for_already_active_id()

trx_id_t Trx_by_id_with_min::get_better_lower_bound_for_already_active_id ( )

static

See also: get_cheap_lower_bound_for_already_active_id() from which this function differs by executing a tighter estimation. If it is indeed better, then as a side effect it will bump the value of s_lower_bound used by get_cheap_lower_bound_for_already_active_id()

◆ get_cheap_lower_bound_for_already_active_id()

static trx_id_t Trx_by_id_with_min::get_cheap_lower_bound_for_already_active_id ( )

inlinestatic

Returns a value which is lower or equal to id of any transaction for which insert(id) happened before the call started, and erase(id) has not happened before the start of the call.

See also: s_lower_bound Note that this value never increases unless someone calls; get_better_lower_bound_for_already_active_id()

◆ insert()

void Trx_by_id_with_min::insert ( trx_t & trx )

inline

◆ limit_to()

static void Trx_by_id_with_min::limit_to	(	std::atomic< trx_id_t > &	a,
		trx_id_t	upper_bound
	)

inlinestaticprivate

Performs an equivalent of if(upper_bound < a) a=upper_bound atomically, ignoring, but preserving the UPDATING_LOWER_BOUND flag.

Parameters

[in]	a	The atomic we want to limit to upper_bound
[in]	upper_bound	The upper_bound we want to impose on a

◆ min_id()

trx_id_t Trx_by_id_with_min::min_id ( ) const

inline

Member Data Documentation

◆ m_by_id

By_id Trx_by_id_with_min::m_by_id

private

◆ m_min_id

std::atomic<trx_id_t> Trx_by_id_with_min::m_min_id {TRX_ID_MAX}

private

For observers which use Trx_shard::mutex protection: each transaction id in the m_by_id is guaranteed to be at least m_min_id.

Writes are protected with Trx_shard::mutex. Reads can be performed without any latch before accessing m_by_id, but care must be taken to interpret the result -

See also: trx_rw_is_active for details.

◆ s_lower_bound

std::atomic< trx_id_t > Trx_by_id_with_min::s_lower_bound = {0, 0}

staticprivate

A "lower bound" is a value which is guaranteed to be smaller or equal than any id in any of the shards of transactions which finished calling insert(id) and have not yet started a call to erase(id).

I.e. a shard can already contain an id smaller than this value, if insert(id) has still not finished. This is sufficient guarantee, if you only care about "active" transactions in the sense that insert(id) for them has happened before the call and erase(id) hasn't started. For example, when you want to check if a record you look at could have been modified by any of active transactions, then this is a valid assumption as creating a record happens after insert(id).

Each of the two values in this array may be used as a lower bound if its highest bit (UPDATING_LOWER_BOUND) is not set. At most one of them has the highest bit set at any given time. If both of them can be used as a lower bound, it follows, that you can use maximum of the two, as the best lower bound estimate. Note that this value may be way lower than actual minimum, as it is only updated from time to time by get_better_lower_bound_for_already_active_id(). The highest bit being set for a given entry means it is currently undergoing the process of updating and its value should not be used until its finished.

Transactions performing insert(id) should ensure that for each i=0,1: (s_lower_bound[i]&~UPDATING_LOWER_BOUND) <= id

The property we wish to prove is:

Claim 1: The value returned by get_cheap_lower_bound_for_already_active_id() is lower or equal to any trx_id for which return from insert(trx->id) has happened-before the call to get_cheap_lower_bound_for_already_active_id() has started and a call to erase(trx->id) (if any at all) will happened-after the return from get_cheap_lower_bound_for_already_active_id().

Note how weak this property is: it's the burden of the person who wants to use this Claim 1 for anything, to first establish the happens-before relations by other means, and only then the Claim 1 can do any useful work at all. The reason we need only such a weak claim, is because in practice we use get_cheap_lower_bound_for_already_active_id() only when already having in our hands a record which is a proof of activity of some transaction which must have happened after it has already called insert(trx->id) successfully - this establishes the needed happens-before relation: insert(trx->id) happens before a write of the record, which happens-before our read, which happens before the call to get_cheap_lower_bound_for_already_active_id(). As for the quite strong requirement for erase() to happen-after, it is justified, because it is fine for get_cheap_lower_bound_for_already_active_id() to ignore a trx which is "in the middle" of erase(trx->id), as it means the trx is already committing, and the call to erase() happens under shard mutex and after the state was already changed to TRX_STATE_COMMITTED_IN_MEMORY under trx->mutex, so whoever else is interested in the question "is trx active?" and asks it under shard mutex or trx->mutex will arrive at "no", so if we answer "no" as well, there's no discrepancy, and if we answer "yes", then we err on the safe side which is fine as well, as the caller will then double check. Hence we don't care about cases where erase(trx->id) has already started.

The proof of Claim 1, depends on a simpler claim:

Claim 2: for any moment which happens-after insert(trx_id) and happens-before erase(trx_id), the value of s_lower_bound[i] (for each i) either has the UPDATING_LOWER_BOUND flag or is lower-or-equal to trx_id.

Claim 2 implies Claim 1, because get_cheap_lower_bound_for_already_active_id() returns a value which it loaded from s_lower_bound[i] and had no UPDATING_LOWER_BOUND flag.

In what follows, it is important that all atomic operations (load, store, compare_exchange_weak, fetch_xor) use memory_order::seq_cst ordering, so there is a single order S in which all of them happen, which is consistent with the happens-before relation (established in the Claim's assumptions, and by shard's mutex).

The proof of Claim 2 is constructive. For each trx shard separately, we use induction over the trx_id-s in the order they are insert()ed to the shard (which happen under shard.active_rw_trxs.mutex, so are ordered by happens-before, too).

The thread which does insert(trx_id) experiences following events in following order:

shard.m_min_id.load() Case A) saw shard.m_min_id <= trx_id and did nothing Case B) saw shard.m_min_id > trx_id and did:
stored shard.m_min_id = trx_id
saw that s_lower_bound[i] <= trx_id holds already, OR enforced this inequality by itself by modifying s_lower_bound[i] with CAS

As all operations on shard.m_min_id and s_lower_bound[i] are ordered in S, it is meaningful to look at the sorted list of all the operations from "2.", to the moment get_cheap_lower_bound_for_already_active_id() load()s the value from it. There are only few ways in which s_lower_bound[i] can be modified:

a) calls to limit_to(s_lower_bound[i], x) from insert(x) or get_better_lower_bound_for_already_active_id(). They can only make the value smaller than it was

b) s_lower_bound[index_to_update].store(min_seen |UPDATING_LOWER_BOUND) in get_better_lower_bound_for_already_active_id() which can make it (let's say: arbitrarily) larger, but also sets the flag

c) s_lower_bound[index_to_update].fetch_xor(UPDATING_LOWER_BOUND) which clears the flag, but doesn't change the lower 63 bits

So, we have a sequence of operations of these three types. Also (b)s which sets the flag and (c)s which clear it, appear in alternating fashion, so that it is meaningful to talk about periods when the flag is present and those where it is not present.

We will show that no mater which Case, A) or B) occured, Claim 2 will hold.

Case A) - the m_min_id was already small so we did nothing.

There are two interesting sub-cases to consider:

A.I) the "1." falls between (b) and (c), i.e. when the flag was present

Here, we need to distinguish two sub-sub-cases:

A.I.1) the "1." happens-before the second for() loop in get_cheap_lower_bound_for_already_active_id() does trx_sys->shards[i].active_rw_trxs.peek().min_id().

In this case, we are fine, because from moment "1." onwards it holds that m_min_id <= trx_id, and thus it will be noticed, and taken into account before (c) clears the flag. From then on, any later (a) can only make it smaller, and future (b) will also happen after shard.m_min_id was already as we need, so will take it into account.

A.I.2) the "1." happens-after the second for() loop loaded V from shard.m_min_id.

If V <= trx_id, everything is fine, as the final value published by (c) will be smaller or equal to trx_id. If V > trx_id, and we've just seen at "1." that it was <= trx_id, it means it somehow decreased between the load() done by the second for() loop, and load() done in "1." in insert(trx_id) by us. The only places which decrease it, are calls to insert(..), so it must be the case that another insert(trx_id') has happened for trx_id'<=trx_id which decreased the m_min_id, and executed the logic for limit_to(s_lower_bound[i],trx_id'), before finishing insert(trx_id') which happened before moment "1.", which in turn happened before (c), which means, that the lower 63 bits of s_lower_bound[i] are <= trx_id' <= trx_id already at moment "1.", and thus the value revealed by (c) will be fine.

A.II) the "1." falls outside of any (b)–(c) window, i.e. when the flag was missing

This is easy case, as at "1." we saw m_min_id <= trx_id, which means (thanks to shard's mutex) that insert(m_min_id) has happened-before "1." and erase(m_min_id) has not yet happened, so we can use inductive assumption, to show the s_shard_bound[i] had to be <= m_min_id at moment "1." as the flag was not present, and thus it is also <= trx_id, and will stay like that through all (a) operations, and any future (b) operation will happen after we've already ensured shard's m_min_id <= trx_id, so will be noticed by scan over shards, before doing (c) to clear the flag.

Case B) the m_min_id was too large, so we did "2." and "3."

There are two interesting sub-cases to consider:

B.I) the "3." falls between (b) and (c), i.e. when the flag was present

It means the lower 63 bits are already <= trx_id at moment "3.", and will stay so until (c), which will only clear the flag. Operation (a) also can't make it larger. So, it is only another (b) in future which could make it larger, but that future (b) will happen after "2." which ensured shard's m_min_id is <= trx_id, and any thread doing (b)–(c) end-to-end has to check all the shards, to take them into account.

B.II) the "3." falls outside of any (b)–(c) window, i.e. when the flag was missing

Here, similarly, at moment "3." the s_lower_bound[i] is <= trx_id, and subsequent (a)s can't violate it. Also, if (b) sets the flag in future, then it will only be cleared by (c) after scanning all shards, which will happen after moment "2.", so will take the shard's m_min_id into account.

So, Claim 2 holds in all these cases.

◆ s_updating_lower_bound

std::atomic< bool > Trx_by_id_with_min::s_updating_lower_bound {false}

staticprivate

This is used during get_better_lower_bound_for_already_active_id() to announce that it is trying to establish new value for s_lower_bound.

This value is false if no such process is under way, and changed to true by the only thread chosen to perform it, thus serves the purpose of "mutex". The reason we don't use an std::mutex, is that we don't wish to wait, nor spin, we just want to give up when somebody else already works on it.

◆ UPDATING_LOWER_BOUND

constexpr trx_id_t Trx_by_id_with_min::UPDATING_LOWER_BOUND = trx_id_t{1} << 63

staticconstexprprivate

The documentation for this class was generated from the following files:

storage/innobase/include/trx0sys.h
storage/innobase/trx/trx0sys.cc

Classes

Public Member Functions

Static Public Member Functions

Private Types

Static Private Member Functions

Private Attributes

Static Private Attributes

Member Typedef Documentation

◆ By_id

Member Function Documentation

◆ by_id()

◆ erase()

◆ get()

◆ get_better_lower_bound_for_already_active_id()

◆ get_cheap_lower_bound_for_already_active_id()

◆ insert()

◆ limit_to()

◆ min_id()

Member Data Documentation

◆ m_by_id

◆ m_min_id

◆ s_lower_bound

◆ s_updating_lower_bound

◆ UPDATING_LOWER_BOUND