MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
InnoDB Data Locking - Part 2 "Locks"

In InnoDB Data Locking – Part 1 “Introduction” we’ve described the difficulties Lock System tries to solve using metaphor of people trying to concurrently edit spreadsheets. While it might be useful metaphor to get some intuitions about the problem, to talk about solutions it helps to know at least a little about the “reality” this metaphor maps to. In this post I’ll talk about how statements we’ve seen before such as “Alice request Read access to file A but has to wait for Basil to release his Write rights first” map to InnoDB’s reality of tables, rows, locks, lock queues etc.

The metaphor

The metaphor used in previous post, was supposed to be like this:

  • file on a shared drive → database
  •  spreadsheet tab inside a file → table
  • row inside a spreadsheet → row
  • a column in a spreadsheet → column
  • person → client
  • person’s plan of action → transaction
  • access right request → a lock
  • access right → a lock mode
  • requesting an access right → locking

However, to make my story resemble what might happen in real life office it involved scenarios like “Alice plans to Read file A” which would translate to “Client A performs a transaction which starts by locking database A for share” which is something which doesn’t happen in real life in InnoDB, as transactions do not lock whole databases: that would be too heavy handed. To achieve more parallelism, in InnoDB, transactions request much more fine grained access – usually at the level of individual rows. However, a story like “Alice plans to Read row 2 in the first tab of fileA.ods” felt a bit too verbose to me for the purpose of previous blog post, which was meant to convey some general intuitions and not drown you with details. I hope, it doesn’t make the previous post “wrong”, it just makes it “silly” – nobody really locks entire databases to operate on a single cell, but if they did, then they would face the problems described there. Moreover, these problems are direct analogue of the problems you will face at the lower level of granularity.

What is a database “lock”?

One thing I found very confusing when familiarizing myself with database lingo, was that the word “lock” has a different meaning in databases than in programming.
In programming if you have “a lock” then it is a single object in memory stored under some address, and then there are multiple threads which try “to lock” it and either succeed or wait till they succeed. So there’s one lock per resource, and the action of “locking” is something a thread do and you can catch the moment it happens using a debugger, but there are no memory objects (other than call stacks) explicitly documenting the fact that a given thread tried or succeeded to get the lock.
In InnoDB above concept is called “a latch” to repurpose the word “lock” for something else entirely. In InnoDB’s Lock System, “a lock” is really more like “a request for a specific kind of access right to specific resource by a specific transaction”. So, for a particular resource, there might exist hundreds of “locks” if there are many transactions which request access to it, and there can even be more than one for a single transaction if it needed to access it using different lock modes. A “lock” can be waiting, or granted and documents an access right for a given transaction to a given resource. You can think about it as a piece of paper form, that you have to file to get permission for something, which waits for approval stamp in some official’s drawer and eventually gets granted, and serves as a certificate documenting your rights. So, a lock is more like a tuple documenting a state of a request:

for example (simplifying greatly) a Lock System could contain following locks related to single resource (row#2 in table ‘report’) at the same time:

One of the benefits of explicitly modelling who requested what as objects in memory is that you can inspect the situation by looking at those objects. In particular you can query performance_schema.data_locks table to see all the locks created by your active transactions within InnoDB engine:

I’ll explain exact meaning of all possible values in the LOCK_MODE column later in the “Record locks” section of the article, for now it suffices to have an intuition that (loosely speaking) S and X correspond to shared and exclusive.

(If you start to wonder about the paradox of keeping locks protecting access to tables inside yet another table, let me comfort you: this is not a real InnoDB table. There is some magic which makes it look like a table, but it is actually a result of scanning actual low-level data structures in server’s memory and presenting them as neat rows.)

Actually, these are just the explicit locks – for performance reasons InnoDB tries to avoid explicitly representing access rights which it can deduce implicitly from the state of a row itself. You see, each time a transaction modifies a row it puts own id in the header of the row to indicate that it was the last one to modify it – if this transaction has still not committed, then it implies that it still has exclusive access right to this record (it had to have it to modify the row, and locks in “two phase locking” are released only when committing) without wasting space for storing this information explicitly. Such implicit locks will not show up in performance_schema.data_locks (that would require a scan over the undo logs to identify all implicit locks). The most common reason implicit locks are created is an INSERToperation: successfully inserted rows are not visible to other transactions until the inserting transaction commits, and it is a common situation that a single transaction inserts many rows, so it is cheaper to not create explicit locks for newly inserted rows, instead just implicitly assume that the inserting transaction has exclusive access rights to all of them simply because its id is written in their headers. As will be explained in Part 3 “Deadlocks”, it is important to properly model and monitor who waits for who, so whenever the Lock System identifies that an implicit lock can be a reason that another transaction has to wait, it converts the implicit lock to an explicit lock, so that it can be properly analysed, monitored, reported etc. This is referred to as implicit-to-explicit conversion, and semantically doesn’t change anything – it merely changes the representation of the lock.

Table locks

The (confusing!) interaction with Server’s table locks

As said before, in InnoDB most of the locking happens at the granularity of a row. This increases opportunities for parallelism, because multiple transactions can work on disjoint sets of rows concurrently and the server is still able to pretend that one happened after the other in a serializable sequence. There are also table-level locks, which let you lock entire tables. These are rather rare sight, because of the way InnoDB is integrated with the Server. InnoDB sits below the Server, which also has its own locking mechanisms, so most of the time InnoDB doesn’t even know that a transaction has locked a table as it happens “above its head”. Frankly, I’m a bit torn if we should talk about table locks now: in some sense they are much simpler than record locks, on the other hand the way InnoDB and Server coordinate access to tables each trying to do it in its own way makes understanding of what has happened more complicated. In particular it means that performance_schema.data_locks does not report locks maintained by the Server itself. So, on a default configuration you’ll see some confusing things happening (I know, I know, there are no “confusing things”, just confused agents with wrong mental models. I hope to provide you with correct mental model soon), for example:

You might expect that your transaction has locked table t, but you can’t see any locks:

and moreover you can’t even see the transaction!

This is because by default the Server and InnoDB are configured in such a way, that Server will COMMIT the current transaction if you do LOCK TABLES. Indeed, even though we have not issued COMMIT instruction ourselves, any other client can already see the row we’ve INSERTed , which means the transaction is considered committed already (oops!):

The “LOCK TABLES mechanism” is somewhat separate from “transactions mechanism” and the default behaviour is to finish one before starting to play with another. You can think about LOCK TABLES + UNLOCK TABLES as bracketing a critical section, and you can think same way about BEGIN + COMMIT . But, by default you can’t interleave the two.

You can verify that the LOCK TABLES t READ worked by consulting a completely different table, named performance_schema.metadata_locks (note the “meta” in the name):

The architecture of MySQL is that Server and InnoDB are quite separate and I wouldn’t like to pretend I know much about Server’s internals. So, let me just say that this table shows locks taken by the Server, and that they indeed prevent other clients from trying to modify the table:

will wait, and you can verify it by:

Note however, that this has nothing to do with InnoDB Lock System: actually there is no ongoing transaction that InnoDB knows about at the moment:

And as soon as you try to start a transaction in con1 which did the LOCK TABLES

the insert in con3 will proceed and succeed:

(These 3 min is how long it took me to get to type BEGIN into con1 after I typed INSERT into con3)

So, it looks like starting a transaction with BEGIN implicitly does UNLOCK TABLES. Indeed: by default if you start to play with transactions you finish playing with locking tables.

One more thing to note here, is that con3 did not have to use any LOCK TABLES statement before issuing an INSERT and yet the mechanism of preventing con3 from action until con1 releases a table worked. This means that participating in this Servel-level table locking mechanism is obligatory and implicit, and not something you have to opt-in and can somehow miss. We say this locking is mandatory, not adivisory.

Also, please note, that the lock type required for INSERT statement was SHARED_WRITE which might sound confusing as up till now we usually equated “shared” with “reading” and “exclusive” with “write”. The correct interpretation here is that there might be more than one transaction editing rows, in the same table. So, they can share access to the table with each other, even though each of them wants to write, as long as they will write into distinct rows. So it is “shared” (at table level) and “write” (at row level) at the same time.

However, SHARED_WRITE conflicts with SHARED_READ_ONLY which also makes sense, because con1 wanted prevent any writes to the whole table. Such SHARED_READ_ONLY has “shared” in the name, because it can be compatible with other transactions which also want SHARED_READ_ONLY, as their interests align: they all want to prevent modifications.

Table locks in InnoDB (hopefully less confusing now!)

OK, so these were locks maintained by Server, but this series of blog posts is intended to talk about InnoDB’s Lock System. How can we create a table-level lock in InnoDB?

The documented trick is to taboo the word BEGIN (and its synonym START TRANSACTION  ) which implicitly cause UNLOCK TABLES. Instead, we will disable autocommit, so that it is implicit that all that we do is always a part of a transaction. We will be inside transaction without having to explicitly start it. Clever! (backward compatibility has its price, and value, I guess)

So, we now have what we wanted: an active InnoDB transaction (with id 3851 within InnoDB)  which is in possession of explicit table locks within InnoDB which correspond to the locks held by the corresponding Server thread (with id 49). Yes, the tables are locked twice: at the level of Server and InnoDB:

table lock held by Server layer lock held inside InnoDB engine
t SHARED_READ_ONLY S
t1 SHARED_NO_READ_WRITE X
Table Intention locks in InnoDB (hopefully least confusing)

Another way to acquire a table lock within InnoDB, which happens very often and without any LOCK TABLES or autocommit trickery, is when you simply try to read or modify some portion of the table, that is, when you try to SELECT a row, or UPDATE an existing row or INSERT a new row.

For example, assume we start a completely new transaction, and insert a new row:

To be able to even try to insert anything into to table t, this transaction will need to acquire a particular right to the table:

InnoDB’s IX corresponds to SHARED_WRITE at the Server layer, that we saw before. To continue this example, let’s say this transaction performs a read from t1:

Surprisingly, there is no table-level lock taken by InnoDB in this case. This makes some sense: the rows were not locked neither, because it was a non-locking select, and the query enjoyed protection at the Server level:

Behaviour is different, if you try to lock a part of the table for reading by performing a locking select (SELECT...FOR SHARE/UPDATE), such as:

This time, as we are trying to read and lock a part of the table t1, we request IS for it.

One thing that emerges here is a distinction between “whole table” and “part of the table” when we try to specify required access rights at the table level. You can imagine following combinations:

  • X → I want to be the only one who has access to the whole table
  • S → I want to be able assume that whole table is protected from modification
  • IX → I intend to modify some part of the table
  • IS → I intend to read some part of the table

(These names (X, S, IX, IS) is how InnoDB talks about table locks)

Let’s take a second to figure out which pairs of lock requests are compatible with each other, and which can not be granted at the same time, as if we were to design this ourselves. Such a compatibility relation between access rights can be neatly summarized in a form of a compatibility matrix which has one row for each possible access right you might want to request, and one column for each possible access right another transaction already holds:

↓my request \ held by other→ X S IX IS AUTO_INC
X ? ? ? ? ?
S ? ? ? ? ?
IX ? ? ? ? ?
IS ? ? ? ? ?
AUTO_INC ? ? ? ? ?

Let’s figure out how to feel in the above matrix, with ⌛(new request has to wait) and ✅ (new request can proceed).

Clearly X seems incompatible with anything else. S seems compatible with other S and IS, but it can not cope with another thread doing modifications even to a small part of the table, so it’s conflicts with IX.

↓my request \ held by other→ X S IX IS AUTO_INC
X
S ?
IX ? ? ?
IS ? ? ?
AUTO_INC ? ? ? ?

Should IX conflict with other IX or IS? No – the whole point of having such a granular system was to permit concurrent modifications to the table. Sure, we have to somehow ensure that two transactions do not modify conflicting set of rows, but this can be dealt with at the lower level of granularity, when they try to request access to individual rows. All the transaction requesting IX is asking for is “a permission to ask for access to rows in future”. This “asking for permission to ask” might sound stupid, but it serves at least two purposes:

  • we can save everybody trouble by quickly responding “nope, your IS have to wait because somebody has X locked the whole table” before transaction even starts to search for actual rows to access
  • a granted IS or IX lock is a clear sign that there is work going on inside the table, which any other transaction trying to lock a whole table will have to take into account, and thus it might have to wait till it finishes (“Danger, workers below!”)

One could imagine a different design, in which intention locks (IS and IX) do not exist, and each time a transaction tries to lock individual row, it first has to check if there are conflicting S or X table locks, and each time a transaction tries to X or S lock a table it has to check if there are any conflicting record-level locks first. One benefit of specifying intentions upfront is that in general it leads to less deadlocks (or exposes them sooner). Another is that if you think the design with “check if there are existing record-level locks first” through, you’ll realize you probably want to cache the answer to this question, avoid costly lookups, minimize synchronization effort to update this info, and have some sane way of reporting what is happening,… and you’ll end up with some equivalent of IS and IX locks (or at least keep track of their “count”).

So, we end up with following compatibility matrix:

↓my request \ held by other→ X S IX IS AUTO_INC
X
S
IX
IS
AUTO_INC

(I’ve put AUTO_INC locks not mentioned yet into this matrix to make it complete for future reference. I hope you now have enough intuitions to figure out yourself why AUTO_INC lock has to conflict with S and why it is slightly different from IX in that it conflicts with itself. SPOILER:
AUTO_INC is taken when inserting a row at the end of the table and assigning a  key to it via AUTO INCREMENT, so, naturally, care must be taken to synchronize two transactions doing this in parallel, so they do not end up with the same key
select the preceding white lines to see the spoiler)

Notice that this matrix has a nice property of being symmetric: if A conflicts with B, then also B conflicts with A.  We will see a matrix without this property when dealing with record-level locks, and you’ll learn to appreciate how soothing it is to be able to work with symmetric conflict relation and say things like “A and B conflict with each other” carelessly not specifying the direction.

Yet another perspective to look at table locks is to generalize it to an arbitrary hierarchy of nested scopes (say: datacenter > database > table > partition > index > row > field) and try to figure out a system where you can lock any of these scopes in a such a way that conflicts are discovered. Say, I want to drop a partition, while someone else is trying to make a snapshot of the whole database? How to model that to keep track of what is going on and figure out if someone should wait?  Our idea is to allow people to request X or S lock at the given lower level, only after they’ve acquired IX or IS (respectively) for all the levels above. So, to drop a partition, you definitely want X access right to it, but you first need IX to the table, and IX to the database and IX to the datacenter. And if someone wants to take a snapshot of a database, they need S access right to it, and IS to the datacenter. A conflict between S and IX at the level of the database is quickly detected and someone will have to wait. In InnoDB we only have only two levels of this hierarchy: tables and rows. (Actually, if you find this “nested scopes” analogy helpful, then you might enjoy a perspective in which “gap before the row” is also a scope,  S,GAP and X,GAP locks are “S locks on the level of gap”, and INSERT_INTENTION locks are like “IX locks at the level of gap”. Note the “INTENTION” in the name, it’s not a coincidence!)

Short note on AUTO_INC locks

They are different than anything else. There is a lot of special-case code and logic to make inserting a lot of rows as performant as possible. You may assume that whatever I wrote in this series doesn’t necessarily apply to them, unless I said so. For starters, they are often not taken at all – a short-lived latch protecting the sequence counter is acquired for duration of increment, and released ASAP. And if they are, they might get released at the end of the statement instead of being kept till the end of transaction. See AUTO_INCREMENT Handling in InnoDB in our Reference Manual for more details.

Record locks

As said earlier, most of the locking activity in InnoDB happens at the record level, but I find InnoDB table locks easier to explain as there are less possible lock modes (just 5: X, S, IS, IX, AUTO_INC) and the conflict relation is symmetric, which makes it easier to understand necessary concepts.

InnoDB is a huge piece of software, so necessarily one has to talk about some abstraction of what is going on, not to get drowned in details. Thus, please forgive me following huge oversimplification: we will imagine that a row in an index is just a point on an axis. That is, each index is modelled as a separate axis, and if you list the rows from the index in their ascending order you get some discrete set of points from left to right along this axis:

Could be conceptualized as:

Our mental image should consist of points and gaps between them:

The rightmost gap is a special one in that it is not before any real row. You can imagine a pseudo-record “at infinity”, which is larger than any other record, and thus the right-most gap is “before the pseudo-record”. (In real, not-oversimplified, InnoDB this problem happens within each data page: sometimes we need to talk about the gap after the last record on this particular page. Yes, this is conceptually the same gap as the gap before the first record on the next page. But, often we are in a situation where we don’t have access to this next page, yet need to somehow talk/identify/operate on this gap. Therefore each page in InnoDB has a supremum pseudo-record. There is some widespread misconception that “supremum pseudo-record” marks end of the whole index. No, there is one in each leaf of the index)

Even without knowing too much about how databases like InnoDB operate, we can guess, that sometimes the operation involves just the record, sometimes the gap before a record, and at yet another times we need to access both, the record and a gap. One way to model that, would be to consider records and gaps to be two different kinds of resources which you can lock independently. Current InnoDB implementation takes a different approach: there is just one resource for each point, but there are multiple kinds of access right you can request for it, and the access right specifies if you need the row, the gap or both parts. One benefit of this is that it is optimized for the most common case where you need both.

There are many different access rights defined currently in InnoDB, which are denoted in performance_schema.data_locks.lock_mode column by using following literals:

  • S,REC_NOT_GAP → shared access to the record itself
  • X,REC_NOT_GAP → exclusive access to the record itself
  • S,GAP → right to prevent anyone from inserting anything into the gap before the row
  • X,GAP → same as above. Yes, “S” and “X” are short for “shared” and “exclusive”, but given that the semantic of this access right is to “prevent insert from happening” several threads can all agree to prevent the same thing without any conflict, thus currently InnoDB treats S,GAP and X,GAP (or *,GAP locks, for short) the same way: as conflicting just with *,INSERT_INTENTION
  • S → is like a combination of S,REC_NOT_GAP and S,GAP at the same time. So it is a shared access right to the row, and prevents insert before it.
  • X → is like a combination of X,REC_NOT_GAP and X,GAP at the same time. So it is an exclusive access right to the row, and prevents insert before it.
  • X,GAP,INSERT_INTENTION → right to insert a new row into the gap before this row. Despite “X” in its name it is actually compatible with others threads trying to insert at the same time.
  • X,INSERT_INTENTION → conceptually same as above, but only happens for the “supremum pseudo-record” which is imaginary record “larger than any other record on the page” so that the gap “before” “it” is actually “gap after the last record”.

The above list is an implementation detail, subject to change in the future. What will probably remain is the idea that there are many “lock modes” and a set of rules to decide if a request for access in mode A has to wait for a transaction accessing the resource in mode B to finish. This can be given by a matrix similar to this:

↓requested \ held→ S,REC_NOT_GAP X,REC_NOT_GAP *,GAP S X *,INSERT_INTENTION
S,REC_NOT_GAP
X,REC_NOT_GAP
*,GAP
S
X
*,INSERT_INTENTION

Some things to notice:

  • nobody cares about already granted INSERT_INTENTION. This is because this access right is immediately “consumed” right after being granted: the transaction immediately inserts a new record into the database, which causes the gap before the (old) row to split into two gaps, so in some sense the old access right is no longer needed/valid and thus is ignored.
  • *,GAP locks are immediately granted no matter what. This is heavily used in “lock splitting” technique I will describe later
  • in particular, INSERT_INTENTION has to wait for *,GAP, but not the other way around – the conflict relation is not symmetrical!
  • INSERT_INTENTION has to wait for S, and S has to wait for X,REC_NOT_GAP, yet INSERT_INTENTION does not have to wait for X,REC_NOT_GAP – the conflict relation is not transitive!

Once again: these are implementation details, which might change in future versions. What matters is to realize that you can have a database engine with much more complex set of access rights than simply Read and Write and that the conflict relation between them can be arbitrary (not even symmetrical or transitive).

Would you like to know more?

This article is already rather long and low-level, but in case you’d like to know more implementation details you might want to read an auxiliary article InnoDB Data Locking – Part 2.5 “Locks” (Deeper dive) about:

  • Record Locks’ compression and how one lock_t struct represents many logical lock requests
  • The way performance_schema.data_locks handles LOCK_DATA column and why it sometimes shows NULL
  • Lock splitting, a new feature in MySQL 8.0.18 which helps  avoid some deadlocks
  • Locks on secondary indexes and how they are inferred from implicit locks

Or you can skip this low-level stuff, and go straight to the next article: InnoDB Data Locking – Part 3 “Deadlocks”.

Thank you for using MySQL!