One of the more frequently asked questions about the native JSON data type, is what size can a JSON document be. The short answer is that the maximum size is 1GB. However, JSON often changes how data modeling is done, and deserves a slightly longer response.
Background
When using JSON, schemas tend to modeled just a little bit differently. To use an example:
- We have the requirement to store customers
- customers have addresses (1 or more)
Independent of JSON, there are two ways to store this. I will call them normalized and denormalized:
- In the normalized way you have a table called customers and another table called customer_addresses. The column customer_addresses.customer_id relates to customers.id.
- In the denormalized way, you have one table called customers. The customer has a set of columns named address_1, address_2, address_3, which store each address.
Okay, so the denormalized way is not always common because it breaks 1NF, but what if I instead had a JSON column which accepted an array of possible addresses? This seems to be more natural since it doesn’t require an arbitrary number of columns to be created in advance, and may become a more common in usage.
Which leads me to assert that: JSON will lead to more denormalization, and this will generate larger rows on average. I think this is fine, given that the following considerations are respected:
-
Locking
JSON documents are stored in individual rows inside of MySQL, and the default InnoDB storage engine uses row level locking.
That is to say that in the example above, there will be contention in the denormalized schema when attempting to update several addresses at once within the same customer.
This contention is a function of granularity, not size. By storing more information in the same row, we reduce the granularity of what can be isolated and locked.
-
Storage
The JSON data type is stored similar to how BLOB/TEXT data types are stored in MySQL. In the default configuration[1], when the full row is less than about 8K it will be stored on-page with the reset of the row. If the row is larger than 8K, the JSON document may be moved to overflow space with a pointer used in the row to be able to locate it.
Whenever a JSON document is modified, the whole type is currently rewritten. This can create a sort of write amplification where with very big JSON documents making a small modification leads to large writes being done.
-
Memory Efficiency
The InnoDB Buffer Pool (the main cache for data) stores content in pages. Being too denormalized can cause performance problems because for all the “good rows” that you need to be stored in cache, all this verbose auxiliary data is dragged along into the cache with it. You can think of this as a sort of cache dilution, where you may require more memory because you can not tightly pack as many rows per page as prior[2].
Conclusion
There is a fixed number (1GB), but the ideal size is going to be less than this value. If you need a number, my recommendation is usually to try to keep JSON documents to a few MB in size. This makes the most sense.
[1] The Page size and whether compression is used can change this behavior.
[2] The same problem can actually happen with a normalized schema too, since by splitting data across two tables, that means at least two distinct pages need to be in memory. This would be exacerbated if the hotspots to data are very widely distributed, since even though more rows are packed per page in a normalized schema, only a few of those rows actually needed to be in cache.