Documentation Home
MySQL 5.7 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 35.7Mb
PDF (A4) - 35.8Mb
PDF (RPM) - 34.8Mb
EPUB - 8.7Mb
HTML Download (TGZ) - 8.5Mb
HTML Download (Zip) - 8.5Mb
HTML Download (RPM) - 7.3Mb
Eclipse Doc Plugin (TGZ) - 9.3Mb
Eclipse Doc Plugin (Zip) - 11.5Mb
Man Pages (TGZ) - 203.7Kb
Man Pages (Zip) - 309.0Kb
Info (Gzip) - 3.4Mb
Info (Zip) - 3.4Mb
Excerpts from this Manual

MySQL 5.7 Reference Manual  /  Data Types  /  Data Type Storage Requirements

12.8 Data Type Storage Requirements

The storage requirements for table data on disk depend on several factors. Different storage engines represent data types and store raw data differently. Table data might be compressed, either for a column or an entire row, complicating the calculation of storage requirements for a table or column.

Despite differences in storage layout on disk, the internal MySQL APIs that communicate and exchange information about table rows use a consistent data structure that applies across all storage engines.

This section includes guidelines and information for the storage requirements for each data type supported by MySQL, including the internal format and size for storage engines that use a fixed-size representation for data types. Information is listed by category or storage engine.

The internal representation of a table has a maximum row size of 65,535 bytes, even if the storage engine is capable of supporting larger rows. This figure excludes BLOB or TEXT columns, which contribute only 9 to 12 bytes toward this size. For BLOB and TEXT data, the information is stored internally in a different area of memory than the row buffer. Different storage engines handle the allocation and storage of this data in different ways, according to the method they use for handling the corresponding types. For more information, see Chapter 16, Alternative Storage Engines, and Section C.10.4, “Limits on Table Column Count and Row Size”.

Storage Requirements for InnoDB Tables

See Section 15.8.3, “Physical Row Structure of InnoDB Tables” for information about storage requirements for InnoDB tables.

Storage Requirements for NDB Tables

Important

NDB tables use 4-byte alignment; all NDB data storage is done in multiples of 4 bytes. Thus, a column value that would typically take 15 bytes requires 16 bytes in an NDB table. For example, in NDB tables, the TINYINT, SMALLINT, MEDIUMINT, and INTEGER (INT) column types each require 4 bytes storage per record due to the alignment factor.

Each BIT(M) column takes M bits of storage space. Although an individual BIT column is not 4-byte aligned, NDB reserves 4 bytes (32 bits) per row for the first 1-32 bits needed for BIT columns, then another 4 bytes for bits 33-64, and so on.

While a NULL itself does not require any storage space, NDB reserves 4 bytes per row if the table definition contains any columns defined as NULL, up to 32 NULL columns. (If a MySQL Cluster table is defined with more than 32 NULL columns up to 64 NULL columns, then 8 bytes per row are reserved.)

Every table using the NDB storage engine requires a primary key; if you do not define a primary key, a hidden primary key is created by NDB. This hidden primary key consumes 31-35 bytes per table record.

You can use the ndb_size.pl Perl script to estimate NDB storage requirements. It connects to a current MySQL (not MySQL Cluster) database and creates a report on how much space that database would require if it used the NDB storage engine. See Section 19.4.25, “ndb_size.pl — NDBCLUSTER Size Requirement Estimator” for more information.

Storage Requirements for Numeric Types

Data TypeStorage Required
TINYINT1 byte
SMALLINT2 bytes
MEDIUMINT3 bytes
INT, INTEGER4 bytes
BIGINT8 bytes
FLOAT(p)4 bytes if 0 <= p <= 24, 8 bytes if 25 <= p <= 53
FLOAT4 bytes
DOUBLE [PRECISION], REAL8 bytes
DECIMAL(M,D), NUMERIC(M,D)Varies; see following discussion
BIT(M)approximately (M+7)/8 bytes

Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the leftover digits require some fraction of four bytes. The storage required for excess digits is given by the following table.

Leftover DigitsNumber of Bytes
00
11
21
32
42
53
63
74
84

Storage Requirements for Date and Time Types

For TIME, DATETIME, and TIMESTAMP columns, the storage required for tables created before MySQL 5.6.4 differs from tables created from 5.6.4 on. This is due to a change in 5.6.4 that permits these types to have a fractional part, which requires from 0 to 3 bytes.

Data TypeStorage Required Before MySQL 5.6.4Storage Required as of MySQL 5.6.4
YEAR1 byte1 byte
DATE3 bytes3 bytes
TIME3 bytes3 bytes + fractional seconds storage
DATETIME8 bytes5 bytes + fractional seconds storage
TIMESTAMP4 bytes4 bytes + fractional seconds storage

As of MySQL 5.6.4, storage for YEAR and DATE remains unchanged. However, TIME, DATETIME, and TIMESTAMP are represented differently. DATETIME is packed more efficiently, requiring 5 rather than 8 bytes for the nonfractional part, and all three parts have a fractional part that requires from 0 to 3 bytes, depending on the fractional seconds precision of stored values.

Fractional Seconds PrecisionStorage Required
00 bytes
1, 21 byte
3, 42 bytes
5, 63 bytes

For example, TIME(0), TIME(2), TIME(4), and TIME(6) use 3, 4, 5, and 6 bytes, respectively. TIME and TIME(0) are equivalent and require the same storage.

For details about internal representation of temporal values, see MySQL Internals: Important Algorithms and Structures.

Storage Requirements for String Types

In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value.

Data TypeStorage Required
CHAR(M)M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set. See Section 15.8.3, “Physical Row Structure of InnoDB Tables” for information about CHAR data type storage requirements for InnoDB tables.
BINARY(M)M bytes, 0 <= M <= 255
VARCHAR(M), VARBINARY(M)L + 1 bytes if column values require 0 − 255 bytes, L + 2 bytes if values may require more than 255 bytes
TINYBLOB, TINYTEXTL + 1 bytes, where L < 28
BLOB, TEXTL + 2 bytes, where L < 216
MEDIUMBLOB, MEDIUMTEXTL + 3 bytes, where L < 224
LONGBLOB, LONGTEXTL + 4 bytes, where L < 232
ENUM('value1','value2',...)1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
SET('value1','value2',...)1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)

Variable-length string types are stored using a length prefix plus data. The length prefix requires from one to four bytes depending on the data type, and the value of the prefix is L (the byte length of the string). For example, storage for a MEDIUMTEXT value requires L bytes to store the value plus three bytes to store the length of the value.

To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multibyte characters. In particular, when using a utf8 Unicode character set, you must keep in mind that not all characters use the same number of bytes. utf8mb3 and utf8mb4 character sets can require up to three and four bytes per character, respectively. For a breakdown of the storage used for different categories of utf8mb3 or utf8mb4 characters, see Section 11.1.9, “Unicode Support”.

VARCHAR, VARBINARY, and the BLOB and TEXT types are variable-length types. For each, the storage requirements depend on these factors:

  • The actual length of the column value

  • The column's maximum possible length

  • The character set used for the column, because some character sets contain multibyte characters

For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes).

The effective maximum number of bytes that can be stored in a VARCHAR or VARBINARY column is subject to the maximum row size of 65,535 bytes, which is shared among all columns. For a VARCHAR column that stores multibyte characters, the effective maximum number of characters is less. For example, utf8mb3 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8mb3 character set can be declared to be a maximum of 21,844 characters. See Section C.10.4, “Limits on Table Column Count and Row Size”.

InnoDB encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4.

The NDB storage engine supports variable-width columns. This means that a VARCHAR column in a MySQL Cluster table requires the same amount of storage as would any other storage engine, with the exception that such values are 4-byte aligned. Thus, the string 'abcd' stored in a VARCHAR(50) column using the latin1 character set requires 8 bytes (rather than 5 bytes for the same column value in a MyISAM table).

TEXT and BLOB columns are implemented differently in the NDB storage engine, wherein each row in a TEXT column is made up of two separate parts. One of these is of fixed size (256 bytes), and is actually stored in the original table. The other consists of any data in excess of 256 bytes, which is stored in a hidden table. The rows in this second table are always 2,000 bytes long. This means that the size of a TEXT column is 256 if size <= 256 (where size represents the size of the row); otherwise, the size is 256 + size + (2000 − (size − 256) % 2000).

The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values. See Section 12.4.4, “The ENUM Type”.

The size of a SET object is determined by the number of different set members. If the set size is N, the object occupies (N+7)/8 bytes, rounded up to 1, 2, 3, 4, or 8 bytes. A SET can have a maximum of 64 members. See Section 12.4.5, “The SET Type”.


User Comments
  Posted by Marc Bouffard on August 4, 2004
Had a lot of trouble finding the maximum table size in bytes for capacity planning. More specifically it was InnoDB tables that I had a problem with. Average row size is good, but I wanted maximum row size.

I checked several products and could not find what I wanted. Some of the tables I deal with are 300+ fields and so manual calculation was not practical.

So I wrote a little perl script that does it. Thought it might be of some use, so I include it here...it does all field types except enum/set types. It does not calculate anything regarding index size.

Just do a mysqldump -d (just the schema) of your DB to a file, and run this perl script specifying the schema file as the only argument.
----------------------------------------------------------------
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;

my %DataType =
("TINYINT"=>1,
"SMALLINT"=>2,
"MEDIUMINT"=>3,
"INT"=>4,
"BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3,
"DATETIME"=>8,
"TIMESTAMP"=>4,
"TIME"=>3,
"YEAR"=>1,
"CHAR"=>'$M',
"VARCHAR"=>'$M+1',
"TINYBLOB"=>'$M+1',
"TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2',
"TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3',
"MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4',
"LONGTEXT"=>'$M+4');

my $D;
my $M;
my $dt;

my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>)
{
chomp;
if ( $_ =~ s/create table[ ]*([a-zA-Z_]*).*/$1/i )
{
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";

  Posted by Rich Tomasso on April 5, 2005
Here's a modification of Marc's script above that also handles ENUM's. Enjoy.

#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;

my %DataType =
("TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3,
"INT"=>4, "BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "TIME"=>3, "YEAR"=>1,
"CHAR"=>'$M', "VARCHAR"=>'$M+1',
"ENUM"=>1,
"TINYBLOB"=>'$M+1', "TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2', "TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3', "MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4', "LONGTEXT"=>'$M+4');

my ($D, $M, $dt);

my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;

open (TABLEFILE,"< $ARGV[0]");

LOGPARSE:while (<TABLEFILE>) {
chomp;
if ( $_ =~ s/create table[ ]`*([a-zA-Z_]*).*`/$1/i ) {
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|ENUM[ ]*\(*['A-Za-z_,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/\(.*\)//g;
$dt =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";

  Posted by Alex Gheorghiu on January 30, 2009
The above scripts are not taking into account several important information (so they are outdated)

1. the database/table encoding.
If you have an UTF8 encoding for a varchar(100) that it will take up 300 bytes (3 bytes per UTF symbol)
"[...]As of MySQL 4.1, to calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character."

2. enum can have either 1 or 2 bytes
"[...]The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values."

  Posted by Rafał Michalski on February 19, 2009
Here I wrote another script based on Marc's, that takes into account what Alex wrote and more.
It calculates VARCHAR/CHAR/TEXT taking CHARSET or COLLATION into account, calculates properly SET and ENUM size, DECIMAL/NUMERIC is calculated according to >5.0.3 packed standard.
Calculates also least row byte size for dynamic row length tables.
It uses "mysql" and "mysqldump" tools internally.
Any argument to this script is provided as an argument for mysqldump.
Example: {scriptname} --all-databases
Please report any bug, especially when it comes to size calculations. Enjoy.

----------- copy here --------------
#!/usr/bin/perl
use strict;
$| = 1;

my %DataType = (
"TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3, "INT"=>4, "INTEGER"=>4, "BIGINT"=>8,
"FLOAT"=>'$M<=24?4:8', "DOUBLE"=>8,
"DECIMAL"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"NUMERIC"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"BIT"=>'($M+7)>>3',
"DATE"=>3, "TIME"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "YEAR"=>1,
"BINARY"=>'$M',"CHAR"=>'$M*$CL',
"VARBINARY"=>'$M+($M>255?2:1)', "VARCHAR"=>'$M*$CL+($M>255?2:1)',
"ENUM"=>'$M>255?2:1', "SET"=>'($M+7)>>3',
"TINYBLOB"=>9, "TINYTEXT"=>9,
"BLOB"=>10, "TEXT"=>10,
"MEDIUMBLOB"=>11, "MEDIUMTEXT"=>11,
"LONGBLOB"=>12, "LONGTEXT"=>12
);

my %DataTypeMin = (
"VARBINARY"=>'($M>255?2:1)', "VARCHAR"=>'($M>255?2:1)'
);

my ($D, $M, $S, $C, $L, $dt, $dp ,$bc, $CL);
my $fieldCount = 0;
my $byteCount = 0;
my $byteCountMin = 0;
my @fields = ();
my $fieldName;
my $tableName;
my $defaultDbCL = 1;
my $defaultTableCL = 1;
my %charsetMaxLen;
my %collationMaxLen;

open (CHARSETS, "mysql -B --skip-column-names information_schema -e 'select CHARACTER_SET_NAME,MAXLEN from CHARACTER_SETS;' |");
%charsetMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <CHARSETS>);
close CHARSETS;

open (COLLATIONS, "mysql -B --skip-column-names information_schema -e 'select COLLATION_NAME,MAXLEN from CHARACTER_SETS INNER JOIN COLLATIONS USING(CHARACTER_SET_NAME);' |");
%collationMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <COLLATIONS>);
close COLLATIONS;

open (TABLEINFO, "mysqldump -d --compact ".join(" ",@ARGV)." |");

while (<TABLEINFO>) {
chomp;
if ( ($S,$C) = /create database.*?`([^`]+)`.*default\scharacter\sset\s+(\w+)/i ) {
$defaultDbCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : 1;
print "Database: $S".($C?" DEFAULT":"").($C?" CHARSET $C":"")." (bytes per char: $defaultDbCL)\n\n";
next;
}
if ( /^create table\s+`([^`]+)`.*/i ) {
$tableName = $1;
@fields = ();
next;
}
if ( $tableName && (($C,$L) = /^\)(?:.*?default\scharset=(\w+))?(?:.*?collate=(\w+))?/i) ) {
$defaultTableCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultDbCL);
print "Table: $tableName".($C||$L?" DEFAULT":"").($C?" CHARSET $C":"").($L?" COLLATION $L":"")." (bytes per char: $defaultTableCL)\n";
$tableName = "";
$fieldCount = 0;
$byteCount = 0;
$byteCountMin = 0;
while ($_ = shift @fields) {
if ( ($fieldName,$dt,$dp,$M,$D,$S,$C,$L) = /\s\s`([^`]+)`\s+([a-z]+)(\((\d+)(?:,(\d+))?\)|\((.*)\))?(?:.*?character\sset\s+(\w+))?(?:.*?collate\s+(\w+))?/i ) {
$dt = uc $dt;
if (exists $DataType{$dt}) {
if (length $S) {
$M = ($S =~ s/(\'.*?\'(?!\')(?=,|$))/$1/g);
$dp = "($M : $S)"
}
$D = 0 if !$D;
$CL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultTableCL);
$bc = eval($DataType{$dt});
$byteCount += $bc;
$byteCountMin += exists $DataTypeMin{$dt} ? $DataTypeMin{$dt} : $bc;
} else {
$bc = "??";
}
$fieldName.="\t" if length($fieldName) < 8;
print "bytes:\t".$bc."\t$fieldName\t$dt$dp".($C?" $C":"").($L?" COLL $L":"")."\n";
++$fieldCount;
}
}
print "total:\t$byteCount".($byteCountMin!=$byteCount?"\tleast: $byteCountMin":"\t\t")."\tcolumns: $fieldCount\n\n";
next;
}
push @fields, $_;
}
close TABLEINFO;

  Posted by Jake Drew on May 26, 2011
It appears that TEXT fields with no length specified default to a length of 10 Bytes in your script output. However, information_schema.columns.character_maximum_length lists all my text fields as 65535?

ex:
bytes: 10 abstract TEXT COLL utf8_unicode_ci

Is this a space calculation bug in the script?
  Posted by Jake Drew on June 9, 2011
Here is an SQL script that can be used to determine maximum space per row for InnoDB tables using the COMPACT row format.

I have tested the results against my database structures loaded with maximum length records @ 100,000 , 500,000 , and 1,000,000 records. The results seem to be fairly accurate.

I based the maximum space calculations for fields using the following MySQL reference above. I based the calculations for InnoDB Compact row format primary and secondary index record headers using the following MySQL reference:

http://dev.mysql.com/doc/refman/5.1/en/innodb-physical-record.html

Notes:

The SQL produces all sizes in Bytes. If the SQL encounters an unknown data type, it assigns a byte value of 999999999999999 Bytes for that field. You must update TABLE_SCHEMA = 'Your Schema Name' in two places. The query add no overhead factor to it's results. Any overhead factor must be added to the results produced by this query.

SQL Below:

SELECT B.TABLE_SCHEMA
, B.TABLE_NAME
, (CASE WHEN SUM(PK_BYTES) = 0 THEN 6 ELSE SUM(PK_BYTES) END) + 18 AS PK_BYTES_TOT -- 18 = Index Record Header (5) + Transaction ID (6) + Roll Pointer (7)
, SUM(FIELD_BYTE_SPACE) AS FIELD_BYTES_TOT
, SUM(IX_BYTES) AS IX_FIELD_BYTES_TOT
, SUM(CASE WHEN IX_BYTES > 0 THEN 1 ELSE 0 END) AS IX_FIELD_COUNT
, ((CASE WHEN SUM(PK_BYTES) = 0 THEN 6 ELSE SUM(PK_BYTES) END) + 18) + SUM(FIELD_BYTE_SPACE) + SUM(IX_BYTES) AS TABLE_BYTES_TOT
FROM
(
SELECT A.*
, CASE WHEN COLUMN_KEY = 'PRI'THEN FIELD_BYTE_SPACE ELSE 0 END AS PK_BYTES
, CASE WHEN A.COLUMN_KEY <> 'PRI'
AND A.COLUMN_KEY <> '' THEN (PK_BYTE_SPACE + FIELD_BYTE_SPACE) ELSE 0 END AS IX_BYTES
FROM (
SELECT PK_SP.TABLE_SCHEMA
, PK_SP.TABLE_NAME
, PK_SP.COLUMN_NAME
, DATA_TYPE
, CHARACTER_MAXIMUM_LENGTH
, NUMERIC_PRECISION
, IS_NULLABLE
, COLUMN_KEY
, CHARACTER_SET_NAME
, CHARACTER_OCTET_LENGTH
, (CASE -- CHARACTER FIELDS
WHEN DATA_TYPE = 'varchar' THEN CHARACTER_MAXIMUM_LENGTH + 1
WHEN DATA_TYPE = 'char' THEN CHARACTER_MAXIMUM_LENGTH
WHEN DATA_TYPE = 'tinyblob'
OR DATA_TYPE = 'tinytext' THEN CHARACTER_MAXIMUM_LENGTH + 1
WHEN DATA_TYPE = 'blob'
OR DATA_TYPE = 'text' THEN CHARACTER_MAXIMUM_LENGTH + 2
WHEN DATA_TYPE = 'mediumblob'
OR DATA_TYPE = 'mediumtext' THEN CHARACTER_MAXIMUM_LENGTH + 3
WHEN DATA_TYPE = 'largeblob'
OR DATA_TYPE = 'largetext' THEN CHARACTER_MAXIMUM_LENGTH + 4
-- NUMERIC FIELDS
WHEN DATA_TYPE = 'tinyint' THEN 1
WHEN DATA_TYPE = 'smallint' THEN 2
WHEN DATA_TYPE = 'mediumint' THEN 3
WHEN DATA_TYPE = 'int'
OR DATA_TYPE = 'integer' THEN 4
WHEN DATA_TYPE = 'bigint' THEN 8
WHEN DATA_TYPE = 'float'
AND (NUMERIC_PRECISION <= 24
OR NUMERIC_PRECISION IS NULL) THEN 4
WHEN DATA_TYPE = 'float'
AND NUMERIC_PRECISION > 24 THEN 8
WHEN DATA_TYPE = 'bit' THEN (NUMERIC_PRECISION + 7) / 8
WHEN DATA_TYPE = 'double'
OR DATA_TYPE = 'numeric' THEN
(FLOOR(NUMERIC_PRECISION/9)*4) + ROUND((NUMERIC_PRECISION- FLOOR(NUMERIC_PRECISION/9)*9)*.5,0)
-- DATETIME FIELDS
WHEN DATA_TYPE = 'date'
OR DATA_TYPE = 'time' THEN 3
WHEN DATA_TYPE = 'datetime' THEN 8
WHEN DATA_TYPE = 'timestamp' THEN 4
WHEN DATA_TYPE = 'year' THEN 1
-- BINARY FIELDS
WHEN DATA_TYPE = 'binary' THEN CHARACTER_MAXIMUM_LENGTH
ELSE 999999999999999 END) +
(CASE WHEN IS_NULLABLE = 'YES' THEN 1 ELSE 0 END) AS FIELD_BYTE_SPACE
, CASE WHEN PK_BYTE_SPACE IS NULL THEN 6 + 18 ELSE PK_BYTE_SPACE + 18 END AS PK_BYTE_SPACE
FROM information_schema.columns AS PK_SP
LEFT OUTER JOIN
(SELECT TABLE_SCHEMA
, TABLE_NAME
, SUM((CASE -- CHARACTER FIELDS
WHEN DATA_TYPE = 'varchar' THEN CHARACTER_MAXIMUM_LENGTH + 1
WHEN DATA_TYPE = 'char' THEN CHARACTER_MAXIMUM_LENGTH
WHEN DATA_TYPE = 'tinyblob'
OR DATA_TYPE = 'tinytext' THEN CHARACTER_MAXIMUM_LENGTH + 1
WHEN DATA_TYPE = 'blob'
OR DATA_TYPE = 'text' THEN CHARACTER_MAXIMUM_LENGTH + 2
WHEN DATA_TYPE = 'mediumblob'
OR DATA_TYPE = 'mediumtext' THEN CHARACTER_MAXIMUM_LENGTH + 3
WHEN DATA_TYPE = 'largeblob'
OR DATA_TYPE = 'largetext' THEN CHARACTER_MAXIMUM_LENGTH + 4
-- NUMERIC FIELDS
WHEN DATA_TYPE = 'tinyint' THEN 1
WHEN DATA_TYPE = 'smallint' THEN 2
WHEN DATA_TYPE = 'mediumint' THEN 3
WHEN DATA_TYPE = 'int'
OR DATA_TYPE = 'integer' THEN 4
WHEN DATA_TYPE = 'bigint' THEN 8
WHEN DATA_TYPE = 'float'
AND (NUMERIC_PRECISION <= 24
OR NUMERIC_PRECISION IS NULL) THEN 4
WHEN DATA_TYPE = 'float'
AND NUMERIC_PRECISION > 24 THEN 8
WHEN DATA_TYPE = 'bit' THEN (NUMERIC_PRECISION + 7) / 8
WHEN DATA_TYPE = 'double'
OR DATA_TYPE = 'numeric' THEN
(FLOOR(NUMERIC_PRECISION/9)*4) + ROUND((NUMERIC_PRECISION- FLOOR(NUMERIC_PRECISION/9)*9)*.5,0)
-- DATETIME FIELDS
WHEN DATA_TYPE = 'date'
OR DATA_TYPE = 'time' THEN 3
WHEN DATA_TYPE = 'datetime' THEN 8
WHEN DATA_TYPE = 'timestamp' THEN 4
WHEN DATA_TYPE = 'year' THEN 1
-- BINARY FIELDS
WHEN DATA_TYPE = 'binary' THEN CHARACTER_MAXIMUM_LENGTH
ELSE 999999999999999 END) +
(CASE WHEN IS_NULLABLE = 'YES' THEN 1 ELSE 0 END)) AS PK_BYTE_SPACE
FROM information_schema.columns COL_SP
WHERE COLUMN_KEY = 'PRI'
AND TABLE_SCHEMA = 'studypods_dev'
GROUP BY TABLE_SCHEMA
, TABLE_NAME) AS IX_SP
ON PK_SP.TABLE_SCHEMA = IX_SP.TABLE_SCHEMA
AND PK_SP.TABLE_NAME = IX_SP.TABLE_NAME
WHERE PK_SP.TABLE_SCHEMA = 'studypods_dev') AS A
) AS B
GROUP BY B.TABLE_SCHEMA
, B.TABLE_NAME
  Posted by Rick James on September 15, 2012
The formulas above apply to MyISAM. For InnoDB data, the quick answer is to calculate for MyISAM, then double or triple that value.

The more complex way is something like:
Step 1: Compute basic length of each field (without length field for VAR fields); add 1 or 2 to that length. (1 if all the fields are 'short')
Step 2: Add those together, plus 29 bytes for record overhead.
Step 3: Add 40% for the blocks not being full.
Step 4: Multiply by the number of rows.

That contorted computation can easily be off by a significant amount, either way.
  Posted by Sean Nolan on October 17, 2012
This page lists the BLOB and TEXT types and gives a formula for calculating the storage required, but it does not give the different maximum sizes. Here they are:

TINYTEXT - 255 bytes
TEXT - 65535 bytes
MEDIUMTEXT - 16,777,215 bytes (2^24 - 1)
LONGTEXT - 4G bytes (2^32 – 1)

TINYBLOB - 255 bytes
BLOB - 65535 bytes
MEDIUMBLOB - 16,777,215 bytes (2^24 - 1)
LONGBLOB - 4G bytes (2^32 – 1)
Sign Up Login You must be logged in to post a comment.