MySQL Internals Manual  /  ...  /  LOAD DATA INFILE Events

20.9.1 LOAD DATA INFILE Events

LOAD DATA INFILE is not written to the binary log like other statements. It is written as one or more events in a packed format, not as a cleartext statement in the binary log. The events indicate what options are present in the statement and how to process the data file.

Historically, there seem to have been at least four event sequences for representing LOAD DATA INFILE operations.

1) In MySQL 3.23, there was only one event: Load_log_event (type code LOAD_EVENT = 6). Load_log_event only contains the filename, not the file itself. When the slave sees a Load_log_event, it requests that the master send the file in a separate connection. This has the drawback that the binary log is not self-contained: If the file has been removed on the master or the slave cannot access the master, the file transfer fails.

2) In MySQL 4.0.0, the file contents were included in the binary log. Several new event types were introduced: Create_file_log_event (type code CREATE_FILE_EVENT = 8), Append_block_log_event (type code APPEND_BLOCK_EVENT = 9), Execute_load_log_event (type code EXEC_LOAD_EVENT = 10), and Delete_file_log_event (type code DELETE_FILE_EVENT = 11). The event sequence is:

  • Create_file_log_event : 1 instance

  • Append_block_log_event : 0 or more instances

  • Execute_load_log_event (success) or Delete_file_log_event (failure): 1 instance

The Create_file_log_event contains the options to LOAD DATA INFILE. This was a design flaw since the file cannot be loaded until the Exec_load_log_event is seen. To work around this, the slave, when executing the Create_file_log_event, wrote the Create_file_log_event to a temporary file. When the Execute_load_log_event was seen, this temporary file was read back so that the LOAD DATA INFILE statement could be constructed.

Append_block_log_event is used for files larger than a threshold. In this case, the file is split and the pieces are sent in separate events. The threshold is around 2^17 = 131072 bytes.

CREATE_FILE_EVENT tells the slave to create a temporary file and fill it with a first data block. Later, zero or more APPEND_BLOCK_EVENT events append blocks to this temporary file. EXEC_LOAD_EVENT tells the slave to load the temporary file into the table, or DELETE_FILE_EVENT tells the slave not to do the load and to delete the temporary file. DELETE_FILE_EVENT occurs when the LOAD DATA failed on the master: On the master we start to write loaded blocks to the binary log before the end of the statement. If for some reason there is an error, we must tell the slave to abort the load.

MySQL 4.0.0 also introduced the NEW_LOAD_EVENT = 12 type code.

If a slave reads a NEW_LOAD_EVENT from a binlog, it will use it as a LOAD_EVENT (but allowing longer separator names). Lost in the mysteries of time is the knowledge of whether there was ever a server version capable of writing NEW_LOAD_EVENT.

3) The original Load_log_event used one character for each of the delimiters (FIELDS TERMINATED BY, and so forth). At an unknown point in the version history, the format was modified to allow multiple-character strings as separators. This uses the same class, Load_log_event, but has the type code NEW_LOAD_EVENT = 12. This affects Create_file_log_event, since that inherits from <code class="literal">Load_log_event. So the new feature in Load_log_event allows Create_file_log_event to use multiple-character delimiters.

[A guess for when this occurred would be MySQL 4.0.0: That is when the sql_ex structure that holds the single-character field/line options was renamed to old_sql_ex and a new sql_ex structure was created that allows multiple-character values.]

4) In 5.0.3, the event sequence was changed again with the addition of two new event types: Begin_load_query_log_event (type code BEGIN_LOAD_QUERY_EVENT = 17) and Execute_load_query_log_event (type code EXECUTE_LOAD_QUERY_EVENT = 18). The event sequence is:

  • Begin_load_query_log_event : 1 instance

  • Append_block_log_event : 0 or more instances

  • Execute_load_query_log_event (success) or Delete_file_log_event (failure): 1 instance

With the new sequence, information about the options to LOAD DATA INFILE is moved from the first event to the last event. Consequently, Begin_load_query_log_event is almost the same as Append_file_log_event (it contains only file data), whereas Execute_load_query_log_event contains the text of the LOAD DATA INFILE statement. The revised event sequence fixes the design flaw in the 4.0 format.

Also, the temp file that stores the parameters to LOAD DATA INFILE is not needed anymore. There is still a temp file containing all the data to be loaded.

Here is a concrete example (it applies to MySQL 4.0 and 4.1):

On the master we have a file named /m/tmp/u.txt that contains these lines:


And we issue this statement on the master while the default database is test:

load data infile '/m/tmp/u.txt' replace into table x fields
terminated by ',' optionally enclosed by '"' escaped by '\\'
lines starting by '>' terminated by '\n' ignore 2 lines (a,b,c);

Then in the master's binary log we have this CREATE_FILE_EVENT (hexadecimal dump):

00000180:                     db4f 153f 0801 0000  .........O.?....
00000190: 006f 0000 0088 0100 0000 0004 0000 0000  .o..............
000001a0: 0000 0002 0000 0001 0403 0000 0003 0000  ................
000001b0: 0001 2c01 2201 0a01 3e01 5c06 0101 0161  ..,."...>.\....a
000001c0: 0062 0063 0078 0074 6573 7400 2f6d 2f74  .b.c.x.test./m/t
000001d0: 6d70 2f75 2e74 7874 003e 312c 322c 330a  mp/u.txt.>1,2,3.
000001e0: 3e34 2c35 2c36 0a3e 372c 382c 390a 3e31  >4,5,6.>7,8,9.>1
000001f0: 302c 3131 2c31 32db 4f15 3f0a 0100 0000  0,11,12.O.?.....
00000200: 1700 0000 f701 0000 0000 0300 0000       ..............
  • Line 180:

    • Timestamp (db4f 153f)

    • Event type (08)

    • Server ID (01 0000 00)

  • Line 190:

    • Event size (6f 0000 00)

    • Position in the binary log (88 0100 00) (that's 392 in decimal base)

    • Flags (00 00)

    • Thread ID (04 0000 00)

    • Time it took (00 0000 00)

  • Line 1a0:

    • Number of lines to skip at the beginning of the file (02 0000 00)

    • Length of the table name (01)

    • Length of the database name (04)

    • Number of columns to load (03 0000 00)

    • The file ID (03 0000 00)

  • Line 1b0:

    • Length of the field terminating string (01)

    • Field terminating string (2c = ,)

    • Length of the field enclosing string (01)

    • Field enclosing string (22 = ")

    • Length of the line terminating string (01)

    • Line terminating string (0a = newline)

    • Length of the line starting string (01)

    • Line starting string (3e = >)

    • Length of the escaping string (01)

    • Escaping string (5c = \)

    • Flags (06) (that's OPT_ENCLOSED_FLAG | REPLACE_FLAG)

    • Length of the name of the first column to load (01)

    • Length of the name of the second column to load (01)

    • Length of the name of the third column to load (01)

    • Name of the first column to load (61 00 = "a")

  • Line 1c0:

    • Name of the second column to load (62 00 = "b")

    • Name of the third column to load (63 00 = "c")

    • Name of the table to load (78 00 = "x"), name of the database that contains the table (74 6573 7400 = "test")

    • Name of the file on the master (2f6d 2f74 6d70 2f75 2e74 7874 00 = "/m/tmp/u.txt")

  • Line 1d0 and following:

    • Raw data to load (3e 312c 322c 330a 3e34 2c35 2c36 0a3e 372c 382c 390a 3e31 302c 3131 2c31 32)

The next byte in the file is the beginning of the EXEC_LOAD_EVENT event.